Volunteers needed to run regression test reporting
We have had a long term problem with regression test reporting going offline unexpectedly. The process used XSLT processing was hard to configure, horribly slow, and very fragile, and those factors made it a pain to run. Last year, Steven Watanabe rewrote the report generation in C++, and those issues have pretty much disappeared, so the process is now far easier to run. It runs in five minutes or less. Hooray C++! Hooray Steven! And Hooray Tom Kent who runs the reports! The remaining reliability factors are local ones like system crashes. Since the reports run as cron jobs, it can be quite a while before anyone is inconvenienced enough to report it to the Boost list, and Murphy's Law ensures that it is always the day after the person running the reports left for a week-long back country hike. We can mitigate that by having several people running the reports. I've written some docs, so it shouldn't be too hard, although right now it does require a *nix-like system. I've been running them on a Windows host using a VirtualBox virtual Ubuntu system, and have found it to be easy. One possibility would be for Tom to continue to run the reports on the hour, a second person to run them twenty minutes after the hour, and a third to run forty minutes after the hour. Please volunteer to help make regression reporting more timely and reliable! --Beman
Hi Beman, Beman Dawes wrote:
Please volunteer to help make regression reporting more timely and reliable!
For a while I have been wondering about volunteering an ARM Linux machine to run tests; please correct me if I'm wrong, but I believe that the current testing is done on x86 and one POWER system. I have a box with a dual-core Cortex-A15 (Samsung Exynos 5) and 2GB of RAM running Debian that is mostly idle. It currently has g++ 4.6 installed. But could it successfully run the tests in a sane period of time without choking? Has anyone observed what the peak RAM requirement is? What is the typical run time on an x86 system? I do have longer term plans for this hardware so it might not be available permanently - or it might melt - but I would be willing to give it a try. Are there more up-to-date instructions than those at http://beta.boost.org/development/running_regression_tests.html ? I have very limited experience with Boost.Build, Boost.Test, git or Python so I anticipate needing some hand-holding! Regards, Phil.
Hi Phil,
For a while I have been wondering about volunteering an ARM Linux machine to run tests; please correct me if I'm wrong, but I believe that the current testing is done on x86 and one POWER system.
AFAIK, that's correct, and it is certainly correct for the reliable testers. Out of curiosity, does your Linux on ARM system run big endian or little endian? I would personally love to have at least one regular Linux tester running big endian, but that's no big thing.
I have a box with a dual-core Cortex-A15 (Samsung Exynos 5) and 2GB of RAM running Debian that is mostly idle. It currently has g++ 4.6 installed. But could it successfully run the tests in a sane period of time without choking? Has anyone observed what the peak RAM requirement is? What is the typical run time on an x86 system?
My desktop system under Win7, w/16GB ram, Intel i5-3570K CPU @ 3.40GHz, 4 Core(s), 4 Logical Processor(s) a full run takes roughly 6 hours or so the first time, but that drops on subsequent runs. If nothing much has changed, it can be as low as 2 hours. If it helps, we could set up a light test mode that only tested core libraries, where core is defined as those a lot of other libraries depend on. That would eliminate a lot of tests, including a lot that take a long time to compile or run. I don't know what memory use is, but have run on a Linux virtual machine with 2 gigs in the past.
I do have longer term plans for this hardware so it might not be available permanently - or it might melt - but I would be willing to give it a try.
Are there more up-to-date instructions than those at http://beta.boost.org/development/running_regression_tests.html ? I have very limited experience with Boost.Build, Boost.Test, git or Python so I anticipate needing some hand-holding!
Those are the latest. Note that I updated them as recently as this morning. Good to hear from you, --Beman
Phil Endecott wrote:
I have a box with a dual-core Cortex-A15 (Samsung Exynos 5) and 2GB of RAM running Debian that is mostly idle. It currently has g++ 4.6 installed. But could it successfully run the tests in a sane period of time without choking?
I seem to have run the regression tests successfully and the results are shown as "exynos5" at http://www.boost.org/development/tests/develop/developer/summary.html There are a handful of test failures that do not seem to happen on other platforms. Library maintainers are welcome to ask me if they think there is anything platform-specific that needs investigation. It took about 16 hours to run. The only issues that I encountered were: - I set LANG=C to suppress some messages from Perl; I've had to do this on other occasions on this machine, so it's probably not an issue with the scripts. - I tried toolset=gcc-4.6.3 and got an incomprehensible error; I used toolset=gcc in the end. I suspect I should have just used toolset=gcc-4.6. It might be worthwhile to explicitly validate the toolset arg and give a better error message. - It only seemed to be using one CPU. Is there something that I can do to make it use both? I will look into doing this from a cron job. How often is it actually useful to run them? Some other notes: - This is a little-endian system. I haven't seem anyone using ARM in big-endian mode for quite a few years now. - It would be great to run tests on iOS. The way the devices are locked down makes this difficult. The best solution is probably to use a jailbroken device, and to use ssh to copy the test executables over and run them. - For Android, I'd also suggest cross-compiling and copying the test executables onto a device - and in this case it can be done without having to crack the hardware. Ideally much of this could be shared between Android and iOS. - I will probably install llvm as well, not least because that is rather closer to Apple's iOS cross compiler. - ARMv8, i.e. 64-bit ARM, will be the next challenge - once I have suitable hardware! Regards, Phil.
Hi, I'm running the regression test on my personal box (i5-3337U 1.8GHz running Windows 8/cygwin) before moving to a bigger one at work, and it has been working for 24 hours and has not finished yet. As Phil noted the fact that the regression test is using one core does not help. Is mono-core the expected policy ? I've also run into a small problem, when launching "python run.py" alone, the script failed while fetching files http://pastebin.com/62fgtzff But when i've run it using "python run.py --tag=trunk" (i.e. the supposed default value), it successfully fetch a whole repo and run test:: http://pastebin.com/cnnSrjsS Another question, what would be the most interesting configuration to test ? * msvc10 * msvc9 * cygwin_4.8 * or maybe msvc8 if I can still have it on my box And what tag ? trunk or another one ? Cheers -- Johan On Sun, Jan 19, 2014 at 5:34 PM, Phil Endecott < spam_from_boost_dev@chezphil.org> wrote:
Phil Endecott wrote:
I have a box with a dual-core Cortex-A15 (Samsung Exynos 5) and 2GB of RAM running Debian that is mostly idle. It currently has g++ 4.6 installed. But could it successfully run the tests in a sane period of time without choking?
I seem to have run the regression tests successfully and the results are shown as "exynos5" at http://www.boost.org/development/tests/develop/developer/summary.html There are a handful of test failures that do not seem to happen on other platforms. Library maintainers are welcome to ask me if they think there is anything platform-specific that needs investigation.
It took about 16 hours to run.
The only issues that I encountered were: - I set LANG=C to suppress some messages from Perl; I've had to do this on other occasions on this machine, so it's probably not an issue with the scripts. - I tried toolset=gcc-4.6.3 and got an incomprehensible error; I used toolset=gcc in the end. I suspect I should have just used toolset=gcc-4.6. It might be worthwhile to explicitly validate the toolset arg and give a better error message. - It only seemed to be using one CPU. Is there something that I can do to make it use both?
I will look into doing this from a cron job. How often is it actually useful to run them?
Some other notes:
- This is a little-endian system. I haven't seem anyone using ARM in big-endian mode for quite a few years now.
- It would be great to run tests on iOS. The way the devices are locked down makes this difficult. The best solution is probably to use a jailbroken device, and to use ssh to copy the test executables over and run them.
- For Android, I'd also suggest cross-compiling and copying the test executables onto a device - and in this case it can be done without having to crack the hardware. Ideally much of this could be shared between Android and iOS.
- I will probably install llvm as well, not least because that is rather closer to Apple's iOS cross compiler.
- ARMv8, i.e. 64-bit ARM, will be the next challenge - once I have suitable hardware!
Regards, Phil.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost
On Jan 19, 2014, at 4:29 PM, Johan Baltié
wrote: Hi,
I'm running the regression test on my personal box (i5-3337U 1.8GHz running Windows 8/cygwin) before moving to a bigger one at work, and it has been working for 24 hours and has not finished yet. As Phil noted the fact that the regression test is using one core does not help. Is mono-core the expected policy ?
By default everything will run a single job at a time. You can pass the argument --bjam-options="-jN" to the run.py script. This will tell bjam to use N parallel jobs. This at least will parallelize the build process. Not sure if the running of the tests themselves will happen in parallel or not; maybe someone else can chime in. Jason
AMDG On 01/19/2014 04:17 PM, Jason Roehm wrote:
On Jan 19, 2014, at 4:29 PM, Johan Baltié
wrote: I'm running the regression test on my personal box (i5-3337U 1.8GHz running Windows 8/cygwin) before moving to a bigger one at work, and it has been working for 24 hours and has not finished yet. As Phil noted the fact that the regression test is using one core does not help. Is mono-core the expected policy ?
By default everything will run a single job at a time. You can pass the argument --bjam-options="-jN" to the run.py script. This will tell bjam to use N parallel jobs. This at least will parallelize the build process. Not sure if the running of the tests themselves will happen in parallel or not; maybe someone else can chime in.
Running of the tests is integrated into the build system. It's not a separate step. In Christ, Steven Watanabe
On Fri, Jan 17, 2014 at 11:34 AM, Phil Endecott
For a while I have been wondering about volunteering an ARM Linux machine to run tests; please correct me if I'm wrong, but I believe that the current testing is done on x86 and one POWER system.
This would be great, I've been trying (without success) to get an ARM test runner up for several months now.
I have a box with a dual-core Cortex-A15 (Samsung Exynos 5) and 2GB of RAM running Debian that is mostly idle. It currently has g++ 4.6 installed. But could it successfully run the tests in a sane period of time without choking? Has anyone observed what the peak RAM requirement is? What is the typical run time on an x86 system?
What hardware is this? Is it publically available for purchase? I've been looking high and low for something with 2GB+ of RAM (I wouldn't try running the regression tests with less than this) that I could get Debian onto. I tried some cheap android stick that was made for putting android on TV via a HDMI port, but I couldn't find a good kernel for the Allwinner CPU. Tom
Tom Kent wrote:
On Fri, Jan 17, 2014 at 11:34 AM, Phil Endecott
wrote: I have a box with a dual-core Cortex-A15 (Samsung Exynos 5) and 2GB of RAM running Debian that is mostly idle. It currently has g++ 4.6 installed. But could it successfully run the tests in a sane period of time without choking? Has anyone observed what the peak RAM requirement is? What is the typical run time on an x86 system?
What hardware is this? Is it publically available for purchase? I've been looking high and low for something with 2GB+ of RAM (I wouldn't try running the regression tests with less than this) that I could get Debian onto. I tried some cheap android stick that was made for putting android on TV via a HDMI port, but I couldn't find a good kernel for the Allwinner CPU.
It's this: http://chezphil.org/india/ Regards, Phil.
On 1/17/2014 11:56 AM, Beman Dawes wrote:
We have had a long term problem with regression test reporting going offline unexpectedly. The process used XSLT processing was hard to configure, horribly slow, and very fragile, and those factors made it a pain to run.
Last year, Steven Watanabe rewrote the report generation in C++, and those issues have pretty much disappeared, so the process is now far easier to run. It runs in five minutes or less.
Hooray C++! Hooray Steven! And Hooray Tom Kent who runs the reports!
The remaining reliability factors are local ones like system crashes. Since the reports run as cron jobs, it can be quite a while before anyone is inconvenienced enough to report it to the Boost list, and Murphy's Law ensures that it is always the day after the person running the reports left for a week-long back country hike.
We can mitigate that by having several people running the reports. I've written some docs, so it shouldn't be too hard, although right now it does require a *nix-like system. I've been running them on a Windows host using a VirtualBox virtual Ubuntu system, and have found it to be easy.
One possibility would be for Tom to continue to run the reports on the hour, a second person to run them twenty minutes after the hour, and a third to run forty minutes after the hour.
Please volunteer to help make regression reporting more timely and reliable!
I could try running regression tests on an older, slower 32-bit computer I have on which I can try installling some updated Linux distros with gcc, clang, Intel C++, and Sun C++. Would this be valuable ? Is there a clear and straightforward explanation to what is needed to run regression tests using modular-boost ? I know how to setup modular-boost and now use git fairly well.
On 1/17/2014 11:56 AM, Beman Dawes wrote:
We have had a long term problem with regression test reporting going offline unexpectedly. The process used XSLT processing was hard to configure, horribly slow, and very fragile, and those factors made it a pain to run.
Last year, Steven Watanabe rewrote the report generation in C++, and those issues have pretty much disappeared, so the process is now far easier to run. It runs in five minutes or less.
Hooray C++! Hooray Steven! And Hooray Tom Kent who runs the reports!
The remaining reliability factors are local ones like system crashes. Since the reports run as cron jobs, it can be quite a while before anyone is inconvenienced enough to report it to the Boost list, and Murphy's Law ensures that it is always the day after the person running the reports left for a week-long back country hike.
We can mitigate that by having several people running the reports. I've written some docs, so it shouldn't be too hard, although right now it does require a *nix-like system. I've been running them on a Windows host using a VirtualBox virtual Ubuntu system, and have found it to be easy.
So running regression tests for Windows compiler, ie msvc, must be done on a virtual machine hosted under Linux/Unix ? Or do you mean that your docs only apply to running regression tests under a *nix system ?
On 01/17/2014 05:25 PM, Edward Diener wrote:\
So running regression tests for Windows compiler, ie msvc, must be done on a virtual machine hosted under Linux/Unix ? Or do you mean that your docs only apply to running regression tests under a *nix system ?
I think that the process that he's referring to isn't the running of regression tests itself. Instead, it's the generation of reports based upon the regression test results uploaded by each tester. Those currently run periodically, but I think Beman is looking to find a more robust and frequent scheme for publishing of the human-readable reports. Jason
AMDG On 01/17/2014 02:25 PM, Edward Diener wrote:
On 1/17/2014 11:56 AM, Beman Dawes wrote:
We can mitigate that by having several people running the reports. I've written some docs, so it shouldn't be too hard, although right now it does require a *nix-like system. I've been running them on a Windows host using a VirtualBox virtual Ubuntu system, and have found it to be easy.
So running regression tests for Windows compiler, ie msvc, must be done on a virtual machine hosted under Linux/Unix ? Or do you mean that your docs only apply to running regression tests under a *nix system ?
Beman isn't talking about the tests themselves, but about the tool that collects the results from the test runners and generates the html pages that you see on the Boost website. In Christ, Steven Watanabe
On Fri, Jan 17, 2014 at 5:34 PM, Steven Watanabe
AMDG
On 01/17/2014 02:25 PM, Edward Diener wrote:
On 1/17/2014 11:56 AM, Beman Dawes wrote:
We can mitigate that by having several people running the reports. I've written some docs, so it shouldn't be too hard, although right now it does require a *nix-like system. I've been running them on a Windows host using a VirtualBox virtual Ubuntu system, and have found it to be easy.
So running regression tests for Windows compiler, ie msvc, must be done on a virtual machine hosted under Linux/Unix ? Or do you mean that your docs only apply to running regression tests under a *nix system ?
Beman isn't talking about the tests themselves, but about the tool that collects the results from the test runners and generates the html pages that you see on the Boost website.
Steven and Jason (see prior message) are right. We need a volunteer or two to run reports. It limits the usefulness of lots of people running the tests if we can't publish the test results in a timely manner. --Beman
We can mitigate that by having several people running the reports. I've written some docs, so it shouldn't be too hard, although right now it does require a *nix-like system. I've been running them on a Windows host using a VirtualBox virtual Ubuntu system, and have found it to be easy.
i wonder: has anyone ever tried to integrate the boost regression tests with a cross-compiler and/or qemu? some years ago, i've tested some altivec/neon code on x86_64 hardware by running my redirecting ctest to call the binaries via qemu. this would have two advantages: * being able to test some architectures without having the hardware * not having to compile natively on slow hardware also it would be wonderful to be able to have testers for ios and android. would be nice if apple or google could provide the infrastructure for this ... one can dream ...
On Sat, Jan 18, 2014 at 6:07 AM, Mostafa
On Sat, 18 Jan 2014 02:31:56 -0800, Tim Blechmann
wrote: also it would be wonderful to be able to have testers for ios and
android. would be nice if apple or google could provide the infrastructure for this ... one can dream ...
One can also ask them ...
+1 It would also be helpful to know exactly what infrastructure is required. If it comes down to a machine and a bit of money, we might be able to find contributors without too much trouble. Thanks, --Beman
2014/1/18 Tim Blechmann
We can mitigate that by having several people running the reports. I've written some docs, so it shouldn't be too hard, although right now it does require a *nix-like system. I've been running them on a Windows host using a VirtualBox virtual Ubuntu system, and have found it to be easy.
i wonder: has anyone ever tried to integrate the boost regression tests with a cross-compiler and/or qemu? some years ago, i've tested some altivec/neon code on x86_64 hardware by running my redirecting ctest to call the binaries via qemu.
I run regression tests on Windows, cross compiling tests to Android (ARM) and running the tests on Android emulator (quemu). You may see some info at https://github.com/apolukhin/regression_android -- Best regards, Antony Polukhin
On Sat, Jan 18, 2014 at 6:19 AM, Antony Polukhin
...
I run regression tests on Windows, cross compiling tests to Android (ARM) and running the tests on Android emulator (quemu). You may see some info at https://github.com/apolukhin/regression_android
Is it possible to add these tests to the regular reporting mix? Would that require changes to the regression script? While we don't have resources for complex changes, we could make simple changes if it would help. --Beman
2014/1/18 Beman Dawes
On Sat, Jan 18, 2014 at 6:19 AM, Antony Polukhin
wrote:
...
I run regression tests on Windows, cross compiling tests to Android (ARM) and running the tests on Android emulator (quemu). You may see some info at https://github.com/apolukhin/regression_android
Is it possible to add these tests to the regular reporting mix?
Would that require changes to the regression script? While we don't have resources for complex changes, we could make simple changes if it would help.
No changes must be done to the regression scripts. Some changes may be applied to building system (do not link against -lrt on Android + specifying "<threadapi>pthread" in user-config.jam does not work). Results will be seen on a regular reporting mix automatically (at least previous run.py reported). I'm porting my scripts to Linux and soon I'll try to run run.py cross-compilinng on Linux host (run.py on Windows currently fails even without cross-compiling). -- Best regards, Antony Polukhin
Am 17.01.14 17:56, schrieb Beman Dawes:
We have had a long term problem with regression test reporting going offline unexpectedly. The process used XSLT processing was hard to configure, horribly slow, and very fragile, and those factors made it a pain to run.
Last year, Steven Watanabe rewrote the report generation in C++, and those issues have pretty much disappeared, so the process is now far easier to run. It runs in five minutes or less.
Hooray C++! Hooray Steven! And Hooray Tom Kent who runs the reports!
The remaining reliability factors are local ones like system crashes. Since the reports run as cron jobs, it can be quite a while before anyone is inconvenienced enough to report it to the Boost list, and Murphy's Law ensures that it is always the day after the person running the reports left for a week-long back country hike.
We can mitigate that by having several people running the reports. I've written some docs, so it shouldn't be too hard, although right now it does require a *nix-like system. I've been running them on a Windows host using a VirtualBox virtual Ubuntu system, and have found it to be easy.
One possibility would be for Tom to continue to run the reports on the hour, a second person to run them twenty minutes after the hour, and a third to run forty minutes after the hour.
Please volunteer to help make regression reporting more timely and reliable!
--Beman
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Any chance for the test environment to be made available as an RPM and Debian package on the Linux side ? I believe you would find a rather large amount of testers willing to contribute ressources (e.g. at Universities), if all they have to do is an "apt-get install" of the package. Best Regards, Beet
On Jan 17, 2014, at 9:56 AM, Beman Dawes wrote:
We have had a long term problem with regression test reporting going offline unexpectedly. The process used XSLT processing was hard to configure, horribly slow, and very fragile, and those factors made it a pain to run.
Last year, Steven Watanabe rewrote the report generation in C++, and those issues have pretty much disappeared, so the process is now far easier to run. It runs in five minutes or less.
Hooray C++! Hooray Steven! And Hooray Tom Kent who runs the reports!
The remaining reliability factors are local ones like system crashes. Since the reports run as cron jobs, it can be quite a while before anyone is inconvenienced enough to report it to the Boost list, and Murphy's Law ensures that it is always the day after the person running the reports left for a week-long back country hike.
We can mitigate that by having several people running the reports. I've written some docs, so it shouldn't be too hard, although right now it does require a *nix-like system. I've been running them on a Windows host using a VirtualBox virtual Ubuntu system, and have found it to be easy.
One possibility would be for Tom to continue to run the reports on the hour, a second person to run them twenty minutes after the hour, and a third to run forty minutes after the hour.
Please volunteer to help make regression reporting more timely and reliable!
I can help with the reporting if you still need some volunteers. This would be on RHEL in case that matters. -- Noel
participants (12)
-
Antony Polukhin
-
beet
-
Belcourt, Kenneth
-
Beman Dawes
-
Edward Diener
-
Jason Roehm
-
Johan Baltié
-
Mostafa
-
Phil Endecott
-
Steven Watanabe
-
Tim Blechmann
-
Tom Kent