[testing] Proposal - regression tests results display upgrade
Hi, Currently test failures of a new feature are always marked with the yellow color and a word 'fail'. Would it make sense to use some more descriptive color/naming scheme? In particular it would be nice to distinguish between the actual failures and the situations when the compilation of a test took too much time or an output file was too big. Would it make sense to also distinguish between compilation, linking and run failure? E.g. instead of a simple 'fail' use 'fail (c)', 'fail (l)' and 'fail (r)'. E.g. the color for actual failures could be orange and for "others" could stay yellow. Furthermore, currently it's impossible to see the results of compilation if a test passed. Would it make sense to change it? This would allow us to view and fix warnings even if the test was passed. The test passed with warnings could be marked e.g. with 'pass (w)' or the color could be set to some between yellow and this light green. If you directed me where it should be done I could try to change it. I'm guessing it would require modifying some script generating the page and some CSS? Regards, Adam
On Wed, May 28, 2014 at 9:08 AM, Adam Wulkiewicz
Hi,
Currently test failures of a new feature are always marked with the yellow color and a word 'fail'. Would it make sense to use some more descriptive color/naming scheme? In particular it would be nice to distinguish between the actual failures and the situations when the compilation of a test took too much time or an output file was too big. Would it make sense to also distinguish between compilation, linking and run failure? E.g. instead of a simple 'fail' use 'fail (c)', 'fail (l)' and 'fail (r)'. E.g. the color for actual failures could be orange and for "others" could stay yellow.
Furthermore, currently it's impossible to see the results of compilation if a test passed. Would it make sense to change it? This would allow us to view and fix warnings even if the test was passed. The test passed with warnings could be marked e.g. with 'pass (w)' or the color could be set to some between yellow and this light green.
Makes sense to me. IIRC, Marshall Clow has given some thought to regression test reporting needs, so you might want to get input from him.
If you directed me where it should be done I could try to change it. I'm guessing it would require modifying some script generating the page and some CSS?
See boost-root\tools\regression\src\report Thanks, --Beman
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Adam Wulkiewicz Sent: 28 May 2014 14:09 To: boost@lists.boost.org Subject: [boost] [testing] Proposal - regression tests results display upgrade
Hi,
Currently test failures of a new feature are always marked with the yellow
color and > a word 'fail'. > Would it make sense to use some more descriptive color/naming scheme? > In particular it would be nice to distinguish between the actual failures and the > situations when the compilation of a test took too much time or an output file was > too big. > Would it make sense to also distinguish between compilation, linking and run failure? > E.g. instead of a simple 'fail' use 'fail (c)', 'fail (l)' > and 'fail (r)'. > E.g. the color for actual failures could be orange and for "others" > could stay yellow. > > Furthermore, currently it's impossible to see the results of compilation if a test > passed. > Would it make sense to change it? > This would allow us to view and fix warnings even if the test was passed. > The test passed with warnings could be marked e.g. with 'pass (w)' or the color could > be set to some between yellow and this light green.
+1 definitely. This is particularly a problem for Boost.Math - the largest library in Boost, by far, in both code and tests, with several tests that often time out. Various summary counts of passes and fails would be good as well. It takes a while to scan the current (nice) display, eg http://www.boost.org/development/tests/develop/developer/math.html Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 01539 561830
Paul A. Bristow wrote:
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Adam Wulkiewicz Sent: 28 May 2014 14:09 To: boost@lists.boost.org Subject: [boost] [testing] Proposal - regression tests results display upgrade
Hi,
Currently test failures of a new feature are always marked with the yellow
color and > a word 'fail'. > Would it make sense to use some more descriptive color/naming scheme? > In particular it would be nice to distinguish between the actual failures and the > situations when the compilation of a test took too much time or an output file was > too big. > Would it make sense to also distinguish between compilation, linking and run failure? > E.g. instead of a simple 'fail' use 'fail (c)', 'fail (l)' > and 'fail (r)'. > E.g. the color for actual failures could be orange and for "others" > could stay yellow. > > Furthermore, currently it's impossible to see the results of compilation if a test > passed. > Would it make sense to change it? > This would allow us to view and fix warnings even if the test was passed. > The test passed with warnings could be marked e.g. with 'pass (w)' or the color could > be set to some between yellow and this light green. +1 definitely.
This is particularly a problem for Boost.Math - the largest library in Boost, by far, in both code and tests, with several tests that often time out.
Various summary counts of passes and fails would be good as well.
It takes a while to scan the current (nice) display, eg
http://www.boost.org/development/tests/develop/developer/math.html
I thought about some list of phrases/regexps which if found/matched in the compiler output or even the output of the regression tool would result in various coloring of cells and names. E.g. something like: fail yellow "compile with /bigobj" fail yellow "300 second time limit exceeded" fail(c) orange ^"Compile".+"fail"$ # ... Or something similar in XML. Could also be set per-library if needed. As much as I'd like to play with it I'm affraid I won't find enough time. Regards, Adam
Paul A. Bristow wrote:
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Adam Wulkiewicz Would it make sense to use some more descriptive color/naming scheme? In particular it would be nice to distinguish between the actual failures and the situations when the compilation of a test took too much time or an output file was too big. Would it make sense to also distinguish between compilation, linking and run failure?
+1 definitely.
This is particularly a problem for Boost.Math - the largest library in Boost, by far, in both code and tests, with several tests that often time out.
Various summary counts of passes and fails would be good as well.
It takes a while to scan the current (nice) display, eg
http://www.boost.org/development/tests/develop/developer/math.html
Ok, I managed to find some time to play with it. AFAIU the reports are generated using the code from: https://github.com/boostorg/boost/tree/develop/tools/regression/src/report, is that correct? The first change is simple since I'd like to see if I'm doing everything right. I changed the way how those specific fails are displayed. If the compilation fails and at the end of the compiler output (25 last characters) one of the following strings can be found: - "File too big" - "time limit exceeded" the test is considered as "unfinished compilation" and displayed on the library results page as a yellow cell with a link named "fail?". So it's distinguishable from the normal "fail". Here is the PR: https://github.com/boostorg/boost/pull/25 I only tested it locally on a test done by 1 runner for Geometry and Math libraries. Is there a way I could test it on results sent by all of the runners? How is this program executed? Is there some script which e.g. checks some directory and passes all of the tests as command arguments? Regards, Adam
On Wed, Aug 13, 2014 at 10:42 AM, Adam Wulkiewicz wrote: Paul A. Bristow wrote: -----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Adam Wulkiewicz Would it make sense to use some more descriptive color/naming scheme? In particular it would be nice to distinguish between the actual
failures and the situations when the compilation of a test took too much time or an
output file was too big.
Would it make sense to also distinguish between compilation, linking and
run failure? +1 definitely. This is particularly a problem for Boost.Math - the largest library in
Boost, by
far, in both code and tests, with several tests that often time out. Various summary counts of passes and fails would be good as well. It takes a while to scan the current (nice) display, eg http://www.boost.org/development/tests/develop/developer/math.html Ok, I managed to find some time to play with it.
AFAIU the reports are generated using the code from:
https://github.com/boostorg/boost/tree/develop/tools/regression/src/report,
is that correct? Yes, I believe so. The first change is simple since I'd like to see if I'm doing everything
right. I changed the way how those specific fails are displayed. If the
compilation fails and at the end of the compiler output (25 last
characters) one of the following strings can be found:
- "File too big"
- "time limit exceeded"
the test is considered as "unfinished compilation" and displayed on the
library results page as a yellow cell with a link named "fail?". So it's
distinguishable from the normal "fail". Here is the PR: https://github.com/boostorg/boost/pull/25 I skimmed it and didn't see any red flags. I only tested it locally on a test done by 1 runner for Geometry and Math
libraries.
Is there a way I could test it on results sent by all of the runners?
How is this program executed? IIRC, Noel Belcourt at sandia runs the tests. Thus he would be a good
person to merge your pull request since he will likely be the first to
notice any problems.
Noel, are you reading this:-? Is there some script which e.g. checks some directory and passes all of
the tests as command arguments? He would be the best person to answer that.
Thanks for working that this!
--Beman
On Aug 13, 2014, at 1:51 PM, Beman Dawes
On Wed, Aug 13, 2014 at 10:42 AM, Adam Wulkiewicz
wrote:
Paul A. Bristow wrote:
-----Original Message-----
From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Adam
Wulkiewicz
Would it make sense to use some more descriptive color/naming scheme?
In particular it would be nice to distinguish between the actual failures and
the
situations when the compilation of a test took too much time or an output file
was
too big. Would it make sense to also distinguish between compilation, linking and run
failure?
+1 definitely.
This is particularly a problem for Boost.Math - the largest library in Boost, by far, in both code and tests, with several tests that often time out.
Various summary counts of passes and fails would be good as well.
It takes a while to scan the current (nice) display, eg
http://www.boost.org/development/tests/develop/developer/math.html
Ok, I managed to find some time to play with it. AFAIU the reports are generated using the code from: https://github.com/boostorg/boost/tree/develop/tools/regression/src/report, is that correct?
Yes, I believe so.
The first change is simple since I'd like to see if I'm doing everything right. I changed the way how those specific fails are displayed. If the compilation fails and at the end of the compiler output (25 last characters) one of the following strings can be found: - "File too big" - "time limit exceeded" the test is considered as "unfinished compilation" and displayed on the library results page as a yellow cell with a link named "fail?". So it's distinguishable from the normal "fail".
Here is the PR: https://github.com/boostorg/boost/pull/25
I skimmed it and didn't see any red flags.
I only tested it locally on a test done by 1 runner for Geometry and Math libraries. Is there a way I could test it on results sent by all of the runners? How is this program executed?
IIRC, Noel Belcourt at sandia runs the tests. Thus he would be a good person to merge your pull request since he will likely be the first to notice any problems.
Noel, are you reading this:-?
A little delayed but yes.
Is there some script which e.g. checks some directory and passes all of
the tests as command arguments?
He would be the best person to answer that.
I’ll have to investigate, don’t know off the top of my head. — Noel
OK, I only now have time to read email.. Is there something I can help
answer? (Not sure what the question refers to).
Rene.
On Wed, Aug 13, 2014 at 5:53 PM, Belcourt, Kenneth
On Aug 13, 2014, at 1:51 PM, Beman Dawes
wrote: On Wed, Aug 13, 2014 at 10:42 AM, Adam Wulkiewicz < adam.wulkiewicz@gmail.com
wrote:
Paul A. Bristow wrote:
-----Original Message-----
From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Adam
Wulkiewicz
Would it make sense to use some more descriptive color/naming scheme?
In particular it would be nice to distinguish between the actual failures and
the
situations when the compilation of a test took too much time or an output file
was
too big. Would it make sense to also distinguish between compilation, linking and run
failure?
+1 definitely.
This is particularly a problem for Boost.Math - the largest library in Boost, by far, in both code and tests, with several tests that often time out.
Various summary counts of passes and fails would be good as well.
It takes a while to scan the current (nice) display, eg
http://www.boost.org/development/tests/develop/developer/math.html
Ok, I managed to find some time to play with it. AFAIU the reports are generated using the code from:
https://github.com/boostorg/boost/tree/develop/tools/regression/src/report ,
is that correct?
Yes, I believe so.
The first change is simple since I'd like to see if I'm doing everything right. I changed the way how those specific fails are displayed. If the compilation fails and at the end of the compiler output (25 last characters) one of the following strings can be found: - "File too big" - "time limit exceeded" the test is considered as "unfinished compilation" and displayed on the library results page as a yellow cell with a link named "fail?". So it's distinguishable from the normal "fail".
Here is the PR: https://github.com/boostorg/boost/pull/25
I skimmed it and didn't see any red flags.
I only tested it locally on a test done by 1 runner for Geometry and
Math
libraries. Is there a way I could test it on results sent by all of the runners? How is this program executed?
IIRC, Noel Belcourt at sandia runs the tests. Thus he would be a good person to merge your pull request since he will likely be the first to notice any problems.
Noel, are you reading this:-?
A little delayed but yes.
Is there some script which e.g. checks some directory and passes all of
the tests as command arguments?
He would be the best person to answer that.
I’ll have to investigate, don’t know off the top of my head.
— Noel
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net -- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail
Rene Rivera wrote:
OK, I only now have time to read email.. Is there something I can help answer? (Not sure what the question refers to).
I prepared a PR (https://github.com/boostorg/boost/pull/25) changing the way how the library results page is generated. It works locally in a way that it generates the results page but I'm not sure really if this is the code of a program used to generate the regression summary pages and how is it executed. Am I playing with the correct program? How the procedure of contribution to Regression looks like? In this particular case I'm unable to test what I've done on a "living organism" since AFAIU the tests are processed on an external server. Assuming that my contribution is desired, should someone with sufficient access rights test the code from my fork or should the code be merged and then tested as "official"? Or should I request access rights required to test it myself? Another thing. I'd like to investigate why the results are missing for Geometry and Spirit master branch. I'm guessing that if I wasn't able to somehow figure it out locally then some form of access to all of the test results sent by the runners would be needed for checking the XMLs and debugging the program generating the pages. Would this be possible or do you have a better idea? Regards, Adam
Another thing. I'd like to investigate why the results are missing for Geometry and Spirit master branch. I'm guessing that if I wasn't able to somehow figure it out locally then some form of access to all of the test results sent by the runners would be needed for checking the XMLs and debugging the program generating the pages. Would this be possible or do you have a better idea?
I am not sure how the summary is generated but I looked into the XML and as far as I can tell spirit has been moved to spirit/test. As for why it still shows up in the summary, I don't know enough to say. Similarly geometry seems to have moved to geometry/test but for some reason does not appear in the summary. Hope that helps point you in the right direction. Regards, Thomas
Thomas Suckow wrote:
Another thing. I'd like to investigate why the results are missing for Geometry and Spirit master branch. I'm guessing that if I wasn't able to somehow figure it out locally then some form of access to all of the test results sent by the runners would be needed for checking the XMLs and debugging the program generating the pages. Would this be possible or do you have a better idea?
I am not sure how the summary is generated but I looked into the XML and as far as I can tell spirit has been moved to spirit/test. As for why it still shows up in the summary, I don't know enough to say. Similarly geometry seems to have moved to geometry/test but for some reason does not appear in the summary.
Yes, AFAIK everything is ok with the libraries. Though I have an idea what may be the cause of the problem. On the page: http://www.boost.org/development/tests/master/developer/geometry_.html empty cells have class set to "library-missing". I can't find anything in the program I modified that could generate this HTML. However I can find it in the XSL related with the scripts probably also involved in the reports generation from the tree: https://github.com/boostorg/regression/tree/develop/xsl_reports. So the question remains. How exactly the results are generated? Are they generated the same way for develop and master branches? Is the C++ program I modified used in any case? Regards, Adam
participants (6)
-
Adam Wulkiewicz
-
Belcourt, Kenneth
-
Beman Dawes
-
Paul A. Bristow
-
Rene Rivera
-
Thomas Suckow