[all][testing] Regression summary upgrades

Adam Wulkiewicz

11 May 2015 11 May '15

3:46 p.m.

Hi, Recently the regression summary was upgraded. The cells corresponding to the failing tests can have names comp, link, run or fail. This way we can see what's the nature of the failure without going into details. See, e.g. the tests for Math: http://www.boost.org/development/tests/develop/developer/math.html The last category of failures (generic "fail") is for the others or unknown. Currently the "Lib" errors are falling into this category. There are some in Math, e.g.: http://www.boost.org/development/tests/develop/developer/output/CrystaX-NET-... So we could mark them explicitly as "lib" failures and then the other/unknown failures could be marked as "unkn" or left as it is now - "fail". This has sense if there can be other "types" of failures, e.g. caused by some error in the testing scripts, Build or Regression, for which the reason is somehow unknown. Though I haven't seen them. I propose to go further with this and to mark the compilation failures for which we know the reason of the failure in a special way: - "file" - file too big (reported on Windows, e.g. http://www.boost.org/development/tests/develop/developer/output/MinGW-w64-4-...) - "time" - time limit exceeded (e.g. http://www.boost.org/development/tests/develop/developer/output/teeks99-03b-...) I saw them in many libraries and I think it would be very useful for libraries having many tests like Math or Geometry. The other types of failures I could think of aren't that important for me personally but maybe someone could appreciate them, e.g. the compilers developers: - "ierr" or "cerr" - internal compiler error or segmentation fault during compilation (e.g. http://www.boost.org/development/tests/develop/developer/output/PNNL-RHEL6_6... or http://www.boost.org/development/tests/develop/developer/output/teeks99-03h-...) - "segf" - segmentation fault during run (http://www.boost.org/development/tests/develop/developer/output/BenPope%20x8...) Furthermore I propose to use various colors for different failure reasons. You can see the examples here: https://github.com/awulkiew/summary-enhancer https://github.com/awulkiew/summary-enhancer/tree/master/example/pages Of course the colors could be changed. We should be able to modify them in the master.css file. In the examples above they are varying much to make a clear distinction between them. But if you like the yelow collor then all of the new failures could be yelowish with a slight accent of another color, e.g. compilation errors could be yellow with orange accent and "time"/"file" errors with green accent. Regards, Adam

Show replies by date

John Maddock

11 May 11 May

4:09 p.m.

On 11/05/2015 16:46, Adam Wulkiewicz wrote:

...

Hi,

Recently the regression summary was upgraded. The cells corresponding to the failing tests can have names comp, link, run or fail. This way we can see what's the nature of the failure without going into details. This is great: many thanks!

...

See, e.g. the tests for Math: http://www.boost.org/development/tests/develop/developer/math.html

The last category of failures (generic "fail") is for the others or unknown. Currently the "Lib" errors are falling into this category. There are some in Math, e.g.: http://www.boost.org/development/tests/develop/developer/output/CrystaX-NET-...

That's a weird one - if you follow the links it actually says that linking the lib succeeded - which leaves me wondering what actually went wrong?

...

So we could mark them explicitly as "lib" failures and then the other/unknown failures could be marked as "unkn" or left as it is now - "fail". This has sense if there can be other "types" of failures, e.g. caused by some error in the testing scripts, Build or Regression, for which the reason is somehow unknown. Though I haven't seen them.

I propose to go further with this and to mark the compilation failures for which we know the reason of the failure in a special way: - "file" - file too big (reported on Windows, e.g. http://www.boost.org/development/tests/develop/developer/output/MinGW-w64-4-...) - "time" - time limit exceeded (e.g. http://www.boost.org/development/tests/develop/developer/output/teeks99-03b-...) I saw them in many libraries and I think it would be very useful for libraries having many tests like Math or Geometry.

+1, time limit exceptions are a big issue for the Math and Multiprecision libs... and yes, I've already spent a lot of time refactoring to make them smaller, but they still persist. I suspect many of these are caused by virtual machines with insufficient RAM allocated, that then end up thrashing the HD: many of the tests that time out compile really pretty quickly here (even on obsolete hardware).

...

The other types of failures I could think of aren't that important for me personally but maybe someone could appreciate them, e.g. the compilers developers: - "ierr" or "cerr" - internal compiler error or segmentation fault during compilation (e.g. http://www.boost.org/development/tests/develop/developer/output/PNNL-RHEL6_6... or http://www.boost.org/development/tests/develop/developer/output/teeks99-03h-...) - "segf" - segmentation fault during run (http://www.boost.org/development/tests/develop/developer/output/BenPope%20x8...) +1, internal compiler errors aren't really our fault are they? It would be good to be able to screen them out easily.

...

Furthermore I propose to use various colors for different failure reasons.

You can see the examples here: https://github.com/awulkiew/summary-enhancer https://github.com/awulkiew/summary-enhancer/tree/master/example/pages

Of course the colors could be changed. We should be able to modify them in the master.css file. In the examples above they are varying much to make a clear distinction between them. But if you like the yelow collor then all of the new failures could be yelowish with a slight accent of another color, e.g. compilation errors could be yellow with orange accent and "time"/"file" errors with green accent.

+1 on the use of color, again it helps us screen out what explained and what's not. John.

Edward Diener

4:54 p.m.

On 5/11/2015 12:09 PM, John Maddock wrote:

...

On 11/05/2015 16:46, Adam Wulkiewicz wrote:

...
Hi,

Recently the regression summary was upgraded. The cells corresponding to the failing tests can have names comp, link, run or fail. This way we can see what's the nature of the failure without going into details. This is great: many thanks!

...
See, e.g. the tests for Math: http://www.boost.org/development/tests/develop/developer/math.html

The last category of failures (generic "fail") is for the others or unknown. Currently the "Lib" errors are falling into this category. There are some in Math, e.g.: http://www.boost.org/development/tests/develop/developer/output/CrystaX-NET-...

That's a weird one - if you follow the links it actually says that linking the lib succeeded - which leaves me wondering what actually went wrong?

...
So we could mark them explicitly as "lib" failures and then the other/unknown failures could be marked as "unkn" or left as it is now - "fail". This has sense if there can be other "types" of failures, e.g. caused by some error in the testing scripts, Build or Regression, for which the reason is somehow unknown. Though I haven't seen them.

I propose to go further with this and to mark the compilation failures for which we know the reason of the failure in a special way: - "file" - file too big (reported on Windows, e.g. http://www.boost.org/development/tests/develop/developer/output/MinGW-w64-4-...)

- "time" - time limit exceeded (e.g. http://www.boost.org/development/tests/develop/developer/output/teeks99-03b-...)

I saw them in many libraries and I think it would be very useful for libraries having many tests like Math or Geometry.

+1, time limit exceptions are a big issue for the Math and Multiprecision libs... and yes, I've already spent a lot of time refactoring to make them smaller, but they still persist. I suspect many of these are caused by virtual machines with insufficient RAM allocated, that then end up thrashing the HD: many of the tests that time out compile really pretty quickly here (even on obsolete hardware).

I see some time-out errors in VMD testing. The code makes heavy use of the preprocessor but no single test should last more than 5 minutes ( 300 seconds ). I also would love to see time limit exceeded marked differently, and maybe some way of having regression testers to increase the time limit when those failures occur. I think it is great work to have the regression test matrix marked more specifically.

Adam Wulkiewicz

12 May 12 May

10:25 a.m.

Edward Diener wrote:

...

On 5/11/2015 12:09 PM, John Maddock wrote:

...
On 11/05/2015 16:46, Adam Wulkiewicz wrote:

...
I propose to go further with this and to mark the compilation failures for which we know the reason of the failure in a special way: - "file" - file too big (reported on Windows, e.g. http://www.boost.org/development/tests/develop/developer/output/MinGW-w64-4-...)

- "time" - time limit exceeded (e.g. http://www.boost.org/development/tests/develop/developer/output/teeks99-03b-...)

I saw them in many libraries and I think it would be very useful for libraries having many tests like Math or Geometry.

+1, time limit exceptions are a big issue for the Math and Multiprecision libs... and yes, I've already spent a lot of time refactoring to make them smaller, but they still persist. I suspect many of these are caused by virtual machines with insufficient RAM allocated, that then end up thrashing the HD: many of the tests that time out compile really pretty quickly here (even on obsolete hardware).

I see some time-out errors in VMD testing. The code makes heavy use of the preprocessor but no single test should last more than 5 minutes ( 300 seconds ).

I also would love to see time limit exceeded marked differently, and maybe some way of having regression testers to increase the time limit when those failures occur.

I think it is great work to have the regression test matrix marked more specifically.

I created a PR with the change: https://github.com/boostorg/regression/pull/13 Feel free to write your suggestions. Regards, Adam

Adam Wulkiewicz

6:46 p.m.

Edward Diener wrote:

...

On 5/11/2015 12:09 PM, John Maddock wrote:

...
+1, time limit exceptions are a big issue for the Math and Multiprecision libs... and yes, I've already spent a lot of time refactoring to make them smaller, but they still persist. I suspect many of these are caused by virtual machines with insufficient RAM allocated, that then end up thrashing the HD: many of the tests that time out compile really pretty quickly here (even on obsolete hardware).

I see some time-out errors in VMD testing. The code makes heavy use of the preprocessor but no single test should last more than 5 minutes ( 300 seconds ).

I also would love to see time limit exceeded marked differently, and maybe some way of having regression testers to increase the time limit when those failures occur.

I think it is great work to have the regression test matrix marked more specifically.

Done. Failures are marked according to the known reason. There are of course some time (time limit exceeded) failures reported for Multiprecision and VMD. But there are also cerr (internal compiler error), in particular many for VMD on GCC 4.8. Please check if everything is ok and don't hesitate to report bugs. Additional suggestions are also welcome. E.g. I'm thinking about upgrading the Summary page http://www.boost.org/development/tests/develop/developer/summary.html to also display specific failure type, i.e. the most significant one. Right now it's all yellow. Would it be useful for someone? The hierarchy could be: comp > link > run > fail > cerr > time > file The colors aren't displayed in regression matrices for master branch because the old styles are used: http://www.boost.org/development/tests/master/master.css while this one: http://www.boost.org/development/tests/develop/master.css was updated properly. Why? Regards, Adam

Paul A. Bristow

13 May 13 May

1:58 p.m.

-----Original Message-----

...

From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Adam Wulkiewicz Sent: 12 May 2015 19:47 To: boost@lists.boost.org Subject: Re: [boost] [all][testing] Regression summary upgrades

...

...
I also would love to see time limit exceeded marked differently, and maybe some way of having regression testers to increase the time limit when those failures occur.

...

...
I think it is great work to have the regression test matrix marked more specifically.

+1 :-)

...

Done. Failures are marked according to the known reason. There are of course some time (time limit exceeded) failures reported for Multiprecision and VMD. But there are also cerr (internal compiler error), in particular many for VMD on GCC 4.8.

...

Additional suggestions are also welcome.

...

E.g. I'm thinking about upgrading the Summary page http://www.boost.org/development/tests/develop/developer/summary.html to also display specific failure type, i.e. the most significant one. Right now it's all yellow. Would it be useful for someone? The hierarchy could be: comp > link > run > fail > cerr > time > file

Yellow is a bit discouraging, for example for Boost.Math where the details are much more encouragingly green. Is it easily possible to show the fraction of tests and/or platforms that pass? (However I accept that the requirements are different for libraries will zillions of tests compared to those with few). Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830

Adam Wulkiewicz

2:47 p.m.

Paul A. Bristow wrote:

...

-----Original Message-----

...
From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Adam Wulkiewicz E.g. I'm thinking about upgrading the Summary page http://www.boost.org/development/tests/develop/developer/summary.html to also display specific failure type, i.e. the most significant one. Right now it's all yellow. Would it be useful for someone? The hierarchy could be: comp > link > run > fail > cerr > time > file Yellow is a bit discouraging, for example for Boost.Math where the details are much more encouragingly green.

Exactly. After seing it I personally get an impression that Boost is not working in general, which of course is not true.

...

Is it easily possible to show the fraction of tests and/or platforms that pass?

(However I accept that the requirements are different for libraries will zillions of tests compared to those with few).

For now I prepared a PR adding the above proposal: https://github.com/boostorg/regression/pull/16 If it's merged we'll see if it helps and how much. When the summary is generated all of the tests from all libraries are checked anyway. This is done to generate the statistics shown on the top of the Summary page (Unusable, Regressions and New failures). And later when the table itself is generated. So, it wouldn't be a problem to gather some additional statistics for a library per toolset. But I don't have a clear view what should be presented and how. Well I have a few ideas: 1. In a cell put a percent of failing tests with a color of the type of the most significant one. I don't know if this would be clear enough because the numbers are in general hard to analyse on a first sight. 2. Instead of the numbers there could be some character representation of the percent of the tests that are failures like: [||| ] [--- ] or something similar. 3. Use some wierd unicode characters for the indicator. Or explicitly write fraction as: ¼, ½, ¾. 4. Or more graphically. The cells for failures probably could be divided into 2 colors vertically. The height of the yellow/orange bar inside a cell could indicate the percentage of the failures (with 4 possible heights: <=25%, <=50%, <=75% and <=100%). With this we could put the name of the most significant failure in the cell. This could be nice actually but it's hard to predict how clear would it be. It requires some testing. Do you have some specific idea? Regards, Adam

Adam Wulkiewicz

11:25 p.m.

Adam Wulkiewicz wrote:

...

Paul A. Bristow wrote:

...
Yellow is a bit discouraging, for example for Boost.Math where the details are much more encouragingly green.

Exactly. After seing it I personally get an impression that Boost is not working in general, which of course is not true.

...
Is it easily possible to show the fraction of tests and/or platforms that pass?

(However I accept that the requirements are different for libraries will zillions of tests compared to those with few).

<snip>

...

4. Or more graphically. The cells for failures probably could be divided into 2 colors vertically. The height of the yellow/orange bar inside a cell could indicate the percentage of the failures (with 4 possible heights: <=25%, <=50%, <=75% and <=100%). With this we could put the name of the most significant failure in the cell. This could be nice actually but it's hard to predict how clear would it be. It requires some testing.

This could look more or less like on the following picture (I hope it's ok to send images embedded in the email on the Boost list): Regards, Adam

Rob Stewart

14 May 14 May

12:20 a.m.

On May 13, 2015 5:25:27 PM MDT, Adam Wulkiewicz <adam.wulkiewicz@gmail.com> wrote:

...

This could look more or less like on the following picture (I hope it's ok to send images embedded in the email on the Boost list):

In case you haven't noticed, that doesn't work on this list. ___ Rob (Sent from my portable computation engine)

Adam Wulkiewicz

1:42 a.m.

2015-05-14 2:20 GMT+02:00 Rob Stewart <rob.stewart@verizon.net>:

...

On May 13, 2015 5:25:27 PM MDT, Adam Wulkiewicz <adam.wulkiewicz@gmail.com> wrote:

...
This could look more or less like on the following picture (I hope it's ok to send images embedded in the email on the Boost list):

In case you haven't noticed, that doesn't work on this list.

I haven't, thanks for the info. So once again, the summary showing graphical percentage of the failing tests per library/toolset could look like this: https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... Regards, Adam

Tom Kent

12:32 p.m.

On Wed, May 13, 2015 at 8:42 PM, Adam Wulkiewicz <adam.wulkiewicz@gmail.com> wrote:

...

So once again, the summary showing graphical percentage of the failing tests per library/toolset could look like this:

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

I like it, one suggestion: white background instead of green for the ones with some issue? Tom

Adam Wulkiewicz

1:54 p.m.

Tom Kent wrote:

...

On Wed, May 13, 2015 at 8:42 PM, Adam Wulkiewicz <adam.wulkiewicz@gmail.com> wrote:

...
So once again, the summary showing graphical percentage of the failing tests per library/toolset could look like this:

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

I like it, one suggestion: white background instead of green for the ones with some issue?

I'm glad that you like the overall idea. Your suggestion could look like this: https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... I considered it too but though that since the purpose of this change is to increase the overall greenness of the matrix according to some measure, in this case the percent of passed tests, then probably the green color should be kept. The above pictures are the result of a quick prototype. E.g. I'd like to allow using various colors for different failures. The colors would be set using styles so we'd be able to easily change them. Btw, this is how it could look like if the bar was horizontal: https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... As I see it, the vertical bar is more library oriented as it's easier to compare the tests of a library for different toolsets, because the row height is the same for all cells of the library. On the other hand the horizontal bar allows to compare various libraries for a single toolset/compiler easily, because the width of a column is the same. So it probably depends on who is watching the summary if he's reading the cells horizontally or vertically. Does he want to see the level of portability of some library or how many libraries are reasonably safe to use on a single platform. I guess the first way would be more useful for a library developer and the second way for a general user of Boost. So we could use one kind of display for a developer summary and the other one for a user summary. The catch is that for now the user summary is not generated. Does someone have an entirely different idea? Regards, Adam

Rene Rivera

2:21 p.m.

On Thu, May 14, 2015 at 8:54 AM, Adam Wulkiewicz <adam.wulkiewicz@gmail.com> wrote:

...

Btw, this is how it could look like if the bar was horizontal:

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

I was about to suggest horizontal before I got to this :-) Does someone have an entirely different idea?

...

I have two ideas.. Both of which I've experimented with in a cloud based regression reporting system I've worked on over the years (in my not copious spare time): a. Used a fixed sized icon that fills (either horizontal or vertical). Making it fixed sized reduces the weight toward either user vs. developer (but doesn't eliminate it completely). b. Use a regular shape, like a circle or square, that varies in size to show the percentage. This eliminates the bias entirely. Unfortunately the easiest way to do this one is with embedded SVGs. But it is possible, although hard, to do it with plain html+css. For an example of what this type of chart looks like take a look at the github puch card graph < https://github.com/boostorg/build/graphs/punch-card>. HTH. -- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net -- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail

Adam Wulkiewicz

3:31 p.m.

Rene Rivera wrote:

...

b. Use a regular shape, like a circle or square, that varies in size to show the percentage. This eliminates the bias entirely. Unfortunately the easiest way to do this one is with embedded SVGs. But it is possible, although hard, to do it with plain html+css. For an example of what this type of chart looks like take a look at the github puch card graph < https://github.com/boostorg/build/graphs/punch-card>.

That's a good idea. AFAIU it even doesn't need to be a regular shape. It could as well be a rectangle or an ellipse. For instance: https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... or https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... it the browser supported CSS3 rounded corners. On older browsers user should see a rectangle. Regards, Adam

Rene Rivera

3:40 p.m.

On Thu, May 14, 2015 at 10:31 AM, Adam Wulkiewicz <adam.wulkiewicz@gmail.com

...

wrote:

...

Rene Rivera wrote:

...
b. Use a regular shape, like a circle or square, that varies in size to show the percentage. This eliminates the bias entirely. Unfortunately the easiest way to do this one is with embedded SVGs. But it is possible, although hard, to do it with plain html+css. For an example of what this type of chart looks like take a look at the github puch card graph < https://github.com/boostorg/build/graphs/punch-card>.

That's a good idea. AFAIU it even doesn't need to be a regular shape. It could as well be a rectangle or an ellipse. For instance:

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... or

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... it the browser supported CSS3 rounded corners. On older browsers user should see a rectangle.

Two things.. It's definitely going to look more pleasing, more natural, and hence easier to understand if it's a regular shape. The human brain is biased to that kind of understanding. You can't map the percentage linearly to the size of the shape. You need to map it to the surface area of the shape. I know this makes it slightly harder.. But hey it's humans we are targeting, and they are nasty to deal with ;-) -- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net -- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail

Adam Wulkiewicz

8:31 p.m.

Rene Rivera wrote:

...

On Thu, May 14, 2015 at 10:31 AM, Adam Wulkiewicz <adam.wulkiewicz@gmail.com

...
wrote: Rene Rivera wrote:

...
b. Use a regular shape, like a circle or square, that varies in size to show the percentage. This eliminates the bias entirely. Unfortunately the easiest way to do this one is with embedded SVGs. But it is possible, although hard, to do it with plain html+css. For an example of what this type of chart looks like take a look at the github puch card graph < https://github.com/boostorg/build/graphs/punch-card>.

That's a good idea. AFAIU it even doesn't need to be a regular shape. It could as well be a rectangle or an ellipse. For instance:

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... or

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... it the browser supported CSS3 rounded corners. On older browsers user should see a rectangle.

Two things..

It's definitely going to look more pleasing, more natural, and hence easier to understand if it's a regular shape. The human brain is biased to that kind of understanding.

You can't map the percentage linearly to the size of the shape. You need to map it to the surface area of the shape. I know this makes it slightly harder.. But hey it's humans we are targeting, and they are nasty to deal with ;-)

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... Even when the shapes are disks it doesn't look right, because there is a lot of green around them, the cells are rectangular. It would probably be ok if the cells were squares but it'd require getting rid of the runners and toolsets names as they're now. I wanted to avoid major changes. Furthermore I'm not sure if that'd be really required. As I see it, for non-regular cells what is important is the relation of a yellow (failure) area and a green (pass) area in a cell. In other words how much is the cell filled with "failures". I don't think it's ideal since I agree that the regression matrix in general could have more modern look. But I think that for now it'd be good enough. Regards, Adam

Rene Rivera

8:38 p.m.

On Thu, May 14, 2015 at 3:31 PM, Adam Wulkiewicz <adam.wulkiewicz@gmail.com> wrote:

...

Rene Rivera wrote:

...
On Thu, May 14, 2015 at 10:31 AM, Adam Wulkiewicz < adam.wulkiewicz@gmail.com

...
wrote: Rene Rivera wrote:

b. Use a regular shape, like a circle or square, that varies in size to

...
show the percentage. This eliminates the bias entirely. Unfortunately the easiest way to do this one is with embedded SVGs. But it is possible, although hard, to do it with plain html+css. For an example of what this type of chart looks like take a look at the github puch card graph < https://github.com/boostorg/build/graphs/punch-card>.

That's a good idea. AFAIU it even doesn't need to be a regular shape. It could as well be a rectangle or an ellipse. For instance:

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... or

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... it the browser supported CSS3 rounded corners. On older browsers user should see a rectangle.

Two things..

It's definitely going to look more pleasing, more natural, and hence easier to understand if it's a regular shape. The human brain is biased to that kind of understanding.

You can't map the percentage linearly to the size of the shape. You need to map it to the surface area of the shape. I know this makes it slightly harder.. But hey it's humans we are targeting, and they are nasty to deal with ;-)

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

Even when the shapes are disks it doesn't look right, because there is a lot of green around them, the cells are rectangular. It would probably be ok if the cells were squares but it'd require getting rid of the runners and toolsets names as they're now. I wanted to avoid major changes. Furthermore I'm not sure if that'd be really required. As I see it, for non-regular cells what is important is the relation of a yellow (failure) area and a green (pass) area in a cell. In other words how much is the cell filled with "failures". I don't think it's ideal since I agree that the regression matrix in general could have more modern look. But I think that for now it'd be good enough.

Yeah, in the current limited arrangement it's not a good choice. Without of course, changing the arrangement to separate out the percent indicator. I'd say go with the plain horizontal bar. And think about it more later if you really want to. -- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net -- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail

Rene Rivera

8:39 p.m.

On Thu, May 14, 2015 at 3:38 PM, Rene Rivera <grafikrobot@gmail.com> wrote:

...

On Thu, May 14, 2015 at 3:31 PM, Adam Wulkiewicz < adam.wulkiewicz@gmail.com> wrote:

...
Rene Rivera wrote:

...
On Thu, May 14, 2015 at 10:31 AM, Adam Wulkiewicz < adam.wulkiewicz@gmail.com

...
wrote: Rene Rivera wrote:

b. Use a regular shape, like a circle or square, that varies in size to

...
show the percentage. This eliminates the bias entirely. Unfortunately the easiest way to do this one is with embedded SVGs. But it is possible, although hard, to do it with plain html+css. For an example of what this type of chart looks like take a look at the github puch card graph < https://github.com/boostorg/build/graphs/punch-card>.

That's a good idea. AFAIU it even doesn't need to be a regular shape. It could as well be a rectangle or an ellipse. For instance:

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... or

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... it the browser supported CSS3 rounded corners. On older browsers user should see a rectangle.

Two things..

It's definitely going to look more pleasing, more natural, and hence easier to understand if it's a regular shape. The human brain is biased to that kind of understanding.

You can't map the percentage linearly to the size of the shape. You need to map it to the surface area of the shape. I know this makes it slightly harder.. But hey it's humans we are targeting, and they are nasty to deal with ;-)

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

Even when the shapes are disks it doesn't look right, because there is a lot of green around them, the cells are rectangular. It would probably be ok if the cells were squares but it'd require getting rid of the runners and toolsets names as they're now. I wanted to avoid major changes. Furthermore I'm not sure if that'd be really required. As I see it, for non-regular cells what is important is the relation of a yellow (failure) area and a green (pass) area in a cell. In other words how much is the cell filled with "failures". I don't think it's ideal since I agree that the regression matrix in general could have more modern look. But I think that for now it'd be good enough.

Yeah, in the current limited arrangement it's not a good choice. Without of course, changing the arrangement to separate out the percent indicator. I'd say go with the plain horizontal bar. And think about it more later if you really want to.

PS. Thanks for giving it a try for my sake :-) -- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net -- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail

Tom Kent

3:45 p.m.

On Thu, May 14, 2015 at 8:54 AM, Adam Wulkiewicz <adam.wulkiewicz@gmail.com> wrote:

...

Tom Kent wrote:

...
On Wed, May 13, 2015 at 8:42 PM, Adam Wulkiewicz < adam.wulkiewicz@gmail.com> wrote:

So once again, the summary showing graphical percentage of the failing

...
tests per library/toolset could look like this:

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

I like it, one suggestion: white background instead of green for the ones with some issue?

I'm glad that you like the overall idea. Your suggestion could look like this:

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

I considered it too but though that since the purpose of this change is to increase the overall greenness of the matrix according to some measure, in this case the percent of passed tests, then probably the green color should be kept.

Yeah, after seeing that, I do think the green is better than the white, ignore my suggestion.

...

Btw, this is how it could look like if the bar was horizontal:

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

Ooo, nice. I like horizontal much better. Tom

Peter Dimov

8:45 p.m.

Tom Kent wrote:

...

...
Btw, this is how it could look like if the bar was horizontal:

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

Ooo, nice. I like horizontal much better.

This looks pretty good, but I would put the green on the left (the idea being that we go from the undesirable 0% green to our target of 100% green.)

Adam Wulkiewicz

9:24 p.m.

Peter Dimov wrote:

...

Tom Kent wrote:

...
...
Btw, this is how it could look like if the bar was horizontal:

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen...

Ooo, nice. I like horizontal much better.

This looks pretty good, but I would put the green on the left (the idea being that we go from the undesirable 0% green to our target of 100% green.)

https://raw.githubusercontent.com/awulkiew/data-images/master/summary-percen... I like it. Ok, I'll give it a try and we'll see how it looks like when applied to the full matrix. Regards, Adam

Paul A. Bristow

18 May 18 May

11:24 a.m.

...

-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Tom Kent Sent: 14 May 2015 16:46 To: Boost Developers List Subject: Re: [boost] [all][testing] Regression summary upgrades

On Thu, May 14, 2015 at 8:54 AM, Adam Wulkiewicz <adam.wulkiewicz@gmail.com> wrote:

...
Tom Kent wrote:

...
On Wed, May 13, 2015 at 8:42 PM, Adam Wulkiewicz < adam.wulkiewicz@gmail.com> wrote:

So once again, the summary showing graphical percentage of the failing

...
tests per library/toolset could look like this: Btw, this is how it could look like if the bar was horizontal:

https://raw.githubusercontent.com/awulkiew/data-images/master/summary- percent-graphical-h.png

Ooo, nice. I like horizontal much better.

Ooooo Yes! I love it. (Green to the left and yellow on right is even nicer). And thanks for your efforts on this. Paul (recovering-from-holiday-mode) --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830

Dmitry Moskalchuk

11 May 11 May

5:53 p.m.

On 11/05/15 19:09, John Maddock wrote:

...

...
The last category of failures (generic "fail") is for the others or unknown. Currently the "Lib" errors are falling into this category. There are some in Math, e.g.: http://www.boost.org/development/tests/develop/developer/output/CrystaX-NET-...

That's a weird one - if you follow the links it actually says that linking the lib succeeded - which leaves me wondering what actually went wrong?

Actually there is error "clang: error: cannot specify -o when generating multiple output files" on building pre-compiled header. It's unclear for me why link was done after compile failure; probably there is bug with wrong propagation of exit code or something like that. I'll try to look into that and figure out what's wrong. -- Dmitry Moskalchuk

Adam Wulkiewicz

12 May 12 May

10:52 a.m.

Dmitry Moskalchuk wrote:

...

On 11/05/15 19:09, John Maddock wrote:

...
...
The last category of failures (generic "fail") is for the others or unknown. Currently the "Lib" errors are falling into this category. There are some in Math, e.g.: http://www.boost.org/development/tests/develop/developer/output/CrystaX-NET-...

That's a weird one - if you follow the links it actually says that linking the lib succeeded - which leaves me wondering what actually went wrong? Actually there is error "clang: error: cannot specify -o when generating multiple output files" on building pre-compiled header. It's unclear for me why link was done after compile failure; probably there is bug with wrong propagation of exit code or something like that. I'll try to look into that and figure out what's wrong.

I saw also "Lib" errors where the compilation suceeded but linking wasn't mentioned, in the past in Geometry, AFAIR for newly tested compilers. Like those currently reported for Iostreams: http://www.boost.org/development/tests/develop/developer/iostreams.html e.g. see http://www.boost.org/development/tests/develop/developer/output/BenPope%20x8... Now it should be easier to track them. I'm wondering, if this is some bug in Build would it be helpful to detect such issues and mark the tests differently? Of course everything according to the test type - run, compile, compile-fail, etc. For instance: - a "Lib" test mentioning only succeeding phases, e.g. only mentioning succeeding compilation when the test should run, - more general, a test mentioning only some of the phases, so not mentioning the desired phase, - a test with failing previous phases but succeeding next phases, e.g. failing compilation but succeeding linking. Could something like this be helpful? Regards, Adam

Steven Watanabe

2:44 p.m.

AMDG On 05/12/2015 04:52 AM, Adam Wulkiewicz wrote:

...

- a test with failing previous phases but succeeding next phases, e.g. failing compilation but succeeding linking.

This shouldn't ever happen. If it does, it's probably a bug in process_jam_log or the test reporting system. In Christ, Steven Watanabe

Rene Rivera

2:47 p.m.

On Tue, May 12, 2015 at 9:44 AM, Steven Watanabe <watanabesj@gmail.com> wrote:

...

AMDG

On 05/12/2015 04:52 AM, Adam Wulkiewicz wrote:

...
- a test with failing previous phases but succeeding next phases, e.g. failing compilation but succeeding linking.

This shouldn't ever happen. If it does, it's probably a bug in process_jam_log or the test reporting system.

Indeed.. And please report such bugs to < https://github.com/boostorg/regression/issues>. And please include links to the problem pages *and* a reference to the git commit in the boost super project that is showing the problem. -- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net -- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail

Adam Wulkiewicz

3:39 p.m.

Rene Rivera wrote:

...

On Tue, May 12, 2015 at 9:44 AM, Steven Watanabe <watanabesj@gmail.com> wrote:

...
AMDG

On 05/12/2015 04:52 AM, Adam Wulkiewicz wrote:

...
- a test with failing previous phases but succeeding next phases, e.g. failing compilation but succeeding linking.

This shouldn't ever happen. If it does, it's probably a bug in process_jam_log or the test reporting system.

Indeed.. And please report such bugs to < https://github.com/boostorg/regression/issues>. And please include links to the problem pages *and* a reference to the git commit in the boost super project that is showing the problem.

This is why I though about detecting them automatically and displaying in the regression matrix. This way it would be trivial to spot them. Regards, Adam

Thomas Trummer

13 May 13 May

7:41 p.m.

On Mon, May 11, 2015 at 6:09 PM, John Maddock <jz.maddock@googlemail.com> wrote:

...

+1, time limit exceptions are a big issue for the Math and Multiprecision libs... and yes, I've already spent a lot of time refactoring to make them smaller, but they still persist. I suspect many of these are caused by virtual machines with insufficient RAM allocated, that then end up thrashing the HD: many of the tests that time out compile really pretty quickly here (even on obsolete hardware).

I can confirm this. With most other libraries cl.exe uses at most 100MiB of RAM but with Math it's easily over 1GiB. If you use parallel builds (e.g. -j2) this adds up pretty quickly (with bjam.exe using about 600MiB). Also with Math there are usually four or five instances of the compiler running at the same time (even if you specify -j2). Is this normal?

3711

Age (days ago)

3718

Last active (days ago)

List overview

Download

27 comments

11 participants

participants (11)

Adam Wulkiewicz
Dmitry Moskalchuk
Edward Diener
John Maddock
Paul A. Bristow
Peter Dimov
Rene Rivera
Rob Stewart
Steven Watanabe
Thomas Trummer
Tom Kent