Re: [boost] Boost Library Testing - a modest proposal - was boost.test regression or behavior change (was Re: Boost.lockfree)

9 Oct 2015

      On 10/9/15 10:54 AM, Raffi Enficiaud wrote:

It's hard to tell, but it seems to me that so far we're in agreement.
...
...
b) Testing on other platforms.
We have a system which has worked pretty well for many years. Still it
has some features that I'm not crazy about.
i) it doesn't scale well - as boost gets bigger the testing load gets
bigger.
I suggested a test procedure on "stages of quality" in my previous post:
- fast feedback by continuous runners, giving a quick status on some
mainstream compilers. Runners may have overlapping configuration/setup,
so that the load is balanced somehow.
- scheduling of less available runners on candidates selected from
previous stage. The interface can be by increasing a git branch, the
runners picking that branch only.
This a pretty elaborate setup.  And also fairly ambiguous to me. Seems 
like implementing such a thing would be quite an effort - by whom I 
don't know.
...
...
ii) it tests the develop branch of each library against the develop
branch of all the other libraries
...
...
Exactly,
OK - so we're agreement about this.
...
but also not being able to track down the history of the
versions on the current dashboard is far from helping. As a developer, I
would like to see a summary of eg. the number of failing tests vs.
number of test, and *per revision*.
I don't think such information would be useful to me.  But maybe that's 
just me.
...
...
iii) it relies on volunteer testers to select compilers/platforms to
test under. So it's not exhaustive and the selection might not reflect
that which people are actually using.
I would say that it would be good if each runner publishes the setup
(not the runtime, but how it has been deployed), and maybe a script for
being able to reproduce this runner. I think about docker (and how easy
it is to describe fully a system), there are tools for the other
platforms, more complicated though.

...
The idea behind that is to be able to reproduce the runners, so that
they are not shown by name (eg. teeks99-08) but by property (eg.
win2012R2-64on64, msvc-12). I am not saying that the current setup
should not be followed, I am suggesting a way to address the scalability
issue. For that we can have equivalent runners and balance the load.
Sounds very ambitious and complex.
...
...
I would like to see us encourage our users to test the libaries that
they use. This system would work in the following way.
If by users you mean the post-release /end users/, are you expecting a
post-release feedback? I am not sure I understand.
This suggestion doesn't address pre-release issues.  Frankly, except for 
a few issues (develop vs master) cited above I don't think they are a 
big problem and I think the current testing setup is adequate.

But this system can really only test the combinations that the testers 
select.  The problem comes up after release when one gets bug reports 
form users of the released library.  I would like to get these sooner 
rather than later and on the platforms that people are actually using.
I often get issues reported which are related the current configuration 
but but the user hasn't run the latest tests on his current setup so all 
I get is a complaint.  If the user ran the tests on the libraries which 
he's using (which he should be doing in any case!) I'd have a lot more 
to work with and bugs would get discovered and addressed sooner with 
less effort.

Of course if users want to switch to develop branch on those libraries 
they use and run the tests pre-release - that would be great.  But I'm 
not really expecting many people to do that.
...
BTW, do we have numbers on the number of ppl downloading an release
candidate?
I'm guessing we do.
...
...
a) A user downloads/builds boost.
...
...
So we expect having html pages of 10000 columns. I think again the
information needs to be digested.
LOL - that would be great !!!  Of course if such a proposal were to be 
so wildly successful so as to create such a problem, we'd have to 
upgrade our archiving and inquiry of test results.  I'm not losing any 
sleep regarding this issue right now.
...
...
f) we would discourage uses from just using the boost libraries without
runnig they're own tests. We would do this by exhortation and by
refusing to support users who have been unwilling to run and post local
tests.
Mmmm... sounds bad to me.
LOL - we can't agree on everything.
...
...
This would give us the following:
a) a scalable testing setup which could handle a Boost containing any
number of libraries.
And what about just a randomized test?
I don't see how that would be better.
...
...
c) We would have statistics on libraries being used. Something we are
sorely lacking now.
I am wondering why this would be relevant.
OK - it's not really relevant as far as testing is concerned.  This 
information would become available as a side effect.

But it would be extremely useful to know that library X has N users. 
This would help indicate which libraries might be considered for 
elimination from the standard boost distribution.  If something like 
"boost/shared_ptr" is used by only 10 people - it would be interesting 
to know.  If the serialization library is only used by 10 people, I 
would be very interesting to know.  Etc.
...
...
And best of all - We're almost there !!!! we'd only need to:
a) enhance slightly the dependency tools we've crafted but aren't
actually using.
The dependencies are indirectly tested I would say, so testing the
dependencies is a /nice to have/, but if I am using X that depends on Y,
testing X should in most cases be enough.
Let's suppose I'm going to use some boost library X and Y (through 
dependency) as part of the aircraft control system of the next 400 
person passenger plane.  Wouldn't you feel safer if all the code used in 
the system were tested? Would you say it's good enough only test some of 
it?  And if you can run the tests almost for free, is there any reason 
you would skip it?

Basically if I'm going to deploy X in my product and it depends on Y and 
Z, all those should be tested in my environment.  And there's absolutely 
no reason not to do this.

OK - I didn't explain this well.
...
...
b) develop a tool to post the local results to a common dashboard
c) enhance the current dashboard to accept these results.
Several tools exist already, eg. CDash together with cmake. Why spending
that much effort in developing our tools? Our expectations are not that
different than many other open or closed source softwares: we want quick
and/or wide feedback on the development state of boost.
I totally agree.

But it's not that simple when you got down to details.  I have personal 
experience with CDash.  I've used as part of the Safe Numerics library 
to be found at www.blincubator.com .   I've recommend it's usage and 
describe how to use it at that same web site.  So I'm  more familiar 
with it than most.  It's pretty tightly coupled to CMake and CTest and I 
don't see an obvious way to use it with our bjam test setup.  How about 
replacing bjam with CMake - interesting but not simple either as they 
don't really match in capability.  And the test reporting isn't quite up 
to our needs.

Having a bit experience in all this in the context of Boost, I still 
believe they path I've proposed is the best one.

Robert Ramey