Le 09/10/15 18:37, Robert Ramey a écrit :
I believe this whole thread started from the changes in Boost.Test such that it can no longer support testing of C++03 compatible libraries. This is totally unrelated to the testing of Boost libraries.
The thread started because boost.test broke something used by other libraries, in a development branch, which raised some misunderstanding on the purpose of this branch and the overall workflow. As a side note, I reverted the changes so that C++03 is not required for the set of features that are not explicitly stating this requirement in the documentation of 1.59 (datasets mainly, but also some forms of test declaration and test assertions).
Here is what I would like to see:
a) local testing by library developers.
Of course library developers need this in order to develop and maintain libraries.
Currently we have this and has worked quite well for many years. Making Boost.Test require C++11+ throws a monkey wrench into things for the libraries which use it. But that's only temporary. Libraries whose developers feel they need to maintain compatibility with C++98 can move to lightweight test with relatively little effort.
I do not think that local testing has ever been an issue. The value of the dashboard is on the scalability of the testing wrt. platforms/compiler combinations, especially for configurations that are hard to find today (eg. MSVC7) and/or hard to set up (eg. Android). I would also like to emphasis the difference between the unit testing tool (boost.test or lightweight) and the test driver (bjam): - The "API" for running the test bed is bjam. This is used by developers and the regression testing workflow - The API for writing tests can whatever developer like, boost.test is just one choice, which is not directly seen by the regression dashboard.
Developers who are concerned that the develop branch is a "soup" can easily isolate themselves from this by testing against the master branch of all the other libraries. The Boost modularization system with git has made this very simple and practicle (thank you Beman!).
So - not a problem.
Right: this is trivial locally, yet this is not the current workflow of the regression dashboard. The complains started because of failures in develop, and because of workflow considerations + safe increments. As a developer, I would like to test my library on many runners (and as fast as possible).
b) Testing on other platforms.
We have a system which has worked pretty well for many years. Still it has some features that I'm not crazy about.
i) it doesn't scale well - as boost gets bigger the testing load gets bigger.
I suggested a test procedure on "stages of quality" in my previous post: - fast feedback by continuous runners, giving a quick status on some mainstream compilers. Runners may have overlapping configuration/setup, so that the load is balanced somehow. - scheduling of less available runners on candidates selected from previous stage. The interface can be by increasing a git branch, the runners picking that branch only.
ii) it tests the develop branch of each library against the develop branch of all the other libraries - hence we have a testing "soup" where a test might show failure but this failure might not be related to the library under test but some other library. It diminishes the utility of the test results in tracking down problems.
Exactly, but also not being able to track down the history of the versions on the current dashboard is far from helping. As a developer, I would like to see a summary of eg. the number of failing tests vs. number of test, and *per revision*.
iii) it relies on volunteer testers to select compilers/platforms to test under. So it's not exhaustive and the selection might not reflect that which people are actually using.
I would say that it would be good if each runner publishes the setup (not the runtime, but how it has been deployed), and maybe a script for being able to reproduce this runner. I think about docker (and how easy it is to describe fully a system), there are tools for the other platforms, more complicated though. The idea behind that is to be able to reproduce the runners, so that they are not shown by name (eg. teeks99-08) but by property (eg. win2012R2-64on64, msvc-12). I am not saying that the current setup should not be followed, I am suggesting a way to address the scalability issue. For that we can have equivalent runners and balance the load.
I would like to see us encourage our users to test the libaries that they use. This system would work in the following way.
If by users you mean the post-release /end users/, are you expecting a post-release feedback? I am not sure I understand. BTW, do we have numbers on the number of ppl downloading an release candidate?
a) A user downloads/builds boost.
b) he decides he's going to use library X, and Y
c) he runs a tool which tells him which libraries he has to test. This would be the result of a dependency analysis. We have tools which do similar dependency analysis but they would have to be slightly enhanced to distinguish between testing, deployment, etc. I don't think this would be a huge undertaking given the work that has already been done.
d) he runs the local testing setup on those libraries and their dependents.
e) he uploads the test results to a dashboard similar if not identical to the current one.
So we expect having html pages of 10000 columns. I think again the information needs to be digested.
f) we would discourage uses from just using the boost libraries without runnig they're own tests. We would do this by exhortation and by refusing to support users who have been unwilling to run and post local tests.
Mmmm... sounds bad to me.
This would give us the following:
a) a scalable testing setup which could handle a Boost containing any number of libraries.
And what about just a randomized test? Say we have an ever growing number of tests N (big), but the acceptance or running N is decreasing with N. Say we limit to M << N (say 100), and we shuffle uniformly: the feedback would be much faster, the acceptance much higher. On our side, we need some machinery to digest this information based on the environment setup.
b) All combinations of libraries/platforms/compilers actually being used would be those being tested and vice versa. We would have complete and efficient test coverage.
c) We would have statistics on libraries being used. Something we are sorely lacking now.
I am wondering why this would be relevant.
d) We would be encouraging better software development practices. Sometime ago someone posted that he had a problem but couldn't run the tests because "management" wouldn't allocate the time - and this was a critical human life safety app. He escaped before I could weedle out of him which company he worked.
And best of all - We're almost there !!!! we'd only need to:
a) enhance slightly the dependency tools we've crafted but aren't actually using.
The dependencies are indirectly tested I would say, so testing the dependencies is a /nice to have/, but if I am using X that depends on Y, testing X should in most cases be enough. If it happens that the some breakage goes unnoticed through the tests of X, having tested Y might have helped but this is not trivial: coverage of X should be improved.
b) develop a tool to post the local results to a common dashboard c) enhance the current dashboard to accept these results.
Several tools exist already, eg. CDash together with cmake. Why spending that much effort in developing our tools? Our expectations are not that different than many other open or closed source softwares: we want quick and/or wide feedback on the development state of boost. Raffi