Re: [Boost-users] Negative effect of expected-to-fail tests in compiler development projects.

13 Dec 2015


      AMDG

On 12/12/2015 08:11 PM, Sergey Sprogis wrote:
...
I’m wondering how difficult it would be to add  into bjam one option,
something like:
-negative_tests=off
which will not launch any of negative (expected to fail) tests.
For people who use Boost to test newly developed compilers negative
tests are  quite a nuisance.
I mean those tests which got “compile_fail” and “run_fail” status inside
final *.xml file generated at the end of Boost regression testing.
[ they are also marked similarly inside Jamfiles ]
If you're using the xml files, then
isn't there enough information to
filter out these results automatically?
...
That’s because newly developed compilers in their early stages of
implementation normally have  bugs which produce a lot o false error
messages for correct Boost code.
Important task here is to extract those messages, to evaluate them, and
to prove that they are indeed correct, but  compiler is wrong.
And when hundreds of such false error messages are mixing together with
thousands of legitimate error messages produced by negative tests (there
are > 700  of them in boost.1.59.0, for example) it becomes a
non-trivial task to filter them out. So the natural desire is not to
launch such tests at all.
If you're using the jam output directly,
you can filter by searching for "...failed"
or just run the tests a second time, which
will only attempt to run tests that failed
the first time.
...
Another, slightly unpleasant effect of negative tests is that they
appear during so called “test pass rate” calculations during compiler
testing process.
Typically managers responsible for compiler development  want to know
the progress in terms of what “test pass rate”  produces new compiler
for the whole Boost testing or for
the testing of some specific libraries. Normally such pass rate  is
calculated as the ratio between number of passed tests and the total
number of tests.
But ideally, tests in those calculations should be correct, so if they
fail, it 100% means compiler bugs.
And here again, negative tests are not useful, and should be avoided to
make calculations more accurate.
I'm not sure I follow.  Shouldn't the
compiler accepting incorrect code also
be considered a compiler bug?
...
On a side note, I think it could be also useful to add such total
testing pass rates into the
Boost Regression Dashboard, so the quality of every tested compiler
could be easy to seen.
Much more people  could be interested in looking at such dashboard, I
guess.
In Christ,
Steven Watanabe