Negative effect of expected-to-fail tests in compiler development projects.

Sergey Sprogis

13 Dec 2015 13 Dec '15

3:11 a.m.

I’m wondering how difficult it would be to add into bjam one option, something like: -negative_tests=off which will not launch any of negative (expected to fail) tests. For people who use Boost to test newly developed compilers negative tests are quite a nuisance. I mean those tests which got “compile_fail” and “run_fail” status inside final *.xml file generated at the end of Boost regression testing. [ they are also marked similarly inside Jamfiles ] That’s because newly developed compilers in their early stages of implementation normally have bugs which produce a lot o false error messages for correct Boost code. Important task here is to extract those messages, to evaluate them, and to prove that they are indeed correct, but compiler is wrong. And when hundreds of such false error messages are mixing together with thousands of legitimate error messages produced by negative tests (there are > 700 of them in boost.1.59.0, for example) it becomes a non-trivial task to filter them out. So the natural desire is not to launch such tests at all. Another, slightly unpleasant effect of negative tests is that they appear during so called “test pass rate” calculations during compiler testing process. Typically managers responsible for compiler development want to know the progress in terms of what “test pass rate” produces new compiler for the whole Boost testing or for the testing of some specific libraries. Normally such pass rate is calculated as the ratio between number of passed tests and the total number of tests. But ideally, tests in those calculations should be correct, so if they fail, it 100% means compiler bugs. And here again, negative tests are not useful, and should be avoided to make calculations more accurate. On a side note, I think it could be also useful to add such total testing pass rates into the Boost Regression Dashboard, so the quality of every tested compiler could be easy to seen. Much more people could be interested in looking at such dashboard, I guess.

Show replies by date

Steven Watanabe

13 Dec 13 Dec

10:13 p.m.

AMDG On 12/12/2015 08:11 PM, Sergey Sprogis wrote:

...

I’m wondering how difficult it would be to add into bjam one option, something like:

-negative_tests=off

which will not launch any of negative (expected to fail) tests.

For people who use Boost to test newly developed compilers negative tests are quite a nuisance. I mean those tests which got “compile_fail” and “run_fail” status inside final *.xml file generated at the end of Boost regression testing. [ they are also marked similarly inside Jamfiles ]

If you're using the xml files, then isn't there enough information to filter out these results automatically?

...

That’s because newly developed compilers in their early stages of implementation normally have bugs which produce a lot o false error messages for correct Boost code. Important task here is to extract those messages, to evaluate them, and to prove that they are indeed correct, but compiler is wrong.

And when hundreds of such false error messages are mixing together with thousands of legitimate error messages produced by negative tests (there are > 700 of them in boost.1.59.0, for example) it becomes a non-trivial task to filter them out. So the natural desire is not to launch such tests at all.

If you're using the jam output directly, you can filter by searching for "...failed" or just run the tests a second time, which will only attempt to run tests that failed the first time.

...

Another, slightly unpleasant effect of negative tests is that they appear during so called “test pass rate” calculations during compiler testing process.

Typically managers responsible for compiler development want to know the progress in terms of what “test pass rate” produces new compiler for the whole Boost testing or for the testing of some specific libraries. Normally such pass rate is calculated as the ratio between number of passed tests and the total number of tests. But ideally, tests in those calculations should be correct, so if they fail, it 100% means compiler bugs. And here again, negative tests are not useful, and should be avoided to make calculations more accurate.

I'm not sure I follow. Shouldn't the compiler accepting incorrect code also be considered a compiler bug?

...

On a side note, I think it could be also useful to add such total testing pass rates into the Boost Regression Dashboard, so the quality of every tested compiler could be easy to seen. Much more people could be interested in looking at such dashboard, I guess.

In Christ, Steven Watanabe

Sergey Sprogis

11 p.m.

On 12/13/2015 2:13 PM, Steven Watanabe wrote:

...

AMDG

On 12/12/2015 08:11 PM, Sergey Sprogis wrote:

...
I’m wondering how difficult it would be to add into bjam one option, something like:

-negative_tests=off

which will not launch any of negative (expected to fail) tests.

For people who use Boost to test newly developed compilers negative tests are quite a nuisance. I mean those tests which got “compile_fail” and “run_fail” status inside final *.xml file generated at the end of Boost regression testing. [ they are also marked similarly inside Jamfiles ]

If you're using the xml files, then isn't there enough information to filter out these results automatically?

Yes, it's possible, and actually I'm doing that. And if I'm the only person on this alias who needs it, then of course it's not worth the effort to do anything else.

...

...
That’s because newly developed compilers in their early stages of implementation normally have bugs which produce a lot o false error messages for correct Boost code. Important task here is to extract those messages, to evaluate them, and to prove that they are indeed correct, but compiler is wrong.

And when hundreds of such false error messages are mixing together with thousands of legitimate error messages produced by negative tests (there are > 700 of them in boost.1.59.0, for example) it becomes a non-trivial task to filter them out. So the natural desire is not to launch such tests at all.

If you're using the jam output directly, you can filter by searching for "...failed" or just run the tests a second time, which will only attempt to run tests that failed the first time.

I think '...failed' contains the number of failed targets which is not always the same as number of failed tests. But actually, for my purpose I need to filter out all compilation errors produced by negative tests which are noise to me.. I also need to calculate total number of positive tests, and how much of them failed. All that can be done using *.xml file, but I do not thing jam output allows me to do that accurately.

...

...
Another, slightly unpleasant effect of negative tests is that they appear during so called “test pass rate” calculations during compiler testing process.

Typically managers responsible for compiler development want to know the progress in terms of what “test pass rate” produces new compiler for the whole Boost testing or for the testing of some specific libraries. Normally such pass rate is calculated as the ratio between number of passed tests and the total number of tests. But ideally, tests in those calculations should be correct, so if they fail, it 100% means compiler bugs. And here again, negative tests are not useful, and should be avoided to make calculations more accurate.

I'm not sure I follow. Shouldn't the compiler accepting incorrect code also be considered a compiler bug?

Yes, it's a compiler bug, but usually for the vast majority of negative tests compiler produces error messages. The problem is that those messages are irrelevant for my purpose. I need to filter them out to be focused only on false error messages from positive tests which indicate compiler bug. My goal is to find compiler bugs, not source code bugs, and I do not need negative tests for that. Concerning compiler bugs related to successful compilation of incorrect code. That type of bugs though important, but are pretty rare, and usually on earlier stages of compiler implementations, they are not first priority.

...

...
On a side note, I think it could be also useful to add such total testing pass rates into the Boost Regression Dashboard, so the quality of every tested compiler could be easy to seen. Much more people could be interested in looking at such dashboard, I guess.

In Christ, Steven Watanabe

_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

John Maddock

14 Dec 14 Dec

6:23 p.m.

...

Yes, it's a compiler bug, but usually for the vast majority of negative tests compiler produces error messages. The problem is that those messages are irrelevant for my purpose. I need to filter them out to be focused only on false error messages from positive tests which indicate compiler bug. My goal is to find compiler bugs, not source code bugs, and I do not need negative tests for that. Concerning compiler bugs related to successful compilation of incorrect code. That type of bugs though important, but are pretty rare, and usually on earlier stages of compiler implementations, they are not first priority.

I understand all that, but remember that this is written by volunteers, and that we have no real need for what you want to do - our aim is to test source code and catch regressions, not to test compilers as such. So I'm afraid you'll probably have to patch things yourself :( How about gutting the contents of of the compile-fail rule in tools/build/src/tools/testing.jam? HTH, John.

Nat Goodspeed

8:12 p.m.

On Mon, Dec 14, 2015 at 1:23 PM, John Maddock <jz.maddock@googlemail.com> wrote:

...

...
The problem is that those messages are irrelevant for my purpose. I need to filter them out to be focused only on false error messages from positive tests which indicate compiler bug.

...

I'm afraid you'll probably have to patch things yourself :(

How about gutting the contents of of the compile-fail rule in tools/build/src/tools/testing.jam?

I must admit, though I'm not testing a compiler myself, I've been bothered before by compile-fail output. We build and test internal Boost packages on a TeamCity instance, with a filter that logs lines that "look like" build errors. But when we hit a /real/ build error for some platform, it can take forever to find it in the log output: our filter produces a dismaying quantity of output because of all the expected failure messages. Suggestion: maybe for compile-fail, capture compiler output separately, and only display it if the termination code isn't as expected? (Maybe also, for test coders, support a switch to display it unconditionally.)

Aparna Kumta

18 Dec 18 Dec

1:22 a.m.

On 12/14/15 10:23, John Maddock wrote:

...

I understand all that, but remember that this is written by volunteers, and that we have no real need for what you want to do - our aim is to test source code and catch regressions, not to test compilers as such. So I'm afraid you'll probably have to patch things yourself :(

How about gutting the contents of of the compile-fail rule in tools/build/src/tools/testing.jam?

Thanks for the suggestion. This is very useful to remove the 'noise' generated by the expected-to-fail tests while looking for real compiler issues. Aparna

...

HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

John Maddock

14 Dec 14 Dec

8:52 a.m.

...

If you're using the jam output directly, you can filter by searching for "...failed" or just run the tests a second time, which will only attempt to run tests that failed the first time.

That's the easiest way - build all the tests and then do an incremental build and use the output from that. HTH, John.

3484

Age (days ago)

3489

Last active (days ago)

List overview

Download

4 comments

5 participants

participants (5)

Aparna Kumta
John Maddock
Nat Goodspeed
Sergey Sprogis
Steven Watanabe