Re: [boost] The Future of Boost - CI

9 May 2023

      On Mon, May 8, 2023 at 5:22 PM Andrey Semashev via Boost <
boost@lists.boost.org> wrote:
...
On 5/8/23 20:40, John Maddock via Boost wrote:
...
Machine time could well be donated by volunteers and perhaps replace the
current test/status matrix, which is fine, but requires you to go off
seeking for results, which may or may not have cycled yet.  Plus that
matrix relies on a "build the whole of Boost" approach which
increasingly simply does not scale.
I'm really grateful to the volunteers that run the tests and maintain
the official test matrix, but honestly, I'm not paying attention to it
anymore. I have three main issues with it:
1. Slow turnaround. From my memory, it could take weeks or more for the
runners to run the tests over a commit I made. With this order of times,
it is impossible to perform continued development while maintaining code
in working state.
2. Lack of notifications.
3. Problematic debugging. It was not uncommon that a test run failed
because of some misconfiguration on the runner's side. And it was also
not uncommon that build logs were unavailable.
So, while, again, I'm most grateful to the people that made public
testing possible at all before we had the current CI services, today we
do have CI services with the above problems fixed (more or less), and
I'm spoiled by them. It is true that the public CI resources are limited
and insufficient at times, so, IMHO, the way forward would be towards
fixing this problem without losing the convenience of the CI services
we've become used to.
As the person responsible for most (currently all) of the test runners in
the official test matrix, I thought I'd throw a couple cents in here.

I currently have three machines running these tests. These machines were
purchased by boost (then from SFC) in 2017, and are a bit old but still
running well.

One of the machines is running windows server, it cycles running through
the last six versions of visual studio (plus the latest one with
`/std:c++latest`) for develop + master.

The two other machines are linux runners. One of them just runs the latest
version of clang + gcc for develop + master, at about 2hr a pop there
should never be a commit that goes more than ~8hr without one of these
configurations running.

The other machine runs a *huge* number (>150) of gcc/clang configs. This
takes approximately a week to get through all of them, but provides a
breadth of testing that isn't available anywhere else. See the table in
this readme [1] for the full list, as well as a bit more about the runners.

To address a bit of #3 above, these are fully running on docker containers,
I'd be more than happy to add other users configs to this list.

I've also got a couple raspberry pi machines running gcc + clang /
develop + master, but they are *very* slow (20hrs?). I've also got a
RISC-V SBC that I want to get this going on, but haven't found the time yet.
Architecture *shouldn't* matter for most of what we do, but there are a
couple edge cases where it can be useful.

I don't spend a lot of time on caring and feeding of this, so would be
happy to keep it going in the future...but the down side of that is that
there have been (and I'm sure will continue to be) instances where things
break and go unnoticed for weeks. (I just noticed MSVC has been failing
all the develop runs for weeks, almost certainly a config issue on my end)

All that said, I'm not sure where this fits into the picture going forward.

Andrey's points above (esp. 2&3) are very valid. I wouldn't depend on a CI
system like this in my day-job.

There are plenty more issues with the system as well...specifically:
*   Tons of time wasted re-building things that haven't changed (sometimes
    for years!)
*   No history saved beyond the most recent build
*   Processing of results a centralized point of failure

I'm happy to go with whatever the community is looking to do on this front.
Some options as I see it:

*  Keep the current test matrix as a compliment to the various CI systems
   starting to roll out.
*  Merge my runner system in with the new C++ Alliance CI system
*  Shutdown the test matrix and move fully to C++ Alliance cloud CI

Tom

[1]https://github.com/teeks99/boost-build/blob/master/Regression/README.md

Re: [boost] The Future of Boost - CI

Tom Kent