Boost CI Infrastructure

30 Apr 2023

      The C++ Alliance has an interest in, and has made ongoing efforts
towards, ensuring the availability and quality of the Continuous
Integration systems that Boost uses.

If you recall, when Travis went pay we deployed our own instances of
Drone (https://drone.io). Sam submitted pull requests to every Boost
repository offering the option (but not the requirement) to have
builds take place on our Drone infrastructure.

Now I'm not sure if it could handle every single library doing builds
of 20+ targets but it helped some.

GitHub Actions became available, and there was the possibility to
split the load (or make a redundant script) of targets across both
Drone and GHA. However we have noticed that GHA for the Boost GitHub
organization kind of sucks because we have 160+ libraries that are all
submitting GHA jobs under the same user (the
https://github.com/boostorg Organization account).

When users submit GHA jobs for their libraries under their own account
(for example if I run a GHA CI job for Boost.URL using my fork at
https://github.com/vinniefalco/url) we have noticed that the
turnaround is much better. Probably because the individual account is
not competing with 160+ other jobs.

The C++ Alliance upgraded its GHA account status for free, to have
elevated access to resources for being a certified non-profit
organization. We are in the process of getting the Boost GHA account
upgraded as well. However, since we still have 160+ libraries all
competing for resources it might still not be enough.

Drone offers the possibility of self-hosted runners, but Alan ran into
a problem with Drone. That is, that it's scripting language (Starlark)
for lack of a better term "sucks balls." The Python version isn't much
better. The main problem is that it is very hard to write scripts
which are compositions of other scripts, one because of the object
model of the CI and two because of these security restrictions which
attempt to prevent abuse/spam.

Alan de Freitas has done extensive work writing scripts for both Drone
and GHA and has concluded that the GHA language and ecosystem is far
superior. There is a free marketplace for GHA actions where you can
just "include" a program for anything that you need to do:

https://github.com/marketplace?type=actions

It is very easy to compose new actions out of existing actions, and
having the CI dashboard integrated into the repository and pull
requests is definitely a plus.

The problem still remains that GHA is overloaded during peak times.
Sam Darwin (our CTO) is attempting to deploy a network of self-hosted
GHA runners for the Boost organization. These are costly and we would
like to only activate these runners when GitHub's shared runners are
over some threshold of load. Thus Sam is wrestling with various APIs
(and reaching out to Microsoft) to see how we can achieve this. The
problem is that the GHA action file has to specify which runner to use
(our own versus the share) and this complicates things as this file is
committed to the repository.

I guess what I'm trying to say here is that we care deeply about long
turnaround times on CI ("they suck) have been working on the problem.

I have even been working on this myself before The C++ Alliance became
involved, by trying to refactor my libraries to compile faster. To
break out things common to tests into their own .lib or .o file so
that the same library sources are not compiled more than once. By
reducing the amount of stuff in headers, avoiding overly fancy and
complex metaprogramming, cutting down on the amount of instantiations
of various function templates.

Alan and I have also put effort into making the most out of a fixed
number of CI targets. For example, by cleverly arranging the settings
on the various targets so that we get the most amount of combinations
of toolchains and settings out of the least amount of targets.

Okay, tl;dr: time.

If you are having two hour CI turnarounds, it means quite frankly that
you have not put any effort into caring about your CI's workload. Just
like you can't expect to keep putting more and more shit into header
files and having compilations stay fast, you also cannot expect to
just load your CI up with everything and have it always be responsive.
You need to continually monitor and adjust your CI scripts to get the
most out of the finite CI resources, just like you have to continually
refactor and adjust your library to keep compile times down, to keep
the number of dependencies on additional Boost libraries down (yes
this matters Andrey)

We can throw more resources at the problem but C++ authors (especially
Boost ones) will always be able to write code and scripts that consume
more resources than what is available. We might have a section in the
"Contributor's Guide" outlining strategies for cutting down CI
workloads and turnaround times (Peter Turcan)?.

Alan de Freitas has done a considerable amount of good work in this
area so if you would like to ask specific questions on the list or in
the Slack workspace that is also a good choice (or you can ask me or
Sam).

Thanks

Vinnie Falco

tags

participants (1)