Boost CI Infrastructure
The C++ Alliance has an interest in, and has made ongoing efforts towards, ensuring the availability and quality of the Continuous Integration systems that Boost uses. If you recall, when Travis went pay we deployed our own instances of Drone (https://drone.io). Sam submitted pull requests to every Boost repository offering the option (but not the requirement) to have builds take place on our Drone infrastructure. Now I'm not sure if it could handle every single library doing builds of 20+ targets but it helped some. GitHub Actions became available, and there was the possibility to split the load (or make a redundant script) of targets across both Drone and GHA. However we have noticed that GHA for the Boost GitHub organization kind of sucks because we have 160+ libraries that are all submitting GHA jobs under the same user (the https://github.com/boostorg Organization account). When users submit GHA jobs for their libraries under their own account (for example if I run a GHA CI job for Boost.URL using my fork at https://github.com/vinniefalco/url) we have noticed that the turnaround is much better. Probably because the individual account is not competing with 160+ other jobs. The C++ Alliance upgraded its GHA account status for free, to have elevated access to resources for being a certified non-profit organization. We are in the process of getting the Boost GHA account upgraded as well. However, since we still have 160+ libraries all competing for resources it might still not be enough. Drone offers the possibility of self-hosted runners, but Alan ran into a problem with Drone. That is, that it's scripting language (Starlark) for lack of a better term "sucks balls." The Python version isn't much better. The main problem is that it is very hard to write scripts which are compositions of other scripts, one because of the object model of the CI and two because of these security restrictions which attempt to prevent abuse/spam. Alan de Freitas has done extensive work writing scripts for both Drone and GHA and has concluded that the GHA language and ecosystem is far superior. There is a free marketplace for GHA actions where you can just "include" a program for anything that you need to do: https://github.com/marketplace?type=actions It is very easy to compose new actions out of existing actions, and having the CI dashboard integrated into the repository and pull requests is definitely a plus. The problem still remains that GHA is overloaded during peak times. Sam Darwin (our CTO) is attempting to deploy a network of self-hosted GHA runners for the Boost organization. These are costly and we would like to only activate these runners when GitHub's shared runners are over some threshold of load. Thus Sam is wrestling with various APIs (and reaching out to Microsoft) to see how we can achieve this. The problem is that the GHA action file has to specify which runner to use (our own versus the share) and this complicates things as this file is committed to the repository. I guess what I'm trying to say here is that we care deeply about long turnaround times on CI ("they suck) have been working on the problem. I have even been working on this myself before The C++ Alliance became involved, by trying to refactor my libraries to compile faster. To break out things common to tests into their own .lib or .o file so that the same library sources are not compiled more than once. By reducing the amount of stuff in headers, avoiding overly fancy and complex metaprogramming, cutting down on the amount of instantiations of various function templates. Alan and I have also put effort into making the most out of a fixed number of CI targets. For example, by cleverly arranging the settings on the various targets so that we get the most amount of combinations of toolchains and settings out of the least amount of targets. Okay, tl;dr: time. If you are having two hour CI turnarounds, it means quite frankly that you have not put any effort into caring about your CI's workload. Just like you can't expect to keep putting more and more shit into header files and having compilations stay fast, you also cannot expect to just load your CI up with everything and have it always be responsive. You need to continually monitor and adjust your CI scripts to get the most out of the finite CI resources, just like you have to continually refactor and adjust your library to keep compile times down, to keep the number of dependencies on additional Boost libraries down (yes this matters Andrey) We can throw more resources at the problem but C++ authors (especially Boost ones) will always be able to write code and scripts that consume more resources than what is available. We might have a section in the "Contributor's Guide" outlining strategies for cutting down CI workloads and turnaround times (Peter Turcan)?. Alan de Freitas has done a considerable amount of good work in this area so if you would like to ask specific questions on the list or in the Slack workspace that is also a good choice (or you can ask me or Sam). Thanks
participants (1)
-
Vinnie Falco