Boost tarball release that works with CMake

Mojca Miklavec

16 Nov 2022 16 Nov '22

1:35 a.m.

Dear developers, For a CMake-based project we need to fetch Boost sources as part of the build process (unless boost is already installed, in which case that version can be used). We are using CMake's FetchContent that recursively clones the repository from GitHub, but that process is extremely slow. On the Windows runner in GitHub actions it takes roughly 3 extra minutes to "git clone" boost alone (or 11 minutes for something like "choco install boost-msvc-14.3"). I tried to fetch the .7z file instead, but that one doesn't work because the CMake support gets stripped from the distributed sources when generating the tarballs. A single file would result in a much faster and more reliable workflow. The file can be cached on GitHub Actions, on a local computer one doesn't need to repeatedly clone the repository (which also slows down other parts of CMake) etc. I started the discussion here: https://github.com/boostorg/release-tools/pull/35 where it has been explained to me that: - either the CMake support needs to be heavily refactored (unlikely to happen soon) - or the project could generate one tarball using the layout that's consistent with git repository The second option might eventually be fulfilled automatically by GitHub, but that is probably not likely to happen any time soon either. I can imagine that creating one extra '.tar.xz' with a slightly different layout to support CMake builds might be straightforward to make. Is there any chance to fulfill my dream to be able to build a boost tarball with CMake out-of-the-box? Thank you very much, Mojca

Show replies by date

René Ferdinand Rivera Morell

16 Nov 16 Nov

3:22 a.m.

On Tue, Nov 15, 2022 at 9:03 PM Mojca Miklavec via Boost wrote:

...

Dear developers,

For a CMake-based project we need to fetch Boost sources as part of the build process (unless boost is already installed, in which case that version can be used).

We are using CMake's FetchContent that recursively clones the repository from GitHub, but that process is extremely slow. On the Windows runner in GitHub actions it takes roughly 3 extra minutes to "git clone" boost alone (or 11 minutes for something like "choco install boost-msvc-14.3"). I tried to fetch the .7z file instead, but that one doesn't work because the CMake support gets stripped from the distributed sources when generating the tarballs.

A single file would result in a much faster and more reliable workflow. The file can be cached on GitHub Actions, on a local computer one doesn't need to repeatedly clone the repository (which also slows down other parts of CMake) etc.

I started the discussion here: https://github.com/boostorg/release-tools/pull/35 where it has been explained to me that: - either the CMake support needs to be heavily refactored (unlikely to happen soon) - or the project could generate one tarball using the layout that's consistent with git repository

The second option might eventually be fulfilled automatically by GitHub, but that is probably not likely to happen any time soon either.

I can imagine that creating one extra '.tar.xz' with a slightly different layout to support CMake builds might be straightforward to make.

Is there any chance to fulfill my dream to be able to build a boost tarball with CMake out-of-the-box?

I can't answer the CMake question for you, but... I need to explain one issue that will hopefully dissuade you using tar balls. Many months ago we had the default Boost CI pull the regular tarballs to do testing. Tha had an interesting effect of creating a fair bit of downloads from the jFrog repository that holds those tarballs. Everything was great. CI would download Boost really fast. And get all of it without problems. Or so we thought. One day everything stopped working with download errors. We contacted jFrog about it. They looked and saw that we hit a data cap limit. They nicely raised the limit. Everything was great again. Some months later the same thing happened. With the same resolution. Then it happened again (I don't actually remember how many times we repeated that cycle). At which point we concluded that approach was not viable. As we were using more bandwidth than the entirety of all the other jFrog downloads combined (this may be an exaggeration). At which point we started changing the various CI methods to selectively git clone Boost (there's this great tool that gets just the right projects you need). And we haven't had download problems since then. In conclusion.. Please avoid downloading Boost tarballs from your CI processes. Thank you, in advance, for your consideration of the download bandwidth of our gracious release archive files provider, jFrog. -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supone Nada -- Robot Dreams - http://robot-dreams.net

Gavin Lambert

5:10 a.m.

On 16/11/2022 16:22, René Ferdinand Rivera Morell wrote:

...

At which point we concluded that approach was not viable. As we were using more bandwidth than the entirety of all the other jFrog downloads combined (this may be an exaggeration). At which point we started changing the various CI methods to selectively git clone Boost (there's this great tool that gets just the right projects you need). And we haven't had download problems since then. It seems likely that a full clean git clone (even at --depth 1) would consume a significantly larger amount of bandwidth than downloading tarballs would. But github has no specific bandwidth limits other than "if we notice you, we might do something about it", so you may get away with things you shouldn't.

Either method could be improved with caching -- the tarball could be cached for all individual library CI jobs from the same commit, and one set of git clones could be reused similarly. Git does win a bit more when only building a subset of libraries, or if the CI caches the clone and performs an incremental checkout rather than a full reclone (although that only provides benefit when *not* using --depth 1, which is otherwise better). (These *could* be supported with tarballs too, but only with more difficulty and complex delta-file management, which isn't really worth the effort when an alternative exists.) Whichever method is used, it should try to "play nice" and reuse as much as it can to avoid abusing cloud resources. (And this also improves speed, so is good as a selfish goal too.) Of course, this sort of caching and reuse is very hard to achieve with a generic cloud CI rather than a custom host.

Mojca Miklavec

7:25 a.m.

Hi, See my previous email in the thread. I'm just shortly amending some of the points. On Wed, 16 Nov 2022 at 07:20, Gavin Lambert via Boost wrote:

...

It seems likely that a full clean git clone (even at --depth 1) would consume a significantly larger amount of bandwidth than downloading tarballs would.

Yes.

...

But github has no specific bandwidth limits other than "if we notice you, we might do something about it", so you may get away with things you shouldn't.

Wouldn't the same apply to releasing the tarball(s) on the github releases page then?

...

Either method could be improved with caching -- the tarball could be cached for all individual library CI jobs from the same commit, and one set of git clones could be reused similarly.

While I have no problem caching the tarball, I have absolutely no idea how to do the same with git even on my personal computer. There is a small hack called "FETCHCONTENT_UPDATES_DISCONNECTED" that helps a bit on my local machine, but it seems unlikely that it could have any use inside a GitHub action.

...

Git does win a bit more when only building a subset of libraries, or if the CI caches the clone and performs an incremental checkout rather than a full reclone (although that only provides benefit when *not* using --depth 1, which is otherwise better). (These *could* be supported with tarballs too, but only with more difficulty and complex delta-file management, which isn't really worth the effort when an alternative exists.)

Please note that we have no reason to change the version between two releases (which only seem to happen three times per year). Ok, at the moment I'm eagerly waiting for 1.81.0 because of some blockers that have recently been fixed, but in general I don't care. So the issue is not really "--depth 1" or incremental builds, but the missing logic in FetchContent that keeps refetching everything every time I touch any single CMakeLists.txt or do a simple rebase. And I answered about "subset of libraries": I need 71 of them to cater for two or three that I actually need, which completely defeats the purpose of modular fetch.

...

Whichever method is used, it should try to "play nice" and reuse as much as it can to avoid abusing cloud resources. (And this also improves speed, so is good as a selfish goal too.)

I could easily reuse the tarball though, while reusing a git clone seems problematic in many ways. Thank you for the super fast responses, Mojca

Mojca Miklavec

7:18 a.m.

On Wed, 16 Nov 2022 at 04:22, René Ferdinand Rivera Morell wrote:

...

I can't answer the CMake question for you, but... I need to explain one issue that will hopefully dissuade you using tar balls. Many months ago we had the default Boost CI pull the regular tarballs to do testing. Tha had an interesting effect of creating a fair bit of downloads from the jFrog repository that holds those tarballs. Everything was great. CI would download Boost really fast. And get all of it without problems. Or so we thought. One day everything stopped working with download errors. We contacted jFrog about it. They looked and saw that we hit a data cap limit.

I'm pretty sure that we use waaaaaaaay more bandwidth with `git clone` than we would by caching the tarball though: https://github.com/actions/cache But if jFrog is the weak point and "github just works": what about simply putting the tarball to releases on GitHub (or at least this special one with CMake support)? Compare https://github.com/boostorg/boost/releases/tag/boost-1.80.0 with https://github.com/wxWidgets/wxWidgets/releases/tag/v3.2.1 I really like the way wxWidgets does the releases. (And GitHub actions would already "have all the build resources in house".)

...

At which point we started changing the various CI methods to selectively git clone Boost (there's this great tool that gets just the right projects you need). And we haven't had download problems since then.

There are a gazillion issues with "git clone" that I see: * I really like the idea of selectively using just the modules I need. However, I only need "dll" and "uuid" and yet I need to download 71 submodules (maybe one or two less after our code cleanup, but at least when we needed "filesystem system date_time regex" we had to download all of those 71) which kind of defeats the whole purpose and doesn't save any bit of bandwidth. See also https://github.com/boostorg/cmake/issues/26#issuecomment-1286919419 * GitHub is well optimized for downloading your own git sources. As an example, a GitHub action needs 2 seconds to clone it all, including all the large files. I tried doing the same on AWS CodeBuild and it takes 5 minutes. Every. Single. Job. That's for our own sources. I have absolutely no idea how to optimize that for something that gets fetched via FetchContent (other than by installing boost with `apt install` which is feasible for Linux, but not for Windows). I uploaded a custom Windows image for AWS CodeBuild that has Boost preinstalled, but AWS needs15 minutes to load the Windows image, making the CI so slow that I find it useless. Fetching boost via FetchContent takes roughly 2-3 extra minutes on GitHub actions. Fetching 100 MB from cache on the other hand would be instant. * CMake has a known issue that it keeps fetching everything from git "all the time" after you have already downloaded all the sources: https://gitlab.kitware.com/cmake/cmake/-/issues/21146 Mojca

Peter Dimov

8:54 a.m.

Mojca Miklavec wrote:

...

Dear developers,

For a CMake-based project we need to fetch Boost sources as part of the build process (unless boost is already installed, in which case that version can be used).

We are using CMake's FetchContent that recursively clones the repository from GitHub, but that process is extremely slow. On the Windows runner in GitHub actions it takes roughly 3 extra minutes to "git clone" boost alone (or 11 minutes for something like "choco install boost-msvc-14.3"). I tried to fetch the .7z file instead, but that one doesn't work because the CMake support gets stripped from the distributed sources when generating the tarballs.

A single file would result in a much faster and more reliable workflow. The file can be cached on GitHub Actions, on a local computer one doesn't need to repeatedly clone the repository (which also slows down other parts of CMake) etc.

I started the discussion here: https://github.com/boostorg/release-tools/pull/35 where it has been explained to me that: - either the CMake support needs to be heavily refactored (unlikely to happen soon) - or the project could generate one tarball using the layout that's consistent with git repository

The second option might eventually be fulfilled automatically by GitHub, but that is probably not likely to happen any time soon either.

I can imagine that creating one extra '.tar.xz' with a slightly different layout to support CMake builds might be straightforward to make.

Is there any chance to fulfill my dream to be able to build a boost tarball with CMake out-of-the-box?

I have added a Github Action to the superproject that should generate tarballs with submodules on every tag: https://github.com/boostorg/boost/blob/master/.github/workflows/release.yml We'll see how well this works once the beta is tagged. (We don't tag RCs.)

Andrey Semashev

10:08 a.m.

On 11/16/22 11:54, Peter Dimov via Boost wrote:

...

I have added a Github Action to the superproject that should generate tarballs with submodules on every tag:

https://github.com/boostorg/boost/blob/master/.github/workflows/release.yml

We'll see how well this works once the beta is tagged. (We don't tag RCs.)

I think you need to convert the line endings for the .zip archive. And .7z might be desired.

Andrey Semashev

10:09 a.m.

On 11/16/22 13:08, Andrey Semashev wrote:

...

On 11/16/22 11:54, Peter Dimov via Boost wrote:

...
I have added a Github Action to the superproject that should generate tarballs with submodules on every tag:

https://github.com/boostorg/boost/blob/master/.github/workflows/release.yml

We'll see how well this works once the beta is tagged. (We don't tag RCs.)

I think you need to convert the line endings for the .zip archive. And .7z might be desired.

As a possible solution, build .zip and .7z on a Windows CI job.

Alexander Grund

4:51 p.m.

...

...
We'll see how well this works once the beta is tagged. (We don't tag RCs.)

...

I think you need to convert the line endings for the .zip archive. And .7z might be desired.

I enhanced that such that the zip file has DOS/Windows line endings: https://github.com/boostorg/boost/pull/712 I also removed ALL .git/.github files and folders and CI meta data/configs to further slim it down. The current version would have e.g. ".git" files in each submodule If you agree with that it would be great to merge that before the tagging

Olaf van der Spek

8:31 p.m.

On Wed, Nov 16, 2022 at 11:08 AM Andrey Semashev via Boost wrote:

...

On 11/16/22 11:54, Peter Dimov via Boost wrote:

...
I have added a Github Action to the superproject that should generate tarballs with submodules on every tag:

https://github.com/boostorg/boost/blob/master/.github/workflows/release.yml

We'll see how well this works once the beta is tagged. (We don't tag RCs.)

I think you need to convert the line endings for the .zip archive. And .7z might be desired.

Do normal line endings not work on Windows? ;) -- Olaf

Mojca Miklavec

10:26 a.m.

On Wed, 16 Nov 2022 at 09:54, Peter Dimov via Boost wrote:

...

I have added a Github Action to the superproject that should generate tarballs with submodules on every tag:

https://github.com/boostorg/boost/blob/master/.github/workflows/release.yml

We'll see how well this works once the beta is tagged. (We don't tag RCs.)

This is AWESOME, thank you very much! I tested it on my personal fork and it works really nicely. The fetch phase is now reduced from 3 minutes to a few seconds (well, I need additional 40-90 seconds to extract the file, but just once anyway), and now the results can be easily cached and CMake doesn't keep re-fetching all the time. The solution with additional files in the release section on GitHub sounds perfect to me (and you won't be putting any additional pressure on jfrog). Mojca PS: unrelated / off-topic; I also need something similar to the following patch to be able to build our sources with "treat warnings as errors" on Visual Studio (in case it's not too late for 1.81.0): https://github.com/boostorg/config/pull/456/files

741

Age (days ago)

741

Last active (days ago)

List overview

Download

10 comments

7 participants

participants (7)

Alexander Grund
Andrey Semashev
Gavin Lambert
Mojca Miklavec
Olaf van der Spek
Peter Dimov
René Ferdinand Rivera Morell