Boost downloads and JFrog

newer
Making Boost modular. IMPORTANT!...

Glen Fernandes

7 Jan 2024 7 Jan '24

4:48 p.m.

Rather than emails and Slack DMs, I would prefer we have this discussion on the mailing list. To summarize what I have repeated in the aforementioned channels: - Yes, our downloads are not working again because the JFrog account is again not active. - No, I cannot do anything about the JFrog account. - Yes, we can update the website to point to downloads hosted elsewhere. It is correct that JFrog does not charge us yet but our traffic has increased from 60TB/month to almost 200TB/month which is no longer supportable for free. We have limited time to find a solution. We are on borrowed time but as far as I know we still have time. the deactivation of the JFrog account in December 31 was not expected, and it was reinstated. Glen

Show replies by date

Peter Dimov

7 Jan 7 Jan

5:19 p.m.

Glen Fernandes wrote:

...

Rather than emails and Slack DMs, I would prefer we have this discussion on the mailing list.

To summarize what I have repeated in the aforementioned channels:

- Yes, our downloads are not working again because the JFrog account is again not active. - No, I cannot do anything about the JFrog account. - Yes, we can update the website to point to downloads hosted elsewhere.

It is correct that JFrog does not charge us yet but our traffic has increased from 60TB/month to almost 200TB/month which is no longer supportable for free. We have limited time to find a solution.

We are on borrowed time but as far as I know we still have time. the deactivation of the JFrog account in December 31 was not expected, and it was reinstated.

One obvious alternative is Github Releases; this has been raised in the past, and Raffi Enficiaud even created a script to do it: https://github.com/boostorg/release-tools/pull/16 but for some reason, there was no interest in using this for releases. Ultimately, the release managers need to decide whether to pursue this as an option. (We already use Github Releases for the CMake archives, e.g. https://github.com/boostorg/boost/releases/tag/boost-1.84.0 )

Glen Fernandes

5:23 p.m.

On Sun, Jan 7, 2024 at 12:19 PM Peter Dimov wrote:

...

One obvious alternative is Github Releases; this has been raised in the past, and Raffi Enficiaud even created a script to do it: https://github.com/boostorg/release-tools/pull/16 but for some reason, there was no interest in using this for releases.

Ultimately, the release managers need to decide whether to pursue this as an option.

If we change what goes into the distribution, this is an option. As far as I was told, at our current distribution size, this would require LFS which GitHub would charge us for. Glen

Vinnie Falco

5:24 p.m.

On Sun, Jan 7, 2024 at 9:23 AM Glen Fernandes via Boost < boost@lists.boost.org> wrote:

...

If we change what goes into the distribution, this is an option. As far as I was told, at our current distribution size, this would require LFS which GitHub would charge us for.

What would the file sizes look like if the release came out as two separate archives: One with complete HTML documentation, and the other with the source code? Thanks

Peter Dimov

5:32 p.m.

Glen Fernandes wrote:

...

On Sun, Jan 7, 2024 at 12:19 PM Peter Dimov wrote:

One obvious alternative is Github Releases; this has been raised in the past, and Raffi Enficiaud even created a script to do it: https://github.com/boostorg/release-tools/pull/16 but for some reason, there was no interest in using this for releases.

Ultimately, the release managers need to decide whether to pursue this as an option.

If we change what goes into the distribution, this is an option. As far as I was told, at our current distribution size, this would require LFS which GitHub would charge us for.

https://docs.github.com/en/repositories/releasing-projects-on-github/about-r... says "Each file included in a release must be under 2 GiB. There is no limit on the total size of a release, nor bandwidth usage." The currently hosted archives are comparable in size with the official releases. The official boost_1_84_0.7z is 106 MB, and the corresponding CMake archive is 90.1 MB.

Glen Fernandes

6:11 p.m.

On Sun, Jan 7, 2024 at 12:32 PM Peter Dimov wrote:

...

Glen Fernandes wrote:

...
If we change what goes into the distribution, this is an option. As far as I was told, at our current distribution size, this would require LFS which GitHub would charge us for.

https://docs.github.com/en/repositories/releasing-projects-on-github/about-r...

says

"Each file included in a release must be under 2 GiB. There is no limit on the total size of a release, nor bandwidth usage."

The currently hosted archives are comparable in size with the official releases.

The official boost_1_84_0.7z is 106 MB, and the corresponding CMake archive is 90.1 MB.

In other words, as long as the GitHub release can be made from our existing repository contents, we should be fine? i.e. We cannot put our current official built releases into a GitHub repository because any file over 100 MB would be rejected: https://docs.github.com/en/repositories/working-with-files/managing-large-fi... "GitHub blocks files larger than 100 MiB. To track files beyond this limit, you must use Git Large File Storage (Git LFS)." Glen

Peter Dimov

6:15 p.m.

Glen Fernandes wrote:

...

On Sun, Jan 7, 2024 at 12:32 PM Peter Dimov wrote:

Glen Fernandes wrote:

...
If we change what goes into the distribution, this is an option. As far as I was told, at our current distribution size, this would require LFS which GitHub would charge us for.

https://docs.github.com/en/repositories/releasing-projects-on- github/about-releases

says

"Each file included in a release must be under 2 GiB. There is no limit on the total size of a release, nor bandwidth usage."

The currently hosted archives are comparable in size with the official releases.

The official boost_1_84_0.7z is 106 MB, and the corresponding CMake archive is 90.1 MB.

In other words, as long as the GitHub release can be made from our existing repository contents, we should be fine?

i.e. We cannot put our current official built releases into a GitHub repository because any file over 100 MB would be rejected:

https://docs.github.com/en/repositories/working-with-files/managing-large- files/about-large-files-on-github

"GitHub blocks files larger than 100 MiB. To track files beyond this limit, you must use Git Large File Storage (Git LFS)."

https://github.com/boostorg/boost/releases/download/boost-1.84.0/boost-1.84.... is 149 MB. The above probably refers to putting large files in a repository, not to release artifacts.

Glen Fernandes

8:08 p.m.

On Sun, Jan 7, 2024 at 1:15 PM Peter Dimov wrote:

...

...
On Sun, Jan 7, 2024 at 12:32 PM Peter Dimov wrote:

Glen Fernandes wrote: > If we change what goes into the distribution, this is an option. As far as I was > told, at our current distribution size, this would require LFS which GitHub > would charge us for.

https://docs.github.com/en/repositories/releasing-projects-on- github/about-releases

says

"Each file included in a release must be under 2 GiB. There is no

Glen Fernandes wrote: limit on

...
the total size of a release, nor bandwidth usage."

The currently hosted archives are comparable in size with the

official

...
releases.

The official boost_1_84_0.7z is 106 MB, and the corresponding CMake archive is 90.1 MB.

In other words, as long as the GitHub release can be made from our existing repository contents, we should be fine?

i.e. We cannot put our current official built releases into a GitHub repository because any file over 100 MB would be rejected:

https://docs.github.com/en/repositories/working-with-files/managing-large-

...
files/about-large-files-on-github

"GitHub blocks files larger than 100 MiB. To track files beyond this limit, you must use Git Large File Storage (Git LFS)."

https://github.com/boostorg/boost/releases/download/boost-1.84.0/boost-1.84....

is 149 MB.

The above probably refers to putting large files in a repository, not to release artifacts.

Thanks; then leveraging GitHub releases seems like the best solution to me. Maybe the only (free) solution we have. Glen

Sam Darwin

8 Jan 8 Jan

1:09 p.m.

...

Thanks; then leveraging GitHub releases seems like the best solution to me. Maybe the only (free) solution we have.

Glen, While GitHub Releases seems like the best plan, here are a few other details. JFrog is hosting: 1. development snapshots 2. beta and release-candidates 3. releases The GitHub Releases paradigm is well designed for 3, and possibly 2. However it's not designed for "development snapshots" and would really be forcing that functionality. - Developers visit the GitHub releases page, and hopefully the top listing would be the latest actual release. If the main listing is a 'develop' snapshot and a 'master' snapshot, that could be confusing, since it isn't an official release. - A release is tied to a git tag. While tags can be changed, they are usually expected to be immutable. Publishing a dev snapshot multiple times per day would constantly re-write the so-called 'snapshot' git tag. Or fill up the page with 1000 releases. No. An idea could be to publish official releases on GitHub, and continue to host development snapshots on JFrog. Bandwidth to the snapshots is probably low enough to mean that a CDN isn't necessary, however if the pricing is favorable, certainly add the CDN. Directly assume billing for both JFrog and the CDN, so they wouldn't shut them down. Even the official releases could continue to be uploaded to JFrog, as a backup storage location, if authentication is enabled on those specific files. With authentication, internal scripts would have access, while the general public would be prevented from downloading releases from JFrog in certain folders. https://github.com/boostorg/release-tools already has support for GitHub Releases from years ago. Should be reviewed. However, this goes back to the topic about development snapshots. I would not recommend enabling that for daily snapshots. Only using github_releases.py, every 3-4 months, for the official releases. Notice the GitHub CLI 'gh' 'gh release' sub-command. It makes sense to leverage the gh cli as much as possible. the project has 34k stars, and 7000 commits.

Andrey Semashev

3:44 p.m.

On 1/8/24 16:09, Sam Darwin via Boost wrote:

...

...
Thanks; then leveraging GitHub releases seems like the best solution to me. Maybe the only (free) solution we have.

Glen, While GitHub Releases seems like the best plan, here are a few other details. JFrog is hosting: 1. development snapshots 2. beta and release-candidates 3. releases

The GitHub Releases paradigm is well designed for 3, and possibly 2. However it's not designed for "development snapshots" and would really be forcing that functionality.

What's the purpose of development snapshots? Can we drop them in favor of git checkout?

Sam Darwin

3:53 p.m.

...

Cloudflare

2.8 Limitation on Serving Non-HTML Content The Services are offered primarily as a platform to cache and serve web pages and websites. Unless explicitly included as part of a Paid Service purchased by you, you agree to use the Services solely for the purpose of (i) serving web pages

...

What's the purpose of development snapshots?

- The existing scripts that publish official releases do that by using the development snapshots. Renaming the latest snapshot, which was built using CircleCI. - Constantly validating the full build process through CircleCI - 'develop' and 'master' docs are sourced from the snapshots. Notice the 'develop' in this URL. https://www.boost.org/doc/libs/develop/tools/boostdep/doc/html/ - a 'git checkout' would still need to compile the html files. After that has been done, it's convenient to zip the results, and upload them to the web server.

Andrey Semashev

4:11 p.m.

On 1/8/24 18:53, Sam Darwin wrote:

...

...
Cloudflare

2.8 Limitation on Serving Non-HTML Content The Services are offered primarily as a platform to cache and serve web pages and websites. Unless explicitly included as part of a Paid Service purchased by you, you agree to use the Services solely for the purpose of (i) serving web pages

...
What's the purpose of development snapshots?

- The existing scripts that publish official releases do that by using the development snapshots. Renaming the latest snapshot, which was built using CircleCI. - Constantly validating the full build process through CircleCI - 'develop' and 'master' docs are sourced from the snapshots. Notice the 'develop' in this URL. https://www.boost.org/doc/libs/develop/tools/boostdep/doc/html/ <https://www.boost.org/doc/libs/develop/tools/boostdep/doc/html/> - a 'git checkout' would still need to compile the html files. After that has been done, it's convenient to zip the results, and upload them to the web server.

Boost's internal processes (e.g. that the website uses snapshots to publish docs from develop and master) can be changed, if needed. That is, if snapshots are problematic to serve, and the sole their user is Boost itself, we should be able to do without them. For example, update the website from CI itself. I was more interested in whether the actual users need them.

Vinnie Falco

5:11 p.m.

On Mon, Jan 8, 2024 at 8:11 AM Andrey Semashev via Boost < boost@lists.boost.org> wrote:

...

I was more interested in whether the actual users need them.

The thing about the snapshots is that they are not necessarily in continual demand, but when you do need it, you are very glad it is there. Because the alternative is to waste a lot of time duplicating the release process. Thanks

Phil Endecott

9 Jan 9 Jan

12:27 a.m.

Sam Darwin wrote:

...

...
Cloudflare

2.8 Limitation on Serving Non-HTML Content The Services are offered primarily as a platform to cache and serve web pages and websites. Unless explicitly included as part of a Paid Service purchased by you, you agree to use the Services solely for the purpose of (i) serving web pages

https://blog.cloudflare.com/updated-tos/ "Goodbye, section 2.8 and hello to Cloudflare’s new terms of service 16/05/2023" "Cloudflare’s network became larger and more robust and its portfolio broadened to include services like Stream, Images, and R2. These services are explicitly designed to allow customers to serve non-HTML content like video, images, and other large files hosted directly by Cloudflare. ... we made it clear that customers can serve video and other large files using the CDN so long as that content is hosted by a Cloudflare service like Stream, Images, or R2." Phil.

Vinnie Falco

8 Jan 8 Jan

3:55 p.m.

On Mon, Jan 8, 2024 at 7:44 AM Andrey Semashev via Boost < boost@lists.boost.org> wrote:

...

What's the purpose of development snapshots? Can we drop them in favor of git checkout?

The development snapshot is effectively a release. It has all the headers collated into the proper root include directory. It has all the sources in the right place for building each lib. It has the rendered HTML. The snapshot allows anyone to efficiently test whether or not an issue is resolved in the tip of a branch for example. "git checkout" doesn't provide anything remotely resembling a proper release. To function as a drop-in replacement for an installed official release of Boost would require a lot of post-processing after checking out the superproject using git. Thanks

Andrey Semashev

4:06 p.m.

On 1/8/24 18:55, Vinnie Falco wrote:

...

On Mon, Jan 8, 2024 at 7:44 AM Andrey Semashev via Boost <boost@lists.boost.org <mailto:boost@lists.boost.org>> wrote:

What's the purpose of development snapshots? Can we drop them in favor of git checkout?

The development snapshot is effectively a release. It has all the headers collated into the proper root include directory. It has all the sources in the right place for building each lib. It has the rendered HTML. The snapshot allows anyone to efficiently test whether or not an issue is resolved in the tip of a branch for example.

He could also test a git checkout.

...

"git checkout" doesn't provide anything remotely resembling a proper release. To function as a drop-in replacement for an installed official release of Boost would require a lot of post-processing after checking out the superproject using git.

Is this functionality actually in high demand by users? I would imagine, the majority of users either consume Boost releases (or betas or RCs), or use Boost from git.

Peter Dimov

4:08 p.m.

Andrey Semashev wrote:

...

Is this functionality actually in high demand by users? I would imagine, the majority of users either consume Boost releases (or betas or RCs), or use Boost from git.

When we ask people to test something not yet released, having a master snapshot to download makes it much easier for them, because they can use their existing workflow for integrating Boost.

Peter Dimov

4:03 p.m.

Andrey Semashev wrote:

...

What's the purpose of development snapshots? Can we drop them in favor of git checkout?

Website documentation snapshots for master and develop are created from them.

Rainer Deyke

9 Jan 9 Jan

8:16 a.m.

On 08.01.24 14:09, Sam Darwin via Boost wrote:

...

While GitHub Releases seems like the best plan, here are a few other details. JFrog is hosting: 1. development snapshots 2. beta and release-candidates 3. releases

The GitHub Releases paradigm is well designed for 3, and possibly 2. However it's not designed for "development snapshots" and would really be forcing that functionality. - Developers visit the GitHub releases page, and hopefully the top listing would be the latest actual release. If the main listing is a 'develop' snapshot and a 'master' snapshot, that could be confusing, since it isn't an official release. - A release is tied to a git tag. While tags can be changed, they are usually expected to be immutable. Publishing a dev snapshot multiple times per day would constantly re-write the so-called 'snapshot' git tag. Or fill up the page with 1000 releases. No.

Possible workaround: put the dev snapshots in a separate GitHub project that only exists to host development snapshots. Let it fill up with thousands of releases. If we hit some kind of GitHub limit, delete the whole project and start a new one. -- Rainer Deyke (rainerd@eldwood.com)

Andrey Semashev

7 Jan 7 Jan

5:24 p.m.

On 1/7/24 19:48, Glen Fernandes via Boost wrote:

...

Rather than emails and Slack DMs, I would prefer we have this discussion on the mailing list.

To summarize what I have repeated in the aforementioned channels:

- Yes, our downloads are not working again because the JFrog account is again not active. - No, I cannot do anything about the JFrog account. - Yes, we can update the website to point to downloads hosted elsewhere.

It is correct that JFrog does not charge us yet but our traffic has increased from 60TB/month to almost 200TB/month which is no longer supportable for free. We have limited time to find a solution.

We are on borrowed time but as far as I know we still have time. the deactivation of the JFrog account in December 31 was not expected, and it was reinstated.

Are the source uploads to SourceForge[1] still considered "official"? If yes, perhaps we could publish those links as mirrors. https://sourceforge.net/projects/boost/files/boost/

Glen Fernandes

5:29 p.m.

On Sun, Jan 7, 2024 at 12:24 PM Andrey Semashev wrote:

...

Are the source uploads to SourceForge[1] still considered "official"? If yes, perhaps we could publish those links as mirrors. https://sourceforge.net/projects/boost/files/boost/

We don't consider them official. Even though SourceForge management has changed (I'm told) since the time they were caught injecting their own adware into installers, their reputation hasn't recovered. Many of the vendors that ship Boost would refuse to obtain it from SourceForge URLs. Glen

Phil Endecott

8 Jan 8 Jan

3:33 p.m.

Glen Fernandes wrote:

...

Rather than emails and Slack DMs, I would prefer we have this discussion on the mailing list.

To summarize what I have repeated in the aforementioned channels:

- Yes, our downloads are not working again because the JFrog account is again not active. - No, I cannot do anything about the JFrog account. - Yes, we can update the website to point to downloads hosted elsewhere.

It is correct that JFrog does not charge us yet but our traffic has increased from 60TB/month to almost 200TB/month which is no longer supportable for free. We have limited time to find a solution.

We are on borrowed time but as far as I know we still have time. the deactivation of the JFrog account in December 31 was not expected, and it was reinstated.

Hi Glen, I have recently been using Cloudflare R2 to serve low-cost high-bandwidth downloads. https://developers.cloudflare.com/r2/ Pricing is: * Domain registration (required; for non-obvious reasons you need to use a new domain hosted by them): approx $10/year. * Storage: $0.015 per GB-month. * HTTP GET operations: $0.36 per million operations. * Egress Bandwidth: free! With smaller providers, I would assume that "free bandwidth" probably means "free until we decide you're using too much". But in the case of Cloudflare, my impression is that they are huge enough that even Boost's 200 TB/month might be only a drop in the ocean. If any other readers have experience or opinions about that, I'd be interested to hear from you! Regards, Phil.

Glen Fernandes

9 Jan 9 Jan

11:34 a.m.

On Mon, Jan 8, 2024 at 10:33 AM Phil Endecott wrote:

...

Hi Glen,

I have recently been using Cloudflare R2 to serve low-cost high-bandwidth downloads. https://developers.cloudflare.com/r2/

Pricing is:

* Domain registration (required; for non-obvious reasons you need to use a new domain hosted by them): approx $10/year.

* Storage: $0.015 per GB-month.

* HTTP GET operations: $0.36 per million operations.

* Egress Bandwidth: free!

With smaller providers, I would assume that "free bandwidth" probably means "free until we decide you're using too much". But in the case of Cloudflare, my impression is that they are huge enough that even Boost's 200 TB/month might be only a drop in the ocean. If any other readers have experience or opinions about that, I'd be interested to hear from you!

Thanks Phil. When I had looked into Cloudflare the estimate was more expensive than Fastly, but maybe I missed something. I'll double check with them to be sure. Glen

Glen Fernandes

11:36 a.m.

I wrote:

...

the deactivation of the JFrog account in December 31 was not expected, and it was reinstated.

Our JFrog downloads are back. JFrog had renewed our account for an extra week instead of an extra year. Glen

Ion Gaztañaga

8 Jan 8 Jan

10:03 p.m.

On 07/01/2024 17:48, Glen Fernandes via Boost wrote:

...

Rather than emails and Slack DMs, I would prefer we have this discussion on the mailing list.

To summarize what I have repeated in the aforementioned channels:

- Yes, our downloads are not working again because the JFrog account is again not active. - No, I cannot do anything about the JFrog account. - Yes, we can update the website to point to downloads hosted elsewhere.

It is correct that JFrog does not charge us yet but our traffic has increased from 60TB/month to almost 200TB/month which is no longer supportable for free. We have limited time to find a solution.

We are on borrowed time but as far as I know we still have time. the deactivation of the JFrog account in December 31 was not expected, and it was reinstated.

Glen

Hi Glen, Maybe this issue has been already discussed in the Boost Foundation. According to the minutes, some cost reduction efforts were discussed in the past (Mailing list, Server cost...). This mainly looks like a cost issue to me (if I understand your description correctly, it's not an storage problem but a network bandwidth problem) Binary download is the primary Boost distribution method, maybe we should try to get some funding or sponsorship through the Boost Foundation. Does it make sense? Best, Ion

Glen Fernandes

10:33 p.m.

On Monday, January 8, 2024, Ion Gaztañaga wrote:

...

Hi Glen,

Maybe this issue has been already discussed in the Boost Foundation. According to the minutes, some cost reduction efforts were discussed in the past (Mailing list, Server cost...). This mainly looks like a cost issue to me (if I understand your description correctly, it's not an storage problem but a network bandwidth problem)

Binary download is the primary Boost distribution method, maybe we should try to get some funding or sponsorship through the Boost Foundation. Does it make sense?

Hi Ion, Yes, we have been negotiating for an affordable rate, the most competitive of which we will have a formal quote for on Wednesday (01/10). A free solution or free fallback is still preferable even if it only covers releases, RCs, and betas (and not development snapshots etc.). Yes, binary download here being our tarballs hosted on JFrog currently (not the static libraries built by Tom). Glen

Ion Gaztañaga

9 Jan 9 Jan

1:47 p.m.

On 08/01/2024 23:33, Glen Fernandes via Boost wrote:

...

Hi Ion,

Yes, we have been negotiating for an affordable rate, the most competitive of which we will have a formal quote for on Wednesday (01/10).

A free solution or free fallback is still preferable even if it only covers releases, RCs, and betas (and not development snapshots etc.).

Yes, binary download here being our tarballs hosted on JFrog currently (not the static libraries built by Tom).

Glen

Thanks glen, Let's hope the formal quote arrives soon. Best, Ion

Sam Darwin

2:13 p.m.

...

by a Cloudflare service like Stream, Images, or R2.

Interesting. A CDN would still be restricted. But if we migrated the files to their R2 service then it would be permitted. In the longer term that could be worth exploring.

Robert Ramey

7:48 p.m.

On 1/7/24 8:48 AM, Glen Fernandes via Boost wrote:

...

Rather than emails and Slack DMs, I would prefer we have this discussion on the mailing list.

...
It is correct that JFrog does not charge us yet but our traffic has increased from 60TB/month to almost 200TB/month which is no longer supportable for free. We have limited time to find a solution.

Perhaps it's time to "get serious" about Boost "Modularization". Basically this would mean that users download just the libraries (and their dependencies) they actually intend to use. Of course this would be a big project. But we've been working hard to try to move in this direction. I would envision: a) user interested in boost download and locally test Boost "core" b) for each library that a users is immediately interested in: downloads, builds and tests the library (and it's dependencies) c) as time moves on, users could update, replace, or delete their set of libraries. This would in practice eliminate the concept of Boost version 1.84 etc... and replace with Boost Serialization library version 1, ... Boost would migrate from being a single/monolithic library to a group of libraries with some explicit dependencies (on other boost librarys, standard library or ?). The fact that we can't do so now is a symptom that our development practices need work. Robert Ramey

Alexander Grund

10 Jan 10 Jan

8:07 a.m.

> Perhaps it's time to "get serious" about Boost "Modularization". 
> Basically this would mean that users download just the libraries (and 
> their dependencies) they actually intend to use.  Of course this would 
> be a big project.  But we've been working hard to try to move in this 
> direction.  I would envision:
>
> a) user interested in boost download and locally test Boost "core"
> b) for each library that a users is immediately interested in:
>  downloads, builds and tests the library (and it's dependencies)
> c) as time moves on, users could update, replace, or delete their set 
> of libraries.
>
> This would in practice eliminate the concept of Boost version 1.84 
> etc... and replace with Boost Serialization library version 1, ...
> Boost would migrate from being a single/monolithic library to a group 
> of libraries with some explicit dependencies (on other boost librarys, 
> standard library or ?). 
While I agree that this modularization would be great and helpful 
especially for package managers (and their maintainers) I don't think 
distributing Boost libs as a loose collection of libraries is good for 
end-users/developers:
- Boost libs do and should depend on other Boost libs especially for 
compatibility, bug fixes, improvements etc over the stdlib/compiler
- Dependencies of Boost libraries are not obvious, especially the 
transitive ones making it hard to keep a working configuration
- Dependencies might change with or without notice
- It could make users believe they can mix and match Boost libraries of 
different versions while we do (and possibly can) only test a single 
configuration, i.e. the current state of master/develop/tagX of ALL 
libraries at that state
- Although that "monolith" thing is a common complaint it also has an 
advantage: Once you have Boost already downloaded/set up you have 
something close to an "extended standard library" and are encouraged to 
look if what you need is already in one of the Boost libs you already 
have available instead of rolling your own. Having to go through the 
trouble of getting yet another set of dependent libraries is likely 
off-putting.

Especially for the "with some explicit dependencies (on other boost 
librarys" part:
Having the experience trying to just have a CMakeList (for CI testing) 
listing all direct and transitive dependencies of a Boost library "at 
the bottom" has shown that this is quite fragile and has led to trouble 
before. Pushing that to users isn't an idea I'm very fond off. And as a 
Boost maintainer I don't really want to care if some other Boost lib did 
or did not add or remove a dependency on another Boost lib.
So having that list of "explicit dependencies" will be hard to keep up 
to date and valid at all times. Currently we can get away by saying 
"Boost is the dependency of Boost"
Yes the dependency tool (e.g. used in CI) does a good job already and we 
could leverage that to create that list. It will still be changing and 
would need to be much more reliable, which means maintainers need to do 
more work getting their (direct) dependencies always(!) right and in the 
format that tool understands.

However I do like the approach we currently have in CMake (and likely 
soon in B2) of not requiring a single "include/boost" folder but having 
the build system figure out (transitive) dependencies and required paths.

Alex

Vinnie Falco

2:38 p.m.

On Tue, Jan 9, 2024 at 11:49 AM Robert Ramey via Boost < boost@lists.boost.org> wrote:

...

Perhaps it's time to "get serious" about Boost "Modularization".

This statement ignores all the work that has been done and continues to be done in terms of making Boost modular.

...

Basically this would mean that users download just the libraries (and their dependencies) they actually intend to use. Of course this would be a big project. But we've been working hard to try to move in this direction.

Yes, we have "been serious." (And note when I say "we" I exclude myself, as I have just been busy cranking away at producing individual libraries and supporting Boost Libraries infrastructure).

...

This would in practice eliminate the concept of Boost version 1.84 etc... and replace with Boost Serialization library version 1, ...

Grouping all the libraries together into a single release version (e.g. 1.85.0) which is all tested against each other as a unit, is the only sane development model. Otherwise we run into the combinatorial explosion of questions like "what version works with what." And furthermore, "eliminating the concept of Boost version ${X}" pushes more testing and documentation work (to explain what versions work with what) onto each individual author instead of centralizing that effort into the release process. I don't like this at all. Boost would migrate from being a single/monolithic library to a group of

...

libraries with some explicit dependencies (on other boost librarys, standard library or ?).

Boost is already a "group of libraries with some explicit dependencies." It just so happens that they are bundled together into one archive. Whatever it is that you are proposing would be in addition to and not in lieu of what we have.

...

The fact that we can't do so now is a symptom that our development practices need work.

I think this is overlooking the fact that the Boost release process *works well* right now. Three releases every year like clockwork and they are pretty high quality in terms of having minimal inter-library defects. Thanks On Tue, Jan 9, 2024 at 11:49 AM Robert Ramey via Boost < boost@lists.boost.org> wrote:

...

On 1/7/24 8:48 AM, Glen Fernandes via Boost wrote:

...
Rather than emails and Slack DMs, I would prefer we have this discussion on the mailing list.

...
It is correct that JFrog does not charge us yet but our traffic has increased from 60TB/month to almost 200TB/month which is no longer supportable for free. We have limited time to find a solution.

Perhaps it's time to "get serious" about Boost "Modularization". Basically this would mean that users download just the libraries (and their dependencies) they actually intend to use. Of course this would be a big project. But we've been working hard to try to move in this direction. I would envision:

a) user interested in boost download and locally test Boost "core" b) for each library that a users is immediately interested in: downloads, builds and tests the library (and it's dependencies) c) as time moves on, users could update, replace, or delete their set of libraries.

This would in practice eliminate the concept of Boost version 1.84 etc... and replace with Boost Serialization library version 1, ... Boost would migrate from being a single/monolithic library to a group of libraries with some explicit dependencies (on other boost librarys, standard library or ?).

The fact that we can't do so now is a symptom that our development practices need work.

Robert Ramey

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Regards, Vinnie Follow me on GitHub: https://github.com/vinniefalco

Robert Ramey

12 Jan 12 Jan

2:16 a.m.

On 1/10/24 6:38 AM, Vinnie Falco via Boost wrote:

...

On Tue, Jan 9, 2024 at 11:49 AM Robert Ramey via Boost < boost@lists.boost.org> wrote:

...
Perhaps it's time to "get serious" about Boost "Modularization".

This statement ignores all the work that has been done and continues to be done in terms of making Boost modular.

Just the opposite. It acknowledges the work done and implicitly laments that it has failed to arrive at its logical conclusion. What is the purpose of investing the effort into "Boost Modularization" if it's not this?

...

...
Basically this would mean that users download just the libraries (and their dependencies) they actually intend to use. Of course this would be a big project. But we've been working hard to try to move in this direction.

Yes, we have "been serious." (And note when I say "we" I exclude myself, as I have just been busy cranking away at producing individual libraries and supporting Boost Libraries infrastructure).

...
This would in practice eliminate the concept of Boost version 1.84 etc... and replace with Boost Serialization library version 1, ...

Grouping all the libraries together into a single release version (e.g. 1.85.0) which is all tested against each other as a unit, is the only sane development model. Otherwise we run into the combinatorial explosion of questions like "what version works with what."

Actually, I believe this statement is exactly wrong. a) Treating Boost "as a unit" and testing on this basis results in an amount of work which increases with the square of the number of libraries. b) Modifying libraries to pass tests in other libraries make one library dependent on another which might not be obvious. Libraries should be tested individually as a unit to prove that the the implementation of the library matches faithfully implements it's expored interface. c) If this is done, it is guaranteed that errors cannot be introduced when libraries are composed. d) Libraries interfaces including type requirements should be formally documented and tests should certify that the implementation is consistent with the interface. e) If the above is done - the amount of testing will increase only linearly with the number of libraries. f) Adding one more library should not provoke any new errors. e) Introducing an error into a library should be detected during unit testing and if inspite of this it's undetected, it will only affect other libraries which import the header from the erroneous library. g) Library authors should strive to detect at compile time (preferibly) or runtime which interface requirements are not met. Boost testing has special facilities "test compile fail" specifically to facilitate that. h) The need to "test all the libraries together" is a red flag/code smell. In any case, users should be able to download any number and/or combination of libraries (along with their dependencies) and use just that. This will avoid making uses applications more complicated than they already are. And furthermore,

...

"eliminating the concept of Boost version ${X}" pushes more testing and documentation work (to explain what versions work with what) onto each individual author instead of centralizing that effort into the release process. I don't like this at all.

Right - Wrong ! If we need to specify that information - we've already made a mistake. No one can keep all those combinations in their head. If some library/app depends on some other library subject to some dependency of compiler level, etc. and that requirement is unfullfilled, It should result in a compile time error. Our config library - a masterpiece by John Maddock - is designed to address this very problem and does so very well.

...

Boost would migrate from being a single/monolithic library to a group of

...
libraries with some explicit dependencies (on other boost librarys, standard library or ?).

It might. I can't predict the future. Nothing in this proposition precludes a "complete release". But it should be an option - not a requirement.

...

...
Boost is already a "group of libraries with some explicit dependencies." It just so happens that they are bundled together into one archive. Whatever it is that you are proposing would be in addition to and not in lieu of what we have.

Sure. But if we finished the "boost modularization", the "global" release would be nothing more than the union of the individual ones and guarenteed to be correct.

...

...
The fact that we can't do so now is a symptom that our development practices need work.

I think this is overlooking the fact that the Boost release process *works well* right now. Three releases every year like clockwork and they are pretty high quality in terms of having minimal inter-library defects.

I don't dispute this. But - it doesn't scale and can never scale. That's what started this discussion in the first place. I'm aware that significant effort has been invested into the "boost modularization" effort. I have a couple of questions about this effort. a) what is the point of this "modularization" effort if not this? b) what this the desired/expected benefit of this effort if not this? c) when will we know when it's done? d) how will we know whether or not it's been successful?

...

...
Robert Ramey

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Alan de Freitas

5:26 a.m.

...

a) Treating Boost "as a unit" and testing on this basis results in an amount of work which increases with the square of the number of libraries.

Sorry. Why exactly the square of the number of libraries?

Robert Ramey

7:02 a.m.

On 1/11/24 9:26 PM, Alan de Freitas via Boost wrote:

...

...
a) Treating Boost "as a unit" and testing on this basis results in an amount of work which increases with the square of the number of libraries.

Sorry. Why exactly the square of the number of libraries?

Suppose you've got one library with 10 cases you want to test and each test takes 1 second to run. Now suppose you've got 2 libraries each with 10 cases. Suppose you're concerned about one library provoking a failure in the other. Then for each test in the first library, there might be 10 conditions in the second library which you would want to test against. etc... Actually, a better analysis might conclude that the number of possible cross failure modes might increase with the following number: n + n * (n-1) / 2 + n * (n-1) * (n-2) / 3, + n * (n-1) * (n-2) * (n-3) / 4 ... Of course it's a crude measure (and argument). But it illustrates that if you're trying to test cross impacts of libraries, the number of possible failure modes increases disproportionately to the number libraries to be tested. Actually when we think we're "cross testing" we're really not because we aren't really writing tests to consider these kinds of failures. So the whole idea of thinking that we're actually testing anything when we test all at once is very misleading. A related situation occurs when making a scientific experiment. Typically such an experiment has a control case and the test case which varies from the control case in only one variable. So if the two cases result in different results, we know that that one variable is the source of the difference. Trying to test "all at once" is exactly the opposite of scientific method. The whole idea of unit testing is an attempt to make our testing more useful and scientific. In the "old days", we would write the whole program from start to finish before we did any testing. This is comparable to the "cross testing" argument from earlier in this post. This wasn't called "testing" it was called "debugging". It proved to be a very inefficient and time consuming operation. In reference to the above, consider how much more time it takes to "debug" the whole program as opposed to testing each function/type individually. As yet another aside, I worked for years as a freelance developer/consultant. I only got called when things were stuck and they needed some to take the blame and they had no other choice. Part of this was likely due to my annoying and pedantic personality. I have never had a customer who ever wrote unit test. When I asked why, the answer was always "we haven't got time". Historically, the idea of unit testing only really became a "thing" around the year 2000. Imagine - 30-40 years of software development with the build and crash method. Another historical note that I believe that I'm repeating correctly. When the first stored program computer was fired up they tried a program like factoring a number or something. They (including John von Neuman), were astonished that it didn't work the first time!!! Given the mindset of my collegues this doesn't amaze me. Another interesting note from the past was that up until ~1960 programmers were almost all female. It didn't take long (~10 years) before most of them were men. I have no idea why this is/was. Make of this whatever you want. I'm sure someone will have a theory. Robert Ramey

...

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Alexander Grund

8:50 a.m.

...

a) Treating Boost "as a unit" and testing on this basis results in an amount of work which increases with the square of the number of libraries. How does introducing another dimension (versions of other libraries) help in this regard? Now you don't only have work of num_libs^2 but also times the number of versions of each library. b) Modifying libraries to pass tests in other libraries make one library dependent on another which might not be obvious. Libraries should be tested individually as a unit to prove that the the implementation of the library matches faithfully implements it's expored interface. Don't we do that (testing libraries individually) already? What else is

Am 12.01.24 um 03:16 schrieb Robert Ramey via Boost: the purpose of each libraries "test" folder? Where does "Modifying libraries to pass tests in other libraries" happen? So far I only observed this when a consuming library exposed a bug in the consumed library which is totally fine, isn't it? So we actually gain something by testing the whole: Not only have we unit tests of the library but the unit tests of the consuming library acts as an integration test of the consumed increasing test coverage. We "eat our own food" so to say.

...

In any case, users should be able to download any number and/or combination of libraries (along with their dependencies) and use just that. This will avoid making uses applications more complicated than they already are. But interfaces do change. See below If some library/app depends on some other library subject to some dependency of compiler level, etc. and that requirement is unfullfilled, It should result in a compile time error. Our config library - a masterpiece by John Maddock - is designed to address this very problem and does so very well. Well that is a great example why mixing Boost library versions does not work: Boost.Config has a growing list of macros such has `BOOST_NO_FOO` and most Boost libraries use that to enable, disable or change features. If a user now uses a newer Boost.X with an older Boost.Config where that macro didn't exist at all (yet), then Boost.X will fail to compile or run into known bugs at runtime (e.g. when workarounds were implemented depending on whether that defect exists, i.e. that macro is defined) Our current CI (and release process) tests each Boost library using a specific (minimum, in case of CI) version of the other Boost libraries it depends on.

...
I think this is overlooking the fact that the Boost release process *works well* right now. Three releases every year like clockwork and they are pretty high quality in terms of having minimal inter-library defects.

I don't dispute this. But - it doesn't scale and can never scale. That's what started this discussion in the first place.

...

But if we finished the "boost modularization", the "global" release would be nothing more than the union of the individual ones and guarenteed to be correct.

What exactly doesn't scale? The goal of the "modularization" should be to be able to consume a Boost release piecewise and it looks like this works quite well. Checking the package manager in Ubuntu I see libboost-regex1.74.0, libboost-thread1.74.0, libboost-filesystem1.74.0, etc. I.e. individual libraries of a single Boost release. If that is what you wanted with then I totally agree. But we already have that, don't we? And if someone doesn't want to download the whole "global release" tarball, they can download the individual libraries from the repos at Github using the same tag. As only those of the same tag are "guarenteed to be correct" (as far as possible). The only issue left I see is that all boost headers need to be in the same include folder. The CMake build already has that solved and AFAIK B2 soon will follow, if it didn't already. Alex

Andrey Semashev

10:50 a.m.

On 1/12/24 05:16, Robert Ramey via Boost wrote:

...

a) Treating Boost "as a unit" and testing on this basis results in an amount of work which increases with the square of the number of libraries. b) Modifying libraries to pass tests in other libraries make one library dependent on another which might not be obvious. Libraries should be tested individually as a unit to prove that the the implementation of the library matches faithfully implements it's expored interface. c) If this is done, it is guaranteed that errors cannot be introduced when libraries are composed.

No, it doesn't. Even disregarding that you can't reasonably test everything, you're forgetting that libraries change (yes, including the public interface) and often affect each other through their usage, sometimes in non-obvious ways. Suppose, a library A provides a unique_ptr implementation (such as the one in Boost.Move) that supports a custom deleter. That library may test that unique_ptr does use the deleter as intended. But that doesn't guarantee that this will still work in another library B that uses the unique_ptr with its custom deleter - for example, because that deleter is defined in B's namespace and since it is specified in unique_ptr template parameters, it now affects ADL. Integration testing exists for a reason. If you're not doing integration testing, you're getting a bunch of disparate components that don't compose well or at all.

Robert Ramey

6:12 p.m.

On 1/12/24 2:50 AM, Andrey Semashev via Boost wrote:

...

On 1/12/24 05:16, Robert Ramey via Boost wrote:

...
a) Treating Boost "as a unit" and testing on this basis results in an amount of work which increases with the square of the number of libraries. b) Modifying libraries to pass tests in other libraries make one library dependent on another which might not be obvious. Libraries should be tested individually as a unit to prove that the the implementation of the library matches faithfully implements it's expored interface. c) If this is done, it is guaranteed that errors cannot be introduced when libraries are composed.

No, it doesn't.

Even disregarding that you can't reasonably test everything, you're forgetting that libraries change (yes, including the public interface)

Changing the public interface is a serious mistake for a library. Ideally, library interfaces implement concepts or compile time asserts on types and run time asserts as pre-conditions. So it this is unavoidable, there is less damage to users who have mistakenly depended on this particular library. I don't think it's fair to expect users to "just deal with it" when a library interface changes.

...

and often affect each other through their usage, sometimes in non-obvious ways.

Maybe. If so, I think this would be a design mistake.

...

Suppose, a library A provides a unique_ptr implementation (such as the one in Boost.Move) that supports a custom deleter. That library may test that unique_ptr does use the deleter as intended.

...

But that doesn't guarantee that this will still work in another library B that uses the unique_ptr with its custom deleter - for example, because that deleter is defined in B's namespace and since it is specified in unique_ptr template parameters, it now affects ADL.

I don't think its appropriate to expect any code which uses unique_ptr to have to specifically test unique_ptr with a users particular custom deleter. unique_ptr should specify it's type requirements for a custom deleter which can guaranteed at compile time and should work with any implementation of a custom deleter which fulfills these requirements. This will guarantee that unique_ptr cannot fail. Of course this doesn't mean that one's custom deleter should not be tested. It should, but independently of anything else. Personally, I'd specify a custom deleter with the namespace name to avoid ADL surprises.

...

Integration testing exists for a reason. If you're not doing integration testing, you're getting a bunch of disparate components that don't compose well or at all.

Exactly wrong. Integration testing test a very small subset of combinations of cases. I only gives the illusion of correctness. The only way to provably correct composition is to formally and completely define interfaces, and test each implementation to guarantee that it correctly implements that interface. If integration testsing fails, it must be because some component has not followed the above rule. Robert Ramey

...

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Peter Dimov

10 Jan 10 Jan

3:24 p.m.

Robert Ramey wrote:

...

This would in practice eliminate the concept of Boost version 1.84 etc... and replace with Boost Serialization library version 1, ...

For most intents and purposes, Boost releases _are_ Boost. If we eliminate the concept of Boost release 1.84, what remains already exists and is called Github.

...

Boost would migrate from being a single/monolithic library to a group of libraries with some explicit dependencies (on other boost librarys, standard library or ?).

Boost would migrate into nothing.

Janek Kozicki

13 Jan 13 Jan

4:35 p.m.

New subject: splitting Boost into sub-libraries with separate version numbers

Robert Ramey via Boost said: (by the date of Tue, 9 Jan 2024 11:48:44 -0800)

...

This would in practice eliminate the concept of Boost version 1.84 etc... and replace with Boost Serialization library version 1, ... Boost would migrate from being a single/monolithic library to a group of libraries with some explicit dependencies (on other boost librarys, standard library or ?).

I have a following code snippet in my serialization header: // To correctly recognize serialization of Inf and NaN numbers, // there are different includes for different boost versions #if BOOST_VERSION>=104700 #include<boost/math/special_functions/nonfinite_num_facets.hpp> #else #include<boost/math/nonfinite_num_facets.hpp> #endif How do you plan to not break it with separate version numbers for each library? best regards Janek -- Janek Kozicki, PhD. DSc. Arch. Assoc. Prof. Gdansk University of Technology (Gdansk Tech) Faculty of Applied Physics and Mathematics Institute of Physics and Applied Computer Science Division of Theoretical Physics and Quantum Information -- http://yade-dem.org/ http://pg.edu.pl/p/jan-kozicki-19725 http://mostwiedzy.pl/en/jan-kozicki,19725-1

Robert Ramey

6:42 p.m.

New subject: splitting Boost into sub-libraries with separate version numbers

On 1/13/24 8:35 AM, Janek Kozicki via Boost wrote:

...

Robert Ramey via Boost said: (by the date of Tue, 9 Jan 2024 11:48:44 -0800)

...
This would in practice eliminate the concept of Boost version 1.84 etc... and replace with Boost Serialization library version 1, ... Boost would migrate from being a single/monolithic library to a group of libraries with some explicit dependencies (on other boost librarys, standard library or ?).

I have a following code snippet in my serialization header:

// To correctly recognize serialization of Inf and NaN numbers, // there are different includes for different boost versions #if BOOST_VERSION>=104700 #include<boost/math/special_functions/nonfinite_num_facets.hpp> #else #include<boost/math/nonfinite_num_facets.hpp> #endif

How do you plan to not break it with separate version numbers for each library?

I would expect each library to be versioned individually. So the above would look something like:

...

// To correctly recognize serialization of Inf and NaN numbers, // there are different includes for different boost versions #include <boost/math/version.hpp> #if BOOST_MATH_VERSION >= 12 (or ?) #include<boost/math/special_functions/nonfinite_num_facets.hpp> #else #include<boost/math/nonfinite_num_facets.hpp> #endif

As an (off topic) aside, that the the serialization library has an archive version number embedded in every archive. It's incremented everytime the archive format is amended. In this way the most recent versions of the serialization library can read archives created in all previous versions of the library. I think this number is up to 20 now. Robert Ramey

...

best regards Janek

-- Janek Kozicki, PhD. DSc. Arch. Assoc. Prof. Gdansk University of Technology (Gdansk Tech) Faculty of Applied Physics and Mathematics Institute of Physics and Applied Computer Science Division of Theoretical Physics and Quantum Information -- http://yade-dem.org/ http://pg.edu.pl/p/jan-kozicki-19725 http://mostwiedzy.pl/en/jan-kozicki,19725-1

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

537

Age (days ago)

543

Last active (days ago)

List overview

Download

39 comments

12 participants

participants (12)

Alan de Freitas
Alexander Grund
Andrey Semashev
Glen Fernandes
Ion Gaztañaga
Janek Kozicki
Peter Dimov
Phil Endecott
Rainer Deyke
Robert Ramey
Sam Darwin
Vinnie Falco