On Boost and modules

older
Re: [boost] [release] Boost 1.85.0...

Ruben Perez

8 Apr 2024 8 Apr '24

3:43 p.m.

Hi all, Following a recent conversation on the possibility to somehow enable Boost to be consumed using C++20 modules, I've jumped in and done a small research: https://anarthal.github.io/cppblog/modules René posted the link to r/cpp and it seems to have attracted some interest (https://www.reddit.com/r/cpp/comments/1bxggim/c20_modules_and_boost_an_analy...). It looks like a small fraction of users would consider consuming Boost as modules if there was the possibility (note that considering here is pretty far from actually doing it). Nevertheless, I'd like to know everyone's opinion on the subject. If we find a way to overcome the technical challenges that I manifest in the article, do you think a set of non-intrusive "module bindings" allowing users to consume Boost as a module could add any value to Boost? Thanks, Ruben.

Show replies by date

Stephan T. Lavavej

8 Apr 8 Apr

4:23 p.m.

[Ruben Perez]

...

do you think a set of non-intrusive "module bindings" allowing users to consume Boost as a module could add any value to Boost?

This would be extremely valuable. Not only are modules the long-term future of C++, but they specifically improve one of Boost's biggest weaknesses - the build throughput cost of dragging in a bunch of header-only machinery and doing very little with it. I'd expect to see similar or greater improvements as we observed between #include <iostream> and import std;. If (when) you encounter any compiler or library bugs in MSVC, I can bring the former to the front-end team's attention and personally investigate the latter. All I ask is for command-line repros with as much Boost machinery reduced away as possible, to save time. STL

Andrey Semashev

5:30 p.m.

On 4/8/24 18:43, Ruben Perez via Boost wrote:

...

Hi all,

Following a recent conversation on the possibility to somehow enable Boost to be consumed using C++20 modules, I've jumped in and done a small research: https://anarthal.github.io/cppblog/modules

It would be interesting to see the benchmark numbers for a larger number of CPU cores (e.g. 16). I can see in the table that up to 3 TUs build time with modules is 25-40% higher than with headers and the situation significantly changes for 4 and more TUs. You were using 3 cores for compilation, and I wonder if this is related.

...

René posted the link to r/cpp and it seems to have attracted some interest (https://www.reddit.com/r/cpp/comments/1bxggim/c20_modules_and_boost_an_analy...). It looks like a small fraction of users would consider consuming Boost as modules if there was the possibility (note that considering here is pretty far from actually doing it).

Nevertheless, I'd like to know everyone's opinion on the subject. If we find a way to overcome the technical challenges that I manifest in the article, do you think a set of non-intrusive "module bindings" allowing users to consume Boost as a module could add any value to Boost?

I find the fact that modules, including std, are not redistributable and that they must be built for every project extremely disappointing. I suspect, this will negate the potential benefit from reducing parsing when large projects like Boost are consumed as modules. Remember that we typically use only small portions of Boost, or even small portions of select Boost libraries. It doesn't make sense to have to build the whole Boost into a module only to pull a small part from it. I would much rather include the headers I want instead. This is also relevant to the standard library. Will we have to build the ever-growing std module every time we need a smallest thing from the standard library? This sounds like a disaster to me. One other thing that isn't clear is how modules interact with compiled libraries. I don't suppose modules will replace static/shared libraries, so I presume a module will be added on top of the library? How should it "export" the symbols that are already exported from the compiled library then?

Robert Ramey

6:29 p.m.

On 4/8/24 10:30 AM, Andrey Semashev via Boost wrote:

...

I find the fact that modules, including std, are not redistributable and that they must be built for every project extremely disappointing. I suspect, this will negate the potential benefit from reducing parsing when large projects like Boost are consumed as modules. Remember that we typically use only small portions of Boost, or even small portions of select Boost libraries. It doesn't make sense to have to build the whole Boost into a module only to pull a small part from it. I would much rather include the headers I want instead.

This is also relevant to the standard library. Will we have to build the ever-growing std module every time we need a smallest thing from the standard library? This sounds like a disaster to me.

One other thing that isn't clear is how modules interact with compiled libraries. I don't suppose modules will replace static/shared libraries, so I presume a module will be added on top of the library? How should it "export" the symbols that are already exported from the compiled library then?

The more I read this thread, the more it seems to me that modules are just a bad idea. We already have shared libraries which are redistributable and that's already a hassle given all the compiler switches. shared libraries have the same issue in that if one only want's to use one function, the whole library has to be shipped. The complaints regarding compile times are not even valid as far as I'm concerned. I've been on jobs where the "product" has compile times as long as 12 hours. It seems to always turn out that this is due to lazy programmers just including too much, ("convenience" headers), including the same thing over and over again (header only libraries), not understanding basic ideas like PIMPL, and not understanding the libraries they are already including, avoiding re-factoring when it's called for (they can't do it because they didn't document the libraries they included in the first place), not understanding and using explicit instantiation in the right places, etc. I don't think there is any way to fix all by just including some new already overly complex facility. Compiler vendors aren't helping either. How is it that they can't get it together and agree on a syntax for declaring visibility on library functions and types? How about some system to guarantee that compiler switches are compatible for imported code - be it in shared or unshared libraries? Then there is the committee. Stop with the doo-dads like three way if and stuff like that. Spend some more time thinking about making constexpr more automatic and still backward compatible and maybe more exportable. As for Boost, using our toolset has been a 22 year wrestling match. It looks like I need to rebuild b2 (bootstrap) when I update to the latest boost because there's some coupling somewhere. I could go on - but I'll quit while I'm ahead. Robert Ramey

Vinnie Falco

6:41 p.m.

On Mon, Apr 8, 2024 at 11:30 AM Robert Ramey via Boost < boost@lists.boost.org> wrote:

...

it seems to me that modules are just a bad idea

Modules are a great *idea*, but if no one does the work to explore them it will never become a practical solution. Thanks

Stephan T. Lavavej

7:04 p.m.

[Andrey Semashev]

...

I find the fact that modules, including std, are not redistributable and that they must be built for every project extremely disappointing.

This allows modules to respect compiler options (including Standard modes) and macro definitions.

...

Remember that we typically use only small portions of Boost, or even small portions of select Boost libraries. It doesn't make sense to have to build the whole Boost into a module only to pull a small part from it.

Quite the opposite - building Boost into a module makes subsequent imports lightning-fast, especially when only small portions are used (and must pay instantiation costs).

...

This is also relevant to the standard library. Will we have to build the ever-growing std module every time we need a smallest thing from the standard library? This sounds like a disaster to me.

Again, quite the opposite. You need to build the Standard Library Modules when your compiler options change (of course you can keep separate directories for separate configurations). After that, the built modules can be used until your toolset version changes.

...

One other thing that isn't clear is how modules interact with compiled libraries. I don't suppose modules will replace static/shared libraries, so I presume a module will be added on top of the library? How should it "export" the symbols that are already exported from the compiled library then?

If you want separately compiled source files to be usable with classic headers or named modules equally, this is possible. In MSVC we've achieved this for the Standard Library by using extern "C++". (We're still working on handling having both classic includes and named module imports within a single TU; that's the most challenging case. As of VS 2022 17.10, including an STL header before import std; works, but not the other way around.) [Robert Ramey]

...

It seems to always turn out that this is due to lazy programmers just including too much, ("convenience" headers)

Good news, then - modules make this essentially a non-issue. Importing a module and doing nothing with it is extremely cheap, because a module is a highly structured representation of library code which can be looked up on-demand. That's why we have a monolithic `import std;` that's near-instantaneous to drag in, compared to even a single "fine-grained" `#include <iostream>`.

...

including the same thing over and over again (header only libraries)

This is partially mitigated by modules, since the cost of having to parse (and do the initial phase of lookups etc.) is paid once when building the module. I say "partially" because instantiation costs are still paid per TU. Theoretically, this is one of the few scenarios where PCHes could theoretically have superior throughput to modules, since PCHes (compiler memory snapshots) can capture instantiations along the way. However, PCHes are so inflexible, that giving that up should generally be worthwhile. STL

Ruben Perez

7:21 p.m.

...

...
One other thing that isn't clear is how modules interact with compiled libraries. I don't suppose modules will replace static/shared libraries, so I presume a module will be added on top of the library? How should it "export" the symbols that are already exported from the compiled library then?

If you want separately compiled source files to be usable with classic headers or named modules equally, this is possible. In MSVC we've achieved this for the Standard Library by using extern "C++". (We're still working on handling having both classic includes and named module imports within a single TU; that's the most challenging case. As of VS 2022 17.10, including an STL header before import std; works, but not the other way around.)

I have questions regarding this - maybe you can help me here: * Let's say I build a Boost module. This #include's std headers in its global module fragment. * Now the user compiles their main.cpp, and they import boost and the std modules. No includes here. Is this scenario supposed to work? If it is, is it supposed to be portable (i.e. defined by the standard)? Thanks, Ruben.

Stephan T. Lavavej

7:38 p.m.

[Ruben Perez]

...

I have questions regarding this - maybe you can help me here: * Let's say I build a Boost module. This #include's std headers in its global module fragment. * Now the user compiles their main.cpp, and they import boost and the std modules. No includes here. Is this scenario supposed to work? If it is, is it supposed to be portable (i.e. defined by the standard)?

It's supposed to work portably according to the Standard (N4971 [std.modules]). With our current implementation, this is effectively the include/import mixing that's challenging for MSVC to handle. (May or may not work in VS 2022 17.10. I have a test that ensures that plain #include <meow> before import std; works, but not more exotic scenarios. Basically anything that leads to the std module being dragged in, and then the compiler seeing an #include <meow>, will result in the compiler seeing duplicate machinery that with the current implementation it cannot reconcile. Having #include <meow> be within another module's GMF counts AFAIK.) It would be better for the Boost module to be built on top of the Standard Library Module (both in terms of avoiding current compiler limitations, and for long-term cleanliness and build throughput). STL

Ruben Perez

7:52 p.m.

...

...
I have questions regarding this - maybe you can help me here: * Let's say I build a Boost module. This #include's std headers in its global module fragment. * Now the user compiles their main.cpp, and they import boost and the std modules. No includes here. Is this scenario supposed to work? If it is, is it supposed to be portable (i.e. defined by the standard)?

It's *supposed* to work portably according to the Standard (N4971 [std.modules]).

With our current implementation, this is effectively the include/import mixing that's challenging for MSVC to handle. (May or may not work in VS 2022 17.10. I have a test that ensures that *plain* #include <meow> before import std; works, but not more exotic scenarios. Basically anything that leads to the std module being dragged in, and *then* the compiler seeing an #include <meow>, will result in the compiler seeing duplicate machinery that with the current implementation it cannot reconcile. Having #include <meow> be within another module's GMF counts AFAIK.) It would be better for the Boost module to be built on top of the Standard Library Module (both in terms of avoiding current compiler limitations, and for long-term cleanliness and build throughput).

STL

I see. Is it expected to be fixed in the near future? AFAIK libc++ makes it work by making the std modules a proxy to the standard headers, with the "export using" idiom. I know it'd be better for clean-ness in the long run, but this requires much more maintenance. I see it difficult right now to get enough buy in from library authors as to commit to #ifdef-ing out all their standard includes. If we managed to get a non-intrusive implementation out and this gained some traction, we could then think to do what you proposed. Ruben.

Stephan T. Lavavej

9:51 p.m.

[Ruben Perez]

...

I see. Is it [arbitrary import/include mixing in a TU] expected to be fixed in the near future?

We don't have an ETA at this time (definitely not VS 2022 17.10, unlikely 17.11). The compiler team wants to resolve it by invisibly translating #include <meow> to behave as import std.compat; with additional magic behind the scenes (to handle macros and non-Standard identifiers, which import std.compat; does not emit by itself). That translation mechanism doesn't exist in the compiler yet, and I still need to finish an audit of what CRT stuff the STL drags in (that someone saying #include <meow> might be depending on). They've told me that making extern "C++" (the mechanism I used to make include-before-import mixing work in 17.10) handle import-before-include is not feasible. I don't truly understand that, but I'm a library dev precisely because I don't want to understand deep compiler internals 😹 STL

Boris Kolpackov

9 Apr 9 Apr

6:57 a.m.

Ruben Perez via Boost <boost@lists.boost.org> writes:

...

AFAIK libc++ makes it work by making the std modules a proxy to the standard headers, with the "export using" idiom.

At least up to Clang 18 it "works" to the same level as in MSVC 17.10: include-before-import works, the other way around -- doesn't. It's possible this has changed in Clang 19, though.

Daniela Engert

9:25 a.m.

...

Boris Kolpackov via Boost <boost@lists.boost.org> hat am 09.04.2024 08:57 CEST geschrieben:

Ruben Perez via Boost <boost@lists.boost.org> writes:

...
AFAIK libc++ makes it work by making the std modules a proxy to the standard headers, with the "export using" idiom.

At least up to Clang 18 it "works" to the same level as in MSVC 17.10: include-before-import works, the other way around -- doesn't. It's possible this has changed in Clang 19, though.

Right, and this precludes any non-trivial use of 'import std' when you can't control *all* dependencies in their entirety. You need to guarantee that *all* #includes of header files lexically appear before all imports that directly or indirectly depend on those same header files. In *all* translation units. The standard *mandates* implementers to make the interoperability of #includes of standard C++ headers and the modularized standard library happen, regardless of ordering (with different levels of compliance in current implementations). But there is no such guarantee for other libraries. I see a lot of people that underestimate what it takes to get this right. Experience level 1: look Ma, my module compiles! I know modules! Experience level 2: look Ma, I can import my module into a .cpp file! I know modules! Experience level 3: look Ma, I can import my module into another module and it works! Experience level 4: look Ma, I can import my module and other modules that import my module into the same .cpp files! Experience level 5: that whole stuff actually links and does what it it supposed to do. On one compiler. Experience level 6: you can repeat that, on multiple compilers. The demo application from last year's talks was deliberately constructed such that I could demonstrate what it takes to get there. The hardest part was to deal with Asio, clang, and all the ABI deficiencies that were unveiled along the journey. Dani

Rainer Deyke

11:49 a.m.

On 09.04.24 11:25, Daniela Engert via Boost wrote:

...

Right, and this precludes any non-trivial use of 'import std' when you can't control *all* dependencies in their entirety. You need to guarantee that *all* #includes of header files lexically appear before all imports that directly or indirectly depend on those same header files. In *all* translation units.

I don't see what's so difficult about this requirement. Just put your #includes above your imports, and don't use imports in header files. Header-based code will automatically follow these rules by virtue of not using imports. Module-based code will naturally follow these rules by virtue of not using header files, and by putting the #includes above the imports. Or am I missing something here? -- Rainer Deyke (rainerd@eldwood.com)

Ruben Perez

12:05 p.m.

...

...
Right, and this precludes any non-trivial use of 'import std' when you can't control *all* dependencies in their entirety. You need to guarantee

On 09.04.24 11:25, Daniela Engert via Boost wrote: that *all* #includes of header files lexically appear before all imports that directly or indirectly depend on those same header files. In *all* translation units.

I don't see what's so difficult about this requirement. Just put your #includes above your imports, and don't use imports in header files. Header-based code will automatically follow these rules by virtue of not using imports. Module-based code will naturally follow these rules by virtue of not using header files, and by putting the #includes above the imports. Or am I missing something here?

What I understood (please correct me if I'm wrong) is that importing a module that uses an include in its global module fragment and then importing another module does violate this rule.

...

-- Rainer Deyke (rainerd@eldwood.com)

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Boris Kolpackov

10 Apr 10 Apr

4:33 a.m.

Ruben Perez via Boost <boost@lists.boost.org> writes:

...

...
I don't see what's so difficult about this requirement. Just put your #includes above your imports, and don't use imports in header files. Header-based code will automatically follow these rules by virtue of not using imports. Module-based code will naturally follow these rules by virtue of not using header files, and by putting the #includes above the imports. Or am I missing something here?

What I understood (please correct me if I'm wrong) is that importing a module that uses an include in its global module fragment and then importing another module does violate this rule.

I tend to agree with Rainer, this would only become an issue if you start putting imports into headers (and only for modules that can also be consumed via headers). In particular, I believe this specific issue (ignoring declarations that have already been imported/included) is limited to a single translation unit. But maybe I am wrong. Can you elaborate on the scenario that you think is problematic? In particular, I think the "and then importing another module" part is missing the "that does XXX" (maybe "that uses an import instead")? I've tried the following with Clang 18 and libc++'s std module and everything seems to work without any issues: // a.mxx module; #include <iostream> export module a; export void a (std::ostream& os) {os << "a";} // b.mxx export module b; import std; export void b (std::ostream& os) {os << "b";} // c.mxx // export module c; import a; import b; import std; export void c (std::ostream& os) {a (os); b (os);} I've tried different orders of importation in c.mxx without any changes in the result.

Matt Borland

8 Apr 8 Apr

7:56 p.m.

...

It's supposed to work portably according to the Standard (N4971 [std.modules]).

...

With our current implementation, this is effectively the include/import mixing that's challenging for MSVC to handle. (May or may not work in VS 2022 17.10. I have a test that ensures that plain #include <meow> before import std; works, but not more exotic scenarios. Basically anything that leads to the std module being dragged in, and then the compiler seeing an #include <meow>, will result in the compiler seeing duplicate machinery that with the current implementation it cannot reconcile. Having #include <meow> be within another module's GMF counts AFAIK.) It would be better for the Boost module to be built on top of the Standard Library Module (both in terms of avoiding current compiler limitations, and for long-term cleanliness and build throughput).

...

STL

While likely an uncommon issue have you explored cyclic dependencies between modules? For example we are modularizing boost.math now, and would like to import std. In the MSVC case you have boost.math as a submodule for the implementation of C++17 special math. Would this break our ability to import std? If this is uncharted territory I'll experiment with it, and report back. Matt

Stephan T. Lavavej

9:43 p.m.

[Matt Borland]

...

While likely an uncommon issue have you explored cyclic dependencies between modules? For example we are modularizing boost.math now, and would like to import std. In the MSVC case you have boost.math as a submodule for the implementation of C++17 special math. Would this break our ability to import std?

This is a great question. It shouldn't be a problem because we've isolated our use of Boost.Math to our separately compiled sources (a "satellite DLL" or static lib). Classic STL headers and import std; just see us calling extern "C" functions named __std_smf_riemann_zeta and so forth. (For static linking, where there is no binary isolation, we have a pre-existing problem, unrelated to modules, in that we're dragging in ordinary boost:: symbols which raises the specter of ODR violations. As for modules, we're including Boost.Math classically, and we expect it to include the Standard Library classically in this mode, so no import std; machinery should be sneaking into our static lib, so at least the static lib problem won't get worse.) Thanks, STL

Matt Borland

9 Apr 9 Apr

6:45 a.m.

On Monday, April 8th, 2024 at 11:43 PM, Stephan T. Lavavej <stl@exchange.microsoft.com> wrote:

...

[Matt Borland]

...
While likely an uncommon issue have you explored cyclic dependencies between modules? For example we are modularizing boost.math now, and would like to import std. In the MSVC case you have boost.math as a submodule for the implementation of C++17 special math. Would this break our ability to import std?

...

This is a great question. It shouldn't be a problem because we've isolated our use of Boost.Math to our separately compiled sources (a "satellite DLL" or static lib). Classic STL headers and `import std;` just see us calling `extern "C"` functions named `__std_smf_riemann_zeta` and so forth.

...

(For static linking, where there is no binary isolation, we have a pre-existing problem, unrelated to modules, in that we're dragging in ordinary `boost::` symbols which raises the specter of ODR violations. As for modules, we're including Boost.Math classically, and we expect it to include the Standard Library classically in this mode, so no `import std;` machinery should be sneaking into our static lib, so at least the static lib problem won't get worse.)

...

Thanks, STL

Thanks for the response. I believe it was Casey (or someone from your team) that asked us about allowing the namespace for those functions to be configured via macro. Let me know if that's something we need to look at again to fix those issues. Matt

Stephan T. Lavavej

7:27 p.m.

[STL]

...

For static linking, where there is no binary isolation, we have a pre-existing problem, unrelated to modules, in that we're dragging in ordinary `boost::` symbols which raises the specter of ODR violations.

[Matt Borland]

...

I believe it was Casey (or someone from your team) that asked us about allowing the namespace for those functions to be configured via macro. Let me know if that's something we need to look at again to fix those issues.

Yeah, this is still tracked by the open issue https://github.com/boostorg/math/issues/769 . We haven't asked about it recently since it hasn't been actively causing problems, but we'd use such a macro if support for it were added. Thanks! STL

Andrey Semashev

8 Apr 8 Apr

9:09 p.m.

On 4/8/24 22:04, Stephan T. Lavavej wrote:

...

[Andrey Semashev]

...
I find the fact that modules, including std, are not redistributable and that they must be built for every project extremely disappointing.

This allows modules to respect compiler options (including Standard modes) and macro definitions.

Why changing compiler options affects C++ AST? Which is what the compiled module basically is, or supposed to be, anyway. Yes, the AST could change because of different macro definitions, but that is not an issue. Or not more of an issue than it currently is with different macro definitions between TUs. I don't expect modules to solve this issue.

...

...
Remember that we typically use only small portions of Boost, or even small portions of select Boost libraries. It doesn't make sense to have to build the whole Boost into a module only to pull a small part from it.

Quite the opposite - building Boost into a module makes subsequent imports lightning-fast, especially when only small portions are used (and must pay instantiation costs).

Maybe, but one needs to build it first in its entirety. And there are cases when you *always* build from scratch. For example, in CI. This seems like a deal breaker to me.

...

...
This is also relevant to the standard library. Will we have to build the ever-growing std module every time we need a smallest thing from the standard library? This sounds like a disaster to me.

Again, quite the opposite. You need to build the Standard Library Modules when your compiler options change (of course you can keep separate directories for separate configurations). After that, the built modules can be used until your toolset version changes.

Again, why this is needed? As far as I'm concerned, the standard library is bundled with the compiler, and its module should ship with it, just like headers and compiled library currently are.

...

...
One other thing that isn't clear is how modules interact with compiled libraries. I don't suppose modules will replace static/shared libraries, so I presume a module will be added on top of the library? How should it "export" the symbols that are already exported from the compiled library then?

If you want separately compiled source files to be usable with classic headers or named modules equally, this is possible. In MSVC we've achieved this for the Standard Library by using extern "C++".

Could you give an example? Does this involve some compiler-specific magic (i.e. non-portable), beyond marking symbols exported from the compiled library with __declspec(dllexport)/__attribute__((visibility("default")))?

...

(We're still working on handling having both classic includes and named module imports within a single TU; that's the most challenging case. As of VS 2022 17.10, including an STL header before import std; works, but not the other way around.)

Is the order of includes and imports a fundamental limitation or is this a limitation of current implementations that will be lifted in the future?

Andrey Semashev

9:31 p.m.

On 4/9/24 00:09, Andrey Semashev wrote:

...

On 4/8/24 22:04, Stephan T. Lavavej wrote:

...
[Andrey Semashev]

...
I find the fact that modules, including std, are not redistributable and that they must be built for every project extremely disappointing.

This allows modules to respect compiler options (including Standard modes) and macro definitions.

Why changing compiler options affects C++ AST? Which is what the compiled module basically is, or supposed to be, anyway.

Re. standard versions, there are ways to support multiple standard versions and still ship the compiled std module. For example: 1. Ship everything in the same module std. This will make the module std expose symbols defined by all C++ versions, but that should not be a problem, as conforming user's code will use whatever matches his chosen C++ version. 2. Ship multiple compiled modules for different C++ versions. Pick one during compilation. Will also need to do this to support debug and release versions of the standard library. Maybe there are other ways, those two are just what immediately came to mind.

...

Yes, the AST could change because of different macro definitions, but that is not an issue. Or not more of an issue than it currently is with different macro definitions between TUs. I don't expect modules to solve this issue.

...
...
Remember that we typically use only small portions of Boost, or even small portions of select Boost libraries. It doesn't make sense to have to build the whole Boost into a module only to pull a small part from it.

Quite the opposite - building Boost into a module makes subsequent imports lightning-fast, especially when only small portions are used (and must pay instantiation costs).

Maybe, but one needs to build it first in its entirety. And there are cases when you *always* build from scratch. For example, in CI. This seems like a deal breaker to me.

...
...
This is also relevant to the standard library. Will we have to build the ever-growing std module every time we need a smallest thing from the standard library? This sounds like a disaster to me.

Again, quite the opposite. You need to build the Standard Library Modules when your compiler options change (of course you can keep separate directories for separate configurations). After that, the built modules can be used until your toolset version changes.

Again, why this is needed? As far as I'm concerned, the standard library is bundled with the compiler, and its module should ship with it, just like headers and compiled library currently are.

...
...
One other thing that isn't clear is how modules interact with compiled libraries. I don't suppose modules will replace static/shared libraries, so I presume a module will be added on top of the library? How should it "export" the symbols that are already exported from the compiled library then?

If you want separately compiled source files to be usable with classic headers or named modules equally, this is possible. In MSVC we've achieved this for the Standard Library by using extern "C++".

Could you give an example? Does this involve some compiler-specific magic (i.e. non-portable), beyond marking symbols exported from the compiled library with __declspec(dllexport)/__attribute__((visibility("default")))?

...
(We're still working on handling having both classic includes and named module imports within a single TU; that's the most challenging case. As of VS 2022 17.10, including an STL header before import std; works, but not the other way around.)

Is the order of includes and imports a fundamental limitation or is this a limitation of current implementations that will be lifted in the future?

Stephan T. Lavavej

10:29 p.m.

[Andrey Semashev]

...

Re. standard versions, there are ways to support multiple standard versions and still ship the compiled std module. For example: 1. Ship everything in the same module std. This will make the module std expose symbols defined by all C++ versions, but that should not be a problem, as conforming user's code will use whatever matches his chosen C++ version. 2. Ship multiple compiled modules for different C++ versions. Pick one during compilation. Will also need to do this to support debug and release versions of the standard library. Maybe there are other ways, those two are just what immediately came to mind.

I encourage you to explore these ideas for Boost and discover why they are not viable for production use. STL

Andrey Semashev

11:47 p.m.

On 4/9/24 01:29, Stephan T. Lavavej wrote:

...

[Andrey Semashev]

...
Re. standard versions, there are ways to support multiple standard versions and still ship the compiled std module. For example: 1. Ship everything in the same module std. This will make the module std expose symbols defined by all C++ versions, but that should not be a problem, as conforming user's code will use whatever matches his chosen C++ version. 2. Ship multiple compiled modules for different C++ versions. Pick one during compilation. Will also need to do this to support debug and release versions of the standard library. Maybe there are other ways, those two are just what immediately came to mind.

I encourage you to explore these ideas for Boost and discover why they are not viable for production use.

Well, Boost has been doing #2 for years, at least on Windows. Not wrt. C++ versions, but we do ship binaries for different compiler versions, debug/release, etc.

Stephan T. Lavavej

10:27 p.m.

[Andrey Semashev]

...

Why changing compiler options affects C++ AST?

Increasing /std:c++20 to /std:c++latest (or /std:c++23, /std:c++26, etc. in the future) causes lots of stuff to appear, some stuff to disappear, some stuff to be marked as deprecated, more stuff to be marked as constexpr, and some stuff simply changes form (return types have been changed from void to non-void in the past, classes have gained typedefs, etc.). Changing between static linking and dynamic linking (/MT versus /MD) affects whether things are declared as __declspec(dllimport). Changing between release and debug (/MT or /MD versus /MTd or /MDd) massively affects the representations of classes, and the code that they execute. The calling convention options (/Gd /Gr /Gv /Gz) affect whether functions are treated as __cdecl, __stdcall, __fastcall, __vectorcall, etc. The /Zp option (affecting packing) affects the layout of classes. The STL defends itself against this one, but most code doesn't bother. There are many escape hatches for Standard behavior that affect semantics: The accursed /Zc:wchar_t- affects whether wchar_t is a real type or a fake unsigned short. /Zc:noexceptTypes- affects whether noexcept participates in the type system, which the STL has to occasionally react to by omitting noexcept from function pointer typedefs. /Zc:char8_t- removes char8_t from the type system, and the STL has to react accordingly. And on, and on, and on. I haven't even mentioned the macro modes we support (like controlling deprecations, restoring removed machinery, etc.). Shipping all possible combinations of these settings is impossible.

...

Maybe, but one needs to build it first in its entirety. And there are cases when you *always* build from scratch. For example, in CI. This seems like a deal breaker to me.

Building the Standard Library Modules takes something like 6 seconds and emits less than 40 MB of output (it's about 10x smaller than a PCH). The cost is nonzero, but not massive. Boost's headers are more massive than the Standard Library, but I still expect building all of Boost as a module to be pretty fast - certainly nothing like building Boost's separately compiled components which is extremely expensive.

...

Again, why this is needed? As far as I'm concerned, the standard library is bundled with the compiler, and its module should ship with it, just like headers and compiled library currently are.

The headers are compiled with the user's choice of compiler options and macros - which, as I explained above, can vary dramatically. Modules are an alternative to classic inclusion, so they need to respect those options. The separately compiled sources are a huge headache precisely because they can only ship in a small, finite number of configurations - which is why we've tried to shrink the separately compiled sources over the years, and flatten its surface area to plain old extern "C" functions. We tried shipping MSVC's early experimental, non-Standard modules for the standard library as prebuilt components that were usable with specific compiler options. This wasn't suitable for the Standard, production-level import std; which is why we will never ship prebuilt versions of them, only std.ixx and std.compat.ixx sources.

...

...
If you want separately compiled source files to be usable with classic headers or named modules equally, this is possible. In MSVC we've achieved this for the Standard Library by using extern "C++".

...

Could you give an example? Does this involve some compiler-specific magic (i.e. non-portable), beyond marking symbols exported from the compiled library with __declspec(dllexport)/__attribute__((visibility("default")))?

I can only speak for the MSVC environment (I don't know what a visibility attribute is). The separately compiled sources are built normally (no modules). The headers declaring separately compiled machinery (whether functions or classes) need to wrap them in either extern "C" (if you want that, with the usual effects) or extern "C++". extern "C++" is interesting because it's valid going back to C++98, but had essentially no effect. Now it means "this stuff is attached to the global module", which allows module code to link with classic code. (That is, in MSVC where modules have strong ownership, we still want any separately compiled machinery to not be owned by the module.) Because classic code isn't affected by extern "C++", non-modules scenarios aren't impacted. (In the STL, we went further and wrapped everything in extern "C++" that wasn't already extern "C". That gave up strong ownership as a workaround for making the include-before-import scenario work. This was an acceptable sacrifice because std is special and already relies on _Ugly names to avoid conflicts with implementation details, so we don't need strong ownership to coexist with user code.)

...

Is the order of includes and imports a fundamental limitation or is this a limitation of current implementations that will be lifted in the future?

It is a current-implementation limitation of MSVC (can't speak for the other toolsets) that will be resolved, somehow, in the future. We know it's a huge headache to widespread use of modules in practice. As I've mentioned, the Standard requires arbitrary mixing to work (and in fact I wrote that Standardese). STL

Ruben Perez

10:34 p.m.

...

extern "C++" is interesting because it's valid going back to C++98, but had essentially no effect. Now it means "this stuff is attached to the global module", which allows module code to link with classic code. (That is, in MSVC where modules have strong ownership, we still want any separately compiled machinery to not be owned by the module.)

Apologies if this is a dumb question, but what does "strong ownership" mean?

...

Stephan T. Lavavej

11:02 p.m.

[Ruben Perez]

...

Apologies if this is a dumb question, but what does "strong ownership" mean?

It's a good question which I'm only partially qualified to answer (I know a lot about what it's taken to modularize the entire Standard Library, but there's a lot of Core Language stuff that I haven't needed to learn for this narrow task). My understanding is that the Standard doesn't specify (i.e. leaves it up to implementations to decide) what happens when two modules internally use the same names for different things. Consider a Cats module and a Dogs module. import Cats; makes Cats::meow() available, and import Dogs; makes Dogs::woof() available, because they've marked Cats::meow() and Dogs::woof() with the export keyword. These modules can be imported by the same TU (or separate TUs in the same program) and used without any conflicts, because they aren't trying to export the same names. But what if the Cats module relies on non-exported, completely internal machinery details::make_noise(), and the (independently written and maintained) Dogs module also happens to have internal machinery named details::make_noise(), that does totally different canine things? In the classic, non-modules world, the answer is clear - this is an ODR violation with undefined behavior, and you'll get a linker error if you're very lucky. (Header-only code, or statically linked separately compiled code, provides no isolation. DLLs do, but the Standard doesn't recognize their existence.) Modules provide more structure because when the Cats module is built, it knows that it's for Cats machinery. So an implementation is allowed to make the Cats module "strongly own" non-exported symbols like details::make_noise(). (My understanding is that this results in details::make_noise() being mangled to reflect the fact that Cats owns it.) If an implementation chooses the strong ownership strategy, then Cats can have its details::make_noise() coexist with Dogs also having its details::make_noise(), and there is no conflict, no ODR violation, and everything is fine. This intuitively makes sense, because each module strictly controls its exported surface area, and its implementation details shouldn't matter to other modules, and users should be able to freely combine modules. For reasons that I completely do not understand with my cat-sized brain that is barely able to print "3.14", I believe that only MSVC has chosen the strong ownership strategy, while Clang and GCC have chosen "weak ownership" (which is perhaps easier to understand - with that strategy, details::make_noise() isn't specially affected by whether it appears in module Cats or module Dogs, so you get the same kind of ODR violation that classic headers would produce). Apparently compiler devs feel really strongly about both sides of this issue and I don't know why. Anyways, this is relevant to making modules code interact with classically compiled code, because the classically compiled code doesn't know anything about modules and isn't attached to any named modules. There's the "global module fragment" (again, something I'm only partially qualified to talk about), which is a structured way to say "hey, no module owns any of this stuff". My understanding is that it can be a good way to deal with entire libraries that are classic and haven't been modularized. However, I found that it wasn't really suited for dealing with a pre-existing mostly-header-only library that occasionally declares separately compiled machinery in the middle of its usual header-only definitions. I ended up using the GMF for UCRT machinery only (since I can enumerate all UCRT headers, include them in the GMF, and I don't want the std module to own anything from the UCRT; this is essentially belt-and-suspenders since everything in the UCRT is already extern "C" but I was advised that it was a good idea to put them in the GMF to be extra sure). Hope this helps. Learning this stuff was difficult for me since (1) modules are so new and (2) a lot of what has been written about modules has been from a completely clean slate perspective, not from the perspective I needed which was (3) continuing to support classic includes and named modules with the same codebase, and (4) having a classic separately compiled component (that accreted over 20+ years and was a big headache even before modules). I'm very eager to see more library authors explore modularization so the community can learn these techniques (and possibly discover superior ones, I don't pretend that I've found the best strategy for all time). STL

René Ferdinand Rivera Morell

11:14 p.m.

On Mon, Apr 8, 2024 at 6:03 PM Stephan T. Lavavej via Boost <boost@lists.boost.org> wrote:

...

For reasons that I completely do not understand with my cat-sized brain that is barely able to print "3.14", I believe that only MSVC has chosen the strong ownership strategy, while Clang and GCC have chosen "weak ownership" (which is perhaps easier to understand - with that strategy, details::make_noise() isn't specially affected by whether it appears in module Cats or module Dogs, so you get the same kind of ODR violation that classic headers would produce). Apparently compiler devs feel really strongly about both sides of this issue and I don't know why.

It's my understanding that both clang and gcc are moving, or have moved, to strong ownership also. -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supone Nada -- Robot Dreams - http://robot-dreams.net

Daniela Engert

9 Apr 9 Apr

5:25 a.m.

Am 09.04.2024 um 01:14 schrieb René Ferdinand Rivera Morell via Boost:

...

On Mon, Apr 8, 2024 at 6:03 PM Stephan T. Lavavej via Boost <boost@lists.boost.org> wrote:

...
For reasons that I completely do not understand with my cat-sized brain that is barely able to print "3.14", I believe that only MSVC has chosen the strong ownership strategy, while Clang and GCC have chosen "weak ownership" (which is perhaps easier to understand - with that strategy, details::make_noise() isn't specially affected by whether it appears in module Cats or module Dogs, so you get the same kind of ODR violation that classic headers would produce). Apparently compiler devs feel really strongly about both sides of this issue and I don't know why. It's my understanding that both clang and gcc are moving, or have moved, to strong ownership also.

Right, at least clang has. It was the intent of both compiler teams to leave the weak ownership model behind. Over the course of multiple versions (beginning with clang 16), that transition caused clang to struggle with correctly forming linker symbols and deficiencies in the Itanium ABI which made it impossible (in non-obvious ways) to use modules (like Asio) at scale. Related to the chosen ownership model, there is the notion of "attachment". That's something most developers are unfamiliar with or never have heard about. For a starter, people might want to look here: https://www.reddit.com/r/cpp/comments/1busseu/comment/kxw409i/ . My talks on modules touch on that subject, too, including the consequences. Adding language linkage specifications to non-exported declarations is the lever to "detach" entities from the module they morally belong to. Thanks Dani -- PGP/GPG: 2CCB 3ECB 0954 5CD3 B0DB 6AA0 BA03 56A1 2C4638C5

Andrey Semashev

12:37 a.m.

On 4/9/24 01:27, Stephan T. Lavavej wrote:

...

[Andrey Semashev]

...
Why changing compiler options affects C++ AST?

Increasing /std:c++20 to /std:c++latest (or /std:c++23, /std:c++26, etc. in the future) causes lots of stuff to appear, some stuff to disappear, some stuff to be marked as deprecated, more stuff to be marked as constexpr, and some stuff simply changes form (return types have been changed from void to non-void in the past, classes have gained typedefs, etc.).

Changing between static linking and dynamic linking (/MT versus /MD) affects whether things are declared as __declspec(dllimport).

Changing between release and debug (/MT or /MD versus /MTd or /MDd) massively affects the representations of classes, and the code that they execute.

The calling convention options (/Gd /Gr /Gv /Gz) affect whether functions are treated as __cdecl, __stdcall, __fastcall, __vectorcall, etc.

The /Zp option (affecting packing) affects the layout of classes. The STL defends itself against this one, but most code doesn't bother.

There are many escape hatches for Standard behavior that affect semantics:

The accursed /Zc:wchar_t- affects whether wchar_t is a real type or a fake unsigned short.

/Zc:noexceptTypes- affects whether noexcept participates in the type system, which the STL has to occasionally react to by omitting noexcept from function pointer typedefs.

/Zc:char8_t- removes char8_t from the type system, and the STL has to react accordingly.

And on, and on, and on. I haven't even mentioned the macro modes we support (like controlling deprecations, restoring removed machinery, etc.). Shipping all possible combinations of these settings is impossible.

Many of the things you mentioned above have no effect on the AST. For example, it doesn't matter which calling convention or struct packing the user chooses, it doesn't affect the function or class definitions. It affects code generation, yes, but that does not take place when modules are compiled. Even wchar_t, char8_t, noexcept being part of the type system and deprecation markup (presumably, using __declspec) should not matter since the AST should contain these tokens either way. The different options control *interpretation* of these tokens, and will only affect template instantiations, overload resolution, name mangling and code generation, none of which happens during module compilation. So, of the options you mentioned, only the C++ version, static/shared linking and debug/release version are the options that may in fact affect AST. This seems manageable.

Boris Kolpackov

1:58 p.m.

Andrey Semashev via Boost <boost@lists.boost.org> writes:

...

Many of the things you mentioned above have no effect on the AST.

They may have no effect on some hypothetical AST that was designed especially to facilitate BMI portability, but they have effect on the ASTs that are actually used in GCC/Clang/MSVC. But the conceptually intractable problem here (as others have mentioned) is that many of these options have corresponding compiler-defined macros which means code may be #ifdef'ed in/out depending on these macros. Even if you decide to ignore such directives somehow (which already sounds nuts), there is no guarantee that such conditional code is actually compilable with/without the option in question. I think if you want BMI portability, the most sensible approach is to have fat BMIs where you compile the same interface with a set of option combinations, merge the resulting BMIs into one fat file, and then tell your users that they can only use one of the supported combinations (or compile a custom BMI from source). I don't believe anything in the current approach prevents us from exploring this in the future. It just makes sense to first chew what we have bitten off so far.

Rainer Deyke

11:54 a.m.

On 08.04.24 23:09, Andrey Semashev via Boost wrote:

...

Why changing compiler options affects C++ AST? Which is what the compiled module basically is, or supposed to be, anyway.

Yes, the AST could change because of different macro definitions, but that is not an issue. Or not more of an issue than it currently is with different macro definitions between TUs. I don't expect modules to solve this issue.

Compilers often expose their options through predefined macros. -- Rainer Deyke (rainerd@eldwood.com)

Rainer Deyke

12:31 p.m.

On 08.04.24 20:29, Robert Ramey via Boost wrote:

...

The more I read this thread, the more it seems to me that modules are just a bad idea. We already have shared libraries which are redistributable and that's already a hassle given all the compiler switches. shared libraries have the same issue in that if one only want's to use one function, the whole library has to be shipped.

This does not follow. Shared libraries have problems, therefore modules (which /solve/ many of these problems) are a bad idea? What? I'm picturing a future where modules (in source code form) become /the/ way to distribute C++ libraries. Advantages: - No more messing with build systems for the library author. Just ship the source code. - No more messing with library build systems for the library consumer. - Unified build system does not need to differentiate between user files and library files. - Consistent macro-based configuration for all libraries, not just header-only libraries - No shared libraries = aggressive dead-code elimination at build time, resulting in massive reductions of code size. - No shared libraries = programs can ship as a single executable file. -- Rainer Deyke (rainerd@eldwood.com)

Andrey Semashev

1:14 p.m.

On 4/9/24 15:31, Rainer Deyke via Boost wrote:

...

On 08.04.24 20:29, Robert Ramey via Boost wrote:

...
The more I read this thread, the more it seems to me that modules are just a bad idea. We already have shared libraries which are redistributable and that's already a hassle given all the compiler switches. shared libraries have the same issue in that if one only want's to use one function, the whole library has to be shipped.

This does not follow. Shared libraries have problems, therefore modules (which /solve/ many of these problems) are a bad idea? What?

I'm picturing a future where modules (in source code form) become /the/ way to distribute C++ libraries.

Your system won't run the C++ modules, someone has to compile them first. Build systems aren't going anywhere, as well as compiled libraries.

...

Advantages: - No more messing with build systems for the library author. Just ship the source code.

Shipping the source code is not new. Building libraries from source is also not new.

...

- No more messing with library build systems for the library consumer.

Unless the consumer doesn't want to compile the dependencies all the time.

...

- Unified build system does not need to differentiate between user files and library files. - Consistent macro-based configuration for all libraries, not just header-only libraries

I'm not sure what the above two mean.

...

- No shared libraries = aggressive dead-code elimination at build time, resulting in massive reductions of code size. - No shared libraries = programs can ship as a single executable file.

Again, not going to happen. Monolithic executables, while may be convenient in a portable usage scenario, are a problem in terms of maintenance and resource consumption.

Rainer Deyke

1:43 p.m.

On 09.04.24 15:14, Andrey Semashev via Boost wrote:

...

On 4/9/24 15:31, Rainer Deyke via Boost wrote:

...
On 08.04.24 20:29, Robert Ramey via Boost wrote:

...
The more I read this thread, the more it seems to me that modules are just a bad idea. We already have shared libraries which are redistributable and that's already a hassle given all the compiler switches. shared libraries have the same issue in that if one only want's to use one function, the whole library has to be shipped.

This does not follow. Shared libraries have problems, therefore modules (which /solve/ many of these problems) are a bad idea? What?

I'm picturing a future where modules (in source code form) become /the/ way to distribute C++ libraries.

Your system won't run the C++ modules, someone has to compile them first. Build systems aren't going anywhere, as well as compiled libraries.

I'm fine with one build system. The one I use to compile my own code. I'm not fine with the dozen or so build systems I currently have to use to build my dependencies. Looking at my list of dependencies, I've got to deal with: - Autotools. - CMake. Different versions thereof. - Meson. - Ninja. - SCons. - Hand-written makefiles. - Broken build files for all of the above, requiring manual patching. - Probably a bunch more I forgot about. - And of course Boost's own b2.

...

...
Advantages: - No more messing with build systems for the library author. Just ship the source code.

Shipping the source code is not new. Building libraries from source is also not new.

Shipping plain source that can just be dropped into source tree is far too rare.

...

...
- No more messing with library build systems for the library consumer.

Unless the consumer doesn't want to compile the dependencies all the time.

You don't recompile the dependencies "all the time". You compile once per configuration and recompile when the dependency changes, same as any other source file. Your build system takes care of this.

...

I'm not sure what the [below] two mean.

...
- Unified build system does not need to differentiate between user files and library files.

This means I just need to deal with one build system, not one per library.

...

...
- Consistent macro-based configuration for all libraries, not just header-only libraries

This means that there is just one way of configuring a library, by defining configuration macros. No messing around with CMake variables or command-line arguments for configure scripts or whatever Meson or SCons use. -- Rainer Deyke (rainerd@eldwood.com)

Дмитрий Архипов

2:33 p.m.

вт, 9 апр. 2024 г. в 16:43, Rainer Deyke via Boost <boost@lists.boost.org>: first. Build systems aren't going anywhere, as well as compiled libraries.

...

I'm fine with one build system. The one I use to compile my own code. I'm not fine with the dozen or so build systems I currently have to use to build my dependencies.

This is already solved by package managers. They deal with dependencies' build systems for you and create "exports" for your build system.

...

You don't recompile the dependencies "all the time". You compile once per configuration and recompile when the dependency changes, same as any other source file. Your build system takes care of this.

But you do. I would think, most projects are built in CI these days. CI can get you pre-built binaries for your dependencies. But BMIs are not redistributable, as we've been told it is not even a goal. The best you can do in order to not rebuild every module for every job is CI runner cache.

...

This means that there is just one way of configuring a library, by defining configuration macros. No messing around with CMake variables or command-line arguments for configure scripts or whatever Meson or SCons use.

Nothing stops you from using macros even now. E.g. with CMake: CXXFLAGS=-DMYLIB_MACRO=1 cmake -S. -Bbuild With b2: b2 define=MYLIB_MACRO=1 There must be a reason projects prefer to use configuration options and not macros directly.

Rainer Deyke

10 Apr 10 Apr

11:05 a.m.

On 09.04.24 16:33, Дмитрий Архипов via Boost wrote:

...

вт, 9 апр. 2024 г. в 16:43, Rainer Deyke via Boost <boost@lists.boost.org>: first. Build systems aren't going anywhere, as well as compiled libraries.

...
I'm fine with one build system. The one I use to compile my own code. I'm not fine with the dozen or so build systems I currently have to use to build my dependencies.

This is already solved by package managers. They deal with dependencies' build systems for you and create "exports" for your build system.

I think you mean it /could/ be solved by a /hypothetical/ package manager. Because it's not solved by any actual package manager I know. I tried Conan. I ran into the following problems: - Even getting it to work with a trivial test program is a pain in the ass. - It's based on binary packages, which means very limited configurability for packages to avoid a combinatorial explosion of package files. - It does not come with all of libraries I use, or even a significant subset thereof. - The libraries it does host are often broken. - It wants to fetch packages from the internet, which means that build scripts aren't future proof, because stuff disappears from the internet all the time. To avoid this I would have to host my own repository, which is again a huge pain in the ass. - It tries to be "build system agnostic", which means I need still be fluent in a dozen or so build systems to fix things inevitably go wrong. I did not try any others, but that's because I was turned away from them before I even downloading them. I'm currently taking a second looks at build2. I haven't got very far, but I'm already starting to hate it.

...

...
You don't recompile the dependencies "all the time". You compile once per configuration and recompile when the dependency changes, same as any other source file. Your build system takes care of this.

But you do. I would think, most projects are built in CI these days. CI can get you pre-built binaries for your dependencies. But BMIs are not redistributable, as we've been told it is not even a goal. The best you can do in order to not rebuild every module for every job is CI runner cache.

That sounds like a problem with your build system. I do "from scratch" builds every so often. They take about a week to run. Of course, "from scratch" for me includes compiling the tools I use to build the tools, including the compiler.

...

...
This means that there is just one way of configuring a library, by defining configuration macros. No messing around with CMake variables or command-line arguments for configure scripts or whatever Meson or SCons use.

Nothing stops you from using macros even now. E.g. with CMake: CXXFLAGS=-DMYLIB_MACRO=1 cmake -S. -Bbuild

With b2: b2 define=MYLIB_MACRO=1

And a third syntax for SCons, and a fourth for autotools, and a fifth for Meson, and so on. The whole point is to avoid having to learn a dozen different build tools, each with their own command line syntax.

...

There must be a reason projects prefer to use configuration options and not macros directly.

The main reason is that people are stuck on the library binaries model (whether static or dynamic) - a model that has been obsolete since C++ first gained templates. -- Rainer Deyke (rainerd@eldwood.com)

Boris Kolpackov

5:08 p.m.

Rainer Deyke via Boost <boost@lists.boost.org> writes:

...

I'm currently taking a second looks at build2. I haven't got very far, but I'm already starting to hate it.

I would be interested to hear why (probably best off-list since it will be off-topic here). Having said that, build2 is (unfortunately) complex and it also "reimagines" a lot of fundamentals compared to the existing build systems and package managers (especially compared to meta build systems like CMake). So many people find it too "alien" at first. I wish it could be done in a simpler way and based on the established intuition, but I don't know how, given the general brokenness of the mainstream approaches we have now. One thing I can say in build2's defense is that it works: what other C++ build toolchains can you use that would allow you to build an application that depends on, say, both Boost and Qt (and all their dependencies, recursively) with a single build system reliably and uniformly across all the major platforms and compiler (including Windows)?

Robert Ramey

5:27 p.m.

On 4/10/24 10:08 AM, Boris Kolpackov via Boost wrote:

...

Rainer Deyke via Boost <boost@lists.boost.org> writes:

...

One thing I can say in build2's defense is that it works:

works -> can be made to work. I understand the vision for b2 and have been using it with boost for more than 20 years. It's complex - arguably to complex. It's still under constant development. I only update and rebuild it occasionally. But when I do, there's almost always some sort of issue which requires going to the list or slack/boost. Eventually I get this to work and it's quite satisfactory. Basically, the development process for b2 is not resulting in a reliable product. Addressing this is a job that is difficult and underrated. I would like to see CMake efforts culminate in a result which can replace b2. But that doesn't seem to be progressing either.

...

what otherC++ build toolchains can you use that would allow you to build an application that depends on, say, both Boost and Qt (and all their dependencies, recursively) with a single build system reliably and uniformly across all the major platforms and compiler (including Windows)?

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Andrey Semashev

5:37 p.m.

On 4/10/24 20:27, Robert Ramey via Boost wrote:

...

On 4/10/24 10:08 AM, Boris Kolpackov via Boost wrote:

...
Rainer Deyke via Boost <boost@lists.boost.org> writes:

...
One thing I can say in build2's defense is that it works:

works -> can be made to work.

I understand the vision for b2 and have been using it with boost for more than 20 years. It's complex - arguably to complex. It's still under constant development. I only update and rebuild it occasionally. But when I do, there's almost always some sort of issue which requires going to the list or slack/boost. Eventually I get this to work and it's quite satisfactory. Basically, the development process for b2 is not resulting in a reliable product.

Addressing this is a job that is difficult and underrated. I would like to see CMake efforts culminate in a result which can replace b2. But that doesn't seem to be progressing either.

I suspect you may be confusing build2 (https://build2.org/) and b2 (a.k.a. Boost.Build, https://www.boost.org/doc/libs/1_84_0/tools/build/doc/html/index.html). Those are different build systems, and I believe, Boris was talking of the former. Yes, the names are confusing.

René Ferdinand Rivera Morell

6:36 p.m.

As Andrey said.. You confused two different build systems. But I'm replying solely for B2. :-) On Wed, Apr 10, 2024 at 12:28 PM Robert Ramey via Boost <boost@lists.boost.org> wrote:

...

I understand the vision for b2 and have been using it with boost for more than 20 years. It's complex - arguably to complex.

It's actually way simpler than most build systems. Which might be a problem as it doesn't perhaps do as high level magic that users might expect.

...

It's still under constant development.

Yes, and that's good. But Boost only sees the bigger incremental changes from releases.

...

I only update and rebuild it occasionally.

Do you have a suggestion to encourage you to update to the up to date releases as they happen?

...

But when I do, there's almost always some sort of issue which requires going to the list or slack/boost. Eventually I get this to work and it's quite satisfactory. Basically, the development process for b2 is not resulting in a reliable product.

Do you have suggestions as to what to change in the development process?

...

Addressing this is a job that is difficult and underrated. I would like to see CMake efforts culminate in a result which can replace b2. But that doesn't seem to be progressing either.

Just because you don't superficially see the progress doesn't mean it's not there. I remember a certain organization making that same mistake of not thinking progress was being made. -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supone Nada -- Robot Dreams - http://robot-dreams.net

Robert Ramey

7:19 p.m.

On 4/10/24 11:36 AM, René Ferdinand Rivera Morell via Boost wrote:

...

As Andrey said.. You confused two different build systems. But I'm replying solely for B2. :-)

I'm talking about the version which gets built when one invokes the "bootstrap" script.

...

Do you have a suggestion to encourage you to update to the up to date releases as they happen?

If there were some sort of versioning that might work. Suppose b2 boost 1.85 is all working. Suppose that b2 contains the check - "guaranteed to work on boost builds up to version 1.85". When b2 starts up, it checks the version of boost it's running on. If the current version of b2 can't work with the current version of boost, it emits an helpful error message and shuts down. This actually begs the main question: Why is b2 coupled to the version of Boost in the first place? I don't have time to argue this now. My current version of booststrap genereates a b2 executable which throws a seg fault at startup. Of course this should never occur. It suggests that b2 needs more tests before being released.

...

...
But when I do, there's almost always some sort of issue which requires going to the list or slack/boost. Eventually I get this to work and it's quite satisfactory. Basically, the development process for b2 is not resulting in a reliable product.

Do you have suggestions as to what to change in the development process?

Independent test suite for b2. Don't couple b2 with current version of Boost

...

Just because you don't superficially see the progress doesn't mean it's not there.

I can only rely upon what I see. Robert Ramey

René Ferdinand Rivera Morell

7:37 p.m.

On Wed, Apr 10, 2024 at 2:19 PM Robert Ramey via Boost <boost@lists.boost.org> wrote:

...

On 4/10/24 11:36 AM, René Ferdinand Rivera Morell via Boost wrote:

...
Do you have a suggestion to encourage you to update to the up to date releases as they happen?

If there were some sort of versioning that might work.

There is, as of 5.0. Which, 5.1, is going to ship with Boost 1.85.

...

Suppose b2 boost 1.85 is all working. Suppose that b2 contains the check - "guaranteed to work on boost builds up to version 1.85". When b2 starts up, it checks the version of boost it's running on. If the current version of b2 can't work with the current version of boost, it emits an helpful error message and shuts down.

That's possible. We just need to add "require-b2 5.1 ;" to boost-root/Jamroot (or any other jamfile). Which this PR (https://github.com/boostorg/boost/pull/854/files#diff-ea2703244a4dec1e6f7bfe...) will do once I get the final testing on that done and it gets merged.

...

This actually begs the main question: Why is b2 coupled to the version of Boost in the first place? I don't have time to argue this now.

It's not. I test every change to B2 with Boost versions from 1.66 onward (although I need to add 1.84 and 1.85 when that's released) plus develop and master branches on Windows, Linux, and macOS. For example <https://dev.azure.com/bfgroup/B2/_build/results?buildId=1157&view=logs&s=b4a2ddcd-11cb-5970-1bbf-f636351df511&j=3ad4e34e-ab49-518a-e310-16a369583b97>. Which means you can use the current B2 with every version of Boost released in the past 7 years.

...

My current version of booststrap genereates a b2 executable which throws a seg fault at startup. Of course this should never occur. It suggests that b2 needs more tests before being released.

Please, I implore you, submit a bug with repro steps to <https://github.com/bfgroup/b2/issues>.

...

...
...
But when I do, there's almost always some sort of issue which requires going to the list or slack/boost. Eventually I get this to work and it's quite satisfactory. Basically, the development process for b2 is not resulting in a reliable product.

Do you have suggestions as to what to change in the development process?

Independent test suite for b2.

B2 has had a rather large test suite for close to 2 decades. But testing build systems is challenging. :-(

...

Don't couple b2 with current version of Boost

Already the case. But maybe you mean something other than providing backward compat? -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supone Nada -- Robot Dreams - http://robot-dreams.net

Rainer Deyke

8:11 p.m.

On 10.04.24 19:08, Boris Kolpackov via Boost wrote:

...

Rainer Deyke via Boost <boost@lists.boost.org> writes:

...
I'm currently taking a second looks at build2. I haven't got very far, but I'm already starting to hate it.

I would be interested to hear why (probably best off-list since it will be off-topic here).

I haven't delved deep into Build2 yet (as stated above), so I could be misinterpreting things. I also haven't given up on it yet. That said: - bdep creates a <project_name>/<project_name> hierarchy. I assume the outer <project_name> directory is the package and the inner <project_name> is the project within the package. But wait, there's one more level! I can't put my build directories in the inner /or/ the outer <project_name>, so in practice I've got three levels: <project_name>/<project_name>/<project_name>. - I need to specify the name for each build configuration twice, once for the @name and once for the directory. - It wants me to use git. I strongly prefer fossil as my version control system. - Build configurations appear to be stored in a sqlite database, not in a readable and editable text file. -- Rainer Deyke (rainerd@eldwood.com)

Dominique Devienne

11 Apr 11 Apr

8:49 a.m.

On Wed, Apr 10, 2024 at 10:11 PM Rainer Deyke via Boost < boost@lists.boost.org> wrote:

...

- Build configurations appear to be stored in a sqlite database, not in a readable and editable text file.

Some would argue that's a positive. There are tons of GUI SQLite "viewers". A structured normalized data-model for build configuration sounds good to me (haven't looked at this particular one though) If you want text, you can also "dump" the SQL or CSV of the tables using a sqlite3 1-liner, or used sqlite3 to SELECT the subset you want. But I get the resistance to non-text too. Perhaps Boris has an alternate text-form? --DD

...

From https://www.loc.gov/preservation/digital/formats/fdd/fdd000461.shtml The Library of Congress Recommended Formats Statement (RFS) includes SQLite as a preferred format for datasets <https://www.loc.gov/preservation/resources/rfs/data.html>

Dominique Devienne

8:51 a.m.

On Thu, Apr 11, 2024 at 10:49 AM Dominique Devienne <ddevienne@gmail.com> wrote:

...

On Wed, Apr 10, 2024 at 10:11 PM Rainer Deyke via Boost < boost@lists.boost.org> wrote:

...
- Build configurations appear to be stored in a sqlite database, not in a readable and editable text file.

Some would argue that's a positive. There are tons of GUI SQLite "viewers". A structured normalized data-model for build configuration sounds good to me (haven't looked at this particular one though) If you want text, you can also "dump" the SQL or CSV of the tables using a sqlite3 1-liner, or used sqlite3 to SELECT the subset you want. But I get the resistance to non-text too. Perhaps Boris has an alternate text-form? --DD

From https://www.loc.gov/preservation/digital/formats/fdd/fdd000461.shtml

...
The Library of Congress Recommended Formats Statement (RFS) includes SQLite as a preferred format for datasets <https://www.loc.gov/preservation/resources/rfs/data.html>

Also, it's a bit ironic you complain about SQLite use, when your SCM of choice, Fossil, *is* SQLite-based, from the SQLite author no less! :) --DD

Rainer Deyke

9:21 a.m.

On 11.04.24 10:51, Dominique Devienne via Boost wrote:

...

On Thu, Apr 11, 2024 at 10:49 AM Dominique Devienne <ddevienne@gmail.com> wrote:

...
On Wed, Apr 10, 2024 at 10:11 PM Rainer Deyke via Boost < boost@lists.boost.org> wrote:

...
- Build configurations appear to be stored in a sqlite database, not in a readable and editable text file.

Some would argue that's a positive. There are tons of GUI SQLite "viewers". A structured normalized data-model for build configuration sounds good to me (haven't looked at this particular one though) If you want text, you can also "dump" the SQL or CSV of the tables using a sqlite3 1-liner, or used sqlite3 to SELECT the subset you want. But I get the resistance to non-text too. Perhaps Boris has an alternate text-form? --DD

From https://www.loc.gov/preservation/digital/formats/fdd/fdd000461.shtml

...
The Library of Congress Recommended Formats Statement (RFS) includes SQLite as a preferred format for datasets <https://www.loc.gov/preservation/resources/rfs/data.html>

Also, it's a bit ironic you complain about SQLite use, when your SCM of choice, Fossil, *is* SQLite-based, from the SQLite author no less! :) --DD

The difference is that I don't want to hand-edit my version history, but I do want to hand-edit my build configurations. Going through the command line for the latter seems like an unnecessary redirection when it comes to copying build configurations from project to project. (Imagine a dozen projects with a dozen build configurations each that all need an additional command line parameter because a common dependency has changed its requirements.) I have nothing against sqlite per se. It's a great tool. I just don't think it's the right tool in this specific case. -- Rainer Deyke (rainerd@eldwood.com)

Boris Kolpackov

9:44 a.m.

[I am keeping the list CC'ed seeing that there is interest in this subject. But let me know if anyone feels this is veering too much off-topic.] Rainer Deyke via Boost <boost@lists.boost.org> writes:

...

I haven't delved deep into Build2 yet (as stated above), so I could be misinterpreting things. I also haven't given up on it yet.

Glad to hear. There are a few mis-conceptions below (which, I accept, are in large part due to the complexity and "we often do things differently"):

...

- bdep creates a <project_name>/<project_name> hierarchy. I assume the outer <project_name> directory is the package and the inner <project_name> is the project within the package. But wait, there's one more level! I can't put my build directories in the inner /or/ the outer <project_name>, so in practice I've got three levels: <project_name>/<project_name>/<project_name>.

That's the default structure and there is rationale[1] for it, though as most things in this area, it's a trade-off and not everyone agrees with the choices we've made. The good news is that the layout can be customized to generate pretty much any layout you want, split (include/src) or combined, with nested subdirectories or without, etc. We have a section in the documentation that shows how to create over a dozen layouts, including the one used in Boost: https://build2.org/bdep/doc/bdep-new.xhtml#src-layout

...

- I need to specify the name for each build configuration twice, once for the @name and once for the directory.

Hm, not sure where you got this impression. The only time you need to specify both is when creating the configuration. Even in this case the directory is actually optional and if you don't specify it, you will get ../<project>-<name> by default. Here is a typical "create a throw-away project to quickly test something" workflow: $ bdep new hello $ cd hello $ bdep init -C @gcc cc config.cxx=g++ $ bdep init -C @clang cc config.cxx=clang++ $ bdep update @gcc $ bdep update @clang $ ls -1 ../ hello hello-gcc hello-clang

...

- It wants me to use git. I strongly prefer fossil as my version control system.

You can use build2 without git. You will loose quite a bit of "nice to have" functionality, mostly in bdep, but it's doable. Even without git, you will still be able to use bdep-ci and bdep-publish with a bit of effort. We have packages where upstream doesn't use version control at all (or at least it's not publicly visible; for example byacc[2]).

...

- Build configurations appear to be stored in a sqlite database, not in a readable and editable text file.

It's a bit more nuanced that that: some information (usually the one you don't want to edit manually) is stored in the SQLite database, while things like compiler, options, etc that you have configured are stored in plain text files that you are welcome to edit. Continuing with the above example: $ cat ../hello-clang/build/config.build config.cxx = clang++ config.cxx.poptions = [null] config.cxx.coptions = [null] config.cxx.loptions = [null] config.cxx.aoptions = [null] config.cxx.libs = [null] ... You can edit this file and, for example, change config.cxx.coptions to: config.cxx.coptions = -Wall -Wextra -Werror [1] https://build2.org/build2-toolchain/doc/build2-toolchain-intro.xhtml#proj-st... [2] https://github.com/build2-packaging/byacc/

Rainer Deyke

12:57 p.m.

On 11.04.24 11:44, Boris Kolpackov via Boost wrote:

...

...
- bdep creates a <project_name>/<project_name> hierarchy. I assume the outer <project_name> directory is the package and the inner <project_name> is the project within the package. But wait, there's one more level! I can't put my build directories in the inner /or/ the outer <project_name>, so in practice I've got three levels: <project_name>/<project_name>/<project_name>.

That's the default structure and there is rationale[1] for it, though as most things in this area, it's a trade-off and not everyone agrees with the choices we've made.

I have been using the canonical layout for the purpose of following along with the Getting Started Guide. I realize that some aspects of this cane be changed, but others seemingly can't. This is the layout I want: <project_name> + builds <- not in version control + configuration1 + configuration2 + source + <super_project_name> + <project_name> + <project>.hpp + <project>.cpp + <build_configurations_files> <- in version control I find the use of <super_project_name> useful to separate internal libraries and external libraries, and to prevent namespace pollution. I use this include style: #include "super_project_name/project_name/project.hpp" Double quotes instead of angle brackets because the file I'm including is not from the standard library, but always with both super_project and project names. The important bits here that Build2 doesn't seem to like: - Builds go under the main project directory. One project, one top-level directory (with as many subdirectories as needed). - Build configurations are versioned.

...

...
- I need to specify the name for each build configuration twice, once for the @name and once for the directory.

Hm, not sure where you got this impression. The only time you need to specify both is when creating the configuration. Even in this case the directory is actually optional and if you don't specify it, you will get ../<project>-<name> by default.

Yeah, I really don't like these <project>-<configuration> directories at the same level as the main project. Hence the need to manually specify a different directory.

...

...
- It wants me to use git. I strongly prefer fossil as my version control system.

You can use build2 without git. You will loose quite a bit of "nice to have" functionality, mostly in bdep, but it's doable.

I know I can, but I don't like losing "nice to have" functionality. Especially when I'm just getting started with Build2 and I don't know what that functionality is or how nice it is.

...

...
- Build configurations appear to be stored in a sqlite database, not in a readable and editable text file.

It's a bit more nuanced that that: some information (usually the one you don't want to edit manually) is stored in the SQLite database, while things like compiler, options, etc that you have configured are stored in plain text files that you are welcome to edit. > Continuing with the above example:

$ cat ../hello-clang/build/config.build

That's a relief, but the configuration options are stored with the build artifacts? That's awkward if I want the configuration options in version control while keeping the build artifacts outside. -- Rainer Deyke (rainerd@eldwood.com)

Boris Kolpackov

5:03 p.m.

Rainer Deyke via Boost <boost@lists.boost.org> writes:

...

This is the layout I want:

<project_name> + builds <- not in version control + configuration1 + configuration2 + source + <super_project_name> + <project_name> + <project>.hpp + <project>.cpp + <build_configurations_files> <- in version control

So similar to the Boost layout of individual libraries but combined instead of split: foo/ └── libs/ └── boost/ └── foo/ ├── foo.cpp └── foo.hpp

...

I find the use of <super_project_name> useful to separate internal libraries and external libraries, and to prevent namespace pollution. I use this include style:

#include "super_project_name/project_name/project.hpp"

Ok, so this in the Boost analogy: #include "boost/foo/foo.hpp"

...

Double quotes instead of angle brackets because the file I'm including is not from the standard library, but always with both super_project and project names.

Nothing prevents you from using <>-style inclusion for non-standard headers, but ok, every C++ developer has their unique style ;-).

...

The important bits here that Build2 doesn't seem to like: - Builds go under the main project directory. One project, one top-level directory (with as many subdirectories as needed).

Yes, one of the somewhat "hard" rules in build2 is that you either build in source, or out of source, not kind-of-out-of-source (i.e., with the builds/ directory inside your project repository). The main user-facing reason for this is that build2 is multirepo-first, meaning that we assume the user will work with multiple repositories simultaneously and often it will be convenient to share build configurations between them. As an example, consider another repository bar that you develop simultaneously and that depends on foo (pretty much the arrangement of the Boost libraries on GitHub before they are amalgamated). While you don't have to, it's normally convenient to initialize them in a shared build configuration. If your build configuration directories are inside foo/ and bar/, then it becomes asymmetrical and awkward. Having said that, there is an escape hatch if you really want to keep you build directory inside your repository: don't make your repository root a build2 project/package root, instead pushing it one directory level deeper (this is also how we support multi- package repositories). While it's hard to recreate your structure exactly (because of another "hard" rule in build2, which is that a project/package root directory must be the same as the project/package name, so it cannot be source/ or libs/), but you can get pretty close: foo/ ├── builds/ └── libboost-foo/ └── boost/ └── foo/ ├── foo.cpp └── foo.hpp The bdep-new commands that create this would be: $ bdep new --type empty foo $ cd foo $ bdep new --package --lang c++,cpp --type lib,subdir=boost/foo libboost-foo $ tree foo foo/ ├── libboost-foo/ │ ├── boost/ │ │ └── foo/ │ │ ├── boost-foo.cpp │ │ ├── boost-foo.hpp │ │ ├── buildfile │ │ ├── export.hpp │ │ └── version.hpp.in │ ├── build/ │ │ ├── bootstrap.build │ │ ├── export.build │ │ └── root.build │ ├── tests/ │ │ ├── basics/ │ │ │ ├── buildfile │ │ │ └── driver.cpp │ │ ├── build/ │ │ │ ├── bootstrap.build │ │ │ └── root.build │ │ └── buildfile │ ├── buildfile │ ├── manifest │ └── README.md ├── buildfile ├── packages.manifest ├── README.md └── repositories.manifest

...

- Build configurations are versioned.

That's a relief, but the configuration options are stored with the build artifacts? That's awkward if I want the configuration options in version control while keeping the build artifacts outside.

There is a mechanism for that: you can save some pre-canned configurations in the version control and then load them when creating the build configurations. Continuing with the above example: $ cat <<EOF >with-warnings.build config.version = 1 config.cxx.coptions += -Wall -Wextra -Werror EOF $ bdep init -C builds/gcc-with-warnings @gccW cc config.cxx=g++ \ config.config.load=./with-warnings.build For background on config.config.load (there is also .save), see: https://build2.org/release/0.12.0.xhtml#config-export-import

Rainer Deyke

12 Apr 12 Apr

10:49 a.m.

On 11.04.24 19:03, Boris Kolpackov via Boost wrote:

...

Rainer Deyke via Boost <boost@lists.boost.org> writes:

...
Double quotes instead of angle brackets because the file I'm including is not from the standard library, but always with both super_project and project names.

Nothing prevents you from using <>-style inclusion for non-standard headers, but ok, every C++ developer has their unique style ;-).

I am well aware that common practice uses <>-style includes for non-standard headers. I just don't like to think of extension-less standard headers like <vector> as actual physical files on my disk, so I mentally separate standard headers from actual physical header files with a .hpp/.hxx/.h extension. (And in fact, I don't think the standard headers are /required/ to be actual physical files per the C++ standard, although in practice they almost always are.)

...

...
The important bits here that Build2 doesn't seem to like: - Builds go under the main project directory. One project, one top-level directory (with as many subdirectories as needed).

Yes, one of the somewhat "hard" rules in build2 is that you either build in source, or out of source, not kind-of-out-of-source (i.e., with the builds/ directory inside your project repository).

The main user-facing reason for this is that build2 is multirepo-first, meaning that we assume the user will work with multiple repositories simultaneously and often it will be convenient to share build configurations between them. As an example, consider another repository bar that you develop simultaneously and that depends on foo (pretty much the arrangement of the Boost libraries on GitHub before they are amalgamated). While you don't have to, it's normally convenient to initialize them in a shared build configuration. If your build configuration directories are inside foo/ and bar/, then it becomes asymmetrical and awkward.

Wait, build configuration directories can be shared between multiple repositories? The canonical <project_name>-<configuration_name> directory names led me to believe that each configuration build directory is tied to both a specific project and a specific configuration. Of course, it's somewhat common for me to have multiple checkouts of the same project side-by-side, and it would not be safe for them to share build directories.

...

Having said that, there is an escape hatch if you really want to keep you build directory inside your repository: don't make your repository root a build2 project/package root, instead pushing it one directory level deeper (this is also how we support multi- package repositories).

Yes, that's what I was alluding to in an earlier post about a <project_name>/<project_name>/<project_name> directory structure. It works, it's just hard to keep all those same-named directories straight. Especially when using a file browser that only displays the name of the current directory, not the full path.

...

While it's hard to recreate your structure exactly (because of another "hard" rule in build2, which is that a project/package root directory must be the same as the project/package name, so it cannot be source/ or libs/), but you can get pretty close:

That's good to know.

...

$ bdep new --type empty foo $ cd foo $ bdep new --package --lang c++,cpp --type lib,subdir=boost/foo libboost-foo

...

...
That's a relief, but the configuration options are stored with the build artifacts? That's awkward if I want the configuration options in version control while keeping the build artifacts outside.

There is a mechanism for that: you can save some pre-canned configurations in the version control and then load them when creating the build configurations.

Also good to know. -- Rainer Deyke (rainerd@eldwood.com)

Boris Kolpackov

11:20 a.m.

Rainer Deyke via Boost <boost@lists.boost.org> writes:

...

Wait, build configuration directories can be shared between multiple repositories? The canonical <project_name>-<configuration_name> directory names led me to believe that each configuration build directory is tied to both a specific project and a specific configuration.

Yes, that's the sensible default for the simple cases but which you can change if you know ahead of time you will (likely) be working on several repositories at once. For my own work I usually have the builds/ subdirectory next to the repositories I am working on and inside I have a bunch of build configuration directories like builds/gcc13/ and builds/clang18/, etc.

...

Of course, it's somewhat common for me to have multiple checkouts of the same project side-by-side, and it would not be safe for them to share build directories.

They cannot share it (you cannot have the same project name initialized multiple times in the same build configuration, naturally) but you can switch from one checkout to another in the same configuration with relative ease. This becomes especially handy if you have a large number of dependencies. (Though we also have the notion of linked configuration which can be used to address this issue in another way).

...

Yes, that's what I was alluding to in an earlier post about a <project_name>/<project_name>/<project_name> directory structure. It works, it's just hard to keep all those same-named directories straight.

Yes, that's true. The header inclusion scheme like "boost/foo/foo.hpp" is forcing us into these deeply nested and repetitive hierarchies. But we should be able to do better with modules.

John Maddock

9 Apr 9 Apr

3:37 p.m.

On 09/04/2024 13:31, Rainer Deyke via Boost wrote:

...

On 08.04.24 20:29, Robert Ramey via Boost wrote:

...
The more I read this thread, the more it seems to me that modules are just a bad idea. We already have shared libraries which are redistributable and that's already a hassle given all the compiler switches. shared libraries have the same issue in that if one only want's to use one function, the whole library has to be shipped.

This does not follow. Shared libraries have problems, therefore modules (which /solve/ many of these problems) are a bad idea? What?

I'm picturing a future where modules (in source code form) become /the/ way to distribute C++ libraries. Advantages: - No more messing with build systems for the library author. Just ship the source code. I hear you. Regex has been "just a bunch of source files" since day one. It's surprising how many people are unable to cope with that and need a "build and install" to hold their hand. Just saying...

Ruben Perez

8 Apr 8 Apr

7:30 p.m.

...

It would be interesting to see the benchmark numbers for a larger number of CPU cores (e.g. 16). I can see in the table that up to 3 TUs build time with modules is 25-40% higher than with headers and the situation significantly changes for 4 and more TUs. You were using 3 cores for compilation, and I wonder if this is related.

It is definitely related. I ran the benchmark like this intentionally, so the effects of parallelism could be seen without having to run a benchmark with 20TUs. You can expect the modules build to be slower at 7TUs with 16 cores. I will double check shortly that this is the case, though.

...

One other thing that isn't clear is how modules interact with compiled libraries. I don't suppose modules will replace static/shared libraries, so I presume a module will be added on top of the library? How should it "export" the symbols that are already exported from the compiled library then?

I haven't explored this yet as to have a clear mental model here.

...

Boris Kolpackov

9 Apr 9 Apr

2:33 p.m.

Andrey Semashev via Boost <boost@lists.boost.org> writes:

...

It doesn't make sense to have to build the whole Boost into a module only to pull a small part from it. I would much rather include the headers I want instead.

I actually don't think having a single/only `boost` module would be the right way to modularize Boost. I would suggest having a module per library (at least; maybe even more granular, say for Spirit, which is actually three libraries in one). And a single `boost` module that re-exports them all. Users who want all of Boost, can import `boost`, those like you who want to carefully manage their dependencies can import more granular modules. And, at least in build2, we only build BMIs that are actually used (imported). (This brings an interesting question: if I import `boost`, but only use a small subset of libraries, how do I know which ones I should be linking. Trial and error until there are no more unresolved symbols feels a bit stone age.)

...

One other thing that isn't clear is how modules interact with compiled libraries. I don't suppose modules will replace static/shared libraries, so I presume a module will be added on top of the library?

Yes, from the library perspective, module interfaces are pretty similar to headers: when building the library, the object files produced when compiling module interfaces are linked into the library along with other TU object files. The interfaces are shipped/installed with the library and then compiled by the library consumers.

...

How should it "export" the symbols that are already exported from the compiled library then?

Modules actually make authoring portable shared libraries almost sane. Specifically, with modules, you only need the dllexport part of the dllexport/dllimport dance (and yes, this works for variables, not only functions). That is, you need to compile the BMI for a shared library with dllexport and then, when this BMI is used in import, dllimport happens auto-magically. Which means that this can all be taken care of by the build system without you having to provide the export header (or equivalent) that arranges for the dllexport/dllimport dance. For example, in build2 we have implemented the __symexport keyword-like macro (defined automatically by the build system) which you use like so: export namespace n { __symexport void f (); class __symexport C { ... }; } (At first it seems bizarre that you have to ask to export things twice. But if you meditate on this a bit and also consider that module export deals with C++ names while symbol export deals with symbols, and the mapping is by no means one-to-one, you will likely conclude that this double-export approach is probably for the best.)

René Ferdinand Rivera Morell

2:40 p.m.

On Tue, Apr 9, 2024 at 9:33 AM Boris Kolpackov via Boost <boost@lists.boost.org> wrote:

...

Andrey Semashev via Boost <boost@lists.boost.org> writes:

...
It doesn't make sense to have to build the whole Boost into a module only to pull a small part from it. I would much rather include the headers I want instead.

I actually don't think having a single/only `boost` module would be the right way to modularize Boost. I would suggest having a module per library (at least; maybe even more granular, say for Spirit, which is actually three libraries in one).

Yes, definitely a module per library. I would avoid the sub-modules. Probably not worth the pain and it would be more confusing to users.

...

And a single `boost` module that re-exports them all. Users who want all of Boost, can import `boost`, those like you who want to carefully manage their dependencies can import more granular modules. And, at least in build2, we only build BMIs that are actually used (imported).

I would strongly discourage a singular `boost` module. There are too many ways to subset boost that would cause confusion as to what `import boost;` means.

...

(This brings an interesting question: if I import `boost`, but only use a small subset of libraries, how do I know which ones I should be linking. Trial and error until there are no more unresolved symbols feels a bit stone age.)

Yeah, let's not corner our way into such issues. -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supone Nada -- Robot Dreams - http://robot-dreams.net

Ruben Perez

3:20 p.m.

...

Yes, definitely a module per library. I would avoid the sub-modules. Probably not worth the pain and it would be more confusing to users.

Agree. The only point where they may be necessary is for peer dependencies. For instance, Boost Asio has a peer dependency on OpenSSL, but there are plenty of uses that don't require it (it's not even in the convenience header). I'd advise against the global "boost" module. I don't think it adds anything when you can "import boost.asio" and go.

Peter Dimov

3:26 p.m.

Ruben Perez wrote:

...

I'd advise against the global "boost" module. I don't think it adds anything when you can "import boost.asio" and go.

I suspect that if we provide both per-library and `boost` modules, most people will end up using the latter. That's kind of like the monolithic vs modular debacle, modular is aesthetically more pleasant, but monolithic works and is less hassle.

Boris Kolpackov

5:50 p.m.

Peter Dimov via Boost <boost@lists.boost.org> writes:

...

That's kind of like the monolithic vs modular debacle, modular is aesthetically more pleasant, but monolithic works and is less hassle.

Modular works just fine if you have proper tooling. We've had Boost split into packages at library granularity in build2 for a couple of years now and I've not once heard from our users that it's a hassle or that they wish it was a single package.

John Maddock

3:43 p.m.

On 09/04/2024 15:33, Boris Kolpackov via Boost wrote:

...

Andrey Semashev via Boost <boost@lists.boost.org> writes:

...
It doesn't make sense to have to build the whole Boost into a module only to pull a small part from it. I would much rather include the headers I want instead. I actually don't think having a single/only `boost` module would be the right way to modularize Boost. I would suggest having a module per library (at least; maybe even more granular, say for Spirit, which is actually three libraries in one). And a single `boost` module that re-exports them all. Users who want all of Boost, can import `boost`, those like you who want to carefully manage their dependencies can import more granular modules. And, at least in build2, we only build BMIs that are actually used (imported).

(This brings an interesting question: if I import `boost`, but only use a small subset of libraries, how do I know which ones I should be linking. Trial and error until there are no more unresolved symbols feels a bit stone age.)

Right, the currently experimental Boost.Regex module support has a hard dependency on ICU if the latter is installed on your system, irrespective of whether you're actually using that feature or not. Stone age indeed. I probably need to split it into 2 modules just because of that. But that leaves you scrabbling around trying to figure out which sources you need to link against and which not... and that's just for one Boost library! There's still a lot left to figure out here... John.

Boris Kolpackov

6:27 p.m.

John Maddock via Boost <boost@lists.boost.org> writes:

...

Right, the currently experimental Boost.Regex module support has a hard dependency on ICU if the latter is installed on your system, irrespective of whether you're actually using that feature or not. Stone age indeed. I probably need to split it into 2 modules just because of that. But that leaves you scrabbling around trying to figure out which sources you need to link against and which not... and that's just for one Boost library! There's still a lot left to figure out here...

The "state of the art" solution for this (e.g., in Rust, build2) is to use the package manager to specify not only the version constraint of your dependencies but also the desired features (in build2 we call it "dependency configuration"). So in your case, projects that wish to use Boost.Regex with ICU support would request that feature and based on that the Boost.Regex's build system will decide whether to link ICU, etc. This is what it looks like in build2: The feature (called "configuration variable" in build2): https://github.com/build2-packaging/boost/blob/master/libboost-regex/build/r... Conditional dependency on ICU packages: https://github.com/build2-packaging/boost/blob/master/libboost-regex/manifes... Conditional importation and linking of ICU libraries: https://github.com/build2-packaging/boost/blob/master/libboost-regex/include...

Andrey Semashev

3:43 p.m.

On 4/9/24 17:33, Boris Kolpackov wrote:

...

Andrey Semashev via Boost <boost@lists.boost.org> writes:

...
It doesn't make sense to have to build the whole Boost into a module only to pull a small part from it. I would much rather include the headers I want instead.

I actually don't think having a single/only `boost` module would be the right way to modularize Boost. I would suggest having a module per library (at least; maybe even more granular, say for Spirit, which is actually three libraries in one). And a single `boost` module that re-exports them all. Users who want all of Boost, can import `boost`, those like you who want to carefully manage their dependencies can import more granular modules. And, at least in build2, we only build BMIs that are actually used (imported).

(This brings an interesting question: if I import `boost`, but only use a small subset of libraries, how do I know which ones I should be linking. Trial and error until there are no more unresolved symbols feels a bit stone age.)

You link every library from Boost. Preferably, with -Wl,--as-needed. But I agree that having a module for the entire Boost doesn't make sense.

...

...
One other thing that isn't clear is how modules interact with compiled libraries. I don't suppose modules will replace static/shared libraries, so I presume a module will be added on top of the library?

Yes, from the library perspective, module interfaces are pretty similar to headers: when building the library, the object files produced when compiling module interfaces are linked into the library along with other TU object files. The interfaces are shipped/installed with the library and then compiled by the library consumers.

...
How should it "export" the symbols that are already exported from the compiled library then?

Modules actually make authoring portable shared libraries almost sane. Specifically, with modules, you only need the dllexport part of the dllexport/dllimport dance (and yes, this works for variables, not only functions). That is, you need to compile the BMI for a shared library with dllexport and then, when this BMI is used in import, dllimport happens auto-magically.

Which means that this can all be taken care of by the build system without you having to provide the export header (or equivalent) that arranges for the dllexport/dllimport dance. For example, in build2 we have implemented the __symexport keyword-like macro (defined automatically by the build system) which you use like so:

export namespace n { __symexport void f ();

class __symexport C { ... }; }

Thanks for the example. So, if I understood correctly, this is the same as what we do now: define a macro for symbol markup that expands to either dllexport or dllimport, depending on whether the library is being compiled or consumed. I'm assuming this is ok to have `f()` and members of `C` not defined in the module? The "attachment" thing that was mentioned before made it sound like it may be problematic. Also, will this work if the compiled library itself is built without modules?

Boris Kolpackov

6:07 p.m.

Andrey Semashev via Boost <boost@lists.boost.org> writes:

...

On 4/9/24 17:33, Boris Kolpackov wrote:

...
export namespace n { __symexport void f ();

class __symexport C { ... }; }

Thanks for the example. So, if I understood correctly, this is the same as what we do now: define a macro for symbol markup that expands to either dllexport or dllimport, depending on whether the library is being compiled or consumed.

It is the same mechanism (a macro) but you only need to ever define it to dllexport, never to dllimport. This allows a build system to define this macro automatically. In the above example, you as the library author never need to worry about defining __symexport or any other macros that would ordinarily be defined during the build to distinguish between static/shared and building/consuming. In other words, you can treat __symexport as a pseudo-keyword for exporting symbols from shared libraries that "just works".

...

I'm assuming this is ok to have `f()` and members of `C` not defined in the module?

Yes, that doesn't matter. Here is an example: https://github.com/build2/cxx20-modules-examples/blob/named-only-import-std/... https://github.com/build2/cxx20-modules-examples/blob/named-only-import-std/... This works with both MSVC and Clang on Windows.

...

Also, will this work if the compiled library itself is built without modules?

I haven't personally tried to dllexport a using-declaration but I assume that's how the MSVC folks modularized their standard library so presumably it works, at least in MSVC. Maybe Stephan can confirm?

Matt Borland

6:35 a.m.

...

Nevertheless, I'd like to know everyone's opinion on the subject. If we find a way to overcome the technical challenges that I manifest in the article, do you think a set of non-intrusive "module bindings" allowing users to consume Boost as a module could add any value to Boost?

Absolutely it would add value to Boost and to the C++ ecosystem as a whole. We have Steve, Dani, and others who write and implement modules encouraging us to do this while answering our loads of questions. This would help put Boost back on the cutting edge as a technical leader, but it would also be a symbiotic relationship with the aforementioned writers and implementers. Now that I know how it works adding native module support via macros is not overly burdensome: https://github.com/cppalliance/decimal/pull/484. You'll find a few big takeaways in there: 1) A macro to make static constexpr variables inline constexpr 2) A macro to export the functions and classes you want to export 3) A macro to #ifdef out all the standard library headers like Steve talked about I am more than happy to help other authors/maintainers who want to pursue this. I have created: https://github.com/cppalliance/boost2 to explore how we can build and install boost library modules. The end result is hopefully a working demonstration of an import boost2 which contains multiple library modules to ease the concerns of viability. It doesn't contain anything interesting yet, but I will come back with reports of success or failure. Matt

457

Age (days ago)

461

Last active (days ago)

List overview

Download

62 comments

14 participants

participants (14)

Andrey Semashev
Boris Kolpackov
Daniela Engert
Dominique Devienne
John Maddock
Matt Borland
Peter Dimov
Rainer Deyke
René Ferdinand Rivera Morell
Robert Ramey
Ruben Perez
Stephan T. Lavavej
Vinnie Falco
Дмитрий Архипов