[Boost.Serialization] Help with PR fixing a memory leak in the current version

older
[rational, spirit] 1.68.0 beta 2...

Alexander Grund

25 Jun 2018 25 Jun '18

7:35 a.m.

Hi, TLDR: I would like https://github.com/boostorg/serialization/pull/105 to be merged for the next Boost release to fix a memory leak and need someone to review this or help getting it merge-ready. Details: Some time ago I made an analysis of a crash introduced in Boost 1.65.0 in the serialization part. The details are quite complicated as the cause is static initialization/destruction order in shared libraries which may even depend on compilers (could not confirm) or standard libraries (dito, observed in all tested versions). It boils down to a violation of the expected destruction order in the case of shared libraries. I observed how 2 global instances of the same type but from different shared libs are destroyed together although the 2nd should not yet be which causes a use-after-free from another global instance. The Singletons involved did have methods to detect, whether they are active or destroyed but they were kinda optimized away in Boost 1.65 leading to a crash which makes the whole library unusable. Not understanding the root cause of the crash lead to changing the singleton to use dynamic allocation, but not freeing it which leads to the memory leak that is currently in Boost. The current state of the develop-branch changed this back to the crash situation we had before making it unsuitable for the next release. Another pair of eyes would be great to check the PR and get this finally fixed. Thanks, Alex

Show replies by date

Robert Ramey

25 Jun 25 Jun

3:47 p.m.

On 6/25/18 12:35 AM, Alexander Grund via Boost wrote:

...

Hi,

TLDR: I would like https://github.com/boostorg/serialization/pull/105 to be merged for the next Boost release to fix a memory leak and need someone to review this or help getting it merge-ready.

Details: Some time ago I made an analysis of a crash introduced in Boost 1.65.0 in the serialization part. The details are quite complicated as the cause is static initialization/destruction order in shared libraries which may even depend on compilers (could not confirm) or standard libraries (dito, observed in all tested versions). It boils down to a violation of the expected destruction order in the case of shared libraries. I observed how 2 global instances of the same type but from different shared libs are destroyed together although the 2nd should not yet be which causes a use-after-free from another global instance. The Singletons involved did have methods to detect, whether they are active or destroyed but they were kinda optimized away in Boost 1.65 leading to a crash which makes the whole library unusable. Not understanding the root cause of the crash lead to changing the singleton to use dynamic allocation, but not freeing it which leads to the memory leak that is currently in Boost. The current state of the develop-branch changed this back to the crash situation we had before making it unsuitable for the next release.

Another pair of eyes would be great to check the PR and get this finally fixed.

Thanks, Alex

I much appreciate your willingness to invest effort in this issue. But my analysis and conclusions regarding the issue are totally different. Motivated your efforts I added new tests for the singleton and spent time enhancing tests for dll. One test was just to complicated for me to make work with b2 so I suppress it. Only one test is failing on some platforms - test_export. This is related to DLLs. I added windows + mingw and msvc++ to my machine just to try to replicate the failure in my own environment. I have been unable to do so. I believe that this is due to some issue in libc++ or libstdc++ related to some combination of circumstances related to dynamic loading and possible visibility. Both of these areas are undefined by the standard so it's natural that there might be ambiguities here. Late last year I tried to address this dll/export problem by merging in a fix. This fix entailed dynamically allocated a data structure in the singleton. This fixed this failing test, but unbeknownst to me, resulted in the memory leak. At this point there are two paths: plow forward and make the code evermore complex to navigate around the circumstances which make the test fail or step back and backout the fix which resulted in the memory leak. I've chosen the latter. I realize you've invested a lot of effort and it's disheartening not to have it appreciated. Don't think this way. It IS appreciated. The serialization library is still going strong only because persons such as yourself have made these sorts of efforts. I incorporate a significant portion of PRs submitted. The way forward is to add some more tests which isolate the problem. I'm aware that you've submitted one test which fails. It seems like its on the right track. But (IIRC), it dynamically loads/unloads DLLS containing user level serialization code in the same sequence. I think its reasonable to insist that if a user is going to do that that such DLLS be unloaded in reverse order. I admit that setting up these sorts of tests is difficult and time consuming in any build environment. But that's where I am. Robert Ramey

...

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Alexander Grund

26 Jun 26 Jun

8:39 a.m.

...

I much appreciate your willingness to invest effort in this issue. But my analysis and conclusions regarding the issue are totally different. Motivated your efforts I added new tests for the singleton and spent time enhancing tests for dll. One test was just to complicated for me to make work with b2 so I suppress it. Only one test is failing on some platforms - test_export. This is related to DLLs. I added windows + mingw and msvc++ to my machine just to try to replicate the failure in my own environment. I have been unable to do so. I do have (Makefile) test, which does show the behaviour described. I'm

...

Late last year I tried to address this dll/export problem by merging in a fix. This fix entailed dynamically allocated a data structure in the singleton. This fixed this failing test, but unbeknownst to me, resulted in the memory leak. This is what I tried to describe too in the issue and MR: The leak is simply due to no one freeing the dynamically allocated memory. This is even deterministic and easily reproducible. I think that oversight might be caused by you being so used to the code, that you overlook things

...

The way forward is to add some more tests which isolate the problem. I'm aware that you've submitted one test which fails. It seems like its on the right track. But (IIRC), it dynamically loads/unloads DLLS containing user level serialization code in the same sequence. I think its reasonable to insist that if a user is going to do that that such DLLS be unloaded in reverse order. I admit that setting up these sorts of tests is difficult and time consuming in any build environment. But that's where I am. I can provide a test for b2 (the same as I have the Makefile for) if the boost test suite is build with `-fPIC`. This does not involve unloading DLLs or so but simply links an executable against 2 shared libraries which are linked against static boost. This crashes reproducibly. If anyone can tell me how to do the fPIC stuff for Boost.Build I'm happy to add this test to show that it fails with a crash in current code and

Hi, pretty sure, if executed on another machine, it will fail too. I tried to add it to the b2-Testsuite but failed due to having static boost in dynamic libraries (which is the way I found where it crashes for sure). And there is 1 testcase currently in the Boost.Serialization testsuite that DOES fail and I'm pretty sure the issue described is the reason but of course I cannot really confirm this as the root cause is undefined behaviour (or ambiquities with shared libs) that might be visible to fresh eyes. This is why I would like to also have some one else comment on the PR too. that my approach fixes this. In any case, I do have the feeling that the reason for the changes done in my PR (and the changes itself) are not fully understood. This is why I want to encourage question to the changes itself as I think those were not (only) meant to fix an issue in some circumstances but actually correct the code with test cases to prove this where possible (e.g. the is_destroyed flag). Other testcases just try to run into the problem I described. Best, Alex -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Alexander Grund Interdisziplinäre Anwendungsunterstützung und Koordination (IAK) Technische Universität Dresden Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) Chemnitzer Str. 46b, Raum 010 01062 Dresden Tel.: +49 (351) 463-35982 E-Mail: alexander.grund@tu-dresden.de ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Robert

27 Jun 27 Jun

9:19 p.m.

On 6/26/2018 3:39 AM, Alexander Grund via Boost wrote:

...

...
I much appreciate your willingness to invest effort in this issue. But my analysis and conclusions regarding the issue are totally different. Motivated your efforts I added new tests for the singleton and spent time enhancing tests for dll. One test was just to complicated for me to make work with b2 so I suppress it. Only one test is failing on some platforms - test_export. This is related to DLLs. I added windows + mingw and msvc++ to my machine just to try to replicate the failure in my own environment. I have been unable to do so. I do have (Makefile) test, which does show the behaviour described. I'm

...
Late last year I tried to address this dll/export problem by merging in a fix. This fix entailed dynamically allocated a data structure in the singleton. This fixed this failing test, but unbeknownst to me, resulted in the memory leak. This is what I tried to describe too in the issue and MR: The leak is simply due to no one freeing the dynamically allocated memory. This is even deterministic and easily reproducible. I think that oversight might be caused by you being so used to the code, that you overlook things

...
The way forward is to add some more tests which isolate the problem. I'm aware that you've submitted one test which fails. It seems like its on the right track. But (IIRC), it dynamically loads/unloads DLLS containing user level serialization code in the same sequence. I think its reasonable to insist that if a user is going to do that that such DLLS be unloaded in reverse order. I admit that setting up these sorts of tests is difficult and time consuming in any build environment. But that's where I am. I can provide a test for b2 (the same as I have the Makefile for) if the boost test suite is build with `-fPIC`. This does not involve unloading DLLs or so but simply links an executable against 2 shared libraries which are linked against static boost. This crashes reproducibly. If anyone can tell me how to do the fPIC stuff for Boost.Build I'm happy to add this test to show that it fails with a crash in current code and

Hi, pretty sure, if executed on another machine, it will fail too. I tried to add it to the b2-Testsuite but failed due to having static boost in dynamic libraries (which is the way I found where it crashes for sure). And there is 1 testcase currently in the Boost.Serialization testsuite that DOES fail and I'm pretty sure the issue described is the reason but of course I cannot really confirm this as the root cause is undefined behaviour (or ambiquities with shared libs) that might be visible to fresh eyes. This is why I would like to also have some one else comment on the PR too. that my approach fixes this.

In any case, I do have the feeling that the reason for the changes done in my PR (and the changes itself) are not fully understood. This is why I want to encourage question to the changes itself as I think those were not (only) meant to fix an issue in some circumstances but actually correct the code with test cases to prove this where possible (e.g. the is_destroyed flag). Other testcases just try to run into the problem I described.

I confirm that the 1.66 version never ever invokes the singleton destructor declared and defined in boost/serialization/singleton.hpp, approximately line 175. It does not matter whether it is a debug or release build, 32-bit, or 64-bit Microsoft or Intel C++ on Windows. A DLL wrapper around Boost.Serialization is used at run time. The actual Boost.Serialization library is statically linked to the DLL. The roughly 20 line main function, test executable then uses the DLL to invoke the serialization. I verify the behavior using an old school "printf" approach, with std::cout messages and static ptrdiff_t counting variables inside the get_instance() and ~singleton() methods. I observe eleven independent instances of the singleton class, with ZERO calls to the singleton destructor. The eleven singletons are created with ONE instance of the serialization object, regardless of invoking the serialize method. The main function: int main() { { HINSTANCE hGetProcIDDLL = LoadLibrary(L"BoostSerializeDLL.dll"); } /**/ std::cout << "Beginning unit test..."; std::shared_ptr<thislibrary::myclass> myc = thislibrary::create_myclass(); { std::shared_ptr<std::ofstream> ostream = std::make_shared<std::ofstream>(); ostream->open("C:\\temp\\leak_test.xml"); { boost::archive::xml_oarchive xml(*ostream); myc->serialize(xml, 1); // comment this line out and the leak remains. } ostream->close(); } /* */ return 0; } Fundamentally, there is something between 300 to 400 bytes memory leak. Yes, it is not much with today's world of GB RAM. However, when you have production code that leaks, then ANY memory leak is unacceptable. Essentially, the leak is due to the lack of the eleven singleton instances NOT invoking the respective destructor. Plus, the group I am part of spends tens of hours tracking down why the production product has memory leaks. The final convincing point that must be clearly understood is this: due to the leak, Boost.Serialization has been replaced. If that does not convince a reader that there is a real problem, then nothing will. Sincerely, Robert

...

Best, Alex

Robert Ramey

28 Jun 28 Jun

4:18 a.m.

On 6/27/18 2:19 PM, Robert via Boost wrote:

...

On 6/26/2018 3:39 AM, Alexander Grund via Boost wrote:

...
...
I much appreciate your willingness to invest effort in this issue. But my analysis and conclusions regarding the issue are totally different. Motivated your efforts I added new tests for the singleton and spent time enhancing tests for dll. One test was just to complicated for me to make work with b2 so I suppress it. Only one test is failing on some platforms - test_export. This is related to DLLs. I added windows + mingw and msvc++ to my machine just to try to replicate the failure in my own environment. I have been unable to do so. I do have (Makefile) test, which does show the behaviour described. I'm pretty sure, if executed on another machine, it will fail too. I

...
Late last year I tried to address this dll/export problem by merging in a fix. This fix entailed dynamically allocated a data structure in the singleton. This fixed this failing test, but unbeknownst to me, resulted in the memory leak. This is what I tried to describe too in the issue and MR: The leak is simply due to no one freeing the dynamically allocated memory. This is even deterministic and easily reproducible. I think that oversight might be caused by you being so used to the code, that you overlook

...
The way forward is to add some more tests which isolate the problem. I'm aware that you've submitted one test which fails. It seems like its on the right track. But (IIRC), it dynamically loads/unloads DLLS containing user level serialization code in the same sequence. I think its reasonable to insist that if a user is going to do that that such DLLS be unloaded in reverse order. I admit that setting up these sorts of tests is difficult and time consuming in any build environment. But that's where I am. I can provide a test for b2 (the same as I have the Makefile for) if

Hi, tried to add it to the b2-Testsuite but failed due to having static boost in dynamic libraries (which is the way I found where it crashes for sure). And there is 1 testcase currently in the Boost.Serialization testsuite that DOES fail and I'm pretty sure the issue described is the reason but of course I cannot really confirm this as the root cause is undefined behaviour (or ambiquities with shared libs) things that might be visible to fresh eyes. This is why I would like to also have some one else comment on the PR too. the boost test suite is build with `-fPIC`. This does not involve unloading DLLs or so but simply links an executable against 2 shared libraries which are linked against static boost. This crashes reproducibly. If anyone can tell me how to do the fPIC stuff for Boost.Build I'm happy to add this test to show that it fails with a crash in current code and that my approach fixes this.

In any case, I do have the feeling that the reason for the changes done in my PR (and the changes itself) are not fully understood. This is why I want to encourage question to the changes itself as I think those were not (only) meant to fix an issue in some circumstances but actually correct the code with test cases to prove this where possible (e.g. the is_destroyed flag). Other testcases just try to run into the problem I described.

I confirm that the 1.66 version never ever invokes the singleton destructor declared and defined in boost/serialization/singleton.hpp, approximately line 175. It does not matter whether it is a debug or release build, 32-bit, or 64-bit Microsoft or Intel C++ on Windows. A DLL wrapper around Boost.Serialization is used at run time. The actual Boost.Serialization library is statically linked to the DLL. The roughly 20 line main function, test executable then uses the DLL to invoke the serialization.

I verify the behavior using an old school "printf" approach, with std::cout messages and static ptrdiff_t counting variables inside the get_instance() and ~singleton() methods. I observe eleven independent instances of the singleton class, with ZERO calls to the singleton destructor. The eleven singletons are created with ONE instance of the serialization object, regardless of invoking the serialize method.

The main function:

int main() { { HINSTANCE hGetProcIDDLL = LoadLibrary(L"BoostSerializeDLL.dll"); } /**/ std::cout << "Beginning unit test..."; std::shared_ptr<thislibrary::myclass> myc = thislibrary::create_myclass(); { std::shared_ptr<std::ofstream> ostream = std::make_shared<std::ofstream>(); ostream->open("C:\\temp\\leak_test.xml"); { boost::archive::xml_oarchive xml(*ostream); myc->serialize(xml, 1); // comment this line out and the leak remains. } ostream->close(); } /* */ return 0; }

Fundamentally, there is something between 300 to 400 bytes memory leak. Yes, it is not much with today's world of GB RAM. However, when you have production code that leaks, then ANY memory leak is unacceptable. Essentially, the leak is due to the lack of the eleven singleton instances NOT invoking the respective destructor. Plus, the group I am part of spends tens of hours tracking down why the production product has memory leaks. The final convincing point that must be clearly understood is this: due to the leak, Boost.Serialization has been replaced. If that does not convince a reader that there is a real problem, then nothing will.

I believe this has been fixed for 1.68 which can be found in the current develop and master branch. What are the results of your test when applied to the latest develop or release version?

...

Sincerely,

Robert

...
Best, Alex

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Alexander Grund

29 Jun 29 Jun

9:07 a.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

Hi all again, I have to stress the importance of a swift action/review here as the current master now contains the flawed commit again that leads to a certain crash when using shared libraries linked against static boost. Reproduce with: git clone git@github.com:boostorg/boost.git && cd boost && git submodule update --recursive --init ./bootstrap.sh && ./b2 variant=release link=static --with-libraries=serialization cxxflags=-fPIC cflags=-fPIC -j2 mkdir mytest && cd mytest Create 3 files: |test_multi_singleton.cpp:intf();intg();intmain(intargc,char**){// Make sure symbols are usedif(argc==8)returnf();if(argc==9)returng();}multi_singleton1.cpp:#include<boost/serialization/extended_type_info_typeid.hpp>intf(){return0!=boost::serialization::extended_type_info_typeid<int>::get_const_instance().get_key();}multi_singleton2.cpp:#include<boost/serialization/extended_type_info_typeid.hpp>intg(){// Use different(!) typereturn0!=boost::serialization::extended_type_info_typeid<float>::get_const_instance().get_key();}| export CPATH=$PWD/.. export LIBRARY_PATH=$PWD/../stage/lib/ g++ multi_singleton1.cpp -lboost_serialization -fPIC -shared -olibmulti_singleton1.so g++ multi_singleton2.cpp -lboost_serialization -fPIC -shared -olibmulti_singleton2.so g++ test_multi_singleton.cpp -L. -lmulti_singleton1 -lmulti_singleton2 LD_LIBRARY_PATH=. ./a.out Result: *** Error in `./a.out': double free or corruption (fasttop): 0x0000000001accc20 *** This is due to the bug I described and fixed in https://github.com/boostorg/serialization/pull/105. An extract from there only showing the bug with `is_destroyed` is https://github.com/boostorg/serialization/pull/110 In the current state I'm strongly against including the latest master from Boost.Serialization. It would be better, to include the Boost 1.67 version of it which "only" contains a memory leak. Regards, Alex

Robert Ramey

3:20 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 6/29/18 2:07 AM, Alexander Grund via Boost wrote:

...

Hi all again,

I have to stress the importance of a swift action/review here as the current master now contains the flawed commit again that leads to a certain crash when using shared libraries linked against static boost.

Right now I'm bogged down in other stuff, but I did find a little time to investigate this some more. I reviewed the information on the failures if test_dll_export along with the corresponding config.info output. I've found some useful things. e.g. doesn't occur on windows. Seems to occur only on linux platforms. I've found it occurs on both lib++ and libstdc++ evironments. I can't figure out from the test results whether the tests are linking from static or shared standard libraries. I didn't realize that this occurs (only on?) configurations which create shared libraries which link against static boost. Mixing shared libraries is a known cause of difficulties. It many cases it seems to work, but in many cases it also leads to unfixable failures. I'm still looking into this.

...

Reproduce with: git clone git@github.com:boostorg/boost.git && cd boost && git submodule update --recursive --init ./bootstrap.sh && ./b2 variant=release link=static --with-libraries=serialization cxxflags=-fPIC cflags=-fPIC -j2 mkdir mytest && cd mytest Create 3 files:

|test_multi_singleton.cpp:intf();intg();intmain(intargc,char**){// Make sure symbols are usedif(argc==8)returnf();if(argc==9)returng();}multi_singleton1.cpp:#include<boost/serialization/extended_type_info_typeid.hpp>intf(){return0!=boost::serialization::extended_type_info_typeid<int>::get_const_instance().get_key();}multi_singleton2.cpp:#include<boost/serialization/extended_type_info_typeid.hpp>intg(){// Use different(!) typereturn0!=boost::serialization::extended_type_info_typeid<float>::get_const_instance().get_key();}|

export CPATH=$PWD/.. export LIBRARY_PATH=$PWD/../stage/lib/ g++ multi_singleton1.cpp -lboost_serialization -fPIC -shared -olibmulti_singleton1.so g++ multi_singleton2.cpp -lboost_serialization -fPIC -shared -olibmulti_singleton2.so g++ test_multi_singleton.cpp -L. -lmulti_singleton1 -lmulti_singleton2 LD_LIBRARY_PATH=. ./a.out

Result: *** Error in `./a.out': double free or corruption (fasttop): 0x0000000001accc20 ***

I'll look into your test.

...

This is due to the bug I described and fixed in https://github.com/boostorg/serialization/pull/105. An extract from there only showing the bug with `is_destroyed` is https://github.com/boostorg/serialization/pull/110

In the current state I'm strongly against including the latest master from Boost.Serialization. It would be better, to include the Boost 1.67 version of it which "only" contains a memory leak.

We differ on the cause, the fix and the work around. Objection noted. Robert Ramey

...

Regards, Alex

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Gavin Lambert

2 Jul 2 Jul

12:48 a.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 30/06/2018 03:20, Robert Ramey wrote:

...

I didn't realize that this occurs (only on?) configurations which create shared libraries which link against static boost. Mixing shared libraries is a known cause of difficulties. It many cases it seems to work, but in many cases it also leads to unfixable failures.

By general definition, if you have two shared libraries that internally statically link the "same" library, they must *completely hide* that internal usage. Exactly zero of the internal library's symbols are permitted to exist in the shared library's export list, and absolutely never can any of its types appear in the shared library's own API. This includes usage by inheritance or by convention (such as being streamable to a serialization archive via templated operators). Violation of that rule occurs frequently (because it's hard to enforce and most of the time goes unnoticed) but it is still an ironclad rule if you don't want weird and undefined behaviour. Templates are a frequent cause of issues since they're very leaky by nature, and highly susceptible to ODR issues. In the context of Boost.Serialization, this means that you can pass the serialized representations across shared library module boundaries but you cannot pass archives or the serializable objects themselves across those boundaries, unless both shared libraries also use Boost.Serialization itself as a shared library. In general the simplest thing to do is to either keep everything static or everything shared; mixing leads to headaches unless a great deal of care is taken.

Gavin Lambert

1:01 a.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

Mere moments ago, quoth I:

...

In the context of Boost.Serialization, this means that you can pass the serialized representations across shared library module boundaries but you cannot pass archives or the serializable objects themselves across those boundaries, unless both shared libraries also use Boost.Serialization itself as a shared library.

To put this a simpler way: If you want to be able to use Boost.Serialization as a static library within shared libraries, then you must make absolutely certain that there are exactly zero #includes or references of any kind to Boost.Serialization's header files in any of the public header files of the shared library. If everything compiles after that, then you should be ok. This usually requires making use of PIMPL techniques.

Alexander Grund

7:22 a.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

...

To put this a simpler way:

If you want to be able to use Boost.Serialization as a static library within shared libraries, then you must make absolutely certain that there are exactly zero #includes or references of any kind to Boost.Serialization's header files in any of the public header files of the shared library.

If everything compiles after that, then you should be ok. This usually requires making use of PIMPL techniques. Please have a look at my test: The interfaces of the shared libraries do NOT contain anything boost related. Only very simple C functions that are called to make sure, the singletons within Boost are instantiated. So your preconditions are met, yet the failure occurs.

Possible use case: You use Boost.Serialization in your library and use another shared library which uses Boost.Serialization itself, but completely independently. Nothing about it in the interface, but it would still crash.

...

...
This is due to the bug I described and fixed in https://github.com/boostorg/serialization/pull/105. An extract from there only showing the bug with `is_destroyed` is https://github.com/boostorg/serialization/pull/110

In the current state I'm strongly against including the latest master from Boost.Serialization. It would be better, to include the Boost 1.67 version of it which "only" contains a memory leak.

We differ on the cause, the fix and the work around. Objection noted.

Ok let me put this differently and lets see how far we can agree: 1. Destruction order may or may not be as intended (For single binaries destruction order is inverse of construction, but my test shows, that this does not hold for this shared library of static library mix) 2. Singletons of Boost.Serialization have a method `is_destroyed` which returns whether one can still access the singleton or not 3. `!is_destroyed` is asserted by the destructors of singletons referring to other singletons 4. 3. will fail if 1. does not hold, the program will then crash (or UB) 5. if 1. is true, the crash can be avoided by checking `is_destroyed` before accessing the singleton (not doing so if destroyed) 6. `is_destroyed` is broken, hence the assertion is unreliable and we get crashes where we would first get assertion failures in debug builds Hence the steps to fix this: First and foremost: Test and fix `is_destroyed` Second: implement 5. While 1. is an assumption that is disputable I think the rest is not. Correct me here, if I'm wrong. Hence https://github.com/boostorg/serialization/pull/110 which adds the test for `is_destroyed`. Even if there is another explanation for 1. what is wrong with implementing 5.? It will always be safe and code like `if(!foo.is_destroyed()) foo.use()` is quite idiomatic C (`if(foo) foo->use()`) Alex Grund

Robert

1:30 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 7/2/2018 2:22 AM, Alexander Grund via Boost wrote:

...

...
To put this a simpler way:

If you want to be able to use Boost.Serialization as a static library within shared libraries, then you must make absolutely certain that there are exactly zero #includes or references of any kind to Boost.Serialization's header files in any of the public header files of the shared library.

If everything compiles after that, then you should be ok. This usually requires making use of PIMPL techniques. Please have a look at my test: The interfaces of the shared libraries do NOT contain anything boost related. Only very simple C functions that are called to make sure, the singletons within Boost are instantiated. So your preconditions are met, yet the failure occurs.

Possible use case: You use Boost.Serialization in your library and use another shared library which uses Boost.Serialization itself, but completely independently. Nothing about it in the interface, but it would still crash.

Although crashing is not observed with the slightly older Boost releases (e.g. 1.66, 1.65.1, 1.64, etc.), intentionally isolating Boost.Serialization as Gavin describes fails, with the already described lack of destructor invocations and memory leak.

...

...
...
This is due to the bug I described and fixed in https://github.com/boostorg/serialization/pull/105. An extract from there only showing the bug with `is_destroyed` is https://github.com/boostorg/serialization/pull/110

In the current state I'm strongly against including the latest master from Boost.Serialization. It would be better, to include the Boost 1.67 version of it which "only" contains a memory leak.

We differ on the cause, the fix and the work around. Objection noted.

Ok let me put this differently and lets see how far we can agree:

1. Destruction order may or may not be as intended (For single binaries destruction order is inverse of construction, but my test shows, that this does not hold for this shared library of static library mix) 2. Singletons of Boost.Serialization have a method `is_destroyed` which returns whether one can still access the singleton or not 3. `!is_destroyed` is asserted by the destructors of singletons referring to other singletons 4. 3. will fail if 1. does not hold, the program will then crash (or UB) 5. if 1. is true, the crash can be avoided by checking `is_destroyed` before accessing the singleton (not doing so if destroyed) 6. `is_destroyed` is broken, hence the assertion is unreliable and we get crashes where we would first get assertion failures in debug builds

Hence the steps to fix this: First and foremost: Test and fix `is_destroyed` Second: implement 5.

While 1. is an assumption that is disputable I think the rest is not. Correct me here, if I'm wrong. Hence https://github.com/boostorg/serialization/pull/110 which adds the test for `is_destroyed`. Even if there is another explanation for 1. what is wrong with implementing 5.? It will always be safe and code like `if(!foo.is_destroyed()) foo.use()` is quite idiomatic C (`if(foo) foo->use()`)

I completely agree with your approach. I also agree that having crashing and/or undefined behavior (UB) is not acceptable for any release, let alone the upcoming 1.68. --Robert

...

Alex Grund

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Alexander Grund

1:40 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

...

Although crashing is not observed with the slightly older Boost releases (e.g. 1.66, 1.65.1, 1.64, etc.), intentionally isolating Boost.Serialization as Gavin describes fails, with the already described lack of destructor invocations and memory leak. Did you test the code I sent around with static boost and those versions? I expect a crash in 1.68 (if the commit is not reverted) and in 1.65.x, so if you don't experience one I'd be interested in details (OS, stdlib, compiler etc.). Also: 1.64 should be fine and 1.65 should not leak, but crash. 1.66/1.67 have the leak.

Alex

Robert Ramey

3:24 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 7/2/18 6:40 AM, Alexander Grund via Boost wrote:

...

...
Although crashing is not observed with the slightly older Boost releases (e.g. 1.66, 1.65.1, 1.64, etc.), intentionally isolating Boost.Serialization as Gavin describes fails, with the already described lack of destructor invocations and memory leak.

...

Did you test the code I sent around with static boost and those versions? I expect a crash in 1.68 (if the commit is not reverted) and in 1.65.x, so if you don't experience one I'd be interested in details (OS, stdlib, compiler etc.). Also: 1.64 should be fine and 1.65 should not leak, but crash. 1.66/1.67 have the leak.

Here's what I did. I looked at the serialization test matrix: https://www.boost.org/development/tests/develop/developer/serialization.html The interesting test is test_dll_exported. I compared this table against the config.info: https://www.boost.org/development/tests/develop/developer/config.html And I also looked at the test command line. test results to see if I could find a commonality between the test failures and the configuration being tested. I found that: a) Test failed only on linux platforms. But the test doesn't fail on all linux platforms. b) Test never fails on windows platforms. c) tests would fail with both clang and gcc compilers d) Test failure seems pretty reliable. Exception: the one test which passes uses gcc 7.3 e) I can't discern what the build switch settings are from the python command line. That is I would like to know if it was built with link=shared/static and runtime-link=shared/static. I believe that this information would be significant. f) a couple of ARM tests pass - one fails. Much has been made of a memory leak. The current version 1.68 on develop and master branches removed the dynamic allocation in the singleton module. So if there is a memory leak it would be more subtle than first meets the eye. I believe this memory leak was introduced in efforts to make this test pass for 1.67 (or maybe 1.66 I don't remember). I believe the correct way to approach this is to analyze your new test and also test_dll_export in light of Gavin Lambert's comments above. If this works out, I would then add this information to the serialization library documentation. Unfortunately, I'm bogged down in other stuff so I can't address it instantaneously, but it I do consider it important to investigate further. Robert Ramey

...

Alex

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Alexander Grund

3:46 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

...

test results to see if I could find a commonality between the test failures and the configuration being tested. I found that:

a) Test failed only on linux platforms. But the test doesn't fail on all linux platforms. b) Test never fails on windows platforms. c) tests would fail with both clang and gcc compilers d) Test failure seems pretty reliable. Exception: the one test which passes uses gcc 7.3 e) I can't discern what the build switch settings are from the python command line. That is I would like to know if it was built with link=shared/static and runtime-link=shared/static. I believe that this information would be significant. f) a couple of ARM tests pass - one fails. Great! I found at least for https://www.boost.org/development/tests/develop/developer/output/BI1-Debian-...

...

Much has been made of a memory leak. The current version 1.68 on develop and master branches removed the dynamic allocation in the singleton module. So if there is a memory leak it would be more subtle than first meets the eye. I believe this memory leak was introduced in efforts to make this test pass for 1.67 (or maybe 1.66 I don't remember). FYI: The leak was introduced in 1.66, the crash in 1.65.0 and again in current master. I believe the correct way to approach this is to analyze your new test and also test_dll_export in light of Gavin Lambert's comments above. If this works out, I would then add this information to the serialization library documentation. To make it simple I'd suggest you concentrate on my new test. I'm sure it meets Gavin Lambert's requirements of not leaking Boost-"internals" outside of the shared libs used and the crash is easy to reproduce (at least on my side`on every linux machine). If you can reproduce this even on a single machine on your side, then this is all argument it needs

that boost was build with link=shared. I don't think runtime-link is tested? At least I'd expect a reference to that in the link command. But I don't think it matters much, if the test I sent is considered formally valid (see below) that there is something in need of fixing. But even without this: Isn't it enough, that `is_destroyed` is broken, to investigate in my fix? Fixing that function is the major part of the PR. The rest is the simple check-before-use stuff. Alex

Robert Ramey

4:21 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 7/2/18 8:46 AM, Alexander Grund via Boost wrote:

...

But even without this: Isn't it enough, that `is_destroyed` is broken, to investigate in my fix? Fixing that function is the major part of the PR. The rest is the simple check-before-use stuff

I will also investigate "is_destroyed" I agree that everything has to come up clean and correct. I'm shooting for "no loose ends". But the issues are complicated for, among other things: a) DLL/Shared libraries are not defined by the standard. b) different vendors have different keywords/requirements c) visibility is not defined by the standard either - but it's almost essential. We're navigating a mine field here. Robert Ramey

Robert

3:39 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 7/2/2018 8:40 AM, Alexander Grund via Boost wrote:

...

...
Although crashing is not observed with the slightly older Boost releases (e.g. 1.66, 1.65.1, 1.64, etc.), intentionally isolating Boost.Serialization as Gavin describes fails, with the already described lack of destructor invocations and memory leak. Did you test the code I sent around with static boost and those versions?

No, I have not tested the 1.68 revision. If my day goes well enough with my primary tasks, I will build 1.68 early this evening or possibly July 5th. This will be strictly on Windows with both Microsoft 15.6.4 and Intel C++ 18.0, Update 3. --Robert I expect a crash in 1.68 (if the commit is not reverted) and

...

in 1.65.x, so if you don't experience one I'd be interested in details (OS, stdlib, compiler etc.). Also: 1.64 should be fine and 1.65 should not leak, but crash. 1.66/1.67 have the leak.

Alex

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

degski

3:53 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 2 July 2018 at 18:39, Robert via Boost <boost@lists.boost.org> wrote:

...

This will be strictly on Windows with both Microsoft 15.6.4 ...

With the current level of flux in VC development, you really should consider upgrading to 15.7.4 (unless the 6 is a typo and should read 7), or better, move to 15.8 Preview 3 (that's the advice STL gave me in another issue). Have a good day, degski -- *"If something cannot go on forever, it will stop" - Herbert Stein*

Robert Ramey

4:22 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 7/2/18 8:53 AM, degski via Boost wrote:

...

On 2 July 2018 at 18:39, Robert via Boost <boost@lists.boost.org> wrote:

...
This will be strictly on Windows with both Microsoft 15.6.4 ...

With the current level of flux in VC development, you really should consider upgrading to 15.7.4 (unless the 6 is a typo and should read 7), or better, move to 15.8 Preview 3 (that's the advice STL gave me in another issue).

But many/most users don't have the latest MSVC version - and the library has to work on at least the most recent ones.

...

Have a good day,

degski

degski

4:52 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 2 July 2018 at 19:22, Robert Ramey via Boost <boost@lists.boost.org> wrote:

...

But many/most users don't have the latest MSVC version

Given how easy the upgrade process has become, I suspect, you're wrong here. Also, users (not corporate ones, but they are not going to adopt Boost 1.68 either at release time anyway, so that's irrelevant) are looking forward now to the (C++17) stuff that will work (and doesn't yet work with their current version, a minor version difference can and will make all the difference).

...

- and the library has to work on at least the most recent ones.

Today's Preview is tomorrow's Stable, and will be outdated 2 weeks later, that's how things are if you really "wonna make some progress" (you should see the number of bugs and fixes on vcpkg, it's impressive), it's a wild ride, but it's fun. As you are a dev of an important library, your stable is today's Preview. degski

Robert

4:54 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 7/2/2018 10:53 AM, degski via Boost wrote:

...

On 2 July 2018 at 18:39, Robert via Boost <boost@lists.boost.org> wrote:

...
This will be strictly on Windows with both Microsoft 15.6.4 ...

With the current level of flux in VC development, you really should consider upgrading to 15.7.4 (unless the 6 is a typo and should read 7), or better, move to 15.8 Preview 3 (that's the advice STL gave me in another issue).

I would love to, but am intentionally delaying for work reasons. Some of the team have updated to 15.7.4. Because of my using the older Microsoft version, it takes only a few minutes to track down the root cause of a recent issue. This may not be known to news group members here. But, the Windows Intel C++ compiler is heavily dependent on the Microsoft supplied standard library header files (e.g. from 2015: https://software.intel.com/en-us/forums/intel-c-compiler/topic/596318). Similarly, in a Mac oriented post, there is a dependency on the gcc header files (e.g. from 2013: https://software.intel.com/en-us/forums/intel-c-compiler/topic/366102). I have already submitted an issue with Intel that the Visual Studio 15.7.4 headers cause Intel 18.0, Update 3 to produce thousands of spurious errors on the std::variant. The 15.6.4 std::variant version does not cause the invalid error output from Intel. Plus, only the 19.0 Intel C++ is planned to address the problem(s). --Robert

...

Have a good day,

degski

degski

5:10 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 2 July 2018 at 19:54, Robert via Boost <boost@lists.boost.org> wrote:

...

Some of the team have updated to 15.7.4. Because of my using the older Microsoft version ...

You can install the preview side-by-side with the stable version, you'd have BOBW (another advice from STL). This may not be known to news group members here. But, the Windows Intel

...

C++ compiler is heavily dependent on the Microsoft supplied standard library header files (e.g. from 2015: https://software.intel.com/en- us/forums/intel-c-compiler/topic/596318).

You said it, it's a compiler, not an implementation of a STL.

...

Similarly, in a Mac oriented post, there is a dependency on the gcc header files (e.g. from 2013: https://software.intel.com/en- us/forums/intel-c-compiler/topic/366102).

Luckily I know nothing about Mac (in the same category as fuckbook.com, twitter (I don't own a phone) and smartphones in general (fat fingers, but my wife keeps asking me to debug her bloody Android phone, shit)).

...

I have already submitted an issue with Intel that the Visual Studio 15.7.4 headers cause Intel 18.0, Update 3 to produce thousands of spurious errors on the std::variant.

Yeah, I know from experience that they are very slow in reacting (if they react at all, which is usually not the case). F.e. the way they detect the compiler in tbb/mkl does not work with Clang/LLVM on windows, because they do the detection (if it should work) in the wrong order (they should check for VC first and Clang later, can't go wrong on linux/Mac), seems pretty simple, I filed it, but it is still as it used to be. degski -- *"If something cannot go on forever, it will stop" - Herbert Stein*

Robert

7:03 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 7/2/2018 12:10 PM, degski via Boost wrote:

...

On 2 July 2018 at 19:54, Robert via Boost <boost@lists.boost.org> wrote:

...
Some of the team have updated to 15.7.4. Because of my using the older Microsoft version ...

You can install the preview side-by-side with the stable version, you'd have BOBW (another advice from STL).

Yeah, I am not too excited about that on my primary desktop. I might install the preview on a VM though. Presently, the team continues to find newly released "features" that break stuff. Hence, my now four decades (and growing) distrust...

...

This may not be known to news group members here. But, the Windows Intel

...
C++ compiler is heavily dependent on the Microsoft supplied standard library header files (e.g. from 2015: https://software.intel.com/en- us/forums/intel-c-compiler/topic/596318).

You said it, it's a compiler, not an implementation of a STL.

...
Similarly, in a Mac oriented post, there is a dependency on the gcc header files (e.g. from 2013: https://software.intel.com/en- us/forums/intel-c-compiler/topic/366102).

Luckily I know nothing about Mac (in the same category as fuckbook.com, twitter (I don't own a phone) and smartphones in general (fat fingers, but my wife keeps asking me to debug her bloody Android phone, shit)).

...
I have already submitted an issue with Intel that the Visual Studio 15.7.4 headers cause Intel 18.0, Update 3 to produce thousands of spurious errors on the std::variant.

Yeah, I know from experience that they are very slow in reacting (if they react at all, which is usually not the case). F.e. the way they detect the compiler in tbb/mkl does not work with Clang/LLVM on windows, because they do the detection (if it should work) in the wrong order (they should check for VC first and Clang later, can't go wrong on linux/Mac), seems pretty simple, I filed it, but it is still as it used to be.

I have them fixing a couple other items too. As far as speedily resolving something, that all depends on what the Intel group determines is important. Tea leaf interpretation or flipping a coin 50 times, is about the same prediction accuracy. :) --Robert

...

degski

degski

3 Jul 3 Jul

3:38 a.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 2 July 2018 at 22:03, Robert via Boost <boost@lists.boost.org> wrote:

...

Yeah, I am not too excited about that on my primary desktop. I might install the preview on a VM though. Presently, the team continues to find newly released "features" that break stuff. Hence, my now four decades (and growing) distrust...

You are not wrong here. ... Tea leaf interpretation or flipping a coin 50 times, is about the same

...

prediction accuracy. :)

LOL degski -- *"If something cannot go on forever, it will stop" - Herbert Stein*

Olaf van der Spek

8:14 a.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On Mon, Jul 2, 2018 at 9:03 PM, Robert via Boost <boost@lists.boost.org> wrote:

...

Yeah, I am not too excited about that on my primary desktop. I might install the preview on a VM though. Presently, the team continues to find newly released "features" that break stuff. Hence, my now four decades (and growing) distrust...

In VC or in their code? Either way, the sooner it's found the sooner it can be fixed. -- Olaf

Robert

1:34 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 7/3/2018 3:14 AM, Olaf van der Spek via Boost wrote:

...

On Mon, Jul 2, 2018 at 9:03 PM, Robert via Boost <boost@lists.boost.org> wrote:

...
Yeah, I am not too excited about that on my primary desktop. I might install the preview on a VM though. Presently, the team continues to find newly released "features" that break stuff. Hence, my now four decades (and growing) distrust...

In VC or in their code? Windows SDK, ATL, COM, C++ compiler conformance, standard library header files, ODBC driver to SQL Server, etc.

...

Either way, the sooner it's found the sooner it can be fixed.

In the group I am part of, as soon as the root cause is determined, an issue is filed. How soon the Visual Studio C++ team responds with a resolution, to something that previously is not broken, is non deterministic, however. The tea leaf and coin flipping prediction statement applies here too. :) --Robert

Alexander Grund

11 Jul 11 Jul

7:11 a.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

Hi all, with the release of 1.68 beta I want to push this again. Have you tried the example I sent around? If so, did it crash on (probably) every linux system too? For me it did which makes me consider this as a showstopper and I would like to propose again to simply revert the last change to the version with the memory leak which is way less serious than a crash. Of course merging my fix would be preferred as it fixes all problems but it seems that the review of that might take to long for 1.68. Regards, Alex

Robert

2:04 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 7/11/2018 2:11 AM, Alexander Grund via Boost wrote:

...

Hi all,

with the release of 1.68 beta I want to push this again. Have you tried the example I sent around? If so, did it crash on (probably) every linux system too? For me it did which makes me consider this as a showstopper and I would like to propose again to simply revert the last change to the version with the memory leak which is way less serious than a crash. Of course merging my fix would be preferred as it fixes all problems but it seems that the review of that might take to long for 1.68.

I agree with you. IMHO, the PR should be merged. At a minimum, the 1.68 release notes should list where to find the PR. Then, the users decide whether to use it. I am only one concerned professional user though. I am sure others will voice their thoughts too. Sincerely, Robert

...

Regards, Alex

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Robert Ramey

3:50 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 7/11/18 12:11 AM, Alexander Grund via Boost wrote:

...

Hi all,

with the release of 1.68 beta I want to push this again. Have you tried the example I sent around? Sorry, I haven't had time to investigate this.

...

For me it did which makes me consider this as a showstopper and I would like to propose again to simply revert the last change to the version with the memory leak which is way less serious than a crash. Of course merging my fix would be preferred as it fixes all problems but it seems that the review of that might take to long for 1.68. The problem is that I don't see any connection between your proposed change and the failed test. I'm not disputing that your change makes a problematic symptom go away. But I haven't been able to invest enough time to really understand why that might be so.

The whole issue of dynamically loaded DLLS/shared libraries has been a festering sore for at least 20 years now. It's undefined by the standard and the committee has not seen fit to address it. Sorry it just takes time to get this right. Robert Ramey

...

Regards, Alex

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Alexander Grund

4:32 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

...

The problem is that I don't see any connection between your proposed change and the failed test. I'm not disputing that your change makes a problematic symptom go away. But I haven't been able to invest enough time to really understand why that might be so. Please do not focus on that currently checked in test (test_dll_load or what it was) because that one is to complex to reason about. Please check the very basic singleton interface tests I even added as a separate PR (https://github.com/boostorg/serialization/pull/110) which clearly shows, that the current implementation is flawed. Additionally I have (empiric) evidence that destruction order is unreliable (see the test I sent around in this ML). So what my proposed change is, is simply: a) fix the bug in the implementation shown by #110 (should be indisputably the right thing to do) and b) check if the singleton accessed by a singleton-destructor is still alive (equivalent of `if(foo) foo->bar()`) I don't see how this fixes a symptom only. It tackles the root causes: A defect in the code and a use-after-free caused by the unexpected destruction order (although the cause for that is unclear and only know to be related to shared libraries).

I do acknowledge your lack of time and because the fix is so logical/straight forward I asked for another reviewer to relieve you from that or help out.

...

The whole issue of dynamically loaded DLLS/shared libraries has been a festering sore for at least 20 years now. It's undefined by the standard and the committee has not seen fit to address it. Sorry it just takes time to get this right. I also do understand this. But I have proof that your latest change causes a (possible) crash instead of a (certain) memory leak and hence ask you to revert that change for 1.68 or the library will be unusable for some users. Once there is more time, the issue can be fully resolved.

Alex Grund

Robert Ramey

7:20 p.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 7/11/18 9:32 AM, Alexander Grund via Boost wrote:

...

...
The problem is that I don't see any connection between your proposed change and the failed test. I'm not disputing that your change makes a problematic symptom go away. But I haven't been able to invest enough time to really understand why that might be so. Please do not focus on that currently checked in test (test_dll_load or what it was) because that one is to complex to reason about.

it's test_dll_export. It's the simplest test I could figure out. There's another one that tests dynamic loading of DLL test_dll_plugin. I don't run that test as I would have to dive a lot more into bjam than I can aford to do.

...

Please check the very basic singleton interface tests I even added as a separate PR (https://github.com/boostorg/serialization/pull/110) which clearly shows, that the current implementation is flawed.

I think I did that once. On this basis I added a test_singleton to the test suite. I'm pretty https://github.com/boostorg/serialization/blob/master/test/test_singleton.cp... . I does check that all singletons are destroyed in the reverse order. Note: this test doesn't appear in any of the "teeks99" tests. I have no idea why that might be. Also note that the current singleton implementation does no allocation from the heap. If there is a memory leak - it's not obvious why singleton might be causing it. OK - i've looked again at your pr. It replaces my test with your own. Rather than doing that, it's been my practice to add new tests. This helps prevent turning into a process of whack-a-mole. I'll add these two new tests and see if I can re-produce the results on my own machine.

...

Additionally I have (empiric) evidence that destruction order is unreliable (see the test I sent around in this ML). I looked as one or two of your tests and as a result updated the test suite as above. So I think things are covered. I'll double check though.

...

So what my proposed change is, is simply: a) fix the bug in the implementation shown by #110 (should be indisputably the right thing to do) and

LOL - sorry it's not indiputable to me.

...

b) check if the singleton accessed by a singleton-destructor is still alive (equivalent of `if(foo) foo->bar()`)

I will check this.

...

I don't see how this fixes a symptom only. It tackles the root causes: A defect in the code and a use-after-free caused by the unexpected destruction order

I do not believe there is any freeing going on here. (although the cause for that is unclear and only know

...

to be related to shared libraries).

Right. I do not think its possible to avoid problems if libs are not unloaded in the reverse order that they are loaded. I believe that this will be something that has to be enforced at the application level. It would be nice if it were possible to detect this from the library, but without any standard defined behavior, this is going to be difficult.

...

I do acknowledge your lack of time and because the fix is so logical/straight forward I asked for another reviewer to relieve you from that or help out.

I don't always agree with the PRs and the analysis of other participants. I try to give all such suggestions the attention they deserve and recognize the hard work that they entail - even if I decline to include them. On the other hand. Most of the suggestions do eventually get incorporated - perhaps after some back and forth to refine them, add/update tests, etc. It is only because of efforts by persons such as yourself, that I have been able to maintain the library after all these years. And it only because of these efforts that I keep motivated to keep it as perfect as possible. You make me a better man.

...

...
The whole issue of dynamically loaded DLLS/shared libraries has been a festering sore for at least 20 years now. It's undefined by the standard and the committee has not seen fit to address it. Sorry it just takes time to get this right.

...

I also do understand this. But I have proof that your latest change causes a (possible) crash instead of a (certain) memory leak and hence ask you to revert that change for 1.68 or the library will be unusable for some users. Once there is more time, the issue can be fully resolved.

This is the crux. I do not this can be resolved definitively without a real understanding of the source of the problem.

...

Alex Grund

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Alexander Grund

12 Jul 12 Jul

7:45 a.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

...

it's test_dll_export. It's the simplest test I could figure out. There's another one that tests dynamic loading of DLL test_dll_plugin. I don't run that test as I would have to dive a lot more into bjam than I can aford to do. Did you test what I sent around in this ML? As already explained I could not get it into bjam either, but it does show the crash(!) on all linux systems without further fiddling highlighting the need of an action. Also note that the current singleton implementation does no allocation from the heap. If there is a memory leak - it's not obvious why singleton might be causing it. The current implementation does NOT have a leak but a CRASH. Thats why I want you to reverse that as the former is less severe (see above). OK - i've looked again at your pr. It replaces my test with your own. Rather than doing that, it's been my practice to add new tests. This helps prevent turning into a process of whack-a-mole. I'll add these two new tests and see if I can re-produce the results on my own machine. My test was developed before yours, so I had a merge conflict. Looking at your test I found that it tests the wrong thing: Instead of testing

...

...
Additionally I have (empiric) evidence that destruction order is unreliable (see the test I sent around in this ML). I looked as one or two of your tests and as a result updated the test suite as above. So I think things are covered. I'll double check though. If you had run the test I sent around you'd see that this is not covered. Destruction order is guaranteed except in the case of shared

that the interface of singleton is correct, it tests the construction/destruction order. That is guaranteed by the compiler, hence you actually test the compiler. As that order only changes in the context of shared libraries I did not see any added value of the test, hence the complete overwrite (yeah, sorry, but was the easiest and as explained a test of the interface is better than a test of the compiler) libraries which is hard to get into bjam (well actually not, but it requires building with fPIC. If that's possible/allowed, I'll add a test to bjam that will crash on travis although it is very valid.

...

...
So what my proposed change is, is simply: a) fix the bug in the implementation shown by #110 (should be indisputably the right thing to do) and

LOL - sorry it's not indiputable to me. Can you explain that a bit more? #110 shows that `is_destroyed` does return a wrong value. Why would it be not right to fix that?

...
I don't see how this fixes a symptom only. It tackles the root causes: A defect in the code and a use-after-free caused by the unexpected destruction order

I do not believe there is any freeing going on here. An object is used after it is destroyed. That object has a map which is freed on destruction but accessed afterwards. Try valgrind on the code I sent around and it will show you the use-after-free. Right. I do not think its possible to avoid problems if libs are not unloaded in the reverse order that they are loaded. I believe that this will be something that has to be enforced at the application level. It would be nice if it were possible to detect this from the library, but without any standard defined behavior, this is going to be difficult. How can this be enforced if there is no (manual) dynamic loading going on? You simply link 2 shared libraries and the application crashes on exit. Nothing a user can do about that.

...
I also do understand this. But I have proof that your latest change causes a (possible) crash instead of a (certain) memory leak and hence ask you to revert that change for 1.68 or the library will be unusable for some users. Once there is more time, the issue can be fully resolved.

This is the crux. I do not this can be resolved definitively without a real understanding of the source of the problem. Well the source is indeterminate destruction order. You could try to find the source for that or take it as given and handle it properly which is easily possible.

If you have some time we can discuss this in IRC or another platform directly. But please test what I sent around first. Otherwise we are running around in circles as you don't seem to believe me that there is a crash in a valid use case without anything a user can do. Alex Grund

Alexander Grund

10:30 a.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

FWIW I opened another PR to show the crash also on travis: https://github.com/boostorg/serialization/pull/111. See the (expected as described) failures with LINK=static on Linux. I hope #111 is enough to show, that the latest change needs to be reverted for 1.68. #110 together with #111 should show clearly, where the implementation is broken. Alex Grund

degski

2 Jul 2 Jul

3:26 a.m.

New subject: [Boost.Serialization] Crash in current master (1.68)

On 2 July 2018 at 03:48, Gavin Lambert via Boost <boost@lists.boost.org> wrote:

...

On 30/06/2018 03:20, Robert Ramey wrote:

...
I didn't realize that this occurs (only on?) configurations which create shared libraries which link against static boost. Mixing shared libraries is a known cause of difficulties. It many cases it seems to work, but in many cases it also leads to unfixable failures.

By general definition, if you have two shared libraries that internally statically link the "same" library, they must *completely hide* that internal usage. Exactly zero of the internal library's symbols are permitted to exist in the shared library's export list, and absolutely never can any of its types appear in the shared library's own API.

This includes usage by inheritance or by convention (such as being streamable to a serialization archive via templated operators).

Violation of that rule occurs frequently (because it's hard to enforce and most of the time goes unnoticed) but it is still an ironclad rule if you don't want weird and undefined behaviour.

Templates are a frequent cause of issues since they're very leaky by nature, and highly susceptible to ODR issues.

In the context of Boost.Serialization, this means that you can pass the serialized representations across shared library module boundaries but you cannot pass archives or the serializable objects themselves across those boundaries, unless both shared libraries also use Boost.Serialization itself as a shared library.

In general the simplest thing to do is to either keep everything static or everything shared; mixing leads to headaches unless a great deal of care is taken.

This is very useful information, in any context, be it Boost or not. degski -- *"If something cannot go on forever, it will stop" - Herbert Stein*

Alexander Grund

28 Jun 28 Jun

7:14 a.m.

...

Fundamentally, there is something between 300 to 400 bytes memory leak. Yes, it is not much with today's world of GB RAM. However, when you have production code that leaks, then ANY memory leak is unacceptable. Essentially, the leak is due to the lack of the eleven singleton instances NOT invoking the respective destructor. Plus, the group I am part of spends tens of hours tracking down why the production product has memory leaks. The final convincing point that must be clearly understood is this: due to the leak, Boost.Serialization has been replaced. If that does not convince a reader that there is a real problem, then nothing will. Thank you Robert. I have also added another PR that does something very similar to your manual test. It tests that singletons are actually destructed and that the is_destructed flag is set. Note that this is an extract from my bigger PR and just contains the tests to prove this

point and tests are failing obviously in both 1.66 and current dev (different parts though). Maybe we can get this one in first and then fix it with my other PR. Would you maybe add a vote/thumbs-up for the PRs also on Github to increase awareness? To add on this: It is not only a memory leak. The ctor is not called. If one relies on side effects by that, it can cause all sorts of havoc. Thanks, Alex

2564

Age (days ago)

2581

Last active (days ago)

List overview

Download

33 comments

6 participants

participants (6)

Alexander Grund
degski
Gavin Lambert
Olaf van der Spek
Robert
Robert Ramey