On 7/11/18 9:32 AM, Alexander Grund via Boost wrote:
The problem is that I don't see any connection between your proposed change and the failed test. I'm not disputing that your change makes a problematic symptom go away. But I haven't been able to invest enough time to really understand why that might be so. Please do not focus on that currently checked in test (test_dll_load or what it was) because that one is to complex to reason about.
it's test_dll_export. It's the simplest test I could figure out. There's another one that tests dynamic loading of DLL test_dll_plugin. I don't run that test as I would have to dive a lot more into bjam than I can aford to do.
Please check the very basic singleton interface tests I even added as a separate PR (https://github.com/boostorg/serialization/pull/110) which clearly shows, that the current implementation is flawed.
I think I did that once. On this basis I added a test_singleton to the test suite. I'm pretty https://github.com/boostorg/serialization/blob/master/test/test_singleton.cp... . I does check that all singletons are destroyed in the reverse order. Note: this test doesn't appear in any of the "teeks99" tests. I have no idea why that might be. Also note that the current singleton implementation does no allocation from the heap. If there is a memory leak - it's not obvious why singleton might be causing it. OK - i've looked again at your pr. It replaces my test with your own. Rather than doing that, it's been my practice to add new tests. This helps prevent turning into a process of whack-a-mole. I'll add these two new tests and see if I can re-produce the results on my own machine.
Additionally I have (empiric) evidence that destruction order is unreliable (see the test I sent around in this ML). I looked as one or two of your tests and as a result updated the test suite as above. So I think things are covered. I'll double check though.
So what my proposed change is, is simply: a) fix the bug in the implementation shown by #110 (should be indisputably the right thing to do) and
LOL - sorry it's not indiputable to me.
b) check if the singleton accessed by a singleton-destructor is still alive (equivalent of `if(foo) foo->bar()`)
I will check this.
I don't see how this fixes a symptom only. It tackles the root causes: A defect in the code and a use-after-free caused by the unexpected destruction order
I do not believe there is any freeing going on here. (although the cause for that is unclear and only know
to be related to shared libraries).
Right. I do not think its possible to avoid problems if libs are not unloaded in the reverse order that they are loaded. I believe that this will be something that has to be enforced at the application level. It would be nice if it were possible to detect this from the library, but without any standard defined behavior, this is going to be difficult.
I do acknowledge your lack of time and because the fix is so logical/straight forward I asked for another reviewer to relieve you from that or help out.
I don't always agree with the PRs and the analysis of other participants. I try to give all such suggestions the attention they deserve and recognize the hard work that they entail - even if I decline to include them. On the other hand. Most of the suggestions do eventually get incorporated - perhaps after some back and forth to refine them, add/update tests, etc. It is only because of efforts by persons such as yourself, that I have been able to maintain the library after all these years. And it only because of these efforts that I keep motivated to keep it as perfect as possible. You make me a better man.
The whole issue of dynamically loaded DLLS/shared libraries has been a festering sore for at least 20 years now. It's undefined by the standard and the committee has not seen fit to address it. Sorry it just takes time to get this right.
I also do understand this. But I have proof that your latest change causes a (possible) crash instead of a (certain) memory leak and hence ask you to revert that change for 1.68 or the library will be unusable for some users. Once there is more time, the issue can be fully resolved.
This is the crux. I do not this can be resolved definitively without a real understanding of the source of the problem.
Alex Grund
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost