[Boost.serialization] unexpected bad_cast exception while deserializing some arbitrary instances (e.g.: ar & data) ...
Hi serializers! Let me present my problem: I have a class, named 'io_factory', that enables to deserialize an arbitrary object from a text (or XML) archive using the following '__load_text' template method: -8<--------------------------------------------------------------------- class io_factory { ... template <typename Data> void __load_text (boost::archive::text_iarchive & ar_, Data & data_) { Data & b = data_; std::cerr << "DEVEL: io_factory::__load_text: archive class name='" << typeid (ar_).name () << "'" << std::endl; std::cerr << "DEVEL: io_factory::__load_text: class name='" << typeid (b).name () << "'" << std::endl; try { /* Here I may experience an unexpected exception * only in very special circumstances: */ ar_ >> b; std::cerr << "DEVEL: io_factory::__load_text: deserialization is ok." << std::endl; } catch (std::exception & x) { std::cerr << "DEVEL: io_factory::__load_text: EXCEPTION=" << x.what () << std::endl; } std::cerr << "DEVEL: io_factory::__load_text: done" << std::endl; } ... }; -8<--------------------------------------------------------------------- I wrote a shared library that makes use of this template code segment. Using a test program, as I try to deserialize an instance of some 'nemo3::ana_event' class from a Boost text archive file, it prints: -8<--------------------------------------------------------------------- DEVEL: io_factory::__load_text: archive class name='N5boost7archive13text_iarchi veE' DEVEL: io_factory::__load_text: class name='N5nemo39ana_eventE' <= ok 'nemo3::ana_event' DEVEL: io_factory::__load_text: deserialization is ok. <==== IT WORKS! DEVEL: io_factory::__load_text: done -8<--------------------------------------------------------------------- This is fine! My object is properly loaded from the file. My test program works as expected. Well, now, I also have a Python module that wraps some C++ code and the 'io_factory' class. I use Boost.Python to do that. When I run my Python program, importing this wrapper module, to deserialize the same Boost text archive file from the python shell; it prints: -8<--------------------------------------------------------------------- ... DEVEL: io_factory::__load_text: archive class name='N5boost7archive13text_iarchiveE' DEVEL: io_factory::__load_text: class name='N5nemo39ana_eventE' DEVEL: io_factory::__load_text: EXCEPTION=std::bad_cast <= OOOOPS! DEVEL: io_factory::__load_text: done ... -8<--------------------------------------------------------------------- As you can see, there is a nasty 'std::bad_cast' thrown exception from the "ar_ >> b;" statement. This code works perfectly in the first case, now something is broken and I cannot figure out what! I am really confused by this issue. I already used exactly the same approach within another library (also based on Boost.serialization and Boost.Python) and I never met any problem. More, I also have a smaller python script that uses the same wrapper module with the same input archive file and it works: I can read all the instances I stored in my archive file! If I use an XML archive, the problem is exactly the same: it works in the first case, not in the last one. I do not understand where this 'bad_cast' exception comes from. For me it as nothing to do with: - the format of the archive (I know the archive file is ok as the test program store/load is properly) - the wrapping within Python (I can make it run ok within a sample python script) I really need a hint that could help me to understand this issue. It seems this problem looks like a side-effect (?) anyway the deep reason is outside of the scope of my code and my knowledge. My config is: * OS: Ubuntu Linux 8.04 * Linux Kernel: $ uname -a Linux mauger-laptop 2.6.24-24-generic #1 SMP Wed Apr 15 15:54:25 UTC 2009 i686 GNU/Linux * Boost: $ boost-config --version 1_38_0 * gcc: $ g++ --version g++ (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu3) * Python 2.5 I'd really appreciate your comments. I'm afraid I cannot send a small sample code that could reproduce this 'bug' in a simpler way as it comes only at a quite high level on top of 3 big shared libraries + Boost + Python. My attempt to isolate some critical code to see this side-effect within "Boost.serialization only" or "Boost.Python only" failed. More it confirms that the code works as expected (hum... or seems to work...). Thanks a lot for your help. regards frc -- François Mauger Département de Physique - Université de Caen Basse-Normandie courriel/e-mail: mauger@lpccaen.in2p3.fr tél./phone: 02 31 45 25 12 / (+33) 2 31 45 25 12 fax: 02 31 45 25 49 / (+33) 2 31 45 25 49 Adresse/address: Laboratoire de Physique Corpusculaire de Caen (UMR 6534) ENSICAEN 6, Boulevard du Marechal Juin 14050 CAEN Cedex FRANCE
Francois Mauger wrote:
I do not understand where this 'bad_cast' exception comes from. For me it as nothing to do with: - the format of the archive (I know the archive file is ok as the test program store/load is properly) - the wrapping within Python (I can make it run ok within a sample python script)
Hi Francois, You could try adding this to the top of your script: import dl import sys flags = sys.getdlopenflags() sys.setdlopenflags(flags | dl.RTLD_GLOBAL) -t
Hi Troy,
You could try adding this to the top of your script:
import dl import sys flags = sys.getdlopenflags() sys.setdlopenflags(flags | dl.RTLD_GLOBAL)
Great! It works (at least no problem appeared after several trial)! Thank you very much. I was so desesperate! It seems also that I should use this 'trick' within my former work: it has probably worked for monthes only by chance! Now I got a solution, may I dare ask you an explanation about this issue? I'm rather unskilled with this 'dl' stuff. What I don't understand is why my code was working in some case and not in another one, despite the fact that imported wrapper modules were the same. Does it have to do with some unpredictable dynamic loading of symbols depending of the Python running context? Do you think this issue would appear also under Darwin (the other target OS for my libs)? Thanks for you comments and many many thanks for this trick. regards frc
-t
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- Francois Mauger Laboratoire de Physique Corpusculaire de Caen et Universite de Caen ENSICAEN - 6, Boulevard du Marechal Juin, 14050 CAEN Cedex, FRANCE e-mail: mauger@lpccaen.in2p3.fr tel.: (0/+33) 2 31 45 25 12 fax: (0/+33) 2 31 45 25 49
François Mauger wrote:
Hi Troy,
You could try adding this to the top of your script:
import dl import sys flags = sys.getdlopenflags() sys.setdlopenflags(flags | dl.RTLD_GLOBAL)
Great! It works (at least no problem appeared after several trial)! Thank you very much. I was so desesperate! It seems also that I should use this 'trick' within my former work: it has probably worked for monthes only by chance!
We (IceCube) use this combination of serialization and python bindings on a large scale on linux/osx/fbsd with good results. I wouldn't be too afraid.
Now I got a solution, may I dare ask you an explanation about this issue? I'm rather unskilled with this 'dl' stuff. What I don't understand is why my code was working in some case and not in another one, despite the fact that imported wrapper modules were the same. Does it have to do with some unpredictable dynamic loading of symbols depending of the Python running context? Do you think this issue would appear also under Darwin (the other target OS for my libs)?
You can look at the what the flags RTLD_GLOBAL and RTLD_LOCAL do in a call to dlopen() (manpage). It is worth writing some test cases and really learning what is happening. By default a python 'import' uses RTLD_LOCAL (use strace and see for yourself); the voodoo above sets the default to RTLD_GLOBAL, as you can probably guess. There is also a paper called "how to write shared libraries" by Ulrich Drepper with lots of good background information in it. Another thing to play with: force an RTLD_GLOBAL load of the wrapped shared library by calling dlopen("libwrappedlib.so", RTLD_NOW | RTLD_GLOBAL) inside the BOOST_PYTHON_MODULE(wrappedlib) { ... } function. -t
Hi Troy Thanks a lot for these comments! I'll have a look. Best regards and happy neutrinos! frc (forever searching for neutrinoless double beta...) --
troy d. straszheim a écrit : François Mauger wrote: Hi Troy,
You could try adding this to the top of your script:
import dl import sys flags = sys.getdlopenflags() sys.setdlopenflags(flags | dl.RTLD_GLOBAL)
Great! It works (at least no problem appeared after several trial)! Thank you very much. I was so desesperate! It seems also that I should use this 'trick' within my former work: it has probably worked for monthes only by chance!
We (IceCube) use this combination of serialization and python bindings on a large scale on linux/osx/fbsd with good results. I wouldn't be too afraid.
Now I got a solution, may I dare ask you an explanation about this issue? I'm rather unskilled with this 'dl' stuff. What I don't understand is why my code was working in some case and not in another one, despite the fact that imported wrapper modules were the same. Does it have to do with some unpredictable dynamic loading of symbols depending of the Python running context? Do you think this issue would appear also under Darwin (the other target OS for my libs)?
You can look at the what the flags RTLD_GLOBAL and RTLD_LOCAL do in a call to dlopen() (manpage). It is worth writing some test cases and really learning what is happening. By default a python 'import' uses RTLD_LOCAL (use strace and see for yourself); the voodoo above sets the default to RTLD_GLOBAL, as you can probably guess. There is also a paper called "how to write shared libraries" by Ulrich Drepper with lots of good background information in it.
Another thing to play with: force an RTLD_GLOBAL load of the wrapped shared library by calling dlopen("libwrappedlib.so", RTLD_NOW | RTLD_GLOBAL) inside the BOOST_PYTHON_MODULE(wrappedlib) { ... } function.
-t _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- François Mauger Département de Physique - Université de Caen Basse-Normandie courriel/e-mail: mauger@lpccaen.in2p3.fr tél./phone: 02 31 45 25 12 / (+33) 2 31 45 25 12 fax: 02 31 45 25 49 / (+33) 2 31 45 25 49 Adresse/address: Laboratoire de Physique Corpusculaire de Caen (UMR 6534) ENSICAEN 6, Boulevard du Marechal Juin 14050 CAEN Cedex FRANCE
I think this is a manifestation of bug reported with the 1.39 version of the library. I'm trying to fix this but so far I haven't been able to implement a satisfactory solution. stay tuned. Robert Ramey Francois Mauger wrote:
Hi serializers!
Let me present my problem:
I have a class, named 'io_factory', that enables to deserialize an arbitrary object from a text (or XML) archive using the following '__load_text' template method:
-8<---------------------------------------------------------------------
class io_factory { ...
template <typename Data> void __load_text (boost::archive::text_iarchive & ar_, Data & data_) { Data & b = data_; std::cerr << "DEVEL: io_factory::__load_text: archive class name='" << typeid (ar_).name () << "'" << std::endl; std::cerr << "DEVEL: io_factory::__load_text: class name='" << typeid (b).name () << "'" << std::endl; try { /* Here I may experience an unexpected exception * only in very special circumstances: */ ar_ >> b; std::cerr << "DEVEL: io_factory::__load_text: deserialization is ok." << std::endl; } catch (std::exception & x) { std::cerr << "DEVEL: io_factory::__load_text: EXCEPTION=" << x.what () << std::endl;
} std::cerr << "DEVEL: io_factory::__load_text: done" << std::endl; } ... };
-8<---------------------------------------------------------------------
I wrote a shared library that makes use of this template code segment. Using a test program, as I try to deserialize an instance of some 'nemo3::ana_event' class from a Boost text archive file, it prints:
-8<--------------------------------------------------------------------- DEVEL: io_factory::__load_text: archive class name='N5boost7archive13text_iarchi veE' DEVEL: io_factory::__load_text: class name='N5nemo39ana_eventE' <= ok 'nemo3::ana_event' DEVEL: io_factory::__load_text: deserialization is ok. <==== IT WORKS! DEVEL: io_factory::__load_text: done -8<---------------------------------------------------------------------
This is fine! My object is properly loaded from the file. My test program works as expected.
Well, now, I also have a Python module that wraps some C++ code and the 'io_factory' class. I use Boost.Python to do that.
When I run my Python program, importing this wrapper module, to deserialize the same Boost text archive file from the python shell; it prints:
-8<--------------------------------------------------------------------- ... DEVEL: io_factory::__load_text: archive class name='N5boost7archive13text_iarchiveE' DEVEL: io_factory::__load_text: class name='N5nemo39ana_eventE' DEVEL: io_factory::__load_text: EXCEPTION=std::bad_cast <= OOOOPS! DEVEL: io_factory::__load_text: done ... -8<---------------------------------------------------------------------
As you can see, there is a nasty 'std::bad_cast' thrown exception from the "ar_ >> b;" statement. This code works perfectly in the first case, now something is broken and I cannot figure out what!
I am really confused by this issue. I already used exactly the same approach within another library (also based on Boost.serialization and Boost.Python) and I never met any problem. More, I also have a smaller python script that uses the same wrapper module with the same input archive file and it works: I can read all the instances I stored in my archive file! If I use an XML archive, the problem is exactly the same: it works in the first case, not in the last one.
I do not understand where this 'bad_cast' exception comes from. For me it as nothing to do with: - the format of the archive (I know the archive file is ok as the test program store/load is properly) - the wrapping within Python (I can make it run ok within a sample python script)
I really need a hint that could help me to understand this issue. It seems this problem looks like a side-effect (?) anyway the deep reason is outside of the scope of my code and my knowledge. My config is:
* OS: Ubuntu Linux 8.04
* Linux Kernel: $ uname -a Linux mauger-laptop 2.6.24-24-generic #1 SMP Wed Apr 15 15:54:25 UTC 2009 i686 GNU/Linux
* Boost: $ boost-config --version 1_38_0
* gcc: $ g++ --version g++ (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu3)
* Python 2.5
I'd really appreciate your comments. I'm afraid I cannot send a small sample code that could reproduce this 'bug' in a simpler way as it comes only at a quite high level on top of 3 big shared libraries + Boost + Python. My attempt to isolate some critical code to see this side-effect within "Boost.serialization only" or "Boost.Python only" failed. More it confirms that the code works as expected (hum... or seems to work...).
Thanks a lot for your help.
regards
frc
participants (4)
-
Francois Mauger
-
François Mauger
-
Robert Ramey
-
troy d. straszheim