[serialization] [libstdc++] [detail] utf8_codecvt_facet fixes broke serialization test_array_xml_warchive
The specific crash message is: *** Error in `../../../bin.v2/libs/serialization/test/test_array_xml_warchive.test/clang-linux-libstdcpp/debug/test_array_xml_warchive': double free or corruption (!prev): 0x00000000015f6f90 *** It occurs for clang, gcc, and intel compilers, using libstdc++. It does not occur with clang using libc++. It does not occur with msvc 10.0, 11.0, or 12.0. None of the other libraries (filesystem, log, program_options, property_tree) that use utf8_codecvt_facet are failing on develop. This is mainly a heads up to let people know that the serialization problem in develop is being worked on, but it may be a day or two before I have a fix. --Beman
Beman Dawes wrote
The specific crash message is:
*** Error in `../../../bin.v2/libs/serialization/test/test_array_xml_warchive.test/clang-linux-libstdcpp/debug/test_array_xml_warchive': double free or corruption (!prev): 0x00000000015f6f90 ***
It occurs for clang, gcc, and intel compilers, using libstdc++. It does not occur with clang using libc++. It does not occur with msvc 10.0, 11.0, or 12.0.
None of the other libraries (filesystem, log, program_options, property_tree) that use utf8_codecvt_facet are failing on develop.
This is mainly a heads up to let people know that the serialization problem in develop is being worked on, but it may be a day or two before I have a fix.
Hmmmm - this looks like new behavior. I don't remember changing anything that might provoke this. Am I wrong or is there some other change (perhaps in another library) which provokes this? Since C++11 we had some problems with utf8_codecvt_facet due to confusion between the now "built-in" implementation and the original "home grown" version. It took some time to sort out because it varied according to which combinations of compiler version and compiler switches were selected and no one has all combinations on their desktop. So fair warning about being too hasty about fixing this or declaring it fixed. I got trapped several times this way. Also note that it seems that is only used on wide character strings and lots of other libraries don't require these. So it might be wrong and our tests might not be sufficiently exhaustive to detect this. This raises another interesting question. For many years we've been relying on Ron Garcia's original codecvt facet which has worked fine. This in spite of the fact that it was never reviewed and attempts to include in boost outside of the detail directories were rebuffed. I snuck the documentation and tests of it into the serialization library as I needed it and had no other choice. But now it's sort of intertwined with the std implementation (IRC) which is part of the problem. A better solution might be a new library for codecvt facets. There is a rich opportunity here. The codecvt interface is actually quite general and codecvt facets can be used for translating text from one coding to another even without any i/o involved. This new library would consist of a) a codecvt "construction kit" consisting of code from the data flow iterators of the serialization library and/or implementations from the boost range library. b) This "construction kit" would permit one to compose a "codecvt stack" of conversions at compile time. c) any such "codecvt stack" could be used as a stream facet or as a stand alone way to translate one character stream to another. Having invested some time learning about how codecvt facets work, I've come to the conclusion that they are largely un appreciated. I'm guessing billions of lines of hand rolled code (BLOC s) which implement conversions on a pair by pair basis could be replaced by such a library. Making such a library would be more or less straight forward, but would require a lot of care to related issues such as it's documentation in order to make it more widely used. But the person who does this will likely become as famous as I am. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/serialization-libstdc-detail-utf8-codecvt... Sent from the Boost - Dev mailing list archive at Nabble.com.
On Thu, Sep 4, 2014 at 11:59 AM, Robert Ramey
Beman Dawes wrote
The specific crash message is:
*** Error in
`../../../bin.v2/libs/serialization/test/test_array_xml_warchive.test/clang-linux-libstdcpp/debug/test_array_xml_warchive':
double free or corruption (!prev): 0x00000000015f6f90 ***
It occurs for clang, gcc, and intel compilers, using libstdc++. It does not occur with clang using libc++. It does not occur with msvc 10.0, 11.0, or 12.0.
Although the regression tests are only showing failures on non-Windows systems, the failure is also easy to reproduce using cygwin/gcc on Windows. It occurs in both C++03 and C++11 modes.
None of the other libraries (filesystem, log, program_options, property_tree) that use utf8_codecvt_facet are failing on develop.
This is mainly a heads up to let people know that the serialization problem in develop is being worked on, but it may be a day or two before I have a fix.
Hmmmm - this looks like new behavior.
Actually, this is the same problem Marshall ran into a year or so ago when he fixed boost/detail/utf8_codecvt_facet.hpp: --- C:\Users\Beman\AppData\Local\Temp\TortoiseGit\utf253F.tmp\utf8_codecvt_facet-5ef03bf-left.hpp 2014-09-05 10:23:25.000000000 -0400 +++ C:\boost\modular\develop\libs\detail\include\boost\detail\utf8_codecvt_facet.hpp 2014-09-05 08:43:11.000000000 -0400 @@ -89,13 +89,13 @@ namespace std { using ::mbstate_t; using ::size_t; } #endif -#if !defined(__MSL_CPP__) && !defined(__LIBCOMO__) +#if defined(_CPPLIB_VER) && (_CPPLIB_VER < 540) #define BOOST_CODECVT_DO_LENGTH_CONST const #else #define BOOST_CODECVT_DO_LENGTH_CONST #endif // maximum lenght of a multibyte string
I don't remember changing anything that might provoke this. Am I wrong or is there some other change (perhaps in another library) which provokes this? Since C++11 we had some problems with utf8_codecvt_facet due to confusion between the now "built-in" implementation and the original "home grown" version.
AFAIK, serialization is the only library that tries to switch between the std:: version and the boost:: version. It is quite clear the bug is in serialization (or even stdlibc++ codecvt) rather than in the boost::detail code.
It took some time to sort out because it varied according to which combinations of compiler version and compiler switches were selected and no one has all combinations on their desktop.
The bug is showing up regardless of the compiler version or switches. It is easy to demonstrate; just switch back and forth between the two versions of the #if line.
So fair warning about being too hasty about fixing this or declaring it fixed. I got trapped several times this way.
The #if bug and several other bugs in boost::detail that got introduced trying to make serialization work around the time Marshall introduced his original patch. While those changes papered over the problem in serialization, they are causing bug reports to be posted against other libraries, particularly filesystem.
Also note that it seems that is only used on wide character strings and lots of other libraries don't require these. So it might be wrong and our tests might not be sufficiently exhaustive to detect this.
In filesystem, all BSD-based operating systems (such as Mac OS X) use the boost::detail code.
This raises another interesting question. For many years we've been relying on Ron Garcia's original codecvt facet which has worked fine. This in spite of the fact that it was never reviewed and attempts to include in boost outside of the detail directories were rebuffed. I snuck the documentation and tests of it into the serialization library as I needed it and had no other choice.
But now it's sort of intertwined with the std implementation (IRC) which is part of the problem.
Does any Boost library other than serialization try to switch between boost:: and std:: versions?
A better solution might be a new library for codecvt facets. There is a rich opportunity here.
Why? Microsoft, for example, ships codecvt facets for 79 character sets, including the difficult Asian character sets. Why should boost try to duplicate the work that vendors have already done, particularly when Unicode become predominate.? --Beman
Beman Dawes wrote:
Actually, this is the same problem Marshall ran into a year or so ago when he fixed boost/detail/utf8_codecvt_facet.hpp:
...
-#if !defined(__MSL_CPP__) && !defined(__LIBCOMO__) +#if defined(_CPPLIB_VER) && (_CPPLIB_VER < 540) #define BOOST_CODECVT_DO_LENGTH_CONST const #else #define BOOST_CODECVT_DO_LENGTH_CONST #endif
As I already said months ago (I think), there's an easy fix for that: just define do_length two times, one const, one non-const. You won't have to #ifdef then, and it will work everywhere.
Beman Dawes wrote
On Thu, Sep 4, 2014 at 11:59 AM, Robert Ramey <
ramey@
> wrote:
Does any Boost library other than serialization try to switch between boost:: and std:: versions?
I'm just trying to make sure that the serialization library works on both C++O3 and C++11 systems. Support for utf8_codecvt is different for the libraries which come withe systems. On many platforms, C++11 includes std support for utf8_codecvt and that's what I use if it exists. While C++03 libraries don't come with such support. And those that did had inconsistent interfaces. I made changes which use config to select the std version if the library supports it and the boost version otherwise. That was the only way I saw to avoid invalidating all existing files created with the serialization library.
A better solution might be a new library for codecvt facets. There is a rich opportunity here.
Why? Microsoft, for example, ships codecvt facets for 79 character sets, including the difficult Asian character sets. Why should boost try to duplicate the work that vendors have already done, particularly when Unicode become predominate.?
the codecvt interface is quite general purpose and could be used for a lot of other things like translating binary data into base64. This could be used to decouple things like base64 from the user program and just make it a component of the stream buffer. Also there's the possibility of making a codecvt composer such that other codecvt types could be piped together. Basically, it would be a kit including a set of stream transformation primitives which could be arbitrarily composed to generate a more complex stream. FWIW that's what I had in mind. RObert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/serialization-libstdc-detail-utf8-codecvt... Sent from the Boost - Dev mailing list archive at Nabble.com.
On Thursday 04 September 2014 08:59:53 Robert Ramey wrote:
But now it's sort of intertwined with the std implementation (IRC) which is part of the problem. A better solution might be a new library for codecvt facets.
We already have such library - Boost.Locale.
Andrey Semashev-2 wrote
On Thursday 04 September 2014 08:59:53 Robert Ramey wrote:
But now it's sort of intertwined with the std implementation (IRC) which is part of the problem. A better solution might be a new library for codecvt facets.
We already have such library - Boost.Locale.
I didn't know that. I looked at the documentation and didn't find and drop-in replacement for the utf8-codecvt facet. I admit I didn't spend a lot of time looking though. I'll look later. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/serialization-libstdc-detail-utf8-codecvt... Sent from the Boost - Dev mailing list archive at Nabble.com.
On Saturday 06 September 2014 17:31:09 Robert Ramey wrote:
Andrey Semashev-2 wrote
On Thursday 04 September 2014 08:59:53 Robert Ramey wrote:
But now it's sort of intertwined with the std implementation (IRC) which is part of the problem. A better solution might be a new library for codecvt facets.
We already have such library - Boost.Locale.
I didn't know that. I looked at the documentation and didn't find and drop-in replacement for the utf8-codecvt facet. I admit I didn't spend a lot of time looking though. I'll look later.
It's not a drop-in replacement, but you can generate a locale with the appropriate codecvt facet. http://www.boost.org/doc/libs/1_56_0/libs/locale/doc/html/charset_handling.h...
Now that I'm thinking about I'm remembering a bit more. I have a free standing test of the utf8_codecvt facet as part of the serialization library tests. This test has always passed. I have a test test_array_warchive which fails originally (and now again) with a run time error - double deletion of some object. I spent a huge amount of time trying to figure this out. I concluded that depending on inclusions, sometimes the standard library implementation would be linked with the header from the boost utf8_codecvt (or something like that). So this problem only showed in this one test which was not specific to utf8_codecvt facets. Maybe I got the fix wrong, but now test_array_warchive is failing again on develop. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/serialization-libstdc-detail-utf8-codecvt... Sent from the Boost - Dev mailing list archive at Nabble.com.
Beman Dawes wrote
The specific crash message is:
*** Error in `../../../bin.v2/libs/serialization/test/test_array_xml_warchive.test/clang-linux-libstdcpp/debug/test_array_xml_warchive': double free or corruption (!prev): 0x00000000015f6f90 ***
It occurs for clang, gcc, and intel compilers, using libstdc++. It does not occur with clang using libc++. It does not occur with msvc 10.0, 11.0, or 12.0.
None of the other libraries (filesystem, log, program_options, property_tree) that use utf8_codecvt_facet are failing on develop.
This is mainly a heads up to let people know that the serialization problem in develop is being worked on, but it may be a day or two before I have a fix.
I'm not seeing where this came from. This problem existed several months ago and I spent some time tracking it down, making changes and fixed it. Problems are not showing up in the master branch. Now my fix was backed out. Presumably something iwasn't quite right about it. But now the original problem has reappeared in the develop branch. I presume that whoever backed out my fix is "fixing the fix" so I can just ignore it. I am sort of curious why my fix was backed out. There wasn't any issue with the serialization library until that was done. So there must have been some other issue. What was it? I'm just curious. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/serialization-libstdc-detail-utf8-codecvt... Sent from the Boost - Dev mailing list archive at Nabble.com.
Beman Dawes wrote
The specific crash message is:
*** Error in `../../../bin.v2/libs/serialization/test/test_array_xml_warchive.test/clang-linux-libstdcpp/debug/test_array_xml_warchive': double free or corruption (!prev): 0x00000000015f6f90 ***
It occurs for clang, gcc, and intel compilers, using libstdc++. It does not occur with clang using libc++. It does not occur with msvc 10.0, 11.0, or 12.0.
None of the other libraries (filesystem, log, program_options, property_tree) that use utf8_codecvt_facet are failing on develop.
None of the other text_warchive tests (~50) are failing either. It's
a problem which only appears in this one specific case. That's not
to say that test is the problem, it's just that appears in that one case.
I fixed this months ago.
Now that it's been fixed again - the offending test is failing again on
all platforms using libstdc++.
I suggest rolling back the most recent change then everything
should start passing again.
Robert Ramey
--
View this message in context: http://boost.2283326.n4.nabble.com/serialization-libstdc-detail-utf8-codecvt...
Sent from the Boost - Dev mailing list archive at Nabble.com.
participants (4)
-
Andrey Semashev
-
Beman Dawes
-
Peter Dimov
-
Robert Ramey