On Sun, Feb 14, 2010 at 3:06 AM, Lars Viklund
On Sun, Feb 14, 2010 at 02:50:43AM +0530, Sachin Garg wrote:
My project uses both boost and wxwidgets and unicode encoding by both is different on Mac OSX. Everything works fine on windows.
Problem: Boost and WX do end up encoding the strings differently when converting to unicode on OSX. I am detailing an example:
WX's encoding is same on both windows and osx but Boost's encoding is different on both platforms. It is probably not a bug but I am unable to figure out the reason and how to make them both work together. Hex dumps of Unicode encodings of this string
Unicode has a bunch of different Normalization Forms [1]. A normalization form tells how diacritics and composite codepoints should be composed or decomposed when represented.
The choice of NF is up to the OS, most importantly, OSX and Windows does it differently. The encoding of your strings seems to be the same, they're just composed differently.
Boost likely uses OS functions to convert between encodings while I assume that WX uses its own internally consistent transcoding.
[1] http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms
Thanks, this explains a lot. Is there some std/boost way to specify which encoding/normalization to use? Or to find out which encoding boost defaults to? I will bring this up on WX list too, but in case there is no 'correct' way to decide which encoding to use, I will still need to make them compatible to make my software work. SG