On 26.10.19 18:41, Zach Laine via Boost-users wrote:
NFC, very close to FCC, is more popular, due to its compactness. I picked the normalization form with the most readily available time and space optimizations, and then stuck to just that one -- the alternative is many text types with different normalizations having to interoperate, which sounds like hell.
I can understand that, all other things being equal, the more compact form might be preferable. I mean, if you know nothing about Unicode normalization forms other than that one is more compact than the other, then you might as well pick the more compact one, right? But all other things are clearly /not/ equal, or you would just use NFC. And the difference in compactness between NFC and NFD is completely trivial. I challenge you to find any real-world text where the difference is size between NFC and NFD is big enough that I should care about it, both in absolute and relative terms. I consider FCC a non-solution to a non-problem. The advantage of NFC over NFD is not compactness, but compatibility with interfaces that expect NFC. Since FCC does not provide that advantage, there is no reason to choose FCC over NFD. On the other hand, there are several good reasons for choosing NFD over FCC. Aside from the obvious one - compatibility with interfaces that expect NFD - there's also cleaner, simpler code with fewer surprises. For example, it is a completely straightforward operation to replace all acute accents in a NFD text with grave accents or to remove acute accents entirely, whereas the FCC equivalent requires effectively transcoding to NFD. In summary, I think you should support NFD text types. Either in addition to FCC or instead of it. -- Rainer Deyke (rainerd@eldwood.com)