On Mon, Oct 28, 2019 at 3:35 AM David Demelier via Boost-users < boost-users@lists.boost.org> wrote:
Le 26/10/2019 à 03:11, Zach Laine via Boost-users a écrit :
About 14 months ago I posted the same thing. There was significant work that needed to be done to Boost.Text (the proposed library), and I was a bit burned out.
Now I've managed to make the necessary changes, and I feel the library is ready for review, if there is interest.
This library, in part, is something I want to standardize.
It started as a better string library for namespace "std2", with minimal Unicode support. Though "std2" will almost certainly never happen now, those string types are still in there, and the library has grown to also include all the Unicode features most users will ever need.
Github: https://github.com/tzlaine/text Online docs: https://tzlaine.github.io/text
I've read the intro on why is std::string so bad and I have to disagree with many points.
1. The Fat Interface
In which way is std::string bloat? Of course some functions are probably here as synonymous but to say it's bloat is kinda false. Just look at Java's String numerous functions instead [0].
Comparing std::string to Java's string class is not doing std::string any favors.
And I
2. The Missing Unicode Support
Yes, many newcomers may be surprised to see that a string "é" has a size of 2 bytes (assuming UTF-8). But it's also the case of UTF-16 strings which may have surrotage pairs...
UTF-8 is the way to go and effectively stored. One could argue that we should have some utf8 iterators or things like that. But std::string is still a good candidate for string manipulations.
I agree that UTF-8 is the way to go (and as I think you've seen, the library reflects that). However, UTF-8 encoding is only part of the story. There is also normalization. If you use UTF-8-in-std::strings, normalization will not be enforced. (Neither will UTF-8 encoding, but that's less of a problem if you always intend to produce replacement characters for broken UTF-8.) Most users will want a type that enforces normalization as a class invariant. Those that do not have the tools -- the algorithms and iterators in the Unicode layer -- to do that in a std::string if they want.
3. Miscellaneous Limitations
Not thread-safe being an issue? Thanks god it is not. Imagine the overhead of a threadsafe version of a string. The purpose of a library is not to be threadsafe on every objects. This has to be on the user side.
I don't think all string types should be threadsafe, but having a threadsafe option is nice. That was not an explicit goal of adding ropes, but it is a nice side-effect of the choice I made for how to implement the ropes in Boost.Text.
That said, I really hope for a better unicode support in std:: in the near future. Your library is well designed and API is clean, I hope it could be added in Boost :-).
Thanks, me too. :) Zach