Re: [Boost-users] Interest in a Unicode library for Boost?

29 Oct 2019


      On Tue, Oct 29, 2019 at 5:11 AM Leon Mlakar via Boost-users <
boost-users@lists.boost.org> wrote:
...
On 26.10.2019 03:11, Zach Laine via Boost-users wrote:
...
About 14 months ago I posted the same thing.  There was significant
work that needed to be done to Boost.Text (the proposed library), and
I was a bit burned out.
Now I've managed to make the necessary changes, and I feel the library
is ready for review, if there is interest.
This library, in part, is something I want to standardize.
It started as a better string library for namespace "std2", with
minimal Unicode support.  Though "std2" will almost certainly never
happen now, those string types are still in there, and the library has
grown to also include all the Unicode features most users will ever need.
Github: https://github.com/tzlaine/text
Online docs: https://tzlaine.github.io/text
If you care about portable Unicode support, or even addressing the
embarrassment of being the only major production language with next to
no Unicode support, please have a look and provide feedback.
Puuting an issue of standardization aside, I certainly would love to see
something like that included in Boost. After a quick read of you docs
(about an hour), I'm not sure I'm happy with all the choices you've made
(see some remarks below) but overall I see it as something I would use
in the future. As you wrote, Unicode is hard, even with a library like
this; nearly mission impossible without.
Few remarks, for all their worth:
- I've never seen std::string and thread (un)safety as an issue
Fair enough.  As stated previously in this thread, the threadsafety feature
is a side effect that comes from the copy-on-write semantics of rope.
*That* is the reason rope is designed the way it is, not the threadsafety
part.  It just happens that the threadsafety part comes for free when you
do the copy-on-write part.
...
- pattern if (x == npos) is now so common that is imho important to
preserve it
The std::string/std::string_view API is the only place in the STL where the
algorithms do not return the end of the half-open input range on failure.
That's really wonky.  I don't care about preserving it.
...
- for the sake of completeness the normalization type used at the text
level ought to be a policy parameter; although I do understand your
arguments against it I think it should be there even at the cost of
different text types being inoperable without conversions
I disagree.  Policy parameters are bad for reasoning.  If I see a
text::text, as things currently stand, I know that it is stored as a
contiguous array of UTF-8, and that it is normalized FCC.  If I add a
template parameter to control the normalization, I change the invariants of
the type.  Types with different invariants should have different names.  To
do otherwise is a violation of the single responsibility principle.
...
- at the text level I'm not sure I'm willing to cope with different
fundamental text types; I just want to use boost::text::text, pretty
much the same as I use std::string as an alias to much more complex
class template; heck, even at the string layer I'd probably prefer
rope/contiguous concept to be a policy parameter to the same type template.
That would be like adding a template parameter to std::vector that makes it
act like a std::deque for certain values of that parameter.  Changing the
space and time complexity of a type by changing a template parameter is the
wrong answer.
...
- views should be introduced as views and not mixed with rope/contiguous
fundamental types
That does not sound like what I want either, but I don't know what this
refers to.  Could you be specific?

Zach