On Tue, Oct 29, 2019 at 5:11 AM Leon Mlakar via Boost-users < boost-users@lists.boost.org> wrote:
On 26.10.2019 03:11, Zach Laine via Boost-users wrote:
About 14 months ago I posted the same thing. There was significant work that needed to be done to Boost.Text (the proposed library), and I was a bit burned out.
Now I've managed to make the necessary changes, and I feel the library is ready for review, if there is interest.
This library, in part, is something I want to standardize.
It started as a better string library for namespace "std2", with minimal Unicode support. Though "std2" will almost certainly never happen now, those string types are still in there, and the library has grown to also include all the Unicode features most users will ever need.
Github: https://github.com/tzlaine/text Online docs: https://tzlaine.github.io/text
If you care about portable Unicode support, or even addressing the embarrassment of being the only major production language with next to no Unicode support, please have a look and provide feedback.
Puuting an issue of standardization aside, I certainly would love to see something like that included in Boost. After a quick read of you docs (about an hour), I'm not sure I'm happy with all the choices you've made (see some remarks below) but overall I see it as something I would use in the future. As you wrote, Unicode is hard, even with a library like this; nearly mission impossible without.
Few remarks, for all their worth:
- I've never seen std::string and thread (un)safety as an issue
Fair enough. As stated previously in this thread, the threadsafety feature is a side effect that comes from the copy-on-write semantics of rope. *That* is the reason rope is designed the way it is, not the threadsafety part. It just happens that the threadsafety part comes for free when you do the copy-on-write part.
- pattern if (x == npos) is now so common that is imho important to preserve it
The std::string/std::string_view API is the only place in the STL where the algorithms do not return the end of the half-open input range on failure. That's really wonky. I don't care about preserving it.
- for the sake of completeness the normalization type used at the text level ought to be a policy parameter; although I do understand your arguments against it I think it should be there even at the cost of different text types being inoperable without conversions
I disagree. Policy parameters are bad for reasoning. If I see a text::text, as things currently stand, I know that it is stored as a contiguous array of UTF-8, and that it is normalized FCC. If I add a template parameter to control the normalization, I change the invariants of the type. Types with different invariants should have different names. To do otherwise is a violation of the single responsibility principle.
- at the text level I'm not sure I'm willing to cope with different fundamental text types; I just want to use boost::text::text, pretty much the same as I use std::string as an alias to much more complex class template; heck, even at the string layer I'd probably prefer rope/contiguous concept to be a policy parameter to the same type template.
That would be like adding a template parameter to std::vector that makes it act like a std::deque for certain values of that parameter. Changing the space and time complexity of a type by changing a template parameter is the wrong answer.
- views should be introduced as views and not mixed with rope/contiguous fundamental types
That does not sound like what I want either, but I don't know what this refers to. Could you be specific? Zach