On Sun, Jun 21, 2020 at 9:16 AM JeanHeyd Meneide via SG16
This is a review for the Boost.Text library, submitted a day late (but hopefully not a dollar short! (U.S. colloquialism, don't mind me!)).
The library has 3 somewhat related but (somewhat?) separable sub-libraries. In "building block" order, these are:
- A string layer (a new std::string) - A unicode layer (algorithms and data) - A text layer (string, but if it gave a single flying crap about Unicode)
[snip]
====== Layer 2 ======
This is the layer I am -- on a library design level and a personal philosophy level -- the most opposed to.
But my answer is still to accept it (well, modulo it being based on the above string type. Please just use std::string).
That seems to be the consensus.
[[ text ]] [[ rope ]] While these containers can be evaluated individually, other reviews have picked up a great deal of pickings at them and so I won't bother. There was some grumbling about how a rope-like data structure is not interesting enough to be included and I will just quietly wave that off as "my use case is the only use case that matters and therefore I don't care about other people's invariants or needs".
There are many implicitly (and explicitly) stated and maintained opinions in this layer:
- UTF-8 is the way, truth, and life. - Unicode is the only encoding that matters ever, for all time, in perpetuity. - Allocators are shit! - NFC is probably the best we can do here for varying reasons. - Who needs non-contiguous storage anyways? - Who needs non-SBO storage, anyways?
These are all opinions, many of which are present in the design of the text container. And they allow this text container to ship. But that lack of flexibility -- while okay for Qt or Apple's CoreText or whatever other platform-specific hoo-ha you want to get involved with -- does not help. In fact, it cuts them off: more than one person during Meeting C++ spoke to me of Boost.Text and said it could not meet their needs because it maintained encoding or normalization invariants that did not interoperate with their existing system. Storage is also an issue: while "I use boost::text::string underneath" is fine and dandy, many systems (next to none, maybe?) are going to speak in "text" or its related string type. They will want the underlying container to speak to. For duck-type purposes, it works. But for everyone else, it fails.
Since the string layer uses an `int` for its size and capacity, it is lopsidedly incompatible with existing STL's implementations of string, to the point that a reinterpret_cast -- however evil -- is not suitable for transporting a reference-without-copy into these APIs.
When text::text changes to use std::string, the size_type will naturally have to change to size_t. I intend to change the size_type of the others to be internally consistent. In for a penny, in for a pound.
God bless string_view and its friends, because it allows us to at least continue to talk to some APIs since the text type guarantees contiguous storage. This means that at the boundaries of an application -- or even as a plugin to a wider ecosystem -- I am paying a (sometimes obscene) cost to interoperate between std::string/llvm::SmallString/unicode_code_unit_sequence and all the other things people have developed to sit between them and what they believe their string needs are. And while it is whack that so many of these classes exist,
they do.
That lack of interoperability -- and once again, the lack of an allocator template parameter -- hampers this library from COMPLETELY DOMINATING the string scene. It will always be used as a solution, maybe even 80% of the time. Those seeking more will have to figure out how to build their own UTF16 containers, or their own special-encoded containers, with very little support from the text library (save for some transcoding functions they can leverage, but only from specific Unicode encodings).
I hope and expect that the idea floated earlier in the review -- of adding Unicode-invariant-maintaining free functions -- will make implementing your own types all but trivial. Zach