Re: [boost] [review] [text] Text formal review

14 Jun 2020

      On 14.06.20 01:25, Zach Laine via Boost wrote:
...
On Fri, Jun 12, 2020 at 4:15 PM Rainer Deyke via Boost
<boost@lists.boost.org> wrote:
...
A memmapped string_view /is/ a contiguous sequence of char.  I don't see
the difference.
The difference is mutability.  There's no perf concern with erasing
the first element of a string_view, if that's not even a supported
operation.
A /lot/ of strings, probably the vast majority, will never be mutated. 
And for the rest, the majority will only be mutated by appending.

Erasing the first element is a nice to have but expensive and rarely 
used feature.  If you find yourself doing that a lot, then you probably 
do want a rope.
...
...
Somewhere in the implementation of operator[] and operator(), there has
to be a branch on index < 0 (or >= 0) in order for that negative index
trick to work, which the compiler can't always optimize away.  Branches
are often affordable but they're not free.
Ah, I see, thanks.  Would it make you feel better if negative indexing
  were only used when getting substrings?
That does address the performance problem, so yes.
...
...
I hadn't thought through the interface in detail.  I just saw that this
was a feature of the text layer, and thought it would be nice to have in
the unicode layer, because I don't want to use the text layer (in its
current form).
I don't need a detailed interface.  Pseudocode would be fine too.
insert_nfd(string, position, thing_to_insert)
// Insert 'thing_to_insert' into 'string' at 'position'.  Both 'string'
// and 'thing_to_insert' are required to be in NFD.  The area around the
// insertion is renormalized to NFD.
...
...
Having to renormalize at API boundaries can be prohibitively expensive.
Sure.  Anything can be prohibitively expensive in some context.  If
that's the case in a particular program, I think it is likely to be
unacceptable to use text::operator+(string_view) as well, since that
also does on-the-fly normalization.
Hopefully only on the string_view and the area immediately surrounding 
the insertion.
...
Someone, somewhere, has to pay
that cost if you want to use two chunks of text in
encoding/normalization A and B.  You might be able to keep working in
A for some text and keep working in B separately for other text, but I
think code that works like that is going to be hard to reason about,
and will be as common as code that freely mixes wstring and string
(and I mean not only at program boundaries).  That is, not very
common.
Which is why I want to avoid just that.

Your suggestions:

   void f() {
     // renormalizes to fcc
     text::text t = api_funtion_that_returns_nfd();
     do_something_with(t);
     string s;
     text::normalize_to_nfd(t.extract(), back_inserter(s));
     api_function_that_accepts_nfd(s);
   }

My suggestion:

   void f() {
     text::text<nfd, std::string> t = api_function_that_returns_nfd();
     do_something_with(t);
     api_function_that_accepts_nfd(t.extract());
   }
...
That's what I don't get.  Could you explain how text<A> and text<B>
are useful in a specific case?
text<deque<char> >, for fast insertion/removal at both ends?

But it's really text::text<std::string> that I'm after, so if 
text::text<std::string> becomes just text::text, then I'm satisfied.

-- 
Rainer Deyke (rainerd@eldwood.com)