On Sun, Jun 14, 2020 at 7:25 AM Rainer Deyke via Boost
On 14.06.20 01:25, Zach Laine via Boost wrote:
On Fri, Jun 12, 2020 at 4:15 PM Rainer Deyke via Boost
wrote: A memmapped string_view /is/ a contiguous sequence of char. I don't see the difference.
The difference is mutability. There's no perf concern with erasing the first element of a string_view, if that's not even a supported operation.
A /lot/ of strings, probably the vast majority, will never be mutated.
Ok, then those should more appropriately be string_views.
And for the rest, the majority will only be mutated by appending.
That does not help, unless the capacity is so large that a reallocation is unnecessary.
Erasing the first element is a nice to have but expensive and rarely used feature. If you find yourself doing that a lot, then you probably do want a rope.
Any mutation might cause a reallocation. I named one of the worst-case operations rhetorically, but appending is also bad if it causes that reallocation. It's not a question of what kind of mutating operation you're doing, but whether you're mutating or not.
I hadn't thought through the interface in detail. I just saw that this was a feature of the text layer, and thought it would be nice to have in the unicode layer, because I don't want to use the text layer (in its current form).
I don't need a detailed interface. Pseudocode would be fine too.
insert_nfd(string, position, thing_to_insert) // Insert 'thing_to_insert' into 'string' at 'position'. Both 'string' // and 'thing_to_insert' are required to be in NFD. The area around the // insertion is renormalized to NFD.
I see -- no surprises here. As I said, I like this idea a lot! However, see below.
Having to renormalize at API boundaries can be prohibitively expensive.
Sure. Anything can be prohibitively expensive in some context. If that's the case in a particular program, I think it is likely to be unacceptable to use text::operator+(string_view) as well, since that also does on-the-fly normalization.
Hopefully only on the string_view and the area immediately surrounding the insertion.
No, that's why I picked string_view, and not text_view. text_view insertion does not normalize the incoming text, but string_view insertion does. This is in keeping with the philosophy: - At program I/O boundaries (not all API boundaries), convert to UTF-8 and FCC. - Internal interfaces that take UTF-8/FCC will not transcode or normalize. - Internal interface that take non-UTF-8/FCC will transcode and normalize as needed. text::operator+(string_view sv) does not know the normalization of sv, so it normalizes. The alternative is clunky -- you have to make a new string somewhere to normalize into, and then use operator+() on the result.
Someone, somewhere, has to pay that cost if you want to use two chunks of text in encoding/normalization A and B. You might be able to keep working in A for some text and keep working in B separately for other text, but I think code that works like that is going to be hard to reason about, and will be as common as code that freely mixes wstring and string (and I mean not only at program boundaries). That is, not very common.
Which is why I want to avoid just that.
Your suggestions:
void f() { // renormalizes to fcc text::text t = api_funtion_that_returns_nfd(); do_something_with(t); string s; text::normalize_to_nfd(t.extract(), back_inserter(s)); api_function_that_accepts_nfd(s); }
My suggestion:
void f() { text::text
t = api_function_that_returns_nfd(); do_something_with(t); api_function_that_accepts_nfd(t.extract()); }
Right, I get it. I just think you're leaving out the lack of
interoperability with text::text