On 14.06.20 01:25, Zach Laine via Boost wrote:
On Fri, Jun 12, 2020 at 4:15 PM Rainer Deyke via Boost
wrote: A memmapped string_view /is/ a contiguous sequence of char. I don't see the difference.
The difference is mutability. There's no perf concern with erasing the first element of a string_view, if that's not even a supported operation.
A /lot/ of strings, probably the vast majority, will never be mutated. And for the rest, the majority will only be mutated by appending. Erasing the first element is a nice to have but expensive and rarely used feature. If you find yourself doing that a lot, then you probably do want a rope.
Somewhere in the implementation of operator[] and operator(), there has to be a branch on index < 0 (or >= 0) in order for that negative index trick to work, which the compiler can't always optimize away. Branches are often affordable but they're not free.
Ah, I see, thanks. Would it make you feel better if negative indexing were only used when getting substrings?
That does address the performance problem, so yes.
I hadn't thought through the interface in detail. I just saw that this was a feature of the text layer, and thought it would be nice to have in the unicode layer, because I don't want to use the text layer (in its current form).
I don't need a detailed interface. Pseudocode would be fine too.
insert_nfd(string, position, thing_to_insert) // Insert 'thing_to_insert' into 'string' at 'position'. Both 'string' // and 'thing_to_insert' are required to be in NFD. The area around the // insertion is renormalized to NFD.
Having to renormalize at API boundaries can be prohibitively expensive.
Sure. Anything can be prohibitively expensive in some context. If that's the case in a particular program, I think it is likely to be unacceptable to use text::operator+(string_view) as well, since that also does on-the-fly normalization.
Hopefully only on the string_view and the area immediately surrounding the insertion.
Someone, somewhere, has to pay that cost if you want to use two chunks of text in encoding/normalization A and B. You might be able to keep working in A for some text and keep working in B separately for other text, but I think code that works like that is going to be hard to reason about, and will be as common as code that freely mixes wstring and string (and I mean not only at program boundaries). That is, not very common.
Which is why I want to avoid just that.
Your suggestions:
void f() {
// renormalizes to fcc
text::text t = api_funtion_that_returns_nfd();
do_something_with(t);
string s;
text::normalize_to_nfd(t.extract(), back_inserter(s));
api_function_that_accepts_nfd(s);
}
My suggestion:
void f() {
text::text
That's what I don't get. Could you explain how text<A> and text<B> are useful in a specific case?
text