On Wed, Jun 3, 2020 at 1:22 AM Emil Dotchevski via Boost-users
On Tue, Jun 2, 2020 at 7:44 PM Glen Fernandes via Boost-users
wrote: The Boost formal review of Zach Laine's Unicode library, Text, commences on June 11th and concludes June 20th. We welcome the participation of Boost developers and users, and the C++ community.
Here is my review of Text in the form of a list of notes/questions, in no particular order.
Thanks for the review, Emil.
Reading through the documentation, everything makes sense. The motivation for each type and operation is easy to understand.
I welcome the use of signed int as index/size type, but I wonder if this would interact in some negative way when mixed with similar types from the standard library. I don't think so, but who knows.
In some cases, that is sure to be the case. text::string needs to exist only so that I can experiment with integrating something like text::text into it, to show how we might add to std::string if std::text ever becomes a thing. So, I don't expect the signed/unsigned interop issue to come up much, if at all -- I don't expect people to use text::string much directly.
I don't like the multi-page format of the documentation. I'd prefer a single HTML page for easy searching.
There's a lot of it! In previous Boost reviews, I've gotten the opposite of this feedback, FWIW.
What is the purpose of the throw_on_encoding_error type here? https://github.com/tzlaine/text/blob/b8bb1a1101e7cf53b460671a3ead19516e7636f...
I searched and couldn't find any place in the documentation that explains how ErrorHandler is used. Looking at the source code, it's just a function, but its interface should be documented.
Right! That's an oversight. I'll add documentation on its use and interface.
Generally I dislike interfaces with configurable error handling but it appears that in the case of the conversion iterators this is warranted: both replacement character and throwing are useful. I wonder though if it would be better to delete this template parameter and always output 0xfffd, and in addition provide an iterator adaptor type which can be used to wrap e.g. an output iterator, and which throws if it sees 0xfffd being output.
This is possible, and if others agree I can certainly make this change.
Both .c_str() and .data() should be added to string for compatibility. Also, it is not true that s.begin() is equivalent to s.c_str(), because in debug iterators should not be pointers. Perhaps &*s.begin() is equivalent, but this should not be valid on an empty string (assert), null termination notwithstanding.
Well, I don't have any non-pointer debug iterators to consider, and an empty text::string is null-terminated. .c_str() .data() and .begin() truly are synonyms in this case. I see no need to have more than one.
The note about passing ropes by value, I think, is redundant. Logically, copying a rope gets you an independent copy, the fact that internally the copies share state is an implementation detail. Therefore, the documentation can just specify that a rope is as thread-safe as an int.
I don't know about that. That is, I don't know if it's actually as threadsafe as an int. I'm simply trying to remind folks that if you pass a rope via const& you're not going to get the thread-safety you want. I've seen a lot of shared_ptrs passed that way, and I'm trying to short-circuit the internal decision making that might lead to the same choice when passing a rope.
Speaking of thread-safety, where are the thread-safety guarantees of rope documented? Where are the relevant tests?
The documentation of threadsafety is here: https://tzlaine.github.io/text/doc/html/boost_text__proposed_/the_string_lay... You have to read down a bit. The tests are in rope_threadsafety.cpp, which I have run with and without TSan.
Speaking of tests, I suggest separating the generated test sources in a subdirectory so things like test/string.cpp are easy to find. Not to mention github.com truncates the test directory to 1000 files.
This seems like a good idea -- it would certainly make things easier for me too! I'll do this.
The grammar seems wrong here; 'I don't know if you've ever written an undo system that can do, undo, or redo any command you can think of, in less than 40 lines of code, but there one is."
It parses for me, and apparently Gavin, but I agree it is a bit awkward. I'll reword it.
"Every text editor you've ever used that handles large files gracefully has done something to break the file into chunks" What about the buffer gap algorithm? I think Borland editors used this way back in the day. :)
I have no idea. I *think* this confirms what I was trying to say, though -- that your editor is dealing with sections of your large file (and not the whole thing) for efficiency. The rope approach I use in Boost.Text is just one possible implementation of this high-level approach. Zach