On 09.08.2014 05:44, Richard wrote:
[Please do not mail me a copy of your followup]
For a motivating example, see this gist: https://gist.github.com/LegalizeAdulthood/7b67968bd93fbd4f9dbb
It uses boost::mapped_file to map a mail message into memory and then proceeeds to parse it. This is simply an example, but I think this hacked up parser comes fairly close to handling the full RFC2822 message, ignoring MIME extensions. https://tools.ietf.org/html/rfc2822 I hacked it up from memory of mail message format rules and a single example message, so don't consider it production quality :-).
The intention here is to parse a file without copying it's input text into any buffers. Notice that this parser builds structure from a buffer by identifying interesting substrings as (b,e) pointer pairs.
A more traditional approach would have involved at least 2 more copies: one that gets the data from the file system into the stream buffer and another one that copies the data from the stream buffer into a std::string. The extra copying can take significant amounts of time when processing thousands of mail messages.
Mail messages tend to be short, so mapping them into memory is not such a big deal. Obviously there is a lifetime relationship between the mapped file (the source character buffer) and the associated strings. For my use case, I'm not interested in being able to write to the strings, just read from them. (If you look closely you'll see that I "cheat" and add a few segments into my string that are associated with const char* C-style strings, but they too are read-only.)
I'm interested to know if anyone is aware of a "string chain" library that provides a (read-only) API similar to std::string, but is fundamentally just managing (b,e) pointer pairs into some larger buffer.
Boost.Test has a funky string class hiding in it called basic_cstring, but it wasn't created for this purpose.
I think http://www.boost.org/doc/libs/1_55_0/libs/utility/doc/html/string_ref.html and http://en.cppreference.com/w/cpp/experimental/basic_string_view could help. Jan Herrmann