A library for "string chains"?
[Please do not mail me a copy of your followup] For a motivating example, see this gist: https://gist.github.com/LegalizeAdulthood/7b67968bd93fbd4f9dbb It uses boost::mapped_file to map a mail message into memory and then proceeeds to parse it. This is simply an example, but I think this hacked up parser comes fairly close to handling the full RFC2822 message, ignoring MIME extensions. https://tools.ietf.org/html/rfc2822 I hacked it up from memory of mail message format rules and a single example message, so don't consider it production quality :-). The intention here is to parse a file without copying it's input text into any buffers. Notice that this parser builds structure from a buffer by identifying interesting substrings as (b,e) pointer pairs. A more traditional approach would have involved at least 2 more copies: one that gets the data from the file system into the stream buffer and another one that copies the data from the stream buffer into a std::string. The extra copying can take significant amounts of time when processing thousands of mail messages. Mail messages tend to be short, so mapping them into memory is not such a big deal. Obviously there is a lifetime relationship between the mapped file (the source character buffer) and the associated strings. For my use case, I'm not interested in being able to write to the strings, just read from them. (If you look closely you'll see that I "cheat" and add a few segments into my string that are associated with const char* C-style strings, but they too are read-only.) I'm interested to know if anyone is aware of a "string chain" library that provides a (read-only) API similar to std::string, but is fundamentally just managing (b,e) pointer pairs into some larger buffer. Boost.Test has a funky string class hiding in it called basic_cstring, but it wasn't created for this purpose. -- "The Direct3D Graphics Pipeline" free book http://tinyurl.com/d3d-pipeline The Computer Graphics Museum http://computergraphicsmuseum.org The Terminals Wiki http://terminals.classiccmp.org Legalize Adulthood! (my blog) http://legalizeadulthood.wordpress.com
On Saturday, August 09, 2014 11:44 AM, Richard wrote:
I'm interested to know if anyone is aware of a "string chain" library that provides a (read-only) API similar to std::string, but is fundamentally just managing (b,e) pointer pairs into some larger buffer.
Boost.Test has a funky string class hiding in it called basic_cstring, but it wasn't created for this purpose.
I don't know how useful it would be for your purpose, but you could look at joined_range and string_algo: http://www.boost.org/doc/libs/1_56_0/doc/html/string_algo/design.html#string... http://www.boost.org/doc/libs/1_56_0/libs/range/doc/html/range/reference/uti... Ben
On 09.08.2014 05:44, Richard wrote:
[Please do not mail me a copy of your followup]
For a motivating example, see this gist: https://gist.github.com/LegalizeAdulthood/7b67968bd93fbd4f9dbb
It uses boost::mapped_file to map a mail message into memory and then proceeeds to parse it. This is simply an example, but I think this hacked up parser comes fairly close to handling the full RFC2822 message, ignoring MIME extensions. https://tools.ietf.org/html/rfc2822 I hacked it up from memory of mail message format rules and a single example message, so don't consider it production quality :-).
The intention here is to parse a file without copying it's input text into any buffers. Notice that this parser builds structure from a buffer by identifying interesting substrings as (b,e) pointer pairs.
A more traditional approach would have involved at least 2 more copies: one that gets the data from the file system into the stream buffer and another one that copies the data from the stream buffer into a std::string. The extra copying can take significant amounts of time when processing thousands of mail messages.
Mail messages tend to be short, so mapping them into memory is not such a big deal. Obviously there is a lifetime relationship between the mapped file (the source character buffer) and the associated strings. For my use case, I'm not interested in being able to write to the strings, just read from them. (If you look closely you'll see that I "cheat" and add a few segments into my string that are associated with const char* C-style strings, but they too are read-only.)
I'm interested to know if anyone is aware of a "string chain" library that provides a (read-only) API similar to std::string, but is fundamentally just managing (b,e) pointer pairs into some larger buffer.
Boost.Test has a funky string class hiding in it called basic_cstring, but it wasn't created for this purpose.
I think http://www.boost.org/doc/libs/1_55_0/libs/utility/doc/html/string_ref.html and http://en.cppreference.com/w/cpp/experimental/basic_string_view could help. Jan Herrmann
[Please do not mail me a copy of your followup]
Jan Herrmann
On 09.08.2014 05:44, Richard wrote:
[Please do not mail me a copy of your followup]
For a motivating example, see this gist: https://gist.github.com/LegalizeAdulthood/7b67968bd93fbd4f9dbb
[...]
I think http://www.boost.org/doc/libs/1_55_0/libs/utility/doc/html/string_ref.html and http://en.cppreference.com/w/cpp/experimental/basic_string_view could help.
Looks like string_ref is really close to what I was asking for; I'm going to see if I can rework that gist with string_ref. -- "The Direct3D Graphics Pipeline" free book http://tinyurl.com/d3d-pipeline The Computer Graphics Museum http://computergraphicsmuseum.org The Terminals Wiki http://terminals.classiccmp.org Legalize Adulthood! (my blog) http://legalizeadulthood.wordpress.com
[Please do not mail me a copy of your followup]
legalize+jeeves@mail.xmission.com (Richard) spake the secret code
Looks like string_ref is really close to what I was asking for; I'm going to see if I can rework that gist with string_ref.
Gist updated to use boost::string_ref:
https://gist.github.com/LegalizeAdulthood/7b67968bd93fbd4f9dbb
So string_ref took over the place of the string_chain_segment that I
had before.
Now the question remains... is there something that already exists
that provides a string_view from a vector of string_ref's?
As near as I can tell, string_view is a class defined in namespace std
that provides a string-like interface over a single char* string. I'm
looking to provide an abstraction over a vector
On 14.08.2014 03:37, Richard wrote:
[Please do not mail me a copy of your followup]
legalize+jeeves@mail.xmission.com (Richard) spake the secret code
thusly: Looks like string_ref is really close to what I was asking for; I'm going to see if I can rework that gist with string_ref.
Gist updated to use boost::string_ref: https://gist.github.com/LegalizeAdulthood/7b67968bd93fbd4f9dbb
So string_ref took over the place of the string_chain_segment that I had before.
Now the question remains... is there something that already exists that provides a string_view from a vector of string_ref's?
Range with http://www.boost.org/doc/libs/1_56_0/libs/range/doc/html/range/reference/uti... could help.
As near as I can tell, string_view is a class defined in namespace std that provides a string-like interface over a single char* string. I'm looking to provide an abstraction over a vector
. I can, of course, code one myself, but I'd rather not reinvent the wheel....
Thanks to Jan Herrmann for the pointer to string_ref.
Jan Herrmann
[Please do not mail me a copy of your followup]
Jan Herrmann
On 14.08.2014 03:37, Richard wrote:
Now the question remains... is there something that already exists that provides a string_view from a vector of string_ref's?
Range with http://www.boost.org/doc/libs/1_56_0/libs/range/doc/html/range/reference/uti... could help.
Hmm... looks interesting. Obviously that could be used to get a single iterator over two string_ref's. If I have a chain of string_ref's r1...rN, it looks like I'd have to do: join(r1, join(r2, join(r3, join(r4, ..., join(rN-1, rN) ...))) Which doesn't look promising, but I'll give it a go. -- "The Direct3D Graphics Pipeline" free book http://tinyurl.com/d3d-pipeline The Computer Graphics Museum http://computergraphicsmuseum.org The Terminals Wiki http://terminals.classiccmp.org Legalize Adulthood! (my blog) http://legalizeadulthood.wordpress.com
On 15.08.2014 23:52, Richard wrote:
[Please do not mail me a copy of your followup]
Jan Herrmann
spake the secret code <53EDB252.9060903@gmx.de> thusly: On 14.08.2014 03:37, Richard wrote:
Now the question remains... is there something that already exists that provides a string_view from a vector of string_ref's?
Range with http://www.boost.org/doc/libs/1_56_0/libs/range/doc/html/range/reference/uti... could help.
Hmm... looks interesting. Obviously that could be used to get a single iterator over two string_ref's.
If I have a chain of string_ref's r1...rN, it looks like I'd have to do:
join(r1, join(r2, join(r3, join(r4, ..., join(rN-1, rN) ...)))
Which doesn't look promising, but I'll give it a go.
I think you will loose an O(1) indexed access but that might not be a problem. To join a container of string_refs std::accumulate could help. Jan Herrmann
On Aug 18, 2014, at 2:46 AM, Jan Herrmann
wrote: On 15.08.2014 23:52, Richard wrote: [Please do not mail me a copy of your followup]
Jan Herrmann
spake the secret code <53EDB252.9060903@gmx.de> thusly: On 14.08.2014 03:37, Richard wrote: Now the question remains... is there something that already exists that provides a string_view from a vector of string_ref's?
Range with http://www.boost.org/doc/libs/1_56_0/libs/range/doc/html/range/reference/uti... could help.
Hmm... looks interesting. Obviously that could be used to get a single iterator over two string_ref's.
If I have a chain of string_ref's r1...rN, it looks like I'd have to do:
join(r1, join(r2, join(r3, join(r4, ..., join(rN-1, rN) ...)))
Which doesn't look promising, but I'll give it a go.
I think you will loose an O(1) indexed access but that might not be a problem. To join a container of string_refs std::accumulate could help.
Believe each join type is unique, and type erasure on containers is usually non-performant, so the best you could do is fusion::accumulate - if your sequences were compile-time, which they are clearly no). The classic data structure for this stuff is a rope, but I don't know of a rope that refers to an outside buffer... :-( Cheers Gordon
On 18.08.2014 10:09, Gordon Woodhull wrote:
On Aug 18, 2014, at 2:46 AM, Jan Herrmann
wrote: On 15.08.2014 23:52, Richard wrote: [Please do not mail me a copy of your followup]
Jan Herrmann
spake the secret code <53EDB252.9060903@gmx.de> thusly: On 14.08.2014 03:37, Richard wrote: Now the question remains... is there something that already exists that provides a string_view from a vector of string_ref's?
Range with http://www.boost.org/doc/libs/1_56_0/libs/range/doc/html/range/reference/uti... could help.
Hmm... looks interesting. Obviously that could be used to get a single iterator over two string_ref's.
If I have a chain of string_ref's r1...rN, it looks like I'd have to do:
join(r1, join(r2, join(r3, join(r4, ..., join(rN-1, rN) ...)))
Which doesn't look promising, but I'll give it a go.
I think you will loose an O(1) indexed access but that might not be a problem. To join a container of string_refs std::accumulate could help.
Believe each join type is unique, and type erasure on containers is usually non-performant, so the best you could do is fusion::accumulate - if your sequences were compile-time, which they are clearly no).
The classic data structure for this stuff is a rope, but I don't know of a rope that refers to an outside buffer... :-(
Cheers Gordon
Ok my mistake. So I think a rope for string_ref's has to be implemented. I found twines (http://llvm.org/docs/doxygen/html/Twine_8h_source.html) which look like a rope of variants of string_ref like types. Jan Herrmann
[Please do not mail me a copy of your followup]
Gordon Woodhull
The classic data structure for this stuff is a rope, but I don't know of a rope that refers to an outside buffer... :-(
I always wondered why neither the standard library nor boost has a rope class. -- "The Direct3D Graphics Pipeline" free book http://tinyurl.com/d3d-pipeline The Computer Graphics Museum http://computergraphicsmuseum.org The Terminals Wiki http://terminals.classiccmp.org Legalize Adulthood! (my blog) http://legalizeadulthood.wordpress.com
On 8/15/2014 12:10 AM, Jan Herrmann wrote:
On 14.08.2014 03:37, Richard wrote:
Now the question remains... is there something that already exists that provides a string_view from a vector of string_ref's?
Range with http://www.boost.org/doc/libs/1_56_0/libs/range/doc/html/range/reference/uti... could help.
As Richard discovered for himself, join doesn't help here. What Boost.Range needs is an adaptor that flattens a range of ranges into a range. This is essential functionality, and I describe some of the cool stuff than can be implemented with it here: http://ericniebler.com/2014/04/27/range-comprehensions/ Eric
On Thu, Aug 21, 2014 at 12:34 AM, Eric Niebler
[...] What Boost.Range needs is an adaptor that flattens a range of ranges into a range.
FWIW, for a different perspective, this is what XPath2 implicitly does to its sequences, because of rule#4 below. When switching from XPath1 to XPath2, this is something that stuck in my mind, and worked surprisingly well (for me). --DD from http://en.wikipedia.org/wiki/XPath_2.0 : 1) Every value in XPath 2.0 is a *sequence* of *items*. 2) The items may be *nodes* or *atomic values*. 3) An individual node or atomic value is considered to be a sequence of length one. 4) Sequences may not be nested
participants (6)
-
Ben Pope
-
Dominique Devienne
-
Eric Niebler
-
Gordon Woodhull
-
Jan Herrmann
-
legalize+jeeves@mail.xmission.com