Dear boost devs, I'm writing here because I've been coding a tokenizer that I believe, given its generality, could be an addition to Boost. I'm asking for a judgement on whether the idea has room in boost or not. The interface I'm proposing is a function of two arguments, a string (call it text) and a set of strings (call it key-terms), that returns a vector<string> (call it result) which fulfils 4 constraints: 1. a string in the result can only be either a key-term or a string between two key-terms; 2. the concatenation of the result is always the original text; 3. a key-term containing other key-terms has priority over the latter; 4. a key-term overlapping other has priority based on its position in the text A tokenizer that divides a string by delimiters is a subset of this interface whose key-terms are the delimiters. This is considered in Boost.Tokenizer, where the key-terms are *non-overlapping. *The critical addition here is the ability to deal with *overlapping key-terms*. A common use case of overlapping key-terms is when you have key-terms that you want to consider as single tokens but they overlap with common delimiters. A practical example: tokenize a string (with words separated by spaces) and guarantee that both `"United States of America"` and `"United States of Brazil"` are interpreted as single tokens. The non-triviality appears because such feat requires storing which key-terms are using a previous sub-string, and how to reverse in case the match fails. (e.g. "United States of " is common to both terms above, but once the first letter appears, either one or both can be discarded as potential matches). Some examples in pseudo-code (see how they fulfil constraints 1-4) tokenize("the end", {}) --> {"the end"} tokenize("the end", {" "}) --> {"the", " ", "end"} tokenize("foo-bar", {"foo", "foo-bar"}) --> {"foo-bar"} tokenize("the end", {"the e", " ", "end"}) --> {"the e", "nd"} tokenize("foo-tars ds", {"foo", "foo-bar", "-tar"}) --> {"foo", "-tar", "s ds"} As a proof of concept, I've implemented such interface and respective test cases, which you can find in https://github.com/jorgecarleitao/tokenize. Any change is possible to accommodate to boost standards: this can be generalized to arbitrary sequences, to arbitrary types, and to use iterators, documented, better tested etc. But before anything else, I would like to ask for an opinion on whether this is sufficiently general and useful to be considered to boost. Thank you for your time, Best regards, Jorge