String split behaviour
Hi, I am getting two empty strings from the following program, void boost_split_test() { const string &text("-"); vector<string> tokens; split(tokens, text, boost::is_any_of("-"), token_compress_on); cout << "size of tokens " << tokens.size() << '\n'; for (auto const &e : tokens) cout << e.size() << '\n'; } Output: size of tokens 2 0 0 Is this expected output? I expecting an zero split parts. Could someone clarify? -- Thanks, :) Venki.
On Wed, May 20, 2015 at 11:32 AM, Venkateswara Rao Sanaka < moderncpp.venki@gmail.com> wrote:
Hi,
I am getting two empty strings from the following program,
void boost_split_test() {
const string &text("-");
vector<string> tokens;
split(tokens, text, boost::is_any_of("-"), token_compress_on);
cout << "size of tokens " << tokens.size() << '\n';
for (auto const &e : tokens)
cout << e.size() << '\n';
}
Output:
size of tokens 2 0 0
Is this expected output? I expecting an zero split parts. Could someone clarify?
This seems reasonable to me. You asked it to split the string containing a single dash into parts separated by dashes. The string gets split into an empty string, a dash (which is not returned to you, being the separator), and an empty string. Consider splitting the input string "Foo-" (or "-Foo") compared to "Foo". One gives two strings (one before the dash, one after the dash), the other gives one string (because there are no dashes). Given a string with "n" separators, you should get "n+1" strings back (with the proviso that consecutive separators are collapsed together, so "Foo--" is treated the same as "Foo-"). -- Marshall P.S. Checking the tests, I notice that there's no coverage for this case (separators at the beginning or the end of the input). I'll put it on my list. Thanks!
Thanks Marshall for the reply. In our code I faced a strange error when splitting the string. The hyphen symbol was used to represent null data, upon splitting the string containing only hyphen, I expected a result of zero tokens (I was wrong here). Even dynamic languages are behaving same, see below a python sample, Python 2.7.6 (default, Mar 22 2014, 22:59:56) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information.
s = "-Foo" tokens = s.split("-") print tokens ['', 'Foo']
An example in the boost documentation would help the user.
Even the following command line example proves the same,
$echo "a-b" | awk -F "-" '{for (i=1; i <= NR; i++) printf "%s:", $i}' --->
This will print a:b
$echo "-" | awk -F "-" '{for (i=1; i <= NR; i++) printf "%s:", $i}' --->
This will print : (i.e. two NUL strings on screen)
Infact the second command line example was the reason behind my confusion :)
Thankful to you all Boost developers. Great work.
On Thu, May 21, 2015 at 1:59 AM, Marshall Clow
On Wed, May 20, 2015 at 11:32 AM, Venkateswara Rao Sanaka < moderncpp.venki@gmail.com> wrote:
Hi,
I am getting two empty strings from the following program,
void boost_split_test() {
const string &text("-");
vector<string> tokens;
split(tokens, text, boost::is_any_of("-"), token_compress_on);
cout << "size of tokens " << tokens.size() << '\n';
for (auto const &e : tokens)
cout << e.size() << '\n';
}
Output:
size of tokens 2 0 0
Is this expected output? I expecting an zero split parts. Could someone clarify?
This seems reasonable to me.
You asked it to split the string containing a single dash into parts separated by dashes. The string gets split into an empty string, a dash (which is not returned to you, being the separator), and an empty string.
Consider splitting the input string "Foo-" (or "-Foo") compared to "Foo". One gives two strings (one before the dash, one after the dash), the other gives one string (because there are no dashes).
Given a string with "n" separators, you should get "n+1" strings back (with the proviso that consecutive separators are collapsed together, so "Foo--" is treated the same as "Foo-").
-- Marshall
P.S. Checking the tests, I notice that there's no coverage for this case (separators at the beginning or the end of the input). I'll put it on my list. Thanks!
_______________________________________________ Boost-Testing mailing list Boost-Testing@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-testing
-- Thanks, :) Venki.
participants (2)
-
Marshall Clow
-
Venkateswara Rao Sanaka