Glib::ustring and tokenizer
Hi everybody, I want to use Glib::ustring for UTF8 support. I was using boost::tokenizer up to this moment for breaking up some strings, which now would be ustrings. In the tokenizer Documentation you can find: template < class TokenizerFunc = char_delimiters_separator<char>, class Iterator = std::string::const_iterator, class Type = std::string
class tokenizer Im never used template programming, so can somebody give me a hint, if and how i can modify this to Glib::ustring? Maybe something like this? template < class TokenizerFunc = char_delimiters_separator<char>, class Iterator = Glib::ustring::const_iterator, class Type = Glib::ustring
class utokenizer Can i use wchar_t now? im a liddle bit confused, if it not works i have just to write some code myself, but i have hope getting it working. Thanks for reading and maybe some answers. Greetings from stormy germany Manuel Jung
On Thu, 18 Jan 2007, Manuel Jung wrote: [snip]
Maybe something like this?
template < class TokenizerFunc = char_delimiters_separator<char>, class Iterator = Glib::ustring::const_iterator, class Type = Glib::ustring
class utokenizer
Can i use wchar_t now? im a liddle bit confused, if it not works i have just to write some code myself, but i have hope getting it working.
Try this : typedef boost::tokenizer< boost::char_delimiters_separator< Glib::ustring::value_type > , Glib::ustring::const_iterator , Glib::ustring > utokenizer ; -- François Duranleau LIGUM, Université de Montréal
François Duranleau wrote:
typedef boost::tokenizer< boost::char_delimiters_separator< Glib::ustring::value_type > , Glib::ustring::const_iterator , Glib::ustring > utokenizer ;
Hi, this is a nice try, but i doesnt wor completly. I get these errors: src/whale.h:40: Fehler: verirrtes »\194« im Programm src/whale.h:40: Fehler: verirrtes »\160« im Programm src/whale.h:40: Fehler: verirrtes »\194« im Programm src/whale.h:40: Fehler: verirrtes »\160« im Programm src/whale.h:40: Fehler: verirrtes »\194« im Programm src/whale.h:40: Fehler: verirrtes »\160« im Programm src/whale.h:41: Fehler: verirrtes »\194« im Programm src/whale.h:41: Fehler: verirrtes »\160« im Programm src/whale.h:41: Fehler: verirrtes »\194« im Programm src/whale.h:41: Fehler: verirrtes »\160« im Programm src/whale.h:41: Fehler: verirrtes »\194« im Programm src/whale.h:41: Fehler: verirrtes »\160« im Programm src/whale.h:42: Fehler: verirrtes »\194« im Programm src/whale.h:42: Fehler: verirrtes »\160« im Programm src/whale.h:42: Fehler: verirrtes »\194« im Programm This means: "Error: mislea >>\194<< in program". this are lines 40-42: boost::char_delimiters_separator< Glib::ustring::value_type > , Glib::ustring::const_iterator , Glib::ustring > tokenizer ; im wondering that the error messages are in german? i dont have a clue what these are about, seems like some wrong characters? Greets Manuel Jung
François Duranleau wrote:
On Thu, 18 Jan 2007, Manuel Jung wrote:
[snip]
Maybe something like this?
template < class TokenizerFunc = char_delimiters_separator<char>, class Iterator = Glib::ustring::const_iterator, class Type = Glib::ustring
class utokenizer
Can i use wchar_t now? im a liddle bit confused, if it not works i have just to write some code myself, but i have hope getting it working.
Try this :
typedef boost::tokenizer< boost::char_delimiters_separator< Glib::ustring::value_type > , Glib::ustring::const_iterator , Glib::ustring > utokenizer ;
Okay... allready fixed that.. every line 6 times.. seems this was an encoding issue, deleting the whitespaces before the line and replaces them with "new" one solved the errors.. It compiles now right fine. Im using char_seperator<> instead of char_delimiters_seperator. To get a list of Glib::ustring::value_type, i have a ustring and extract the value_type values with the []operator. Works fine, but code crashes at runtime with a std::__throw_length_error (). Cause its not worth the work using boost::tokenizer with ustring i will write some one split up function myself for now. Thanks for your help. Damn UTF8 and unicode.. damn german^^ (even if im from germany..) Greets Manuel Jung
Manuel Jung wrote:
Cause its not worth the work using boost::tokenizer with ustring i will write some one split up function myself for now.
I'd suggest you try string_algorithms instead of tokenizer -- it's more flexible and is should be able to handle anything that works like a string: http://www.boost.org/doc/html/string_algo.html Jeff
participants (3)
-
François Duranleau
-
Jeff Garland
-
Manuel Jung