regular expression searching for Glib::ustring??
I am using Gtkmm. I want to do boost regular expression searching on Glib::ustring. http://www.gtkmm.org/gtkmm2/docs/reference/html/classGlib_1_1ustring.html This class represents characters in UTF-8, so each character in the buffer is represented by a varriable number of bytes. But it does have a bidirectional iterator. How would you set up boost regex to search if both the regular expression and the string to be searched is a ustring? Does one need to override any of the types defiened in regex_traits? Thank you. BTW. Quote from:http://www.boost.org/libs/regex/doc/regex_traits.html "Under construction. The current boost.regex traits class design will be migrated to that specified in the regular expression standardization proposal." This is not very useful to someone trying to use the boost_regex library now! -- Paul Elliott 1(512)837-1096 pelliott@io.com PMB 181, 11900 Metric Blvd Suite J http://www.io.com/~pelliott/pme/ Austin TX 78758-3117
I am using Gtkmm. I want to do boost regular expression searching on Glib::ustring. http://www.gtkmm.org/gtkmm2/docs/reference/html/classGlib_1_1ustring.html
This class represents characters in UTF-8, so each character in the buffer is represented by a varriable number of bytes. But it does have a bidirectional iterator.
How would you set up boost regex to search if both the regular expression and the string to be searched is a ustring?
Does one need to override any of the types defiened in regex_traits?
I wouldn't do it that way: Boost.Regex works only with character sets, where each code point is an "atom", where as UTF8 is a multibyte sequence that requires multiple characters to be considered as atoms. One way of handling this is to define a conversion iterator that translates on-the-fly between UTF-8 characters and wide character atoms, then use boost::wregex and feed it your converting iterator, rather than raw UTF-8 data. John.
participants (2)
-
John Maddock
-
Paul Elliott