I am using Gtkmm. I want to do boost regular expression searching on Glib::ustring. http://www.gtkmm.org/gtkmm2/docs/reference/html/classGlib_1_1ustring.html
This class represents characters in UTF-8, so each character in the buffer is represented by a varriable number of bytes. But it does have a bidirectional iterator.
How would you set up boost regex to search if both the regular expression and the string to be searched is a ustring?
Does one need to override any of the types defiened in regex_traits?
I wouldn't do it that way: Boost.Regex works only with character sets, where each code point is an "atom", where as UTF8 is a multibyte sequence that requires multiple characters to be considered as atoms. One way of handling this is to define a conversion iterator that translates on-the-fly between UTF-8 characters and wide character atoms, then use boost::wregex and feed it your converting iterator, rather than raw UTF-8 data. John.