I have just discovered the incredible boost and regex++ libraries but I have
encountered some difficulties...
I read that the japanese special encoding is handled in regex++, specially
using wide char wchar characters. In the regex++ faq, it is presented the
system of class , ex. [[:space]], in order to define a set of characters
with a same property. I have been looking for a kind of [[:Japanese
characters]] class. Actually I have a text with a lot of strange characters
and japanese one ( Hiragana, katakan, Kanji everything..!) and I want to
find these japanese sentence in order to translate them and replace in the
text. I need hence a way in order to identify a japanese sentence . A kind
of function const bool isJap( const wchar ) const would be fine.
So if somebody has any idea or a some links, I would appreciate it! Thanks!
~~~~~~~~~~~~~~~~~~~~~~~
Two options:
1) You can hack the traits class used by boost.regex:
Create your own traits class that inherits from boost::regex_traits and
which implements the following member functions:
uint32_t lookup_classname(const char_type* first, const char_type*
last)const;
bool is_class(char_type c, uint32_t f)const;
The first transforms your character-class name into a constant, the latter
checks to see if a character is a member of that class. Choose a value for
your constant that isn't already in use by regex_traits.
Finally use reg_expression