Re: [Boost-users] find japanese character with boost regex++

14 Dec 2003

      ...
Are the existing character-classes following a standard, or are you open
to
patches to extend them?
Yes, they follow the POSIX and ECMA script standards to give:

"alnum"
"alpha",
"cntrl",
"digit",
"graph",
"lower",
"print",
"punct",
"space",
"upper",
"xdigit",
"blank",
"word",
"unicode",
...
It might be nice to have at least:
 [:hiragana:]
 [:katakana:]
 [:hankaku_katakana:]
isn't that just [[:hiragana:][:katakana:]] ?
...
[:wide_alpha:]
 [:wide_num:]
 [:wide_alphanum:]
There should be no need for those - [[:alpha:]] will detect wide character
alphabetic characters perfectly well (provided the locale isn't "C").
...
Defining the set of Japanese kanji would be harder.
How are they defined?

It might be best to add a facility to add new character classes as a list of
characters and ranges to include, something like:

register_character_class("myname", "d-f");

Then we add all the Unicode block ranges as standard for wide character
regexes.

John.

Re: [Boost-users] find japanese character with boost regex++

John Maddock