Are the existing character-classes following a standard, or are you open to patches to extend them?
Yes, they follow the POSIX and ECMA script standards to give: "alnum" "alpha", "cntrl", "digit", "graph", "lower", "print", "punct", "space", "upper", "xdigit", "blank", "word", "unicode",
It might be nice to have at least: [:hiragana:] [:katakana:] [:hankaku_katakana:]
isn't that just [[:hiragana:][:katakana:]] ?
[:wide_alpha:] [:wide_num:] [:wide_alphanum:]
There should be no need for those - [[:alpha:]] will detect wide character alphabetic characters perfectly well (provided the locale isn't "C").
Defining the set of Japanese kanji would be harder.
How are they defined? It might be best to add a facility to add new character classes as a list of characters and ranges to include, something like: register_character_class("myname", "d-f"); Then we add all the Unicode block ranges as standard for wide character regexes. John.