Actually I have a text with a lot of strange characters and japanese one ( Hiragana, Katakana, Kanji everything..!) and I want to find these japanese sentence in order to translate them and replace in the text. I need hence a way in order to identify a japanese sentence . A kind of function const bool isJap( const wchar ) const would be fine.
Do you need to use regexes? I've not tried boost.regex yet so cannot help there.
Is your text just ascii and Japanese? Or do you need to distinguish from other languages as well?
If just ascii and Japanese, you could define a Japanese char as anything that is not ascii (beware shift-jis encoding though, as 2nd byte of a double byte character is in the ascii range). If your data is unicode it should also be easy to treat European characters as non-Japanese as well.
Darren
Thanks Darren for your reply, Well actually I can avoid using regex but my text is more than ascii and japanese. Actually it is a byte file where some pieces are japanese sentences and others are byte controls like 0x00 ( which introduces more difficulties because you cannot parse the text as a string because 0x00 is an end character... ). So I thinks I have to parse bytes by pair and try to identify them as Shift-JIS when it is the case. Any idea of a function or program that does it? Thanks, jschmid