31 Oct
2002
31 Oct
'02
10:52 a.m.
Hi, We want to use regex++ (version 3.31) with UTF-8 strings. I tried to match a UTF-8 character of 2 bytes to the regex "." and the match failed. It seems regex++ handles these 2 bytes as two separate characters. 1) Is there a "native" way in the regex++ library for using UTF-8 strings? Can we use UTF-8 strings to compare against a compiled regex (the regex is in ASCII only)? Can the regex itself hold UTF-8 characters? 2) Is converting to wchar_t our only option? As far as I understand, wchar_t does not cover the entire range of characters covered by UTF-8, so it may not be enough. Any other ideas? thanks, Gitit.