Regex++ and UTF-8

31 Oct 2002

      Hi,

We want to use regex++ (version 3.31) with UTF-8 strings.
I tried to match a UTF-8 character of 2 bytes to the regex "." and the match
failed. It seems regex++ handles these 2 bytes as two separate characters.

1) Is there a "native" way in the regex++ library for using UTF-8 strings?
Can we use UTF-8 strings to compare against a compiled regex (the regex is
in ASCII only)? Can the regex itself hold UTF-8 characters?

2) Is converting to wchar_t our only option? As far as I understand, wchar_t
does not cover the entire range of characters covered by UTF-8, so it may
not be enough. Any other ideas?

thanks,
Gitit.

Gitit

tags

participants (1)