I think it has most of what's needed, though it seems that the type conversion __builtin_convertvector, which is needed to expand e.g. a UTF-8 byte to UTF-32 with zero bytes, is only present in newer versions of g++ than I have. Than it's likely not very useful for now. Maybe later once that compiler version is more wide-spread // Attempt to decode the subset of UTF-8 with code points < 256. // Format is either 0xxxxxxx -> 0xxxxxxx // or 110---xx 10yyyyyy -> xxyyyyyy // The input mustn't start or finish in the middle of a multi-byte // character. // Other inputs produce undefined outputs. Good code for that special case. But I think "undefined outputs" is not acceptable. I've seen other SIMD UTF-8 conversions around and they basically all focus on ASCII converting as much as possible and fallback to one-by-one decoding once a non-ascii is found That will be quick, but it does lack a few things; it doesn't check if it has reached the end of the input and it doesn't do any error checking.
So not really usable either. BUT: Compare to Boost.Locale which has a `decode` and `decode_valid` function where the latter assumes valid UTF-8 However checking for end-of-input is a must obviously. BTW: Does Boost.Text have functions or overloads where you can specify that text is in a specific encoding/normalization? If not I think this should be added. Sometimes you get text from an internal function and know those things so you can skip verification and conversion