> Now as you have seen there are many possible "non-standard" UTF-8 variants. > > What should I accept?
I still strongly suggest you simply call RtlUTF8ToUnicodeN() (https://msdn.microsoft.com/en-us/library/windows/hardware/ff563018(v=vs.85).... https://msdn.microsoft.com/en-us/library/windows/hardware/ff563018(v=vs.85)....) to do the UTF-8 conversion. Do **nothing** else.
Niall, could you explain why? I don't know any of the Windows-relevant details.
1. RtlUTF8ToUnicodeN() is what the NT kernel uses and isn't polluted by Win32. 2. RtlUTF8ToUnicodeN() has a well designed API unlike the awful Win32 MultiByteToWideChar() function. 3. RtlUTF8ToUnicodeN() is close to as fast as any implementation of the same thing, unlike MultiByteToWideChar() and some STL implementations of <codecvt>. 4. RtlUTF8ToUnicodeN() treats invalid input in the way which the rest of the NT kernel is built to expect. I caveat this with saying that Win32 functions can mangle input in a really unhelpful way, I would only trust RtlUTF8ToUnicodeN() being fed directly to NT kernel APIs. That I know works well. 5. I've been using RtlUTF8ToUnicodeN() in my own code for years and have found it unsurprising and unproblematic. Unlike MultiByteToWideChar() or C++ 11's UTF-8 support which doesn't quite work right on some standard libraries. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/