Re: [boost] UUID design discussion

25 Apr 2024

      From: Peter Dimov <pdimov@gmail.com>
Date: Thursday, April 25, 2024 at 9:53 AM
To: Rob Boehne <robb@datalogics.com>, boost@lists.boost.org <boost@lists.boost.org>
Subject: RE: [boost] UUID design discussion
Rob Boehne wrote:
...
* At the moment wide strings are processed by the name generators
  by converting every wchar_t to 32 bit, then hashing the bytes, zeroes
  and all. This doesn't strike me as correct. I think that the string should
  be converted to UTF-8 on the fly (with 32 bit wchar_t assumed UTF-16
  and 32 bit wchar_t assumed UTF-32.)
To my thinking – a string should just be treated as binary data and it should
not have its encoding changed – this should also make less work.
This behavior makes name UUIDs produced by e.g. "www.example.org<http://www.example.org>"
and L"www.example.org<http://www.example.org>" different, which is unlikely to be what one wants
in practice, and is against the recommendation of RFC 4122, which says

   o  Convert the name to a canonical sequence of octets (as defined by
      the standards or conventions of its name space); put the name
      space ID in network byte order.

I don't think anyone can justify the choice of e.g. 0x41 0x00 0x00 0x00 as
the "canonical sequence of octets" for U"A".

Ok – I withdraw my comment.