Re: [boost] UUID design discussion

25 Apr 2024


      On 4/25/24 18:28, Peter Dimov wrote:
...
Andrey Semashev wrote:
...
On 4/25/24 17:53, Peter Dimov via Boost wrote:
...
This behavior makes name UUIDs produced by e.g. "www.example.org"
and L"www.example.org" different, which is unlikely to be what one
wants in practice, and is against the recommendation of RFC 4122,
which says
o  Convert the name to a canonical sequence of octets (as defined by
      the standards or conventions of its name space); put the name
      space ID in network byte order.
I don't think anyone can justify the choice of e.g. 0x41 0x00 0x00
0x00 as the "canonical sequence of octets" for U"A".
Perhaps, we should simply assume that whatever form of the string the user
provided to the generator is the "canonical" form. That is, if the user wants
"www.example.org" and L"www.example.org" to produce the same UUID, it
is his responsibility to convert those strings to the same representation before
passing it to the generator.
I think, in some regions, Unicode might not be the first encoding of choice, and
there also are incorrectly encoded strings that cannot be converted to UTF-8. I
don't think that Boost.UUID should deal with those issues.
The right way to not deal with these issues is to simply not take wide strings
in the first place. This forces the user to supply "the canonical octet
representation".
Since we do take wide strings, we have implicitly accepted the responsibility
to produce the canonical octet representation for them. And inserting zeroes
randomly is simply wrong.
Ok, so maybe we should simply deprecate the support for wide string inputs?