Re: [boost] Boost.URL: Something Wicked This Way Comes! (Was: Needs Review)

12 Oct 2021

...
On Oct 12, 2021, at 3:15 PM, Vinnie Falco <vinnie.falco@gmail.com> wrote:
On Tue, Oct 12, 2021 at 1:02 PM Alex Christensen <achristensen@apple.com> wrote:
...
at some point you may run into issues with people trying to give
your library input like http://example.com/💩 and expecting the
URL parser to normalize it to http://example.com/%F0%9F%92%A9
Okay, I think what you're saying is that you will have this string literal:
string_view s = "http://example.com/\xf0\x9f\x92\xa9";
Unfortunately, this is not a valid URL and I don't think that the
library should accept this input.
It is perfectly valid input that some URL libraries I work with accept and percent encode, and some URL libraries I work with reject it as an invalid URL.  I think it’s a valid URL parser input that ought to produce a valid URL, but not everyone agrees on this yet.
...
However, you could write this:
url u = parse_uri( "http://example.com" ).value();
u.set_path( "/\xf0\x9f\x92\xa9" );
This will produce:
assert( u.encoded_url() == "http://example.com/%f0%9f%92%a9" );
Is this what you meant?
Welcome to the crazy world of URL parsing!
...
I'm guessing that the turd emoji is inserted
into the C++ source file as a utf-8 encoded code point, so that's what
you get in the string literal.
Thanks