
On Tue, Oct 12, 2021 at 10:57 AM Alex Christensen
Why did you use RFC 3986 as your specification?
Well, this one seems to be the latest RFC regarding URLs, modulo a bit of HTTP-specific stuff like authority-form appearing in rfc7230. Is there something newer?
How do you feel about the WhatWG URL specification at https://url.spec.whatwg.org?
Quite frankly, I hate it. This "specification" manages to organize and present the information in the most obtuse manner possible. I find it hostile to implementers like myself.
It has a large body of tests ...that you may consider looking at.
Yep, it does! Integrating them is on my to-do- list: https://github.com/CPPAlliance/url/tree/c6c4b433c3b1057161b6ce50bb4fba0b5f59...
Do you have any general plan for a strategy for handling non-ASCII input?
Yes, the plan is to reject such input. Strings containing non-ASCII characters are not valid URLs. And even some ASCII characters are not allowed to appear in a URL, for example all control characters.
I haven’t tested it yet, but what do you plan to do if someone passes a UTF-8 encoded non-ASCII string...
...into the constructor? You can't construct a URL from a string, you have to go through one of
I think what you're asking is, what if someone supplies a URL which has escaped characters which, when percent-decoding is applied, become valid UTF-8 code point sequences? That's perfectly fine. Percent-encoded URL parts are in fact "ASCII strings." the parsing functions. This is because the library recognizes several variations of URL grammar, and does not favor any particular grammar by choosing one to support construction. See Table 1.1: https://master.url.cpp.al/url/parsing.html Thanks