On Oct 12, 2021, at 12:17 PM, Vinnie Falco
wrote: On Tue, Oct 12, 2021 at 10:57 AM Alex Christensen
wrote: Why did you use RFC 3986 as your specification?
Well, this one seems to be the latest RFC regarding URLs, modulo a bit of HTTP-specific stuff like authority-form appearing in rfc7230. Is there something newer? I would say that the WhatWG URL specification is that something newer, but I sympathize. It is difficult to get started with.
Do you have any general plan for a strategy for handling non-ASCII input?
Yes, the plan is to reject such input. Strings containing non-ASCII characters are not valid URLs. And even some ASCII characters are not allowed to appear in a URL, for example all control characters. That is certainly a choice you can make, but at some point you may run into issues with people trying to give your library input like http://example.com/💩 http://example.com/%F0%9F%92%A9 and expecting the URL parser to normalize it to http://example.com/%F0%9F%92%A9 http://example.com/%F0%9F%92%A9 for you like it does in some other URL libraries. I see Punycode encoding and decoding doesn’t seem to be in the scope of this library, and for your use cases that may be fine and for others that might not be fine. It seems like you’re aware of this design choice, though.