
On Tue, Oct 12, 2021 at 2:52 PM Alex Christensen
It is perfectly valid input that some URL libraries I work with accept and percent encode, and some URL libraries I work with reject it as an invalid URL. I think it’s a valid URL parser input that ought to produce a valid URL, but not everyone agrees on this yet.
Not so fast, I think that this can be decided objectively. A URL in the context of Boost.URL refers to "URI" in the rfc3986 sense. I use URL because most people never heard of URI. What you are thinking of as a "valid URL parser input" is actually an Internationalized Resource Identifier, which supports the broader universal character set instead of just ASCII and is abbreviated by the even more obscure acronym "IRI." It is covered by rfc3987: https://datatracker.ietf.org/doc/html/rfc3987 Translating your comment, I think you're saying "Boost.URL should support Internationalized Resource Identifiers." That is unfortunately out of scope for the library, as Boost.URL is mostly designed for the exchange of URLs between machines or programs and not necessarily for display to users. Perhaps someday, the entire world will have switched to IRIs (maybe after IPv4 is no longer in use) but we are not there yet, and most systems require IRIs to be mapped to their URI equivalent: https://datatracker.ietf.org/doc/html/rfc3987#section-3 There is some value to IRIs but not as much as there is for the ASCII URLs, which fill a tremendous user need (HTTP/WebSocket clients and servers using Beast or Asio). Thanks