On Sat, Jun 4, 2022 at 2:23 PM Andrzej Krzemienski via Boost
I am trying to do a review of Boost.URI library
Great! But umm.... err..., well - its called Boost.URL ;)
I tried to use it with the latest MinGW Distro on Windows ( https://nuwen.net/mingw.html), which uses GCC 11.2 and Boost 1.77.
Yeah you need the latest Boost. Until the library is actually accepted, it is written for the tip of the develop branch of the superproject. Note that this goes for all our in-development libraries.
Second, I recommend that Boost.URL docs say that it requires Boost 1.78 or higher.
That's not unreasonable. We develop the documentation as-if the library is already accepted, to minimize the changes that must be made post-acceptance. You should open an issue as that is the best way to motivate change: https://github.com/CPPAlliance/url/issues
Aliases for standard types, such as string_view
https://master.url.cpp.al/url/ref/boost__urls__string_view.html, use their Boost equivalents.
After reading this, I expected that Boost.URL would use boost::string_view from Boost.Utility library: https://www.boost.org/doc/libs/1_79_0/libs/utility/doc/html/utility/utilitie...
But instead, it uses boost::core::string_view, which is an implementation detail from Boost.Core library: https://github.com/CPPAlliance/url/blob/master/include/boost/url/string_view...
Yeah, this documentation was written before we started using Core's string_view. It will need to be updated in Boost.URL, Boost.JSON, Boost.Beast, and Boost.HTTP.Proto. Newly opened issues are the best way to motivate change: https://github.com/boostorg/beast/issues https://github.com/boostorg/beast/json https://github.com/CPPAlliance/url/issues https://github.com/CPPAlliance/http_proto
Again, this is news for me that Boost has two implementations of string_view. Why?
Yeah, so Peter has convinced me that offering two versions of every one of our libraries is not a great idea. By that I mean, that offering a macro that lets the user configure the library for either std::string_view or boost::string_view is detrimental. Because this produces two distinct linkable libraries that each have their own diverging ABIs (or is it APIs?). This unnecessary friction is a constant source of complaints. Peter's vision is that Boost evolves so that its types are more compatible with their std equivalents. For example boost::core::string_view will be more easily converted implicitly in places where the user expects such conversions to take place. We couldn't do this in Boost.Utility's string view because the author is philosophically opposed to making this change. There's some discussion here: https://github.com/boostorg/utility/issues/40 https://github.com/boostorg/utility/pull/51
Next, the section on the parsers ( https://master.url.cpp.al/url/parsing/url.html) describes the function parse_uri() which returns result
. What strikes me is this difference: URI (Identifier) in the function name, and URL (Locator) in the return type. I always used the terms URL and URI interchangeably.
About that. So, the library uses the term "URL" to mean any of the provided containers, e.g. url_view, url, static_url. The term "URI" always refers to the specific BNF syntax found in the relevant RFC.
But now that I see them used in this way in a well designed library, it looks disturbing. The quoted rfc3986 ( https://datatracker.ietf.org/doc/html/rfc3986#section-1.1.3) says that an URL is a subset of URI.
The decision that I have made is to just ignore the RFC's guidance on what URL means, and instead use the term as it has become popularly known. I believe that the distinction between URL and URI is just not recognized by the general public and in particular the wide audience to which Boost.URL applies. No one asks you for your URI, but everyone asks you for your URL. People put URLs into the address bar. No one says "type this URI into the address bar." The address bar accepts non-http schemes such as mailto and file. These are technically URIs (see: https://en.wikipedia.org/wiki/Mailto). But no one calls them that. A google search for "URL" produces fifteen times more results than a search for "URI" although you would think that URIs would be more common since they are a superset of URLs. Go figure :) Therefore I have chosen to use the less technically correct but the more marketable term "URL" in the key places where it matters: the name of the library and the name of the container. Or to put it a different way url u; Looks a hell of a lot better than uri u;
The synopsis for parse_uri ( https://master.url.cpp.al/url/ref/boost__urls__parse_uri.html) says:
Exception safety: throws nothing.
And the line below it says that the function throws std::length_error when the input is too long. It looks like a bug in specs. Later we read:
Return value: A result containing the view to the URL, or an error code if
the parsing was unsuccessful.
Yep this needs an open issue :) https://github.com/CPPAlliance/url/issues
Which is not precise enough to give me the answer to the URI-vs-URL question. When can a parsing be non-successful? Is it only because it was not conformant to the grammar? The synopsis says "This function parses a string according to the URI grammar below", but is it a URI grammar or a URL grammar actually?
Actually this is covered by the docs :) see table 1.1: https://master.url.cpp.al/url/parsing/url.html
Now, there is probably a good explanation to the URI vs URL discrepancy. I think it would be good if it was placed in the docs, so that the users don't get confused.
Yes we could use a blurb which explains that the library settles on the name URL to refer to containers: https://github.com/CPPAlliance/url/issues
While this might look like a list of complaints, I really appreciate the efforts the authors put in creating this library and its documentation. The documentation is really high quality, way higher than the average you will find in GitHub. And this is actually because of this high quality that I am able to spot and report these issues.
Hey thanks!!! Yeah there's of course going to be the usual rogues gallery of doc mistakes, missing explanations, etc... We appreciate your investigation of the library and the accompanying reports as they will help us provide the last bits of polish needed to make this great! Regards