Re: [boost] Boost.URL -- some notes

5 Jun 2022

      Hi Andrzej,

Thanks for reviewing the library.
...
I recommend that Boost.URL docs say that it requires Boost 1.78 or higher.
Definitely. I'll look at the issue in the next days.

https://github.com/CPPAlliance/url/issues/184
...
A better alternative would be to use the official boost::string_view from
Boost.Utility. Or is there a good reason not to?
As others have noted, core::string_view is convertible to std::string_view,
which is becoming more and more important.
A string_view not convertible is std::string_view is problematic. Others
have already shared some relevant links.

Now, the name `parse_uri` implies that it will
...
recognize any URI,
It does. URLs and URIs have the same fields.
The distinction is only relevant for URNs, which would have some
subcomponents we don't consider.
...
but on the other hand it is impossible that the result
will fit into a url_view, because not every URI is an URL.
This is possible because the url_view has all the necessary fields.
Maybe for the same reason, the distinction between URL and URI is becoming
more and more pointless.
For instance, Javascript calls everything a URL.

The synopsis for parse_uri (
...
https://master.url.cpp.al/url/ref/boost__urls__parse_uri.html) says:
Exception safety: throws nothing.
...
And the line below it says that the function throws std::length_error when
the input is too long. It looks like a bug in specs.
Definitely.

https://github.com/CPPAlliance/url/issues/185
...
When can a parsing be non-successful? Is it only because it was
not conformant to the grammar?

Yes.
...
The synopsis says "This function parses a
string according to the URI grammar below", but is it a URI grammar or a
URL grammar actually?
We should probably try to better explain the difference between URI, URL,
and URNs in the docs.
There's some content but it's probably not enough.

This is naturally confusing because people use URI and URL interchangeably.
But then they see URL is a subset of URIs and assume a URL cannot represent
any URI.
But this is incorrect, and it's precisely the reason people use URI and URL
interchangeably.

In fact, the distinction between absolute-URI, relative-ref, URI, and
URI-reference is much more relevant.
The distinction between URLs and URIs is not that relevant because a URL
has all fields required by a URI.
Only URNs consider some URI subcomponents to represent extra fields.

So the class is called URL because that's what everyone calls it.
And all algorithms are called parse_<component>, where <component> is
exactly the name as it happens in the grammar.
Thus, we have parse_absolute_uri, parse_relative_ref, parse_uri, and
parse_uri_reference, which is what the spec calls them.
...
That is, any other reason for not being successful (if any resources needed
to be allocated and failed) may still be reported via exceptions.
These algorithms don't allocate memory.
...
Now, there is probably a good explanation to the URI vs URL discrepancy. I
think it would be good if it was placed in the docs, so that the users
don't get confused.
There are some mentions of that in the docs, but we could create a section
to discuss the distinction between them more explicitly and provide
examples.
...
Regards,
&rzej;
Thanks again!

-- 
Alan Freitas
https://github.com/alandefreitas

Re: [boost] Boost.URL -- some notes

Alan de Freitas