[url] Andrzej's notes

22 Aug 2022

      Hi Everyone,
It is somewhat unfortunate that the review happens during the vacation
period, therefore I will not be able to devote the amount of time that the
library deserves for the review. There is a procedure for extending the
review period for a couple of days, but even if it were engaged, I doubt it
would have bought me sufficient time. So let me instead offer some loose
thoughts.

Am I familiar with the problem domain, or do I have a use for this library?

My understanding of an URL is that it is a text that I type into my browser
(or other tools) that starts with either "http" or "https".  I use URLs
indirectly by using Boost.Beast, but Boost.Beast offers a higher-level
interface for providing parts of URL, and even if this library is accepted,
I would use the higher-level interface of Boost.Beast. This is not to claim
that there are no uses for the proposed library. It is just my experience.

Obviously, Boost.URL fits into the trend of providing many types with
strong invariants, which I am very fond of. The fact that you have to set
host, port and path separately avoids many bugs: you just cannot create an
invalid URL.

There is one thing that I do not understand though. I read that:
1. The library is tailored for environments that do not use exceptions
2. url guarantees that it maintains its invariant of always storing a
string that represents a valid url.

So given the following code executed on a platform with exceptions
disabled, what happens?

void test(url my_url)
{
  my_url.set_port("twenty-one");
}

is it going to kill the program? If so, is this considered a satisfactory
solution? I use exceptions in my job, so this is a rather theoretical issue
for me, but if I were using urls in no-exceptions environment, I think I
would expect an interface like:

error_code ec;
my_url.set_port("twenty-one", ec);

Or something similar.

Next, the interface of parse_uri().

This library offers three types for storing (and owing) the characters
representing a url:
* url
* static_url
* an unnamed type returned by url_view::persist().

This is good, as in C++ the way the data is stored and laid out is in fact
part of the contract. For the same reason we have multiple string types
that differ by how memory is allocated, if it is stack allocated, how big
the SSO is, what is guaranteed for swap. As a consequence we have a
practical engineering issue -- not confined to URLs -- if there is N types
storing the same logical data in N different ways, and I have a function
returning this logical data, I do not want to offer N functions with N
names. I may add a template, or I may use a universal type that can cheaply
convert to the N types. How do I do that if I do not want to put templates
in the interface of my library? The choice made by Boost.URL may not be
optimal, but it is certainly a good starting point in the pursuit of the
solution to this problem. I see url_view not as a view but as a universal
type for cheaply constructing different containers storing URLs. Of course,
this approach is incompatible with the "almost always auto" philosophy,
which some people now promote to simply "always auto". My personal opinion
is that this AAA philosophy is harmful from the outset and it begs for
problems and bugs. If someone adapts the AAA one must accept the
responsibility for the problems like the one with dangling references.

Next, classes url_base and url_view_base are part of the library's public
API. Why? You can keep them as bases but not advertise them in the library
documentation. Do you expect any use cases for them?

Next. the design in url classes is that it is a mixture of the generic URL
as defined in the RFC, and the HTTP-speciffic stuff, such as paths and
parameters. It is never the case that I need both of this worlds at the
same time: I either need a generic URL that has a password but doesn't have
path or params, or I need a HTTP-like URL, where I do not need a password.
So if we think about strong types with strong invariants, I never get as
strong a type as I would need. For instance, I would never want to pass the
password in URL when using Boost.Beast, so having member set_password would
only be a potential for bugs.

Am I suggesting anything? The only use case that I have for this library is
HTTP-like URLS. I would appreciate a http::url class that is tailored for
http or even REST protocols.

Regarding the docs, while I agree with others that some examples of use
cases, such as percent encoding, would be desired, I also note that the
documentation stands up to the high standards of  of Boost libraries: not
only could I figure out the architecture and how things work, but I was
able to learn a lot about the problem domain, the URLs, URIs, IRIs, design
trade-offs. I really appreciate the effort the authors have put in making
the documentation this good.

I have two recommendations, though. One, the picture that associates IQs
with facial expressions: I strongly recommend that it is removed. It does
not bring any technical value, but it seems to be sending a hostile message.

Two, the colouring of different parts of URL syntax, as in
https://master.url.cpp.al/url/containers.html#url.containers.generic_syntax:
it help to quickly grasp things, but it also bring some confusion. For
instance, I cannot tell if the double-slash is part of authority, or if the
colon is part of the scheme. The colors suggest that they are, but the
member functions tell otherwise.

I have spent maybe like 30 hours studying the docs, superficially studying
the implementation and tried to find if the two match, played with the toy
examples in godbolt (gcc 12.1): https://godbolt.org/z/Tbsqrd77M
Found a number of bugs and reported them in GitHub. The authors were very
responsive and very quickly addressed practically all of them.

Every implementation of every Boost library is difficult to read, in my
opinion. Boost.URL matches the Boost standard in this respect. I could
understand what the functions do, and what they do wrong.

I found the solution in function url_view::persist() very innovative: you
have one allocation instead of three.

I didn't have time to play with path and param views. They seem fine on the
surface, but I have a feeling that they would be clumsy to use in practice.
I observed that params allow duplicate keys? Is it legal?

I am hesitant to give a recommendation for acceptance. I would rather have
a longer discussion than a fast decision. I also want to see how the
authors respond to these notes.

Finally, I want to thank the authors for writing and sharing this library.
It is clear that a lot of effort and thought has been invested in it.

Regards,
&rzej;

Andrzej Krzemienski

Glen Fernandes

Vinnie Falco

Julien Blanc

Vinnie Falco

Klemens Morgenstern

Glen Fernandes

Vinnie Falco

Vinnie Falco

Seth

tags

participants (6)