boost.url review

13 Aug 2022

Hi all,

Here is my review of Boost.Url.

>  - Does this library bring real benefit to C++ developers for real world
>    use-case? Do you have an application for this library?

I'm happy to see this library being proposed. I strongly believe
we need such functionality in Boost. If it gets accepted, I'm
planning on using it in Boost.Mysql to create type-erased
connections from connection URLs (such as
mysql://user@localhost:3306/database?ssl-mode=disable).
I think URL functionality is a must for the C++ for web environment
we're developing.

 - Does the API match with current best practices?

The API is clear and concise. It allows parsing and modifying
URLs in a pretty nice way.

I've found the naming for view containers confusing (e.g.
params and params_view). When I saw it, I thought params
would take ownership of the query string params buffer, while
params_view would not. If I've understood correctly, the difference
is that params allows for modification (as it's returned from url containers)
and params_view does not (as it's returned from url_view). So it's
more about mutability than ownership. If that's the case, I would
either emphasize it in the docs, or rename the containers to
something that reflects this fact (e.g. const_params and mutable_params,
as asio does).

I'm happy to see percent-encoding standalone functions being exposed,
as this is something present in almost all other languages. I'd consider
exposing higher-level functions here, maybe taking a lvalue reference
to a std::string, allowing the user to percent-encode strings easier
(so they resemble JavaScript encodeURIComponent).

Something I've found when playing with mutating URLs is that modifying
a query parameter to a certain value is a little bit verbose:

url u; // get it from somewhere
auto it = u.params().find("k");
if (it == u.params().end())
{
    u.params().append("k", "value");
}
else
{
    u.params().replace_value(it, "value");
}

I had this use case in the last web application I wrote (although it
wasn't C++).
I don't know if this is a common use case, but I'd consider adding a function
that simplifies it (i.e. params::set_value(string_view key, string_view value).

I'm not very keen on exposing the parsing infrastructure and helper
functions (other than percent encoding/decoding) as public API,
as I think it violates the principle of API surface minimization.
I understand that these parsing facilities may help libraries
such as HTTPProto, but I'm not sure if the general public should
be using them. I may be missing something, but I haven't
found the example on magnet links very compelling - I feel
the same result could be achieved in a cleaner way by using
the higher level API. If the parsing facilities are to be kept
public, I think we need a more compelling example for them.
>From reading this comment
https://github.com/CPPAlliance/url/issues/379#issuecomment-1212026752
I get the impression that some schemes have different reserved
character sets, and that these functions help with that problem
- maybe an example of such an scheme could be helpful.

I can see other public types in the reference such as recycled,
recycled_ptr and aligned_storage, which again have the feeling
should not be exposed as public types.

 - Is the documentation helpful and clear?

The documentation is good and the reference is complete, but I miss
some examples. We've just got two examples, and one of them is
about a very concrete feature that most users will likely not be using
day-to-day.

In particular, I miss an example (and possibly a page) about composing
and mutating URLs (starting with a base URL, add some segments and query
parameters, emphasizing on when percent encoding gets applied).
This is a pretty typical task when you're consuming a third party
REST API.

I'd add a page and an example on percent encoding functions, as they
are only explained in the reference and the explanation assumes
familiarity with the CharSet concept.

As there are several container view types, I would also suggest
creating a comparison table, like in json::value
(https://www.boost.org/doc/libs/1_80_0/libs/json/doc/html/json/dom/value.html)
(I've found that one particularly useful).

The docstring for segments::insert seems to be wrong
(https://master.url.cpp.al/url/ref/boost__urls__segments/insert/overload1.html),
as I've passed a non-percent encoded string and it has accepted it,
encoding the value. It seems an incorrect copy/paste from
segments_encoded::insert. params::insert overloads with key only
and with key/value also seem to have swapped docstrings.

I'm also curious about the security review that is announced
in the main page - when is that going to take place?

 - Did you try to use it? What problems or surprises did you encounter?

I used it in a toy project via CMake FetchContent, with Boost 1.80.0,
Linux and clang12. I had no problems building it. I initially
tried `mkdir build && cmake -DCMAKE_PREFIX_PATH=path/to/boost ..`
but got errors about linking to non existent targets.

I only built the source (not the official examples and tests) and
my toy programs. I focused on composing and mutating URLs,
but also tried parsing and percent encoding. Except for the
mismatched docstrings I've already mentioned, no surprises.

 - What is your evaluation of the implementation?

I haven't looked into it.

- How much effort did you put into your evaluation? A glance?
  A quick reading? In-depth study?

I've dedicated half day to build and play with the library,
and had read most of the docs (and some code) beforehand.

- Are you knowledgeable about the problem domain?

I'm not a URL expert, just a regular URL user. I've written
several web applications where URL handling is a day-to-day task.

My recommendation is to CONDITIONALLY ACCEPT
this library as part of Boost, subject to reviewing which
types/functions in the lower-level API are exposed and which
ones are not.

Thanks Vinnie and Alan for your work.

Regards,
Ruben.

Klemens Morgenstern

Vinnie Falco

Vinnie Falco

Ruben Perez

Vinnie Falco

Alex Christensen

Ruben Perez

Vinnie Falco

Vinnie Falco

Ruben Perez

Andrzej Krzemienski

Vinnie Falco

Rainer Deyke

Vinnie Falco

Rainer Deyke

Peter Dimov

Rainer Deyke

Vinnie Falco

Rainer Deyke

Дмитрий Архипов

Alan de Freitas

tags

participants (9)