On Sun, Aug 21, 2022 at 6:21 AM Phil Endecott via Boost
My review of the proposed Boost.URL library follows.
Thank you for taking the time to have a look at this library.
In previous email discussions a few months ago I complained about the dangerous use of non-owning view types in the library interface,
Yes but... this is how it is. The expectation for this library (and indeed, all my libraries past, present, and future) is that users have a grasp of C++ and are aware of the lifetime rules for both references, and types which have the semantics of references. That's the price of admission for the type-erasure and zero-cost abstraction.
result
r = parse_uri( s );
You could instead write result< url > = rv parse_uri( s ); and then rv.value() would be owning.
auto r = parse_uri( s );
There's no indication here that r is a non-owning view and that the caller is responsible for keeping s alive for the lifetime of r.
From https://master.url.cpp.al/url/ref/boost__urls__parse_uri.html#url.ref.boost_... "Ownership of the string is not transferred; the caller is responsible for ensuring that the lifetime of the character buffer extends until
On this point we agree. But...isn't this always a problem when using `auto`? It deprives the reader of information. When you write "auto" you are opting in for convenience over readability. If your complaint here is that the use of `auto` can make it easy to forget about things then I agree with that too. Still, I'm glad auto is in the language. However if you will note, the documentation meticulously minimizes the usage of auto in the example code, for this reason. The lifetime requirements are documented: the view is no longer being accessed."
A safer API design would, at a minimum, give the parsing function a name with "view" in it. The default, shorter-named function should return a value type.
Yes well, we considered that design and concluded that it wasn't practical. The library has multiple value types for URLs, which one should it return? And a pattern of use is to store the result of parsing in an existing variable which already has storage allocated (to support reuse and avoid unnecessary allocate/copy/free cycles).
More fundamentally, I'm not convinced that this allocation-avoiding view-rich API style is best for a general purpose vocabulary type like URL.
A URL is literally a string which meets the prescribed syntactical requirements of RFC3986. If I receive an HTTP GET request like this: GET /index.htm HTTP/1.1\r\n you think that the library should not let the user parse the substring and return a view to the URL? And instead it should allocate memory, all to protect beginners?
URL should be a "Keep It Simple" type, accessible to C++ beginners, with straightforward lifetime semantics.
std::string_view is C++17, are we saying now that Boost libraries should avoid some C++ features to appease beginners?
Compare with std::filesystem::path, which does not try to optimise by actually being a view.
Not really a good example for bolstering your case, because filesystem::path has significant problems :) Significant enough to motivate std::filesystem::path_view: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1030r4.pdf
Hmm, so url_view doesn't own the underlying storage, except when it does as a result of this mysterious method. What's the rationale for this? (Have I missed some documentation?)
`boost::urls::url` is for mutation and unique ownership. The function persist() is provided for efficient shared ownership of read-only URLs. Users can't write this function themselves, because it relies on implementation details.
What is sizeof(url) or sizeof(url_view)? I think it must be about 160 bytes, since you store offsets to each of the components. I'm not convinced that this is the best approach. ... Making the objects smaller would probably have measurable performance benefits.
This was the choice that we went with, but it can be changed in the future with sufficient experience in the field, without breaking the API. We would need measurements to know for sure.
In your non-view url object, you seem to allocate storage using new char[]. Why not use a std::string? One benefit of a std::string is SBO. Another is that you could provide a move constructor from a string.
No particular reason, this is something we could change in the future with sufficient experience in the field, without breaking the API. Move construction from `std::string` does sound interesting :) I have my doubts that very many URLs are going to be fitting into the small buffer but you never know.
In your query parameter API, you expose something that looks like a standard associative container except that you call the key and value "key" and "value" respectively, rather than "first" and "second". While I think many of us would agree that those are better names, it does make it more difficult to use this container with generic code than it would be otherwise.
This is something we could change, I guess you're proposing: using reference_type = `pair< pct_encoded_view, optional< pct_encoded_view > >` Which generic code uses first and second? Do you have an example you might point us at?
I see that you also have a bool called "has_value" in the value_type. I think this is to handle this:
?a=1 // has_value == true && value == "1" ?a= // has_value == true && value == "" ?a // has_value == false
Is that correct?
Yes
Again having this third member in the container's value_type makes generic code more difficult. I would be inclined to use a std::optional for the value instead.
We are already using boost::optional elsewhere, so we could use it here as well. But, there is a downside. If the user only cares about values which are not empty they could skip checking has_value and just look at key.empty(). If we switch to optional then everyone has to check optional::has_value(). I'm undecided on which is better, hopefully we will get more feedback.
In the past I have had situations where it was necessary to specify the case for hex letters in %-encoding; please consider adding this to pct_encode_opts.
Well percent-encoded escapes always use hexadecimal so I'm not sure what you're asking for here.
The Magnet Link example needs a paragraph at the start to say what a Magnet Link is.
Yes good idea.
Because of these concerns about the basic API design, I believe that the library should be REJECTED.
I remember I was at CppCon, I think it was 2018 and I was at some sort of official dinner - probably the speaker's dinner. I got sat next to this loudmouth who at one point starting going off on string_view and how bad it was, how dangerous it was, all the problems it caused in the jillion-line codebase at his job, and how it never should have been added to the standard. He did make some valid points of course but I couldn't shake the feeling that perhaps a programming language other than C++ would suit him better. That person was Nicolai Josuttis. Anyway the point I'm making here is that string_view in particular, and types that behave like references in general, are a contentious issue for C++. Some people hate them because "beginners" and easy-to-make-mistakes. Some people love them because "muh performance" (and beautiful type-erasure). It seems to me that one of the distinguishing features of C++ is in fact its zero-cost abstractions. How cool is it that the Boost.URL library lets you not only parse a URL in place, which is some part of a larger character buffer, but also lets you seamlessly access all of the parts? This I believe is very much in the spirit of C++. If we start thinking that users need to be coddled, have the useful but sharp edges filed down, pointy ends blunted, and so forth - I feel like we will end up with Rust. For that you can probably just save all the work and start using Rust sooner rather than later. Thanks