Re: [boost] URL Review

22 Aug 2022

      Thank you for the review.
...
auto r = parse_uri( s );
There's no indication here that r is a non-owning view and that
the caller is responsible for keeping s alive for the lifetime of r.
Yes. This is a relevant issue here.
This tends to arise with any function that returns a view and it's not
simple to fix. Reference types are contentious and some people still seem
to be against std::string_view.
But people do need them in practice. I imagine many wouldn't like
allocating more memory dynamically in an async server just to decide what
to do with an URL string for which you already have allocated storage.

On the other hand, it sounds reasonable to have a compromise for other
potential use cases.
It's not impossible to come up with other alternatives like parse_uri_view(
s ), urls::views::parse_uri( s ), etc...
I guess what std::ranges does, in practice to deal with that, is put all
functions that return views in the "views" namespace.
The biggest problem is still deciding which of the URL types to return from
parse_uri( s ).

https://github.com/CPPAlliance/url/issues/435
...
More fundamentally, I'm not convinced that this allocation-avoiding
view-rich API style is best for a general purpose vocabulary type
like URL.
Some defaults where the shorter-named functions return value types are OK.
Not providing the views would be a mistake though.
Most "non-beginner" use cases we know of require views.
We would just be leaving this demand unattended.
...
I see that there is a method called persist():
std::shared_ptr< url_view const >
   persist() const;
This function returns a read-only copy of the URL, with shared
   lifetime. The returned value owns (persists) the underlying string.
   The algorithm used to create the value minimizes the number of
   individual memory allocations, making it more efficient than when
   using direct standard library functions.
Hmm, so url_view doesn't own the underlying storage, except when it
does as a result of this mysterious method. What's the rationale for
this? (Have I missed some documentation?)
Yes. I agree this function might seem confusing.
The types are coherent, but we end up with something that owns the storage
but is still called url_view.

It allocates the memory for the URL and a url_view that references it. The
rationale is to have only one dynamic memory allocation.
It's sort of an advanced function and we have reviewed the documentation on
that a few times.
There's still a lot to improve.

https://github.com/CPPAlliance/url/issues/436
...
I would be inclined to
determine the offsets each time they are needed by the accessor
methods. In the constructor, parse to validate the syntax but don't
store any offsets.
Exploring ideas to make the object smaller is worth it.
Implementation details can change and there might be better/lower upper
limits for some URL components.

I don't think parsing the URL again every time the user calls an accessor
function is a reasonable solution for the general case though.
Each access to an element would have a linear cost and URLs are allowed be
long in many applications.
Even for small URLs, reparsing it 9 times if the user asks for the
components sounds unreasonable.
...
I believe that in the case of the path segments and the query
parameters, you are not storing offsets (because the number is variable,
and that would need allocations); why not adopt that approach for the
other offsets?
The rationale is that these views are accessed through forward or
bidirectional iterators, so the cost is not linear for each access.
...
In your non-view url object, you seem to allocate storage using
new char[]. Why not use a std::string? One benefit of a std::string
is SBO. Another is that you could provide a move constructor from
a string.
Yes. That sounds interesting.

https://github.com/CPPAlliance/url/issues/437
...
I find the names of the encoding/decoding views confusing. I would
expect a "decoded_view" to refer to underlying encoded data and
return equivalent decoded data when it is read from, but you seen to
call that an "encoded_view".
I agree. Some uses might be confusing, like in

pct_encoded_view decoded_host = u.host();

Most people who have looked at this have proposed something like
decode_view, decoded_view, pct_decoded_view, or decoded_view.

If we follow the convention used in std::ranges, then it would be
pct_decode_view,
because it's usually a verb. For instance,

https://en.cppreference.com/w/cpp/ranges/transform_view
https://en.cppreference.com/w/cpp/ranges#Range_adaptors

https://github.com/CPPAlliance/url/issues/438
...
Is that correct? Again having this third member in the container's
value_type makes generic code more difficult. I would be inclined to
use a std::optional for the value instead.
Yes. One option is to use std::optional for the value.
This looks more coherent.

On the other hand, there are applications where an empty string is a
reasonable replacement when the value is empty.
Another is to have a class instead of a struct and so we could encapsulate
the values and return them one way or the other.

https://github.com/CPPAlliance/url/issues/439
...
In the past I have had situations where it was necessary to specify
the case for hex letters in %-encoding; please consider adding this to
...
pct_encode_opts.
Yes. This could be useful.

Uppercase should be the default since this is what URL normalization
requires.

https://github.com/CPPAlliance/url/issues/440

-- 
Alan Freitas
https://github.com/alandefreitas

Re: [boost] URL Review

Alan de Freitas