Thank you for the review.
auto r = parse_uri( s );
There's no indication here that r is a non-owning view and that the caller is responsible for keeping s alive for the lifetime of r.
Yes. This is a relevant issue here. This tends to arise with any function that returns a view and it's not simple to fix. Reference types are contentious and some people still seem to be against std::string_view. But people do need them in practice. I imagine many wouldn't like allocating more memory dynamically in an async server just to decide what to do with an URL string for which you already have allocated storage. On the other hand, it sounds reasonable to have a compromise for other potential use cases. It's not impossible to come up with other alternatives like parse_uri_view( s ), urls::views::parse_uri( s ), etc... I guess what std::ranges does, in practice to deal with that, is put all functions that return views in the "views" namespace. The biggest problem is still deciding which of the URL types to return from parse_uri( s ). https://github.com/CPPAlliance/url/issues/435
More fundamentally, I'm not convinced that this allocation-avoiding view-rich API style is best for a general purpose vocabulary type like URL.
Some defaults where the shorter-named functions return value types are OK. Not providing the views would be a mistake though. Most "non-beginner" use cases we know of require views. We would just be leaving this demand unattended.
I see that there is a method called persist():
std::shared_ptr< url_view const > persist() const;
This function returns a read-only copy of the URL, with shared lifetime. The returned value owns (persists) the underlying string. The algorithm used to create the value minimizes the number of individual memory allocations, making it more efficient than when using direct standard library functions.
Hmm, so url_view doesn't own the underlying storage, except when it does as a result of this mysterious method. What's the rationale for this? (Have I missed some documentation?)
Yes. I agree this function might seem confusing. The types are coherent, but we end up with something that owns the storage but is still called url_view. It allocates the memory for the URL and a url_view that references it. The rationale is to have only one dynamic memory allocation. It's sort of an advanced function and we have reviewed the documentation on that a few times. There's still a lot to improve. https://github.com/CPPAlliance/url/issues/436
I would be inclined to determine the offsets each time they are needed by the accessor methods. In the constructor, parse to validate the syntax but don't store any offsets.
Exploring ideas to make the object smaller is worth it. Implementation details can change and there might be better/lower upper limits for some URL components. I don't think parsing the URL again every time the user calls an accessor function is a reasonable solution for the general case though. Each access to an element would have a linear cost and URLs are allowed be long in many applications. Even for small URLs, reparsing it 9 times if the user asks for the components sounds unreasonable.
I believe that in the case of the path segments and the query parameters, you are not storing offsets (because the number is variable, and that would need allocations); why not adopt that approach for the other offsets?
The rationale is that these views are accessed through forward or bidirectional iterators, so the cost is not linear for each access.
In your non-view url object, you seem to allocate storage using new char[]. Why not use a std::string? One benefit of a std::string is SBO. Another is that you could provide a move constructor from a string.
Yes. That sounds interesting. https://github.com/CPPAlliance/url/issues/437
I find the names of the encoding/decoding views confusing. I would expect a "decoded_view" to refer to underlying encoded data and return equivalent decoded data when it is read from, but you seen to call that an "encoded_view".
I agree. Some uses might be confusing, like in pct_encoded_view decoded_host = u.host(); Most people who have looked at this have proposed something like decode_view, decoded_view, pct_decoded_view, or decoded_view. If we follow the convention used in std::ranges, then it would be pct_decode_view, because it's usually a verb. For instance, https://en.cppreference.com/w/cpp/ranges/transform_view https://en.cppreference.com/w/cpp/ranges#Range_adaptors https://github.com/CPPAlliance/url/issues/438
Is that correct? Again having this third member in the container's value_type makes generic code more difficult. I would be inclined to use a std::optional for the value instead.
Yes. One option is to use std::optional for the value. This looks more coherent. On the other hand, there are applications where an empty string is a reasonable replacement when the value is empty. Another is to have a class instead of a struct and so we could encapsulate the values and return them one way or the other. https://github.com/CPPAlliance/url/issues/439
In the past I have had situations where it was necessary to specify
the case for hex letters in %-encoding; please consider adding this to
pct_encode_opts.
Yes. This could be useful. Uppercase should be the default since this is what URL normalization requires. https://github.com/CPPAlliance/url/issues/440 -- Alan Freitas https://github.com/alandefreitas