On Mon, Aug 22, 2022 at 11:34 AM Andrzej Krzemienski via Boost
I use URLs indirectly by using Boost.Beast, but Boost.Beast offers a higher-level interface for providing parts of URL
huh.. that's news to me :) Beast only gives you the string in the request message: https://github.com/boostorg/beast/blob/76043dec2cf67a2ba33b32bdcc129f5f0027b... it is this string which users of both Beast and URL would want to parse, e.g. void handle_request( beast::request const& req ) { urls::url_view u( req.target() ); ... Just out of curiosity where do you see this higher level interface for providing URL parts that Beast provides?
There is one thing that I do not understand though. I read that: 1. The library is tailored for environments that do not use exceptions 2. url guarantees that it maintains its invariant of always storing a string that represents a valid url.
Yeah, I think that we (the library authors) and the documentation are not communicating things correctly. So I will try to make some statements that are definitely correct, and then perhaps we can sort out what it means for exceptionless environments: * Anything which allocates memory will throw on heap exhaustion (class url mainly) - redefine BOOST_THROW_EXCEPTION to call std::terminate or something * URL and authority parsing free functions and grammar rules do not throw (they use error_code) - they return views, which do not allocate * url_view, url, and static_url constructors will throw on syntax error - simply avoid these constructors and use the free functions instead * modifiers which take un-encoded inputs have a wide contract: all input strings are valid - however the url might need to reallocate memory to encode the result * modifiers which take encoded inputs have a precondition: the input is a valid encoding for the part - you can call the non-encoded function which will do the encoding for you * serialization of a url, url_view, or static_url throws nothing - because the string is always stored serialized * static_url avoids all allocation - but it will throw on overflow, and the caller is expected to size the variable appropriately, and prevent passing large inputs My thinking originally was that parsing into a view, storing a serialized string, and offering the static_url container could be meaningfully called "friendly for environments without exceptions." We should probably reframe this :)
So given the following code executed on a platform with exceptions disabled, what happens?
void test(url my_url) { my_url.set_port("twenty-one"); }
Well the invariant that the container always holds a valid URL will no longer be true.
if I were using urls in no-exceptions environment, I think I would expect an interface like:
error_code ec; my_url.set_port("twenty-one", ec);
Yes this is something we might formalize into the API, although we would use `result<void>` as the return type. But I'm not sure how helpful that is. This needs some thought.
if there is N types storing the same logical data in N different ways, and I have a function returning this logical data, I do not want to offer N functions with N names. I may add a template, or I may use a universal type that can cheaply convert to the N types. How do I do that if I do not want to put templates in the interface of my library?
Ah yeah that's actually quite easy. `url_view_base const&` will capture the read-only part of any URL, while `url_base&` will capture both the read-only and the modifiable part of any URL. No templates needed. You inspired this change when you reported the problem with slicing. Example code: // can be called with url_view, url, or static_url void f( boost::urls::url_view_base const& ); // can be called with url or static_url void f( boost::urls::url_base& ); We should probably document this usage.
Next, classes url_base and url_view_base are part of the library's public API. Why? You can keep them as bases but not advertise them in the library documentation. Do you expect any use cases for them?
They are advertised in the reference: https://master.url.cpp.al/url/ref/boost__urls__url_view_base.html
Next. the design in url classes is that it is a mixture of the generic URL as defined in the RFC, and the HTTP-speciffic stuff, such as paths and parameters. It is never the case that I need both of this worlds at the same time: I either need a generic URL that has a password but doesn't have path or params, or I need a HTTP-like URL, where I do not need a password. So if we think about strong types with strong invariants, I never get as strong a type as I would need. For instance, I would never want to pass the password in URL when using Boost.Beast, so having member set_password would only be a potential for bugs.
Password can in fact appear in HTTP. This is a valid CONNECT request:
CONNECT user:pass@livechat.boost.org HTTP/1.1\r\n\r\n
Since that request-target is not a valid URL, you would need to call
`parse_authority` instead. The return value from that function is
result
Am I suggesting anything? The only use case that I have for this library is HTTP-like URLS. I would appreciate a http::url class that is tailored for http or even REST protocols.
Aside from offering the password, which could be restricted with business logic instead of a new type - I feel that boost::urls::url_view and boost::urls::url are pretty well suited for HTTP and REST. The hierarchical and structured interpretations of path and query and provided as separate containers with reference semantics, so they do not burden users of opaque schemes. There's a balance that is struck here - users are offered containers which works for all URLs, and while some of the API may not be of use to your particular use-case, there is some value having a vocabulary type which can be trafficked universally regardless of its contents.
Regarding the docs, while I agree with others that some examples of use cases, such as percent encoding, would be desired, I also note that the documentation stands up to the high standards of of Boost libraries:
Well thanks for the kind words! But... well.. we still have work to do there. Thank you though!
I have two recommendations, though. One, the picture that associates IQs with facial expressions: I strongly recommend that it is removed. It does not bring any technical value, but it seems to be sending a hostile message.
LMAO... I only put that in for the review, for some levity. It would of course not be in the final library.
Two, the colouring of different parts of URL syntax, as in https://master.url.cpp.al/url/containers.html#url.containers.generic_syntax: it help to quickly grasp things, but it also bring some confusion. For instance, I cannot tell if the double-slash is part of authority, or if the colon is part of the scheme. The colors suggest that they are, but the member functions tell otherwise.
Well it is using the ABNF syntax. We could state that explicitly, and point out that the colon, double slash, question mark, and pound sign are delimiters which separate the parts and are not considered to belong to their respective element.
I didn't have time to play with path and param views. They seem fine on the surface, but I have a feeling that they would be clumsy to use in practice. I observed that params allow duplicate keys? Is it legal?
Yeah, duplicates are allowed. I think the path views are pretty OK.
I'm not sure about params. I mean, we need the container and it should
be iterator based, and it has to provide the key and value. But I
don't know if "bool has_value" is best, or if we should use an
optional, or if we should use `pair
Finally, I want to thank the authors for writing and sharing this library. It is clear that a lot of effort and thought has been invested in it.
Thank you for taking the time to write the review! Regards