Hi Everyone, It is somewhat unfortunate that the review happens during the vacation period, therefore I will not be able to devote the amount of time that the library deserves for the review. There is a procedure for extending the review period for a couple of days, but even if it were engaged, I doubt it would have bought me sufficient time. So let me instead offer some loose thoughts. Am I familiar with the problem domain, or do I have a use for this library? My understanding of an URL is that it is a text that I type into my browser (or other tools) that starts with either "http" or "https". I use URLs indirectly by using Boost.Beast, but Boost.Beast offers a higher-level interface for providing parts of URL, and even if this library is accepted, I would use the higher-level interface of Boost.Beast. This is not to claim that there are no uses for the proposed library. It is just my experience. Obviously, Boost.URL fits into the trend of providing many types with strong invariants, which I am very fond of. The fact that you have to set host, port and path separately avoids many bugs: you just cannot create an invalid URL. There is one thing that I do not understand though. I read that: 1. The library is tailored for environments that do not use exceptions 2. url guarantees that it maintains its invariant of always storing a string that represents a valid url. So given the following code executed on a platform with exceptions disabled, what happens? void test(url my_url) { my_url.set_port("twenty-one"); } is it going to kill the program? If so, is this considered a satisfactory solution? I use exceptions in my job, so this is a rather theoretical issue for me, but if I were using urls in no-exceptions environment, I think I would expect an interface like: error_code ec; my_url.set_port("twenty-one", ec); Or something similar. Next, the interface of parse_uri(). This library offers three types for storing (and owing) the characters representing a url: * url * static_url * an unnamed type returned by url_view::persist(). This is good, as in C++ the way the data is stored and laid out is in fact part of the contract. For the same reason we have multiple string types that differ by how memory is allocated, if it is stack allocated, how big the SSO is, what is guaranteed for swap. As a consequence we have a practical engineering issue -- not confined to URLs -- if there is N types storing the same logical data in N different ways, and I have a function returning this logical data, I do not want to offer N functions with N names. I may add a template, or I may use a universal type that can cheaply convert to the N types. How do I do that if I do not want to put templates in the interface of my library? The choice made by Boost.URL may not be optimal, but it is certainly a good starting point in the pursuit of the solution to this problem. I see url_view not as a view but as a universal type for cheaply constructing different containers storing URLs. Of course, this approach is incompatible with the "almost always auto" philosophy, which some people now promote to simply "always auto". My personal opinion is that this AAA philosophy is harmful from the outset and it begs for problems and bugs. If someone adapts the AAA one must accept the responsibility for the problems like the one with dangling references. Next, classes url_base and url_view_base are part of the library's public API. Why? You can keep them as bases but not advertise them in the library documentation. Do you expect any use cases for them? Next. the design in url classes is that it is a mixture of the generic URL as defined in the RFC, and the HTTP-speciffic stuff, such as paths and parameters. It is never the case that I need both of this worlds at the same time: I either need a generic URL that has a password but doesn't have path or params, or I need a HTTP-like URL, where I do not need a password. So if we think about strong types with strong invariants, I never get as strong a type as I would need. For instance, I would never want to pass the password in URL when using Boost.Beast, so having member set_password would only be a potential for bugs. Am I suggesting anything? The only use case that I have for this library is HTTP-like URLS. I would appreciate a http::url class that is tailored for http or even REST protocols. Regarding the docs, while I agree with others that some examples of use cases, such as percent encoding, would be desired, I also note that the documentation stands up to the high standards of of Boost libraries: not only could I figure out the architecture and how things work, but I was able to learn a lot about the problem domain, the URLs, URIs, IRIs, design trade-offs. I really appreciate the effort the authors have put in making the documentation this good. I have two recommendations, though. One, the picture that associates IQs with facial expressions: I strongly recommend that it is removed. It does not bring any technical value, but it seems to be sending a hostile message. Two, the colouring of different parts of URL syntax, as in https://master.url.cpp.al/url/containers.html#url.containers.generic_syntax: it help to quickly grasp things, but it also bring some confusion. For instance, I cannot tell if the double-slash is part of authority, or if the colon is part of the scheme. The colors suggest that they are, but the member functions tell otherwise. I have spent maybe like 30 hours studying the docs, superficially studying the implementation and tried to find if the two match, played with the toy examples in godbolt (gcc 12.1): https://godbolt.org/z/Tbsqrd77M Found a number of bugs and reported them in GitHub. The authors were very responsive and very quickly addressed practically all of them. Every implementation of every Boost library is difficult to read, in my opinion. Boost.URL matches the Boost standard in this respect. I could understand what the functions do, and what they do wrong. I found the solution in function url_view::persist() very innovative: you have one allocation instead of three. I didn't have time to play with path and param views. They seem fine on the surface, but I have a feeling that they would be clumsy to use in practice. I observed that params allow duplicate keys? Is it legal? I am hesitant to give a recommendation for acceptance. I would rather have a longer discussion than a fast decision. I also want to see how the authors respond to these notes. Finally, I want to thank the authors for writing and sharing this library. It is clear that a lot of effort and thought has been invested in it. Regards, &rzej;
On Monday, August 22, 2022, Andrzej Krzemienski wrote:
I am hesitant to give a recommendation for acceptance. I would rather have a longer discussion than a fast decision. I also want to see how the authors respond to these notes.
The review can be extended, if author and review manager agree, and can schedule with wizard accordingly. Glen
On Mon, Aug 22, 2022 at 11:59 AM Glen Fernandes via Boost
The review can be extended, if author and review manager agree, and can schedule with wizard accordingly.
If there are a meaningful number of folks on the list who can confirm that they would submit a late review if the period were extended, then I'm open to extension. There's no rush here anyway, since we have work to do on the documentation (and addressing all of the issues raised). Thanks
Le lundi 22 août 2022 à 12:42 -0700, Vinnie Falco via Boost a écrit :
On Mon, Aug 22, 2022 at 11:59 AM Glen Fernandes via Boost
wrote: The review can be extended, if author and review manager agree, and can schedule with wizard accordingly.
If there are a meaningful number of folks on the list who can confirm that they would submit a late review if the period were extended, then I'm open to extension. There's no rush here anyway, since we have work to do on the documentation (and addressing all of the issues raised).
I would be in favor of an extension, i've followed the discussions and started reading the docs, but did not find the time to do more testing and write a review. I do have a bit more time this week, so could maybe find the time to make a small review. Not sure i will raise more points than were already, though. Regards, Julien
On Mon, Aug 22, 2022 at 11:28 PM Julien Blanc via Boost
I would be in favor of an extension
Klemens, an extension of the review period doesn't negatively impact us so if you're open to it, and the review wizard(s) are open to it then an extension is fine if it means more feedback. I'll leave it to you to coordinate the logistics. Thanks
I was about to close the review - how long an extension are we talking about? Including next weekend? On Tue, Aug 23, 2022 at 11:59 PM Vinnie Falco via Boost < boost@lists.boost.org> wrote:
On Mon, Aug 22, 2022 at 11:28 PM Julien Blanc via Boost
wrote: I would be in favor of an extension
Klemens, an extension of the review period doesn't negatively impact us so if you're open to it, and the review wizard(s) are open to it then an extension is fine if it means more feedback. I'll leave it to you to coordinate the logistics.
Thanks
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Mon, Aug 22, 2022 at 11:34 AM Andrzej Krzemienski via Boost
I use URLs indirectly by using Boost.Beast, but Boost.Beast offers a higher-level interface for providing parts of URL
huh.. that's news to me :) Beast only gives you the string in the request message: https://github.com/boostorg/beast/blob/76043dec2cf67a2ba33b32bdcc129f5f0027b... it is this string which users of both Beast and URL would want to parse, e.g. void handle_request( beast::request const& req ) { urls::url_view u( req.target() ); ... Just out of curiosity where do you see this higher level interface for providing URL parts that Beast provides?
There is one thing that I do not understand though. I read that: 1. The library is tailored for environments that do not use exceptions 2. url guarantees that it maintains its invariant of always storing a string that represents a valid url.
Yeah, I think that we (the library authors) and the documentation are not communicating things correctly. So I will try to make some statements that are definitely correct, and then perhaps we can sort out what it means for exceptionless environments: * Anything which allocates memory will throw on heap exhaustion (class url mainly) - redefine BOOST_THROW_EXCEPTION to call std::terminate or something * URL and authority parsing free functions and grammar rules do not throw (they use error_code) - they return views, which do not allocate * url_view, url, and static_url constructors will throw on syntax error - simply avoid these constructors and use the free functions instead * modifiers which take un-encoded inputs have a wide contract: all input strings are valid - however the url might need to reallocate memory to encode the result * modifiers which take encoded inputs have a precondition: the input is a valid encoding for the part - you can call the non-encoded function which will do the encoding for you * serialization of a url, url_view, or static_url throws nothing - because the string is always stored serialized * static_url avoids all allocation - but it will throw on overflow, and the caller is expected to size the variable appropriately, and prevent passing large inputs My thinking originally was that parsing into a view, storing a serialized string, and offering the static_url container could be meaningfully called "friendly for environments without exceptions." We should probably reframe this :)
So given the following code executed on a platform with exceptions disabled, what happens?
void test(url my_url) { my_url.set_port("twenty-one"); }
Well the invariant that the container always holds a valid URL will no longer be true.
if I were using urls in no-exceptions environment, I think I would expect an interface like:
error_code ec; my_url.set_port("twenty-one", ec);
Yes this is something we might formalize into the API, although we would use `result<void>` as the return type. But I'm not sure how helpful that is. This needs some thought.
if there is N types storing the same logical data in N different ways, and I have a function returning this logical data, I do not want to offer N functions with N names. I may add a template, or I may use a universal type that can cheaply convert to the N types. How do I do that if I do not want to put templates in the interface of my library?
Ah yeah that's actually quite easy. `url_view_base const&` will capture the read-only part of any URL, while `url_base&` will capture both the read-only and the modifiable part of any URL. No templates needed. You inspired this change when you reported the problem with slicing. Example code: // can be called with url_view, url, or static_url void f( boost::urls::url_view_base const& ); // can be called with url or static_url void f( boost::urls::url_base& ); We should probably document this usage.
Next, classes url_base and url_view_base are part of the library's public API. Why? You can keep them as bases but not advertise them in the library documentation. Do you expect any use cases for them?
They are advertised in the reference: https://master.url.cpp.al/url/ref/boost__urls__url_view_base.html
Next. the design in url classes is that it is a mixture of the generic URL as defined in the RFC, and the HTTP-speciffic stuff, such as paths and parameters. It is never the case that I need both of this worlds at the same time: I either need a generic URL that has a password but doesn't have path or params, or I need a HTTP-like URL, where I do not need a password. So if we think about strong types with strong invariants, I never get as strong a type as I would need. For instance, I would never want to pass the password in URL when using Boost.Beast, so having member set_password would only be a potential for bugs.
Password can in fact appear in HTTP. This is a valid CONNECT request:
CONNECT user:pass@livechat.boost.org HTTP/1.1\r\n\r\n
Since that request-target is not a valid URL, you would need to call
`parse_authority` instead. The return value from that function is
result
Am I suggesting anything? The only use case that I have for this library is HTTP-like URLS. I would appreciate a http::url class that is tailored for http or even REST protocols.
Aside from offering the password, which could be restricted with business logic instead of a new type - I feel that boost::urls::url_view and boost::urls::url are pretty well suited for HTTP and REST. The hierarchical and structured interpretations of path and query and provided as separate containers with reference semantics, so they do not burden users of opaque schemes. There's a balance that is struck here - users are offered containers which works for all URLs, and while some of the API may not be of use to your particular use-case, there is some value having a vocabulary type which can be trafficked universally regardless of its contents.
Regarding the docs, while I agree with others that some examples of use cases, such as percent encoding, would be desired, I also note that the documentation stands up to the high standards of of Boost libraries:
Well thanks for the kind words! But... well.. we still have work to do there. Thank you though!
I have two recommendations, though. One, the picture that associates IQs with facial expressions: I strongly recommend that it is removed. It does not bring any technical value, but it seems to be sending a hostile message.
LMAO... I only put that in for the review, for some levity. It would of course not be in the final library.
Two, the colouring of different parts of URL syntax, as in https://master.url.cpp.al/url/containers.html#url.containers.generic_syntax: it help to quickly grasp things, but it also bring some confusion. For instance, I cannot tell if the double-slash is part of authority, or if the colon is part of the scheme. The colors suggest that they are, but the member functions tell otherwise.
Well it is using the ABNF syntax. We could state that explicitly, and point out that the colon, double slash, question mark, and pound sign are delimiters which separate the parts and are not considered to belong to their respective element.
I didn't have time to play with path and param views. They seem fine on the surface, but I have a feeling that they would be clumsy to use in practice. I observed that params allow duplicate keys? Is it legal?
Yeah, duplicates are allowed. I think the path views are pretty OK.
I'm not sure about params. I mean, we need the container and it should
be iterator based, and it has to provide the key and value. But I
don't know if "bool has_value" is best, or if we should use an
optional, or if we should use `pair
Finally, I want to thank the authors for writing and sharing this library. It is clear that a lot of effort and thought has been invested in it.
Thank you for taking the time to write the review! Regards
participants (6)
-
Andrzej Krzemienski
-
Glen Fernandes
-
Julien Blanc
-
Klemens Morgenstern
-
Seth
-
Vinnie Falco