Boost.URL review planned for 13-22 August 2022
Folks, Klemens Morgenstern kindly volunteered [1] as the Review Manager (thanks Klemens!) for the Boost.URL [2], so we have planned the review shortly after the upcoming release of the Boost 1.80. Check the Review Schedule [3] for details! [1] https://lists.boost.org/Archives/boost/2022/06/253225.php [2] https://lists.boost.org/Archives/boost/2022/04/252814.php [3] https://www.boost.org/community/review_schedule.html Best regards, -- The Boost Review Wizards Mateusz Loskot
Disclosure: I've followed the evolution of the library with some interest on cppslack. Hence I may be consciously/sub-consciously primed to see some of the choices that went into the library as "sensible". # What is your evaluation of the potential usefulness of the library? It's useful especially for applications dealing with network (Asio, Beast, Mysql) but obviously beyond # What is your evaluation of the design? I won't call it user-friendly, but seems to achieve the goals that it was set out to achieve well, which means focusing on versatility and control, inevitably sacrificing some ease-of-use. Like some of the earlier reviewers, I also have some concerns over introducing non-owning vocabulary type. I do appreciate the design choice though, because I think it ultimately fits the Boost ecosystem best - favoring performance/versatility over ease of use. In contrast to others, I think the choice of Nomenclature, namely URL everywhere _except_ in the grammar domain (the parse function names) is a very elegant one: it avoids the historical naming swamp (by choosing the URL camp) while still making it abundantly clear what parts of RFC are referenced. Other libraries don't always expose accurate terms (`authority`, `URI reference`) and it leads to confusion, e.g. python3: In [1]: from urllib.parse import urlparse In [2]: urlparse('localhost:5555') Out[2]: ParseResult(scheme='', netloc='', path='localhost:5555', params='', query='', fragment='') Whereas Boost URL allows the user to specify the intended semantics - parse_uri("localhost:5555") gives {scheme="localhost", path="5555"} - parse_relative_reference("localhost:5555") gives an error (leftover) - the RFC specifies[1] how to fix the input to allow `:` to appear in the first elements of a relative-path reference: parse_relative_reference("./localhost:5555") gives {path="./localhost:5555"} I do feel that `pct_encoded_view` is an unfortunate choice for a view that actively decodes URL-encoded characters. I'd prefer `pct_decode_view` or `pct_decoding_view`. Views names should probably describe the semantics of the view itself, not what kind of underlying sequence is being adapted/transformed. I found `resolve` a bit mystifying. While I appreciate the docs not trying to duplicate the RFC specifying what the operation means, I think it will be a big help to have inspirational examples, like example one that I came up with a while ago: boost::url make_url(boost::url_view base_api, boost::url_view method) { assert(!method.is_path_absolute()); assert(base_api.data()[base_api.size() - 1] == '/'); boost::urls::error_code ec; boost::url url; resolve(base_api, method, url, ec); if (ec) throw boost::system::system_error(ec); return url; } // example of use: static boost::url_view const base_api{"wss://api.binance.com/api/v3/"}; boost::url method{"depth"}; method.params().append("symbol", "BTCUSDT"); std::cout << make_url(base_api, method); // "wss://api.binance.com/api/v3/depth?symbol=BTCUSDT" On a sidenote, `resolve` is one of those API's where I appreciate that it is a free function (so it applies equally well to all `url_view` compatible containers) as well taking an output argument for the result. Like with `encoded_view::assign_to` I believe it is the most elegant way to give the caller control over allocation and reuse. It would be welcome to _also_ have `resolve()` member functions for discoverability. Regarding the parser interface, I feel this should mature into its own library. I don't see myself using it currently, and would consider it a pure implementation w.r.t. of Boost URL. I have not reviewed this section of the library and only skimmed the docs. # What is your evaluation of the implementation? Didn't look at it too much, though I found some observations I'll list below. # What is your evaluation of the documentation? The documentation can use a little work in the narrative department. On the plus side, some pages already incorporate some "real life" examples. The brevity makes it easy to overlook. Perhaps the examples would be better when fleshed out in long form, then linked to. I like the diagrams and some example tables. I think I might be referring to the diagram in the future when discussing/explaining the behaviour of URLs or Boost URL in particular. # Did you try to use the library? With what compiler? Did you have any problems? I have used the library earlier e.g. when writing Beast code and answering StackOverflow # How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? I've been eyeing the library for some time while it was "baking". I spent ~4 hours gathering these review comments today, using the library building from the boost super-project using both CMake and b2 (linux). In some departments I guess I had the advantage of prior knowledge, making my review more in-depth in places. # Are you knowledgeable about the problem domain? I'm not a notable domain expert, but I have used my share of URL classes in C#, Java and .NET. Also, have implemented parsers for query strings in the past. # Do you think the library should be accepted as a Boost library? I vote to ACCEPT the library. Although there are things to be improved, I wouldn't consider any of them conditional. Thanks Vinnie and Alan for putting time into creating and honing this capable library. Seth Heeren ----- A list of observations encountered when trying out things and reading through the code: - The current implementation trips a fair number of deprecation warnings in Boost Filesystem - I trust those are probably known and easily addressed - In Containers docs: "URL reference. Missing scheme and host." - should say "scheme and authority" - "Although URL query strings are often used to represent key/value pairs, they are not a compound element" - Is confusing to me. It seems to say key/values are not supported, but it goes on to say `url_view::params` extracts "this view of key/value pairs". - Defining BOOST_URL_NO_SOURCE_LOCATION does not compile the tests (possibly more?) - message for `error::success` returns arbitrary value: #4372 error.cpp(53) failed: 'BOOST_URL_ERR(error::success).message() == "success"' ('bad alpha' == 'success') - Running the tests is very noisy with console litter, making it hard to find actual failure info - "Exceptions only thrown on excessive input length" - what is excessive? Is that restriction useful? Since parsing doesn't allocate in the first place, at the time of parsing any "excessive" allocation will already have existed. Any limit would usefully apply on conversion to `urls:: - Throw location on parse error appears to always be the call site if `.value()` is allowed to throw. While technically accurate (the _throw location_ vs _error location_) it does diminish the value of the error information if people opt to handle errors by exceptions. This might be more of a Boost System concern. - That said, source location isn't always set - probably some spots missing BOOST_URL_RETURN_EC magic? auto rr = parse_relative_ref("foo:bar"); if (rr.has_error()) { std::cout << "Error: " << rr.error().location() << "\n"; // "(unknown source location)" } [1] https://datatracker.ietf.org/doc/html/rfc3986#section-4.2:~:text=Such%20a%20...
On Mon, Aug 22, 2022, at 4:57 AM, Seth via Boost wrote:
Whereas Boost URL allows the user to specify the intended semantics
- parse_uri("localhost:5555") gives {scheme="localhost", path="5555"} - parse_relative_reference("localhost:5555") gives an error (leftover) - the RFC specifies[1] how to fix the input to allow `:` to appear in the first elements of a relative-path reference: parse_relative_reference("./localhost:5555") gives {path="./localhost:5555"}
Somehow forgot to add the most obvious match here - since it's not a URL: - parse_authority("localhost:5555") which gives {host="localhost",port="5555"} while also giving additional accessors like port_number, has_port etc.
On Sun, Aug 21, 2022 at 7:58 PM Seth via Boost
...
Thank you for taking the time to review the library.
In contrast to others, I think the choice of Nomenclature, namely URL everywhere _except_ in the grammar domain (the parse function names) is a very elegant one: it avoids the historical naming swamp (by choosing the URL camp) while still making it abundantly clear what parts of RFC are referenced.
Wow thanks!
Other libraries don't always expose accurate terms (`authority`, `URI reference`) and it leads to confusion, e.g. python3:
In [1]: from urllib.parse import urlparse In [2]: urlparse('localhost:5555') Out[2]: ParseResult(scheme='', netloc='', path='localhost:5555', params='', query='', fragment='')
That doesn't look quite right, "localhost" should be the scheme and the path should be "5555" which of course is also not what the user likely intended but.. yeah.
I found `resolve` a bit mystifying. While I appreciate the docs not trying to duplicate the RFC specifying what the operation means, I think it will be a big help to have inspirational examples, like example one that I came up with a while ago:
Yes we need more examples.
It would be welcome to _also_ have `resolve()` member functions for discoverability.
Well, I don't know that I would add a member function that performs an identical operation just as an alternative... BUT, there is a good use-case for a member function and that is to allow resolve where the base is also the destination. This is not currently allowed in the free function, because the base is received as a view. But as a member function of the owning containers (url, static_url) it can be done in-place - one user has already requested this.
Regarding the parser interface, I feel this should mature into its own library. I don't see myself using it currently, and would consider it a pure implementation w.r.t. of Boost URL. I have not reviewed this section of the library and only skimmed the docs.
A shame because I put the most effort into those docs :) I feel they are representative of the quality I would like to see for the rest of the documentation.
The documentation can use a little work in the narrative department. On the plus side, some pages already incorporate some "real life" examples. The brevity makes it easy to overlook. Perhaps the examples would be better when fleshed out in long form, then linked to.
So just to be clear for everyone participating in the review or who has plans to use the library, the docs are not yet up to the level I consider good enough for a Bosot library. We will focus on continuing to improve the docs.
- The current implementation trips a fair number of deprecation warnings in Boost Filesystem - I trust those are probably known and easily addressed
The implementation does not use filesystem but the examples and tests do, and we have already made a change to silence those warnings but it is not in master.
- In Containers docs: "URL reference. Missing scheme and host." - should say "scheme and authority" - "Although URL query strings are often used to represent key/value pairs, they are not a compound element" - Is confusing to me. It seems to say key/values are not supported, but it goes on to say `url_view::params` extracts "this view of key/value pairs".
Yep, doc work.. never done. You already have visibility into our workflow so if you want to weigh in while we are polishing that up feel free :)
- Defining BOOST_URL_NO_SOURCE_LOCATION does not compile the tests (possibly more?)
We need to make one of our CI targets set that macro.
- message for `error::success` returns arbitrary value: #4372 error.cpp(53) failed: 'BOOST_URL_ERR(error::success).message() == "success"' ('bad alpha' == 'success')
That's weird, can you please open an issue?
- "Exceptions only thrown on excessive input length" - what is excessive? Is that restriction useful? Since parsing doesn't allocate in the first place, at the time of parsing any "excessive" allocation will already have existed. Any limit would usefully apply on conversion to `urls::
Yeah I'm not sure what this is about, except for static_url which has a hard limit on construction. Thanks
That doesn't look quite right, "localhost" should be the scheme and the path should be "5555" which of course is also not what the user likely intended but.. yeah.
That's why I like the more explicit parse functions. It makes the casual user aware that you can start your parse with different expectations.
Well, I don't know that I would add a member function that performs an identical operation just as an alternative... BUT, there is a good use-case for a member function and that is to allow resolve where the base is also the destination. This is not currently allowed in the free function, because the base is received as a view. But as a member function of the owning containers (url, static_url) it can be done in-place - one user has already requested this.
That's a use case I hadn't even considered, I think the added discoverability is a nice side-catch. Perhaps the javadoc for one can refer to the other for maximum effect.
A shame because I put the most effort into those docs :) I feel they are representative of the quality I would like to see for the rest of the documentation.
I will get to it some day. I do have a thing for parser combinators/generators, and for simple code :)
- message for `error::success` returns arbitrary value: #4372 error.cpp(53) failed: 'BOOST_URL_ERR(error::success).message() == "success"' ('bad alpha' == 'success')
That's weird, can you please open an issue? Done https://github.com/CPPAlliance/url/issues/445 It's just a missing switch case. Perhaps we shouldleverage some compilers' ability to diagnose missing enumeration members in switches.
participants (3)
-
Mateusz Loskot
-
Seth
-
Vinnie Falco