URL library?

newer
[Boost.Real] Testing Boost.Real in...

older
Re: [boost] Yet Another Container:...

Vinnie Falco

21 Jan 2020 21 Jan '20

3:43 a.m.

Is there any interest in a URL library for Boost? This is something that has been requested for a while now, and I've finally gotten around to it. Key features: * Construct a read-only url::view from a string_view * Construct a modifiable url::value from a string_view - Mutate the parts (e.g. set_scheme) - Set encoded or decoded strings: url::value u; u.set_username("Fr ed"); u.set_encoded_password("pass%20word"); - Retrieve encoded or decoded strings: u.username(); // returns decoded std::string u.encoded_password(); // returns encoded string_view For servers, execution paths are provided to avoid all dynamic allocation. For example to retrieve the decoded username: url::static_pool<4000> sp; std::cout << u.username( sp.allocator() ); The std::basic_string returned by username() uses the specified allocator. A server can handle URLs without allocating any memory. There's some punycode conversion routines but I haven't figured out if they should be part of the library, or how they would manifest as APIs (for international domain names). You can perform calculations with URLs using an Allocator (default to std::allocator<char>), or you can use a container with "static storage" (e.g. fixed_string): url::static_value<4000> u; // 4000 char capacity The library is here: <https://github.com/vinniefalco/url> This is still a work in progress, and I'm open to feedback that might help me make better remaining design choices. Thanks

Show replies by date

Mateusz Loskot

21 Jan 21 Jan

7:12 a.m.

On Tue, 21 Jan 2020 at 04:44, Vinnie Falco via Boost <boost@lists.boost.org> wrote:

...

Is there any interest in a URL library for Boost?

Yes, I'm very much interested.

...

There's some punycode conversion routines but I haven't figured out if they should be part of the library, or how they would manifest as APIs (for international domain names). [...] The library is here:

<https://github.com/vinniefalco/url>

This is still a work in progress, and I'm open to feedback that might help me make better remaining design choices.

There is no reference to any of the URI/URL RFCs in the code or any (documentation) files. Is this deliberate? What's the status of conformance? Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

Ggh

9:29 a.m.

Yes, interested As a matter of fact: I have a URL class for http/https specifically, could act as a starting point. ty, best Greg Quoting Mateusz Loskot via Boost <boost@lists.boost.org>:

...

On Tue, 21 Jan 2020 at 04:44, Vinnie Falco via Boost <boost@lists.boost.org> wrote:

...
Is there any interest in a URL library for Boost?

Yes, I'm very much interested.

...
There's some punycode conversion routines but I haven't figured out if they should be part of the library, or how they would manifest as APIs (for international domain names). [...] The library is here:

<https://github.com/vinniefalco/url>

This is still a work in progress, and I'm open to feedback that might help me make better remaining design choices.

There is no reference to any of the URI/URL RFCs in the code or any (documentation) files. Is this deliberate? What's the status of conformance?

Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Tell me, and I forget. Ask me, and I discover...

Ggh

9:39 a.m.

New subject: URL library?YUP

I have a URL class, which could act like a starter, ...it is someplace... cheers Greg Quoting Mateusz Loskot via Boost <boost@lists.boost.org>:

...

On Tue, 21 Jan 2020 at 04:44, Vinnie Falco via Boost <boost@lists.boost.org> wrote:

...
Is there any interest in a URL library for Boost?

Yes, I'm very much interested.

...
There's some punycode conversion routines but I haven't figured out if they should be part of the library, or how they would manifest as APIs (for international domain names). [...] The library is here:

<https://github.com/vinniefalco/url>

This is still a work in progress, and I'm open to feedback that might help me make better remaining design choices.

There is no reference to any of the URI/URL RFCs in the code or any (documentation) files. Is this deliberate? What's the status of conformance?

Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Tell me, and I forget. Ask me, and I discover...

Vinnie Falco

3:37 p.m.

On Mon, Jan 20, 2020 at 11:13 PM Mateusz Loskot via Boost <boost@lists.boost.org> wrote:

...

There is no reference to any of the URI/URL RFCs in the code or any (documentation) files. Is this deliberate?

Thanks for the feedback! Yes there is a link here: <https://github.com/vinniefalco/url/blob/develop/include/boost/url/detail/parse.hpp#L21>

...

What's the status of conformance?

The target is rfc3986 compliance. I believe it is there (modulo bugs). These tests all pass: <https://github.com/vinniefalco/url/blob/cfd09ee8925d596b201fc0502d6bf6a407fb3b27/test/value.cpp> Thanks

Mateusz Loskot

22 Jan 22 Jan

4:25 a.m.

On Tue, 21 Jan 2020, 16:37 Vinnie Falco, <vinnie.falco@gmail.com> wrote:

...

On Mon, Jan 20, 2020 at 11:13 PM Mateusz Loskot via Boost <boost@lists.boost.org> wrote:

...
There is no reference to any of the URI/URL RFCs in the code or any (documentation) files. Is this deliberate?

Thanks for the feedback! Yes there is a link here:

< https://github.com/vinniefalco/url/blob/develop/include/boost/url/detail/par...

...

Thanks! GitHub seemed to fail find this for me.

...

What's the status of conformance?

The target is rfc3986 compliance.

Sweet! Mateusz Loskot, mateusz@loskot.net (Sent from mobile, may suffer from top-posting)

Vinnie Falco

4:27 a.m.

On Tue, Jan 21, 2020 at 8:26 PM Mateusz Loskot via Boost <boost@lists.boost.org> wrote:

...

GitHub seemed to fail find this for me. ...

...
What's the status of conformance?

Yes, to clarify, I have floated this library a little bit earlier in its development cycle than my other libraries. This is because I have some open design questions such as how to handle punycode, and what do to with percent-encoding with respect to Unicode. Thus the library and documentation is not quite as well-developed as my other offerings. Although since it is a much smaller library, I'll have it whipped into shape in short order (working on the docs now). Thanks

Dominique Devienne

21 Jan 21 Jan

10:01 a.m.

On Tue, Jan 21, 2020 at 4:44 AM Vinnie Falco via Boost <boost@lists.boost.org> wrote:

...

Is there any interest in a URL library for Boost?

Yes, interested as well. I typically rely on QUrl (which brings in QtCore) or WebSocketPP's url, but I'd prefer a nice one from you and Boost Vinnie. I had a quick look, and the first thing that jumps to my mind though is the shear number of files, in the repo, and even just the source code, for what is a small library. Do schema and host_type need their own headers, and sometimes impl/.hpp, .ipp ??? I've known people/orgs with rules like 1-class-1-file, which I find overly granular. I've a big fan of "amalgamated" libraries, especially those which are header-only, where you can drop just 1 or 2 or 3 files into your project, and build them as source with your own code. Lowers the barrier to try something tremendously. With Boost, the hurdles are high enough, I don't even try before my org updates the full 3rd party, every 2 or 3 years... I'm probably extreme, in doing the opposite of 1-class-1-file, with a pair of .h/.cpp files that are more equivalent to an entire library (worse offender is 2K .h, and 14K .cpp), but it seems to me that the proposed Boost.URL has an awful lot of source files, "just" for URL parsing. I'd have a .h/.hpp/.ipp only myself :). .h for decls and inlines only with minimum header deps, .hpp for template stuff with additional includes, .ipp/.cpp for non-tempate non-inline impls. but I'm know I'm far from mainstream here :). ---DD PS: Also saw some references to Boost.Beast in passing. PPS: Is the allocator support similar to your proposed Boost.JSON? Could that be an independent component.

Vinnie Falco

3:41 p.m.

On Tue, Jan 21, 2020 at 2:03 AM Dominique Devienne via Boost <boost@lists.boost.org> wrote:

...

I had a quick look, and the first thing that jumps to my mind though is the shear number of files, in the repo, and even just the source code, for what is a small library. Do schema and host_type need their own headers, and sometimes impl/.hpp, .ipp ???

Yes, everything is organized that way for specific reasons. Although the final version of the library may have a slightly different set of files. For example, I might just get rid of scheme.hpp and everything in it.

...

it seems to me that the proposed Boost.URL has an awful lot of source files, "just" for URL parsing.

It isn't "just" URL parsing, it is also encoding and decoding algorithms, custom storage and allocation, and modification of the URL.

...

PS: Also saw some references to Boost.Beast in passing. PPS: Is the allocator support similar to your proposed Boost.JSON? Could that be an independent component.

Boost.JSON has its own special allocator model because of the hierarchical nature of the JSON container. Since a boost::url::value is effectively just a string, the allocator model in this new library is much simpler. A derived class uses the already familiar Allocator parameter. Thanks

Andrey Semashev

10:12 a.m.

On 2020-01-21 06:43, Vinnie Falco via Boost wrote:

...

Is there any interest in a URL library for Boost? This is something that has been requested for a while now, and I've finally gotten around to it.

I'd be more interested in a more generic URI library. Along with a few associated algorithms, e.g. those described in: https://tools.ietf.org/html/rfc3986

...

Key features:

* Construct a read-only url::view from a string_view * Construct a modifiable url::value from a string_view

Why not uri and uri_view.

Vinnie Falco

3:51 p.m.

On Tue, Jan 21, 2020 at 2:13 AM Andrey Semashev via Boost <boost@lists.boost.org> wrote:

...

I'd be more interested in a more generic URI library. Along with a few associated algorithms, e.g. those described in: https://tools.ietf.org/html/rfc3986

Yes, this library does that. I do not use the term "URI" because it is confusing and pointless. They are all URLs now. My library follows the RFC, except that I have renamed the top level production rules to reflect this preference: URL = scheme ":" hier-part [ "?" query ] [ "#" fragment ] URL-reference = URL / relative-ref absolute-URL = scheme ":" hier-part [ "?" query ] I didn't invent this idea, deprecating the word "URI" and using "URL" consistently in its place is recommended by WhatWG.

...

Why not uri and uri_view.

First, I don't use the term "uri" ever. But i think you're asking, why not "url" and "url_view?" Because `url::url` and `url::url_view` look bad, they repeat a word. Thus we have `url::view` and `url::value`, which are sensible. Thanks

Andrey Semashev

6:39 p.m.

On 2020-01-21 18:51, Vinnie Falco wrote:

...

On Tue, Jan 21, 2020 at 2:13 AM Andrey Semashev via Boost <boost@lists.boost.org> wrote:

...
I'd be more interested in a more generic URI library. Along with a few associated algorithms, e.g. those described in: https://tools.ietf.org/html/rfc3986

Yes, this library does that. I do not use the term "URI" because it is confusing and pointless. They are all URLs now. My library follows the RFC, except that I have renamed the top level production rules to reflect this preference:

URL = scheme ":" hier-part [ "?" query ] [ "#" fragment ] URL-reference = URL / relative-ref absolute-URL = scheme ":" hier-part [ "?" query ]

I didn't invent this idea, deprecating the word "URI" and using "URL" consistently in its place is recommended by WhatWG.

There is a semantic difference between URI and URL - the former is an identifier and the latter is a locator (i.e. a path to a resource location). You can treat locator as an identifier but not the other way around. Using the term URL to refer to an URI is confusing. The reason I'm interested particularly in URIs is because I have to deal with them, not so much with URLs.

...

...
Why not uri and uri_view.

First, I don't use the term "uri" ever. But i think you're asking, why not "url" and "url_view?" Because `url::url` and `url::url_view` look bad, they repeat a word. Thus we have `url::view` and `url::value`, which are sensible.

Well, no, not really. I know 'using namespace abc;' is not something universally welcome, but its is a valid use case nonetheless. After that having `view` and `value` is no longer sensible. I would still prefer `boost::uris::uri` and `boost::uris::uri_view`. Note that the namespace is plural.

Andrey Semashev

6:47 p.m.

On 2020-01-21 21:39, Andrey Semashev wrote:

...

On 2020-01-21 18:51, Vinnie Falco wrote:

...
On Tue, Jan 21, 2020 at 2:13 AM Andrey Semashev via Boost <boost@lists.boost.org> wrote:

...
I'd be more interested in a more generic URI library. Along with a few associated algorithms, e.g. those described in: https://tools.ietf.org/html/rfc3986

Yes, this library does that. I do not use the term "URI" because it is confusing and pointless. They are all URLs now. My library follows the RFC, except that I have renamed the top level production rules to reflect this preference:

URL = scheme ":" hier-part [ "?" query ] [ "#" fragment ] URL-reference = URL / relative-ref absolute-URL = scheme ":" hier-part [ "?" query ]

I didn't invent this idea, deprecating the word "URI" and using "URL" consistently in its place is recommended by WhatWG.

There is a semantic difference between URI and URL - the former is an identifier and the latter is a locator (i.e. a path to a resource location). You can treat locator as an identifier but not the other way around. Using the term URL to refer to an URI is confusing.

The reason I'm interested particularly in URIs is because I have to deal with them, not so much with URLs.

Also, I'll add that WhatWG is a web-related working group, and URIs are used in many other areas. In my case it's telephony and media processing.

...

...
...
Why not uri and uri_view.

First, I don't use the term "uri" ever. But i think you're asking, why not "url" and "url_view?" Because `url::url` and `url::url_view` look bad, they repeat a word. Thus we have `url::view` and `url::value`, which are sensible.

Well, no, not really. I know 'using namespace abc;' is not something universally welcome, but its is a valid use case nonetheless. After that having `view` and `value` is no longer sensible.

I would still prefer `boost::uris::uri` and `boost::uris::uri_view`. Note that the namespace is plural.

Vinnie Falco

7:59 p.m.

On Tue, Jan 21, 2020 at 10:41 AM Andrey Semashev via Boost <boost@lists.boost.org> wrote:

...

There is a semantic difference between URI and URL - the former is an identifier and the latter is a locator (i.e. a path to a resource location). You can treat locator as an identifier but not the other way around. Using the term URL to refer to an URI is confusing.

Having both terms is confusing, and WhatWG got this right. The vast majority of users just want to "parse a URL", for example one that comes in from an HTTP request, or one that is specified on the command line. When they go into Google, they type "URL" they don't type "URI." Hardly anyone knows what a URI is. But even my mother who is 90 knows what a URL is. I want my libraries to be popular and have mass appeal, not just satisfy a niche audience of super-experts. When I type "URI" into Google I get: About 287,000,000 results (0.87 seconds) www.uri.edu The University of Rhode Island (top result) People Also Ask: What is difference URL and URI? While they are used interchangeably, there are some subtle differences... Now if I type "URL" into Google, I get: About 12,620,000,000 results (0.50 seconds) en.wikipedia.org › wiki › URL URL - Wikipedia (top result) People Also Ask: What is the URL? What is an example of a URL address? How do I find URL? What is the path in the URL? What is URL on my phone? What does WWW stand for? Yes, not only is "URL" 44 times more popular than "URI" in terms of search results, but the top question about "URI" is "What is difference URL and URI?". While for "URL" no one is asking about the difference. Another way to think of it, in terms of name recognition "URI" is to .org what "URL" is to .com. People assume that a domain name is in .com because that's the most popular TLD. That's why .com domains go for so much more money. It is true that URL is not an exact fit if you adhere to the technical documentation 100%, but I think the overall benefit of just standardizing on the name "URL" outweighs the downsides. It is easier for users, better for Boost, and gives the library more appeal to average folk.

...

The reason I'm interested particularly in URIs is because I have to deal with them, not so much with URLs.

This library should do everything you want with URIs since I take care of parsing all the top-level rules. The library does not make assumptions about the data. For example if you want to treat the path as just one string and ignore the segments, you can do that. If you want to ignore the distinction between username and password in the userinfo, you can do that too. You can treat the query params as an associative array of key/value pairs if you want, or you can ignore that and just work with the query directly. If you have specific use-cases feel free to open an issue or cite them here and I will make sure they are attended to (assuming it is in-scope). Thanks P.S. "Only snobs call it a URI" :)

Andrey Semashev

8:27 p.m.

On 2020-01-21 22:59, Vinnie Falco wrote:

...

On Tue, Jan 21, 2020 at 10:41 AM Andrey Semashev via Boost <boost@lists.boost.org> wrote:

...
There is a semantic difference between URI and URL - the former is an identifier and the latter is a locator (i.e. a path to a resource location). You can treat locator as an identifier but not the other way around. Using the term URL to refer to an URI is confusing.

Having both terms is confusing, and WhatWG got this right. The vast majority of users just want to "parse a URL", for example one that comes in from an HTTP request, or one that is specified on the command line. When they go into Google, they type "URL" they don't type "URI." Hardly anyone knows what a URI is. But even my mother who is 90 knows what a URL is.

I want my libraries to be popular and have mass appeal, not just satisfy a niche audience of super-experts. When I type "URI" into Google I get:

About 287,000,000 results (0.87 seconds) www.uri.edu The University of Rhode Island (top result)

People Also Ask: What is difference URL and URI? While they are used interchangeably, there are some subtle differences...

Now if I type "URL" into Google, I get:

About 12,620,000,000 results (0.50 seconds) en.wikipedia.org › wiki › URL URL - Wikipedia (top result)

People Also Ask: What is the URL? What is an example of a URL address? How do I find URL? What is the path in the URL? What is URL on my phone? What does WWW stand for?

Yes, not only is "URL" 44 times more popular than "URI" in terms of search results, but the top question about "URI" is "What is difference URL and URI?". While for "URL" no one is asking about the difference.

You get more exposure of the URL term because there are much more people using web for various reasons than e.g. SIP or email or SDP. For web, sure, there's the URL bar in your browser and HTTP headers and that's pretty much it. Given this, I can understand WhatWG's decision to standardize URLs *in their specific domain*. That doesn't make that choice valid in other domains. Search through SIP RFC and you will find the correct term is URI there. If your library targets those other domains, you should speak their language, too. Sorry, but I can't call e.g. an email address an URL, and I don't agree with proliferation of such confusion. It's MB vs. MiB all over again.

...

Another way to think of it, in terms of name recognition "URI" is to .org what "URL" is to .com. People assume that a domain name is in .com because that's the most popular TLD. That's why .com domains go for so much more money.

It is true that URL is not an exact fit if you adhere to the technical documentation 100%, but I think the overall benefit of just standardizing on the name "URL" outweighs the downsides. It is easier for users, better for Boost, and gives the library more appeal to average folk.

Well, let's agree to disagree then.

Gavin Lambert

22 Jan 22 Jan

1:10 a.m.

On 22/01/2020 07:39, Andrey Semashev wrote:

...

On 2020-01-21 18:51, Vinnie Falco wrote:

...
On Tue, Jan 21, 2020 at 2:13 AM Andrey Semashev wrote:

...
I'd be more interested in a more generic URI library. Along with a few associated algorithms, e.g. those described in: https://tools.ietf.org/html/rfc3986

Yes, this library does that. I do not use the term "URI" because it is confusing and pointless. They are all URLs now. My library follows the RFC, except that I have renamed the top level production rules to reflect this preference:

URL = scheme ":" hier-part [ "?" query ] [ "#" fragment ] URL-reference = URL / relative-ref absolute-URL = scheme ":" hier-part [ "?" query ]

I didn't invent this idea, deprecating the word "URI" and using "URL" consistently in its place is recommended by WhatWG.

There is a semantic difference between URI and URL - the former is an identifier and the latter is a locator (i.e. a path to a resource location). You can treat locator as an identifier but not the other way around. Using the term URL to refer to an URI is confusing.

Notably, all URLs are URIs, but not all URIs are URLs. Some are URNs, for example, which are structured a bit differently (eg. "urn:oasis:names:specification:docbook:dtd:xml:4.1.2"). A program only dealing with "locations to download from" generally only needs to worry about URLs, but there are other places where all URIs (including URNs) may be encountered (even by such a program) -- for example, as XML namespace identifiers. (Usually these can be treated as opaque, though.) Still, given that the same parsing rules can apply to both (URNs usually just have a long opaque path after the "urn" scheme), it doesn't seem unreasonable to call it an "URL library" anyway (despite the recommendation in RFC3986). Some people would be confused by calling them "URIs" and those who know better will know that as well. Having said that, the docs should call out RFC support and URI compatibility explicitly, so that people aren't left wondering.

Vinnie Falco

1:20 a.m.

On Tue, Jan 21, 2020 at 5:10 PM Gavin Lambert via Boost <boost@lists.boost.org> wrote:

...

...Some are URNs,...

LOL!! I was hoping to reduce it to one term but instead now we have three.. Fortunately URN has the same syntax, it is just a custom scheme. The way I deal with that is that the user can parse the urn as a URL, check the scheme, and then apply the scheme-specific syntax rules for subdelimiters to the individual parts.

...

Some people would be confused by calling them "URIs" and those who know better will know that as well. Having said that, the docs should call out RFC support and URI compatibility explicitly, so that people aren't left wondering.

Yes I agree, I added that to the list of tasks. Thanks

Christian Mazakas

1:24 a.m.

It's also worth mentioning, there are alternative URL parser implementations available. For example, here's Furi: https://github.com/LeonineKing1199/furi It's in essence of a port of the URI ABNF written in Boost.Spirit, more specifically X3. There's also routines for percent encoding and decoding. Instead of the proposed Boost.URL, this lib aims to be low-level but composable. Because the entire ABNF set is exported, one could theoretically re-compose a parser that'd handle any scenario. The emphasis is on immutability and functional style of programming. The main structure that users will interact with is really just a POD of `std::string_view`s and parser combinators themselves are also very FP-oriented. Less than desirable aspects of the lib are that it only does Unicode in UTF-32 but does give you easy methods of converting to it. This was done for the sake of simplicity and also because that's how X3 does it. Fortunately, most URLs are relatively small in practice so the storage overhead is affordable in most scenarios. The best way of verifying the parser are the various uri and uri_parts tests. If you can think of a URL that'd break it, I'd love to try it! If it's ABNF-correct, Furi will recognize it too! - Chris

Andrey Semashev

8:59 a.m.

On 2020-01-22 04:10, Gavin Lambert via Boost wrote:

...

On 22/01/2020 07:39, Andrey Semashev wrote:

...
On 2020-01-21 18:51, Vinnie Falco wrote:

...
On Tue, Jan 21, 2020 at 2:13 AM Andrey Semashev wrote:

...
I'd be more interested in a more generic URI library. Along with a few associated algorithms, e.g. those described in: https://tools.ietf.org/html/rfc3986

Yes, this library does that. I do not use the term "URI" because it is confusing and pointless. They are all URLs now. My library follows the RFC, except that I have renamed the top level production rules to reflect this preference:

URL = scheme ":" hier-part [ "?" query ] [ "#" fragment ] URL-reference = URL / relative-ref absolute-URL = scheme ":" hier-part [ "?" query ]

I didn't invent this idea, deprecating the word "URI" and using "URL" consistently in its place is recommended by WhatWG.

There is a semantic difference between URI and URL - the former is an identifier and the latter is a locator (i.e. a path to a resource location). You can treat locator as an identifier but not the other way around. Using the term URL to refer to an URI is confusing.

Notably, all URLs are URIs, but not all URIs are URLs. Some are URNs, for example, which are structured a bit differently (eg. "urn:oasis:names:specification:docbook:dtd:xml:4.1.2").

A program only dealing with "locations to download from" generally only needs to worry about URLs, but there are other places where all URIs (including URNs) may be encountered (even by such a program) -- for example, as XML namespace identifiers. (Usually these can be treated as opaque, though.)

Still, given that the same parsing rules can apply to both (URNs usually just have a long opaque path after the "urn" scheme), it doesn't seem unreasonable to call it an "URL library" anyway (despite the recommendation in RFC3986). Some people would be confused by calling them "URIs" and those who know better will know that as well. Having said that, the docs should call out RFC support and URI compatibility explicitly, so that people aren't left wondering.

From https://tools.ietf.org/html/rfc8141: A Uniform Resource Name (URN) is a Uniform Resource Identifier (URI) that is assigned under the "urn" URI scheme and a particular URN namespace, with the intent that the URN will be a persistent, location-independent resource identifier. So the name URI is very much appropriate when working with URNs. As is with URLs. But URL definitely is not the appropriate term to work with URNs. "People will understand what you mean" is not the right reasoning. As a programmer, you have every opportunity to pick the right name for the entity of your code, so that a technically educated reader understands what this entity represents. People who aren't programmers or do not know even the basic terms in your technical domain are not your audience. Personally, I wouldn't be using a `url` type to represent URIs for the documentation purpose alone.

Gavin Lambert

3:36 a.m.

On 21/01/2020 16:43, Vinnie Falco wrote:

...

Is there any interest in a URL library for Boost? This is something that has been requested for a while now, and I've finally gotten around to it.

I'm quite interested. Though some docs would be nice. ;)

...

For servers, execution paths are provided to avoid all dynamic allocation. For example to retrieve the decoded username: url::static_pool<4000> sp; std::cout << u.username( sp.allocator() );

Repeated reinventing of static allocators gives me some pause. Maybe that should be broken out into a separate library first? And maybe recently-accepted FixedString could use it too (or you could use theirs)?

...

The library is here:

<https://github.com/vinniefalco/url>

Glancing at https://github.com/vinniefalco/url/blob/develop/include/boost/url/impl/basic..., it looks like there's quite a bit of duplicate code (eg. between set_password and set_encoded_password). I assume this is related to the desire to avoid allocation, but perhaps you could make use of your own static_pool when delegating common subtasks, rather than duplicating the logic? (Side note: I find the "wrap at <40 columns" style harder to read. Who has screens that narrow these days?)

Vinnie Falco

4:15 a.m.

On Tue, Jan 21, 2020 at 7:37 PM Gavin Lambert via Boost <boost@lists.boost.org> wrote:

...

Though some docs would be nice. ;)

Heh... working on that. And per the Ramey Rule, the doc work has surfaced defects in the API which I am fixing. This page has the most work: <http://vinniefalco.github.io/doc/url/url/ref/boost__url__basic_value.html> Still being worked on of course.

...

Repeated reinventing of static allocators gives me some pause. Maybe that should be broken out into a separate library first?

Well this is not such an easy thing. One of the goals for all my libraries is that they can work outside of boost (just define BOOST_URL_STANDALONE). I could break out this little allocator into another library, but I doubt it is enough to justify a whole entire lib. Is there another already existing allocator that does the same thing? I'm not sure there is. But even so, users who just need to parse, modify, and compose URLs in their server, and wish to avoid memory allocations will be glad that they have a 170-line solution in a single header available to them without the need to look elsewhere.

...

And maybe recently-accepted FixedString could use it too (or you could use theirs)?

FixedString doesn't use any allocator. The reason I use the Allocator model here (versus my home-brewed "storage_ptr" in Boost.JSON) is because I want to return std::basic_string from the relevant functions.

...

Glancing at https://github.com/vinniefalco/url/blob/develop/include/boost/url/impl/basic..., it looks like there's quite a bit of duplicate code (eg. between set_password and set_encoded_password).

I assume this is related to the desire to avoid allocation, but perhaps you could make use of your own static_pool when delegating common subtasks, rather than duplicating the logic?

I think what you're proposing is that set_password() can first percent-encode the string using a local pool, and then pass that to set_encoded_password(). This will certainly eliminate the duplicated code. But then we are either placing a limit on the size of the string that may be passed, or we have the possibility of going to the heap one extra time (to handle the case where the resulting string is larger than the static_pool's capacity). I think I would just rather live with the duplicated code. Although, if you look closely it isn't _really_ duplicated, there are subtle variations in it which admittedly are rather resistant to refactoring although I haven't tried very hard. Open to ideas how it can be reduced, without the need to allocate.

...

(Side note: I find the "wrap at <40 columns" style harder to read. Who has screens that narrow these days?)

No idea what you're going on about here. Thanks

Phil Endecott

29 Jan 29 Jan

4:18 p.m.

Hi Vinnie, Vinnie Falco wrote:

...

Is there any interest in a URL library for Boost? <https://github.com/vinniefalco/url>

Have you considered to what extent its interface could look like std::filesystem::path ? Back in about 2005 I wrote a URI parser using Spirit, mainly as an exercise to learn about Spirit's new features. The code was compact and directly corresponded to the BNF in the spec. Run-time performance was good though compilation times were slow and error messages were horrible. The main problem I faced was that the BNF in the spec was wrong! The RFC had to be read alongside an obscure "errata" webpage (https://skrb.org/ietf/http_errata.html) that I didn't discover for ages. Anyway, I've had a look at your parser and... well it's about five times as much code and much further removed from the BNF. Regards, Phil.

Vinnie Falco

6:33 p.m.

On Wed, Jan 29, 2020 at 8:51 AM Phil Endecott via Boost <boost@lists.boost.org> wrote:

...

The main problem I faced was that the BNF in the spec was wrong!

Yeah! I have noticed that the RFC sure doesn't go out of its way to make things easy to understand.

...

...I've had a look at your parser and... well it's about five times as much code and much further removed from the BNF.

"Five times as much code" as what, a program that uses Spirit? Are you including the Spirit source code in that figure? Anyway... That parsing code is written in a certain style which assists with producing meaningful code coverage reports. This makes it appear longer than it needs to be. But not 5x longer (more like 5%). I don't use Spirit for any libraries for a few reasons: * Users don't like it * It consumes too much resources to compile * It adds a dependency * It is less secure With respect to dependencies, all my new libraries work without Boost with an appropriate configuration macro defined (they will also require C++17 instead of C++11 in this mode). A dependency on Boost.Spirit would circumvent this. Any security audit [1] of a version of Beast, Boost.JSON, or Boost.URL which used Spirit would come with a massive disclaimer of security assumptions of Spirit. By writing the parsers in a way that they don't call into any outside code, the respective libraries can make stronger security guarantees. Furthermore the library can assert that absent changes in the parsing code, the security assurances are still valid. Compare this with what might happen if a bug is introduced in a dependency in newer version. Thanks [1] <https://vinniefalco.github.io/BeastAssets/Beast%20-%20Hybrid%20Application%20Assessment%202017%20-%20Assessment%20Report%20-%2020171114.pdf>

Vinnie Falco

30 Jan 30 Jan

4:59 a.m.

On Wed, Jan 29, 2020 at 8:51 AM Phil Endecott via Boost <boost@lists.boost.org> wrote:

...

Have you considered to what extent its interface could look like std::filesystem::path ?

Do you mean just the path-handling interface of the URL container or do you mean the whole container? For example using `operator/`? I'm generally against getting fancy with overloading operators and trying to make DSL. I have found that simple works best. In the case of the URL container, good ole' fashioned "get" and "set" functions with accurate, consistent, and descriptive are called for. I did splurge a little bit and work up some container-like views over the path and query params. And the query params will be modifiable through the view, for example to insert or change params. (or erase them). Thanks

1978

Age (days ago)

1987

Last active (days ago)

List overview

Download

23 comments

8 participants

participants (8)

Andrey Semashev
Christian Mazakas
Dominique Devienne
Gavin Lambert
Ggh
Mateusz Loskot
Phil Endecott
Vinnie Falco

URL library?

Vinnie Falco

Mateusz Loskot

Ggh

Ggh

Vinnie Falco

Mateusz Loskot

Vinnie Falco

Dominique Devienne

Vinnie Falco

Andrey Semashev

Vinnie Falco

Andrey Semashev

Andrey Semashev

Vinnie Falco

Andrey Semashev

Gavin Lambert

Vinnie Falco

Christian Mazakas

Andrey Semashev

Gavin Lambert

Vinnie Falco

Phil Endecott

Vinnie Falco

Vinnie Falco

tags

participants (8)