On Sun, Oct 17, 2021 at 2:52 PM Gavin Lambert via Boost
assert( u.encoded_url() == "https:/.//index.htm" );
I assume this was intended to be "https://./index.htm"?
Nope, it was correct as I wrote it. You managed to produce an authority with a single dot :)
abs("/././/", { ".", "", "" });
We treat a leading "/." as not appearing in the segments, to make the behavior of the library doing these syntactic adjustments transparent and satisfy the rule that assignments from segments produce the same result when iterated.
If you're stripping leading ./ then shouldn't the result just be "/" alone? Same reason that "/../../foo/../bar/" should become "/bar/".
Well no, there's a difference between what is in the value returned by url_view::encoded_path() and what you get when you iterate the segments. Leading "/." or "./" stays in the encoded segments but is not returned by iterating segments, for the reason that it is considered "metadata" about the path that keeps it regular without changing the meaning. I think you are getting confused with "normalization" which is a different thing entirely. Given the following URL: https:/.//index.htm Normalization would leave it as-is. Given: https:/././/index.htm normalization would return https:/.//index.htm If you start with the URL above and add an authority: url u = parse_uri( "https:/.//index.htm" ).value(); u.set_encoded_authority( "example.com" ); The result is assert( u.encoded_url() == "https://example.com//index.htm" ); So there are three things at play here: 1. Modifying the path for normalization 2. Tweaking the path to match the grammar 3. Tweaking the path to provide segments() container invariants Number 1 above is what people are mostly familiar with, for example collapsing double dotted segments ".." safely. Number 2 is understood by fewer people but is a consequence of the grammar in the RFC. For example, if you have an authority, and a path that starts with double slash "//", then if you remove the authority you have to prepend "/." to the path. Another one, if you have a scheme and a relative path whose first segment contains a colon, and you remove the scheme then you have to prepend "./" to the path. These tweaks let the library guarantee that all mutation operations leave the URL in a syntactically valid state without having to do weird things like throw exceptions, return error codes, ignore the request, or worse impose additional semantic changes to the URL (for example, turning an absolute path into a relative one in a case where the user didn't explicitly request it). Number 3 is the most controversial and unintuitive, it falls out as a consequence of making the segments and encoded_segments containers behave exactly like vector<string>. For example, if you call clear() on the container, then it should return an empty list: url u = parse_relative_ref( "path/to/file.txt" ).value(); u.segments().clear(); assert( u.segments().begin() == u.segments().end() ); However, what if you have an absolute path? url u = parse_relative_ref( "/path/to/file.txt" ); assert( u.segments() == { "path", "to", "file.txt" } ); Okay so far so good but what if you clear? u.segments().clear(); assert( u.encoded_url() == "/" ); Wait, that's not clear, there's still a path segment! Well of course there is, if you clear an absolute path you should get back an absolute path. But the segments container should be empty: assert( u.segments().begin() == u.segments().end() ); // has to pass See what's happening here? Lets start with a relative path url u = parse_relative_ref( "index.htm" ); Now lets reassign the path: u.segments() = { "my:file.txt" }; Well, we can't leave the URL as "my:file.txt" because that would be interpreted as a scheme. So the library prepends "./": assert( u.encoded_segments() == "./my:file.txt" ); But we have an invariant, after you assign the segments you have to get the same thing back: assert( u.segments() = { "my:file.txt" } ); To enforce this invariant we have to treat some path prefixes as if they weren't there. "/" by itself, "./", and "/.". And that's how its done Thanks