Peter Dimov wrote:
On Sun, Oct 17, 2021 at 8:56 PM Gavin Lambert via Boost
It's worthwhile considering these things from the start, as they can inform design of your baseline (such as compatibility of path segment iteration).
Segment iteration is not going to be compatible. In addition to adding an initial "/" segment for absolute paths, Filesystem also collapses consecutive / separators. So iterating "/foo//bar//baz///" produces
"/" ? "foo" ? "bar" ? "baz" ? ""
(https://godbolt.org/z/EsjKzc5f1)
A design goal of URL seems to be that the information that the accessors give accurately reflects the contents of the string (and that there's no hidden metadata that the string doesn't reflect.)
So the segments of the above path are
{ "foo", "", "bar", "", "baz", "", "", "" }
because otherwise the segments of the above and "/foo/bar/baz/" will be the same, which means that it won't be possible to reconstruct the string from the information the URL accessors give.
Right. But why has it chosen that goal, rather than the alternatives? What's the rationale? It seems to me that a URL with redundant /s (e.g. http://foo.com/path/////to/file) is either (a) malicious or erroneous input, or (b) equivalent to the versions without the redundant /s. So a user might want to (a) get an exception or error, or (b) ignore the redundant segments. Under what circumstances would a user want to see the empty segments between those /s? Here's an alternative: - Skip over duplicate adjacent '/' when iterating segments. - Return "/" as the first segment for absolute paths. - Return "" as the last segment for paths with a trailing "/". - Give p.push_back(s) a precondition that s must not be empty if p.back() is empty. I think this gives pretty sane behaviour. The invariant that push_back()ing a series of segments and then iterating returns the same strings holds. Vinnie Falco wrote:
note that the "absoluteness" of the path is a property of the URL which is reflected in the url API and not the segments:
You're saying this because that's what the BNF says. Your URL api doesn't have to exactly mirror the BNF. If it would make sense for "absoluteness" to be a property of the path rather than the URL from the point of view of the library user, you can do that. Two other things to consider: - What is your operator== going to do about redundant /s? - How does this all work with data: and mailto: URLs? Regards, Phil.