Re: [boost] Boost.URL: Something Wicked This Way Comes!

19 Oct 2021

      Peter Dimov wrote:
...
...
On Sun, Oct 17, 2021 at 8:56 PM Gavin Lambert via Boost
...
It's worthwhile considering these things from the start, as they can
inform design of your baseline (such as compatibility of path segment
iteration).
Segment iteration is not going to be compatible. In addition to adding
an initial "/" segment for absolute paths, Filesystem also collapses
consecutive / separators. So iterating "/foo//bar//baz///" produces
"/" ? "foo" ? "bar" ? "baz" ? ""
(https://godbolt.org/z/EsjKzc5f1)
A design goal of URL seems to be that the information that the accessors
give accurately reflects the contents of the string (and that there's no
hidden metadata that the string doesn't reflect.)
So the segments of the above path are
{ "foo", "", "bar", "", "baz", "", "", "" }
because otherwise the segments of the above and "/foo/bar/baz/" will
be the same, which means that it won't be possible to reconstruct the string
from the information the URL accessors give.
Right. But why has it chosen that goal, rather than the alternatives?
What's the rationale?

It seems to me that a URL with redundant /s (e.g. http://foo.com/path/////to/file)
is either (a) malicious or erroneous input, or (b) equivalent to the
versions without the redundant /s. So a user might want to (a) get  an
exception or error, or (b) ignore the redundant segments. Under what
circumstances would a user want to see the empty segments between those
/s?

Here's an alternative:

- Skip over duplicate adjacent '/' when iterating segments.

- Return "/" as the first segment for absolute paths.

- Return "" as the last segment for paths with a trailing "/".

- Give p.push_back(s) a precondition that s must not be empty if
p.back() is empty.

I think this gives pretty sane behaviour. The invariant that push_back()ing
a series of segments and then iterating returns the same strings holds.

Vinnie Falco wrote:
...
note that the "absoluteness" of the path is a property of the
URL which is reflected in the url API and not the segments:
You're saying this because that's what the BNF says. Your URL api
doesn't have to exactly mirror the BNF. If it would make sense for
"absoluteness" to be a property of the path rather than the URL from
the point of view of the library user, you can do that.

Two other things to consider:

- What is your operator== going to do about redundant /s?
- How does this all work with data: and mailto: URLs?

Regards, Phil.

Re: [boost] Boost.URL: Something Wicked This Way Comes!

Phil Endecott