[filesystem] Request for comments on proposed relative() function
There are two open tickets requesting a relative() function, and also a National Body (I.E official) comment against the Filesystem TS (which is due to finalize at the June C++ committee meeting). The committee's Library Working Group has indicated they would like to add such a function. With help from Jamie Allsop, I've put together a proposal. See attached for docs. See https://github.com/boostorg/filesystem/tree/feature/relative for a branch of filesystem with the implementation and some relative() tests added. Comments appreciated. --Beman
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Beman Dawes Sent: 08 May 2014 16:58 To: Boost Developers List Subject: [boost] [filesystem] Request for comments on proposed relative() function
There are two open tickets requesting a relative() function, and also a National Body (I.E official) comment against the Filesystem TS (which is due to finalize at the June C++ committee meeting). The committee's Library Working Group has indicated they would like to add such a function.
With help from Jamie Allsop, I've put together a proposal. See attached for docs
I couldn't read the attached docs :-( I'm sure relative() is useful. Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 01539 561830
On Thu, May 8, 2014 at 1:13 PM, Paul A. Bristow
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Beman Dawes Sent: 08 May 2014 16:58 To: Boost Developers List Subject: [boost] [filesystem] Request for comments on proposed relative() function
There are two open tickets requesting a relative() function, and also a National Body (I.E official) comment against the Filesystem TS (which is due to finalize at the June C++ committee meeting). The committee's Library Working Group has indicated they would like to add such a function.
With help from Jamie Allsop, I've put together a proposal. See attached for docs
I couldn't read the attached docs :-(
Hum... Not sure what the problem is, but here they are inline: 8.6.3 path relative function [path.relative] path relative(const path& p, const path& base); Creates a path from the trailing elements of p that are relative to base. *Effects:* If the number of elements in [p.begin(), p.end()) is less than or equal to the number of elements in [base.begin(), base.end()), or if any element in [base.begin(), base.end()) is not equal to the corresponding element in [base.begin(), base.end()), throw an exception of type filesystem_error. *Remarks: *Equality or inequality are determined by path::operator== or path::operator!= respectively. *Returns: *An object of class path containing the first element of p that does not have a corresponding element in base, followed by the subsequent elements of p appended as if by path::operator/=. *Throws:* filesystem_error. [*Note:* The behavior of relative is determined by the lexical value of the elements of p and base - the external file system is not accessed. The case where an element of base is not equal to corresponding element of p is treated as an error to avoid returning an incorrect result in the event of symlinks. *--end note*] *A possible implementation would be:* auto mm = mismatch( p.begin(), p.end(), base.begin(), base.end()); if (mm.first == p.end() || mm.second != base.end()) { throw filesystem_error( "p does not begin with base, so can not be made relative to base", p, base, error_code(errc::invalid_argument, generic_category())); } path tmp(*mm.first++); for (; mm.first != p.end(); ++mm.first) tmp /= *mm.first; return tmp;
On 05/08/2014 07:19 PM, Beman Dawes wrote:
Creates a path from the trailing elements of p that are relative to base.
It may be more user-friendly to add that base must be a prefix of p.
element in [base.begin(), base.end()) is not equal to the corresponding element in [base.begin(), base.end()), throw an exception of type
Shouldn't one of these ranges be about p?
On Thu, May 8, 2014 at 3:02 PM, Bjorn Reese
On 05/08/2014 07:19 PM, Beman Dawes wrote:
Creates a path from the trailing elements of p that are relative to
base.
It may be more user-friendly to add that base must be a prefix of p.
Done.
element in [base.begin(), base.end()) is not equal to the corresponding
element in [base.begin(), base.end()), throw an exception of type
Shouldn't one of these ranges be about p?
Nice catch! The second range should be on p. --Beman
2014-05-08 21:02 GMT+02:00 Bjorn Reese
On 05/08/2014 07:19 PM, Beman Dawes wrote:
Creates a path from the trailing elements of p that are relative to base.
It may be more user-friendly to add that base must be a prefix of p.
It might be even more user-friendly to allow this use case: BOOST_TEST(fs::relative("/abc", "/abc/def") == path("../def")); -- Daniel
On May 9, 2014 4:23:00 PM EDT, Daniel Pfeifer
2014-05-08 21:02 GMT+02:00 Bjorn Reese
: On 05/08/2014 07:19 PM, Beman Dawes wrote:
Creates a path from the trailing elements of p that are relative to base.
It may be more user-friendly to add that base must be a prefix of p.
It might be even more user-friendly to allow this use case:
BOOST_TEST(fs::relative("/abc", "/abc/def") == path("../def"));
The first element in the range p, that's not in base, is the empty set, so p cannot be made relative to base in your example. If you reverse the arguments, then the result would be "def". ___ Rob (Sent from my portable computation engine)
On Wed, May 14, 2014 at 3:02 AM, Rob Stewart
On May 9, 2014 4:23:00 PM EDT, Daniel Pfeifer
wrote: 2014-05-08 21:02 GMT+02:00 Bjorn Reese
: On 05/08/2014 07:19 PM, Beman Dawes wrote:
Creates a path from the trailing elements of p that are relative to base.
It may be more user-friendly to add that base must be a prefix of p.
It might be even more user-friendly to allow this use case:
BOOST_TEST(fs::relative("/abc", "/abc/def") == path("../def"));
The first element in the range p, that's not in base, is the empty set, so p cannot be made relative to base in your example. If you reverse the arguments, then the result would be "def".
+1 --Beman
2014-05-14 11:02 GMT+02:00 Rob Stewart
On May 9, 2014 4:23:00 PM EDT, Daniel Pfeifer
wrote: 2014-05-08 21:02 GMT+02:00 Bjorn Reese
: On 05/08/2014 07:19 PM, Beman Dawes wrote:
Creates a path from the trailing elements of p that are relative to base.
It may be more user-friendly to add that base must be a prefix of p.
It might be even more user-friendly to allow this use case:
BOOST_TEST(fs::relative("/abc", "/abc/def") == path("../def"));
The first element in the range p, that's not in base, is the empty set, so p cannot be made relative to base in your example. If you reverse the arguments, then the result would be "def".
My bad. I inteded to write: BOOST_TEST(fs::relative("/abc", "/abc/def") == path("..")); BOOST_TEST(fs::relative("/abc/def", "/abc/ghi") == path("../def")); Here again, the first element in the range p, that's not in base, is the empty set. So you say that this is not possible? I see that the proposed implementation of fs::relative does not support it, but I am very interested in this functionality. -- Daniel
On May 15, 2014 7:50:46 AM EDT, Daniel Pfeifer
BOOST_TEST(fs::relative("/abc", "/abc/def") == path("..")); BOOST_TEST(fs::relative("/abc/def", "/abc/ghi") == path("../def"));
Here again, the first element in the range p, that's not in base, is the empty set. So you say that this is not possible? I see that the proposed implementation of fs::relative does not support it, but I am very interested in this functionality.
OK ___ Rob (Sent from my portable computation engine)
On 9/05/2014 05:19, quoth Beman Dawes:
*Returns: *An object of class path containing the first element of p that does not have a corresponding element in base, followed by the subsequent elements of p appended as if by path::operator/=.
Any chance of including in the docs some example inputs and outputs, or test cases? I'm having trouble synchronising what I think this method ought to be doing with what my understanding of this description suggests. (Actually I have similar issues with most of the Boost.Filesystem documentation -- possibly I just don't properly grok standardese.) Possibly I'm just interpreting it incorrectly, or I'm making an incorrect assumption about the internals of path, but this description does not sound correct.
if (mm.first == p.end() || mm.second != base.end()) { throw filesystem_error( "p does not begin with base, so can not be made relative to base", p, base, error_code(errc::invalid_argument, generic_category())); }
In the event that the provided path cannot be made relative to base, isn't it more generically useful to return the original unmodified absolute path? (ie. simply returning p instead of throwing, at least when both paths are absolute.) I'm assuming that the intended use case of this is to "minimise" a path given a known working directory, and unrelated absolute paths are already in their minimal form in that context. Or another possibly useful output (as Daniel hinted at, though I don't think he got the case right) would be to return a relative path using dotdot syntax, so: BOOST_TEST(fs::relative("/abc", "/abc/def") == path("..")); BOOST_TEST(fs::relative("/ghi", "/abc/def") == path("../../ghi")); This could be more useful in some cases (it bloats the path but makes it more immune to being moved elsewhere). Maybe we even need both. Although that brings up another question (which I'm not really familiar enough with the "path" class internals to answer by looking at the example implementation): is "base" intended to be assumed as a directory name (which is how most filesystem "make relative" functions typically work) or as a file name (which is how URL "make relative" works)? My examples above assume the former, which seems consistent provided that current_path() never returns a path with trailing directory separator (unless the current path is the root). On a peripherally related note, I find the following behaviour (on Windows) surprising: fs::path("C:\\foo") / fs::path("C:\\bar") == fs::path("C:\\foo\\C:\\bar") Shouldn't appending a root-path discard the prior path, like how fs::absolute() works? (I assume this was intentional to simplify the implementation, but I was hoping there would be something analogous to Path.Combine from .NET, which can also return relative paths.)
On May 13, 2014 10:00:25 PM EDT, Gavin Lambert
*Returns: *An object of class path containing the first element of p
On 9/05/2014 05:19, quoth Beman Dawes: that
does not have a corresponding element in base, followed by the subsequent elements of p appended as if by path::operator/=.
Any chance of including in the docs some example inputs and outputs, or
test cases? I'm having trouble synchronising what I think this method ought to be doing with what my understanding of this description suggests. (Actually I have similar issues with most of the Boost.Filesystem documentation -- possibly I just don't properly grok standardese.)
This is a common complaint. Beman always looked toward standardization, so he wrote his docs with that in mind. The problem is that not everyone groks that format.
Possibly I'm just interpreting it incorrectly, or I'm making an incorrect assumption about the internals of path, but this description does not sound correct.
if (mm.first == p.end() || mm.second != base.end()) { throw filesystem_error( "p does not begin with base, so can not be made relative to base", p, base, error_code(errc::invalid_argument, generic_category())); }
In the event that the provided path cannot be made relative to base, isn't it more generically useful to return the original unmodified absolute path? (ie. simply returning p instead of throwing, at least when both paths are absolute.)
This gets you into the discussion of error codes versus exceptions as a means to report errors.
I'm assuming that the intended use case of this is to "minimise" a path given a known working directory, and unrelated absolute paths are already in their minimal form in that context.
That's a reasonable use case, but this function is only about returning the portion of p that is relative to base.
Or another possibly useful output (as Daniel hinted at, though I don't think he got the case right) would be to return a relative path using dotdot syntax, so:
BOOST_TEST(fs::relative("/abc", "/abc/def") == path("..")); BOOST_TEST(fs::relative("/ghi", "/abc/def") == path("../../ghi"));
This could be more useful in some cases (it bloats the path but makes it more immune to being moved elsewhere). Maybe we even need both.
Interesting
Although that brings up another question (which I'm not really familiar enough with the "path" class internals to answer by looking at the example implementation): is "base" intended to be assumed as a directory name (which is how most filesystem "make relative" functions typically work) or as a file name (which is how URL "make relative" works)?
path makes no assumptions about what it references, or whether the pathname even exists in the filesystem.
On a peripherally related note, I find the following behaviour (on Windows) surprising:
fs::path("C:\\foo") / fs::path("C:\\bar") == fs::path("C:\\foo\\C:\\bar")
Shouldn't appending a root-path discard the prior path, like how fs::absolute() works? (I assume this was intentional to simplify the implementation, but I was hoping there would be something analogous to Path.Combine from .NET, which can also return relative paths.)
Windows makes a mess of these things, but I agree that would be nice. ___ Rob (Sent from my portable computation engine)
On Wed, May 14, 2014 at 3:12 AM, Rob Stewart
On May 13, 2014 10:00:25 PM EDT, Gavin Lambert
wrote:...
I'm assuming that the intended use case of this is to "minimise" a path given a known working directory, and unrelated absolute paths are already in their minimal form in that context.
That's a reasonable use case, but this function is only about returning the portion of p that is relative to base.
Or another possibly useful output (as Daniel hinted at, though I don't think he got the case right) would be to return a relative path using dotdot syntax, so:
BOOST_TEST(fs::relative("/abc", "/abc/def") == path("..")); BOOST_TEST(fs::relative("/ghi", "/abc/def") == path("../../ghi"));
This could be more useful in some cases (it bloats the path but makes it more immune to being moved elsewhere). Maybe we even need both.
Interesting
The function has now been renamed from relative() to lexically_relative() to make clearer that it deal with paths at a purely lexical level. I probably should have mentioned that work is underway on several additional functions to "do-the-right-thing" when the paths exist or partially exist in the external file system. On possibility is to add a semi_canonical() function which behaves like canonical() for an existing portion of a path, and then normalizes any trailing non-existent portion. This would allow an additional function that does the right thing for existing or partially existing paths. It might be implemented like this: path relative(const path& p, const path& base) { return lexically_relative(semi_canonical(p), semi_canonical(base)); } It isn't clear yet if these semantics are really the most useful.
Although that brings up another question (which I'm not really familiar enough with the "path" class internals to answer by looking at the example implementation): is "base" intended to be assumed as a directory name (which is how most filesystem "make relative" functions typically work) or as a file name (which is how URL "make relative" works)?
path makes no assumptions about what it references, or whether the pathname even exists in the filesystem.
+1 Thanks to both Gavin and Rob for their comments, --Beman
On May 14, 2014 1:39:35 PM EDT, Beman Dawes
The function has now been renamed from relative() to lexically_relative() to make clearer that it deal with paths at a purely lexical level.
I probably should have mentioned that work is underway on several additional functions to "do-the-right-thing" when the paths exist or partially exist in the external file system.
On possibility is to add a semi_canonical() function which behaves like canonical() for an existing portion of a path, and then normalizes any trailing non-existent portion.
This would allow an additional function that does the right thing for existing or partially existing paths. It might be implemented like this:
path relative(const path& p, const path& base) { return lexically_relative(semi_canonical(p), semi_canonical(base)); }
It isn't clear yet if these semantics are really the most useful.
I haven't looked closely enough, but I wonder if things need to be segregated better such that lexical functions and classes are distinguished from the rest. You could use separate namespaces, for example. ___ Rob (Sent from my portable computation engine)
On 14/05/2014 21:12, quoth Rob Stewart:
In the event that the provided path cannot be made relative to base, isn't it more generically useful to return the original unmodified absolute path? (ie. simply returning p instead of throwing, at least when both paths are absolute.)
This gets you into the discussion of error codes versus exceptions as a means to report errors.
Not really; I was asking if this case should be considered as an error at all. I can see a fairly likely use case of "translate this absolute path to one that is relative to my current working dir", for which sometimes a perfectly acceptable answer is "the original absolute path".
Although that brings up another question (which I'm not really familiar enough with the "path" class internals to answer by looking at the example implementation): is "base" intended to be assumed as a directory name (which is how most filesystem "make relative" functions typically work) or as a file name (which is how URL "make relative" works)?
path makes no assumptions about what it references, or whether the pathname even exists in the filesystem.
I'm aware of that; my point was that the semantics of the "make a relative path" operation typically differ between filesystems and URLs. For example (imagining some hypothetical functions): make_relative_uri("http://foo.com/baz/bar", "http://foo.com/baz") == "baz/bar" make_relative_uri("http://foo.com/baz/bar", "http://foo.com/baz/") == "bar" make_relative_path("/baz/bar", "/baz") == "bar" make_relative_path("/baz/bar", "/baz/") == "bar" Essentially, URLs always assume any trailing component that does not end in a slash is a filename, and will make paths relative to the containing directory of that file. While filesystems generally assume that the base is a directory name regardless of whether it has a trailing slash or not.
On 2014-05-15 02:02, Gavin Lambert wrote:
...
Although that brings up another question (which I'm not really familiar enough with the "path" class internals to answer by looking at the example implementation): is "base" intended to be assumed as a directory name (which is how most filesystem "make relative" functions typically work) or as a file name (which is how URL "make relative" works)?
path makes no assumptions about what it references, or whether the pathname even exists in the filesystem.
I'm aware of that; my point was that the semantics of the "make a relative path" operation typically differ between filesystems and URLs.
Dear all, One problem with boost filesystem is the lack of theoretical foundation. Hey, don't run away. :) Note: I have not checked the formalities thoroughly, so parts of the text might be erroneous or incomplete. Note: I type slashes everywhere because it is easier. Some should be backslashes on Windows. What are paths? =============== Surprisingly this is already a controversial question. From my experience, I can classify people's attitude to three categories. "Paths are names of resources, cookies for the user code." ---------------------------------------------------------- This is a justified approach, in simple cases like: int main(int argc, char *argv[]) { ifstream in(argv[1]); ofstream out(argv[2]); out << in.rdbuf(); } But obviously there is more to paths than that! "Paths are strings. Doing simple string operations won't do anybody harm." -------------------------------------------------------------------------- This is a common approach in scripting languages, for example. There are plenty of C++ code that does the same thing. This approach assumes specific connection between syntax and semantics of paths, however. The most common path operation, namely concatenation, can be easily defined to be consistent by applying some discipline (at least on Unix and Windows). Specifically, by ending all directories with a slash, concatenating any non-file path x with a relative path y can be done simply by x + y (std::string::operator +). The problem of this approach is that it does not encompass the meaning behind the defined operations, making hard to well-define them. "Paths are sequences of instructions to locate some resources." --------------------------------------------------------------- After all, this is what they are for the OS. It breaks them to a sequence of 'path elements' (from here on simply 'elements'), and depending on each of them, does some state transition on the internal path resolution state, getting to the desired filesystem node. This definition is not limited to filesystem paths, it is general enough to work with "step left, 3 steps back, dig the treasure", or "England, London, Baker St. 4". We will get to it later. Definitions =========== Notation: * f,g,h filesystem nodes * a,b,c elements * x,y,z paths From here on, if we quantify over "all filesystem nodes f", the intention is over all conceivable nodes in all possible machine states, not only in the concrete filesytem the program is currently run. Definition: a path is a finite sequence of elements, denoted by x = a0 / ... / an. (Here '/' is only an algebraic notation.) With filesystem paths, the elements are instructions of how to traverse the graph of filesystem nodes. Kinds of elements include: * On Unix: root "/", regular "name", current ".", parent "..". * On Windows: drive "c:", root "/", regular "name", current "." and parent "..". The action of an element on a node, denoted f / a, returns a new filesystem node as if by a call to openat() on Unix. On Windows there is no direct analogy of openat, so the semantics are somewhat similar to both, NtCreateFile() and SetCurrentDirectory(). The action of a path on a node is defined by f / (a0 / ... / an) = (...(f / a0) / ... ) / an. (openat() is consistent with this definition, in the sense that you get the same result whether you pass any string representation (see below) of (a0 / ... / an) in one call, or invoke it repeatedly for each element.) Naturally, concatenation of paths x / y is simply concatenation of the said sequences. The identity f / (x / y) = (f / x) / y holds, by definition. So far I gave an abstract definition of paths *without saying how they are represented syntactically*. This point is important because we cannot map every sequence of elements to a string in the OS specific syntax. For example, there is no encoding of the sequence "a" / "c:" on Windows. However: Definition: x is equivalent to y iff for all f, if g = f / x and h = f / y are both defined, then g = h (i.e. they yield the same node). Definition: str(x) is the canonical string representation of the path x. path(s) is the path resulting from parsing the string s. I leave str() and path() be implementation defined, subject to the following requirements: * path(str(x)) is equivalent to x * if x and y are equivalent, then str(x) = str(y) In practice there need no be path() and str() functions. In memory, paths are already represented as strings. All operations, although formally defined on abstract paths, can be implemented to work directly on strings. Therefore, where it is obvious from the context, we can identify paths x with their string representations str(x). For all operations, except canonical(s) = str(path(x)), we require that if the operands are already in canonical string representation, then the result is also to be in canonical string representation. This lets the implementation to short-circuit when appropriate. Operations (examples are for Windows, because it is less trivial than Unix): * concatenation -- already defined: * "a" / "b" = "a/b" * "c:" / "d" = "c:d" * "/a/b" / "c:" = "c:" * "a" / "../b" = "a/../b" * "d:/a" / "/b" = "d:/b" * is_absolute(x) iff there exists f such that for all g, g / x = f. * is_absolute("d:/xyz") = true * is_absolute("/abc") = false * is_absolute("c:") = true or false, depending on whether we include the environment states into the filesystem node quantification. * relative(x, y) = z s.t. y / z is equivalent to x. (not always exists) * relative("a/b", "a") = "b" * relative("c:/a/b", "c:") = "/a/b" * relative("c:/a/b", "c:/") = "c:/" * relative("c:a/b", "c:/") = NaP * relative(x, "") = x Eval-equivalence ---------------- Definition: x is eval-equivalent to y iff for all f, if g = f / x and h = f / y are both defined, and none involve symbolic link resolution, then g = h. The rest of the previous section is repeated except that eval-equivalence is used. * concatenation * "a/b" / "./../c" = "a/c" * "a" / "../../b" = "../b" * relative(x, y) * relative("a/b", "a/c") = "../b" * relative("c:a/b", "c:/") = NaP Note that on Windows, AFAIK but I may be mistaken, eval-equivalence and equivalence are the same, because ".." are resolved syntactically by Win32 API before any reparse points etc. kick-in. One can think of eval-parent ".." meaning 'go back' whereas the Unix parent ".." behave as a regular subdirectory. One may still wish to distinguish between equivalence and eval-equivalence on Windows. Summary ----------- Lots of questions are left intentionally un-answered. The above text establishes a framework, from here we can proceeded in different ways. For every concrete paths implementation we define * possible path elements * the effect of these elements on the filesystem nodes * path equivalence * a parsing function path() * a formatting function str() Possible variation: * Define that x is equivalent to y as before, with an added stipulation that the current "." elements are assumed to be regular. So that "." / "." = "./." rather than ".". * Split regular elements to filenames and directories. When parsing "abc/", path() will return a single directory element, when parsing "abc" it will return a single filename element. This is similar to the way URLs work. Usecase: "~/a.txt" / "b.txt" = "~/b.txt". Where Boost.Filesystem fails? ============================= Note I wasn't following its evolution in the last year or two, so some of this section might be outdated. Path is a sequence of what? --------------------------- Constructors, value_type, append(), all suggest that path is a sequences of chars/wchars. begin()/end()/iterator, on the other hand, suggest that path is a sequence of... paths? O.o Let's leave alone the fact that using the second as a definition doesn't work (it's like saying that string is a sequence of strings, though some languages do exactly this). The bigger problem is that path::iterator::value_type != path::value_type. What I naturally want to do when slicing paths, is to construct path(first, last) where first and last are path::iterators, but I cannot. Currently, at places where this is the intended operation, the documentations says "as if applying operator /= repeatedly". Which brings us to the following point. operator /= is broken --------------------- The path "c:a/b" decomposes, on Windows, to the sequence {"c:", "a", "b"}. But applying operator /= to this sequence repeatedly returns "c:/a/b". Now, according to the documentation, parent_path() of "c:a/b" should return "c:/a". But guess what, the last time I checked, it returned the desired "c:a"! Someone got confused... absolute(p,base) illogical -------------------------- The case when p.has_root_name() && !p.has_root_directory() makes no sense. I remember that it was once left unspecified. Don't know why it got defined now. Explanation: The analogy of what today's implementation does is absolute("England, Baker St. 4", "France, Paris, Passage Landrieu 8") = "England, Paris, Baker St. 4". For those who don't like analogies, absolute("c:a/b") shall return "c:/...current c drive directory.../a/b", not "c:/a/b" as it does today! More generally, when changing a higher-rank element, it makes no sense to leave the lower-rank elements which was specified relative to that higher-rank element! Note that: * For other cases, absolute() does the same thing as abstract path concatenation defined above. * This case is irrelevant for Unix. Thus I propose to scrap absolute() altogether. Bad names --------- * parent_path() does not return the parent directory! It should be named pop_back(). * absolute() isn't needed, system_complete() should be renamed to absolute(). * canonical() -- a better name would be resolve(), cause what it ultimately does is resolving symlinks? Naming it realpath() is also an option. equivalent() could be more useful... ------------------------------------ ...if it exposed an encapsulated system specific unique file "key". Then we could make an unordered_map of files by their unique filesystem keys. directory iterators ------------------- I would expect a directory_iterator to return elements rather than concatenating them with the path passed in the constructor. Similar to what 'ls' does. recursive_directory_iterators, on the other hand, can be modeled after a 'find .' and do what it does today. Think of implementing recursive_directory_iterators with the current public interface of directory_iterator... Path arithmetic and symlinks ============================ Path arithmetic, which includes concatenation, relative, is_absolute, etc., shall work purely syntactically. Generally, operations that work on paths shall either require that the whole path exists (or they are creating it), or ignore the filesystem completely. All or nothing. It is user's responsibility to resolve() the path if he really wants to. Rationale: by the definitions section, paths are abstract entities detached from the filesystem. Asking for relative(x,y) or x/y are legitimate questions on their own, even if the paths do not exist at the specific point in time and space. Even when x and y are relative to a yet-unknown base. Making them depend on the current filesystem state is error prone. Bottom line =========== Boost.Filesystem, as it is today, does not have a clear separation of syntax, semantics and filesystem access. Overall it looks quite messy, and when I tried to use it, it did not solve the real problems I had at hand. RFC and thanks for your attention, Yakov Galka
On 16/05/2014 00:44, quoth Yakov Galka:
One problem with boost filesystem is the lack of theoretical foundation.
There kind of is one if you look at the Definitions section at http://www.boost.org/doc/libs/1_55_0/libs/filesystem/doc/reference.html. I'm not completely convinced that the implemented methods actually obey this though, nor that they are necessarily wrong in not doing so. :)
Operations (examples are for Windows, because it is less trivial than Unix): * concatenation -- already defined: * "a" / "b" = "a/b" * "c:" / "d" = "c:d" * "/a/b" / "c:" = "c:" * "a" / "../b" = "a/../b" * "d:/a" / "/b" = "d:/b" * is_absolute(x) iff there exists f such that for all g, g / x = f. * is_absolute("d:/xyz") = true * is_absolute("/abc") = false * is_absolute("c:") = true or false, depending on whether we include the environment states into the filesystem node quantification. * relative(x, y) = z s.t. y / z is equivalent to x. (not always exists) * relative("a/b", "a") = "b" * relative("c:/a/b", "c:") = "/a/b" * relative("c:/a/b", "c:/") = "c:/" * relative("c:a/b", "c:/") = NaP * relative(x, "") = x
Agree with most of those, except: relative("c:/a/b", "c:/") = "a/b" relative("c:a/b", "d:/") = "c:a/b" relative("c:a/b", "c:/") = "a/b" (Case 2 because that's still a valid relative path from d:/ base; case 3 because it's implied that c:/ is the cwd of c:, and given that the path is equivalent to case 1. Though I accept that this third case may be borderline; alternatively it should return the same as case 2.) Also I'm dubious whether allowing the base-path of relative() to be itself relative is useful in any way. In particular the case relative(absolute-path, relative-path) seems nonsensical; at best it should probably return the unmodified absolute-path. Though I suppose for consistency with absolute/canonical, relative() could use absolute(base) as its base internally, which would produce reasonable results (albeit somewhat dependent on filesystem state -- but that's not unexpected when dealing with relative paths).
Note that on Windows, AFAIK but I may be mistaken, eval-equivalence and equivalence are the same, because ".." are resolved syntactically by Win32 API before any reparse points etc. kick-in. One can think of eval-parent ".." meaning 'go back' whereas the Unix parent ".." behave as a regular subdirectory. One may still wish to distinguish between equivalence and eval-equivalence on Windows.
I'm not sure how the Linux filesystem in general behaves, but the shell typically appears to do the same thing -- cd into a symlinked folder followed by "cd .." gets you back to your original folder, not the parent of the symlink target.
absolute(p,base) illogical --------------------------
The case when p.has_root_name() && !p.has_root_directory() makes no sense. I remember that it was once left unspecified. Don't know why it got defined now.
absolute("c:foo", "c:/bar") == "c:/bar/foo" This requires that case.
* canonical() -- a better name would be resolve(), cause what it ultimately does is resolving symlinks? Naming it realpath() is also an option.
It also appears to currently be the only function that resolves ".." path fragments. I would prefer if there were a function that does this lexically, without hitting the filesystem or requiring that the path actually exist. (Perhaps system_complete()? Hard to tell from the docs, as it defines its behaviour in terms of a function that does not exist.)
Rationale: by the definitions section, paths are abstract entities detached from the filesystem. Asking for relative(x,y) or x/y are legitimate questions on their own, even if the paths do not exist at the specific point in time and space. Even when x and y are relative to a yet-unknown base. Making them depend on the current filesystem state is error prone.
I don't think anyone ever suggested relative() should require that the path exist. But I don't think that it's unreasonable for it to react to the current directory of each drive -- that's the purpose of relative paths. (And if you want it to be relative to some other folder, you should specify which one explicitly as an absolute path, and then there is no dependency on the filesystem.)
RFC and thanks for your attention, Yakov Galka
I think most of the points you brought up here aren't really on-topic in this particular thread, and would have been better made in a separate thread (or by writing your own alternative implementation). I doubt it's likely that grand sweeping changes to an existing accepted library would get anywhere. But that doesn't mean you couldn't submit an alternative intended to supersede it; that's happened in the past. But hey, I'm not a maintainer, so what do I know. :)
On Fri, May 16, 2014 at 3:53 AM, Gavin Lambert
On 16/05/2014 00:44, quoth Yakov Galka:
One problem with boost filesystem is the lack of theoretical foundation.
There kind of is one if you look at the Definitions section at http://www.boost.org/doc/libs/1_55_0/libs/filesystem/doc/reference.html. I'm not completely convinced that the implemented methods actually obey this though, nor that they are necessarily wrong in not doing so. :)
It is simply not enough. Every path-related wording is syntax related rather than semantics defining. It does not define what path concatenation means, thus leaving stuff like "c:" / "d:" unspecified. In the end, the wording of operator / says simply that it "adds a separator when needed". I believe that the word "separator" shall not even exist in the documentation. In the best you can use the Definitions section to work with generic paths only. There is nothing to infer what "[A.B]F.TXT"/"[D]" should do on OpenVMS. ...
relative("c:/a/b", "c:/") = "a/b"
Absolutely, this was a write-in-progress bug :) relative("c:a/b", "d:/") = "c:a/b"
Yes, this is implied by my definition: "d:/" / "c:a/b" = "c:a/b" relative("c:a/b", "c:/") = "a/b"
... Though I accept that this third case may be borderline; alternatively it should return the same as case 2.)
My point is that defining relative() in isolation is a way to no-where. Your example does not satisfy "c:/" / "a/b" = "c:a/b", so it is incorrect. It should be "c:a/b" or NaP, depending on the way you define concatenation.
Also I'm dubious whether allowing the base-path of relative() to be itself relative is useful in any way. In particular the case relative(absolute-path, relative-path) seems nonsensical; at best it should probably return the unmodified absolute-path.
It is perfectly defined by my definition and returns the absolute-path. Though I suppose for consistency with absolute/canonical, relative() could
use absolute(base) as its base internally, which would produce reasonable results (albeit somewhat dependent on filesystem state -- but that's not unexpected when dealing with relative paths).
It is unexpected for me -- when I'm dealing with paths I don't want to access the filesystem at all, unless I specifically say that.
... I'm not sure how the Linux filesystem in general behaves, but the shell typically appears to do the same thing -- cd into a symlinked folder followed by "cd .." gets you back to your original folder, not the parent of the symlink target.
What cd does depends on the shell. Don't know of Linux, but on FreeBSD + sh, AFAIR, cd resolves the paths. Also if I cd to some filesystem node that gets deleted, then I cannot "cd .." because the node does not exist anymore. This is really annoying! absolute(p,base) illogical
--------------------------
The case when p.has_root_name() && !p.has_root_directory() makes no sense. I remember that it was once left unspecified. Don't know why it got defined now.
absolute("c:foo", "c:/bar") == "c:/bar/foo"
This requires that case.
And absolute("c:foo", "d:/bar") == "c:/bar/foo"... It does not make sense. Back to the theory, you could define "c:/a" / "c:" = "c:/a" and "c:/a" / "d:" = "d:" (I think this is what SetCurrentDirectory, but not cd, do), and then again, our abstract concatenation would do what absolute does, but correctly.
* canonical() -- a better name would be resolve(), cause what it
ultimately does is resolving symlinks? Naming it realpath() is also an option.
... (Perhaps system_complete()? Hard to tell from the docs, as it defines its behaviour in terms of a function that does not exist.)
Yes, system_complete, on Windows, resolves ".." and "c:a" correctly. And it does not resolve symlinks, which is the correct thing on Windows.
Rationale: by the definitions section, paths are abstract entities
detached from the filesystem. Asking for relative(x,y) or x/y are legitimate questions on their own, even if the paths do not exist at the specific point in time and space. Even when x and y are relative to a yet-unknown base. Making them depend on the current filesystem state is error prone.
I don't think anyone ever suggested relative() should require that the path exist. But I don't think that it's unreasonable for it to react to the current directory of each drive -- that's the purpose of relative paths. (And if you want it to be relative to some other folder, you should specify which one explicitly as an absolute path, and then there is no dependency on the filesystem.)
This all sounds good, but working with paths relative to an *unknown* base is useful. Think of some project directory that tries to be relocatable. I think most of the points you brought up here aren't really on-topic in
this particular thread, and would have been better made in a separate thread (or by writing your own alternative implementation). I doubt it's likely that grand sweeping changes to an existing accepted library would get anywhere. But that doesn't mean you couldn't submit an alternative intended to supersede it; that's happened in the past.
True, some of them are off-topic. And I do have an alternative path implementation that I'm using myself, which I might release some day. However, boost.filesystem already undergone three major versions, and it is actively pushed to being standardized. So fixing it might be more logical than introducing another library, that fixes those concrete problems but presents an entirely different, likely controversial, approach. -- Yakov
On 16/05/2014 19:21, quoth Yakov Galka:
Though I suppose for consistency with absolute/canonical, relative() could use absolute(base) as its base internally, which would produce reasonable results (albeit somewhat dependent on filesystem state -- but that's not unexpected when dealing with relative paths).
It is unexpected for me -- when I'm dealing with paths I don't want to access the filesystem at all, unless I specifically say that.
But you *are* specifically saying that by not supplying an absolute path as your base path. Again, I don't think relative(*, some-relative-path) is a sensible operation. Besides, relative(absolute(path1, X), absolute(path2, X)) == relative(absolute(path1, Y), absolute(path2, Y)) for any X & Y when both paths are non-rooted relative; any cwd injection will cancel out. And finally, the working dir of each drive is not actually filesystem state. It's environment state, separately owned by each process (albeit inherited from its parent). So you're not actually accessing the filesystem when you get the CWD.
absolute("c:foo", "c:/bar") == "c:/bar/foo"
This requires that case.
And absolute("c:foo", "d:/bar") == "c:/bar/foo"... It does not make sense.
True, but there isn't a good answer to that unless you can determine what the cwd of c: is (or just assume it's the root, which is a bad idea especially for console programs). Which means this case depends on "filesystem state" -- though actually environment. (And it's not arbitrary state -- it's state that's set either earlier in your own process or by your parent process possibly in preparation to execute you. So it's not unreasonable to depend on it.) But it's not correct to treat this as an invalid case either. Imagine a console program that is trying to translate a user-supplied argument (the first parameter) into an absolute path based on its CWD (the second parameter). Both the program and the user should expect, by convention, that the previously-used-WD of C: should be used in the expansion. As in: cd /d c:\baz cd /d d:\bar zap quux c:foo The parameters are well defined and should expand to d:\bar\quux and c:\baz\foo respectively. (Which is apparently what Boost.Filesystem does, but only when using system_complete() instead of absolute(). I'm not sure why these have different behaviour; system_complete() seems more sensible.)
This all sounds good, but working with paths relative to an *unknown* base is useful. Think of some project directory that tries to be relocatable.
I don't see how that would be a problem. It can use absolute(path-in-config-file, path-of-project) to obtain the "real" location of some resource while running, and then write it back using relative(real-location, path-of-project). You're free to move the project and its resources around between runs and the links will be preserved as long as the internal structure is maintained. An unknown base is only something that exists in storage. To actually accomplish anything useful, you need to know what the actual base is. (It's usually not sensible to manipulate the path by itself -- you have to do the same manipulation [eg. renaming] to the actual resource or you'll break the link.)
On Fri, May 16, 2014 at 11:41 AM, Gavin Lambert
... Again, I don't think relative(*, some-relative-path) is a sensible operation.
Besides, relative(absolute(path1, X), absolute(path2, X)) == relative(absolute(path1, Y), absolute(path2, Y)) for any X & Y when both paths are non-rooted relative; any cwd injection will cancel out.
Absolute is not relevant to the definition. I feel we lost each other. To make it constructive, do you agree with my definition: relative(x,y) returns a path z (unique up to equivalence), if exists, such that y / z = x (up to equivalence) ? Here I'm using the abstract definitions of / and equivalence, which may vary depending on the effect you want to achieve.
And finally, the working dir of each drive is not actually filesystem state. It's environment state, separately owned by each process (albeit inherited from its parent). So you're not actually accessing the filesystem when you get the CWD.
You still access some state external to the given paths. This is what I want to avoid. True, but there isn't a good answer to that unless you can determine what
the cwd of c: is (or just assume it's the root, which is a bad idea especially for console programs). Which means this case depends on "filesystem state" -- though actually environment. ... (Which is apparently what Boost.Filesystem does, but only when using system_complete() instead of absolute(). I'm not sure why these have different behaviour; system_complete() seems more sensible.)
This is exactly what I am telling! You either want to combine paths, this is what / is for (in my definition), or complete them with some current system-dependent state, which is what system_complete() does. (Sometimes you want to do both, in that order.) To get an absolute path on Windows (using today's boost.fs), one has to call system_complete(), *not* absolute() (!!!), because the later does not capture platform semantics correctly.
This all sounds good, but working with paths relative to an *unknown* base
is useful. Think of some project directory that tries to be relocatable.
I don't see how that would be a problem. It can use absolute(path-in-config-file, path-of-project) to obtain the "real" location of some resource while running, and then write it back using relative(real-location, path-of-project). You're free to move the project and its resources around between runs and the links will be preserved as long as the internal structure is maintained.
Don't you think that resolving two paths w.r.t. the same base, just in order to get the difference between them, shows that relative of two relative paths make sense on its own? If you read carefully the definitions in my OP, the relative() I described has exactly this property: for relative paths it works as-if assuming both have all possible bases prepended (same base for both). An unknown base is only something that exists in storage. To actually
accomplish anything useful, you need to know what the actual base is. (It's usually not sensible to manipulate the path by itself -- you have to do the same manipulation [eg. renaming] to the actual resource or you'll break the link.)
How about getting a relative path between two files in a project? I.e. one pointing to another. -- Yakov
Yakov Galka wrote:
relative(x,y) returns a path z (unique up to equivalence), if exists, such that y / z = x (up to equivalence)
If you allow z to be an absolute path, it'd never be unique when x is absolute, because x would be a trivial solution then. This is why I don't particularly like relative( d:/, c:a/b ) == c:a/b -- c:a/b is not a relative path. Absolute, def :- path x is absolute when r / x does not depend on r. On an unrelated note, this c:a/b business sure throws a spanner in the works. It's absolute by the above definition... but it depends on the current directory of drive C... except that there is no such thing as a current directory of drive C in Windows, there's only one current directory per process... except that under DOS current directories were per-drive, so they are emulated today using hidden environment variables*. Madness. How about we just say c:a/b is c:/a/b and be done with it. (*) SET "" will display the hidden variables. The current directory of drive C: is the variable "=C:".
On Fri, May 16, 2014 at 4:20 PM, Peter Dimov
Yakov Galka wrote:
relative(x,y) returns a path z (unique up to equivalence), if exists,
such that y / z = x (up to equivalence)
If you allow z to be an absolute path, it'd never be unique when x is absolute, because x would be a trivial solution then.
I think you got it backwords. x is the parameter, z in the solution. So yes, if x is absolute, then z is also absolute and equals x. Uniqueness isn't broken. This is why I don't particularly like
relative( d:/, c:a/b ) == c:a/b
I assume you meant relative( c:a/b, d:/ ). -- c:a/b is not a relative path.
Absolute, def :- path x is absolute when r / x does not depend on r.
Agree. Equivalent to my OP. On an unrelated note, this c:a/b business sure throws a spanner in the
works. It's absolute by the above definition...
Not exactly true. It depend on how you define x / "c:". Definition 1: x / "c:" = "c:" for all x. Then c:a/b is absolute. Definition 2: "c:" / x / "c:" = "c:" / x "a:" / x / "c:" = "c:" for a != c and no element of x is a drive. This is similar to what SetCurrentDirectory does, and implies that c:a/b isn't absolute. I do not imply that any of these interpretations is superior. but it depends on the current directory of drive C... except that there is
no such thing as a current directory of drive C in Windows, there's only one current directory per process... except that under DOS current directories were per-drive, so they are emulated today using hidden environment variables*.
Not exactly true. It's true that they are implemented for backward compatibility reasons. But they are read by the Windows API, including SetCurrentDirectory, GetFullPathName, etc... So there is still a per-drive notion of "current directory" in addition to "THE current directory".
Madness. How about we just say c:a/b is c:/a/b and be done with it.
It's a platform convention. Don't know about other systems (OpenVMS?). Being Unix-centric is definitely easier. But if so, there is so much other junk that can be simplified. Why not define narrow-char encoding be UTF-8, say? After all, codepages are still supported only for backward compatibility reasons. Seems to be a much more important and useful assumption than the rarely used per-drive current directories... Smells like hypocrisy to me. -- Yakov
On Fri, May 16, 2014 at 4:49 PM, Yakov Galka
... Definition 2: "c:" / x / "c:" = "c:" / x "a:" / x / "c:" = "c:" for a != c and no element of x is a drive. This is similar to what SetCurrentDirectory does, and implies that c:a/b isn't absolute.
Actually the problem of this definition is that it cannot be associative given the current Windows path syntax. -- Yakov
Yakov Galka wrote:
On Fri, May 16, 2014 at 4:49 PM, Yakov Galka
wrote: ... Definition 2: "c:" / x / "c:" = "c:" / x "a:" / x / "c:" = "c:" for a != c and no element of x is a drive. This is similar to what SetCurrentDirectory does, and implies that c:a/b isn't absolute.
Actually the problem of this definition is that it cannot be associative given the current Windows path syntax.
To support c:a, we need either to distinguish between "c:" and "c:/" as path elements, or posit that c:/ consists of { "c:", "/" }. The openat-based definition then becomes associative (using either), if I'm not mistaken. Although I very well could be, because did I mention the word madness in relation to all this? I forget.
Yakov Galka wrote:
On Fri, May 16, 2014 at 4:20 PM, Peter Dimov
wrote: Yakov Galka wrote:
relative(x,y) returns a path z (unique up to equivalence), if exists,
such that y / z = x (up to equivalence)
If you allow z to be an absolute path, it'd never be unique when x is absolute, because x would be a trivial solution then.
I think you got it backwords. x is the parameter, z in the solution. So yes, if x is absolute, then z is also absolute and equals x. Uniqueness isn't broken.
I don't think so. x = c:/a/b y = c:/a Want: z such that y / z = x Let z = x "c:/a" / "c:/a/b" = "c:/a/b" = x Ergo, z = x is a solution. Let z = "b" "c:/a" / "b" = "c:/a/b" = x Ergo, z = "b" is a solution. x and "b" are not equivalent, which means that the solution is not unique. (There's also z3 = "/a/b", also an interesting specimen under Windows with respect to the absolute/relative classification.)
But they are read by the Windows API, including SetCurrentDirectory, GetFullPathName, etc... So there is still a per-drive notion of "current directory" in addition to "THE current directory".
They are read but not written. If the environment says that the current directory of D: is D:\foo and you SetCurrentDirectory( "D:\\bar" ), the current directory of D: remains D:\foo in the environment. So D:test was D:\foo\test, becomes D:\bar\test while the current directory is D:\bar, and then reverts to D:\foo\test when the current directory becomes C:\foo. Madness, as I said. No sane person uses such paths. :-) That is not quite in the same category as the encoding, because it affects the path algebra, and encoding does not. On second thought, we already need to distinguish between a/b and /a/b, so perhaps making the distinction between c:a/b and c:/a/b would not be a significant burden.
On 2014-05-16 17:32, Peter Dimov wrote:
Yakov Galka wrote:
On Fri, May 16, 2014 at 4:20 PM, Peter Dimov
wrote: Yakov Galka wrote:
relative(x,y) returns a path z (unique up to equivalence), if exists,
such that y / z = x (up to equivalence)
If you allow z to be an absolute path, it'd never be unique when x is absolute, because x would be a trivial solution then.
I think you got it backwords. x is the parameter, z in the solution. So yes, if x is absolute, then z is also absolute and equals x. Uniqueness isn't broken.
I don't think so.
x = c:/a/b y = c:/a
Good catch! I said I haven't checked it thoroughly :) My original thought and implementation used a notion of ranks, more as an implementation detail, so I left it out in my OP for the sake of simplicity. But I overlooked it matters in this definition. Basically higher rank elements erase the lower rank ones, highest rank is absolute. So here, rank(drive) > rank(root) > rank(regular), rank(x) = max rank of elements of x. You add the requirement that relative() returns a minimal-rank path, and I think you are ok with uniquiness on Windows and Unix.
That is not quite in the same category as the encoding, because it affects the path algebra, and encoding does not.
True. But path algebra is visible to the user, who has some expectations about it. In-memory encodings, on the other hand, are internal to the program. The question is whether you design a library for the sake of making a yet-another-c++-(standard)-library, or to ultimately deliver quality products to the end-users. On 2014-05-16 17:38, Peter Dimov wrote:
Yakov Galka wrote:
On Fri, May 16, 2014 at 4:49 PM, Yakov Galka
wrote: ... Definition 2: "c:" / x / "c:" = "c:" / x "a:" / x / "c:" = "c:" for a != c and no element of x is a drive. This is similar to what SetCurrentDirectory does, and implies that c:a/b isn't absolute.
Actually the problem of this definition is that it cannot be associative given the current Windows path syntax.
To support c:a, we need either to distinguish between "c:" and "c:/" as path elements, or posit that c:/ consists of { "c:", "/" }.
The last time I checked, "c:/" was already parsed as {"c:", "/"} by boost. This is also the way implied in my OP.
The openat-based definition then becomes associative (using either), if I'm not mistaken.
You don't have openat on Windows, so you must approximate it for the sake of well defining this. Cheers, Yakov
Yakov Galka wrote:
Basically higher rank elements erase the lower rank ones, highest rank is absolute. So here, rank(drive) > rank(root) > rank(regular), rank(x) = max rank of elements of x. You add the requirement that relative() returns a minimal-rank path, and I think you are ok with uniquiness on Windows and Unix.
It's a bit more complicated than that on Windows, because these two: z1 = "/a" z2 = "c:a" have the same rank (or, to be precise, their ranks can't be ordered) - both are between relative and absolute. One is drive-relative, directory-absolute, other is drive-absolute, directory-relative. I think that the definition still holds, though.
On Fri, May 16, 2014 at 6:51 PM, Peter Dimov
Yakov Galka wrote:
Basically higher rank elements erase the lower rank ones, highest rank is
absolute. So here, rank(drive) > rank(root) > rank(regular), rank(x) = max rank of elements of x. You add the requirement that relative() returns a minimal-rank path, and I think you are ok with uniquiness on Windows and Unix.
It's a bit more complicated than that on Windows, because these two:
z1 = "/a" z2 = "c:a"
have the same rank (or, to be precise, their ranks can't be ordered) - both are between relative and absolute. One is drive-relative, directory-absolute, other is drive-absolute, directory-relative.
This is not what I have defined. It is: rank(z1) = max(rank(root), rank(regular)) = rank(root) rank(z2) = max(rank(drive), rank(regular)) = rank(drive) Is z2 absolute? Wrt. what definitions? * "Definition 1" (x / "c:" = "c:" for all x), and either yours or mine (equivalent) definitions of is_absolute: then z2 is absolute because x / z2 = z2 for all x. Then my claim that path of max-rank (here rank(drive)) is absolute is true. * "Definition 2", ("c:" / x / "c:" = "c:" / x ...): you cannot define ranks for such concatenation. If you give ranks to elements you get a stronger framework, which, in particular, is associative. These "definitions" aren't proper definitions though. Abstract path concatenation is associative by definition (in my OP), so the question is how we define a valid str() function and action of elements on filesystem nodes. It *is* possible to define str() to achieve the effect intended in "Definition 2". By example, think of str("c:" / "a" / "d:" / "b") = "c:a/d:b". I.e. you serialize the whole per-drive current directory state-delta as a prefix of the path. It would be associative. However, this is an extension over Windows path syntax, so we definitely don't want to go there. Ironically, this is what boost.filesystem does (cats two valid paths to give an invalid one: "c:/a" / "d:/b" = "c:/a/d:/b"). Alternative, non-equivalent definition of is_absolute: ------------------------------------------------------ Definition: is_absolute(x) iff system_complete(x) is equivalent to x. Assuming that "c:" isn't absolute enough for us, this definition may make more sense. Note that it is even more logical given that system_complete() is the true absolute() on Windows. Yet another definition: Definition: is_absolute(x) iff system_complete(x) is eval-equivalent to x. Which raises the interesting question whether we want "c:/a/../b" to be absolute. -- Yakov
Yakov Galka wrote: ...
z1 = "/a" z2 = "c:a"
...
Is z2 absolute? Wrt. what definitions?
* "Definition 1" (x / "c:" = "c:" for all x), and either yours or mine (equivalent) definitions of is_absolute: then z2 is absolute because x / z2 = z2 for all x. Then my claim that path of max-rank (here rank(drive)) is absolute is true.
* "Definition 2", ("c:" / x / "c:" = "c:" / x ...): you cannot define ranks for such concatenation. If you give ranks to elements you get a stronger framework, which, in particular, is associative.
At first, I was in favor of definition 2, stated as "x / y is the meaning of y when the current directory is x". "c:/x" / "c:a" == "c:/x/a" "d:/x" / "c:a" == "c:a" "c:/x" / "/a" == "c:/a" "d:/x" / "/a" == "d:/a" On second thought though, I'm not sure that this is what I'd want from usability point of view. When the user gives me "c:a", he probably wants "c:a" and not "c:/x/a", even if the documentation states that paths are treated relative to "c:/x". In addition, "x" / "c:a" is not representable. Definition 1, then. "c:/x" / "c:a" == "c:a" "d:/x" / "c:a" == "c:a" But then "c:/x" / "/a" == "/a" "d:/x" / "/a" == "/a" for the same usability reasons. In this case, both z1 and z2 are absolute (even though they are not absolute by the N3940 definition.)
On Sun, May 18, 2014 at 6:02 PM, Peter Dimov
Yakov Galka wrote: ... On second thought though, I'm not sure that this is what I'd want from usability point of view. When the user gives me "c:a", he probably wants "c:a" and not "c:/x/a", even if the documentation states that paths are treated relative to "c:/x". ... "c:/x" / "/a" == "/a" "d:/x" / "/a" == "/a"
for the same usability reasons.
I don't follow. The real-world use-case is that I have some config/project/database/install-dir named x = "c:/myproduct", that references another file y, conceptually relative to itself. I think we all agree that if y = "a.txt" then x / y = "c:/myproduct/a.txt" is the desired result. Now, if y = "/windows/a.txt" I would definitely expect the result to be "c:/windows/a.txt", and not "/windows/a.txt" which may end up resolved to "d:/windows/a.txt" if "d:" is the current drive. How "/abc" is any different from "a.txt" in this respect?
In this case, both z1 and z2 are absolute (even though they are not absolute by the N3940 definition.)
OK, went through N3940 (as I said, wasn't following it for some time). The problem of the definition
A path that unambiguously identifies the location of a file without reference to an additional starting location.
is that it is unclear what 'location' is (a path or an inode?) and that it lacks a quantifier. After 10 minutes trying to understand what it says, I think that this is what it tried to say: x is absolute iff for all possible filesystem instances F (under the given OS), if exists a filesystem node f in F and a program state S such that f = open(F, S, x), then for all program states S, f = open(F, S, x). (i.e. ∀F ∃f ∈ F ((∃S f = open(F,S,x)) → (∀S f = open(F,S,x)))) Here open(F, S, x) is the node of F that the path x resolves to at the program state S, and program state S includes current directory, environment vars, etc... -- Yakov
On Sun, May 18, 2014 at 8:21 PM, Yakov Galka
x is absolute iff for all possible filesystem instances F (under the given OS), if exists a filesystem node f in F and a program state S such that f = open(F, S, x), then for all program states S, f = open(F, S, x).
(i.e. ∀F ∃f ∈ F ((∃S f = open(F,S,x)) → (∀S f = open(F,S,x)))) Here open(F, S, x) is the node of F that the path x resolves to at the program state S, and program state S includes current directory, environment vars, etc...
Note: I think this definition in fact coincides with that x is absolute iff system_complete(x) is eval-equivalent to x. So defining is_absolute in terms of the system-specific system_complete() actually makes sense. -- Yakov
Yakov Galka wrote:
I don't follow. The real-world use-case is that I have some config/project/database/install-dir named x = "c:/myproduct", that references another file y, conceptually relative to itself. I think we all agree that if y = "a.txt" then x / y = "c:/myproduct/a.txt" is the desired result. Now, if y = "/windows/a.txt" I would definitely expect the result to be "c:/windows/a.txt", and not "/windows/a.txt" which may end up resolved to "d:/windows/a.txt" if "d:" is the current drive. How "/abc" is any different from "a.txt" in this respect?
Yes, exactly. Now my point is that if you prefer /windows/a.txt to be resolved to c:/windows/a.txt, you should also prefer c:a.txt to be resolved to c:/myproduct/a.txt. If, on the other hand, you prefer, as I do, /windows/a.txt to be resolved to /windows/a.txt, then you should also prefer c:a.txt to be resolved to c:a.txt, and not to c:/myproduct/a.txt. In other words, /a and c:a are equally absolute. Treating c:a as absolute and /a as relative doesn't quite make sense to me.
On Sun, May 18, 2014 at 8:35 PM, Peter Dimov
Yakov Galka wrote:
I don't follow. The real-world use-case is that I have some config/project/database/install-dir named x = "c:/myproduct", that references another file y, conceptually relative to itself. I think we all agree that if y = "a.txt" then x / y = "c:/myproduct/a.txt" is the desired result. Now, if y = "/windows/a.txt" I would definitely expect the result to be "c:/windows/a.txt", and not "/windows/a.txt" which may end up resolved to "d:/windows/a.txt" if "d:" is the current drive. How "/abc" is any different from "a.txt" in this respect?
Yes, exactly.
Now my point is that if you prefer /windows/a.txt to be resolved to c:/windows/a.txt, you should also prefer c:a.txt to be resolved to c:/myproduct/a.txt. ...
Yes, I prefer it to be so. Unfortunately it means that either * we need to represent some of the program state affected by the previous concatenations in a string form, which is not encodable in a system understandable syntax, or * give up associativity, or * process drives independently of the rest, like "d:/a/b/c" / "c:" = "c:/a/b/c" (this is what boost::filesystem::absolute does), or * acknowledge that Windows paths are so crazy that the best we can do is x / "c:" = "c:", always. Do you see another option? (The last option isn't that bad: it's just that "c:" always means the "per-drive current directory of drive c", whereas "/" is the root of the drive of THE current directory.)
In other words, /a and c:a are equally absolute.
Treating c:a as absolute and /a as relative doesn't quite make sense to me.
As I said, you can detach absolute() from concatenation, by defining it in terms of system_complete(), in which case both become relative (as desired) independently of which option above you choose. -- Yakov
On 17/05/2014 02:32, quoth Peter Dimov:
Madness. How about we just say c:a/b is c:/a/b and be done with it.
You might be able to get away with that in a GUI app -- but then a GUI app shouldn't be getting those kinds of paths in the first place. You can't get away with it in a console app, at least in the context of parsing command line arguments. (If you have something separate translating arguments into absolute paths [does the alternate setargv do that?] then you might -- although that doesn't help you if you're parsing response files or scripts as well.) When the user provides a console app with drive-relative paths, they expect them to be relative to the current directory of those drives, as most recently set by that user. Treating it as rooted is just going to confuse and annoy them.
But they are read by the Windows API, including SetCurrentDirectory, GetFullPathName, etc... So there is still a per-drive notion of "current directory" in addition to "THE current directory".
They are read but not written. If the environment says that the current directory of D: is D:\foo and you SetCurrentDirectory( "D:\\bar" ), the current directory of D: remains D:\foo in the environment. So D:test was D:\foo\test, becomes D:\bar\test while the current directory is D:\bar, and then reverts to D:\foo\test when the current directory becomes C:\foo.
Madness, as I said. No sane person uses such paths. :-)
There *is* a perverse kind of logic to it. The current directory of alternate drives are set by the user (or a user agent such as batch files), and the user expects them to stay where they set them until the next time they change them themselves. But yes, such paths should only be used in the context of user input (command line or response files). They should be translated to "real" paths as soon as possible and stored that way. But there has to be some mechanism for doing that translation -- and it seems like something Filesystem ought to provide.
To support c:a, we need either to distinguish between "c:" and "c:/" as path elements, or posit that c:/ consists of { "c:", "/" }. The openat-based definition then becomes associative (using either), if I'm not mistaken. Although I very well could be, because did I mention the word madness in relation to all this? I forget.
Granted I haven't verified this, but the docs of Filesystem suggested that it already did that.
On Fri, May 16, 2014 at 7:20 AM, Peter Dimov
Yakov Galka wrote:
relative(x,y) returns a path z (unique up to equivalence), if exists,
such that y / z = x (up to equivalence)
If you allow z to be an absolute path, it'd never be unique when x is absolute, because x would be a trivial solution then. This is why I don't particularly like
relative( d:/, c:a/b ) == c:a/b
As proposed, lexically_relative("d:/", "c:a/b") is an error. The semantics are deliberately conservative in that p must be prefixed with base, rather than trying to invent some possible semantics for the case where the two arguments appear to have no lexical connection. --Beman
On Fri, May 16, 2014 at 1:21 AM, Yakov Galka
On Fri, May 16, 2014 at 3:53 AM, Gavin Lambert
wrote:
I think most of the points you brought up here aren't really on-topic in this particular thread, and would have been better made in a separate thread (or by writing your own alternative implementation). I doubt it's likely that grand sweeping changes to an existing accepted library would get anywhere. But that doesn't mean you couldn't submit an alternative intended to supersede it; that's happened in the past.
True, some of them are off-topic. And I do have an alternative path implementation that I'm using myself, which I might release some day. However, boost.filesystem already undergone three major versions, and it is actively pushed to being standardized. So fixing it might be more logical than introducing another library, that fixes those concrete problems but presents an entirely different, likely controversial, approach.
Filesystem will be a Technical Specification initially, rather than become part of the Standard Library. Full standardization is tentatively targeted for C++17. Voting on the Filesystem TS finished in January, and the committee is expected to finish comment resolution at the June meeting in Rapperswil, Switzerland. The deadline for the pre-meeting mailing is a week from today, so there really isn't time to do more than resolve national body comments from the PDTS voting. I've started the process of updating the Boost implementation and documentation to conform to the TS. Some of that will go in 1.56, with anything not finished in time will got in 1.57. --Beman
On May 16, 2014 6:08:18 PM EDT, Beman Dawes
On Fri, May 16, 2014 at 1:21 AM, Yakov Galka
wrote: On Fri, May 16, 2014 at 3:53 AM, Gavin Lambert
wrote:
I think most of the points you brought up here aren't really on-topic in this particular thread, and would have been better made in a separate thread (or by writing your own alternative implementation). I doubt it's likely that grand sweeping changes to an existing accepted library would get anywhere. But that doesn't mean you couldn't submit an alternative intended to supersede it; that's happened in the past.
True, some of them are off-topic. And I do have an alternative path implementation that I'm using myself, which I might release some day. However, boost.filesystem already undergone three major versions, and it is actively pushed to being standardized. So fixing it might be more logical than introducing another library, that fixes those concrete problems but presents an entirely different, likely controversial, approach.
Filesystem will be a Technical Specification initially, rather than become part of the Standard Library. Full standardization is tentatively targeted for C++17. Voting on the Filesystem TS finished in January, and the committee is expected to finish comment resolution at the June meeting in Rapperswil, Switzerland. The deadline for the pre-meeting mailing is a week from today, so there really isn't time to do more than resolve national body comments from the PDTS voting.
IOW, Yakov, there's still time to head off standardization of the library, as is, but you must be very active now on iso.cpp and elsewhere to see your vision come to fruition. ___ Rob (Sent from my portable computation engine)
participants (9)
-
Beman Dawes
-
Bjorn Reese
-
Daniel Pfeifer
-
Gavin Lambert
-
Nat Goodspeed
-
Paul A. Bristow
-
Peter Dimov
-
Rob Stewart
-
Yakov Galka