[Filesystem] v3 path separator changes
I'm finally getting round to moving to Filesystem v3 and now my code is breaking all over the place. The cause is the output of string() on Windows which has changed behaviour. I'm manipulating Unix paths for use over SFTP, but doing so on Windows. For instance I might want to append "c" to the Unix path "/a/b". path p("/a/b"); p /= "c"; cout << p.string(); In version 2 this would output "/a/b/c" but now it produces "/a/b\c". Why the breaking change? I'm trying to understand the rationale behind many of the v3 changes which [1] and [2] don't even begin to cover. [1] http://www.boost.org/doc/libs/1_53_0/libs/filesystem/doc/v3.html [2] http://www.boost.org/doc/libs/1_53_0/libs/filesystem/doc/deprecated.html Alex -- Swish - Easy SFTP for Windows Explorer (http://www.swish-sftp.org)
Alexander Lamaison
I'm finally getting round to moving to Filesystem v3 and now my code is breaking all over the place. The cause is the output of string() on Windows which has changed behaviour.
I'm manipulating Unix paths for use over SFTP, but doing so on Windows. For instance I might want to append "c" to the Unix path "/a/b".
path p("/a/b"); p /= "c"; cout << p.string();
In version 2 this would output "/a/b/c" but now it produces "/a/b\c". Why the breaking change?
I'm trying to understand the rationale behind many of the v3 changes which [1] and [2] don't even begin to cover.
[1] http://www.boost.org/doc/libs/1_53_0/libs/filesystem/doc/v3.html [2] http://www.boost.org/doc/libs/1_53_0/libs/filesystem/doc/deprecated.html
Surely someone knows why Boost.Filesystem path behaviour changed so much between v2 and v3? :-/ Alex -- Swish - Easy SFTP for Windows Explorer (http://www.swish-sftp.org)
On Fri, Mar 22, 2013 at 04:50:35PM +0000, Alexander Lamaison wrote:
Alexander Lamaison
writes: I'm finally getting round to moving to Filesystem v3 and now my code is breaking all over the place. The cause is the output of string() on Windows which has changed behaviour.
I'm manipulating Unix paths for use over SFTP, but doing so on Windows. For instance I might want to append "c" to the Unix path "/a/b".
path p("/a/b"); p /= "c"; cout << p.string();
In version 2 this would output "/a/b/c" but now it produces "/a/b\c". Why the breaking change?
I'm trying to understand the rationale behind many of the v3 changes which [1] and [2] don't even begin to cover.
[1] http://www.boost.org/doc/libs/1_53_0/libs/filesystem/doc/v3.html [2] http://www.boost.org/doc/libs/1_53_0/libs/filesystem/doc/deprecated.html
Surely someone knows why Boost.Filesystem path behaviour changed so much between v2 and v3?
While it might not say so on the tin, Boost.Filesystem has never really been about handling any path format outside of the one that it thinks the current platform uses. Any such accidental alignment has been just that, accidental. If you want to model URIs, model foreign paths, or model VFS paths, it's most likely not the library you want to use. -- Lars Viklund | zao@acc.umu.se
On Fri, Mar 22, 2013 at 7:01 PM, Lars Viklund
Surely someone knows why Boost.Filesystem path behaviour changed so much between v2 and v3?
While it might not say so on the tin, Boost.Filesystem has never really been about handling any path format outside of the one that it thinks the current platform uses. Any such accidental alignment has been just that, accidental.
This is not true. It defines two path formats, one "generic" and one "native". Their interaction is not well defined, however. Starting with that you are not necessarily can distinguish between generic and native, and generic is not really portable in any way, so its usefulness is uncertain. And if you ask me... this is a bad design choice, like half the things in Boost.Filesystem v3. -- Yakov
Yakov Galka
On Fri, Mar 22, 2013 at 7:01 PM, Lars Viklund
wrote: Surely someone knows why Boost.Filesystem path behaviour changed so much between v2 and v3?
While it might not say so on the tin, Boost.Filesystem has never really been about handling any path format outside of the one that it thinks the current platform uses. Any such accidental alignment has been just that, accidental.
This is not true. It defines two path formats, one "generic" and one "native".
Their interaction is not well defined, however. Starting with that you are not necessarily can distinguish between generic and native, and generic is not really portable in any way, so its usefulness is uncertain.
And if you ask me... this is a bad design choice, like half the things in Boost.Filesystem v3.
Is it time for a Boost.Filesystem v4 to result from an in-depth discussion on here? I'd hope it would take the best of v3 while removing some of the hurt it introduced in the process. For example, my top two issues are: - unclear generic/native path handling - methods returning a 'path' for stuff that isn't a path but just needs a unicode string Perhaps people can reply to this thread with any gripes they have? Alex -- Swish - Easy SFTP for Windows Explorer (http://www.swish-sftp.org)
On Sat, Mar 23, 2013 at 8:01 AM, Alexander Lamaison
Yakov Galka
writes: Is it time for a Boost.Filesystem v4 to result from an in-depth discussion on here? I'd hope it would take the best of v3 while removing some of the hurt it introduced in the process.
Perhaps. For my needs, v3 is an improvement over v2, which I found just generally annoying. All I really needed out of a filesystem library is a way to work with the filesystem on the native operating system, without having to constantly look up how to do anything from the documentation. I felt version 2 focused overmuch on relatively unusual cases and made it very annoying for the common case. Version 3 corrected that, but at some expense, apparently. So now that I'm happy, some of the other folks responding to this thread are not. I've often wondered about the design of filesystem, to be honest. It seems to me as if it could be improved to be more generic, and yield some other tools that could go beyond being helpful in working with file systems.
Perhaps people can reply to this thread with any gripes they have?
Well, I wouldn't describe this as a gripe exactly, just an idea for a different approach, if it tickles anyone's fancy. I have never seen a file system that wasn't, really, a tree with branches and leaves. It can have more than one base, but ultimately, it's a tree. As such, it feels to me like a more natural way to work with the file system is through an object that represents trees, and functions that help you work with the tree, or represent a particular path on the tree as a generic string, with a particular separator character (or characters). With a sort of understanding that the base is different from a branch, which is itself different from a leaf. (E.g.: in the Windows file system, the base represents drives or network shares, a branch is a folder, and a leaf is a file). With that approach, you could also represent the Windows registry using the same objects (just a different backend to handle the obviously different API calls). Or SMNP OIDs. Or a web site's paths. You could also do some other things that get rather intriguing, like serializing a representation of your tree to disk, or comparing pairs of these trees to find differences. Which, in a different company, was something I needed to do, but I confess that it's probably not a common use for trees... although it's common enough that I saw a paper written on the subject in the ACM's library. If boost first had a library that best presented tree representations of existing objects, then another layer of objects that helped build this tree or turn it into a string representation for specific paths for different kinds of needs (windows OS, VMS, web site, etc), it might offer the power and flexibility that people seem to want out of this library without the headaches of an API that makes you jump through non-intuitive hoops. Or does boost already have something like this and I've just been missing it? - Trey
On Sat, Mar 23, 2013 at 2:42 PM, Joseph Van Riper < fleeb.fantastique@gmail.com> wrote:
[...] I have never seen a file system that wasn't, really, a tree with branches and leaves. It can have more than one base, but ultimately, it's a tree.
It is called a "forest". But filesystems aren't really forests. For starters, they are very often DAGs (you can hardlink the leaves). Much software assumes that path is a unique identifier of a resource and breaks on non-trees. If people would not make such assumption we could very well allow arbitrary graphs. The point is that there is no *a priori* reason that file systems are DAGs, it is the *result* of thinking of filesystems as trees of paths.
As such, it feels to me like a more natural way to work with the file system is through an object that represents trees, and functions that help you work with the tree, or represent a particular path on the tree as a generic string, with a particular separator character (or characters).
What are the separator characters in c:\x.txt, c:x.txt, or SYS$SYSDEVICE:[USER.DOCS]PHOTO.JPG:8? I claim that the "paths are strings" mindset is too simplistic, narrow minded and useless for defining path arithmetic. It is still true that we *do* want to represent paths as strings, and actually a library that would work with std::strings has its own right for existence. However, any paths library shall not have the word "separator" in its documentation for anything other than the platform specific parts. "SYS$SYSDEVICE:[USER.DOCS]PHOTO.JPG:7" / "[OTHER]DOCUMENT.TXT:8" == "SYS$SYSDEVICE:[USER.DOCS.OTHER]DOCUMENT.TXT:8" With that approach, you could also represent the Windows registry using the
same objects (just a different backend to handle the obviously different API calls). Or SMNP OIDs. Or a web site's paths.
Not really, the syntax is different. -- Yakov
Yakov Galka wrote:
On Sat, Mar 23, 2013 at 2:42 PM, Joseph Van Riper < fleeb.fantastique@gmail.com> wrote:
[...] I have never seen a file system that wasn't, really, a tree with branches and leaves. It can have more than one base, but ultimately, it's a tree.
It is called a "forest". But filesystems aren't really forests. For starters, they are very often DAGs (you can hardlink the leaves). Much software assumes that path is a unique identifier of a resource and breaks on non-trees. If people would not make such assumption we could very well allow arbitrary graphs. The point is that there is no *a priori* reason that file systems are DAGs, it is the *result* of thinking of filesystems as trees of paths.
They can quite easily become non-DAGs with junction points / symlinks being able to create cycles.
On Sat, Mar 23, 2013 at 7:34 PM, Michael Marcin
They can quite easily become non-DAGs with junction points / symlinks being able to create cycles.
I absolutely agree with you. I did not include them in my post because usually people treat them as second-class citizens, and in some sense they are such. -- Yakov
On Sat, Mar 23, 2013 at 1:34 PM, Michael Marcin
Yakov Galka wrote:
On Sat, Mar 23, 2013 at 2:42 PM, Joseph Van Riper < fleeb.fantastique@gmail.com> wrote:
I have never seen a file system that wasn't, really, a tree with branches and leaves. It can have more than one base, but ultimately, it's a tree.
It is called a "forest". But filesystems aren't really forests. For starters, they are very often DAGs (you can hardlink the leaves). Much software assumes that path is a unique identifier of a resource and breaks on non-trees. If people would not make such assumption we could very well allow arbitrary graphs. The point is that there is no *a priori* reason
[...] that file systems are DAGs, it is the *result* of thinking of filesystems as trees of paths.
They can quite easily become non-DAGs with junction points / symlinks being able to create cycles.
Ah, yes, I had forgotten about hard or soft links (nod to Yakov Galka for this as well). I suppose those do need to be considered in some way if you wish to do more interesting things with this than the common use case. Still, in practice, people do not generally type in the underlying identifier for a node in the file system. While the system of using trees (or a forest, to include root nodes) to represent the file system might be broken, it is a dominant pattern. Resisting this pattern would most likely yield something that is not intuitive, despite the existence of items that do not fit the tree pattern. Even most APIs within operating systems that expose a file system use tree-like commands to work with the file system, with ways of going into sub directories, or parent directories, moving branches or leaves around, deleting said items, viewing the branches or leaves within a particular branch or root, etc. While I won't claim to be an expert on every file system in existence, I suspect most operating systems expose commands that work as if the underlying structure is a tree, despite the file system acting more like a graph, with the exception of the occasional command that links or unlinks to another node. At the very least, if I were on a command prompt, I wouldn't expect to find any command that allowed me to perform bulk operations within the file system with the mind-set of working a graph instead of a tree. For example, a command to find a particular file tends to have operations that look within sub-folders, not sub-graphs. And, yeah, I see where that leads to problems occasionally (when the file system forms a circle due to how the nodes are arranged), but I also see where most people seem to wave that off as, 'oh well, that's to be expected if you create circles in your file system'. I would offer that if you want to have a representation that uses a graph, you would still need to provide commands that match what you see in the operating system's api for working with the file system, which kind of brings you back to at least imitating a tree, rightly or wrongly, if you want the library to remain intuitive. - Trey
On Sat, Mar 23, 2013 at 9:15 PM, Joseph Van Riper < fleeb.fantastique@gmail.com> wrote:
On Sat, Mar 23, 2013 at 1:34 PM, Michael Marcin
wrote: [...] I would offer that if you want to have a representation that uses a graph, you would still need to provide commands that match what you see in the operating system's api for working with the file system, which kind of brings you back to at least imitating a tree, rightly or wrongly, if you want the library to remain intuitive.
Even if the FS would be an arbitrary graph, it does not mean that we must drop the path concept. The distinction between paths and FS graphs is important, and many people get it wrong. One key point observation is that a paths may have a meaning without an underlining filesystem. For example I may say "C:..\a.txt" and it means "the file named 'a.txt' in the parent directory of the current working directory of drive C". The above string is a sequence of *instructions* for locating a resource, which may not even exist. Some systems do expose those nodes in one way or another. In UNIX we have inodes, which I do not know much about. In Windows the NtCreateFile actually allows one to use a path relative to a previously opened handle. Now, this is not principle that this is not the API one usually uses. What does matter is that this is what we mean when we write a path and that it is what actually happens behind the scenes: the operating system parses the path into a sequence of path elements, each of which gives it a command to what node of the graph it must go next. This is the abstraction paths represent, and not "a string with some separators". One should dance from here. Some may ask how the way we look on paths changes the interface of a path library if we are going to store them in strings anyway? For an answer consider the the definition of operator / in current boost filesystem, which is defined syntactically and gives "c:" / "a" == "c:\a". Contrast it with a higher level one: a concatenation of the sequences of instructions, which would give "c:" / "a" == "c:a". For some more examples of bugs related to the syntactic definition of boost filesystem see the thread I linked earlier. -- Yakov
On Sat, Mar 23, 2013 at 12:36 PM, Yakov Galka
On Sat, Mar 23, 2013 at 2:42 PM, Joseph Van Riper < fleeb.fantastique@gmail.com> wrote:
[...]
As such, it feels to me like a more natural way to work with the file
system is through an object that represents trees, and functions that help
you work with the tree, or represent a particular path on the tree as a generic string, with a particular separator character (or characters).
What are the separator characters in c:\x.txt, c:x.txt, or SYS$SYSDEVICE:[USER.DOCS]PHOTO.JPG:8?
* C:\x.txt: : for root, \ for branches (and branches between leaves). * C:x.txt: : for root, leading right into a leaf that would be wherever your sense of 'current directory' should be. If someone entered this, you'd want to expand this to something like C:\my\current\folder\x.txt before working with it, unless you want a sense of 'current folder'. But that raises an interesting point I hadn't really considered... paths identified by a name, like Windows' %WINDOW% folder, etc. I don't know how often other operating systems use specialized folders that identify to a particular name, but Windows has several folder locations that one might identify with a particular name, even beyond expanding an environment variable (e.g. the Program Files folder). Yeah, this case requires some thought. * SYS$SYSDEVICE:[USER.DOCS]PHOTO.JPG:8: : for root, . for folders, with [] characters surrounding the branches, showing the leaves. In VMS, the leaf has an added attribute (I never discussed the attributes of a leaf) of a revision in addition to an extension, which makes that file system a tad more unique. Good ol' VMS. At the end of the day, in at least two of the three cases, the representation of the path is unique to the system they're on, and could be treated with functions specific to those representations. I'm suggesting, I suppose, a separation between its string representation for a path, and how the path is organized in memory. This could make it possible to create a VMS-style string representation of a Windows file system path, as silly as that might seem.
I claim that the "paths are strings" mindset is too simplistic, narrow minded and useless for defining path arithmetic. It is still true that we *do* want to represent paths as strings, and actually a library that would work with std::strings has its own right for existence. However, any paths library shall not have the word "separator" in its documentation for anything other than the platform specific parts.
I agree with this. The underlying representation should be more sophisticated. But we still need a representation for a particular path as a string, or such a library isn't very useful for most cases. Manipulate a path as if it were something other than a string, and when you've finished manipulating it, call a function to expand it into a string (perhaps for use in an API that depends upon such a string, or for storage, or whatever). With that approach, you could also represent the Windows registry using the
same objects (just a different backend to handle the obviously different API calls). Or SMNP OIDs. Or a web site's paths.
Not really, the syntax is different.
I'm not sure I see what you mean here. Perhaps I didn't express my thoughts well. That wouldn't be a surprise, since I was writing with brevity in mind and not clarity. I'm observing that the operating system provides an API that allow you to traverse a file system in a manner like working with a tree (or forest if you prefer). Similarly, the Windows operating system provides an API that allows you to traverse the registry in a manner like working with a tree. You can go to a parent, you have base nodes, and you have branches and leaves (just with different names... a file in the file system is analogous to a registry item in the registry). I'm observing that several systems seem to use this paradigm, and there's a utility in abstracting it. Interestingly, you can represent a particular path in a file system using a string, and you can represent a particular path in the Windows registry using a string. Note that I mean the path to the item, and not the item itself... obviously you'd need more than a string to represent a particular file within the file system, heh. I think for many use cases where the aforementioned paradigm is used, a string representation for a specific node within the system also exists. If you build individual libraries that map a file system's API to a library's API (a bridge between the two, if you will), you could build a library with eligible library backends that abstract the system API calls to make them common. And you could build a library that takes the underlying tree-like path object and represent it as a string in some format. So, let's say we had a library with a kind of generic path object with a function 'branches()'. That function could return a set of paths that represent other branches found within that particular path object (if any... yes, it should be possible for the path you have to not exist, in which case perhaps it could be created, but you can't find branches or leaves). So, if the path represented a file system path, you'd see other folders. If it represented a key in the Windows registry, it would show other keys. Now, if you wanted to take a particular path object and represent it as a string, you might call a function that takes the path as an argument, and it spits out a string representation of that path that is unique to how that function builds strings from paths. So, it could represent the path with the posix-style forward-slash separator, or the VMS-style root:[folder]file format, or even something else like a root:folder.folder.file format. Since the function that builds strings is simply calling an established set of functions within the generic 'path' like object, it can do this regardless of whether the path represents a Windows file system, windows registry, VMS file system, or an SNMP OID. This is what I wanted to suggest. - Trey
On Sat, Mar 23, 2013 at 9:55 PM, Joseph Van Riper < fleeb.fantastique@gmail.com> wrote:
On Sat, Mar 23, 2013 at 12:36 PM, Yakov Galka
wrote:
[...]
What are the separator characters in c:\x.txt, c:x.txt, or SYS$SYSDEVICE:[USER.DOCS]PHOTO.JPG:8?
* C:\x.txt: : for root, \ for branches (and branches between leaves). * C:x.txt: : for root, leading right into a leaf that would be wherever your sense of 'current directory' should be.
Whatever definition you give here, it won't let you parse the path with a simple split() or concatenate it with a simple join(). Therefore all such definition would be useless, and thus I see no point in talking about "separators" in a portable path library.
If someone entered this, you'd want to expand this to something like C:\my\current\folder\x.txt before working with it, unless you want a sense of 'current folder'.
But in order to do this you must have a 'current director', which means that you must query the filesystem. A path may point to a device that does not yet exist, like a drive that will be mounted later. The expansion must be done later, when you intend to *resolve* the path.
But that raises an interesting point I hadn't really considered... paths identified by a name, like Windows' %WINDOW% folder, etc. [...]
These are environment variables and I don't think mixing them with a filesystem library is a good idea. This would mix unrelated concepts. [...]
I claim that the "paths are strings" mindset is too simplistic, narrow minded and useless for defining path arithmetic. It is still true that we *do* want to represent paths as strings, and actually a library that would work with std::strings has its own right for existence. However, any paths library shall not have the word "separator" in its documentation for anything other than the platform specific parts.
I agree with this. The underlying representation should be more sophisticated.
I haven't said this. The underlining representation may or may not be a string. What I say is that the interface shall not be defined syntactically.
Manipulate a path as if it were something other than a string, and when you've finished manipulating it, call a function to expand it into a string (perhaps for use in an API that depends upon such a string, or for storage, or whatever). [...]
Exactly. I'm observing that the operating system provides an API that allow you to
traverse a file system in a manner like working with a tree (or forest if you prefer). [...] This is what I wanted to suggest.
I agree in general. What you describe is very similar to [Poco::Path]( http://pocoproject.org/slides/080-Files.pdf), which seems to get right many things boost didn't. -- Yakov
On Sat, Mar 23, 2013 at 4:42 PM, Yakov Galka
On Sat, Mar 23, 2013 at 9:55 PM, Joseph Van Riper < fleeb.fantastique@gmail.com> wrote:
On Sat, Mar 23, 2013 at 12:36 PM, Yakov Galka
wrote:
[...]
What are the separator characters in c:\x.txt, c:x.txt, or SYS$SYSDEVICE:[USER.DOCS]PHOTO.JPG:8?
* C:\x.txt: : for root, \ for branches (and branches between leaves). * C:x.txt: : for root, leading right into a leaf that would be wherever your sense of 'current directory' should be.
Whatever definition you give here, it won't let you parse the path with a simple split() or concatenate it with a simple join(). Therefore all such definition would be useless, and thus I see no point in talking about "separators" in a portable path library.
To be honest, I think we're actually thinking the same thing here, in general. I was describing what it would take to create a string out of the underlying objects, but you're describing the effort it takes to convert a string into the underlying objects, which I hadn't covered in my e-mail. It is an important problem, though. I should examine your link... it sound intriguing. - Trey
On Sat, Mar 23, 2013 at 8:01 AM, Alexander Lamaison
... Is it time for a Boost.Filesystem v4 to result from an in-depth discussion on here?
Probably not since the C++ standard committee is so far along with a Filesystem Technical Specification (TS). Once that ships (possibly later this year), Boost.Filesystem will be brought into sync with the TS. That doesn't involve much functional change, but it will cause a major update to the Boost.Filesystem reference documentation.
I'd hope it would take the best of v3 while removing some of the hurt it introduced in the process. For example, my top two issues are: - unclear generic/native path handling
See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3505.html The committee's Filesystem Study Group put a lot of effort into clarifying the class path specs.
- methods returning a 'path' for stuff that isn't a path but just needs a unicode string
Support for C++11 u16string and u32string will make that a bit easier, but it is really a misuse of class path. The real fix is improved string interoperability, and there is work going on separate from Boost.Filesystem to address that. -Beman
Beman Dawes
On Sat, Mar 23, 2013 at 8:01 AM, Alexander Lamaison
wrote: ... Is it time for a Boost.Filesystem v4 to result from an in-depth discussion on here?
Probably not since the C++ standard committee is so far along with a Filesystem Technical Specification (TS). Once that ships (possibly later this year), Boost.Filesystem will be brought into sync with the TS. That doesn't involve much functional change, but it will cause a major update to the Boost.Filesystem reference documentation.
I'd hope it would take the best of v3 while removing some of the hurt it introduced in the process. For example, my top two issues are: - unclear generic/native path handling
See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3505.html
The committee's Filesystem Study Group put a lot of effort into clarifying the class path specs.
I'll read that with interest. Is that why there wasn't much discussion here of the changes that went into filesystem v3: because it was happening outside Boost? Out of interest, is there a way us laypeople can contribute to the TS discussion?
- methods returning a 'path' for stuff that isn't a path but just needs a unicode string
Support for C++11 u16string and u32string will make that a bit easier, but it is really a misuse of class path. The real fix is improved string interoperability, and there is work going on separate from Boost.Filesystem to address that.
Good. Where? Because the discussion of a Boost Unicode string died. Several times over. Alex -- Swish - Easy SFTP for Windows Explorer (http://www.swish-sftp.org)
On Sat, Mar 23, 2013 at 2:01 PM, Alexander Lamaison
[...] Is it time for a Boost.Filesystem v4 to result from an in-depth discussion on here? I'd hope it would take the best of v3 while removing some of the hurt it introduced in the process. For example, my top two issues are: - unclear generic/native path handling - methods returning a 'path' for stuff that isn't a path but just needs a unicode string
Perhaps people can reply to this thread with any gripes they have?
See a related thread of things that boost.filesystem got wrong: http://boost.2283326.n4.nabble.com/boost-filesystem-path-frustration-td46417... -- Yakov
Yakov Galka wrote:
On Sat, Mar 23, 2013 at 2:01 PM, Alexander Lamaison
wrote: [...] Is it time for a Boost.Filesystem v4 to result from an in-depth discussion on here? I'd hope it would take the best of v3 while removing some of the hurt it introduced in the process. For example, my top two issues are: - unclear generic/native path handling - methods returning a 'path' for stuff that isn't a path but just needs a unicode string
Perhaps people can reply to this thread with any gripes they have?
See a related thread of things that boost.filesystem got wrong:
http://boost.2283326.n4.nabble.com/boost-filesystem-path-frustration-td46417...
I quite love the idea from that thread of making filesystem a first class abstraction. Although if I'm using a posix_filesystem object on windows to manipulate posix paths I obviously can't call operating system routines to do the work. And in this situation I imagine I couldn't open the file without converting the posix_filesystem::path to a windows_filesystem::path. Would this necessitate 2 implementations of posix_filesystem? One for when it is the native filesystem and one for when it isn't? I'd also like to see more ideas about how a virtual filesystem object would work. Sounds similar to PhysicsFS. http://icculus.org/physfs/
On Sat, Mar 16, 2013 at 12:54 PM, Alexander Lamaison
I'm finally getting round to moving to Filesystem v3 and now my code is breaking all over the place. The cause is the output of string() on Windows which has changed behaviour.
I'm manipulating Unix paths for use over SFTP, but doing so on Windows. For instance I might want to append "c" to the Unix path "/a/b".
path p("/a/b"); p /= "c"; cout << p.string();
In version 2 this would output "/a/b/c" but now it produces "/a/b\c". Why the breaking change?
IIRC, it was partially requests from users and partially the realization that most users want platform independent syntax but platform dependent semantics. If you would like your example to append a slash, use the (relatively new) path concatenation operator += p += "/c"; If you would rather continue to use operator /= then change the output to cout << p.generic_string(); HTH, --Beman
Beman Dawes
On Sat, Mar 16, 2013 at 12:54 PM, Alexander Lamaison
wrote: I'm finally getting round to moving to Filesystem v3 and now my code is breaking all over the place. The cause is the output of string() on Windows which has changed behaviour.
I'm manipulating Unix paths for use over SFTP, but doing so on Windows. For instance I might want to append "c" to the Unix path "/a/b".
path p("/a/b"); p /= "c"; cout << p.string();
In version 2 this would output "/a/b/c" but now it produces "/a/b\c". Why the breaking change?
IIRC, it was partially requests from users and partially the realization that most users want platform independent syntax but platform dependent semantics.
In that case I'd expect it to output "\a\b\c". I can't think of a reason why mixed slashes would ever be the right answer. It's the worst of both worlds. But, the biggest issue is that the change wasn't documented. The docs make a big deal of the change from templated paths to a single path class, but make no mention of this, more significant, difference.
If you would rather continue to use operator /= then change the output to
cout << p.generic_string();
I've since discovered this string/generic_string/native triplet. I don't think it's the right solution. string() should either return the generic string or the native string. What it returns at the moment is confusing and not very useful. Or is there a use-case I'm not seeing. Alex -- Swish - Easy SFTP for Windows Explorer (http://www.swish-sftp.org)
On Mar 23, 2013, at 12:13 PM, Alexander Lamaison
Beman Dawes
writes: On Sat, Mar 16, 2013 at 12:54 PM, Alexander Lamaison
wrote: I'm manipulating Unix paths for use over SFTP, but doing so on Windows. For instance I might want to append "c" to the Unix path "/a/b".
path p("/a/b"); p /= "c"; cout << p.string();
In version 2 this would output "/a/b/c" but now it produces "/a/b\c". Why the breaking change?
IIRC, it was partially requests from users and partially the realization that most users want platform independent syntax but platform dependent semantics.
In that case I'd expect it to output "\a\b\c". I can't think of a reason why mixed slashes would ever be the right answer. It's the worst of both worlds.
You're expecting this line path p("/a/b"); to parse your string into "a" / "b" when it merely stores the string as you gave it. Otherwise, when you add this line p /= "c"; you expect the previous separator to be used rather than the native separator it uses.
But, the biggest issue is that the change wasn't documented. The docs make a big deal of the change from templated paths to a single path class, but make no mention of this, more significant, difference.
I can't imagine that Beman would reject a documentation patch.
If you would rather continue to use operator /= then change the output to
cout << p.generic_string();
I've since discovered this string/generic_string/native triplet. I don't think it's the right solution. string() should either return the generic string or the native string. What it returns at the moment is confusing and not very useful. Or is there a use-case I'm not seeing.
string() is returning the contents, as you instructed path to form it. Calling generic_string() means parse the contents and ensure all separators follow the generic syntax. To do what you want would require path to be modal. Marking it, in this case, to do everything in the generic format first would have given the behavior you wanted. However, there is no such modality. / appends with the native separator. Construction from a string, or appending one, merely stores the supplied string. When you want a specific format after mixing those operations, call generic_string() or native_string(). HTH ___ Rob (Sent from my portable computation engine)
Rob Stewart
On Mar 23, 2013, at 12:13 PM, Alexander Lamaison
wrote: Beman Dawes
writes: On Sat, Mar 16, 2013 at 12:54 PM, Alexander Lamaison
wrote: I'm manipulating Unix paths for use over SFTP, but doing so on Windows. For instance I might want to append "c" to the Unix path "/a/b".
path p("/a/b"); p /= "c"; cout << p.string();
In version 2 this would output "/a/b/c" but now it produces "/a/b\c". Why the breaking change?
IIRC, it was partially requests from users and partially the realization that most users want platform independent syntax but platform dependent semantics.
In that case I'd expect it to output "\a\b\c". I can't think of a reason why mixed slashes would ever be the right answer. It's the worst of both worlds.
You're expecting this line
path p("/a/b");
to parse your string into "a" / "b" when it merely stores the string as you gave it.
That's not true. It does parse the string and recognises "a" and "b" as separate segments of the path. If it didn't, iteration would return "a/b" followed by "c". I wrote a small program (included at the end) to prove this. Here is the output: Enter path: a/b string(): a/b generic_string(): a/b native(): a/b Segments: "a" "b" Enter path: a\b string(): a\b generic_string(): a/b native(): a\b Segments: "a" "b" Enter path: a/b\c string(): a/b\c generic_string(): a/b/c native(): a/b\c Segments: "a" "b" "c"
Otherwise, when you add this line
p /= "c";
you expect the previous separator to be used rather than the native separator it uses.
I expect the path to abstract over the separators used and, when asked for the path as a string, return something consistent. It doesn't matter if that means always using the native separator, as long as it doesn't use both.
But, the biggest issue is that the change wasn't documented. The docs make a big deal of the change from templated paths to a single path class, but make no mention of this, more significant, difference.
I can't imagine that Beman would reject a documentation patch.
I'll fix the documentation once I understand the reasoning.
If you would rather continue to use operator /= then change the output to
cout << p.generic_string();
I've since discovered this string/generic_string/native triplet. I don't think it's the right solution. string() should either return the generic string or the native string. What it returns at the moment is confusing and not very useful. Or is there a use-case I'm not seeing.
string() is returning the contents, as you instructed path to form it. Calling generic_string() means parse the contents and ensure all separators follow the generic syntax.
Constructing a path from "a/b" isn't an instruction to the library to construct a path with forward slashes. It's an instruction to create a path abstraction with two segments, the first "a" and the second "b". Separators shouldn't even come into it until a string representation of the path is requested, and the string conversion methods should make explicit what separator to use: generic or native.
To do what you want would require path to be modal.
I don't undestand what you mean here.
Marking it, in this case, to do everything in the generic format first would have given the behavior you wanted. However, there is no such modality. / appends with the native separator. Construction from a string, or appending one, merely stores the supplied string. When you want a specific format after mixing those operations, call generic_string() or native_string().
Again, this isn't true. / appends a segment. Separators are irrevant
to a path abstraction. In version 2 this was the case. v3 muddles
it by giving seperators extra significance.
Alex
#define BOOST_FILESYSTEM_VERSION 3
#include
On Sun, Mar 24, 2013 at 8:03 PM, Alexander Lamaison
Rob Stewart
writes: [...] In that case I'd expect it to output "\a\b\c". I can't think of a reason why mixed slashes would ever be the right answer. It's the worst of both worlds.
You're expecting this line
path p("/a/b");
to parse your string into "a" / "b" when it merely stores the string as you gave it.
That's not true. It does parse the string and recognises "a" and "b" as separate segments of the path. If it didn't, iteration would return "a/b" followed by "c". I wrote a small program (included at the end) to prove this. Here is the output:
Based on previous conversations with Beman, I think that what Rob Stewart means, and correct me if I'm wrong, is that one "feature" of the library is the assertion path(str).string() == str. In other words: boost::path is a very dump strong typedef for a string that magically does encoding conversions and has some syntactic operations defined, like operator / that adds a *slash*. (...or a *backslash* on other platforms...) It seems that the designer of the library does not like the idea that path be a higher level platform independent abstraction of paths. As I'd say many times I see little use in the current path class, and I personally use UTF-8 std::strings everywhere with suitably defined operations. What annoys me is that Boost.filesystem has a fairly good multiplatform implementation of filesystem operative functions, but which depends on this dumb path class. -- Yakov
Yakov Galka
On Sun, Mar 24, 2013 at 8:03 PM, Alexander Lamaison
wrote: Rob Stewart
writes: [...] In that case I'd expect it to output "\a\b\c". I can't think of a reason why mixed slashes would ever be the right answer. It's the worst of both worlds.
You're expecting this line
path p("/a/b");
to parse your string into "a" / "b" when it merely stores the string as you gave it.
That's not true. It does parse the string and recognises "a" and "b" as separate segments of the path. If it didn't, iteration would return "a/b" followed by "c". I wrote a small program (included at the end) to prove this. Here is the output:
Based on previous conversations with Beman, I think that what Rob Stewart means, and correct me if I'm wrong, is that one "feature" of the library is the assertion path(str).string() == str.
It does now. It didn't used to and I'm struggling to see what the benefit of the change was. After all, if all you want is the original string why would you even use the path class.
In other words: boost::path is a very dump strong typedef for a string that magically does encoding conversions and has some syntactic operations defined, like operator / that adds a *slash*. (...or a *backslash* on other platforms...)
It seems that the designer of the library does not like the idea that path be a higher level platform independent abstraction of paths. As I'd say many times I see little use in the current path class, and I personally use UTF-8 std::strings everywhere with suitably defined operations. What annoys me is that Boost.filesystem has a fairly good multiplatform implementation of filesystem operative functions, but which depends on this dumb path class.
I wouldn't write off the path class entirely. It was what first got me in to Boost all those years ago! But it could be better and some of the recent changes don't make sense to me. Alex -- Swish - Easy SFTP for Windows Explorer (http://www.swish-sftp.org)
On Sun, Mar 24, 2013 at 9:12 PM, Alexander Lamaison
Yakov Galka
writes: [...] That's not true. It does parse the string and recognises "a" and "b" as separate segments of the path. If it didn't, iteration would return "a/b" followed by "c". I wrote a small program (included at the end) to prove this. Here is the output:
Based on previous conversations with Beman, I think that what Rob Stewart means, and correct me if I'm wrong, is that one "feature" of the library is the assertion path(str).string() == str.
It does now. It didn't used to and I'm struggling to see what the benefit of the change was. After all, if all you want is the original string why would you even use the path class.
These are exactly my thoughts..
[...]
It seems that the designer of the library does not like the idea that
path
be a higher level platform independent abstraction of paths. As I'd say many times I see little use in the current path class, and I personally use UTF-8 std::strings everywhere with suitably defined operations. What annoys me is that Boost.filesystem has a fairly good multiplatform implementation of filesystem operative functions, but which depends on this dumb path class.
I wouldn't write off the path class entirely. It was what first got me in to Boost all those years ago! But it could be better and some of the recent changes don't make sense to me.
I admit that a path class is a matter of preference. But this is why it is unfair that exists(const path &x) uses a path in its interface towards those who do not like using this class. -- Yakov
Yakov Galka
On Sun, Mar 24, 2013 at 9:12 PM, Alexander Lamaison
wrote: Yakov Galka
writes: [...] It seems that the designer of the library does not like the idea that
path
be a higher level platform independent abstraction of paths. As I'd say many times I see little use in the current path class, and I personally use UTF-8 std::strings everywhere with suitably defined operations. What annoys me is that Boost.filesystem has a fairly good multiplatform implementation of filesystem operative functions, but which depends on this dumb path class.
I wouldn't write off the path class entirely. It was what first got me in to Boost all those years ago! But it could be better and some of the recent changes don't make sense to me.
I admit that a path class is a matter of preference. But this is why it is unfair that exists(const path &x) uses a path in its interface towards those who do not like using this class.
You realise Boost.Filesystem actually works the way you describe? :-P Nothing forces you to use class path to benefit from the operation functions. Try this for example: assert(boost::filesystem::exists("c:\\")); Works as you expect. Alex -- Swish - Easy SFTP for Windows Explorer (http://www.swish-sftp.org)
On Mon, Mar 25, 2013 at 2:32 PM, Alexander Lamaison
Yakov Galka
writes: [...] I admit that a path class is a matter of preference. But this is why it is unfair that exists(const path &x) uses a path in its interface towards those who do not like using this class.
You realise Boost.Filesystem actually works the way you describe? :-P Nothing forces you to use class path to benefit from the operation functions. Try this for example:
assert(boost::filesystem::exists("c:\\"));
Yes, of course. But on, say, POSIX platforms, you pay for yet another string allocation even if you don't care for the path. In addition some library functions return a path, so there are always places where the path class pops up. -- Yakov
On Mar 24, 2013, at 2:47 PM, Yakov Galka
On Sun, Mar 24, 2013 at 8:03 PM, Alexander Lamaison
wrote: Rob Stewart
writes: [...] In that case I'd expect it to output "\a\b\c". I can't think of a reason why mixed slashes would ever be the right answer. It's the worst of both worlds.
You're expecting this line
path p("/a/b");
to parse your string into "a" / "b" when it merely stores the string as you gave it.
That's not true. It does parse the string and recognises "a" and "b" as separate segments of the path. If it didn't, iteration would return "a/b" followed by "c". I wrote a small program (included at the end) to prove this. Here is the output:
path doesn't do that parsing. The directory iterator extracts a native path.
Based on previous conversations with Beman, I think that what Rob Stewart means, and correct me if I'm wrong, is that one "feature" of the library is the assertion path(str).string() == str.
Yes
In other words: boost::path is a very dump strong typedef for a string that magically does encoding conversions and has some syntactic operations defined, like operator / that adds a *slash*. (...or a *backslash* on other platforms...)
Yes, path is a glorified string class. A fair bit of its value will be lost when there's better Unicode string support in the standard library. That said, there's still value in an abstraction that permits assembling and decomposing paths. If path did the normalization work, on insertion, or managed a collection of components, the overhead would increase. The current design permits assembling pieces in different ways and then, when a final result is needed, in a particular form, it offers three ways of getting the path: as assembled (string), generic, and native.
What annoys me is that Boost.filesystem has a fairly good multiplatform implementation of filesystem operative functions, but which depends on this dumb path class.
It would be reasonable to support overloads accepting strings, and not just paths. The current rationale, I think, is to overcome the lack of any other Unicode support. ___ Rob (Sent from my portable computation engine)
Rob Stewart
On Mar 24, 2013, at 2:47 PM, Yakov Galka
wrote: On Sun, Mar 24, 2013 at 8:03 PM, Alexander Lamaison
wrote: Rob Stewart
writes: [...] In that case I'd expect it to output "\a\b\c". I can't think of a reason why mixed slashes would ever be the right answer. It's the worst of both worlds.
You're expecting this line
path p("/a/b");
to parse your string into "a" / "b" when it merely stores the string as you gave it.
That's not true. It does parse the string and recognises "a" and "b" as separate segments of the path. If it didn't, iteration would return "a/b" followed by "c". I wrote a small program (included at the end) to prove this. Here is the output:
path doesn't do that parsing. The directory iterator extracts a native path.
Where exactly the parsing happens if an implementation detail. What matters is how class path 'thinks' about paths and, in many respects, class path abstracts away from separators. For example path("a/b").filename() return "a" not "a/b" on Windows. The problem is that part of the interface ignores separators. The other part makes them significant. It's a mess. One or the other please.
Based on previous conversations with Beman, I think that what Rob Stewart means, and correct me if I'm wrong, is that one "feature" of the library is the assertion path(str).string() == str.
Yes
In other words: boost::path is a very dump strong typedef for a string that magically does encoding conversions and has some syntactic operations defined, like operator / that adds a *slash*. (...or a *backslash* on other platforms...)
Yes, path is a glorified string class. A fair bit of its value will be lost when there's better Unicode string support in the standard library. That said, there's still value in an abstraction that permits assembling and decomposing paths.
If path did the normalization work, on insertion, or managed a collection of components, the overhead would increase.
And yet that's exactly what it used to do. If this change has been made as an 'optimisation' we should be worried. Making an interface less clear for a hypothetical optimisation is the kind of thing that we learn not to do in software engineering 101.
The current design permits assembling pieces in different ways and then, when a final result is needed, in a particular form, it offers three ways of getting the path: as assembled (string), generic, and native.
Actually the truth is worse than that: native() doesn't convert the slashes to native format. It just allows a string to contain both generic and native separators, so now I don't understand the difference between string() and native() at all.
What annoys me is that Boost.filesystem has a fairly good multiplatform implementation of filesystem operative functions, but which depends on this dumb path class.
It would be reasonable to support overloads accepting strings, and not just paths. The current rationale, I think, is to overcome the lack of any other Unicode support.
Not needed as this works already. See my other post. Alex -- Swish - Easy SFTP for Windows Explorer (http://www.swish-sftp.org)
On Mon, Mar 25, 2013 at 2:46 PM, Alexander Lamaison
The current design permits assembling pieces in different ways and then, when a final result is needed, in a particular form, it offers three ways of getting the path: as assembled (string), generic, and native.
Actually the truth is worse than that: native() doesn't convert the slashes to native format. It just allows a string to contain both generic and native separators, so now I don't understand the difference between string() and native() at all.
A nice feature that could be supported if only people wanted path to represent the abstraction rather than the string representation, is that native() could return \\?\ long path syntax for long paths... This would improve much software too. -- Yakov
On Sun, Mar 24, 2013 at 11:51 PM, Rob Stewart
In other words: boost::path is a very dump strong typedef for a string
On Mar 24, 2013, at 2:47 PM, Yakov Galka
wrote: that magically does encoding conversions and has some syntactic operations defined, like operator / that adds a *slash*. (...or a *backslash* on other platforms...)
Yes, path is a glorified string class. A fair bit of its value will be lost when there's better Unicode string support in the standard library. That said, there's still value in an abstraction that permits assembling and decomposing paths.
Yes, but current path class does not abstract anything, as was pointed in the original post. [...]
What annoys me is that Boost.filesystem has a fairly good multiplatform implementation of filesystem operative functions, but which depends on this dumb path class.
It would be reasonable to support overloads accepting strings, and not just paths. The current rationale, I think, is to overcome the lack of any other Unicode support.
There are superior approaches for Unicode support: using UTF-8 narrow
chars.[1] Why superior? Because the type of path[n], path.c_str(),
path.string() etc. would not change from one system to the other. Portable
libraries hide the differences between platforms rather than propagating
them to the interfaces.
I admit that maybe not everyone may agree on using narrow strings. But
then, why shove your approach to those who don't like UTF-16 on Windows?
Boost.Filesystem v2 was much better in this aspect too. If you like UTF-16
paths; use wpaths, if I like UTF-8 paths then I would use (narrow) paths
with an appropriate locale embedded. And no, current library interface does
not count: if I want to read a UTF-8 path from a database, do some
arthmetic work on it, and store it back, for some reason the author of the
idea jumped in the middle and decided that because I may potentially pass
this string to Windows (and in this case I do not) he must convert my
string to UTF-16 as soon as I gave it to him.... What a nonsense. Why
cannot I choose the exact type I want to use: path
On Mar 25, 2013, at 3:11 PM, Yakov Galka
Rob Stewart
On Mar 25, 2013, at 3:11 PM, Yakov Galka
wrote: [snip complaints]
I'm not a Boost.Filesystem expert. I tried to explain my understanding if why some things were how they are.
Of course. Someone has to make the counterargument and I'm glad you did. It really helps further the discussion.
I can tell you that Beman did not make changes in a vacuum. He regularly sought input from this list. If you weren't part of those discussions, then your opinions couldn't affect the outcome.
Now, given that you have such vociferous concerns with the design, I see several possible positive steps you can take:
• add Trac tickets for each of your concerns
https://svn.boost.org/trac/boost/ticket/8342
• write a drop-in replacement for what you think is broken and post it for review and Beman's consideration • write an alternative library and submit it for review
You should know that an updated version of Boost.Filesystem is up for standardization. You could get involved now to try to influence what gets standardized before what you consider wrong is standardized and is harder to change.
That's why I'm worried (think of the, thankfully few, decisions in the STL that we wish we weren't stuck with), but it's not clear to me how mere mortals participate in that process. Alex -- Swish - Easy SFTP for Windows Explorer (http://www.swish-sftp.org)
On 03/27/2013 11:47 AM, Alexander Lamaison wrote:
but it's not clear to me how mere mortals participate in that process.
I am not a committee member, but there is some nice information about that here: http://isocpp.org/std I think most members are mortals by the way. -- Bjørn
participants (8)
-
Alexander Lamaison
-
Beman Dawes
-
Bjørn Roald
-
Joseph Van Riper
-
Lars Viklund
-
Michael Marcin
-
Rob Stewart
-
Yakov Galka