[Filesystem] Proposal: make filesystem generic-programming friendly
Dear all, I'm increasingly finding that I need to write programs that can operate over multiple kinds of filesystem, including the local filesystem, network filesystems such as (S)FTP, filesystems in archives such as zip files, and mock filesystems that exist only in memory for unit testing. This is not as easy as it could be. The difficulty is that the Boost.Filesystem API makes it difficult to write generic code that uses the local filesystem as just one of several API-compatible filesystems. Therefore, I've written a proposal for making a backward-compatible addition to the Boost.Filesystem API that would make it much more friendly to generic programming. It is a first draft and I welcome your comments, criticism and suggestions for improvement. The proposal is available here: http://alamaison.github.io/2014/01/09/proposal-generic-filesystem/ and I've included a text-only version below. Alex Proposal: Generic filesystem API ================================ [FIRST DRAFT] C++ streams provide a common interface to operate on file-like data, regardless of how that data it is actually stored. However, programs often need to operate not just on data from different file sources, but also on entirely separate filesystems. The problem ----------- There is, currently, no way to operate on different filesystems generically, if Boost.Filesystem is to be one of those filesystems. The local-filesystem operations in the Boost.Filesystem API make it difficult to call them generically because they, inadvertently, defeat ADL, so generic code cannot resolve them to specific implementations. For example, how do you write code that calls `path temp_directory_path();` and can operate on both the local filesystem, via Boost.Filesystem, and on a filesystem over FTP, via an FTP library? If both libraries declare the function, how do you resolve the correct implementation? You can't use a namespace as a type. Normally, ADL is the solution to the problem; it resolves the correct namespace based on the namespace of the operation's argument(s). But `temp_directory_path()` doesn't take an argument. The source of the problem is a misconception that the free-functions performing FS operations are part of the `class path` API, when really they are part of the API of an implicit local-filesystem object. The Boost.Filesystem FAQ asks:
**Why are paths sometimes manipulated by member functions and sometimes by non-member functions?**
The design rule is that purely lexical operations are supplied as class path member functions, while operations performed by the operating system are provided as free functions.
This is wrong because the majority of the non-member operations don't
manipulate paths at all. They are functions that manipulate the
filesystem, some of which use a path, and, therefore, are really part of
the API of an implicit local filesystem object. The proposed solution
(later) just makes this explicit, in a backward-compatible way.
Is modifying Boost.Filesystem necessary?
----------------------------------------
Before we discuss our proposal, let us explore how much can be
achieved without any changes to Boost.Filesystem, using the typical
ADL approach to generic programming.
### Limited solution: ADL on filesystem-specific path
If each filesystem were to declare a `path` class conforming to the
`boost::filesystem::path` interface, ADL could resolve a *subset* of the
filesystem operations.
The example below implements a simple generic algorithm,
`remove_if_temporary` over both Boost.Filesystem and an imaginary
`ftp_filesystem`.
#include <iostream>
#include
I've hit issues similar to the pointed ones, so I'm interested in such a solution. As a user I agree that being able to use the whole path/aglorithms on different kinds of filesystems would simplify greatly my current work. My use case is to be able to easily choose between real file system and archive file system at runtime. To implement archie file work I currently use PhysFS [1]. However I have several totally different projects with this need of runtime choice of filesystem and currently I have to add project-domain-specific file manipulation layers to each project which is therefore implemented twice, once with boost::filesystem and the other with PhysFS. That's a lot of time spent on making sure they do the same things and it is easily error prone. A solution to help me write my project-specific algorithms once would be very helpful. If the proposal is really applicable without any backward incompatibility, I'm open to test it ASAP in my current projects (if an implementation exists). Also, it seems that it would the solve the same problems for std::filesystem? It was a concern I had recently when thinking about it. Question: it is not totally clear to me what the proposal wants to do with the boost::filesystem::path class exactly. The example suggests that boost::filesystem::path would be usable for any filesystem implementation. In which case, do you suggest to remove absolute() and cannonical() from boost::filesystem::path so that it would only be possible to get these versions using one of the namespace functions overload of the same name, one for local_filesystem(), the other taking a filesystem as parametter? [1] http://icculus.org/physfs/
I replied yesterday but it looks like Gnus ate it.
Klaim - Joël Lamotte
My use case is to be able to easily choose between real file system and archive file system at runtime. To implement archie file work I currently use PhysFS [1].
However I have several totally different projects with this need of runtime choice of filesystem and currently I have to add project-domain-specific file manipulation layers to each project which is therefore implemented twice, once with boost::filesystem and the other with PhysFS. That's a lot of time spent on making sure they do the same things and it is easily error prone. A solution to help me write my project-specific algorithms once would be very helpful.
If we were to go ahead with my proposal, you could write a thin layer on top of PhysFS to make it a model of FilesystemConcept. Boost.Filesystem would also model this concept, and then write your algorithms once for both of them.
Also, it seems that it would the solve the same problems for std::filesystem? It was a concern I had recently when thinking about it.
The impeding standard was what caused me to release the draft now. I'd like to get this into Boost.Filesystem before the standard (which is based on it) is frozen.
Question: it is not totally clear to me what the proposal wants to do with the boost::filesystem::path class exactly.
True, I did gloss over this a bit.
The example suggests that boost::filesystem::path would be usable for any filesystem implementation.
Perhaps I overstated that. What I was trying to say was that it should be possible to use the same path class for multiple filesystems. That may mean reusing boost::filesystem::path, if its behaviour makes sense for that filesystem, or something else. The only case I can think of where *every* boost::filesystem::path method is applicable, would be when mocking the local filesystem. Even a filesystem implementation that connects to a remote filesystem using it's native path format, would not be able to use boost::filesystem::path because platform-specific methods would render the answer based on the *local* platform, rather than the remote one. A path class for such a filesystem would be almost identical to boost::filesystem::path, except for basing its platform-specific decisions on the remote platform.
In which case, do you suggest to remove absolute() and cannonical() from boost::filesystem::path so that it would only be possible to get these versions using one of the namespace functions overload of the same name, one for local_filesystem(), the other taking a filesystem as parametter?
absolute() and canonical() are already free functions, so these would just be overloaded or turned into filesystem object methods (whichever API design we choose) just like the other operations. But you probably mean make_absolute(). As far as I can see, this is the only filesystem-instance-specific method in boost::filesystem::path and would, ideally, not be there at all. However, in practice, it does no harm because it won't be in FilesystemPathConcept, so generic code cannot rely on it. Models of FilesystemPathConcept are permitted to have local-platform-specific behaviour for certain methods. For example, native() is allowed to vary the appearance of the path it returns based on local operating-system conventions. For example, a ZIP-file filesystem should be allowed to format paths using backslashes on Windows. However, filesystem-instance-specific behaviour should not be permitted as (apart from the local filesystem), that requires access to a filesystem object instance. Glad to hear any further thoughts. Alex
So basically you suggest to require the filesystem implementation concept to be able to work with boost::filestystem::path, but not restrain using a more filesystem-specific version of path, if it makes sense for this filesystem. boost::filesystem::path would just be the default path type, but not the only one usable depending on the filesystem implementation. Did I understand correctly? Also, what is your intent with the draft exactly? Do you want to provide such modifications? (as a PR maybe?) Or are you asking the maintainer to do it because you don't feel that you can yourself?
Klaim - Joël Lamotte
So basically you suggest to require the filesystem implementation concept to be able to work with boost::filestystem::path, but not restrain using a more filesystem-specific version of path, if it makes sense for this filesystem.
Yes. boost::filestystem::path may implement _more_ than the concept requires, of course.
boost::filesystem::path would just be the default path type, but not the only one usable depending on the filesystem implementation.
I wouldn't say the default type. There isn't really a default. Each filesystem specifies its implementation of FilesystemPathConcept using a typdef. A filesystem without this typedef would not model FilesystemConcept.
Also, what is your intent with the draft exactly? Do you want to provide such modifications? (as a PR maybe?) Or are you asking the maintainer to do it because you don't feel that you can yourself?
I intend to do all the changes to Boost.Filesystem myself, as time permits, as well as (non-Boost) implementations of an SFTP filesystem (partially complete) and a 7z filesystem (not yet started). The point of discussing the draft here is to make sure Beman, and any other interested parties, agree the changes make sense, and to get important feedback before I get too far ahead. Already your few questions have made the role of the path classes much clearer in my head, so thanks! Alex
On Jan 11, 2014, at 6:16 PM, Alexander Lamaison
Klaim - Joël Lamotte
writes: Each filesystem specifies its implementation of FilesystemPathConcept using a typdef. A filesystem without this typedef would not model FilesystemConcept.
Also, what is your intent with the draft exactly? Do you want to provide such modifications? (as a PR maybe?) Or are you asking the maintainer to do it because you don't feel that you can yourself? [snip] The point of discussing the draft here is to make sure Beman, and any other interested parties, agree the changes make sense, and to get important feedback before I get too far ahead.
I like your idea. It doesn't derail Beman's work, but does generalize it. ___ Rob (Sent from my portable computation engine)
On 11 Jan 2014 at 23:16, Alexander Lamaison wrote:
The point of discussing the draft here is to make sure Beman, and any other interested parties, agree the changes make sense, and to get important feedback before I get too far ahead. Already your few questions have made the role of the path classes much clearer in my head, so thanks!
I would be an interested party. Sometime in the future I'd like the ability to add to C++ filesystem namespaces which are implemented by a process-local abstraction layer which transforms high level filesystem operations into something else. This sounds not dissimilar to what you're doing. I would be particularly interested in being able to substantially extend the file path syntax with useful new capabilities. Like VMS did. Niall -- Currently unemployed and looking for work. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/
On Sat, Jan 11, 2014 at 12:37 PM, Alexander Lamaison
The impeding standard was what caused me to release the draft now. I'd like to get this into Boost.Filesystem before the standard (which is based on it) is frozen.
It is already frozen and in fact the ISO PDTS balloting closes in 8 days. Based on early unofficial feedback, ballot resolution will mostly be limited to typo-level changes. See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3803.pdf That's the bad news. The good news is that the committee's plan is that ISO/IEC PDTS 18822 AKA File System Technical Specification will be the first of a series of filesystem related Technical Specifications, and the committee's Filesystem Study Group will be actively soliciting proposals for new filesystem related components. Your proposal could wind up hitting the SG just when it is actively looking for new components for the next TS. More on this later. Thanks, --Beman
----------------------------------------
Date: Sun, 12 Jan 2014 11:18:11 -0500 From: bdawes@acm.org To: boost@lists.boost.org Subject: Re: [boost] [Filesystem] Proposal: make filesystem generic-programming friendly
On Sat, Jan 11, 2014 at 12:37 PM, Alexander Lamaison
wrote: The impeding standard was what caused me to release the draft now. I'd like to get this into Boost.Filesystem before the standard (which is based on it) is frozen.
It is already frozen and in fact the ISO PDTS balloting closes in 8 days. Based on early unofficial feedback, ballot resolution will mostly be limited to typo-level changes.
See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3803.pdf
That's the bad news. The good news is that the committee's plan is that ISO/IEC PDTS 18822 AKA File System Technical Specification will be the first of a series of filesystem related Technical Specifications, and the committee's Filesystem Study Group will be actively soliciting proposals for new filesystem related components. Your proposal could wind up hitting the SG just when it is actively looking for new components for the next TS.
More on this later. Thanks,
Just curious, how is this going? And was the proposal by Alexander ever submitted to the committee? It sounds really interesting and useful.
On Fri, Feb 14, 2014 at 9:40 AM, Ahmed Charles
----------------------------------------
Date: Sun, 12 Jan 2014 11:18:11 -0500 From: bdawes@acm.org To: boost@lists.boost.org Subject: Re: [boost] [Filesystem] Proposal: make filesystem generic-programming friendly
On Sat, Jan 11, 2014 at 12:37 PM, Alexander Lamaison
The impeding standard was what caused me to release the draft now. I'd like to get this into Boost.Filesystem before the standard (which is based on it) is frozen.
It is already frozen and in fact the ISO PDTS balloting closes in 8 days. Based on early unofficial feedback, ballot resolution will mostly be limited to typo-level changes.
See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3803.pdf
That's the bad news. The good news is that the committee's plan is that ISO/IEC PDTS 18822 AKA File System Technical Specification will be the first of a series of filesystem related Technical Specifications, and the committee's Filesystem Study Group will be actively soliciting proposals for new filesystem related components. Your proposal could wind up hitting the SG just when it is actively looking for new components for the next TS.
More on this later. Thanks,
Just curious, how is this going?
One no vote, all other National Bodies voted yes. The no vote and three of the yes votes had comments attached. Total of 33 National Body comments were technical. There were also editorial comments, which the editor will fix without the committee having to do anything. The committee will devote two meetings to fixing the issues. That's called ballot resolution is ISO-speak. We resolved most of the NB comments in Issaquah this week. For the most part that involved wording tweaks to the standardese. Surprisingly, the LWG/SG-3 voted to add make relative functions. There are also a bunch of issues from Bill Plauger and STL detailing problems they ran into working on the Microsoft implementation. They caught a lot of noexcept related isssues, for example. An updated working paper and issues lists will be available in the post-meeting mailing, due in two weeks or so.
And was the proposal by Alexander ever submitted to the committee?
It sounds really interesting and useful.
No sign of it, but that may be for the better as we were totally tied up with C++14 ballot resolution, Filesystem TS ballot resolution, pulling the Library Fundamentals TS together from the individual proposals, and starting TS working papers for several other TSes. The committee is on a roll. --Beman
Ahmed Charles
----------------------------------------
Date: Sun, 12 Jan 2014 11:18:11 -0500 From: bdawes@acm.org To: boost@lists.boost.org Subject: Re: [boost] [Filesystem] Proposal: make filesystem generic-programming friendly
On Sat, Jan 11, 2014 at 12:37 PM, Alexander Lamaison
wrote: The impeding standard was what caused me to release the draft now. I'd like to get this into Boost.Filesystem before the standard (which is based on it) is frozen.
It is already frozen and in fact the ISO PDTS balloting closes in 8 days. Based on early unofficial feedback, ballot resolution will mostly be limited to typo-level changes.
See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3803.pdf
That's the bad news. The good news is that the committee's plan is that ISO/IEC PDTS 18822 AKA File System Technical Specification will be the first of a series of filesystem related Technical Specifications, and the committee's Filesystem Study Group will be actively soliciting proposals for new filesystem related components. Your proposal could wind up hitting the SG just when it is actively looking for new components for the next TS.
More on this later. Thanks,
Just curious, how is this going? And was the proposal by Alexander ever submitted to the committee?
It sounds really interesting and useful.
Unfortunately I've not found the time yet. Job hunting is taking priority. Alex -- Swish - Easy SFTP for Windows Explorer (http://www.swish-sftp.org)
On Thu, Jan 9, 2014 at 7:34 PM, Klaim - Joël Lamotte
Question: it is not totally clear to me what the proposal wants to do with the boost::filesystem::path class exactly. The example suggests that boost::filesystem::path would be usable for any filesystem implementation. In which case, do you suggest to remove absolute() and cannonical() from boost::filesystem::path so that it would only be possible to get these versions using one of the namespace functions overload of the same name, one for local_filesystem(), the other taking a filesystem as parametter?
In the committee version of the library, absolute() and canonical() have already been removed from class path. I'm holding off a number of changes to the Boost version pending ISO ballot resolution. I don't want to change Boost filesystem only to have to change it yet again if ISO ballot resolution forces a conflicting change. --Beman
Klaim - Joël Lamotte
I've hit issues similar to the pointed ones, so I'm interested in such a solution. As a user I agree that being able to use the whole path/aglorithms on different kinds of filesystems would simplify greatly my current work.
My use case is to be able to easily choose between real file system and archive file system at runtime. To implement archie file work I currently use PhysFS [1].
However I have several totally different projects with this need of runtime choice of filesystem and currently I have to add project-domain-specific file manipulation layers to each project which is therefore implemented twice, once with boost::filesystem and the other with PhysFS. That's a lot of time spent on making sure they do the same things and it is easily error prone. A solution to help me write my project-specific algorithms once would be very helpful.
If my proposal goes ahead, you should be able to write a thin layer around the PhysFS API that implements the FilesystemConcept. Then the same code could use it, generically, with Boost.Filesystem.
If the proposal is really applicable without any backward incompatibility, I'm open to test it ASAP in my current projects (if an implementation exists). Also, it seems that it would the solve the same problems for std::filesystem? It was a concern I had recently when thinking about it.
That's what's pushed me to publish the draft now. I would like to see the same changes in std::filesystem before it's set in stone.
Question: it is not totally clear to me what the proposal wants to do with the boost::filesystem::path class exactly. The example suggests that boost::filesystem::path would be usable for any filesystem implementation.
Yes. The idea is to allow particular filesystem implementations to use boost::filesystem::path if they want to, or to use something else that implements the FilesystemPathConcept API, if they need it. Generic code can refer to the path type using its typedef in the filesystem object so that it works with either.
In which case, do you suggest to remove absolute() and cannonical() from boost::filesystem::path so that it would only be possible to get these versions using one of the namespace functions overload of the same name, one for local_filesystem(), the other taking a filesystem as parametter?
The functions `canonical` and `absolute` are already free functions, but, yes, I would expect to overload them/make them a method of the filesystem object (depending which API option we go ahead with) just like the others. As for methods of `boost::filesystem::path` like `make_absolute`, ideally these should be removed from `path` because they need to consult the filesystem. However, for backward-compatibility it is better to leave them in place, at least for the time being. std::filesystem represents a good opportunity to remove them because there is no existing code to remain backward-compatible with. As you can see from the Path decomposition table [1], many methods in `boost::filesystem::path` do have platform-specific behaviour and each filesystem implementor would have to look carefully at whether that is appropriate for their filesystem or not, before deciding whether to use `boost::filesystem::path` as their path class or whether to use their own. It may be that for the ZIP archives filesystem, it is desireable for paths within that filesystem to use back-slashes on Windows and forward-slashes everywhere else, whereas an FTP filesystem should always use forward-slashed regardless of platform. I am purposely distinguishing between filesystem-specific behaviour and platforms-specific behaviour. The latter just adapts to the local operating-system's display conventions while the former uses the filesystem to obtain information. Models of FilesystemPathConcept (be that `boost::filesystem::path` or otherwise) should be able to exhibit platform-specific behaviour to allow things like the ZIP path behaviour above. However, they should not have filesystem-specific behaviour, like `canonical()` as that may need filesystem instance data to implement, and so should be part of the filesystem object. Some examples of platform-specific behaviour in `boost::filesystem::path`: (constructor) make_preferred native c_str string wstring u16string u32string Some examples of filesystem-specific behaviour in `boost::filesystem::path` that, ideally, would not be there: make_absolute is_absolute is_relative [1] http://www.boost.org/doc/libs/1_55_0/libs/filesystem/doc/reference.html#Path... Alex -- Swish - Easy SFTP for Windows Explorer (http://www.swish-sftp.org)
participants (6)
-
Ahmed Charles
-
Alexander Lamaison
-
Beman Dawes
-
Klaim - Joël Lamotte
-
Niall Douglas
-
Rob Stewart