On 1 Sep 2015 at 23:51, Jeremy Maitin-Shepard wrote:
Limiting the discussion to just the problem of race free filesystem on POSIX, imagine the problem of opening the sibling of a file in a directory whose path is constantly changing. POSIX does not provide an API for opening the parent directory of an open file descriptor. To work around this, you must first get the canonical current path of the open file descriptor, strip off the end, open that directory and then use that as a base directory for opening a file with the same leafname as your original file. You then loop all that if you fail to open that leafname, or the leafname opened has a different inode. Once you have the correct parent directory, you can open the sibling. This is an example of where caching the stat_t of handle during open saves syscalls and branches in more performance important APIs later on.
The normal way to do this with POSIX *at APIs would be to just open a handle to the directory in the first place. I suppose the purpose of this more complex approach is to avoid having to keep an extra file descriptor to the directory open, or to allow the user to open a sibling file from an arbitrary AFIO file handle without preparing in advance (as I suppose would be required by your shadow file-based locking approach). It does seem like rather specific and not necessarily all that common functionality to require users to pay for by default.
The approach you just suggested doesn't handle the case where you open a file, and then it gets moved before the sibling lookup. AFIO can be asked actually to cache the containing directory of a handle. It will attempt to use that handle, when available, to skip inode looping. This feature needs to be specifically enabled as it is off by default.
A similar problem exists for race free file deletions and a long list of other scenarios. The cause is a number of defects in the POSIX race free APIs. The Austin Working Group are aware of the problem. Windows doesn't have problems with race free siblings and deletions due to a much better thought through race free API, but it does have other problems with deletions not being actual deletions and different workarounds are needed there.
Personally, I would prefer an API that lets me pay only for what I need. You could expose the low-level platform-specific behavior but also provide higher-level operations that have added complexity to avoid races or emulate behavior not provided natively.
As you're almost certainly aware by now, I benchmark each feature and decide if it's worth defaulting to on or defaulting to off given my best judgement of all the factors involved. If you really want race protection off all the time, simply instantiate a dispatcher with a file_flags mask which disables race protection for all operations on that dispatcher. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/