Hi All, Whatever happened to the work in this branch https://github.com/boostorg/filesystem/tree/feature/directory-entry-cache-re... that looks to implement this paper? http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0317r1.html Wondering if it's planned, dead, or looking for someone to do the work. I ran into the problem today by chance. -- chris
On 2020-04-30 05:45, Chris Glover via Boost wrote:
Hi All,
Whatever happened to the work in this branch https://github.com/boostorg/filesystem/tree/feature/directory-entry-cache-re...
that looks to implement this paper? http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0317r1.html
Wondering if it's planned, dead, or looking for someone to do the work. I ran into the problem today by chance.
It's definitely abandoned. And incomplete, by the looks of it.
On Thu, Apr 30, 2020 at 3:51 AM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
It's definitely abandoned. And incomplete, by the looks of it.
Yes, seems so. If there's interest in finishing this work, I can take a look to see if I can commit to that, but if it was abandoned for some other reason (not breaking ABI maybe?) then I won't bother. Does anyone know the history? -- chris
On Sat, May 2, 2020 at 5:54 PM Chris Glover
On Thu, Apr 30, 2020 at 3:51 AM Andrey Semashev via Boost
wrote: It's definitely abandoned. And incomplete, by the looks of it.
Yes, seems so. If there's interest in finishing this work, I can take a look to see if I can commit to that, but if it was abandoned for some other reason (not breaking ABI maybe?) then I won't bother.
Does anyone know the history?
I don't know the history but likely Beman didn't have time for it. ABI is not maintained across Boost releases, so it is not the issue. I'm not familiar with the proposal itself, so I don't know if there were any issues with it.
On 02/05/2020 15:53, Chris Glover via Boost wrote:
On Thu, Apr 30, 2020 at 3:51 AM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
It's definitely abandoned. And incomplete, by the looks of it.
Yes, seems so. If there's interest in finishing this work, I can take a look to see if I can commit to that, but if it was abandoned for some other reason (not breaking ABI maybe?) then I won't bother.
Does anyone know the history?
I cannot say for sure, but it was abandoned at around the same time as LLFIO demonstrated to Beman a way of enumerating directory contents, with complete stat_t per entry, @ > 4 million entries/sec/core on all the major platforms. That makes any notion of caching pointless, just enumerate the entire directory, always. I've also been arguing strenously before WG21 to deprecate directory_iterator as fundamentally incorrect ASAP, and I don't think I've been unsuccessful. Recent papers to reach WG21 proposing sorely needed improvements to directory_iterator have all been shot down. The feeling I got in the room was the whole thing needs replacing. My current hope for proposing std::directory_handle for standardisation is early 2021. Niall
On Tue, May 5, 2020 at 9:59 AM Niall Douglas via Boost < boost@lists.boost.org> wrote:
I cannot say for sure, but it was abandoned at around the same time as LLFIO demonstrated to Beman a way of enumerating directory contents, with complete stat_t per entry, @ > 4 million entries/sec/core on all the major platforms. That makes any notion of caching pointless, just enumerate the entire directory, always.
I've also been arguing strenously before WG21 to deprecate directory_iterator as fundamentally incorrect ASAP, and I don't think I've been unsuccessful. Recent papers to reach WG21 proposing sorely needed improvements to directory_iterator have all been shot down. The feeling I got in the room was the whole thing needs replacing. My current hope for proposing std::directory_handle for standardisation is early 2021.
Niall
Interesting opinion. Usually these sorts of things are a series of trade offs; memory vs time, latency vs throughput; convenience vs pick-your-favourite-metric, so saying once size would fit all is a bit dubious. Nonetheless, I looked at your library and thought it might give me exactly what I want because the API allows me to spend memory to save time. But it turned out to be really slow when recursing. This makes sense because it's generating many small queries which, because it's calling a low level API, the OS is unable to help with. Here's a callstack from WPA trace with the hot path. Microsoft Windows Profiler Line # Process Thread ID Stack Count Weight (in view) (ms) 12 | |- test.exe!llfio_v2_62985a1f::directory_handle::read 163634 19,004.423700 13 | | |- ntdll.dll!ZwQueryDirectoryFile 161368 18,737.389200 14 | | | |- ntoskrnl.exe!KiSystemServiceCopyEnd 161349 18,735.136900 15 | | | | |- ntoskrnl.exe!NtQueryDirectoryFile 161346 18,734.804000 16 | | | | | ntoskrnl.exe!NtQueryDirectoryFileEx 161346 18,734.804000 17 | | | | | |- ntoskrnl.exe!BuildQueryDirectoryIrp 160107 18,590.971100 18 | | | | | | |- ntoskrnl.exe!ProbeForWrite 160073 18,587.073100 19 | | | | | | | |- ntoskrnl.exe!KiPageFault 122698 14,246.119500 20 | | | | | | | | |- ntoskrnl.exe!MmAccessFault 79304 9,208.701200 21 | | | | | | | | | |- ntoskrnl.exe!MiDispatchFault 55441 6,439.250700 22 | | | | | | | | | | |- ntoskrnl.exe!MiResolveDemandZeroFault 51240 5,950.194000 23 | | | | | | | | | | | |- ntoskrnl.exe!MiResolvePrivateZeroFault 48839 5,670.213100 24 | | | | | | | | | | | | |- ntoskrnl.exe!MiCompletePrivateZeroFault 25331 2,938.969600 25 | | | | | | | | | | | | | |- ntoskrnl.exe!MiCompletePrivateZeroFault<itself> 13842 1,606.017700 I presume I am using the API correctly, but if not I'm happy to try something else. For reference, here are some rough timings from my test: boost::recursive_directory_iterator: ~30seconds. FindNextFile: ~13seconds llfio: ~980 seconds This was reading file size and modified date during iteration, which if they had been cached in recursive_directory_iterator, probably would have made it close in time to FindNextFile, which would be ideal for me. -- chris
On 07/05/2020 03:07, Chris Glover wrote:
I cannot say for sure, but it was abandoned at around the same time as LLFIO demonstrated to Beman a way of enumerating directory contents, with complete stat_t per entry, @ > 4 million entries/sec/core on all the major platforms. That makes any notion of caching pointless, just enumerate the entire directory, always.
I've also been arguing strenously before WG21 to deprecate directory_iterator as fundamentally incorrect ASAP, and I don't think I've been unsuccessful. Recent papers to reach WG21 proposing sorely needed improvements to directory_iterator have all been shot down. The feeling I got in the room was the whole thing needs replacing. My current hope for proposing std::directory_handle for standardisation is early 2021.
Interesting opinion.
Usually these sorts of things are a series of trade offs; memory vs time, latency vs throughput; convenience vs pick-your-favourite-metric, so saying once size would fit all is a bit dubious.
It's more fundamental than that. The kernel API which enumerates directories is quite like reading bytes from a file. Reading a file a single byte at a time is about the same time as reading lots of bytes at a time, because the overhead for calling any kernel API is dominant relative to the operation itself.
I presume I am using the API correctly, but if not I'm happy to try something else.
For reference, here are some rough timings from my test: boost::recursive_directory_iterator: ~30seconds. FindNextFile: ~13seconds llfio: ~980 seconds
I would be extremely surprised with these numbers. It surely must be the case that you calling the APIs wrong somehow. Can you send me, off list, an example of the code you are doing so I can check it?
This was reading file size and modified date during iteration, which if they had been cached in recursive_directory_iterator, probably would have made it close in time to FindNextFile, which would be ideal for me.
On Windows the llfio::directory_entry gets its file size and modified date filled in, as it comes for free on Windows during directory enumeration. Equally, during directory enumeration, you ought to ask only for what metadata you need. Sometimes LLFIO can use tricks to greatly improve performance. Niall
On Thu, May 7, 2020 at 4:49 AM Niall Douglas wrote:
On 07/05/2020 03:07, Chris Glover wrote:
I presume I am using the API correctly, but if not I'm happy to try something else.
For reference, here are some rough timings from my test: boost::recursive_directory_iterator: ~30seconds. FindNextFile: ~13seconds llfio: ~980 seconds
I would be extremely surprised with these numbers. It surely must be the case that you calling the APIs wrong somehow.
Can you send me, off list, an example of the code you are doing so I can check it?
If you don't mind, please paste the program to gist.github.com or paste.ubuntu.com . Others might be willing to test it on different implementations. Glen
On Thu, May 7, 2020 at 8:16 AM Glen Fernandes via Boost < boost@lists.boost.org> wrote:
If you don't mind, please paste the program to gist.github.com or paste.ubuntu.com . Others might be willing to test it on different implementations.
Sure, here's a small repo with the same algorithm implemented three different ways: FindNextFile, recursive_directory_iterator, llfio::directory_handle. All they do is categorise every file on C:\, so you should be able to change "C:\\" to "/" for other systems. I lifted these from a larger system so they're doing some extra work and probably have bugs, but should be close enough. https://github.com/cdglove/fs-test -- chris Glen
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 08/05/2020 03:06, Chris Glover via Boost wrote:
Sure, here's a small repo with the same algorithm implemented three different ways: FindNextFile, recursive_directory_iterator, llfio::directory_handle. All they do is categorise every file on C:\, so you should be able to change "C:\\" to "/" for other systems.
I lifted these from a larger system so they're doing some extra work and probably have bugs, but should be close enough.
So this turned into a weekend long thing to figure out, and I thank you for it, as we have a whole bunch of code here which assumes things which it turns out are out of date on recent Windows. FYI I had to break out the Windows Performance Recorder to figure out what the hell was going on in the Windows kernel, more on that shortly. Quick results for enumerating all of C:\ on my machine, warm cache: ``` G:\boostish\cdglove>x64\Release\llfio.exe test-llfio found 1485232 files. 13.872090s wall, 1.734375s user + 12.109375s system = 13.843750s CPU (99.8%) G:\boostish\cdglove>x64\Release\win32.exe test-win32 found 1491222 files. 14.713440s wall, 2.328125s user + 12.375000s system = 14.703125s CPU (99.9%) G:\boostish\cdglove>x64\Release\boost.exe Exception thrown: filesystem::recursive_directory_iterator increment error: The system cannot find the path specified Up until fatal error test-boost_filesystem found 868657 files. 66.506680s wall, 5.187500s user + 61.234375s system = 66.421875s CPU (99.9%) G:\boostish\cdglove>x64\Release\std.exe Exception thrown: recursive_directory_iterator::operator++: The system cannot find the path specified. Up until fatal error test-std_filesystem found 870341 files. 63.407547s wall, 2.921875s user + 60.375000s system = 63.296875s CPU (99.8%) ``` I attach, as a ZIP archive, the code which made the programs above. I annotated what I mainly changed in the LLFIO test to explain what you were doing wrong. In case the ZIP archive doesn't come through, I also uploaded it to https://gist.github.com/ned14/9ed4b583b3064ad9f8bbb8e9fa2408b6 You will note that LLFIO is basically the same speed as Win32 FindFirstFile(), except that LLFIO gives you race free snapshots, which is a very important guarantee under concurrent filesystem modification. Both Boost and std's recursive_directory_iterator are orders of magnitude slower, which cannot be helped due to their design. I am very sure, by the way, that LLFIO can enumerate the entire disc within five seconds on Windows, if you use a very different approach to yours. See below. --- To give a bit of the bigger picture, stuff seems to have changed in Windows in the last few years. Firstly, earlier Windows did not have a fast FindFirstFile() implementation. LLFIO used to beat it by 2x or 3x using the same technique which Windows Explorer used, because FindFirstFile() was that slow, so Windows Explorer went straight to the NT kernel, as LLFIO does. Now FindFirstFile() is competitive, which is a well done to Microsoft for fixing their stuff. Secondly, recent Windows 10 seems to have a much, much, much slower implementation of ProbeForWrite() than it used to. Like, quadratic complexity to input buffer size. LLFIO used to be super fast because we'd feed the kernel a very large kernel buffer to fill, and it would never exceed it. Unfortunately, ProbeForWrite() is now so slow that large kernel buffers means ProbeForWrite() running for 90% of the time, so that strategy no longer works. I did some trial and error, and LLFIO now has a completely new directory_handle::read() implementation. The new one iterates the directory once, in pieces using a small kernel buffer, to calculate the minimum size of kernel buffer needed for a snapshot. It then iterates the directory a second time, as a snapshot. This ensures that the kernel buffer is always minimally sized, and we still get our all important race free snapshot semantic. Despite performing 5x-6x more syscalls per directory enumeration, this strategy is actually much better than the previous one, because it keeps ProbeForWrite() from slowing everything down any more than is absolutely required. And it's no slower than FindFirstFile(), but with much stronger guarantees and full stat_t metadata returned per entry. --- Back to how to achieve five second whole disc enumerations, the first thing is stop performing a unicode transcode per file name. They are really horrible on the CPU and its caches. Use a small buffer optimisation to remove the dynamic memory allocation, for most file names, and just memcpy() the filename. The second thing is you should throw kernel threads at the problem. Filesystems will embarrassingly parallelise up to hardware concurrency, just make sure to never touch the same inode from more than one kernel thread at a time. Given one can enumerate the entire disc using a single kernel thread within 14 seconds or so, with a well written breadth-first concurrent algorithm, I see no reason at all why under 5 seconds for the entire disc isn't very possible. I honestly can no longer recommend LLFIO over Win32 any more for directory enumeration, on Windows at least. I can still heartily recommend LLFIO over the libc on POSIX, which we still beat handily. Thanks for reporting this issue, and providing a repro, it was very valuable. Niall
On Sun, May 10, 2020 at 6:02 PM Niall Douglas
So this turned into a weekend long thing to figure out, and I thank you for it, as we have a whole bunch of code here which assumes things which it turns out are out of date on recent Windows. FYI I had to break out the Windows Performance Recorder to figure out what the hell was going on in the Windows kernel, more on that shortly.
Quick results for enumerating all of C:\ on my machine, warm cache:
``` G:\boostish\cdglove>x64\Release\llfio.exe test-llfio found 1485232 files. 13.872090s wall, 1.734375s user + 12.109375s system = 13.843750s CPU (99.8%)
G:\boostish\cdglove>x64\Release\win32.exe test-win32 found 1491222 files. 14.713440s wall, 2.328125s user + 12.375000s system = 14.703125s CPU (99.9%)
G:\boostish\cdglove>x64\Release\boost.exe Exception thrown: filesystem::recursive_directory_iterator increment error: The system cannot find the path specified Up until fatal error test-boost_filesystem found 868657 files. 66.506680s wall, 5.187500s user + 61.234375s system = 66.421875s CPU (99.9%)
G:\boostish\cdglove>x64\Release\std.exe Exception thrown: recursive_directory_iterator::operator++: The system cannot find the path specified. Up until fatal error test-std_filesystem found 870341 files. 63.407547s wall, 2.921875s user + 60.375000s system = 63.296875s CPU (99.8%) ```
Can confirm I am seeing the same results as you now. After updating llfio, performance improved massively such that I was within 2x of FindNextFile. Then after making your recommended changes to how I was using llfio, they roughly converged. I'm not convinced that this could replace a high level directory iterator though. llfio is easy to use incorrectly as evidenced by the annotations you needed to add to my code. That's not a criticism of the design, only a commentary that it's sufficiently low level to cause some grief if one isn't very careful. I have a few minor improvement ideas on that which we can discuss off list. I'll likely play with the unfinished work to directory_iterator to see if any improvements can be made. I pushed the updated changes to https://github.com/cdglove/fs-test. I think I also fixed the access error exception you were getting in the boost test. (added pop_on_error to the directory options). Thanks for taking the time to look into all of this! -- chris
participants (4)
-
Andrey Semashev
-
Chris Glover
-
Glen Fernandes
-
Niall Douglas