On 08/05/2020 03:06, Chris Glover via Boost wrote:
Sure, here's a small repo with the same algorithm implemented three different ways: FindNextFile, recursive_directory_iterator, llfio::directory_handle. All they do is categorise every file on C:\, so you should be able to change "C:\\" to "/" for other systems.
I lifted these from a larger system so they're doing some extra work and probably have bugs, but should be close enough.
So this turned into a weekend long thing to figure out, and I thank you for it, as we have a whole bunch of code here which assumes things which it turns out are out of date on recent Windows. FYI I had to break out the Windows Performance Recorder to figure out what the hell was going on in the Windows kernel, more on that shortly. Quick results for enumerating all of C:\ on my machine, warm cache: ``` G:\boostish\cdglove>x64\Release\llfio.exe test-llfio found 1485232 files. 13.872090s wall, 1.734375s user + 12.109375s system = 13.843750s CPU (99.8%) G:\boostish\cdglove>x64\Release\win32.exe test-win32 found 1491222 files. 14.713440s wall, 2.328125s user + 12.375000s system = 14.703125s CPU (99.9%) G:\boostish\cdglove>x64\Release\boost.exe Exception thrown: filesystem::recursive_directory_iterator increment error: The system cannot find the path specified Up until fatal error test-boost_filesystem found 868657 files. 66.506680s wall, 5.187500s user + 61.234375s system = 66.421875s CPU (99.9%) G:\boostish\cdglove>x64\Release\std.exe Exception thrown: recursive_directory_iterator::operator++: The system cannot find the path specified. Up until fatal error test-std_filesystem found 870341 files. 63.407547s wall, 2.921875s user + 60.375000s system = 63.296875s CPU (99.8%) ``` I attach, as a ZIP archive, the code which made the programs above. I annotated what I mainly changed in the LLFIO test to explain what you were doing wrong. In case the ZIP archive doesn't come through, I also uploaded it to https://gist.github.com/ned14/9ed4b583b3064ad9f8bbb8e9fa2408b6 You will note that LLFIO is basically the same speed as Win32 FindFirstFile(), except that LLFIO gives you race free snapshots, which is a very important guarantee under concurrent filesystem modification. Both Boost and std's recursive_directory_iterator are orders of magnitude slower, which cannot be helped due to their design. I am very sure, by the way, that LLFIO can enumerate the entire disc within five seconds on Windows, if you use a very different approach to yours. See below. --- To give a bit of the bigger picture, stuff seems to have changed in Windows in the last few years. Firstly, earlier Windows did not have a fast FindFirstFile() implementation. LLFIO used to beat it by 2x or 3x using the same technique which Windows Explorer used, because FindFirstFile() was that slow, so Windows Explorer went straight to the NT kernel, as LLFIO does. Now FindFirstFile() is competitive, which is a well done to Microsoft for fixing their stuff. Secondly, recent Windows 10 seems to have a much, much, much slower implementation of ProbeForWrite() than it used to. Like, quadratic complexity to input buffer size. LLFIO used to be super fast because we'd feed the kernel a very large kernel buffer to fill, and it would never exceed it. Unfortunately, ProbeForWrite() is now so slow that large kernel buffers means ProbeForWrite() running for 90% of the time, so that strategy no longer works. I did some trial and error, and LLFIO now has a completely new directory_handle::read() implementation. The new one iterates the directory once, in pieces using a small kernel buffer, to calculate the minimum size of kernel buffer needed for a snapshot. It then iterates the directory a second time, as a snapshot. This ensures that the kernel buffer is always minimally sized, and we still get our all important race free snapshot semantic. Despite performing 5x-6x more syscalls per directory enumeration, this strategy is actually much better than the previous one, because it keeps ProbeForWrite() from slowing everything down any more than is absolutely required. And it's no slower than FindFirstFile(), but with much stronger guarantees and full stat_t metadata returned per entry. --- Back to how to achieve five second whole disc enumerations, the first thing is stop performing a unicode transcode per file name. They are really horrible on the CPU and its caches. Use a small buffer optimisation to remove the dynamic memory allocation, for most file names, and just memcpy() the filename. The second thing is you should throw kernel threads at the problem. Filesystems will embarrassingly parallelise up to hardware concurrency, just make sure to never touch the same inode from more than one kernel thread at a time. Given one can enumerate the entire disc using a single kernel thread within 14 seconds or so, with a well written breadth-first concurrent algorithm, I see no reason at all why under 5 seconds for the entire disc isn't very possible. I honestly can no longer recommend LLFIO over Win32 any more for directory enumeration, on Windows at least. I can still heartily recommend LLFIO over the libc on POSIX, which we still beat handily. Thanks for reporting this issue, and providing a repro, it was very valuable. Niall