On 07/05/2020 03:07, Chris Glover wrote:
I cannot say for sure, but it was abandoned at around the same time as LLFIO demonstrated to Beman a way of enumerating directory contents, with complete stat_t per entry, @ > 4 million entries/sec/core on all the major platforms. That makes any notion of caching pointless, just enumerate the entire directory, always.
I've also been arguing strenously before WG21 to deprecate directory_iterator as fundamentally incorrect ASAP, and I don't think I've been unsuccessful. Recent papers to reach WG21 proposing sorely needed improvements to directory_iterator have all been shot down. The feeling I got in the room was the whole thing needs replacing. My current hope for proposing std::directory_handle for standardisation is early 2021.
Interesting opinion.
Usually these sorts of things are a series of trade offs; memory vs time, latency vs throughput; convenience vs pick-your-favourite-metric, so saying once size would fit all is a bit dubious.
It's more fundamental than that. The kernel API which enumerates directories is quite like reading bytes from a file. Reading a file a single byte at a time is about the same time as reading lots of bytes at a time, because the overhead for calling any kernel API is dominant relative to the operation itself.
I presume I am using the API correctly, but if not I'm happy to try something else.
For reference, here are some rough timings from my test: boost::recursive_directory_iterator: ~30seconds. FindNextFile: ~13seconds llfio: ~980 seconds
I would be extremely surprised with these numbers. It surely must be the case that you calling the APIs wrong somehow. Can you send me, off list, an example of the code you are doing so I can check it?
This was reading file size and modified date during iteration, which if they had been cached in recursive_directory_iterator, probably would have made it close in time to FindNextFile, which would be ideal for me.
On Windows the llfio::directory_entry gets its file size and modified date filled in, as it comes for free on Windows during directory enumeration. Equally, during directory enumeration, you ought to ask only for what metadata you need. Sometimes LLFIO can use tricks to greatly improve performance. Niall