On 13 Sep 2016 at 7:37, degski wrote:
... Longer answer ...
Thanks for the write-up... It's a shame Windows doesn't do the VMS file-shredding though...
It would be hard to implement in NTFS. Each file is stored in a chain of 64Kb extents. Modifying a segment is a read-copy-update operation and relinking the chain, so as a file is updated you are basically leaking bits of data all over the free space list over time. Therefore shredding on delete is not particularly effective at truly deleting the file contents on NTFS, and that's why their defrag API on a cronjob is a much better way of doing it (and I think what the DoD C2 secure edition does). I should apologise to the list for yesterday not actually explaining why deleted files take a while to delete on Windows. All I can say is it's very busy as Boost Summer of Code winds down and CppCon nears. It's too easy to brain dump. The historical reason for that behaviour was explained, but not why it's still done today. The reason is because NTFS and Windows really does care about your data and forces a metadata fsync to the journal on the containing directory when you delete a file entry within it. Obviously this forces a journal write per file entry deleted, and if you're deleting say 1m file entries from a directory that would mean 1m fsyncs. To solve this, Windows actively avoids deleting files if the filesystem is busy despite that all handles are closed and the file was marked with the delete on close flag. I've seen up to two seconds in testing here locally. It'll then do a batch pass of writing a new MFT record with all the deleted files removed and fsync that, so instead of 1m fsyncs, there is just one. Some might ask why not immediately unlink it in RAM as Linux does? Linux historically really didn't try hard to avoid data loss on sudden power loss, and even today it uniquely requires programmers to explicitly call fsync on containing directories in order to achieve sudden power loss safety. NTFS and Windows tries much harder, and it tries to always keep what *metadata* the program sees via the kernel syscalls equal to what is on physical storage (actual file data is a totally separate matter). It makes programming reliable filesystem code much easier on Windows than on Linux which was traditionally a real bear. (ZFS on FreeBSD interestingly takes a middle approach in between Windows' and Linux's - it allows a maximum 5 second reordering window after which writes arrive on physical storage exactly in the order issued. This lets the program get ahead of storage by up to 30 seconds or so, but because you get a fairly total sequentially consistent ordering it makes sudden power loss recovery vastly easier because you only need to scan +/- 5 seconds to recover a valid state) Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/