On 28 Jul 2015 at 20:40, Paul Harris wrote:
I am _disagree_ with the way dedup'd files are currently treated as a special file (as if they were a device or a character file or a fifo or a socket). device/socket/fifos all need to be read in a special way, but dedup'd files should be read as if they were a plain file.
I _disagree_ that a dedup file should be treated as if they are a symlink. This is because a dedup file does not point to another file (or inode) on the file system, which is a characteristic of a symlink or a hardlink. It is basically just a compressed file. We don't treat NTFS-compressed files differently from regular files, why are we treating dedup'd files differently?
From AFIO's perspective, when it does NtQueryDirectoryFile() to fetch
NTFS compressed files act exactly like normal files. Reparse point files do not and require significant additional processing to figure out what kind they are. That's the difference. metadata about a file entry, it can zero cost learn if an entry is a reparse point by examining FileAttributes for the FILE_ATTRIBUTE_REPARSE_POINT flag. It cannot tell what kind of reparse point file it is without opening the file and asking. Windows' CreateFile() API is astonishingly slow. To require calling that, then an additional NtQueryDirectoryFile() to fetch the FILE_REPARSE_POINT_INFORMATION metadata and close the handle - which is the fastest way I know of to fetch the reparse point tag code - would impose an enormous performance penalty for all file entries marked with FILE_ATTRIBUTE_REPARSE_POINT. I appreciate you're saying the cost is worth it, but we're thinking all Boost users here, not just the small minority on Windows Server 2012 with dedup turned on.
for (directory_iterator ...) { if (is_symlink(fn)) backup_link(fn); if (is_regular_file(fn)) backup_contents(fn); if (is_directory(fn)) ignore(fn); if (is_other(fn)) ignore(fn); }
Currently, this pseudo code would fail to backup any automatic dedup'd files (which are basically any file older than 3 days on some of my sites). It fails because a dedup'd file is currently an "other".
If you treat a dedup'd file as a symlink, only the "link" will be backed up. This link points to a magical place that is impossible to read other than simply reading "fn".
So how does this simple program backup the dedup'd file contents?
I appreciate the problem with saying something is a symlink, but trying to retrieve the target of that symlink has to error out because it's meaningless in the case of a dedup symlink. What seems to me the best route forward is you do something like this: if (is_symlink(fn)) { error_code ec; auto target=read_symlink(fn, ec); if(!ec) backup_link(fn); } Because is_regular_file() and is_directory() use status(), they follow any symlink so you can safely fall through to those. Is this acceptable to you? If so, I'll update AFIO accordingly to match these new semantics and add a note to the docs. I'm sure Beman will consider something similar when he gets to be less busy. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/