On 28 July 2015 at 19:07, Andrey Semashev
On 28.07.2015 04:33, Paul Harris wrote:
I think we are not on the same page. Let me try and refocus the discussion...
With symlinks, there is more than one access point to the same file content. (ie multiple file names to the identical content).
That makes symlinks fundamentally different to regular files. And it's why they are treated differently. Eg don't back up content twice.
Is that statement correct?
As Niall already commented, that's not correct. What you described is more like a hardlink [1].
You can easily spot the difference if you rename or delete the file the link points to. The symlink will keep pointing to the old file (thus being a dangling symlink) while the hardlink will still be pointing to the file content.
A hardlink is actually not any more special than a regular file. Put simply, from the filesystem perspective any file is a name pointing to the content. When you create a new file, there's only one such name. When you create a hardlink, you create another name pointing to the same content and increment the reference count to the content. The two names are equivalent, and the content exists as long as there are names referencing it.
I think my point is being missed... I am not debating symlinks or hardlinks... I am _happy_ with the way hardlinks and symlinks are treated, in both posix and windows. I am _happy_ with the way reparse-based-symlinks and junctions are treated in windows. I am _disagree_ with the way dedup'd files are currently treated as a special file (as if they were a device or a character file or a fifo or a socket). device/socket/fifos all need to be read in a special way, but dedup'd files should be read as if they were a plain file. I _disagree_ that a dedup file should be treated as if they are a symlink. This is because a dedup file does not point to another file (or inode) on the file system, which is a characteristic of a symlink or a hardlink. It is basically just a compressed file. We don't treat NTFS-compressed files differently from regular files, why are we treating dedup'd files differently? Dedup files and symlink files on windows both (unfortunately) use the same mechanism - reparse points. But we should only treat symlink and junction reparse point files as symlinks. Anything else should be treated as a regular file. That is how I am reading the MS docs, and that is how I am experiencing working with the filesystems. Simple example is when building a backup program for files in a _single directory_. Lets say you want to store every file's content once. When you find a directory, ignore it. When you find an "other" file, ignore it (how can you backup a device / character file / etc?) When you find a symlink, you want to store just the link. When you find a regular file, you want to store the contents. When you find a reparse-point-symlink, you want to store just the link (like a posix symlink). When you find a dedup'd file, you want to store the contents (like a posix regular file). for (directory_iterator ...) { if (is_symlink(fn)) backup_link(fn); if (is_regular_file(fn)) backup_contents(fn); if (is_directory(fn)) ignore(fn); if (is_other(fn)) ignore(fn); } Currently, this pseudo code would fail to backup any automatic dedup'd files (which are basically any file older than 3 days on some of my sites). It fails because a dedup'd file is currently an "other". If you treat a dedup'd file as a symlink, only the "link" will be backed up. This link points to a magical place that is impossible to read other than simply reading "fn". So how does this simple program backup the dedup'd file contents? cheers, Paul