[filesystem] proposal: treat reparse files as regular files
Hi all, -- Proposal -- tl;dr : I propose that we treat all non-symlink "reparse_files" as "regular_files". If the boost library user wants to do something special with these plain reparse files, they should use alternative means. But typically they are supposed to be treated as regular files. This means we could drop the "reparse_file" enum, or continue to use it for a special-case whats_my_real_status() function. --- Motivation --- Windows Server 2012 uses reparse points to implement deduplification. Those files should be treated as regular files in all circumstances. Currently, they are not classed as "regular" files, so fs::copy() will skip those files, and library-user code written to list files based on official examples will ignore all dedup'd files. This is causing serious and latent problems at the user end, because deduping only happens occasionally after X days, and users cannot easily check if a file is dedup'd (they look just like regular files). --- Real life example --- Another example of reparse use is the "Symantec Enterprise Vault" (version 10), which I found running on one site. It replaces files on the server with reparse-point files. FSUTIL REPARSEPOINT QUERY filename.txt shows the contents of the reparse buffer, which is a URL to an internal HTTP server. The url points to a .asp link with a bunch of codes and dates to identify the file in the server. Copy-pasting that URL into a webbrowser allows you to directly download the file via the webbrowser, which is pretty neat I suppose. In this case, the reparsed-files in Windows Explorer all have grey X crosses on their file icon. If you "type" them (via cmd) or open them, the icon loses the grey cross and the file is no longer a reparse point file. My software refused to read the files because they were "not regular files". Once I adjusted the boost code (described below), my software saw them as regular and opened the files. The file icons lost the grey cross. SO it seems that the file server automatically downloads and replaces the files with the stored content on demand, and the file reading client program should really just treat these files as normal files. --- Short logic --- reparse files (that are not symlinks) should almost always be treated as plain files. They are a mechanism for MS file servers to store files in clever ways, but the client should not care and just read/write them as if they were normal files. This is different to all the other "other" files which can't be treated like normal files: block, character, fifo, socket, unknown So, reparse files should not be grouped with the "other" file types. They are also NOT symlinks, and should not be treated as symlinks (which would require special decisions for copying, or querying the status, or checking if the target still exists). --- What are reparse files --- I did some reading, if I understand correctly: Reparse points give drivers (on the server) a chance to get data through some other specialised means (eg query from a cluster store). They are processed by the server, not the client, so clients should treat reparse data as opaque data. EXCEPT for symlink reparse files. https://msdn.microsoft.com/en-us/library/dd541667.aspx quote:"The following reparse tags, with the exception of IO_REPARSE_TAG_SYMLINK, are processed on the server and are not processed by a client after transmission over the wire. Clients should treat associated reparse data as opaque data." It seems like the rest of the tags are used for connecting files to other types of storage (eg long term storage, cluster storage). Clients may need to do something special with SOME reparse point files, IF the client cares about how long the file read may take. https://msdn.microsoft.com/en-us/library/windows/desktop/aa365505(v=vs.85).a... quote: "Most applications should take special actions for files that have been moved to long-term storage, if only to notify the user that it may take a while to retrieve the file." --- Changes required --- Option 1: change is_regular_file() to return true where type==reparse_file I don't like this option, as library-users could be checking the type directly instead of using is_regular_file(). Option 2: These functions return reparse_file: fs::file_type query_file_type(const path& p, error_code* ec) file_status status(const path& p, error_code* ec) file_status symlink_status(const path& p, error_code* ec) They should instead return regular_file instead. --- How to test with dedup files --- Creating dedup'd files is a feature only available on Windows Server 2012, I believe, although Windows XP/Vista/7/8/10 clients all can read dedup files. Here is how I created a windows server to test with (for free!) on a demo Azure cloud server. I have one working, so if anyone would like to use it for their testing, let me know. Step one: follow this blog article: http://blogs.technet.com/b/tommypatterson/p/azureservertrial.aspx once the machine was "running" I clicked Connect at the bottom. That gave me an .rdp file which in theory I could use with rdesktop, but it uses a DNS name that was only just created, so that didn't work. When you click the name of the server in the list, it shows the public IP on the right.. and the port then you can do this $ rdesktop that.ip.addr:port But only if you have the latest rdesktop AND you have set up kerberos something-something. Instead I found a windows computer and used remote desktop from there. --- Once inside, in the "Server Manager --> Dashboard" window on the screen, click "Add Roles" then go next next until "Server Roles" expand "File and Storage services" , "File and iSCSI" , and tick "Data Deduplication" Then next next etc and Install. Wait a bit... and its done. http://www.techrepublic.com/blog/data-center/configuring-windows-server-8-de... --- Continuing on that webpage... Time to enable dedup. There is a temp disk D: so lets enable there. Method 1... I did this and then went to method 2... Start PowerShell, type: "Enable-DedupVolume D:" Method 2... in that same Dashboard, hit the 4th button (File and Storage Services) Then Volumes --> Disks click Volume 1 at the top, and then right click D: at the bottom --> Configure Dedup. To try and accelerate this puppy, I set the "age to dedup" to 0 days. http://www.techrepublic.com/blog/data-center/windows-server-2012-deduplicati... --- Time to make something to dedup. We'll just duplicate the warning.txt file that exists on D: In powershell: PS> D: PS> $file = Get-Content DATALOSS_WARNING_README.txt Then, do these 2 commands a bunch of times until "big.txt" gets to say 6MB PS> Add-Content big.txt $file PS> $file = Get-Content big.txt Then use windows explorer (or other) to make a dozen copies of big.txt Copy c:\windows\explorer.exe to D: to give it something to dedup Go to D: and then copy-paste explorer.exe a dozen times. In PowerShell, type: PS> Update-DedupStatus -Volume D: PS> Start-DedupStatus -Type Optimization -Volume D: and then wait for it to finish. you can track its progress with: PS> Get-DedupJob PS> Get-DedupStatus -Volume D: --- So, once its deduped, you check. PS> FSUTIL REPARSEPOINT QUERY big.txt you should see that its a reparse point with that 0x800etc0013 code. Copy-paste big.txt to big2.txt and check it with the query, and it should tell you big2 is NOT a reparse point. NOW you have some files to test the boost library... You can't zip them up (they lose the dedup tag), you have to run boost binaries ON the computer in the sky. --- Finish --- Thanks for reading, Paul
On 24 Jul 2015 at 16:03, Paul Harris wrote:
tl;dr : I propose that we treat all non-symlink "reparse_files" as "regular_files".
If the boost library user wants to do something special with these plain reparse files, they should use alternative means. But typically they are supposed to be treated as regular files.
This means we could drop the "reparse_file" enum, or continue to use it for a special-case whats_my_real_status() function.
--- Motivation ---
Windows Server 2012 uses reparse points to implement deduplification. Those files should be treated as regular files in all circumstances. Currently, they are not classed as "regular" files, so fs::copy() will skip those files, and library-user code written to list files based on official examples will ignore all dedup'd files.
This is causing serious and latent problems at the user end, because deduping only happens occasionally after X days, and users cannot easily check if a file is dedup'd (they look just like regular files).
--- Real life example ---
Another example of reparse use is the "Symantec Enterprise Vault" (version 10), which I found running on one site. It replaces files on the server with reparse-point files. FSUTIL REPARSEPOINT QUERY filename.txt shows the contents of the reparse buffer, which is a URL to an internal HTTP server. The url points to a .asp link with a bunch of codes and dates to identify the file in the server. Copy-pasting that URL into a webbrowser allows you to directly download the file via the webbrowser, which is pretty neat I suppose.
In this case, the reparsed-files in Windows Explorer all have grey X crosses on their file icon. If you "type" them (via cmd) or open them, the icon loses the grey cross and the file is no longer a reparse point file.
My software refused to read the files because they were "not regular files". Once I adjusted the boost code (described below), my software saw them as regular and opened the files. The file icons lost the grey cross.
SO it seems that the file server automatically downloads and replaces the files with the stored content on demand, and the file reading client program should really just treat these files as normal files.
--- Short logic ---
reparse files (that are not symlinks) should almost always be treated as plain files. They are a mechanism for MS file servers to store files in clever ways, but the client should not care and just read/write them as if they were normal files.
This is different to all the other "other" files which can't be treated like normal files: block, character, fifo, socket, unknown
So, reparse files should not be grouped with the "other" file types.
They are also NOT symlinks, and should not be treated as symlinks (which would require special decisions for copying, or querying the status, or checking if the target still exists).
--- What are reparse files ---
I did some reading, if I understand correctly:
Reparse points give drivers (on the server) a chance to get data through some other specialised means (eg query from a cluster store). They are processed by the server, not the client, so clients should treat reparse data as opaque data. EXCEPT for symlink reparse files.
https://msdn.microsoft.com/en-us/library/dd541667.aspx
quote:"The following reparse tags, with the exception of IO_REPARSE_TAG_SYMLINK, are processed on the server and are not processed by a client after transmission over the wire. Clients should treat associated reparse data as opaque data."
It seems like the rest of the tags are used for connecting files to other types of storage (eg long term storage, cluster storage). Clients may need to do something special with SOME reparse point files, IF the client cares about how long the file read may take. https://msdn.microsoft.com/en-us/library/windows/desktop/aa365505(v=vs.85).a... quote: "Most applications should take special actions for files that have been moved to long-term storage, if only to notify the user that it may take a while to retrieve the file."
--- Changes required ---
Option 1: change is_regular_file() to return true where type==reparse_file I don't like this option, as library-users could be checking the type directly instead of using is_regular_file().
Option 2: These functions return reparse_file:
fs::file_type query_file_type(const path& p, error_code* ec) file_status status(const path& p, error_code* ec) file_status symlink_status(const path& p, error_code* ec)
They should instead return regular_file instead.
--- How to test with dedup files ---
Creating dedup'd files is a feature only available on Windows Server 2012, I believe, although Windows XP/Vista/7/8/10 clients all can read dedup files.
Here is how I created a windows server to test with (for free!) on a demo Azure cloud server. I have one working, so if anyone would like to use it for their testing, let me know.
Step one: follow this blog article: http://blogs.technet.com/b/tommypatterson/p/azureservertrial.aspx
once the machine was "running" I clicked Connect at the bottom. That gave me an .rdp file which in theory I could use with rdesktop, but it uses a DNS name that was only just created, so that didn't work.
When you click the name of the server in the list, it shows the public IP on the right.. and the port then you can do this $ rdesktop that.ip.addr:port
But only if you have the latest rdesktop AND you have set up kerberos something-something.
Instead I found a windows computer and used remote desktop from there.
---
Once inside, in the "Server Manager --> Dashboard" window on the screen, click "Add Roles" then go next next until "Server Roles" expand "File and Storage services" , "File and iSCSI" , and tick "Data Deduplication" Then next next etc and Install. Wait a bit... and its done. http://www.techrepublic.com/blog/data-center/configuring-windows-server-8-de...
---
Continuing on that webpage... Time to enable dedup. There is a temp disk D: so lets enable there.
Method 1... I did this and then went to method 2... Start PowerShell, type: "Enable-DedupVolume D:"
Method 2... in that same Dashboard, hit the 4th button (File and Storage Services) Then Volumes --> Disks click Volume 1 at the top, and then right click D: at the bottom --> Configure Dedup.
To try and accelerate this puppy, I set the "age to dedup" to 0 days.
http://www.techrepublic.com/blog/data-center/windows-server-2012-deduplicati...
---
Time to make something to dedup. We'll just duplicate the warning.txt file that exists on D:
In powershell: PS> D: PS> $file = Get-Content DATALOSS_WARNING_README.txt
Then, do these 2 commands a bunch of times until "big.txt" gets to say 6MB PS> Add-Content big.txt $file PS> $file = Get-Content big.txt
Then use windows explorer (or other) to make a dozen copies of big.txt
Copy c:\windows\explorer.exe to D: to give it something to dedup Go to D: and then copy-paste explorer.exe a dozen times.
In PowerShell, type: PS> Update-DedupStatus -Volume D: PS> Start-DedupStatus -Type Optimization -Volume D:
and then wait for it to finish. you can track its progress with: PS> Get-DedupJob PS> Get-DedupStatus -Volume D:
---
So, once its deduped, you check. PS> FSUTIL REPARSEPOINT QUERY big.txt you should see that its a reparse point with that 0x800etc0013 code.
Copy-paste big.txt to big2.txt and check it with the query, and it should tell you big2 is NOT a reparse point.
NOW you have some files to test the boost library... You can't zip them up (they lose the dedup tag), you have to run boost binaries ON the computer in the sky.
--- Finish ---
Thanks for reading, Paul
I appreciate all the detail, and I'm sure so does Beman who is Filesystem's maintainer. However, they all still look like symlinks to me. Just because the OS magically replaces them with the real file on first access is immaterial - the same thing could happen on Linux. If you don't treat them as symlinks, there is no way of inspecting the link without causing it to be auto-downloaded which could be catastrophic in some use cases. I still vote for pseudo-symlinks to be reported by Filesystem as symlinks. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 25 July 2015 at 00:56, Niall Douglas
On 24 Jul 2015 at 16:03, Paul Harris wrote:
tl;dr : I propose that we treat all non-symlink "reparse_files" as "regular_files".
If the boost library user wants to do something special with these plain reparse files, they should use alternative means. But typically they are supposed to be treated as regular files.
This means we could drop the "reparse_file" enum, or continue to use it for a special-case whats_my_real_status() function.
--- Motivation ---
Windows Server 2012 uses reparse points to implement deduplification. Those files should be treated as regular files in all circumstances. Currently, they are not classed as "regular" files, so fs::copy() will skip those files, and library-user code written to list files based on official examples will ignore all dedup'd files.
This is causing serious and latent problems at the user end, because deduping only happens occasionally after X days, and users cannot easily check if a file is dedup'd (they look just like regular files).
--- Real life example ---
Another example of reparse use is the "Symantec Enterprise Vault" (version 10), which I found running on one site. It replaces files on the server with reparse-point files. FSUTIL REPARSEPOINT QUERY filename.txt shows the contents of the reparse buffer, which is a URL to an internal HTTP server. The url points to a .asp link with a bunch of codes and dates to identify the file in the server. Copy-pasting that URL into a webbrowser allows you to directly download the file via the webbrowser, which is pretty neat I suppose.
In this case, the reparsed-files in Windows Explorer all have grey X crosses on their file icon. If you "type" them (via cmd) or open them, the icon loses the grey cross and the file is no longer a reparse point file.
My software refused to read the files because they were "not regular files". Once I adjusted the boost code (described below), my software saw them as regular and opened the files. The file icons lost the grey cross.
SO it seems that the file server automatically downloads and replaces the files with the stored content on demand, and the file reading client program should really just treat these files as normal files.
--- Short logic ---
reparse files (that are not symlinks) should almost always be treated as plain files. They are a mechanism for MS file servers to store files in clever ways, but the client should not care and just read/write them as if they were normal files.
This is different to all the other "other" files which can't be treated like normal files: block, character, fifo, socket, unknown
So, reparse files should not be grouped with the "other" file types.
They are also NOT symlinks, and should not be treated as symlinks (which would require special decisions for copying, or querying the status, or checking if the target still exists).
--- What are reparse files ---
I did some reading, if I understand correctly:
Reparse points give drivers (on the server) a chance to get data through some other specialised means (eg query from a cluster store). They are processed by the server, not the client, so clients should treat reparse data as opaque data. EXCEPT for symlink reparse files.
https://msdn.microsoft.com/en-us/library/dd541667.aspx
quote:"The following reparse tags, with the exception of IO_REPARSE_TAG_SYMLINK, are processed on the server and are not processed by a client after transmission over the wire. Clients should treat associated reparse data as opaque data."
It seems like the rest of the tags are used for connecting files to other types of storage (eg long term storage, cluster storage). Clients may need to do something special with SOME reparse point files, IF the client cares about how long the file read may take.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365505(v=vs.85).a...
quote: "Most applications should take special actions for files that have been moved to long-term storage, if only to notify the user that it may take a while to retrieve the file."
--- Changes required ---
Option 1: change is_regular_file() to return true where type==reparse_file I don't like this option, as library-users could be checking the type directly instead of using is_regular_file().
Option 2: These functions return reparse_file:
fs::file_type query_file_type(const path& p, error_code* ec) file_status status(const path& p, error_code* ec) file_status symlink_status(const path& p, error_code* ec)
They should instead return regular_file instead.
--- How to test with dedup files ---
Creating dedup'd files is a feature only available on Windows Server 2012, I believe, although Windows XP/Vista/7/8/10 clients all can read dedup files.
Here is how I created a windows server to test with (for free!) on a demo Azure cloud server. I have one working, so if anyone would like to use it for their testing, let me know.
Step one: follow this blog article: http://blogs.technet.com/b/tommypatterson/p/azureservertrial.aspx
once the machine was "running" I clicked Connect at the bottom. That gave me an .rdp file which in theory I could use with rdesktop, but it uses a DNS name that was only just created, so that didn't work.
When you click the name of the server in the list, it shows the public IP on the right.. and the port then you can do this $ rdesktop that.ip.addr:port
But only if you have the latest rdesktop AND you have set up kerberos something-something.
Instead I found a windows computer and used remote desktop from there.
---
Once inside, in the "Server Manager --> Dashboard" window on the screen, click "Add Roles" then go next next until "Server Roles" expand "File and Storage services" , "File and iSCSI" , and tick "Data Deduplication" Then next next etc and Install. Wait a bit... and its done.
http://www.techrepublic.com/blog/data-center/configuring-windows-server-8-de...
---
Continuing on that webpage... Time to enable dedup. There is a temp disk D: so lets enable there.
Method 1... I did this and then went to method 2... Start PowerShell,
type:
"Enable-DedupVolume D:"
Method 2... in that same Dashboard, hit the 4th button (File and Storage Services) Then Volumes --> Disks click Volume 1 at the top, and then right click D: at the bottom --> Configure Dedup.
To try and accelerate this puppy, I set the "age to dedup" to 0 days.
http://www.techrepublic.com/blog/data-center/windows-server-2012-deduplicati...
---
Time to make something to dedup. We'll just duplicate the warning.txt
file
that exists on D:
In powershell: PS> D: PS> $file = Get-Content DATALOSS_WARNING_README.txt
Then, do these 2 commands a bunch of times until "big.txt" gets to say 6MB PS> Add-Content big.txt $file PS> $file = Get-Content big.txt
Then use windows explorer (or other) to make a dozen copies of big.txt
Copy c:\windows\explorer.exe to D: to give it something to dedup Go to D: and then copy-paste explorer.exe a dozen times.
In PowerShell, type: PS> Update-DedupStatus -Volume D: PS> Start-DedupStatus -Type Optimization -Volume D:
and then wait for it to finish. you can track its progress with: PS> Get-DedupJob PS> Get-DedupStatus -Volume D:
---
So, once its deduped, you check. PS> FSUTIL REPARSEPOINT QUERY big.txt you should see that its a reparse point with that 0x800etc0013 code.
Copy-paste big.txt to big2.txt and check it with the query, and it should tell you big2 is NOT a reparse point.
NOW you have some files to test the boost library... You can't zip them up (they lose the dedup tag), you have to run boost binaries ON the computer in the sky.
--- Finish ---
Thanks for reading, Paul
I appreciate all the detail, and I'm sure so does Beman who is Filesystem's maintainer.
However, they all still look like symlinks to me. Just because the OS magically replaces them with the real file on first access is immaterial - the same thing could happen on Linux. If you don't treat them as symlinks, there is no way of inspecting the link without causing it to be auto-downloaded which could be catastrophic in some use cases.
I still vote for pseudo-symlinks to be reported by Filesystem as symlinks.
I did think about that, but the design of these reparse points intends for these files to be treated as plain files by the client - as per MS documents. Plus, I understand it as: the reparse buffer is entirely driver-specific, and so you can't expect boost or any user program to be able to decode what is inside the reparse buffer and do anything intelligent. AND the resolving is done by the driver on the server side. Note that there are probably a dozen products out there that use these reparse buffers for their storage solution... its not just windows dedup. So, I don't see how the client can't do anything intelligent with symlink knowledge, AND if boost library users are forced to treat them as symlinks, then you now have 2 kinds of symlinks: 1) standard symlink, which you really want a shallow copy sometimes, and you have to be careful of loops ( A -> B -> A ) 2) reparse (but not symlink), which you cannot shallow-copy (as far as I understand), and loops are not possible. So I've already seen: * My software doesn't want to follow links, but now the new version will force me to specifically check if its just a reparse-file and then follow. * Whole-disk backup software don't follow symlinks because they assume they'll get the real file later. Reparse (nonsymlink) files do not have any other "real file" so those files are not being backed up at all right now. So treating as a symlink causes more trouble than the helping the one edge case. reparse-files-non-symlink is such a specialised case, I'd personally want a specialised get_reparse_info kind of function, so if I really need to care, then I can find that information. Your thoughts? Cheers, Paul
On 27 Jul 2015 at 10:55, Paul Harris wrote:
However, they all still look like symlinks to me. Just because the OS magically replaces them with the real file on first access is immaterial - the same thing could happen on Linux. If you don't treat them as symlinks, there is no way of inspecting the link without causing it to be auto-downloaded which could be catastrophic in some use cases.
I still vote for pseudo-symlinks to be reported by Filesystem as symlinks.
I did think about that, but the design of these reparse points intends for these files to be treated as plain files by the client - as per MS documents.
This is like saying that POSIX symlinks are intended to be treated as their target, which is the whole point of using them. Reparse points are the *technology* by which Microsoft implemented symlinks in NTFS. They offer a *family* of symlink implementations, all with varying semantics. Some of that family bear strong resemblence to the much more limited POSIX symlink, others are quite different. If you weren't on NTFS, the technology used to implement symlinks is different. For example, the NT kernel provides its own non-persistent symlink implementation totally separate from NTFS.
Plus, I understand it as: the reparse buffer is entirely driver-specific, and so you can't expect boost or any user program to be able to decode what is inside the reparse buffer and do anything intelligent.
Microsoft have published the structure for their reparse tag formats. Anyone can parse that structure (AFIO does).
AND the resolving is done by the driver on the server side. Note that there are probably a dozen products out there that use these reparse buffers for their storage solution... its not just windows dedup.
The resolution varies actually. For example junction points are resolved server side, symlinks are resolved client side.
So, I don't see how the client can't do anything intelligent with symlink knowledge, AND if boost library users are forced to treat them as symlinks, then you now have 2 kinds of symlinks:
1) standard symlink, which you really want a shallow copy sometimes, and you have to be careful of loops ( A -> B -> A )
2) reparse (but not symlink), which you cannot shallow-copy (as far as I understand), and loops are not possible.
You can copy the standard Microsoft reparse points as those are documented. I see no reason why Filesystem's read_symlink(), create_symlink() and copy_symlink() all don't work just fine if upgraded to understand more reparse point types.
* My software doesn't want to follow links, but now the new version will force me to specifically check if its just a reparse-file and then follow.
No, that depends on whatever the OS does with the symlink. Ordinarily I would assume it dereferences the link unless you specifically ask for it not to, same as on POSIX i.e. if you lstat() it, it returns the stat for the symlink, if you stat() it it returns the stat for the target.
* Whole-disk backup software don't follow symlinks because they assume they'll get the real file later. Reparse (nonsymlink) files do not have any other "real file" so those files are not being backed up at all right now.
So treating as a symlink causes more trouble than the helping the one edge case.
reparse-files-non-symlink is such a specialised case, I'd personally want a specialised get_reparse_info kind of function, so if I really need to care, then I can find that information.
Your thoughts?
I think Filesystem should provide what POSIX provides. Where Windows provides close enough to POSIX behaviours we should support that too. However pages of special Windows support isn't what Boost does usually. We're here to abstract out the commonalities generally speaking. I agree Filesystem (and AFIO) should recognise deduped files as something valid and can be worked with. Anything past that is up to the end user. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
I think we are not on the same page. Let me try and refocus the
discussion...
With symlinks, there is more than one access point to the same file
content. (ie multiple file names to the identical content).
That makes symlinks fundamentally different to regular files. And it's why
they are treated differently. Eg don't back up content twice.
Is that statement correct?
Reparse point files (that are not junctions or symlinks) do not have an
alternate access point through the file system.
You cannot access the underlying data via another file name. Eg dedup
files.
Is that also correct?
Cheers,
Paul
On 27 Jul 2015 8:42 pm, "Niall Douglas"
On 27 Jul 2015 at 10:55, Paul Harris wrote:
However, they all still look like symlinks to me. Just because the OS magically replaces them with the real file on first access is immaterial - the same thing could happen on Linux. If you don't treat them as symlinks, there is no way of inspecting the link without causing it to be auto-downloaded which could be catastrophic in some use cases.
I still vote for pseudo-symlinks to be reported by Filesystem as symlinks.
I did think about that, but the design of these reparse points intends for these files to be treated as plain files by the client - as per MS documents.
This is like saying that POSIX symlinks are intended to be treated as their target, which is the whole point of using them.
Reparse points are the *technology* by which Microsoft implemented symlinks in NTFS. They offer a *family* of symlink implementations, all with varying semantics. Some of that family bear strong resemblence to the much more limited POSIX symlink, others are quite different.
If you weren't on NTFS, the technology used to implement symlinks is different. For example, the NT kernel provides its own non-persistent symlink implementation totally separate from NTFS.
Plus, I understand it as: the reparse buffer is entirely driver-specific, and so you can't expect boost or any user program to be able to decode what is inside the reparse buffer and do anything intelligent.
Microsoft have published the structure for their reparse tag formats. Anyone can parse that structure (AFIO does).
AND the resolving is done by the driver on the server side. Note that there are probably a dozen products out there that use these reparse buffers for their storage solution... its not just windows dedup.
The resolution varies actually. For example junction points are resolved server side, symlinks are resolved client side.
So, I don't see how the client can't do anything intelligent with symlink knowledge, AND if boost library users are forced to treat them as symlinks, then you now have 2 kinds of symlinks:
1) standard symlink, which you really want a shallow copy sometimes, and you have to be careful of loops ( A -> B -> A )
2) reparse (but not symlink), which you cannot shallow-copy (as far as I understand), and loops are not possible.
You can copy the standard Microsoft reparse points as those are documented.
I see no reason why Filesystem's read_symlink(), create_symlink() and copy_symlink() all don't work just fine if upgraded to understand more reparse point types.
* My software doesn't want to follow links, but now the new version will force me to specifically check if its just a reparse-file and then follow.
No, that depends on whatever the OS does with the symlink. Ordinarily I would assume it dereferences the link unless you specifically ask for it not to, same as on POSIX i.e. if you lstat() it, it returns the stat for the symlink, if you stat() it it returns the stat for the target.
* Whole-disk backup software don't follow symlinks because they assume they'll get the real file later. Reparse (nonsymlink) files do not have any other "real file" so those files are not being backed up at all right now.
So treating as a symlink causes more trouble than the helping the one edge case.
reparse-files-non-symlink is such a specialised case, I'd personally want a specialised get_reparse_info kind of function, so if I really need to care, then I can find that information.
Your thoughts?
I think Filesystem should provide what POSIX provides. Where Windows provides close enough to POSIX behaviours we should support that too.
However pages of special Windows support isn't what Boost does usually. We're here to abstract out the commonalities generally speaking.
I agree Filesystem (and AFIO) should recognise deduped files as something valid and can be worked with. Anything past that is up to the end user.
Niall
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 28 Jul 2015 at 9:33, Paul Harris wrote:
I think we are not on the same page. Let me try and refocus the discussion...
With symlinks, there is more than one access point to the same file content. (ie multiple file names to the identical content).
That makes symlinks fundamentally different to regular files. And it's why they are treated differently. Eg don't back up content twice.
Is that statement correct?
No. Symlinks are small text files consisting of the path to indirect to. You can open them and modify them whether on POSIX or Windows. For most OS filesystem APIs, the OS spots symlink files and magically does the indirection for you.
Reparse point files (that are not junctions or symlinks) do not have an alternate access point through the file system.
You cannot access the underlying data via another file name. Eg dedup files.
Is that also correct?
Reparse points are just like POSIX symlink files - they are small files containing the path of where to indirect to. They are not special in any way, except by triggering exceptional behaviour in most OS APIs. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 28.07.2015 04:33, Paul Harris wrote:
I think we are not on the same page. Let me try and refocus the discussion...
With symlinks, there is more than one access point to the same file content. (ie multiple file names to the identical content).
That makes symlinks fundamentally different to regular files. And it's why they are treated differently. Eg don't back up content twice.
Is that statement correct?
As Niall already commented, that's not correct. What you described is more like a hardlink [1]. You can easily spot the difference if you rename or delete the file the link points to. The symlink will keep pointing to the old file (thus being a dangling symlink) while the hardlink will still be pointing to the file content. A hardlink is actually not any more special than a regular file. Put simply, from the filesystem perspective any file is a name pointing to the content. When you create a new file, there's only one such name. When you create a hardlink, you create another name pointing to the same content and increment the reference count to the content. The two names are equivalent, and the content exists as long as there are names referencing it. [1] https://en.wikipedia.org/wiki/Hard_link
On 28 July 2015 at 19:07, Andrey Semashev
On 28.07.2015 04:33, Paul Harris wrote:
I think we are not on the same page. Let me try and refocus the discussion...
With symlinks, there is more than one access point to the same file content. (ie multiple file names to the identical content).
That makes symlinks fundamentally different to regular files. And it's why they are treated differently. Eg don't back up content twice.
Is that statement correct?
As Niall already commented, that's not correct. What you described is more like a hardlink [1].
You can easily spot the difference if you rename or delete the file the link points to. The symlink will keep pointing to the old file (thus being a dangling symlink) while the hardlink will still be pointing to the file content.
A hardlink is actually not any more special than a regular file. Put simply, from the filesystem perspective any file is a name pointing to the content. When you create a new file, there's only one such name. When you create a hardlink, you create another name pointing to the same content and increment the reference count to the content. The two names are equivalent, and the content exists as long as there are names referencing it.
I think my point is being missed... I am not debating symlinks or hardlinks... I am _happy_ with the way hardlinks and symlinks are treated, in both posix and windows. I am _happy_ with the way reparse-based-symlinks and junctions are treated in windows. I am _disagree_ with the way dedup'd files are currently treated as a special file (as if they were a device or a character file or a fifo or a socket). device/socket/fifos all need to be read in a special way, but dedup'd files should be read as if they were a plain file. I _disagree_ that a dedup file should be treated as if they are a symlink. This is because a dedup file does not point to another file (or inode) on the file system, which is a characteristic of a symlink or a hardlink. It is basically just a compressed file. We don't treat NTFS-compressed files differently from regular files, why are we treating dedup'd files differently? Dedup files and symlink files on windows both (unfortunately) use the same mechanism - reparse points. But we should only treat symlink and junction reparse point files as symlinks. Anything else should be treated as a regular file. That is how I am reading the MS docs, and that is how I am experiencing working with the filesystems. Simple example is when building a backup program for files in a _single directory_. Lets say you want to store every file's content once. When you find a directory, ignore it. When you find an "other" file, ignore it (how can you backup a device / character file / etc?) When you find a symlink, you want to store just the link. When you find a regular file, you want to store the contents. When you find a reparse-point-symlink, you want to store just the link (like a posix symlink). When you find a dedup'd file, you want to store the contents (like a posix regular file). for (directory_iterator ...) { if (is_symlink(fn)) backup_link(fn); if (is_regular_file(fn)) backup_contents(fn); if (is_directory(fn)) ignore(fn); if (is_other(fn)) ignore(fn); } Currently, this pseudo code would fail to backup any automatic dedup'd files (which are basically any file older than 3 days on some of my sites). It fails because a dedup'd file is currently an "other". If you treat a dedup'd file as a symlink, only the "link" will be backed up. This link points to a magical place that is impossible to read other than simply reading "fn". So how does this simple program backup the dedup'd file contents? cheers, Paul
I am watching this thread closely. But I'm traveling until next week so won't comment on technical issues until then. --Beman
Hi Beman,
Have you had time to consider the way forward for boost::filesystem?
cheers,
Paul
On 28 July 2015 at 21:59, Beman Dawes
I am watching this thread closely. But I'm traveling until next week so won't comment on technical issues until then.
--Beman
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 28 Jul 2015 at 20:40, Paul Harris wrote:
I am _disagree_ with the way dedup'd files are currently treated as a special file (as if they were a device or a character file or a fifo or a socket). device/socket/fifos all need to be read in a special way, but dedup'd files should be read as if they were a plain file.
I _disagree_ that a dedup file should be treated as if they are a symlink. This is because a dedup file does not point to another file (or inode) on the file system, which is a characteristic of a symlink or a hardlink. It is basically just a compressed file. We don't treat NTFS-compressed files differently from regular files, why are we treating dedup'd files differently?
From AFIO's perspective, when it does NtQueryDirectoryFile() to fetch
NTFS compressed files act exactly like normal files. Reparse point files do not and require significant additional processing to figure out what kind they are. That's the difference. metadata about a file entry, it can zero cost learn if an entry is a reparse point by examining FileAttributes for the FILE_ATTRIBUTE_REPARSE_POINT flag. It cannot tell what kind of reparse point file it is without opening the file and asking. Windows' CreateFile() API is astonishingly slow. To require calling that, then an additional NtQueryDirectoryFile() to fetch the FILE_REPARSE_POINT_INFORMATION metadata and close the handle - which is the fastest way I know of to fetch the reparse point tag code - would impose an enormous performance penalty for all file entries marked with FILE_ATTRIBUTE_REPARSE_POINT. I appreciate you're saying the cost is worth it, but we're thinking all Boost users here, not just the small minority on Windows Server 2012 with dedup turned on.
for (directory_iterator ...) { if (is_symlink(fn)) backup_link(fn); if (is_regular_file(fn)) backup_contents(fn); if (is_directory(fn)) ignore(fn); if (is_other(fn)) ignore(fn); }
Currently, this pseudo code would fail to backup any automatic dedup'd files (which are basically any file older than 3 days on some of my sites). It fails because a dedup'd file is currently an "other".
If you treat a dedup'd file as a symlink, only the "link" will be backed up. This link points to a magical place that is impossible to read other than simply reading "fn".
So how does this simple program backup the dedup'd file contents?
I appreciate the problem with saying something is a symlink, but trying to retrieve the target of that symlink has to error out because it's meaningless in the case of a dedup symlink. What seems to me the best route forward is you do something like this: if (is_symlink(fn)) { error_code ec; auto target=read_symlink(fn, ec); if(!ec) backup_link(fn); } Because is_regular_file() and is_directory() use status(), they follow any symlink so you can safely fall through to those. Is this acceptable to you? If so, I'll update AFIO accordingly to match these new semantics and add a note to the docs. I'm sure Beman will consider something similar when he gets to be less busy. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 29 July 2015 at 10:06, Niall Douglas
On 28 Jul 2015 at 20:40, Paul Harris wrote:
I am _disagree_ with the way dedup'd files are currently treated as a special file (as if they were a device or a character file or a fifo or a socket). device/socket/fifos all need to be read in a special way, but dedup'd files should be read as if they were a plain file.
I _disagree_ that a dedup file should be treated as if they are a symlink. This is because a dedup file does not point to another file (or inode) on the file system, which is a characteristic of a symlink or a hardlink. It is basically just a compressed file. We don't treat NTFS-compressed files differently from regular files, why are we treating dedup'd files differently?
NTFS compressed files act exactly like normal files. Reparse point files do not and require significant additional processing to figure out what kind they are. That's the difference.
You only need to process symlink-reparse-point-files. Dedup reparse point files can be treated the same as a normal file.
From AFIO's perspective, when it does NtQueryDirectoryFile() to fetch metadata about a file entry, it can zero cost learn if an entry is a reparse point by examining FileAttributes for the FILE_ATTRIBUTE_REPARSE_POINT flag. It cannot tell what kind of reparse point file it is without opening the file and asking.
Windows' CreateFile() API is astonishingly slow. To require calling that, then an additional NtQueryDirectoryFile() to fetch the FILE_REPARSE_POINT_INFORMATION metadata and close the handle - which is the fastest way I know of to fetch the reparse point tag code - would impose an enormous performance penalty for all file entries marked with FILE_ATTRIBUTE_REPARSE_POINT.
I have no comment on performance. I want things to work.
I appreciate you're saying the cost is worth it, but we're thinking all Boost users here, not just the small minority on Windows Server 2012 with dedup turned on.
You don't seem to understand that this affects ANY Windows client that talks to a Windows 2012 dedup-enabled server. Which, as of last month, has gone from zero to 5 different companies in my world. Seems that all the IT departments are upgrading after the end-of- financial-year. So, a Windows 7 user will be accessing dedup files.
for (directory_iterator ...) { if (is_symlink(fn)) backup_link(fn); if (is_regular_file(fn)) backup_contents(fn); if (is_directory(fn)) ignore(fn); if (is_other(fn)) ignore(fn); }
Currently, this pseudo code would fail to backup any automatic dedup'd files (which are basically any file older than 3 days on some of my sites). It fails because a dedup'd file is currently an "other".
If you treat a dedup'd file as a symlink, only the "link" will be backed up. This link points to a magical place that is impossible to read other than simply reading "fn".
So how does this simple program backup the dedup'd file contents?
I appreciate the problem with saying something is a symlink, but trying to retrieve the target of that symlink has to error out because it's meaningless in the case of a dedup symlink.
Please stop calling it "dedup symlink". It is _not_ any kind of symlink. That is the point of misunderstanding, we are not on the same page.
What seems to me the best route forward is you do something like this:
if (is_symlink(fn)) { error_code ec; auto target=read_symlink(fn, ec); if(!ec) backup_link(fn); }
Because is_regular_file() and is_directory() use status(), they follow any symlink so you can safely fall through to those.
This is unacceptable, because I do not want to follow symlinks. That was specified in the example. Lets be more specific about the example directory to backup. On Monday, it contains: FILE_A (a plain file) FILE_B (a symlink to FILE_A) FILE_C (a plain copy of FILE_A) Backup should store this: FILE_A contents. FILE_B link. FILE_C contents. On Tuesday, dedup/archival has run on the server. Directory now contains: FILE_A (a dedup file) FILE_B (a symlink to FILE_A) FILE_C (a dedup file) Backup SHOULD store this: FILE_A contents. FILE_B link. FILE_C contents. IF you treat dedup=symlink, then the example will instead store: FILE_A link. FILE_B link. FILE_C link. (although I have no idea what "FILE_A link" will actually read) If you follow symlinks, then backup stores the wrong thing: FILE_A contents. FILE_B contents (WRONG). FILE_C contents. If you treat dedup files as regular files, then backup stores correctly: FILE_A contents. FILE_B link. FILE_C contents. cheers, Paul
On 29 Jul 2015 at 12:27, Paul Harris wrote:
I appreciate the problem with saying something is a symlink, but trying to retrieve the target of that symlink has to error out because it's meaningless in the case of a dedup symlink.
Please stop calling it "dedup symlink". It is _not_ any kind of symlink. That is the point of misunderstanding, we are not on the same page.
It *is* a kind of symlink. Deduped files on NTFS are kept as a chain of compressed fragments. When you open the file handle, all that has to be decompressed and rechained back together into a temporary inode. This is why deduped files are so publicly marked because they are much more expensive to open than regular files. I suspect that's why CIFS exports the flag instead of actually treating the file as a proper regular file because you want client programs to know this isn't a regular file. Anyway, thanks to Gavin I have a solution for AFIO which is optimal, so I'll commit that shortly - these deduped files are going to get a special flag, not least because handle::path() is going to return something weird for the open file handle. Beman has a trickier problem on his hands - he can either add a special type of flag for these files and then the OP's code falls through to is_regular_file and he's happy. Or he can filter out the symlink flag when the reparse tag is a dedup, and always return a regular file instead. I don't know which is better for Filesystem. Thanks to Gavin for spotting that the reserved field in WIN32_FIND_DATA is officially the reparse tag type! Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 29/07/2015 14:06, Niall Douglas wrote:
NTFS compressed files act exactly like normal files. Reparse point files do not and require significant additional processing to figure out what kind they are. That's the difference.
From AFIO's perspective, when it does NtQueryDirectoryFile() to fetch metadata about a file entry, it can zero cost learn if an entry is a reparse point by examining FileAttributes for the FILE_ATTRIBUTE_REPARSE_POINT flag. It cannot tell what kind of reparse point file it is without opening the file and asking.
Windows' CreateFile() API is astonishingly slow. To require calling that, then an additional NtQueryDirectoryFile() to fetch the FILE_REPARSE_POINT_INFORMATION metadata and close the handle - which is the fastest way I know of to fetch the reparse point tag code - would impose an enormous performance penalty for all file entries marked with FILE_ATTRIBUTE_REPARSE_POINT.
If it helps, https://msdn.microsoft.com/en-us/library/windows/desktop/aa365511.aspx seems to specify that reparse points provide their tag id in the dwReserved0 field of the WIN32_FIND_DATA structure (I'm not sure how that maps to the native API, but I assume it's somewhere). That should be sufficient to identify the reparse point type. (https://msdn.microsoft.com/en-us/library/windows/desktop/aa365740.aspx backs this up, incidentally.) Granted, a single NtQueryDirectoryFile on the whole directory is not enough to get both sets of data, but you should still be able to do it in just two calls per directory (times however many calls are required to fully enumerate the directory, of course). Presumably you're currently using one of FileBothDirectoryInformation or FileFullDirectoryInformation. You should be able to switch to the "Id" variants (FileIdBothDirectoryInformation or FileIdFullDirectoryInformation) instead (if you're not already using them). This gives you a FileId for each file, along with the other information. After you've enumerated the entire directory, you can go back and get FileReparsePointInformation for the whole directory, and then match up the FileId against the FileReference to merge the data and get the reparse tag for each file. (I haven't tested this, so I'm not sure if it gives you an empty tag for files that aren't reparse points, or only lists reparse points. The latter would be nice, as it would be close to zero overhead for directories that do not contain reparse points.) Presumably Win32 FindFirstFile is doing something like this internally, since it does provide the reparse tag. I'm not sure if it's current, but http://blogs.technet.com/b/filecab/archive/2013/02/14/dfsr-reparse-point-sup... seems to suggest the following behaviour as reasonable: - treating IO_REPARSE_TAG_MOUNT_POINT as directory symlinks - treating IO_REPARSE_TAG_SYMLINK as symlinks - treating IO_REPARSE_TAG_DEDUP, IO_REPARSE_TAG_SIS, and IO_REPARSE_TAG_HSM as regular files - treating any other tag as something to be ignored (in most cases) There was also a note that you can use IsReparseTagNameSurrogate to determine if a given reparse point tag is a surrogate (some kind of link) or not (treat like regular file). That might be the best option, if it's consistent -- and at least for the official MS tags it seems to be; MOUNT_POINT and SYMLINK are surrogates and the other types are not.
I appreciate you're saying the cost is worth it, but we're thinking all Boost users here, not just the small minority on Windows Server 2012 with dedup turned on.
I'm not on Server 2012, but this thread caught my attention because I remember encountering a bug that prevented all WinXP clients from accessing deduped files on CIFS shares provided by Server 2012. I think in the end this was a server-side bug related to McAfee and the different protocols used by WinXP vs. Win7, and so clients shouldn't normally be able to see whether files are deduped or not remotely, but I haven't explicitly verified that. If CIFS shares do expose files as dedup reparse points instead of concealing that then it might affect quite a lot of users.
On 29 Jul 2015 at 18:09, Gavin Lambert wrote:
On 29/07/2015 14:06, Niall Douglas wrote:
NTFS compressed files act exactly like normal files. Reparse point files do not and require significant additional processing to figure out what kind they are. That's the difference.
From AFIO's perspective, when it does NtQueryDirectoryFile() to fetch metadata about a file entry, it can zero cost learn if an entry is a reparse point by examining FileAttributes for the FILE_ATTRIBUTE_REPARSE_POINT flag. It cannot tell what kind of reparse point file it is without opening the file and asking.
Windows' CreateFile() API is astonishingly slow. To require calling that, then an additional NtQueryDirectoryFile() to fetch the FILE_REPARSE_POINT_INFORMATION metadata and close the handle - which is the fastest way I know of to fetch the reparse point tag code - would impose an enormous performance penalty for all file entries marked with FILE_ATTRIBUTE_REPARSE_POINT.
If it helps, https://msdn.microsoft.com/en-us/library/windows/desktop/aa365511.aspx seems to specify that reparse points provide their tag id in the dwReserved0 field of the WIN32_FIND_DATA structure (I'm not sure how that maps to the native API, but I assume it's somewhere). That should be sufficient to identify the reparse point type.
That does help greatly in fact. I know FindXXXFile doesn't open each file, so somehow or other the Win32 layer is able to fetch the reparse tag type for directory entries purely from the directory handle.
Granted, a single NtQueryDirectoryFile on the whole directory is not enough to get both sets of data, but you should still be able to do it in just two calls per directory (times however many calls are required to fully enumerate the directory, of course).
Presumably you're currently using one of FileBothDirectoryInformation or FileFullDirectoryInformation. You should be able to switch to the "Id" variants (FileIdBothDirectoryInformation or FileIdFullDirectoryInformation) instead (if you're not already using them). This gives you a FileId for each file, along with the other information.
After you've enumerated the entire directory, you can go back and get FileReparsePointInformation for the whole directory, and then match up the FileId against the FileReference to merge the data and get the reparse tag for each file.
(I haven't tested this, so I'm not sure if it gives you an empty tag for files that aren't reparse points, or only lists reparse points. The latter would be nice, as it would be close to zero overhead for directories that do not contain reparse points.)
Unfortunately getting FileReparsePointInformation returns just a single record which is the reparse point for the directory handle being enumerated. It doesn't return reparse tags for directory contents. There is an index of all reparse points on a NTFS volume in a magic NTFS file stream, but that's NTFS specific code, and it requires a file handle to be opened. I'm thinking that as reparse points are really just an overload on EA, maybe the returned EaSize field is magically set to the reparse tag when attributes specify it's a reparse point file? I'd have to experiment to find out. I can't see any other obvious field which would return the reparse tag. EDIT: What a guess I just made!: https://www.osronline.com/showthread.cfm?link=171655. Thanks Gavin, you just solved the problem for AFIO at least.
I appreciate you're saying the cost is worth it, but we're thinking all Boost users here, not just the small minority on Windows Server 2012 with dedup turned on.
I'm not on Server 2012, but this thread caught my attention because I remember encountering a bug that prevented all WinXP clients from accessing deduped files on CIFS shares provided by Server 2012. I think in the end this was a server-side bug related to McAfee and the different protocols used by WinXP vs. Win7, and so clients shouldn't normally be able to see whether files are deduped or not remotely, but I haven't explicitly verified that. If CIFS shares do expose files as dedup reparse points instead of concealing that then it might affect quite a lot of users.
I had understood from the OP that CIFS is exporting the reparse point tag to clients, hence the breakage. The reason, I suspect, that CIFS is being so braindead here is that opening a deduped file is more expensive than usual and clients ought to know. Which is exactly why I am opposed to treating these things as a regular file. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 29/07/2015 22:59, Niall Douglas quoth:
Unfortunately getting FileReparsePointInformation returns just a single record which is the reparse point for the directory handle being enumerated. It doesn't return reparse tags for directory contents.
Ah, true. I missed that part. Seems kinda annoying they made that different from all the other information classes.
I'm thinking that as reparse points are really just an overload on EA, maybe the returned EaSize field is magically set to the reparse tag when attributes specify it's a reparse point file? I'd have to experiment to find out. I can't see any other obvious field which would return the reparse tag.
EDIT: What a guess I just made!: https://www.osronline.com/showthread.cfm?link=171655. Thanks Gavin, you just solved the problem for AFIO at least.
Yep, it appears so. That makes life easier. You should probably make it generic via IsReparseTagNameSurrogate as I mentioned earlier rather than checking for the symlink/dedup tags specifically. So: 1. entries with FILE_ATTRIBUTE_DIRECTORY are directories. 2. entries with FILE_ATTRIBUTE_REPARSE_POINT *and* a tag with IsReparseTagNameSurrogate == true are symlinks. (And possibly also directories, via #1.) 3. entries with FILE_ATTRIBUTE_REPARSE_POINT *and* a tag with IsReparseTagNameSurrogate == false are regular files that are possibly slow to open. 4. entries with FILE_ATTRIBUTE_COMPRESSED are regular files that are possibly slow to open. 5. entries with FILE_ATTRIBUTE_OFFLINE are regular files that are probably not openable (or *very* slow to open). 6. entries lacking those attributes are regular files. Sound about right?
The reason, I suspect, that CIFS is being so braindead here is that opening a deduped file is more expensive than usual and clients ought to know. Which is exactly why I am opposed to treating these things as a regular file.
I think they probably should be treated the same as files with FILE_ATTRIBUTE_COMPRESSED, since essentially it's just a different compression scheme. I don't know whether you currently distinguish these from regular files or not.
On 30 Jul 2015 at 11:17, Gavin Lambert wrote:
You should probably make it generic via IsReparseTagNameSurrogate as I mentioned earlier rather than checking for the symlink/dedup tags specifically.
Actually no - AFIO can only read the target for reparse points it knows about. I committed a fix for this earlier today. It reports reparse points with tag IO_REPARSE_TAG_MOUNT_POINT or IO_REPARSE_TAG_SYMLINK as symlinks. Everything else is reported normally. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 1/08/2015 10:39, Niall Douglas wrote:
On 30 Jul 2015 at 11:17, Gavin Lambert wrote:
You should probably make it generic via IsReparseTagNameSurrogate as I mentioned earlier rather than checking for the symlink/dedup tags specifically.
Actually no - AFIO can only read the target for reparse points it knows about.
I committed a fix for this earlier today. It reports reparse points with tag IO_REPARSE_TAG_MOUNT_POINT or IO_REPARSE_TAG_SYMLINK as symlinks. Everything else is reported normally.
Granted I'm not familiar with AFIO's APIs, but wouldn't it make the most sense to report other name surrogates as symlinks as well (via an "is this a symlink" or "get file type" method), but then if queried for the target of an unknown symlink type it will return/throw a "not supported" error?
On 3 Aug 2015 at 11:28, Gavin Lambert wrote:
Actually no - AFIO can only read the target for reparse points it knows about.
I committed a fix for this earlier today. It reports reparse points with tag IO_REPARSE_TAG_MOUNT_POINT or IO_REPARSE_TAG_SYMLINK as symlinks. Everything else is reported normally.
Granted I'm not familiar with AFIO's APIs,
There is a single page "cheat sheet" at https://boostgsoc13.github.io/boost.afio/doc/html/afio/overview.html.
but wouldn't it make the most sense to report other name surrogates as symlinks as well (via an "is this a symlink" or "get file type" method), but then if queried for the target of an unknown symlink type it will return/throw a "not supported" error?
I am not adverse to adding a "st_reparse_point" field to stat_t. This would let client code do its own detection on Windows. Does this work for you? Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 3/08/2015 13:43, Niall Douglas wrote:
On 3 Aug 2015 at 11:28, Gavin Lambert wrote:
Actually no - AFIO can only read the target for reparse points it knows about.
I committed a fix for this earlier today. It reports reparse points with tag IO_REPARSE_TAG_MOUNT_POINT or IO_REPARSE_TAG_SYMLINK as symlinks. Everything else is reported normally.
Granted I'm not familiar with AFIO's APIs,
There is a single page "cheat sheet" at https://boostgsoc13.github.io/boost.afio/doc/html/afio/overview.html.
It would be nice if this included hyperlinks for the local types. I have no idea what a directory_entry looks like. (And even after manually navigating around the docs until I found https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/dir..., I still have no idea what those fields actually *mean*. Only because you mentioned it below did I also find https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/structs/sta..., which is more descriptive. Although I later went back and noticed I overlooked fetch_lstat on directory_entry. Another case where hyperlinks would have been nice.)
but wouldn't it make the most sense to report other name surrogates as symlinks as well (via an "is this a symlink" or "get file type" method), but then if queried for the target of an unknown symlink type it will return/throw a "not supported" error?
Using the above vocabulary, it seems to me that: - enumerate() / lstat() should be able to report all name surrogates as symlinks, however that is currently done (presumably via st_type == symlink_file). Other reparse types should be reported as regular files/directories. - symlink() should be able to open unknown symlinks (since that's just a flag to CreateFile). - rmsymlink() should be able to delete unknown symlinks. - target() should work for the known symlink types and fail "not supported" (or similar) for the other name surrogate types, and fail "invalid operation" (or similar) for any non-reparse file or non-name-surrogate type. Does that sound reasonable? I suppose another variant on this would be to report known-type symlinks as st_type == symlink_file, unknown-type name surrogates as st_type == type_unknown, and any other reparse point as st_type == regular_file/directory_file. This would have the advantage of hinting whether target() is likely to work, but the disadvantage of being a bit misleading. (On a peripherally related note, it seems odd that Boost.Filesystem's file_type appears to lack a way to express "a symlink to a directory", which should be opened as a directory instead of as a file. Is this a POSIX limitation, that you're required to inspect the target to determine whether it's a file or directory? I know that Windows provides this up-front, both for junctions and for actual symlinks, which in turn means that if you do want to follow directory symlinks then you can just open them as regular directories without fanfare. Of course, that's also partly why symlinks are discouraged on Windows, because naive enumeration code will follow them by default and hilarity can ensue.)
I am not adverse to adding a "st_reparse_point" field to stat_t. This would let client code do its own detection on Windows. Does this work for you?
I don't personally have a use case, so I can't really answer the last question. As I said I'm coming at this thread from a design standpoint rather than a practical one. (And the original focus of the thread was on Filesystem rather than AFIO.) Having said that, more information never hurts; but I think this should be in addition to the behaviour described above, not instead of it.
On 3 Aug 2015 at 16:38, Gavin Lambert wrote:
There is a single page "cheat sheet" at https://boostgsoc13.github.io/boost.afio/doc/html/afio/overview.html.
It would be nice if this included hyperlinks for the local types. I have no idea what a directory_entry looks like.
Fixed. Each operation on the cheat sheet now lists what types are related to it.
(And even after manually navigating around the docs until I found https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/dir..., I still have no idea what those fields actually *mean*. Only because you mentioned it below did I also find https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/structs/sta..., which is more descriptive. Although I later went back and noticed I overlooked fetch_lstat on directory_entry. Another case where hyperlinks would have been nice.)
Fixed. Each reference page now links to related types too.
but wouldn't it make the most sense to report other name surrogates as symlinks as well (via an "is this a symlink" or "get file type" method), but then if queried for the target of an unknown symlink type it will return/throw a "not supported" error?
Using the above vocabulary, it seems to me that:
- enumerate() / lstat() should be able to report all name surrogates as symlinks, however that is currently done (presumably via st_type == symlink_file). Other reparse types should be reported as regular files/directories.
I would prefer to not report something as a symlink when target() won't work with it. So you now have an additional stat_t flag called st_reparse_point which is always the FILE_ATTRIBUTE_REPARSE_POINT flag.
- symlink() should be able to open unknown symlinks (since that's just a flag to CreateFile).
This works.
- rmsymlink() should be able to delete unknown symlinks.
This works.
- target() should work for the known symlink types and fail "not supported" (or similar) for the other name surrogate types, and fail "invalid operation" (or similar) for any non-reparse file or non-name-surrogate type.
This works. Unknown symlink types return an EINVAL error as per POSIX.
Does that sound reasonable?
Yes :)
I suppose another variant on this would be to report known-type symlinks as st_type == symlink_file, unknown-type name surrogates as st_type == type_unknown, and any other reparse point as st_type == regular_file/directory_file. This would have the advantage of hinting whether target() is likely to work, but the disadvantage of being a bit misleading.
(On a peripherally related note, it seems odd that Boost.Filesystem's file_type appears to lack a way to express "a symlink to a directory", which should be opened as a directory instead of as a file. Is this a POSIX limitation, that you're required to inspect the target to determine whether it's a file or directory? I know that Windows provides this up-front, both for junctions and for actual symlinks, which in turn means that if you do want to follow directory symlinks then you can just open them as regular directories without fanfare. Of course, that's also partly why symlinks are discouraged on Windows, because naive enumeration code will follow them by default and hilarity can ensue.)
Filesystem is trapped by POSIX however, and POSIX treats symlinks as a special thing onto themselves. AFIO is a bit caught here too actually. If you're enumerating a directory you have no easy way of disambiguating between a symlink to a directory and a symlink to a file. You basically have to try opening it as a directory, and if it errors out you then open it as a file. Windows does supply what kind of symlink it is without additional syscalls, but POSIX does not. You'd have to do two syscalls per entry to disambiguate which is very costly for something so niche.
I am not adverse to adding a "st_reparse_point" field to stat_t. This would let client code do its own detection on Windows. Does this work for you?
I don't personally have a use case, so I can't really answer the last question. As I said I'm coming at this thread from a design standpoint rather than a practical one. (And the original focus of the thread was on Filesystem rather than AFIO.)
Having said that, more information never hurts; but I think this should be in addition to the behaviour described above, not instead of it.
Well you've got a st_reparse_point field now, so you can detect reparse points which aren't those understood by AFIO and special case them if you so desire. The key aim for AFIO is as consistent a POSIX filesystem semantics as is possible portably. As mentioned in earlier threads, any real world use of async file i/o is going to need #ifdef for platforms anyway as filing systems are so different, but where I can eliminate that I will. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 4/08/2015 04:31, Niall Douglas wrote:
On 3 Aug 2015 at 16:38, Gavin Lambert wrote:
There is a single page "cheat sheet" at https://boostgsoc13.github.io/boost.afio/doc/html/afio/overview.html.
It would be nice if this included hyperlinks for the local types. I have no idea what a directory_entry looks like.
Fixed. Each operation on the cheat sheet now lists what types are related to it.
Nice, thanks.
(And even after manually navigating around the docs until I found https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/dir..., I still have no idea what those fields actually *mean*. Only because you mentioned it below did I also find https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/structs/sta..., which is more descriptive. Although I later went back and noticed I overlooked fetch_lstat on directory_entry. Another case where hyperlinks would have been nice.)
Fixed. Each reference page now links to related types too.
They still seem to be missing on the function reference pages (I checked https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/e...), which is probably where they'd be the most useful. I'm used to the style of the Boost.Asio docs (eg. http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/reference/async_rea...), where all the types are linked directly in the method description. (Of course, mostly they're templates, but still...)
I would prefer to not report something as a symlink when target() won't work with it. So you now have an additional stat_t flag called st_reparse_point which is always the FILE_ATTRIBUTE_REPARSE_POINT flag.
I guess that depends on usage cases -- if it's most common to write code like if (type() == symlink_file) { do something with target(); } then you have a point. Although code that has sufficient error checking should be able to cope with the idea of a symlink that has an unreadable target. But it seems odd to me to claim that a file is *not* a symlink just because you're told that it's a type of symlink that you don't know how to read. Having said that, I don't know how common custom symlinks are in the wild, or if they even exist at all.
AFIO is a bit caught here too actually. If you're enumerating a directory you have no easy way of disambiguating between a symlink to a directory and a symlink to a file. You basically have to try opening it as a directory, and if it errors out you then open it as a file.
Windows does supply what kind of symlink it is without additional syscalls, but POSIX does not. You'd have to do two syscalls per entry to disambiguate which is very costly for something so niche.
Perhaps rather than just having symlink_file, Filesystem should have symlink_file, symlink_directory, and symlink_entry? POSIX would return the latter (indicating that it's unknown whether it's a file or directory) while Windows would return one of the first two. This would still allow code to be written in a reasonably platform-independent manner. Another option might be for stat_t to have a field that contains the OS-native flags, so that on Windows the DIRECTORY flag could be examined directly. This might also allow for other esoteric attributes (COMPRESSED, ENCRYPTED, NOT_CONTENT_INDEXED, etc) to be inspected/set as desired, although that's probably more useful in Filesystem rather than AFIO. Although I think this is uglier than the above for the enumeration case.
On 4 Aug 2015 at 12:40, Gavin Lambert wrote:
Fixed. Each reference page now links to related types too.
They still seem to be missing on the function reference pages (I checked https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/e...), which is probably where they'd be the most useful.
It's actually there, it's just the docs generation tooling has collapsed the paragraphs into a single large paragraph. I'll look into a workaround after I've ported AFIO onto the new APIBind based multi-abi implementation of Boost.Monad.
I'm used to the style of the Boost.Asio docs (eg. http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/reference/async_rea...), where all the types are linked directly in the method description. (Of course, mostly they're templates, but still...)
I don't think the Boost.Geometry doxygen to qbk tool AFIO uses can do this. For Boost.Monad I'm sticking to a pure doxygen solution. I've wasted a lot of blood and sweat for little gain on AFIO's BoostBook documentation, and doxygen I think is a much more complete documentation tool than it once used to be. It would be really great if someone could skin doxygen to output something very close to BoostBook's output as I find doxygen's default HTML output pretty awful, but I suspect many of us will need to adopt doxygen first to generate the pressue for someone to do the skinning work.
I would prefer to not report something as a symlink when target() won't work with it. So you now have an additional stat_t flag called st_reparse_point which is always the FILE_ATTRIBUTE_REPARSE_POINT flag.
I guess that depends on usage cases -- if it's most common to write code like if (type() == symlink_file) { do something with target(); } then you have a point. Although code that has sufficient error checking should be able to cope with the idea of a symlink that has an unreadable target.
But it seems odd to me to claim that a file is *not* a symlink just because you're told that it's a type of symlink that you don't know how to read.
I'd like to think AFIO's symlinks are "POSIX(-y) symlinks".
Having said that, I don't know how common custom symlinks are in the wild, or if they even exist at all.
AFIO is a bit caught here too actually. If you're enumerating a directory you have no easy way of disambiguating between a symlink to a directory and a symlink to a file. You basically have to try opening it as a directory, and if it errors out you then open it as a file.
Windows does supply what kind of symlink it is without additional syscalls, but POSIX does not. You'd have to do two syscalls per entry to disambiguate which is very costly for something so niche.
Perhaps rather than just having symlink_file, Filesystem should have symlink_file, symlink_directory, and symlink_entry? POSIX would return the latter (indicating that it's unknown whether it's a file or directory) while Windows would return one of the first two. This would still allow code to be written in a reasonably platform-independent manner.
Another option might be for stat_t to have a field that contains the OS-native flags, so that on Windows the DIRECTORY flag could be examined directly. This might also allow for other esoteric attributes (COMPRESSED, ENCRYPTED, NOT_CONTENT_INDEXED, etc) to be inspected/set as desired, although that's probably more useful in Filesystem rather than AFIO. Although I think this is uglier than the above for the enumeration case.
Given the Filesystem TS has shipped, I'd say that moment has passed. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 5/08/2015 10:14, Niall Douglas wrote:
I guess that depends on usage cases -- if it's most common to write code like if (type() == symlink_file) { do something with target(); } then you have a point. Although code that has sufficient error checking should be able to cope with the idea of a symlink that has an unreadable target.
But it seems odd to me to claim that a file is *not* a symlink just because you're told that it's a type of symlink that you don't know how to read.
I'd like to think AFIO's symlinks are "POSIX(-y) symlinks".
That's least-common-denominator thinking. Which is hard to get away from when building a cross-platform abstraction layer, I know, but "because it's POSIX" isn't really a good justification either. There are some things that POSIX is very bad at (mostly for historic reasons). If you have a function that operates on symlinks, then it should operate on *all* symlinks, not merely a subset of them. Like I said though, it's possible the distinction is academic and not practical; I don't know if there are any other kinds of surrogates in the wild. So I can understand the reluctance. :)
Given the Filesystem TS has shipped, I'd say that moment has passed.
Too late to be in the standard (yet), maybe. But one of the roles of Boost is to be better than the standard, so it can be the *next* standard. :)
On 5 Aug 2015 at 11:27, Gavin Lambert wrote:
But it seems odd to me to claim that a file is *not* a symlink just because you're told that it's a type of symlink that you don't know how to read.
I'd like to think AFIO's symlinks are "POSIX(-y) symlinks".
That's least-common-denominator thinking. Which is hard to get away from when building a cross-platform abstraction layer, I know, but "because it's POSIX" isn't really a good justification either. There are some things that POSIX is very bad at (mostly for historic reasons).
I'm more thinking that there is no point in adding features which have no proven use case yet. I don't mind adding a boolean flag which costs me nothing and cannot be buggy. I get much more worried about adding features which I cannot test and have no proven user base. Better I think to wait until a proven use case arises. BTW you may not be aware, but AFIO includes every historical release of itself within itself via submodule branch pins. In other words, if you build an application targeting v1 ABI of AFIO, that will work in perpetuity (or at least until I stop supporting it). AFIO already ships two versions of itself, v1 and v2. Hence I don't have the problems other Boost libraries have with changing API semantics down the line. I can do so without breaking anyone's code because there is a literal copy of previous AFIO's shipped every release.
Given the Filesystem TS has shipped, I'd say that moment has passed.
Too late to be in the standard (yet), maybe. But one of the roles of Boost is to be better than the standard, so it can be the *next* standard. :)
If you can persuade Beman I'll follow it. AFIO is intended as a set of extensions to Filesystem, not as a replacement and as such is wholly dependent on Filesystem. In other words, whatever Filesystem does I'll match. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 6/08/2015 05:03, Niall Douglas wrote:
Given the Filesystem TS has shipped, I'd say that moment has passed.
Too late to be in the standard (yet), maybe. But one of the roles of Boost is to be better than the standard, so it can be the *next* standard. :)
If you can persuade Beman I'll follow it. AFIO is intended as a set of extensions to Filesystem, not as a replacement and as such is wholly dependent on Filesystem. In other words, whatever Filesystem does I'll match.
That was the idea. I guess we'll have to wait and see what he says next week.
On 29 July 2015 at 18:59, Niall Douglas
On 29 Jul 2015 at 18:09, Gavin Lambert wrote:
On 29/07/2015 14:06, Niall Douglas wrote:
I appreciate you're saying the cost is worth it, but we're thinking all Boost users here, not just the small minority on Windows Server 2012 with dedup turned on.
I'm not on Server 2012, but this thread caught my attention because I remember encountering a bug that prevented all WinXP clients from accessing deduped files on CIFS shares provided by Server 2012. I think in the end this was a server-side bug related to McAfee and the different protocols used by WinXP vs. Win7, and so clients shouldn't normally be able to see whether files are deduped or not remotely, but I haven't explicitly verified that. If CIFS shares do expose files as dedup reparse points instead of concealing that then it might affect quite a lot of users.
I had understood from the OP that CIFS is exporting the reparse point tag to clients, hence the breakage.
The reason, I suspect, that CIFS is being so braindead here is that opening a deduped file is more expensive than usual and clients ought to know. Which is exactly why I am opposed to treating these things as a regular file.
On the topic of "this file will be slow to read", IMHO this is an orthogonal issue. It might be nice to be able to query some sort of "this will be hell slow to read" status so I could perhaps do something about it, But the files (slow or not) should still be treated as normal files. This problem is bigger than just reparse-point files. Reparse-point files (not symlinks/junctions) are just one type of maybe-this-will-be-slow files. Reading off the local underutilised disk is a lot faster than a local disk suffering high IO, On monday, "K:" might be a lot slower than the "M:" because the K drive is a distant server on a slow network, and the M: is a fast server on the local subnet. On tuesday, it perhaps is the opposite because I've flown into the site hosting the K:. Perhaps a network file read is slow one minute (on 3G network) and fast just one minute later (switch on WIFI). But, the current system doesn't tell me anything about that. Nor does boost treat the K: files as "special" files just because it *might* be slow. So I don't see why we should start treating (eg dedup) reparse files any different. Speed of a read is an orthogonal issue, and often not something that I can do something about. If its going to take 5 minutes to read that Word document off the disk, then that's what it takes. I can't read that file any other way. If its a problem for my software, I'll need to read it in a nonblocking way, with the ability to cancel and show progress etc. But the simple case is to block on the read, and my users are cool with that because most software works that way. cheers, Paul
On 29 July 2015 at 14:09, Gavin Lambert
I'm not sure if it's current, but http://blogs.technet.com/b/filecab/archive/2013/02/14/dfsr-reparse-point-sup... seems to suggest the following behaviour as reasonable:
- treating IO_REPARSE_TAG_MOUNT_POINT as directory symlinks - treating IO_REPARSE_TAG_SYMLINK as symlinks - treating IO_REPARSE_TAG_DEDUP, IO_REPARSE_TAG_SIS, and IO_REPARSE_TAG_HSM as regular files - treating any other tag as something to be ignored (in most cases)
I believe the last point is wrong in our context. That blog is talking about DFS Replication, which is a very special case for reading files. The fallback ("dehydrating and rehydrating files") is something they'd rather not do because it would be unpacking files out of 3rd party archival areas. They'd rather not read and copy content if they can avoid it. This is so specific that they probably shouldn't be using boost libraries to do this work. 3rd party companies (like McAfee, Symantec) can request a unique reparse tag for their custom server software, When the file is read, the Server uses the reparse tag ID to match up with the required 3rd party driver to handle the read/write. For example, Symantec Enterprise Vault v10 has Reparse Tag Value 0x00000010 (observed on server in the wild). If you like, I can send a screenshot of this particular file, taken on a Windows 7 client computer, looking at K: (a network share of a Windows server). This allows 3rd party companies to make their own fancy cluster/archival storage solutions. The only way I can read this particular file is through the first filename...... there is no symlink to follow. So its not a symlink, there is no second filename to look at. In this particular case, the files are archived if not read for X days. When you first open the file, Symantec replaces the reparse-point file with the REAL file, and things continue as normal from there. So, similar purpose to a dedup file, but different implementation. So I would have written that last point as: - treating any other tag as a regular file Because in this case, the Server Admin that I talk to want to install whatever software they like on the server, and for client software to just read the files. They use this software to reduce the storage usage. Thats all. cheers, Paul
On 30/07/2015 14:49, Paul Harris wrote:
On 29 July 2015 at 14:09, Gavin Lambert
wrote: I'm not sure if it's current, but http://blogs.technet.com/b/filecab/archive/2013/02/14/dfsr-reparse-point-sup... seems to suggest the following behaviour as reasonable:
- treating IO_REPARSE_TAG_MOUNT_POINT as directory symlinks - treating IO_REPARSE_TAG_SYMLINK as symlinks - treating IO_REPARSE_TAG_DEDUP, IO_REPARSE_TAG_SIS, and IO_REPARSE_TAG_HSM as regular files - treating any other tag as something to be ignored (in most cases)
I believe the last point is wrong in our context. That blog is talking about DFS Replication, which is a very special case for reading files. The fallback ("dehydrating and rehydrating files") is something they'd rather not do because it would be unpacking files out of 3rd party archival areas. They'd rather not read and copy content if they can avoid it. [...] So I would have written that last point as: - treating any other tag as a regular file
If you have a look at the very next paragraph in the quoted message, that's what I said. :) (The part you quoted was repeating what the blog said, not as a recommendation for Boost library behaviour.)
On 30 July 2015 at 11:33, Gavin Lambert
On 30/07/2015 14:49, Paul Harris wrote:
On 29 July 2015 at 14:09, Gavin Lambert
wrote: I'm not sure if it's current, but
http://blogs.technet.com/b/filecab/archive/2013/02/14/dfsr-reparse-point-sup... seems to suggest the following behaviour as reasonable:
- treating IO_REPARSE_TAG_MOUNT_POINT as directory symlinks - treating IO_REPARSE_TAG_SYMLINK as symlinks - treating IO_REPARSE_TAG_DEDUP, IO_REPARSE_TAG_SIS, and IO_REPARSE_TAG_HSM as regular files - treating any other tag as something to be ignored (in most cases)
I believe the last point is wrong in our context. That blog is talking
about DFS Replication, which is a very special case for reading files. The fallback ("dehydrating and rehydrating files") is something they'd rather not do because it would be unpacking files out of 3rd party archival areas. They'd rather not read and copy content if they can avoid it.
[...]
So I would have written that last point as: - treating any other tag as a regular file
If you have a look at the very next paragraph in the quoted message, that's what I said. :)
(The part you quoted was repeating what the blog said, not as a recommendation for Boost library behaviour.)
Sorry, you mean this bit : There was also a note that you can use IsReparseTagNameSurrogate to
determine if a given reparse point tag is a surrogate (some kind of link) or not (treat like regular file). That might be the best option, if it's consistent -- and at least for the official MS tags it seems to be; MOUNT_POINT and SYMLINK are surrogates and the other types are not.
On 30/07/2015 15:54, Paul Harris wrote:
If you have a look at the very next paragraph in the quoted message, that's what I said. :)
(The part you quoted was repeating what the blog said, not as a recommendation for Boost library behaviour.)
Sorry, you mean this bit :
There was also a note that you can use IsReparseTagNameSurrogate to
determine if a given reparse point tag is a surrogate (some kind of link) or not (treat like regular file). That might be the best option, if it's consistent -- and at least for the official MS tags it seems to be; MOUNT_POINT and SYMLINK are surrogates and the other types are not.
Yes. Admittedly it was perhaps a bit unclear; I expanded on this in my reply to Niall earlier today, which does have a recommendation, although not down to specific APIs. I'm still assuming that he wants to distinguish between "fast files" and "slow files" in some way, but both should be "regular files" -- there should be a separate API to ask if they're fast or not, if that distinction is useful.
participants (5)
-
Andrey Semashev
-
Beman Dawes
-
Gavin Lambert
-
Niall Douglas
-
Paul Harris