On 31 Aug 2015 at 20:39, Andrey Semashev wrote:
Before people go "aha!", remember that afio::handle has proper error handling and has to configure a number of other items, including asking the OS for what its "real" path is and other housekeeping.
Why does the library do that when not asked? Why not collect this data only when the user requests?
The problem is the race free filesystem support which isn't as simple, unfortunately, as simply using the POSIX *at() functions. You need a minimum of the true canonical path of the fd, and its stat_t struct (specifically the inode, created and modified timestamps). That's at least two additional syscalls just for those, and AFIO may do as many as seven syscalls per handle open depending. You can disable the race free semantics, which are slow, using afio::file_flags::no_race_protection. However AFIO fundamentally assumes that any file system path consuming function is slow and will not be frequently called by applications expecting high performance, so work is deliberately loaded into path consuming functions to keep it away from all other functions. If you have a need to do a lot of file opens and closes and don't care about races on the file system, you are better off not using AFIO. STL iostreams is perfectly good for this as the only OSs which allow async file opens and closes is QNX and the Hurd.
- I'd much prefer native_handle() to return the actual appropriate native handle type for the platform, e.g. int on POSIX, rather than a void *. I'm guessing this is done in order to support alternative dispatchers, that aren't necessarily based on OS files at all, in the future, but perhaps a better way of doing this that doesn't require casting the void* to an int, can be found.
It's more ABI stability.
Stability with respect to what? Do you expect native handles to be somehow different on the same platform?
AFIO's design allows many backends e.g. ZIP archives, HTTP etc. So yes, native handles can vary on the same platform and I needed to choose one for the ABI.
Casting void * to int is fine.
No, it's not. It's a hack that you impose on the library users. According to [expr.reinterpret.cast]/4 it's not even required to compile, as int may not be large enough to hold a pointer.
The native handle on POSIX is stored internally as an int. In native_handle(), it is cast to a size_t, and then to a void *. Please do correct me if I am wrong, but I had thought that this is defined behaviour: int a=5; void *b=(void *)(size_t) a; int c=(int)(size_t) b; assert(c==a); This is certainly a very common pattern in C.
Directory handle cache:
- A cache of open handles to directories is kept by AFIO by default. This is particularly problematic because (as is documented) it makes reading directory entries thread-unsafe by default. If this is to be kept, a very strong rationale needs to be given, and the behavior must be changed somehow to ensure that directory reading is reliable by default.
The documentation explains exactly why there is this cache, it's a big win despite the complexity it introduces. The problem of races on directory enumeration on shared directory handles is eliminated if you make the number of entries enumerated in a single shot bigger than the total. In practice, this is not a problem, even on 10M item directories, just throw max_items at the problem.
Will this result in the process having 10M open handles permanently?
God no, it's exactly to achieve the opposite. Open handle count is severely limited on POSIX platforms, and AFIO engages in quite a bit of complexity to reduce the problem of running into process limits on open fds. If you are careful to keep metadata lookups within the directory_entry::metadata_fastpath() set, having to open each file entry to read metadata can be avoided. Your 10M item directory is therefore enumerated without opening a handle 10M times.
std::vector does not copy memory on C++ 11 when returned by value.
Correction: when returned with a move constructor or RVO.
Which AFIO makes sure of, and I have verified as working.
It's more that pwrite off the end of the file is inherently racy to concurrent read/writers. If you want safe automatic file length extension on writes, open a handle with the append flag and use that.
AFIO fatal exits the process if you try to read or write past the current file size, it's a logic error and your program is inherently broken.
This is unacceptable behavior IMO. I would expect an exception in such case.
AFIO assumes the only reason you are reading and writing past the maximum file size is either a logic error or memory corruption. It cannot tell which is which quickly, so we assume it's memory corruption as your program should never have this logic error. And as I explained before, if AFIO thinks memory corruption has occurred it can no longer know if what it has been told to do is correct or wrong, and therefore not terminating the process could cause damage to other people's data. I do have a very big concern that it is a denial of service attack on anything using AFIO by simply shrinking a file it is using from another process to cause the AFIO using process to fatal exit, so I will be trying to improve on the current situation in the engine rewrite. It's not as easy as it looks if you want to keep reads and writes fast, believe me, but again the ASIO reactor is in my way because that is code I can't control. Once I've written a replacement reactor I am hoping it can track scatter-gather writes a lot better than ASIO can and therefore handle this situation better without adding much overhead for tracking the individual operations. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/