On 28 Aug 2015 at 23:47, Antony Polukhin wrote:
I'm not sure what you mean here. That example creates a dispatcher. It then sets via RAII of the guard variable the current thread local dispatcher.
Let me clarify. In that example you use two approaches, one of wich explicitly uses `dispatcher` (for example `dispatcher->file(ihs_reqs)`) while the other approach uses `dispatcher` implicitly (for example `async_truncate(oh, offset)`). The example would look more solid if only one approach will be used. The implicit dispatcher looks more simple, which I think is good for examples.
Ok thanks for clarifying.
Also, thread count 1 seems more right and less error prone (by default this will require no threads synchronization from users).
I'm not sure which thread count you mean here.
By default thread_pool is creted that spawns some performance optimal count of threads. I'm proposing to spawn minimal amount of threads (1 thread) by default. In that way novice user could experiment with library without any need to think of threads synchronization. Such change (1 thread by default) will reduce the library entry barrier. But that's only my opinion, it's not a requirement :)
I'm not sure what the gain would be here. With the AFIO API as currently designed, you can schedule from a main thread quite complex logic to be executed asynchronously, and I have plans for even more powerful encapsulation of asynchronous (monadic) logic for later versions. You then synchronise or await on the futures returned by that logic in that main thread. This is all novice users should need or want. If you need to access shared or global state from a continuation called in an unknown order in unknown threads, that is already quite advanced stuff. Reducing the thread count to one seems inferior to a locking a global mutex which would achieve the same outcome.
You've convinced me. But I still insist on `async_copy` method ready out of the box. This could be really usefull + will untie your hands for future modifications (for example you could try to call splice/tee and if that blocks - do the copying using memory allocations).
A standard function for async file copy is listed as planned for a future version in the release notes. Here is what it says: "* Related to locking files, the ability to copy files (unwritten destination locked until complete) using a variety of algorithms (including cheap copies on btrfs [cp --reflink=auto]) with progress notification via the file change monitoring. Extend rename file and delete file with lock file options." So essentially when I get byte range locking implemented, async_copy will follow shortly thereafter (and be constant time on filing systems supporting that).
OK, good news: I've finally understood the docs and all the interface decisions that were made. This leads to my decision to review AFIO in two parts: nonbatch and batch.
This is still not a review, just some discussion. Let's start from the nonbatch operations...
NONBATCH
I like the interface, I love it. Some comments:
*Thank you!* You are the first person to publicly say you like it let alone love it. It has design compromises, but it was not chosen without a great deal of thought and comparison with alternatives.
Most part of the functions in table "AFIO single page cheat sheet for the very impatient" have synchronous analogs in Boost.Filesystem http://www.boost.org/doc/libs/1_59_0/libs/filesystem/doc/reference.html How about changing the names to be more close to the Boost.Filesystem. For example:
async_truncate -> async_resize_file truncate -> resize_file
AFIO's truncate is a bit different from resizing. It specifically guarantees to not allocate any new space where that is possible. I chose truncate simply from the ftruncate() POSIX function.
async_dir -> async_directory
I chose dir simply for shortness, especially rmdir.
async_rmfile + async_rmsymlink -> async_remove async_rmdir -> async_remove_all
I actually need separate rmfile/rmdir/rmsymlink functions. They are not equivalents by any stretch and on Windows at least, each have totally separate implementations. Even on POSIX, there is a lot of difference in what happens. Just because POSIX or Windows provides a single API to delete stuff doesn't mean AFIO can do the same. For example, we may have created a shadow file if someone opened a byte range lock. If so, we need to ask if anyone else is using the file from any other process, and only if not can we delete the file plus its shadow file, else we need to upcall to other users that we want to delete this file.
I'm not a huge fan of Boost.Filesystem names, but they are almost a part of the C++ Standard now. Keeping close to them will simplify user's life by not forcing the user to learn new names. It will make the migration from sync Filesystem code to async AFIO more easy.
You should see a deliberate attempt to mimic Filesystem and POSIX wherever possible. Any deviations were pondered as deliberate acts for good reasons only. I'm not claiming I got all these right every time, it's just I should have a rationale for any naming deviations. Sans bugs of course.
Additionally use the Boost.Interprocess naming for file_flags: create_only_if_not_exist -> create_only, create -> open_or_create, add open_only
That's a good idea. Logged to https://github.com/BoostGSoC13/boost.afio/issues/105.
Probably rename `file_flags` into `open_flags`, because those flags are also usable for directories and used only at open.
Actually they are used in many other places throughout the API, just not particularly obviously so. I personally wanted to use just `flags` but there are also metadata flags and statfs flags. Dispatchers also have force set and force clear file_flags masks, I find those particularly useful for testing and in many other scenarios.
Also describe all the `T` in `async_` operations. For example
template
future async_file(T _path, file_flags _flags = file_flags::none) T _path - The filling system path to use. Could be filesystem::path, const char* ...
You just want a list of what T could be? If so, good idea and logged to https://github.com/BoostGSoC13/boost.afio/issues/106.
Also the sync operations that return `handle_ptr` does not seem right. Just remove them if possible. This will simplify the codebase and will hide from user's eyes the strange `handle_ptr` type.
Apart from the functions which open a file/dir/symlink obviously. Yeah that's fair enough. Logged at https://github.com/BoostGSoC13/boost.afio/issues/107.
The problem that I see is implementation of nonbatch operations using batch operations. `std::vector
(1, std::move(req))).front()` in every nonbatch operation looks really bad. This must be fixed, no vector constructons must be in here.
That's pure temporary shim code for mocking up the v1.4 API using the v1.3 engine. It works and is reliable, but it's definitely going away soon.
I do not like the `afio::path` class. It almost same as `filesystem::path` and only does some path modifications on Windows platform. It's much better to use `filesystem::path` all around the code and fix the paths directly near the system call. In such way user won't be bothered by additional class that almost does nothing. It's also an overkill to have a duplicate of `filesystem::path` just for minor path fixes.
It's not there for paths going into AFIO, it's there for paths coming out of AFIO e.g. handle::path(). This is because on Windows AFIO only works with NT kernel paths which are not valid win32 paths and more problematically, are not a one to one translation either. Therefore afio::path offers implicit conversion from const char *, std::string, filesystem::path etc into afio::path, but does not permit implicit conversion out. Instead you must make an active API call choice to convert the NT kernel path into some valid Win32 path with the options available having various documented tradeoffs. The main goal is to stop people accidentally feeding an AFIO path to any filesystem::path consuming function. This would work on POSIX, but fail spectacularly on Windows. For reference, AFIO v1.2 and earlier did use filesystem::path, and even I accidentally fed NT kernel paths to Win32 functions, so I added the type safety you see now.
BATCH
* TOC has no mention of batch operations. Please describe in a separate chapter what benefits batch operations provide, how fast they are and so forth.
The batch functions are listed in the reference guide. But I'll admit why I left out much mention of them: until I reimplement the internal engine using lightweight futures I cannot say what the benefits are of batch operations relative to the non-batch under the lightweight future based engine. It could be that batch delivers no benefit, and is simply a loop of single issuance. But I'll be keeping the batch API.
* The implementation is not perfect. Too many memory allocations and shared_ptrs. In ideal world there must be only two memory allocations for a batch operation: one in `std::vector
>` constructor, another one in making shared state for the future.
Absolutely agreed. And is exactly the reason behind the lightweight futures refactor. I'm way ahead of you on this. Note that most shared_ptr use is by move semantics, and therefore a lot less cost than it may look.
* The interface is not ideal and I can not fully understand how to improve it. But here are some ideas.
Minor improvements: * make all the async batch operations start with `async_batch_` prefix (or at least `async_`).
I'm still considering a batch free function API. I didn't present it here because it isn't fully baked yet. But it does begin with "batch_async_*".
* follow the C++ Standard naming for `std::packaged_task` like stuff. Rename all the `*_req` to `*_task`: `io_req` -> `io_task`, `path_req` -> `path_task` ...
Did you realise anything with a *_req is just a dumb structure? They are essentially named tuples.
Problems that I see: * std::vector as an input container is not a perfect solution * std::vector as an output container is OK for some cases, but some users may require nonallocating output
I am constrained by ABI stability, so I can't use arbitrary templated sequences (but see below).
* too many helper task-like structures (`io_req`, `path_req`) * result of batch operation must be converted to be suitable for next batch operation. In other words, if I want to create 100 files and then I wish to resize and write something to each file, I'll need to move a lot of `future<>`s from each batch output into `*_req` structures.
It's actually not as bad as it looks in practice. You almost certainly need to write a loop to create the batch, and so pulling in preconditions from a preceding operation is not a problem. It feels very natural.
* empty futures for first batch operation is not something that I'd wish to have around
An empty future is treated as no precondition i.e. execute now. I'm more than happy to think of a better alternative. I dislike overloading the meaning of an empty future this way, but it works well and isn't a problem in practice.
How hard would be to: * allow use of any container for providing a batch tasks
It could be done whilst preserving ABI using a visitor class whereby a virtual function fetched the next item in the sequence. However, almost certainly any benefit from batch requires the size of the batch to be known before anything else happens such that independent_comalloc etc can be called and perhaps the ability to feed the batch to something like OpenMP. All that suggested std::vector wasn't a bad default, and I like std::vector because C can speak std::vector easily, and I have a C and Microsoft COM API coming. To be honest I didn't feel it a priority until someone logs an issue with a real world case where the current std::vector solution is a problem. And I'm mindful that Eric's Ranges are the eventual solution to all problems like this, so I was thinking of waiting for Ranges.
* return `boost::array
, N>` if input data size is known at compile time
This is tricky in a stable DLL configuration. AFIO's header only configuration is not my personal default.
* move the `completion` field from each of the `*_req` structures to the batch method signature
Anywhere you see mention of completions goes away in the lightweight futures rewrite as completions are being replaced with continuations. All this is exactly why I didn't emphasise the batch API in this review presentation.
This will give the following interface:
array
, 7> operation = async_batch_file(std::initializer_list { path_task("file1.txt", open_flags::create), path_task("file2.txt", open_flags::create), path_task("file3.txt", open_flags::create), path_task("file4.txt", open_flags::create), path_task("file5.txt", open_flags::create), path_task("file6.txt", open_flags::create), path_task("file7.txt", open_flags::create) }); const off_t offsets[7] = { 4096, 4096, 4096, 4096, 4096, 4096, 4096 }; operation = async_batch_resize_file(std::move(operation), offsets); operation = async_batch_write(std::move(operation), io_task[]{ /* ... */ } ); operation = async_batch_remove(std::move(operation)); when_all_p(std::move(operation)).wait();
That's pretty close to my planned free function batch API. Thanks for the feedback Antony. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/