Hartmut, thanks for the HPX example, I haven't got the time to analyze it yet.
Currently, I'm experimenting with coroutines, I think (hope) there is a way we could provide an interface like this:
void duplicate(int input, queue_back<int>& output) { output.push_or_yield(input); output.push_or_yield(input); }
push_or_yield enqueues the element, or if the queue is full: the coroutine yields and tries to enter the monitor of the downstream task. If it's already taken, pick another task. If there is no such task, block until a task becomes available. (or spin on the previous task a bit)
I think this would have nice (configurable) latency characteristics.
To explain things: HPX creates a coroutine (i.e. hpx::thread) for each hpx::async. The returned future can be used to synchronize with the thread's execution. The overhead of one such thread is in the range of 700-900ns, so you can easily spawn fairly small amounts of work (i.e. segments) and still be efficient. Creating of millions (literally!) of such threads is not a problem either. HPX implements almost all of the Standards TS related to concurrency and parallelism (N4088, N4104, N4107 - see here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/) which makes the returned futures very versatile and - together with the other proposed extensions - composable in many contexts. One added benefit of HPX is that all of this works across machines. You can have the same functionality as outlined in a distributed application. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu