I've put together some slides to show what I'm thinking of and make sure we are on the same page: http://www.slideshare.net/erenon/boostpipeline-scheduling-of- segments- 36832455
To point here, assuming a pipeline, one can use extra information to make some educated guesses regarding scheduling: Yield to offending queue (upstream if input is empty, downstream if output is full), choose specific segment to optimize latency.
Probably this can be achieved using fibers, or even with coroutines.
Due to the complexity of HPX I'm not sure how does it compare.
Huh? Complexity? It's 100% API compatible to C++11 and various proposals to C++14/17. I'd like to know more about why you think HPX is complex.
I added a much simplified implementation of what you described in your slides. Most of the complexity lies in the operator&, not it the HPX code itself. [snip]
Thanks for showing this sample, it's definitely very clean! By complexity, I meant internal details I couldn't cover yet, not something silly in the API.
Three things come to my mind regarding this example: - It **seems** every segment creates a future instead of pushing the result into a bounded queue. Possibly the later would be faster.
Possibly. This depends on the amount of work you're having in each segment. As said, the overhead of one async/future is in the range of microseconds, so it will be efficient (in terms of parallel efficiency) once you have a certain amount of work. At the same time, if you have segments which have not sufficient work it might not make sense to overlap those in the first place, you could simply return a ready-future (like the read() step I showed). But in the end you'll have to measure for your use cases.
- The segments never yield. Is there a complete scheduler underneath?
Yes. The segments could yield if needed.
- By launching tasks using hpx::async, it seems the scheduler can't have any notion of the pipeline, which makes efficient scheduling harder.
Well, it depends on your scheduler. In HPX the default scheduler (others can be used) is based on a FIFO queue with work-stealing from the back-end (one queue per core) for load balancing across all used cores.
Sorry for my ignorance, I can't answer these questions just by glancing at the docs.
You might not find those answers in the docs anyways ;)
From my perspective, having the segments return a future has many advantages:
a) you can utilize the asynchronous nature of future-based continuations without effort b) this composes well with other asynchronous parallelization techniques (like when_all, when_any, etc.) c) it generally allows for hiding raw threads from the user (which is always good) and helps transforming concurrency into parallelism Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu