Maybe I'm making it myself too easy but I'd see every pipeline stage as a scheduler, say, for Asynchronous a stealing threadpool scheduler(with one or more threads), every stage getting a job transforming input data and posting to the queue of the next scheduler a functor doing the next stage transformation, etc. Then I'd create a composite in one line of code to make sure work- stealing happens and that would be it for the infrastructure.
Purely for the fun, it took me a few minutes to write such pipeline, a simple version using a thread for every stage, then one with work stealing. There are a tons of stuff to improve, for example strings should be moved but I hope you get the idea. Now one "just" needs to write the syntactic sugar to have beautiful pipelines.
Purely for fun as well and for the sake of completeness (as HPX was
mentioned here before), here is Christophe's code in HPX:
#include