Purely for the fun, it took me a few minutes to write such pipeline, a simple version using a thread for every stage, then one with work stealing. There are a tons of stuff to improve, for example strings should be moved but I hope you get the idea. Now one "just" needs to write the syntactic sugar to have beautiful pipelines.
Well, if it's possible to use as many threads as there are segments (transformation), then I think it's a fairly easier problem. Making it possible for a thread to alternate between segments is something interesting. (Please correct me if I misunderstood your solution) Regarding work stealing: I don't know how does it work, but is the order of the input stable across segments?
We almost certainly are meaning different things. I meant something like:
auto p = Paragraphs << TextFile << HTMLFile << "http://www.boost.org/"
Paragraphs::Words::iterator it=p[4].match("Niall");
while(it) std::cout << "'Niall' found at offset " << it->offset << std::endl;
I think this demand driven computing is not something I'm aiming at, but it can approximated. The goal of my design is to provide a low-latency pipeline with acceptable throughput (or a high throughput one with acceptable latency); data driven nature serve this better IMO. Hartmut, thanks for the HPX example, I haven't got the time to analyze it yet. Currently, I'm experimenting with coroutines, I think (hope) there is a way we could provide an interface like this: void duplicate(int input, queue_back<int>& output) { output.push_or_yield(input); output.push_or_yield(input); } push_or_yield enqueues the element, or if the queue is full: the coroutine yields and tries to enter the monitor of the downstream task. If it's already taken, pick another task. If there is no such task, block until a task becomes available. (or spin on the previous task a bit) I think this would have nice (configurable) latency characteristics. Thanks for the intense discussion, Benedek