back to implementing pipeline with automatic mutualization Pipeline::doStage(pair item, Filter & f) { ... elt.second = f(elt.second); if(f.nextFilter) { async doStage(elt, f.nextFilter); } else ... } This is roughly where we left off. Need to fill in: how do we deal with serial stages? and what happens when we hit the end of the pipeline. We'll deal with serial stages first. If the filter is serial, it has to know what element it's looking for. class Filter { int next_elt; //always init to 0 in ctors etc. public: bool is_serial(); friend class Pipeline; }; soo, add blockuntil to the first part of doStage: BlockUntil(!f.is_serial() || item.first == f.next_elt); f.next_elt++; This doesn't work because it makes all parallel stages serial: they all touch next_elt, which means that they can't run in parallel because the transaction mechanism will stop them. Thusly, we need to make parallel filters not fiddle with next_elt: 1) if(f.is_serial()) { BlockUntil(f.next_elt == elt.first); f.next_elt++; } //this is correct, but there's a chance that the AME people won't let //you write it this way, because BlockUntil might have to be the first //thing in a function. 2) BlockUntil(!f.is_serial() || elt.first == f.next_elt++); //we can get away with side effects in blockuntil because it will be //undone if the predicate is false. 3) arguably better than 2: BlockUntil(!f.is_serial() || elt.first == f.next_elt); if(f.is_serial()) f.next_elt++; Capital. What about when we're done with the pipeline? Remember, we have a limit on the number of items that can be in flight at once in the pipeline. class Pipeline { size_t tokensLeft; int next_num; Filter * first_filter; public: void inject() { pair elt; do { elt = make_pair(next_num++, first_filter(NULL)); if(elt.second && first_filter->nextFilter) async doStage(elt, first_filter->nextFilter); } while(elt.second && !first_filter->nextFilter); if(!elt.second)) tokensLeft--; } }; Finish doStage: it just calls inject() at the end. The call has to be async because inject increments next_num, serializing the final stage. This is really a bummer because there's nothing in the language that forces us to think about this. If you call a shared library synchronously and the library has state shared across invokations, you just serialized on that call. This goes back to the "it's better to be slow and correct than fast and wrong" argument. It's not so clear to me that this argument is correct, especially if you're wrong with low probability. FINALLY: we need to handle starting and finishing the pipeline. void Pipeline::run(size_t numTokens) yields { //REQ: numTokens > 0 tokensLeft = numTokens; for(size_t ii = 0; ii < numTokens; ++ii) inject(); //no reason to async because it WILL run serially //wait for the pipeline to finish //can't just blockUntil (will livelock) yield(); BlockUntil(tokensLeft == 0); } Last issue: what if filters need to yield? Observation 1: if we say filters can't yield, they can't use unprotected code to do things like I/O. (e.g., read from disk, do stuff, write to disk) However, we can fix things up: the call to first_filter in inject is yielding, and that means all calls to inject are yielding. But it doesn't hurt Pipeline::run because that was already yielding, and doStage is always invoked asynchronously (which means yields don't matter).