By contrast a vector of pointers/indexes will be faster to iterate due to the CPU being able to cache the jump locations/offsets, whereas it can't with a linked list of any type. That's what my graphs show.
So, I implemented that. The design now as simple as it gets. A RootedDiGraph with Adjacency vectors (and pointers obtained from the memory-pool to nodes and arcs). Instead of std::vector, I use the pt::pector from https://github.com/aguinet/pector, which has (can have) a smaller footprint, allows for std::uint32_t's as indices and has a growth-policy option (possibly with the use of realloc).
Results:
fan-out 32:
RootedDiGraphAdjIntrLists: 1810 milliseconds. RootedDiGraphAdjVectors: 1302 milliseconds. RootedDiGraphFlat: 2486 milliseconds.
fan-out 512:
RootedDiGraphAdjIntrLists: 28841 milliseconds. RootedDiGraphAdjVectors: 14698 milliseconds. RootedDiGraphFlat: 33168 milliseconds.
That's a significant improvement.
Thanks degski - that more-or-less reflects what I thought. Good to know I could make a difference!