It would be interesting to see the benchmark numbers for a larger number of CPU cores (e.g. 16). I can see in the table that up to 3 TUs build time with modules is 25-40% higher than with headers and the situation significantly changes for 4 and more TUs. You were using 3 cores for compilation, and I wonder if this is related.
It is definitely related. I ran the benchmark like this intentionally, so the effects of parallelism could be seen without having to run a benchmark with 20TUs. You can expect the modules build to be slower at 7TUs with 16 cores. I will double check shortly that this is the case, though.
One other thing that isn't clear is how modules interact with compiled libraries. I don't suppose modules will replace static/shared libraries, so I presume a module will be added on top of the library? How should it "export" the symbols that are already exported from the compiled library then?
I haven't explored this yet as to have a clear mental model here.