On 9/10/2015 6:59 PM, Matt Calabrese wrote:
Yeah, I have to do that now with GCC, too, but FWIW the problems I was
having were even before GCC added macro expansion tracking. I'm not sure
what GCC does that's so memory intensive during preprocessing.
To some degree it depends on whether the preprocessor is doing what it
is supposed to do at all or just something that kinda-sorta results in
the same thing. One of the biggest performance hits that I have seen in
those preprocessors that do it right (or at least mostly) has to do with
what happens with actual arguments to macros.
Consider what happens here:
#define A 1 B
#define B 2 C
#define C 3 D
#define D 4
0 A 5
This *looks* like four nested "calls," but it isn't. Rather, this is a
stream. Ignoring the whole disabling context/blue paint for the moment,
the replacement process proceeds as follows:
0 A 5
^
0 1 B 5
^
0 1 B 5
^
0 1 2 C 5
^
0 1 2 C 5
^
0 1 2 3 D 5
^
0 1 2 3 D 5
^
0 1 2 3 4 5
^
0 1 2 3 4 5
^
0 1 2 3 4 5
^
In this scenario, once the scan position has advanced past a
preprocessing token, that token is "done". I.e. it is pure output (to
standard output, to a file, to the parser, or whatever).
Now change the scenario slightly:
#define M(x) x
M(0 A 5)
In this case, the argument is scanned for macro replacement as an
independent scan. That whole scan takes place as above, the result of
which is substituted for 'x' in the replacement list, the invocation of
'M' is replaced by that result, the scan position is placed at the first
token of that result, and top-level scanning resumes.
The overall difference here is that the presence of the argument (where
the argument is used in the replacement list as a non # or ## operand)
causes what amounts to a recursive call to the entire scanning for macro
replacement process. The results of which must be placed in a buffer,
not output, in order to perform the substitution.
So, A defined as B defined as C (and so on) is not recursive, but
M(M(M())) usually *is* recursive.
As an immediate testable example, Chaos has a macro that is used to
count scans for macro replacement:
#include
#define A(x) B(x)
#define B(x) C(x)
#define C(x) D(x)
#define D(x) E(x)
#define E(x) x
CHAOS_PP_HALT(
A(
CHAOS_PP_DELVE()
)
)
The result here is 6 (on entry to A, on entry to B, on entry to C, on
entry to D, on entry to E, and finally when top level scanning is resumed).
Change the above to:
#include
#define A(x) B(, x)
#define B(p, x) C(, p##x)
#define C(p, x) D(, p##x)
#define D(p, x) E(, p##x)
#define E(p, x) p##x
CHAOS_PP_HALT(
A(
CHAOS_PP_DELVE()
)
)
Now you get 2 (on entry to A and when top level scanning is resumed).
This occurs because the placemarker concatenation is stopping the
argument from being scanned for macro replacement all the way through
most of the structure. (When this type of thing is scaled up, it can be
*massively* faster than not doing it.)
None of these scenarios here are actually too bad because the recursion
depth is shallow (and by "recursion depth" I mean, of course, the
recursive scan depth) so you aren't getting buffers upon buffers, etc..
However, a library built with performance in mind from the ground up
(i.e. not Boost.Preprocessor and not Chaos in many cases), has to be
built with the above type of stuff in mind.
With regard to tuples versus arrays... The answer is really "neither".
Both are bad choices for a data structure. Usually they are used for
passing multiple auxiliary arguments through some higher-order
algorithm. The right solution there is to simply never pack them
together in that fashion at all:
#define M(s, n, a, b, c) (a + b + c)
CHAOS_PP_EXPR(CHAOS_PP_REPEAT(
5, M, 1, 2, 3
))
(i.e. no TUPLE_ELEM in sight)
You can only get random access to tuple elements for a small number of
elements. So for any largish container (i.e. where performance matters
more), algorithmic processing is usually more efficient with a sequence.
In a library like Chaos, it also makes much more efficient use of the
available recursion steps because things like folds can be unrolled
(because the data structure itself provides the computational horsepower
to process itself--even if the algorithm is n^2 or worse).
But, given a choice between array and tuple as a container (rather than
just a parameter packing mechanism), I would choose tuples. Otherwise,
any algorithmic processing of the data structure is going to require
arithmetic to maintain the size of the array. (If you statically know
the size of the array, then you aren't using it as a container, but
rather just a packing structure.)
Regards,
Paul Mensonides