On 9/11/2015 7:04 AM, Paul Mensonides wrote:
On 9/10/2015 6:59 PM, Matt Calabrese wrote:
Yeah, I have to do that now with GCC, too, but FWIW the problems I was having were even before GCC added macro expansion tracking. I'm not sure what GCC does that's so memory intensive during preprocessing.
To some degree it depends on whether the preprocessor is doing what it is supposed to do at all or just something that kinda-sorta results in the same thing. One of the biggest performance hits that I have seen in those preprocessors that do it right (or at least mostly) has to do with what happens with actual arguments to macros.
Consider what happens here:
#define A 1 B #define B 2 C #define C 3 D #define D 4
0 A 5
This *looks* like four nested "calls," but it isn't. Rather, this is a stream. Ignoring the whole disabling context/blue paint for the moment, the replacement process proceeds as follows:
0 A 5 ^ 0 1 B 5 ^ 0 1 B 5 ^ 0 1 2 C 5 ^ 0 1 2 C 5 ^ 0 1 2 3 D 5 ^ 0 1 2 3 D 5 ^ 0 1 2 3 4 5 ^ 0 1 2 3 4 5 ^ 0 1 2 3 4 5 ^
In this scenario, once the scan position has advanced past a preprocessing token, that token is "done". I.e. it is pure output (to standard output, to a file, to the parser, or whatever).
Now change the scenario slightly:
#define M(x) x
M(0 A 5)
In this case, the argument is scanned for macro replacement as an independent scan. That whole scan takes place as above, the result of which is substituted for 'x' in the replacement list, the invocation of 'M' is replaced by that result, the scan position is placed at the first token of that result, and top-level scanning resumes.
The overall difference here is that the presence of the argument (where the argument is used in the replacement list as a non # or ## operand) causes what amounts to a recursive call to the entire scanning for macro replacement process. The results of which must be placed in a buffer, not output, in order to perform the substitution.
So, A defined as B defined as C (and so on) is not recursive, but M(M(M())) usually *is* recursive.
As an immediate testable example, Chaos has a macro that is used to count scans for macro replacement:
#include
#define A(x) B(x) #define B(x) C(x) #define C(x) D(x) #define D(x) E(x) #define E(x) x
CHAOS_PP_HALT( A( CHAOS_PP_DELVE() ) )
The result here is 6 (on entry to A, on entry to B, on entry to C, on entry to D, on entry to E, and finally when top level scanning is resumed).
Change the above to:
#include
#define A(x) B(, x) #define B(p, x) C(, p##x) #define C(p, x) D(, p##x) #define D(p, x) E(, p##x) #define E(p, x) p##x
CHAOS_PP_HALT( A( CHAOS_PP_DELVE() ) )
Now you get 2 (on entry to A and when top level scanning is resumed). This occurs because the placemarker concatenation is stopping the argument from being scanned for macro replacement all the way through most of the structure. (When this type of thing is scaled up, it can be *massively* faster than not doing it.)
None of these scenarios here are actually too bad because the recursion depth is shallow (and by "recursion depth" I mean, of course, the recursive scan depth) so you aren't getting buffers upon buffers, etc.. However, a library built with performance in mind from the ground up (i.e. not Boost.Preprocessor and not Chaos in many cases), has to be built with the above type of stuff in mind.
I recall your explaining previously how using the concatenation construct ('p##x'), as a means of shortcutting rescanning, when designing a preprocessing library increases preprocessor efficiency. I still think clarity of code, even library internals, is often more important than saving CPU cycles but everyone has their own priorities.
With regard to tuples versus arrays... The answer is really "neither". Both are bad choices for a data structure. Usually they are used for passing multiple auxiliary arguments through some higher-order algorithm. The right solution there is to simply never pack them together in that fashion at all:
#define M(s, n, a, b, c) (a + b + c)
CHAOS_PP_EXPR(CHAOS_PP_REPEAT( 5, M, 1, 2, 3 ))
(i.e. no TUPLE_ELEM in sight)
You can only get random access to tuple elements for a small number of elements. So for any largish container (i.e. where performance matters more), algorithmic processing is usually more efficient with a sequence. In a library like Chaos, it also makes much more efficient use of the available recursion steps because things like folds can be unrolled (because the data structure itself provides the computational horsepower to process itself--even if the algorithm is n^2 or worse).
I am recommending seqs from now on for any fairly large composite.
But, given a choice between array and tuple as a container (rather than just a parameter packing mechanism), I would choose tuples. Otherwise, any algorithmic processing of the data structure is going to require arithmetic to maintain the size of the array. (If you statically know the size of the array, then you aren't using it as a container, but rather just a packing structure.)
Good to know. The main objection to tuple rather than array is that a tuple can't have 0 elements. I plan to alleviate that in VMD, which supports "emptiness", by allowing tuples and seqs to start or end as "emptiness". Of course testing for "emptiness" using variadic macros is very slightly flawed as you know, since you wrote that code, but since VMD supports "emptiness" I will add it to that library and then those who want to use tuples or seqs as starting ( or ending with ) 0 elements can do so.