On Fri, Sep 11, 2015 at 4:04 AM, Paul Mensonides
On 9/10/2015 6:59 PM, Matt Calabrese wrote:
Yeah, I have to do that now with GCC, too, but FWIW the problems I was
having were even before GCC added macro expansion tracking. I'm not sure what GCC does that's so memory intensive during preprocessing.
To some degree it depends on whether the preprocessor is doing what it is supposed to do at all or just something that kinda-sorta results in the same thing. One of the biggest performance hits that I have seen in those preprocessors that do it right (or at least mostly) has to do with what happens with actual arguments to macros.
Consider what happens here:
#define A 1 B #define B 2 C #define C 3 D #define D 4
0 A 5
This *looks* like four nested "calls," but it isn't. Rather, this is a stream. Ignoring the whole disabling context/blue paint for the moment, the replacement process proceeds as follows:
0 A 5 ^ 0 1 B 5 ^ 0 1 B 5 ^ 0 1 2 C 5 ^ 0 1 2 C 5 ^ 0 1 2 3 D 5 ^ 0 1 2 3 D 5 ^ 0 1 2 3 4 5 ^ 0 1 2 3 4 5 ^ 0 1 2 3 4 5 ^
In this scenario, once the scan position has advanced past a preprocessing token, that token is "done". I.e. it is pure output (to standard output, to a file, to the parser, or whatever).
Now change the scenario slightly:
#define M(x) x
M(0 A 5)
In this case, the argument is scanned for macro replacement as an independent scan. That whole scan takes place as above, the result of which is substituted for 'x' in the replacement list, the invocation of 'M' is replaced by that result, the scan position is placed at the first token of that result, and top-level scanning resumes.
The overall difference here is that the presence of the argument (where the argument is used in the replacement list as a non # or ## operand) causes what amounts to a recursive call to the entire scanning for macro replacement process. The results of which must be placed in a buffer, not output, in order to perform the substitution.
So, A defined as B defined as C (and so on) is not recursive, but M(M(M())) usually *is* recursive.
As an immediate testable example, Chaos has a macro that is used to count scans for macro replacement:
#include
#define A(x) B(x) #define B(x) C(x) #define C(x) D(x) #define D(x) E(x) #define E(x) x
CHAOS_PP_HALT( A( CHAOS_PP_DELVE() ) )
The result here is 6 (on entry to A, on entry to B, on entry to C, on entry to D, on entry to E, and finally when top level scanning is resumed).
Change the above to:
#include
#define A(x) B(, x) #define B(p, x) C(, p##x) #define C(p, x) D(, p##x) #define D(p, x) E(, p##x) #define E(p, x) p##x
CHAOS_PP_HALT( A( CHAOS_PP_DELVE() ) )
Now you get 2 (on entry to A and when top level scanning is resumed). This occurs because the placemarker concatenation is stopping the argument from being scanned for macro replacement all the way through most of the structure. (When this type of thing is scaled up, it can be *massively* faster than not doing it.)
None of these scenarios here are actually too bad because the recursion depth is shallow (and by "recursion depth" I mean, of course, the recursive scan depth) so you aren't getting buffers upon buffers, etc.. However, a library built with performance in mind from the ground up (i.e. not Boost.Preprocessor and not Chaos in many cases), has to be built with the above type of stuff in mind.
!
Awesome write-up. This should really be explained somewhere in the
Boost.Preprocessor docs somewhere, just because I imagine that most people,
myself included, really only have a vague understanding of how the
preprocessor is /supposed/ to work. Perhaps it's out of the scope of the
library, as people should just already know it as a part of the language,
but that seems to me to be even less true regarding the preprocessor than
it is regarding the more complicated template rules.
On Fri, Sep 11, 2015 at 4:04 AM, Paul Mensonides
With regard to tuples versus arrays... The answer is really "neither". Both are bad choices for a data structure. Usually they are used for passing multiple auxiliary arguments through some higher-order algorithm. The right solution there is to simply never pack them together in that fashion at all:
#define M(s, n, a, b, c) (a + b + c)
CHAOS_PP_EXPR(CHAOS_PP_REPEAT( 5, M, 1, 2, 3 ))
(i.e. no TUPLE_ELEM in sight)
You can only get random access to tuple elements for a small number of elements. So for any largish container (i.e. where performance matters more), algorithmic processing is usually more efficient with a sequence. In a library like Chaos, it also makes much more efficient use of the available recursion steps because things like folds can be unrolled (because the data structure itself provides the computational horsepower to process itself--even if the algorithm is n^2 or worse).
Okay, that goes along with what I generally do anyway: convert to and deal with sequences behind the scenes. The only thing I find tuple-like containers useful for are user-input. Speaking of Chaos, I know I try to bring this up whenever you make an appearance, but can you please put Chaos up for review? I don't know the status of which compilers can handle it without workarounds, but as long as a compliant compiler can handle it, that's good enough for Boost, and you can just choose to not support broken implementations. One problem that I run into with Boost.Preprocessor is that since I can't "recurse" with BOOST_PP_SEQ_FOR_EACH I have to frequently use some other repetition construct instead, and I'm sure that I'm paying for that decision. This is particularly a problem if the macro I'm developing is to be used by other people who are also writing preprocessor code, meaning that if I choose to use something like BOOST_PP_SEQ_FOR_EACH internally, I'm effectively restricting where people can invoke my macro. IIRC, Chaos allows for all of these looping-like constructs to be used in such a manner, and I think that's really important for non-trivial uses of the preprocessor library. Even as Boost.Preprocessor persists, having Chaos in Boost would be a great addition. -- -Matt Calabrese