On 8 Jun 2014 at 18:37, Stephan T. Lavavej wrote:
Do also bear in mind that MSVC is not an AST based compiler, so all your benchmarks will look totally different on MSVC anyway. What may have O(N^N) scaling on an AST compiler might well be O(N) on MSVC - in fact, that is exactly why MSVC isn't AST based, as internal Microsoft code is so complex it is uncompilable by an AST based compiler.
1. That's not why. The reason is that manipulating tokens seemed like a good idea 20-30 years ago, when this was sufficient to deal with C and C++ (before templates, variadic templates, decltype, constexpr, expression SFINAE, etc.) when computers were far smaller and slower.
For a good chunk of MSVC's (and C++'s) userbase, template use never really exceeds cookie cutter use i.e. template<class T> class vector. And for those guys - which includes a good chunk of Microsoft itself - MSVC's approach makes perfect sense. It is after all blindingly quick, and uses virtually no RAM relatively speaking. By "internal Microsoft code is so complex it is uncompilable by an AST based compiler" I was referring to a conversation I had with Gaby and James McNellis at C++ Now with Chandler occasionally jumping in to confirm the insanity :). They were telling me about the problems of getting 18k line long functions to compile with *any* AST based compiler, and that a unique problem facing MSVC is that a ton of Microsoft internal code is automatically generated in a way done under the expectation that MSVC doesn't mind really long statements such as an if() statement spanning a few thousand lines. One then faces the choice of retrofixing those automated generators to produce less insane output, or keep MSVC non-AST based for non-template non-constexpr code. I was told to date the latter has been chosen.
2. MS has a whole bunch of code, but only a tiny fraction approaches the complexity of Boost (in terms of "doing tricky things with the language"; stuff that's just loops and pointers is not complicated by this metric, no matter how twisty). Our STL implementation is usually the high water mark for complexity, and while we do tricky stuff, I know that Boost is trickier.
Oh sure. I didn't clarify I meant things like a single boolean if containing thousands of tests. I meant sheer size complexity, not language complexity.
3. I am not really qualified to talk about compiler tech, but I cannot possibly understand how keeping an AST around would change the *asymptotic* complexity of translation (it introduces constant factors which are an issue in practice). The AST has a strict superset of the information available by looking at a few tokens at a time. By definition, any speedups with a token-based approach that can't be obtained through an AST would be due to nonconformance.
The problem is RAM consumption, because you need to create an AST for the item being compiled before you compile it. Louis' empirical tests for MPL11 (now Hana) implementation solutions did a lot of time and space comparisons for various metaprogramming techniques. Both clang and GCC are AST based, so their space consumption graphs looked not dissimilar. MSVC has taken considerable effort to use sane amounts of RAM under very trying conditions, so I would suspect without evidence that MSVC would produce *very* different graphs - some worse, but probably many better. What is O(N^2) on clang or GCC I could see being O(N) on MSVC for example simply due to lack of two phase lookup. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/