[preprocessing] Feedback requested on a C99 preprocessor written in pure universal Python

newer
boost::asio::ssl::stream handshake...

Niall Douglas

6 Mar 2017 6 Mar '17

10:45 a.m.

Those of you who watch reddit/r/cpp will know I've been working for the past month on a pure Python implementation of a C99 conforming preprocessor. I am pleased to be able to ask for Boost feedback on a fairly high quality implementation: https://github.com/ned14/pcpp It passes the C11 standard's list of "tricky" preprocessor expansions and the mcpp test suite. It has no issue correctly handling the preprocessor metaprogramming I've thrown at it from my Boost libraries. I'm not going to claim it can handle Boost.Preprocessor or especially Boost.VMD and I've tried neither, but pull requests with source code fixes adding support for those (with accompanying unit tests) would be welcome. My main use case for writing this is to assemble my header only Boost libraries into a single "drop in and go" file. To that end it has a (still buggy) --passthru mode which passes through #define and #undef plus any #if logic which uses an undefined macro. I'm still working on pass through mode so expect showstopper bugs in that configuration, but as a straight highly standards conforming preprocessor it's ready for others to use and I welcome any feedback. There are a multitude of use cases, everything from running as a conforming preprocessor before invoking MSVC to compile right through to parsing, reflection and introspection of source code. (and yes it is also a Python library as well as a command line tool, you can find API reference docs at https://ned14.github.io/pcpp/) I look forward to any feedback and my thanks in advance for it. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

Show replies by date

Edward Diener

6 Mar 6 Mar

7:10 p.m.

On 3/6/2017 5:45 AM, Niall Douglas via Boost wrote:

...

Those of you who watch reddit/r/cpp will know I've been working for the past month on a pure Python implementation of a C99 conforming preprocessor. I am pleased to be able to ask for Boost feedback on a fairly high quality implementation:

https://github.com/ned14/pcpp

It would be nice, for the purpose of testing with Boost PP and Boost VMD, if somehow your preprocessor could somehow be plugged in to one of the compilers Boost supports, with VC++ being the most obvious choice because its preprocessor is not C++ standard conforming. There may be some other way of directly creating a toolset with your preprocessor in order to test it in Boost, but I am not knowledgeable enough with Boost Build to know how to do this. If you, or somebody else, could do this it would surely be welcome by me and probably by yourself.

...

It passes the C11 standard's list of "tricky" preprocessor expansions and the mcpp test suite. It has no issue correctly handling the preprocessor metaprogramming I've thrown at it from my Boost libraries. I'm not going to claim it can handle Boost.Preprocessor or especially Boost.VMD and I've tried neither, but pull requests with source code fixes adding support for those (with accompanying unit tests) would be welcome.

My main use case for writing this is to assemble my header only Boost libraries into a single "drop in and go" file. To that end it has a (still buggy) --passthru mode which passes through #define and #undef plus any #if logic which uses an undefined macro. I'm still working on pass through mode so expect showstopper bugs in that configuration, but as a straight highly standards conforming preprocessor it's ready for others to use and I welcome any feedback. There are a multitude of use cases, everything from running as a conforming preprocessor before invoking MSVC to compile right through to parsing, reflection and introspection of source code.

(and yes it is also a Python library as well as a command line tool, you can find API reference docs at https://ned14.github.io/pcpp/)

I look forward to any feedback and my thanks in advance for it.

Niall

Niall Douglas

8:46 p.m.

On 06/03/2017 19:10, Edward Diener via Boost wrote:

...

On 3/6/2017 5:45 AM, Niall Douglas via Boost wrote:

...
Those of you who watch reddit/r/cpp will know I've been working for the past month on a pure Python implementation of a C99 conforming preprocessor. I am pleased to be able to ask for Boost feedback on a fairly high quality implementation:

https://github.com/ned14/pcpp

It would be nice, for the purpose of testing with Boost PP and Boost VMD, if somehow your preprocessor could somehow be plugged in to one of the compilers Boost supports, with VC++ being the most obvious choice because its preprocessor is not C++ standard conforming.

It's pretty straightforward in theory. pcpp can consume from stdin or a file, and can output to stdout or a file, so it's easy to insert into a sequence using the pipe operator (which works fine on Windows too). An alternative is to simply feed its output to any compiler, pcpp marks up the output with # lineno filepath exactly the same as a normal preprocessor so the compiler can track the original source files. Those should be passed through by any preprocessor unchanged, including MSVC's.

...

There may be some other way of directly creating a toolset with your preprocessor in order to test it in Boost, but I am not knowledgeable enough with Boost Build to know how to do this. If you, or somebody else, could do this it would surely be welcome by me and probably by yourself.

I would be no more skilled than you at persuading Boost.Build to do this. Even in cmake, it's tricky to inject a custom command inheriting the current compiler flags i.e. all the -D's, -I's etc.

...

...
(and yes it is also a Python library as well as a command line tool, you can find API reference docs at https://ned14.github.io/pcpp/)

Another big use case could be for debugging complex preprocessing because it's very easy to hook in and introspect preprocessing as it is being executed. But as I've mentioned, I've not tested it with really complex preprocessor metaprogramming, getting it this far has already taken me a month and I suspect my unemployment will be ending soon, so my free time will return to nil. Still, it was a nice diversion away from C++ and it has refreshed my Python skills very nicely. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

Edward Diener

10:51 p.m.

On 3/6/2017 3:46 PM, Niall Douglas via Boost wrote:

...

On 06/03/2017 19:10, Edward Diener via Boost wrote:

...
On 3/6/2017 5:45 AM, Niall Douglas via Boost wrote:

...
Those of you who watch reddit/r/cpp will know I've been working for the past month on a pure Python implementation of a C99 conforming preprocessor. I am pleased to be able to ask for Boost feedback on a fairly high quality implementation:

https://github.com/ned14/pcpp

It would be nice, for the purpose of testing with Boost PP and Boost VMD, if somehow your preprocessor could somehow be plugged in to one of the compilers Boost supports, with VC++ being the most obvious choice because its preprocessor is not C++ standard conforming.

It's pretty straightforward in theory. pcpp can consume from stdin or a file, and can output to stdout or a file, so it's easy to insert into a sequence using the pipe operator (which works fine on Windows too).

An alternative is to simply feed its output to any compiler, pcpp marks up the output with # lineno filepath exactly the same as a normal preprocessor so the compiler can track the original source files. Those should be passed through by any preprocessor unchanged, including MSVC's.

How do I identify pcpp as the preprocessor in Boost PP or Boost VMD code ? In other words does pcpp predefine some macro(s) that identifies itself and/or its level of C/C++ preprocessor conformance ? If it does I can check for this in the Boost PP configuration and set the level of C++ standards conformance and variadic macro support in the Boost PP configuration file. This would better enable Boost PP/Boost VMD to work with pcpp.

...

...
There may be some other way of directly creating a toolset with your preprocessor in order to test it in Boost, but I am not knowledgeable enough with Boost Build to know how to do this. If you, or somebody else, could do this it would surely be welcome by me and probably by yourself.

I would be no more skilled than you at persuading Boost.Build to do this. Even in cmake, it's tricky to inject a custom command inheriting the current compiler flags i.e. all the -D's, -I's etc.

...
...
(and yes it is also a Python library as well as a command line tool, you can find API reference docs at https://ned14.github.io/pcpp/)

Another big use case could be for debugging complex preprocessing because it's very easy to hook in and introspect preprocessing as it is being executed. But as I've mentioned, I've not tested it with really complex preprocessor metaprogramming, getting it this far has already taken me a month and I suspect my unemployment will be ending soon, so my free time will return to nil. Still, it was a nice diversion away from C++ and it has refreshed my Python skills very nicely.

Niall

Andrey Semashev

11:56 p.m.

On Mon, Mar 6, 2017 at 11:46 PM, Niall Douglas via Boost <boost@lists.boost.org> wrote:

...

On 06/03/2017 19:10, Edward Diener via Boost wrote:

...
On 3/6/2017 5:45 AM, Niall Douglas via Boost wrote:

...
Those of you who watch reddit/r/cpp will know I've been working for the past month on a pure Python implementation of a C99 conforming preprocessor. I am pleased to be able to ask for Boost feedback on a fairly high quality implementation:

https://github.com/ned14/pcpp

It would be nice, for the purpose of testing with Boost PP and Boost VMD, if somehow your preprocessor could somehow be plugged in to one of the compilers Boost supports, with VC++ being the most obvious choice because its preprocessor is not C++ standard conforming.

It's pretty straightforward in theory. pcpp can consume from stdin or a file, and can output to stdout or a file, so it's easy to insert into a sequence using the pipe operator (which works fine on Windows too).

Given that preprocessor checks are often used for compiler workarounds, and pcpp is not a full C++ frontend, one would have to make sure pcpp defines the same set of predefined macros the compiler does. In the particular case of MSVC that would make libraries like Boost.PP and Boost.VMD treat pcpp the same way they do MSVC, which is probably suboptimal, if at all functional. I guess, for such tandem to be workable, pcpp has to define its own predefined macros, and PP and VMD have to test it before they test other compiler-specific macros.

Edward Diener

7 Mar 7 Mar

1:56 a.m.

On 3/6/2017 6:56 PM, Andrey Semashev via Boost wrote:

...

On Mon, Mar 6, 2017 at 11:46 PM, Niall Douglas via Boost <boost@lists.boost.org> wrote:

...
On 06/03/2017 19:10, Edward Diener via Boost wrote:

...
On 3/6/2017 5:45 AM, Niall Douglas via Boost wrote:

...
Those of you who watch reddit/r/cpp will know I've been working for the past month on a pure Python implementation of a C99 conforming preprocessor. I am pleased to be able to ask for Boost feedback on a fairly high quality implementation:

https://github.com/ned14/pcpp

It would be nice, for the purpose of testing with Boost PP and Boost VMD, if somehow your preprocessor could somehow be plugged in to one of the compilers Boost supports, with VC++ being the most obvious choice because its preprocessor is not C++ standard conforming.

It's pretty straightforward in theory. pcpp can consume from stdin or a file, and can output to stdout or a file, so it's easy to insert into a sequence using the pipe operator (which works fine on Windows too).

Given that preprocessor checks are often used for compiler workarounds, and pcpp is not a full C++ frontend, one would have to make sure pcpp defines the same set of predefined macros the compiler does. In the particular case of MSVC that would make libraries like Boost.PP and Boost.VMD treat pcpp the same way they do MSVC, which is probably suboptimal, if at all functional. I guess, for such tandem to be workable, pcpp has to define its own predefined macros, and PP and VMD have to test it before they test other compiler-specific macros.

Exactly ! Still merely to run the Boost PP and Boost VMD tests, which are decent tests for much hardcore C++ standard preprocessor conformance, pcpp could minimally define __cplusplus >= 201103L or __STDC_VERSION__ >= 199901L, without necessarily identifying itself otherwise, and Boost PP/Boost VMD will treat the preprocessor as strictly C++ standard conformant with variadic macro support. But what you write above is certainly correct in general, in order to test other Boost libraries and end-user's code, with pcpp as the preprocessor for some other compiler. BTW Boost already has an almost complete conformant C++ preprocesor in Boost Wave. I could not have developed VMD or helped support Boost PP without its ability to show correct macro expansion. It has been absolutely invaluable in that respect.

Niall Douglas

10:55 a.m.

...

...
Given that preprocessor checks are often used for compiler workarounds, and pcpp is not a full C++ frontend, one would have to make sure pcpp defines the same set of predefined macros the compiler does. In the particular case of MSVC that would make libraries like Boost.PP and Boost.VMD treat pcpp the same way they do MSVC, which is probably suboptimal, if at all functional. I guess, for such tandem to be workable, pcpp has to define its own predefined macros, and PP and VMD have to test it before they test other compiler-specific macros.

You've struck exactly at the main use case for pcpp, and why I didn't start from Wave or an existing preprocessor but instead reinvented the wheel and in Python, not C. I specifically am implementing a "partially executing" pre-preprocessor which can be programmatically and dynamically instructed to transform a set of input files with preprocessing commands into other files with some of those preprocessor commands executed or expanded or replaced, and some passed through. pcpp by default acts as a straight preprocessor, but it can also be told to pass through #if logic it can't fully execute due to unknowns, or partially execute #if logic. It can be told to pass through #define and #undef but also execute them (or not) on a per-macro basis. It would be easy enough to tell it to not execute any preprocessing commands except #include and include guards for example. I'm sure you can see the big benefits to pregenerating canned preprocessed files such that #including a Boost library is much faster than before because most of the preprocessing work is already done. Right now generating those is tedious work using hacky scripts on source files with splicing metadata injected via comments etc which are brittle. pcpp will allow for a FAR more robust solution which can be safely left to a CI to run per commit if desired. pcpp is basically Facebook's Warp (https://github.com/facebookarchive/warp) but done much more flexibly and usefully (no offence intended to Warp's developers, but Warp isn't very useful outside a very limited use case). My ideal end goal is for a download page for a Boost library to provide a set of tick boxes and drop down menus that let a user pre-preprocess a Boost library into a custom "drop in and go" single file edition for their particular use case, just like you can with say jQuery downloads. Again choosing Python instead of C makes that safe and secure. I don't know if I'll have the time to reach that, but I'm an awful lot closer now than I was a month ago.

...

Still merely to run the Boost PP and Boost VMD tests, which are decent tests for much hardcore C++ standard preprocessor conformance, pcpp could minimally define __cplusplus >= 201103L or __STDC_VERSION__ >= 199901L, without necessarily identifying itself otherwise, and Boost PP/Boost VMD will treat the preprocessor as strictly C++ standard conformant with variadic macro support. But what you write above is certainly correct in general, in order to test other Boost libraries and end-user's code, with pcpp as the preprocessor for some other compiler.

BTW Boost already has an almost complete conformant C++ preprocesor in Boost Wave. I could not have developed VMD or helped support Boost PP without its ability to show correct macro expansion. It has been absolutely invaluable in that respect.

During the use case needs analysis for this mini-project which I conducted on Reddit (https://www.reddit.com/r/cpp/comments/5ss6cv/any_interest_in_a_python_c99_preprocessor/?st=izzekruw&sh=5e1177c9), I discounted Wave as I felt its implementation was not flexible enough for this use case of dynamic rewriting. Also an implementation written in C or C++ cannot be dynamically driven by a build process like a Python implementation can e.g. cmake can write Python script into a file and inject that into pcpp. I am also *extremely* sure that I could not have developed a conforming C preprocessor in Boost.Spirit in just eighty hours of work. Correct recursive function macro expansion turned out to be a real head scratcher, I ended up solving it using a token colouring approach. Python is so much more a productivity language and ecosystem than C++, you can write high performance high quality code so much quicker because the productivity support in the ecosystem is vastly superior. I really wish WG21 and the Standard C++ Foundation approached C++ evolution like the Python Software Foundation does, but well my views on that are very well understood by now. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

Edward Diener

3:38 p.m.

On 3/7/2017 5:55 AM, Niall Douglas via Boost wrote:

...

...
...
Given that preprocessor checks are often used for compiler workarounds, and pcpp is not a full C++ frontend, one would have to make sure pcpp defines the same set of predefined macros the compiler does. In the particular case of MSVC that would make libraries like Boost.PP and Boost.VMD treat pcpp the same way they do MSVC, which is probably suboptimal, if at all functional. I guess, for such tandem to be workable, pcpp has to define its own predefined macros, and PP and VMD have to test it before they test other compiler-specific macros.

You've struck exactly at the main use case for pcpp, and why I didn't start from Wave or an existing preprocessor but instead reinvented the wheel and in Python, not C. I specifically am implementing a "partially executing" pre-preprocessor which can be programmatically and dynamically instructed to transform a set of input files with preprocessing commands into other files with some of those preprocessor commands executed or expanded or replaced, and some passed through.

pcpp by default acts as a straight preprocessor, but it can also be told to pass through #if logic it can't fully execute due to unknowns, or partially execute #if logic. It can be told to pass through #define and #undef but also execute them (or not) on a per-macro basis. It would be easy enough to tell it to not execute any preprocessing commands except #include and include guards for example.

I'm sure you can see the big benefits to pregenerating canned preprocessed files such that #including a Boost library is much faster than before because most of the preprocessing work is already done. Right now generating those is tedious work using hacky scripts on source files with splicing metadata injected via comments etc which are brittle. pcpp will allow for a FAR more robust solution which can be safely left to a CI to run per commit if desired.

pcpp is basically Facebook's Warp (https://github.com/facebookarchive/warp) but done much more flexibly and usefully (no offence intended to Warp's developers, but Warp isn't very useful outside a very limited use case).

My ideal end goal is for a download page for a Boost library to provide a set of tick boxes and drop down menus that let a user pre-preprocess a Boost library into a custom "drop in and go" single file edition for their particular use case, just like you can with say jQuery downloads. Again choosing Python instead of C makes that safe and secure. I don't know if I'll have the time to reach that, but I'm an awful lot closer now than I was a month ago.

The practical problem with this is that source files with preprocessor directives often depend on the compiler being used, with its predefined macros, to generate the correct output.

...

snip...

Niall

Niall Douglas

4:16 p.m.

...

...
pcpp by default acts as a straight preprocessor, but it can also be told to pass through #if logic it can't fully execute due to unknowns, or partially execute #if logic. It can be told to pass through #define and #undef but also execute them (or not) on a per-macro basis. It would be easy enough to tell it to not execute any preprocessing commands except #include and include guards for example.

The practical problem with this is that source files with preprocessor directives often depend on the compiler being used, with its predefined macros, to generate the correct output.

Do note pcpp can pass through #if logic which uses undefined macros. So as compiler specific macros will be undefined, all #if logic relating to them passes through. The output is therefore still portable across compilers. Don't get me wrong, there is still a small amount of hand tuning involved, so a human does need to grok the output and make sure it's coming out okay and if not, add another command line argument to force things. But so far, at least for Outcome, it is coming out pretty much right first time (though this entirely could be how I write my preprocessor, and Outcome being all C++ 14 doesn't support broken compilers except for MSVC. So my use case is much simpler than most). Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

Edward Diener

5:03 p.m.

On 3/7/2017 11:16 AM, Niall Douglas via Boost wrote:

...

...
...
pcpp by default acts as a straight preprocessor, but it can also be told to pass through #if logic it can't fully execute due to unknowns, or partially execute #if logic. It can be told to pass through #define and #undef but also execute them (or not) on a per-macro basis. It would be easy enough to tell it to not execute any preprocessing commands except #include and include guards for example.

The practical problem with this is that source files with preprocessor directives often depend on the compiler being used, with its predefined macros, to generate the correct output.

Do note pcpp can pass through #if logic which uses undefined macros. So as compiler specific macros will be undefined, all #if logic relating to them passes through. The output is therefore still portable across compilers.

Passing through all #if logic cannot be right ! #if SOME_PREDEFINED_MACRO >= some_value some_code #endif #if SOME_PREDEFINED_MACRO < some_value some_other_code #endif If both "some_code" and "some_other_code" are passed through I doubt the output will be correct. Because of quirks in compiler preprocessors. especially VC++, a great deal of macro logic in Boost PP and Boost VMD is based in identifying compilers and their levels of C/C++ standards compliance, which is done by compiler predefined macros. I would imagine that this could be extended to others writing their own macros, whether they use Boost PP/VMD or not. Of course various cross-platform header files also depend on such information. I am not trying to minimize your effort in any way in writing pcpp. I am just saying that to use it as the preprocessor front-end for various compilers is more complicated than I believe you think. Unless pcpp can create the same predefined macros which the backend compiler's preprocessor creates I doubt if the output can be reliable in many situations.

...

Don't get me wrong, there is still a small amount of hand tuning involved, so a human does need to grok the output and make sure it's coming out okay and if not, add another command line argument to force things. But so far, at least for Outcome, it is coming out pretty much right first time (though this entirely could be how I write my preprocessor, and Outcome being all C++ 14 doesn't support broken compilers except for MSVC. So my use case is much simpler than most).

Oswin Krause

5:09 p.m.

Hi,

...

...
Do note pcpp can pass through #if logic which uses undefined macros. So as compiler specific macros will be undefined, all #if logic relating to them passes through. The output is therefore still portable across compilers.

Passing through all #if logic cannot be right !

#if SOME_PREDEFINED_MACRO >= some_value some_code #endif

#if SOME_PREDEFINED_MACRO < some_value some_other_code #endif

Do you care to elaborate? If pcpp does not touch the lines above(i.e. passes through everything including the #ifdef) i can hardly see a problem for the next level preprocessor of the original compiler

...

If both "some_code" and "some_other_code" are passed through I doubt the output will be correct.

Because of quirks in compiler preprocessors. especially VC++, a great deal of macro logic in Boost PP and Boost VMD is based in identifying compilers and their levels of C/C++ standards compliance, which is done by compiler predefined macros. I would imagine that this could be extended to others writing their own macros, whether they use Boost PP/VMD or not. Of course various cross-platform header files also depend on such information.

I am not trying to minimize your effort in any way in writing pcpp. I am just saying that to use it as the preprocessor front-end for various compilers is more complicated than I believe you think. Unless pcpp can create the same predefined macros which the backend compiler's preprocessor creates I doubt if the output can be reliable in many situations.

...
Don't get me wrong, there is still a small amount of hand tuning involved, so a human does need to grok the output and make sure it's coming out okay and if not, add another command line argument to force things. But so far, at least for Outcome, it is coming out pretty much right first time (though this entirely could be how I write my preprocessor, and Outcome being all C++ 14 doesn't support broken compilers except for MSVC. So my use case is much simpler than most).

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Niall Douglas

5:19 p.m.

...

...
Do note pcpp can pass through #if logic which uses undefined macros. So as compiler specific macros will be undefined, all #if logic relating to them passes through. The output is therefore still portable across compilers.

Passing through all #if logic cannot be right !

#if SOME_PREDEFINED_MACRO >= some_value some_code #endif

#if SOME_PREDEFINED_MACRO < some_value some_other_code #endif

If both "some_code" and "some_other_code" are passed through I doubt the output will be correct.

Edward, it passes through the #if statements and their associated #endif etc, not just the clauses they wrap. So basically if it can't execute an #if because the expression contains an undefined macro, it pass through the entire thing which can be executed by a later preprocessor. Does this make sense now?

...

I am not trying to minimize your effort in any way in writing pcpp. I am just saying that to use it as the preprocessor front-end for various compilers is more complicated than I believe you think. Unless pcpp can create the same predefined macros which the backend compiler's preprocessor creates I doubt if the output can be reliable in many situations.

It only part-executes the preprocessor commands, leaving behind the stuff you tell it. There is an example of the pass through on the front page of the github if that helps. It shows the partial execution and pass through of preprocessor commands. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

Edward Diener

6:52 p.m.

On 3/7/2017 12:19 PM, Niall Douglas via Boost wrote:

...

...
...
Do note pcpp can pass through #if logic which uses undefined macros. So as compiler specific macros will be undefined, all #if logic relating to them passes through. The output is therefore still portable across compilers.

Passing through all #if logic cannot be right !

#if SOME_PREDEFINED_MACRO >= some_value some_code #endif

#if SOME_PREDEFINED_MACRO < some_value some_other_code #endif

If both "some_code" and "some_other_code" are passed through I doubt the output will be correct.

Edward, it passes through the #if statements and their associated #endif etc, not just the clauses they wrap.

So basically if it can't execute an #if because the expression contains an undefined macro, it pass through the entire thing which can be executed by a later preprocessor.

Does this make sense now?

...
I am not trying to minimize your effort in any way in writing pcpp. I am just saying that to use it as the preprocessor front-end for various compilers is more complicated than I believe you think. Unless pcpp can create the same predefined macros which the backend compiler's preprocessor creates I doubt if the output can be reliable in many situations.

It only part-executes the preprocessor commands, leaving behind the stuff you tell it.

There is an example of the pass through on the front page of the github if that helps. It shows the partial execution and pass through of preprocessor commands.

I did not understand what you meant by passthrough. In that case pcpp only serves as first level preprocessor and will very often need a second level preprocessor to fully preprocess the code. OK, I now understand that.

...

Niall

Hartmut Kaiser

8:36 p.m.

...

I discounted Wave as I felt its implementation was not flexible enough for this use case of dynamic rewriting.

Wave has a full command line based preprocessor ready to use (see https://github.com/boostorg/wave/tree/develop/tool). No work on your end should have been necessary at all.

...

Also an implementation written in C or C++ cannot be dynamically driven by a build process like a Python implementation can e.g. cmake can write Python script into a file and inject that into pcpp.

Sorry, I don't understand what this means. Could you elaborate, please?

...

I am also *extremely* sure that I could not have developed a conforming C preprocessor in Boost.Spirit in just eighty hours of work.

I don't think your preprocessor is conforming. I had a quick look at it today. From the tests I performed I came away with the impression that while a large amount of work has been done, it still requires a lot of work in order to turn it into a really conforming preprocessor. I just ran the tests of the wave test suite and discovered a diverse set of problems around reporting of line number information, macro preprocessing problems (where things get rescanned either too often or not often enough - depending on the context), non-conforming whitespace placement in the generated output (things which should get concatenated are not or v.v.), conditional expression evaluation problems, universal character encoding problems, missing error reporting for various use cases of conditional preprocessing, invalid/missing recognition of preprocessing numbers, and probably more. I'd suggest that you use the existing test suites (there are several available) to verify the functionality of your preprocessor before claiming for it to be conforming.

...

Correct recursive function macro expansion turned out to be a real head scratcher, I ended up solving it using a token colouring approach.

Yes, getting this right is non-trival. The current version of the code does not get it right, however. Especially the handling of placeholder tokens is tricky, just one example: #define NIL #define A B NIL #define B() anything A() This should generate 'B()' and not 'anything' (as one might think). HTH Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu

Niall Douglas

10:23 p.m.

...

...
I am also *extremely* sure that I could not have developed a conforming C preprocessor in Boost.Spirit in just eighty hours of work.

I don't think your preprocessor is conforming.

It is not. Apart from that one sentence above you chose to quote I have never claimed it was conforming, and on its Readme page at the front of its github is a list of known non-conformances.

...

I had a quick look at it today. From the tests I performed I came away with the impression that while a large amount of work has been done, it still requires a lot of work in order to turn it into a really conforming preprocessor. I just ran the tests of the wave test suite and discovered a diverse set of problems around reporting of line number information,

#line isn't implemented as I have no personal need for it, as the Readme says. pcpp generates line directives, and can pass them through, but does not parse them.

...

macro preprocessing problems (where things get rescanned either too often or not often enough - depending on the context), non-conforming whitespace placement in the generated output (things which should get concatenated are not or v.v.),

These were not known. But I am not surprised. Whitespace implementation in particular was rushed.

...

conditional expression evaluation problems, universal character encoding problems, missing error reporting for various use cases of conditional preprocessing, invalid/missing recognition of preprocessing numbers, and probably more.

These are also stated as known not to work in the Readme. The error handling isn't ideal either, I cut corners assuming the input is a valid program.

...

I'd suggest that you use the existing test suites (there are several available) to verify the functionality of your preprocessor before claiming for it to be conforming.

It uses a modified edition of the mcpp suite which it passes. And the tricky examples given in the C11 standard. I have the Wave test suite in my directory structure but it is not committed yet as I haven't written the support code for the unit testing yet. I intend to commit it, but leave the tests disabled on the CI as I don't intend to fix failures.

...

...
Correct recursive function macro expansion turned out to be a real head scratcher, I ended up solving it using a token colouring approach.

Yes, getting this right is non-trival. The current version of the code does not get it right, however. Especially the handling of placeholder tokens is tricky, just one example:

#define NIL #define A B NIL #define B() anything A()

This should generate 'B()' and not 'anything' (as one might think).

Yes that's a bug, and actually one I knew about already. Another is it currently can't cope with files included that don't terminate with a newline. I'll fix those, but I don't intend to bring pcpp up to passing all of the Wave test suite as pcpp works just fine with my code, and that's all I need it for. Those wanting better conformance are welcome to send pull requests. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

Hartmut Kaiser

10:33 p.m.

...

...
...
I am also *extremely* sure that I could not have developed a conforming C preprocessor in Boost.Spirit in just eighty hours of work.

I don't think your preprocessor is conforming.

It is not. Apart from that one sentence above you chose to quote I have never claimed it was conforming, and on its Readme page at the front of its github is a list of known non-conformances.

The language you used in the paragraph above implies it was conforming. Also, I thought you'd be interested in fixing your preprocessor, but alas - you don't. I guess that's fine by me. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu

Thomas Heller

8 Mar 8 Mar

6:08 a.m.

Am 06.03.2017 11:45 vorm. schrieb "Niall Douglas via Boost" < boost@lists.boost.org>: Those of you who watch reddit/r/cpp will know I've been working for the past month on a pure Python implementation of a C99 conforming preprocessor. I am pleased to be able to ask for Boost feedback on a fairly high quality implementation: https://github.com/ned14/pcpp It passes the C11 standard's list of "tricky" preprocessor expansions and the mcpp test suite. It has no issue correctly handling the preprocessor metaprogramming I've thrown at it from my Boost libraries. I'm not going to claim it can handle Boost.Preprocessor or especially Boost.VMD and I've tried neither, but pull requests with source code fixes adding support for those (with accompanying unit tests) would be welcome. My main use case for writing this is to assemble my header only Boost libraries into a single "drop in and go" file. To that end it has a (still buggy) --passthru mode which passes through #define and #undef plus any #if logic which uses an undefined macro. I'm still working on pass through mode so expect showstopper bugs in that configuration, but as a straight highly standards conforming preprocessor it's ready for others to use and I welcome any feedback. There are a multitude of use cases, everything from running as a conforming preprocessor before invoking MSVC to compile right through to parsing, reflection and introspection of source code. I did something very similar to this wave quite a few years back. We dubbed it partial preprocessing. It's still in use in Phoenix, iirc. We developed integrations for b2 and later on integrated it into our cmake build process for HPX.

Thomas Heller

8:35 a.m.

On Mittwoch, 8. März 2017 07:08:00 CET you wrote:

...

Am 06.03.2017 11:45 vorm. schrieb "Niall Douglas via Boost" < boost@lists.boost.org>:

Those of you who watch reddit/r/cpp will know I've been working for the past month on a pure Python implementation of a C99 conforming preprocessor. I am pleased to be able to ask for Boost feedback on a fairly high quality implementation:

https://github.com/ned14/pcpp

It passes the C11 standard's list of "tricky" preprocessor expansions and the mcpp test suite. It has no issue correctly handling the preprocessor metaprogramming I've thrown at it from my Boost libraries. I'm not going to claim it can handle Boost.Preprocessor or especially Boost.VMD and I've tried neither, but pull requests with source code fixes adding support for those (with accompanying unit tests) would be welcome.

My main use case for writing this is to assemble my header only Boost libraries into a single "drop in and go" file. To that end it has a (still buggy) --passthru mode which passes through #define and #undef plus any #if logic which uses an undefined macro. I'm still working on pass through mode so expect showstopper bugs in that configuration, but as a straight highly standards conforming preprocessor it's ready for others to use and I welcome any feedback. There are a multitude of use cases, everything from running as a conforming preprocessor before invoking MSVC to compile right through to parsing, reflection and introspection of source code.

I did something very similar to this wave quite a few years back. We dubbed it partial preprocessing. It's still in use in Phoenix, iirc. We developed integrations for b2 and later on integrated it into our cmake build process for HPX.

Just to explain the process we developed... We found that in prior times, before variadic templates, our variadics emulation using Boost.PP iterations etc too slow. To mitigate those costs, we decided to use wave to perform those preprocessor iterations, thus partial preprocessing. This was perfectly feasible to do without sacrificing portability. Our cmake implementation can be found here: https://github.com/STEllAR-GROUP/hpx/blob/ 18ff9829e732b32d8ff45a56ceaf0c4e3be99033/cmake/HPX_Preprocessing.cmake What we did is essentially let CMake create a wave.cfg, depending on the present system headers, which tells wave to not expand certain macros to retain their functionality to hide platform specific irks. The wave.cfg template can be found here: https://github.com/STEllAR-GROUP/hpx/blob/ 18ff9829e732b32d8ff45a56ceaf0c4e3be99033/cmake/templates/wave.cfg.in Usage is here: https://github.com/STEllAR-GROUP/hpx/blob/ 18ff9829e732b32d8ff45a56ceaf0c4e3be99033/preprocess/CMakeLists.txt With those you can enable specific headers to get partially preprocessed, for example here, where only the BOOST_PP_ITERATOR gets expanded, and written to a specific file using waves powerful pragmas: https://github.com/STEllAR-GROUP/hpx/blob/ 18ff9829e732b32d8ff45a56ceaf0c4e3be99033/hpx/runtime/components/ memory_block.hpp#L340-L359 The output can be observed here: https://github.com/STEllAR-GROUP/hpx/tree/ 18ff9829e732b32d8ff45a56ceaf0c4e3be99033/hpx/runtime/components/preprocessed The advantage here is certainly to have a fully compliant C99 preprocessor at our disposal. Your usecase might differ slightly, but I am pretty confident that with a little change to our process, completely feasible. The only reason we got rid of wave 2 years ago was that we made variadic templates a prerequesite... HTH, Thomas

Niall Douglas

9 Mar 9 Mar

12:15 a.m.

...

...
My main use case for writing this is to assemble my header only Boost libraries into a single "drop in and go" file. To that end it has a (still buggy) --passthru mode which passes through #define and #undef plus any #if logic which uses an undefined macro. I'm still working on pass through mode so expect showstopper bugs in that configuration, but as a straight highly standards conforming preprocessor it's ready for others to use and I welcome any feedback. There are a multitude of use cases, everything from running as a conforming preprocessor before invoking MSVC to compile right through to parsing, reflection and introspection of source code.

I did something very similar to this wave quite a few years back. We dubbed it partial preprocessing. It's still in use in Phoenix, iirc. We developed integrations for b2 and later on integrated it into our cmake build process for HPX. [snip] The advantage here is certainly to have a fully compliant C99 preprocessor at our disposal. Your usecase might differ slightly, but I am pretty confident that with a little change to our process, completely feasible. The only reason we got rid of wave 2 years ago was that we made variadic templates a prerequesite...

Interesting work of yours, and thanks for describing it. It is unfortunate you did not more widely publish its existence, else Facebook wouldn't have gone off and written Warp and I'm sure a raft of other people wouldn't have gone off and written their own hack solutions e.g. Phil Nash has a bespoke system in CATCH for example. Perhaps you should consider presenting at a major C++ conference on this work so a video search result pops up for those who come at this problem in the future? Your description also confirmed to me something which I suspected from studying the Wave sources before I started pcpp, which is that Wave is hard to customise in the way I was wanting. The technique of generating long lists of macros to not expand is one approach to solving this problem, however I believe one can take a more programmatic approach based on _source annotation_. I'm still testing the feasibility of my idea, and if it doesn't work I'll fall back onto the same approach you took, but if it does work then it'll be a much less brittle approach to take. I also have the very big advantage that I can embed Python script into my C++ headers to customise pcpp processing, not that I've leveraged that yet. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

Hartmut Kaiser

7:28 a.m.

...

...
...
My main use case for writing this is to assemble my header only Boost libraries into a single "drop in and go" file. To that end it has a (still buggy) --passthru mode which passes through #define and #undef plus any #if logic which uses an undefined macro. I'm still working on pass through mode so expect showstopper bugs in that configuration, but as a straight highly standards conforming preprocessor it's ready for others to use and I welcome any feedback. There are a multitude of use cases, everything from running as a conforming preprocessor before invoking MSVC to compile right through to parsing, reflection and introspection of source code.

I did something very similar to this wave quite a few years back. We dubbed it partial preprocessing. It's still in use in Phoenix, iirc. We developed integrations for b2 and later on integrated it into our cmake build process for HPX. [snip] The advantage here is certainly to have a fully compliant C99 preprocessor at our disposal. Your usecase might differ slightly, but I am pretty confident that with a little change to our process, completely feasible. The only reason we got rid of wave 2 years ago was that we made variadic templates a prerequesite...

Interesting work of yours, and thanks for describing it. It is unfortunate you did not more widely publish its existence, else Facebook wouldn't have gone off and written Warp and I'm sure a raft of other people wouldn't have gone off and written their own hack solutions e.g. Phil Nash has a bespoke system in CATCH for example. Perhaps you should consider presenting at a major C++ conference on this work so a video search result pops up for those who come at this problem in the future?

Your description also confirmed to me something which I suspected from studying the Wave sources before I started pcpp, which is that Wave is hard to customise in the way I was wanting. The technique of generating long lists of macros to not expand is one approach to solving this problem, however I believe one can take a more programmatic approach based on _source annotation_. I'm still testing the feasibility of my idea, and if it doesn't work I'll fall back onto the same approach you took, but if it does work then it'll be a much less brittle approach to take. I also have the very big advantage that I can embed Python script into my C++ headers to customise pcpp processing, not that I've leveraged that yet.

You might have seen that wave uses special #pragma to control various parameters of the preprocessing process. That can be used to achieve what you want as you can easily implement your own. Besides, customizing wave is not more difficult than writing one of the hooks pcpp is providing. I'd even say it's easier as wave gives you a tokenized stream representing the input sequence instead of the raw character stream. The only drawback is that you have many more knobs to turn as there are hooks for a plethora of preprocessor 'events' to weave your code into. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu

Hans Dembinski

8 Mar 8 Mar

8:29 a.m.

Dear Niall,

...

Those of you who watch reddit/r/cpp will know I've been working for the past month on a pure Python implementation of a C99 conforming preprocessor. I am pleased to be able to ask for Boost feedback on a fairly high quality implementation:

https://github.com/ned14/pcpp

did you do some benchmarks on how fast pcpp is compared to a "normal" C-based preprocessor? I understand that you wrote this implementation to get more features and better standard-compliance than some commercial preprocessors, but since some people around me have claimed that preprocessing takes a significant fraction of total compile time, I wonder about performance. Best regards, Hans

Niall Douglas

9 Mar 9 Mar

3:18 p.m.

Sorry the late reply, your email was filed as Spam because it SPF failed. Here was the SPF Failure cause: Received-SPF: Softfail (domain owner discourages use of this host) identity=mailfrom; client-ip=149.217.99.100; helo=mail2.mpi-hd.mpg.de; envelope-from=hans.dembinski@gmail.com; receiver=s_sourceforge@nedprod.com You might want to fix this. On 08/03/2017 08:29, Hans Dembinski wrote:

...

Dear Niall,

...
Those of you who watch reddit/r/cpp will know I've been working for the past month on a pure Python implementation of a C99 conforming preprocessor. I am pleased to be able to ask for Boost feedback on a fairly high quality implementation:

https://github.com/ned14/pcpp

did you do some benchmarks on how fast pcpp is compared to a "normal" C-based preprocessor?

Not with really large inputs yet, no. But its scaling curves are ideal, so it's linear to tokens processed, linear to macros expanded, linear to macros defined. It's an ideally minimum copy implementation made easy by Python never copying anything unless asked, plus we keep token objects below 512 bytes so the small object Python allocator is used instead of malloc. In absolute terms it will always be far slower than a C or C++ implementation. But we're talking half a second versus a tenth of a second here for small inputs. I would suspect for large inputs the gap will close, Python ain't half bad at performance once objects are allocated, especially Python 3 where pcpp runs noticeably faster than on Python 2. I haven't tried pcpp with PyPy (JIT compiler) yet, but it does nothing weird so it should work. That would close the absolute performance gap substantially I would guess.

...

I understand that you wrote this implementation to get more features and better standard-compliance than some commercial preprocessors, but since some people around me have claimed that preprocessing takes a significant fraction of total compile time, I wonder about performance.

If a build step can pre-preprocess all #includes into a single file and run most of the preprocessing, compilers can parse it in much quicker. If that step takes a few seconds but saves minutes for the overall build. you win. I can't say anything about MSVC, but GCC and clang they have a special fast path for the preprocessor for chunks of text with no macro expansions possible in it. With already preprocessed input, each translation unit therefore can save big and thus the overall build time substantially reduces. That's why Facebook Warp, HPX and other projects have implemented a pre-preprocessing build step. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

3040

Age (days ago)

3043

Last active (days ago)

List overview

Download

21 comments

7 participants

participants (7)

Andrey Semashev
Edward Diener
Hans Dembinski
Hartmut Kaiser
Niall Douglas
Oswin Krause
Thomas Heller