Hi all, I believe that Metaparse should be accepted into Boost.
- What is your evaluation of the potential usefulness of the library?
As mentioned by others, Metaparse is a library for library developers. In terms of functionality, it is closest to Boost.Proto, although it allows defining arbitrary EDSLs as strings, instead of using the built-in C++ operators. In order to test the effectiveness of Metaparse, I built a parser for the Graphviz "dot" language [1]. This is of practical importance to me, because I have an interest in graph metaprogramming [2], specifically generating types and networks of types from graph metadata. Having a nice language for defining graphs in metadata, to support a syntax beyond angle brackets, would be very helpful. One limitation of this class of EDSL is that it can produce arbitrarily complex types, but it can't refer to any existing types. E.g. you might have the string "int" in your EDSL, but in order for it to refer to the C++ int type, you are going to need an mpl::map from type-strings to types. I didn't get this far in my evaluation, but it's really beyond the scope of the library.
- Are you knowledgeable about the problem domain?
Fairly knowledgeable. I have built an experimental parser for compile-time EDSLs over type expressions called "Angly" [4] (presented at C++Now 2012). I have learned from the considerably simpler design of Metaparse, and hope to apply it to my own library (in particular using recursive descent rather than trying to wrangle a deterministic finite automaton).
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
I spent about 5 hours getting set up and working through the tutorial, and about 7 hours getting up and running with my own parser. Once I "got it", it only took an hour or two to implement the rest of the grammar (and about 4 hours to write this review). My results (grammar and a few basic tests) are in a Gist available here: [3]. My grammar is able to parse "dot" graph descriptions that look like this "graph g { A [x=y]; \ B->C [y=z, w=a]; \ D->E; \ F; \ G->H [a=b, b=c, c=d, d=e, e=f, f=g]; \ I->J; \ }" Because I ran out of time, my parser only produces an AST; it does not yet produce MPL.Graph or Fusion.Graph types.
- Did you try to use the library? - With what compiler? - Did you have any problems?
I mostly used metashell for my evaluation, but I also tested compilation times with clang 3.5 once I had something running. Any problems I had were the usual difficulties of TMP. I was not expecting the compile times to be tolerable, but even in metashell, the turnaround was usually around 15-20s. Compiling my longest test with clang, just shy of 256 characters, took only about 4-5 seconds.
- What is your evaluation of the documentation?
Excellent. Working through the tutorial was enough for me to absorb the particular constructs needed for building grammars and semantic actions. I also used the reference documentation when building my own parser. This documentation is more terse, very much in the style of the MPL documentation. But it functions. I only skimmed the user manual, since the content looks a little bit stale (doesn't mention build_parser) and somewhat redundant with the tutorial. As Andrzej mentions, there may need to be an introduction concerning the scope of the library and what you can and can't do with this kind of EDSL, where you would use Proto instead of Metaparse, etc. I think that as the library evolves, the documentation will probably need more general parsing background, of the sort you will find in Spirit documentation. For example, the tutorial covers the unary `-` operator, but it doesn't mention why there is no ambiguity with the binary `-` operator. Also, it seems (as mentioned in the design discussion below) that the grammar must be unambiguous, and this is mentioned nowhere. Overall, the documentation is stronger on the meta- than the -parse.
- What is your evaluation of the implementation?
I did not peer into the code very much, beyond finding some examples I needed. My impression from Abel's walk-through at C++Now 2012 is that the implementation is quite simple and straightforward, based on established techniques from parsers in Haskell.
- What is your evaluation of the design?
Very good. However, I think the API might be fleshed out a little bit once the library gets more usage. In particular, I found myself writing a few helper metafunctions that looked like this:
template<typename EdgeDecl>
struct declare_edge {
typedef edge_decl
;
It may be my inexperience, but it struck me that it would be helpful to have a "spread" or "apply" transform so that the items in the mpl::vector produced by sequence would be supplied as parameters to declare_edge, instead of having to pull them out using mpl::at_c. There may be some trick I don't know here.
I will mention two other points of confusion. These are probably not design flaws in the library, but I thought I would bring them up in case they are.
First, I started out defining my AST nodes like this:
template
Hi Gordon, Thank you for the review. On 2015-06-03 23:49, Gordon Woodhull wrote:
I only skimmed the user manual, since the content looks a little bit stale (doesn't mention build_parser) and somewhat redundant with the tutorial.
The intention with the manual is to provide more focused answers (eg. what are the looping constructs and how they relate to each other?). You can learn it from the tutorial as well, but finding it is more difficult.
As Andrzej mentions, there may need to be an introduction concerning the scope of the library and what you can and can't do with this kind of EDSL, where you would use Proto instead of Metaparse, etc. Will be added.
I think that as the library evolves, the documentation will probably need more general parsing background, of the sort you will find in Spirit documentation. For example, the tutorial covers the unary `-` operator, but it doesn't mention why there is no ambiguity with the binary `-` operator.
Also, it seems (as mentioned in the design discussion below) that the grammar must be unambiguous, and this is mentioned nowhere. Will be added.
Very good. However, I think the API might be fleshed out a little bit once the library gets more usage. In particular, I found myself writing a few helper metafunctions that looked like this:
template<typename EdgeDecl> struct declare_edge { typedef edge_decl
::type, typename mpl::at_c ::type, typename mpl::at_c ::type> type; }; invoked from a transform like this:
using edge_stmt = transform< sequence
>>, mpl::quote1 ;
It may be my inexperience, but it struck me that it would be helpful to have a "spread" or "apply" transform so that the items in the mpl::vector produced by sequence would be supplied as parameters to declare_edge, instead of having to pull them out using mpl::at_c. There may be some trick I don't know here. There are first_of, last_of, middle_of and nth_of for the cases when one needs one element of a sequence. For your use case (instantiate a template class with multiple elements) using at_c is what is currently available, but some better tool can be added.
First, I started out defining my AST nodes like this:
template
struct edge_decl {}; Seems like a reasonable basic type-holder to me. As shown above, this is instantiated and returned by a helper metafunction invoked by the transform which pulls stuff out of an mpl::vector.
But then somewhere in metaparse, it tries to again "invoke" this result with ::type when appending it to another mpl::vector.
Really? So my transform needs to return not the value, but a metafunction which returns a value? It might be a bug. Do you still have the code that produced it?
I couldn't figure this out, so I gave each of my AST nodes a self-referential ::type in the way that mpl::vector has:
template
struct edge_decl { typedef edge_decl type; //why? }; Something is not right here! Note that when I was experimenting with lazy evaluation, then I came to the conclusion that adding such self-referential ::type to every class you intend to use as value in template metaprogramming makes life much easier. You don't need to worry about where you return a value and where you return a thunk.
However, this does not change the fact that Metaparse should not use ::type where it is not needed.
The second problem I encountered is about ambiguous grammars. For the "dot" language it seems natural to define a statement this way:
using stmt = one_of
; where a node statement looks like
ID [attr=value, attr=value];
and an edge statement looks like
ID -> ID [attr=value];
But there's an ambiguity here, and once the parser successfully reads a node, it is going to get upset if it sees a `->`. So the rule has to be written the other way:
using stmt = one_of
; Good point. If node_stmt succeeds, one_of won't check edge_stmt (in your first attempt). This should be pointed out in the documentation.
Similarly, the `;` separator for declarations is optional in the "dot" language, but I couldn't figure out how to implement this, because if you try to have an optional element at the beginning of a rule:
template
using opt = one_of ; using stmt_list = foldlp< sequence
, stmt>, transform mpl::vector1>, // ... then the parser will greedily eat the epsilon and have nowhere to go (at least according to my amateur understanding of parsers).
I gave up here and made the separator mandatory. Again, this is probably not a flaw in the design, but the documentation should describe what kinds of grammars will work, and perhaps give pointers to other resources about disambiguating grammars. Probably there is a different construct which would help here. I'm not sure what error you ended up with in this case. The way you implement the optional semicolon seems to be right. What I'd change here is using last_of instead of sequence assuming that you only need the result of stmt and not the entire sequence of the (optional) semicolon and the stmt result. "How you should implement different kinds of grammar elements" is difficult to present in the documentation. Currently the tutorial shows how things intended to be used (eg. it presents first_of and middle_of).
Regards, Ábel
Hi Abel,
On Jun 3, 2015, at 6:38 PM, Abel Sinkovics
First, I started out defining my AST nodes like this:
template
struct edge_decl {}; Seems like a reasonable basic type-holder to me. As shown above, this is instantiated and returned by a helper metafunction invoked by the transform which pulls stuff out of an mpl::vector.
But then somewhere in metaparse, it tries to again "invoke" this result with ::type when appending it to another mpl::vector.
Really? So my transform needs to return not the value, but a metafunction which returns a value? It might be a bug. Do you still have the code that produced it?
Sure. I commented each of the places I had trouble in my gist, https://gist.github.com/gordonwoodhull/dcbb332eea06dd3279b9 ... which was only a few places, and very small changes a testament that your library is surprisingly easy to use once you get used to it.
I couldn't figure this out, so I gave each of my AST nodes a self-referential ::type in the way that mpl::vector has:
template
struct edge_decl { typedef edge_decl type; //why? }; Something is not right here! Note that when I was experimenting with lazy evaluation, then I came to the conclusion that adding such self-referential ::type to every class you intend to use as value in template metaprogramming makes life much easier. You don't need to worry about where you return a value and where you return a thunk.
However, this does not change the fact that Metaparse should not use ::type where it is not needed.
Ah, I hope that is it. I could not figure out what I was supposed to be returning, and felt lousy about adding a ::type just to move on.
the parser will greedily eat the epsilon and have nowhere to go (at least according to my amateur understanding of parsers).
I gave up here and made the separator mandatory. Again, this is probably not a flaw in the design, but the documentation should describe what kinds of grammars will work, and perhaps give pointers to other resources about disambiguating grammars. Probably there is a different construct which would help here. I'm not sure what error you ended up with in this case.
This is also commented in the gist.
The way you implement the optional semicolon seems to be right. What I'd change here is using last_of instead of sequence assuming that you only need the result of stmt and not the entire sequence of the (optional) semicolon and the stmt result.
... and that fixed it! Which shows that my "stuck eating epsilon" explanation is probably incorrect. I've updated the gist with last_of, that's a lot better.
"How you should implement different kinds of grammar elements" is difficult to present in the documentation. Currently the tutorial shows how things intended to be used (eg. it presents first_of and middle_of).
Yes, it certainly is not easy to document. Hopefully these best practices will emerge as Metaparse gets many more users! Cheers, Gordon
Hi Gordon, On 2015-06-04 02:10, Gordon Woodhull wrote:
It might be a bug. Do you still have the code that produced it? Sure. I commented each of the places I had trouble in my gist, https://gist.github.com/gordonwoodhull/dcbb332eea06dd3279b9
... which was only a few places, and very small changes a testament that your library is surprisingly easy to use once you get used to it. This is great, thanks. I can debug it then.
I couldn't figure this out, so I gave each of my AST nodes a self-referential ::type in the way that mpl::vector has:
template
struct edge_decl { typedef edge_decl type; //why? }; Something is not right here! Note that when I was experimenting with lazy evaluation, then I came to the conclusion that adding such self-referential ::type to every class you intend to use as value in template metaprogramming makes life much easier. You don't need to worry about where you return a value and where you return a thunk.
However, this does not change the fact that Metaparse should not use ::type where it is not needed. Ah, I hope that is it. I could not figure out what I was supposed to be returning, and felt lousy about adding a ::type just to move on. Note that types representing data (containers, int_, etc) in MPL and in the standard (eg. integral_constant) have this self-recursive ::type as well which makes them easy to use in lazy code.
the parser will greedily eat the epsilon and have nowhere to go (at least according to my amateur understanding of parsers).
I gave up here and made the separator mandatory. Again, this is probably not a flaw in the design, but the documentation should describe what kinds of grammars will work, and perhaps give pointers to other resources about disambiguating grammars. Probably there is a different construct which would help here. I'm not sure what error you ended up with in this case. This is also commented in the gist.
According to gist it looks like using last_of is fine while using sequence is not. And they return different things: last_of returns the result of stmt, sequence returns a container with [optional semicolon, result of stmt] as elements. (and the code processing the result needs to expect the right thing) Regards, Ábel
On Jun 4, 2015, at 12:56 AM, Abel Sinkovics
I gave up here and made the separator mandatory. Again, this is probably not a flaw in the design, but the documentation should describe what kinds of grammars will work, and perhaps give pointers to other resources about disambiguating grammars. Probably there is a different construct which would help here. I'm not sure what error you ended up with in this case. This is also commented in the gist. According to gist it looks like using last_of is fine while using sequence is not. And they return different things: last_of returns the result of stmt, sequence returns a container with [optional semicolon, result of stmt] as elements. (and the code processing the result needs to expect the right thing)
Yeah, I was accounting for that. I got an error about trying to evaluate void::type, so maybe the problem was just that the default here also needs to return a self-recursive type:
template
It might be good to include opt<> (optional) in the library, or to have a short cookbook of common patterns like this, since it is not always obvious if one is doing the right thing. I've been thinking a lot about whether I should add something like opt<> to the library or not given that it is a common pattern in grammars. The reason I have not done it yet was that I could not come up with a reasonable Default value and having two arguments (the optional parser and what to return when it is not present) didn't look simpler to me
Hi Gordon, On 2015-06-04 08:07, Gordon Woodhull wrote: than the one_of<, return_<>> pattern. However, given that people seem to create opt anyway (as you did it as well) suggests that it should still be added. Regards, Ábel
participants (2)
-
Abel Sinkovics
-
Gordon Woodhull