On 17/09/2020 20:30, VinÃcius dos Santos Oliveira via Boost wrote:
As it has been explained before, push parsers don't compose. And you aren't limited to root-level scanning. You should have `json::partial::scanf()` to act on subtrees too. A prototype for this idea can be found at https://github.com/breese/trial.protocol/pull/43.
Firstly, that was a great essay on the backing theory Vinicius. I only wish more people who write parsers would read that first. I would urge you to convert it into a blog post or something similar, post it online so people can find it, and all that great explanation of theory doesn't get lost forever.
## Review questions
Please be explicit about your decision (ACCEPT or REJECT).
REJECT.
I understand your motivation here, and given nobody else on this list will say this here, you are absolutely right that in the general case, pull based parsers are the right choice. I chose a pull design for pcpp (the pure Python C preprocessor), despite that strictly speaking it's really not necessary for parsing contents whose length is always fully known in advance. However, pull based designs are just better in general, more flexible, more extensible. In the very specific case of parsing JSON however, I'm not sure if the standard rules of evaluation apply. The author of sajson claims that most of his speed comes from not being a pull parser. What you do is zero copy DMA the incoming socket data into a memory mapped buffer, execute sajson's AST parse upon that known sized memory mapped buffer which encodes the AST directly into the source by modifying the buffer in place to avoid dynamic memory allocations completely, and voila bam there's your JSON parsed with a strict minimum of memory copied or cache lines modified. He claims, and I have no reason to doubt him, that because he can make these hard coded assumptions about the input buffer, he was able to make a very fast JSON parser (amongst the fastest non-SIMD parsers). By inference, a pull parser couldn't be as fast. I find that explanation by sajson's author compelling. The fact he completely avoids dynamic memory allocation altogether, and builds the AST inline into the original buffer of JSON, is particularly compelling. I haven't looked at Boost.JSON. But it seems to target a more C++ idiomatic API, be pluggable for other formats like Boost.Serialisation, but retain most of the performance of JSON parsers such as sajson or simdjson. As Boost reviews primarily review API design, Boost.JSON's choice of approach fits well for the process here. Boost prefers purity over performance. Personally speaking, for JSON I care solely and exclusively about maximum possible parse speed. I have no use for customisation or extensibility. If Boost.JSON beats sajson in in-place AST building and it also beats simdjson, I'll use it. If it doesn't, I won't. I suspect most users of JSON by far would have the exact same attitude as I do. For users like us, we really don't care what the parser does, or how it is designed, or whatever crappy API it might have, all we care about is maximum possible data extraction performance. Never ever calling malloc is an excellent sign of the right kind of JSON parser design, at least in my book. Niall