Em sex., 18 de set. de 2020 às 05:57, Niall Douglas via Boost
Firstly, that was a great essay on the backing theory Vinicius. I only wish more people who write parsers would read that first. I would urge you to convert it into a blog post or something similar, post it online so people can find it, and all that great explanation of theory doesn't get lost forever.
Thanks, Niall. You can share this link if you want: https://gitlab.com/-/snippets/2016550
In the very specific case of parsing JSON however, I'm not sure if the standard rules of evaluation apply. The author of sajson claims that most of his speed comes from not being a pull parser. What you do is zero copy DMA the incoming socket data into a memory mapped buffer, execute sajson's AST parse upon that known sized memory mapped buffer which encodes the AST directly into the source by modifying the buffer in place to avoid dynamic memory allocations completely, and voila bam there's your JSON parsed with a strict minimum of memory copied or cache lines modified. He claims, and I have no reason to doubt him, that because he can make these hard coded assumptions about the input buffer, he was able to make a very fast JSON parser (amongst the fastest non-SIMD parsers). By inference, a pull parser couldn't be as fast.
I find that explanation by sajson's author compelling. The fact he completely avoids dynamic memory allocation altogether, and builds the AST inline into the original buffer of JSON, is particularly compelling.
Design-wise, sajson has at least 2 tricks worth discussing: - It doesn't expose stream events to the user. So the pull/push taxonomy doesn't really apply here. - It modifies the input stream. That's a destructive parsing technique. That's the section "a faster DOM tree" from my review. Thanks for bringing this project to our attention. As for Boost.JSON, none of the above matters. Point 1 one could matter if the parser was an implementation detail, but that's not the case. Leaving the Boost.JSON review topic aside for a sentence, I share your assessment.
I haven't looked at Boost.JSON. But it seems to target a more C++ idiomatic API, be pluggable for other formats like Boost.Serialisation, but retain most of the performance of JSON parsers such as sajson or simdjson. As Boost reviews primarily review API design, Boost.JSON's choice of approach fits well for the process here. Boost prefers purity over performance.
`json::value` can have integration to Boost.Serialization, but its parser (this would be the archive concept) can't. I could write a detailed explanation here like I've done for the pull/push taxonomy... maybe another day.
I suspect most users of JSON by far would have the exact same attitude as I do. For users like us, we really don't care what the parser does, or how it is designed, or whatever crappy API it might have, all we care about is maximum possible data extraction performance. Never ever calling malloc is an excellent sign of the right kind of JSON parser design, at least in my book.
A coworker of mine has thoughts similar to yours. Great guy. Anyway, I feel like resuming the project that I've put on stall, so that's my goodbye. I'll still keep an eye on the discussions, but I'll try to stay mostly silent. Have fun, you all. And nice talking to you again, Niall. How many years since we worked together (even if only for a brief period)? :) -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/