Re: [boost] Reviewing Boost.JSON

19 Sep 2020

      Em sex., 18 de set. de 2020 às 05:57, Niall Douglas via Boost
<boost@lists.boost.org> escreveu:
...
Firstly, that was a great essay on the backing theory Vinicius. I only
wish more people who write parsers would read that first. I would urge
you to convert it into a blog post or something similar, post it online
so people can find it, and all that great explanation of theory doesn't
get lost forever.
Thanks, Niall. You can share this link if you want:
https://gitlab.com/-/snippets/2016550
...
In the very specific case of parsing JSON however, I'm not sure if the
standard rules of evaluation apply. The author of sajson claims that
most of his speed comes from not being a pull parser. What you do is
zero copy DMA the incoming socket data into a memory mapped buffer,
execute sajson's AST parse upon that known sized memory mapped buffer
which encodes the AST directly into the source by modifying the buffer
in place to avoid dynamic memory allocations completely, and voila bam
there's your JSON parsed with a strict minimum of memory copied or cache
lines modified. He claims, and I have no reason to doubt him, that
because he can make these hard coded assumptions about the input buffer,
he was able to make a very fast JSON parser (amongst the fastest
non-SIMD parsers). By inference, a pull parser couldn't be as fast.
I find that explanation by sajson's author compelling. The fact he
completely avoids dynamic memory allocation altogether, and builds the
AST inline into the original buffer of JSON, is particularly compelling.
Design-wise, sajson has at least 2 tricks worth discussing:

- It doesn't expose stream events to the user. So the pull/push taxonomy
  doesn't really apply here.
- It modifies the input stream. That's a destructive parsing
  technique. That's the section "a faster DOM tree" from my review. Thanks
  for bringing this project to our attention.

As for Boost.JSON, none of the above matters. Point 1 one could matter if
the parser was an implementation detail, but that's not the case.

Leaving the Boost.JSON review topic aside for a sentence, I share your
assessment.
...
I haven't looked at Boost.JSON. But it seems to target a more C++
idiomatic API, be pluggable for other formats like Boost.Serialisation,
but retain most of the performance of JSON parsers such as sajson or
simdjson. As Boost reviews primarily review API design, Boost.JSON's
choice of approach fits well for the process here. Boost prefers purity
over performance.
`json::value` can have integration to Boost.Serialization, but its parser
(this would be the archive concept) can't. I could write a detailed
explanation here like I've done for the pull/push taxonomy... maybe another
day.
...
I suspect most users of JSON by far would have the exact same attitude
as I do. For users like us, we really don't care what the parser does,
or how it is designed, or whatever crappy API it might have, all we care
about is maximum possible data extraction performance. Never ever
calling malloc is an excellent sign of the right kind of JSON parser
design, at least in my book.
A coworker of mine has thoughts similar to yours. Great guy.

Anyway, I feel like resuming the project that I've put on stall, so that's
my goodbye. I'll still keep an eye on the discussions, but I'll try to stay
mostly silent. Have fun, you all.

And nice talking to you again, Niall. How many years since we worked
together (even if only for a brief period)? :)

--
Vinícius dos Santos Oliveira
https://vinipsmaker.github.io/