On Tue, Sep 22, 2020 at 1:13 AM Gavin Lambert via Boost
Regarding the SAX parser specifically, I dislike the *_part methods in the callback interface. I feel that these should be removed and only the final form retained, called only when the complete text of the value has been determined.
The parser (and serializer) support incremental / streaming operation. To provide the "complete" text would require allocating memory and saving the partial result until enough input has been supplied to finish the text. basic_parser does not currently allocate memory, which was a design goal, and this would change that.
I don't feel it is ever sensible to call back with a partially-parsed value (unless I am misunderstanding what these are for, but there is little explanation in the docs, and I wasn't able to get them called at all in my test).
In order to see partial results, you have to invoke the parser with partial JSON. In other words, break up the input JSON into two or more consecutive buffers of characters, and supply them in order.
I also don't see the point of supplying a size argument to on_key* and on_string, as this can be read from the string_view argument instead.
Right, so this size argument is actually the total number of characters, while the string_view::size is the number of characters in the current piece.
having a bool return as well as an error_code output argument seems peculiar; one or the other should be sufficient, as is it just invites inconsistency.
Peculiar, but fast :) this was profiled and optimized. The bool return plus ec is superior to having only ec.
parser.write() is incremental but basic_parser.write() is not, which seems odd. I assume this means that parser is storing an incrementally populated buffer of the full document rather than _really_ incrementally parsing.
Everything is incremental. parser::write() has the condition that if there are any leftover characters after receiving a complete JSON, an error is returned. Contrast this with parser::write_some which does not produce an error in this case. basic_parser::write is more similar to parser::write_some. We will be renaming it to basic_parser::write_some, as another reviewer suggested, to make this more clear. There is also a synergy with asio's naming with that change.
parse/parser do not appear to support consuming only part of the supplied input as a complete JSON value (returning how much was consumed), which is useful for streams where multiple objects are concatenated without explicit separation.
parse() the free function does not, but parser does - use parser::write_some.
The "Using Numbers" section of the docs appears to be entirely blank.
This is fixed in develop.
(On a related note, an example use case for serializer is listed as "ensure a fixed amount of work is performed in each cycle", but the only means of control is the length of the provided buffer. I guess this is equating work to number of characters output, which seems dubious.
Well, yeah, that's exactly what it is. Consider a network server that delivers JSON to clients. We don't want a completion handler to be stuck serializing 100MB of JSON while another client is waiting who only needs 1MB serialized. The way to ensure fairness here is to serialize incrementally with some reasonable buffer size, say 32kb. The light client will finish in ~33 i/o cycles while the heavy client still makes progress.
It seems less suited for JSON as configuration file -- while it can parse such files (via common extensions such as ignoring comments and trailing commas) it lacks means to pretty-print and to round-trip the comments, which would be essential if writing back a config file intended for human use.
To store the comments would mean storing them in the DOM, which would have a significant impact on performance as the space for everything is tightly controlled.
The basic_parser examples do not compile out of the box -- with the suggested json.hpp and src.hpp includes there are link errors with basic_parser symbols. Additionally including basic_parser.hpp resolves these but starts complaining about missing max_{key,string,array,object}_size members of the handler type, which are not documented. After these are supplied then it works as expected. I don't see why these should be needed, however.
Those max sizes should be documented in the handler exemplar: https://master.json.cpp.al/json/ref/boost__json__basic_parser.html#json.ref.... The reason they are required is for optimization. Handling the limit in the basic_parser leads to performance gains. Thanks