On Tue, Sep 22, 2020 at 7:48 PM Gavin Lambert via Boost < boost@lists.boost.org> wrote:
On 23/09/2020 05:55, Vinnie Falco wrote:
I also don't see the point of supplying a size argument to on_key* and on_string, as this can be read from the string_view argument instead.
Right, so this size argument is actually the total number of characters, while the string_view::size is the number of characters in the current piece.
Which is a poor design choice, as I said.
How? The goal is to provide as much information as possible to the handler. Forcing users to track it themselves would introduce redundancy.
I don't really understand why. basic_parser doesn't actually store any
of the keys/values (especially as it doesn't merge value parts), so it should have no reason to impose any kind of limit. This just seems like it's doing an extra count/comparison that could be removed entirely, and removing it would naturally improve performance.
basic_parser only checks the limit; it does not enforce it. The handler is responsible for determining what that limit is. I suppose that we could add a little TMP that elides the check statically, but I don't see too much of a point in doing so. On Tue, Sep 22, 2020 at 7:48 PM Gavin Lambert via Boost < boost@lists.boost.org> wrote:
On 23/09/2020 05:55, Vinnie Falco wrote:
On Tue, Sep 22, 2020 at 1:13 AM Gavin Lambert wrote:
Regarding the SAX parser specifically, I dislike the *_part methods in the callback interface. I feel that these should be removed and only the final form retained, called only when the complete text of the value has been determined.
The parser (and serializer) support incremental / streaming operation. To provide the "complete" text would require allocating memory and saving the partial result until enough input has been supplied to finish the text. basic_parser does not currently allocate memory, which was a design goal, and this would change that.
Not necessarily. A small buffer can be statically allocated for a single value-in-progress. It's trivial for all numeric values (except perhaps pathological precision, but that can be ignored). It's harder for string values; that might have to support dynamic allocation, it's true.
I just don't see the point in having an interface that can return half of a value -- this just moves the problem of allocation and keeping track of the full value to the parser handler, which seems like an SRP violation.
I don't feel it is ever sensible to call back with a partially-parsed value (unless I am misunderstanding what these are for, but there is little explanation in the docs, and I wasn't able to get them called at all in my test).
In order to see partial results, you have to invoke the parser with partial JSON. In other words, break up the input JSON into two or more consecutive buffers of characters, and supply them in order.
I tried that, but the parse simply failed at the end of the first partial buffer and it completely ignored the second buffer. Hence my statement that basic_parser did not operate incrementally. Though see below; this appears to be a failure of example rather than of implementation.
I also don't see the point of supplying a size argument to on_key* and on_string, as this can be read from the string_view argument instead.
Right, so this size argument is actually the total number of characters, while the string_view::size is the number of characters in the current piece.
Which is a poor design choice, as I said.
parser.write() is incremental but basic_parser.write() is not, which seems odd. I assume this means that parser is storing an incrementally populated buffer of the full document rather than _really_ incrementally parsing.
Everything is incremental. parser::write() has the condition that if there are any leftover characters after receiving a complete JSON, an error is returned. Contrast this with parser::write_some which does not produce an error in this case. basic_parser::write is more similar to parser::write_some. We will be renaming it to basic_parser::write_some, as another reviewer suggested, to make this more clear. There is also a synergy with asio's naming with that change.
I wrote a print_parser (which is just like the null_parser example but prints the method calls and parameters) and invoked it like so:
json::error_code ec; print_parser parser; std::string data1 = R"j({"foo":14,"bar":1)j"; std::string data2 = R"j(8,"baz":"bo)j"; std::string data3 = R"j(b"})j"; parser.write(&data1[0], data1.size(), ec); parser.write(&data2[0], data2.size(), ec); parser.write(&data3[0], data3.size(), ec);
It stopped after calling on_int64(1, "1") and did not continue parsing, completely ignoring the second buffer, never calling any _part methods and reporting the wrong integer value. It parses correctly only if both are supplied. Hence, as I said, it is not behaving incrementally.
However I have since noticed that the missing link is that it must pass 'true' as the 'more' parameter to basic_parser.write, which is not demonstrated in any example; it may be helpful to show that. (In fact this resolves my other concern about multi-document parsing as well, which also could use an explicit example.)
Once this is done, then (as expected by the docs) it reports these calls (in part):
on_number_part("1") on_int64(18, "8")
on_string_part("bo", 2) on_string("b", 3)
So I reiterate that the number and string handling is inconsistent and these part methods are more confusing than helpful -- numbers are parsed into the complete value but strings are not, and neither give you the full text of the value (which may be important if the goal is to parse into an arbitrary-precision value type).
I do regard this as a very poor design choice, but not sufficiently so to reject the library.
(I still also regard SAX-style parsing itself as a poor design choice [vs pull parsing], due to poor composability, but that is not a failure of the library itself, except perhaps in offering it at all. But I can understand why.)
Well, yeah, that's exactly what it is. Consider a network server that delivers JSON to clients. We don't want a completion handler to be stuck serializing 100MB of JSON while another client is waiting who only needs 1MB serialized. The way to ensure fairness here is to serialize incrementally with some reasonable buffer size, say 32kb. The light client will finish in ~33 i/o cycles while the heavy client still makes progress.
I was assuming that a "deep" JSON (many nested objects) may have a different cost per output character than a wide but shallow one. But it's not a big deal.
To store the comments would mean storing them in the DOM, which would have a significant impact on performance as the space for everything is tightly controlled.
I don't see why. It's just an extra string associated with any value.
FWIW, I don't think it needs to *precisely* round-trip the comments -- it's ok to convert comments to /**/ style (that's mandatory if the output isn't pretty-printed) and to consolidate all comments associated with one value to a single comment that's always output after that value (or perhaps have 'before' and 'after' slots). This should be much more manageable.
As it stands, the library is completely useless for the case of writing a JSON config file intended for human use. It's ok if you don't want that to be a goal, but this should be made clear up front so that people don't waste time considering the library for use cases it's not intended for. (Though it would be preferable if you did support it.)
The basic_parser examples do not compile out of the box -- with the suggested json.hpp and src.hpp includes there are link errors with basic_parser symbols. Additionally including basic_parser.hpp resolves these but starts complaining about missing max_{key,string,array,object}_size members of the handler type, which are not documented. After these are supplied then it works as expected. I don't see why these should be needed, however.
Those max sizes should be documented in the handler exemplar:
< https://master.json.cpp.al/json/ref/boost__json__basic_parser.html#json.ref....
The docs I am referring to are the ones referenced in the original pre-announcement post (although now I notice that the main announcement post has a different link):
http://vinniefalco.github.io/doc/json/json/ref/boost__json__basic_parser.htm...
This is also where the "using numbers" section is entirely blank.
The reason they are required is for optimization. Handling the limit in the basic_parser leads to performance gains.
I don't really understand why. basic_parser doesn't actually store any of the keys/values (especially as it doesn't merge value parts), so it should have no reason to impose any kind of limit. This just seems like it's doing an extra count/comparison that could be removed entirely, and removing it would naturally improve performance.
Unless you're saying that moving the check to basic_parser improves the performance of parser, in which case: you're doing it wrong. Imposing restrictions on all users of basic_parser for something that only parser cares about is the wrong design choice.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost