śr., 23 wrz 2020 o 02:53 Vinnie Falco via Boost
On Tue, Sep 22, 2020 at 4:47 PM Gavin Lambert via Boost
wrote: I just don't see the point in having an interface that can return half of a value -- this just moves the problem of allocation and keeping track of the full value to the parser handler, which seems like an SRP violation.
It is the opposite of SRP violation. basic_parser is concerned only with parsing and not keeping track of the full value. The responsibility of tracking full values is moved to the caller. That way, basic_parser does not impose its opinions on how the full value should be stored. This was litigated in earlier posts if I recall (Peter?).
I don't feel it is ever sensible to call back with a partially-parsed value (unless I am misunderstanding what these are for, but there is little explanation in the docs, and I wasn't able to get them called at all in my test).
The interfaces are built up in layers. At the lowest layer you have basic_parser which concerns itself only with processing the input buffer and turning it into semantic components. It does _not_ have the responsibility of buffering input to present it as a larger component, that's the job of the handler. basic_parser is exposed to users so they can use their own schemes for dealing with the semantic components.
Above basic_parser, you have parser which uses value_stack. The value_stack does take responsibility for buffering input, and producing DOM values. As with basic_parser, both of these components are exposed to users so they can write their own layers on top.
On top of parser we have the parse() free functions, which address the most popular use-case of parsing. These functions will satisfy most needs. And for the needs that are not satisfied, users can easily implement their own parse functions with different behaviors, as the library exposes all the intermediate lower levels.
Maybe this explanation is worth putting in the documentation. Regards, &rzej;
Right, so this size argument is actually the total number of characters, while the string_view::size is the number of characters in the current piece.
Which is a poor design choice, as I said.
No, the information is already there so we are just passing it to the handler which can use it, or not. If the handler doesn't use it, then it is optimized away.
parser.write(&data1[0], data1.size(), ec); parser.write(&data2[0], data2.size(), ec); parser.write(&data3[0], data3.size(), ec);
You have to call basic_parser::reset in order to start parsing a new "document."
So I reiterate that the number and string handling is inconsistent and these part methods are more confusing than helpful -- numbers are parsed into the complete value but strings are not, and neither give you the full text of the value (which may be important if the goal is to parse into an arbitrary-precision value type).
I do regard this as a very poor design choice, but not sufficiently so to reject the library.
No it is not a poor design choice, it is the correct design choice. If basic_parser were to buffer the input, then users who have no need of buffered input (such as value_stack) would be paying for it for nothing. This interface is driven first and foremost for performance and the incremental/streaming functionality. If callers want to see full text, they can append the partial strings into a data member until the final on_string or on_int64 / on_uint64 / on_double and then interact with it that way. But they are not obliged to do so.
Again I repeat, this is not a poor design choice. json::string for example is perfectly capable of being constructed from two separate buffers of characters (an implementation detail):
< https://github.com/CPPAlliance/json/blob/6ddddfb16f9b54407d153abdda11a29f746...
If we were to follow your "correct design" to make basic_parser buffer the input, the implementation would be needlessly allocating memory and copying characters.
I was assuming that a "deep" JSON (many nested objects) may have a different cost per output character than a wide but shallow one. But it's not a big deal.
Yes, it isn't perfect of course but over time and with a reasonable distribution of possible JSON inputs it averages out to be good enough. Certainly better than not having streaming at all (which would be terrible).
To store the comments would mean storing them in the DOM, which would have a significant impact on performance as the space for everything is tightly controlled.
I don't see why. It's just an extra string associated with any value.
Yes exactly. sizeof(value)==16 on 32bit platforms, and sizeof(value)==24 on 64-bit platforms. When arrays and objects are constructed during parsing, the implementation performs a memcpy after allocation. Every additional byte in the sizeof(value) slows down the creation of arrays and objects. Adding a pointer to value would cost 25% on 32-bit and 33% on 64-bit, and make Boost.JSON unable to ever beat RapidJSON in performance.
This just seems like it's doing an extra count/comparison that could be removed entirely, and removing it would naturally improve performance.
If you don't want a limit then just set the max to `size_t(-1)` and the compiler will elide the check completely.
Unless you're saying that moving the check to basic_parser improves the performance of parser, in which case: you're doing it wrong. Imposing restrictions on all users of basic_parser for something that only parser cares about is the wrong design choice.
The design of these components balances usability with performance. If you don't care about limits you can just set the max to size_t(-1). And I think this is the right choice. The library makes common things easy, e.g. call parse(). If you need more complexity, you instantiate a `parser` whose API requires a few more steps. If you really want to go low level, you use `basic_parser` which being low level can seem esoteric.
And I think this is okay, because the percentage of users who will use basic_parser is incredibly tiny. The percentage of `parser` users much greater. While the percentage of users of the parse() free function will be closer to 100% than anything else. In other words the library favors performance and utility at the lowest levels over theoretical interface correctness, where the number of users is naturally smaller (and their domain-expertise naturally higher).
Regards
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost