On 23/09/2020 12:52, Vinnie Falco wrote:
It is the opposite of SRP violation. basic_parser is concerned only with parsing and not keeping track of the full value. The responsibility of tracking full values is moved to the caller. That way, basic_parser does not impose its opinions on how the full value should be stored. This was litigated in earlier posts if I recall (Peter?).
Except it does, because it parses into int/uint/double, which is inherently a storage decision. You might have a valid argument if it left that step to the handler to determine as well. (i.e. only providing on_number_part and on_number, just the text without any attempt to parse into a numeric type.) Making that change would probably make the arbitrary-precision-parsing people happier as well.
The interfaces are built up in layers. At the lowest layer you have basic_parser which concerns itself only with processing the input buffer and turning it into semantic components. It does _not_ have the responsibility of buffering input to present it as a larger component, that's the job of the handler. basic_parser is exposed to users so they can use their own schemes for dealing with the semantic components.
Above basic_parser, you have parser which uses value_stack. The value_stack does take responsibility for buffering input, and producing DOM values. As with basic_parser, both of these components are exposed to users so they can write their own layers on top.
My point is that this seems like the wrong choice of layers. As it stands, consumers should never care about the basic_parser layer, and instead will actually want a different layer (not currently exposed) that is between the two, which exposes complete values that are not in any DOM. The current form of basic_parser could still exist, if you like, but it should be made an implementation detail and the complete-value basic_parser actually exposed to consumers.
parser.write(&data1[0], data1.size(), ec); parser.write(&data2[0], data2.size(), ec); parser.write(&data3[0], data3.size(), ec);
You have to call basic_parser::reset in order to start parsing a new "document."
This is not parsing new documents, but new document fragments. As such it must not reset.
No it is not a poor design choice, it is the correct design choice. If basic_parser were to buffer the input, then users who have no need of buffered input (such as value_stack) would be paying for it for nothing. This interface is driven first and foremost for performance and the incremental/streaming functionality. If callers want to see full text, they can append the partial strings into a data member until the final on_string or on_int64 / on_uint64 / on_double and then interact with it that way. But they are not obliged to do so.
There should never be any such consumers. Partial values are not useful to anyone. If they exist in the library, that is because things are structured improperly.
If we were to follow your "correct design" to make basic_parser buffer the input, the implementation would be needlessly allocating memory and copying characters.
It has to happen somewhere, regardless. Why are you pushing that reponsibility onto the consumer?
And I think this is the right choice. The library makes common things easy, e.g. call parse(). If you need more complexity, you instantiate a `parser` whose API requires a few more steps. If you really want to go low level, you use `basic_parser` which being low level can seem esoteric.
And I think this is okay, because the percentage of users who will use basic_parser is incredibly tiny. The percentage of `parser` users much greater. While the percentage of users of the parse() free function will be closer to 100% than anything else. In other words the library favors performance and utility at the lowest levels over theoretical interface correctness, where the number of users is naturally smaller (and their domain-expertise naturally higher).
So why provide basic_parser at all, if its interface is deliberately incorrect?