On 11/16/19 7:09 PM, Vinnie Falco via Boost wrote:
My JSON library is going to propose to Boost soon, is there anyone who might be willing and able to function in the role of review manager?
Before we get to that point, there are some questions that the community should address. These days there are a huge number of JSON libraries out there. Some focus on features (e.g. binary JSON or JSONPath), others on performance, and others on user-friendliness. What competitive edge should a potential Boost JSON library offer? Which use cases should a potential Boost JSON library support? Should the library work seemlessly with standard algorithms? Should it be possible to create parser combinators? Should there be a JSON archive for Boost.Serialization? Should it replace parts of Boost.PropertyTree? PS: On a procedural note, you need an endorsement of the library before seeking a review manager.
On Sun, Nov 17, 2019 at 3:45 AM Bjorn Reese via Boost
Before we get to that point, there are some questions
Yes these are good questions. I thought I had answered them but it could certainly use more explaining (and in the documentation as well). In terms of parsing and serialization I don't think there will be any single solution that will satisfy all use cases. The emphasis of my JSON library is on the container that represents the JSON In memory. It is designed to be useful as a vocabulary type. This means that if someone wants to write a library that does things with JSON (e.g. implementing JSON-RPC[1]), their public interfaces can confidently use `boost::json::value`[2] in function signatures and declarations for several reasons: * `json::value` is SemiRegular * `json::value` is small (16/24 bytes on 32/64 bit architecture) * `json::value` behaves predictably with respect to special members (copy, move) * `json::value` supports custom allocators (without being a template parameter) * The physical structure of the value types is designed to reduce compilation times: - No class templates - No configurability (well-defined, predictable behavior) - Strong separation of concerns * The object[3], array[4], and string[5] types, used for the underlying representation of the corresponding JSON kind, are first class types[3,4,5] * All operations provide the strong exception safety guarantee That said, parsing and serialization are still important and in this area my library is quite competitive in terms of performance. RapidJSON is currently the top-performing library for parsing and serialization using a general purpose container (I don't count SIMDJson, which produces a read-only structure). In comparison to RapidJSON, my library outperforms RapidJSON in most cases and completely blows away nlohmann [6]. This library also supports both incremental parsing and incremental serialization using caller-provided buffers, an important use-case for building high performing network programs. To my knowledge no other JSON library supports this.
These days there are a huge number of JSON libraries out there. Some focus on features (e.g. binary JSON or JSONPath), others on performance, and others on user-friendliness. What competitive edge should a potential Boost JSON library offer?
A problem that I see with some other libraries is that they attempt to do too much, resulting in poor API designs with no separation of concerns, and long compilation times. Something that I am doing these days, based on things I've learned while maintaining Beast, is to design my new libraries differently: * Keep the scope narrow: solve one problem and solve it well * Minimize the use of templates where doing so does not diminish functionality * Design with modularity in mind: minimize dependencies * Be mindful of compilation times This new thinking addresses the most common complaints that users have about Boost libraries. A planned feature is to enable this JSON library to be used without Boost simply by defining BOOST_JSON_STANDALONE. This way stubborn users who refuse to use boost because Reasons can still enjoy a wonderful JSON library.
Which use cases should a potential Boost JSON library support? Should the library work seemlessly with standard algorithms?
I'm not sure what supporting standard algorithms means, but the object, array, and string containers have interfaces that are identical to their C++20 equivalents (unordered_map, vector, and string) except that for the `object` type: * The elements are stored contiguously * Iterators are ordinary pointers, and may become invalidated on insertions and removals. * The order of insertions is preserved, as long as there are no removals. * All inserted values will use the same allocator as the container itself. * All hash, bucket, and node interfaces are removed
Should it be possible to create parser combinators? Should there be a JSON archive for Boost.Serialization? Should it replace parts of Boost.PropertyTree?
These are out of scope for my library. If parser combinators are important, they can be developed as a separate library. The same goes for bindings for Boost.Serialization and Boost.PropertyTree. Generally speaking, I think new Boost library offerings need to be more numerous, smaller, modular, and with fewer dependencies from now on. I would like to break up Beast into 4 or 5 individual libraries at some point (the logistics of that being not-yet-determined). Thanks [1] https://www.jsonrpc.org/specification [2] https://vinniefalco.github.io/doc/json/json/ref/boost__json__value.html [3] https://vinniefalco.github.io/doc/json/json/ref/boost__json__object.html [4] https://vinniefalco.github.io/doc/json/json/ref/boost__json__array.html [5] https://vinniefalco.github.io/doc/json/json/ref/boost__json__string.html [6] https://vinniefalco.github.io/doc/json/json/benchmarks.html
On Sun, Nov 17, 2019 at 3:38 PM Vinnie Falco via Boost < boost@lists.boost.org> wrote:
[...] RapidJSON is currently the top-performing library for parsing and serialization using a general purpose container (I don't count SIMDJson, which produces a read-only structure). In comparison to RapidJSON, my library outperforms RapidJSON in most cases and completely blows away nlohmann [6].
Looking at your own benchmarks, that's not obvious to me, at least on the parsing side. [6] https://vinniefalco.github.io/doc/json/json/benchmarks.html Regarding those benchmarks, could you please: 1) provide synthetic graphs? 2) better explain what the benchmark does? Those sizes and durations yield very low throughput numbers, so you're obviously doing the parsing several times in a loop, so please adds details on that page, and calculate the real MB/s throughput as well please. Also peak memory would be of interest. 3) Smallest files parsed is ~ 600KB, while in some (important IMHO) use-cases, it's much smaller files of just a few bytes or low-KBs, but lots of them (thousands, millions). In such cases, the constant-overhead of setting up the parser matters and/or instantiating the root value matters, since might dominate over the parsing time. Would it be possible to test that use case too please? Could you also please explain (or link to) on that page the PROs and CONs of default vs block storage for boot::json::value? There seems to be a speed advantage, so what's the catch since not the default? Thanks for the detailed post and your efforts to propose this to Boost. Might be my first review. --DD
On Mon, Nov 18, 2019 at 1:14 AM Dominique Devienne via Boost
Also peak memory would be of interest.
How do you calculate this?
Could you also please explain (or link to) on that page the PROs and CONs of default vs block storage for boot::json::value?
I added this page of exposition: https://vinniefalco.github.io/doc/json/json/usage/storage.html Still working on the benchmarks... Thanks
On Mon, Nov 18, 2019 at 1:14 AM Dominique Devienne via Boost
Regarding those benchmarks, could you please: 1) provide synthetic graphs? 2) better explain what the benchmark does? Those sizes and durations yield very low throughput numbers, so you're obviously doing the parsing several times in a loop, so please adds details on that page, and calculate the real MB/s throughput as well please. Also peak memory would be of interest. 3) Smallest files parsed is ~ 600KB, while in some (important IMHO) use-cases, it's much smaller files of just a few bytes or low-KBs, but lots of them (thousands, millions). In such cases, the constant-overhead of setting up the parser matters and/or instantiating the root value matters, since might dominate over the parsing time. Would it be possible to test that use case too please?
How about this https://vinniefalco.github.io/doc/json/json/benchmarks.html Thanks
Le mercredi 20 novembre 2019 à 18:54 -0800, Vinnie Falco via Boost a écrit :
How about this
This is much clearer imho. Some additional remarks: - you don't specify the hardware you ran the benchmark on. Additionaly, a test on an arm architecture, like a raspberry pi, would be nice. - could you provide a link with the source code for the benchmarks, as well as the test files ? This would make it easier to reproduce for readers (this would also mitigate the first remark) - the biggest file is 2MB. A 200MB file would be a good start to see how the library deals with large files. Regards, Julien Blanc
On Wed, Nov 20, 2019 at 11:10 PM Julien Blanc
Some additional remarks: - you don't specify the hardware you ran the benchmark on. Additionaly, a test on an arm architecture, like a raspberry pi, would be nice.
I listed my hardware on the page, and I re-collated the results to group them by input file to make comparisons easier. I don't have a Raspberry PI, so if someone wants to run the bench program and report the results that would be lovely.
- could you provide a link with the source code for the benchmarks, as well as the test files ?
Yep, the bench program and input files are here: https://github.com/vinniefalco/json/tree/develop/bench
A 200MB file would be a good start to see how the library deals with large files.
I don't have a 200MB JSON file, and I have doubts that it would provide much more information than what we have already. I'd also prefer not to put a 200MB file into the repository. The goal of this library is to be in the same ballpark as rapidjson, it doesn't try to be the top performer (in fact, rapidjson achieves its performance by making usability tradeoffs which are unsuited for vocabulary types). However if someone wants to run the bench program on a 200MB file or any other custom inputs and report the results, that would be great! Thanks
Bjorn Reese wrote:
Should there be a JSON archive for Boost.Serialization?
There are many possible ways to write Boost.Serialization JSON archives, depending on the JSON produced. You could for instance output structs as `{ "x": 1, "y": 2 }`, or `[ "x", 1, "y", 2 ]`, or `[ 1, 2 ]`, or even `{ "size": 3, "data": [ 1, 2 ] }`. The first representation is the most natural and human-editable, but can't be deserialized incrementally because the fields can come in any order. (Arrays can also include the size or not.) Assuming we're talking about the first option, it should be possible to use the proposed JSON library to implement a JSON input archive that reads from a json::value. The output archive doesn't really need a library.
On 11/17/19 7:40 AM, Peter Dimov via Boost wrote:
Bjorn Reese wrote:
Should there be a JSON archive for Boost.Serialization?
There are many possible ways to write Boost.Serialization JSON archives, depending on the JSON produced. You could for instance output structs as `{ "x": 1, "y": 2 }`, or `[ "x", 1, "y", 2 ]`, or `[ 1, 2 ]`, or even `{ "size": 3, "data": [ 1, 2 ] }`.
The first representation is the most natural and human-editable,
Human editable archives are something people ask for. But it's not really possible in a general way because archives have to reflect the the C++ data structures that they correspond to. If one want's to be able to edit data off line - better would be to start with a free standing archive design using something like google protocol buffers. Then you're into the normal trade offs regarding such libraries.
but can't be deserialized incrementally ...
Which is pretty much a requirement for a generally function serialization system.
Assuming we're talking about the first option, it should be possible to use the proposed JSON library to implement a JSON input archive that reads from a json::value. The output archive doesn't really need a library.
Implementing json for boost serialization should not be a target requirement for a json library which is meant to be simple, efficient, easy to use. Implementing JSON for boost serialization wouldn't be very hard. I'm suprised that in 15? years no one has done it. In fact no one has even asked for it!. Were I to do it, I'd follow the design using boost spirit which has served me well for many years. It's easy to maintain, relatively simple to implement, efficient enough so that no one has complained about when used for xml. It turned xml serializaton ( a dumb concept I felt I had to implement) into a non-issue. Robert Ramey
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Sun, Nov 17, 2019 at 8:06 AM Robert Ramey via Boost
Implementing JSON for boost serialization wouldn't be very hard.
Sounds like this "not very hard" implementation using Boost.JSON with Boost.Serialization is a perfect task for you as part of your upcoming review of the library. Thanks
Robert Ramey wrote:
Implementing JSON for boost serialization wouldn't be very hard. I'm surprised that in 15? years no one has done it.
Your second sentence should make you at least question the first one. The problem is as I already outlined; if you want to support field reordering, you need a JSON library like the one proposed, because you have to parse the entire JSON into a `value` and then deserialize from the `value`. If you don't support field reordering, the format would basically only interoperate with itself, which defeats the purpose of using JSON. You might as well just use the text archive.
On 11/17/19 8:57 AM, Peter Dimov via Boost wrote:
Robert Ramey wrote:
Implementing JSON for boost serialization wouldn't be very hard. I'm surprised that in 15? years no one has done it.
Your second sentence should make you at least question the first one.
The problem is as I already outlined; if you want to support field reordering,
serialization doesn't require reordering. the json reader for serialization only needs to be able to read archives created by the json writer for serialization which specifies the order.
If you don't support field reordering, the format would basically only interoperate with itself, which defeats the purpose of using JSON. You might as well just use the text archive.
Exactly. Which is probably why no one ever bothered to write a JSON archive. Which is exactly the reason that writing an xml version was an unnecessary waste of time. The idea that it's possible to make an "editable archive" with XML or JSON or anything else is fundamentally false. There is no way that one could map in the general case an "edited" archive to a pre-determined C++ data structure. Of course one could conjure up some specific cases where it would be conceivable - but then we be in the C++ committee trap of engaging in a fools errand of trying to implement a general idea by specifying a bunch of special cases. Nothing would be served by going there. If someone wants to make a JSON version of boost serialization that's fine. But don't think that you can make an implementation which is independent of the C++ data structures being serialized. Look to a different model. Ideally one would have parse JSON to some general C++ format which one can then pick through and retrieve what he wants or determine that it's not in there. Another way to think about it is to replace google protocol buffers. The later requires that you make a separate structural syntax which is a pain. But protocol buffers is even more popular than boost serialization. So I think a Boost JSON parser would success. My personal requirements for such a system would be: a) ability to handle unlimited/unterminated data. As data is read in - an event is triggered when a grammatical element is recognized. b) Events are implemented by the users (though the library would provide a bunch of defaults). This provides for infinite flexibility, parallel execution for large datasets. c) Of course such a system could be implemented in a direct, verifiably correct manner with boost spirit. I don't know what the performance implications would be. Boost XML de-serialization has been done by spirit (first version) for 15 years and no one has complained about the speed. Whenever anyone complains about the speed of text based archives it always goes back to the file system - which presumably would be the case with any JSON library. Robert Ramey
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, November 17, 2019 6:31 PM, Robert Ramey via Boost
On 11/17/19 8:57 AM, Peter Dimov via Boost wrote:
Robert Ramey wrote:
Implementing JSON for boost serialization wouldn't be very hard. I'm surprised that in 15? years no one has done it.
Your second sentence should make you at least question the first one. The problem is as I already outlined; if you want to support field reordering,
serialization doesn't require reordering. the json reader for serialization only needs to be able to read archives created by the json writer for serialization which specifies the order.
If you don't support field reordering, the format would basically only interoperate with itself, which defeats the purpose of using JSON. You might as well just use the text archive.
Exactly. Which is probably why no one ever bothered to write a JSON archive. Which is exactly the reason that writing an xml version was an unnecessary waste of time.
The idea that it's possible to make an "editable archive" with XML or JSON or anything else is fundamentally false. There is no way that one could map in the general case an "edited" archive to a pre-determined C++ data structure. Of course one could conjure up some specific cases where it would be conceivable - but then we be in the C++ committee trap of engaging in a fools errand of trying to implement a general idea by specifying a bunch of special cases. Nothing would be served by going there.
If someone wants to make a JSON version of boost serialization that's fine. But don't think that you can make an implementation which is independent of the C++ data structures being serialized.
Look to a different model. Ideally one would have parse JSON to some general C++ format which one can then pick through and retrieve what he wants or determine that it's not in there. Another way to think about it is to replace google protocol buffers. The later requires that you make a separate structural syntax which is a pain. But protocol buffers is even more popular than boost serialization. So I think a Boost JSON parser would success.
This entire section highlights my frustrations with the JSON format and most C++ JSON implementations. The bulk of cases are JSON -> specific data structure. The easiest implementation for JSON objects in C++ is storing all fields to a temporary DOM and then doing a lookup when mapping to a data structure. I wrote an implementation for msgpack (not fully tested :/) that uses C++ template variadics to skip the DOM step; almost no one is inspecting arbitrary fields and a linear search across a hard-coded number of fields in a C++ struct is typically quicker than dynamically managing some kind of map. And if you don't mind punishing the compiler, I think its possible to sort the fields automagically in C++14 constexpr for logarithmic lookup. Boost.Serialization cannot support this DOM-less mode without an interface change unfortunately, and I'm not sure if this type of interface is appropriate for Boost. A quick glance at this proposed library suggests that it should be possible to write a new parser with a different interface that leverage the same backend parsing code (something which should be possible with any SAX implementation I think).
My personal requirements for such a system would be:
a) ability to handle unlimited/unterminated data. As data is read in - an event is triggered when a grammatical element is recognized.
b) Events are implemented by the users (though the library would provide a bunch of defaults). This provides for infinite flexibility, parallel execution for large datasets.
c) Of course such a system could be implemented in a direct, verifiably correct manner with boost spirit. I don't know what the performance implications would be. Boost XML de-serialization has been done by spirit (first version) for 15 years and no one has complained about the speed. Whenever anyone complains about the speed of text based archives it always goes back to the file system - which presumably would be the case with any JSON library.
Robert Ramey
Lee
On Sat, Nov 23, 2019 at 10:37 AM Lee Clagett via Boost
The bulk of cases are JSON -> specific data structure. The easiest implementation for JSON objects in C++ is storing all fields to a temporary DOM and then doing a lookup when mapping to a data structure.
This is an important use-case, but there are several libraries out there which support this. The innovation in Boost.JSON is providing a vocabulary type for the DOM that uses the discriminated union (the `boost::json::value`). This way folks can build up layers of abstraction using a common interface. Boost.JSON also performs well, which is a bonus. And it is the first library to support incremental parsing and serialization, which makes it a natural choice for network programs. Lets say you have this JSON: { "id" : 42, "name" : "John Doe", "invoices" : [ 1001, 1002, 1005, 1007 ] } The output of the parser is a json::value which holds result. The top level value is an object, with 3 elements. Because json::value is SemiRegular and has a sane API, you could easily turn the parsed DOM into this structure: struct customer { std::uint64_t id; json::string name; json::array invoices; explicit customer( json::value const& jv ) : id( jv.as_object().at( "id" ).as_uint64() ) , name( std::move( jv.as_object().at( "customer" ).as_string() ) ) , invoices( std::move( jv.as_object().at( "invoices" ).as_array() ) ) { } }; The customer constructor would not allocate any memory at all, instead ownership would be transferred via noexcept move construction. Remember that the thesis of this library is "no single JSON library can or should satisfy all use-cases." What I have done (hopefully) is identified a common use-case that benefits from some form of standardization, to provide a vocabulary type for a JSON. Thanks
Le samedi 23 novembre 2019 à 11:11 -0800, Vinnie Falco via Boost a écrit :
On Sat, Nov 23, 2019 at 10:37 AM Lee Clagett via Boost
wrote: The bulk of cases are JSON -> specific data structure. The easiest implementation for JSON objects in C++ is storing all fields to a temporary DOM and then doing a lookup when mapping to a data structure.
This is an important use-case, but there are several libraries out there which support this. The innovation in Boost.JSON is providing a vocabulary type for the DOM that uses the discriminated union (the `boost::json::value`).
Not sure why you say this is an innovation. Maybe i missed something, but i dont see this much different than https://doc.qt.io/qt-5/qjsonvalue.html , which is around for years.
This way folks can build up layers of abstraction using a common interface. Boost.JSON also performs well, which is a bonus. And it is the first library to support incremental parsing and serialization, which makes it a natural choice for network programs.
This is imho the major feature in boost.json. This was in fact the main reason i started writing my own json library some years ago.
Lets say you have this JSON:
{ "id" : 42, "name" : "John Doe", "invoices" : [ 1001, 1002, 1005, 1007 ] }
The output of the parser is a json::value which holds result. The top level value is an object, with 3 elements. Because json::value is SemiRegular and has a sane API, you could easily turn the parsed DOM into this structure:
While it is ok for most use cases, if talking about performance you probably don't want your DOM intermediate structure. You want a deserializer that just directly create a customer structure without any intermediate. And you’ll just use a vectorstd::uint64_t for invoices. and an std::string for name. Same goes for the serializer. But the nice thing is that from what i see, it is entirely possible to do with current boost.json (didn't check about the serializer part, though).
Remember that the thesis of this library is "no single JSON library can or should satisfy all use-cases." What I have done (hopefully) is identified a common use-case that benefits from some form of standardization, to provide a vocabulary type for a JSON.
Using json types directly is fine for prototyping, for quickly written code that is not supposed to last for years. There is a market for this, this is useful. However, i would not rely on it for any code that is supposed to last or be reused amoung projects. Regards, Julien
On Sun, Nov 24, 2019 at 12:26 AM Julien Blanc
Not sure why you say this is an innovation. Maybe i missed something, but i dont see this much different than https://doc.qt.io/qt-5/qjsonvalue.html , which is around for years.
"Not much different" is still different, and those differences are enough to disqualify QtJsonValue from being a vocabulary type: * QtJsonValue does not support allocators * QtJsonValue has an "undefined" state[1], reminiscent of valueless variants. * QtJsonValue is part of a huge application framework. This alone is enough to make it unsuitable as a vocabulary type: No one is writing libraries that depend on Qt (imagine a Boost library being proposed that required Qt).
While it is ok for most use cases
Yes and these are the use cases which my library addresses.
But the nice thing is that from what i see, it is entirely possible to do with current boost.json (didn't check about the serializer part, though).
I don't think so, the parser in boost.json is a SAX parser which is somewhat inconvenient for parsing directly into user defined types. It could be adapted into a generator using coroutines however.
Using json types directly is fine for prototyping, for quickly written code that is not supposed to last for years. There is a market for this, this is useful. However, i would not rely on it for any code that is supposed to last or be reused amoung projects.
Counterexample: https://github.com/ripple/rippled/blob/232975bfdbde12a65499130d78f938f261c98... There are others. Thanks
Le dimanche 24 novembre 2019 à 07:39 -0800, Vinnie Falco a écrit :
* QtJsonValue is part of a huge application framework. This alone is enough to make it unsuitable as a vocabulary type: No one is writing libraries that depend on Qt (imagine a Boost library being proposed that required Qt).
While the last part of the sentence is true in the sense that it is really unlikely to happen (and thus we need a boost::jsonvalue), the first is obviously completely wrong. See https://github.com/fargies/qjsonrpc as a counter example.
Counterexample:
< https://github.com/ripple/rippled/blob/232975bfdbde12a65499130d78f938f261c98...
Seems i did not express myself correctly. What i wanted to say is that i don't want to use a json type in the core part (the “business logic”) of a program. Obviously i'll use one in any serialization adapter / rpc part (which is the case in your example) if needed. Although i still hope that a solution giving the same level of abstraction than .net core, where you expose standard types, and the framework takes care of all the serialization / deserialization part, is doable in pure c++ (made some experiments there back in time, was fun but very hard to maintain and very limited in functionalities). Regards, Julien
On 2019-11-24 16:39, Vinnie Falco via Boost wrote:
I don't think so, the parser in boost.json is a SAX parser which is somewhat inconvenient for parsing directly into user defined types. It could be adapted into a generator using coroutines however.
Your parser currently uses stack variables, so the adaptor has to use stackful coroutines which would incur two context-switches on every iteration.
On 11/23/19 7:18 PM, Lee Clagett via Boost wrote:
This entire section highlights my frustrations with the JSON format and most C++ JSON implementations. The bulk of cases are JSON -> specific data structure. The easiest implementation for JSON objects in C++ is storing all fields to a temporary DOM and then doing a lookup when mapping to a data structure. I wrote an implementation for msgpack (not fully tested :/) that uses C++ template variadics to skip the DOM step;
I did this for a related binary format called BinToken: https://github.com/breese/trial.protocol/tree/develop/include/trial/protocol... It supports both low-level iteration over the binary format, serialization directly into C++ data structures, and DOM parsing. I used to have a MsgPack (and UBJSON) implementation as well, but I did not upgrade them after a major redesign because I had no personal use for them.
On 11/17/19 7:31 PM, Robert Ramey via Boost wrote:
If someone wants to make a JSON version of boost serialization that's fine. But don't think that you can make an implementation which is independent of the C++ data structures being serialized.
It is a matter of serialization overloads. The ones provided by Boost.Serialization are free functions, where you cannot do partial specialization, so I had to forward everything to a struct as described here: http://breese.github.io/2015/12/20/partiality-for-functions.html If you want a JSON format with a different layout of JSON Objects, then you simply create and include a different set of overloads. The JSON archive remains the same.
On Sun, Nov 17, 2019 at 11:06 AM Robert Ramey via Boost
On 11/17/19 7:40 AM, Peter Dimov via Boost wrote:
Bjorn Reese wrote:
Should there be a JSON archive for Boost.Serialization?
There are many possible ways to write Boost.Serialization JSON archives, depending on the JSON produced. You could for instance output structs as `{ "x": 1, "y": 2 }`, or `[ "x", 1, "y", 2 ]`, or `[ 1, 2 ]`, or even `{ "size": 3, "data": [ 1, 2 ] }`.
The first representation is the most natural and human-editable,
Human editable archives are something people ask for. But it's not really possible in a general way because archives have to reflect the the C++ data structures that they correspond to. If one want's to be able to edit data off line - better would be to start with a free standing archive design using something like google protocol buffers. Then you're into the normal trade offs regarding such libraries.
Since you already support JSON, if you supported YAML as well, you would satisfy the human editable request. - Jim
On 11/17/19 11:41 AM, James E. King III via Boost wrote:
On Sun, Nov 17, 2019 at 11:06 AM Robert Ramey via Boost
Human editable archives are something people ask for. But it's not really possible in a general way because archives have to reflect the the C++ data structures that they correspond to. If one want's to be able to edit data off line - better would be to start with a free standing archive design using something like google protocol buffers. Then you're into the normal trade offs regarding such libraries.
Since you already support JSON, if you supported YAML as well, you would satisfy the human editable request.
Hmmm - did you read the above? Of course serialization could support JSON, YAML, or even some version of the english language. But this would not mean the the archive is editable unless it's write only. The problem is that the archive contents are strictly dependent upon the C++ data structures being serialized. If you edit the archive in a general way, that is not necessarily preserving the original structure, it won't map the the C++ data structure. So what would be the point? If you restrict your editing to some specific requirements - like just changing some data values - it would be pretty complex to specify and enforce which kinds of editing are allowed. Much easier would be just to write a general purpose JSON parser which the user can one way or another map to his C++ data structure.
- Jim
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 11/17/19 11:58 AM, Robert Ramey via Boost wrote:
On 11/17/19 11:41 AM, James E. King III via Boost wrote:
On Sun, Nov 17, 2019 at 11:06 AM Robert Ramey via Boost
Hmmm - did you read the above? Of course serialization could support JSON, YAML, or even some version of the english language. But this would not mean the the archive is editable unless it's write only. The problem is that the archive contents are strictly dependent upon t Off topic - nothing to do with parsing - but interesting and sort of related.
write only archives in fact do have a usage. Suppose you want to create some sort of log - debug, transaction, etc.... It's a pain to include all that formating code in your app - especially since you're not usually using it. And you've mixed in the formating into your program creating a maintainence PITA. Its worse since the minute you do that everyone and his brother will want a different format or for the screen or PDF or ... Solution - a write only XML archive! Create the XML archive and write to in the normal serialization way. Then let every user use his own XSLT script to transform into doc book or whatever. Then using docbook-> hmtl, or docbook pdf, or -> ... to produce his desired output. Actually since the archive is write only, you could also edit it using your favorite XML editor. This is rich territory that so far no one (besides myself) has ever mined. Ironically, the one person who thought it was a waste of time to invest effort in an XML archive (me) is likely the only person who every found a useful purpose for it (besides debugging). Robert Ramey
On 11/17/19 5:05 PM, Robert Ramey via Boost wrote:
Implementing JSON for boost serialization wouldn't be very hard. I'm suprised that in 15? years no one has done it. In fact no one has even
This is news to those of us who have used them for years: http://breese.github.io/trial/protocol/trial_protocol/json/user_guide/serial... http://cppcms.com/wikipp/en/page/cppcms_1x_serialization both of which have been mentioned several times on this mailing-list. Not to mentions all those who had to resort to using Boost.PropertyTree for JSON serialization. Or the GSOC 2013 JSON proposal...
asked for it!. Were I to do it, I'd follow the design using boost spirit which has served me well for many years. It's easy to maintain, relatively simple to implement, efficient enough so that no one has complained about when used for xml. It turned xml serializaton ( a dumb concept I felt I had to implement) into a non-issue.
The Boost.Spirit approach works for XML because you have effectively created a pull parser, which is the approach I have been arguing for. I have yet to see a push parser (which Vinnie's parser is) work for serialization.
On 11/24/19 3:59 AM, Bjorn Reese via Boost wrote:
On 11/17/19 5:05 PM, Robert Ramey via Boost wrote:
Implementing JSON for boost serialization wouldn't be very hard. I'm suprised that in 15? years no one has done it. In fact no one has even
This is news to those of us who have used them for years:
http://breese.github.io/trial/protocol/trial_protocol/json/user_guide/serial...
http://cppcms.com/wikipp/en/page/cppcms_1x_serialization
both of which have been mentioned several times on this mailing-list.
Hmmmm- I'm not seeing how this relates to my point.
Not to mentions all those who had to resort to using Boost.PropertyTree for JSON serialization.
Or the GSOC 2013 JSON proposal...
asked for it!. Were I to do it, I'd follow the design using boost spirit which has served me well for many years. It's easy to maintain, relatively simple to implement, efficient enough so that no one has complained about when used for xml. It turned xml serializaton ( a dumb concept I felt I had to implement) into a non-issue.
The Boost.Spirit approach works for XML because you have effectively created a pull parser, which is the approach I have been arguing for.
So we're in agreement?
I have yet to see a push parser (which Vinnie's parser is) work for serialization.
Right. I don't think it can. So we're in agreement again? it's hard to tell. Robert Ramey Just to re-iterate: My points are: a) I don't think it would be hard to make a JSON version of a serialization archive class using the XML archive as a model. b) As far as I know, now one has done it. c) As far as I can recall - no one has asked me for it. If the have I likely responded with a) above. RR
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 2019-11-24 17:00, Robert Ramey via Boost wrote:
Hmmmm- I'm not seeing how this relates to my point.
I am agreeing the JSON serialization is not very hard, I was responding to your "no one has done it" comment.
Just to re-iterate: My points are:
a) I don't think it would be hard to make a JSON version of a serialization archive class using the XML archive as a model.
Agreed.
b) As far as I know, now one has done it.
Disagree. Please, please, please click on the following link: https://github.com/breese/trial.protocol/tree/develop/include/trial/protocol...
On 11/30/19 7:39 AM, Bjorn Reese via Boost wrote:
On 2019-11-24 17:00, Robert Ramey via Boost wrote:
Hmmmm- I'm not seeing how this relates to my point.
I am agreeing the JSON serialization is not very hard, I was responding to your "no one has done it" comment.
Just to re-iterate: My points are:
a) I don't think it would be hard to make a JSON version of a serialization archive class using the XML archive as a model.
Agreed.
b) As far as I know, now one has done it.
Disagree. Please, please, please click on the following link:
https://github.com/breese/trial.protocol/tree/develop/include/trial/protocol...
I stand corrected!
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 11/30/19 7:39 AM, Bjorn Reese via Boost wrote:
On 2019-11-24 17:00, Robert Ramey via Boost wrote:
Hmmmm- I'm not seeing how this relates to my point.
I am agreeing the JSON serialization is not very hard, I was responding to your "no one has done it" comment.
Just to re-iterate: My points are:
a) I don't think it would be hard to make a JSON version of a serialization archive class using the XML archive as a model.
Agreed.
b) As far as I know, now one has done it.
Disagree. Please, please, please click on the following link:
https://github.com/breese/trial.protocol/tree/develop/include/trial/protocol...
I took a cursory look at this and it seems interesting. I had never seen it before. I should be easy to a) copy some files over to the serialization library - ojason_archive .. and ijason_archive b) tweak the test setup to include these "new" archive classes c) run the whole serialization librar test suite - which is quite a bit d) with the jam setup, one can restrict tests only to one specific archive (or even one specific test) . So running it locally on one's machine is quite practical. It if passes everything, has any required additional documentation (very little) and it's truely a drop in replacement for any other archive and it meets the "ramey coding standards" (not too anal), and willing to support complaints from users of the json archive (not really that bad), I'll be happy to add it the list of included archives along with xml, et. al. This will make you an official boost developer (if you aren't already) with out going through most of the boost agony. If you don't want to do it but someone else does, the same offer would apply to them. Robert Ramey
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Sat, Nov 30, 2019 at 9:08 AM Robert Ramey via Boost
ojason_archive .. and ijason_archive
I looked at the code, but it was above my pay grade of understanding. Can someone please tell me how this Boost.Serialization integration of the JSON format handles the fact that keys in objects are unordered? Thanks
On 11/30/19 9:15 AM, Vinnie Falco via Boost wrote:
On Sat, Nov 30, 2019 at 9:08 AM Robert Ramey via Boost
wrote: ojason_archive .. and ijason_archive
I looked at the code, but it was above my pay grade of understanding. Can someone please tell me how this Boost.Serialization integration of the JSON format handles the fact that keys in objects are unordered?
The sequence of elements in a serialization archive is determined by the C++ data structures definition. So the sequence of elements in the input archive is pre-determined. serialization does not load any arbitrary data into a specified data structure. That would be impossible. It loads data formated by the output archive class with code from a compatible input archive class. This is why JSON parser optimized for serialization would never be a good choice for an arbitrary JSON parser. And a general purpose JSON parser would not be compatible with the serialization library. Robert Ramey
Thanks
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 2019-11-30 18:15, Vinnie Falco via Boost wrote:
Can someone please tell me how this Boost.Serialization integration of the JSON format handles the fact that keys in objects are unordered?
That is up to the user. The iarchive inserts the deserialized data into the container that the user specifies. If the user deserializes the input into a std::map then the data will become ordered. The user can deserialize into another container if they wish to retain the unordered nature of the data. That is how Boost.Serialization works. The std::map serialization is done in this file: https://github.com/breese/trial.protocol/blob/develop/include/trial/protocol... PS: I will be unable to respond in a timely to your other mail due to a business trip.
On Sat, Nov 30, 2019 at 10:29 AM Bjorn Reese via Boost
If the user deserializes the input into a std::map then the data will become ordered. The user can deserialize into another container if they wish to retain the unordered nature of the data.
So if I want to go from a JSON archive to my user-defined struct T, using Boost.Serialization, then the path is from JSON to a map/unordered_map, and then to T? Thanks
On Sat, Nov 30, 2019 at 10:29 AM Bjorn Reese via Boost
wrote: ...
There's a lot of discussion going on here which looks awfully like the kind of back and forth that would happen during a review. Except that this is not a review, and there is no review manager. Who is volunteering to act as the review manager for Boost.JSON? Thanks
On 11/30/19 10:35 AM, Vinnie Falco via Boost wrote:
On Sat, Nov 30, 2019 at 10:29 AM Bjorn Reese via Boost
wrote: If the user deserializes the input into a std::map then the data will become ordered. The user can deserialize into another container if they wish to retain the unordered nature of the data.
So if I want to go from a JSON archive to my user-defined struct T, using Boost.Serialization, then the path is from JSON to a map/unordered_map, and then to T?
No. an ijson_archive is intimately connected to the C++ data structures which were used to produce the archive using ojson_archive class. The are symetric pair. So given your user-defined struct T, you can produce a compatible archive using ojson_archive - and nothing else. Then you can load that archive in to another instance of your user-defined T - but nothing else. You user-defined T can contain any serializable type or inherit from any serializable type. For better or worse, the definition "serializable type" embraces a large majority of all C++ built-in types, library types and any other user type which has a serialize function. Robert Ramey
Thanks
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Sat, Nov 30, 2019 at 12:05 PM Robert Ramey via Boost
So if I want to go from a JSON archive to my user-defined struct T, using Boost.Serialization, then the path is from JSON to a map/unordered_map, and then to T?
No.
I was asking Bjorn, because he brought up using std::map or std::unordered_map as an intermediate step.
So given your user-defined struct T, you can produce a compatible archive using ojson_archive - and nothing else. Then you can load that archive in to another instance of your user-defined T - but nothing else.
What if I produce a JSON archive for my T, then edit the JSON and change the order of the keys, and then try to load it back in to a T? Thanks
On 11/30/19 12:26 PM, Vinnie Falco via Boost wrote:
On Sat, Nov 30, 2019 at 12:05 PM Robert Ramey via Boost
wrote: So if I want to go from a JSON archive to my user-defined struct T, using Boost.Serialization, then the path is from JSON to a map/unordered_map, and then to T?
No.
I was asking Bjorn, because he brought up using std::map or std::unordered_map as an intermediate step.
LOL - I'm not allowed to chime in when I want?
So given your user-defined struct T, you can produce a compatible archive using ojson_archive - and nothing else. Then you can load that archive in to another instance of your user-defined T - but nothing else.
What if I produce a JSON archive for my T, then edit the JSON and change the order of the keys, and then try to load it back in to a T?
In general, you can't do this. It's not supported. If this is what you want to do, you need to use another library such as google protocols. But then you have to specify data syntax by hand and graft code to transform your T back and forth between the two. This would be the use case for your library and "serialization". Robert Ramey
Thanks
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 2019-11-30 21:26, Vinnie Falco via Boost wrote:
I was asking Bjorn, because he brought up using std::map or std::unordered_map as an intermediate step.
I did not. You asked about the orderness of JSON Object deserialization and I answered that it depends on which type you deserialize to.
What if I produce a JSON archive for my T, then edit the JSON and change the order of the keys, and then try to load it back in to a T?
That depends on how T stores the key-value pairs.
On 2019-11-30 19:35, Vinnie Falco wrote:
So if I want to go from a JSON archive to my user-defined struct T, using Boost.Serialization, then the path is from JSON to a map/unordered_map, and then to T?
No. The path is directly from JSON to T, where the user decides what T is. So T could be a map, an unordered_map, or a vector of pairs if the user wish to preserve the insertion order. In lieu of C++ reflection, Boost.Serialization has to be told how to serializer to T, but there is support for some standard containers.
On 11/17/19 3:38 PM, Vinnie Falco wrote:
Yes these are good questions. I thought I had answered them but it could certainly use more explaining (and in the documentation as well).
Indeed you did, but thank you for succintly reiterating them anyways. My question was more directed towards the community as a whole to find out what they expect of a JSON library.
In terms of parsing and serialization I don't think there will be any single solution that will satisfy all use cases. The emphasis of my JSON library is on the container that represents the JSON In memory. It is designed to be useful as a vocabulary type. This means that if
Been there, done that years ago: http://breese.github.io/trial/protocol/trial_protocol/dynamic_variable.html and contributed heavily to its prequel: https://github.com/ferruccio/dynamic-cpp
* `json::value` is small (16/24 bytes on 32/64 bit architecture)
So most (all?) values are stored on the heap?
This library also supports both incremental parsing and incremental serialization using caller-provided buffers, an important use-case for building high performing network programs. To my knowledge no other JSON library supports this.
Have been doing this for years: http://breese.github.io/trial/protocol/trial_protocol/core.html
These are out of scope for my library. If parser combinators are important, they can be developed as a separate library. The same goes for bindings for Boost.Serialization and Boost.PropertyTree. Generally speaking, I think new Boost library offerings need to be more numerous, smaller, modular, and with fewer dependencies from now on. I
The reason I am asking about these questions is that your current design may not be suitable for making these extensions. You really should consider building a Boost.Serialization input archive to investigate if your design holds. If your design becomes a Boost library, then there will be very little incentive to include yet another JSON library to handle the remaining use cases. That is why I am asking these questions up-front. Notice that with the right design we can support all of these use cases without making the library more complex.
On Sun, Nov 24, 2019 at 3:53 AM Bjorn Reese via Boost
This library also supports both incremental parsing and incremental serialization using caller-provided buffers, an important use-case for building high performing network programs. To my knowledge no other JSON library supports this.
Have been doing this for years:
http://breese.github.io/trial/protocol/trial_protocol/core.html
I'm not seeing where trial.protocol has incremental algorithms, perhaps you can show me? trial::protocol::json::basic_reader constructs with the complete input: https://github.com/breese/trial.protocol/blob/4bdf90747944f24b61aa9dbde92d8f... There is no API to provide additional buffers. By "incremental" I mean an "online algorithm", i.e. the entire input does not need to be presented at once. For example, this is what it might look like using boost.json to incrementally parse a JSON from a socket: json::value parse( net::ip::tcp::socket& sock ) { error_code ec; json::parser p; p.start(); for(;;) { char buf[4096]; auto const n = sock.read_some( net::mutable_buffer(buf, sizeof(buf)), ec); if(ec == net::error::eof) break; if(ec) throw system_error(ec); p.write(buf, n, ec); if(ec) throw system_error(ec); } p.finish(); return p.release(); } Serialization functions similarly. The caller provides a buffer, and the implementation attempts to fill the buffer with serialized JSON. If the buffer is not big enough, subsequent calls may be made to retrieve the rest of the serialized output.
So most (all?) values are stored on the heap?
json::object, json::array, and json::string use dynamic allocations to store elements. The string has a small buffer optimization.
Notice that with the right design we can support all of these use cases without making the library more complex.
Respectfully I disagree. The parser/serializer is what it is, and designed to be optimal for the intended use case which is going to and from the DOM (the `json::value`). Perhaps there is another, "right design" which makes a different set of tradeoffs, but that can be the subject of a different library.
If your design becomes a Boost library, then there will be very little incentive to include yet another JSON library to handle the remaining use cases. That is why I am asking these questions up-front.
I disagree. Again the central premise here is that there is no ideal JSON library which can suit all needs. I believe this is why Boost does not yet have a JSON library. Optimizing one use-case necessarily comes at the expense of others. At the very least, the inversion of the flow of control (i.e. a parser which returns a token at a time) advantages one use case and disadvantages others (such as working as an online algorithm). There are other tradeoffs. Because my library addresses a narrow set of uses, there should be more than enough incentive for other pioneers to step in and fill the leftover gaps. And they can do so knowing that they do not need to be everything to everyone. Regards
On Sun, Nov 24, 2019 at 3:53 AM Bjorn Reese wrote:
If your design becomes a Boost library, then there will be very little incentive to include yet another JSON library to handle the remaining use case
As someone who is interested in a JSON library that handles some of those other use cases, I hope nobody feels that this is the case. i.e. There would be room for another library in Boost. If this library is accepted as "Boost.Json", that library could even be "Boost.Json2". Glen
Em dom., 24 de nov. de 2019 às 13:08, Glen Fernandes via Boost < boost@lists.boost.org> escreveu:
On Sun, Nov 24, 2019 at 3:53 AM Bjorn Reese wrote:
If your design becomes a Boost library, then there will be very little incentive to include yet another JSON library to handle the remaining use case
As someone who is interested in a JSON library that handles some of those other use cases, I hope nobody feels that this is the case.
i.e. There would be room for another library in Boost. If this library is accepted as "Boost.Json", that library could even be "Boost.Json2".
Accepting a library while planning to accept/review a second one to do the same thing is a sign of rushed design. An unintentional side-effect would be to move the focus of brilliant programmers to spend their time more on circumvent the limitations of the first design rather than actually helping the task force interested in discovering a fundamental design. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/
On Mon, Nov 25, 2019 at 8:12 AM Vinícius dos Santos Oliveira via Boost
the task force interested in discovering a fundamental design.
I think there has been ample time for "discovering a fundamental design," since hasn't been a JSON library proposed to Boost for over 10 years. The premise of my library is simple: "No single JSON design can address all use-cases in terms of both functionality and performance," Therefore what I have done in Boost.JSON is to address a specific use-case and optimize it for that case. In particular my library is fantastic for network programs which prefer to interact with JSON through a DOM (the boost::json::value type) to leverage the benefits of ad-hoc prototyping. Perhaps there is a "fundamental design" remaining to be discovered, which is easy to use and does everything my library does, and also everything else (such as going directly to and from user-defined types), and also achieves the same level of performance. But I have not seen it, and from my early prototyping experiments I suspect that the goals conflict with each other and such a design is not possible, except for the trivial solution of having what is really two (or more) libraries in one. If there is evidence of this fundamental design please bring it to the list's attention. Thanks
On Mon, Nov 25, 2019 at 8:12 AM Vinícius dos Santos Oliveira via Boost
Accepting a library while planning to accept/review a second one to do the same thing is a sign of rushed design.
Glen is not proposing to accept a second library that does the same thing. He is proposing to accept a second library that does a different thing. There is precedent for this: Boost.Variant, Boost.Variant2. Both exist, are different, and are used. They satisfy different sets of users. Regards
Em dom., 24 de nov. de 2019 às 12:30, Vinnie Falco via Boost < boost@lists.boost.org> escreveu:
I'm not seeing where trial.protocol has incremental algorithms, perhaps you can show me? trial::protocol::json::basic_reader constructs with the complete input:
< https://github.com/breese/trial.protocol/blob/4bdf90747944f24b61aa9dbde92d8f...
There is no API to provide additional buffers. By "incremental" I mean an "online algorithm", i.e. the entire input does not need to be presented at once. For example, this is what it might look like using boost.json to incrementally parse a JSON from a socket:
json::value parse( net::ip::tcp::socket& sock ) { error_code ec; json::parser p; p.start(); for(;;) { char buf[4096]; auto const n = sock.read_some( net::mutable_buffer(buf, sizeof(buf)), ec); if(ec == net::error::eof) break; if(ec) throw system_error(ec); p.write(buf, n, ec); if(ec) throw system_error(ec); } p.finish(); return p.release(); }
Serialization functions similarly. The caller provides a buffer, and the implementation attempts to fill the buffer with serialized JSON. If the buffer is not big enough, subsequent calls may be made to retrieve the rest of the serialized output.
There are a few remarks that ought to be added to these statements. What I have in mind for an incremental parser is an in situ/in-place algorithm in which case there is no auxiliary data structure (i.e. only a small and constant amount of space may be used). These are my expectations when I see a library which advertises incremental parsing. I don't expect the library to internally buffer everything I feed it to if it advertises itself as an incremental parser. Yet, this is what I see in your example itself, which is using the DOM object itself as a buffer: https://github.com/vinniefalco/json/blob/04fe8c2ba8c3414e51a44017638688063e1... This is from your example, not from the library. The library at least offers basic_parser which does meet my expectations (for an incremental parser). JSON allows recursion and it is not really possible to parse arbitrary JSON values w/o at least a stack of states which both Trial.Protocol and your library use. So even if they aren't strictly in situ parsers either, this unavoidable violation is acceptable. And now back to your expectation, there are other concerns that I want to bring to the table. The property you want to offer is useful to streamable formats, but JSON has never been a streamable format. No libraries out there offer such functionality (please correct me if I'm wrong) and therefore, as an interchange format for a multitude of services, the messages designed around JSON have not been designed to rely on the property to stream any information. If streaming is required, you're better off with an extra communication channel to transfer the streamable data. The context switch to parse small chunks of independent JSON values would actually hurt the cache. With all that in mind, even if streamable JSONs was a property we wanted to tackle, the design of Trial.Protocol can easily accommodate such feature with very small changes. My HTTP (which is a streamable format) parser was inspired by Trial.Protocol and this was the first divergence I had to tackle (which was pretty easy to solve). -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/
On Mon, Nov 25, 2019 at 8:02 AM Vinícius dos Santos Oliveira
The property you want to offer is useful to streamable formats, but JSON has never been a streamable format. No libraries out there offer such functionality (please correct me if I'm wrong)
The use-case is performing bounded work in each I/O cycle. For example when reading from a socket. Regards
On 2019-11-24 16:30, Vinnie Falco wrote:
I'm not seeing where trial.protocol has incremental algorithms, perhaps you can show me? trial::protocol::json::basic_reader constructs with the complete input:
You are right. Chunked parsing was missing. The json::reader is a low level API that does not manage buffers, but it does indeed lack a function for updating the view without resetting the other state (the nesting levels.) Fortunately that is a trivial change, which I have just added for your pleasure, along with an example of chunked parsing that is akin to yours: https://github.com/breese/trial.protocol/commit/681afed05f32eb5288d6703d5537... Buffer management can obviously be optimized in the example.
I disagree. Again the central premise here is that there is no ideal JSON library which can suit all needs. I believe this is why Boost does not yet have a JSON library. Optimizing one use-case necessarily
Maybe there is no single JSON library that suits all needs. However, your JSON library supports a rather limited set of use-cases that excludes Boost.Serialization style serialization, whereas Trial.Protocol supports the same use-cases and many more with simpler building-blocks.
comes at the expense of others. At the very least, the inversion of the flow of control (i.e. a parser which returns a token at a time) advantages one use case and disadvantages others (such as working as an online algorithm). There are other tradeoffs. Because my library
That is an incorrect claim. Both push and pull parsers needs to parse tokens one by one and keep state information. You do not have to take my word for it; here is what Thiago Macieira had to say about it: "First of all, the type information is there anyway. The only difference is whether it is exposed to the user in the API or whether it's hidden. In the SAX case, because of the push-style API, it's hidden and the parser does the switching for you and calls your function. Second, you must provide something like an iterator anyway if you want to provide some type properties to the visited function. Properties like array size, string length, etc. That is required if you want to implement string zero-copy. [...] Mark my words: the SAX parser will be implemented on top of the StAX one." From https://groups.google.com/a/isocpp.org/d/msg/std-proposals/JNZzOvC7llo/l1DVh... And just confirm his "mark my works" comment, this is how simple it is to create a push parser with a pull parser: https://github.com/breese/trial.protocol/blob/develop/example/json/push_pars...
Bjorn Reese wrote:
And just confirm his "mark my works" comment, this is how simple it is to create a push parser with a pull parser:
https://github.com/breese/trial.protocol/blob/develop/example/json/push_pars...
One immediate observation: the on_string() callback requires the parser to buffer the entire string. Vinnie's parser serves the string in parts which avoids the needs to buffer.
On Sat, Nov 30, 2019 at 7:33 AM Bjorn Reese via Boost
your JSON library supports a rather limited set of use-cases that excludes Boost.Serialization style serialization, whereas Trial.Protocol supports the same use-cases and many more with simpler building-blocks.
I think that we need to be precise with terminology. The three components of the library I am proposing are thus: 1. Container types (value, object, array, string) 2. Storage API (storage_ptr, storage, scoped_storage, pool) 3. Parser and Serializer When you say "excludes Boost.Serialization style serialization" I believe you are referring only to the Parser in the list above. Specifically, that a parser which extracts and returns a token at a time, rather than consuming the entire input buffer and invoking a callback, is a more fundamental building block as it enables more use-cases. Please let me know whether this is an accurate characterization of your statements. Assuming that it is... I disagree with your analysis for these reasons: 1. You can always use the JSON value container as the archive with Boost.Serialization. 2. Skipping the value container and archiving from a JSON using a token-based parser may be more efficient, but it is dependent on the order of keys and thus can no longer be considered JSON, despite having the same syntax. 3. It remains to be proven that a token-based parser which inverts the flow of control is as efficient. If you would like to add support for measuring Trial.Protocol in the "bench" program of my library, I would be happy to see the results.
That is an incorrect claim.
And yet you provided evidence to support my claim - the recent changes you made to your code require buffering partial inputs. However, we are getting lost in the weeds of parsing. Generally speaking, parsing and serialization are the least interesting aspects of a protocol library. This is true for HTTP, WebSocket, JSON, and URLs. The important part is the container, because this is what appears in public interfaces. For example, check out this library: https://github.com/pokowaka/jwt-cpp/blob/b1db67e54f01f72c914af82aaea9a8d49d6... It implements something used for OAuth2. Note that they currently use nlohmann's JSON. Instead, they should be using `boost::json::value`, the type from my library, because it is more suited as a vocabulary type. If users are clamoring for a parser that returns individual tokens, there is no impediment to providing one as an additional parser, in a separate Boost library. However, I do not hear those calls from very many folks at all (just you and Vinicius if I am being honest). In the meanwhile, people have a very real and very pressing need for exactly the set of features that my library offers, which is to parse JSON into a container, inspect it and modify it, and then serialize it back out. This is the foundation of practically every REST client and REST server endpoint (think JSON-RPC). Now that Boost has Beast, a solid JSON library which optimizes for the network use-case is obviously something that Boost needs. You said that there is "limited" utility for this use-case but I strongly disagree. I get questions all the time about how to use the Beast message container with a JSON payload. For example: https://github.com/pokowaka/jwt-cpp/issues/43#issuecomment-559291355 Thanks
Em sáb., 30 de nov. de 2019 às 13:56, Vinnie Falco via Boost < boost@lists.boost.org> escreveu:
However, we are getting lost in the weeds of parsing. Generally speaking, parsing and serialization are the least interesting aspects of a protocol library. This is true for HTTP, WebSocket, JSON, and URLs. The important part is the container, because this is what appears in public interfaces.
If you don't care about the design of the parser interface, why are you even exposing it outside of a detail namespace? I've been using the older pull-based library for a few years. For my use-case, there are a few services who consume JSON values. For these consumers, we can classify the consumption into two categories: * The JSON needs to be processed and forwarded further. * The JSON is consumed at that end. For the first use-case, a DOM interface works great. For the second use-case, putting the JSON in a tree would be a waste. There are a few notes that I'd like share from my experience with a JSON pull parser: * Matching and decoding are done in separate steps. This means I simply skip string fields that I don't need. These fields that I don't need don't imply constructing a new string to unescape/decode special sequences into raw strings (for field keys I can even consume just the literal if it's okay to violate the principle of "be liberal in what you accept"). * Returning one token at a time is not that interesting. The interesting bit is that I'm not robbed of the control flow. I can build functions that parse the current point from the reader. These functions won't consume one token. These functions may consume a group of tokens (e.g. objects) and return a single C++ value. These interfaces are composable. * Algorithms can be built around the pull parser interface. As an example, there is partial::skip() and partial::parse() already on the repo. The first will skip the current element (e.g. one token for integers and a group of tokens for objects). The latter will parse the pointed subtree into a DOM object. * My messages have a common part with stuff such as timestamp which are common to all messages. My consume-loop for specialized parsers can just forward the elements they don't know how to deal with to a common class that fills the common fields. If you only see the DOM container as "consumable" and "useful to expose in a public interface", maybe it is because you haven chosen the SAX approach. If users are clamoring for a parser that returns individual tokens,
there is no impediment to providing one as an additional parser, in a separate Boost library. However, I do not hear those calls from very many folks at all (just you and Vinicius if I am being honest).
You haven't heard about it from very many folks. Then you came here for feedback just to argue the feedback is irrelevant. If you're not interested in feedback, why even post on the mailing list? I'm glad I'm not developing a JSON library because the rushed way you try to make this into wouldn't be the most pleasant experience to deal with (and we better match your time constraints to answer). In the meanwhile, people have a very real and very pressing need Boost is no place to pressing libraries to be delivered on a timeline. These people who need a library today can use any library on the market. C++ sucks to include new libraries in your project, and some may see in Boost a place where they can install a bunch of libraries at once, but that's not what Boost is about. Boost is about peer-reviewed high-quality libraries. If one library is not ready, it goes back to the design phase. A library will be ready when it's ready. for
exactly the set of features that my library offers, which is to parse JSON into a container, inspect it and modify it, and then serialize it back out. This is the foundation of practically every REST client and REST server endpoint (think JSON-RPC). Now that Boost has Beast, a solid JSON library which optimizes for the network use-case is obviously something that Boost needs.
Waste no time defending this view. This view is not being challenged. Nobody is arguing against a DOM object as far as I can tell. Backing down to defend the DOM object makes no case to the more heated object under discussion. You said that there is "limited" utility for this use-case but I
strongly disagree. I get questions all the time about how to use the Beast message container with a JSON payload. For example:
Trial.Protocol also has a DOM object. Bjørn obviously also sees value in such container then. That's not what's under heated debate. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/
On 2019-11-30 17:55, Vinnie Falco wrote:
When you say "excludes Boost.Serialization style serialization" I believe you are referring only to the Parser in the list above.
Not really. I am refering to the ability to create Boost.Serialization input and output archives.
Specifically, that a parser which extracts and returns a token at a time, rather than consuming the entire input buffer and invoking a callback, is a more fundamental building block as it enables more use-cases. Please let me know whether this is an accurate characterization of your statements. Assuming that it is...
I disagree with your analysis for these reasons:
1. You can always use the JSON value container as the archive with Boost.Serialization.
The whole point of Boost.Serialization is to not go though an intermediate DOM, but to parse the data directly into the data structures provided by the user.
2. Skipping the value container and archiving from a JSON using a token-based parser may be more efficient, but it is dependent on the order of keys and thus can no longer be considered JSON, despite having the same syntax.
This argument makes no sense at all. Firstly, JSON Object is unordered, so any key permutation is a valid syntax. ECMA-404 is quite explicit about this. Secondly, the json::reader and json::writer processors do not change the order of key-value pair. If the data structure used by the user preserves the order, then so will the serialization.
That is an incorrect claim.
And yet you provided evidence to support my claim - the recent changes you made to your code require buffering partial inputs.
I do not see how that follows. I made a single change to the json::reader class in order to support chunked parsing. This had no performance degradations. I also added a simple _example_ to show how chunked parsing can be done, and I even stated that this example could be optimized. However, putting optimized code in examples tends to obscure the purpose of the example. There is really nothing that confirms your claim.
nlohmann's JSON. Instead, they should be using `boost::json::value`, the type from my library, because it is more suited as a vocabulary type.
This is an odd argument to raise here given that Trial.Protocol also has a vocabulary type, as well as DOM parsing/formatting.
You said that there is "limited" utility for this use-case but I strongly disagree. I get questions all the time about how to use the
I am saying that your JSON library supports less use-case than Trial.Protocol. Whether or not that is sufficient is a subjective judgment.
Bjorn Reese wrote:
Secondly, the json::reader and json::writer processors do not change the order of key-value pair. If the data structure used by the user preserves the order, then so will the serialization.
As I already stated in a previous message, if your reader can only read what your writer writes, your format is JSON in name only. A "real" JSON reader must be able to read not just the literal output of the writer, but a modified JSON file that is (per spec) equivalent to the original. Which includes reordered fields, as you yourself wrote in the very previous paragraph.
Firstly, JSON Object is unordered, so any key permutation is a valid syntax. ECMA-404 is quite explicit about this.
On 12/8/19 7:48 AM, Peter Dimov via Boost wrote:
As I already stated in a previous message, if your reader can only read what your writer writes, your format is JSON in name only. A "real" JSON reader must be able to read not just the literal output of the writer, but a modified JSON file that is (per spec) equivalent to the original.
By that definition the XML serialization implemented in the Boost Serialization is not a "real" XML reader. So what. No serialization library can claim to be able to read arbitrary input and automatically map it so some pre-determined C++ data structure. If someone is interested in a JSON format for boost serialization, he can use the current XML setup as a model. If one feels that the current XML implementation is not a "real" parser - that's OK too. I do get a complaint from time to time that XML archive cannot be freely edited - but not by anyone who understands the inherent limitations of what a C++ serialization can do.
Which includes reordered fields, as you yourself wrote in the very previous paragraph.
Firstly, JSON Object is unordered, so any key permutation is a valid syntax. ECMA-404 is quite explicit about this.
And for those who might want a system of editable archives - there's another good model: Google protocol buffers. That might be a system where one describes a data structure in terms of some JSON primitives and the user uses a parser to pick this apart to build his C++ data structure and maintains this coupling as time goes on. This might be a viable, and/or interesting system, but it's unrelated to what Boost serialization currently is and has been for 15 years. Robert Ramey
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, December 8, 2019 3:48 PM, Peter Dimov via Boost
Bjorn Reese wrote:
Secondly, the json::reader and json::writer processors do not change the order of key-value pair. If the data structure used by the user preserves the order, then so will the serialization.
As I already stated in a previous message, if your reader can only read what your writer writes, your format is JSON in name only. A "real" JSON reader must be able to read not just the literal output of the writer, but a modified JSON file that is (per spec) equivalent to the original. Which includes reordered fields, as you yourself wrote in the very previous paragraph.
This response could've easily been to Vinnie as well. I believe what Bjorn was mentioning - The reader can assume the same order, until it misses a field and then buffers into a generic json::value. In some use cases, the order will always be the same since the application will use the same implementation for reading and writing. I also mentioned earlier in the thread that higher performance will frequently (nearly always) be with linear searching the fields of a struct compared to using a generic data structure and moving/copying. This does require an interface that isn't typical of existing JSON parsers (something template variadic most likely). Vinnie refuted the latter approach, claiming that the JSON parser was not designed to suit all needs. A "pull" parser will allow for the typical generic interface, a SAX/push parser interface, and allow advanced users to use either of the techniques described in the above paragraph. I believe Bjorn and Vinícius are both arguing for this type of parser, and if so I agree with them.
Firstly, JSON Object is unordered, so any key permutation is a valid syntax. ECMA-404 is quite explicit about this.
Lee
On Sun, Dec 8, 2019 at 7:21 AM Bjorn Reese via Boost
The whole point of Boost.Serialization is to not go though an intermediate DOM, but to parse the data directly into the data structures provided by the user.
Then I am understanding you correctly. You claim that Trial.Protocol is more suited than Boost.JSON for integration with Boost.Serialization, because of the parser interface. I don't think this is correct for the case of going directly to a user-defined type, for the reason I stated earlier which you agree with: that JSON does not prescribe the ordering of elements of objects. Consider a simple struct with members a, b, and c. We might archive it this way: a & BOOST_SERIALIZATION_NVP("a", a); b & BOOST_SERIALIZATION_NVP("b", b); c & BOOST_SERIALIZATION_NVP("c", c); The resulting JSON might look like this: { "a" : 1, "b" : 2, "c" : 3 } However, a perfectly valid encoding for an identical object could look like this: { "c" : 3, "b" : 2, "a" : 1 } In this case the serialization code above will not function correctly given the alternate encoding of the JSON, because the code assumes that the order of keys will be the same as when it was serialized. An obvious solution is to parse into the intermediate DOM. JSON is not an appropriate archive format for serialization, for the reason that the order is not prescribed. You could implement a Boost.Serialization archive for it, and the result would have the *format* of JSON but it would not have the same semantics. Thus it cannot really be considered JSON. And if we are going to choose a format which is not-JSON, there are many better formats with less overhead.
Firstly, JSON Object is unordered, so any key permutation is a valid syntax. ECMA-404 is quite explicit about this.
Exactly.
If the data structure used by the user preserves the order, then so will the serialization.
Exactly. "if" the order of keys in the archive is preserved, then it works. This is "not-JSON." The use-case for a JSON serialization archive is not for the same program to round-trip the data, otherwise there are more efficient formats as I stated above. A useful use-case would be to serialize a user-defined type T to JSON, in a way that other programs can access, even if the keys are reordered. And to go the other way, from an external program that produces a JSON, to deserializing into a user-defined type T, even if the keys are reordered. A typical example is a C++ server communicating with a JavaScript/HTML5 client. The JavaScript program doesn't know about and doesn't care about Boost.Serialization, nor does it preserve the order of keys.
Your parser currently uses stack variables, so the adaptor has to use stackful coroutines which would incur two context-switches on every iteration.
I was referring to stackless coroutines. That type of coroutine still has stack variables. It just doesn't save the entire stack at a suspend point. The compiler can turn any function which has statically bounded stack requirements into a "stackless coroutine." Thanks
On 2019-12-08 16:48, Vinnie Falco wrote:
Consider a simple struct with members a, b, and c. We might archive it this way:
a & BOOST_SERIALIZATION_NVP("a", a); b & BOOST_SERIALIZATION_NVP("b", b); c & BOOST_SERIALIZATION_NVP("c", c);
The resulting JSON might look like this:
{ "a" : 1, "b" : 2, "c" : 3 }
You are specifying the order of the entries in the serialization code,
so I would expect this to be serialized into a JSON Array:
[ ["a", 1], ["b", 2], ["c", 3] ]
However, I acknowledge your point about reading "tagged" data from
a JSON Object. This is fairly easy to do, but does involve a bit of
boilerplate because the current serialization wrappers assume that
classes are encoded as JSON Array, so we need to use non-intrusive
serialization in order to by-pass this default behaviour.
Assuming we have a "struct alphabet" with the a, b, c variables, then
the following wrapper will read the variables in any user-modified
order. No changes to the any of the APIs are needed:
namespace trial { namespace protocol { namespace serialization {
template <typename CharT>
struct load_overloader
I was referring to stackless coroutines. That type of coroutine still has stack variables. It just doesn't save the entire stack at a suspend point. The compiler can turn any function which has statically bounded stack requirements into a "stackless coroutine."
Do you have a reference to how these stackless-with-bounded-stack coroutines works?
On Mon, Dec 9, 2019 at 6:30 AM Bjorn Reese via Boost
then the following wrapper
Yeah, that's a different use-case. A legitimate one but not the one that my library solves, and not one that I have an interest in solving.
namespace trial { namespace protocol { namespace serialization {
I keep seeing this Trial.Protocol but the library being proposed (which has no review manager yet) is Boost.JSON. It has not yet been proven that a token-at-a-time parser can yield the same performance and incremental behavior features as the one provided in Boost.JSON. Of course, if you would like to incorporate support for Trial.Protocol in the benchmark for comparison I encourage it. Simply implement `any_impl` for your algorithms and add it to the vector in main: https://github.com/vinniefalco/json/blob/develop/bench/bench.cpp It also seems unfortunate that we have taken a hard turn into the discussion of Boost.Serialization which I have to stress is not a contemplated use-case for this library (although it would certainly be possible to implement an archive for it). I am very interested in hearing about things which do land within the use-case, especially things which can be an improvement, as suggestions for improvements of Boost.Serialization-related uses are not particularly helpful (again, because Boost.Serialization is not a contemplated use-case). To get such a conversation started, here are some possible topics: * The design of boost::json::storage and boost::json::storage_ptr * Usability of the customization points for going to/from user defined types * The individual boost::json::object, boost::json::array, and boost::json::string classes * DOM API An even better technique for discovering topics of conversation would be to actually use the library and give feedback... has anyone done that?
Do you have a reference to how these stackless-with-bounded-stack coroutines works?
Nothing handy but there are many videos, blog posts, and wg21 papers about coroutines (search for "Coroutines TS"). Thanks
On Mon, Dec 9, 2019 at 6:30 AM Bjorn Reese via Boost
Do you have a reference to how these stackless-with-bounded-stack coroutines works?
This timely blog post just surfaced: https://devblogs.microsoft.com/oldnewthing/20191209-00/?p=103195 Regards
On Mon, Dec 9, 2019 at 9:51 AM Vinnie Falco
...
I've got some great news! I finished up the customization point allowed conversions to and from user-defined types and JSON values. This documentation page is complete: https://vinniefalco.github.io/doc/json/json/usage/conversion.html And a friendly reminder, this library needs a review manager! Merry Christmas and May God Bless You in the New Year! Vinnie
On 11/17/19 4:40 PM, Peter Dimov via Boost wrote:
Assuming we're talking about the first option, it should be possible to use the proposed JSON library to implement a JSON input archive that reads from a json::value. The output archive doesn't really need a library.
Because the average developer knows how to properly format JSON strings and how to handle NaN?
On Sun, Nov 17, 2019 at 3:45 AM Bjorn Reese via Boost
Should there be a JSON archive for Boost.Serialization?
Yes. The archive should be immune to reordering of keys. That is, that the archive can be correctly deserialized even if the keys are reordered in the JSON after serialization. This requires buffering the contents of the archive in an intermediate data structure. Boost.JSON is a logical choice here, its DOM is compact and the parser and serializer have top-tier performance [1] Thanks [1] Boost.JSON benchmarks https://vinniefalco.github.io/doc/json/json/benchmarks.html
On 12/8/19 8:02 AM, Vinnie Falco via Boost wrote:
On Sun, Nov 17, 2019 at 3:45 AM Bjorn Reese via Boost
wrote: Should there be a JSON archive for Boost.Serialization?
Yes.
The archive should be immune to reordering of keys. That is, that the archive can be correctly deserialized even if the keys are reordered in the JSON after serialization.
Nope
On 12/8/19 10:01 AM, Robert Ramey via Boost wrote:
On 12/8/19 8:02 AM, Vinnie Falco via Boost wrote:
On Sun, Nov 17, 2019 at 3:45 AM Bjorn Reese via Boost
wrote: Should there be a JSON archive for Boost.Serialization?
Yes.
The archive should be immune to reordering of keys.
an "archive" in the sense used by boost serialization is not and cannot be immune to the reordering of keys.
That is, that the archive can be correctly deserialized even if the keys are reordered in the JSON after serialization.
an "archive" in the sense used by boost serialization cannot do this. Of course, if compatibility with the current Boost Serialization library is not a consideration then one can define "archive" anyway he wants. If wants to make a different serialization library, again, one could define "archive" in accordance with that new library. I remember that I got the word "archive" from the MS MFC library which used this term and handles the same way. The above is true of all other libraries that I know about which refer to themselves as "serialization libraries". This included MS .Net, Cereal, and others. FYI - Cereal includes an archive based on JSON I believe if you want to look at it. I've tried to explain why serialization does not imply the facility to edit an archive in a general sense. I'm not sure how much more to say about this. Robert Ramey
On Sun, Dec 8, 2019 at 10:44 AM Robert Ramey via Boost
...
This all makes sense*. Please answer this question:
Should there be a JSON archive for Boost.Serialization?
If your answer to the question above is yes then please answer this question: What is the benefit of a "JSON archive" that is not achieved with other archive types? Thanks * For a suitable definition of "sense"
On 12/8/19 10:47 AM, Vinnie Falco via Boost wrote:
On Sun, Dec 8, 2019 at 10:44 AM Robert Ramey via Boost
wrote: ...
This all makes sense*. Please answer this question:
Should there be a JSON archive for Boost.Serialization?
If someone wants it. sure. but I don't recall anyone actually asking me for this. Given the example of XML, it wouldn't be hard for people to make their own and perhaps people have with out advising me. I'm aware that many have made their own special purpose archive classes - handling complex floating point, versions which avoid using the file system and copy to memory for extra speed, etc. And of course one can pipe the output/input of any archive class through boost iostreams for encryption, compression etc. The boost serialization library is not a library of archive classes, it's a library to build archive classes and it includes a few examples.
If your answer to the question above is yes then please answer this question:
What is the benefit of a "JSON archive" that is not achieved with other archive types?
For my purposes - nothing. The text version is just fine. But I felt the same about XML and some people found a use for it. (I actually use it to display my archive data when I have the read/write out of sync). I suppose there's someone who uses xslt to convert the XML archive into something displayable like PDF or html. But I haven't had occasion to do this. I do know that some people have edited the XML archive with some success. One can't add / delete fields but altering the values in place would work and might be useful. I once under took a project to make an archive class which would produce an editable form - a very interesting idea - but I lost interest in it. Sooo someone could find JSON useful for some special purposes. My real problem comes when people believe that the serialization library which is very easy to use can/should be able to do what google protocol buffers does. It's not possible because it's a different job. B/S C++ data structures <-> byte stream G/P/C GP C++ data structures <-> byte stream ^ | v custom C++ or other language code So if you want to explore how your JSON parser might be involved in an alternative to google protocol buffers, that might be an interesting idea - so you might find it interesting to take a look at google protocol buffers - which BTW is likely more popular than Boost Serialization any way.
Robert Ramey wrote:
What is the benefit of a "JSON archive" that is not achieved with other archive types?
For my purposes - nothing. The text version is just fine.
There is nothing inherently preventing the serialization library from serving the additional use case that you consider off-limits. Your only concern is serializing a C++ structure to a mostly unspecified format and then reading it back, but there's also a scenario where you have an existing and specified format to which you tailor your C++ structures so that serializing them produces the specified format, and deserializing them reads it. This is admittedly somewhat complicated by the - largely opaque - metadata the library inserts, but it's doable.
participants (10)
-
Bjorn Reese
-
Dominique Devienne
-
Glen Fernandes
-
James E. King III
-
Julien Blanc
-
Lee Clagett
-
Peter Dimov
-
Robert Ramey
-
Vinnie Falco
-
Vinícius dos Santos Oliveira