[review][JSON] json::value as a vocabulary type

older
Changing virtual memory amount in...

Andrzej Krzemienski

24 Sep 2020 24 Sep '20

9:57 a.m.

Hi Everyone, I have heard this claim a number of times: json::value is suitable for a vocabulary type. I am not sure what that actually means: no templates? Guarantee that layout or mangled symbol will never change? Anything else? My understanding of a "vocabulary type" is that it should be usable (not necessarily with maximum efficiency) for *any* usage. In the case of JSON that would mean that I should be able to represent any value that corresponds to a valid JSON when converted to text. I do not think that json::value can claim that without the ability to serialize arbitrarily big numbers. I understand that the goal of the library is to address the most common cases, and big numbers do not fall into this category. I am just saying that the name "vocabulary type" may not be accurate here. Regards, &rzej;

Show replies by date

Mike

24 Sep 24 Sep

11:06 a.m.

...

Gesendet: Donnerstag, 24. September 2020 um 11:57 Uhr Von: "Andrzej Krzemienski via Boost" <boost@lists.boost.org>

Hi Everyone, I have heard this claim a number of times: json::value is suitable for a vocabulary type. I am not sure what that actually means: no templates? Guarantee that layout or mangled symbol will never change? Anything else?

I'm sure there exist different interpretations of that term, but the most important aspect for me has *nothing* to do with implementation stability, but that it is the common way to represent some type of information in the various interfaces across the larger c++ eco system. I.e. it is part of the common vocabulary used. E.g. std::string_view can be considered the vocabulary type for string parameters, and even if I use my custom string implementation internally for some reason, I'll make dam sure that my interface accepts std::string_view and returns something that can at least be implicitly be converted to std::string_view. Having such common vocabulary types that represent more complex data than just numbers and strings could greatly facilitate the integration and composition of mutliple different libraries, because I don't have to translate the data from the "language" spoken by lib Foo to the "language" used by lib Bar in order to hand the output from one to the other. E.g. we unfortunately don't have a real vocabulary type for vector data (in the mathematical sense). So, if I want to use GLM to do some linear algebra computation and then display the result in a Qt GUI, I'll - at some point - have to translate from glm::vec2 to QVector2D, or some such. If both libs would use the same vocabulary (at least in their interface), like a std::vec2, I could just pass the result from the glm computation directly to my GUI code without translation (and the associated danger of introducing bugs or loosing data). One of the big strengths of c++ is the ability to create data types that "feel" the same as native types (a.k.a value-types), but ever time we want to hand data from one library to another we either have to decompose the data types to their fundamental components (and even strings can not always be forwarded directly) or write everything as a template. Whether c++ is in need of a JSON vocabulary type and if Boost.JSON does provide a good one is a question I unfortunately can't answer yet (otherwise I'd have written a review), but imho the worth of that library should not just be measured by whether it is suited as a general vocabulary type for the whole c++ eco system, but if it provides a sound (doesn't necessarily need to be optimal) basis for building higher level libs on top of it in the future (inside and outside of boost). (e.g. implementing JSON based internet protocols). Best Mike P.S.: one word about templates as vocabulary types: The danger is that you effectively get not one type to represent e.g. JSON data, but one for each possible instantiation and you are back to square one (think about std::string, vs std::wstring, vs std::u8string). For vector data on the other hand, 2D and 3D vectors represent different kinds of data, so having different types is OK and using a single template instead of repeating the same logic N times makes sense. The chrono duration types are a bit in-between in this regard as there are many different types, but at least conversion between them is relatively easy or even implicit.

Hadriel Kaplan

3:07 p.m.

...

On Sep 24, 2020, at 7:06 AM, Mike via Boost <boost@lists.boost.org> wrote:

Whether c++ is in need of a JSON vocabulary type and if Boost.JSON does provide a good one is a question I unfortunately can't answer yet (otherwise I'd have written a review), but imho the worth of that library should not just be measured by whether it is suited as a general vocabulary type for the whole c++ eco system, but if it provides a sound (doesn't necessarily need to be optimal) basis for building higher level libs on top of it in the future (inside and outside of boost). (e.g. implementing JSON based internet protocols).

This is anecdotal, of course, but within my company’s codebase the equivalent JSON-based variant structure is indeed used as a vocabulary type and passed between libraries - although of course they’re our own libraries, so it’s not really what you mean. It’s extremely convenient and its usage has become somewhat viral. We use Facebook’s `folly::dynamic` for that variant type today, and out of an average size (1M+ LOC) code base, the string “folly::dynamic” appears over 7,600 times. A lot of that usage is in unit test code+libraries, where we use the type for various purposes, but a lot of it is also in production code. It is *not* only used for when we need parsing or serialization to/from JSON, although certainly that’s a big usage too; and makes it even more convenient as a value type because we can serialize it to logs for debugging, or parse from strings/files for unit testing library APIs. Of course the downside with using such a dynamically-typed structure as a vocab type is it it won’t be as efficient as statically-typed ones, and if you put the wrong stuff in it you won’t get compile-time failures. But that’s an acceptable trade-off for some people/use-cases. -hadriel

Vinnie Falco

12:19 p.m.

On Thu, Sep 24, 2020 at 2:58 AM Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:

...

My understanding of a "vocabulary type" is that it should be usable (not necessarily with maximum efficiency) for *any* usage. In the case of JSON

When I use the term I refer to the ability to build higher level abstractions. Here's a perfect example: <https://github.com/arun11299/cpp-jwt> This library implements RFC-7519 and uses objects of type nlohmann::json in its public interface. I argue that boost::json::value would be a superior type to what this library currently uses. That is what is meant when Boost.JSON claims to be a "vocabulary type." It certainly does not mean that arbitrary precision numbers are supported, that every possible use-case is supported, or that it can store any payload with perfect fidelity. Thanks

Mathias Gaunard

25 Sep 25 Sep

11:05 a.m.

On Thu, 24 Sep 2020 at 10:57, Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:

...

My understanding of a "vocabulary type" is that it should be usable (not necessarily with maximum efficiency) for *any* usage. In the case of JSON that would mean that I should be able to represent any value that corresponds to a valid JSON when converted to text. I do not think that json::value can claim that without the ability to serialize arbitrarily big numbers.

I fully agree with this statement. json::value *needs* to support arbitrary numbers. It's incomplete without it. Maybe the author of multiprecision can advise on the best type to use there (gmp or mpfr?).

Zach Laine

1:09 p.m.

On Fri, Sep 25, 2020 at 6:11 AM Mathias Gaunard via Boost <boost@lists.boost.org> wrote:

...

On Thu, 24 Sep 2020 at 10:57, Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:

...
My understanding of a "vocabulary type" is that it should be usable (not necessarily with maximum efficiency) for *any* usage. In the case of JSON that would mean that I should be able to represent any value that corresponds to a valid JSON when converted to text. I do not think that json::value can claim that without the ability to serialize arbitrarily big numbers.

I fully agree with this statement. json::value *needs* to support arbitrary numbers. It's incomplete without it. Maybe the author of multiprecision can advise on the best type to use there (gmp or mpfr?).

This is not a reasonable requirement. std::string is the canonical C++ vocabulary type. On 32-bit systems, it cannot represent 5GB-long strings. Depending on platform limitations, it usually cannot even represent more than 2GB-long strings. Computers are limited to finite resources. Putting finite limits on the representation of all kinds of values is normal, not unexpected -- this is especially true of numeric values. Zach

Andrzej Krzemienski

2:06 p.m.

pt., 25 wrz 2020 o 15:10 Zach Laine via Boost <boost@lists.boost.org> napisał(a):

...

On Fri, Sep 25, 2020 at 6:11 AM Mathias Gaunard via Boost <boost@lists.boost.org> wrote:

...
On Thu, 24 Sep 2020 at 10:57, Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:

...
My understanding of a "vocabulary type" is that it should be usable

(not

...
...
necessarily with maximum efficiency) for *any* usage. In the case of JSON that would mean that I should be able to represent any value that corresponds to a valid JSON when converted to text. I do not think that json::value can claim that without the ability to serialize arbitrarily big numbers.

I fully agree with this statement. json::value *needs* to support arbitrary numbers. It's incomplete without it. Maybe the author of multiprecision can advise on the best type to use there (gmp or mpfr?).

This is not a reasonable requirement. std::string is the canonical C++ vocabulary type. On 32-bit systems, it cannot represent 5GB-long strings. Depending on platform limitations, it usually cannot even represent more than 2GB-long strings. Computers are limited to finite resources. Putting finite limits on the representation of all kinds of values is normal, not unexpected -- this is especially true of numeric values.

I am wondering. If I have a small web service for generating prime numbers, and I need to return them in a JSON file, is my only option to pass it as string? Prime numbers of this kind are bigger than uint64_t. But they are not as big as 1MB. Is such a use case for a number so unusual that it cannot be stored as a JSON number? Are JSON numbers only good for storing int-based identifiers? Regards, &rzej;

Zach Laine

2:12 p.m.

On Fri, Sep 25, 2020 at 9:07 AM Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:

...

pt., 25 wrz 2020 o 15:10 Zach Laine via Boost <boost@lists.boost.org> napisał(a):

...
On Fri, Sep 25, 2020 at 6:11 AM Mathias Gaunard via Boost <boost@lists.boost.org> wrote:

...
On Thu, 24 Sep 2020 at 10:57, Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:

...
My understanding of a "vocabulary type" is that it should be usable

(not

...
...
necessarily with maximum efficiency) for *any* usage. In the case of JSON that would mean that I should be able to represent any value that corresponds to a valid JSON when converted to text. I do not think that json::value can claim that without the ability to serialize arbitrarily big numbers.

I fully agree with this statement. json::value *needs* to support arbitrary numbers. It's incomplete without it. Maybe the author of multiprecision can advise on the best type to use there (gmp or mpfr?).

This is not a reasonable requirement. std::string is the canonical C++ vocabulary type. On 32-bit systems, it cannot represent 5GB-long strings. Depending on platform limitations, it usually cannot even represent more than 2GB-long strings. Computers are limited to finite resources. Putting finite limits on the representation of all kinds of values is normal, not unexpected -- this is especially true of numeric values.

I am wondering. If I have a small web service for generating prime numbers, and I need to return them in a JSON file, is my only option to pass it as string? Prime numbers of this kind are bigger than uint64_t. But they are not as big as 1MB. Is such a use case for a number so unusual that it cannot be stored as a JSON number? Are JSON numbers only good for storing int-based identifiers?

I don't know what an int-based identifier is, but I do know that the use cases for machine-representable ints (that is, and int that is the size of an int, fits in a register, etc.) is >99% and the use cases for a web service that generates prime numbers is <1%. That's what should drive the design. Zach

Alexander Grund

2:17 p.m.

...

...
I am wondering. If I have a small web service for generating prime numbers, and I need to return them in a JSON file, is my only option to pass it as string? Prime numbers of this kind are bigger than uint64_t. But they are not as big as 1MB. Is such a use case for a number so unusual that it cannot be stored as a JSON number? Are JSON numbers only good for storing int-based identifiers? I don't know what an int-based identifier is, but I do know that the use cases for machine-representable ints (that is, and int that is the size of an int, fits in a register, etc.) is >99% and the use cases for a web service that generates prime numbers is <1%. That's what should drive the design.

To add to that: JSON is essentially JavaScript based. JavaScript had long time no ints, only doubles. And the new BigInt can't be converted to JSON So to answer the question: Yes your only option is to pass it as string. Otherwise it is foremost non-portable.

Mathias Gaunard

6:18 p.m.

On Fri, 25 Sep 2020 at 15:17, Alexander Grund via Boost <boost@lists.boost.org> wrote:

...

To add to that: JSON is essentially JavaScript based. JavaScript had long time no ints, only doubles. And the new BigInt can't be converted to JSON

So to answer the question: Yes your only option is to pass it as string. Otherwise it is foremost non-portable.

JSON is not JavaScript and JavaScript is not JSON.

Vinnie Falco

6:28 p.m.

On Fri, Sep 25, 2020 at 11:19 AM Mathias Gaunard via Boost <boost@lists.boost.org> wrote:

...

JSON is not JavaScript and JavaScript is not JSON.

JSON literally stands for "JavaScript Object Notation" and while these two aren't the same, there is certainly a relationship between the two that must factor into any discussion of its use-cases. Thanks

Vinícius dos Santos Oliveira

2:19 p.m.

Em sex., 25 de set. de 2020 às 11:12, Zach Laine via Boost <boost@lists.boost.org> escreveu:

...

I don't know what an int-based identifier is, but I do know that the use cases for machine-representable ints (that is, and int that is the size of an int, fits in a register, etc.) is >99% and the use cases for a web service that generates prime numbers is <1%. That's what should drive the design.

The designs are not conflicting at all. std::string may be the vocabulary type, but it's only a typedef for std::basic_string. json::basic_value could also exist. Their implementations don't need to be shared (so it wouldn't conflict with the performance claims). Having said that, I do find the int64/uint64/double choices the right ones (for json::value). They aren't choices to learn about in the JSON spec, but outside. That's a discussion that I try to avoid for a number of reasons. My 2 cents. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/

Maximilian Riemensberger

2:16 p.m.

On 9/25/20 4:06 PM, Andrzej Krzemienski via Boost wrote:

...

pt., 25 wrz 2020 o 15:10 Zach Laine via Boost <boost@lists.boost.org> napisał(a):

...
On Fri, Sep 25, 2020 at 6:11 AM Mathias Gaunard via Boost <boost@lists.boost.org> wrote:

...
On Thu, 24 Sep 2020 at 10:57, Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:

...
My understanding of a "vocabulary type" is that it should be usable

(not

...
...
necessarily with maximum efficiency) for *any* usage. In the case of JSON that would mean that I should be able to represent any value that corresponds to a valid JSON when converted to text. I do not think that json::value can claim that without the ability to serialize arbitrarily big numbers.

I fully agree with this statement. json::value *needs* to support arbitrary numbers. It's incomplete without it. Maybe the author of multiprecision can advise on the best type to use there (gmp or mpfr?).

This is not a reasonable requirement. std::string is the canonical C++ vocabulary type. On 32-bit systems, it cannot represent 5GB-long strings. Depending on platform limitations, it usually cannot even represent more than 2GB-long strings. Computers are limited to finite resources. Putting finite limits on the representation of all kinds of values is normal, not unexpected -- this is especially true of numeric values.

I am wondering. If I have a small web service for generating prime numbers, and I need to return them in a JSON file, is my only option to pass it as string? Prime numbers of this kind are bigger than uint64_t. But they are not as big as 1MB. Is such a use case for a number so unusual that it cannot be stored as a JSON number?

If you have to interoperate with Javascript at some point, then I think the answer is yes, use a string. JS only knows about double (not even int64 or uint64).

...

Are JSON numbers only good for storing int-based identifiers?

Regards, &rzej;

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Vinnie Falco

2:48 p.m.

On Fri, Sep 25, 2020 at 7:07 AM Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:

...

Are JSON numbers only good for storing int-based identifiers?

The JSON specification is silent on the limits and precision of the range of numbers. All that we know is that it is a "light-weight data interchange format." However, we can gather quite a bit of anecdotal evidence simply by looking at the various languages which have built-in support for JSON.

...

From RFC7159 (https://tools.ietf.org/html/rfc7159)

This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available. Note the phrase "widely available."

...

From <https://stackoverflow.com/questions/13502398/json-integers-limit-on-size>

As a practical matter, Javascript integers are limited to about 2^53 (there are no integers; just IEEE floats).

...

From <https://developers.google.com/discovery/v1/type-format>

...a 64-bit integer cannot be represented in JSON (since JavaScript and JSON support integers up to 2^53).

...

From <https://github.com/josdejong/lossless-json>

When to use? Only in some special cases. For example when you have to create some sort of data processing middleware which has to process arbitrary JSON without risk of screwing up. JSON objects containing big numbers are rare in the wild.

...

From <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number>

The JavaScript Number type is a double-precision 64-bit binary format IEEE 754 value, like double in Java or C#....When parsing data that has been serialized to JSON, integer values falling outside of this range can be expected to become corrupted when JSON parser coerces them to Number type. A possible workaround is to use String instead.

...

From <https://docs.python.org/3/library/json.html#implementation-limitations>

When serializing to JSON, beware any such limitations in applications that may consume your JSON. In particular, it is common for JSON numbers to be deserialized into IEEE 754 double precision numbers and thus subject to that representation’s range and precision limitations. I am actually now starting to wonder if even 64-bit integer support was a good idea, as it can produce numbers which most implementations cannot read with perfect fidelity. It is true that there are some JSON implementations which support arbitrary-precision numbers, but these are rare and all come with the caveat that their output will likely be incorrectly parsed or rejected by the majority of implementations. This is quite an undesirable feature for an "interoperable, data-exchange format" or a vocabulary type. Support for arbitrary precision numbers would not come without cost. The library would be bigger, in a way that the linker can't strip (because of switch statements on the variant's kind). Everyone would pay for this feature (e.g. embedded) but only a handful of users would use it. There is overwhelming evidence that the following statement is false: "json::value *needs* to support arbitrary numbers. It's incomplete without it." Thanks

Andrzej Krzemienski

10:52 p.m.

pt., 25 wrz 2020 o 16:48 Vinnie Falco <vinnie.falco@gmail.com> napisał(a):

...

On Fri, Sep 25, 2020 at 7:07 AM Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:

...
Are JSON numbers only good for storing int-based identifiers?

The JSON specification is silent on the limits and precision of the range of numbers. All that we know is that it is a "light-weight data interchange format." However, we can gather quite a bit of anecdotal evidence simply by looking at the various languages which have built-in support for JSON.

From RFC7159 (https://tools.ietf.org/html/rfc7159)

This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.

Note the phrase "widely available."

From < https://stackoverflow.com/questions/13502398/json-integers-limit-on-size>

As a practical matter, Javascript integers are limited to about 2^53 (there are no integers; just IEEE floats).

From <https://developers.google.com/discovery/v1/type-format>

...a 64-bit integer cannot be represented in JSON (since JavaScript and JSON support integers up to 2^53).

From <https://github.com/josdejong/lossless-json>

When to use? Only in some special cases. For example when you have to create some sort of data processing middleware which has to process arbitrary JSON without risk of screwing up. JSON objects containing big numbers are rare in the wild.

From < https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Obj...

...
The JavaScript Number type is a double-precision 64-bit binary format IEEE 754 value, like double in Java or C#....When parsing data that has been serialized to JSON, integer values falling outside of this range can be expected to become corrupted when JSON parser coerces them to Number type. A possible workaround is to use String instead.

From < https://docs.python.org/3/library/json.html#implementation-limitations>

When serializing to JSON, beware any such limitations in applications that may consume your JSON. In particular, it is common for JSON numbers to be deserialized into IEEE 754 double precision numbers and thus subject to that representation’s range and precision limitations.

I am actually now starting to wonder if even 64-bit integer support was a good idea, as it can produce numbers which most implementations cannot read with perfect fidelity.

It is true that there are some JSON implementations which support arbitrary-precision numbers, but these are rare and all come with the caveat that their output will likely be incorrectly parsed or rejected by the majority of implementations. This is quite an undesirable feature for an "interoperable, data-exchange format" or a vocabulary type. Support for arbitrary precision numbers would not come without cost. The library would be bigger, in a way that the linker can't strip (because of switch statements on the variant's kind). Everyone would pay for this feature (e.g. embedded) but only a handful of users would use it.

There is overwhelming evidence that the following statement is false:

"json::value *needs* to support arbitrary numbers. It's incomplete without it."

I accidentally replied privately to Vinnie. I am now pasting my reply here: Thanks. This is a really useful background. This explains why JSON format

...

conflates integer and floating point numbers: in fact, originally this was only floating point numbers. Number 1 is just a different representation of a floating-point number. But if we adapt this view, bearing in mind that JavaScript JSON libraries may not be able to parse big uint64_t values, indeed Boost.JSON might have made the wrong trade-off by adding support for the full range of uint64_t. The cost is: (1) some values generated by Boost.JSON cannot be parsed by JavaScript JSON libraries, and (2) the complication of the interface (number_cast). And one could say that big uint64_t values constitute the 1% of the use cases that are not worth the costs. On the other hand there is one quire natural use case for the full range of uint64_t: hash values: they are naturally stored as size_t and the biggest values are equally likely to appear as the smallest. And libraries like rapidjson handle this case, so when they are able to serialize it, Boost.JSN should be able to parse it. It looks like the two following goals are not compatible: 1. Parse losslessly every value produced by rapidjson. 2. Generate only values parsable by losslessly JavaScript JSON libraries.

So, I guess the choice made in Boost.JSON is the good one. You will potentially produce values not parsable by some JSON libraries, and if goal 2 is important for some use cases the user has to make sure that she is only putting doubles as numbers.

By the way, when I learned about these issues with numbers/doubles, it occured to me that Boost.JSON must have somewhere a flaw in handling numbers given that it stores three different types and provides equality operator. So I tried to break it. And I couldn't. The mechanism for storing int, uint and double is very well designed and thought over: that you always prefer ints to doubles when parsing, that you always add a comma or exponent where serializing floats, that you compare correctly ins with uints, and that you always compare ints and floats as unequal. This is really consistent. I think it deserves a mention in the documentation.

Regards, &rzej;

Vinnie Falco

10:55 p.m.

On Fri, Sep 25, 2020 at 3:53 PM Andrzej Krzemienski <akrzemi1@gmail.com> wrote:
>> I am actually now starting to wonder if even 64-bit integer support
>> was a good idea, as it can produce numbers which most implementations
>> cannot read with perfect fidelity.
>> ...
> 1. Parse losslessly every value produced by rapidjson.
> 2. Generate only values parsable by losslessly JavaScript JSON libraries.
>
> So, I guess the choice made in Boost.JSON is the good one. You will potentially
> produce values not parsable by some JSON libraries, and if goal 2 is important
> for some use cases the user has to make sure that she is only putting doubles
> as numbers.

Thanks for the kind words.

So, I think at some point I will want to introduce options for
serialization, and one of the options could be the treatment of
integers outside the range ~+/-2^53. We could:

1. serialize them as-is (current implementation)
2. serialize them as the nearest representable IEEE double
3. throw an exception

I know some people might find #3 weird, I'm open to feedback.

Regards

Peter Dimov

26 Sep 26 Sep

12:52 a.m.

Vinnie Falco wrote:

...

So, I think at some point I will want to introduce options for serialization, and one of the options could be the treatment of integers outside the range ~+/-2^53. We could:

1. serialize them as-is (current implementation) 2. serialize them as the nearest representable IEEE double 3. throw an exception

I can't think of a reason to ever prefer #2 over #1. "As is" is already a legitimate serialization of the nearest representable IEEE double, so #1 is a valid implementation of #2, except it doesn't needlessly throw away information. There's no need to innovate here; we already know that preserving 64 bit integers is what's useful in practice.

Hadriel Kaplan

2:42 p.m.

...

On Sep 25, 2020, at 8:52 PM, Peter Dimov via Boost <boost@lists.boost.org> wrote:

Vinnie Falco wrote:

...
So, I think at some point I will want to introduce options for serialization, and one of the options could be the treatment of integers outside the range ~+/-2^53. We could:

1. serialize them as-is (current implementation) 2. serialize them as the nearest representable IEEE double 3. throw an exception

I can't think of a reason to ever prefer #2 over #1. "As is" is already a legitimate serialization of the nearest representable IEEE double, so #1 is a valid implementation of #2, except it doesn't needlessly throw away information.

There's no need to innovate here; we already know that preserving 64 bit integers is what's useful in practice.

I agree with Peter that #2 isn’t useful - no library I know of in C++ does #2. (Though not that I know them all by any means, of course) RapidJSON, nlohmann, folly::dynamic, and taoJSON, etc. all do #1 by default, regardless of precision loss on the receiver-side. folly::dynamic also offers a serialization option to do #3 (which they call `javascript_safe`). Protobuf’s json encoder serializes `int64`/`uint64` as a string, regardless of its actual value causing precision loss or not, I think. The decoder side accepts either number or string and converts it - but obviously it can only do so because it’s controlled by its schema so it knows what the type should be. And of course there’re some libraries that only store them as doubles to begin with (Qt’s QJSonDocument, for example). -hadriel

Mathias Gaunard

12:37 p.m.

On Fri, 25 Sep 2020 at 23:55, Vinnie Falco via Boost <boost@lists.boost.org> wrote:

...

So, I think at some point I will want to introduce options for serialization, and one of the options could be the treatment of integers outside the range ~+/-2^53. We could:

1. serialize them as-is (current implementation) 2. serialize them as the nearest representable IEEE double 3. throw an exception

I know some people might find #3 weird, I'm open to feedback.

What's the problem with storing it as a string or as an arbitrary number when it's not representable as int64 or a double? It doesn't cost anything to do this, it's a pure extension with no impact on people that don't need it.

Vinnie Falco

2:25 p.m.

On Sat, Sep 26, 2020 at 5:37 AM Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:

...

What's the problem with storing it as a string or as an arbitrary number when it's not representable as int64 or a double? It doesn't cost anything to do this, it's a pure extension with no impact on people that don't need it.

There is a cost. Users who have no need for arbitrary precision numbers (which is most users) will have a larger executable, paying for code that never runs. But there's another cost. Say that a library offers a public member function: void set_credentials( boost::json::object params ); The library must now deal with the possibility that the user submits arbitrary precision numbers in `params`. It can't just ignore them, or else it could be subjected to undefined behavior. It could state as a precondition "only regular numbers are supported." Either way, the support for arbitrary precision numbers benefits only a small number of people but all developers have to be burdened with handling it. The presence of arbitrary precision numbers in a JSON virtually guarantees that only a specialized receiver will be able to process it, as evidenced by the numerous documented warnings about producing JSON values outside prescribed ranges. Thanks

Hadriel Kaplan

2:39 p.m.

...

On Sep 26, 2020, at 8:37 AM, Mathias Gaunard via Boost <boost@lists.boost.org> wrote:

On Fri, 25 Sep 2020 at 23:55, Vinnie Falco via Boost <boost@lists.boost.org> wrote:

...
So, I think at some point I will want to introduce options for serialization, and one of the options could be the treatment of integers outside the range ~+/-2^53. We could:

1. serialize them as-is (current implementation) 2. serialize them as the nearest representable IEEE double 3. throw an exception

I know some people might find #3 weird, I'm open to feedback.

What's the problem with storing it as a string or as an arbitrary number when it's not representable as int64 or a double? It doesn't cost anything to do this, it's a pure extension with no impact on people that don't need it.

Do you mean encoding `int64_t`/`uint64_t` into JSON as a string, when either: (a) their value otherwise lose precision or even just (b) always? Let’s call those options #4a and #4b. Obviously they can’t be default behavior because they don't round-trip. #4a seems inconsistent behavior to me, but :shrug: #4b seems reasonable, as it’s a common choice people seem to make to keep precision today. -hadriel

Hadriel Kaplan

3:35 p.m.

...

On Sep 26, 2020, at 10:39 AM, Hadriel Kaplan via Boost <boost@lists.boost.org> wrote:

...
On Sep 26, 2020, at 8:37 AM, Mathias Gaunard via Boost <boost@lists.boost.org> wrote:

What's the problem with storing it as a string or as an arbitrary number when it's not representable as int64 or a double? It doesn't cost anything to do this, it's a pure extension with no impact on people that don't need it.

Do you mean encoding `int64_t`/`uint64_t` into JSON as a string, when either: (a) their value otherwise lose precision or even just (b) always?

Let’s call those options #4a and #4b. Obviously they can’t be default behavior because they don't round-trip.

#4a seems inconsistent behavior to me, but :shrug: #4b seems reasonable, as it’s a common choice people seem to make to keep precision today.

Actually, thinking about this for more than a few seconds, the problem with #4b is _all_ integers in the entire data structure would be encoded as strings, which is probably not what anyone would want. (?) Suddenly #4a sounds reasonable. :) In practice, what other C++ libs do is push that decision back onto the user to make - either by having a callback-based serializer, or by simply ignoring it and making the user set the values as strings to begin with. That way the user chooses which particular fields to encode as what, since only the user knows the schema and use-case. Boost.JSON could also ignore it and push that decision back to the user, but also offer a convenient way to set a string in the json::value from an integral. For example by providing the equivalent of `std::to_string(<integral>)` but for json:string, or even a member function of json::string such as json::string::from(<integral>) or some such. -hadriel

Hadriel Kaplan

25 Sep 25 Sep

3:23 p.m.

...

On Sep 25, 2020, at 7:05 AM, Mathias Gaunard via Boost <boost@lists.boost.org> wrote:

On Thu, 24 Sep 2020 at 10:57, Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:

...
My understanding of a "vocabulary type" is that it should be usable (not necessarily with maximum efficiency) for *any* usage. In the case of JSON that would mean that I should be able to represent any value that corresponds to a valid JSON when converted to text. I do not think that json::value can claim that without the ability to serialize arbitrarily big numbers.

I fully agree with this statement. json::value *needs* to support arbitrary numbers. It's incomplete without it.

Empirical evidence would suggest otherwise. nlohmann, RapidJson, folly::dynamic, etc. do not support that. How can it *need* to support it, when other popular and useful libraries haven’t? Even in javascript land, while there’s spec support for BigInt as a value within javascript’s language types, there’s no ECMA spec for how to encode or decode it to JSON that I know of. There are several libraries that do custom things to encode/decode BigInts to/from JSON, but none of them are interoperable. Chrome V8 engine has BigInt support I believe, for example, but does not support encoding it to JSON. If boost.JSON were to choose its own syntax for encoding such things, it wouldn’t be interoperable with anything other than itself for that value. And if ECMA ever does specify how to encode it, Boost.JSON would have to either change its encoding to match that and thereby break backward-compatibility with previous versions of Boost.JSON... or it would have to offer serialization+parsing options to choose the encoding, which would suck because you’d have to know which JSON encoding style you’re using for any given file/socket. Regardless, I wouldn’t be surprised if Boost.JSON added support for holding values of larger range/precision a la Boost.Multiprecision someday, but that day does not need to be now, in my opinion. I fully expect/hope that if Boost.JSON takes off, that more features will be added to it in the future based on demand and submitted PRs. -hadriel

Rainer Deyke

26 Sep 26 Sep

6:44 p.m.

On 25.09.20 17:23, Hadriel Kaplan via Boost wrote:

...

Empirical evidence would suggest otherwise. nlohmann, RapidJson, folly::dynamic, etc. do not support that. How can it *need* to support it, when other popular and useful libraries haven’t?

Actually AFAIK nlohmann does support arbitrary numbers, since all representation types are template arguments to basic_json. There's nothing to stop you from passing a BigInt type as NumberIntegerType. -- Rainer Deyke (rainerd@eldwood.com)

Hadriel Kaplan

10:09 p.m.

...

On Sep 26, 2020, at 2:44 PM, Rainer Deyke via Boost <boost@lists.boost.org> wrote:

On 25.09.20 17:23, Hadriel Kaplan via Boost wrote:

...
Empirical evidence would suggest otherwise. nlohmann, RapidJson, folly::dynamic, etc. do not support that. How can it *need* to support it, when other popular and useful libraries haven’t?

Actually AFAIK nlohmann does support arbitrary numbers, since all representation types are template arguments to basic_json. There's nothing to stop you from passing a BigInt type as NumberIntegerType.

Yeah, fair point. But then that would be your new one-and-only signed integer type. I’m not sure that’s really the same idea as what’s being asked for.(?) I mean that’s going to make using it as a cross-library API value-type fairly painful, no? One could also just make a new `boost::json::kind` of `any`, and implement something similar to `std::any` to be held as one of the variant types. (and thereby avoid templating everything in boost::json) -hadriel

Peter Dimov

25 Sep 25 Sep

2:47 p.m.

Andrzej Krzemienski wrote:

...

My understanding of a "vocabulary type" is that it should be usable (not necessarily with maximum efficiency) for *any* usage.

This is not at all what a vocabulary type is. A vocabulary type is a type via which two libraries can communicate, without that type being defined by either of them. E.g. std::size_t is a vocabulary type. It's obviously not usable for *any* usage.

1753

Age (days ago)

1755

Last active (days ago)

List overview

Download

25 comments

11 participants

participants (11)

Alexander Grund
Andrzej Krzemienski
Hadriel Kaplan
Mathias Gaunard
Maximilian Riemensberger
Mike
Peter Dimov
Rainer Deyke
Vinnie Falco
Vinícius dos Santos Oliveira
Zach Laine