pt., 25 wrz 2020 o 16:48 Vinnie Falco
On Fri, Sep 25, 2020 at 7:07 AM Andrzej Krzemienski via Boost
wrote: Are JSON numbers only good for storing int-based identifiers?
The JSON specification is silent on the limits and precision of the range of numbers. All that we know is that it is a "light-weight data interchange format." However, we can gather quite a bit of anecdotal evidence simply by looking at the various languages which have built-in support for JSON.
From RFC7159 (https://tools.ietf.org/html/rfc7159)
This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.
Note the phrase "widely available."
From < https://stackoverflow.com/questions/13502398/json-integers-limit-on-size>
As a practical matter, Javascript integers are limited to about 2^53 (there are no integers; just IEEE floats).
From https://developers.google.com/discovery/v1/type-format
...a 64-bit integer cannot be represented in JSON (since JavaScript and JSON support integers up to 2^53).
From https://github.com/josdejong/lossless-json
When to use? Only in some special cases. For example when you have to create some sort of data processing middleware which has to process arbitrary JSON without risk of screwing up. JSON objects containing big numbers are rare in the wild.
From < https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Obj...
The JavaScript Number type is a double-precision 64-bit binary format IEEE 754 value, like double in Java or C#....When parsing data that has been serialized to JSON, integer values falling outside of this range can be expected to become corrupted when JSON parser coerces them to Number type. A possible workaround is to use String instead.
From < https://docs.python.org/3/library/json.html#implementation-limitations>
When serializing to JSON, beware any such limitations in applications that may consume your JSON. In particular, it is common for JSON numbers to be deserialized into IEEE 754 double precision numbers and thus subject to that representation’s range and precision limitations.
I am actually now starting to wonder if even 64-bit integer support was a good idea, as it can produce numbers which most implementations cannot read with perfect fidelity.
It is true that there are some JSON implementations which support arbitrary-precision numbers, but these are rare and all come with the caveat that their output will likely be incorrectly parsed or rejected by the majority of implementations. This is quite an undesirable feature for an "interoperable, data-exchange format" or a vocabulary type. Support for arbitrary precision numbers would not come without cost. The library would be bigger, in a way that the linker can't strip (because of switch statements on the variant's kind). Everyone would pay for this feature (e.g. embedded) but only a handful of users would use it.
There is overwhelming evidence that the following statement is false:
"json::value *needs* to support arbitrary numbers. It's incomplete without it."
I accidentally replied privately to Vinnie. I am now pasting my reply here: Thanks. This is a really useful background. This explains why JSON format
conflates integer and floating point numbers: in fact, originally this was only floating point numbers. Number 1 is just a different representation of a floating-point number. But if we adapt this view, bearing in mind that JavaScript JSON libraries may not be able to parse big uint64_t values, indeed Boost.JSON might have made the wrong trade-off by adding support for the full range of uint64_t. The cost is: (1) some values generated by Boost.JSON cannot be parsed by JavaScript JSON libraries, and (2) the complication of the interface (number_cast). And one could say that big uint64_t values constitute the 1% of the use cases that are not worth the costs. On the other hand there is one quire natural use case for the full range of uint64_t: hash values: they are naturally stored as size_t and the biggest values are equally likely to appear as the smallest. And libraries like rapidjson handle this case, so when they are able to serialize it, Boost.JSN should be able to parse it. It looks like the two following goals are not compatible: 1. Parse losslessly every value produced by rapidjson. 2. Generate only values parsable by losslessly JavaScript JSON libraries.
So, I guess the choice made in Boost.JSON is the good one. You will potentially produce values not parsable by some JSON libraries, and if goal 2 is important for some use cases the user has to make sure that she is only putting doubles as numbers.
By the way, when I learned about these issues with numbers/doubles, it occured to me that Boost.JSON must have somewhere a flaw in handling numbers given that it stores three different types and provides equality operator. So I tried to break it. And I couldn't. The mechanism for storing int, uint and double is very well designed and thought over: that you always prefer ints to doubles when parsing, that you always add a comma or exponent where serializing floats, that you compare correctly ins with uints, and that you always compare ints and floats as unequal. This is really consistent. I think it deserves a mention in the documentation.
Regards, &rzej;