Re: [boost] [review][JSON] json::value as a vocabulary type

25 Sep 2020

      pt., 25 wrz 2020 o 16:48 Vinnie Falco <vinnie.falco@gmail.com> napisał(a):
...
On Fri, Sep 25, 2020 at 7:07 AM Andrzej Krzemienski via Boost
<boost@lists.boost.org> wrote:
...
Are JSON numbers only good for storing int-based identifiers?
The JSON specification is silent on the limits and precision of the
range of numbers. All that we know is that it is a "light-weight data
interchange format." However, we can gather quite a bit of anecdotal
evidence simply by looking at the various languages which have
built-in support for JSON.
From RFC7159 (https://tools.ietf.org/html/rfc7159)
This specification allows implementations to set limits on the range
   and precision of numbers accepted.  Since software that implements
   IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
   generally available and widely used, good interoperability can be
   achieved by implementations that expect no more precision or range
   than these provide, in the sense that implementations will
   approximate JSON numbers within the expected precision.  A JSON
   number such as 1E400 or 3.141592653589793238462643383279 may indicate
   potential interoperability problems, since it suggests that the
   software that created it expects receiving software to have greater
   capabilities for numeric magnitude and precision than is widely
   available.
Note the phrase "widely available."
From <
https://stackoverflow.com/questions/13502398/json-integers-limit-on-size>
As a practical matter, Javascript integers are limited to about 2^53
    (there are no integers; just IEEE floats).
From <https://developers.google.com/discovery/v1/type-format>
...a 64-bit integer cannot be represented in JSON (since JavaScript
    and JSON support integers up to 2^53).
From <https://github.com/josdejong/lossless-json>
When to use? Only in some special cases. For example when you
    have to create some sort of data processing middleware which has
    to process arbitrary JSON without risk of screwing up. JSON objects
   containing big numbers are rare in the wild.
From <
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Obj...
...
The JavaScript Number type is a double-precision 64-bit binary format
    IEEE 754 value, like double in Java or C#....When parsing data that has
    been serialized to JSON, integer values falling outside of this range
can
    be expected to become corrupted when JSON parser coerces them to
    Number type. A possible workaround is to use String instead.
From <
https://docs.python.org/3/library/json.html#implementation-limitations>
When serializing to JSON, beware any such limitations in applications
    that may consume your JSON. In particular, it is common for JSON
    numbers to be deserialized into IEEE 754 double precision numbers
    and thus subject to that representation’s range and precision
limitations.
I am actually now starting to wonder if even 64-bit integer support
was a good idea, as it can produce numbers which most implementations
cannot read with perfect fidelity.
It is true that there are some JSON implementations which support
arbitrary-precision numbers, but these are rare and all come with the
caveat that their output will likely be incorrectly parsed or rejected
by the majority of implementations. This is quite an undesirable
feature for an "interoperable, data-exchange format" or a vocabulary
type. Support for arbitrary precision numbers would not come without
cost. The library would be bigger, in a way that the linker can't
strip (because of switch statements on the variant's kind). Everyone
would pay for this feature (e.g. embedded) but only a handful of users
would use it.
There is overwhelming evidence that the following statement is false:
"json::value *needs* to support arbitrary numbers. It's incomplete
without it."
I accidentally replied privately to Vinnie. I am now pasting my reply here:

Thanks. This is a really useful background. This explains why JSON format
...
conflates integer and floating point numbers: in fact, originally this was
only floating point numbers. Number 1 is just a different representation of
a floating-point number. But if we adapt this view, bearing in mind that
JavaScript JSON libraries may not be able to parse big uint64_t values,
indeed Boost.JSON might have made the wrong trade-off by adding support for
the full range of uint64_t. The cost is: (1) some values generated by
Boost.JSON cannot be parsed by JavaScript JSON libraries, and (2) the
complication of the interface (number_cast). And one could say that big
uint64_t values constitute the 1% of the use cases that are not worth the
costs.
On the other hand there is one quire natural use case for the full range
of uint64_t: hash values: they are naturally stored as size_t and the
biggest values are equally likely to appear as the smallest. And libraries
like rapidjson handle this case, so when they are able to serialize it,
Boost.JSN should be able to parse it. It looks like the two following goals
are not compatible:
1. Parse losslessly every value produced by rapidjson.
2. Generate only values parsable by losslessly JavaScript JSON libraries.
So, I guess the choice made in Boost.JSON is the good one. You will
potentially produce values not parsable by some JSON libraries, and if goal
2 is important for some use cases the user has to make sure that she is
only putting doubles as numbers.
By the way, when I learned about these issues with numbers/doubles, it
occured to me that Boost.JSON must have somewhere a flaw in handling
numbers given that it stores three different types and provides equality
operator. So I tried to break it. And I couldn't. The mechanism for storing
int, uint and double is very well designed and thought over: that you
always prefer ints to doubles when parsing, that you always add a comma or
exponent where serializing floats, that you compare correctly ins with
uints, and that you always compare ints and floats as unequal. This is
really consistent. I think it deserves a mention in the documentation.
Regards,
&rzej;