New numeric data type proposal

newer
[GSOC 2014] Boost.Thread /...

older
[spirit] How to detect Spirit.Qi...

Damian Alberto Vicino

23 Mar 2014 23 Mar '14

6:11 p.m.

Hi, I work in a simulation tools' research group and we developed a basic data type for handling time representation that we want to share. The data type is kind of a mix between MultiPresicionFloatingPoint and Rational. We are using this kind of data type in concrete simulator implementations because of a theoretic problem with the formalisms we implement, in which the floating point error can not be bounded. We published a paper in Simutools14 a few days ago explaining how this unbound errors are reached under a specific formalism (DEVS) and why they are not easy to detect. I can share the paper if someone interested in the details, or provide some toy examples to discuss. And we are working in showing same kind of errors in other formalisms. A simplified version of the proposed data type is: Number<N, D, M, E> where in the common case the four are some integer-like type. The number can be interpreted as N/D * M * 2^E where N and D are never modified unless required for operation of compatibility. Compatibility operation is called when an operation between 2 numbers having different N/D happens. It can also be thought as a variable radix MPFP. We choose this implementation because when used as replacement of integer or float, it doesn’t decrease significantly the performance. It only decrease significantly when operating with values that were not representable in any of those (if we simplify N/D). I will love to discuss implementation details, alternative ideas to handle the problem (this one worked for us, but sure there is other options too that may be more generic or performant in generic scenarios). Also, I’m up to code the whole thing, and to share the naive implementation we got in our simulator so far to start the discussion. Best regards, Damian Vicino

Show replies by date

John Maddock

24 Mar 24 Mar

8:28 a.m.

...

I work in a simulation tools' research group and we developed a basic data type for handling time representation that we want to share. The data type is kind of a mix between MultiPresicionFloatingPoint and Rational.

We are using this kind of data type in concrete simulator implementations because of a theoretic problem with the formalisms we implement, in which the floating point error can not be bounded.

We published a paper in Simutools14 a few days ago explaining how this unbound errors are reached under a specific formalism (DEVS) and why they are not easy to detect. I can share the paper if someone interested in the details, or provide some toy examples to discuss. And we are working in showing same kind of errors in other formalisms.

A simplified version of the proposed data type is: Number<N, D, M, E> where in the common case the four are some integer-like type. The number can be interpreted as N/D * M * 2^E where N and D are never modified unless required for operation of compatibility. Compatibility operation is called when an operation between 2 numbers having different N/D happens. It can also be thought as a variable radix MPFP. We choose this implementation because when used as replacement of integer or float, it doesn’t decrease significantly the performance. It only decrease significantly when operating with values that were not representable in any of those (if we simplify N/D).

I will love to discuss implementation details, alternative ideas to handle the problem (this one worked for us, but sure there is other options too that may be more generic or performant in generic scenarios). Also, I’m up to code the whole thing, and to share the naive implementation we got in our simulator so far to start the discussion.

This sounds interesting, though I'm not sure I understand the rationale/functionality just yet. Are you aware of Boost.Multiprecision (http://www.boost.org/doc/libs/1_55_0/libs/multiprecision/doc/html/index.html) and would this fit within that framework? The answer probably depends on whether you want/need the expression-template framework that that library provides. Regards, John.

Damian Alberto Vicino

6:17 p.m.

Hi, I’m aware of MultiPresicion types in Boost and conceptually I think they are related (and probably also related in-code). The main difference with MPFP is the accuracy and the main difference with Rational is in performance and range. The scenario to use this type is places where accurate results are needed and a common divisor for operands used exists and is known in most operations. The number can be expressed as QxF where Q is some kind of rational and F is some kind of float. (4 integers, Qnum, Qdenom, Mantissa, Exponente). The Q handles the accuracy issues: A simulation example, suppose you got 2 things ticking one 3 times per second and the other 1 time per second. Using MPFP (in Base 10), it doesn’t matter how large precision you take, the 0.33333333 will not add up to integer numbers every 3 and some times the 3x ticks of one will not be simultaneous with the other ticking every 1 sec. Using the native float same example happens the same. The problem is in the periodic numbers which can easily represented by rationals. In this example the proposed numbers are represented as 1/3 * 1 and 1/3 * 3 and the Q never operates because the 2 numbers have common Q, it is reduced to compare if Q equal + same operations than using some kind of float. Compared to Rational the difference is the flexibility of using mantisa and exponents as float to avoid modifying the Q all the time, avoiding unnecessary simplifications and having a larger representation range that which can be handled with just changing magnitude orders using the exponent. The performance bottleneck is when the Q is different in the operators, but it is not worst than using Rationals and it only happens once in a while in the intended use cases. Suppose, a 1/6 Q and 1/7 Q want to participate in an addition, a common divisor needs to be found before operate, both are adjusted to the new divisor. An example of use is our simulator use case: We got models interacting in a simulation, each model has it own Q fixed, so after linear in quantity of models operations had been done, the Q stabilises for the whole simulation and afterward we only operate on the floating point part of the number. We don’t set a global known Q at the start because models are developed in different contexts, research groups, etc we can not enforce a fixed time for everyone using a general purpose simulator, but using this method we can allow interaction easily between those models when needed with zero recoding. Best regards, Damian On Mar 24, 2014, at 4:28 AM, John Maddock <boost.regex@virgin.net> wrote:

...

...
I work in a simulation tools' research group and we developed a basic data type for handling time representation that we want to share. The data type is kind of a mix between MultiPresicionFloatingPoint and Rational.

We are using this kind of data type in concrete simulator implementations because of a theoretic problem with the formalisms we implement, in which the floating point error can not be bounded.

We published a paper in Simutools14 a few days ago explaining how this unbound errors are reached under a specific formalism (DEVS) and why they are not easy to detect. I can share the paper if someone interested in the details, or provide some toy examples to discuss. And we are working in showing same kind of errors in other formalisms.

A simplified version of the proposed data type is: Number<N, D, M, E> where in the common case the four are some integer-like type. The number can be interpreted as N/D * M * 2^E where N and D are never modified unless required for operation of compatibility. Compatibility operation is called when an operation between 2 numbers having different N/D happens. It can also be thought as a variable radix MPFP. We choose this implementation because when used as replacement of integer or float, it doesn’t decrease significantly the performance. It only decrease significantly when operating with values that were not representable in any of those (if we simplify N/D).

I will love to discuss implementation details, alternative ideas to handle the problem (this one worked for us, but sure there is other options too that may be more generic or performant in generic scenarios). Also, I’m up to code the whole thing, and to share the naive implementation we got in our simulator so far to start the discussion.

This sounds interesting, though I'm not sure I understand the rationale/functionality just yet. Are you aware of Boost.Multiprecision (http://www.boost.org/doc/libs/1_55_0/libs/multiprecision/doc/html/index.html) and would this fit within that framework? The answer probably depends on whether you want/need the expression-template framework that that library provides.

Regards, John.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

John Maddock

25 Mar 25 Mar

3:29 p.m.

...

I’m aware of MultiPresicion types in Boost and conceptually I think they are related (and probably also related in-code).

The main difference with MPFP is the accuracy and the main difference with Rational is in performance and range.

The scenario to use this type is places where accurate results are needed and a common divisor for operands used exists and is known in most operations.

The number can be expressed as QxF where Q is some kind of rational and F is some kind of float. (4 integers, Qnum, Qdenom, Mantissa, Exponente).

The Q handles the accuracy issues: A simulation example, suppose you got 2 things ticking one 3 times per second and the other 1 time per second. Using MPFP (in Base 10), it doesn’t matter how large precision you take, the 0.33333333 will not add up to integer numbers every 3 and some times the 3x ticks of one will not be simultaneous with the other ticking every 1 sec. Using the native float same example happens the same. The problem is in the periodic numbers which can easily represented by rationals.

In this example the proposed numbers are represented as 1/3 * 1 and 1/3 * 3 and the Q never operates because the 2 numbers have common Q, it is reduced to compare if Q equal + same operations than using some kind of float.

Compared to Rational the difference is the flexibility of using mantisa and exponents as float to avoid modifying the Q all the time, avoiding unnecessary simplifications and having a larger representation range that which can be handled with just changing magnitude orders using the exponent.

The performance bottleneck is when the Q is different in the operators, but it is not worst than using Rationals and it only happens once in a while in the intended use cases. Suppose, a 1/6 Q and 1/7 Q want to participate in an addition, a common divisor needs to be found before operate, both are adjusted to the new divisor.

An example of use is our simulator use case: We got models interacting in a simulation, each model has it own Q fixed, so after linear in quantity of models operations had been done, the Q stabilises for the whole simulation and afterward we only operate on the floating point part of the number. We don’t set a global known Q at the start because models are developed in different contexts, research groups, etc we can not enforce a fixed time for everyone using a general purpose simulator, but using this method we can allow interaction easily between those models when needed with zero recoding.

Thanks for the explanation, it's interesting, and almost the same as "exact floating point" types which are represented as a/b x 2^n and have exact (arbitrary precision) operations for everything except division (which is unsupported). John.

Damian Alberto Vicino

4:08 p.m.

On Mar 25, 2014, at 11:29 AM, John Maddock <boost.regex@virgin.net> wrote:

...

Thanks for the explanation, it's interesting, and almost the same as "exact floating point" types which are represented as a/b x 2^n and have exact (arbitrary precision) operations for everything except division (which is unsupported).

John.

I didn’t see exact floating points in Boost. Are they already implemented? The idea is pretty similar, I even considered the option of removing Qn and using just 3 integer types for the representation, but using same strategy of only change Qd when needed by an operation. If there is someone already working on that datatype I will like to join the effort. Best regards, Damian

John Maddock

4:52 p.m.

...

...
Thanks for the explanation, it's interesting, and almost the same as "exact floating point" types which are represented as a/b x 2^n and have exact (arbitrary precision) operations for everything except division (which is unsupported).

John.

I didn’t see exact floating points in Boost. Are they already implemented? The idea is pretty similar, I even considered the option of removing Qn and using just 3 integer types for the representation, but using same strategy of only change Qd when needed by an operation. If there is someone already working on that datatype I will like to join the effort.

No they're not implemented - it's on the TODO list ;-) I actually don't *think* it would be that hard to implement, I just wondered what the pro's and cons of the different approaches were? Regards, John.

Damian Alberto Vicino

5:45 p.m.

...

...
I didn’t see exact floating points in Boost. Are they already implemented? The idea is pretty similar, I even considered the option of removing Qn and using just 3 integer types for the representation, but using same strategy of only change Qd when needed by an operation. If there is someone already working on that datatype I will like to join the effort.

No they're not implemented - it's on the TODO list ;-)

I actually don't *think* it would be that hard to implement, I just wondered what the pro's and cons of the different approaches were?

Regards, John.

Do you have a place I can read about how the implementation of that one was intended? I think the differences between the 2 should be in the implementation details and not in the interfaces. The idea behind keeping the Qnum was to maximise the usable numbers in the mantissa. In an scenario where the Q (num and denim) stabilises, the useful representable values of the mantissa is as large as possible. While in the other approach, suppose that the Q should stabilise at 1500/1501 then, Qdenom is 1501, but 1500 is present multiplier of every mantissa. This way you waste 1499 every 1500 numbers in the mantissa representation and probably need to increase the size of the mantissa (or not, depends on the use case). In the other side, finding Qnum is not necessarily easy, it increases the quantity of computation needed to find the common factor between 2 operands and may increase the quantity of disagreements (1/3 and 2/3 are now not in agreement). Regards, Damian

John Maddock

6:45 p.m.

...

Do you have a place I can read about how the implementation of that one was intended?

No sorry, it was discussed/requested when the library was reviewed that's all. There are some alternative implementations on the net, but I can't find them right now :-( John.

...

I think the differences between the 2 should be in the implementation details and not in the interfaces.

The idea behind keeping the Qnum was to maximise the usable numbers in the mantissa. In an scenario where the Q (num and denim) stabilises, the useful representable values of the mantissa is as large as possible. While in the other approach, suppose that the Q should stabilise at 1500/1501 then, Qdenom is 1501, but 1500 is present multiplier of every mantissa. This way you waste 1499 every 1500 numbers in the mantissa representation and probably need to increase the size of the mantissa (or not, depends on the use case). In the other side, finding Qnum is not necessarily easy, it increases the quantity of computation needed to find the common factor between 2 operands and may increase the quantity of disagreements (1/3 and 2/3 are now not in agreement).

Regards, Damian

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Marc Glisse

26 Mar 26 Mar

6:37 a.m.

On Tue, 25 Mar 2014, John Maddock wrote:

...

Thanks for the explanation, it's interesting, and almost the same as "exact floating point" types which are represented as a/b x 2^n and have exact (arbitrary precision) operations for everything except division (which is unsupported).

I think you mean a*2^n. There are LGPL implementations in CGAL: Gmpzf, MP_Float and most recently Mpzf. With a/b*2^n (I wanted to try it as well, but I didn't have time to implement it), I don't see any reason not to support division. It still seems quite a bit different from what Damian was suggesting. Performance is the main driver for those weird types ("usual" rationals would work), and a different performance profile will fit completely different applications. -- Marc Glisse

Damian Alberto Vicino

28 Mar 28 Mar

1:58 a.m.

...

It still seems quite a bit different from what Damian was suggesting. Performance is the main driver for those weird types ("usual" rationals would work), and a different performance profile will fit completely different applications.

I agree on the different application profiles. Do you think this data type can have a place in the Boost.MultiPresicion Library? Regards, Damian

John Maddock

8:53 a.m.

...

...
It still seems quite a bit different from what Damian was suggesting. Performance is the main driver for those weird types ("usual" rationals would work), and a different performance profile will fit completely different applications.

I agree on the different application profiles. Do you think this data type can have a place in the Boost.MultiPresicion Library?

It would seem to be the logical place for it, so if you want to write it, then submissions are always welcome ;-) Regards, John.

Damian Alberto Vicino

30 Mar 30 Mar

1:42 p.m.

...

It would seem to be the logical place for it, so if you want to write it, then submissions are always welcome ;-)

I started the exams season now, but after finishing my tests I will start writing it. Best regards, Damian

4124

Age (days ago)

4131

Last active (days ago)

List overview

Download

11 comments

3 participants

participants (3)

Damian Alberto Vicino
John Maddock
Marc Glisse