Matt Hurd wrote:
IEEE 16bit (fp16) and bfloat16 are both around, but bfloat16 seems to be the new leader in modern implementations thanks to ML use. I haven't experienced both used together but I wouldn't rule it out given bfloat16 may be accelerator specific. Google and intel have support for bfloat16 in some hardware. bfloat16 makes it easy to move to fp32 as they have the same exponent size.
Refs: https://en.wikipedia.org/wiki/Bfloat16_floating-point_format https://nickhigham.wordpress.com/2018/12/03/half-precision-arithmetic-fp16-v...
According to section 4.1.2 of this ARM document: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053d/IHI0053D_acle_2_1.... implementations support both the IEEE format (1 sign, 5 exponent and 10 mantissa) and an alternative format which is similar except that it doesn't support Inf and NaN, and gains slightly more range. Apparently the bfloat16 format is supported in ARMv8.6-A, but I don't believe that is deployed anywhere yet. The other place where I've used 16-bit floats is in OpenGL textures, (https://www.khronos.org/registry/OpenGL/extensions/OES/OES_texture_float.txt), which use the 1-5-10 format. I was a bit surprised by the 1-5-10 choice; the maximum value that can be represented is only 65504, i.e. less than the maximum value for an unsigned int of the same size. bfloat16 can be trivially implemented (as a storage-only type) simply by truncating a 32-bit float; perhaps support for that would be useful too? Regards, Phil.