The formula for MANTISSA is correct 870618 AI00205/06 1
 !standard 03.05.07 (06) 870618 AI00205/06
 !class ramification 840313
 !status approved by WG9/AJPO 870617
 !status approved by Director, AJPO 870617
 !status approved by WG9 870529
!status approved by Ada Board (2100) 870219
!status panel/committeeapproved 861015 (reviewed)
!status panel/committeeapproved (500) 860911 (pending editorial review)
!status workitem 860730
!status received 840313
!references 8300281, 8300493
!topic The formula for MANTISSA is correct
 !summary 870604
The number of mantissa bits for D decimal digits of accuracy is correctly
 given by the formula in the Standard, namely, the integer next above
 (D*log(10)/log(2)) + 1.

 !question 870604
The Standard says that the number of binary digits in the mantissa of a model
 floating point number is the integer next above (D*log(10)/log(2)) + 1. This
seems to be one digit too many, since to ensure D digits of accuracy, it is
sufficient if 2**(B) <= 10**(D), i.e.:
B * log(2) <= D * log(10)
B >= D * log(10)/log(2)
B = ceiling(D * log(10)/log(2)) (since B is an integer)
This gives a value one less than the value specified in 3.5.7(6). Is the
formula given in the Standard correct?
!response 860730
The formula given in the Standard is correct. The Standard says [3.5.7(6)]:
The number [of binary mantissa bits] B associated with [the
minimal number of decimal digits] D is the smallest value such
that the relative precision of the binary form is no less than
that specified for the decimal form.
This requirement is not equivalent to saying that 2**(B) must be less than
or equal to 10**(D). An example may be the best way to see the problem.
Consider D=2, so that by the proposed rule, B = 7, whereas by the Standard's
rule, B = 8. Now consider numbers in the vicinity of 8, whose representation
is 0.80E1 in decimal, and 2#0.1000_000#E4 if carried to 7 bits of precision
in binary. Now the difference between 8 and the number next above it in the
decimal representation is, of course, 0.1, whereas the difference in the
binary representation is 2#0.0000_001#E4, or 0.125 (1/8). The relative error
for the decimal representation is 0.1/8.0 = 1/80, while the relative error
The formula for MANTISSA is correct 870618 AI00205/06 2
for the binary form is 0.125/8.0 = 1/64 > 1/80, i.e., the binary form is less
accurate than the decimal form, since the relative error is greater, and this
is not allowed by 3.5.7(6).
The Standard requires that the largest relative error for model numbers be
less than the smallest relative error for the corresponding decimal numbers.
(If so, every Ddigit decimal number can be represented uniquely as a model
number, and such a model number can, in turn, be uniquely mapped back to the
original decimal number; see I. B. Goldberg, "27 bits are not enough for
8digit accuracy," CACM 10, 2 (Feb. 1967), pp. 105106.)
The maximum relative error for a Ddigit number is one divided by the
smallest Ddigit number, e.g., for D = 2, the maximum relative error is 1/10
= 10**(D+1). The minimum relative error for a 2digit decimal number is
1/99, which is slightly larger than 10**(D), so a lower bound on the minimum
relative error is 10**(D). Corresponding calculations apply to Bdigit
binary numbers. In short, in order to ensure that every decimal number has a
unique model number representation, it is necessary that:
max binary (model) number error <= min decimal number error, i.e.,
2**(B+1) <= 10**(D)
This relation leads directly to the formula given in the Standard.