Ada 9X LSN046MRT
Numerics Annex (Rationale), Vers. 4.7
June 1992
K W Dritz
Argonne National Laboratory
Argonne, IL 60439
Email: dritz@mcs.anl.gov
Attached below is my revision of the Numerics Annex (rationale) following the
last DRs' meeting. I sent the specification separately a little while ago.
The major changes are marked in the margin with change bars.
Ken
===============================================================================
1. Numerics Annex (Rationale)
This section provides the rationale for language features proposed in the 
Numerics Annex of the Mapping Specification. These language features 
include models of floatingpoint and fixedpoint arithmetic, a predefined 
generic package of elementary functions, and attributes comprising a 
collection of ``primitive'' floatingpoint manipulation functions. The 
models of floatingpoint and fixedpoint arithmetic provide the semantics of 
operations of the real types and are placed in the Numerics Annex for 
presentation purposes only; all implementations must conform to the models. 
The elementary functions are currently described in the Numerics Annex but 
are relevant to such a wide variety of applications that they will probably 
be moved to a chapter of the core on ``Ada Standard Libraries.'' The 
primitive functions were once thought to be germane only to the development 
of mathematical software libraries, but the realization that they have uses 
in the implementation of I/O formatting and other applications has resulted 
in a desire to move them to the core. 
The treatment of numerics is simplified by
 retaining, for floatingpoint types, only one of the concepts of
model numbers and safe numbers;
 eliminating, for fixedpoint types, both the model numbers and the 
safe numbers; 
 eliminating attributes that are no longer needed.
The simplification provides the following direct benefits:
 real types become somewhat easier to describe;
 at least one common misapprehension (that the safe numbers and
model numbers of a floatingpoint type differ only in range, i.e.,
that they have the same precision) loses its basis;
 fixedpoint types become more intuitive, with no loss of
functionality.
Conceptually, for floatingpoint types it is the model numbers that are
being eliminated and the safe numbers that are being kept. However, in the
process of doing so, the properties of the latter are being changed
slightly, and they are being called model numbers instead of safe numbers.
One may prefer to think that the model numbers have been retained (with some
changes) and the safe numbers eliminated, but the surviving concept is much
closer to that of Ada 83's safe numbers. The name change is motivated by
the broad connection of the resulting concept to the semantics of
floatingpoint arithmetic, in contrast to the much more limited connotations
of ``safe numbers.'' If, with the advent of Ada 9X, one only talks about
``model numbers'' in the context of their definition in Ada 9X, no confusion
should arise.
The changes in the surviving concept provide these secondary benefits:
 the model of floatingpoint arithmetic becomes more useful to
numerical analysts because, as a descriptive tool, it reflects the
properties of the underlying hardware more closely;
 the ``4*B Rule'' is recast in a way that does not penalize the 
properties of any predefined type; 
 implementations of floating point on decimal hardware become
practical;
 a few anomalies are eliminated.
In general, the changes will have little impact on implementations; in
particular, currently generated floatingpoint code should, in the main, 
remain valid. 
1.1. Semantics of FloatingPoint Arithmetic
Floatingpoint semantics in Ada 83 are tied to the concepts of model numbers
and safe numbers. Effectively, the safe numbers define, for a given
implementation, the accuracy required of the predefined arithmetic operators
and the conditions under which overflow is and is not possible. Numerical
analysts have used characteristics of the safe numbers to make claims about
the actual performance of their programs in the underlying environment, and
they have used attributes of the safe numbers to tailor the behavior of
their programs to the numerical properties of the underlying environment.
The model numbers, in contrast, can be said to represent the worstcase
properties of the safe numbers over all conceivable conforming
implementations, and therefore the worst acceptable numerical performance of
an Ada 83 program. Numerical analysts have generally not exploited the
model numbers or their attributes for any purpose, because they prefer to
focus on the actual performance of a program in the underlying environment.
The attributes of the safe numbers permit one to reason about that
performance in a uniform, symbolic way over all implementations.
Since the model numbers of Ada 83 have generally not been put to practical
use, they are eliminated in Ada 9X. The concept of safe numbers as the
determinant of the actual numeric quality in the underlying environment
survives, but in their incarnation in Ada 9X the former Ada 83 safe numbers
are called model numbers. At the same time, their definition has been
modified slightly to allow them to fit more closely to the actual numeric
characteristics of the environment, making them more useful for the purpose
for which they were intended. In their new role, they correspond exactly to
what Brown in [brown81], which is the basis of Ada's model of floatingpoint
arithmetic, called model numbers. The changes in the floatingpoint model
are in line with those addressed by Study Topic S11.1B(1).
1.1.1. FloatingPoint Machine Numbers
Ada 83 includes a characterization of the underlying machine representation
of a floatingpoint type, based on an interpretation of the canonical form
[RM 3.5.7(4)] in which the constraints on mantissa, radix, and exponent are
those dictated by certain representation attributes [RM 13.7.3(59)]. This
amounts to the definition of a set of numbers which we are calling, in Ada
9X, the machine numbers of a floatingpoint type. We define the machine
numbers of a type T to be those capable of being represented to full
accuracy in the storage representation of T. Some machines have ``extended
registers'' with a wider range and greater precision than the corresponding
(or, sometimes, any) storage format. Thus, in the course of computation
with a type T, values having wider range or greater precision than the
machine numbers of T can be generated, as allowed by the model of
floatingpoint arithmetic. They can also be assigned to variables of T in
Ada 9X (if T is a base type), since a variable may be temporarily, or even
permanently, held in a register. There is no guarantee, however, that such
extended range or precision can be exploited, and consequently no attempt is
made to characterize it.
Note: In this connection, Ada 83 allows values of extended precision, but
not extended range, to be assigned to variables. The benefits of keeping
variables in extended registers are partially negated in Ada 83 by the need
to perform range checks on assignment, even when the variable is of a base
type (having, therefore, an implementationdefined range). These benefits
are fully realizable in Ada 9X due to the fact that range checks are no
longer performed on assignment to a variable of a numeric base type, on the
passing of an argument to a formal parameter of such a type, and on
returning a value of such a type from a function. Overflow may still be
detected in such contexts, i.e. when the expression on the righthand side
of an assignment statement, in an actual parameter, or in a return statement
performs an operation whose result exceeds the hardware's overflow
threshold, but that is a separate semantic issue. This is discussed further
in 1.1.5.
Consideration was given to eliminating the characterization of machine
numbers and retaining only that of the model numbers, thereby simplifying
the discussion of floatingpoint matters even further. However, the
characteristics of the machine numbers (that is, the storage representation
of a floatingpoint type) are needed to define the meaning of certain
attributes, viz. the ``primitive functions'' (see 1.4), as well as the
meaning of UNCHECKED_CONVERSION when its source type is a floatingpoint
type. In addition, occasionally it is appropriate to design a numerical
algorithm so as to exploit the characteristics of the machine representation
as much as possible, even though in some contexts the hardware might not
allow the full benefit of such an attempt to be achieved.
1.1.2. Attributes of FloatingPoint Machine Numbers
The Ada 83 representation attributes of floatingpoint types
(T'MACHINE_EMIN, T'MACHINE_EMAX, T'MACHINE_MANTISSA, T'MACHINE_RADIX,
T'MACHINE_ROUNDS, and T'MACHINE_OVERFLOWS), which return values of type
universal_integer, have been retained in Ada 9X, and two new Booleanvalued
attributes (T'DENORM and T'SIGNED_ZEROS) have been defined.
It has never been particularly clear whether and how the Ada 83
representation attributes accommodate denormalized numbers, if the
implementation happens to have them. This situation is improved in Ada 9X
in two ways. Implementations that generate and use denormalized numbers for
a floatingpoint type T, as defined in [IEEE754], will be distinguished by
having T'DENORM = TRUE; otherwise, T'DENORM = FALSE. (Besides being useful
to programmers, this new attribute plays a role in the definitions of some
of the primitivefunction attributes.) In addition, denormalized numbers
are accommodated as machine numbers by clarifying the meaning of
T'MACHINE_EMIN and relaxing the requirement that the leading digit of
mantissa, in the canonical form of machine numbers, always be nonzero. The
clarification is that T'MACHINE_EMIN gives the smallest value of exponent
(in the canonical form) for which every combination of sign, exponent, and
mantissa yields a machine number, i.e., a value capable of being represented
to full accuracy in the storage representation of T. This effectively means
that T'MACHINE_EMIN is the exponent of the smallest normalized machine
number whose negation is also a machine number (which has relevance to
implementations featuring ``radixcomplement'' representation) and that, in
implementations for which T'DENORM is TRUE, it is also the exponent of all
of the denormalized numbers.
A similar clarification for T'MACHINE_EMAX means that it is the exponent of
the largest machine number whose negation is also a machine number; it is
not the exponent of the most negative number on radixcomplement machines.
An alternative clarification of T'MACHINE_EMIN and T'MACHINE_EMAX was
considered, namely, that they yield the minimum and maximum values of
exponent for which some combination of sign, exponent, and mantissa yields a
machine number. This would have allowed denormalized numbers to be
accommodated without relaxing the requirement that the leading digit of
mantissa be nonzero, and it would allow us to omit an observation, which we
expect to include when we write the complete definition for the primitive
function T'EXPONENT(X), as currently proposed, that this function can yield
a result less than T'MACHINE_EMIN or greater than T'MACHINE_EMAX. Despite
the apparent desirability of this alternative, it was judged to be too much
of a departure from current practice and therefore too likely to cause
compatibility problems.
The new attribute T'SIGNED_ZEROS is provided to indicate whether the
hardware distinguishes the sign of floatingpoint zeros, as described by
[IEEE754]. This attribute, along with the T'COPY_SIGN ``primitive
function'' attribute, allows the numerical programmer to extend the
treatment of signed zeros to the higherlevel abstractions he or she
creates, much in the manner of the elementary functions ARCTAN and ARCCOT
(see 1.3). It is expected that implementations that distinguish the sign of
zeros will do so in a way consistent with relevant external standards (e.g.,
[IEEE754]) to the extent that such standards apply to operations of Ada,
and in appropriate and consistent (but implementationdefined) ways
otherwise; thus, no attempt is made in Ada 9X to prescribe the sign of every
possible zero result, or the behavior of every operation receiving an
operand of zero.
The two new attributes T'DENORM and T'SIGNED_ZEROS describe properties that
an implementation may exhibit independently of any other support for IEEE
arithmetic. Some implementations of Ada 83 do feature denormalized numbers
and signed zeros (because they come for ``free'' with the hardware), but no
other features of IEEE arithmetic.
1.1.3. FloatingPoint Model Numbers
The primary changes that distinguish Ada 9X model numbers from Ada 83 safe
numbers are these:
1. the length of the mantissa (in the canonical form) is no longer
``quantized,'' but is as large as possible consistent with
satisfaction of the accuracy requirements;
2. the radix (in the canonical form) is no longer always two, but is
the same (for a type T) as T'MACHINE_RADIX;
3. the model numbers form an infinite set;
4. the maximum nonoverflowing exponent is no longer bounded below
by a function of the mantissa length;
5. the minimum exponent is no longer required to be the negation of
the maximum nonoverflowing exponent, but is given (for a type T)
by an independent attribute.
The Ada 83 safe numbers have mantissa lengths that are a function of the
DIGITS attribute of the underlying predefined type, giving them a quantized
length chosen from the list (5, 8, 11, 15, 18, 21, 25, ...). Thus, on
binary hardware having T'MACHINE_MANTISSA = 24, which is a common mantissa
length of the singleprecision floatingpoint hardware type, the last three
bits of the machine representation exceed the precision of the safe numbers;
as a consequence, even when the machine arithmetic is fully accurate (at the
machinenumber level), one cannot deduce that Ada arithmetic operations
deliver full machinenumber accuracy. With the first change enumerated
above (freeing the mantissa length from quantization), tighter accuracy
claims will be provable on many machines. As an additional consequence of
this change, in Ada 9X the two types declared as follows
type T1 is digits D;
type T2 is digits D range T1'FIRST .. T1'LAST;
will, as they intuitively should, have the same hardware representation when
hardware characteristics do not require parameter penalties; in Ada 83,
their hardware representations almost always differ, with T2'BASE'DIGITS >
T1'BASE'DIGITS, for reasons having nothing to do with hardware
considerations.
The second change enumerated above (nonbinary radix) has two effects:
 it permits practical implementations on decimal hardware (which,
though not not currently of commercial significance for mainstream
computers, is permitted by IEEE Std. 854 [IEEE854]; is appealing
for embedded computers in consumer electronics; and is used in at
least one such application, an HP calculator);
 on hexadecimal hardware, it allows more machine numbers to be
classed as model numbers (and therefore to be proven to possess
special properties, such as being exactly representable,
contributing no error in certain arithmetic operations, etc.).
As an example of the latter effect, note that T'LAST will become a model
number on most hexadecimal machines. Also, on hexadecimal hardware, a
64bit doubleprecision type having 14 hexadecimal (or 56 binary) digits in
the hardware mantissa, as on many IBM machines, has safe numbers with a
mantissa length of 51 binary bits in Ada 83, and thus no machine number of
this type with more than 51 bits of significance is a safe number; in Ada
9X, such a type would have a mantissa length of 14 hexadecimal digits, with
the consequence that every machine number with 53 bits of significance is
now a model number, as are some with even more. (Why does the type under
discussion not have Ada 83 safe numbers with 55 bits in the mantissa, the
next possible quantized length and a length that is less than that of the
machine mantissa? Because some machine numbers with 54 or 55 bits of
significance do not yield exact results when divided by two and cannot
therefore be safe numbers. This is a consequence of their hexadecimal
normalization, and it gives rise to the phenomenon known as ``wobbling
precision.'')
The third change enumerated above (extending the model numbers to an
infinite set) is intended to fill a gap in Ada 83 wherein the results of
arithmetic operations are not formally defined when they exceed the modeled
overflow threshold but an exception is not raised. Some of the reasons why
this can happen are as follows:
 the quantization of mantissa lengths may force the modeled
overflow threshold to be lower than the actual hardware threshold;
 arithmetic anomalies of one operation may require the attributes
of model and safe numbers to be conservative, with the result that
other operations exceed the minimum guaranteed performance;
 the provision and use of extended registers in some machines moves
the overflow threshold of the registers used to hold arithmetic
results well away from that of the storage representation;
 the positive and negative actual overflow thresholds may be
different, as on radixcomplement machines.
The extension of the model numbers to an infinite range fills a similar gap
in Ada 83 wherein no result is formally defined for an operation receiving
an operand exceeding the modeled overflow threshold, when an exception was
not raised during its prior computation. The change means, of course, that
one can no longer say that the model numbers of a type are a subset of the
machine numbers of the type; one may say instead that the model numbers of a
type T in the range T'MODEL_LARGE .. T'MODEL_LARGE are a subset of the
machine numbers of T.
The fourth change enumerated above (freeing the maximum exponent from
dependence on the mantissa length) is equivalent to the dropping of the
``4*B Rule'' as it applies to the predefined types; a version of the rule 
still affects the implementation's implicit selection of an underlying 
representation for a userdeclared floatingpoint type lacking a range 
constraint, providing in that case a guaranteed range tied to the requested 
precision. The change in the application of the 4*B Rule allows all 
hardware representations to be accommodated as predefined types with 
attributes that accurately characterize their properties. Such types are 
available for implicit selection by the implementation when their properties 
are compatible with the precision and range requested by the user, but they 
remain unavailable for implicit selection in exactly those situations in 
which, in the absence of an explicit range constraint, the 4*B Rule of Ada 
83 acted to preclude their selection. Compatibility considerations related 
to the 4*B Rule are further discussed in 1.1.7. 

The 4*B Rule was necessary in Ada 83 in order to define the model numbers of 
a type entirely as a function of a single parameter (the requested 
precision). By its nature, the rule potentially precludes the 
implementation of Ada in some (hypothetical) environments, as if to say that 
such environments are not suitable for the language or applications written 
in it; in other (actual) environments, it artificially penalizes the 
reported properties of some hardware types so strongly that they have only 
marginal utility as predefined types available for implicit selection and 
may end up being ignored by the vendor. Such matters are best left to the 
judgment of the marketplace and not dictated by the language. The 
particular minimum range required in Ada 83 (as a function of precision) is 
furthermore about twice that deemed minimally necessary for numeric 
applications [brown81]. 

Among current implementations of Ada, the only predefined types whose 
characteristics are affected by the relaxation of the 4*B Rule are DEC VAX 
Dformat and IBM Extended Precision, both of which have a narrow exponent 
range in relation to their precision. In the case of VAX Dformat, even 
though the hardware type provides the equivalent of 16 decimal digits of 
precision, its narrow exponent range requires that 'DIGITS for this type be 
severely penalized and reported as 9 in Ada 83; 'MANTISSA is similarly 
penalized and reported as 31, and the other model attributes follow suit. 
In Ada 9X, in contrast, this predefined type would have a 'DIGITS of 16, a 
'MODEL_MANTISSA of 56, and other model attributes accurately reflecting the 
type's actual properties. A userdeclared floatingpoint type requesting 
more than 9 digits of precision does not select Dformat as the underlying 
representation in Ada 83, but instead selects Hformat; in Ada 9X, it still 
cannot select Dformat if it lacks a range constraint (because of the analog 
of the 4*B Rule that has been built into the equivalence rule), but it can 
select Dformat if it includes an explicit range constraint with 
sufficiently small bounds. The compatibility issues associated with these 
changes are discussed in 1.1.7. 

The IBM Extended Precision hardware type has an actual decimal precision of 
32, but the 4*B Rule requires its 'DIGITS to be severely penalized and 
reported as 18, only three more than that of the doubleprecision type. 
Supporting this type allows an Ada 83 implementation to increase 
SYSTEM.MAX_DIGITS from 15 to 18, a marginal gain; perhaps this is the reason 
why it is rarely supported (it is supported by Alsys but not by the other 
vendors that have implementations for IBM System/370s). In Ada 9X, on the 
other hand, such an implementation can support Extended Precision with a 
'DIGITS of 32, though SYSTEM.MAX_DIGITS must still be 18. Although a 
floatingpoint type declaration lacking a range constraint cannot request 
more than 18 digits, those including an explicit range constraint with 
sufficiently small bounds can do so and can thereby select Extended 
Precision. 
The fifth change enumerated above (separate attribute for the minimum
exponent) removes another compromise made necessary by the desire, in Ada
83, to define the model numbers of a type in terms of a single parameter.
The minimum exponent of the model or safe numbers of a type in Ada 83 is
required to be the negation of the maximum exponent (thereby tying it to the
precision implicitly). One consequence of this is that the maximum exponent
may need to be reduced simply to avoid having the smallest positive safe
number lie inside the implementation's actual underflow threshold; if it is
needed, such a reduction provides another way to obtain values in excess of
the modeled overflow threshold without raising an exception. Another is
that the smallest positive safe number may have a value unnecessarily
greater the actual underflow threshold. With the fifth change, as with some
of the others, more of the machine numbers will be recognized as numbers
having special properties, i.e., as model numbers.
Consideration was given to eliminating the model numbers and retaining only
the machine numbers. While this would simplify the semantics of
floatingpoint arithmetic further, it would not eliminate the interval
orientation of the accuracy requirements (see L.1.5) if variations in
rounding mode from one implementation to another, and if the use of extended
registers, are to be tolerated. It would simply substitute the machine
numbers and intervals of machine numbers for the model numbers and intervals
of model numbers in those requirements, but their qualitative form would
remain the same. However, rephrasing the accuracy requirements in terms of
machine numbers and intervals thereof cannot be realistically considered,
since many platforms on which Ada has been implemented and might be
implemented in the future could not conform to such stringent requirements.
If an implementation has appropriate characteristics, its model numbers up
to the modeled overflow threshold will in fact coincide with its machine
numbers, and an analysis of a program's behavior in terms of the model
numbers will not only have the same qualitative form as it would have if the
accuracy requirements were expressed in terms of machine numbers, but it
will have the same quantitative implications as well. On the other hand, if
an implementation lacks guard digits, employs radixcomplement
representation, or has genuine anomalies, its model numbers up to the
modeled overflow threshold will be a subset of its machine numbers having
less precision, a narrower exponent range, or both, and accuracy
requirements expressed in the same qualitative form, albeit in terms of the
machine numbers, would be unsatisfiable.
1.1.4. Attributes of FloatingPoint Model Numbers
Although some of the attributes of model numbers in Ada 9X are closely
related to those of the safe numbers in Ada 83, they all bear new names of
the form T'MODEL_xxx. Certainly this is necessary for Ada 83's T'MANTISSA;
the new version, T'MODEL_MANTISSA, is conceptually equivalent to
T'BASE'MANTISSA in Ada 83 but is now interpreted as the number of
radixdigits in the mantissa. Thus, at a minimum the value of this
attribute will be roughly quartered on hexadecimal machines, even if there
is no reason to take advantage of the other freedoms now permitted. A new
name is certainly also necessary for T'SAFE_EMAX; the new version,
T'MODEL_EMAX, is now interpreted as a power of the hardware radix, and not
necessarily as a power of two. For hexadecimal machines, the value of this
attribute will be quartered, all other things being equal. T'MODEL_EMIN is
a new attribute. T'MODEL_LARGE is conceptually equivalent to Ada 83's
T'SAFE_LARGE. It is defined in terms of more fundamental attributes, as was
true of T'LARGE in Ada 83, with the result that the changes in the radix of
the model numbers ``cancel out'' in the definition of this attribute; its
value will change little, if at all, and then only to reflect the
unquantization of mantissa lengths of model numbers. The same can be said
about T'MODEL_SMALL, which is conceptually equivalent to Ada 83's
T'SAFE_SMALL, and about T'MODEL_EPSILON, which is conceptually equivalent to
Ada 83's T'BASE'EPSILON.
The values of these attributes will be determined by how well the
implementation can satisfy the accuracy requirements, with the primary
determinant being the quality of the hardware's arithmetic. On ``clean''
machines, for which the model numbers up to the modeled overflow threshold
coincide with the machine numbers, T'MODEL_MANTISSA, T'MODEL_EMAX, and
T'MODEL_EMIN will yield the same values as T'MACHINE_MANTISSA,
T'MACHINE_EMAX, and T'MACHINE_EMIN, respectively, though in general
T'MODEL_MANTISSA and T'MODEL_EMAX may be smaller than their machine
counterparts, and T'MODEL_EMIN may be larger.
It is illuminating to contrast the processes by which the values of the 
model attributes of the predefined types are determined in Ada 83 and Ada 
9X. For this purpose, we restate the process for Ada 9X first, then we 
present the similar Ada 83 process in an unconventional but comparable form. 

For a predefined type P in Ada 9X, the process is as follows: 

 Determine simultaneously the minimum and maximum exponents (EMIN 
and EMAX) and the maximum mantissa length (MMAX) for which the 
accuracy requirements, expressed in terms of the resulting set of 
model numbers, are satisfied. EMIN may be as small as 
P'MACHINE_EMIN, but hardware anomalies in the nature of premature 
underflow may cause it to be larger. EMAX may be as large as 
P'MACHINE_EMAX, but hardware anomalies in the nature of premature 
overflow may cause it to be smaller. MMAX may be as large as 
P'MACHINE_MANTISSA, but lack of guard digits, or hardware 
anomalies in the nature of inaccurate arithmetic, may cause it to 
be smaller. 

 Set P'MODEL_EMIN = EMIN. 

 Set P'MODEL_EMAX = EMAX. 

 Let DMAX be the maximum value of D for which ceiling(D * 
log(10)/log(P'MACHINE_RADIX) + 1) <= MMAX. 

 Set P'DIGITS = DMAX. 

 Set P'MODEL_MANTISSA = MMAX. 

 Set P'MODEL_EPSILON = P'MACHINE_RADIX ** (1  P'MODEL_MANTISSA). 

 Set P'MODEL_SMALL = P'MACHINE_RADIX ** (P'MODEL_EMIN  1). 

 Set P'MODEL_LARGE = P'MACHINE_RADIX ** P'MODEL_EMAX * (1.0  
P'MACHINE_RADIX ** (P'MODEL_MANTISSA)). 

In comparable terms, the same process for Ada 83 may be stated as follows: 

 Determine simultaneously the minimum and maximum binaryequivalent 
exponents (EMIN and EMAX) and the maximum binary mantissa length 
(MMAX) for which the accuracy requirements, expressed in terms of 
the resulting set of model numbers, are satisfied. EMIN may be as 
small as P'MACHINE_EMIN * log(P'MACHINE_RADIX)/log(2), but 
hardware anomalies in the nature of premature underflow may cause 
it to be larger. EMAX may be as large as P'MACHINE_EMAX * 
log(P'MACHINE_RADIX)/log(2), but hardware anomalies in the nature 
of premature overflow may cause it to be smaller. MMAX may be as 
large as (P'MACHINE_MANTISSA  1) * log(P'MACHINE_RADIX)/log(2) + 
1, but lack of guard digits, or hardware anomalies in the nature 
of inaccurate arithmetic, may cause it to be smaller. 

 Set P'SAFE_EMAX = min(EMAX, EMIN). 

 For each D, define a corresponding value of B as follows: B = 
ceiling(D * log(10)/log(P'MACHINE_RADIX) + 1). Let DMAX be the 
maximum value of D for which the corresponding value of B <= MMAX 
and for which 4*B <= P'SAFE_EMAX. Call the corresponding value of 
B BMAX. 

 Set P'DIGITS = DMAX. 

 Set P'MANTISSA = BMAX. 

 Set P'EMAX = 4 * P'MANTISSA. 

 Set P'EPSILON = 2.0 ** (1  P'MANTISSA). 

 Set P'SMALL = 2.0 ** (P'EMAX  1). 

 Set P'LARGE = 2.0 ** P'EMAX * (1.0  2.0 ** (P'MANTISSA)). 

 Set P'SAFE_SMALL = 2.0 ** (P'SAFE_EMAX  1). 

 Set P'SAFE_LARGE = 2.0 ** P'SAFE_EMAX * (1.0  2.0 ** 
(P'MANTISSA)). 

Similar strictly comparable Ada 83 and Ada 9X statements of the equivalence 
rule, by which an implementation implicitly selects a predefined type to 
represent a userdeclared floatingpoint type, are given in 1.1.6. By 
examining those in conjunction with the attribute determination rules just 
given, one can see readily that the 4*B Rule of Ada 83 is entirely 
encapsulated in the attribute determination rules, while its analog in Ada 
9X is entirely encapsulated in the equivalence rule. 
The set of attributes having both T'MODEL_xxx and T'MACHINE_xxx versions
could conceivably be enlarged to add further intuitive strength and
uniformity to the naming convention, but we have resisted adding attributes
which meet no identifiable need.
The attributes whose Ada 83 counterparts returned results of the type
universal_integer continue to do so; these are T'MODEL_MANTISSA and
T'MODEL_EMAX. The new attribute T'MODEL_EMIN also yields a value of this
type. The attributes whose Ada 83 counterparts returned results of the type
universal_real still do; these are T'MODEL_LARGE, T'MODEL_SMALL, and
T'MODEL_EPSILON. Although there is no particular reason why T'MODEL_LARGE
and T'MODEL_SMALL cannot return values of the base type of T, neither is
there a compelling reason to make what would be a gratuitous change.
It is our plan to establish a catalog of the fundamental model parameters
for all known implementations of Ada at some future time.
The renaming of the model attributes is intended to avoid one set of 
compatibility problems, wherein programs remain valid but change their 
effect as the result of changes in the values of attributes, but of course 
it introduces another: such programs become invalid. To avoid this, 
implementations are encouraged to continue to provide the obsolescent 
attributes, in fact with their Ada 83 values, as is discussed more fully in 
1.1.7. 
1.1.5. Accuracy of FloatingPoint Operations
The accuracy requirements for certain predefined operations (arithmetic
operators, relational operators, and the basic operation of conversion, all
of which are referred to in this section simply as ``predefined arithmetic
operations'') of floatingpoint types other than root_real are still
expressed in terms of model intervals, for reasons explained earlier. It is
clarified that they do not apply to all such operations, however. For
example, they do not apply to any attribute that yields a result of a
specific floatingpoint type; such an attribute yields a machine number,
which must be exact.
The accuracy requirements for exponentiation are relaxed in accord with
AI00868. The weaker rules no longer require that exponentiation be
implemented as repeated multiplication; special cases can be recognized and
implemented more efficiently by, for example, repeated squaring, even when
accuracy is sacrificed by doing so. The implementation model for
exponentiation by a negative exponent continues to be exponentiation by its
absolute value, followed by reciprocation. Thus, the rule continues to
allow for the possibility of overflow in this case, despite the
counterintuitive nature of such an overflow. The WG9 Numerics Rapporteur
Group recommended this treatment, so as to allow for the most efficient
implementations, recognizing, of course, that the user who is concerned with
the possibility of overflow can express the desired computation differently
and thereby avoid it. AI00868 did not address the possibility of overflow
in the intermediate results, for negative exponents.
One of the goals for Ada 9X is to allow for and legitimize the typical kinds
of optimizations that increase execution efficiency or numerical quality.
One of these is the use, for the results of the predefined arithmetic
operations of a type, of hardware representations having higher precision or
greater range than those of the storage representation of the type. On some
machines, this is not an option; arithmetic is performed in ``extended
registers,'' there being no registers having exactly the precision or range
of the storage cells used for variables of the type. Thus, we must allow
the results of arithmetic operations to exceed the precision and range of
the underlying type; avoiding that is intolerably expensive on some
machines. A second common optimization is the retention of a variable's
value in a register after its assignment, with subsequent references to the
variable being fulfilled by using the register. This avoids load operations
(i.e., the cost of memory references); it may, in many cases, even avoid the
store into the storage location for the variable that would normally be
generated for the assignment operation.
One implication of legitimizing the use of extended registers is the need to
define the result of an operation that could overflow but doesn't, as well
as the result of a subsequent arithmetic operation that uses such a value.
This is the motivation for the extension of the model numbers to an infinite
range and for the rewrite of RM 4.5.7(7). The new rules describe behavior
that is consistent with the assumption that an operation of a type T that
successfully delivers or uses results beyond the modeled overflow threshold
of the type T is actually performed by an operation corresponding to a type
with higher precision and/or wider range than that of the type T, whose
overflow threshold is not exceeded, and whose accuracy is no worse than that
of the original operation of the type T.
Ada 83 does not permit a value outside the range T'FIRST .. T'LAST to be
propagated beyond the point where it is assigned to a variable of a type T,
or is passed as an actual parameter to a formal parameter of type T, or is
returned from a function of type T; a range check is required in these
contexts to prevent it. Nothing prevents the carrying of excess precision
beyond such a point, however. Thus, keeping a value in an extended register
beyond such a point is permitted in Ada 83, whether or not it is also
stored, provided that the range check is satisfied. The range check may be
performed by a pair of comparisons of the source value to the bounds of the
range, when those bounds are arbitrary; but in the case of a floatingpoint
base type, the check may be a free byproduct of the store. For example, on
hardware conforming to IEEE arithmetic [IEEE754], storing an extended
register into a shorter storage format will signal an overflow if the source
value exceeds the range of the destination format. If the propagation has
no need for an actual store, because the value is to be propagated in the
register, then a store into a throwaway temporary, just to see if overflow
occurs, may be the cheapest way to perform the range check. If the check
succeeds, all subsequent uses of the value in the extended register are
valid and safe, including any potential need to store it into storage, such
as when the value is about to be passed as an actual parameter and the
implementation prefers to pass parameters in storage, or even merely because
of the need for register spilling at an arbitrary place not connected with a
use of the entity currently in the register. The loss of precision that
occurs at that point does not matter, because it is consistent with the
perturbations allowed when the value, had it not been shortened, is
subsequently used as the operand of an operation.
For an assignment to a variable of a numeric base type, actual code to
perform the range check is not always needed; it can be omitted if the
implementation can deduce that the check must succeed. The generation of
code to perform a range check is necessary only when extended registers are
being used, and then only when the source expression is other than just a
primary, that is, contains at the top level a predefined arithmetic
operation that can give rise to a value outside the range. As an example,
consider
X := Y * Z;
in which X, Y, and Z are assumed to be of some floatingpoint base type
T. If there are no parameter penalties, T'MODEL_LARGE = T'LAST = T'FIRST.
If extended registers are not being used, then the multiplication cannot
generate a value outside the range T'FIRST .. T'LAST (since the attempt to
do so would overflow) and the range check can therefore be omitted. On the
other hand, if extended registers are being used, a value exceeding
T'MODEL_LARGE can be produced in the register, because the multiplication
may no longer overflow, and a range check will be needed to preclude the
propagation of a value outside T'FIRST .. T'LAST. When the source
expression is simply the value of a variable, a formal parameter, or the
result of a function call, as in
X := Y;
no actual range check is necessary, since the value (of Y, in the example)
can be presumed to have passed an earlier range check in the first
propagation away from the point where it was generated. When the source
expression does contain a predefined arithmetic operation at the top level,
the formal definition immediately precedes the range check on assignment by
an overflow check (on the multiplication, in the first example above) that
is at least as stringent (it is more stringent if there are parameter
penalties causing T'MODEL_LARGE to be less than T'LAST). Because it is at
least as stringent as the range check, the overflow check ought to subsume
the range check, but in practice it does not since the actual overflow
threshold, when extended registers are used, is even higher. It is
unfortunate that the availability and use of extended registers sometimes
require extra code to be generated for assignments in Ada 83.
We discuss next a possible way to improve on this situation in Ada 9X. It
is not what we have actually done, but it motivates the latter, which is
described afterwards.
We could proceed by making the range check optional at any propagation point
(assignment statement, parameter passing, return statement) when the target
type is a numeric base type. This would allow an outofrange value to be
propagated when no actual store is needed, and it would also permit an
exception to be raised for an outofrange value when the implementation
does require that the propagation be performed by an actual store (for
example, some implementations might never pass bycopy parameters or
function results in registers). In general, it would permit an outofrange
value to survive in an extended register through an arbitrary number of
propagations (in particular, those that don't require stores), only to give
rise to an exception when a propagation point requiring a store is reached.
We would also have to clarify that passing an actual parameter to an
attribute that is a function, and returning a value from such an attribute,
are considered propagations, since the attribute may be implemented in the
same way as a function and may require its arguments or result to be passed
in storage. Thus, a range check would optionally be performed at those
places when the parameter or result is of a numeric base type and the source
value can be out of range. Finally, we would also have to clarify that
presenting an operand to a predefined arithmetic operation, and that
returning a result from a predefined arithmetic operation, are also
considered propagations, since some implementations may implement some such
operations in the same way as function calls, requiring operands and results
to be passed in storage. Thus, a range check would optionally be performed
at those places, too, when the type of an operand or that of the operation's
result is a numeric base type. Actually, this goes a bit too far: there is
no need for the range check on the result of a predefined operation, since
the more stringent overflow check already there subsumes it and accounts for
any necessary raising of CONSTRAINT_ERROR at that point.
That approach comes close to accomplishing what we need. The only problem
with it is that it leaves untouched many contexts in which an outofrange
value in an extended register could be used without an opportunity for
raising CONSTRAINT_ERROR, as might be required by the particular context.
For example, simple variables used as the bounds of ranges, as
discriminants, as subscripts, as specifiers of various sorts in
declarations, as generic actual parameters, as case expressions, in delay
statements, as choices in numerous constructs, and undoubtedly in other
contexts would not be subject to a range check, because these are not
propagation contexts. Implementations would have to be prepared to deal in
these contexts with values having the range implied by the extended
registers that are available, rather than the range implied by the base type
associated with the context at hand.
What we have actually done, instead of including the optional range check in
the semantics of propagation, is to include it in the semantics of
readreferences for certain categories of primary, specifically name and
function_call. (This does not appear in the Mapping Specification for the
Numerics Area, except in the form of a Note, since it is assumed to be a
feature of the core.) The range check in the three main propagation
contexts of Ada 83 (assignment statement, bycopy parameter passing, and
return statement) is entirely eliminated when the target type is a numeric
base type. We shall now show that even in its absence there is an
opportunity to raise CONSTRAINT_ERROR in those propagation contexts, when
the target type is a numeric base type and the source value exceeds the
type's range, and in all other contexts in which such a value might be used.
Indeed, the propagation contexts are just a subset of the general contexts,
so they need not be considered separately.
Every value used in a read (fetch) context, including those in propagation
contexts, is denoted by the category expression or one of its descendants.
Those expressions, or constituents of expressions, involving predefined
arithmetic operations (including the implicit conversions inherent in
references to numeric literals) already provide an opportunity to raise
CONSTRAINT_ERROR when they yield a value of a numeric base type that is
outside the range of the type, because the operations perform an overflow
check on the result that, being more stringent than the desired range check,
subsumes it. The opportunity to raise a CONSTRAINT_ERROR for a
parenthesized expression or a qualified expression, as an expression or a
component thereof, is provided by the evaluation of the expression that is
its immediate component. This leaves only names and function calls.
Therefore, all the necessary opportunities for raising the desired
CONSTRAINT_ERROR are covered by adding an optional range check to the
semantics of readreferences for names and function calls (i.e., after the
return), when the type denoted by the name or function call is a numeric
base type. Note that only names denoting nonstatic objects are affected,
since the evaluation of static expressions is both exact and not limited as
to range.
We stress that all of the new range checks we are introducing are optional;
that is, either the check is optionally performed and raises
CONSTRAINT_ERROR if it fails, or it is always performed and optionally
raises CONSTRAINT_ERROR if it fails. Thus, the checks will not require any
code to be generated unless an actual shortening (storing of an extended
register) does need to occur at one of these places. Furthermore, as we
indicated earlier, even then the range check may come for free (as on IEEE
hardware).
A simple example will illustrate what can be gained. Consider this typical
inner product:
SUM := 0.0;
for I in A'RANGE loop
SUM := SUM + A(I) * B(I);
end loop;
F(SUM);
Assume that SUM is a variable of a numeric base type. We would like to keep
SUM in an extended register during the loop, and in fact not even store the
register into SUM during the loop. In Ada 83, we are formally obligated to
perform a range check upon the assignment inside the loop, to prevent the
propagation of a value outside SUM's range; in IEEE systems, the cheapest
way to do this would be by storing into SUM after all, or into a throwaway
temporary. Thus, a store (or some other means of checking) is executed each
time through the loop. In Ada 9X, on the other hand, no range check is
performed on that assignment, allowing an outofrange value to be
propagated, and justifying the complete omission of stores of the extended
register containing SUM within the loop. The CONSTRAINT_ERROR that Ada 83
would have raised on some assignment in the loop might instead occur during
the passing of SUM to F. It is allowed by the new optional range check in
the semantics of the variable reference inherent in the parameter
association, and whether or not it occurs there depends on whether
parameters are passed in storage or in registers and, in the former case,
whether the value of SUM is out of range at that point.
With this change, it is true that exceptions can occur in places where they
did not occur in Ada 83. However, whenever this happens, one can point to a
different place in the program where the exception would have occurred,
earlier, in its interpretation according to Ada 83 semantics. It may also
be that the exception never occurs in the Ada 9X interpretation; in the
example above, it may be that SUM remains forever in an extended register
and is never stored, or it may be that its value has been brought back
within range by the time it is stored.
We should probably have noted much earlier that this treatment of numeric
base types applies to all of them, not just floatingpoint base types. It
allows integer and fixedpoint values to be held in, for example, 32bit
general registers, in which integer arithmetic is performed, even when the
storage format of the base types involved has only 16 or 8 bits.
Also, although omitting certain range checks appears to conflict with the
safety goals of range checking, it must be remembered that the bounds of
numeric base types are implementation dependent anyway, so that whether a
particular source value can be assigned to a target of a numeric base type
already depends (i.e., in Ada 83) on properties of the implementation.
Furthermore, all declared integer and fixedpoint types necessarily involve
range constraints and will therefore be subject to range checking; only
floatingpoint types declared without a range constraint will escape it. Of
course, all predefined numeric types are base types and will escape range
checking (which is consistent with their implementationdependent ranges).
Other than by using floatingpoint types declared without a range
constraint, or by using predefined types, or by going out of one's way to
use T'BASE as a type mark, one will not escape range checking.
As was explained above, even Ada 83 allowed and explained the loss of
precision that can occur in shortening, when the propagation of a value held
in an extended register requires it. Actually, there is one exception to
this: if shortening is allowed to take place on a value being passed to an
instantiation of UNCHECKED_CONVERSION (and it certainly seems that
shortening is expected in that context), then nothing in the definition of
UNCHECKED_CONVERSION, or anywhere else, currently allows or explains the
shortening, in regard to the contrast between the possibly extraprecise
value going in and the presumably shortened value coming out. In Ada 9X,
the primitive functions (see 1.4) introduce several additional contexts in
which shortening can occur and yet the accompanying loss of precision is
potentially unexplained. We need to introduce some rules that explain the
possibility of loss of accuracy in those contexts where it is not currently
explained. (The primitive functions, being attributes, are not operations
to which 4.5.7 applies.) It seems likely that the latter will be specific
to the contexts involved, though we have not yet resolved how best to
accomplish that.
The accuracy requirements for floatingpoint arithmetic operations are, for
the time being, expressed separately from those for fixedpoint operations,
since the latter do not need the full generality of the intervalbased model
appropriate for floatingpoint operations (see 1.2). Nevertheless, the
rules may ultimately be recombined into a uniform set of rules for all real
types for purely presentation purposes; if so, it would be made clear that
some of the freedoms permitted in the floatingpoint case do not apply in
the fixedpoint case.
1.1.6. FloatingPoint Type Declarations
The restatement of the ``equivalence rule'' for userdeclared floatingpoint
type declarations, which explains how an implementation selects a predefined
floatingpoint type on which to base the representation of the declared
type, is a natural extension of its form in Ada 83 that accommodates the
changes described earlier. This rule is the basis for the intuitive (and
informal) observation that floatingpoint types provide for approximate
computations with real numbers so as to guarantee that the relative error of
an operation that yields a result of type T is bounded by 10.0 **
(T'DIGITS). [Note: A more formally complete version of this observation
can be obtained from a theorem of [brown81].]
The Ada 9X analog of Ada 83's 4*B Rule exerts its effect during the 
application of the ``equivalence rule,'' by which an implementation 
implicitly selects a predefined type to represent a userdeclared 
floatingpoint type. Consider the declaration 

type T is digits D [range L .. R]; 

Restated (see L.1.6), the Ada 9X equivalence rule says that this is 
equivalent to 

type floating_point_type is new P; 
subtype T is floating_point_type 
[range floating_point_type(L) .. floating_point_type(R)]; 

where floating_point_type is an anonymous type, and where P is a predefined 
floatingpoint type implicitly selected by the implementation so that it 
satisfies the following requirements: 

 P'DIGITS >= D. 

 If a range L .. R is specified, then P'MODEL_LARGE >= max(abs(L), 
abs(R)); otherwise, P'MODEL_LARGE >= 10.0 ** (4*D). 

The effect of the analog of Ada 83's 4*B Rule, known in Ada 9X as the 4*D 
Rule, is to ensure that a userdeclared type without a range constraint 
provides adequate range; in fact, in all existing implementations of Ada, it 
precludes a predefined type from being selected if and only if the type 
would be precluded from selection by Ada 83's 4*B Rule. To see this, note 
that Ada 83's equivalence rule can be stated (unconventionally) in the same 
form, except that the conditions that the predefined type P must satisfy are 
as follows: 

 P'DIGITS >= D. 

 If a range L .. R is specified, then P'SAFE_LARGE >= max(abs(L), 
abs(R)). 

When the 4*D Rule precludes the selection of a type P in Ada 9X, it is 
necessarily the case that the value of P'DIGITS is penalized in Ada 83 by 
the 4*B Rule, and thus it is the first of the two conditions above that 
precludes the selection of P in Ada 83. The value of P'DIGITS is not 
penalized in Ada 9X. 

When a type declaration includes a range constraint whose bounds are 
sufficiently small, the Ada 9X equivalence rule potentially permits the 
selection of a predefined type precluded (e.g., by the first condition) in 
Ada 83. However, among current implementations this occurs in only one 
instance (see 1.1.7). The equivalence rule has been formulated in this way 
in Ada 9X to emphasize the role of the range constraint in expressing the 
minimum range needed for computations with the type. An alternative was 
considered, in which the conditions for the selection of a predefined type P 
can be stated as follows: 

 P'DIGITS >= D. 

 P'MODEL_LARGE >= 10.0 ** (4*D). 

 If a range L .. R is specified, then in addition P'MODEL_LARGE >= 
max(abs(L), abs(R)). 

This alternative would result in complete equivalence between the Ada 9X 
selections and those of Ada 83 for all userdeclared floatingpoint types, 
while still permitting hardware types that satisfy Ada 83's 4*B Rule only 
with a precision penalty to be supported without penalty, but such types 
could not always be selected when they provide adequate precision and range, 
relative to the precision and range requested. The alternative was rejected 
because it too severely restricts the utility of hardware types that can be 
supported as unpenalized predefined types in Ada 9X. 

The change in the interpretation of the named number SYSTEM.MAX_DIGITS is 
necessitated by the shift from the 4*B Rule to the new 4*D rule. This 
attribute is typically used to declare an unconstrained floatingpoint type 
with maximum precision. The change in its interpretation ensures that such 
a use will have the same effect in Ada 9X, i.e., will result in the 
selection of the same underlying representation as in Ada 83. 
One way in which our changes do not go quite as far as possible in
reflecting the actual properties of the machine has to do with the use of
T'MODEL_LARGE in describing when overflow can occur and when it cannot. On
radixcomplement machines, the negative overflow threshold does not coincide
in magnitude with the positive overflow threshold, but this is not reflected
in T'MODEL_LARGE, which is conservative (that is, it characterizes the less
extreme threshold). While this is not of any particular consequence as far
as the rewrite of 4.5.7 goes (after all, there are many reasons why a value
exceeding T'MODEL_LARGE in magnitude might not overflow), it does interact
in an undesirable way with the equivalence rule for floatingpoint type
declarations. If a floatingpoint type declaration specifies a lower bound
exactly coinciding with the most negative floatingpoint number of some base
type P, as can happen when P'FIRST is used as the lower bound of the
requested range, then P will be ineligible as the representation of the type
being declared (on radixcomplement machines, and even when no other
arithmetic anomalies are present). This suggests that T'MODEL_LARGE ought
to be abandoned in favor of two attributes, say T'MODEL_FIRST and
T'MODEL_LAST, that characterize the positive and negative ``safe'' (i.e.,
overflowfree) limits separately. The way that T'MODEL_EMAX is defined
would have to change; presumably it could be the maximum of the exponents of
T'MODEL_FIRST and T'MODEL_LAST in the canonical form. A T'MODEL_FIRST and
T'MODEL_LAST could then be defined for all numeric types (for integer and
fixedpoint types they would be equal to T'BASE'FIRST and T'BASE'LAST,
respectively), and they could be used in a typeindependent statement of
when overflow can and cannot occur (allowing its removal from 4.5.7). This
is an attractive idea and may be explored in the future.
1.1.7. Compatibility Considerations 

In this section we analyze the impact of the potential sources of 
incompatibility resulting from the changes in the model of floatingpoint 
arithmetic. We argue that actual incompatibilities will arise rarely in 
practice, and that strategies for minimizing their effect are available. 
Actual incompatibilities have been reduced since the previous version of the 
Mapping Specification by the inclusion of the 4*D Rule (see 1.1.6). 

The explicit use of model attributes of a floatingpoint type is rather rare 
and usually restricted to expertly crafted numeric applications. Thus, the 
elimination of some of the model attributes, in favor of new attributes with 
somewhat different definitions and new names, will probably not be noticed 
by the vast majority of existing Ada programs. We have already recommended 
(see NM:FLOADMODATTR and 1.1.4) that vendors continue to support the 
obsolescent attributes as implementationdefined attributes, with their Ada 
83 definitions, for the purpose of providing a smooth transition for those 
that are affected. Detected use of such attributes should evoke a warning 
message from the compiler, recommending that the references to obsolescent 
attributes be replaced by appropriate references to the new attributes, or 
by other appropriate expressions, when convenient. In most cases, the 
substitution is expected to be straightforward, but some analysis will be 
required to ascertain this. At least, by continuing to provide the 
obsolescent attributes as implementationdefined attributes, a vendor can 
provide continuity for programs affected by this change. 

A userdeclared floatingpoint type declaration specifying an explicit range 
whose bounds are small in relation to the requested precision may select an 
underlying representation that, while providing the requested range, 
nevertheless provides less range than in Ada 83. (This is because the 
predefined type selected as the representation was required to satisfy the 
4*B Rule in Ada 83 but is not required to do so in Ada 9X.) As a 
consequence, overflow in the computation of an intermediate result may occur 
where it did not previously. However, among current implementations this 
occurs only in DEC VAX implementations, when the requested precision exceeds 
9 and the requested range has relatively small bounds, and when use of 
Dformat, rather than Gformat, for the LONG_FLOAT predefined type is 
explicitly enabled by the appropriate pragma. That is, in Ada 9X, Dformat 
can be selected, whereas in Ada 83, Dformat is precluded and Hformat is 
selected. DEC VAX Dformat is used only rarely and is being deemphasized 
in newer systems. DEC VAX compilers that are affected by this change can 
issue a warning message when Dformat is selected in a situation in which 
Hformat would have been selected in Ada 83. The message can indicate that 
removing the range constraint from the type declaration, and placing it 
instead on a subtype declaration, will (necessarily) result in the same 
selection for the underlying representation as in Ada 83. Alternatively, 
the compiler can avoid selecting Dformat, even though it is allowed to. 
The language continues to express no preference for the selection of an 
underlying representation when multiple representations are eligible. 

Similar problems do not arise with IBM Extended Precision in the Alsys 
implementation for IBM 370. There is no larger type that is currently 
selected when the requested precision exceeds 18 decimal digits and the 
requested range has appropriately small bounds; thus, the selection of 
Extended Precision in such a case in Ada 9X represents a valid 
interpretation for what was previously an invalid program. 

In all other cases of which we are aware, the supported hardware types are 
such that a type providing the requested precision will always provide a 
range that satisfies Ada 83's 4*B Rule, resulting in no further 
incompatibilities. 

A userdeclared floatingpoint type declaration not specifying an explicit 
range poses no compatibility problems, because the predefined type chosen to 
represent the declared type must satisfy the new 4*D Rule (see 1.1.6), which 
in this case provides for compatibility with Ada 83. Among current 
implementations, the 4*D Rule will exert an effect only in DEC VAX and Alsys 
IBM 370 implementations; in the former, it will preclude the selection of 
Dformat when the Ada 83 4*B Rule would have precluded it, and force the 
selection of Hformat instead, whereas in the latter, it will preclude the 
selection of Extended Precision when it would not have been selected in Ada 
83, and continue to make the program invalid. The 4*D Rule, newly added to 
the Mapping Specification (see L.1.6), should not, however, be viewed 
strictly as a concession to compatibility. Rather, it should properly be 
viewed as providing for uniformity among future implementations, by implying 
a minimum range in a context where no minimum range is specified. It 
coincidentally provides some additional compatibility when, without it, VAX 
Dformat would be selectable. 

When VAX Dformat or IBM Extended Precision is selected in contexts when it 
would also have been selected in Ada 83 to represent a type T, the value of 
T'BASE'DIGITS, which was 9 for the former and 18 for the latter in Ada 83, 
will now be 16 or 32, respectively. If this attribute is correctly used, 
i.e. to tailor a computation to the actual precision provided by the 
underlying type, then the computation should adapt itself naturally to the 
new value. Nevertheless, in the few circumstances in which use of this 
attribute is detected by an affected compiler, a warning message can be 
issued. 

The very few remaining situations in which a different underlying 
representation is selected in Ada 9X (for example, the one illustrated in 
1.1.3) are considered true anomalies genuinely worth correcting. In any 
case, they have rather artificial characteristics and are thus extremely 
unlikely to occur in practice. 
1.2. Semantics of FixedPoint Arithmetic
Various problems have been identified with fixedpoint types in Ada 83:
 They can be counterintuitive. The values of a fixedpoint type
are not always integer multiples of the declared delta (they are
instead integer multiples of the small, which may be specified or
defaulted, and which in either case need not be the same as, or
even a submultiple of, the delta), and they do not always exhaust
the declared range, even when the bounds of the declared range are
integer multiples of the small (we are thinking about the case
where a bound of the range is a power of two times the small).
These surprises are responsible for some of the confusion with
fixedpoint types (although some programmers do understand and
correctly exploit the fact that the high bound need not be
representable).
 The model used to define the accuracy requirements for operations
of fixedpoint types is much more complicated than it needs to be,
and many of its freedoms have never, in fact, been exploited. The
accuracy achieved by operations of fixedpoint types in a given
implementation is ultimately determined, in Ada 83, by the safe
numbers of the type, just as for floatingpoint types, and indeed
the safe numbers can, and in some implementations do, have more 
precision than the model numbers. However, the model in Ada 83
allows the values of a real type (either fixed or float) to have
arbitrarily greater precision than the safe numbers, i.e., to lie
between safe numbers on the real number axis; implementations of
fixed point typically do not exploit this freedom. Thus, the
opportunity to perturb an operand value within its operand
interval, although allowed, does not arise in the case of fixed
point, since the operands are safe numbers to begin with. In a
similar way, the opportunity to select any result within the
result interval is not exploited by current implementations, which
we believe always produce a safe number; furthermore, in many
cases (i.e., for some operations) the result interval contains
just a single safe number anyway, given that the operands are safe
numbers, and it ought to be more readily apparent that the result
is exact in these cases.
 Support for fixedpoint types is spotty, due to the difficulty of
dealing accurately with multiplications and divisions having
``incompatible smalls'' as well as fixedpoint multiplications,
divisions, and conversions yielding a result of an integer or
floatingpoint type. Algorithms have been published in [Hi90],
but these are somewhat complicated and do not quite cover all
cases, leading to implementations that do not support
representation clauses for SMALL and that, therefore, only support
binary smalls.
These problems are partly the result of trying to make fixedpoint types
serve several needs and several application areas, none of which are served
perfectly and all of which are compromised somewhat, as discussed below.
 One of the intended applications for fixedpoint types is
sensorbased applications, where the representations of scaled
physical quantities are transmitted over ports as binary integers.
Digital signal processing is a related application area with a
similar focus on manipulating scaled binary integers. These needs
are met fairly well, because either no representation clauses for
SMALL are used (the delta already being a power of two) or
representation clauses for universally accepted values of SMALL
are used.
 Fixedpoint types are intended, or at least they have been
considered, for applications in the Information Systems area, i.e.
to represent financial quantities that are typically integer
multiples of decimal fractions of some monetary denomination.
This need is not met well, since extra precision is generally
intolerable in such applications, rounding needs to be controlled,
and there is no guarantee that decimal scaling factors are
supported (because they require the use of representation
clauses). Many fixedpoint implementations limit ranges to the
equivalent of about ten decimal digits, which is inadequate for
some IS applications. In addition, IS applications often need
multiple representations of decimal data, e.g. for computation
versus display. The fixedpoint model in Ada 83 is heavily biased
towards an internal representation of fixedpoint data as a binary
integer, and this bias strongly affects the range of the base
type. Specifying a range as, for example, 9_999_999.99 ..
9_999_999.99 is considered cumbersome; in any case, it does not
guarantee protection against exceeding the range in computations
of the base type.
 Finally, fixedpoint types are often embraced as a kind of cheap
floating point, suitable on hardware lacking true floating point
when the application manipulates values from a severely restricted
range. This need may be met well, in the sense that efficient
performance may be expected when the small is allowed to default
and the user holds no expectations that multiples of the delta are
exactly represented, but it has influenced the design of the
facility too heavily and it compromises the quality of what can be
offered in the other application areas. It is not clear that this
application of fixed point is much used or needed.
Our solution to these problems is to remove some of the freedoms of the 
intervalbased accuracy requirements that have never been exploited and to 
relax the accuracy requirements so as to encourage wider support for fixed 
point. Applications that use binary scaling and/or carefully matched
(``compatible'') scale factors in multiplications and divisions, which is
typical of sensorbased and other embedded applications, will see no loss of
accuracy or efficiency. It is not our intention to meet the special needs
of IS applications; they are addressed by the new decimal fixed point types
defined in the Information Systems area of the Special Needs Annex (see
Section K), although undemanding applications in this area may
coincidentally be served marginally better by ordinary fixedpoint types
than they were in Ada 83.
While the revamped fixedpoint facility removes and relaxes requirements 
that have generally not been exploited, it does not go as far as we had 
hoped in substituting intuitive behavior for the surprises of the past. 
Version 4.1 of the Numerics Annex proposed to eliminate the concept of small 
as distinct from delta, making the values of a userdeclared fixedpoint 
type integer multiples of the declared delta. Although this proposal had 
significant support, it was judged by others to represent too radical a 
change and to produce too many incompatibilities. It would have caused 
programs using fixedpoint types with a delta that is not a power of two and 
a default small to substitute different sets of values for those types and 
to perform scaling by multiplication and division instead of shifting. By 
retaining the concept of small as distinct from delta, as well as a default 
rule for small that is analogous to the Ada 83 rule, we have in the present 
version of the Numerics Annex avoided the need for any fixedpoint type to 
change its behavior. 

The default small in Ada 9X is an implementationdefined power of two less 
than or equal to the delta, whereas in Ada 83 it was defined to be the 
largest power of two less than or equal to the delta. The purpose of this 
change is merely to allow implementations that previously used extra bits in 
the representation of a fixedpoint type for increased precision rather than 
for increased range, giving the safe numbers more precision than the model 
numbers, to continue to do so. An implementation that does so must, 
however, accept the minor incompatibility represented by the fact that the 
type's default small will differ from its value in Ada 83. Implementations 
that used extra bits for extra range have no reason to change their default 
choice of small, even though Ada 9X allows them to do so. 

Note that our simplification of the accuracy requirements, i.e., expressing 
them directly in terms of certain sets of integer multiples of the result 
type's small rather than in terms of model or safe intervals, removes the 
need for some of the attributes of model and safe numbers of fixedpoint 
types. To the extent that these attributes are used in Ada 83 programs, the 
elimination of these attributes poses a potential incompatibility problem. 
As we did for floatingpoint types, we recommend that implementations 
continue to provide these attributes as implementationdefined attributes, 
with their Ada 83 values, and that implementations produce warning messages 
upon detecting their use. 
We had hoped to go so far as to remove, in support of Requirement R2.2B(1),
the potential surprise when a range bound that is a power of two times the
small is not included within the range of a fixedpoint type, by including
all the integer multiples of the small in the declared bounds within the
range of the type, arguing that declarations that change their meaning as a
result can be rewritten to achieve the desired effect. But we could not
argue that few programs would be affected by this change; in other words,
even though this property of Ada 83 has the potential for surprise with some
programs, it is used correctly by far more. The feature remains, but it is
reflected by a different mechanism now that the concepts of model numbers
and their mantissas have been dropped from the fixedpoint description.
Some of the accuracy requirements, i.e. those for the adding operators and
comparisons, now simply say that the result is exact. This was always the
case in Ada 83, assuming operands are always safe numbers there, and yet it
is not clear from the modelinterval form of the accuracy requirements that
comparison of fixedpoint quantities is, in practice, deterministic and need
not be otherwise. Other accuracy requirements are now expressed in terms of
small sets of allowable results, called ``perfect result sets'' or ``close
result sets'' depending on the amount of accuracy that it is practical to
require; these sets always contain consecutive integer multiples of the
result type's small (or of a ``virtual'' small of 1.0 in the case of
multiplication or division with an integer result type). In some cases, the
sets are seen to contain a single such multiple or a pair of consecutive
multiples; this clearly translates into a requirement that the result be
exact, if possible, but never off by more that one rounding error or
truncation error. The cases in which this occurs are the fixedpoint
multiplications and divisions in which the operand and result smalls are
``compatible,'' meaning that the product or quotient of the operand smalls
(depending on whether the operation is a multiplication or a division) is
either an integer multiple of the result small, or vice versa. (These cases
cover much of the careful matching of types typically exhibited by
sensorbased and other embedded applications, which are intended to produce
exact results for multiplications and atmostoneroundingerror results for
divisions, with no extra code for scaling; they can produce the same results
in Ada 9X, and with the same efficient implementation. Our definition of
``compatible'' is more general than required just to cover those cases of
careful matching of operand and result types, permitting some
multiplications that require scaling of the result by at worst a single
integer division, with an error no worse than one rounding error.) For
other cases (when the smalls are ``incompatible''), the accuracy
requirements are relaxed, in support of Requirement R2.2A(1); in fact, they
are left implementation defined. Implementations need not go so far as to
use the Hilfinger algorithms [Hi90], though they may of course do so. An
Ada 9X implementation could, for instance, perform all necessary scaling on
the result of a multiplication or division by a single integer
multiplication or division (or shifting). That is, the efficiency for the
cases of incompatible smalls need not be less than that for the cases of
compatible smalls. This relaxation of the requirements is intended to
encourage support for a wider range of smalls. Indeed, we considered making
support for all smalls mandatory on the grounds that the relaxed
requirements removed all barriers to practical support for arbitrary smalls,
but we rejected it because it would make many existing implementations
instantly nonconforming.
Ada 9X allows an operand of fixedpoint multiplication or division to be a 
real literal, named number, or attribute. Since the value V of that operand 
can always be factored as an integer multiple of a compatible small, the 
operation must be performed with no more than one rounding error and will 
cost no more than one integer multiplication or division for scaling. Note: 
That V can always be factored in this way follows from the fact that it, and 
the smalls of the other operand and the result, are necessarily all rational 
quantities. 
The accuracy requirements for fixedpoint multiplication, division, and
conversion to a floatingpoint target are left implementation defined
because the implementation techniques described in [Hi90] rely on the
availability of several extra bits in typical floatingpoint representations
beyond those belonging to the Ada 83 safe numbers; with the revision of the
floatingpoint model, in particular the elimination of the quantization of
the mantissa lengths of model numbers, those bits are now likely gone.
Requiring modelnumber accuracy for these operations would demand
implementation techniques that are more exacting, expensive, and complicated
than those in [Hi90], or it would result in penalizing the mantissa length
of the model numbers of a floatingpoint type just to recover those bits for
this one relatively unimportant operation. With the accuracy requirements
for this case left implementation defined, an implementation may use the
simple techniques in [Hi90] for fixedpoint multiplication, division, and
conversion to a floatingpoint target; the accuracy achieved will be exactly
as in Ada 83, but will simply not be categorizable as modelnumber accuracy.
We have abandoned an idea we first put forth in Version 4.0 of the Mapping
Specification, namely, that of allowing the radix of the representation of
an ordinary (i.e., nondecimal) fixedpoint type to be specified as ten, by
a representation clause (attribute definition clause in Ada 9X); the current
bias towards a radix of two would persist as the default. We abandoned this
idea because it benefits primarily IS applications, which are now addressed
by separate features. This feature would integrate well with the rest of
our proposal, were it to be restored. Its primary semantic effect would be
to exclude range bounds that are powers of ten from necessarily being
included within the bounds of the type. It would permit bounds like
999_999_999.99 in the declaration of a type whose small is .01 to be written
instead as 1_000_000_000.00 or even as 0.01E+12, which is close to the
``digits 11'' shorthand provided by IS:DECIMAL_FIXED_POINT. Since it would
be of no particular benefit outside of IS applications, and since it is only
a minuscule part of the totality of additional support required in the IS
area, it is not worth adding to ordinary fixed point.
A suggestion for a new representation attribute, T'MACHINE_SATURATES, has
been made. Some digital signal processors do not signal a fault or wrap
around upon overflow but instead saturate at the most positive or negative
value of the base type. It could be useful to detect and describe that
behavior by means of the suggested attribute. We leave this as a subject
for future exploration, perhaps in conjunction with similar refinements of
T'MACHINE_OVERFLOWS suggested (for floatingpoint types) by IEEE arithmetic.
1.3. Elementary Functions
For a general rationale for the elementary functions, the reader is referred
to [GEF91]. These functions are critical to a wide variety of scientific
and engineering applications written in Ada. They have been widely provided
in the past as vendor extensions with no standardized interface and with no
guarantee of accuracy. These impediments to portability and to analysis of
programs are removed by their inclusion in the Numerics Area features of Ada
9X, in support of Requirement R11.1A(1).
The elementary functions are provided in Ada 9X by a new predefined generic
package, GENERIC_ELEMENTARY_FUNCTIONS, which is a very slight variation of
that proposed in ISO DIS 11430, ``Proposed Standard for a Generic Package of
Elementary Functions for Ada.'' The Ada 9X version capitalizes on a feature
of Ada 9X (use of T'BASE as a type mark in declarations) not available in
the environment (Ada 83) to which the DIS is targeted. The feature has been
used here to declare the formal parameter types and result types of the
elementary functions to be the base type of the generic formal type,
eliminating the possibility of range violations at the interface. The same
feature can be used for local variables in the body of
GENERIC_ELEMENTARY_FUNCTIONS (if it is programmed in Ada) to avoid spurious
exceptions caused by range violations on assignments to local variables of
the generic formal type. Thus, there is no longer a need to allow
implementations to impose the restriction that the generic actual type in an
instantiation must be a base type; implementations must allow a
rangeconstrained subtype as the generic actual type, and they must be
immune to the potential effects of the range constraint.
An implementation that accommodates signed zeros (i.e., one for which
FLOAT_TYPE'SIGNED_ZEROS is TRUE) is required to exploit them in several
important contexts, in particular the signs of the zero results from the
``odd'' functions SIN, TAN, and their inverses and hyperbolic analogs, at
the origin, and the sign of the halfcycle result from ARCTAN and ARCCOT;
this follows a recommendation, in [kahan87], that provides important
benefits for complex elementary functions built upon the real elementary
functions, and for applications in conformal mapping. Exploitation of
signed zeros at the many other places where elementary functions can return
zero results is left implementation defined, since no obvious guidelines
exist for these cases.
1.4. Primitive Functions
For a general rationale for the primitive functions, the reader is referred
to [GPF91]. They are required for highquality, portable, efficient
mathematical software such as is provided in libraries of specialfunction
routines, and some are of value even for more mundane uses, like I/O
conversions and software testing. The primitive functions are provided in
support of Requirement R11.1A(1).
The casting of the primitive functions as attributes, rather than as
functions in a generic package (e.g., GENERIC_PRIMITIVE_FUNCTIONS, as
defined for Ada 83 in ISO CD 11729, ``Proposed Standard for a Generic
Package of Primitive Functions for Ada''), befits their primitive nature and
allows them to be used as components of static expressions, when the
arguments are static. MAX and MIN are particularly useful in this regard,
since they are sometimes needed in expressions in numeric type declarations,
for example to ensure that a requested precision is limited to the maximum
allowed.
The functionality of SUCCESSOR and PREDECESSOR, from the proposed
GENERIC_PRIMITIVE_FUNCTIONS standard, is provided by extending the existing
attributes SUCC and PRED to floatingpoint types. Note that T'SUCC(0.0)
returns the smallest positive number, which is a denormalized number if
T'DENORM is TRUE and a normalized number of T'DENORM is FALSE; this is
equivalent to the ``fmin'' derived constant of LCAS [LCAS]. Most of the
other constants and operations of LCAS are provided either as primitive
functions or other attributes in Ada 9X; those that are absent can be
reliably defined in terms of existing attributes.
The proposed separate standard for GENERIC_PRIMITIVE_FUNCTIONS stated that
the primitive functions accept and deliver machine numbers, which implies
that they never receive arguments in extended registers. Conceptually, that
requirement could be removed in Ada 9X, though we are by no means certain
that it is wise to do so, and we are still investigating the issue. If the
primitive functions always receive machine numbers, then, for example, the
result of T'EXPONENT(X) can be assumed to be in the range
T'MIN(T'EXPONENT(T'PRED(0.0)), T'EXPONENT(T'SUCC(0.0))) ..
T'MAX(T'EXPONENT(T'BASE'FIRST), T'EXPONENT(T'BASE'LAST))
and an integer type with that range can be declared to hold any value that
can be returned by T'EXPONENT(X). (These bounds accommodate the fact that
T'EXPONENT of a denormalized number returns a value less than
T'MACHINE_EMIN, and they also accommodate implementations that may use
radixcomplement representation.) However, if we define the primitive
functions so that they must accept the range of arguments that they might
receive in extended registers, then we cannot bound the results of
T'EXPONENT(X) by properties of the implementation, since the range of
extended registers is nowhere reflected in such properties. In that case,
one would be advised to construct an integer type of widest available range
(SYSTEM.MIN_INT .. SYSTEM.MAX_INT) for the type of a variable used to hold
values delivered by the EXPONENT attribute.
If extended range and precision are allowed in the arguments of the
primitive functions, T'SUCC, T'PRED, and T'ADJACENT will, nevertheless,
deliver machine numbers of the type T.
One primitive function that will be allowed to receive an argument in an
extended register is T'MACHINE(X), an attribute that was not represented by
a function in GENERIC_PRIMITIVE_FUNCTIONS. This attribute exists
specifically to give the programmer a way to discard excess precision if the
implementation happens to be using it, and if the details of an algorithm
are sensitive to its use. It also has the side effect of guaranteeing that
a value outside the range T'BASE'FIRST .. T'BASE'LAST is not propagated.
The attribute is a noop in implementations that do not use extended
registers. Its definition allows efficient implementations on
representative hardware. Thus, on IEEE hardware, it may be implemented
simply by storing an extended register into the shorter storage format of
the target type T; on implementations having types with extra precision but
not extra exponent range, it may be implemented by storing the highorder
part of a register pair into storage. Overflow may occur in the former case
but cannot occur in the latter; in both cases, values slightly outside the
range T'BASE'FIRST .. T'BASE'LAST can escape overflow by being rounded to an
endpoint of the range. (This actually happens on IEEE hardware.)
The related primitive function T'MODEL(X) also accepts its argument in an
extended register and shortens the result to a machine number. In this
case, however, the loss of loworder digits is potentially more severe. The
result is guaranteed to be a model number within the range T'MODEL_LARGE ..
T'MODEL_LARGE. This function returns its floatingpoint argument perturbed
to a nearby model number (if it is not already a model number) in the same
way that is allowed for operands and results of the predefined arithmetic
operations (see L.1.5), so it introduces no more error than what is already
allowed. By forcing a quantity to a nearby model number, it guarantees that
subsequent arithmetic operations and comparisons with the number will
experience no further perturbation and will therefore produce predictable
and consistent results. For example, suppose we have a situation like
if X > 1.0 then
...
 several references to X
...
end if;
in which X can be extremely close to 1.0. If X is in the first model
interval above 1.0, the semantics of floatingpoint arithmetic allow the
references to X inside the if statement to behave as if they had the value
1.0, seemingly contradicting the condition that allows entry there, and
multiple references could behave as if they yielded slightly different
values. If this is intolerable, then one can write
Y := T'MODEL(X);
if Y > 1.0 then
...
 several references to Y
...
end if;
The value of Y can be no worse than some value already allowed for the
result of the operation that produced X. If the if statement is entered, we
are guaranteed that Y exceeds 1.0 and that all references to it yield the
same value. If X has a value slightly exceeding 1.0, the if statement might
not be entered, but that was also true in the earlier example.
In implementations in which the model numbers coincide with the machine
numbers, T'MODEL reduces to T'MACHINE, and if in that case extended
registers are not being used, both are noops.
1.5. Possible Future Additions
It has been suggested that Ada 9X should be positioned to compete better
with certain kinds of numeric applications written in Fortran 90, and even
in C, by adding at least rudimentary facilities for random number generation
and for complex arithmetic. Any such facilities that we may propose in the
near future, if the Ada community concurs that they are needed and should be
provided, will take the form of optional predefined packages or generic
packages. For random number generation, we would probably propose only a
uniform random number capability; perhaps the ability to have multiple
generators; subprograms for saving and setting the seed(s) and for
initializing the generator (or a generator) to a repeatable (but
implementationdependent), or a random (timedependent), state without the
use of seeds; and subprograms to fill a whole array with random numbers in
one call. Neither algorithms nor statistical tests would be prescribed.
For complex arithmetic, we would probably propose only a generic package
exporting a visible Cartesian complex type (whose real and imaginary parts
are parameterized by an imported floatingpoint type), the appropriate
arithmetic operators for the complex type, and a small set of complex
elementary functions (e.g., those in Fortran 90). This is a small subset of
the capabilities on which the SIGAda Numerics Working Group has been working
for several years. It is unlikely that the proposal would include accuracy
requirements or requirements for freedom from spurious exceptions (those
that might be produced by relatively ``naive'' implementations when an
intermediate result overflows but the components of the final result do
not), since practical requirements in these areas have not yet been
determined and agreed upon by researchers.
Table of Contents
1. Numerics Annex (Rationale) 1
1.1. Semantics of FloatingPoint Arithmetic 1
1.1.1. FloatingPoint Machine Numbers 1
1.1.2. Attributes of FloatingPoint Machine Numbers 1
1.1.3. FloatingPoint Model Numbers 2
1.1.4. Attributes of FloatingPoint Model Numbers 3
1.1.5. Accuracy of FloatingPoint Operations 3
1.1.6. FloatingPoint Type Declarations 5
1.1.7. Compatibility Considerations 5
1.2. Semantics of FixedPoint Arithmetic 6
1.3. Elementary Functions 7
1.4. Primitive Functions 7
1.5. Possible Future Additions 8