Ada 9X LSN045MRT
Numerics Annex (Specification), Vers. 4.7
June 1992
K W Dritz
Argonne National Laboratory
Argonne, IL 60439
Email: dritz@mcs.anl.gov
Attached below is my revision of the Numerics Annex (specification) following
the last DRs' meeting. The changes have mostly to do with reducing the
incompatibilities perceived to stem from the proposed dropping of the 4B Rule;
a version of it previously discussed in the Vers. 4.1 Numerics Annex
(rationale) has now been incorporated. The other major change has to do with
the restoration of the concept of _small_ for fixedpoint types, with an
appropriate default rule. The major changes are marked in the margin with
change bars. The rationale, which I will send separately, has a major
discussion of the issues surrounding the incorporation of a version of the 4B
Rule (which, by the way, should now be called the 4D Rule).
Ken
===============================================================================
L. Numerics Annex (Specification)
The semantic models of floatingpoint and fixedpoint arithmetic, a generic 
package of elementary functions, and a collection of attributes comprising 
``primitive'' floatingpoint manipulation functions are presented in this 
annex. The ultimate placement of the features described here is still 
undecided. At this writing, it is thought that the generic package of 
elementary functions will likely be included with other required packages in 
a chapter of the core devoted to an ``Ada Standard Library,'' and the 
primitive function attributes will likely also be in the core and required 
of all implementations. It is intended that the models of floatingpoint 
and fixedpoint arithmetic also be supported by all implementations, and 
their placement in an annex is purely for presentation purposes. Little of 
the original intent of the Numerics Annex, as a repository for optional 
features that need be supported only by implementations serving the special 
needs of numeric applications, remains. 
L.1. Semantics of FloatingPoint Arithmetic
L.1.1. FloatingPoint Machine Numbers
Associated with each floatingpoint type is a finite set of machine numbers.
The machine numbers of a type are those capable of being represented, to
full accuracy, in the storage representation of the type. The machine
numbers of a derived type are those of the parent type; the machine numbers
of a subtype are those of the base type.
L.1.2. Attributes of FloatingPoint Machine Numbers
Attributes related to the machine numbers of a floatingpoint type T (i.e.,
to its storage representation) are defined in this section.
T'MACHINE_RADIX yields the radix of the hardware representation of
T. T'MACHINE_MANTISSA yields the largest integer value of p, and
T'MACHINE_EMIN and T'MACHINE_EMAX respectively the most negative and most
positive integer values of exponent, such that every number expressible in
the ``canonical form''
sign * mantissa * (radix ** exponent)
where
 sign = +1 or 1;
 radix = T'MACHINE_RADIX; and
 mantissa is a pdigit fraction in the number base radix, the first
digit of which is nonzero
is a machine number of T, i.e., representable to full accuracy in the
storage representation of T. If, in addition, every number expressible in
the canonical form, but where
 sign = +1 or 1;
 radix = T'MACHINE_RADIX;
 exponent = T'MACHINE_EMIN; and
 mantissa is a T'MACHINE_MANTISSAdigit nonzero fraction in the
number base radix, the first digit of which is zero
is also a machine number of T (called in IEEE Std. 754 a denormalized
number), then the attribute T'DENORM yields the value TRUE; otherwise, it
yields FALSE. T'DENORM is a new representation attribute of the type T. An
implementation (i.e., one employing ``radixcomplement'' representation) may
furthermore include T'MACHINE_RADIX ** T'MACHINE_EMAX and possibly
T'MACHINE_RADIX ** (T'MACHINE_EMIN  2) in the set of machine numbers of T;
if so, they must be documented in Appendix F. Of course, zero is also a
machine number of T.
An implementation may have two distinct representations for floatingpoint
zeros, with positive and negative sign respectively, having the properties
given in IEEE Std. 754 or 854. The attribute T'SIGNED_ZEROS yields TRUE in
this case, and FALSE otherwise. T'SIGNED_ZEROS is a new representation
attribute of the type T.
Note: Even if T'SIGNED_ZEROS is TRUE, the predefined equality operator
yields TRUE given two operands of zero; this is a consequence of the IEEE
standards cited above. Some of the elementary and primitive functions (see
L.3 and L.4, respectively) yield results, given operands of zero, that
depend on the value of T'SIGNED_ZEROS.
The representation attributes T'MACHINE_ROUNDS and T'MACHINE_OVERFLOWS are
retained. The meaning of T'MACHINE_OVERFLOWS is clarified (see L.1.5).
The attributes T'MACHINE_RADIX, T'MACHINE_MANTISSA, T'MACHINE_EMIN, and
T'MACHINE_EMAX return results of the type universal_integer. The attributes
T'SIGNED_ZEROS and T'DENORM return results of type BOOLEAN.
L.1.3. FloatingPoint Model Numbers
Associated with each floatingpoint type is an infinite set of model
numbers. The model numbers of a type are used to define the accuracy
requirements that must be satisfied by certain predefined operations of the
type (see L.1.5); through certain attributes of the model numbers, they are
also used to explain the meaning of a userdeclared floatingpoint type
declaration (see L.1.6). The model numbers of a derived type are those of
the parent type; the model numbers of a subtype are those of the base type.
The model numbers of a floatingpoint type T are zero and all the numbers
expressible in the canonical form, where
 sign = +1 or 1;
 radix = T'MACHINE_RADIX;
 exponent is an integer >= T'MODEL_EMIN; and
 mantissa is a T'MODEL_MANTISSAdigit fraction in the number base
radix, the first digit of which is nonzero.
L.1.4. Attributes of FloatingPoint Model Numbers
Attributes related to the model numbers of a floatingpoint type T are
defined as follows. The attributes T'MODEL_MANTISSA and T'MODEL_EMIN used
to define the model numbers, and the attribute T'MODEL_EMAX, are determined
by the accuracy delivered by certain predefined operations of the type T and
by their ability to avoid overflow. More precisely, T'MODEL_MANTISSA,
T'MODEL_EMIN, and T'MODEL_EMAX yield, respectively, the largest integer <=
T'MACHINE_MANTISSA, the most negative integer >= T'MACHINE_EMIN, and the
most positive integer <= T'MACHINE_EMAX such that certain predefined
operations of the type T satisfy the accuracy requirements given in L.1.5,
expressed in terms of the model numbers of the type T and in terms of the
attribute T'MODEL_LARGE, which is defined as follows:
T'MODEL_LARGE = T'MACHINE_RADIX ** T'MODEL_EMAX *
(1.0  T'MACHINE_RADIX ** (T'MODEL_MANTISSA))
Two additional attributes of the model numbers are defined for convenience,
as follows:
 T'MODEL_EPSILON = T'MACHINE_RADIX ** (1  T'MODEL_MANTISSA). This
attribute gives the absolute value of the difference between the
model number 1.0 and the next higher model number of the type T.
 T'MODEL_SMALL = T'MACHINE_RADIX ** (T'MODEL_EMIN  1). This
attribute gives the value of the smallest positive (nonzero) model
number of the type T.
The attributes T'MODEL_LARGE, T'MODEL_SMALL, and T'MODEL_EPSILON return
results of the type universal_real. The attributes T'MODEL_MANTISSA,
T'MODEL_EMAX, and T'MODEL_EMIN return results of the type universal_integer.
For a userdeclared floatingpoint type T, T'DIGITS returns the precision 
specified in the floating_accuracy_definition of T; the same value is 
returned for any type derived from T or any subtype of T. (In Ada 9X, a 
floating_accuracy_definition is not allowed in a subtype declaration.) For 
a predefined type P, the value of P'DIGITS is the largest value of D for 
which ceiling(D * log(10)/log(P'MACHINE_RADIX) + 1) <= P'MODEL_MANTISSA. 

The Ada 83 attributes T'MANTISSA, T'EMAX, T'LARGE, T'SMALL, T'EPSILON, 
T'SAFE_EMAX, T'SAFE_LARGE, and T'SAFE_SMALL are removed from the language, 
but for purposes of upward compatibility implementations are encouraged to 
retain them as implementationdefined attributes with the same values they 
had in Ada 83. 
L.1.5. Accuracy of FloatingPoint Operations
The accuracy requirements for the evaluation of certain predefined
operations of floatingpoint types are stated as follows.
Note: We present here a tentative version of the entire rewrite of RM 4.5.7
anticipated for Ada 9X. This section does not cover the accuracy of an 
operation of a static expression that involves only the operators of the 
root numeric types; such operations must be evaluated exactly (see 4.9). 
(Operators of the root_real type behave in other contexts like operators of 
a floatingpoint type whose model numbers have a precision and maximum 
exponent at least as great as, and a minimum exponent at least as small as, 
those of any other floatingpoint type declared in STANDARD with DIGITS 
equal to SYSTEM.MAX_DIGITS; see 3.5.6.) It also does not cover the accuracy 
of the predefined attributes of a floatingpoint subtype that yield a value
of the type; such operations also yield exact results (see L.4 and
elsewhere). Finally, it should be noted that values outside the range
T'FIRST .. T'LAST can be assigned to variables, passed to parameters, and
returned from functions whose type T is a numeric base type (because range
checking is no longer performed in those contexts when the type of the
variable, formal parameter, or function is a numeric base type), and that
fetching, in any context, the value denoted by a name or function_call whose
type T is a numeric base type can, but need not, raise CONSTRAINT_ERROR when
the value is outside the range T'FIRST .. T'LAST; thus no special provision
is made in this section for the possible raising of CONSTRAINT_ERROR when
the value denoted by a name or a function_call is used as the operand of a
predefined operation.
A model interval of a floatingpoint type is any interval whose bounds are
model numbers of the type. The model interval of a type T associated with a
value V is the smallest model interval of T that includes V. (The model
interval associated with a model number of a type consists of that number
only.) An operand interval is the model interval, of the type specified for
the operand of an operation, associated with the value of the operand. If
the absolute value of either bound of a model interval of T exceeds
T'MODEL_LARGE, the model interval is said to be out of bounds; otherwise, it
is said to be in bounds.
For any predefined arithmetic operation that yields a result of a
floatingpoint type T, the required bounds on the result are given by a
model interval of T (called the ``result interval'') defined in terms of the
operand values as follows:
The result interval is the smallest model interval of T that
includes the minimum and the maximum of all the values obtained by
applying the (exact) mathematical operation to values arbitrarily
selected from the respective operand intervals.
The result interval of an exponentiation is obtained by applying the above
rule to the sequence of multiplications defined by the exponent, assuming
arbitrary association of the factors, and to the final division in the case
of a negative exponent.
The result interval of a conversion of a numeric value to a floatingpoint
type T is the model interval of T associated with the operand value, except
when the source expression has a fixedpoint type or is a fixedpoint
multiplication or division; in these cases, the result interval is
implementation defined. Note: A conversion to a constrained subtype of a
type is a conversion to the type followed by a check the result of the
conversion belongs to the subtype, as in Ada 83.
For any of the foregoing operations, the implementation must deliver a value
that belongs to the result interval when the result interval is in bounds;
otherwise (i.e., when the result interval is out of bounds),
 if T'MACHINE_OVERFLOWS is TRUE, the implementation must either
deliver a value that belongs to the result interval or raise
CONSTRAINT_ERROR;
 if T'MACHINE_OVERFLOWS is FALSE, the result is implementation
defined.
For any predefined relation on operands of a floatingpoint type T, the
implementation may deliver any value (i.e., either TRUE or FALSE) obtained
by applying the (exact) mathematical comparison to values arbitrarily chosen
from the respective operand intervals.
The result of a membership test is defined in terms of comparisons of the
operand value with the lower and upper bounds of the given range or type
mark (the usual rules apply to these comparisons).
L.1.6. FloatingPoint Type Declarations
A floatingpoint type declaration of one of the two forms (that is, with or
without the optional range constraint indicated by the square brackets):
type T is digits D [range L .. R];
is, by definition, equivalent to the following declarations:
type floating_point_type is new P;
subtype T is floating_point_type
[range floating_point_type(L) .. floating_point_type(R)];
where floating_point_type is an anonymous type, and where P is a predefined
floatingpoint type implicitly selected by the implementation so that it
satisfies the following requirements:
 P'DIGITS >= D. 

 If a range L .. R is specified, then P'MODEL_LARGE >= max(abs(L), 
abs(R)); otherwise, P'MODEL_LARGE >= 10.0 ** (4*D). 
The floatingpoint type declaration is illegal if none of the predefined
floatingpoint types available for implicit selection as a parent type in a
floatingpoint type definition satisfies these requirements. Note:
Implementations may provide other predefined numeric types that are not
available for implicit selection in a numeric type definition.
The definition of the named number SYSTEM.MAX_DIGITS is changed slightly in 
Ada 9X. It now gives the maximum precision that can be requested in the 
declaration of an unconstrained floatingpoint type. Implementations may 
allow types with higher precisions to be declared, provided that their 
declarations include range constraints. 
L.2. Semantics of FixedPoint Arithmetic
The language features for, and especially the model of, fixedpoint 
arithmetic are simplified to facilitate their use and to foster wider 
implementation of the features. The concept of model numbers no longer 
applies to fixedpoint types. 

A special kind of fixedpoint type, called a decimal fixedpoint type, or 
simply a decimal type, is introduced by the Information Systems Annex (see 
IS:DECIMAL_FIXED_POINT). Throughout this section, unqualified references to 
fixedpoint types apply to all fixedpoint types, whether decimal or not. 
Fixedpoint types that are not decimal types are referred to, when 
necessary, as ``ordinary fixedpoint types.'' 



L.2.1. Values and Attributes of FixedPoint Types 

The values of a fixedpoint type are an infinite set of numbers, which are 
the integer multiples of the type's small. The values of a type derived 
from a fixedpoint type are those of the parent type; the values of a 
subtype of a fixedpoint type are those of the base type that satisfy the 
subtype's range constraint. A fixed_accuracy_definition is no longer 
allowed in a subtype declaration. 

For a fixedpoint type T, T'MACHINE_RADIX (which was allowed only for 
floatingpoint types in Ada 83) yields the radix of the hardware 
representation of T. For ordinary fixedpoint types, this attribute always 
yields 2. For decimal types, it yields the value (which may be either 2 or 
10) specified for the type in an attribute definition clause for 
MACHINE_RADIX; it is implementation defined, but restricted to the same 
choices, in the absence of such a clause (see IS:INTERNAL_DECIMAL_REP). (An 
attribute definition clause for MACHINE_RADIX is not allowed for ordinary 
fixedpoint types.) The Ada 83 attributes T'MACHINE_ROUNDS and 
T'MACHINE_OVERFLOWS are retained; the meaning of the latter is clarified 
(see L.2.3). T'FORE and T'AFT are also retained. 

T'SMALL yields the absolute value of the difference between consecutive 
values of the type T; that is, it yields the value of the small of the type. 
If not specified in an attribute definition clause for SMALL, an ordinary 
fixedpoint type's small is, by default, an implementationdefined power of 
two less than or equal to its delta. The small of a userdeclared ordinary 
fixedpoint type may be specified explicitly in an attribute definition 
clause; the value given must be less than or equal to the type's delta. The 
small of a decimal type (see IS:DECIMAL_FIXED_POINT) is always the same as 
its delta and is not explicitly specifiable. Implementations are required 
to support binary smalls (smalls that are powers of two); implementations 
claiming conformance to the Information Systems Annex (see K) are, in 
addition, required to support decimal smalls (smalls that are powers of 
ten). Implementations are allowed, but not required, to support other 
smalls. 

For an arbitrary fixedpoint subtype T, T'SMALL = T'BASE'SMALL. 

For a userdeclared fixedpoint type T, T'DELTA returns the delta specified 
in the fixed_accuracy_definition of T; the same value is returned for any 
type derived from T and for any subtype of T. For a predefined fixedpoint 
type P, the value of P'DELTA is the same as the value of P'SMALL. 

The Ada 83 attributes T'MANTISSA, T'LARGE, T'SAFE_LARGE, and T'SAFE_SMALL 
are removed from the language, but for purposes of upward compatibility 
implementations are encouraged to retain them as implementationdefined 
attributes with the same values they had in Ada 83. 
L.2.2. FixedPoint Type Declarations
An ordinary fixedpoint type declaration
type T is delta D range L .. R;
[for T'SMALL use S;]
where S (if specified) is less than or equal to D is, by definition,
equivalent to the following declarations:
type fixed_point_type is new P;
subtype T is fixed_point_type
range fixed_point_type(L) .. fixed_point_type(R);
where fixed_point_type is an anonymous type, and where P is a predefined
fixedpoint type implicitly selected by the implementation so that it
satisfies the following requirements:
 if S is specified, then P'SMALL = S; otherwise, P'SMALL is an 
implementationdefined power of two less than or equal to D; 

 if abs(R) is a power of two times P'SMALL, P'LAST >= R  P'SMALL; 
otherwise, P'LAST >= R; 

 if abs(L) is a power of two times P'SMALL, P'FIRST <= L + P'SMALL; 
otherwise, P'FIRST <= L. 
The fixedpoint type declaration is illegal if none of the predefined
fixedpoint types available for implicit selection as a parent type in a
fixedpoint type definition satisfies these requirements. Note:
Implementations may provide other predefined numeric types that are not
available for implicit selection in a numeric type definition.
The range of the subtype T declared by the preceding fixedpoint type
declaration is determined as follows:
 T'LAST = min(P'LAST, R);
 T'FIRST = max(P'FIRST, L).
The rules for the selection of the underlying predefined type used to 
represent a userdeclared decimal type T1 are deducible from those applying 
to a particular ordinary fixedpoint type T2 related to T1 (see 
IS:DECIMAL_FIXED_POINT). 
With the elimination of the model numbers for fixedpoint types, the 
definition of the named number SYSTEM.MAX_MANTISSA must be revised slightly. 
Informally, this measure is related to the maximum ``normalized'' magnitude 
of any value of a fixedpoint type or subtype (more precisely, to the number 
of bits required to hold the maximum normalized magnitude). An appropriate 
definition is the maximum value of 

ceiling(log2(max(abs(T'LAST), abs(T'FIRST)) / T'SMALL)) 

for any ordinary fixedpoint type T. Also, the definition of the named 
number SYSTEM.FINE_DELTA is amended slightly to clarify that it applies only 
to ordinary fixedpoint types. 
L.2.3. Accuracy of FixedPoint Operations
The accuracy requirements for the predefined fixedpoint arithmetic
operations and conversions, and the results of relations on fixedpoint
operands, are given below. This section does not cover the accuracy of an 
operation of a static expression that involves only the operators of the 
root numeric types; such operations must be evaluated exactly (see 4.9). 
As in Ada 83, the operands of the fixedpoint adding operators, absolute
value, and comparisons must have identical types. These operations are
required to yield exact results, since no implementation difficulties are
posed by this requirement. Overflow considerations are discussed later.
Multiplications and divisions are allowed between operands of any two 
fixedpoint types. Although this can be viewed as an operation that yields 
an infinitely precise result of a special type, followed by its conversion 
to the result type (see 4.5.5), for purposes of defining the accuracy rules 
we treat this instead as a single operation whose accuracy depends on three 
types (those of the operands and the result). In contrast to Ada 83, the 
result need not always be converted explicitly to some numeric type. 
Explicit conversion is not required when the surrounding context implies a 
unique type; implicit conversion takes place in that case. Explicit 
conversion is required when the context does not provide a unique result 
type. For decimal types, the attribute T'ROUND may be used to imply 
explicit conversion with rounding (see IS:ROUNDING_CONTROL). 

When the result type is a floatingpoint type, the accuracy is 
implementation defined (see L.1.5); this case is not further discussed here. 
For some combinations of the operand and result types in the remaining 
cases, the result is required to belong to a small set of values called the 
``perfect result set''; for other combinations, it is required merely to 
belong to a generally larger and implementationdefined set of values called 
the ``close result set.'' When the result type is a decimal type, the 
perfect result set contains a single value; thus, operations on decimal 
types are always deterministic. 

When one operand of a fixedpoint multiplication or division is of type 
universal_real, a case allowed in Ada 9X but not allowed in Ada 83 (see 
4.5.5), that operand is not implicitly converted in the usual sense, since 
the context does not determine a unique target type, but the accuracy of the 
result of the multiplication or division (i.e., whether the result must 
belong to the perfect result set or merely the close result set) depends on 
the value of the operand of type universal_real and on the types of the 
other operand and of the result. We need not consider here the 
multiplication or division of two such operands, since in that case either 
the operation is evaluated exactly (i.e., it is an operation of a static 
expression all of whose operators are of a root numeric type) or it is 
considered to be an operation of a floatingpoint type (see 3.5.6). 
For a fixedpoint multiplication or division whose (exact) mathematical
result is V, and for the conversion of a value V to a fixedpoint type, the
``perfect result set'' and ``close result set'' are defined as follows:
 If the result type is an ordinary fixedpoint type with a small of 
S, 
* if V is an integer multiple of S, then the perfect result set
contains only the value V;
* otherwise, it contains the integer multiple of S just below V
and the integer multiple of S just above V.
The close result set is an implementationdefined set of
consecutive integer multiples of S containing the perfect result
set as a subset.
 If the result type is a decimal type with a small of S, 

* if V is an integer multiple of S, then the perfect result set 
contains only the value V; 

* otherwise, if truncation applies then it contains only the 
integer multiple of S in the direction toward zero, whereas 
if rounding applies then it contains only the nearest integer 
multiple of S (with ties broken by rounding away from zero). 

The close result set is an implementationdefined set of 
consecutive integer multiples of S containing the perfect result 
set as a subset. Note: As a consequence of subsequent rules, this 
case does not arise when the operand types are also decimal types. 
 If the result type is an integer type,
* if V is an integer, then the perfect result set contains only
the value V;
* otherwise, it contains the integer nearest to the value V (if
V lies equally distant from two consecutive integers, the
perfect result set contains both).
The close result set is an implementationdefined set of
consecutive integers containing the perfect result set as a
subset.
The result of a fixedpoint multiplication or division must belong either to
the perfect result set or to the close result set, as described below, if
overflow does not occur. (Overflow is discussed later.) In the following
cases, if the result type is a fixedpoint type, let S be its small;
otherwise, i.e. when the result type is an integer type, let S be 1.0.
 For a multiplication or division neither of whose operands is of 
type universal_real, let L and R be the smalls of the left and 
right operands. For a multiplication, if (L * R) / S is an 
integer or the reciprocal of an integer (the smalls are said to be 
``compatible'' in this case), the result must belong to the 
perfect result set; otherwise, it belongs to the close result set. 
For a division, if L / (R * S) is an integer or the reciprocal of 
an integer (i.e., the smalls are compatible), the result must 
belong to the perfect result set; otherwise, it belongs to the 
close result set. Note: When the operand and result types are all 
decimal types, their smalls are necessarily compatible; the same 
is true when they are all ordinary fixedpoint types with binary 
smalls. 

 For a multiplication or division having one universal_real operand 
with a value of V, note that it is always possible to factor V as 
an integer multiple of a ``compatible'' small, but the integer 
multiple may be ``too big.'' If the factorization allows an 
integer multiple less than some implementationdefined limit, the 
result must belong to the perfect result set; otherwise, it 
belongs to the close result set. 
A multiplication P * Q of an operand of a fixedpoint type F by an operand
of an integer type I, or viceversa, and a division P / Q of an operand of a
fixedpoint type F by an operand of an integer type I, are also allowed, as
in Ada 83. In these cases, the result has a type of F; explicit conversion
of the result is never required. The accuracy required in these cases is
the same as that required for a multiplication F(P * Q) or a division F(P /
Q) obtained by interpreting the operand of the integer type to have a
fixedpoint type with a small of 1.0.
The accuracy of the result of a conversion from an integer or fixedpoint
type to a fixedpoint type, or from a fixedpoint type to an integer type,
is the same as that of a fixedpoint multiplication of the source value by a
fixedpoint operand having a small of 1.0 and a value of 1.0, as given by
the foregoing rules. The result of a conversion from a floatingpoint type
to a fixedpoint type must belong to the close result set.
The possibility of overflow in the result of a predefined arithmetic
operation or conversion yielding a result of a fixedpoint type T is
analogous to that for floatingpoint types. If all of the permitted results
belong to the range T'BASE'FIRST .. T'BASE'LAST, then the implementation
must deliver one of the permitted results; otherwise,
 if T'MACHINE_OVERFLOWS is TRUE, the implementation must either
deliver one of the permitted results or raise CONSTRAINT_ERROR;
 if T'MACHINE_OVERFLOWS is FALSE, the result is implementation
defined.
L.2.4. Attributes of FixedPoint Numbers
Because the model of fixedpoint arithmetic is no longer expressed in terms
of model numbers and model intervals, no attributes related to the Ada 83
model are required (except T'DELTA, T'SMALL, T'FIRST, and T'LAST). In 
particular, the attributes T'MANTISSA, T'LARGE, T'SAFE_SMALL, and 
T'SAFE_LARGE (of a fixedpoint type T) are eliminated and not replaced by 
other attributes. T'FORE and T'AFT are retained because of their connection 
with I/O.
L.3. Elementary Functions
Implementations conforming to the Numerics Annex shall provide a predefined
generic package called GENERIC_ELEMENTARY_FUNCTIONS and an accompanying
predefined package called ELEMENTARY_FUNCTIONS_EXCEPTIONS having the
following specifications:
package ELEMENTARY_FUNCTIONS_EXCEPTIONS is
ARGUMENT_ERROR : exception;
end ELEMENTARY_FUNCTIONS_EXCEPTIONS;
with ELEMENTARY_FUNCTIONS_EXCEPTIONS;
generic
type FLOAT_TYPE is digits <>;
package GENERIC_ELEMENTARY_FUNCTIONS is
subtype FLOAT_BASE is FLOAT_TYPE'BASE;
function SQRT (X : FLOAT_BASE) return FLOAT_BASE;
function LOG (X : FLOAT_BASE) return FLOAT_BASE;
function LOG (X, BASE : FLOAT_BASE) return FLOAT_BASE;
function EXP (X : FLOAT_BASE) return FLOAT_BASE;
function "**" (X, Y : FLOAT_BASE) return FLOAT_BASE;
function SIN (X : FLOAT_BASE) return FLOAT_BASE;
function SIN (X, CYCLE : FLOAT_BASE) return FLOAT_BASE;
function COS (X : FLOAT_BASE) return FLOAT_BASE;
function COS (X, CYCLE : FLOAT_BASE) return FLOAT_BASE;
function TAN (X : FLOAT_BASE) return FLOAT_BASE;
function TAN (X, CYCLE : FLOAT_BASE) return FLOAT_BASE;
function COT (X : FLOAT_BASE) return FLOAT_BASE;
function COT (X, CYCLE : FLOAT_BASE) return FLOAT_BASE;
function ARCSIN (X : FLOAT_BASE) return FLOAT_BASE;
function ARCSIN (X, CYCLE : FLOAT_BASE) return FLOAT_BASE;
function ARCCOS (X : FLOAT_BASE) return FLOAT_BASE;
function ARCCOS (X, CYCLE : FLOAT_BASE) return FLOAT_BASE;
function ARCTAN (Y : FLOAT_BASE;
X : FLOAT_BASE := 1.0) return FLOAT_BASE;
function ARCTAN (Y : FLOAT_BASE;
X : FLOAT_BASE := 1.0;
CYCLE : FLOAT_BASE) return FLOAT_BASE;
function ARCCOT (X : FLOAT_BASE;
Y : FLOAT_BASE := 1.0) return FLOAT_BASE;
function ARCCOT (X : FLOAT_BASE;
Y : FLOAT_BASE := 1.0;
CYCLE : FLOAT_BASE) return FLOAT_BASE;
function SINH (X : FLOAT_BASE) return FLOAT_BASE;
function COSH (X : FLOAT_BASE) return FLOAT_BASE;
function TANH (X : FLOAT_BASE) return FLOAT_BASE;
function COTH (X : FLOAT_BASE) return FLOAT_BASE;
function ARCSINH (X : FLOAT_BASE) return FLOAT_BASE;
function ARCCOSH (X : FLOAT_BASE) return FLOAT_BASE;
function ARCTANH (X : FLOAT_BASE) return FLOAT_BASE;
function ARCCOTH (X : FLOAT_BASE) return FLOAT_BASE;
ARGUMENT_ERROR : exception
renames ELEMENTARY_FUNCTIONS_EXCEPTIONS.ARGUMENT_ERROR;
end GENERIC_ELEMENTARY_FUNCTIONS;
The specifications above are identical to the proposed separate ISO standard
(DIS 11430) for the elementary functions except that the formal parameters
and results of the elementary functions are of the base type of the generic
formal type, rather than the type itself.
It is intended that implementations of the GENERIC_ELEMENTARY_FUNCTIONS
conform to the various semantic requirements (regarding domains, ranges,
exception handling, accuracy, prescribed results, etc.) presented in DIS
11430 and not repeated here, except that implementations conforming to the
Numerics Annex must allow GENERIC_ELEMENTARY_FUNCTIONS to be instantiated
with a rangeconstrained floatingpoint subtype, and the body must be immune
to potential effects of the range constraint; in other words,
implementations are not allowed to impose a restriction (allowed by DIS
11430) that the generic actual type in an instantiation must be a base type.
In Ada 9X, the accuracy requirements are expressed in terms of
FLOAT_TYPE'EPSILON, since EPSILON of a subtype is now that of the base type.
In DIS 11430, the accuracy requirements are expressed in terms of
FLOAT_TYPE'BASE'EPSILON.
The ARCTAN and ARCCOT functions must exploit signed zeros, if present in the
implementation (as indicated by the value of FLOAT_TYPE'SIGNED_ZEROS). In
particular, when X is negative and Y is zero:
 if FLOAT_TYPE'SIGNED_ZEROS is TRUE, ARCTAN(Y, X, CYCLE) and
ARCCOT(X, Y, CYCLE) must deliver CYCLE/2.0 when Y is a negative
zero and +CYCLE/2.0 when Y is a positive zero;
 if FLOAT_TYPE'SIGNED_ZEROS is FALSE, ARCTAN(Y, X, CYCLE) and
ARCCOT(X, Y, CYCLE) deliver CYCLE/2.0.
The behavior of the versions of ARCTAN and ARCCOT without a CYCLE parameter
is similar in the above case (i.e., when X is negative and Y is zero),
except that the result is then an appropriate approximation of plus or minus
pi.
In addition, the zero delivered by SIN, ARCSIN, SINH, ARCSINH, TAN, TANH,
and ARCTANH when X is zero must have the same sign as X when
FLOAT_TYPE'SIGNED_ZEROS is TRUE; similarly, the zero delivered by ARCTAN
when X is positive and Y is zero must have the same sign as Y when
FLOAT_TYPE'SIGNED_ZEROS is TRUE. (This requirement goes beyond DIS 11430,
which did not specify the sign of the result in these cases.) The extent of
the exploitation of signed zeros is left implementation defined in the many
other contexts in which an elementary function can return a zero result.
L.4. Primitive Functions
Implementations conforming to the Numerics Annex shall provide the following
additional attributes:
T'EXPONENT(X)
T'FRACTION(X)
T'COMPOSE(FRACTION, EXPONENT)
T'SCALE(X, EXPONENT_ADJUSTMENT)
T'FLOOR(X)
T'CEILING(X)
T'ROUNDING(X)
T'TRUNCATION(X)
T'REMAINDER(X, Y)
T'ADJACENT(X, TOWARDS)
T'COPY_SIGN(VALUE, SIGN)
T'LEADING_PART(X, RADIX_DIGITS)
T'MIN(X, Y)
T'MAX(X, Y)
T'MODEL(X)
T'MACHINE(X)
In the case of MIN and MAX, the prefix may denote any scalar type or
subtype; for the other attributes, the prefix must denote a floatingpoint
type or subtype. Implementations conforming to the Numerics Annex shall
also extend the attributes T'SUCC(X) and T'PRED(X) to apply when T is a
floatingpoint type or subtype.
All of the above attributes except MIN, MAX, MODEL, and MACHINE correspond
directly to functions in the GENERIC_PRIMITIVE_FUNCTIONS generic package
proposed as a separate ISO standard (CD 11729) for Ada 83. The ROUNDING and
TRUNCATION attributes correspond to the ROUND and TRUNCATE functions in
GENERIC_PRIMITIVE_FUNCTIONS; the latter names are proposed in the
Information Systems Annex for entirely different attributes (see
IS:ROUNDING_CONTROL). The EXPONENT_ADJUSTMENT parameter of the SCALE
attribute corresponds to the EXPONENT parameter of the SCALE function of
GENERIC_PRIMITIVE_FUNCTIONS. The SUCCESSOR and PREDECESSOR functions of
GENERIC_PRIMITIVE_FUNCTIONS are provided by the extension of the existing
SUCC and PRED attributes. The functionality of the DECOMPOSE procedure of
GENERIC_PRIMITIVE_FUNCTIONS is not provided. MIN, MAX, MODEL, and MACHINE
are new (not taken from GENERIC_PRIMITIVE_FUNCTIONS).
The type of the result yielded by all of the ``primitive function''
attributes except EXPONENT is the base type of T; EXPONENT yields a result
of type universal_integer. The type of actual parameters corresponding to
X, Y, FRACTION, TOWARDS, VALUE, and SIGN must be the base type of T. Actual
parameters corresponding to EXPONENT, EXPONENT_ADJUSTMENT, and RADIX_DIGITS
may be of any integer type (i.e., the formal parameter has type
universal_integer). The value of an actual parameter corresponding to
RADIX_DIGITS must be positive. All the attributes preserve staticness.
These attributes deliver results that are accurate to the level of machine
numbers. Like T'FIRST and T'LAST, which also must deliver fully accurate
results, they are not among the predefined operations covered by the
replacement for RM 4.5.7 (see L.1.5). Note: A decision has not yet been
made about whether extra accuracy can be passed in to a primitive function,
and whether that implies that the extra accuracy must be maintained during
the operation and must affect the result.
It is anticipated that the attributes corresponding to functions in
GENERIC_PRIMITIVE_FUNCTIONS will be defined as they were there, subject to
modifications when a decision is made on the role of extra precision. Their
definitions are not repeated here. The definitions depend, in some cases,
on the presence or absence of denormalized numbers and signed zeros, as
reflected in the values of T'DENORM and T'SIGNED_ZEROS, respectively. The
other attributes are defined below.
T'MACHINE(X) returns the value of X rounded or truncated to a neighboring
machine number (see L.1.1) of the type T; i.e., extra precision beyond
T'MACHINE_MANTISSA radix digits is discarded, and CONSTRAINT_ERROR may be
raised if the value of X is sufficiently outside the range T'BASE'FIRST ..
T'BASE'LAST that rounding or truncating it to the precision of the machine
numbers cannot yield a result in this range (i.e., cannot yield the
appropriate bound of this range).
T'MODEL(X) is defined as follows:
 if X is a model number of the type T (see L.1.3) in the range
T'MODEL_LARGE .. T'MODEL_LARGE, X is returned;
 if X lies between two consecutive model numbers of the type T in
that range, one of those surrounding model numbers is returned;
and
 if X lies outside that range, CONSTRAINT_ERROR is raised.
T'MIN(X, Y) and T'MAX(X, Y) return the minimum and the maximum of their two
arguments, respectively.
Index
Canonical form
definition L1
of denormalized floatingpoint machine numbers L1
of floatingpoint model numbers L1
of normalized floatingpoint machine numbers L1
DENORM (new predefined attribute) L1
Denormalized numbers L1
Elementary functions L3
ELEMENTARY_FUNCTIONS_EXCEPTIONS (predefined package) L3
Fixed point L2
Fixedpoint
accuracy requirements L3
arithmetic model L2
attributes L2
attributes of model numbers eliminated L3
model numbers eliminated L3
type declarations L2
values L2
Floatingpoint L1
accuracy requirements L1
arithmetic model L1
attributes of machine numbers L1
attributes of model numbers L1
denormalized machine numbers L1
machine numbers L1
model numbers L1
type declarations L2
GENERIC_ELEMENTARY_FUNCTIONS (predefined generic package) L3
Model numbers L1
Primitive functions (new predefined floatingpoint attributes)
L4
Signed zeros L1, L4
SIGNED_ZEROS (new predefined attribute) L1
Table of Contents
L. Numerics Annex (Specification) L1
L.1. Semantics of FloatingPoint Arithmetic L1
L.1.1. FloatingPoint Machine Numbers L1
L.1.2. Attributes of FloatingPoint Machine Numbers L1
L.1.3. FloatingPoint Model Numbers L1
L.1.4. Attributes of FloatingPoint Model Numbers L1
L.1.5. Accuracy of FloatingPoint Operations L1
L.1.6. FloatingPoint Type Declarations L2
L.2. Semantics of FixedPoint Arithmetic L2
L.2.1. Values and Attributes of FixedPoint Types L2
L.2.2. FixedPoint Type Declarations L2
L.2.3. Accuracy of FixedPoint Operations L3
L.2.4. Attributes of FixedPoint Numbers L3
L.3. Elementary Functions L3
L.4. Primitive Functions L4
Index I1