Ada 9X LSN-045-MRT Numerics Annex (Specification), Vers. 4.7 June 1992 K W Dritz Argonne National Laboratory Argonne, IL 60439 E-mail: dritz@mcs.anl.gov Attached below is my revision of the Numerics Annex (specification) following the last DRs' meeting. The changes have mostly to do with reducing the incompatibilities perceived to stem from the proposed dropping of the 4B Rule; a version of it previously discussed in the Vers. 4.1 Numerics Annex (rationale) has now been incorporated. The other major change has to do with the restoration of the concept of _small_ for fixed-point types, with an appropriate default rule. The major changes are marked in the margin with change bars. The rationale, which I will send separately, has a major discussion of the issues surrounding the incorporation of a version of the 4B Rule (which, by the way, should now be called the 4D Rule). Ken =============================================================================== L. Numerics Annex (Specification) The semantic models of floating-point and fixed-point arithmetic, a generic | package of elementary functions, and a collection of attributes comprising | ``primitive'' floating-point manipulation functions are presented in this | annex. The ultimate placement of the features described here is still | undecided. At this writing, it is thought that the generic package of | elementary functions will likely be included with other required packages in | a chapter of the core devoted to an ``Ada Standard Library,'' and the | primitive function attributes will likely also be in the core and required | of all implementations. It is intended that the models of floating-point | and fixed-point arithmetic also be supported by all implementations, and | their placement in an annex is purely for presentation purposes. Little of | the original intent of the Numerics Annex, as a repository for optional | features that need be supported only by implementations serving the special | needs of numeric applications, remains. | L.1. Semantics of Floating-Point Arithmetic L.1.1. Floating-Point Machine Numbers Associated with each floating-point type is a finite set of machine numbers. The machine numbers of a type are those capable of being represented, to full accuracy, in the storage representation of the type. The machine numbers of a derived type are those of the parent type; the machine numbers of a subtype are those of the base type. L.1.2. Attributes of Floating-Point Machine Numbers Attributes related to the machine numbers of a floating-point type T (i.e., to its storage representation) are defined in this section. T'MACHINE_RADIX yields the radix of the hardware representation of T. T'MACHINE_MANTISSA yields the largest integer value of p, and T'MACHINE_EMIN and T'MACHINE_EMAX respectively the most negative and most positive integer values of exponent, such that every number expressible in the ``canonical form'' sign * mantissa * (radix ** exponent) where - sign = +1 or -1; - radix = T'MACHINE_RADIX; and - mantissa is a p-digit fraction in the number base radix, the first digit of which is nonzero is a machine number of T, i.e., representable to full accuracy in the storage representation of T. If, in addition, every number expressible in the canonical form, but where - sign = +1 or -1; - radix = T'MACHINE_RADIX; - exponent = T'MACHINE_EMIN; and - mantissa is a T'MACHINE_MANTISSA-digit nonzero fraction in the number base radix, the first digit of which is zero is also a machine number of T (called in IEEE Std. 754 a denormalized number), then the attribute T'DENORM yields the value TRUE; otherwise, it yields FALSE. T'DENORM is a new representation attribute of the type T. An implementation (i.e., one employing ``radix-complement'' representation) may furthermore include -T'MACHINE_RADIX ** T'MACHINE_EMAX and possibly T'MACHINE_RADIX ** (T'MACHINE_EMIN - 2) in the set of machine numbers of T; if so, they must be documented in Appendix F. Of course, zero is also a machine number of T. An implementation may have two distinct representations for floating-point zeros, with positive and negative sign respectively, having the properties given in IEEE Std. 754 or 854. The attribute T'SIGNED_ZEROS yields TRUE in this case, and FALSE otherwise. T'SIGNED_ZEROS is a new representation attribute of the type T. Note: Even if T'SIGNED_ZEROS is TRUE, the predefined equality operator yields TRUE given two operands of zero; this is a consequence of the IEEE standards cited above. Some of the elementary and primitive functions (see L.3 and L.4, respectively) yield results, given operands of zero, that depend on the value of T'SIGNED_ZEROS. The representation attributes T'MACHINE_ROUNDS and T'MACHINE_OVERFLOWS are retained. The meaning of T'MACHINE_OVERFLOWS is clarified (see L.1.5). The attributes T'MACHINE_RADIX, T'MACHINE_MANTISSA, T'MACHINE_EMIN, and T'MACHINE_EMAX return results of the type universal_integer. The attributes T'SIGNED_ZEROS and T'DENORM return results of type BOOLEAN. L.1.3. Floating-Point Model Numbers Associated with each floating-point type is an infinite set of model numbers. The model numbers of a type are used to define the accuracy requirements that must be satisfied by certain predefined operations of the type (see L.1.5); through certain attributes of the model numbers, they are also used to explain the meaning of a user-declared floating-point type declaration (see L.1.6). The model numbers of a derived type are those of the parent type; the model numbers of a subtype are those of the base type. The model numbers of a floating-point type T are zero and all the numbers expressible in the canonical form, where - sign = +1 or -1; - radix = T'MACHINE_RADIX; - exponent is an integer >= T'MODEL_EMIN; and - mantissa is a T'MODEL_MANTISSA-digit fraction in the number base radix, the first digit of which is nonzero. L.1.4. Attributes of Floating-Point Model Numbers Attributes related to the model numbers of a floating-point type T are defined as follows. The attributes T'MODEL_MANTISSA and T'MODEL_EMIN used to define the model numbers, and the attribute T'MODEL_EMAX, are determined by the accuracy delivered by certain predefined operations of the type T and by their ability to avoid overflow. More precisely, T'MODEL_MANTISSA, T'MODEL_EMIN, and T'MODEL_EMAX yield, respectively, the largest integer <= T'MACHINE_MANTISSA, the most negative integer >= T'MACHINE_EMIN, and the most positive integer <= T'MACHINE_EMAX such that certain predefined operations of the type T satisfy the accuracy requirements given in L.1.5, expressed in terms of the model numbers of the type T and in terms of the attribute T'MODEL_LARGE, which is defined as follows: T'MODEL_LARGE = T'MACHINE_RADIX ** T'MODEL_EMAX * (1.0 - T'MACHINE_RADIX ** (-T'MODEL_MANTISSA)) Two additional attributes of the model numbers are defined for convenience, as follows: - T'MODEL_EPSILON = T'MACHINE_RADIX ** (1 - T'MODEL_MANTISSA). This attribute gives the absolute value of the difference between the model number 1.0 and the next higher model number of the type T. - T'MODEL_SMALL = T'MACHINE_RADIX ** (T'MODEL_EMIN - 1). This attribute gives the value of the smallest positive (nonzero) model number of the type T. The attributes T'MODEL_LARGE, T'MODEL_SMALL, and T'MODEL_EPSILON return results of the type universal_real. The attributes T'MODEL_MANTISSA, T'MODEL_EMAX, and T'MODEL_EMIN return results of the type universal_integer. For a user-declared floating-point type T, T'DIGITS returns the precision | specified in the floating_accuracy_definition of T; the same value is | returned for any type derived from T or any subtype of T. (In Ada 9X, a | floating_accuracy_definition is not allowed in a subtype declaration.) For | a predefined type P, the value of P'DIGITS is the largest value of D for | which ceiling(D * log(10)/log(P'MACHINE_RADIX) + 1) <= P'MODEL_MANTISSA. | | The Ada 83 attributes T'MANTISSA, T'EMAX, T'LARGE, T'SMALL, T'EPSILON, | T'SAFE_EMAX, T'SAFE_LARGE, and T'SAFE_SMALL are removed from the language, | but for purposes of upward compatibility implementations are encouraged to | retain them as implementation-defined attributes with the same values they | had in Ada 83. | L.1.5. Accuracy of Floating-Point Operations The accuracy requirements for the evaluation of certain predefined operations of floating-point types are stated as follows. Note: We present here a tentative version of the entire rewrite of RM 4.5.7 anticipated for Ada 9X. This section does not cover the accuracy of an | operation of a static expression that involves only the operators of the | root numeric types; such operations must be evaluated exactly (see 4.9). | (Operators of the root_real type behave in other contexts like operators of | a floating-point type whose model numbers have a precision and maximum | exponent at least as great as, and a minimum exponent at least as small as, | those of any other floating-point type declared in STANDARD with DIGITS | equal to SYSTEM.MAX_DIGITS; see 3.5.6.) It also does not cover the accuracy | of the predefined attributes of a floating-point subtype that yield a value of the type; such operations also yield exact results (see L.4 and elsewhere). Finally, it should be noted that values outside the range T'FIRST .. T'LAST can be assigned to variables, passed to parameters, and returned from functions whose type T is a numeric base type (because range checking is no longer performed in those contexts when the type of the variable, formal parameter, or function is a numeric base type), and that fetching, in any context, the value denoted by a name or function_call whose type T is a numeric base type can, but need not, raise CONSTRAINT_ERROR when the value is outside the range T'FIRST .. T'LAST; thus no special provision is made in this section for the possible raising of CONSTRAINT_ERROR when the value denoted by a name or a function_call is used as the operand of a predefined operation. A model interval of a floating-point type is any interval whose bounds are model numbers of the type. The model interval of a type T associated with a value V is the smallest model interval of T that includes V. (The model interval associated with a model number of a type consists of that number only.) An operand interval is the model interval, of the type specified for the operand of an operation, associated with the value of the operand. If the absolute value of either bound of a model interval of T exceeds T'MODEL_LARGE, the model interval is said to be out of bounds; otherwise, it is said to be in bounds. For any predefined arithmetic operation that yields a result of a floating-point type T, the required bounds on the result are given by a model interval of T (called the ``result interval'') defined in terms of the operand values as follows: The result interval is the smallest model interval of T that includes the minimum and the maximum of all the values obtained by applying the (exact) mathematical operation to values arbitrarily selected from the respective operand intervals. The result interval of an exponentiation is obtained by applying the above rule to the sequence of multiplications defined by the exponent, assuming arbitrary association of the factors, and to the final division in the case of a negative exponent. The result interval of a conversion of a numeric value to a floating-point type T is the model interval of T associated with the operand value, except when the source expression has a fixed-point type or is a fixed-point multiplication or division; in these cases, the result interval is implementation defined. Note: A conversion to a constrained subtype of a type is a conversion to the type followed by a check the result of the conversion belongs to the subtype, as in Ada 83. For any of the foregoing operations, the implementation must deliver a value that belongs to the result interval when the result interval is in bounds; otherwise (i.e., when the result interval is out of bounds), - if T'MACHINE_OVERFLOWS is TRUE, the implementation must either deliver a value that belongs to the result interval or raise CONSTRAINT_ERROR; - if T'MACHINE_OVERFLOWS is FALSE, the result is implementation defined. For any predefined relation on operands of a floating-point type T, the implementation may deliver any value (i.e., either TRUE or FALSE) obtained by applying the (exact) mathematical comparison to values arbitrarily chosen from the respective operand intervals. The result of a membership test is defined in terms of comparisons of the operand value with the lower and upper bounds of the given range or type mark (the usual rules apply to these comparisons). L.1.6. Floating-Point Type Declarations A floating-point type declaration of one of the two forms (that is, with or without the optional range constraint indicated by the square brackets): type T is digits D [range L .. R]; is, by definition, equivalent to the following declarations: type floating_point_type is new P; subtype T is floating_point_type [range floating_point_type(L) .. floating_point_type(R)]; where floating_point_type is an anonymous type, and where P is a predefined floating-point type implicitly selected by the implementation so that it satisfies the following requirements: - P'DIGITS >= D. | | - If a range L .. R is specified, then P'MODEL_LARGE >= max(abs(L), | abs(R)); otherwise, P'MODEL_LARGE >= 10.0 ** (4*D). | The floating-point type declaration is illegal if none of the predefined floating-point types available for implicit selection as a parent type in a floating-point type definition satisfies these requirements. Note: Implementations may provide other predefined numeric types that are not available for implicit selection in a numeric type definition. The definition of the named number SYSTEM.MAX_DIGITS is changed slightly in | Ada 9X. It now gives the maximum precision that can be requested in the | declaration of an unconstrained floating-point type. Implementations may | allow types with higher precisions to be declared, provided that their | declarations include range constraints. | L.2. Semantics of Fixed-Point Arithmetic The language features for, and especially the model of, fixed-point | arithmetic are simplified to facilitate their use and to foster wider | implementation of the features. The concept of model numbers no longer | applies to fixed-point types. | | A special kind of fixed-point type, called a decimal fixed-point type, or | simply a decimal type, is introduced by the Information Systems Annex (see | IS:DECIMAL_FIXED_POINT). Throughout this section, unqualified references to | fixed-point types apply to all fixed-point types, whether decimal or not. | Fixed-point types that are not decimal types are referred to, when | necessary, as ``ordinary fixed-point types.'' | | | | L.2.1. Values and Attributes of Fixed-Point Types | | The values of a fixed-point type are an infinite set of numbers, which are | the integer multiples of the type's small. The values of a type derived | from a fixed-point type are those of the parent type; the values of a | subtype of a fixed-point type are those of the base type that satisfy the | subtype's range constraint. A fixed_accuracy_definition is no longer | allowed in a subtype declaration. | | For a fixed-point type T, T'MACHINE_RADIX (which was allowed only for | floating-point types in Ada 83) yields the radix of the hardware | representation of T. For ordinary fixed-point types, this attribute always | yields 2. For decimal types, it yields the value (which may be either 2 or | 10) specified for the type in an attribute definition clause for | MACHINE_RADIX; it is implementation defined, but restricted to the same | choices, in the absence of such a clause (see IS:INTERNAL_DECIMAL_REP). (An | attribute definition clause for MACHINE_RADIX is not allowed for ordinary | fixed-point types.) The Ada 83 attributes T'MACHINE_ROUNDS and | T'MACHINE_OVERFLOWS are retained; the meaning of the latter is clarified | (see L.2.3). T'FORE and T'AFT are also retained. | | T'SMALL yields the absolute value of the difference between consecutive | values of the type T; that is, it yields the value of the small of the type. | If not specified in an attribute definition clause for SMALL, an ordinary | fixed-point type's small is, by default, an implementation-defined power of | two less than or equal to its delta. The small of a user-declared ordinary | fixed-point type may be specified explicitly in an attribute definition | clause; the value given must be less than or equal to the type's delta. The | small of a decimal type (see IS:DECIMAL_FIXED_POINT) is always the same as | its delta and is not explicitly specifiable. Implementations are required | to support binary smalls (smalls that are powers of two); implementations | claiming conformance to the Information Systems Annex (see K) are, in | addition, required to support decimal smalls (smalls that are powers of | ten). Implementations are allowed, but not required, to support other | smalls. | | For an arbitrary fixed-point subtype T, T'SMALL = T'BASE'SMALL. | | For a user-declared fixed-point type T, T'DELTA returns the delta specified | in the fixed_accuracy_definition of T; the same value is returned for any | type derived from T and for any subtype of T. For a predefined fixed-point | type P, the value of P'DELTA is the same as the value of P'SMALL. | | The Ada 83 attributes T'MANTISSA, T'LARGE, T'SAFE_LARGE, and T'SAFE_SMALL | are removed from the language, but for purposes of upward compatibility | implementations are encouraged to retain them as implementation-defined | attributes with the same values they had in Ada 83. | L.2.2. Fixed-Point Type Declarations An ordinary fixed-point type declaration type T is delta D range L .. R; [for T'SMALL use S;] where S (if specified) is less than or equal to D is, by definition, equivalent to the following declarations: type fixed_point_type is new P; subtype T is fixed_point_type range fixed_point_type(L) .. fixed_point_type(R); where fixed_point_type is an anonymous type, and where P is a predefined fixed-point type implicitly selected by the implementation so that it satisfies the following requirements: - if S is specified, then P'SMALL = S; otherwise, P'SMALL is an | implementation-defined power of two less than or equal to D; | | - if abs(R) is a power of two times P'SMALL, P'LAST >= R - P'SMALL; | otherwise, P'LAST >= R; | | - if abs(L) is a power of two times P'SMALL, P'FIRST <= L + P'SMALL; | otherwise, P'FIRST <= L. | The fixed-point type declaration is illegal if none of the predefined fixed-point types available for implicit selection as a parent type in a fixed-point type definition satisfies these requirements. Note: Implementations may provide other predefined numeric types that are not available for implicit selection in a numeric type definition. The range of the subtype T declared by the preceding fixed-point type declaration is determined as follows: - T'LAST = min(P'LAST, R); - T'FIRST = max(P'FIRST, L). The rules for the selection of the underlying predefined type used to | represent a user-declared decimal type T1 are deducible from those applying | to a particular ordinary fixed-point type T2 related to T1 (see | IS:DECIMAL_FIXED_POINT). | With the elimination of the model numbers for fixed-point types, the | definition of the named number SYSTEM.MAX_MANTISSA must be revised slightly. | Informally, this measure is related to the maximum ``normalized'' magnitude | of any value of a fixed-point type or subtype (more precisely, to the number | of bits required to hold the maximum normalized magnitude). An appropriate | definition is the maximum value of | | ceiling(log2(max(abs(T'LAST), abs(T'FIRST)) / T'SMALL)) | | for any ordinary fixed-point type T. Also, the definition of the named | number SYSTEM.FINE_DELTA is amended slightly to clarify that it applies only | to ordinary fixed-point types. | L.2.3. Accuracy of Fixed-Point Operations The accuracy requirements for the predefined fixed-point arithmetic operations and conversions, and the results of relations on fixed-point operands, are given below. This section does not cover the accuracy of an | operation of a static expression that involves only the operators of the | root numeric types; such operations must be evaluated exactly (see 4.9). | As in Ada 83, the operands of the fixed-point adding operators, absolute value, and comparisons must have identical types. These operations are required to yield exact results, since no implementation difficulties are posed by this requirement. Overflow considerations are discussed later. Multiplications and divisions are allowed between operands of any two | fixed-point types. Although this can be viewed as an operation that yields | an infinitely precise result of a special type, followed by its conversion | to the result type (see 4.5.5), for purposes of defining the accuracy rules | we treat this instead as a single operation whose accuracy depends on three | types (those of the operands and the result). In contrast to Ada 83, the | result need not always be converted explicitly to some numeric type. | Explicit conversion is not required when the surrounding context implies a | unique type; implicit conversion takes place in that case. Explicit | conversion is required when the context does not provide a unique result | type. For decimal types, the attribute T'ROUND may be used to imply | explicit conversion with rounding (see IS:ROUNDING_CONTROL). | | When the result type is a floating-point type, the accuracy is | implementation defined (see L.1.5); this case is not further discussed here. | For some combinations of the operand and result types in the remaining | cases, the result is required to belong to a small set of values called the | ``perfect result set''; for other combinations, it is required merely to | belong to a generally larger and implementation-defined set of values called | the ``close result set.'' When the result type is a decimal type, the | perfect result set contains a single value; thus, operations on decimal | types are always deterministic. | | When one operand of a fixed-point multiplication or division is of type | universal_real, a case allowed in Ada 9X but not allowed in Ada 83 (see | 4.5.5), that operand is not implicitly converted in the usual sense, since | the context does not determine a unique target type, but the accuracy of the | result of the multiplication or division (i.e., whether the result must | belong to the perfect result set or merely the close result set) depends on | the value of the operand of type universal_real and on the types of the | other operand and of the result. We need not consider here the | multiplication or division of two such operands, since in that case either | the operation is evaluated exactly (i.e., it is an operation of a static | expression all of whose operators are of a root numeric type) or it is | considered to be an operation of a floating-point type (see 3.5.6). | For a fixed-point multiplication or division whose (exact) mathematical result is V, and for the conversion of a value V to a fixed-point type, the ``perfect result set'' and ``close result set'' are defined as follows: - If the result type is an ordinary fixed-point type with a small of | S, | * if V is an integer multiple of S, then the perfect result set contains only the value V; * otherwise, it contains the integer multiple of S just below V and the integer multiple of S just above V. The close result set is an implementation-defined set of consecutive integer multiples of S containing the perfect result set as a subset. - If the result type is a decimal type with a small of S, | | * if V is an integer multiple of S, then the perfect result set | contains only the value V; | | * otherwise, if truncation applies then it contains only the | integer multiple of S in the direction toward zero, whereas | if rounding applies then it contains only the nearest integer | multiple of S (with ties broken by rounding away from zero). | | The close result set is an implementation-defined set of | consecutive integer multiples of S containing the perfect result | set as a subset. Note: As a consequence of subsequent rules, this | case does not arise when the operand types are also decimal types. | - If the result type is an integer type, * if V is an integer, then the perfect result set contains only the value V; * otherwise, it contains the integer nearest to the value V (if V lies equally distant from two consecutive integers, the perfect result set contains both). The close result set is an implementation-defined set of consecutive integers containing the perfect result set as a subset. The result of a fixed-point multiplication or division must belong either to the perfect result set or to the close result set, as described below, if overflow does not occur. (Overflow is discussed later.) In the following cases, if the result type is a fixed-point type, let S be its small; otherwise, i.e. when the result type is an integer type, let S be 1.0. - For a multiplication or division neither of whose operands is of | type universal_real, let L and R be the smalls of the left and | right operands. For a multiplication, if (L * R) / S is an | integer or the reciprocal of an integer (the smalls are said to be | ``compatible'' in this case), the result must belong to the | perfect result set; otherwise, it belongs to the close result set. | For a division, if L / (R * S) is an integer or the reciprocal of | an integer (i.e., the smalls are compatible), the result must | belong to the perfect result set; otherwise, it belongs to the | close result set. Note: When the operand and result types are all | decimal types, their smalls are necessarily compatible; the same | is true when they are all ordinary fixed-point types with binary | smalls. | | - For a multiplication or division having one universal_real operand | with a value of V, note that it is always possible to factor V as | an integer multiple of a ``compatible'' small, but the integer | multiple may be ``too big.'' If the factorization allows an | integer multiple less than some implementation-defined limit, the | result must belong to the perfect result set; otherwise, it | belongs to the close result set. | A multiplication P * Q of an operand of a fixed-point type F by an operand of an integer type I, or vice-versa, and a division P / Q of an operand of a fixed-point type F by an operand of an integer type I, are also allowed, as in Ada 83. In these cases, the result has a type of F; explicit conversion of the result is never required. The accuracy required in these cases is the same as that required for a multiplication F(P * Q) or a division F(P / Q) obtained by interpreting the operand of the integer type to have a fixed-point type with a small of 1.0. The accuracy of the result of a conversion from an integer or fixed-point type to a fixed-point type, or from a fixed-point type to an integer type, is the same as that of a fixed-point multiplication of the source value by a fixed-point operand having a small of 1.0 and a value of 1.0, as given by the foregoing rules. The result of a conversion from a floating-point type to a fixed-point type must belong to the close result set. The possibility of overflow in the result of a predefined arithmetic operation or conversion yielding a result of a fixed-point type T is analogous to that for floating-point types. If all of the permitted results belong to the range T'BASE'FIRST .. T'BASE'LAST, then the implementation must deliver one of the permitted results; otherwise, - if T'MACHINE_OVERFLOWS is TRUE, the implementation must either deliver one of the permitted results or raise CONSTRAINT_ERROR; - if T'MACHINE_OVERFLOWS is FALSE, the result is implementation defined. L.2.4. Attributes of Fixed-Point Numbers Because the model of fixed-point arithmetic is no longer expressed in terms of model numbers and model intervals, no attributes related to the Ada 83 model are required (except T'DELTA, T'SMALL, T'FIRST, and T'LAST). In | particular, the attributes T'MANTISSA, T'LARGE, T'SAFE_SMALL, and | T'SAFE_LARGE (of a fixed-point type T) are eliminated and not replaced by | other attributes. T'FORE and T'AFT are retained because of their connection | with I/O. L.3. Elementary Functions Implementations conforming to the Numerics Annex shall provide a predefined generic package called GENERIC_ELEMENTARY_FUNCTIONS and an accompanying predefined package called ELEMENTARY_FUNCTIONS_EXCEPTIONS having the following specifications: package ELEMENTARY_FUNCTIONS_EXCEPTIONS is ARGUMENT_ERROR : exception; end ELEMENTARY_FUNCTIONS_EXCEPTIONS; with ELEMENTARY_FUNCTIONS_EXCEPTIONS; generic type FLOAT_TYPE is digits <>; package GENERIC_ELEMENTARY_FUNCTIONS is subtype FLOAT_BASE is FLOAT_TYPE'BASE; function SQRT (X : FLOAT_BASE) return FLOAT_BASE; function LOG (X : FLOAT_BASE) return FLOAT_BASE; function LOG (X, BASE : FLOAT_BASE) return FLOAT_BASE; function EXP (X : FLOAT_BASE) return FLOAT_BASE; function "**" (X, Y : FLOAT_BASE) return FLOAT_BASE; function SIN (X : FLOAT_BASE) return FLOAT_BASE; function SIN (X, CYCLE : FLOAT_BASE) return FLOAT_BASE; function COS (X : FLOAT_BASE) return FLOAT_BASE; function COS (X, CYCLE : FLOAT_BASE) return FLOAT_BASE; function TAN (X : FLOAT_BASE) return FLOAT_BASE; function TAN (X, CYCLE : FLOAT_BASE) return FLOAT_BASE; function COT (X : FLOAT_BASE) return FLOAT_BASE; function COT (X, CYCLE : FLOAT_BASE) return FLOAT_BASE; function ARCSIN (X : FLOAT_BASE) return FLOAT_BASE; function ARCSIN (X, CYCLE : FLOAT_BASE) return FLOAT_BASE; function ARCCOS (X : FLOAT_BASE) return FLOAT_BASE; function ARCCOS (X, CYCLE : FLOAT_BASE) return FLOAT_BASE; function ARCTAN (Y : FLOAT_BASE; X : FLOAT_BASE := 1.0) return FLOAT_BASE; function ARCTAN (Y : FLOAT_BASE; X : FLOAT_BASE := 1.0; CYCLE : FLOAT_BASE) return FLOAT_BASE; function ARCCOT (X : FLOAT_BASE; Y : FLOAT_BASE := 1.0) return FLOAT_BASE; function ARCCOT (X : FLOAT_BASE; Y : FLOAT_BASE := 1.0; CYCLE : FLOAT_BASE) return FLOAT_BASE; function SINH (X : FLOAT_BASE) return FLOAT_BASE; function COSH (X : FLOAT_BASE) return FLOAT_BASE; function TANH (X : FLOAT_BASE) return FLOAT_BASE; function COTH (X : FLOAT_BASE) return FLOAT_BASE; function ARCSINH (X : FLOAT_BASE) return FLOAT_BASE; function ARCCOSH (X : FLOAT_BASE) return FLOAT_BASE; function ARCTANH (X : FLOAT_BASE) return FLOAT_BASE; function ARCCOTH (X : FLOAT_BASE) return FLOAT_BASE; ARGUMENT_ERROR : exception renames ELEMENTARY_FUNCTIONS_EXCEPTIONS.ARGUMENT_ERROR; end GENERIC_ELEMENTARY_FUNCTIONS; The specifications above are identical to the proposed separate ISO standard (DIS 11430) for the elementary functions except that the formal parameters and results of the elementary functions are of the base type of the generic formal type, rather than the type itself. It is intended that implementations of the GENERIC_ELEMENTARY_FUNCTIONS conform to the various semantic requirements (regarding domains, ranges, exception handling, accuracy, prescribed results, etc.) presented in DIS 11430 and not repeated here, except that implementations conforming to the Numerics Annex must allow GENERIC_ELEMENTARY_FUNCTIONS to be instantiated with a range-constrained floating-point subtype, and the body must be immune to potential effects of the range constraint; in other words, implementations are not allowed to impose a restriction (allowed by DIS 11430) that the generic actual type in an instantiation must be a base type. In Ada 9X, the accuracy requirements are expressed in terms of FLOAT_TYPE'EPSILON, since EPSILON of a subtype is now that of the base type. In DIS 11430, the accuracy requirements are expressed in terms of FLOAT_TYPE'BASE'EPSILON. The ARCTAN and ARCCOT functions must exploit signed zeros, if present in the implementation (as indicated by the value of FLOAT_TYPE'SIGNED_ZEROS). In particular, when X is negative and Y is zero: - if FLOAT_TYPE'SIGNED_ZEROS is TRUE, ARCTAN(Y, X, CYCLE) and ARCCOT(X, Y, CYCLE) must deliver -CYCLE/2.0 when Y is a negative zero and +CYCLE/2.0 when Y is a positive zero; - if FLOAT_TYPE'SIGNED_ZEROS is FALSE, ARCTAN(Y, X, CYCLE) and ARCCOT(X, Y, CYCLE) deliver CYCLE/2.0. The behavior of the versions of ARCTAN and ARCCOT without a CYCLE parameter is similar in the above case (i.e., when X is negative and Y is zero), except that the result is then an appropriate approximation of plus or minus pi. In addition, the zero delivered by SIN, ARCSIN, SINH, ARCSINH, TAN, TANH, and ARCTANH when X is zero must have the same sign as X when FLOAT_TYPE'SIGNED_ZEROS is TRUE; similarly, the zero delivered by ARCTAN when X is positive and Y is zero must have the same sign as Y when FLOAT_TYPE'SIGNED_ZEROS is TRUE. (This requirement goes beyond DIS 11430, which did not specify the sign of the result in these cases.) The extent of the exploitation of signed zeros is left implementation defined in the many other contexts in which an elementary function can return a zero result. L.4. Primitive Functions Implementations conforming to the Numerics Annex shall provide the following additional attributes: T'EXPONENT(X) T'FRACTION(X) T'COMPOSE(FRACTION, EXPONENT) T'SCALE(X, EXPONENT_ADJUSTMENT) T'FLOOR(X) T'CEILING(X) T'ROUNDING(X) T'TRUNCATION(X) T'REMAINDER(X, Y) T'ADJACENT(X, TOWARDS) T'COPY_SIGN(VALUE, SIGN) T'LEADING_PART(X, RADIX_DIGITS) T'MIN(X, Y) T'MAX(X, Y) T'MODEL(X) T'MACHINE(X) In the case of MIN and MAX, the prefix may denote any scalar type or subtype; for the other attributes, the prefix must denote a floating-point type or subtype. Implementations conforming to the Numerics Annex shall also extend the attributes T'SUCC(X) and T'PRED(X) to apply when T is a floating-point type or subtype. All of the above attributes except MIN, MAX, MODEL, and MACHINE correspond directly to functions in the GENERIC_PRIMITIVE_FUNCTIONS generic package proposed as a separate ISO standard (CD 11729) for Ada 83. The ROUNDING and TRUNCATION attributes correspond to the ROUND and TRUNCATE functions in GENERIC_PRIMITIVE_FUNCTIONS; the latter names are proposed in the Information Systems Annex for entirely different attributes (see IS:ROUNDING_CONTROL). The EXPONENT_ADJUSTMENT parameter of the SCALE attribute corresponds to the EXPONENT parameter of the SCALE function of GENERIC_PRIMITIVE_FUNCTIONS. The SUCCESSOR and PREDECESSOR functions of GENERIC_PRIMITIVE_FUNCTIONS are provided by the extension of the existing SUCC and PRED attributes. The functionality of the DECOMPOSE procedure of GENERIC_PRIMITIVE_FUNCTIONS is not provided. MIN, MAX, MODEL, and MACHINE are new (not taken from GENERIC_PRIMITIVE_FUNCTIONS). The type of the result yielded by all of the ``primitive function'' attributes except EXPONENT is the base type of T; EXPONENT yields a result of type universal_integer. The type of actual parameters corresponding to X, Y, FRACTION, TOWARDS, VALUE, and SIGN must be the base type of T. Actual parameters corresponding to EXPONENT, EXPONENT_ADJUSTMENT, and RADIX_DIGITS may be of any integer type (i.e., the formal parameter has type universal_integer). The value of an actual parameter corresponding to RADIX_DIGITS must be positive. All the attributes preserve staticness. These attributes deliver results that are accurate to the level of machine numbers. Like T'FIRST and T'LAST, which also must deliver fully accurate results, they are not among the predefined operations covered by the replacement for RM 4.5.7 (see L.1.5). Note: A decision has not yet been made about whether extra accuracy can be passed in to a primitive function, and whether that implies that the extra accuracy must be maintained during the operation and must affect the result. It is anticipated that the attributes corresponding to functions in GENERIC_PRIMITIVE_FUNCTIONS will be defined as they were there, subject to modifications when a decision is made on the role of extra precision. Their definitions are not repeated here. The definitions depend, in some cases, on the presence or absence of denormalized numbers and signed zeros, as reflected in the values of T'DENORM and T'SIGNED_ZEROS, respectively. The other attributes are defined below. T'MACHINE(X) returns the value of X rounded or truncated to a neighboring machine number (see L.1.1) of the type T; i.e., extra precision beyond T'MACHINE_MANTISSA radix digits is discarded, and CONSTRAINT_ERROR may be raised if the value of X is sufficiently outside the range T'BASE'FIRST .. T'BASE'LAST that rounding or truncating it to the precision of the machine numbers cannot yield a result in this range (i.e., cannot yield the appropriate bound of this range). T'MODEL(X) is defined as follows: - if X is a model number of the type T (see L.1.3) in the range -T'MODEL_LARGE .. T'MODEL_LARGE, X is returned; - if X lies between two consecutive model numbers of the type T in that range, one of those surrounding model numbers is returned; and - if X lies outside that range, CONSTRAINT_ERROR is raised. T'MIN(X, Y) and T'MAX(X, Y) return the minimum and the maximum of their two arguments, respectively. Index Canonical form definition L-1 of denormalized floating-point machine numbers L-1 of floating-point model numbers L-1 of normalized floating-point machine numbers L-1 DENORM (new predefined attribute) L-1 Denormalized numbers L-1 Elementary functions L-3 ELEMENTARY_FUNCTIONS_EXCEPTIONS (predefined package) L-3 Fixed point L-2 Fixed-point accuracy requirements L-3 arithmetic model L-2 attributes L-2 attributes of model numbers eliminated L-3 model numbers eliminated L-3 type declarations L-2 values L-2 Floating-point L-1 accuracy requirements L-1 arithmetic model L-1 attributes of machine numbers L-1 attributes of model numbers L-1 denormalized machine numbers L-1 machine numbers L-1 model numbers L-1 type declarations L-2 GENERIC_ELEMENTARY_FUNCTIONS (predefined generic package) L-3 Model numbers L-1 Primitive functions (new predefined floating-point attributes) L-4 Signed zeros L-1, L-4 SIGNED_ZEROS (new predefined attribute) L-1 Table of Contents L. Numerics Annex (Specification) L-1 L.1. Semantics of Floating-Point Arithmetic L-1 L.1.1. Floating-Point Machine Numbers L-1 L.1.2. Attributes of Floating-Point Machine Numbers L-1 L.1.3. Floating-Point Model Numbers L-1 L.1.4. Attributes of Floating-Point Model Numbers L-1 L.1.5. Accuracy of Floating-Point Operations L-1 L.1.6. Floating-Point Type Declarations L-2 L.2. Semantics of Fixed-Point Arithmetic L-2 L.2.1. Values and Attributes of Fixed-Point Types L-2 L.2.2. Fixed-Point Type Declarations L-2 L.2.3. Accuracy of Fixed-Point Operations L-3 L.2.4. Attributes of Fixed-Point Numbers L-3 L.3. Elementary Functions L-3 L.4. Primitive Functions L-4 Index I-1