Ada 9X LSN-046-MRT Numerics Annex (Rationale), Vers. 4.7 June 1992 K W Dritz Argonne National Laboratory Argonne, IL 60439 E-mail: dritz@mcs.anl.gov Attached below is my revision of the Numerics Annex (rationale) following the last DRs' meeting. I sent the specification separately a little while ago. The major changes are marked in the margin with change bars. Ken =============================================================================== 1. Numerics Annex (Rationale) This section provides the rationale for language features proposed in the | Numerics Annex of the Mapping Specification. These language features | include models of floating-point and fixed-point arithmetic, a predefined | generic package of elementary functions, and attributes comprising a | collection of ``primitive'' floating-point manipulation functions. The | models of floating-point and fixed-point arithmetic provide the semantics of | operations of the real types and are placed in the Numerics Annex for | presentation purposes only; all implementations must conform to the models. | The elementary functions are currently described in the Numerics Annex but | are relevant to such a wide variety of applications that they will probably | be moved to a chapter of the core on ``Ada Standard Libraries.'' The | primitive functions were once thought to be germane only to the development | of mathematical software libraries, but the realization that they have uses | in the implementation of I/O formatting and other applications has resulted | in a desire to move them to the core. | The treatment of numerics is simplified by - retaining, for floating-point types, only one of the concepts of model numbers and safe numbers; - eliminating, for fixed-point types, both the model numbers and the | safe numbers; | - eliminating attributes that are no longer needed. The simplification provides the following direct benefits: - real types become somewhat easier to describe; - at least one common misapprehension (that the safe numbers and model numbers of a floating-point type differ only in range, i.e., that they have the same precision) loses its basis; - fixed-point types become more intuitive, with no loss of functionality. Conceptually, for floating-point types it is the model numbers that are being eliminated and the safe numbers that are being kept. However, in the process of doing so, the properties of the latter are being changed slightly, and they are being called model numbers instead of safe numbers. One may prefer to think that the model numbers have been retained (with some changes) and the safe numbers eliminated, but the surviving concept is much closer to that of Ada 83's safe numbers. The name change is motivated by the broad connection of the resulting concept to the semantics of floating-point arithmetic, in contrast to the much more limited connotations of ``safe numbers.'' If, with the advent of Ada 9X, one only talks about ``model numbers'' in the context of their definition in Ada 9X, no confusion should arise. The changes in the surviving concept provide these secondary benefits: - the model of floating-point arithmetic becomes more useful to numerical analysts because, as a descriptive tool, it reflects the properties of the underlying hardware more closely; - the ``4*B Rule'' is recast in a way that does not penalize the | properties of any predefined type; | - implementations of floating point on decimal hardware become practical; - a few anomalies are eliminated. In general, the changes will have little impact on implementations; in particular, currently generated floating-point code should, in the main, | remain valid. | 1.1. Semantics of Floating-Point Arithmetic Floating-point semantics in Ada 83 are tied to the concepts of model numbers and safe numbers. Effectively, the safe numbers define, for a given implementation, the accuracy required of the predefined arithmetic operators and the conditions under which overflow is and is not possible. Numerical analysts have used characteristics of the safe numbers to make claims about the actual performance of their programs in the underlying environment, and they have used attributes of the safe numbers to tailor the behavior of their programs to the numerical properties of the underlying environment. The model numbers, in contrast, can be said to represent the worst-case properties of the safe numbers over all conceivable conforming implementations, and therefore the worst acceptable numerical performance of an Ada 83 program. Numerical analysts have generally not exploited the model numbers or their attributes for any purpose, because they prefer to focus on the actual performance of a program in the underlying environment. The attributes of the safe numbers permit one to reason about that performance in a uniform, symbolic way over all implementations. Since the model numbers of Ada 83 have generally not been put to practical use, they are eliminated in Ada 9X. The concept of safe numbers as the determinant of the actual numeric quality in the underlying environment survives, but in their incarnation in Ada 9X the former Ada 83 safe numbers are called model numbers. At the same time, their definition has been modified slightly to allow them to fit more closely to the actual numeric characteristics of the environment, making them more useful for the purpose for which they were intended. In their new role, they correspond exactly to what Brown in [brown81], which is the basis of Ada's model of floating-point arithmetic, called model numbers. The changes in the floating-point model are in line with those addressed by Study Topic S11.1-B(1). 1.1.1. Floating-Point Machine Numbers Ada 83 includes a characterization of the underlying machine representation of a floating-point type, based on an interpretation of the canonical form [RM 3.5.7(4)] in which the constraints on mantissa, radix, and exponent are those dictated by certain representation attributes [RM 13.7.3(5-9)]. This amounts to the definition of a set of numbers which we are calling, in Ada 9X, the machine numbers of a floating-point type. We define the machine numbers of a type T to be those capable of being represented to full accuracy in the storage representation of T. Some machines have ``extended registers'' with a wider range and greater precision than the corresponding (or, sometimes, any) storage format. Thus, in the course of computation with a type T, values having wider range or greater precision than the machine numbers of T can be generated, as allowed by the model of floating-point arithmetic. They can also be assigned to variables of T in Ada 9X (if T is a base type), since a variable may be temporarily, or even permanently, held in a register. There is no guarantee, however, that such extended range or precision can be exploited, and consequently no attempt is made to characterize it. Note: In this connection, Ada 83 allows values of extended precision, but not extended range, to be assigned to variables. The benefits of keeping variables in extended registers are partially negated in Ada 83 by the need to perform range checks on assignment, even when the variable is of a base type (having, therefore, an implementation-defined range). These benefits are fully realizable in Ada 9X due to the fact that range checks are no longer performed on assignment to a variable of a numeric base type, on the passing of an argument to a formal parameter of such a type, and on returning a value of such a type from a function. Overflow may still be detected in such contexts, i.e. when the expression on the right-hand side of an assignment statement, in an actual parameter, or in a return statement performs an operation whose result exceeds the hardware's overflow threshold, but that is a separate semantic issue. This is discussed further in 1.1.5. Consideration was given to eliminating the characterization of machine numbers and retaining only that of the model numbers, thereby simplifying the discussion of floating-point matters even further. However, the characteristics of the machine numbers (that is, the storage representation of a floating-point type) are needed to define the meaning of certain attributes, viz. the ``primitive functions'' (see 1.4), as well as the meaning of UNCHECKED_CONVERSION when its source type is a floating-point type. In addition, occasionally it is appropriate to design a numerical algorithm so as to exploit the characteristics of the machine representation as much as possible, even though in some contexts the hardware might not allow the full benefit of such an attempt to be achieved. 1.1.2. Attributes of Floating-Point Machine Numbers The Ada 83 representation attributes of floating-point types (T'MACHINE_EMIN, T'MACHINE_EMAX, T'MACHINE_MANTISSA, T'MACHINE_RADIX, T'MACHINE_ROUNDS, and T'MACHINE_OVERFLOWS), which return values of type universal_integer, have been retained in Ada 9X, and two new Boolean-valued attributes (T'DENORM and T'SIGNED_ZEROS) have been defined. It has never been particularly clear whether and how the Ada 83 representation attributes accommodate denormalized numbers, if the implementation happens to have them. This situation is improved in Ada 9X in two ways. Implementations that generate and use denormalized numbers for a floating-point type T, as defined in [IEEE754], will be distinguished by having T'DENORM = TRUE; otherwise, T'DENORM = FALSE. (Besides being useful to programmers, this new attribute plays a role in the definitions of some of the primitive-function attributes.) In addition, denormalized numbers are accommodated as machine numbers by clarifying the meaning of T'MACHINE_EMIN and relaxing the requirement that the leading digit of mantissa, in the canonical form of machine numbers, always be nonzero. The clarification is that T'MACHINE_EMIN gives the smallest value of exponent (in the canonical form) for which every combination of sign, exponent, and mantissa yields a machine number, i.e., a value capable of being represented to full accuracy in the storage representation of T. This effectively means that T'MACHINE_EMIN is the exponent of the smallest normalized machine number whose negation is also a machine number (which has relevance to implementations featuring ``radix-complement'' representation) and that, in implementations for which T'DENORM is TRUE, it is also the exponent of all of the denormalized numbers. A similar clarification for T'MACHINE_EMAX means that it is the exponent of the largest machine number whose negation is also a machine number; it is not the exponent of the most negative number on radix-complement machines. An alternative clarification of T'MACHINE_EMIN and T'MACHINE_EMAX was considered, namely, that they yield the minimum and maximum values of exponent for which some combination of sign, exponent, and mantissa yields a machine number. This would have allowed denormalized numbers to be accommodated without relaxing the requirement that the leading digit of mantissa be nonzero, and it would allow us to omit an observation, which we expect to include when we write the complete definition for the primitive function T'EXPONENT(X), as currently proposed, that this function can yield a result less than T'MACHINE_EMIN or greater than T'MACHINE_EMAX. Despite the apparent desirability of this alternative, it was judged to be too much of a departure from current practice and therefore too likely to cause compatibility problems. The new attribute T'SIGNED_ZEROS is provided to indicate whether the hardware distinguishes the sign of floating-point zeros, as described by [IEEE754]. This attribute, along with the T'COPY_SIGN ``primitive function'' attribute, allows the numerical programmer to extend the treatment of signed zeros to the higher-level abstractions he or she creates, much in the manner of the elementary functions ARCTAN and ARCCOT (see 1.3). It is expected that implementations that distinguish the sign of zeros will do so in a way consistent with relevant external standards (e.g., [IEEE754]) to the extent that such standards apply to operations of Ada, and in appropriate and consistent (but implementation-defined) ways otherwise; thus, no attempt is made in Ada 9X to prescribe the sign of every possible zero result, or the behavior of every operation receiving an operand of zero. The two new attributes T'DENORM and T'SIGNED_ZEROS describe properties that an implementation may exhibit independently of any other support for IEEE arithmetic. Some implementations of Ada 83 do feature denormalized numbers and signed zeros (because they come for ``free'' with the hardware), but no other features of IEEE arithmetic. 1.1.3. Floating-Point Model Numbers The primary changes that distinguish Ada 9X model numbers from Ada 83 safe numbers are these: 1. the length of the mantissa (in the canonical form) is no longer ``quantized,'' but is as large as possible consistent with satisfaction of the accuracy requirements; 2. the radix (in the canonical form) is no longer always two, but is the same (for a type T) as T'MACHINE_RADIX; 3. the model numbers form an infinite set; 4. the maximum non-overflowing exponent is no longer bounded below by a function of the mantissa length; 5. the minimum exponent is no longer required to be the negation of the maximum non-overflowing exponent, but is given (for a type T) by an independent attribute. The Ada 83 safe numbers have mantissa lengths that are a function of the DIGITS attribute of the underlying predefined type, giving them a quantized length chosen from the list (5, 8, 11, 15, 18, 21, 25, ...). Thus, on binary hardware having T'MACHINE_MANTISSA = 24, which is a common mantissa length of the single-precision floating-point hardware type, the last three bits of the machine representation exceed the precision of the safe numbers; as a consequence, even when the machine arithmetic is fully accurate (at the machine-number level), one cannot deduce that Ada arithmetic operations deliver full machine-number accuracy. With the first change enumerated above (freeing the mantissa length from quantization), tighter accuracy claims will be provable on many machines. As an additional consequence of this change, in Ada 9X the two types declared as follows type T1 is digits D; type T2 is digits D range T1'FIRST .. T1'LAST; will, as they intuitively should, have the same hardware representation when hardware characteristics do not require parameter penalties; in Ada 83, their hardware representations almost always differ, with T2'BASE'DIGITS > T1'BASE'DIGITS, for reasons having nothing to do with hardware considerations. The second change enumerated above (non-binary radix) has two effects: - it permits practical implementations on decimal hardware (which, though not not currently of commercial significance for mainstream computers, is permitted by IEEE Std. 854 [IEEE854]; is appealing for embedded computers in consumer electronics; and is used in at least one such application, an HP calculator); - on hexadecimal hardware, it allows more machine numbers to be classed as model numbers (and therefore to be proven to possess special properties, such as being exactly representable, contributing no error in certain arithmetic operations, etc.). As an example of the latter effect, note that T'LAST will become a model number on most hexadecimal machines. Also, on hexadecimal hardware, a 64-bit double-precision type having 14 hexadecimal (or 56 binary) digits in the hardware mantissa, as on many IBM machines, has safe numbers with a mantissa length of 51 binary bits in Ada 83, and thus no machine number of this type with more than 51 bits of significance is a safe number; in Ada 9X, such a type would have a mantissa length of 14 hexadecimal digits, with the consequence that every machine number with 53 bits of significance is now a model number, as are some with even more. (Why does the type under discussion not have Ada 83 safe numbers with 55 bits in the mantissa, the next possible quantized length and a length that is less than that of the machine mantissa? Because some machine numbers with 54 or 55 bits of significance do not yield exact results when divided by two and cannot therefore be safe numbers. This is a consequence of their hexadecimal normalization, and it gives rise to the phenomenon known as ``wobbling precision.'') The third change enumerated above (extending the model numbers to an infinite set) is intended to fill a gap in Ada 83 wherein the results of arithmetic operations are not formally defined when they exceed the modeled overflow threshold but an exception is not raised. Some of the reasons why this can happen are as follows: - the quantization of mantissa lengths may force the modeled overflow threshold to be lower than the actual hardware threshold; - arithmetic anomalies of one operation may require the attributes of model and safe numbers to be conservative, with the result that other operations exceed the minimum guaranteed performance; - the provision and use of extended registers in some machines moves the overflow threshold of the registers used to hold arithmetic results well away from that of the storage representation; - the positive and negative actual overflow thresholds may be different, as on radix-complement machines. The extension of the model numbers to an infinite range fills a similar gap in Ada 83 wherein no result is formally defined for an operation receiving an operand exceeding the modeled overflow threshold, when an exception was not raised during its prior computation. The change means, of course, that one can no longer say that the model numbers of a type are a subset of the machine numbers of the type; one may say instead that the model numbers of a type T in the range -T'MODEL_LARGE .. T'MODEL_LARGE are a subset of the machine numbers of T. The fourth change enumerated above (freeing the maximum exponent from dependence on the mantissa length) is equivalent to the dropping of the ``4*B Rule'' as it applies to the predefined types; a version of the rule | still affects the implementation's implicit selection of an underlying | representation for a user-declared floating-point type lacking a range | constraint, providing in that case a guaranteed range tied to the requested | precision. The change in the application of the 4*B Rule allows all | hardware representations to be accommodated as predefined types with | attributes that accurately characterize their properties. Such types are | available for implicit selection by the implementation when their properties | are compatible with the precision and range requested by the user, but they | remain unavailable for implicit selection in exactly those situations in | which, in the absence of an explicit range constraint, the 4*B Rule of Ada | 83 acted to preclude their selection. Compatibility considerations related | to the 4*B Rule are further discussed in 1.1.7. | | The 4*B Rule was necessary in Ada 83 in order to define the model numbers of | a type entirely as a function of a single parameter (the requested | precision). By its nature, the rule potentially precludes the | implementation of Ada in some (hypothetical) environments, as if to say that | such environments are not suitable for the language or applications written | in it; in other (actual) environments, it artificially penalizes the | reported properties of some hardware types so strongly that they have only | marginal utility as predefined types available for implicit selection and | may end up being ignored by the vendor. Such matters are best left to the | judgment of the marketplace and not dictated by the language. The | particular minimum range required in Ada 83 (as a function of precision) is | furthermore about twice that deemed minimally necessary for numeric | applications [brown81]. | | Among current implementations of Ada, the only predefined types whose | characteristics are affected by the relaxation of the 4*B Rule are DEC VAX | D-format and IBM Extended Precision, both of which have a narrow exponent | range in relation to their precision. In the case of VAX D-format, even | though the hardware type provides the equivalent of 16 decimal digits of | precision, its narrow exponent range requires that 'DIGITS for this type be | severely penalized and reported as 9 in Ada 83; 'MANTISSA is similarly | penalized and reported as 31, and the other model attributes follow suit. | In Ada 9X, in contrast, this predefined type would have a 'DIGITS of 16, a | 'MODEL_MANTISSA of 56, and other model attributes accurately reflecting the | type's actual properties. A user-declared floating-point type requesting | more than 9 digits of precision does not select D-format as the underlying | representation in Ada 83, but instead selects H-format; in Ada 9X, it still | cannot select D-format if it lacks a range constraint (because of the analog | of the 4*B Rule that has been built into the equivalence rule), but it can | select D-format if it includes an explicit range constraint with | sufficiently small bounds. The compatibility issues associated with these | changes are discussed in 1.1.7. | | The IBM Extended Precision hardware type has an actual decimal precision of | 32, but the 4*B Rule requires its 'DIGITS to be severely penalized and | reported as 18, only three more than that of the double-precision type. | Supporting this type allows an Ada 83 implementation to increase | SYSTEM.MAX_DIGITS from 15 to 18, a marginal gain; perhaps this is the reason | why it is rarely supported (it is supported by Alsys but not by the other | vendors that have implementations for IBM System/370s). In Ada 9X, on the | other hand, such an implementation can support Extended Precision with a | 'DIGITS of 32, though SYSTEM.MAX_DIGITS must still be 18. Although a | floating-point type declaration lacking a range constraint cannot request | more than 18 digits, those including an explicit range constraint with | sufficiently small bounds can do so and can thereby select Extended | Precision. | The fifth change enumerated above (separate attribute for the minimum exponent) removes another compromise made necessary by the desire, in Ada 83, to define the model numbers of a type in terms of a single parameter. The minimum exponent of the model or safe numbers of a type in Ada 83 is required to be the negation of the maximum exponent (thereby tying it to the precision implicitly). One consequence of this is that the maximum exponent may need to be reduced simply to avoid having the smallest positive safe number lie inside the implementation's actual underflow threshold; if it is needed, such a reduction provides another way to obtain values in excess of the modeled overflow threshold without raising an exception. Another is that the smallest positive safe number may have a value unnecessarily greater the actual underflow threshold. With the fifth change, as with some of the others, more of the machine numbers will be recognized as numbers having special properties, i.e., as model numbers. Consideration was given to eliminating the model numbers and retaining only the machine numbers. While this would simplify the semantics of floating-point arithmetic further, it would not eliminate the interval orientation of the accuracy requirements (see L.1.5) if variations in rounding mode from one implementation to another, and if the use of extended registers, are to be tolerated. It would simply substitute the machine numbers and intervals of machine numbers for the model numbers and intervals of model numbers in those requirements, but their qualitative form would remain the same. However, rephrasing the accuracy requirements in terms of machine numbers and intervals thereof cannot be realistically considered, since many platforms on which Ada has been implemented and might be implemented in the future could not conform to such stringent requirements. If an implementation has appropriate characteristics, its model numbers up to the modeled overflow threshold will in fact coincide with its machine numbers, and an analysis of a program's behavior in terms of the model numbers will not only have the same qualitative form as it would have if the accuracy requirements were expressed in terms of machine numbers, but it will have the same quantitative implications as well. On the other hand, if an implementation lacks guard digits, employs radix-complement representation, or has genuine anomalies, its model numbers up to the modeled overflow threshold will be a subset of its machine numbers having less precision, a narrower exponent range, or both, and accuracy requirements expressed in the same qualitative form, albeit in terms of the machine numbers, would be unsatisfiable. 1.1.4. Attributes of Floating-Point Model Numbers Although some of the attributes of model numbers in Ada 9X are closely related to those of the safe numbers in Ada 83, they all bear new names of the form T'MODEL_xxx. Certainly this is necessary for Ada 83's T'MANTISSA; the new version, T'MODEL_MANTISSA, is conceptually equivalent to T'BASE'MANTISSA in Ada 83 but is now interpreted as the number of radix-digits in the mantissa. Thus, at a minimum the value of this attribute will be roughly quartered on hexadecimal machines, even if there is no reason to take advantage of the other freedoms now permitted. A new name is certainly also necessary for T'SAFE_EMAX; the new version, T'MODEL_EMAX, is now interpreted as a power of the hardware radix, and not necessarily as a power of two. For hexadecimal machines, the value of this attribute will be quartered, all other things being equal. T'MODEL_EMIN is a new attribute. T'MODEL_LARGE is conceptually equivalent to Ada 83's T'SAFE_LARGE. It is defined in terms of more fundamental attributes, as was true of T'LARGE in Ada 83, with the result that the changes in the radix of the model numbers ``cancel out'' in the definition of this attribute; its value will change little, if at all, and then only to reflect the unquantization of mantissa lengths of model numbers. The same can be said about T'MODEL_SMALL, which is conceptually equivalent to Ada 83's T'SAFE_SMALL, and about T'MODEL_EPSILON, which is conceptually equivalent to Ada 83's T'BASE'EPSILON. The values of these attributes will be determined by how well the implementation can satisfy the accuracy requirements, with the primary determinant being the quality of the hardware's arithmetic. On ``clean'' machines, for which the model numbers up to the modeled overflow threshold coincide with the machine numbers, T'MODEL_MANTISSA, T'MODEL_EMAX, and T'MODEL_EMIN will yield the same values as T'MACHINE_MANTISSA, T'MACHINE_EMAX, and T'MACHINE_EMIN, respectively, though in general T'MODEL_MANTISSA and T'MODEL_EMAX may be smaller than their machine counterparts, and T'MODEL_EMIN may be larger. It is illuminating to contrast the processes by which the values of the | model attributes of the predefined types are determined in Ada 83 and Ada | 9X. For this purpose, we restate the process for Ada 9X first, then we | present the similar Ada 83 process in an unconventional but comparable form. | | For a predefined type P in Ada 9X, the process is as follows: | | - Determine simultaneously the minimum and maximum exponents (EMIN | and EMAX) and the maximum mantissa length (MMAX) for which the | accuracy requirements, expressed in terms of the resulting set of | model numbers, are satisfied. EMIN may be as small as | P'MACHINE_EMIN, but hardware anomalies in the nature of premature | underflow may cause it to be larger. EMAX may be as large as | P'MACHINE_EMAX, but hardware anomalies in the nature of premature | overflow may cause it to be smaller. MMAX may be as large as | P'MACHINE_MANTISSA, but lack of guard digits, or hardware | anomalies in the nature of inaccurate arithmetic, may cause it to | be smaller. | | - Set P'MODEL_EMIN = EMIN. | | - Set P'MODEL_EMAX = EMAX. | | - Let DMAX be the maximum value of D for which ceiling(D * | log(10)/log(P'MACHINE_RADIX) + 1) <= MMAX. | | - Set P'DIGITS = DMAX. | | - Set P'MODEL_MANTISSA = MMAX. | | - Set P'MODEL_EPSILON = P'MACHINE_RADIX ** (1 - P'MODEL_MANTISSA). | | - Set P'MODEL_SMALL = P'MACHINE_RADIX ** (P'MODEL_EMIN - 1). | | - Set P'MODEL_LARGE = P'MACHINE_RADIX ** P'MODEL_EMAX * (1.0 - | P'MACHINE_RADIX ** (-P'MODEL_MANTISSA)). | | In comparable terms, the same process for Ada 83 may be stated as follows: | | - Determine simultaneously the minimum and maximum binary-equivalent | exponents (EMIN and EMAX) and the maximum binary mantissa length | (MMAX) for which the accuracy requirements, expressed in terms of | the resulting set of model numbers, are satisfied. EMIN may be as | small as P'MACHINE_EMIN * log(P'MACHINE_RADIX)/log(2), but | hardware anomalies in the nature of premature underflow may cause | it to be larger. EMAX may be as large as P'MACHINE_EMAX * | log(P'MACHINE_RADIX)/log(2), but hardware anomalies in the nature | of premature overflow may cause it to be smaller. MMAX may be as | large as (P'MACHINE_MANTISSA - 1) * log(P'MACHINE_RADIX)/log(2) + | 1, but lack of guard digits, or hardware anomalies in the nature | of inaccurate arithmetic, may cause it to be smaller. | | - Set P'SAFE_EMAX = min(EMAX, -EMIN). | | - For each D, define a corresponding value of B as follows: B = | ceiling(D * log(10)/log(P'MACHINE_RADIX) + 1). Let DMAX be the | maximum value of D for which the corresponding value of B <= MMAX | and for which 4*B <= P'SAFE_EMAX. Call the corresponding value of | B BMAX. | | - Set P'DIGITS = DMAX. | | - Set P'MANTISSA = BMAX. | | - Set P'EMAX = 4 * P'MANTISSA. | | - Set P'EPSILON = 2.0 ** (1 - P'MANTISSA). | | - Set P'SMALL = 2.0 ** (-P'EMAX - 1). | | - Set P'LARGE = 2.0 ** P'EMAX * (1.0 - 2.0 ** (-P'MANTISSA)). | | - Set P'SAFE_SMALL = 2.0 ** (-P'SAFE_EMAX - 1). | | - Set P'SAFE_LARGE = 2.0 ** P'SAFE_EMAX * (1.0 - 2.0 ** | (-P'MANTISSA)). | | Similar strictly comparable Ada 83 and Ada 9X statements of the equivalence | rule, by which an implementation implicitly selects a predefined type to | represent a user-declared floating-point type, are given in 1.1.6. By | examining those in conjunction with the attribute determination rules just | given, one can see readily that the 4*B Rule of Ada 83 is entirely | encapsulated in the attribute determination rules, while its analog in Ada | 9X is entirely encapsulated in the equivalence rule. | The set of attributes having both T'MODEL_xxx and T'MACHINE_xxx versions could conceivably be enlarged to add further intuitive strength and uniformity to the naming convention, but we have resisted adding attributes which meet no identifiable need. The attributes whose Ada 83 counterparts returned results of the type universal_integer continue to do so; these are T'MODEL_MANTISSA and T'MODEL_EMAX. The new attribute T'MODEL_EMIN also yields a value of this type. The attributes whose Ada 83 counterparts returned results of the type universal_real still do; these are T'MODEL_LARGE, T'MODEL_SMALL, and T'MODEL_EPSILON. Although there is no particular reason why T'MODEL_LARGE and T'MODEL_SMALL cannot return values of the base type of T, neither is there a compelling reason to make what would be a gratuitous change. It is our plan to establish a catalog of the fundamental model parameters for all known implementations of Ada at some future time. The renaming of the model attributes is intended to avoid one set of | compatibility problems, wherein programs remain valid but change their | effect as the result of changes in the values of attributes, but of course | it introduces another: such programs become invalid. To avoid this, | implementations are encouraged to continue to provide the obsolescent | attributes, in fact with their Ada 83 values, as is discussed more fully in | 1.1.7. | 1.1.5. Accuracy of Floating-Point Operations The accuracy requirements for certain predefined operations (arithmetic operators, relational operators, and the basic operation of conversion, all of which are referred to in this section simply as ``predefined arithmetic operations'') of floating-point types other than root_real are still expressed in terms of model intervals, for reasons explained earlier. It is clarified that they do not apply to all such operations, however. For example, they do not apply to any attribute that yields a result of a specific floating-point type; such an attribute yields a machine number, which must be exact. The accuracy requirements for exponentiation are relaxed in accord with AI-00868. The weaker rules no longer require that exponentiation be implemented as repeated multiplication; special cases can be recognized and implemented more efficiently by, for example, repeated squaring, even when accuracy is sacrificed by doing so. The implementation model for exponentiation by a negative exponent continues to be exponentiation by its absolute value, followed by reciprocation. Thus, the rule continues to allow for the possibility of overflow in this case, despite the counterintuitive nature of such an overflow. The WG9 Numerics Rapporteur Group recommended this treatment, so as to allow for the most efficient implementations, recognizing, of course, that the user who is concerned with the possibility of overflow can express the desired computation differently and thereby avoid it. AI-00868 did not address the possibility of overflow in the intermediate results, for negative exponents. One of the goals for Ada 9X is to allow for and legitimize the typical kinds of optimizations that increase execution efficiency or numerical quality. One of these is the use, for the results of the predefined arithmetic operations of a type, of hardware representations having higher precision or greater range than those of the storage representation of the type. On some machines, this is not an option; arithmetic is performed in ``extended registers,'' there being no registers having exactly the precision or range of the storage cells used for variables of the type. Thus, we must allow the results of arithmetic operations to exceed the precision and range of the underlying type; avoiding that is intolerably expensive on some machines. A second common optimization is the retention of a variable's value in a register after its assignment, with subsequent references to the variable being fulfilled by using the register. This avoids load operations (i.e., the cost of memory references); it may, in many cases, even avoid the store into the storage location for the variable that would normally be generated for the assignment operation. One implication of legitimizing the use of extended registers is the need to define the result of an operation that could overflow but doesn't, as well as the result of a subsequent arithmetic operation that uses such a value. This is the motivation for the extension of the model numbers to an infinite range and for the rewrite of RM 4.5.7(7). The new rules describe behavior that is consistent with the assumption that an operation of a type T that successfully delivers or uses results beyond the modeled overflow threshold of the type T is actually performed by an operation corresponding to a type with higher precision and/or wider range than that of the type T, whose overflow threshold is not exceeded, and whose accuracy is no worse than that of the original operation of the type T. Ada 83 does not permit a value outside the range T'FIRST .. T'LAST to be propagated beyond the point where it is assigned to a variable of a type T, or is passed as an actual parameter to a formal parameter of type T, or is returned from a function of type T; a range check is required in these contexts to prevent it. Nothing prevents the carrying of excess precision beyond such a point, however. Thus, keeping a value in an extended register beyond such a point is permitted in Ada 83, whether or not it is also stored, provided that the range check is satisfied. The range check may be performed by a pair of comparisons of the source value to the bounds of the range, when those bounds are arbitrary; but in the case of a floating-point base type, the check may be a free by-product of the store. For example, on hardware conforming to IEEE arithmetic [IEEE754], storing an extended register into a shorter storage format will signal an overflow if the source value exceeds the range of the destination format. If the propagation has no need for an actual store, because the value is to be propagated in the register, then a store into a throw-away temporary, just to see if overflow occurs, may be the cheapest way to perform the range check. If the check succeeds, all subsequent uses of the value in the extended register are valid and safe, including any potential need to store it into storage, such as when the value is about to be passed as an actual parameter and the implementation prefers to pass parameters in storage, or even merely because of the need for register spilling at an arbitrary place not connected with a use of the entity currently in the register. The loss of precision that occurs at that point does not matter, because it is consistent with the perturbations allowed when the value, had it not been shortened, is subsequently used as the operand of an operation. For an assignment to a variable of a numeric base type, actual code to perform the range check is not always needed; it can be omitted if the implementation can deduce that the check must succeed. The generation of code to perform a range check is necessary only when extended registers are being used, and then only when the source expression is other than just a primary, that is, contains at the top level a predefined arithmetic operation that can give rise to a value outside the range. As an example, consider X := Y * Z; in which X, Y, and Z are assumed to be of some floating-point base type T. If there are no parameter penalties, T'MODEL_LARGE = T'LAST = -T'FIRST. If extended registers are not being used, then the multiplication cannot generate a value outside the range T'FIRST .. T'LAST (since the attempt to do so would overflow) and the range check can therefore be omitted. On the other hand, if extended registers are being used, a value exceeding T'MODEL_LARGE can be produced in the register, because the multiplication may no longer overflow, and a range check will be needed to preclude the propagation of a value outside T'FIRST .. T'LAST. When the source expression is simply the value of a variable, a formal parameter, or the result of a function call, as in X := Y; no actual range check is necessary, since the value (of Y, in the example) can be presumed to have passed an earlier range check in the first propagation away from the point where it was generated. When the source expression does contain a predefined arithmetic operation at the top level, the formal definition immediately precedes the range check on assignment by an overflow check (on the multiplication, in the first example above) that is at least as stringent (it is more stringent if there are parameter penalties causing T'MODEL_LARGE to be less than T'LAST). Because it is at least as stringent as the range check, the overflow check ought to subsume the range check, but in practice it does not since the actual overflow threshold, when extended registers are used, is even higher. It is unfortunate that the availability and use of extended registers sometimes require extra code to be generated for assignments in Ada 83. We discuss next a possible way to improve on this situation in Ada 9X. It is not what we have actually done, but it motivates the latter, which is described afterwards. We could proceed by making the range check optional at any propagation point (assignment statement, parameter passing, return statement) when the target type is a numeric base type. This would allow an out-of-range value to be propagated when no actual store is needed, and it would also permit an exception to be raised for an out-of-range value when the implementation does require that the propagation be performed by an actual store (for example, some implementations might never pass by-copy parameters or function results in registers). In general, it would permit an out-of-range value to survive in an extended register through an arbitrary number of propagations (in particular, those that don't require stores), only to give rise to an exception when a propagation point requiring a store is reached. We would also have to clarify that passing an actual parameter to an attribute that is a function, and returning a value from such an attribute, are considered propagations, since the attribute may be implemented in the same way as a function and may require its arguments or result to be passed in storage. Thus, a range check would optionally be performed at those places when the parameter or result is of a numeric base type and the source value can be out of range. Finally, we would also have to clarify that presenting an operand to a predefined arithmetic operation, and that returning a result from a predefined arithmetic operation, are also considered propagations, since some implementations may implement some such operations in the same way as function calls, requiring operands and results to be passed in storage. Thus, a range check would optionally be performed at those places, too, when the type of an operand or that of the operation's result is a numeric base type. Actually, this goes a bit too far: there is no need for the range check on the result of a predefined operation, since the more stringent overflow check already there subsumes it and accounts for any necessary raising of CONSTRAINT_ERROR at that point. That approach comes close to accomplishing what we need. The only problem with it is that it leaves untouched many contexts in which an out-of-range value in an extended register could be used without an opportunity for raising CONSTRAINT_ERROR, as might be required by the particular context. For example, simple variables used as the bounds of ranges, as discriminants, as subscripts, as specifiers of various sorts in declarations, as generic actual parameters, as case expressions, in delay statements, as choices in numerous constructs, and undoubtedly in other contexts would not be subject to a range check, because these are not propagation contexts. Implementations would have to be prepared to deal in these contexts with values having the range implied by the extended registers that are available, rather than the range implied by the base type associated with the context at hand. What we have actually done, instead of including the optional range check in the semantics of propagation, is to include it in the semantics of read-references for certain categories of primary, specifically name and function_call. (This does not appear in the Mapping Specification for the Numerics Area, except in the form of a Note, since it is assumed to be a feature of the core.) The range check in the three main propagation contexts of Ada 83 (assignment statement, by-copy parameter passing, and return statement) is entirely eliminated when the target type is a numeric base type. We shall now show that even in its absence there is an opportunity to raise CONSTRAINT_ERROR in those propagation contexts, when the target type is a numeric base type and the source value exceeds the type's range, and in all other contexts in which such a value might be used. Indeed, the propagation contexts are just a subset of the general contexts, so they need not be considered separately. Every value used in a read (fetch) context, including those in propagation contexts, is denoted by the category expression or one of its descendants. Those expressions, or constituents of expressions, involving predefined arithmetic operations (including the implicit conversions inherent in references to numeric literals) already provide an opportunity to raise CONSTRAINT_ERROR when they yield a value of a numeric base type that is outside the range of the type, because the operations perform an overflow check on the result that, being more stringent than the desired range check, subsumes it. The opportunity to raise a CONSTRAINT_ERROR for a parenthesized expression or a qualified expression, as an expression or a component thereof, is provided by the evaluation of the expression that is its immediate component. This leaves only names and function calls. Therefore, all the necessary opportunities for raising the desired CONSTRAINT_ERROR are covered by adding an optional range check to the semantics of read-references for names and function calls (i.e., after the return), when the type denoted by the name or function call is a numeric base type. Note that only names denoting non-static objects are affected, since the evaluation of static expressions is both exact and not limited as to range. We stress that all of the new range checks we are introducing are optional; that is, either the check is optionally performed and raises CONSTRAINT_ERROR if it fails, or it is always performed and optionally raises CONSTRAINT_ERROR if it fails. Thus, the checks will not require any code to be generated unless an actual shortening (storing of an extended register) does need to occur at one of these places. Furthermore, as we indicated earlier, even then the range check may come for free (as on IEEE hardware). A simple example will illustrate what can be gained. Consider this typical inner product: SUM := 0.0; for I in A'RANGE loop SUM := SUM + A(I) * B(I); end loop; F(SUM); Assume that SUM is a variable of a numeric base type. We would like to keep SUM in an extended register during the loop, and in fact not even store the register into SUM during the loop. In Ada 83, we are formally obligated to perform a range check upon the assignment inside the loop, to prevent the propagation of a value outside SUM's range; in IEEE systems, the cheapest way to do this would be by storing into SUM after all, or into a throw-away temporary. Thus, a store (or some other means of checking) is executed each time through the loop. In Ada 9X, on the other hand, no range check is performed on that assignment, allowing an out-of-range value to be propagated, and justifying the complete omission of stores of the extended register containing SUM within the loop. The CONSTRAINT_ERROR that Ada 83 would have raised on some assignment in the loop might instead occur during the passing of SUM to F. It is allowed by the new optional range check in the semantics of the variable reference inherent in the parameter association, and whether or not it occurs there depends on whether parameters are passed in storage or in registers and, in the former case, whether the value of SUM is out of range at that point. With this change, it is true that exceptions can occur in places where they did not occur in Ada 83. However, whenever this happens, one can point to a different place in the program where the exception would have occurred, earlier, in its interpretation according to Ada 83 semantics. It may also be that the exception never occurs in the Ada 9X interpretation; in the example above, it may be that SUM remains forever in an extended register and is never stored, or it may be that its value has been brought back within range by the time it is stored. We should probably have noted much earlier that this treatment of numeric base types applies to all of them, not just floating-point base types. It allows integer and fixed-point values to be held in, for example, 32-bit general registers, in which integer arithmetic is performed, even when the storage format of the base types involved has only 16 or 8 bits. Also, although omitting certain range checks appears to conflict with the safety goals of range checking, it must be remembered that the bounds of numeric base types are implementation dependent anyway, so that whether a particular source value can be assigned to a target of a numeric base type already depends (i.e., in Ada 83) on properties of the implementation. Furthermore, all declared integer and fixed-point types necessarily involve range constraints and will therefore be subject to range checking; only floating-point types declared without a range constraint will escape it. Of course, all predefined numeric types are base types and will escape range checking (which is consistent with their implementation-dependent ranges). Other than by using floating-point types declared without a range constraint, or by using predefined types, or by going out of one's way to use T'BASE as a type mark, one will not escape range checking. As was explained above, even Ada 83 allowed and explained the loss of precision that can occur in shortening, when the propagation of a value held in an extended register requires it. Actually, there is one exception to this: if shortening is allowed to take place on a value being passed to an instantiation of UNCHECKED_CONVERSION (and it certainly seems that shortening is expected in that context), then nothing in the definition of UNCHECKED_CONVERSION, or anywhere else, currently allows or explains the shortening, in regard to the contrast between the possibly extra-precise value going in and the presumably shortened value coming out. In Ada 9X, the primitive functions (see 1.4) introduce several additional contexts in which shortening can occur and yet the accompanying loss of precision is potentially unexplained. We need to introduce some rules that explain the possibility of loss of accuracy in those contexts where it is not currently explained. (The primitive functions, being attributes, are not operations to which 4.5.7 applies.) It seems likely that the latter will be specific to the contexts involved, though we have not yet resolved how best to accomplish that. The accuracy requirements for floating-point arithmetic operations are, for the time being, expressed separately from those for fixed-point operations, since the latter do not need the full generality of the interval-based model appropriate for floating-point operations (see 1.2). Nevertheless, the rules may ultimately be recombined into a uniform set of rules for all real types for purely presentation purposes; if so, it would be made clear that some of the freedoms permitted in the floating-point case do not apply in the fixed-point case. 1.1.6. Floating-Point Type Declarations The restatement of the ``equivalence rule'' for user-declared floating-point type declarations, which explains how an implementation selects a predefined floating-point type on which to base the representation of the declared type, is a natural extension of its form in Ada 83 that accommodates the changes described earlier. This rule is the basis for the intuitive (and informal) observation that floating-point types provide for approximate computations with real numbers so as to guarantee that the relative error of an operation that yields a result of type T is bounded by 10.0 ** (-T'DIGITS). [Note: A more formally complete version of this observation can be obtained from a theorem of [brown81].] The Ada 9X analog of Ada 83's 4*B Rule exerts its effect during the | application of the ``equivalence rule,'' by which an implementation | implicitly selects a predefined type to represent a user-declared | floating-point type. Consider the declaration | | type T is digits D [range L .. R]; | | Restated (see L.1.6), the Ada 9X equivalence rule says that this is | equivalent to | | type floating_point_type is new P; | subtype T is floating_point_type | [range floating_point_type(L) .. floating_point_type(R)]; | | where floating_point_type is an anonymous type, and where P is a predefined | floating-point type implicitly selected by the implementation so that it | satisfies the following requirements: | | - P'DIGITS >= D. | | - If a range L .. R is specified, then P'MODEL_LARGE >= max(abs(L), | abs(R)); otherwise, P'MODEL_LARGE >= 10.0 ** (4*D). | | The effect of the analog of Ada 83's 4*B Rule, known in Ada 9X as the 4*D | Rule, is to ensure that a user-declared type without a range constraint | provides adequate range; in fact, in all existing implementations of Ada, it | precludes a predefined type from being selected if and only if the type | would be precluded from selection by Ada 83's 4*B Rule. To see this, note | that Ada 83's equivalence rule can be stated (unconventionally) in the same | form, except that the conditions that the predefined type P must satisfy are | as follows: | | - P'DIGITS >= D. | | - If a range L .. R is specified, then P'SAFE_LARGE >= max(abs(L), | abs(R)). | | When the 4*D Rule precludes the selection of a type P in Ada 9X, it is | necessarily the case that the value of P'DIGITS is penalized in Ada 83 by | the 4*B Rule, and thus it is the first of the two conditions above that | precludes the selection of P in Ada 83. The value of P'DIGITS is not | penalized in Ada 9X. | | When a type declaration includes a range constraint whose bounds are | sufficiently small, the Ada 9X equivalence rule potentially permits the | selection of a predefined type precluded (e.g., by the first condition) in | Ada 83. However, among current implementations this occurs in only one | instance (see 1.1.7). The equivalence rule has been formulated in this way | in Ada 9X to emphasize the role of the range constraint in expressing the | minimum range needed for computations with the type. An alternative was | considered, in which the conditions for the selection of a predefined type P | can be stated as follows: | | - P'DIGITS >= D. | | - P'MODEL_LARGE >= 10.0 ** (4*D). | | - If a range L .. R is specified, then in addition P'MODEL_LARGE >= | max(abs(L), abs(R)). | | This alternative would result in complete equivalence between the Ada 9X | selections and those of Ada 83 for all user-declared floating-point types, | while still permitting hardware types that satisfy Ada 83's 4*B Rule only | with a precision penalty to be supported without penalty, but such types | could not always be selected when they provide adequate precision and range, | relative to the precision and range requested. The alternative was rejected | because it too severely restricts the utility of hardware types that can be | supported as unpenalized predefined types in Ada 9X. | | The change in the interpretation of the named number SYSTEM.MAX_DIGITS is | necessitated by the shift from the 4*B Rule to the new 4*D rule. This | attribute is typically used to declare an unconstrained floating-point type | with maximum precision. The change in its interpretation ensures that such | a use will have the same effect in Ada 9X, i.e., will result in the | selection of the same underlying representation as in Ada 83. | One way in which our changes do not go quite as far as possible in reflecting the actual properties of the machine has to do with the use of T'MODEL_LARGE in describing when overflow can occur and when it cannot. On radix-complement machines, the negative overflow threshold does not coincide in magnitude with the positive overflow threshold, but this is not reflected in T'MODEL_LARGE, which is conservative (that is, it characterizes the less extreme threshold). While this is not of any particular consequence as far as the rewrite of 4.5.7 goes (after all, there are many reasons why a value exceeding T'MODEL_LARGE in magnitude might not overflow), it does interact in an undesirable way with the equivalence rule for floating-point type declarations. If a floating-point type declaration specifies a lower bound exactly coinciding with the most negative floating-point number of some base type P, as can happen when P'FIRST is used as the lower bound of the requested range, then P will be ineligible as the representation of the type being declared (on radix-complement machines, and even when no other arithmetic anomalies are present). This suggests that T'MODEL_LARGE ought to be abandoned in favor of two attributes, say T'MODEL_FIRST and T'MODEL_LAST, that characterize the positive and negative ``safe'' (i.e., overflow-free) limits separately. The way that T'MODEL_EMAX is defined would have to change; presumably it could be the maximum of the exponents of T'MODEL_FIRST and T'MODEL_LAST in the canonical form. A T'MODEL_FIRST and T'MODEL_LAST could then be defined for all numeric types (for integer and fixed-point types they would be equal to T'BASE'FIRST and T'BASE'LAST, respectively), and they could be used in a type-independent statement of when overflow can and cannot occur (allowing its removal from 4.5.7). This is an attractive idea and may be explored in the future. 1.1.7. Compatibility Considerations | | In this section we analyze the impact of the potential sources of | incompatibility resulting from the changes in the model of floating-point | arithmetic. We argue that actual incompatibilities will arise rarely in | practice, and that strategies for minimizing their effect are available. | Actual incompatibilities have been reduced since the previous version of the | Mapping Specification by the inclusion of the 4*D Rule (see 1.1.6). | | The explicit use of model attributes of a floating-point type is rather rare | and usually restricted to expertly crafted numeric applications. Thus, the | elimination of some of the model attributes, in favor of new attributes with | somewhat different definitions and new names, will probably not be noticed | by the vast majority of existing Ada programs. We have already recommended | (see NM:FLOADMODATTR and 1.1.4) that vendors continue to support the | obsolescent attributes as implementation-defined attributes, with their Ada | 83 definitions, for the purpose of providing a smooth transition for those | that are affected. Detected use of such attributes should evoke a warning | message from the compiler, recommending that the references to obsolescent | attributes be replaced by appropriate references to the new attributes, or | by other appropriate expressions, when convenient. In most cases, the | substitution is expected to be straightforward, but some analysis will be | required to ascertain this. At least, by continuing to provide the | obsolescent attributes as implementation-defined attributes, a vendor can | provide continuity for programs affected by this change. | | A user-declared floating-point type declaration specifying an explicit range | whose bounds are small in relation to the requested precision may select an | underlying representation that, while providing the requested range, | nevertheless provides less range than in Ada 83. (This is because the | predefined type selected as the representation was required to satisfy the | 4*B Rule in Ada 83 but is not required to do so in Ada 9X.) As a | consequence, overflow in the computation of an intermediate result may occur | where it did not previously. However, among current implementations this | occurs only in DEC VAX implementations, when the requested precision exceeds | 9 and the requested range has relatively small bounds, and when use of | D-format, rather than G-format, for the LONG_FLOAT predefined type is | explicitly enabled by the appropriate pragma. That is, in Ada 9X, D-format | can be selected, whereas in Ada 83, D-format is precluded and H-format is | selected. DEC VAX D-format is used only rarely and is being de-emphasized | in newer systems. DEC VAX compilers that are affected by this change can | issue a warning message when D-format is selected in a situation in which | H-format would have been selected in Ada 83. The message can indicate that | removing the range constraint from the type declaration, and placing it | instead on a subtype declaration, will (necessarily) result in the same | selection for the underlying representation as in Ada 83. Alternatively, | the compiler can avoid selecting D-format, even though it is allowed to. | The language continues to express no preference for the selection of an | underlying representation when multiple representations are eligible. | | Similar problems do not arise with IBM Extended Precision in the Alsys | implementation for IBM 370. There is no larger type that is currently | selected when the requested precision exceeds 18 decimal digits and the | requested range has appropriately small bounds; thus, the selection of | Extended Precision in such a case in Ada 9X represents a valid | interpretation for what was previously an invalid program. | | In all other cases of which we are aware, the supported hardware types are | such that a type providing the requested precision will always provide a | range that satisfies Ada 83's 4*B Rule, resulting in no further | incompatibilities. | | A user-declared floating-point type declaration not specifying an explicit | range poses no compatibility problems, because the predefined type chosen to | represent the declared type must satisfy the new 4*D Rule (see 1.1.6), which | in this case provides for compatibility with Ada 83. Among current | implementations, the 4*D Rule will exert an effect only in DEC VAX and Alsys | IBM 370 implementations; in the former, it will preclude the selection of | D-format when the Ada 83 4*B Rule would have precluded it, and force the | selection of H-format instead, whereas in the latter, it will preclude the | selection of Extended Precision when it would not have been selected in Ada | 83, and continue to make the program invalid. The 4*D Rule, newly added to | the Mapping Specification (see L.1.6), should not, however, be viewed | strictly as a concession to compatibility. Rather, it should properly be | viewed as providing for uniformity among future implementations, by implying | a minimum range in a context where no minimum range is specified. It | coincidentally provides some additional compatibility when, without it, VAX | D-format would be selectable. | | When VAX D-format or IBM Extended Precision is selected in contexts when it | would also have been selected in Ada 83 to represent a type T, the value of | T'BASE'DIGITS, which was 9 for the former and 18 for the latter in Ada 83, | will now be 16 or 32, respectively. If this attribute is correctly used, | i.e. to tailor a computation to the actual precision provided by the | underlying type, then the computation should adapt itself naturally to the | new value. Nevertheless, in the few circumstances in which use of this | attribute is detected by an affected compiler, a warning message can be | issued. | | The very few remaining situations in which a different underlying | representation is selected in Ada 9X (for example, the one illustrated in | 1.1.3) are considered true anomalies genuinely worth correcting. In any | case, they have rather artificial characteristics and are thus extremely | unlikely to occur in practice. | 1.2. Semantics of Fixed-Point Arithmetic Various problems have been identified with fixed-point types in Ada 83: - They can be counterintuitive. The values of a fixed-point type are not always integer multiples of the declared delta (they are instead integer multiples of the small, which may be specified or defaulted, and which in either case need not be the same as, or even a submultiple of, the delta), and they do not always exhaust the declared range, even when the bounds of the declared range are integer multiples of the small (we are thinking about the case where a bound of the range is a power of two times the small). These surprises are responsible for some of the confusion with fixed-point types (although some programmers do understand and correctly exploit the fact that the high bound need not be representable). - The model used to define the accuracy requirements for operations of fixed-point types is much more complicated than it needs to be, and many of its freedoms have never, in fact, been exploited. The accuracy achieved by operations of fixed-point types in a given implementation is ultimately determined, in Ada 83, by the safe numbers of the type, just as for floating-point types, and indeed the safe numbers can, and in some implementations do, have more | precision than the model numbers. However, the model in Ada 83 allows the values of a real type (either fixed or float) to have arbitrarily greater precision than the safe numbers, i.e., to lie between safe numbers on the real number axis; implementations of fixed point typically do not exploit this freedom. Thus, the opportunity to perturb an operand value within its operand interval, although allowed, does not arise in the case of fixed point, since the operands are safe numbers to begin with. In a similar way, the opportunity to select any result within the result interval is not exploited by current implementations, which we believe always produce a safe number; furthermore, in many cases (i.e., for some operations) the result interval contains just a single safe number anyway, given that the operands are safe numbers, and it ought to be more readily apparent that the result is exact in these cases. - Support for fixed-point types is spotty, due to the difficulty of dealing accurately with multiplications and divisions having ``incompatible smalls'' as well as fixed-point multiplications, divisions, and conversions yielding a result of an integer or floating-point type. Algorithms have been published in [Hi90], but these are somewhat complicated and do not quite cover all cases, leading to implementations that do not support representation clauses for SMALL and that, therefore, only support binary smalls. These problems are partly the result of trying to make fixed-point types serve several needs and several application areas, none of which are served perfectly and all of which are compromised somewhat, as discussed below. - One of the intended applications for fixed-point types is sensor-based applications, where the representations of scaled physical quantities are transmitted over ports as binary integers. Digital signal processing is a related application area with a similar focus on manipulating scaled binary integers. These needs are met fairly well, because either no representation clauses for SMALL are used (the delta already being a power of two) or representation clauses for universally accepted values of SMALL are used. - Fixed-point types are intended, or at least they have been considered, for applications in the Information Systems area, i.e. to represent financial quantities that are typically integer multiples of decimal fractions of some monetary denomination. This need is not met well, since extra precision is generally intolerable in such applications, rounding needs to be controlled, and there is no guarantee that decimal scaling factors are supported (because they require the use of representation clauses). Many fixed-point implementations limit ranges to the equivalent of about ten decimal digits, which is inadequate for some IS applications. In addition, IS applications often need multiple representations of decimal data, e.g. for computation versus display. The fixed-point model in Ada 83 is heavily biased towards an internal representation of fixed-point data as a binary integer, and this bias strongly affects the range of the base type. Specifying a range as, for example, -9_999_999.99 .. 9_999_999.99 is considered cumbersome; in any case, it does not guarantee protection against exceeding the range in computations of the base type. - Finally, fixed-point types are often embraced as a kind of cheap floating point, suitable on hardware lacking true floating point when the application manipulates values from a severely restricted range. This need may be met well, in the sense that efficient performance may be expected when the small is allowed to default and the user holds no expectations that multiples of the delta are exactly represented, but it has influenced the design of the facility too heavily and it compromises the quality of what can be offered in the other application areas. It is not clear that this application of fixed point is much used or needed. Our solution to these problems is to remove some of the freedoms of the | interval-based accuracy requirements that have never been exploited and to | relax the accuracy requirements so as to encourage wider support for fixed | point. Applications that use binary scaling and/or carefully matched (``compatible'') scale factors in multiplications and divisions, which is typical of sensor-based and other embedded applications, will see no loss of accuracy or efficiency. It is not our intention to meet the special needs of IS applications; they are addressed by the new decimal fixed point types defined in the Information Systems area of the Special Needs Annex (see Section K), although undemanding applications in this area may coincidentally be served marginally better by ordinary fixed-point types than they were in Ada 83. While the revamped fixed-point facility removes and relaxes requirements | that have generally not been exploited, it does not go as far as we had | hoped in substituting intuitive behavior for the surprises of the past. | Version 4.1 of the Numerics Annex proposed to eliminate the concept of small | as distinct from delta, making the values of a user-declared fixed-point | type integer multiples of the declared delta. Although this proposal had | significant support, it was judged by others to represent too radical a | change and to produce too many incompatibilities. It would have caused | programs using fixed-point types with a delta that is not a power of two and | a default small to substitute different sets of values for those types and | to perform scaling by multiplication and division instead of shifting. By | retaining the concept of small as distinct from delta, as well as a default | rule for small that is analogous to the Ada 83 rule, we have in the present | version of the Numerics Annex avoided the need for any fixed-point type to | change its behavior. | | The default small in Ada 9X is an implementation-defined power of two less | than or equal to the delta, whereas in Ada 83 it was defined to be the | largest power of two less than or equal to the delta. The purpose of this | change is merely to allow implementations that previously used extra bits in | the representation of a fixed-point type for increased precision rather than | for increased range, giving the safe numbers more precision than the model | numbers, to continue to do so. An implementation that does so must, | however, accept the minor incompatibility represented by the fact that the | type's default small will differ from its value in Ada 83. Implementations | that used extra bits for extra range have no reason to change their default | choice of small, even though Ada 9X allows them to do so. | | Note that our simplification of the accuracy requirements, i.e., expressing | them directly in terms of certain sets of integer multiples of the result | type's small rather than in terms of model or safe intervals, removes the | need for some of the attributes of model and safe numbers of fixed-point | types. To the extent that these attributes are used in Ada 83 programs, the | elimination of these attributes poses a potential incompatibility problem. | As we did for floating-point types, we recommend that implementations | continue to provide these attributes as implementation-defined attributes, | with their Ada 83 values, and that implementations produce warning messages | upon detecting their use. | We had hoped to go so far as to remove, in support of Requirement R2.2-B(1), the potential surprise when a range bound that is a power of two times the small is not included within the range of a fixed-point type, by including all the integer multiples of the small in the declared bounds within the range of the type, arguing that declarations that change their meaning as a result can be rewritten to achieve the desired effect. But we could not argue that few programs would be affected by this change; in other words, even though this property of Ada 83 has the potential for surprise with some programs, it is used correctly by far more. The feature remains, but it is reflected by a different mechanism now that the concepts of model numbers and their mantissas have been dropped from the fixed-point description. Some of the accuracy requirements, i.e. those for the adding operators and comparisons, now simply say that the result is exact. This was always the case in Ada 83, assuming operands are always safe numbers there, and yet it is not clear from the model-interval form of the accuracy requirements that comparison of fixed-point quantities is, in practice, deterministic and need not be otherwise. Other accuracy requirements are now expressed in terms of small sets of allowable results, called ``perfect result sets'' or ``close result sets'' depending on the amount of accuracy that it is practical to require; these sets always contain consecutive integer multiples of the result type's small (or of a ``virtual'' small of 1.0 in the case of multiplication or division with an integer result type). In some cases, the sets are seen to contain a single such multiple or a pair of consecutive multiples; this clearly translates into a requirement that the result be exact, if possible, but never off by more that one rounding error or truncation error. The cases in which this occurs are the fixed-point multiplications and divisions in which the operand and result smalls are ``compatible,'' meaning that the product or quotient of the operand smalls (depending on whether the operation is a multiplication or a division) is either an integer multiple of the result small, or vice versa. (These cases cover much of the careful matching of types typically exhibited by sensor-based and other embedded applications, which are intended to produce exact results for multiplications and at-most-one-rounding-error results for divisions, with no extra code for scaling; they can produce the same results in Ada 9X, and with the same efficient implementation. Our definition of ``compatible'' is more general than required just to cover those cases of careful matching of operand and result types, permitting some multiplications that require scaling of the result by at worst a single integer division, with an error no worse than one rounding error.) For other cases (when the smalls are ``incompatible''), the accuracy requirements are relaxed, in support of Requirement R2.2-A(1); in fact, they are left implementation defined. Implementations need not go so far as to use the Hilfinger algorithms [Hi90], though they may of course do so. An Ada 9X implementation could, for instance, perform all necessary scaling on the result of a multiplication or division by a single integer multiplication or division (or shifting). That is, the efficiency for the cases of incompatible smalls need not be less than that for the cases of compatible smalls. This relaxation of the requirements is intended to encourage support for a wider range of smalls. Indeed, we considered making support for all smalls mandatory on the grounds that the relaxed requirements removed all barriers to practical support for arbitrary smalls, but we rejected it because it would make many existing implementations instantly nonconforming. Ada 9X allows an operand of fixed-point multiplication or division to be a | real literal, named number, or attribute. Since the value V of that operand | can always be factored as an integer multiple of a compatible small, the | operation must be performed with no more than one rounding error and will | cost no more than one integer multiplication or division for scaling. Note: | That V can always be factored in this way follows from the fact that it, and | the smalls of the other operand and the result, are necessarily all rational | quantities. | The accuracy requirements for fixed-point multiplication, division, and conversion to a floating-point target are left implementation defined because the implementation techniques described in [Hi90] rely on the availability of several extra bits in typical floating-point representations beyond those belonging to the Ada 83 safe numbers; with the revision of the floating-point model, in particular the elimination of the quantization of the mantissa lengths of model numbers, those bits are now likely gone. Requiring model-number accuracy for these operations would demand implementation techniques that are more exacting, expensive, and complicated than those in [Hi90], or it would result in penalizing the mantissa length of the model numbers of a floating-point type just to recover those bits for this one relatively unimportant operation. With the accuracy requirements for this case left implementation defined, an implementation may use the simple techniques in [Hi90] for fixed-point multiplication, division, and conversion to a floating-point target; the accuracy achieved will be exactly as in Ada 83, but will simply not be categorizable as model-number accuracy. We have abandoned an idea we first put forth in Version 4.0 of the Mapping Specification, namely, that of allowing the radix of the representation of an ordinary (i.e., non-decimal) fixed-point type to be specified as ten, by a representation clause (attribute definition clause in Ada 9X); the current bias towards a radix of two would persist as the default. We abandoned this idea because it benefits primarily IS applications, which are now addressed by separate features. This feature would integrate well with the rest of our proposal, were it to be restored. Its primary semantic effect would be to exclude range bounds that are powers of ten from necessarily being included within the bounds of the type. It would permit bounds like 999_999_999.99 in the declaration of a type whose small is .01 to be written instead as 1_000_000_000.00 or even as 0.01E+12, which is close to the ``digits 11'' shorthand provided by IS:DECIMAL_FIXED_POINT. Since it would be of no particular benefit outside of IS applications, and since it is only a minuscule part of the totality of additional support required in the IS area, it is not worth adding to ordinary fixed point. A suggestion for a new representation attribute, T'MACHINE_SATURATES, has been made. Some digital signal processors do not signal a fault or wrap around upon overflow but instead saturate at the most positive or negative value of the base type. It could be useful to detect and describe that behavior by means of the suggested attribute. We leave this as a subject for future exploration, perhaps in conjunction with similar refinements of T'MACHINE_OVERFLOWS suggested (for floating-point types) by IEEE arithmetic. 1.3. Elementary Functions For a general rationale for the elementary functions, the reader is referred to [GEF91]. These functions are critical to a wide variety of scientific and engineering applications written in Ada. They have been widely provided in the past as vendor extensions with no standardized interface and with no guarantee of accuracy. These impediments to portability and to analysis of programs are removed by their inclusion in the Numerics Area features of Ada 9X, in support of Requirement R11.1-A(1). The elementary functions are provided in Ada 9X by a new predefined generic package, GENERIC_ELEMENTARY_FUNCTIONS, which is a very slight variation of that proposed in ISO DIS 11430, ``Proposed Standard for a Generic Package of Elementary Functions for Ada.'' The Ada 9X version capitalizes on a feature of Ada 9X (use of T'BASE as a type mark in declarations) not available in the environment (Ada 83) to which the DIS is targeted. The feature has been used here to declare the formal parameter types and result types of the elementary functions to be the base type of the generic formal type, eliminating the possibility of range violations at the interface. The same feature can be used for local variables in the body of GENERIC_ELEMENTARY_FUNCTIONS (if it is programmed in Ada) to avoid spurious exceptions caused by range violations on assignments to local variables of the generic formal type. Thus, there is no longer a need to allow implementations to impose the restriction that the generic actual type in an instantiation must be a base type; implementations must allow a range-constrained subtype as the generic actual type, and they must be immune to the potential effects of the range constraint. An implementation that accommodates signed zeros (i.e., one for which FLOAT_TYPE'SIGNED_ZEROS is TRUE) is required to exploit them in several important contexts, in particular the signs of the zero results from the ``odd'' functions SIN, TAN, and their inverses and hyperbolic analogs, at the origin, and the sign of the half-cycle result from ARCTAN and ARCCOT; this follows a recommendation, in [kahan87], that provides important benefits for complex elementary functions built upon the real elementary functions, and for applications in conformal mapping. Exploitation of signed zeros at the many other places where elementary functions can return zero results is left implementation defined, since no obvious guidelines exist for these cases. 1.4. Primitive Functions For a general rationale for the primitive functions, the reader is referred to [GPF91]. They are required for high-quality, portable, efficient mathematical software such as is provided in libraries of special-function routines, and some are of value even for more mundane uses, like I/O conversions and software testing. The primitive functions are provided in support of Requirement R11.1-A(1). The casting of the primitive functions as attributes, rather than as functions in a generic package (e.g., GENERIC_PRIMITIVE_FUNCTIONS, as defined for Ada 83 in ISO CD 11729, ``Proposed Standard for a Generic Package of Primitive Functions for Ada''), befits their primitive nature and allows them to be used as components of static expressions, when the arguments are static. MAX and MIN are particularly useful in this regard, since they are sometimes needed in expressions in numeric type declarations, for example to ensure that a requested precision is limited to the maximum allowed. The functionality of SUCCESSOR and PREDECESSOR, from the proposed GENERIC_PRIMITIVE_FUNCTIONS standard, is provided by extending the existing attributes SUCC and PRED to floating-point types. Note that T'SUCC(0.0) returns the smallest positive number, which is a denormalized number if T'DENORM is TRUE and a normalized number of T'DENORM is FALSE; this is equivalent to the ``fmin'' derived constant of LCAS [LCAS]. Most of the other constants and operations of LCAS are provided either as primitive functions or other attributes in Ada 9X; those that are absent can be reliably defined in terms of existing attributes. The proposed separate standard for GENERIC_PRIMITIVE_FUNCTIONS stated that the primitive functions accept and deliver machine numbers, which implies that they never receive arguments in extended registers. Conceptually, that requirement could be removed in Ada 9X, though we are by no means certain that it is wise to do so, and we are still investigating the issue. If the primitive functions always receive machine numbers, then, for example, the result of T'EXPONENT(X) can be assumed to be in the range T'MIN(T'EXPONENT(T'PRED(0.0)), T'EXPONENT(T'SUCC(0.0))) .. T'MAX(T'EXPONENT(T'BASE'FIRST), T'EXPONENT(T'BASE'LAST)) and an integer type with that range can be declared to hold any value that can be returned by T'EXPONENT(X). (These bounds accommodate the fact that T'EXPONENT of a denormalized number returns a value less than T'MACHINE_EMIN, and they also accommodate implementations that may use radix-complement representation.) However, if we define the primitive functions so that they must accept the range of arguments that they might receive in extended registers, then we cannot bound the results of T'EXPONENT(X) by properties of the implementation, since the range of extended registers is nowhere reflected in such properties. In that case, one would be advised to construct an integer type of widest available range (SYSTEM.MIN_INT .. SYSTEM.MAX_INT) for the type of a variable used to hold values delivered by the EXPONENT attribute. If extended range and precision are allowed in the arguments of the primitive functions, T'SUCC, T'PRED, and T'ADJACENT will, nevertheless, deliver machine numbers of the type T. One primitive function that will be allowed to receive an argument in an extended register is T'MACHINE(X), an attribute that was not represented by a function in GENERIC_PRIMITIVE_FUNCTIONS. This attribute exists specifically to give the programmer a way to discard excess precision if the implementation happens to be using it, and if the details of an algorithm are sensitive to its use. It also has the side effect of guaranteeing that a value outside the range T'BASE'FIRST .. T'BASE'LAST is not propagated. The attribute is a no-op in implementations that do not use extended registers. Its definition allows efficient implementations on representative hardware. Thus, on IEEE hardware, it may be implemented simply by storing an extended register into the shorter storage format of the target type T; on implementations having types with extra precision but not extra exponent range, it may be implemented by storing the high-order part of a register pair into storage. Overflow may occur in the former case but cannot occur in the latter; in both cases, values slightly outside the range T'BASE'FIRST .. T'BASE'LAST can escape overflow by being rounded to an endpoint of the range. (This actually happens on IEEE hardware.) The related primitive function T'MODEL(X) also accepts its argument in an extended register and shortens the result to a machine number. In this case, however, the loss of low-order digits is potentially more severe. The result is guaranteed to be a model number within the range -T'MODEL_LARGE .. T'MODEL_LARGE. This function returns its floating-point argument perturbed to a nearby model number (if it is not already a model number) in the same way that is allowed for operands and results of the predefined arithmetic operations (see L.1.5), so it introduces no more error than what is already allowed. By forcing a quantity to a nearby model number, it guarantees that subsequent arithmetic operations and comparisons with the number will experience no further perturbation and will therefore produce predictable and consistent results. For example, suppose we have a situation like if X > 1.0 then ... -- several references to X ... end if; in which X can be extremely close to 1.0. If X is in the first model interval above 1.0, the semantics of floating-point arithmetic allow the references to X inside the if statement to behave as if they had the value 1.0, seemingly contradicting the condition that allows entry there, and multiple references could behave as if they yielded slightly different values. If this is intolerable, then one can write Y := T'MODEL(X); if Y > 1.0 then ... -- several references to Y ... end if; The value of Y can be no worse than some value already allowed for the result of the operation that produced X. If the if statement is entered, we are guaranteed that Y exceeds 1.0 and that all references to it yield the same value. If X has a value slightly exceeding 1.0, the if statement might not be entered, but that was also true in the earlier example. In implementations in which the model numbers coincide with the machine numbers, T'MODEL reduces to T'MACHINE, and if in that case extended registers are not being used, both are no-ops. 1.5. Possible Future Additions It has been suggested that Ada 9X should be positioned to compete better with certain kinds of numeric applications written in Fortran 90, and even in C, by adding at least rudimentary facilities for random number generation and for complex arithmetic. Any such facilities that we may propose in the near future, if the Ada community concurs that they are needed and should be provided, will take the form of optional predefined packages or generic packages. For random number generation, we would probably propose only a uniform random number capability; perhaps the ability to have multiple generators; subprograms for saving and setting the seed(s) and for initializing the generator (or a generator) to a repeatable (but implementation-dependent), or a random (time-dependent), state without the use of seeds; and subprograms to fill a whole array with random numbers in one call. Neither algorithms nor statistical tests would be prescribed. For complex arithmetic, we would probably propose only a generic package exporting a visible Cartesian complex type (whose real and imaginary parts are parameterized by an imported floating-point type), the appropriate arithmetic operators for the complex type, and a small set of complex elementary functions (e.g., those in Fortran 90). This is a small subset of the capabilities on which the SIGAda Numerics Working Group has been working for several years. It is unlikely that the proposal would include accuracy requirements or requirements for freedom from spurious exceptions (those that might be produced by relatively ``naive'' implementations when an intermediate result overflows but the components of the final result do not), since practical requirements in these areas have not yet been determined and agreed upon by researchers. Table of Contents 1. Numerics Annex (Rationale) 1 1.1. Semantics of Floating-Point Arithmetic 1 1.1.1. Floating-Point Machine Numbers 1 1.1.2. Attributes of Floating-Point Machine Numbers 1 1.1.3. Floating-Point Model Numbers 2 1.1.4. Attributes of Floating-Point Model Numbers 3 1.1.5. Accuracy of Floating-Point Operations 3 1.1.6. Floating-Point Type Declarations 5 1.1.7. Compatibility Considerations 5 1.2. Semantics of Fixed-Point Arithmetic 6 1.3. Elementary Functions 7 1.4. Primitive Functions 7 1.5. Possible Future Additions 8