LANGUAGE STUDY NOTE Variables and Function Calls with Values Outside Their Declared Subtypes Norman H. Cohen March 16, 1992 1. INTRODUCTION In a nonerroneous execution of an Ada 83 program, every variable, when evaluated, holds a value of its declared subtype and every function call returns a result of the declared result subtype. The rules of Ada 83 allow code to be generated under the assumption that these properties are never violated, since the rules allow erroneous executions to behave in any way whatsoever. In practice, these properties are often violated by variables inadvertently used before they are initialized and by data imported across an external interface. Code generated under the unrealistic assumption that the properties always hold may exhibit undesirable behavior when the properties do not hold. Checks that would prevent the program from storing into or branching to arbitrary addresses may be optimized away as a result of the assumption. Similarly, explicit checks written by the programmer to detect violations of the properties may be optimized away because of the assumption that no such violations exist. The rules of Ada 9X should reflect the reality that variables may assume values outside their subtype. Certain checks that could be optimized away in Ada 83 under the assumption that a variable holds a value of its declared subtype should be required in Ada 9X. However, it should remain possible to omit checks that could be omitted in Ada 83 without bad consequences. This language study note proposes a set of rules for dealing with variables and function results with values outside their declared subtypes. Section 2 of this note reviews specific problems arising from the current rules. Section 3 proposes a solution and examines its implications. Section 4 deals with technical issues arising from the solution for each kind of Ada 9X type. 2. PROBLEMS ASSOCIATED WITH INVALID BIT PATTERNS Invalid bit patterns arise in Ada from the evaluation of variables with undefined values and from data imported across an external interface. A scalar variable has an undefined value before it has been initialized or assigned to; any variable has an undefined value if a task is aborted while updating that variable. Data may be imported across an external interface at an agreed-upon storage location (perhaps using an address clause to place an Ada variable at that location) or read using the procedure READ provided by instances of SEQUENTIAL_IO and DIRECT_IO. In addition, the CHARACTER or STRING version of TEXT_IO.GET may read an 8-bit pattern that is invalid for type CHARACTER because the high-order bit is one. In some cases, data is imported with one type view (say an array of bytes) and converted by an instance of UNCHECKED_CONVERSION to another, higher-level type view. Many, but not all, invalid bit patterns arise from erroneous execution. Evaluation of an undefined scalar variable is erroneous by RM 3.2.1(18) and evaluation of other undefined variables is erroneous by AI-00837. An unchecked conversion that violates "the properties that are guaranteed by the language for objects of the target type" is erroneous by RM 13.10.2. Proposed AI-00870 would make execution erroneous when a call on READ yields an invalid bit pattern but does not raise DATA_ERROR, but the ultimate disposition of this issue is open. There is no RM rule and no AI covering invalid 8-bit patterns for type CHARACTER. Most compilers treat these with benign neglect, facilitating the writing of applications that could not otherwise be written in Ada 83. There seems to be no RM rule and no AI covering the case where a variable located at an address stipulated by an address cause (or designated by an access value obtained from or passed to the outside world) obtains an invalid bit pattern from some hardware or software agent outside the Ada program. Some compilers generate code under the assumption that variables do not contain invalid bit patterns. This strategy can be formally justified, at least in the case of variables not subject to address clauses and not set by calls on READ, by the observation that the assumption is violated only in erroneous executions. RM 1.6(7) permits the generation of code that produces unpredictable results in erroneous executions. However, this strategy has undesirable consequences in such contexts as membership tests, qualified expressions, case statements, and indexed components. 2.1. PROBLEMS WITH MEMBERSHIP TESTS Programmers wishing to guard against invalid bit patterns may attempt to do so with a membership test: type ALTITUDE_TYPE is range 0 .. 50_000; ALTITUDE, PREVIOUS_ALTITUDE : ALTITUDE_TYPE; ... loop PREVIOUS_ALTITUDE := ALTITUDE; READ_ALTITUDE_SENSOR(ALTITUDE); if ALTITUDE in ALTITUDE_TYPE then ... -- normal processing READINGS_IGNORED := 0; else ALTITUDE := PREVIOUS_ALTITUDE; READINGS_IGNORED := READINGS_IGNORED + 1; if READINGS_IGNORED > MAX_READINGS_IGNORED then raise ALTITUDE_SENSOR_FAILURE; end if; end if; ... end loop; However, a compiler assuming the absence of invalid bit patterns could deduce that a test for membership of a variable's value in the declared subtype of that variable--in this case ALTITUDE in ALTITUDE_TYPE --is guaranteed to succeed, and optimize away the test. 2.2. PROBLEMS WITH QUALIFIED EXPRESSIONS A similar problem may arise with the constraint check of a qualified expression: ALTITUDE, UNVALIDATED_ALTITUDE : ALTITUDE_TYPE; ... READ_ALTITUDE_SENSOR(UNVALIDATED_ALTITUDE); begin ALTITUDE := ALTITUDE_TYPE'(UNVALIDATED_ALTITUDE); VALID := TRUE; exception when CONSTRAINT_ERROR => VALID := FALSE; end; The compiler may optimize away the constraint check on the grounds that UNVALIDATED_ALTITUDE always satisfies the constraint of its declared subtype. 2.3. PROBLEMS WITH CASE STATEMENTS RM 5.4(4) imposes the following rules for choices in a case statement: If the expression is the name of an object whose subtype is static, then each value of this subtype must be represented once and only once in the set of choices of the case statement, and no other value is allowed; this rule is likewise applied if the expression is a qualified expression or type conversion whose type mark denotes a static subtype. Otherwise, for other forms of expression, each value of the (base) type of the expression must be represented once and only once in the set of choices, and no other value is allowed. Based on this rule and on the assumption that the case-statement expression is not a variable containing an invalid bit pattern, the compiler is permitted to generate a jump table with entries for only the following bit patterns: o when the case-statement expression is an object with a static subtype, the bit patterns corresponding to the values of that subtype; o when the case-statement expression is a qualified expression or type conversion whose type mark denotes a static subtype, the bit patterns corresponding to the values of that subtype; o otherwise, the bit patterns corresponding to the values of the base type of the case-statement expression. The language does not call for any check to be performed on the value of a case-statement expression (except the constraint check in a qualified expression in those instances in which a qualified expression is required, but as we have seen, the check inherent in the qualified expression can be optimized away). Therefore, if the case-statement expression is a variable containing an invalid bit pattern, or a qualified expression whose operand is such a variable, the generated code may jump to an improper location, with unpredictable consequences. This danger exists even in the presence of a "when others =>" alternative, since a compiler may treat this alternative as covering only the VALID bit patterns that have not been listed explicitly in earlier choices: TEXT_IO.GET(NEXT_CHAR); -- Suppose compiler does not check for byte -- values outside of 0 .. 127. case NEXT_CHAR is when CHARACTER'VAL(0) .. CHARACTER'VAL(31) | CHARACTER'VAL(127) => CLASS := CONTROL_CHARACTER; when ' ' => CLASS := SPACE; when '0' .. '9' => CLASS := DIGIT; when 'A' .. 'Z' | 'a' .. 'z' => CLASS := LETTER; when others => -- May generate choices only for CLASS := SPECIAL_CHARACTER; -- remaining values of type end case; -- CHARACTER and not for byte values -- 128 .. 255. 2.4. PROBLEMS WITH INDEXED COMPONENTS Evaluation of an indexed component entails a check that each index-value expression has a value in the corresponding index subtype. When the index-value expression is a variable whose declared subtype is the corresponding index subtype (or a subset of that subtype), a compiler may omit the index check based on the assumption that the variable contains a value of its declared subtype. For an indexed component occurring as a primary, violation of this assumption leads to a fetch from an arbitrary address, resulting in either a hardware addressing exception or the delivery of an arbitrary bit pattern without warning. (This arbitrary bit pattern may itself be an invalid bit pattern for the array component subtype, causing cascading errors.) For an indexed component occurring as the variable of an assignment statement or as actual parameter of mode out or in out, use of an invalid bit pattern as an index value leads to a store to an arbitrary address, with unpredictable consequences. 3. A POSSIBLE SOLUTION We use the following terminology in discussing the treatment of invalid and potentially invalid bit patterns: o A type is "compact" if every appropriately sized bit pattern is the representation of some value of that type. (Many bit patterns may be representations of the same value.) o For a base type with a first named subtype, a value of the first named subtype is said to be "proper"; a value of the base type that is not also a member of the first named subtype is said to be "improper". o A function call or a use of a variable as a primary is said to have a "confirmed" value if its evaluation is guaranteed to yield a value in the result subtype of the function call or the declared subtype of the variable. If it is possible for the evaluation to yield a value outside of this subtype, the function call or use of the variable is said to have an "unconfirmed" value. The term "value" is used in two different senses when we speak of a proper value and when we speak of a confirmed value. It is a value in the mathematical sense that can be classified, independent of context, as a proper or improper value of a given base type. It is an occurrence of a primary that is a function call or a variable name that is classified as having a confirmed or unconfirmed value; following the declarations X : NATURAL := 0; Y : NATURAL; both X and Y may contain the value 0, but X is viewed as having a confirmed value and Y as having an unconfirmed one. Having an unconfirmed value is a property of an occurrence of Y, not a property of the mathematical value 0. When a variable (including a formal parameter) is declared to belong to a particular subtype, this obligates the implementation to ensure that the variable EITHER contains a value of the subtype OR has an unconfirmed value. Unconfirmed values may arise in a number of ways: o The initial value of a variable without an explicit or language-defined initialization is, in general, unconfirmed. o The result of an unchecked conversion is, in general, an unconfirmed value. o A value obtained from a call on the procedure READ (exported by an instance of SEQUENTIAL_IO or DIRECT_IO) is, in general, unconfirmed. o An unconfirmed value may arise from viewing the same bits as belonging to more than one type, or by manipulating the same bits both in Ada code and code written in some other programming language. Dual views can arise through the unchecked conversion of access values, the passing of access values to or from non-Ada code, the passing of addresses to non-Ada code, or the use of address clauses. o In general, assignment of an unconfirmed value of a subtype ST to a variable of some subtype that includes ST (perhaps ST itself), causes all uses of the target variable potentially reached by that assignment to have unconfirmed values. However, particular values that would, in general, be unconfirmed can be treated by an implementation as confirmed in any circumstance in which the implementation can guarantee that the value is of the appropriate subtype. For example: o If a subtype contains every value of some compact type, all variables and function results of that subtype may be regarded as having confirmed values. o In a region of code that the implementation ensures can only be reached when a given variable has a value of the appropriate subtype, all evaluations of the given variable may be regarded as having confirmed values. For example, if uninitialized variable X is of subtype ST and (as we shall require) the membership test "X in ST" is required to return FALSE when the value of X is not of subtype ST, occurrences of X as a primary in the sequence of statements of if X in ST then ... end if; may be regarded as having confirmed values. Similarly, if a check is performed upon assignment of an unconfirmed value to ensure that the assigned value belongs to the declared subtype of the target variable, then (in the absence of other assignments to that variable) uses of the target variable may be regarded as having confirmed values. o An implementation may apply more sophisticated reasoning to deduce that a variable has a confirmed value. For example, given X, Y : ST; ... Y := X; if X in ST then ... end if; the compiler may reason that if control reaches the sequence of statements inside the if statement, then X must have had a value of subtype ST at the time of the assignment to Y, so that the value of Y can be regarded as confirmed inside the if statement. In general, the treatment of potentially unconfirmed values as confirmed is implementation dependent. It depends on the sophistication of the analysis performed by an implementation and on the implementation's strategy for inserting checks in contexts where they are not strictly required, perhaps to avoid more expensive checks later. Nonetheless, an implementation has an obligation that can be stated in implementation- independent language: An implementation must somehow guarantee that a function result or variable evaluation has a value in the appropriate subtype or that its value is treated according to the rules (given below) for unconfirmed values. Reasoning about unconfirmed values can be extended to larger expressions. Suppose that the variable X occurs as part of some larger expression and that the compiler can determine that whenever X holds a value of some subtype ST1, the larger expression necessarily holds a value of some other subtype ST2. Then if X is in fact an unconfirmed value of ST1, the larger expression can be treated as an unconfirmed value of ST2. For example, given the declaration A : INTEGER range 1 .. 10; B : INTEGER range 2 .. 11; ... B := A+1; the expression A has an unconfirmed value of subtype INTEGER range 1 .. 10 in the assignment to B, so the expression A+1 can be regarded as having an unconfirmed value of subtype INTEGER range 2 .. 11. Similarly, compile-time analysis may allow an unconfirmed value of one subtype to be treated as an unconfirmed value of a smaller subtype. For example, if X has an unconfirmed value of subtype 1 .. 10 in the membership test of if X not in 6 .. 10 then ... end if; then X may be regarded as having an unconfirmed value of subtype 1 .. 5 in the sequence of statements of the if statement. The problem of invalid bit values can be solved by adopting the following rules: 1. Every type declaration introduces an anonymous base type, as well as a subtype named by the identifier given in the type declaration. The anonymous base type is compact. 2. An uninitialized variable does not contain an "undefined" value, but an arbitrary, possibly improper, value of its base type. Thus, evaluation of an uninitialized scalar variable is no longer erroneous, but evaluation of a variable whose update was aborted remains erroneous. 3. For the assignment of an unconfirmed value of subtype ST to a variable whose subtype includes ST, following the successful evaluation of the name and expression of the assignment statement, the effect of attempting to assign a value outside the declared subtype of the target variable may be one of the following: a. PROGRAM_ERROR is raised. b. CONSTRAINT_ERROR is raised. c. The value is assigned. The language does not define which of these three possibilities occurs, nor does it require that the same possibility occur for all executions of a given assignment statement. These rules also apply to the passing of an in or in out actual parameter to its formal parameter at the beginning of a subprogram and to the passing of an out or in out formal parameter back to its actual parameter at the end of the call. 4. A membership test with a type mark tests whether the value of the expression satisfies the constraints imposed by the denoted subtype. Improper values fail this test. 5. Unchecked conversion is never erroneous, though it may return an improper value. 6. If the expression in a case statement is the name of an object whose subtype is static, then a check is performed following the evaluation of the expression to ensure that the value of the expression is not improper. PROGRAM_ERROR is raised if this check fails. (See Section 3.3 below for a discussion of the practical implications of this rule.) These rules give the implementation the option of removing checks upon assignment of a variable or function result with a given declared subtype to a variable of the given subtype or a superset of the given subtype. Other checks and evaluations of conditions may be removed only in cases where the compiler can deduce that the check will always succeed. However, the analysis leading to this conclusion must account for the possibility of improper values arising in nonerroneous executions from uninitialized variables, calls on READ, unchecked conversions, variables with address clauses, and so forth, and propagating through unchecked assignments and unchecked parameter passing. The rules produce the desired effect for each of the problematic constructs discussed earlier. 3.1. EFFECT ON MEMBERSHIP TESTS The membership test ALTITUDE in ALTITUDE_TYPE is an operation of the anonymous compact base type of the first named subtype ALTITUDE_TYPE. This operation is required to return FALSE for values of that type that do not satisfy the constraints of ALTITUDE_TYPE. The test cannot be optimized away if the occurrence of ALTITUDE in the membership test has an unconfirmed value, because it is possible, even in nonerroneous programs, that ALTITUDE will contain an improper value. (On the other hand, if the membership test immediately follows the procedure call READ_ALTITUDE_SENSOR(ALTITUDE); and the compiler has exercised its option to check that the value passed back to the actual parameter is proper, the value of ALTITUDE in the membership test is confirmed, so the compiler can legitimately optimize the membership test away.) 3.2. EFFECT ON QUALIFIED EXPRESSIONS The constraint check in a qualified expression cannot be optimized away unless the operand of the qualified expression has a confirmed value. Thus a qualified expression always raises an exception or yields a value of the subtype denoted by its type mark. 3.3. EFFECT ON CASE STATEMENTS For a case statement, we must consider each of the three circumstances addressed by RM 5.4(4). For a case statement whose expression is a variable of a static subtype, a check is performed to ensure that no branch takes place indexed by an improper value. The check can be optimized away in case statements where the expression has a confirmed value. Even when not optimized away, the check can be implemented cheaply in the form of an additional, compiler-generated case alternative whose choices are the improper values of the base type and whose "sequence of statements" raises PROGRAM_ERROR. For a case statement whose expression is a qualified expression whose type mark denotes a static subtype, successful evaluation of the qualified expression ensures that the value of the expression is not improper, as noted above. (The result of the qualified expression is always a confirmed value.) Similar reasoning applies to a case statement whose expression is a type conversion. For all other case statements, the redefinition of base types introduces an upward incompatibility with Ada 83 that can be detected at compile time and easily corrected. Consider the following example, disregarding for the purposes of the example the proposed expansion of CHARACTER to 256 values in Ada 9X: case NEXT_CHAR is when CHARACTER'VAL(0) .. CHARACTER'VAL(31) | CHARACTER'VAL(127) => CLASS := CONTROL_CHARACTER; when ' ' => CLASS := SPACE; when '0' .. '9' => CLASS := DIGIT; when 'A' .. 'Z' | 'a' .. 'z' => CLASS := LETTER; when '!' .. '/' | ':' .. '@' | '[' .. '^' | '{' .. '~' => CLASS := SPECIAL_CHARACTER; end case; This statement was legal in Ada 83 because it had a choice for each of the 128 values of the base type of NEXT_CHAR, type CHARACTER. Suppose that CHARACTER'SIZE = 8, so that type CHARACTER is not compact. Under the proposed new rules (but with CHARACTER still defined by a 128-value enumeration type definition), type CHARACTER is no longer a base type, but the first named subtype of an anonymous 256-value base type. Since not all choices of the base type are accounted for, the case statement becomes illegal by the last sentence of RM 5.4(4). If the programmer was really manipulating improper values (for example, if NEXT_CHAR was set by a call on TEXT_IO.GET and that procedure does not check for improper values), the programmer blesses the Ada 9X designers for warning him that he had been walking on the edge of a cliff and adds a choice to cover these values, for example: when others => CLASS := EXTENDED_CHARACTER; If the programmer knows that improper values cannot arise, he can indicate this by qualifying the case-statement expression with the first named subtype, so that he doesn't have to add a case-statement arm that he knows is unreachable: case CHARACTER'(NEXT_CHAR) is ... end case; If the compiler is as certain as the programmer that the value of NEXT_CHAR cannot be improper, NEXT_CHAR will be treated as having a confirmed value and no check will be generated. A programmer who is adamant that no check be generated can use pragma SUPPRESS: declare pragma SUPPRESS(RANGE_CHECK, ON => NEXT_CHAR); begin case CHARACTER'(NEXT_CHAR) is ... end case; end; (Actually, the check that a value is not improper should probably be considered distinct from RANGE_CHECK. In the case of an enumeration type with holes, for example, the name RANGE_CHECK is misleading.) 3.4. EFFECT ON INDEXED COMPONENTS In general, the index check for an indexed component cannot be optimized away when the index-value expression is a variable whose declared subtype is the index subtype, because that variable may contain an improper value even in nonerroneous programs. (This is so whether the indexed component occurs as a primary or as the name of a variable to be set.) In cases where the variable used as an index has a confirmed value, the check may be safely omitted. 3.5. EFFECT ON ASSIGNMENTS The proposed rules allow constraint checks on assigned values to be omitted in the same cases as in Ada 83. Given the declarations A : INTEGER range 1 .. 5; B : INTEGER range 1 .. 5; C : INTEGER range 1 .. 10; the check may be omitted from the assignment A := B; However, if the check is omitted, then within the range of this assignment, B is viewed as having an unconfirmed value of its declared subtype. In contrast, the check may not, in general, be omitted from the assignment A := C; because the declared subtype of A does not include the declared subtype of C. In the context if C not in 6 .. 10 then A := C; end if; the check may be omitted after all, since within the if statement a moderately clever compiler may regard C as having an unconfirmed value of subtype INTEGER range 1 .. 5. 3.6. EFFECT ON LANGUAGE-INDEPENDENT OPTIMIZATION TECHNIQUES Concern has been expressed about the effect of safer treatment of uninitialized variables on language-independent optimization techniques. There are two issues to be considered: o the impact on the structure of a common optimizing back end used by front ends for several languages o the impact of additional checks on Ada run-time performance Application of the rules proposed here does not require fundamental changes in an optimizer's treatment of undefined variables. It simply requires that an Ada variable without a language-specified or programmer-specified specified initial value be regarded has having some, unspecified but consistent, initial value. A C extern variable, for example, must be treated in exactly the same way. (Unlike a C extern variable, however, an Ada local variable without an initial value need not be regarded as having all its definitions potentially killed by calls on separately compiled subprograms.) It is understood that Ada imposes a small run-time overhead in return for providing substantially more safety than languages like C. This added safety is perceived, both by users and nonusers of Ada, as a distinguishing strength of the language. The checks proposed here do not add significantly to that overhead. Ada programmers who find the overhead unacceptable can suppress checks, attaining a small speedup at the cost of safety. The rules proposed here have been formulated to minimize the number of checks that have to be made, consistent with certain commonly held (albeit currently unjustified) expectations: o Use of incorrect values will be manifested by the raising of exceptions rather than by unpredictable failures (even if the incorrect values are improper). o In the presence of explicit tests to prevent a statement from being reached when certain values are improper, the statement will in fact not be reached when those values are improper. The proposed rules require checks only in places where unpredictable failures may arise, such as case statements and indexed components, or in places where tests or checks are explicitly called for, such as membership tests and qualified expressions. Even then, a compiler is allowed to optimize away tests and checks whose results are known, perhaps because compile-time analysis determines that a certain use of a variable has a confirmed value. 4. SEMANTICS OF IMPROPER VALUES The notion of improper values introduces several technical problems that must be addressed. The remainder of this LSN addresses these problems. Different issues arise for different classes of types. 4.1. IMPROPER VALUES OF INTEGER TYPES Integer types present no difficulties. The rules of Ada 83 already stipulate that an integer type declaration introduces an anonymous base type (derived from a predefined integer type) and a first named subtype constrained by the range in the integer type definition. In typical implementations, the anonymous base type is already compact. 4.2. IMPROPER VALUES OF ENUMERATION TYPES In Ada 83, if ST is a subtype of a base enumeration type BT, then the attributes ST'FIRST and ST'LAST yield the bounds of the subtype ST, ST'SIZE yields the number of bits required by the implementation to hold all values of subtype ST, and ST'WIDTH yields the maximum image length for all values of subtype ST. However, the attributes ST'POS, ST'VAL, ST'SUCC, ST'PRED, ST'IMAGE, and ST'VALUE are, in effect, synonyms for BT'POS, BT'VAL, BT'SUCC, BT'PRED, BT'IMAGE, and BT'VALUE, respectively: Their range of allowable inputs is determined by the base type, not the subtype. Since these attributes apply to integer types as well as to enumeration types, there are already rules defining their effect in the case of anonymous base types. These rules are based on the presumption that all values of the anonymous base type have position numbers. Suppose that improper values of an enumeration type T have position numbers, ranging from T'POS(T'LAST)+1 to 2**T'BASE'SIZE-1. The application of the rules for anonymous base integer types to anonymous base enumeration types has the following disconcerting implications for a first named enumeration subtype T: o Although improper values do not have enumeration literals, they can be constructed by such attributes as T'BASE'VAL(T'LAST+N), where N is positive. It follows that T'IMAGE can be invoked with confirmed enumeration values that have no images. o T'BASE'WIDTH is defined in terms of the maximum image width of a set of enumeration values some of which may not have enumeration literals. o If X holds the position number of an improper value of T'BASE, the attribute T'POS(X) yields an improper value rather than raising CONSTRAINT_ERROR as in Ada 83. o T'SUCC(T'LAST) yields an improper value rather than raising CONSTRAINT_ERROR as in Ada 83. Thus the application to anonymous base types of Ada 83 rules for enumeration base types results in a run-time upward incompatibility. A run-time incompatibility that replaces the occurrence of an exception with the computation of a sensible result is generally deemed acceptable. However, it is not at all clear that generation of an improper value is a "sensible result." Consider, for example, the following idiom, which, while stylistically objectionable, is probably quite common: type DAY_TYPE is (SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY); DAY : DAY_TYPE; ... begin DAY := DAY_TYPE'SUCC(DAY); exception when CONSTRAINT_ERROR => DAY := DAY_TYPE'FIRST; -- wrap around end; With the rules described above, if the expression DAY_TYPE'SUCC(DAY) is reached with DAY = DAY_TYPE'LAST, it will not raise CONSTRAINT_ERROR, but yield the improper value DAY_TYPE'VAL(7). If this value ever propagates to a context where a check is required, it will raise an exception there. However, the handler in the block statement will not apply at that point. It seems preferable to define T'BASE'VAL to raise CONSTRAINT_ERROR when given the position number of an improper enumeration value and to define T'BASE'SUCC to raise CONSTRAINT_ERROR when its argument is a proper enumeration value but the successor of its argument is not. This behavior is achieved by generating precisely the checks that are generated for Ada 83, and it preserves Ada 83 semantics for enumeration types. However, preserving Ada 83 semantics for both enumeration types and integer types requires that the attributes of discrete types have different rules for improper integer values and improper enumeration values. It also requires that the behavior of the operations of a just-declared anonymous base enumeration type be defined partly in terms of the about-to-be-declared first named subtype; this is aesthetically unpleasing, but not problematic. Ada 83 says nothing about the behavior of T'BASE'POS, T'BASE'SUCC, T'BASE'PRED, T'BASE'IMAGE, the relational operators, and membership tests when applied to an improper value; the language designers presumed that this could happen only in erroneous executions, for which any outcome would be permitted. For example, one possible effect of T'BASE'IMAGE when applied to an invalid bit pattern in Ada 83 is to perform a lookup in an image table using an out-of-range index, possibly causing an addressing exception. Each of these operations must be considered separately. o It seems sensible to stipulate that T'BASE'POS returns the position number of its improper argument. This is trivial to implement, avoids the need for any checks, and produces a sensible, potentially useful result. o For the application of T'BASE'SUCC or T'BASE'PRED to an improper value, it seems reasonable to return the actual successor or predecessor (which is improper except for the predecessor of the first improper value). However, this is inconsistent with the rule that T'BASE'SUCC(T'LAST) must raise CONSTRAINT_ERROR. T'BASE'SUCC(X) could be made to raise CONSTRAINT_ERROR for all improper values X simply by changing the check from "if X = T'LAST ..." to "if X >= T'LAST"; however, to require T'BASE'PRED to raise CONSTRAINT_ERROR for all improper values requires the introduction of another comparison ("if X = T'FIRST or else X > T'LAST ..." in place of "if X = T'FIRST ..."). o The value for T'BASE'IMAGE applied to an improper value could be made implementation defined, just as for a nongraphic character, or the implementation could be required to raise CONSTRAINT_ERROR. In either case, T'BASE'WIDTH should probably be defined to return the maximum image length of all PROPER values, i.e., the same value as T'WIDTH. o A test for membership in the first named subtype T must return FALSE for improper values. o The relational operators should be defined to return a result based on comparison of position numbers. This will ensure that the three relations X in T X in T'FIRST .. T'LAST T'FIRST <= X and X <= T'LAST remain equivalent, and that they return FALSE for improper values. The discussion up to this point has ignored the issue of enumeration representation clauses. In fact, everything said so far applies as well to types with enumeration representation clauses, provided that improper values are assigned position numbers following T'POS(T'LAST). For example, given the declarative items type T is (A, B, C); for T use (1, 2, 4); for T'SIZE use 3; position numbers might be assigned to bit patterns as follows: bit position pattern number value ------- -------- ---------- 000 3 (improper) 001 0 A 010 1 B 011 4 (improper) 100 2 C 101 5 (improper) 110 6 (improper) 111 7 (improper) It is already the case in Ada 83 that for types with enumeration representation clauses, position numbers do not correspond to bit representations. Rather, the 'POS and 'VAL functions can be implemented as tables mapping bit representations to position numbers and position numbers to bit representations, respectively. Operations like 'SUCC and 'PRED and the computation of an address for an array component indexed by the enumeration type can then be implemented in terms of 'POS and 'VAL: T'SUCC(X) = T'VAL(T'POS(X)+1) if X /= T'LAST T'PRED(X) = T'VAL(T'POS(X)-1) if X /= T'FIRST A(X)'ADDRESS = A'ADDRESS + T'POS(X)*storage_units_per_component However, RM 13.3(4) requires that the bit patterns corresponding to an ascending sequence of position numbers be in ascending order. The proposed scheme for assigning position numbers to improper values violates this property. Consequently, under the proposed rules, the comparison X < Y must be implemented by the comparison T'POS(X) < T'POS(Y) rather than by a direct comparison of bit patterns as in Ada 83. However, assigning all improper values higher position numbers than all proper values ensures that all proper values are LOGICALLY contiguous: By starting at T'FIRST and repeatedly applying T'SUCC, one encounters a sequence of proper values ending at T'LAST. The effect of iteration over the first named subtype T and the use of T as an index subtype are well defined and the three relations X in T X in T'FIRST .. T'LAST T'FIRST <= X and X <= T'LAST remain equivalent ways of testing that X is proper. This is precisely what we want: The slightly cheaper comparison and range check made possible by RM 13.3(4) comes at the cost of giving arbitrary answers for improper values. (Since the proposed position-number assignment scheme for improper values would violate the property guaranteed by the rule in RM 13.3(4), there would be no advantage to retaining that rule. Programmers would be able to make good use of the freedom to list enumeration values in an order that is appropriate for the application, regardless of the underlying representation.) 4.3. IMPROPER VALUES OF FLOATING-POINT TYPES Some architectures have floating-point formats that are not compact. The effect of attempting to execute a floating-point operation on an invalid bit pattern may range from producing an arbitrary result to setting a flag to trapping. The following Ada rules provide maximum flexibility: 1. An arithmetic operation applied to an improper floating-point value either raises CONSTRAINT_ERROR or yields an arbitrary (possibly improper) value as a result. 2. A comparison applied to an improper floating-point value either raises CONSTRAINT_ERROR or yields an arbitrary (but proper) BOOLEAN result. It is straightforward to implement these rules on a floating-point processor implementing the IEEE standard 754 if NaN's and infinities are regarded as improper values. (This is not to say, however, that the Ada rules then implement the IEEE standard. For example, Ada rules require division by zero to raise an exception rather than yielding an improper value whose representation happens to be an IEEE infinity.) Tests for the validity of a floating-point representation can be provided by short assembly-language routines. Checks for improper floating-point values may be somewhat more expensive than checks for improper values of other scalar types, but not inordinately so. Such checks need be performed only rarely. Checks are required (and clearly desired, despite the cost) for a test for membership in the first named subtype or in a qualified expression, though of course they may be optimized away if the compiler can establish that the checks always succeed. Checks are NOT required upon assignment of a floating-point value, upon the passing of a floating-point value as a parameter, or upon use of a floating-point value in an arithmetic operation (provided that any resulting trap can be converted to a raising of CONSTRAINT_ERROR). Furthermore, at least for types T for which T'MACHINE_OVERFLOWS is true, no arithmetic operation applied to proper operands can yield an improper result. This facilitates the determination that certain variables have confirmed values, allowing checks on such variables to be optimized away. 4.4. IMPROPER VALUES OF FIXED-POINT TYPES Like integer types, fixed-point types present no difficulties. The rules of Ada 83 already stipulate that a fixed-point type declaration introduces an anonymous base type (derived from a predefined fixed-point type) and a first named subtype constrained by the real type definition. In typical implementations, the anonymous base type is already compact. 4.5. IMPROPER VALUES OF ARRAY TYPES Our treatment of array types is based on the following two principles: 1. An array value may be proper even if it includes some improper component values. 2. "Dope" describing the bounds of an array should not be regarded as part of an array value or an array object. Rather, it is bookkeeping information that may be stored adjacent to the object or elsewhere. Principle 1 is based on the observation that it is common for data structures to include composite variables with some uninitialized components, along with auxiliary information indicating which components have meaningful data. Familiar examples are stacks, text buffers, and circular queues. Principle 2 reflects the thinking in unapproved issues UI-0063 and AI-00556. UI-0063 would stipulate that dope vectors do not participate in unchecked conversion to or from an unconstrained array subtype. Given the declarations and pragma subtype INDEX is INTEGER range 1 .. N; type A is array(INDEX range <>) of C; pragma PACK(A); AI-00556 would require the length clause for A'SIZE use N*C'SIZE; to be accepted, from which it follows that the dope vector for an object of type A is not included in A'SIZE. It follows from these principles that improper values of an array type can only arise in the following ways: o If Ada 9X allows array types with discriminants, array values with invalid discriminant values (i.e., proper or improper values that do not belong to the declared discriminant subtype) would be improper. Similarly, array values whose lengths are inconsistent with their discriminant values would be improper. o For array types with padding between components, if the implementation attempts to maintain certain properties for padding bits (typically that all padding bits are zero, to facilitate comparisons of whole array values by block comparisons), an array value that violates such properties could be regarded as improper. (For an instantiation of SEQUENTIAL_IO or DIRECT_IO with an unconstrained array type, an actual parameter with particular bounds is passed to a call on READ; these bounds are never altered based on dope in the file. If unchecked conversion to an unconstrained array type is supported, the number of components in the result may be inferred from the number of bits in the operand, but not from any information contained in those bits; presumably an implementation supporting such conversions would construct its own dope vector for the result, using implementation- defined rules for selecting a lower bound, and in the case of multidimensional unconstrained array types, for selecting how the number of components should be factored into lengths for each dimension.) It seems clear, given the declarations type LINE(LENGTH: NATURAL := 0) is array (1 .. D) of CHARACTER; subtype CARD is LINE(LENGTH => 80); L : LINE; that the membership test L in CARD should return FALSE if L.LENGTH has a negative value (because, for example, L was obtained by unchecked conversion of bad data or from a call on READ). That is, the membership test should check for values that are improper because of invalid discriminant values. Since a programmer may want to validate externally generated data without checking for a particular discriminant value, the membership test for the first named subtype, such as LINE_POINTER.all in LINE should also check for improper values. From this decision to acknowledge that improper array values may arise, it follows that--in theory--a check should be made for invalid discriminant values before an indexed component or slice of an array with discriminants is evaluated. In practice, this check can almost always be optimized away, because the initial value of a declared or allocated array is a proper value and the normal operations on arrays preserve proper values. It is only when a whole array is updated with the result of an unchecked conversion or by a call on READ, or, in the case of aliased arrays with discriminants, when the address of the array has been taken and used for unknown purposes, that improper array values may arise. A compiler can apply a simple strategy to ensure that unconfirmed array values do not propagate across subprogram calls: Any unconfirmed array value passed as an in or in out parameter should be checked by the caller some time before the call; any unconfirmed array value assigned to a formal parameter of mode in out or out should be checked by the called subprogram after it is assigned and before any operation that may raise an exception or return to the caller. This allows the compiler to treat all in and in out formal parameters as having confirmed values at the start of a subprogram, and that all in out and out actual parameters as having confirmed values at the end of a procedure call (even if the call propagates an exception), making it easy to treat most array values occurring in a program as confirmed. It is not clear that arrays with incorrect padding should be regarded as improper, because padding is an implementation concept with no manifestation in the abstract semantics of the language. Rather, it can be argued that an implementation that uses some internal invariant to expedite particular operations should be responsible for preserving that invariant. Specifically, an implementation that zeroes padding bits to facilitate fast array comparisons must not assume that data coming in from such sources as unchecked conversion or calls on READ is properly padded. The implementation must zero the padding bits of suspect arrays before performing the comparison, or else it must use a slower component-by-component comparison that does not depend on the invariant. A compiler can keep track of which arrays are known to be properly padded and which are suspect, and resort to such special measures only in a few rare cases. Nonetheless, a membership test on an array should not return FALSE by virtue of having arbitrary padding bits; a sequence of bits with nonzero padding should be viewed as an alternative representation of a proper value. If incorrect padding is not viewed as making a value improper, and if discriminants are not allowed for array types in Ada 9X, there will be no improper values of array types. 4.6. IMPROPER VALUES OF RECORD TYPES The issues of invalid discriminants and invariants maintained for padding bits arise for record types as well as for array types. In addition, records are permitted to contain "implementation-dependent components" (dope information) as part of the record value (see RM 13.4(8)); the tags of Ada 9X tagged types could be regarded as language-defined hidden components analogous to these implementation- defined components. Finally, the bits representing a record value may, in some implementations, include pointers to composite components controlled by discriminants, with the components themselves located in a different part of storage. By the same arguments that apply to array types, and for the sake of consistency with array types, the following rules should apply to improper values of a record type: o Membership tests should yield the value FALSE and qualified expressions should raise CONSTRAINT_ERROR for values with invalid discriminants or invalid tags. o Membership tests should yield the value FALSE and qualified expressions should raise CONSTRAINT_ERROR for values with invalid implementation-defined components, to the extent that there is a straightforward test for the validity of such components. (Implementation-defined components that are pointers to or offsets of data stored elsewhere may be particularly difficult to validate.) An implementation should document the meaning of its implementation- defined components and the extent to which each is validated. If an improper value of a record type violates some property that is required of implementation-defined components but cannot be validated, use of that improper value should render execution erroneous. o Like array values, record values should not be considered improper by virtue of the contents of their padding bits. 4.7. IMPROPER VALUES OF ACCESS TYPES Improper access values can be particularly devastating. Besides the usual sources of improper values, unchecked deallocation has the effect of making proper access values improper. Thus flow analysis is useless in tracking the propagation of improper access values. Unapproved uniformity issue UI-0060 would require a membership test for an access value to return FALSE under certain circumstances: If T is an access type and the result of the simple expression [in the lefthand part of a membership test] is recognized by the implementation to be invalid as a representation of a value of type T, the membership test must return FALSE. Otherwise, if the type mark of the membership test imposes a discriminant constraint and the result of the simple expression designates an object whose discriminants do not contain valid values of the corresponding discriminant subtypes, the membership test must return FALSE; similarly, if the type mark imposes an index constraint and the result of the simple expression designates an object whose internal descriptor information is not of the form required by the implementation, the membership test must return FALSE. Otherwise, if the type mark of the membership test imposes a discriminant or index constraint and the result of the simple expression designates an object that does not satisfy that constraint, the membership test must return FALSE. Otherwise the membership test must return TRUE, regardless of the contents of the designated object. An implementation must document in Appendix F the circumstances under which the result of the simple expression is recognized as invalid for access type T and the expected form of the internal descriptor information for designated array objects. These rules seem reasonable for Ada 9X. The contents of the designated value enter into the membership test for an access value only in those cases where the discriminants or bounds of the designated value are examined as part of the test for satisfaction of a constraint on an access subtype; a pointer to an object in which this information is invalid must be considered not to satisfy the constraint. Like any composite value, the value of a composite variable designated by an access type may be proper even if it has improper components. Defining an access value to be improper if it designated a composite with improper components would be particularly inappropriate: In the case of a recursive data structure, this could lead to infinite recursion in the absence of some marking scheme. 4.8. IMPROPER VALUES OF TASK TYPES AND PROTECTED TYPES The generic formal parameters of SEQUENTIAL_IO and DIRECT_IO are generic formal private types, not limited private, so these generic packages cannot be instantiated with task types and protected types. However, the generic formal parameters of UNCHECKED_CONVERSION are generic formal limited private types, so this restriction is easily circumvented: task type T is entry E; end T; type T_SHADOW is array (0 .. T'SIZE-1) of BOOLEAN; pragma PACK(T_SHADOW); function BONA_FIDE_TASK_OBJECT is new UNCHECKED_CONVERSION(T_SHADOW, T); package RUSSIAN_ROULETTE is new SEQUENTIAL_IO(T_SHADOW); SHADOW : T_SHADOW; F : RUSSIAN_ROULETTE.FILE_TYPE; ... RUSSIAN_ROULETTE.READ(F, SHADOW); BONA_FIDE_TASK_OBJECT(SHADOW).E; The effect of the entry call is, of course, unpredictable: There is a remote chance that it will actually work. (Other ways in which arbitrary bit patterns can be placed into a task object or a protected object include passing an access value designating that object to a non-Ada subprogram, using unchecked conversion to convert such an access value to another access type whose designated type is not limited, or using an address clause to locate the limited object at a storage location that is altered by some hardware or software agent outside of the Ada program.) It is unreasonable to require the implementation to check for the consistency of a task object representation before using the task object. Furthermore, improper values of a task type or a protected type can be created only by elaborate subterfuge; they are not likely to arise by accident. Therefore, the following rules should be adopted: 1. Every task object and protected object has a proper initial value. (RM 9.2(2) effectively says this already for task types.) 2. All manipulations defined for task types and protected types, when applied to a task object or protected object with a proper value, leave a proper value in that object. ("Manipulation" includes invocation of a protected operation or a task entry, evaluation of attributes of the object, abortion, and invocation of the subprograms proposed in the real-time annexes for manipulation of tasks and priorities.) 3. If a task object or protected object is somehow given an improper value, any manipulation of that object (in the sense of Rule 2) is erroneous. It is a consequence of these rules that if a programmer writes if BONA_FIDE_TASK_OBJECT(SHADOW) in T then BONA_FIDE_TASK_OBJECT(SHADOW).E; end if; the membership test may be optimized away. (It would be straightforward to test for invalid discriminant values for task objects or protected objects with discriminants. However, invalid discriminant values could only arise in circumstances where the object might also be corrupted in less easily characterized or detected ways. Therefore, a check for valid discriminants alone is of no practical use.)