Ada 9X Language Study Note LSN-038-DR Undefined Scalars in Ada 9X B A Wichmann, National Physical Laboratory, Teddington, Middlesex, TW11 0LW, UK E-mail: baw@seg.npl.co.uk March 25, 1992 1 Introduction The purpose of this note is to analyse the problem that undefined scalars can raise in Ada 9X in order to formulate rules to control the effects they may have. The note uses the nomenclature of Ada 83, rather than that of Ada 9X, since obviously, Ada 9X has not yet stabilised in this area. These notes have depended upon substantial e-mail discussion by the Ada 9X Distinguished Reviewers and the Ada 9X Mapping Team. See the acknowledgement section for details. Although many direct quotes are made from the e-mail material, I have had to edit this slightly to remove typographical errors and also to remove a reference to the source when this could be confidential (and not necessary to make the point). The essential input to this note is the Ada 9X Requirements [1]. Detailed knowledge of the Ada 9X Mapping Document [2] is not required, so that this note can be read by those familiar with Ada 83 alone. 2 The Problem Statement In contrast with `C', relatively few Ada constructs can lead to the situation in which the execution of a program is unpredictable, even though the program compiled without errors. This important advantage of Ada means that relatively few program executions result in a `code-dump', `bus-error' or the like. On the other hand, the Ada language requires substantial run-time checks, which vendors like to minimise to retain a 1 competitive edge with their products. The requirement to perform well with benchmarks, perhaps against `C' and FORTRAN equivalents, implies that the checks should be minimised. It is thus tempting for vendors to design their compilers by assuming the program being compiled is not erroneous, since many checks could be removed by such an assumption. Unfortunately, almost all large programs are likely to be erroneous, perhaps by access to an undefined scalar which is the subject of this note. Hence the Ada 9X standard needs to address the problem of the execution of erroneous programs, although, as we shall see, this could be restricted to the safety/security annex. The difficulty with erroneous programs lies in detecting this property --- if the property were easy to detect, it should be added as a restriction to the language, thus avoiding the problem completely. 2.1 Source of Undefined Scalars This note considers not just the classic case of an uninitialized scalar, but the following list of `undefined' scalars: 1. Objects not initialized on declaration. This is thought to a common form of programming error. Obviously, a programming style can avoid this particular case by using initialization on declaration, but this is confusing in those cases in which the value cannot be set to a logically meaningful value. 2. Access to components which have not been initialized. Since it is not erroneous (in Ada 83 terms) to copy unset composite values, it is very hard to check for the absence of this form of undefined value. 3. Results of an instantiation of UNCHECKEDCONVERSION. The RM does not require that such values are checked as being legal bit patterns of the subtype; on the other hand, such a check can be made. 4. Unsynchronized update. This is a much worse problem. For instance, it is possible to obtain illegal access values by such means, resulting in the proverbial core-dump. In consequence, this method of obtaining undefined scalars is regarded as being out of scope of this note. Included within the case is the abortion of a task while updating a variable. 2 5. External events. For many embedded systems, incoming information can be by Direct Memory Access (DMA). Note that we are assuming that this is properly synchronized with the Ada program. 6. Use of machine-code or pragma INTERFACE. The use of either feature cannot be avoided in many applications and therefore the consequences must be clear. 7. Use of pragma SUPPRESS. Here, the user has explicitly overridden the checks otherwise required by the language. It does not seem appropriate to cover this case. 3 Discussion The problem with undefined scalars is not the source of such values, but their subsequent use. Two views of undefined objects need to be distinguished, computational access, and access to the bits of the underlying representation (via an instantiation of UNCHECKEDCONVERSION). The problem with scalars can be contrasted with access types. Here, the default value and the fact that access values can only be used in restricted contexts, implies that access type values are relatively safe (UNCHECKEDCONVERSION to an access type is obviously unsafe, as is unsynchronized update). 3.1 Scalars It is convenient to divide the discussion into the various classes of scalars concerned. 3.1.1 Subtypes Range constraints provide a powerful means of enforcing invariants upon values in an Ada program. The problem here is that a compiler can assume this invariant is satisfied. But this assumes that the scalar in uninitialized. Hence the danger arises if the compiler assumes the program is not erroneous. 3.1.2 Integers On very many architectures, an object of type INTEGER occupies a word. All the bit patterns that the word can contain correspond 3 to valid values of the type. In consequence, it is tempting to consider an uninitialised value as just being a `random' value of the type. Unfortunately, this simple model can easily fail. For instance, a code generator could decide to equate the integer with a register. However, the register could be over-length, thus allowing values outside the type to arise via uninitialized variables. This situation is not uncommon. For example, many 32-bit machines have Ada compilers which support 16-bit arithmetic via overlength registers. A `feature' of overlength registers is that the test X: T; ... if X in T then can fail (if the compiler does not remove the test!). 3.1.3 Booleans Boolean values are important because they are built into code generators to provide a highly optimized implementation. Also, since Booleans are often handled in the processor as integers, the problem noted above with overlength registers applies. Also, Boolean values can easily migrate between different registers so that no meaning can be given to a `machine representation'. Programmers do expect Boolean values to be either TRUE or FALSE. However, the corresponding tests could fail which in turn could lead to other unexpected behaviour. For an example of this written in Pascal, see Appendix A. 3.1.4 Enumerated types with holes The examples given above with overlength registers is rather similar to the stored values being a `subtype' of the register values. This situation can therefore be detected with conventional constraint checks. However, an enumerated type with a representation clause requiring holes in the value set poses a more severe problem. The check that an arbitrary bit-pattern does correspond to a legal value could be quite expensive. Moreover, Ada 83 does not require that this test is undertaken. The operation of constructing an index value for array component selection when such a type is used as an index is rather different. 4 No hard figures are available, but it seems wisest to assume that the legality check for such values is too expensive to mandate on the core language (i.e, so that the performance penalty is the default). 3.1.5 Floating point These values are somewhat different since constraint checks can rarely be optimized away and the operations like indexing and case selection do not apply. Robert Dewar writes: ... In particular, I see no reason at all to defend against undefined values in the case of floating-point, no disasters can happen as a result of bad floating point values, and it makes perfect semantic sense for assignment to be just a bit copy. On most classical machines (without hidden bit representa- tions), there are bit patterns which cannot be produced as a result of the floating point operation. However, from a computational viewpoint, these values behave just like other legal values. In consequence, they do no damage. For IEEE systems, a floating point bit-pattern could correspond to a signaling NaN. Loading such a value into a floating point register would then cause a trap (in IEEE terms, assuming the trap was enabled). This is a perfectly well-defined behaviour, which the Ada run-time system could convert into the CONSTRAINTERROR or PROGRAM-ERROR exception. I therefore conclude that undefined floating point values does not appear to present an insecurity. Implementations should be required to behave as if scalar floating point variables have a value, or raise CONSTRAINTERROR. 3.1.6 Fixed point The conventional implementation of fixed point is via integer hardware. In view of this, it appears that fixed point raises no new issues. (The AI's concerned with the representation of fixed point values makes the implementation of fixed point via floating point ineffective and hence could be excluded in Ada 9X.) 3.2 Constants Since in Ada constants are objects whose value is set at scope entry on initialization, one can get uninitialized constants. 5 Within the scope of such a constant, reasoning about the program will be impossible, even ignoring the fact that the program is then erroneous! 3.3 Propagation Numerous problems arise once an uninitialized value is used. Bevin Brett provides the following example: X : array(BOOLEAN range FALSE..FALSE) of INTEGER; Y : POSITIVE; Z : INTEGER := X( Y < 0 ); The Y<0 check has bounds 1<0 and INTEGER'LAST>0, FALSE..FALSE, so no check is required. Here, the optimizing compiler does not even reference the unset value -- since the effect of the program can be predicted without such a reference. Another problem is with: A := A; -- a no-op? If the value if A has not been set, then perhaps the constraint check should be applied? Ada 83 does not require such a check. Bob Duff gives another nice example: A related problem is with dope. For example, subtype S is STRING(1..UNINIT); Then, S'LAST is undefined, just like constants. The problem with propagation is that once the program is erroneous in the technical sense of Ada 83, it is very hard to make any statements about a program which are likely to be valid across diverse implementations. On the other hand, given a large program (say, 100,000 lines or more), checking that it is not erroneous is unlikely to be a practical task. (I would estimate that showing it is not erroneous could cost as much at the rest of the program development.) 6 3.4 Incorrect Order Dependence This issue seems much less pernicious than erroneous execution, since the Ada 9X strategy of merely requiring the order to be defined by the implementation solves the major concern. Obviously, the requirement to define the order could be restricted to those systems implementing the safety/security annex. 3.5 Code Generation Issues Once a program is technically erroneous in the Ada 83 sense, it is hard to separate out the issues of machine architecture and the strategy used in the compiler, especially for optimization. Bevin Brett quotes the following nasty cases: (a) case UNINIT is when ... end case; Implemented via a table of addresses, results in jump through trash. (b) if B then Implemented via a table of addresses, results in jump through trash. (c) subtype S is INTEGER range 5..5; type R(D : S := 5) is record C : STRING(1..D); end record; X : S; Y : R(X); could result in accessing outside the allocated area of Y, if the compiler allocated exactly 9 bytes for all values of type R. (d) continuing (c) function F return R is begin return (X, (1..X=>' ')); end; 7 could access outside the temporary passed in to return the function result in. (y) X in ENUMERATION_BASE_TYPE could give FALSE (z) X : INTEGER range 0..5; X rem 6 could give a value outside 0..5 Art Evans provides another example: Let me add another: as a subscript in an entry call to an entry family. Tucker Taft writes about the Mapping Document: We are only talking about what checks may be eliminated by an optimizer, not about adding new checks. But in practice, an existing program may see more checks remaining, presuming they were accustomed to a given optimizer removing a certain number of checks. ... As far as safety/securing annex versus the core, we would much prefer to minimize erroneousness in the core, and then allow programmers to ``add it back" with variants of pragma SUPPRESS, than to leave erroneousness in the core, ready to trip up naive programmers. A compiler writer states: We have only recently found out what a truly aggressive optimiser will do to range checks. The interactions of (a) subrange based check elimination, (b) symbolic range-based expression evaluation, (c) wider-than-based- type-arithmetic, (d) dead-code-elimination worked an interesting set of conclusions on the ACVC! Robert Dewar writes: 8 One thing in this discussion is that it is important to avoid being intimidated by compiler writers who will tell you that if we restrict compilers by not letting them do strenuous optimization, then the quality of the code will be significantly damaged. This is nonsense, but almost all compiler implementors fall prey to this excessive rhetoric. In fact, most optimizations are disappointing, and you should only be swayed by this argument if it is accompanied by VERY SPECIFIC data for EXACTLY THE CASE UNDER DISCUSSION. For example, if someone really writes two compilers, which differ only in their treatment of undefined variables, then comparative performance of these two compilers is relevant. I know that compiler writers will squawk at the last paragraph, but it corresponds to my consistent experience in many compiler projects in which I have been the principle author of the code generator. One nice example of this general principle is the embarrassment being caused by the GCC compiler (the free C compiler from FSF). There are many cases in which it is outperforming compilers that supposedly do all sorts of wonderful global optimizations (GCC has a relatively straightforward view of optimization). Anyway, my plea here is that we be very careful to listen to users, and indeed in this case, I would give much more weight to users views than implementors views. Furthermore, if you want users to qualify their opinions taking into account performance impact, give them real data, not bogus, inflated, unsubstantiated claims such as ``that will have terrible effect on performance". Jean Ichbiah refers to such non-quantized technical arguments as ``technical terrorism", and I think the term is apt! In terms of competition with other languages, I think that Ada programmers precisely expect more safety from Ada than from C, and are willing to pay for it when checks are on. Furthermore, I don't think you will find C programmers too enthusiastic about: if (1 <= a && a <= 10) printf ("%d",a); outputting an 11 ! 9 A compiler vendor writes: The ability to use shared code generators for C, Fortran, and Ada is the ONLY hope that Ada has of keeping up with those languages in generated code quality and also of quick availability on new architectures. If we define Ada 9X in this area in a manner that is very different to C and Fortran, we may well end up `killing the goose that lays the golden eggs'. Robert replies: ... we can't accept a general argument which says: My common code generator does not accommodate X. Therefore Ada-9X cannot contain X. A compiler vendor writes: Our compiler has several modes of optimization; all of these are safe (in the sense that checks are eliminated only if they are redundant in any possible program) except for `checking optimizations'. This seems to provide roughly the right amount of optimization control for most uses. My own view on optimization is that average performance gains are modest, once the obvious issues have been handled. However, individual test cases can give the appearance of a very different picture. This often forces a supplier to undertake an optimization even though the gains are modest on almost all programs. 3.6 User Expectations Robert Dewar writes: I think this issue is not at the forefront of discussion because most Ada programmers are unaware of the issue. At the SIGAda meeting in Seattle a couple of years ago, I took a little poll. I wrote up the assignment statement: A(I) := 0; and the proposition: 10 In Ada with checks on, subscript checking is performed that ensures that a statement like this cannot clobber a memory location outside the array. and asked how many agreed. The result was interesting -- EVERYONE in the room, except for me and Bob Eachus, agreed with this. It is of course false, since the erroneousity arguments can permit the omission of the check in the undefined case (although that is not a 100% clear conclusion from the RM, it is certainly one that some implementors make). The audience was plainly shocked (and probably a little incredulous) at my claim that Ada did not require subscript checking. In fact implementors differ substantially on this issue, some omit the check only if they are sure the subscript is defined and in the right subrange, some ignore the defined condition. ... I think if the user base was fully informed on this issue, and if there are more and more compilers that do not do reliable subscript checking, and that sooner or later we have a disaster tracable to this, you will find that the market place is plenty interested. and continues: Ah, the innocence of the uninitiated is charming! This maybe 100% clear to some but it is in fact a very difficult issue on which no clear agreement has been reached, although a lot has been written. The issue is whether a compiler is allowed to assume that the program execution is non-erroneous, and base an arbitrary chain of logical reasoning on this premise. Now Steelman specifically forbade such reasoning, but the RM really does not address it. There are two views of erroneousity. The first is that the RM specifies a sequence of actions to be performed, and when one of these actions is erroneous, further actions are undefined. The second is that a compiler can make the assumption given above and do ``back reasoning" from this assumption. 11 Actually, when people hear the first rule, they will usually agree that this is their understanding of the situation, and to non-compiler writers, that's how the RM reads. Consider the statement we are talking about: A(I) := 0; Now the RM clearly says that a check is made that I lies in the index subtype. It does not say that this check is omitted if I is of the correct declared subtype. ... Generally we permit such optimizations if they do not change the semantics. So of course we agree that if I is initialized, and of the correct index subtype, then the check can be omitted. The question is can we omit the check if I is not initialized? Well informally, it sure seems like changing the effect of a program from raising CONSTRAINT-ERROR to bombing the code and causing unimaginable chaos is a change in the semantics. But of course the counter argument is that ``erroneous" includes all effects, so these two effects are equivalent to erroneous. Deciding between the two views of erroneousity is not easy. Neither extreme is comfortable to most people: Never back reasoning from erroneousness would mean that even the checks on: A := B; can never be omitted, even when A and B are the same subtype. This is so because the RM requires the check even in this case (it is part of the assignment operation), and the only justification for removing the check is this erroneousity argument. This seems too severe to most people. Allowing arbitrary reasoning from erroneousness causes some amazing and clearly (to most people) undesirable and unacceptable effects, where early parts of a program are executed completely incorrectly, because of later erroneousness. The literature on this subject is full of examples, but I'll repeat one of my favorites: loop 12 Read (Pass_Word); if Pass_Word = System_Pass_Word then DELETE_SYSTEM_DISK; exit; else BAD_ATTEMPTS := BAD_ATTEMPTS + 1; if BAD_ATTEMPTS > 3 then TERMINATE_RUN; else PUT_LINE ("Bad password, try again"); end if; end if; end loop; Suppose that, due to a plain silly error, BADATTEMPTS did not get initialized. Now under the second view of erroneousness, the compiler can reason as follows: ? If Pass-Word is not equal to SystemPassWord, then the execution will be erroneous. ? Therefore I can do anything I like if this equality does not hold, in particular, I can assume that it does hold. ? Therefore (nice improvement of generated code!) I will omit the check completely. The resulting program simply fails to check the pass word. This seems plainly unacceptable. Bryce Bardin writes: The uninitialized variable issue is clearly a complex one, in which a balance must be struck between performance and safety. The following example program was constructed in 1989 to illustrate user's concerns with the general issue of invalid values and to determine the state of the practice for various compilers. As can be seen, the issue is broader than just uninitialized variables. The broader issue was discussed informally a number of times by the ARG, without any clear resolution. In part, I believe the difficulty in reaching a firm conclusion comes from fairly fundamental philosophical differences between users and implementers with regard to the 13 relative importance of speed and safety. (I hasten to add that there is no great unanimity among either users or implementors in this regard.) My own position is based on ``WYSIWYG", tempered by realism. In general, I believe that explicitly written membership tests and if or case statements which check discrete values must not be eliminated if checks are not suppressed and the compiler cannot guarantee that the value being checked is valid. In particular, complete checks must be generated for enumeration types with holes in their representation. I see no reason not to require the checks to be performed on fixed point types, since a simple range check suffices, but do not feel that anything more than a range check should be required for floating point types. If a user requires validation of a composite type, is should be composed from tests of the scalar components. I believe this position, if adopted for Ada 9X, would be consistent with the philosophy of Ada and would not lead to significant performance loss compared with current practice. When performance is the issue, checks should be turned off. In benchmarks against C (and others of its ilk), let's compare apples with apples, not oranges. It is a matter of education by the vendors. We should not compromise our principles in order to win contests based on ignorance. The example program is included in full as Appendix B. Robert Dewar reports: Mike Feldman made the following statement: ``One of the major advantages of Ada is that with a decent Ada compiler (and the commercial compilers by this stage are pretty reliable), you don't get something like `bus error - core dump taken' messages as a the result of an error --- at worst you get an unhandled exception in such cases." People nodded assent, and accepted this observation, as I think most Ada audiences would, as one of the advantages of Ada. Now, luckily, no one chimed in and said: ``That's wrong, as Ada compilers `improve', they will be able to generate (slightly?) more efficient code, and 14 as a result, you will be able to get bus errors etc. as a result of undefined variables, even if checks are on." If anyone had said that, then the audience would, I think, have left a little disillusioned, and certainly lacking one of the strong arguments in favor of Ada. Now if we go back to Mike Feldman's argument, he was making not so much a theoretical statement about the Ada definition, but an empirical statement about Ada compiler behavior. Indeed, using several commercial compilers, including Dec, Alsys, Telesoft, Verdix, RR, Meridian, I have never (with checks on) had one of my programs bomb out in this kind (bus error) of way, and furthermore I have never seen one of my students programs bomb out. Needless to say, the situation with C has been quite different. Now, even with the current generation of compilers, there probably exist constructable counter-examples, but neither Mike Feldman, nor me, nor I would guess most Ada programmers, typically run into them. The question in the future is whether this empirical observation will continue to hold. My concern, in the face of attitudes and claims from enthusiastic compiler optimizers, is that the situation will deteriorate, and people will start to run into trouble. I think it is essential to make sure this does not happen. Protection from runtime disasters is one of the important practical advantages of using Ada and if it is deteriorated, we lose a major advantage over the competition. I don't of course mind having a checks off mode where you can do almost anything you please in this department, but with checks on, security is much more important than a little bit of advantage from dubious optimization techniques like split-range analysis. Why do I say dubious here? Well I have an observation from many years of involvement in compiler optimization (actually about 25 years), which is that most compiler optimizations turn out disappointing, you think they will do much more than they original do (the original experience with FORTRAN-H was typical --- of course IBM could find examples where it made a huge difference --- but in typical applications programs the advantage was surprisingly small). 15 I perfectly well realize that basic RISC optimization techniques are pretty important, but once you get the basic techniques (possibly extended to cross-block scheduling for awkward architectures like Alpha and RS6000), the further gains that can be made are pretty small. ... We have a situation something like this: Checks cost a certain amount, I would guess 10-15% is typical (vendors indulge in hype in this area, so one has to be a little careful -- they tend, as is the general wont of the capitalistic spirit, to emphasize best cases). The use of more aggressive techniques can reduce this, or stop it growing, but these techniques may make the effect of undefined variables more serious. Now I think it is clear that users will generally want the guarantee that: if I in 1..10 then A(I) = 0; end if; does not assign to A(11) and bomb (this is a slight variation on the print example). Of course if you tell them that this guarantee will cost them a factor of 100 in the quality of their generated code, then they will hesitate. That's why real performance penalties are significant. Of course I know you can construct examples where you get arbitrarily huge figures, although I think even in this mode, a factor of 100 is pretty hard to achieve unless you have a really horrible architecture (even Alpha isn't that horrible in my book). However, users are not interested in constructed examples, they are interested in usage. I conclude from the above discussion that the only reasonable route is to make quite tight requirements in the safety/security annex. The view I have is that most vendors will implement this annex, so that almost all users will be able to invoke its provisions. This is likely to inhibit the most aggressive optimizations available in some products. 16 4 Requirements Robert Dewar provides an example: That (``Disaster ) safety critical") is plain wrong, at least according to my understanding. Safety critical means that safety of human life is at stake and that errors can cause death or injury. Well there are plenty of other disasters. Here is one example: A big New York bank had a COBOL program that had an out of range subscript. The result was that late at night, the program thought it needed to borrow $24 billion from the Fed. It sent out the message automatically to borrow the money and the Fed obliged. Next morning the bank found that it had a huge bill for interest on its hands. To the guy in charge, I am pretty sure this was a disaster! The above story is amusing, but somewhat out of scope, since this note is only considering the problems raised in the safety and security fields. Over the next few years, I see a drive to attempt to produce ultra-reliable software. By this I mean software which is more reliable than one error in 104 demands. This is very difficult, and as Littlewood has pointed out, is hard to justify by classical testing [3]. Another inherent limitation to reliability is microcode faults in chips, but that is out of scope as well! In the area we are considering two issues are important: to specify Ada 9X to minimise the problems and to reduce the inevitable problem of compiler bugs. These two requirements implies that a simple conceptual model is required (if that is possible!). The fact that users have a nice warm feeling about the checks that Ada systems perform is inadequate if ultra-high reliability is required. Users do not write 104 programs and therefore cannot provide the experience needed to justify the claims one would like to make. Erhard Ploedereder writes: Pragmatically speaking, the random jump out of a case table is not so much a problem: the likelihood that this bug would not manifest itself immediately 17 during testing is miniscule. Memory trashes caused by undetected index bounds violations, on the other hand, can go undetected literally for years. We all have our war stories on this one. But let's discuss the real issues here: 1. Should the core language enforce any rules involving the effects of uninitialized variables? 2. If so, how far should one go? For the core language, it seems to me hard to justify any change for the position with Ada 83, since the language has not been heavily criticised in this area. Indeed, Ada is distinctly better than most other languages. However, the position for the safety/security annex is quite different since it must be assumed that predictable behaviour is a major concern. Also, for critical applications outside the safety and security areas, it should be possible to invoke the annex. From my own soundings, the proposal for the pragma TRACABLEOBJECT-CODE (MD 4.0, G.5.3) would appear to inhibit the most aggressive forms of optimization, thus perhaps reducing the scope from unexpected behaviour from (formally erroneous) programs. 4.1 Reasoning about programs We have already noted some cases in which apparently reasonable machine code produced by a compiler can act in unexpected ways in the presence of an defined scalar. Robert Dewar states: So, back to the example: if I in 1 .. 10 then PRINT (I); end if; I believe that close to 100% of Ada programmers would feel that printing 11 is wrong here, even if I has an undefined value. Erhard Ploedereder writes: One particular error in Robert's argument is that the specific example of if I in 1..10 then Print(I) has 18 nothing to do with the question at hand. Even if all possible constraint checks are done, this could still print 11, if I is uninitialized. To say that this should not happen is to say that the extensive academic research done on split-range analysis for memory and register allocation should not be applied in practice. The view taken here is that such optimization should not be permitted with the TRACABLEOBJECT-CODE pragma. Brett writes: In the safety critical annex, put in a requirement that all fetches from potentially uninitialised variables must have their value checked as being in the subtype of that variable. ... The simplest solution to all these is check-on-fetch and check-on-external-calls. If this buries the safety- critical-people in a torrent of constraint checks, they have a simple solution. Use base-types that correspond to the hardware, and initialise all variables. The use of base types does not necessarily solve some problems, and does lead to a coding style which is not very Ada-like. See the example from Bryce Bardin for the contrary view. If the constraint checks are indeed a concern, then using pragma SUPPRESS seems quite reasonable. Indeed, one coding style used in safety-critical applications is to show that the program cannot violate constraint checks, and then compile the system with pragma SUPPRESS. One advantage of this approach, is that if it is felt necessary to check the machine code generated by the compiler by hand, there is much less of it. 5 Recommendations Here is the relevant paragraph (MS-1.6(6);4.0): Evaluating an uninitialized scalar variable is a bounded error. The possible results are to raise PROGRAMERROR (as always), or to produce a machine-representable value (which might not be in the subtype of the variable). Violating a range check raises CONSTRAINT-ERROR, even if the value came from an uninitialized variable. A compiler writer states: 19 Wake up and smell the coffee. If the market isn't demanding it, as is evidenced by the lack of it in most RFP's etc., then it is a mistake AND A BAD ONE to handicap all Ada users by the requirements of a minority for clearly erroneous programs to execute ``safely". It is clearly within the scope of Ada '83 for vendors to provide an option that detects such erroneous programs. We don't need extra rules slowing down EVERYONE'S Ada compilations and Ada execution speeds. Robert Dewar states: My own answer ... is that we should require a subscript check in this situation (store overwrite). I think in practice, since most subscripts are scalars, that the compiler will typically be able to eliminate the check in most cases (by figuring out that the variable must be initialized) and that the extra burden of the check is acceptable. Yes, checking in Ada will make us lose some benchmarks to C if we compare checked Ada with unchecked C, but we have to be prepared to argue that loosing such benchmarks shows the superiority of Ada in providing safety (after all C will lose to assembler all the time -- so what?) Ted Baker writes: It sounds like Ada 9X has to make a choice between cheap optimization and reliable checking. The worst of both worlds would seem to be the situation we have with Ada 83, where the compiler is required to do enough checking to make the code run slower than (say) FORTRAN, but not enough to provide well-defined semantics. Norman Cohen writes: It's obvious to me that ... hit the nail right on the head. The only acceptable solution is to provide a ``safe" mode in which surprising behavior is minimized, even for executions that would be erroneous in Ada 83 and a fast mode in which optimizers have a relatively free hand. After all, one of Ada's major markets consists of people who avoid C because they appreciate the safety obtained by compile-time and run-time checks, and another one 20 of Ada's major markets consists of hard-real-time programmers. Dual modes are the only way to satisfy both these markets. (Implementers are free, of course, to provide additional intermediate modes, such as one that optimizes as much as possible without changing observable run-time behavior.) For the core language, there is no consensus from the above extracts of discussion on the topic. In the face of aggressive optimization, the wording in the mapping document using the phrase `machine representation' seems inappropriate due to the difficulty in stating clearly the exact semantics. The appeal made by Robert Dewar to outlaw a possible random store overwrite by using an undefined scalar as an index is very persuasive. However, the situations in which a compiler would need to insert additional checks are quite numerous, so that the cost would be significant (although only a small percentage). However, it is agreed that the benefits are currently undetectable, since such store overwrites just do not happen in practice. Hence I conclude that no change from the position in Ada 83 is needed here. Not placing a requirement in Ada 9X against store overwrite does not mean that the position will get out of hand. Ada programmers value the safety that the language provides and so if a user finds that his programs are insecure due to this, he will complain to the vendor. There are many cases where vendors `correct' their compiler although the product was not formally incorrect. The advantage of having a safety and security annex is that requirements can be placed here which would not be acceptable in the core. Hence quite tight requirements should be made, since we know that there is a risk. We can also assume that a performance penalty will be acceptable in the general case. The key issue which the annex must address is one of predictability. Since large programs are inevitably likely to contain undefined scalars, and that reasoning about parts of such programs is important, we require not just the avoidance of a core dump, but `reasonable' behaviour. This must surely include that Boolean must be either true or false (for instance). I therefore make the following proposal: When the safety and security annex applies, the access to a scalar variable shall either raise the 21 CONSTRAINTERROR exception (or produce FALSE in the case of the in operator) or produce the effect of being within the declared subtype. The following points should be noted: 1. The performance cost will be non-trivial but can be reduced by means of pragma SUPPRESS. 2. The method of invoking the requirements of the annex is not specified here --- suggestions welcome! 3. Users can protect their code by means of the in operator knowing that the optimizer will not remove it. 4. It will be possible to reason about programs in the face of undefined scalars. 5. There is no attempt here to produce proper standardese to define the situation properly---all this note does is to outline a possible solution. (Cases like deferred constants need attention.) If this proposal finds general support, then the next stage would be to analyse the situation in greater depth. 6 Acknowledgements Thanks to Bryce Bardin, Bevin Brett, Norman Cohen, Robert Dewar, Bob Duff, Art Evans, Erhard Ploedereder and Tucker Taft for permission to use extracts from their e-mail messages. References [1] Ada 9X Project Report --- Ada 9x Requirements. Department of Defense. December 1990. [2] Draft Ada 9X Project Report --- Ada 9X Mapping Document, Volume II Department of Defense. December 1991. [3] B Littlewood and L Strigini. Validation of Ultra-High Dependability for Software-based Systems. (to be published). 22 A Pascal example: Neither true nor false The following Pascal program illustrates the problem of reasoning about programs if an unassigned value is present. { Did you think Booleans must be either True or False? If so, you are likely to have a nasty sur- prise when you run this test. The problem arises with Boolean values which have not been given a value. This is `simulated' in this program by using untagged variant records to assign an integer bit pattern to the storage used for an integer.} { Please send results back to Brian Wichmann, baw@seg.npl.co.uk, giving the computer/compiler used. NB It may be necessary to use compiler options to suppress checking in those implementations which (unusually) check untagged variant records. } program bassign(output); const min = -5; {To define range of integers used.} max = -min; type smallint = min..max; { Could replace this type by 'integer'. } { The following record is used to produce integer bit-patterns as `Booleans'. } unsafe = record case boolean of true: (Bool: Boolean); false: (Int: smallint); end; var Z: unsafe; count: smallint; function Ident(B: Boolean): Boolean; { This function is just the identity: make it more complex if you think you have a clever compiler. } begin 23 if B then Ident := arctan(1.0) > 0.5 else Ident := sqrt(3.5) < 1.0 end; procedure Check(B: Boolean); var X, Y: unsafe; j: smallint; found: Boolean; begin X.Bool := B; write('test 1: '); found := false; for j := min to max do begin Y.Int := j; if X.Bool = Y.Bool then begin found := true; writeln('Underlying value is:', j:2) end; end; if not found then writeln('Underlying value not found'); write('test 2: '); if X.Bool = true then writeln(true:5) else if X.Bool = false then writeln(false:5) else writeln('neither true nor false!'); write('test 3: '); if X.Bool then writeln(true:5) else if not X.Bool then writeln(false:5) else writeln('neither true nor false!'); write('test 4: '); if X.Bool = Ident(true) then writeln(true:5) else if X.Bool = Ident(false) then 24 writeln(false:5) else writeln('neither true nor false!'); write('test 5: '); if Ident(true) = X.Bool then writeln(true:5) else if Ident(false) = X.Bool then writeln(false:5) else writeln('neither true nor false!'); write('test 6: '); if X.Bool in [true..true] then writeln(true:5) else if X.Bool in [false..false] then writeln(false:5) else writeln('neither true nor false!'); end; begin for count := min to max do begin Z.Int := count; Check(Z.Bool) end; end. The output from the program will very rarely produce consistent results without printing the text neither true nor false!. In Pascal and Ada, the values True and False are typically represented by 1 and 0 respectively. In consequence, if the underlying value is either 0 or 1, then the program behaves consistently, but in the other cases, the program can report that the Boolean is neither true nor false. The corresponding Ada program is rather more complex, due to the tighter type rules of the language and that the `size' of a Boolean is one bit. Also, the optimization typically undertaken by Ada compilers complicates the issue. B Example Validation Program generic type Discrete is (<>); -- Any discrete type 25 type Int is range <>; -- Any integer type of the same size as Discrete -- in which all bit patterns are legal values. package Range_Validation is function In_Range (Data : Discrete) return Boolean; -- Returns True if end Range_Validation; with Unchecked_Conversion; package body Range_Validation is function To_Int is new Unchecked_Conversion (Discrete, Int); function In_Range (Data : Discrete) return Boolean is begin return To_Int(Data) >= Discrete'Pos(Discrete'First) and then To_Int(Data) <= Discrete'Pos(Discrete'Last); end In_Range; begin if Discrete'Size /= Int'Size or else not (Int'First + 1 = - Int'Last or Int'First + 1 = - (Int'Last - 1)) then raise Program_Error; end if; end Range_Validation; with Range_Validation; with System; with Text_IO; use Text_IO; with Unchecked_Conversion; procedure Test_Discrete_Validation is -- The purpose of this test is to reveal that Ada compil- ers may be performing -- optimizations which are detrimental to the program- mer's needs for data -- validation and that membership tests (at least as cur- rently implemented) -- are not necessarily tests for valid bit patterns. -- There are a number of ways that invalid values may be intro- duced into an -- Ada program: 26 -- 1) Use of an undefined value -- - Use of an uninitialized variable -- - Use of a scalar out parameter which was not up- dated by the call; -- likewise for a scalar subcomponent of an out parameter -- 2) By access to a deallocated object -- (The "dangling pointer" problem) -- 3) By unsynchronized assignment to a shared variable -- 4) Through incorrect usage of package Machine_Code -- 5) By execution of incorrect code (e.g., bad code emit- ted by the compiler) -- 6) Through hardware faults -- 7) By Sequential_IO or Direct_IO reads of a corrupted or in- valid file -- (Checks on reads can be omitted if they are "too complex") -- 8) Use of non-Ada code through pragma Interface -- (Only subtype checks are performed on out parameters) -- 9) Due to suppression of an exception check -- 10) Use of an overlay violating the properties of the re- sult type -- 11) Use of an unchecked conversion violating the properties of -- the result type -- Otherwise, an object can never contain a value which is not de- fined by -- its subtype. -- Cases 1 through 4 always represent avoidable de- sign and/or coding errors. -- There is never a situation in which these errors should oc- cur in an -- otherwise correct program. 27 -- Cases 5 and 6 are not language issues per se (al- though a case could -- possibly be made for the need to check for and correct hard- ware errors -- using software). -- Case 7 is all too likely in practice, and can- not be avoided except through -- defensive design and coding approaches. The most com- mon cause is that the -- Ada program opens an external file for read- ing and the type of that -- file is not the type that the Ada program ex- pects. The safe detection of -- and recovery from this situation *requires* the data to be validated -- *after* it is input into the program. -- Case 8 is common in mixed language systems which take advan- tage of -- existing software, such as 'C' interfaces to X- Windows or other graphics -- standards or applications which are tuned for perfor- mance by dropping into -- assembler. -- Cases 9 through 11 generally occur because of perfor- mance constraints. -- For instance, some portion of code and/or some data types have checks -- suppressed in order to increase the speed of execu- tion. Or an overlay is -- used instead of unchecked conversion because it doesn't in- cur the added -- overhead of slicing. -- It may seem to the programmer, at least, that check- ing of a value for -- validity *after* it has been converted (possibly erro- neously) to the -- desired target type is clearer (and therefore less prone to er- ror) and -- takes less effort than conversion of input data from the (weak) source -- type to a separate (numeric) type in which all bit pat- terns are acceptable, -- then validating against values of the target type which are also converted 28 -- to the numeric type, and finally converting the vali- dated value to the -- target type. -- The membership test ("in") would seem (at first glance, any- way) to be a -- language feature that could do the desired test- ing in a straightforward -- fashion. -- However, the following questions immediately arise: -- Is the membership test ("in") intended to per- form data validation -- directly (that is, without requiring ancillary explicit con- versions) on -- scalar types? (Simple membership tests on compos- ite types are logically -- incomplete for validation.) -- If so, does it apply to all scalar types, or just those with integer -- representations (discrete and (possibly fixed point types))? -- If not, is it still it possible to use member- ship tests for this purpose? -- Does a language-required range check (or a member- ship test) on an -- enumeration type which has a gaps in its representa- tion have to correctly -- account for the gaps? -- Can explicitly coded range checks and member- ship tests in general always -- (never, sometimes) be eliminated by an optimizing com- piler when checking -- is turned on? -- The fundamental issue is: what are the required seman- tics of membership -- tests? -- This program tests how compilers treat member- ship tests for values obtained 29 -- through Unchecked_Conversion. It can also be used with ad- dress clauses -- instead (See the comments below for how to do this). In ei- ther case, the -- program is technically erroneous, because: -- 1) its use of Unchecked_Conversion violates ARM 13.10.2(3) and -- 2) the alternative use of overlays would violate ARM 13.5(8). -- Similar tests can be constructed for cases 7 through 10, but it is likely -- that compilers will treat these cases in the same man- ner as the current -- case (which corresponds to case 11). -- The assumptions about membership tests which were made in this test program -- are: -- 1) explicit membership tests can never legally be elimi- nated as long as -- checking is not suppressed, and -- 2) membership tests on enumeration types which have representations -- that include gaps must properly account for the gaps. -- The desired output is simply a message indicating successful -- completion. N. B.: No compiler I have checked to date passes the test for -- membership in type Gap. Many compilers pass all the other tests, but a few -- (erroneously?) optimize some of the tests away and thus pro- duce other -- failure messages. Size : constant := 8; type Base_Int is range -2**(Size - 1) .. 2**(Size - 1) - 1; subtype Int is Base_Int range -3 .. 5; type Enum is (Enum1, Enum2, Enum3, Enum4, Enum5, Enum6); type Gap is (Gap1, Gap2, Gap3, Gap4, Gap5, Gap6); for Gap use (-8, -1, 0, 1, 4, 5); type Record_Type is 30 record I : Int; E : Enum; G : Gap; end record; for Record_Type use record I at 0 range 0 .. Size - 1; E at 0 range Size .. 2*Size - 1; G at 0 range 2*Size .. 3*Size - 1; end record; type Validation_Type is record V1 : Base_Int; V2 : Base_Int; V3 : Base_Int; end record; for Validation_Type use record V1 at 0 range 0 .. Size - 1; V2 at 0 range Size .. 2*Size - 1; V3 at 0 range 2*Size .. 3*Size - 1; end record; R : Record_Type; V : Validation_Type; -- An alternative approach uses address clauses to achieve over- laying of R -- and V. -- Insert the following address clause to use an overlay: -- for V use at R'Address; Enum_Check_Failed : Boolean := False; Gap_Check_Failed : Boolean := False; Int_Check_Failed : Boolean := False; package IV is new Range_Validation(Int, Base_Int); function In_Int (I : in Int) return Boolean re- names IV.In_Range; 31 package EV is new Range_Validation(Enum, Base_Int); function In_Enum (E : in Enum) return Boolean re- names EV.In_Range; -- Remove this instantiation if an overlay is used: function To_Record_Type is new Unchecked_Conversion(Validation_Type, Record_Type); function To_Base_Int is new Unchecked_Conversion (Gap, Base_Int); function In_Enum (N : in Base_Int) return Boolean is begin return N in Enum'Pos(Enum'First) .. Enum'Pos(Enum'Last); end In_Enum; function In_Gap (N : in Base_Int) return Boolean is begin return N = To_Base_Int(Gap1) or else N in To_Base_Int(Gap2) .. To_Base_Int(Gap4) or else N in To_Base_Int(Gap5) .. To_Base_Int(Gap6); end In_Gap; function In_Gap (G : in Gap) return Boolean is begin return To_Base_Int(G) = To_Base_Int(Gap1) or else To_Base_Int(G) in To_Base_Int(Gap2) .. To_Base_Int(Gap4) or else To_Base_Int(G) in To_Base_Int(Gap5) .. To_Base_Int(Gap6); end In_Gap; procedure I_NG (N : in Base_Int) is begin Int_Check_Failed := True; Put("N = "); Put(Base_Int'Image(N)); Put(", N is "); if N not in Int then Put("not "); end if; Put_Line("in Int"); begin Put("R.I = "); Put(Int'Image(R.I)); 32 Put(", "); exception when others => Put_Line("? ('Image failed)"); end; Put("R.I is "); if R.I not in Int then Put("not "); end if; Put("in Int, R.I is "); if not In_Int(R.I) then Put("not "); end if; Put_Line("In_Int"); New_Line; end I_NG; procedure E_NG (N : in Base_Int) is begin Enum_Check_Failed := True; Put("N = "); Put(Base_Int'Image(N)); Put(", N is "); if not In_Enum(N) then Put("not "); end if; Put_Line("In_Enum"); begin Put("R.E = "); Put(Enum'Image(R.E)); Put(", "); exception when others => Put_Line("? ('Image failed)"); end; Put("R.E is "); if R.E not in Enum then Put("not "); end if; Put("in Enum, R.E is "); if not In_Enum(R.E) then Put("not "); end if; Put_Line("In_Enum"); 33 New_Line; end E_NG; procedure G_NG (N : in Base_Int) is begin Gap_Check_Failed := True; Put("N = "); Put(Base_Int'Image(N)); Put(", N is "); if not In_Gap(N) then Put("not "); end if; Put_Line("In_Gap"); begin Put("R.G = "); Put(Gap'Image(R.G)); Put(", "); exception when others => Put_Line("? ('Image failed)"); end; Put("R.G is "); if R.G not in Gap then Put("not "); end if; Put("in Gap, R.G is "); if not In_Gap(R.G) then Put("not "); end if; Put_Line("In_Gap"); New_Line; end G_NG; procedure Summarize_Errors is begin New_Line; if Int_Check_Failed or else Enum_Check_Failed or else Gap_Check_Failed then if Int_Check_Failed then Put_Line ("Type Int validation checks 'failed'."); end if; if Enum_Check_Failed then Put_Line ("Type Enum validation checks 'failed'."); 34 end if; if Gap_Check_Failed then Put_Line ("Type Gap validation checks 'failed'."); end if; else Put_Line("All validation checks 'succeeded'."); end if; end Summarize_Errors; begin All_Bit_Patterns: for N in Base_Int loop V := (V1 => N, V2 => N, V3 => N); -- Remove this assignment if an overlay is used: R := To_Record_Type(V); if N in Int and then (R.I not in Int or else not In_Int(R.I)) then I_NG(N); end if; if N not in Int and then (R.I in Int or else In_Int(R.I)) then I_NG(N); end if; if In_Enum(N) and then (R.E not in Enum or else not In_Enum(R.E)) then E_NG(N); end if; if not In_Enum(N) and then (R.E in Enum or else In_Enum(R.E)) then E_NG(N); end if; if In_Gap(N) and then (R.G not in Gap or else not In_Gap(R.G)) then G_NG(N); end if; if not In_Gap(N) and then (R.G in Gap or else In_Gap(R.G)) then G_NG(N); end if; end loop All_Bit_Patterns; 35 Summarize_Errors; end Test_Discrete_Validation; C Variables and Function Calls with Values Outside Their Declared Subtypes LSN-036-DR Norman H. Cohen March 16, 1992 C.1 Introduction In a nonerroneous execution of an Ada 83 program, every variable, when evaluated, holds a value of its declared subtype and every function call returns a result of the declared result subtype. The rules of Ada 83 allow code to be generated under the assumption that these properties are never violated, since the rules allow erroneous executions to behave in any way whatsoever. In practice, these properties are often violated by variables inadvertently used before they are initialized and by data imported across an external interface. Code generated under the unrealistic assumption that the properties always hold may exhibit undesirable behavior when the properties do not hold. Checks that would prevent the program from storing into or branching to arbitrary addresses may be optimized away as a result of the assumption. Similarly, explicit checks written by the programmer to detect violations of the properties may be optimized away because of the assumption that no such violations exist. The rules of Ada 9X should reflect the reality that variables may assume values outside their subtype. Certain checks that could be optimized away in Ada 83 under the assumption that a variable holds a value of its declared subtype should be required in Ada 9X. However, it should remain possible to omit checks that could be omitted in Ada 83 without bad consequences. This language study note proposes a set of rules for dealing with variables and function results with values outside their declared subtypes. Section 2 of this note reviews specific problems arising from the current rules. Section 3 proposes a solution and examines its implications. Section 4 deals with 36 technical issues arising from the solution for each kind of Ada 9X type. C.2 Problems associated with invalid bit patterns Invalid bit patterns arise in Ada from the evaluation of variables with undefined values and from data imported across an external interface. A scalar variable has an undefined value before it has been initialized or assigned to; any variable has an undefined value if a task is aborted while updating that variable. Data may be imported across an external interface at an agreed-upon storage location (perhaps using an address clause to place an Ada variable at that location) or read using the procedure READ provided by instances of SEQUENTIAL-IO and DIRECT-IO. In addition, the CHARACTER or STRING version of TEXTIO.GET may read an 8-bit pattern that is invalid for type CHARACTER because the high-order bit is one. In some cases, data is imported with one type view (say an array of bytes) and converted by an instance of UNCHECKEDCONVERSION to another, higher-level type view. Many, but not all, invalid bit patterns arise from erroneous execution. Evaluation of an undefined scalar variable is erroneous by RM 3.2.1(18) and evaluation of other undefined variables is erroneous by AI-00837. An unchecked conversion that violates `the properties that are guaranteed by the language for objects of the target type' is erroneous by RM 13.10.2. Proposed AI-00870 would make execution erroneous when a call on READ yields an invalid bit pattern but does not raise DATA-ERROR, but the ultimate disposition of this issue is open. There is no RM rule and no AI covering invalid 8-bit patterns for type CHARACTER. Most compilers treat these with benign neglect, facilitating the writing of applications that could not otherwise be written in Ada 83. There seems to be no RM rule and no AI covering the case where a variable located at an address stipulated by an address cause (or designated by an access value obtained from or passed to the outside world) obtains an invalid bit pattern from some hardware or software agent outside the Ada program. Some compilers generate code under the assumption that variables do not contain invalid bit patterns. This strategy can be formally justified, at least in the case of variables not subject to address clauses and not set by calls on READ, by the observation that the assumption is violated only in erroneous executions. RM 1.6(7) permits the generation of code that 37 produces unpredictable results in erroneous executions. However, this strategy has undesirable consequences in such contexts as membership tests, qualified expressions, case statements, and indexed components. C.2.1 Problems with membership tests Programmers wishing to guard against invalid bit patterns may attempt to do so with a membership test: type ALTITUDE_TYPE is range 0 .. 50_000; ALTITUDE, PREVIOUS_ALTITUDE : ALTITUDE_TYPE; ... loop PREVIOUS_ALTITUDE := ALTITUDE; READ_ALTITUDE_SENSOR(ALTITUDE); if ALTITUDE in ALTITUDE_TYPE then ... -- normal processing READINGS_IGNORED := 0; else ALTITUDE := PREVIOUS_ALTITUDE; READINGS_IGNORED := READINGS_IGNORED + 1; if READINGS_IGNORED > MAX_READINGS_IGNORED then raise ALTITUDE_SENSOR_FAILURE; end if; end if; ... end loop; However, a compiler assuming the absence of invalid bit patterns could deduce that a test for membership of a variable's value in the declared subtype of that variable--in this case ALTITUDE in ALTITUDE_TYPE --is guaranteed to succeed, and optimize away the test. C.2.2 Problems with qualified expressions A similar problem may arise with the constraint check of a qualified expression: ALTITUDE, UNVALIDATED_ALTITUDE : ALTITUDE_TYPE; ... READ_ALTITUDE_SENSOR(UNVALIDATED_ALTITUDE); 38 begin ALTITUDE := ALTITUDE_TYPE'(UNVALIDATED_ALTITUDE); VALID := TRUE; exception when CONSTRAINT_ERROR => VALID := FALSE; end; The compiler may optimize away the constraint check on the grounds that UNVALIDATED-ALTITUDE always satisfies the constraint of its declared subtype. C.2.3 Problems with case statements RM 5.4(4) imposes the following rules for choices in a case statement: If the expression is the name of an object whose subtype is static, then each value of this subtype must be represented once and only once in the set of choices of the case statement, and no other value is allowed; this rule is likewise applied if the expression is a qualified expression or type conversion whose type mark denotes a static subtype. Otherwise, for other forms of expression, each value of the (base) type of the expression must be represented once and only once in the set of choices, and no other value is allowed. Based on this rule and on the assumption that the case-statement expression is not a variable containing an invalid bit pattern, the compiler is permitted to generate a jump table with entries for only the following bit patterns: ? when the case-statement expression is an object with a static subtype, the bit patterns corresponding to the values of that subtype; ? when the case-statement expression is a qualified expression or type conversion whose type mark denotes a static subtype, the bit patterns corresponding to the values of that subtype; ? otherwise, the bit patterns corresponding to the values of the base type of the case-statement expression. The language does not call for any check to be performed on the value of a case-statement expression (except the constraint 39 check in a qualified expression in those instances in which a qualified expression is required, but as we have seen, the check inherent in the qualified expression can be optimized away). Therefore, if the case-statement expression is a variable containing an invalid bit pattern, or a qualified expression whose operand is such a variable, the generated code may jump to an improper location, with unpredictable consequences. This danger exists even in the presence of a when others => alternative, since a compiler may treat this alternative as covering only the valid bit patterns that have not been listed explicitly in earlier choices: TEXT_IO.GET(NEXT_CHAR); -- Suppose compiler does not check for byte -- values outside of 0 .. 127. case NEXT_CHAR is when CHARACTER'VAL(0) .. CHARACTER'VAL(31) | CHARAC- TER'VAL(127) => CLASS := CONTROL_CHARACTER; when ' ' => CLASS := SPACE; when '0' .. '9' => CLASS := DIGIT; when 'A' .. 'Z' | 'a' .. 'z' => CLASS := LETTER; when others => -- May generate choices only for CLASS := SPECIAL_CHARACTER; -- remaining values of type end case; -- CHARACTER and not for byte values -- 128 .. 255. C.2.4 Problems with indexed components Evaluation of an indexed component entails a check that each index-value expression has a value in the corresponding index subtype. When the index-value expression is a variable whose declared subtype is the corresponding index subtype (or a subset of that subtype), a compiler may omit the index check based on the assumption that the variable contains a value of its declared subtype. For an indexed component occurring as a primary, violation of this assumption leads to a fetch from an arbitrary address, resulting in either a hardware addressing exception or the delivery of an arbitrary bit pattern without warning. (This arbitrary bit pattern may itself be an invalid bit pattern for 40 the array component subtype, causing cascading errors.) For an indexed component occurring as the variable of an assignment statement or as actual parameter of mode out or in out, use of an invalid bit pattern as an index value leads to a store to an arbitrary address, with unpredictable consequences. C.3 A possible solution We use the following terminology in discussing the treatment of invalid and potentially invalid bit patterns: ? A type is `compact' if every appropriately sized bit pattern is the representation of some value of that type. (Many bit patterns may be representations of the same value.) ? For a base type with a first named subtype, a value of the first named subtype is said to be `proper'; a value of the base type that is not also a member of the first named subtype is said to be `improper'. ? A function call or a use of a variable as a primary is said to have a `confirmed' value if its evaluation is guaranteed to yield a value in the result subtype of the function call or the declared subtype of the variable. If it is possible for the evaluation to yield a value outside of this subtype, the function call or use of the variable is said to have an `unconfirmed' value. The term `value' is used in two different senses when we speak of a proper value and when we speak of a confirmed value. It is a value in the mathematical sense that can be classified, independent of context, as a proper or improper value of a given base type. It is an occurrence of a primary that is a function call or a variable name that is classified as having a confirmed or unconfirmed value; following the declarations X : NATURAL := 0; Y : NATURAL; both X and Y may contain the value 0, but X is viewed as having a confirmed value and Y as having an unconfirmed one. Having an unconfirmed value is a property of an occurrence of Y, not a property of the mathematical value 0. When a variable (including a formal parameter) is declared to belong to a particular subtype, this obligates the implementation 41 to ensure that the variable either contains a value of the subtype or has an unconfirmed value. Unconfirmed values may arise in a number of ways: ? The initial value of a variable without an explicit or language-defined initialization is, in general, unconfirmed. ? The result of an unchecked conversion is, in general, an unconfirmed value. ? A value obtained from a call on the procedure READ (exported by an instance of SEQUENTIAL-IO or DIRECTIO) is, in general, unconfirmed. ? An unconfirmed value may arise from viewing the same bits as belonging to more than one type, or by manipulating the same bits both in Ada code and code written in some other programming language. Dual views can arise through the unchecked conversion of access values, the passing of access values to or from non-Ada code, the passing of addresses to non-Ada code, or the use of address clauses. ? In general, assignment of an unconfirmed value of a subtype ST to a variable of some subtype that includes ST (perhaps ST itself), causes all uses of the target variable potentially reached by that assignment to have unconfirmed values. However, particular values that would, in general, be unconfirmed can be treated by an implementation as confirmed in any circumstance in which the implementation can guarantee that the value is of the appropriate subtype. For example: ? If a subtype contains every value of some compact type, all variables and function results of that subtype may be regarded as having confirmed values. ? In a region of code that the implementation ensures can only be reached when a given variable has a value of the appropriate subtype, all evaluations of the given variable may be regarded as having confirmed values. For example, if uninitialized variable X is of subtype ST and (as we shall require) the membership test `X in ST' is required to return FALSE when the value of X is not of subtype ST, occurrences of X as a primary in the sequence of statements of 42 if X in ST then ... end if; may be regarded as having confirmed values. Similarly, if a check is performed upon assignment of an unconfirmed value to ensure that the assigned value belongs to the declared subtype of the target variable, then (in the absence of other assignments to that variable) uses of the target variable may be regarded as having confirmed values. ? An implementation may apply more sophisticated reasoning to deduce that a variable has a confirmed value. For example, given X, Y : ST; ... Y := X; if X in ST then ... end if; the compiler may reason that if control reaches the sequence of statements inside the if statement, then X must have had a value of subtype ST at the time of the assignment to Y, so that the value of Y can be regarded as confirmed inside the if statement. In general, the treatment of potentially unconfirmed values as confirmed is implementation dependent. It depends on the sophistication of the analysis performed by an implementation and on the implementation's strategy for inserting checks in contexts where they are not strictly required, perhaps to avoid more expensive checks later. Nonetheless, an implementation has an obligation that can be stated in implementation- independent language: An implementation must somehow guarantee that a function result or variable evaluation has a value in the appropriate subtype or that its value is treated according to the rules (given below) for unconfirmed values. Reasoning about unconfirmed values can be extended to larger expressions. Suppose that the variable X occurs as part of some larger expression and that the compiler can determine that whenever X holds a value of some subtype ST1, the larger 43 expression necessarily holds a value of some other subtype ST2. Then if X is in fact an unconfirmed value of ST1, the larger expression can be treated as an unconfirmed value of ST2. For example, given the declaration A : INTEGER range 1 .. 10; B : INTEGER range 2 .. 11; ... B := A+1; the expression A has an unconfirmed value of subtype INTEGER range 1 .. 10 in the assignment to B, so the expression A+1 can be regarded as having an unconfirmed value of subtype INTEGER range 2 .. 11. Similarly, compile-time analysis may allow an unconfirmed value of one subtype to be treated as an unconfirmed value of a smaller subtype. For example, if X has an unconfirmed value of subtype 1 .. 10 in the membership test of if X not in 6 .. 10 then ... end if; then X may be regarded as having an unconfirmed value of subtype 1 .. 5 in the sequence of statements of the if statement. The problem of invalid bit values can be solved by adopting the following rules: 1. Every type declaration introduces an anonymous base type, as well as a subtype named by the identifier given in the type declaration. The anonymous base type is compact. 2. An uninitialized variable does not contain an `undefined' value, but an arbitrary, possibly improper, value of its base type. Thus, evaluation of an uninitialized scalar variable is no longer erroneous, but evaluation of a variable whose update was aborted remains erroneous. 3. For the assignment of an unconfirmed value of subtype ST to a variable whose subtype includes ST, following the successful evaluation of the name and expression of the assignment statement, the effect of attempting to assign a value outside the declared subtype of the target variable may be one of the following: 44 (a) PROGRAMERROR is raised. (b) CONSTRAINTERROR is raised. (c) The value is assigned. The language does not define which of these three possibilities occurs, nor does it require that the same possibility occur for all executions of a given assignment statement. These rules also apply to the passing of an in or in out actual parameter to its formal parameter at the beginning of a subprogram and to the passing of an out or in out formal parameter back to its actual parameter at the end of the call. 4. A membership test with a type mark tests whether the value of the expression satisfies the constraints imposed by the denoted subtype. Improper values fail this test. 5. Unchecked conversion is never erroneous, though it may return an improper value. 6. If the expression in a case statement is the name of an object whose subtype is static, then a check is performed following the evaluation of the expression to ensure that the value of the expression is not improper. PROGRAM-ERROR is raised if this check fails. (See Section 3.3 below for a discussion of the practical implications of this rule.) These rules give the implementation the option of removing checks upon assignment of a variable or function result with a given declared subtype to a variable of the given subtype or a superset of the given subtype. Other checks and evaluations of conditions may be removed only in cases where the compiler can deduce that the check will always succeed. However, the analysis leading to this conclusion must account for the possibility of improper values arising in nonerroneous executions from uninitialized variables, calls on READ, unchecked conversions, variables with address clauses, and so forth, and propagating through unchecked assignments and unchecked parameter passing. The rules produce the desired effect for each of the problematic constructs discussed earlier. C.3.1 Effect on membership tests The membership test ALTITUDE in ALTITUDE_TYPE 45 is an operation of the anonymous compact base type of the first named subtype ALTITUDETYPE. This operation is required to return FALSE for values of that type that do not satisfy the constraints of ALTITUDETYPE. The test cannot be optimized away if the occurrence of ALTITUDE in the membership test has an unconfirmed value, because it is possible, even in nonerroneous programs, that ALTITUDE will contain an improper value. (On the other hand, if the membership test immediately follows the procedure call READ_ALTITUDE_SENSOR(ALTITUDE); and the compiler has exercised its option to check that the value passed back to the actual parameter is proper, the value of ALTITUDE in the membership test is confirmed, so the compiler can legitimately optimize the membership test away.) C.3.2 Effect on qualified expressions The constraint check in a qualified expression cannot be optimized away unless the operand of the qualified expression has a confirmed value. Thus a qualified expression always raises an exception or yields a value of the subtype denoted by its type mark. C.3.3 Effect on case statements For a case statement, we must consider each of the three circumstances addressed by RM 5.4(4). For a case statement whose expression is a variable of a static subtype, a check is performed to ensure that no branch takes place indexed by an improper value. The check can be optimized away in case statements where the expression has a confirmed value. Even when not optimized away, the check can be implemented cheaply in the form of an additional, compiler-generated case alternative whose choices are the improper values of the base type and whose `sequence of statements' raises PROGRAMERROR. For a case statement whose expression is a qualified expression whose type mark denotes a static subtype, successful evaluation of the qualified expression ensures that the value of the expression is not improper, as noted above. (The result of the qualified expression is always a confirmed value.) Similar reasoning applies to a case statement whose expression is a type conversion. 46 For all other case statements, the redefinition of base types introduces an upward incompatibility with Ada 83 that can be detected at compile time and easily corrected. Consider the following example, disregarding for the purposes of the example the proposed expansion of CHARACTER to 256 values in Ada 9X: case NEXT_CHAR is when CHARACTER'VAL(0) .. CHARACTER'VAL(31) | CHARAC- TER'VAL(127) => CLASS := CONTROL_CHARACTER; when ' ' => CLASS := SPACE; when '0' .. '9' => CLASS := DIGIT; when 'A' .. 'Z' | 'a' .. 'z' => CLASS := LETTER; when '!' .. '/' | ':' .. '@' | '[' .. '^' | '{' .. '~' => CLASS := SPECIAL_CHARACTER; end case; This statement was legal in Ada 83 because it had a choice for each of the 128 values of the base type of NEXT-CHAR, type CHARACTER. Suppose that CHARACTER'SIZE = 8, so that type CHARACTER is not compact. Under the proposed new rules (but with CHARACTER still defined by a 128-value enumeration type definition), type CHARACTER is no longer a base type, but the first named subtype of an anonymous 256-value base type. Since not all choices of the base type are accounted for, the case statement becomes illegal by the last sentence of RM 5.4(4). If the programmer was really manipulating improper values (for example, if NEXTCHAR was set by a call on TEXTIO.GET and that procedure does not check for improper values), the programmer blesses the Ada 9X designers for warning him that he had been walking on the edge of a cliff and adds a choice to cover these values, for example: when others => CLASS := EXTENDED_CHARACTER; If the programmer knows that improper values cannot arise, he can indicate this by qualifying the case-statement expression with the first named subtype, so that he doesn't have to add a case-statement arm that he knows is unreachable: case CHARACTER'(NEXT_CHAR) is 47 ... end case; If the compiler is as certain as the programmer that the value of NEXTCHAR cannot be improper, NEXT-CHAR will be treated as having a confirmed value and no check will be generated. A programmer who is adamant that no check be generated can use pragma SUPPRESS: declare pragma SUPPRESS(RANGE_CHECK, ON => NEXT_CHAR); begin case CHARACTER'(NEXT_CHAR) is ... end case; end; (Actually, the check that a value is not improper should probably be considered distinct from RANGE-CHECK. In the case of an enumeration type with holes, for example, the name RANGE-CHECK is misleading.) C.3.4 Effect on indexed components In general, the index check for an indexed component cannot be optimized away when the index-value expression is a variable whose declared subtype is the index subtype, because that variable may contain an improper value even in nonerroneous programs. (This is so whether the indexed component occurs as a primary or as the name of a variable to be set.) In cases where the variable used as an index has a confirmed value, the check may be safely omitted. subsubsectionEffect on assignments The proposed rules allow constraint checks on assigned values to be omitted in the same cases as in Ada 83. Given the declarations A : INTEGER range 1 .. 5; B : INTEGER range 1 .. 5; C : INTEGER range 1 .. 10; the check may be omitted from the assignment A := B; 48 However, if the check is omitted, then within the range of this assignment, B is viewed as having an unconfirmed value of its declared subtype. In contrast, the check may not, in general, be omitted from the assignment A := C; because the declared subtype of A does not include the declared subtype of C. In the context if C not in 6 .. 10 then A := C; end if; the check may be omitted after all, since within the if statement a moderately clever compiler may regard C as having an unconfirmed value of subtype INTEGER range 1 .. 5. C.3.5 Effect on language-independent optimization techniques Concern has been expressed about the effect of safer treatment of uninitialized variables on language-independent optimization techniques. There are two issues to be considered: ? the impact on the structure of a common optimizing back end used by front ends for several languages ? the impact of additional checks on Ada run-time performance Application of the rules proposed here does not require fundamental changes in an optimizer's treatment of undefined variables. It simply requires that an Ada variable without a language-specified or programmer-specified specified initial value be regarded has having some, unspecified but consistent, initial value. A C extern variable, for example, must be treated in exactly the same way. (Unlike a C extern variable, however, an Ada local variable without an initial value need not be regarded as having all its definitions potentially killed by calls on separately compiled subprograms.) It is understood that Ada imposes a small run-time overhead in return for providing substantially more safety than languages like C. This added safety is perceived, both by users and nonusers of Ada, as a distinguishing strength of the language. The checks proposed here do not add significantly to that overhead. Ada programmers who find the overhead unacceptable can suppress checks, attaining a small speedup at the cost of safety. 49 The rules proposed here have been formulated to minimize the number of checks that have to be made, consistent with certain commonly held (albeit currently unjustified) expectations: ? Use of incorrect values will be manifested by the raising of exceptions rather than by unpredictable failures (even if the incorrect values are improper). ? In the presence of explicit tests to prevent a statement from being reached when certain values are improper, the statement will in fact not be reached when those values are improper. The proposed rules require checks only in places where unpredictable failures may arise, such as case statements and indexed components, or in places where tests or checks are explicitly called for, such as membership tests and qualified expressions. Even then, a compiler is allowed to optimize away tests and checks whose results are known, perhaps because compile-time analysis determines that a certain use of a variable has a confirmed value. C.4 Semantics of improper values The notion of improper values introduces several technical problems that must be addressed. The remainder of this LSN addresses these problems. Different issues arise for different classes of types. C.4.1 Improper values of integer types Integer types present no difficulties. The rules of Ada 83 already stipulate that an integer type declaration introduces an anonymous base type (derived from a predefined integer type) and a first named subtype constrained by the range in the integer type definition. In typical implementations, the anonymous base type is already compact. C.4.2 Improper values of enumeration types In Ada 83, if ST is a subtype of a base enumeration type BT, then the attributes ST'FIRST and ST'LAST yield the bounds of the subtype ST, ST'SIZE yields the number of bits required by the implementation to hold all values of subtype ST, and ST'WIDTH yields the maximum image length for all values of subtype 50 ST. However, the attributes ST'POS, ST'VAL, ST'SUCC, ST'PRED, ST'IMAGE, and ST'VALUE are, in effect, synonyms for BT'POS, BT'VAL, BT'SUCC, BT'PRED, BT'IMAGE, and BT'VALUE, respectively: Their range of allowable inputs is determined by the base type, not the subtype. Since these attributes apply to integer types as well as to enumeration types, there are already rules defining their effect in the case of anonymous base types. These rules are based on the presumption that all values of the anonymous base type have position numbers. Suppose that improper values of an enumeration type T have position numbers, ranging from T'POS(T'LAST)+1 to 2**T'BASE'SIZE-1. The application of the rules for anonymous base integer types to anonymous base enumeration types has the following disconcerting implications for a first named enumeration subtype T: ? Although improper values do not have enumeration lit- erals, they can be constructed by such attributes as T'BASE'VAL(T'LAST+N), where N is positive. It follows that T'IMAGE can be invoked with confirmed enumeration values that have no images. ? T'BASE'WIDTH is defined in terms of the maximum image width of a set of enumeration values some of which may not have enumeration literals. ? If X holds the position number of an improper value of T'BASE, the attribute T'POS(X) yields an improper value rather than raising CONSTRAINTERROR as in Ada 83. ? T'SUCC(T'LAST) yields an improper value rather than raising CONSTRAINTERROR as in Ada 83. Thus the application to anonymous base types of Ada 83 rules for enumeration base types results in a run-time upward incompatibility. A run-time incompatibility that replaces the occurrence of an exception with the computation of a sensible result is generally deemed acceptable. However, it is not at all clear that generation of an improper value is a `sensible result.' Consider, for example, the following idiom, which, while stylistically objectionable, is probably quite common: type DAY_TYPE is (SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRI- DAY, SATURDAY); 51 DAY : DAY_TYPE; ... begin DAY := DAY_TYPE'SUCC(DAY); exception when CONSTRAINT_ERROR => DAY := DAY_TYPE'FIRST; -- wrap around end; With the rules described above, if the expression DAYTYPE'SUCC(DAY) is reached with DAY = DAYTYPE'LAST, it will not raise CONSTRAINTERROR, but yield the improper value DAYTYPE'VAL(7). If this value ever propagates to a context where a check is required, it will raise an exception there. However, the handler in the block statement will not apply at that point. It seems preferable to define T'BASE'VAL to raise CONSTRAINTERROR when given the position number of an improper enumeration value and to define T'BASE'SUCC to raise CONSTRAINTERROR when its argument is a proper enumeration value but the successor of its argument is not. This behavior is achieved by generating precisely the checks that are generated for Ada 83, and it preserves Ada 83 semantics for enumeration types. However, preserving Ada 83 semantics for both enumeration types and integer types requires that the attributes of discrete types have different rules for improper integer values and improper enumeration values. It also requires that the behavior of the operations of a just-declared anonymous base enumeration type be defined partly in terms of the about-to-be-declared first named subtype; this is aesthetically unpleasing, but not problematic. Ada 83 says nothing about the behavior of T'BASE'POS, T'BASE'SUCC, T'BASE'PRED, T'BASE'IMAGE, the relational operators, and membership tests when applied to an improper value; the language designers presumed that this could happen only in erroneous executions, for which any outcome would be permitted. For example, one possible effect of T'BASE'IMAGE when applied to an invalid bit pattern in Ada 83 is to perform a lookup in an image table using an out-of-range index, possibly causing an addressing exception. Each of these operations must be considered separately. ? It seems sensible to stipulate that T'BASE'POS returns the 52 position number of its improper argument. This is trivial to implement, avoids the need for any checks, and produces a sensible, potentially useful result. ? For the application of T'BASE'SUCC or T'BASE'PRED to an improper value, it seems reasonable to return the actual successor or predecessor (which is improper except for the predecessor of the first improper value). However, this is inconsistent with the rule that T'BASE'SUCC(T'LAST) must raise CONSTRAINTERROR. T'BASE'SUCC(X) could be made to raise CONSTRAINTERROR for all improper values X simply by changing the check from if X = T'LAST ... to if X >= T'LAST; however, to require T'BASE'PRED to raise CONSTRAINT-ERROR for all improper values requires the introduction of another comparison (if X = T'FIRST or else X > T'LAST ... in place of if X = T'FIRST ...). ? The value for T'BASE'IMAGE applied to an improper value could be made implementation defined, just as for a nongraphic character, or the implementation could be required to raise CONSTRAINTERROR. In either case, T'BASE'WIDTH should probably be defined to return the maximum image length of all PROPER values, i.e., the same value as T'WIDTH. ? A test for membership in the first named subtype T must return FALSE for improper values. ? The relational operators should be defined to return a result based on comparison of position numbers. This will ensure that the three relations X in T X in T'FIRST .. T'LAST T'FIRST <= X and X <= T'LAST remain equivalent, and that they return FALSE for improper values. The discussion up to this point has ignored the issue of enumeration representation clauses. In fact, everything said so far applies as well to types with enumeration representation clauses, provided that improper values are assigned position numbers following T'POS(T'LAST). For example, given the declarative items 53 type T is (A, B, C); for T use (1, 2, 4); for T'SIZE use 3; position numbers might be assigned to bit patterns as follows: bit position pattern number value ------- -------- ---------- 000 3 (improper) 001 0 A 010 1 B 011 4 (improper) 100 2 C 101 5 (improper) 110 6 (improper) 111 7 (improper) It is already the case in Ada 83 that for types with enumeration representation clauses, position numbers do not correspond to bit representations. Rather, the 'POS and 'VAL functions can be implemented as tables mapping bit representations to position numbers and position numbers to bit representations, respectively. Operations like 'SUCC and 'PRED and the computation of an address for an array component indexed by the enumeration type can then be implemented in terms of 'POS and 'VAL: T'SUCC(X) = T'VAL(T'POS(X)+1) if X /= T'LAST T'PRED(X) = T'VAL(T'POS(X)-1) if X /= T'FIRST A(X)'ADDRESS = A'ADDRESS + T'POS(X)*storage_units_per_component However, RM 13.3(4) requires that the bit patterns corresponding to an ascending sequence of position numbers be in ascending order. The proposed scheme for assigning position numbers to improper values violates this property. Consequently, under the proposed rules, the comparison X < Y must be implemented by the comparison T'POS(X) < T'POS(Y) rather than by a direct comparison of bit patterns as in Ada 83. However, assigning all improper values higher position numbers than all proper values ensures that all proper values are LOGICALLY contiguous: By starting at T'FIRST and repeatedly 54 applying T'SUCC, one encounters a sequence of proper values ending at T'LAST. The effect of iteration over the first named subtype T and the use of T as an index subtype are well defined and the three relations X in T X in T'FIRST .. T'LAST T'FIRST <= X and X <= T'LAST remain equivalent ways of testing that X is proper. This is precisely what we want: The slightly cheaper comparison and range check made possible by RM 13.3(4) comes at the cost of giving arbitrary answers for improper values. (Since the proposed position-number assignment scheme for improper values would violate the property guaranteed by the rule in RM 13.3(4), there would be no advantage to retaining that rule. Programmers would be able to make good use of the freedom to list enumeration values in an order that is appropriate for the application, regardless of the underlying representation.) C.4.3 Improper values of floating-point types Some architectures have floating-point formats that are not compact. The effect of attempting to execute a floating-point operation on an invalid bit pattern may range from producing an arbitrary result to setting a flag to trapping. The following Ada rules provide maximum flexibility: 1. An arithmetic operation applied to an improper floating- point value either raises CONSTRAINTERROR or yields an arbitrary (possibly improper) value as a result. 2. A comparison applied to an improper floating-point value either raises CONSTRAINTERROR or yields an arbitrary (but proper) BOOLEAN result. It is straightforward to implement these rules on a floating-point processor implementing the IEEE standard 754 if NaN's and infinities are regarded as improper values. (This is not to say, however, that the Ada rules then implement the IEEE standard. For example, Ada rules require division by zero to raise an exception rather than yielding an improper value whose representation happens to be an IEEE infinity.) 55 Tests for the validity of a floating-point representation can be provided by short assembly-language routines. Checks for improper floating-point values may be somewhat more expensive than checks for improper values of other scalar types, but not inordinately so. Such checks need be performed only rarely. Checks are required (and clearly desired, despite the cost) for a test for membership in the first named subtype or in a qualified expression, though of course they may be optimized away if the compiler can establish that the checks always succeed. Checks are NOT required upon assignment of a floating-point value, upon the passing of a floating-point value as a parameter, or upon use of a floating-point value in an arithmetic operation (provided that any resulting trap can be converted to a raising of CONSTRAINT-ERROR). Furthermore, at least for types T for which T'MACHINEOVERFLOWS is true, no arithmetic operation applied to proper operands can yield an improper result. This facilitates the determination that certain variables have confirmed values, allowing checks on such variables to be optimized away. C.4.4 Improper values of fixed-point types Like integer types, fixed-point types present no difficulties. The rules of Ada 83 already stipulate that a fixed-point type declaration introduces an anonymous base type (derived from a predefined fixed-point type) and a first named subtype constrained by the real type definition. In typical implementations, the anonymous base type is already compact. C.4.5 Improper values of array types Our treatment of array types is based on the following two principles: 1. An array value may be proper even if it includes some improper component values. 2. `Dope' describing the bounds of an array should not be regarded as part of an array value or an array object. Rather, it is bookkeeping information that may be stored adjacent to the object or elsewhere. Principle 1 is based on the observation that it is common for data structures to include composite variables with some uninitialized components, along with auxiliary information indicating which components have meaningful data. Familiar 56 examples are stacks, text buffers, and circular queues. Principle 2 reflects the thinking in unapproved issues UI-0063 and AI-00556. UI-0063 would stipulate that dope vectors do not participate in unchecked conversion to or from an unconstrained array subtype. Given the declarations and pragma subtype INDEX is INTEGER range 1 .. N; type A is array(INDEX range <>) of C; pragma PACK(A); AI-00556 would require the length clause for A'SIZE use N*C'SIZE; to be accepted, from which it follows that the dope vector for an object of type A is not included in A'SIZE. It follows from these principles that improper values of an array type can only arise in the following ways: ? If Ada 9X allows array types with discriminants, array values with invalid discriminant values (i.e., proper or improper values that do not belong to the declared discriminant subtype) would be improper. Similarly, array values whose lengths are inconsistent with their discriminant values would be improper. ? For array types with padding between components, if the implementation attempts to maintain certain properties for padding bits (typically that all padding bits are zero, to facilitate comparisons of whole array values by block comparisons), an array value that violates such properties could be regarded as improper. (For an instantiation of SEQUENTIAL-IO or DIRECTIO with an unconstrained array type, an actual parameter with particular bounds is passed to a call on READ; these bounds are never altered based on dope in the file. If unchecked conversion to an unconstrained array type is supported, the number of components in the result may be inferred from the number of bits in the operand, but not from any information contained in those bits; presumably an implementation supporting such conversions would construct its own dope vector for the result, using implementation- defined rules for selecting a lower bound, and in the case of multidimensional unconstrained array types, for selecting how the number of components should be factored into lengths for each dimension.) It seems clear, given the declarations 57 type LINE(LENGTH: NATURAL := 0) is array (1 .. D) of CHARACTER; subtype CARD is LINE(LENGTH => 80); L : LINE; that the membership test L in CARD should return FALSE if L.LENGTH has a negative value (because, for example, L was obtained by unchecked conversion of bad data or from a call on READ). That is, the membership test should check for values that are improper because of invalid discriminant values. Since a programmer may want to validate externally generated data without checking for a particular discriminant value, the membership test for the first named subtype, such as LINE_POINTER.all in LINE should also check for improper values. From this decision to acknowledge that improper array values may arise, it follows that--in theory--a check should be made for invalid discriminant values before an indexed component or slice of an array with discriminants is evaluated. In practice, this check can almost always be optimized away, because the initial value of a declared or allocated array is a proper value and the normal operations on arrays preserve proper values. It is only when a whole array is updated with the result of an unchecked conversion or by a call on READ, or, in the case of aliased arrays with discriminants, when the address of the array has been taken and used for unknown purposes, that improper array values may arise. A compiler can apply a simple strategy to ensure that unconfirmed array values do not propagate across subprogram calls: Any unconfirmed array value passed as an in or in out parameter should be checked by the caller some time before the call; any unconfirmed array value assigned to a formal parameter of mode in out or out should be checked by the called subprogram after it is assigned and before any operation that may raise an exception or return to the caller. This allows the compiler to treat all in and in out formal parameters as having confirmed values at the start of a subprogram, and that all in out and out actual parameters as having confirmed values at the end of a procedure call (even if the call propagates an exception), making it easy to treat most array values occurring in a program as confirmed. 58 It is not clear that arrays with incorrect padding should be regarded as improper, because padding is an implementation concept with no manifestation in the abstract semantics of the language. Rather, it can be argued that an implementation that uses some internal invariant to expedite particular operations should be responsible for preserving that invariant. Specifically, an implementation that zeroes padding bits to facilitate fast array comparisons must not assume that data coming in from such sources as unchecked conversion or calls on READ is properly padded. The implementation must zero the padding bits of suspect arrays before performing the comparison, or else it must use a slower component-by-component comparison that does not depend on the invariant. A compiler can keep track of which arrays are known to be properly padded and which are suspect, and resort to such special measures only in a few rare cases. Nonetheless, a membership test on an array should not return FALSE by virtue of having arbitrary padding bits; a sequence of bits with nonzero padding should be viewed as an alternative representation of a proper value. If incorrect padding is not viewed as making a value improper, and if discriminants are not allowed for array types in Ada 9X, there will be no improper values of array types. C.4.6 Improper values of record types The issues of invalid discriminants and invariants maintained for padding bits arise for record types as well as for array types. In addition, records are permitted to contain `implementation- dependent components' (dope information) as part of the record value (see RM 13.4(8)); the tags of Ada 9X tagged types could be regarded as language-defined hidden components analogous to these implementation- defined components. Finally, the bits representing a record value may, in some implementations, include pointers to composite components controlled by discriminants, with the components themselves located in a different part of storage. By the same arguments that apply to array types, and for the sake of consistency with array types, the following rules should apply to improper values of a record type: ? Membership tests should yield the value FALSE and qualified expressions should raise CONSTRAINTERROR for values with invalid discriminants or invalid tags. ? Membership tests should yield the value FALSE and qualified 59 expressions should raise CONSTRAINTERROR for values with invalid implementation-defined components, to the extent that there is a straightforward test for the validity of such components. (Implementation-defined components that are pointers to or offsets of data stored elsewhere may be particularly difficult to validate.) An implementation should document the meaning of its implementation- defined components and the extent to which each is validated. If an improper value of a record type violates some property that is required of implementation-defined components but cannot be validated, use of that improper value should render execution erroneous. ? Like array values, record values should not be considered improper by virtue of the contents of their padding bits. C.4.7 Improper vales of access types Improper access values can be particularly devastating. Besides the usual sources of improper values, unchecked deallocation has the effect of making proper access values improper. Thus flow analysis is useless in tracking the propagation of improper access values. Unapproved uniformity issue UI-0060 would require a membership test for an access value to return FALSE under certain circumstances: If T is an access type and the result of the simple expression [in the lefthand part of a membership test] is recognized by the implementation to be invalid as a representation of a value of type T, the membership test must return FALSE. Otherwise, if the type mark of the membership test imposes a discriminant constraint and the result of the simple expression designates an object whose discriminants do not contain valid values of the corresponding discriminant subtypes, the membership test must return FALSE; similarly, if the type mark imposes an index constraint and the result of the simple expression designates an object whose internal descriptor information is not of the form required by the implementation, the membership test must return FALSE. Otherwise, if the type mark of the membership test imposes a discriminant or index constraint and the result of the simple expression designates an object that does not satisfy that constraint, the membership 60 test must return FALSE. Otherwise the membership test must return TRUE, regardless of the contents of the designated object. An implementation must document in Appendix F the circumstances under which the result of the simple expression is recognized as invalid for access type T and the expected form of the internal descriptor information for designated array objects. These rules seem reasonable for Ada 9X. The contents of the designated value enter into the membership test for an access value only in those cases where the discriminants or bounds of the designated value are examined as part of the test for satisfaction of a constraint on an access subtype; a pointer to an object in which this information is invalid must be considered not to satisfy the constraint. Like any composite value, the value of a composite variable designated by an access type may be proper even if it has improper components. Defining an access value to be improper if it designated a composite with improper components would be particularly inappropriate: In the case of a recursive data structure, this could lead to infinite recursion in the absence of some marking scheme. C.4.8 Improper values of task types and protected types The generic formal parameters of SEQUENTIALIO and DIRECTIO are generic formal private types, not limited private, so these generic packages cannot be instantiated with task types and protected types. However, the generic formal parameters of UNCHECKEDCONVERSION are generic formal limited private types, so this restriction is easily circumvented: task type T is entry E; end T; type T_SHADOW is array (0 .. T'SIZE-1) of BOOLEAN; pragma PACK(T_SHADOW); function BONA_FIDE_TASK_OBJECT is new UNCHECKED_CONVERSION(T_SHADOW, T); package RUSSIAN_ROULETTE is new SEQUENTIAL_IO(T_SHADOW); SHADOW : T_SHADOW; 61 F : RUSSIAN_ROULETTE.FILE_TYPE; ... RUSSIAN_ROULETTE.READ(F, SHADOW); BONA_FIDE_TASK_OBJECT(SHADOW).E; The effect of the entry call is, of course, unpredictable: There is a remote chance that it will actually work. (Other ways in which arbitrary bit patterns can be placed into a task object or a protected object include passing an access value designating that object to a non-Ada subprogram, using unchecked conversion to convert such an access value to another access type whose designated type is not limited, or using an address clause to locate the limited object at a storage location that is altered by some hardware or software agent outside of the Ada program.) It is unreasonable to require the implementation to check for the consistency of a task object representation before using the task object. Furthermore, improper values of a task type or a protected type can be created only by elaborate subterfuge; they are not likely to arise by accident. Therefore, the following rules should be adopted: 1. Every task object and protected object has a proper initial value. (RM 9.2(2) effectively says this already for task types.) 2. All manipulations defined for task types and protected types, when applied to a task object or protected object with a proper value, leave a proper value in that object. (`Manipulation' includes invocation of a protected operation or a task entry, evaluation of attributes of the object, abortion, and invocation of the subprograms proposed in the real-time annexes for manipulation of tasks and priorities.) 3. If a task object or protected object is somehow given an improper value, any manipulation of that object (in the sense of Rule 2) is erroneous. It is a consequence of these rules that if a programmer writes if BONA_FIDE_TASK_OBJECT(SHADOW) in T then BONA_FIDE_TASK_OBJECT(SHADOW).E; end if; 62 the membership test may be optimized away. (It would be straightforward to test for invalid discriminant values for task objects or protected objects with discriminants. However, invalid discriminant values could only arise in circumstances where the object might also be corrupted in less easily characterized or detected ways. Therefore, a check for valid discriminants alone is of no practical use.) D Document Details D.1 Status This is a working document to help formulate recommendations for the Ada 9X safety and security Annex. D.2 Project This work is funded by the Ada 9X project via Intermetrics. D.3 File Stored on the Sun in file baw/ada/safe/uninit.tex D.4 History ? First draft to be completed for the Frankfurt meeting. D.5 Actions ? Mapping Team to review. ? Any Ada 9X reviewers to comment as desired. 63