LSN022.StatABECheck ------------------------ !topic LSN on Static A-B-E Checking $Revision: 1.4 $ !reference MS-10.5 !from Tucker Taft $Date: 91/11/26 13:46:18 $ !discussion Robert Dewar made the interesting proposal at the last DR meeting that we should reinvestigate the question of static access-before-elaboration (ABE) checking for Ada 9X. We have proposed changes to pragma ELABORATE, and the addition of pragma ELABORATE_SELF to partially ameliorate the current situation, but these are really patches on patches. A static approach would be clearly be preferable if it satisfied the following criteria: 1) Upward consistent with Ada 83; 2) Upward compatible for nearly all existing programs; 3) Relatively easy to describe and understand; 4) Relatively easy to implement; 5) Flexible enough so that it never gets in the way of what a user wants to do. A static approach would certainly provide greater run-time efficiency. But probably the biggest advantage would be that most users could stop worrying about access-before-elaboration. The compiler or the linker would identify the problems that exist, and would otherwise pick an order where no ABE is possible. THE PROBLEM The current ABE checks occur when calling a subprogram, activating a task, and instantiating a generic. In each case, PROGRAM_ERROR must be raised if the body has not been elaborated prior to the check. The problem is to determine statically whether there is an order of library unit spec and body elaboration that ensures that no ABE checks can occur, and if so, pick one such order. For Ada 9X, we have made the problem more subtle because a subprogram may be called indirectly, either through an access-to-subprogram value, or as part of a run-time dispatch. However, these can be modeled fairly simply by defining a "pseudo subprogram" whose body is defined to contain calls on all possible targets of such indirect calls. These pseudo subprograms can be defined dynamically for each call based on what targets are possible at that point (which depends recursively on the elaboration order chosen), or can be defined statically by what targets are possible at any point in the program, or at any point during elaboration. SOME TERMINOLOGY For economy of discussion, let us define various terms: 1) A "unit" is a subprogram, task, generic, or subprogram "collection." A subprogram collection represents an access-to-subprogram type, or a dispatching operation. 2) A unit is "invoked" when it is used in a way that would perform an ABE check (i.e., "invoke" covers subprogram call, task activation, and instantiation). 3) A unit may be "added" to a collection as a possible target of an indirect call, meaning that an invocation of the collection may invoke the unit added (but only after the addition takes place, if using a dynamic definition of the collection). This happens on subp'ACCESS, access-to-subp type conversion, and overriding an inherited dispatching op. For now we will assume a macro-expansion model of instantiation. Therefore the "invoking" of a generic does no more than perform the ABE check. The actual elaborating of the bodies of the units produced by instantiating a generic package happens separately from "invoking" the generic, as though the instantiation's package body were "inlined" at the point of the instantiation. In other words, the units produced by an instantiation are considered separate units from the generic unit. A STATIC RULE Rather than raising PROGRAM_ERROR on invocation before body elaboration, we must restate the rule in terms of a (static) legality requirement on the program. Here is the static rule: The body of a unit must "occur strictly before" it is "used." The phrases "occur strictly before" and "used" are defined statically, as follows: a) Within the sequence of declarative items and optional "begin-part" which form a single declarative region, a declarative item "occurs strictly before" later declarative items and the begin-part in the same sequence. b) If a declarative item of a region is a unit that is a package declaration or package body or package instantiation, then the earlier declarative items of the (enclosing) region occur strictly before the declarative items and begin-part of the package unit, which in turn occur strictly before the declarative items and begin-part occuring later in the (enclosing) region. (In other words, packages should be "flattened" into the enclosing region.) c) The body of a subprogram is "used" by a begin-part or expression that contains a call on the subprogram. (Note -- this means that a subprogram is "used" even if the call appears in an alternative of an if statement or case statement which is never executed.) d) The body of a task type is used by the begin-part (an implicit one if no explicit one is present) of a declarative region, if an object of the task type, or a type with a subcomponent of the task type, is declared in the region. (Note -- this means that for a variant record, the task is "used" even if it only appears in an unchosen variant.) e) The body of a task type is used by a begin-part or expression that contains an allocator for the task type or a type with a subcomponent of the task type. f) The body of a generic is used by an instantiation of the generic. g) A type or subtype declaration uses an expression if it appears in a constraint in the declaration. An object declaration uses an expression if it appears in the declaration. A variable declaration without an initial expression uses all of the default expressions of its type. A renaming declaration uses any expressions that appear in the name being renamed. A call on a subprogram (or entry) uses any default expressions in the specification of the subprogram (or entry). A generic instantiation uses the expressions that appear in it, as well as all of the default expressions that appear in the formal part of the declaration of the generic. h) A call through an access-to-subprogram value of type T calls the subprogram "collection" of access type T. The collection then calls subprograms and other subprogram collections, as follows: - The collection of T calls a subprogram if the ACCESS attribute of the subprogram that returns type T is used in a begin-part or expression, unless the call on the collection is strictly before this use; - The collection of T calls the collection of T1 if a conversion from T1 to T is used in a begin-part or expression, unless the call on T's collection is strictly before this use. i) A dispatching call of a dispatching operation (i.e. where the controlling tag value is not known statically) calls the "collection" of the dispatching operation. The collection of the dispatching operation then calls other corresponding subprograms, as follows: - The collection of a given dispatching operation of a given tagged type, calls a corresponding subprogram body of the type or its derivatives, unless the call is strictly before the declaration of the corresponding subprogram. j) A begin-part or declarative item uses a unit if it uses a subprogram or collection or task that uses the unit. Although there are a lot of cases, the concept is pretty straightforward. A unit is used if it might be called/activated/instantiated directly or indirectly. AN IMPLEMENTATION MODEL Within a single declarative region, the rule is straightforward, and internal calls can be checked at compile-time in the absence of subunits. In the presence of subunits, and for calls to separate library units, information must be provided to the pre-linker to verify that no access before elaboration is possible, and to pick a safe elaboration order if one exists. The compiler must provide the pre-linker with information on each compilation unit. This must include information about the ordering of the bodies of subprograms/tasks/generics which are usable outside the compilation unit, relative to the uses of subprograms/tasks/generics declared outside the comp-unit. In addition, it must include information about the ordering of uses of a subprogram collection, and the operations that add units to a subprogram collection (subp'ACCESS, access-to-subp type conversion, and overriding of a dispatching op). Because we are ignoring the order of execution within a begin-part, and the order of evaluation within an expression, the information on a compilation unit can be kept relatively simple. For each compilation unit, and each of its units that are externally usable, a list identifying the (partial) order of occurrence of the following should be provided: 1) A body of an externally usable unit 2) A use of an externally defined unit 3) An addition of a unit to an externally visible subprogram collection 4) A use of an externally visible subprogram collection 5) A stub for a separately compiled subunit All uses or additions occuring within the same declarative item or same begin-part, should be grouped into an unordered set. Packages should be flattened out in this information. Local units need not be reported separately if uses appearing inside them are incorporated into the list of uses within externally usable units that call them. THE ALGORITHM First we connect all subunits to their parent unit at the point of their stub. Hereafter, we no longer treat subunits as separate compilation units. We then perform a closure over the "uses" information, ignoring the uses of collections for now, to form an initial stab at the "uses" relation of the static rule. We then go through each compilation unit, and check that it does not use one of its own units prior to elaborating the body of that unit. If this check fails, then we have a local ABE problem. Otherwise, we go on to check for global ABE problems. Given that each unit is elaborated in exactly one compilation unit, we can turn the "uses" relation into a "must follow in elab order" relation between compilation units. The "must follow" relation also must include the "with" dependences and the "pragma ELABORATE" explicit "must follow" specifications. If the "must follow" relation has any cycles, then there is a global ABE problem. We then look for an addition of a unit to a collection that "must follow" the elaboration of the unit, or for an addition that "must follow" all calls on the collection. We drop each such addition from further consideration. We also look for calls on a collection that occur in the same unordered set with additions, or "must follow" additions to the collection still being considered. If we find one, we add in "must follow" relationships between the calling compilation unit and the units elaborating the units that were added to the collection. We iterate this process until no additional "must follow" relationships are added, or until there are no more "additions" to consider, or until a cycle is detected. When we are done, we may still have one or more additions that need to be considered. If so, we can try to introduce a "must follow" relationship between the call on the collection and the elaboration of the unit being added. Introducing one such "must follow" may allow other additions to be eliminated, and may force additional "must follow" relationships to be added, or may result in a cycle (in which case it has to be backed out). This is an iterative trial and error process, but hopefully some clever person can find a non-exhaustive search process which is guaranteed to terminate if there is a non-cyclic solution. If all of the additions can be dispensed-with without introducing a cycle, then we can do a simple topological sort to determine a legal elaboration order.