!topic LSN on General Access Types !key LSN-1083 on General Access Types !reference RM9X-3.10.2;4.0 !reference RM9X-3.10.2(16);4.0 !reference AARM-12.3(12.p,12.q);4.0 !reference LSN-1042 on Accessibility Checks in Generics !from Bob Duff $Date: 94/02/15 15:17:44 $ $Revision: 1.5 $ !discussion Two issues related to access types and the accessibility rules came up again at the recent DR/XRG meeting in the UK: - The rules prevent "downward closures". - The rules are applied in an assume-the-worst manner in generic bodies. These issues keep coming up, because they represent clearly-desirable functionality, but with a substantial implementation cost (at least for some implementations). It is difficult to make the trade-off. We recommend not supporting downward closures, but solving the generic problem by going over to run-time accessibility checks in generics. Below, we discuss the issues, and the reasons for our recommendation. LSN-1042, published in October, 1992, discusses in general the relationship between accessibility checks and the generic contract model. ---------------- Downward Closures: Ada's access-to-subprogram types represent assignable subprogram values. They support the ability to create data structures containing the identities of subprograms. They support callbacks, where one software component declares a subprogram and hands its identity to another software component, which can then "call back" the first software component at a later time. These types do not support downward closures, where the identity of a more nested subprogram can be passed to a less nested subprogram. Downward closures can be used, for example, to implement iterators, like this: package P is type Integer_Set is private; ... -- various operations type Integer_Action is access procedure(I: Integer); procedure Iterate(S: Integer_Set; Action: Integer_Action); -- Call Action on each member of S. end P; function Count(S: Integer_Set) return Natural is Result: Natural := 0; procedure Increment(I: Integer) is begin Result := Result + 1; end Increment; begin Iterate(S, Increment'Access); -- Illegal! return Result; end Count; Increment'Access above is illegal, because Increment is more nested than type Integer_Action. It would be bad to have to make Increment global, because that would require making the variable Result global. It is a bad idea to force the use of global variables in a language that has tasks -- in this example, Count would not be reentrant if Result were global. Note that this situation is typical of iterators and similar things -- in typical use, the action passed to an iterator will often need visibility upon a variable declared in the caller of the iterator, in order to accumulate some sort of result. The reason for the restriction is that access-to-subprogram types allow assignment. At the point where Iterate is called, we don't know if the body of Iterate is going to save the access value in a global variable. We disallow passing the nested Increment procedure because Iterate *might* save a pointer to it, and call it later, after Increment (and the variable Result) no longer exist. Pascal supports downward closures, but it does not support Ada's callback style, because the identity of a subprogram being passed as a parameter cannot be copied. Modula-2 supports the callback style, but not the downward closure style. Languages like Lisp support a much more general form of closure, which is not of interest to us, since it requires "stack frames" to be heap allocated -- "heap frames", really, in the general case. C supports the callback style, but since there are no nested functions, the question of downward closures does not arise. There are no tasks, either, in C, so making variables global (or C's "static") does not cause as much trouble as in Ada. (Supporting threads in C presents some interesting problems!) Implementation Issues: In a language (like Ada) with nested subprograms, each subprogram needs some way to access its environment -- that is, the variables declared outside it. There are two common methods for representing this environment: static links, and displays. Most Ada compilers use static links, but some use displays. One key difference between the two methods is that static links have a compile-time-known size (32 bits, on a typical machine), whereas the size of a display depends on the maximum nesting depth throughout the partition. In an implementation with static links, an access-to-subprogram value is generally represented as a pair of addresses -- the address of the subprogram's code, and the static link. However, given the current Ada 9X rules, an access-to-subprogram type that is declared at library level does not need the static link -- it can be represented as just a code address. This is nice, because it is efficient, and it allows the representation to easily match C's typical representation of function pointers. Access-to-subprogram types declared one level deeper than library level can also be represented as just a code address. However, two or more levels deeper requires the static link. In an implementation with displays, given the current rules, an access-to-subprogram value can be represented as just a code address -- it is not necessary to attach a display to each access value. This also has the nice interoperability-with-C property. It has been suggested that downward closures be supported by allowing the Unchecked_Access attribute on subprograms. (It is currently allowed only on objects.) The semantics would presumably be that the access value returned is valid (the designated subprogram can be called) so long as the designated subprogram (and it's containing stack frame) still exist. If it doesn't exist, a dereference would be erroneous. We do not recommend this method for these reasons: - It would be unsafe -- it is uncomfortable to introduce erroneousness to a high-level feature like downward closures. - It would require every access-to-subprogram value to carry a static link or display, because the nesting level of the designated subprogram would no longer be restricted at compile time. This would break the interoperability-with-C property, and would be less efficient. (One could imagine an implementation using a different representation when the Convention is C, but that adds complexity, and doesn't solve the efficiency problem in the Ada-only case.) There are two ways to support downward closures that we would recommend, if downward closures are to be supported. Both ways recognize the fact that downward closures really need a separate feature from the callback style of access-to-subprograms: - Limited access-to-subprogram types. - Access-to-subprogram parameters. Limited access types were in an earlier version of Ada 9X. Since they would be limited, assignment would be disallowed, so downward-closure-style parameter passing could be allowed, and would be safe. The compiler would implement limited access-to-subprogram types with a static link or display. Non-limited ones would retain the interoperability-with-C property. The syntax would be something like this: type Integer_Action is limited access procedure(I: Integer); -- ^^^^^^^ procedure Iterate(S: Integer_Set; Action: Integer_Action); -- Call Action on each member of S. Type conversion from limited to non-limited would be illegal. Access-to-subprogram parameters would be an extension to access parameters (not to be confused with parameters of a named access type). The syntax would be something like this: procedure Iterate(S: Integer_Set; Action: access procedure(I: Integer)); -- Call Action on each member of S. This is quite similar to the syntax used in Pascal. As for any access parameter, the access type would be anonymous. Parameter type matching would be based on the designated profile. Access parameters already use run-time accessibility checks -- this new kind would do likewise. Thus, assignment would be allowed, but the accessibility checks would prevent dangling pointers. In our iterator example, if Iterate were to try to copy Action to a global variable (during the call to Iterate from Count), then Constraint_Error would be raised, because Count.Increment is too deeply nested. The value of an access-to-subprogram parameter would need to have a static link or display. Library-level access types could retain the interoperability-with-C property. Of the two workable methods outlined above, the access-to-subprogram parameters method would be somewhat simpler in Reference Manual terms, because it would be able to piggyback on the existing access parameters feature for its rules and semantics. The implementation complexity seems equivalent. Efficiency of the limited access types method would be slightly better, because it would not be necessary to pass in information needed for the run-time accessibility checks. Limited access types would be the first elementary limited types; we would have to reword the parameter passing rules to avoid a contradictory requirement to pass by copy and by reference. (It doesn't really matter which is done, actually.) For implementations with static links, the implementation of either method is straightforward. Such implementations already need to include a static link in at least some access-to-subprogram values, and the downward-closure ones would be the same. However, for implementations with displays, either method is an implementation burden, because: - The current rules do not require carrying the display with the access-to-subprogram value, whereas the new rules would. - The size of the display is not easily known at compile time. It's size can be calculated at run time, or a maximum possible size can be known if there is a restriction on nesting depth. Either way, managing the passing of the unknown size display causes complexity. It is certainly *possible* to implement: I know of at least one Pascal compiler that used displays, and implemented downward closures. Gnu C supports nesting of functions as an extension, and supports pointers to them. And Ada has other features that require the management of unknown-sized data structures. Nonetheless, the implementation is not trivial. It should also be noted that an implementation with displays would be less efficient than one with static links, because an indirect call would require copying the entire display. It's not that big of a deal, however -- it just means copying several words of memory, and there would be no *distributed* overhead. Note also that there are workarounds: Subprograms can be passed as generic formal parameters, so an iterator can often be implemented as a generic formal parameter. However, that prevents recursion, and is more verbose. A closure can also be represented as an access-to-class-wide value. In the iterator example, whatever state is needed by the Action procedure would be encapsulated in a type extension. However, this method is even more verbose. One possibility would be to not support downward closures directly as access-to-subprogram types, but to make the workaround a bit easier. In particular, we could allow limited tagged types to be extended in a more nested place than the parent type. (The accessibility rule would remain as is for non-limited type extensions.) Instead, a rule would be needed for allocators, similar to the current rule for types with access discriminants. This relaxation is possible for limited types, because they have no assignment, and hence cannot be copied to a place where the object lives longer than its type. This would allow the following: package P is type Integer_Set is private; ... -- various operations type Closure is tagged limited null record; procedure Action(C: Closure; I Integer); procedure Iterate(S: Integer_Set; C: Closure); -- Call Do_It(Action, M) for each member M of S. end P; function Count(S: Integer_Set) return Natural is type My_Closure_Type is new Closure with null record; -- Illegal! Result: Natural := 0; procedure Action(C: Closure; I: Integer) is begin Result := Result + 1; end Action; My_Closure: My_Closure_Type; begin Iterate(S, My_Closure); return Result; end Count; The above is less verbose than the current workaround, and has the advantage that My_Closure_Type can be declared where it arguably belongs -- inside Count. The implementation could store the static link or display in each object of a limited type extension that was declared at a more-nested level than its parent. Code would be added to each potentially primitive subprogram of such a type, to move the static link or display from the parameter object to its proper location. Note that no change to the call sites is necessary. Nonetheless, there is some implementation burden, because extra compiler work is needed for declaring these objects, and in the bodies of the potentially primitive subprograms. (The reason I say "potentially" primitive is that because of renaming, we don't know in general while compiling a given subprogram that it will be primitive. Potentially primitive subprograms are essentially the ones declared at the same nesting level, and that take a parameter of the type.) We (reluctantly) recommend not supporting downward closures, because: - An implementation burden exists for display-based implementations. - Workarounds exist. - It seems like too big a change to make to the language at this late date. I say "reluctantly" recommend, because it is clear to me that the functionality is desirable, and if we were designing Ada 9X from scratch, we would support downward closures. ---------------- Generic Bodies: Currently, accessibility rules are checked in generic bodies in an assume-the-worst manner (see 3.10.2(16) and 12.3(12.p,12.q)). This means that such rules are checked assuming the instantiation will be more nested than the generic unit. This is unfortunate, since most of the time, the instance will be at the same nesting level as the generic unit; the rules are therefore more pessimistic than is usually necessary. In many cases, it is possible to work around the problem by moving something from the generic body up to the private part. For example, suppose P is a subprogram declared in a generic body, and the body wishes to do P'Access, of an access type declared outside the generic unit. This is illegal, but one can move the declaration of the subprogram to the private part of the generic, and declare a constant, also in the private part: P_Access: constant Global_Access_To_Subprogram_Type := P'Access; Similarly, type extensions can be declared in the private part of the generic when they are illegal in the body. For the Access attribute for objects, a different workaround is possible -- use Unchecked_Access instead. Unfortunately, this throws away all the benefit of the accessibility checks, and can lead to erroneous execution if one is not careful. Another case is for type conversion of an access type declared in the generic unit (spec or body) to a type global to the generic unit. This is currently illegal. A possible workaround is to add a conversion function to the generic body: function Convert(Ptr: access ...) return Global_Access_Type is begin return Global_Access_Type(Ptr); end Convert; Then, if X is of the inner access type, Convert(X) will do a run-time check, which will fail if and only if the current instance is more nested than the generic unit. All of the above workarounds are annoying, because they cause trouble when converting a package into a generic package -- one has to apply the above workarounds after having written and tested the non-generic package. One possible solution to the above problems is to define some extra syntax, or a pragma that means "this generic unit will be instantiated only at the same nesting level". However, extra syntax for a small issue seems like too big a change at this point. And a pragma with such a heavy semantic effect seems ugly. Perhaps a better solution would be to say that all accessibility rules are checked at run time in instance bodies. (Doing things at run time is one way of getting around generic contract model problems in general, and it would work here.) An implementation that does macro expansion for generic instantiations could always detect these "run-time" errors at macro-expansion time. An implementation that does code sharing would have to generate code to do the run-time checks. One problem with either solution is that there is a substantial implementation burden for implementations that do universal generic code sharing. For such implementions, when a subprogram declared inside the generic unit is called, it needs to be implicitly passed an extra parameter that represents the current instance. This extra parameter is used by the subprogram to address variables declared in the generic body. Since such subprograms have a different calling convention, if one allows access values pointing at them to escape outside the generic, then one has a problem. We recommend using the run-time accessibility checking solution. However, because of the implementation problem, we recommend using this solution only for access-to-object types, and not solving the problem for access-to-subprogram types. The implementation problem mentioned above does not exist for access-to-object types. This solution is not entirely consistent, but note that it is already the case that access-to-subprogram types are different: there are no anonymous access-to-subprogram types, and there is no Unchecked_Access attribute for these types. ---------------- SUMMARY: Downward closures can be sensibly supported, and are quite desirable, but cause implementation difficulties for display-based implementations. Therefore, we recommend not solving this problem. Removing the accessibility restrictions on generic bodies can also be sensibly supported, and is also desirable, but (for access-to-subprogram types) presents severe implementation difficulties for implementations that do universal sharing of generic bodies. We recommend solving the problem for access-to-object types by using run-time checks. We recommend not solving the problem for access-to-subprogram types.