!topic LSN on Finalization in Ada 9X !key LSN-1046 on Finalization in Ada 9X !reference MS-7.4.6;4.6 !reference MI-OO03 !from Bob Duff $Date: 92/10/10 14:03:39 $ $Revision: 1.3 $ !discussion (I also want to say, "reference MS-7.4.6;2.0", but our tool doesn't like version 2.0.) This Language Study Note discusses User-defined Finalization. At the Frankfurt WG9 meeting, the concensus was that finalization should be kept in the language, and should be moved to the core language. (It was at that time in the SP Annex.) However, two countries voted that the MRT should study the issue of removing a certain restriction from the proposal, and we heard the same concern from several DRs. This LSN addresses that concern. The restriction in question is that the user must decide whether or not to have finalization at the root type of a hierarchy. Finalization cannot be added to a derived type, unless the parent type has finalization. The MRT has already made several proposals that address the concern (see the MI and old version of the MS referenced above). This LSN summarizes some of that information. We need to decide whether the removal of the above restriction is worth the cost. ISSUES ADDRESSED BY ALL PROPOSALS: First, we discuss some issues that must be addressed by any finalization proposal. ISSUE 1: SYNTAX. Three different proposals for the syntax have existed at one point or another. MI-OO03 proposed this: procedure My_Finalize_Operation(X: T); for T'Finalize use My_Finalize_Operation; MS;2.0 proposed this: procedure T'Finalize(X: in out T); MS;4.6 proposes no extra syntax for finalization. Instead, the user indicates finalization by deriving from a special tagged type. ISSUE 2: INITIALIZE AND FINALIZE ARE ONE-FOR-ONE. It is obviously a bad idea to finalize uninitialized variables. Since finalization is done automatically, the user has no control over such erroneousness. Therefore, the rules must ensure that if a variable is not initialized, it is not finalized. It is also a bad idea to miss any finalizations. If an object IS initialized, then it should be finalized also. The conclusion is that finalization should be done if and only if initialization is done. This implies that initialization and finalization should be abort-deferred regions. ISSUE 3: IS USER-DEFINED COPY ALLOWED? Finalization has certain interactions with copying. Ada performs a copy of an object in the following cases: - Default initialization of components, for object_declarations and allocators. - Explicit initialization of stand-alone objects, and of objects allocated by allocators. - Assignment statements. - Parameter passing and function return. - Input-Output, 'READ/'WRITE, unchecked_conversion, interface to other languages, and the like. The language-defined copy (which just copies the bits) is fundamentally incompatible with user-defined finalization. When the programmer defines finalization for a type, it is generally because the type contains a reference to some resource that needs to be deallocated, unlocked, or otherwise manipulated. Such references won't work if clients of the abstraction are making copies willy-nilly. For example, suppose I have a data type that contains a varying-length text string, so I use an access type. (In this case the "resource" is simply the heap.) package P is type T is [tagged?] [limited?] private; ... private type String_Acc is access String; type T is [tagged?] [limited?] record Name: String_Acc; ... -- other components. end record; end P; In order for the abstraction to be abstract, I must define a finalization operation that does something to the Name component whenever an object of type T disappears. For example, I might deallocate it. Or I might manipulate a reference count. In any case, if a client is allowed to make copies of such objects without giving me control, my abstraction won't work right -- Names will get deallocated twice, or reference counts will not be correctly maintained. In a tasking application, a lock might be unlocked twice. All of this means that any finalization proposal must either prevent copies, or must allow the user to define the semantics of the copy. In the above example, the user's copy operation would do a deep copy (i.e. allocate a new Name), or increment a reference count, or some such thing. Previous Mapping proposals have allowed the user to define a T'COPY operation. As explained above, the interesting operation is user-defined copy, not just user-defined assignment. In any case, a finalization proposal that allows copy, but not user-defined copy, won't fly. If user-defined copy is allowed, it should not be applied when inside the immediate scope of the type itself. Or perhaps for operations that are declared in that immediate scope. Otherwise, how can one define the copy operation in terms of assignment? ISSUE 4: INTENDED IMPLEMENTATION STRATEGY. The MRT believes that the only feasible implementation strategy is to add objects to a per-task list when they are initialized, and use the list to find them when they need to be finalized. In order to avoid the inefficiency and uncertainty of heap allocation, the list must be threaded through links that are in the object itself. Implementations are, of course, free to choose any strategy that they can get to work. However, we believe that a PC-map-based strategy would be quite complicated. Consider, for example, an array of finalizable components. Suppose that while initializing the array, an exception occurs during initialization of the seventeenth component. According to ISSUE 2, we must ensure that exactly those components that were initialized must be finalized. How do we remember which those were? Recall that initialization of the array need not take place in any particular order. With the per-task list strategy, it's easy -- if it's on the list, then it got initialized. Which types contain link fields, and what offset are they allocated at? One alternative is to allocate them in every object that might have finalization. (The different proposals differ as to how many such types there are, and therefore differ as to how tolerable it is to have overhead in every such object.) If they are always present, then they can always be allocated at the same offset, thus simplifying the process of finding them. Another alternative is to allocate them only in types that actually have finalization. But if finalization can be added anywhere in the hierarchy (rather than just at the root type), they cannot be allocated at the same offset in all objects. This raises the same run-time issues as multiple inheritance -- fields are at different offsets, so extra data structures are needed in order to find them at run time. Another alternative is to allocate them only in types that actually have finalization, but to put them at a negative offset from the beginning of the object. There are some drawbacks to this approach: some compilers don't know how to deal with negative record offsets. It makes the implementation more complex to be allocating record fields in two directions. On some machines, negative offsets are rather inefficient. Negative offsets may already be needed for other purposes. For example, some have noted that allocating discriminants at negative offsets can make tagged types more efficient. In addition to the links, each object must contain some method of finding the finalization operation. The same issues -- whether and where to allocate this field -- arise. ISSUE 5: FINALIZATION OF CONSTANTS. It makes sense for the finalization operation to take an 'in out' parameter of the type. However, this won't work if there might be constants of the type. ISSUE 6: ALLOWED FOR NESTED TYPES? If finalization is allowed for a type that is nested in a subprogram or task body, then when finalization occurs, it is necessary to set up the appropriate static link or display to represent the correct static context. ISSUE 7: COMPOSABILITY. If a composite type has components with finalization, does the composite type have a finalization operation composed from those if the components? If it also has its own finalization, when is that done? If a parent type has finalization, does a derived type inherit it? If the derived type has its own finalization, does this finalization override that of the parent, or is it in addition to that of the parent? Note that for tagged types, if composition is automatic on derivation, the compiler needs to construct the composite finalization operations. Consider, for example, what happens on finalization of an object with an unknown tag: type A is access T'CLASS; X: A; ... -- make X point to an object somewhere in the class FREE(X); -- X.all is finalized at this point. A tagged type may be both derived from something with finalization, and have additional components with finalization, and also have its own finalization. If these things compose, then in what order? Similar questions arise for the copy operation, if it is user-definable. Whatever the answers are, are they uniform with the way user-defined equality works? Is it necessary to change equality in an upward incompatible manner? ISSUE 8: DECISIONS MADE AT ROOT TYPE? Does the user have to decide whether or not to have finalization at the root of a hierarchy of derived types, or can finalization be added in later? This is the concern that this LSN is trying to address -- making decisions at the root type is restrictive. ---------------- Now we turn to the individual proposals, and discuss how they address each of the above issues: - PROPOSAL A: FULL-BLOWN FINALIZATION - PROPOSAL B: COPY ONLY FOR LIMITED TYPES - PROPOSAL C: NO USER-DEFINED COPY. - PROPOSAL D: RESTRICTION TO ROOT TYPE. - PROPOSAL E: DERIVATION FROM CONTROLLED ---------------------------------------------------------------- PROPOSAL A: FULL-BLOWN FINALIZATION This is similar to the MI-OO03 proposal. Finalization is allowed for any type. ISSUE 1: SYNTAX. We'll use this syntax, since people seem to like it better than the MS;2.0 proposal: procedure My_Finalize_Operation(X: T); for T'Finalize use My_Finalize_Operation; Open issue: Is the fact that finalization exists for a given type an abstract property of the type? Should there be rules about the placement of "for T'FINALIZE use..." that ensure it is exported from a package (for a private type)? In other words, should the programmer of a derived type be able to know whether the parent type has finalization? ISSUE 2: INITIALIZE AND FINALIZE ARE ONE-FOR-ONE. By definition. ISSUE 3: IS USER-DEFINED COPY ALLOWED? Yes. Some syntax such as "for T'COPY use ..." could be used. In addition, we prevent copy for parameter-passing and function return by saying that all finalizable types are passed by reference. Possible problem: the compiler doesn't always know whether an object is finalizable, since a derived type might have added finalization. Possible problem: Elementary types are now sometimes required to be passed by reference. Possible problems: User-defined copy on non-limited types is a bag of worms. ISSUE 4: INTENDED IMPLEMENTATION STRATEGY. It's not clear how to optimize away the links for (the majority of) objects that have no finalization. A similar comment applies to the pointer (or whatever) to the finalize operation. ISSUE 5: FINALIZATION OF CONSTANTS. Presumably, finalization must take an 'in' parameter, since constants are generally allowed. This means that if the finalization operation wants to pass a component to an instance of UNCHECKED_DEALLOCATION, it must make a copy of that component into a temp. ISSUE 6: ALLOWED FOR NESTED TYPES? Open issue. The simplest semantically is to say Yes. But there could be an explicit restriction in the language. ISSUE 7: COMPOSABILITY. Finalization of components happens after finalization of the composite object (because finalization of the object might access components). Finalization of a parent type happens after finalization of the derived type (because the derived type's finalization might access components of the parent). I don't know what the right answer is in the case where there are both components and a parent. The copy operation of a type composes in the expected way from user-defined copy. This makes it non-uniform with user-defined equality -- but that's better than upward incompatibility. ISSUE 8: DECISIONS MADE AT ROOT TYPE? Finalization may be added at any point in the type hierarchy. Note the relationship of Issues 7 and 8. If finalization can be added anywhere, it is important that finalization compose on derivation. Suppose T2 is derived from T1, and that T2 has finalization, but T1 does not. Suppose somebody wants to add finalization for T1. That somebody can't know about the existence of T2, in general. Therefore, it better not be the resposibility of T2 to call T1's finalization. On the other hand, if finalization can only be added for a root type, then this scenario cannot occur. The programmer of T2 will always know whether T1 has finalization, and can therefore call it explicitly. ---------------------------------------------------------------- PROPOSAL B: COPY ONLY FOR LIMITED TYPES This is the same as Proposal A, except that the user-defined copy operation is only allowed on a limited type. For a type that becomes non-limited after its declaration, the copy operation is not invoked in certain regions. It's not clear whether defining a copy operation automatically makes a limited type become non-limited. In any case, it should be legal for a type extension of a non-limited type to contain a limited component, so long as the user defines copy and "=" operations. Or perhaps not doing so would make the extension abstract. (Recall that an important aspect of our OOP proposal is that operations cannot be subtracted on type extension. When doing a dispatching call, it is not necessarily known at compile time what body will be executed, but it IS known that there is a body to execute.) If one needs to finalize an access type (say, for doing reference counting), one must put the access type inside a limited record, and finalize the record. ISSUE 1: SYNTAX. As for Proposal A. ISSUE 2: INITIALIZE AND FINALIZE ARE ONE-FOR-ONE. By definition. ISSUE 3: IS USER-DEFINED COPY ALLOWED? Yes. But only for limited types. ISSUE 4: INTENDED IMPLEMENTATION STRATEGY. Now only limited types need the overhead of the link fields, and the pointer to the finalize operation. ISSUE 5: FINALIZATION OF CONSTANTS. No problem -- constants never exist. If the type becomes non-limited, then finalization is not done in that region. ISSUE 6: ALLOWED FOR NESTED TYPES? Same as Proposal A. ISSUE 7: COMPOSABILITY. As for Proposal A. ISSUE 8: DECISIONS MADE AT ROOT TYPE? Finalization may be added at any point in the type hierarchy. ---------------------------------------------------------------- PROPOSAL C: NO USER-DEFINED COPY. Finalization is allowed for any limited type. User-defined copy is not allowed. Any type with finalization is passed by reference, even if it turns out to be an elementary type. ISSUE 1: SYNTAX. Same as proposals A and B. ISSUE 2: INITIALIZE AND FINALIZE ARE ONE-FOR-ONE. By definition. ISSUE 3: IS USER-DEFINED COPY ALLOWED? No. But most finalizable types are not copied a lot anyway. Parameter passing and function return are by-reference. ISSUE 4: INTENDED IMPLEMENTATION STRATEGY. Same as proposal B. ISSUE 5: FINALIZATION OF CONSTANTS. No problem -- same as Proposal B. ISSUE 6: ALLOWED FOR NESTED TYPES? Same as Proposals A and B. ISSUE 7: COMPOSABILITY. As for Proposals A and B. ISSUE 8: DECISIONS MADE AT ROOT TYPE? Finalization may be added at any point in the type hierarchy. However, limitedness must be decided at the root type, since the user cannot add a copy operation to a derived type. Therefore, if the user wants to add finalization to a derived type, he will often have to go back and change the root type to make it limited. ---------------------------------------------------------------- PROPOSAL D: RESTRICTION TO ROOT TYPE. Same as Proposal C, except that when you say "for T'FINALIZE use...", T must be a non-derived type, or it must be a derived type whose parent has finalization. ISSUE 1: SYNTAX. Same as Proposals A, B, and C. ISSUE 2: INITIALIZE AND FINALIZE ARE ONE-FOR-ONE. By definition. ISSUE 3: IS USER-DEFINED COPY ALLOWED? No. ISSUE 4: INTENDED IMPLEMENTATION STRATEGY. The link fields, plus the pointer to the finalization operation, can be allocated at the same place in each object, and only when actually needed. ISSUE 5: FINALIZATION OF CONSTANTS. No problem -- same as Proposals B and C. ISSUE 6: ALLOWED FOR NESTED TYPES? Same as Proposals A, B and C. ISSUE 7: COMPOSABILITY. As for Proposals A, B, and C. ISSUE 8: DECISIONS MADE AT ROOT TYPE? Yes, both limitedness and finalization are decided upon at the root type, and cannot be changed for a derived type. ---------------------------------------------------------------- PROPOSAL E: DERIVATION FROM CONTROLLED This is essentially the proposal documented in MS;4.6. The user makes a type finalizable by deriving from a special tagged type called CONTROLLED, which has INITIALIZE and FINALIZE operations. One change from MS;4.6 is that FINALIZE is an abstract operation, which means that spelling errors will be caught at compile time. One advantage is that many of the semantic issues are handled for free -- no need for a lot of special rules about finalizable objects, since most of those rules follow from the rules about type extension. Similarly, many of the implementation issues go away -- see, for example, below where we describe how the link fields are allocated. ISSUE 1: SYNTAX. No syntax. ISSUE 2: INITIALIZE AND FINALIZE ARE ONE-FOR-ONE. By definition. ISSUE 3: IS USER-DEFINED COPY ALLOWED? No. CONTROLLED is a limited type, so all of its desendants are limited. This prevents most copy operations (e.g. assignment). Parameter passing and function return are by reference. Controlled types are "inherently limited" which means that all views of the type are limited. Inherently limited types do not become non-limited. This means that there is no region in which copy is allowed; therefore no region in which finalization should be turned off. (Note that the "inherently limited" concept is needed in any case; it would be disastrous, for example, to pass a protected object be reference.) ISSUE 4: INTENDED IMPLEMENTATION STRATEGY. The link fields are simply declared in type CONTROLLED, and are inherited by all controlled types. In the normal OOP implementation strategy, the component offsets do not change. There is no need for a special pointer from each object to the finalization operation. Instead, the tag field is used to find the finalization operation in the normal way -- by doing a dispatching call. In the other proposals, the link fields were hidden dope added by the implementation. In this proposal, however, the user is aware of their existence -- the type is explicitly derived from CONTROLLED, so it makes sense that whatever fields CONTROLLED has (even if private) also exist in the user's type. Finalization of an object of unknown tag is automatic, since FINALIZE is a dispatching operation, and the user has control over composition with the parent type. ISSUE 5: FINALIZATION OF CONSTANTS. Not necessary. The finalize operation has an 'in out' parameter. ISSUE 6: ALLOWED FOR NESTED TYPES? No. Type CONTROLLED is declared at library level, and the normal rules of type extension prevent controlled types from being more deeply nested. ISSUE 7: COMPOSABILITY. If a component needs finalization, it can be controlled, too. Component finalization happens after that of the composite object. Finalization for multiple components is done in the reverse order of initialization (which is done in an arbitrary order). A derived type's finalization overrides that of the parent. If the user wishes to make both happen, then the derived finalization should call the parent's, in the usual "pass-the-buck" OOP style. This could be viewed as an advantage or a disadvantage: it is less automatic, but the user has control of when (and if) it happens. Note that it is the answer to Issue 8 that allows this simplification to be safe. ISSUE 8: DECISIONS MADE AT ROOT TYPE? Yes, both limitedness and finalization are decided upon at the root type, and cannot be changed for a derived type. However, in most examples we have worked with, the purpose of adding finalization to a type extension is to finalize a new component. But simply making that component be controlled solves the problem in most cases. In some cases, the new finalization is really for the object as a whole, in which case one must define a type for the component that has an access discriminant. The access discriminant is then initialized to point to the containing object. This method is outlined in LSN-1033. The add-a-controlled-component mechanism can be encapsulated in a generic package, which takes a finalization operation as a formal procedure. This method has been documented elsewhere. Another workaround is to have a coding convention: Make all types (or all "interesting" types) derived from CONTROLLED, just in case someone wants to add finalization later. There are probably some applications where such a coding convention makes sense. But applications where the per-object overhead is intolerable will be happy that the "coding convention" has not be codified as a language rule. ---------------------------------------------------------------- COMPARISON OF PROPOSALS: Proposal A is far too complex, and introduces too many semantic anomolies. (What, for example, happens if a discriminant is of a finalizable type? Discriminants are not passed by reference, surely.) Proposal B eliminates the restriction that people are concerned about. However, it requires the user-defined copy feature. Proposal C eliminates the restriction, but there is still the restriction that you can't add limitedness to a derived type. The ability to add finalization is of small comfort to the programmer who has to go back to the root type to make it limited. Removing the restriction from limited requires user-defined copy, turning this into Proposal B. Proposal D doesn't even address the concerns that were raised, although there is a workaround. It is mostly the same as Proposal E, except that restrictions must be stated explicitly in the RM, instead of following logically from the existing restrictions on type extensions. Proposal D also raises the composability issue -- how to define it. Proposal E doesn't address the concern either, although there is a workaround. Its advantage is ease of description and ease of implementation. Proposal B is the simplest one that fully addresses the concern. Proposal E is the simplest one that does not address the concern. Therefore, I think the choice boils down to whether the benefits of Proposal B are worth the cost.