LSN021.OOPFinalize ------------------------ !topic LSN on OOP-Based Object Finalization $Revision: 1.2 $ !reference MS-7.4.6;3.1 !reference MS-11.3;3.1 !from Tucker Taft $Date: 91/11/11 11:43:14 $ !discussion Our current plans are to drop the MS V3.1 proposals for finalization and exit handlers, and substitute an OOP-based approach which requires no new syntax, and can be placed in the annex. It should probably go in the Systems Programming Area, because it is most closely related to user-defined storage management. Any realistic user-defined storage pool will almost certainly be dependent on some kind of finalization to reclaim "chunks" of storage devoted to the storage pool. Finalization would be provided only for "controlled" types derived from a special root type, defined in the following package: -------------- package Finalization_Support is type Controlled is tagged limited private; procedure Initialize(Obj : in out Controlled) is <>; procedure Finalize(Obj : in out Controlled) is <>; private type Controlled is tagged record . . . -- Stuff to support run-time finalization -- other components added by extension end record; end Finalization_Support; -------------- Here is a typical user-defined controlled type: ------------------ with Finalization_Support; package Dynamic_String is -- Package to provide a fully dynamic string. -- On assignment, the existing buffer is reused if -- sufficiently large. -- Otherwise, the buffer is freed and a new one is allocated. -- If Initial_Size is > 0, then the buffer is preallocated -- to the specified size, presumably because the programmer expects -- the string to grow to that size over its lifetime. type Dyn_String(Initial_Size : Natural := 0) is new Finalization_Support.Controlled with limited private; procedure Initialize(Str : in out Dyn_String); -- This allocates the initial buffer if Initial_Size > 0 procedure Finalize(Str : in out Dyn_String); -- This frees the current buffer procedure Assign(Left : out Dyn_String; Right : in Dyn_String); -- This copies the value of the RHS into the LHS, -- freeing and reallocating the buffer if necessary. function "&"(Left, Right : Dyn_String) return Dyn_String; -- Concatenate left and right, returning a combined string function Value(Str : Dyn_String) return String; -- Return current string value of dyn string function Construct(Str : String) return Dyn_String; -- Construct dyn string given normal Ada string. function "="(Left, Right : Dyn_String) return Boolean; -- Dynamic string equality operator -- Other ops are obviously possible. private type String_Ptr is access String; type Dyn_String(Initial_Size : Natural := 0) is new Finalization_Support.Controlled with record Length : Natural := 0; -- Current length of string Buffer : String_Ptr; -- Buffer containing string (plus slack) end record; end Dynamic_String; with Unchecked_Deallocation; package body Dynamic_String is procedure Free is new Unchecked_Deallocation( String, String_Ptr ); procedure Initialize(Str : in out Dyn_String) is -- This allocates the initial buffer if Initial_Size > 0 begin if Str.Initial_Size > 0 then Str.Buffer := new String(1..Str.Initial_Size); end if; end Initialize; procedure Finalize(Str : in out Dyn_String) is -- This frees the current buffer begin Str.Length := 0; Free(Str.Buffer); end Finalize; procedure Assign(Left : out Dyn_String; Right : in Dyn_String) is -- This copies the value of the RHS into the LHS, -- freeing and reallocating the buffer if necessary. begin if Right.Length = 0 then -- RHS is null string, no need to copy data null; elsif Left.Buffer = null or else Left.Buffer'LENGTH < Right.Length then -- Reallocate appropriate size buffer, -- initialized from RHS. -- ASSERT: Left'ADDRESS /= Right'ADDRESS -- (i.e. not assigning to self). -- PROOF: an exercise for the reader ;-) Free(Str.Buffer) Left.Buffer := new String'(Right.Buffer(1..Right.Length)); else -- Reuse LHS buffer Left.Buffer(1..Right.Length) := Right.Buffer(1..Right.Length); end if; -- Set length of LHS Left.Length := Right.Length; end Assign; function "&"(Left, Right : Dyn_String) return Dyn_String is -- Concatenate left and right, returning a combined string Result : Dyn_String(Left.Length + Right.Length); begin Result.Length := Left.Length + Right.Length; if Left.Length > 0 then -- Copy in left part Result.Buffer(1..Left.Length) := Left.Buffer(1..Left.Length); end if; if Right.Length > 0 then -- Copy in right part Result.Buffer(Left.Length+1 .. Result.Length) := Right.Buffer(1..Right.Length); end if; return Result; -- Presuming finalization is deferred end "&"; function Value(Str : Dyn_String) return String is -- Return current string value of dyn string begin if Str.Length = 0 then return ""; else return Str.Buffer(1..Str.Length); end if; end Value; function Construct(Str : String) return Dyn_String is -- Construct dyn string given normal Ada string. Result : Dyn_String(Str'LENGTH); begin Result.Length := Str'LENGTH; if Result.Length > 0 then Result.Buffer.all := Str; end if; return Result; -- Presuming finalization is deferred end Construct; function "="(Left, Right : Dyn_String) return Boolean is -- Dynamic string equality operator begin return Left.Length = Right.Length and then (Left.Length = 0 or else Left.Buffer(1..Left.Length) = Right.Buffer(1..Left.Length)); end "="; end Dynamic_String; ------------- As is hopefully demonstrated above, this OOP-based finalization mechanism is pretty easy to use, and when combined with finalization deferral on function return (see below) allows "controlled" types to be used to implement a true "abstract" data type with user control over all aspects of the object. POSSIBLE IMPLEMENTATION MODEL The run-time support for controlled types can be pretty straightforward. Here is a possible "private" part for package Finalization_Support: private type Controlled_Ptr is access all Controlled'CLASS; type Controlled is tagged record Next_Controlled_Object : Controlled_Ptr; -- Pointer to next object to be finalized Storage_Pool_Follower : Controlled_Ptr; -- Additional pointer used for objects in the heap -- to simplify removal upon Unchecked_Deallocation. -- other components to be added by extension end record; end Finalization_Support; Each task would maintain a singly-linked list of to-be-finalized objects. Similarly, each storage pool containing controlled objects (including components), would maintain a doubly-linked list of to-be-finalized objects. On entering a frame with controlled objects (or components), a pointer to the head of the per-task finalization list (PTFL) would be saved. As new objects are initialized, they would be added to the head of the PTFL. On any exit from the frame, the PTFL would be walked back until the saved point, calling Finalize on each object. Each controlled subcomponent appears on the list separately. Components are initialized and placed on the list before the enclosing object, implying that they will be finalized only after the finalization operation of their encloser. When allocating a controlled object, or one which contains controlled subcomponents, the controlled objects would be added to the per-storage-pool finalization list (PSPFL). Unchecked_Deallocation would finalize the subcomponents and then the object as a whole which is about to be deallocated. It would remove these from the PSPFL. When the storage pool as a whole is about to be deallocated, the PSPFL would be walked, calling Finalize on each object remaining on the list. One attraction of this implementation model is that it is essentially equivalent to the support required already for subtasks. It does introduce overhead on creating a controlled object, but, unlike exceptions, finalization is *always* performed if there is a controlled object in a scope. Therefore, this overhead is relative to the cost of the Finalize procedure, and seems acceptable. It might be possible to avoid the use of an explicit to-be-finalized list, but special approaches would be necessary to keep track of how far along initialization had proceeded if an exception occurred within a declarative part. A PC (Program-Counter) "map" approach as typically used for exception handlers might work, though with arrays and variant records containing controlled components it could get pretty hairy. Therefore, we anticipate that the list-based approach will be preferred. RETURNING FROM A FUNCTION When the value of a local controlled variable is returned from a function, there is a danger that the value will be meaningless after performing finalization of the local variables of the function. There are various solutions to this problem: 1) Disallow returning the value of a local variable which (might) require finalization (this is roughly the MS v3.1 solution); 2) Call a user-defined copy operation to produce the value to be returned, which would presumably do a "deep" copy or bump reference counts so finalization wouldn't render the returned value meaningless; 3) Defer finalization of the local variable and anything which it might reference. We are currently favoring solution (3) because it seems the simplest for the user, most efficient, and most flexible. (2) has a number of semantic difficulties because it is clearly important that the return from the user-defined copy function not itself perform another copy, or else infinite recursion would result. This is why we had the special (and confusing) rule in MD-1.0 that T'COPY (and T'FINALIZE) were only invoked outside their immediate scope. Furthermore, with (2), the temporary produced by T'COPY must eventually be finalized, reintroducing the problem of a function adding a to-be-finalized object to its caller's list. Solution (1) doesn't work because generic contract model issues tend to force it to apply to all limited types, and it is clearly less flexible for types whose full type is still limited. As shown in the example for Dyn_String above, there would be no way to correctly implement the "&" operator or the Construct function. Solution (3) is essentially equivalent to treating a call on such a function like elaborating a package subunit, whose body variables must persist after the elaboration. Alternatively, the return from such a function can be viewed as invoking a "continuation." Both of these views imply possible implementation models. The package subunit approach usually involves putting the variables which are to persist on some kind of secondary stack or mark/release heap. This seems pretty straightforward. Given the per-task finalization list model for finalization, if one puts the "controlled" variables and the aliased variables on the secondary stack, and then returns without walking the PTFL, the local variables of the function requiring finalization would be automatically transfered to the callers to-be-finalized list. At the end of a statement (other than another return statement returning a possibly-finalizable limited type) containing a call on such a function, the PTFL would be walked back to the point it was at the "begin" of the current frame. In an analogous fashion, the secondary stack would be cut back to the length it had at the "begin" as well. The continuation model is intriguing, and might be preferable in some implementations. The return statement of the function would be transformed into a call on a special procedure taking as its only parameter the value which would have been returned by the function. The special procedure to be called would be passed in as an implicit parameter to the "function." The special procedure would consist of the code for the part of the statement which used the value of the function call. After calling this special procedure, the "function" would clean up as usual, and then return (with no result). Temporaries in the original caller's frame would hold any results produced by this special procedure. A static link or display parameter would be needed for this special procedure to allow it to reference or update objects wihin the original caller's frame. The continuation approach avoids the need to return without cutting back the (secondary) stack, and hence might be easier to support in some environments. the need for a static link/display being passed to the special procedure). To reiterate, we prefer deferring finalization on function return because it is the simplest and most flexible approach for the *user*. It also avoids problems with the generic contract model, etc. FINALIZATION AND ABORT/ASYNC-TRANFER-OF-CONTROL(ATC) If one models controlled objects after subtasks (see comment above in IMPLEMENTATION MODEL section), then it seems to make sense for them to be treated like subtasks w.r.t. abort/ATC. In particular if abort/ATC waits for subtasks to terminate, then it should similarly perform finalization on controlled objects. Ada'83 abort does wait for subtasks, so we presume it should perform finalization. ATC is more of an open issue, but currently we don't see how it can be defined in a way that doesn't wait for subtasks. Therefore, we presume that ATC will similarly perform finalization. >From the point of view of bounding abort and ATC, it seems sufficient to say that if there are no subtasks, controlled objects, or local storage pools, then the time to perform an abort or ATC (after exiting any "critical" section) should be bounded, and independent of the depth of nesting. This would imply that a bounded-time check should be possible to determine that there are no "clean-up" operations to be performed, followed by a bounded-time "long-jump" in the case of an ATC. Because the programmer can avoid the declaration of entities requiring cleanup, they can ensure that a bounded-time ATC is possible when necessary. Another way of looking at this is to accept that controlled objects are relatively "heavyweight" objects, hopefully not as heavy as a subtask, but heavy enough to avoid creating and destroying them frequently in time critical code. If one *does* create a controlled object in an abortable region, then it indicates an explicit intention that the finalization action be performed on ATC/abort, to provide more safety and to allow the code within the scope of the controlled object to operation with abort/ATC allowed. Without such a capability, the only way to achieve safety is to defer abort/ATC throughout the period where some shared resource (such as heap space) is in use. This seems clearly less desirable, implying that object finalization is important to allow the safe and responsive use of ATC. Given a list-based approach, finalization is easy to perform on ATC or abort. With abort, the RTS simply walks the entire PTFL to finalize all objects of the task. With ATC, the RTS walks the PTFL back to the point saved in the select-statement header. It is a bounded-time check to see whether any work needs to be done (just compare the current PTFL head against the saved pointer). Once the PTFL has been walked, the ATC can be accomplished by simply restoring the registers to their value saved in the select-statement header. This is also a bounded-time operation. ========================