Lars Bjšrnfot, Kristina Lundqvist, Gšran Wall and Lars Asplund
Department of Computer Systems, Uppsala University
P.O. Box 325, S-751 05 Uppsala, Sweden
EÐmail: bjornfot@docs.uu.se

Abstract. A highly parallel algorithm for termination of Ada tasks is suggested. The algorithm is intended for hardware implementation in order to overcome the complexity of recursive software algorithms. High efficiency and determinism is expected to be achieved with a hardware implementation. A VHDL implementation of termination units is presented together with simulations that show that the termination condition can be evaluated in constant time.
1 Introduction
A hard real-time system is defined as a system where deadlines must not ever be missed, i.e. a missed deadline means system failure. Such systems require schedulability analysis, either a static schedule is calculated, or a dynamic scheduling algorithm is applied which guarantees that no deadline is missed. For the latter case, there are typically several restrictions, such as for each task, the periodicity, CPU utilization and critical resource usage must be known. In particular, before a new task is added, a new schedulability analysis must be performed. Thus dynamic task creation is not common in hard realÐtime systems, and as a consequence tasks rarely are required to terminate; they are expected to live throughout the execution of the system.
However, if such schedulability analysis allows a certain number of dynamic tasks, there must be means to achieve efficient and predictable task creation and termination. The difficulty to achieve this is caused by (at least): Task creation requires memory to be allocated (from the heap), and task termination requires investigation of the task state of other tasks directly or indirectly dependent on this task and its master(s).
We present an algorithm for evaluation of the termination condition. The algorithm is intended for hardware implementation, and we give the design and implementation in a hardware description language, VHDL [1]. High efficiency is achieved by having a state machine per task evaluating the termination condition. The state machines execute in parallel. The implementation is written in a way that allows synthesis as well as simulation. Synthesized code can be downloaded and executed on a Field Programmable Gate Array (FPGA), which offers a rapid development environment with short turnÐaround time.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The design of a hardware support for task termination, as described in this paper, should be considered as a part of a larger design, where a centralized Controller manages all tasks in a distributed system. We suggest a hardware implementation of Ada tasking as a separate node in a distributed system [2][3]. The concept is not language dependent. Any language with a clearly defined tasking mechanism could benefit from such hardware support, but we have primarily concentrated on Ada 83 [4], since it is standardized (with Ada we mean Ada 83 in this paper).

Most important of the previous work in the area of hardware implementation of Ada tasking, is ATAC [5]. Full Ada 83 tasking is implemented as a coÐprocessor. ATAC in its current version 2.0 handles 32 tasks onÐchip, but the number of tasks can be expanded to 2048 with external RAM.
Other hardware solutions for increasing performance of Ada is the microprocessor Thor [6], a general purpose 32Ðbit RISC with support for Ada tasking.
There are other hardware implementations of realÐtime kernels that are language independent, e.g. the coÐprocessor FASTHARD [7], and the CPU FASTCHART [8], but they do not address the Ada tasking mechanism.
An efficient implementation of Ada task termination for shared memory MIMD architecture is presented in [9]. The solution is highly parallel and distributed. Hardware support for Ada tasking is also discussed in [10]. A distributed version of ATAC is suggested in [11].
The scheduling coÐprocessor in SPRING [12] performes onÐtheÐfly scheduling analysis to decide if new tasks can be created with all deadlines guaranteed.
1.1 Outline
The rest of this paper is structured in the following manner: Section 2: Discusses Ada's task model with the purpose of explaining and clarifying important aspect of the semantics. Section 3: An example of how complex task termination is to handle in software, just to give some perspective of and motivation to our hardware solution. Section 4: This is the main section which describes our termination algorithm and its implementation in VHDL, illustrated with examples and simulation traces. Section 5: Puts the termination of tasks in a wider perspective and what impact the solution will have on the RunÐTime System and compiler. Section 6: Summary.
2 The Task Model in Ada
This section contains an overview of parts of the tasking model concerning termination. Comments and analogies are given to the LRM quotes, and some motives for certain language constructs are given.
The complex termination rules for Ada task causes the termination to be nonÐdeterministic. A task can have one or more dependent tasks, and is also depending on a master. A task completes when the last statement is executed, or when it is aborted, or if an unhandled exception is raised. A task may terminate when waiting on a terminate alternative in a select statement. The task may only terminate when all its dependents (recursively defined in a treeÐstructure) has terminated or is waiting at a terminate alternative. All such tasks are required to terminate simultaneously, as an atomic action.
It is further complicated by the fact that a task waiting at a terminate alternative may be called, thus causing itself and all of its dependents to become nonÐeligible for termination. In a simpler process model, a process is either terminated or not. Waiting for termination is just a matter of waiting for all children to become terminated.
Execution of a selectÐorÐterminate takes longer time than a simple accept, since the termination condition must be evaluated (cooling down direct and indirect masters).
The termination rules must be checked (warming up direct and indirect masters) at every call to a task waiting at a selectÐorÐterminate construct. Such calls take longer time than a call to a task waiting at a simple accept or a select without terminate.
2.1 Master
Each task depends on at least one master. A master is a construct that is either a task, a currently executing block statement or subprogram, or a library package.
The reason for introducing masters and task dependencies is that it allows a group of tasks to terminate when it has accomplished its mission.
2.2 Direct Dependence
A task depends on a master directly in the following two cases:
(a) The task designated by a task object that is the object, or a subcomponent of the object, created by the evaluation of an allocator depends on the master that elaborates the corresponding access type definition.
(b) The task designated by any other task object depends on the master whose execution creates the task object.

Task dependency is important both when deciding if one completed task may terminate, and if a group of tasks waiting at a terminate alternative may terminate simultaneously. Furthermore, the dependency is used to determine when allocated memory of a task can be reused. The memory of a terminated task is not deallocated until all dependent tasks has terminated.
2.3 Indirect Dependence
Furthermore, if a task depends on a given master that is a block statement executed by another master, then the task depends also on this other master, in an indirect manner; the same holds if the given master is a subprogram called by another master, and if the given master is a task that depends (directly or indirectly) on another master. Dependences exist for objects of a private type whose full declaration is in terms of a task type.

A block or a subprogram can be masters, and they can be executed by an other master. In this case the task depends indirectly on the other master.
There is a duality in the task model, that a task both may be a master (have dependents) and have have a master (be dependent).
A master is in itself not dependent on any other master, only task dependence is defined in the LRM. However, the analogy of exiting the scope of a block/subprogram and terminating a task is apparent.
2.4 Completed
A task is said to have completed its execution when it has finished the execution of the sequence of statements that appears after the reserved word begin in the corresponding body. Similarly a block or a subprogram is said to have completed its execution when it has finished the execution of the corresponding sequence of statements. For a block statement, the execution is also said to be completed when it reaches an exit, return, or goto statement transferring control out of the block. For a procedure, the execution is also said to be completed when a corresponding return statement is reached. For a function, the execution is also said to be completed after the evaluation of the result expression of a return statement. Finally the execution of a task, block statement, or subprogram is completed if an exception is raised by the execution of its sequence of statements and there is no corresponding handler, or, if there is one, when it has finished the execution of the corresponding handler.
A task, block or subprogram completes, conceptually by reaching its "endÐofÐstatements". The task completion and block/subprogram completion are similar, even if there exists many different ways to reach the end.
2.5 Terminated
If a task has no dependent task, its termination takes place when it has completed its execution. After its termination, a task is said to be terminated. If a task has dependent tasks, its termination takes place when the execution of the task is completed and all dependent tasks are terminated. A block statement or subprogram body whose execution is completed is not left until all of its dependent tasks are terminated.
Termination requires that the task has completed, and if there are dependents, that those tasks have terminated.
Again the analogy with subprograms (and blocks): "Termination" of a subprogram requires that the subprogram has completed, and all dependents have terminated.
2.6 Terminate Alternative in Select Statement
The tricky part is when a group of tasks, connected by the dependencies, are about to terminate simultaneously:
Termination of a task otherwise takes place if and only if its execution has reached an open terminate alternative in a select statement (see 9.7.1), and the following conditions are satisfied:
- The task depends on some master whose execution is completed (hence not a library package).
- Each task that depends on the master considered is either already terminated or similarly waiting on an open terminate alternative of a select statement.
When both conditions are satisfied, the task considered becomes terminated, together with all tasks that depend on the master considered.

The "termination" of a master requires that the master has completed, and that all its dependents have or are about to terminate. The latter applies when all (remaining) dependents have been suspended in a selectiveÐwait statement with an open terminate alternative, such as:

--suspended on select with open terminate
accept NN;
end select;

The rule for termination does not prevent a terminated task from being called. Any visible task may be called, and Tasking_Error is raised if the called task is terminated.
The opposite applies when a master has "terminated"; it also prevents any terminated tasks dependent of this master from being called. No task/subprogram longer exists, that could make such a call.
Library packages that are masters never terminate. Thus, for tasks depending on library packages, the terminate alternative is not applicable until the main program completes.
2.8 Example of Task Dependencies and Possible Calls
The example below illustrates direct and indirect dependencies of five tasks and two (subprogram) masters. The condition to terminate is indicated by comments, but we only consider the simple case where tasks complete and wait for dependents, not the selectÐorÐterminate case. The dependencies are also shown in fig. 1.
procedure M1 is -- master M1

task C is; -- depends on M1
entry E;
end C;

task body C is
task D; -- depends on C, indirect on M1
task body D is
task E; -- depends on D, indirect on C, M1
task body E is ...
end D; --exit when completed and E terminated
end C; -- exit when completed and D terminated

procedure M2 is -- master M2

task A is -- depends on M2, indirect on M1
entry E;
end A;

task body A is

task B is --depends on A, indirect on M1,M2
entry E;
end B;

task body B is
accept E;
end B; -- terminate when completed

begin -- A
accept E;
end A; -- terminate when completed
-- and B has terminated

end M2; -- exit when completed and
-- A, B has terminated

end M1; -- exit when completed and
-- C,D,E has terminated

The task dependencies are indicated in fig. 1 as thick lines. Possible entry calls are shown as arrows. It is possible to call a task:
sideways (sibling)
any steps up (ancestor or ancestor's sibling)
one step down (child)
The procedure M1 is a master since it creates a new task, C, which also is a master since it creates D etc. Task D and E are indirect dependents on M1. The tasks A and B are declared in M2 and created by M1 executing M2, so A and B are indirect dependents of M1.
The "dependency" of master M2 on M1 is also indicated with a thick line. Such a dependency is not defined in the language, but it is a way to show the indirect dependency of A and B on M1. An important property is that M1 cannot complete until M2 completes and returns. M2, A and B however can complete in any order. The local tasks A and B, which is created during the call to M2, does not exist when M2 has returned.
3 Task Termination in Software
In order to illustrate some of the complexity and recursive behaviour of a software algorithm for termination decision, we have analyzed the runÐtime system in one of our cross compilers. The algorithms and data structures described here are somewhat simplified but shows the recursive behaviour, on which we base the assumption of nonÐdeterminism in execution times for selectÐorÐterminate, as well as for entry calls to such tasks.
Fig 1. Task dependencies (thick lines) and possible entry calls (arrows)
3.1 Layer Temperature
In the routines for accepts and entry calls, there are similar and rather complex algorithms to warm up and cool down layers. A layer is a collection of tasks that may terminate collectively. The layers are connected in linked lists that are traversed, and furthermore when reaching the end of a list, the traversing may continue in a "master block" that has a similar linked list of layers.
A task control block (TCB) and a layer are associated via pointers, and layers are also linked in lists. The "temperature" of a layer is:
WARM = not waiting on select with open terminate
COOL = waiting + has nonÐterminable dependents
COLD = waiting + all dependents if any are terminable
For example, a warm layer (the task associated with this layer) is executing, delayed, suspended on an accept, or waiting at a select with open delay. A cool layer has been suspended on a select with open terminate, but has dependents that are not ready to terminate. A cold layer has also been suspended on a select with open terminate, and all its dependents have either terminated or are likewise suspended on a selectÐorÐterminate.
3.2 Nested Declare Blocks
The complexity is caused by having nested declare blocks (that are masters) and allowing a selectÐorÐterminate statement inside the block, as in the following example.
task body A is
task B ...
task C ...
select accept ... or terminate;
end Inner;
end Outer;
end A;

A TCB contains layer pointers called Most_Recent_Layer and Master_Block. A layer contains a layer pointer called Link_Layer and a TCB pointer called Masters_TCB.
Nested blocks are linked in the Link_Layer list. The select statement refers to task A and the Most_Recent_Layer of A is Inner at that point. Link_Layer in block Inner points first to Outer and then to null. Task C is dependent on declare block Inner (indirect on task A), and task B is dependent on declare block Outer (indirect on task A). See fig. 2.
Inner completes only if the entry of A is called. Outer completes only after Inner has completed. To terminate when waiting at the select, both B and C must be terminable (completed or waiting), and the master of A completed.
Fig 2. Layers and task control blocks linked together in the nested declare blocks example.
3.3 Algorithm for Cooling and Warming Layers
The algorithms in the runÐtime system for handling select with open terminate is as follows. Starting with task A's Most_Recent_Layer, each layer temperature in the linked list Link_Layer is set to COLD, until a layer with nonÐterminable dependents is found. If end of list is encountered before a layer with nonÐterminable dependents, then nr_nonterminable_dependents in Master_Block is decreased, and if nonÐzero the search continues (recursively) in Master_Block.
The search is ended either when a master has nonÐterminable dependents or is completed. In the first case, the temperature of the layer is set to COOL, and in the second case the task and all dependents may terminate collectively.
The algorithm for handling entry calls is as follows. The state of the called task is examined, and if there is an open terminate alternative the masters are warmed up. Starting with the called tasks Most_Recent_Layer, each layer temperature in the linked list Link_Layer is set to WARM, until a nonÐCOLD layer is found (that has nonÐterminable dependents). If end of list is encountered before a nonÐCOLD layer, then nr_nonterminable_dependents in the Master_Block is increased, and this layer is made COOL and then the search continues (recursively) in Master_Block.
4 Task Termination in Hardware
This section describes the design of termination evaluation in hardware. Since each task is described by a state machine, and all such state machines execute in parallel, the evaluation of the termination condition is very efficient.
4.1 Task State
There are three task states that are important for termination, plus one initial, undefined state. Only two bits are needed to represent the four states.
Undefined Destroyed or not yet created
Terminated The task has completed, terminated, or become abnormal
Running The task is executing, waiting in the ready queue or waiting for rendezvous
Waiting The task is waiting at a terminate alternative in a select statement
4.2 Task State Transition
The transitions are shown in the following automaton, fig. 3. This is not a complete description of Ada task states, but sufficient for evaluation of the termination condition. For example, a task that completes (reaches the end statement) corresponds to the terminated state. In the same way, a task that is aborted or that raises an undefined exception, corresponds to the terminated state.
Fig 3. Task states and transitions
All tasks in the Waiting state must terminate simultaneously. When all tasks depending on a master has terminated, the allocated resources (memory etc.) of the tasks and the master may be destroyed, and then reused.
Only a little extra logic is needed to calculate the termination condition for one task, but there are four signals necessary to communicate with the other tasks. The signals are named Rin, Rout, Tin, and Tout. R propagates information upwards to a master, and informally means "is running". T propagate a command downward to dependents, and informally means "do terminate", see fig. 4.
Fig 4. The signals necessary to communicate task state information to other tasks.
The middle part of fig. 5 shows the logic necessary to calculate the termination condition. The task state S = (undefined, running, terminated, waiting) together with the signal Rin determines the values on the signals Rout and Tout, which are defined by
Rout = running + Rin
Tout = terminated . Tin + Rin
Assuming S is a master, the values of the master signals Rin and Tout (those below S) have the following meaning
Rin Meaning
0 All dependent tasks are terminated/undefined/
1 At least one dependent task is still running
Tout Meaning
0 This master has terminated and no dependent tasks
are running, i.e. terminate them!
1 This master is not terminated or some dependent
task is still running
The transitions Create, Wait, Nowait, Complete and Destroy are programmatic, while the transition Term is caused by the termination condition being fulfilled. Termination always originates with a master that is completed, but any number of dependent tasks that are waiting may be involved. The transition is taken when a task is waiting and its master signals that the termination condition has been fulfilled (i.e. when its master has completed and no dependents are running). The signal Term is defined by
Term = waiting . Tin
Term causes the task state to change to terminated, which "immediately" is propagated via Tout to any dependents.
4.3 Connection of Termination Units
As seen in fig. 5, a termination unit consists of the task state represented by a state machine, and two registers to hold the identity of the master and the dependents.
Task state is a state machine with four states
Master register is a register with the id of the task's master
Dependents register is a register with the id of this task, if it is a master and have dependents
Two buses named R and T represent the master signals that connects the termination units. For the R signals, all Rout are connected directly to (one) Rin with wiredÐOR, thus making an important part of the evaluation without any extra gates. All termination units are connected to all master signals via multiplexers. Using the master/dependents registers one of the signals in R and T respectively is chosen.
Fig 5. A termination unit consisting of a state machine, some extra logic, two registers, and four MUXes to connect to the master signals (buses) R and T.
4.4 VHDL Implementation and Simulation
We use the Mentor Graphics software which contains a predefined VHDL library MGC_Portable with definitions such as the type Qsim_Logic, a 12Ðvalued enumeration type for simulation and synthesis of bits and buses. Resolution functions such as Qsim_State_Resolved_Or_Vector are available. The code presented is intended for synthesis as well as simulation. The synthesis subset of VHDL is not standardized so this will be compiler dependent.
Common definitions
Package Defs contains definitions such as states, actions, and buses. AVECTOR is a general data bus while RVECTOR describes a bus where each signal is resolved with the OR function.
library Mgc_Portable;
use Mgc_Portable.Qsim_Logic.all;

package Defs is

-- state machine

type States is (Undefined, Running, Terminated, Waiting);
type Actions is (Create, Complete, Await, Nowait,
Terminate, Destroy); -- "wait" is reserved

-- termination unit

type Register_Name is (Control, Master, Dependent);
constant Size : Integer := 4;
subtype Avector is Qsim_State_Vector (Size-1 downto 0);
subtype Rvector is
Qsim_State_Resolved_Or_Vector (Size-1 downto 0);

end Defs;

The state machine
The state machine below has asynchronous reset (RST), a positive flankÐtriggered clock (CLK) and an output that only depends on the current state, i.e. the transitions themselves does not cause any change in output; the soÐcalled Moore style. Since the logic is 12Ðvalued, care must be taken that no false clock pulses are accepted, e.g. CLK going from 'X' to '1' shall not trigger the machine. This is only relevant for simulation but is necessary for correct synthesis. Since the VHDL standard is intended for simulation, different VHDL compilers have chosen to implement slightly different subsets for synthesis. Sometimes the synthesis fails to produce the desired circuit, if certain guidelines are not followed. The "style C" Moore machine is one of five recommended coding styles for the compiler used.
library Mgc_Portable;
use Mgc_Portable.Qsim_Logic.all;
use Work.Defs.all;

entity State is
port (Rst, Clk : Qsim_State;
Action : Actions;
Result : out States);
end State;

architecture Rtl of State is
signal State: States := Undefined;
-- Style C Moore Machine
Sm : process (Rst, Clk)
if (Rst = '0') then -- asynchronous reset
State <= Undefined;

elsif (Clk'Event and Clk = '1' and Clk'Last_Value = '0') then
case State is
when Undefined =>
if (Action = Create) then
State <= Running;
end if;

when Running =>
if (Action = Await ) then
State <= Waiting;
elsif (Action = Complete) then -- programmatic complete
State <= Terminated;
end if;

when Waiting =>
if (Action = Nowait) then
State <= Running;
elsif (Action = Terminate) then -- terminate at select
State <= Terminated;
end if;

when Terminated =>
if (Action = Destroy) then
State <= Undefined;
end if;
end case;
end if;
end process;

Result <= State; -- output

end Rtl;

The termination unit
The termination unit is encapsulated with communication both to the environment via DBUS and to other termination units via the master signals. The originally two master signals are implemented as four vectors since the synthesis does not allow a resolution function being applied to inout parameters. This should be solved by specifying the master signals as bus, which allows a resolution function to be applied, however only wired_x is allowed. This is highly compiler dependent
In the simulation of one termination unit shown in fig. 6, the signal names match the names in the VHDL code below, however, not all signals are shown. The termination unit is dependent on master nr 1 (Reg_Master is set to 0001) and has no own dependents (Reg_Depend is set to 0000). To simulate this, T_In(1) is forced to 1 and R_In(0) to 0 at 1. This together with the values of Reg_Depend and Reg_Master affects Rin at 1 and Tin at 2 respectively. The actions written to Reg_Control always affects the state z on the following rising clock pulse, as seen at 3 and 4. The performed actions result in the correct state changes. Since there is no dependent, Rin is high only when z = running. At 5 when z = waiting, T_In(1) is forced to 0, to simulate termination initiated by a completed master. Tout is set at at 6 to terminate any dependents. Term becomes high at 7, and this causes the next action Terminate at 8. The state change is effected at the next clock pulse at 9 (intentionally delayed in the figure). The next state is Terminated at 10.
Fig 6. Simulation of one termination unit. The last clock pulse is intentionally delayed to show the Act value after Tin has been forced to '0'.
library Mgc_Portable;
use Mgc_Portable.Qsim_Logic.all;
use Work.Defs.all;

entity Term is
port (Rst, Clk, Write : Qsim_State;
Rs : Register_Name; -- register select
Dbus : Avector;
T_In : in Rvector; -- separate in/out. Bus must be wired_x
R_Out : out Rvector;
T_Out : out Rvector;
R_In : in Rvector);
end Term;

architecture Rtl of Term is

signal Reg_Control : Avector;
signal Reg_Master : Avector;
signal Reg_Depend : Avector;
signal Term : Qsim_State;
signal Z : States; -- task state

signal Rin, Rout, Tin, Tout : Qsim_State;
signal Act : Actions := Destroy;

component State
port (Rst, Clk : Qsim_State;
Action : Actions;
Result : out States);
end component;
for all: State use entity Work.State(Rtl);

function To_Qsim (B : Boolean) return Qsim_State is
if B then
return '1';
return '0';
end if;

Term_State: State port map (
Rst => Rst,
Clk => Clk,
Action => Act,
Result => Z);

Input : process (Rst, Write) -- no need to include Rs
if (Rst = '0') then
Reg_Control <= "0000";
Reg_Master <= "0000";
Reg_Depend <= "0000";
elsif (Write'Event and Write = '1' and Write'Last_Value = '0') then
-- use Write as clock
case Rs is
when Control => Reg_Control <= Dbus;
when Master => Reg_Master <= Dbus;
when Dependent => Reg_Depend <= Dbus;
end case;
end if;
end process;

P_State: Process (Reg_Control, Term)
if Term = '1' then
Act <= Terminate;
case Reg_Control is
when "0001" =>
Act <= Create;
when "0010" =>
Act <= Await;
when "0011" =>
Act <= Nowait;
when "0100" =>
Act <= Complete;
when "0101" =>
Act <= Destroy;
when others => null;
end case;
end if;
end process;

-- Rn high means "is running"
-- Tn low means "do terminate"

Tout <= (To_Qsim (Z /= Terminated) and Tin) or Rin;
Rout <= To_Qsim (Z = Running) or Rin;
Term <= To_Qsim (Z = Waiting) and not Tin;

-- input MUX: use Reg_n as select

Tin <= T_In (To_Integer(Reg_Master));
Rin <= R_In (To_Integer(Reg_Depend));

-- output DEMUX: handle in processes. Reg_Master and Reg_depend
-- should not change while Rout and Tout are active.
-- The last signal assignment holds and overrides the default.

P_Master: process (Reg_Master, Rout, Z)
if Z /= Undefined then
R_Out <= "0000"; -- wired OR default
R_Out (To_Integer(Reg_Master)) <= Rout;
R_Out <= "0000"; -- default
end if;
end process;

P_Dependent: process (Reg_Depend, Tout, Z)
if Z /= Undefined then
T_Out <= "0000"; -- wired OR default
T_Out (To_Integer(Reg_Depend)) <= Tout;
T_Out <= "0000"; -- default
end if;
end process;

end Rtl;

An array of termination units
An array of components is created and connected with the generate statement in VHDL. Most discouraging this statement is not implemented in the VHDL synthesis subset of the current version of our software, Mentor Graphics v8.4. The only solution is to name and connect each component "manually". Such an implementation is shown below, however, this is not feasible for a large (realistic) array.
A simulation of the termination array is shown in fig. 7. The dependencies are the same as the example described in the following section 4.5. From 1 and forward the four tasks T1 to T4 are created, they are allocated to termination unit a0 to a3. From 2 and forward the master (T1) completes and changes state to Terminated, and then the dependent tasks (T2 to T4) one by one changes state to Waiting. When the last task changes state, the condition for termination is fulfilled and the signal Tout at 3 ripples through all dependents, so that Term is set at 4. Term going high affects the action at 5. For simplicity only a few signals from termination unit a1 is shown in the figure. All dependents are terminated simultaneously.
Fig 7. Simulation of the termination unit array. The last clock pulse is intentionally delayed to show the Act value.
library Mgc_Portable;
use Mgc_Portable.Qsim_Logic.all;

use Work.Defs.all;

entity Ta is
port (Rst, Clk, Write: Qsim_State;
Sel : Integer range 0 to 3; -- 4 termination units
Rs : Register_Name;
Dbus : Avector);
end Ta;

architecture Rtl of Ta is

component Term
port (Rst, Clk, Write : Qsim_State;
Rs : Register_Name;
Dbus : Avector;
T_In : in Rvector;
R_Out : out Rvector;
T_Out : out Rvector;
R_In : in Rvector);
end component;
for all: term use entity work.term(rtl);

signal T, R : Rvector; -- resolution Wired_OR
signal Lwrite : Qsim_State_Vector (3 downto 0);

begin --concurrent statements

A0: Term port map (Rst, Clk, Lwrite(0), Rs, Dbus, T, R, T, R);
A1: Term port map (Rst, Clk, Lwrite(1), Rs, Dbus, T, R, T, R);
A2: Term port map (Rst, Clk, Lwrite(2), Rs, Dbus, T, R, T, R);
A3: Term port map (Rst, Clk, Lwrite(3), Rs, Dbus, T, R, T, R);

Selector: process (Sel, Write)
variable Temp: Qsim_State_Vector (3 downto 0);
Temp := "0000";
if (Write = '1') then
Temp(Sel) := '1';
end if;
Lwrite <= Temp;
end process;

end Rtl;

4.5 Example of Dependencies and Usage of the Master and Dependents Register
Consider a task T1 that has two dependents, T2 and T3, and where T3 has one dependent, T4. The actual Ada program could be constructed in a number of ways, but the following simple fragment of code illustrates one way.
task T1;
task type Type_2;
task type Type_3;
task type Type_4;
task body T1 is
T2 : Type_2; --dependent on T1
T3 : Type_3; --dependent on T1
end T1;

task body Type_3 is
T4 : Type_4; --dependent on T3
end Type_3;

The width of R and T is four bits each, which is the minimum number necessary with this set of tasks. If bit T0 and R0 is used to represent "no dependents", the values of the master register M, and dependents register D becomes:
Task M D
T1 3 2
T2 2 0=none
T3 2 1
T4 1 0=none
The tasks and the dependencies are shown in fig. 9 and more elaborated in fig. 8, where the mapping of registers M and D to R and T is shown in detail for each task.
Fig 8. Task termination unit array, and the mapping of M and D to a master signal, for the example in 4.5.
Fig 9. Task dependencies illustrated with the signals Rin, Rout, Tin, and Tout.
The signal R0 that represent "no dependents" must be hardwired to "1", while T0 can be ignored (or at least wiredÐOR and then ignored). R0 and T0 are rightmost in fig. 8.
4.6 Representing Procedures and Declare Blocks
A procedure or block that is a master is represented with a termination unit. Since a master does not necessarily have its own thread of execution it is necessary to keep track of the current master.
When a procedure master is created, its initial state is Running. It will remain there until end is executed, which changes its state to Terminated. The procedure is now completed and waiting only for dependents to terminate.
For a declare block there is an additional complexity: the selectÐorÐterminate statement is allowed (accept is allowed but that does not make it more difficult). A declare block is created with an initial state Waiting and it will remain there until end is executed which changes its state to Terminated, or until it is terminated together with its master and all dependents. When executing a selectÐorÐterminate inside a declare block it is the termination unit of the task that changes state to Waiting, the states of the (potentially) nested block's termination units are not affected.
Nested declare blocks are handled the same way as nested procedures, in the sense that all termination units are in the state, and execution of a programmatic end only affects the last termination unit, causing it to become Terminated. The difference is that for nested blocks the state must be Waiting, and for nested procedures the state is Running (could be Waiting).
Both of the following code fragments have nested masters that are not tasks. The dependencies are mapped on termination units in the same way, as shown in fig. 10.
Fig 10. Task dependencies of nested procedures or declare blocks. The circles represent masters that are not tasks.
-- Nested declare blocks with a select
task body A is
task B ...
task C ...
select accept ... or terminate;
end Inner;
end Outer;
end A;
--Nested procedures. Equivalent with the previous
--code fragment except that select is not allowed

task body A is
procedure Outer is
task B ...
procedure Inner is
task C ...
-- select not allowed.
end Inner;
end Outer;
end A;

4.7 Limitations
The number of master signals sets a hard limit on the number of masters in the system. The most "master consuming" configurations are
all tasks nested: task T1 created T2 that creates T3 etc. The number of tasks is the same as the number of levels.
a binary tree: task T1 creates two tasks, that each creates two tasks. If the nesting depth is n the number of masters is 2n-1 and the number of tasks is 2n.
Luckily, any number of sibling tasks can be created without increasing the number of masters. It is reasonable to assume that the number of tasks in a realistic system is much larger than the number of masters.
4.8 Increased Utilization of the Master Signals
If the number of masters in a program is larger than the number of master signals, the task hierarchy will not fit on the hardware. By structuring the task hierarchy into subtrees, and the master signals in groups belonging to subtrees, it is possible to increase the utilization of the hardware. Tasks belonging to the same subtree are mapped to termination units physically close on the chip. The master signals are separated by (programmable) latches, so that the same master signal can be reused, mapped to different masters, in different subtrees without interference.
There is a simple mapping algorithm that allows any number of masters in a NÐary tree, i.e. a tree where a task may have any number of children. The tasks are enumerated inÐorder and then directly mapped to the the termination units. The termination units are packed so that no units are "wasted". The limiting factor with this algorithm is the nesting depth.
The example in fig. 11 (although with fewer children than a binary tree) shows a tree where the tasks for simplicity have been named after their inÐorder number. The subtrees rooted at T2 and T6 must be separated, and likewise the subtrees at T4 and T7. If this tree is part of a larger system, the master signals must be cut at more places, e.g. above T1 and below T9.
Fig 11. Task dependencies and mapping to termination units. The nesting depth is 4 and the number of masters is 6 with T1's master included.
An even simpler way to increase the number of masters is to cut the signals on a few fixed places, like in fig. 12. Within each subtree termination units can be allocated arbitrarily
Fig 12. Cutting the master signals in half two times increases the maximum number of masters from 4 to 11.
5 RunÐTime System and Compiler Impact
In our previous papers we have suggested a distributed system architecture with a centralized node managing all tasks in the system. This section describes the proposed Controller and the impact on the local runÐtime systems, and on the compiler.
In the proposed distributed system there is a potentially large number of looselyÐconnected nodes, i.e. CPUs each with private memory, connected to a highÐspeed LAN. The nodes are general purpose computers that execute a distributed Ada program. One node, the Controller, is highly specialized for management of all Ada tasks in the system.
Each node has a smaller runÐtime system than a "normal" Ada program since all task management is replaced with a protocol for communication with the Controller. Any call to the runÐtime system concerning tasking is transformed to a request to the Controller. Every request causes a reply that among other things contains information about which task the node shall execute. Messages may be sent at any time from the Controller to a node, for example when a delay expires. The Controller has all state info, effects all state changes, and schedules all tasks.
For each node there must be a priorityÐsorted ready queue. There must be one expirationÐsorted delay queue for the system. For each entry, there must be one FIFO queue. All operations such as Insert and Remove must be deterministic.
There is one state machine per task. Calls to the Controller are serialized but each call is served by parallel execution of all task state machines. The address of the state machine can be used as a global task identity.
The controller must be able to handle for example create and delete task, local and interÐnode rendezvous, interrupts, and ACP Ñ the Ada Controller Protocol.
Create task: Memory is allocated locally, to be used for task stack as well as for some task state info, registers etc. A request for task creation is sent to the Controller, which allocates a TCB (which the termination unit is part of). The allocation algorithm should be simple: any TCB that is free. The actual choice of TCB identity, must be propagated back to the requesting node.
Destroy task: All termination is originally programmatic; one task that completes or executes a selectÐorÐterminate sends a request to the Controller. There will be one reply and possibly a large number of messages to other nodes that has tasks to terminate. Memory are deallocated locally.
Rendezvous: All entry calls and accepts cause a request to the Controller, which reschedules the tasks on the node (or nodes if the called task is on another node). The tasks are removed or inserted in the ready queue, delay queue and/or ready queue. If two nodes are involved, both a reply and a message is sent, to switch tasks. Any rendezvous parameters are transferred directly between the nodes. Both the parameter size and address must be known by the runÐtime system. For local calls only the address is necessary. Any exception during rendezvous is first transmitted to the Controller, which propagates it to the caller.
Interrupts: If handled by a procedure (by pragma interrupt_handler or similar) this does not cause any Controller activity but any interrupt associated with an entry gives a request/reply to the Controller. The ATAC handles interrupt and reschedules without disturbing the CPU but in a distributed system this is not practical.
ACP: A simple and efficient protocol for communication between the nodes and the Controller is required. The round trip time of a request/reply is about a few µs on a fiberÐoptic network, and the Controller must be able to make any tasking decision during this short time. The Controller must implement both the protocol and the tasking in hardware to achieve this.
The impact on the compiler should be minimized, primarily by making changes in the runÐtime system instead. Since Ada is a standard, the runÐtime system interfaces of different compilers are more uniform than for nonÐstandardized languages.
In addition to the task id, the current master must be known to the runÐtime system. Some operations concerns the task itself, such as execution of a select statement, while other operations concerns the current master, such as execution of an end statement.
6 Summary
Our hardware algorithm makes possible the termination of groups of Ada tasks in a deterministic manner (in fixed time), which is important not only when a task completes and terminates, but also when a task makes an entry call to another task that are waiting at a selectÐorÐterminate. In a hard realÐtime system, such language constructs (that are nonÐdeterministic) are excluded.
Compared to software implementations with potentially long lists of tasks that must be traversed sequentially (task dependencies as well as nested masters are handled), hardware implementations in general offers high speed as well as the more important determinism.
The benefit of expressing an algorithm in VHDL is that VHDL code can be tested in simulators as well as synthesized and executed on a Field Programmable Gate Array, FPGA. The FPGA can easily be reprogrammed and thus it is relatively cheap to experiment with different hardware implemented algorithms. The synthesis step of the termination unit will be performed as soon as possible, however, the correct version of the software for synthesis and execution has not yet been delivered. The next step is to implement the Controller in VHDL, simulate, synthesize, and execute it on a FPGA.
This work is sponsored by NUTEK, project number P1221-2. We also want to thank CelciusTech Systems and Rational.
[1] R. Lipsett, C. Schaefer, C. Ussery, "VHDL: Hardware Description and Design", Kluwer Academic Publishers, 1989.
[2] L. Bjšrnfot, L. Asplund, K. Lundqvist, and G. Wall, "Distributed RunÐTime Systems, a Protocol for Ada", LNCS 688, pp. 249 - 263, 1993, SpringerÐVerlag.
[3] L. Bjšrnfot, K. Lundqvist, G. Wall, and L. Asplund, "Distribution of Tasks Within a Centrally Scheduled Local Area Network", Proceedings from the First Symposium 'Ada in Europe', Copenhagen Denmark, September 1994.
[4] United States Department of Defence, "Reference Manual for the Ada Programming Language", ANSI/MIL-STD-1815A, 1983.
[5] Joachim Roos, "A RealÐTime Support Procesor for Ada Tasking", in ASPLOS-III, pp 162-171, Apr 1989.
Joachim Roos, "The Performance of a Prototype Coprocessor for Ada Tasking", in TRI-Ada, pp 265-279, Dec 1990.
Joachim Roos, "Designing a RealÐTime Coprocessor for Ada Tasking", IEEE Design & Test of Computers, Mars 1991.
J. Roos and F. G—mezÐMolinero, "VLSI Hardware Support for Ada Tasking", in Ada in Aerospace, Barcelona, pp 52-59, Dec 1990.
J. Roos and F. G—mezÐMolinero, "A Complete Version of the Ada Tasking Coprocessor", in RealÐTime Embedded Processing for Space Applications, pp 101-110, Nov 1992.
R-TECH AB RealÐTime Technology, Sweden, "ATAC 2.0 Ada Tasking Coprocessor", E-mail: joachim@r-tech.se, Mars 1993. Draft data sheet.
[6] Stefan AsserhŠll, "A Microprocessor with Ada RealÐTime Support", in Ada i Komersiella system, Ada in Sweden and SESAM, Apr 1991.
SAAB Ericsson Space, "THOR A 32Ðbit RISC with onÐchip Ada Support and Error Detection", Oct 1992. Advance information.
[7] Lennart Lindh, "FASTHARD Ñ A Fast Time Deterministic HARDware Based RealÐTime Kernel", in Euromicro '92 workshop on RealÐTime Systems, 1992.
Lennart Lindh, "A RealÐTime Kernel Implemented in One Chip", in Euromicro '93 workshop on RealÐTime Systems, 1993.
[8] L. Lindh and F. Stanischewski, "FASTCHART Ñ A Fast Time Deterministic CPU and Hardware Based RealÐTime Kernel", 1991. Collection of technical reports, CUS, VŠsterŚs, Sweden.
F. Stanischewski and L. Lindh, "A Design of a RealÐTime Unit (RTU) in Hardware", in SNART Symposium, pp 52-58, June 1991. ISSN 0283-0574
Frank Stanischewski, "FASTCHART Ñ Performance and Benefits and Disadvantages of the Architecture", in Euromicro '93 workshop on RealÐTime Systems, 1993.
[9] S. Flynn, E. Shonberg, and E. Schonberg, "The Efficient Termination of Ada Tasks in a Multiprocessor Environment", in Ada Letters, vol VII, pp 55-76, Nov/Dec 1987.
[10] Anders Ardš, "Hardware Support for Efficient Execution of Ada Tasking", IEEE, 1988.
[11] Lars Lundberg, "A Coprocessor for High Performance Multiprocessor Ada Tasking", in Proc. of the Ada Europe International Conference, Athes, pp 147-165. Springer Verlag, 1991.
[12] J. A. Stankovic and K. Ramamritham, "The Spring kernel: A new paradigm for realÐtime systems", IEEE Software, vol 8, nr 3, pp 62-72, May 1991.
M.C. Ko and W.P. Burleson, "The Spring Scheduling Co-Processor", TR-93-CSE-4, Dept. of Electrical and Computer Engineering, University of Massachusetts, 1993.