Previous |
Contents |
Next |
We shall have to evolve problem-solvers galore
since each problem they solve creates ten problems more.
Piet Hein, More Grooks
In this chapter Im going to try and show you another side to object-oriented programming by taking another look at the calculator program from chapter 13. Im going to leave it operating on Integer values although it would be possible to modify it to deal with Floats or make it generic to be able to be instantiated with different numeric types. What I want to concentrate on here is redesigning it so that it will be possible to extend it easily to handle new types of operand and operator, which at the moment would involve some fairly fundamental changes to the program. What I want to produce is something that will evaluate an expression which is stored in a string. Using a string as the source of the expression is a useful generalisation; you could read the string from the keyboard or a file, or you could generate it within your program.
Apart from the string containing the expression to be recognised, the other main component of the evaluation process is a specification of the expressions syntax. In previous versions of the program, the syntax was implicit in the subprograms which did the analysis. This means that it couldnt be altered without modifying the existing subprograms. By using a tagged type Expression_Type, the subprograms can be made primitive operations of the type and overridden in derived classes as necessary. This will allow new types of operators and operands to be added without affecting existing code, as well as allowing changes to be made to the syntax of expressions. A tagged type can therefore be seen as a collection of subprograms rather than just as an extensible collection of data components. This view is central to the object-oriented way of looking at the world. Heres the declaration of Expression_Type:
type Expression_Type is tagged limited private;
Expression_Type doesnt need to contain any data, since its just a collection of primitive operations. The full declaration in the private part of the package will just look like this:
type Expression_Type is tagged limited null record;
Expression_Type is a limited type because there is no need to allow assignment and comparison of Expression_Type values. All Expression_Type objects will effectively be identical since theyll all embody the same syntax rules. The main primitive operation will be a function called Evaluate to evaluate a string representing an expression and produce an integer result:
function Evaluate (Syntax : Expression_Type; Expr : String) return Integer;
The parameter Syntax will in effect embody the syntax rules for an expression, and Evaluate will use these rules to process the expression given in the Expr parameter to produce an integer result. Evaluate will be a primitive operation of Expression_Type so that it can be overridden by descendants of Expression_Type if the format of an expression needs to be changed. Well also need an exception that Evaluate can use to report syntax errors:
Syntax_Error : exception;
These declarations are the only ones that clients of the package need to be able to see. Everything else can be put into the private part of the package so that its hidden from clients but is still accessible to allow child packages to define types derived from Expression_Type.
An expression consists of a series of tokens arranged according to the syntax rules embodied in Evaluate. The types of token that the calculator in chapter 13 handled included numbers, various operators, left and right parentheses, and an end-of-expression marker (the end of the line or a full stop or something similar). To allow the set of legal tokens to be extended we need yet another tagged type:
type Token_Type is abstract tagged null record;
This is an abstract type since we wont want to allow clients to create amorphous Token_Type objects; you should only be able to deal with specific tokens like numbers, the addition operator and so on. There isnt any data that all tokens will have in common, so Ive declared it to be a null record. Token_Type is there so that all tokens will have a common parent and the class-wide type Token_Type'Class will be able to be used for any token at all. We can derive some concrete classes directly from Token_Type:
type Left_Parenthesis is new Token_Type with null record; type Right_Parenthesis is new Token_Type with null record; type End_Of_Expression is new Token_Type with null record;
We can also derive further abstract types to represent specific classes of token which add some necessary primitive operations:
type Priority_Type is range 0..9; subtype Operator_Priority_Type is Priority_Type range 1..Priority_Type'Last; type Operator_Type is abstract new Token_Type with null record; function Priority (Operator : Operator_Type) return Operator_Priority_Type is abstract; type Operand_Type is abstract new Token_Type with null record; function Value (Operand : Operand_Type) return Integer is abstract;
We know that there are a number of different operators; these obviously all have something in common, so Ive introduced a type called Operator_Type to capture the features which are common to all operators. Operators have a priority (i.e. a precedence), so there is a primitive function of Operator_Type called Priority which returns the operators priority as a value of type Operator_Priority_Type. This allows for nine different levels of priority; Ill use 1 for the highest priority and 9 for the lowest. The reason for making Operator_Priority_Type a subtype of Priority_Type is to allow 0 to be used for operands; Ill explain this in more detail later. Priority is an abstract function so concrete derivations of Operator_Type will be forced to provide an overriding definition for it.
Rather than defining a type Number_Type to represent numeric operands directly, Ive defined a more general type Operand_Type so that different types of operands can be added later. The common feature of operands is that they have a value, so Operand_Type has a primitive function to return the value. This is also an abstract operation which will need to be overridden when we know what type of operand were dealing with.
Operators come in two basic types: binary operators with two operands and unary operators with only one. We can derive further types from Operator_Type which provide primitive functions called Apply to apply the operator to its operands:
type Binary_Operator_Type is abstract new Operator_Type with null record; function Apply (Operator : Binary_Operator_Type; Left, Right : Integer) return Integer is abstract; type Unary_Operator_Type is abstract new Operator_Type with null record; function Apply (Operator : Unary_Operator_Type; Right : Integer) return Integer is abstract;
Some operators can be either binary or unary; unfortunately Ada does not allow multiple inheritance where a derived type can have more than one parent (although its possible to use workarounds involving generics or access discriminants), but in this case we can achieve something close to the desired effect by rederiving from Binary_Operator_Type:
type Variadic_Operator_Type is abstract new Binary_Operator_Type with null record; function Unary_Priority (Operator : Variadic_Operator_Type) return Operator_Priority_Type is abstract; function Apply (Operator : Variadic_Operator_Type; Left, Right : Integer) return Integer is abstract; function Apply (Operator : Variadic_Operator_Type; Right : Integer) return Integer is abstract;
This has two versions of Apply: the first is inherited from Binary_Operator_Type to deal with the case where the operator is used in its binary form and the second is a new one introduced to deal with the case where it is used in its unary form. There is also a separate function Unary_Priority to return the priority of the unary form of the operator. For now well have two main categories of operator: adding operators (+ and ) and multiplication operators (* and /). Here are the declarations we need for these:
type Multiplying_Operator_Type is abstract new Binary_Operator_Type with null record; function Priority (Operator : Multiplying_Operator_Type) return Operator_Priority_Type; type Adding_Operator_Type is abstract new Variadic_Operator_Type with null record; function Priority (Operator : Adding_Operator_Type) return Operator_Priority_Type; function Unary_Priority (Operator : Adding_Operator_Type) return Operator_Priority_Type;
Priority is no longer an abstract operation since we know what the priorities of multiplication and addition operators are:
function Priority (Operator : Multiplying_Operator_Type) return Operator_Priority_Type is begin return 5; end Priority; function Priority (Operator : Adding_Operator_Type) return Operator_Priority_Type is begin return 6; end Priority; function Unary_Priority (Operator : Adding_Operator_Type) return Operator_Priority_Type is begin return 2; end Unary_Priority;
The priorities 5 and 6 have been chosen to allow other operators to have even lower priorities if necessary; the unary priority for the adding operators is the second highest to allow for one higher level. Now we can derive the final concrete types. Here is the definition of a type to represent :
type Minus_Operator is new Adding_Operator_Type with null record; function Apply (Operator : Minus_Operator; Left, Right : Integer) return Integer; function Apply (Operator : Minus_Operator; Right : Integer) return Integer;
The other operators are similar; Ill leave it to you to come up with their declarations. Here are the bodies of the Apply primitives for Minus_Operator:
function Apply (Operator : Minus_Operator; Left, Right : Integer) return Integer is begin return Left - Right; end Apply; function Apply (Operator : Minus_Operator; Right : Integer) return Integer is begin return -Right; end Apply;
It may seem strange to have a data type that contains no data, but in fact there is one item of data that these types all have: their tag. The tags are the reason that we need these types at all, so that we can dispatch to the correct operation. Once again, you can think of these data types as collections of operations rather than as collections of data.
We can derive a type to represent numeric operands by derivation from Operand_Type:
type Number_Type (Value : Integer) is abstract new Operand_Type with null record; function Value (Operand : Number_Type) return Integer;
Unlike the other types, this does contain one item of data apart from the tag, namely the value of the number. This is supplied as a discriminant so that the value of a Number_Type object must be set when it is declared. The primitive function Value simply returns the value of the discriminant:
function Value (Operand : Number_Type) return Integer is begin return Operand.Value; end Value;
Extracting successive tokens from the string contained in an Expression_Type value can be done by a primitive operation of Expression which Ill call Next_Token. Next_Token can use its knowledge about the expected format of an expression to create specific types of token based on the characters it reads from a string. Heres how it can be declared:
type Token_Access is access Token_Type'Class; package Token_Pointers is new JE.Pointers (Token_Type'Class, Token_Access); subtype Token_Pointer is Token_Pointers.Pointer_Type; procedure Next_Token (Syntax : in Expression_Type; Expr : in String; From : in out Positive; Token : in out Token_Pointer);
Next_Token needs to start at the position in Expr specified by From, skip over any spaces and then use the character its found to determine the actual token type. It will then need to create a token of the appropriate type and return a class-wide pointer using the Token parameter. The package JE.Pointers from the previous chapter is used to ensure that all pointers are automatically destroyed after use. Its also possible that Next_Token will go past the end of the expression buffer, so it needs to check for this and return an End_Of_Expression token if it happens. Heres the implementation of Next_Token:
procedure Next_Token (Syntax : in Expression_Type; Expr : in String; From : in out Positive; Token : in out Token_Pointer) is begin -- Find start of next token while From <= Expr'Last and then Expr(From) = ' ' loop From := From + 1; end loop; -- Check for end of expression if From > Expr'Last then Token := Pointer(new End_Of_Expression); else Fetch_Token (Expression_Type'Class(Syntax), Expr, From, Token); end if; end Next_Token;
This begins by skipping over any spaces and then checking if the current position is beyond the end of the string. If so, it creates a new End_Of_Expression token to be the result of the procedure. If not it calls another primitive function Fetch_Token to extract the next token from the string.
Note that Syntax is converted to a class-wide value so that the call to Fetch_Token will be a dispatching call; if you forget to do this, overriding Fetch_Token will have no effect since Next_Token will still be calling the Expression_Type version of the procedure. Fetch_Token is declared like this:
procedure Fetch_Token (Syntax : in Expression_Type; Expr : in String; From : in out Positive; Token : in out Token_Pointer);
Heres how Fetch_Token can be implemented:
procedure Fetch_Token (Syntax : in Expression_Type; Expr : in String; From : in out Positive; Token : in out Token_Pointer) is begin case Expr(From) is when '+' => Token := Pointer(new Plus_Operator); when '-' => Token := Pointer(new Minus_Operator); when '*' => Token := Pointer(new Times_Operator); when '/' => Token := Pointer(new Over_Operator); when '(' => Token := Pointer(new Left_Parenthesis); when ')' => Token := Pointer(new Right_Parenthesis); when '0'..'9' => declare Value : Integer; begin Ada.Integer_Text_IO.Get (Expr(From..Expr'Last), Value, From); Token := Pointer(new Number_Type(Value)); end; when others => Ada.Exceptions.Raise_Exception (Syntax_Error'Identity, "Illegal character '" & Expr(From) & "'"); end case; From := From + 1; end Fetch_Token;
A case statement is used to select amongst alternatives based on the first character of the token. Ive assumed that there are descendants of Token_Type called Plus_Operator, Times_Operator and Over_Operator which have been defined by a similar process to Minus_Operator. Numbers are dealt with using yet another version of Get defined in Ada.Integer_Text_IO which reads a value from a string. The first parameter is the string to read from (in this case, everything from the current position to the end of the string), the second is the variable to store the result in, and the third is an output which is set to the position of the last character that was read. At the end of Fetch_Token, the current position is incremented to get past the last character of the token. In the case where the character thats been read in isnt recognised at all, Ada.Exceptions.Raise_Exception is used to raise a Syntax_Error exception. Ive done it this way instead of using a raise statement because Raise_Exception provides the ability to specify a message which will be associated with the exception.
Now its time to look at the implementation of Evaluate. All it needs to do is to call another primitive function called Parse to parse the expression and then check that the final token is an End_Of_Expression token. Heres how it can be implemented:
function Evaluate (Syntax : Expression_Type; Expr : String) return Integer is Token : Token_Pointer; From : Positive := Expr'First; Result : Integer; begin Parse (Expression_Type'Class(Syntax), Expr, From, Priority_Type'Last, Result, Token); if Value(Token).all not in End_Of_Expression'Class then Ada.Exceptions.Raise_Exception (Syntax_Error'Identity, "Missing operator or left parenthesis"); end if; return Result; end Evaluate;
Parse will do the actual expression evaluation, storing the result in Value as well as returning the terminating token. If the terminating token is not an End_Of_Expression token, there is either an operator missing or too many right parentheses (i.e. a missing left parenthesis) so a Syntax_Error exception is reported.
Parse is declared like this:
procedure Parse (Syntax : in Expression_Type; Expr : in String; From : in out Positive; Prio : in Priority_Type; Result : out Integer; Next : in out Token_Pointer);
The intention here is to do recursive descent parsing as described in chapter 13. The reason for the Prio parameter is that an expression can be considered to be a sequence of terms separated by low-priority (priority 9) operators; each term is a sequence of terms separated by priority 8 operators, each of which is a sequence of terms separated by priority 7 operators, and so on. This can be represented as a set of syntax rules using the same notation as was used in chapter 13:
Expression = Term_8 { Op_9 Term_8 } Term_8 = Term_7 { Op_8 Term_7 } Term_7 = Term_6 { Op_7 Term_6 } ... Term_1 = Term_0 { Op_1 Term_0 } Term_0 = Operand
Note that in general, Term_N can be defined as
Term_N = Term_(N-1) { Op_N Term_(N-1) }
You can see the sense in this if you consider how similar the functions Expression and Term were in chapter 13. Parse can simply call itself recursively with the value of N (the operator priority) as the value of the parameter Prio. The recursion will end when the highest operator priority is reached; each term will then be an operand (i.e. a number, a unary operator followed by a subexpression or a subexpression in parentheses). A priority of zero can be used to indicate that we need to read an operand. This is the reason why Priority_Type has a wider range than Operator_Priority_Type. Parse can therefore be implemented like this:
procedure Parse (Syntax : in Expression_Type; Expr : in String; From : in out Positive; Prio : in Priority_Type; Result : out Integer; Next : in out Token_Pointer) is begin if Prio = Priority_Type'First then -- Term_0 Get_Operand (Expression_Type'Class(Syntax), Expr, From, Result, Next); else -- Term_N for N > 0 declare Right : Integer; Op : Token_Pointer; begin Parse (Syntax, Expr, From, Prio-1, Result, Op); while Value(Op).all in Binary_Operator_Type'Class and then Priority(Binary_Operator_Type'Class(Value(Op).all)) = Prio loop Parse (Syntax, Expr, From, Prio-1, Right, Next); Result := Apply (Binary_Operator_Type'Class(Value(Op).all), Result, Right); Op := Next; end loop; Next := Op; end; end if; end Parse;
If the supplied priority is zero, we need to get an operand. This is done by calling Get_Operand (which is another dispatching call, thanks to the conversion to Expression_Type'Class). If the priority is non-zero, Parse is called recursively to get a term involving operators at the next higher priority (i.e. Prio1). The value of the term will be stored in Value, and the token which terminated it will be stored in Op. The terminating token might be a binary operator of the required priority, in which case we need to get the next term by calling Parse again. This time the value is stored in Right and the terminating token is stored in Next. The operator in Op is then applied to the two operands in Value and Right, and Next is then copied into Op to be used as the operator the next time around the loop. When the loop exits, the token in Op (the final terminating token) is copied back into Next before returning.
All thats left to do now is to define Get_Operand. An operand can be a number, an expression enclosed in parentheses or a unary operator followed by an expression involving operators with a higher priority than the unary operator. The syntax of operands can therefore be written like this:
Operand = Number | ( Expression ) | Unary_Operator_N Term_(N-1)
This uses Unary_Operator_N to symbolise a unary operator with priority N. Heres the implementation:
procedure Get_Operand (Syntax : in Expression_Type; Expr : in String; From : in out Positive; Result : out Integer; Next : in out Token_Pointer) is Op : Token_Pointer; begin Next_Token (Expression_Type'Class(Syntax), Expr, From, Next); if Value(Next).all in Operand_Type'Class then -- Operand Result := Value (Operand_Type'Class(Value(Next).all)); Next_Token (Expression_Type'Class(Syntax), Expr, From, Next); elsif Value(Next).all in Left_Parenthesis'Class then -- Left parenthesis Parse (Expression_Type'Class(Syntax), Expr, From, Priority_Type'Last, Result, Next); if Value(Next).all in Right_Parenthesis'Class then Next_Token (Expression_Type'Class(Syntax), Expr, From, Next); else Ada.Exceptions.Raise_Exception (Syntax_Error'Identity, "Missing right parenthesis"); end if; elsif Value(Next).all in Unary_Operator_Type'Class then -- Unary operator Op := Next; Parse (Expression_Type'Class(Syntax), Expr, From, Priority(Unary_Operator_Type'Class(Value(Op).all)), Result, Next); Result := Apply (Unary_Operator_Type'Class(Value(Op).all), Result); elsif Value(Next).all in Variadic_Operator_Type'Class then -- Variadic operator Op := Next; Parse (Expression_Type'Class(Syntax), Expr, From, Unary_Priority (Variadic_Operator_Type'Class(Value(Op).all), Result, Next); Result := Apply (Variadic_Operator_Type'Class(Value(Op).all), Result); elsif Value(Next).all in End_Of_Expression'Class then -- End of expression Ada.Exceptions.Raise_Exception (Syntax_Error'Identity, "Expression incomplete"); else -- Unknown token Ada.Exceptions.Raise_Exception (Syntax_Error'Identity, "Illegal token"); end if; end Get_Operand;
If the next token is an operand, its Value operation is used to set the value of the Result parameter and the next token is returned. If its a left parenthesis, an expression is processed using Parse to get its value into Result and a check is then made that the token returned by Parse is a right parenthesis. If it is, the token after the right parenthesis is read; otherwise, a syntax error is reported. If the token is a unary operator, Parse is called to process the term that follows; thanks to the lack of multiple inheritance this code has to be replicated for Variadic_Operator_Type as well. If the token is none of these, its an error: either an unexpected end of expression or an invalid token.
What weve got now is a lot more complex than the calculator program in chapter 13. Its tempting to dismiss the extra complexity as gimmickry, but as youll see in the next chapter it serves a useful purpose. What weve got now is an expression evaluator that can be adapted for use in any other program we write that needs to be able to evaluate expressions. The whole thing is now extensible in various ways: extra operators can be added without disturbing the existing code by deriving from Operator_Type, and extra operands can be added by deriving from Operand_Type. The token handling procedures will need updating to deal with the new tokens, but this can be done by deriving from Expression_Type and overriding Fetch_Token. The overridden version of Fetch_Token can call the old version of Fetch_Token to deal with the existing types of token, so theres no duplication of code. The complexity is there for a purpose; without everything youve seen in this chapter it will be necessary to modify all the existing code whenever you need to be able to cope with a small variation on the form that an expression can take. Its complex because its completely universal, and in the next chapter youll see how it can be adapted for use in a spreadsheet.
17.1 | Modify the calculator to support an exponentiation operator represented by ^ with the same precedence as the Ada operator **, i.e. with a higher priority than any other operator. |
17.2 | Modify the calculator to allow expressions to contain hexadecimal values which are indicated by a prefix of $, e.g. $03FF for 16#03FF#. |
17.3 | Modify the calculator to allow postfix operators (i.e. operators which appear after their operands), and provide a factorial operator ! (see chapter 13) so that, for example, 5! evaluates to 120. |
17.4 | Modify the program in this chapter to convert an arithmetic expression into reverse Polish notation. The reverse Polish form of an expression consists of the operands of an operator followed by the operator itself, so that 5+3 is translated to 5 3 +, and (1+2)*(3+4) is translated to 1 2 + 3 4 + *. This can be done by modifying Get_Operand to display operands as they are read and modifying Term to display operators after the operands have been dealt with, rather than using Apply to evaluate the result of the operation as at present. |
Previous |
Contents |
Next |
This file is part of
Ada 95: The Craft of Object-Oriented Programming
by John English.
Copyright © John
English 2000. All rights reserved.
Permission is given to redistribute this work for non-profit educational
use only, provided that all the constituent files are distributed without
change.
$Revision: 1.2 $
$Date: 2002/02/22 01:47:18 $