[Ada Information Clearinghouse]

"Rationale for the Design of the
Ada® Programming Language"

[Ada '83 Rationale, HTML Version]

Copyright ©1986 owned by the United States Government. All rights reserved.
Direct inquiries to the Ada Information Clearinghouse at adainfo@sw-eng.falls-church.va.us.

CHAPTER 2: Lexical and Textual Structure

2.1 Lexical Structure

A program is written in characters forming lines on a printed page. The arrangement on the page is primarily to assist the human reader, and consequently is mainly in free format. The allowed characters belong to the ISO (ASCII) character set, and the text of a program may contain both upper case and lower case letters. For portability reasons, it is possible to write any program in a 56 character subset of the ISO character set.

On a higher level than that of characters, a program is considered to consist of lexical elements. Both the mechanical compiler and the human interpreter of programs will tend to work in lexical elements, so it is important that these elements should be clearly specified. Lexical elements are clearly delimited and may not straddle line boundaries - a restriction that assists human reading and helps compilers to recover after having detected an error.

The lexical elements are:

Each literal is a lexical element that stands for a value, namely the value literally represented: for example, 10 and 1E1 are two integer literals which both stand for the integer value ten. In addition, a program text can include elements that have no influence on the meaning of the program but are included as information and guidance for the human reader or for the compiler. These are: Identifiers start with a letter which may be followed by a sequence of letters and digits. In addition, an underline character may appear between two other characters of an identifier. This underline is significant and plays the role of the space in ordinary prose (but without breaking the integrity of the identifier). The need for such an underline is seen from good choices of names such as BYTES_PER_WORD rather than BYTESPERWORD. Furthermore the significance of the underline makes SPACE_PER_SON a different identifier from SPACEPERSON or SPACE_PERSON and A_NYLON_GRIP different from ANY_LONG_RIP.

Reserved words are special identifiers that are reserved for special significance in the language. There are 63 such words. Many of them play an important role in the definition of the overall syntax of the major program units of the language, for example:

    procedure   is   begin   end
Other reserved words play a syntactic role at a more detailed level, for example:
    constant   in   out   range
Finally, seven of them are used as operators. These are the reserved words
    and   or   xor   not   abs   rem   mod

Reserved words other than operators cannot be redeclared, and operators can only be redeclared as operators and with the same precedence. Hence programmers cannot write obscure programs by redefining the meaning of words that play an important syntactic role in the definition of the structure of Ada texts. Similarly, declarations written by programmers cannot affect overall properties of the syntax, for example, the fact that if two adjacent lexical elements are identifiers, one of them (at least) must be a reserved word.

Special printing of reserved words is recommended for highlighting programs on an appropriate output device. The method chosen in this book is boldface (and lower case). Since the language does not distinguish between character fonts, one can envisage methods of highlighting the reserved words by the use of a different font, such as lower case, italics, underlining, color, and so on. Clearly, this does not contradict the use of the ISO character set for program input. On the other hand, for program printouts, it is currently possible to get excellent renditions via graphical printers, color screens, or photocomposers; and it is important to exploit this ability in order to enhance the readability of programs.

The identifiers for attribute designators are not treated as reserved words; they are always preceded by an apostrophe (pronounced prime or tick) and can thus be distinguished from declared identifiers and reserved words purely on the basis of lexical information. The identifiers for predefined attribute designators are in fact different from the reserved words, excepting only DIGITS, DELTA, and RANGE.

Ada uses attributes as environment enquiries and to refer to predefined properties. Other languages have used functional notation or dot notation for this purpose. These alternative forms both have the disadvantage of restricting the user's free choice of names. For example, if the address of an object were denoted by a function, this function would have to be overloaded on all data types. Furthermore any user definition of ADDRESS would hide the predefined one and thus make it unavailable. Similarly, dot notation would prevent declaration of record components with the same identifier as an attribute designator. Neither of these consequences is acceptable in light of the fact that the number of attribute designators can be large, and that some of them may be specific to an implementation. Both problems are avoided by the Ada notation for attributes.

The choice of identifiers for reserved words and attributes depends primarily on convention. Preference is given to full English words rather than abbreviations since we believe full words to be simpler to read. For instance procedure is used rather than proc (in Algol 68) and constant rather than const (in Pascal). Shorter words are also given preference: for example access is used in preference to reference, and task is used in preference to process.

The following special characters can be used as single-character delimiters between lexical elements:

    &  '  ( )  *  +  ,  -  .  /  :  ;  <  =  >  |
Two-character compound delimiters are constructed by juxtaposition of two such characters, as follows:
    =>  ..  **  :=  /=  >=  <=  <<  >>  <>

Naturally, in listings of Ada programs, the compound delimiters can be represented following conventional notations where the corresponding characters exist:

    /=  as    ≠      >=  as  ≥      <=  as  ≤

Numeric literals are all introduced by an initial digit. A requirement that has long been recognized when printing numeric tables is for a character to break up long sequences of digits: in Ada, the underline character serves this purpose. In contrast to identifiers, underlines in numeric literals do not alter the meaning, so that 12_000 naturally has the same value as 12000.

A simple sequence of digits is an integer literal written in decimal notation. For other bases from 2 up to 16, the base is given first and is followed by a sequence of digits enclosed by sharp characters (#) or by colons (:), the colon being used as replacement character for the sharp, but only when the sharp is not available. The enclosed digits may include the letters A to F for bases greater than ten. Thus, the conventional ways of expressing bit patterns in binary, octal, or hexadecimal are provided.

Real literals must contain a period, which represents the radix point. They may be expressed in decimal notation or with other bases. Finally, both integer and real literals may include the letter E followed by an exponent.

Examples of numeric literals are given below:
10                        -- an integer literal
 
10.0                      -- a real literal
 
1E3                       -- an integer literal of value 1000
 
1.5E2                     -- a real literal of value 150.0
 
2#1111_1111#              -- an integer literal of value 255
 
2#1#E8                    -- an integer literal of value 256
 
2#1.1111_1111_111#E11     -- a real literal of value 4095.0

Other forms of lexical element are character literals and string literals. A character literal is formed by enclosing a single character between two apostrophes (') - its value belongs to a character type. A string literal is formed by enclosing zero or more characters between double quotes (") - the value of a string literal is a sequence of character values.

String literals (like all lexical elements) are limited to a single line: otherwise for sequences straddling line boundaries the number of spaces in the string would not be clear since the end of line is not visibly delimited. Furthermore, the limitation to one line reduces the consequences of compilation errors arising from the (unintentional) omission of a closing quote character.

To represent a long sequence of characters, the sequence is split into several string literals, each on a single line, and connected by the catenation operator (&). Apart from long sequences, there may be a need to split sequences that contain characters that are not in the 56 basic character subset of ASCII. Examples of catenations of string literals are as follows:
"A long line of printed output which " &
 
"is continued on the next line of the
     program."
 
"END OF LINE " & ASCII.CR & ASCII.LF &
     "START OF NEXT LINE"

Comments may appear alone on a line or at the end of a line. As an end of line remark, the comment should appear as an explanation of the preceding text -- hence the use of a double hyphen (doing duty for a dash) is natural and appropriate, as illustrated by this sentence. For simplicity, a space is not allowed between the two hyphens. No form of embedded comments (within a line of text) is provided, as their utility is insufficient to justify the extra complexity. Single comments that are larger than one line are not provided. Such comments would require a closing comment delimiter and this would again raise the dangers associated with the (unintentional) omission of the closing delimiter: entire sections of a program could be ignored by the compiler without the programmer realizing it, so that the program would not mean what he thinks. Long comments can be written as a succession of single line comments, thus combining elegance with safety.

A pragma (from the Greek word meaning action) is used to direct the actions of the compiler in particular ways, but has no effect on the semantics of a program (in general). Pragmas are used to control listing, to define an object configuration (for example, the size of memory), to control features of the code generated (for example, the degree of optimization or the level of diagnostics), and so on. Such directives are not likely to be related to the rest of the language in an obvious way. Hence the form taken should not intrude upon the language, but it should be uniform. Thus, the general form of pragmas is defined by the language. They start with the reserved word pragma followed by a pragma identifier, optionally followed by a list of arguments enclosed by parentheses, and terminated by a semicolon. The overall syntax of the pragma identifier and arguments is similar to that of a procedure call. Pragmas are allowed at places where a declaration or a statement is allowed; also at places where other constructs that play the role of declarations (for example clauses) are allowed. Examples of pragmas are as follows:
pragma LIST(ON);        -- listing wanted

pragma INLINE(SET_MASK);   -- in line inclusion of call
 
pragma OPTIMIZE(SPACE);
 
pragma SUPPRESS(RANGE_CHECK,  ON =>  TABLE);

Some pragmas are defined by the language (see Annex B of the reference manual). It is expected that other pragmas will be defined by various implementations, in particular for the programming support environments developed around the Ada language.


¤ NEXT ¤ PREVIOUS ¤ UP ¤ TOC ¤ INDEX ¤
Address any questions or comments to adainfo@sw-eng.falls-church.va.us.