On a higher level than that of characters, a program is considered to consist of lexical elements. Both the mechanical compiler and the human interpreter of programs will tend to work in lexical elements, so it is important that these elements should be clearly specified. Lexical elements are clearly delimited and may not straddle line boundaries - a restriction that assists human reading and helps compilers to recover after having detected an error.
The lexical elements are:
Reserved words are special identifiers that are reserved for special significance in the language. There are 63 such words. Many of them play an important role in the definition of the overall syntax of the major program units of the language, for example:
procedure is begin endOther reserved words play a syntactic role at a more detailed level, for example:
constant in out rangeFinally, seven of them are used as operators. These are the reserved words
and or xor not abs rem mod
Reserved words other than operators cannot be redeclared, and operators can only be redeclared as operators and with the same precedence. Hence programmers cannot write obscure programs by redefining the meaning of words that play an important syntactic role in the definition of the structure of Ada texts. Similarly, declarations written by programmers cannot affect overall properties of the syntax, for example, the fact that if two adjacent lexical elements are identifiers, one of them (at least) must be a reserved word.
Special printing of reserved words is recommended for highlighting programs on an appropriate output device. The method chosen in this book is boldface (and lower case). Since the language does not distinguish between character fonts, one can envisage methods of highlighting the reserved words by the use of a different font, such as lower case, italics, underlining, color, and so on. Clearly, this does not contradict the use of the ISO character set for program input. On the other hand, for program printouts, it is currently possible to get excellent renditions via graphical printers, color screens, or photocomposers; and it is important to exploit this ability in order to enhance the readability of programs.
The identifiers for attribute designators are not treated as reserved words; they are always preceded by an apostrophe (pronounced prime or tick) and can thus be distinguished from declared identifiers and reserved words purely on the basis of lexical information. The identifiers for predefined attribute designators are in fact different from the reserved words, excepting only DIGITS, DELTA, and RANGE.
Ada uses attributes as environment enquiries and to refer to predefined properties. Other languages have used functional notation or dot notation for this purpose. These alternative forms both have the disadvantage of restricting the user's free choice of names. For example, if the address of an object were denoted by a function, this function would have to be overloaded on all data types. Furthermore any user definition of ADDRESS would hide the predefined one and thus make it unavailable. Similarly, dot notation would prevent declaration of record components with the same identifier as an attribute designator. Neither of these consequences is acceptable in light of the fact that the number of attribute designators can be large, and that some of them may be specific to an implementation. Both problems are avoided by the Ada notation for attributes.
The choice of identifiers for reserved words and attributes depends primarily on convention. Preference is given to full English words rather than abbreviations since we believe full words to be simpler to read. For instance procedure is used rather than proc (in Algol 68) and constant rather than const (in Pascal). Shorter words are also given preference: for example access is used in preference to reference, and task is used in preference to process.
The following special characters can be used as single-character delimiters between lexical elements:
& ' ( ) * + , - . / : ; < = > |Two-character compound delimiters are constructed by juxtaposition of two such characters, as follows:
=> .. ** := /= >= <= << >> <>
Naturally, in listings of Ada programs, the compound delimiters can be represented following conventional notations where the corresponding characters exist:
/= as ≠ >= as ≥ <= as ≤
Numeric literals are all introduced by an initial digit. A requirement that has long been recognized when printing numeric tables is for a character to break up long sequences of digits: in Ada, the underline character serves this purpose. In contrast to identifiers, underlines in numeric literals do not alter the meaning, so that 12_000 naturally has the same value as 12000.
A simple sequence of digits is an integer literal written in decimal notation. For other bases from 2 up to 16, the base is given first and is followed by a sequence of digits enclosed by sharp characters (#) or by colons (:), the colon being used as replacement character for the sharp, but only when the sharp is not available. The enclosed digits may include the letters A to F for bases greater than ten. Thus, the conventional ways of expressing bit patterns in binary, octal, or hexadecimal are provided.
Real literals must contain a period, which represents the radix point. They may be expressed in decimal notation or with other bases. Finally, both integer and real literals may include the letter E followed by an exponent.
Examples of numeric literals are given below:
10 -- an integer literal 10.0 -- a real literal 1E3 -- an integer literal of value 1000 1.5E2 -- a real literal of value 150.0 2#1111_1111# -- an integer literal of value 255 2#1#E8 -- an integer literal of value 256 2#1.1111_1111_111#E11 -- a real literal of value 4095.0 |
Other forms of lexical element are character literals and string literals. A character literal is formed by enclosing a single character between two apostrophes (') - its value belongs to a character type. A string literal is formed by enclosing zero or more characters between double quotes (") - the value of a string literal is a sequence of character values.
String literals (like all lexical elements) are limited to a single line: otherwise for sequences straddling line boundaries the number of spaces in the string would not be clear since the end of line is not visibly delimited. Furthermore, the limitation to one line reduces the consequences of compilation errors arising from the (unintentional) omission of a closing quote character.
To represent a long sequence of characters, the sequence is split into several string literals, each on a single line, and connected by the catenation operator (&). Apart from long sequences, there may be a need to split sequences that contain characters that are not in the 56 basic character subset of ASCII. Examples of catenations of string literals are as follows:
"A long line of printed output which " & "is continued on the next line of the program." "END OF LINE " & ASCII.CR & ASCII.LF & "START OF NEXT LINE" |
Comments may appear alone on a line or at the end of a line. As an end of line remark, the comment should appear as an explanation of the preceding text -- hence the use of a double hyphen (doing duty for a dash) is natural and appropriate, as illustrated by this sentence. For simplicity, a space is not allowed between the two hyphens. No form of embedded comments (within a line of text) is provided, as their utility is insufficient to justify the extra complexity. Single comments that are larger than one line are not provided. Such comments would require a closing comment delimiter and this would again raise the dangers associated with the (unintentional) omission of the closing delimiter: entire sections of a program could be ignored by the compiler without the programmer realizing it, so that the program would not mean what he thinks. Long comments can be written as a succession of single line comments, thus combining elegance with safety.
A pragma (from the Greek word meaning action) is used to direct the actions of the compiler in particular ways, but has no effect on the semantics of a program (in general). Pragmas are used to control listing, to define an object configuration (for example, the size of memory), to control features of the code generated (for example, the degree of optimization or the level of diagnostics), and so on. Such directives are not likely to be related to the rest of the language in an obvious way. Hence the form taken should not intrude upon the language, but it should be uniform. Thus, the general form of pragmas is defined by the language. They start with the reserved word pragma followed by a pragma identifier, optionally followed by a list of arguments enclosed by parentheses, and terminated by a semicolon. The overall syntax of the pragma identifier and arguments is similar to that of a procedure call. Pragmas are allowed at places where a declaration or a statement is allowed; also at places where other constructs that play the role of declarations (for example clauses) are allowed. Examples of pragmas are as follows:
pragma LIST(ON); -- listing wanted pragma INLINE(SET_MASK); -- in line inclusion of call pragma OPTIMIZE(SPACE); pragma SUPPRESS(RANGE_CHECK, ON => TABLE); |
Some pragmas are defined by the language (see Annex B of the reference manual). It is expected that other pragmas will be defined by various implementations, in particular for the programming support environments developed around the Ada language.