======= LSN006.ClassWide ======= !topic Types, Classes, and Operations !from Tucker Taft 91-05-30 !discussion This LSN will attempt to motivate the 9X proposals in the area of object-oriented programming (T'CLASS, tags, primitive operations, and all that) by first discussing the fundamental concepts of "type" and "class" as they have been used in programming languages (and other human endeavors), and then follow through with an Ada-9X-specific discussion highlighting the importance of types, classes, and their relation to operations. CLASSIFICATION In any human endeavor which deals with a large number of distinct entities, there is an immediate attempt to classify the entities to simplify understanding of their distinctions. The "classic" example of this appears in biology, where the various animal and plant species are organized (thanks to Aristotle?) into a 7-level classification hierarchy (kingdom, phylum, class, order, family, genus, species -- King Phillip Could Order Fairly Good Soldiers). At the lowest level in the hierarchy is of course individuals, but this "level" is almost always ignored for practical reasons. Instead, the lowest level of discussion is generally "species" (and sometimes subspecies). Two individuals of the same species have a special "compatibility" (i.e. the possibility of fertile offspring) which is distinct from two individuals which are of different species (even if closely related in the classification hierarchy). To minimize the complexity and redundancy of describing and understanding the myriad species, the interesting characteristics are introduced at the highest-possible level in the hierarchy. In this way, we can "know" certain things about all mammals, all vertebrates, all animals, etc. We don't have to be given a list of individual characteristics for each species. Instead, we are told which classes and subclasses (in the general sense) the species is a part of, and we then can use our knowledge of the characteristics associated with those classes/subclasses to determine the characteristics of the particular species. It is useful to notice that the species form a partitioning of the living things -- that is a given plant or animal is of exactly one species. In this sense a species is like a data type. On the other hand, a given plant or animal is in a hierarchy of classes (again, using class as the general term for kingdom, phylum, ...). Because it is a (mostly) single-inheritance hierarchy, each level in the hierarchy is also a partitioning. However, there are exceptions to this, such as the famous single-celled Eugena which is both a plant and an animal (in some schemes they split out a new kingdom just for this weird beastie to avoid multiple inheritance!). Therefore, one can make the general statement: An individual is of a single species/type. An individual is a member of several classes. It is also useful to realize that whether a particular class is a set of species, or a set of individuals, is frequently fuzzed over. One often states that a species is "in" a particular class, and one also states that an individual is "in" a particular class. To make this more formal: A class is a set of individuals with common characteristics. To say that a species/type is "in" a class means that the set of individuals of the species/type iS a subset of the set of individuals of the class. Alternatively: A class is a set of species/types with common characteristics. To say that an individual is "in" a class means that the individual is in a species/type that is in the class. In this sense, a class can be viewed as either a set of individuals or a set of species/types. However, the important distinction between a class and a type is that every individual is in exactly one type, whereas the classes form a hierarchy (perhaps with multiple inheritance). PROGRAMMING LANGUAGE EVOLUTION In the early programming languages (e.g. Fortran-I), there were a fixed number of predefined types used to represent all of the interesting data to be manipulated. In fact, the concept of "type" was barely recognized, since, at least in Fortran and Basic, it was implicit in the identifier used to name a variable (e.g., the famous I-N rule in Fortran, the trailing '$' or '%' in Basic). Array objects existed, though no array "types" did. The concept of "class" was pretty irrelevant since there was only a single integer "type", a single float type, etc. As things got more complicated, types became explicit and parameterized by a size (e.g. INTEGER*2). This represented the first appearance of some kind of classification on types. The integer class consisted of INTEGER*2 and INTEGER*4 (at least in some compilers). Similarly there was a LOGICAL class and a REAL class. However, there were certainly no user-defined classes, since there were no user-defined types. At some point, the big leap was made to user-defined types. Simula, Pascal, and Algol-68 seem to share honors as being the first major languages with this concept. All supported user-defined record-like types in one form or another. Simula of course went several steps further in providing early versions of inheritance (via prefixing) and polymorphism (via virtual procedures). Pascal was the most successful of these languages, and went on to represent a foundation for Modula, Ada, and many other languages designed since. It is interesting to notice that the concept of organizing the types into classes is virtually non-existent in Pascal. Even though users can define named subranges of integers, and their own record types, there were inadequate type-compatibility rules in the initial definition of the language, resulting in a name-equivalence vs. structural-equivalence split in the Pascal market and its follow-ons. The next big step was to "abstraction" and/or information-hiding. Modula with its opaque types, and Ada with its private types are the best known examples of data abstraction. CLU, Mesa, EL-1, etc., were less widely-used languages which pioneered data abstraction. The "final" step (thus far) has been to combine user-defined data types, data abstraction, and the organization of the types into a user-defined class hierarchy, to create what is currently called an "object-oriented" programming language, though as can be seen, OOP is more a combination of earlier concepts than a totally new concept. Ada-83 comes very close to this OOP combination, because it has reasonable support for user-defined data types and data abstraction, and has a (weak) mechanism for relating types in a hierarchy via the derived type mechanism. However, only the predefined type classes like integer and discrete have good support in the language (e.g. can be used for generic formal types). To summarize so far: A "type" corresponds to the level of a species in biological classification systems. Every entity has a single type. The types form a partitioning over the entities. A class is always part of a hierarchy (which may or may not involve multiple inheritance) and allows class-wide characteristics to be defined non-redundantly, at the highest possible level in the hierarchy. Programming languages first had only predefined types, and then predefined classes (e.g. INTEGER*2 and INTEGER*4), and then user-defined types, and then user-defined abstract types, and finally user-defined abstract data types organized into a class hierarchy. DEFINING CLASS-WIDE OPERATIONS IN Ada-83 AND Ada-9X Ada 83 has predefined types, predefined classes, and user-defined types. For Ada 9X we are trying to add more complete support for user-defined classes. By this we mean that a class hierarchy of types can be explicitly defined by the user, and that class-wide characteristics (operations/components) of the types can be defined non-redundantly. In Ada 83 there are various ways to define class-wide operations: a) The language can predefine certain class-wide operations, such as "null", 'VAL, etc., expressed as being "overloaded on all types in the class." b) The language can predefine a "universal" type for the class, with implicit convertibility to all of the types in the class, and effectively define the class-wide operation by defining it on the universal type. For example, the 'LENGTH operation is defined to return universal integer rather than being overloaded on all integer types (I believe that in Ada-80 it was overloaded, and then in Ada-83 was reexpressed as returning the universal type). c) The operation can be associated with the root type of the class, and then inherited by all derivatives, possibly with an overridden implementation. With this mechanism, the implementation of the class-wide operation can vary from one specific type to the next. However, adding such an operation necessarily disturbs all existing derivatives and clients, at least to the extent of requiring recompilation. d) The operation can be defined via a generic, where the formal type identifies the (predefined) class for which the operation is being defined. TEXT_IO.INTEGER_IO is such a generic, defining the operations PUT and GET for the integer class. With this mechanism, the algorithm of the class-wide operation is independent of the particular type, except insofar as it depends on other class-wide operations (like (c)) which are specialized to the particular type. These mechanisms for defining class-wide operations are somewhat limited in Ada 83. (a) and (b) are only available to the language designer in Ada 83. (c) is limited because type derivation does not allow type extension, and because there is no other support for these kinds of user-defined classes. (d) is only available for the predefined classes. In Ada 9X, we propose to preserve these four mechanims for defining class-wide operations. However, only (a) would be limited to the language designer, and insofar as possible, operations like 'VAL would be reexpressed as (b)-style operations, using the universal type "trick" rather than universal overloading. Mechanisms (b), (c), and (d) would be available to the programmer for defining class-wide operations for both predefined and user-defined classes. Method (b) becomes usable by the programmer by having explicit names for the universal types for any class, both predefined and user-defined. Method (d) becomes more generally useful because of the proposed formal derived type, so that a formal type parameter may represent a user-defined type class as well as a predefined class. Furthermore, by combining (b) and (d) an algorithm defined by a generic can be instantiated with the universal type, providing a class-wide instantiation. Each of these mechanisms for defining class-wide operations serves a slightly different purpose. They are not redundant with one another. Methods (b) and (d) are useful for defining additional class-wide operations without disturbing the existing clients/derivatives. Method (c) (adding a primitive operation) allows a class-wide operation to depend on the specifics of the particular type. Method (b) (defining an op on the universal type) is useful for defining a single class-wide operation. Method (d) (defining a generic which has a formal type identifying a class) requires explicit instantiation, but also is more flexible in that it allows new types, exceptions, etc. to be defined as well as simple operations (see below for more discussion of parameterized type classes). We believe that enhancing the support for defining class-wide operations is an extremely important way for improving programmer productivity. It is an important way to reduce the numbers of lines of code required to implement a complex system. PARAMETERIZED TYPES AND CLASSES In addition to classes created explicitly or implicitly via type derivation, there are also type classes defined via parameterization. In particular, array and access type classes are essentially parameterized by a component/designated type. Array type classes are further parameterized by an index type, and access type classes are further parameterized by a specific scope/collection/storage-pool. Ada 83 generics can also be used to produce user-defined parameterized type classes. For example, a parameterized "list" class is implicitly defined by a generic package with a formal type parameter for the list element type. An important example for numerics users is the "complex" class defined by a generic package with a formal float type for the real and imaginary components. Having defined this kind of parameterized class, there remains the desire to be able to extend the class with additional class-wide operations, as well as use it to build further parameterized classes. The generic formal package parameter is the way proposed in Ada 9X to allow a generic to define additional class-wide operations for a user-defined parameterized type class. THE UNIVERSAL TYPE "TRICK" Ada-83 had the need for a number of class-wide operations. Furthermore, there are a number of Ada-83 constructs where the "signature" of the construct has classes rather than particular types. For example, the "signature" for the if statement in Ada 83 is: if then . . . Similarly, the signature for an integer type definition is: type is range .. ; The signature for the 'VAL operation of a discrete type is: function 'VAL() return T; Finally, the signature for the "null" operation of access types is: function "null" return access-class; When a class is used as a result in a signature, the intent is that context determines the particular type. Note, however, that the signature for the 'POS operation is: function 'POS() return universal-integer; Because of implicit conversion, this is very nearly equivalent to: function 'POS() return integer-class; In other words, stating that an operation returns universal-integer (in Ada 83) is a "trick" for saying that it is overloaded on all integer types, or equivalently, it is a class-wide operation. In the case of 'POS, it is class-wide for any discrete type as a parameter, and any integer type as a result. The one advantage that the universal type "trick" has over simply being overloaded on all types in a class, is that when the context does not identify a specific type, then the universal type can be considered to be preferred. Note, however, that one could simply establish some other type (e.g. the "root" type of the class) as the preferred type, thereby perhaps eliminating the need for the universal type concept. Ada 9X, UNIVERSAL TYPES, AND CLASS-WIDE OPERATIONS Since Ada 83 established the universal type as a "trick" for defining a class-wide operation, we have proposed to generalize this for Ada 9X to allow the user to define additional class-wide operations on both predefined and user-defined classes using the universal type as the operand and/or result of the operation. When a universal type of a class is used for a formal parameter, the intent is that an actual operand may be of any type in the class. When a universal type is used as the result type, the intent is that the result is implicitly overloaded on all types in the class, requiring context to resolve the actual result type. Within the body of a subprogram with a formal parameter of a universal type, one can manipulate the formal as though it is an object of some unknown type in the class. From a visibility point of view, it is treated as though it were of the root-type of the class (that is, the primitive operations for use with the universal type are considered to be declared in the same visible part as the root-type), but when one of the primitive operations is actually called, if the type carries a run-time tag, then the tag is used to allow a dispatch to the "appropriate" overriding of the root's primitive operation. These primitive operations of the universal type are not otherwise usable, so as to avoid any ambiguity with the primitive operations of specific non-universal types in the class. In other words, these primitive operations of the universal type are *not* class-wide operations, they are universal-type-only operations. This is analagous to the rule for the operations on the universal numeric types in Ada 83. The universal integer "+" operation takes *only* universal integer, and returns universal integer only -- it's result is *not* overloaded-on/implicitly-convertible-to any other integer type. SUMMARY IN Ada 9X TERMS Each object is of exactly one type. An object may be a member of many classes. If you know the type of an object, you know *all* of its operations. If you know an object is a member of some class, you know *some* of its operations. User-defined classes can be defined via derivation and extension, or via a parameterized generic. Class-wide operations can be defined using one of the methods identified below. For classes defined via derivation and extension, class-wide operations can be defined: a) by the language designer (i.e. by "magic") b) by an operation on the universal type for the class c) by defining a primitive operation on the root type for the class with possible overriding definitions in each derivative d) by defining an operation in a generic with a formal type parameter for the class i) formal derived type for user-defined classes ii) formal discrete, integer, fixed, float, access, or array type, for the predefined classes For classes defined via a parameterized generic, class-wide operations can be defined: e) by defining an operation in the original parameterized generic (this could be done using any of the methods (a)-(d) above, if done inside the generic spec). f) by defining an operation in a generic with a formal package parameter The major "leverage" provided by object-oriented programming features comes from the ability to define classes of related types, and to define class-wide operations non-redundantly. Ada9X can achieve this same leverage by generalizing the existing Ada83 mechanisms for defining class-wide operations, to cover user-defined classes defined either via derivation/extension, or via parameterized generics. -Tuck