Language Study Note LSN-034-DR INTERFACE Pragmas Randy Brukardt, RR Software 1992 March 4 !Topic LSN on INTERFACE pragmas. !from RR Software (Randy Brukardt) !Reference MS 4.0;13.9 We were asked at the recent DR meeting to write an LSN encapsulating all of our thoughts on INTERFACE pragmas (i.e. pragmas INTERFACE, INTERFACE_OBJECT, INTERFACE_NAME, and CALLING_CONVENTION). In addition, we and others have made some criticisms of the current design of these pragmas. This note is intended to delve into all of the issues surrounding these pragmas. After doing that, some alternative designs for the pragmas will be suggested. This note is built from a considerable volume of correspondence on these subjects, as well as various conversations. Opinions expressed in this note are mine, and do not necessarily reflect those of other Ada 9x teams. However, current MRT thinking on these matters is reflected as well as I understand it. This note also covers issues surrounding address clauses, since it turns out that these are closely related to the INTERFACE pragmas. In fact, address clauses can be considered an alternative form of the INTERFACE_NAME pragma of the MS-4.0 proposal. (More on this later). This note tries to cover all of the issues, and therefore covers issues that seem obvious. One of the purposes of this note is to get all of the issues and requirements in one place, so that designs for interface features can be evaluated against the entire set. What are we trying to accomplish here? First, a brief overview of what Ada 9x is trying to change. Ada 83 provides the Interface pragma for calling out to foreign languages. It is rather loosely defined, but it suffices in many ways for its job. However, it is only part of the story. Ada 83 does not provide any language defined way for Ada programs to access objects declared in foreign languages. Ada 83 also does not define any ways for foreign languages to call Ada. This capability was not originally thought to be important, however the rise of systems like Motif and Microsoft Windows have made it critical that Ada provide good facilities for this. These systems call routines in the client program when an event occurs; the client program has little or no control over this process. Similarly, Ada 83 does not define object access from foreign languages. Ada 83 provides no way for Ada routines to be used without an Ada main program. Lastly, Ada 83 does not provide a way to specify the actual name of the foreign language routine or object. Since foreign languages may have much different naming conventions than Ada (few foreign languages support operator symbols, for example!), and restricting users only to routines which have Ada-legal names is not really acceptable (some languages have no such names, for example a C compiler that always prepends '_' on names), some capability is necessary for this. Individual compilers provide some or all of these facilities, but there is no standardization on how it is done. Therefore, bindings which need these facilities are by definition tied to a single compiler. As we shall see, this is a lot bigger area than the handful of paragraphs that it has received in the mapping specification indicates. Let's move on to some of the fundamental issues. The fundamental issues. We'll start by analyzing issues that are not particularly Ada issues; they would occur with any programming language. It is also interesting to note that these issues are pretty much independent of each other, and are independent of which foreign language is being used. The most important issue is that of import or export (or neither). An entity (object or subprogram) can be imported from a foreign language environment, exported to a foreign language environment, or neither (meaning it is local to the Ada program). (Local has a different meaning in this sense than the normal Ada meaning; an Ada global variable can be local to the Ada part of a program. This meaning of local will pop up occasionally in this paper). It should be clear that no entity can be both imported and exported; importing implies that the entity was defined elsewhere, and has been made available to this Ada code, while exporting implies that the entity was defined here, and is being made available to other code. Entities also can be imported or exported indirectly, by passing pointers to them to (or from) the foreign code. These entities can be treated locally for the purposes of import/export, although we will soon meet them again. Most languages draw a distinction between these two concepts. For example, the Microsoft assembler language for the 8086 family has PUBLIC (export) and EXTERN (import) declarations. Another example is the C language, which has extern (import) declarations, while entities without special declarations are automatically exported. (C has several declarations which will make an entity local to the current unit.) I don't think that there is much debate that for Ada the default should be that entities are local. Ada spends much effort on information hiding, and very little is imported automatically. The same should hold in a larger, mixed language system. In addition, Ada's naming conventions potentially are not compatible with the larger foreign language system. Finally, automatic export clutters the name space of the system. Exactly how import and export should be handled is open to various solutions; we'll explore solutions later in this note. The next issue is that of calling conventions & data formats. Foreign languages may use different parameter passing, different calling mechanisms, and may lay out data differently than an Ada compiler. This is actually a property of a specific implementation; calling conventions and data layouts may differ between implementations of the same language. For languages (implementations) to interoperate, one or both of the languages must be able to use the others data layouts and calling conventions. There are many ways to accomplish this, including user-written 'glue' routines; having one language use the conventions and layouts of another consistently; or by one of the compilers being able to use both its native format and the foreign formats. Obviously, we can only make changes to Ada, and not to other languages, so any solutions that we provide have to be in Ada. It also should be obvious that making the user bear the whole burden of interfacing, while possible, is not going to be considered as a solution to the problems posed at the outset of this note. A decision between the last two alternatives probably ought to be left up to the implementations themselves. In any event, having one language implementation use the calling conventions of another consistently will only work for one set of language combinations. An Ada compiler that uses the C calling convention consistently will not help to interface to Fortran (unless the Fortran also uses the C calling convention). It is also possible that the rules of Ada would make it impossible to use a foreign calling convention exclusively. Entities which need to have a foreign language calling convention or data layout may include entities which are being exported, imported, or are local. It may also be necessary to indicate that the items which are pointed at by a pointer are in a foreign language calling convention or data layout. It should be obvious that pointers passed to a foreign language must point at entities which use a format familiar to that language. Similarly, just because an item is imported or exported does not mean that it should be in a foreign language format. If the foreign language code is new, or if the foreign language supports using Ada entities directly, the Ada format may be what is wanted. Doing so eliminates any possible need for translation or optimization loss that might be implied by a foreign format. For objects, import or export brings up the question of initialization. Ada can initialize objects explicitly or implicitly. Explicit initializations are controllable by the user, so they are not a problem. However, implicit initializations (such as initializing access types to null) can be a problem. Imported objects generally should not be initialized by Ada. Since the object belongs to some foreign language subsystem, that subsystem ought to be doing its own initialization. In the rare case where an imported object does need to be initialized by Ada, that can be done explicitly with an assignment. Exported objects, however, belong to Ada. They should be treated as Ada objects, and therefore should be initialized (including implicit initializations) if necessary. In addition, explicit initializations ought to be allowed (that is, the design of the solution to import/export should not prevent initializations generally). Exported objects should act as normal Ada objects, as they are part of an Ada subsystem. Foreign entities and entities made available to foreign code have to be identified somehow. There are two ways to do this. First, there is the conventional named reference. This involves somehow giving a (non-local) name to an entity, which is then used in references to the entity. The references are then fixed up at link-time. The second technique is to use an Ada address clause (or its equivalent in other languages) to place an object in a particular location. (Note that in this context, 'name' does not mean name in the Ada sense, but rather an identification available to the foreign language system.) Both methods are necessary. The need for the named method is fairly obvious - it allows changes to be made in the foreign language code without requiring changes in the Ada code. This is very much in the Ada philosophy. The absolute addressing method requires a bit more explanation. The key to understanding it is that a foreign language subsystem may not actually exist at link-time. For example, the IBM PC ROM BIOS is a foreign language subsystem to an Ada program, but there is no IBM PC ROM BIOS object module to link. To access objects within the IBM PC ROM BIOS, references must be made to their machine addresses. Memory-mapped hardware devices also can be considered foreign language subsystems. In many cases, the device will produce the values itself, so the processing is done by something other than the Ada program. Some readers may wonder why Ada address clauses alone are not enough to solve this absolute addressing problem. That is because Ada address clauses alone cannot tell between an entity being imported, and one that is local or exported. Both cases are useful and common; I have had both cases come up in real Ada programs I have worked on. Memory mapped hardware devices are a fine example of address clauses being used to reference an imported entity. An example of address clauses being used for local entities is for critical values to be put into a special battery-backed memory. Returning to the named method, we need to consider how the names are determined. Since foreign language names may not be legal Ada identifiers, a method of attaching arbitrary names to Ada entities is needed. There is another glitch: compilers for some languages do not give straightforward names to their entities. (For example, some C compilers prepend '_' to all names they export. C++ compilers often add type information to the names of their entities.) Given enough information, the Ada compiler can automatically construct the correct name. Whether an Ada compiler should do this is a question open for debate. It seems likely that the language should not require 'name-mangling', but that it should be designed so it is possible. The last fundamental issue is that of independent relocation. When an object is imported, the Ada compiler must assume no relationship between the object and the Ada code. That is pretty obvious. However, in some circumstances, it is useful to compile Ada entities such that they are independent of any other entity. One example of this occurs in embedded systems, where it may be necessary to split a program unit between several ROMs. Independent relocation of Ada entities is potentially expensive, since the underlying system may not support it well. Even more importantly, many existing Ada compilers expect to compile all of a single compilation unit into a single relocatable unit. Breaking this assumption would ripple changes throughout an Ada compiler. The capability is only needed for specialized applications (even most embedded systems don't need it). Address clauses also provide a form of this for whatever entities are supported in a given application. (Indeed, address clauses provide the most useful form, needed for examples like the battery-backed memory example given above). Therefore, it seems dubious to require support for independent relocation of Ada entities in the core language. If the need for a standardized facility is great enough, it could be put in an annex. I feel that this need is so specialized that a language-independent definition is unnecessary; let the few vendors whose customers need this feature provide it tailored to the application. It should be noted that any application requiring this feature is not portable (since the need to move Ada objects independent of their declaration implies an application specific need; the Ada semantics do not change if they are not moved). The Ada issues. There are several issues related to the Ada language, which are not specific to any particular solution. A couple of these already were discussed with the fundamental issues - initialization and address clauses. Discussions of other issues follow. Ada 83 interface pragmas are applied to overloaded families of subprograms. (A single subprogram can be considered an overloaded family with one member.) This is inappropriate in some cases, particularly for applying names. Since most other languages don't have overloading, and don't support different parameter types, this simply makes a mess. There are a few special cases where this capability might be useful, but in general it is a hindrance. Some (many?) compilers don't support Interface pragmas on overloaded routines. This is easy to justify, as implementations may place restrictions on this pragma. With the increased capabilities of Ada 9x, it seems likely that supporting overloading will become even harder. Ada 9x originally proposed a solution for this problem (the pragma applies only to the subprogram it follows immediately). However, the DRs felt that this solution was too upward incompatible, and it was removed. A recent note suggests adds an optional parameter and result profile to the specification of the name, in order to solve this problem. This certainly would solve the problem, but has problems of its own. The major problem is with the syntax used to specify the profile. One possibility is to use a full subprogram specification. This, however, is not compatible with the Ada 83 definition of pragma syntax. That means it could not be used in Ada 83 compilers (usually a major advantage of a pragma-based solution). It also requires major changes in pragma processing. Another possibility is the use just a parenthised list of types. This does meet the pragma syntax requirement. However, it is a new, different kind of declaration for Ada users to learn. We use a similar syntax in our assembler to allow access to overloaded Ada routines, but users are rarely comfortable with it. Therefore, they are rarely used, bringing us back to the original problem. Both of these possibilities are implementable, but are much less than perfect. Pragma INTERFACE is the only solution available to solve any interfacing problems in Ada 83. It is usually described as Pragma INTERFACE (language_name, subprogram_name); The description 'language_name' is only a partial description of the situation. A better description would be 'language_implementation_name'. This distinction is important because calling conventions and object layouts are properties of a language implementation; they are only loosely defined by the language. On some systems (MS-DOS for example), implementations of the same language vary widely. There are also cases where the 'language_name' does not describe a language at all, but rather some other situation. For example, on the DEC VAX, there is a standard calling convention shared by several languages; a single name could be used to describe this calling convention. Another example is the proposed INTRINSIC calling convention - it also is not related to a specific language. Nothing in the Ada 83 rules prevent interpreting 'language_name' as 'language_implementation_name'. This property is important in Ada 83, and will be even more important in Ada 9x. It may even be a good idea for Ada 9x to come up with a different name for the 'language_name' parameter that more closely relates to its actual meaning. At the minimum, a note should be added that mentions that language_name may actually identify a specific implementation. A more minor comment is that the first parameter of the pragma ought to have been the entity it applies to. That is the most important parameter, so it should be first. I invariably write the pragmas that way first, and then have to change the parameters around. (For instance, I had to correct many of the examples in this note when this was pointed out to me.) Unfortunately, fixing this would cause a significant upward incompatibility, so we are likely stuck with a bad design. Another Ada problem occurs when an Ada call is made to an Ada routine with a foreign language calling convention. This problem is not obvious at first, but a correct understanding is critical to efficient implementations of interfacing. When you call a routine in a foreign language, Ada semantics stop at the call. Similarly, when a foreign language calls into an Ada routine, Ada semantics start as soon as the routine starts executing. Consider the following example: Package Inter Is Procedure Call_Out (A : In Integer); Pragma Interface(C, Call_Out); Procedure Call_In (A : In Integer); Pragma Calling_Convention(C, Call_In); End Inter; Now assume that the C routine Call_Out calls Call_In. While we're in Call_In, we have a structure that looks like: Call_In Ada Semantics -------|------------------- Semantics Break Call_Out C Semantics -------|------------------- Semantics Break Main Ada Semantics Now, whether or not the Ada semantics are preserved across the semantics breaks is implementation dependent. For instance, there is no reason to assume that an exception raised in Call_In can be propagated thru Call_Out to be handled in Main. Make the semantics implementation dependent is necessary, of course, since other languages do not know anything about Ada, and it is not a good idea to force a subpar implementation on Ada features just so this will work. Now change the example so that Call_Out and Call_In are the same routine. This is essentially the solution to the situation of calling an Ada routine with a foreign calling convention: calling out though pragma Interface and back in thru Calling_Convention. The structure now looks like: Call_In Ada Semantics -------|------------------- Semantics Break (parameter passing) C Semantics -------|------------------- Semantics Break Main Ada Semantics The semantic breaks are still there. It is not necessarily the case that Ada semantics can be preserved if some other calling convention is used. (Otherwise, the compiler might as well use the C calling convention all the time, and avoid the hassles of a conversion). This 'semantic break' model is necessary if we are going to allow Ada compilers to choose the most efficient possible model for their normal code. If the compiler implementor decides that passing in an exception handler pointer or a static link pointer (or whatever) with each subprogram call is the best way to implement something on their target, we cannot force them to preserve Ada semantics across calls in non-native formats. Doing so would eliminate many possible implementation choices for an Ada compiler. There are three possible solutions to the 'semantic break' problem. First, we can say that semantics ARE preserved in a call from Ada to an Ada routine with a foreign calling convention. For direct subprogram calls, this is not a problem, as a dual-ported approach will work fine. (That is, insure that the subprogram has both Ada and foreign calling conventions, and call the appropriate one). Access to subprogram types, however, mean that this has the effect of greatly limiting implementation choices. Secondly, if a vendor decides to add a new language to its interface pragmas, then it may have to redo some of its purely Ada code design, in order that the semantics continue to be preserved. This choice seems unacceptable to me. Second, we can flag calls which have problems as illegal. For direct subprogram calls, this is easy and specific. (But adopting a dual-ported approach is just as easy, and doesn't surprise the user as much). However, for access to subprogram types, this implies preventing any foreign language interface calls. This would eliminate some useful functionality - for example, calling a C routine passed back as a pointer to a C routine from some other C routine. Lastly, we can adopt the semantic break model and declare any calls from Ada to an Ada routine with a foreign calling convention to have implementation dependent semantics (as regular pragma Interface calls do). This has none of the problems of the first two solutions - no restrictions in implementation freedom, no restrictions in functionality. One problem with this is that we can have a call that looks like Ada, to a routine that looks like Ada, yet subtle things are different than if the routine is called with a Ada calling convention. This looks to me like a possible maintenance headache. Why is this? Consider the following maintenance scenario: The maintenance programmer is maintaining procedure Blech. It contains a call to procedure A_Package.FooBar. He decides that he needs to look at procedure A_Package.FooBar to do the maintenance. He therefore opens an editor window on the body of A_Package, and searches for FooBar. After studying the code a while, he decides to raise an exception in FooBar. After recompiling, he tests the package and determines that it still doesn't work. For several hours, he tears his hair out, until he finally realizes that the exception he just added isn't being propagated. He calls his compiler vendor to complain, but they are equally perplexed. Only after someone thinks to look at the specification, do they notice that the routine has pragma Calling_Convention(C, FooBar); which means that the Mugwomp 3000 compiler can't propagate exceptions. I know at least one maintenance programmer (me) that looks at specifications only to fix type mismatches. Most of the time, I trace thru the bodies looking for the bug. If the body of a call is written in Ada, I doubt I would ever think to look to see if it has a special calling convention. Another problem is portability (a certain result of implementation dependence). Some compilers would certainly support propagating exceptions from foreign language interface code. Our compiler certainly would; it would take a lot of extra work to prevent an exception from being propagated. (We have to do that extra work for tasks, for instance). Now if code using pragma Calling_Convention were ported from our compiler to the Mugwomp 3000, exceptions would stop propagating, possibly breaking the code. A possible way to help users with the semantic break model is to require warnings for calls when the semantics are not preserved. These warnings would be broader than strictly necessary, however, so it is not clear how helpful they would be. Requiring documentation of the vendor's implementation (do exceptions propagate? etc.) would certainly be helpful, but it is not clear that all of the relevant properties can be easily explained on an agressively optimizing compiler. The MRT has indicated that they will adopt the semantic break model, and expect implementation dependence in calls from Ada to an Ada routine with a foreign calling convention. This is probably the best solution of a bad lot. At least, Ada calls to non-Ada calling_conventions should be rare. It is important that the RM for Ada 9x makes it clear that the semantics of a call thru a foreign language interface, even to Ada code, are implementation-defined. This MUST be stated in the RM; otherwise this whole issue would eventually be hashed over again by the ARG, and possibly a different resolution would occur. Most of the work to date on Interface pragmas has focused on variables and subprograms. However, there are circumstances where declaring objects as constants would make sense. Consider an embedded system with a field modifiable configuration ROM. Values in this ROM are constants to the embedded program (they can only be changed when the program is not running). Therefore, it would be nice if the values of constants could be imported from foreign code (including hardware, as discussed earlier). One way to allow that would be to allow deferred constants to have the appropriate interface pragma specified for them. In that case, a full declaration would be illegal. This would be very consistent with pragma Interface (for which bodies are not allowed). For MS-4.0, this would allow declarations like the following: A : Constant INTEGER; Pragma Interface_Object(A); For A Use At 16#EA0#; Of course, this can be done with variables (at some loss of expressivity), so this is not critical to the success or failure of a proposal. Another Ada-specific problem with INTERFACE pragmas is their non-one pass nature. Because the pragmas are separated from the declaration they control, and they potentially effect the way that the declaration is compiled, they are very non-one pass in nature. This problem is similar to one suffered for address clauses; however the address clause problem has an ugly solution. It appears none is available for the pragma which declares imported objects (and therefore suppresses default initialization). Consider the following example written with the MS-4.0 rules: With Random; Package Test_Int Is Type Rec Is Record Ran_Val : Natural := Random.Int_Random (1, 100); End Record; Object : Rec; -- Hundreds of lines of other stuff here.... Private Pragma Interface_Object(Object); End Test_Int; (This does correspond to some people's preferred programming style, even if this example is a bit contrived). Think about what a one-pass compiler would have to do to compile this example. If the side-effects implicit in Int_Random are detectable, any implementation where the function was evaluated would be wrong. Pragma Interface_Object is particularly bad, because it requires the compiler NOT to do something. Most representation clauses (including address clauses) can be couched in terms of doing something additional. This problem appears insolveable, short of radical redesign of the Ada declaration syntax. Unless the import syntax or pragma is required to occur with the declaration or immediately after it, this problem will occur. And, given the howl of protest against making a similar change to pragma Interface, this appears to be not in the cards. The next Ada-specific problem is what to do about nesting. Objects and subprograms which are nested may not be able to be reasonably exported or imported. Exactly what can and cannot be done is target and foreign-language specific. However, there are some common cases to consider. First, many systems require that the addresses of exported link names be statically determinable. Since nested (that is, not library-level) Ada objects probably have addresses only determined at run-time, they cannot be exported by link name in such a system. Second, nested Ada subprograms can access objects declared in scopes which surround them. (This is sometimes called "up-level addressing"). If such a subprogram is called from a foreign language when those objects in the outer scopes do not exist, then the subprogram certainly will not work, and odd things will happen. It should be noted that importing foreign entities into nested scopes generally does not present an implementation problem. Assuming the foreign entity is usable at all in Ada, treating it as library-level with restricted naming will usually work to allow it to be used in a nested scope. (Notice that foreign objects imported into a nested scope do not cease to exist when the scope is exited and are not replicated in case of recursion; however, it is unlikely that foreign objects would be used in a context where this matters). There are several possible solutions to these problems. The first solution is simply to leave any needed restrictions up to the implementor. This solution leaves great freedom to the implementor, but is rather expensive to implement. This is because the bodies of exported subprograms must be examined in order to determine the presence or absence of up-level addressing. There is also the danger that standards maintenance groups may later remove the freedom to add restrictions, much as has occurred in representation clauses for Ada 83. (Uncertainty is a major enemy of compiler implementors). The second possible solution is the 'user-beware' model. A compiler will accept any appropriate pragmas, but in nested cases, the resulting program may not work. That allows users to use nesting when they are certain that it should work, but they are essentially on their own to determine that. Combinations of this and the previous solution are possible. Both of these solutions suffer from significant portability problems: a program that worked on one Ada compiler, may not work on another, even if the second compiler is for the same target and foreign language. User-beware is also rather un-Ada-like: Ada philosophy suggests that errors should be detected, rather than simply generating code which does not work. A third solution is for the language to provide restrictions. Language restrictions make programs more portable, of course. Such restrictions are obviously impossible to do in a way which will embody all needed restrictions for all targets - that would like leave us with no support at all. However, the most common problems are common enough to suggest that language restrictions are not uncalled for. One possible restriction is to prohibit export of nested objects and subprograms. Restrictions on entity import seem unnecessary. Finally, portability guidelines could be specified in an annex. I'll come back to this subject later in this note. Yet another Ada-specific problem has to do with object types. Some Ada object types do not correspond well to types in other languages (i.e. tasks, unconstrained arrays). Should some language rule exist which prevents objects of these types to be imported or exported? Or should a user-beware model be used? Any of the solutions mentioned as solutions for the last problem would work here, as well. Ada tasking is another area of concern. When a foreign language routine calls an Ada routine, exactly which task is running? Is synchronization necessary for synchronous calls (calls which are Ada to foreign back to Ada)? Asynchronous calls must be synchronized somehow; how should that be done? What happens when two Ada tasks call the same foreign language routine? Clearly, the best answers to these questions are system and foreign-language dependent. It is not clear that any target and foreign language independent answers can be given to these questions. For example, if the foreign language supports some sort of tasking itself, the answers might be quite different than they would be for a foreign language which does not. Thus, it seems best to try to answer these questions only on a language-by-language basis (and therefore, not in the core). One final Ada-specific problem has to do with programs whose main programs are written in other languages. Ada 83 is very specific about when Ada entities are elaborated, and those rules are all written in terms of the main program (10.5-1). But what happens when there is no Ada main program? Obviously, Ada 9x is going to have to address the elaboration rules if it is to improve support for programs where the main program is written in other languages. Another, related problem is the 'environment task' that executes the main program. When there is no Ada main program, is there an environment task? If not, how to tasks declared in Ada code work? If there is, when does it get started? Solving these problems is not easy. There are a couple of solutions that work reasonably well, but both have flaws. One of the solutions is to have a routine which can be called in order to initialize the run-time system. This routine would create an environment task, execute any runtime initialization necessary, and elaborate all of the Ada routines which are referenced in the program. The foreign main program would have to call this routine before any Ada code could be executed. This solution has a couple of problems. First, it cannot be used in canned foreign language code. Some subsystems have a few hooks where user-written routines can be added. They have no mechanism for changing the main program, so no initialization call can be inserted. These subsystems seems to be the most common case where the ability to have a non-Ada main program is needed. (If the main program can be recompiled, it is not that hard to rewrite it [just the main program] in Ada - that is especially easy if Ada 9x has adopted standardized interfacing facilities.) A lesser problem is that this solution is not automatic -- the failure to call the initialization routine could have devastating effects when the Ada code is called. Finally, there is the problem of defining a reasonable set of parameters to this routine. There are many parameters which could usefully be allowed on this routine (the size of the environment task's stack, whether or not trap handlers are installed, etc.), but it is not clear that any implementation-independent definition of them could be made. The other solution is to have a pragma that could be added to a compilation unit to signify that it could be used from a program that does not have an Ada main program. The Ada compiler would then generate code to do the appropriate initialization and elaboration on the first call to the unit. This solution also has a couple of problems. This solution doesn't work for exported objects, as there is no flow of control which can execute the initialization. A second problem can be that elaboration may occur too late for appropriate resources to be grabbed, particular if the Ada routines are not called frequently, as in error handlers. Both solutions potentially would be expensive to existing Ada compilers. To see why, we have to look at how Ada 83 compilers generally handle elaboration of units. Most Ada compilers contain some sort of program which calculates the elaboration order for a program, and generates the code necessary to implement it. Sometimes this is a separate tool, and sometimes it is part of the linker, but it always has to exist. Such tools can be very user-friendly, since the only thing the tool needs to know is the name of the main program -- everything else can be figured out from the program library. In the case of systems without Ada main programs, the role and implementation of this tool has to be significantly changed. It should be clear such a tool is still necessary, since the partial order of elaboration still needs to be calculated, and all of the withs have to be elaborated, including the transitive ones. Certainly, we would not want users to have to calculate elaboration orders by hand. However, we probably would have to require them to list all units which are to be called by the foreign language system. That is an error-prone requirement. Nor is it clear what a good user interface for such a tool would be. It is also expensive to support in existing tools, which mostly were designed to support a single main program. Finally, in compilers where the elaboration tool is part of the linker, a major redesign would be necessary to separate the functions or the linker and the elaboration tool (or to add support for linking foreign language modules). In either case, a substantial usability loss would occur. A final solution would be to limit Ada entities in systems without Ada main programs to only those that do not require elaboration. While that eliminates all of the implementation problems, it is likely to be far too strict of a limitation. (Nor does it add anything beyond that which users could do with most compilers now). Given the low quality of these solutions, perhaps it is best that none of them were adopted in MS 4.0. I think that, barring a better solution which works for canned main programs and which works for all exportable Ada entities, we cannot require a solution to this problem in Ada 9x. The MS 4.0 solution I won't repeat the details of the Mapping Specification 4.0 solution to the Interface pragma problems. See section 13.9 for details. Rather, I will provide a critique of the solution based on the issues identified in the earlier sections. The MRT has indicated that limits of one calling convention per subprogram and one link name per Ada entity were omitted from section 13.9. The critique assumes that these limits are included in the MS 4.0 solution. Positives First, the MS 4.0 solution covers most of the needs which we've identified. There are no major gaps. Second, implementors are allowed (possibly even encouraged) to add restrictions as necessary. This means that implementations can be efficient and tailored to the needs of their particular users. Third, the solution is reasonably easy to use; most of the time only one pragma is needed for each declaration. Negatives First, there are many possible combinations of the pragmas. Some are legal, but many are not. For example, INTERFACE_NAME can be used with CALLING_CONVENTION, but only if the optional third parameter to CALLING_CONVENTION is not given. This likely to lead to user confusion. Second, some pragmas are more general than is reasonable, which allows doubtful capabilities. Particularly, nesting limitations are inconsistent. Import and export of nested objects is prohibited, while no limitations are placed on subprograms. This prohibits a possibly useful capability (importing objects into nested scopes), and allows a dangerous situation (exporting nested subprograms). Third, no solution is given to the problem for specifying import or export for a single subprogram in a group of overloaded subprograms. (To be fair to the MRT, a solution to this problem was proposed and shot down by the reviewers). Fourth, no solution to the data layout problem is proposed. This means that compiler-specific solutions will need to be used, leading to more non-portability. Fifth, automatic name-mangling of link-time names cannot be done in all cases. (Name-mangling is automatic adjustment of the names for the target language and compiler). This occurs because the INTERFACE_NAME pragma (used to export Ada entities to foreign language code) does not specify a language name. Adding the needed foreign language name would lead to an even more confusing solution, since the language name would have to be redundantly specified when INTERFACE_NAME is used with INTERFACE, for example. Sixth, the INTERFACE_NAME pragma always requires independent relocation. While that is not unreasonable for import (indeed, it is so obvious a requirement that it need not be stated), for export, it is a very specialized need. I do not believe that this requirement is appropriate in the core. Our Implementation We've succeeded in implementing most of the MS-4.0 proposal. This is, in fact, why we've uncovered many of the issue discussed in this note. For the most part, the implementation is straightforward. However, there are a few important notes: Much of the code of implementing the pragmas is spent on getting the combinations of pragmas recognized, and the illegal combinations rejected. There is much more code like that than there is code which actually implements the pragmas. Pragma Interface_Object has a decidedly anti-one-pass compiler design. We had to use an old syntactic trick in order to implement it appropriately. The trick is the same one used to get labels, block names, and generic names handled correctly in the compiler. We are using a dual-ported approach to pragma Calling_Convention, more because of historical reasons than any implementation need. This implementation does avoid the semantic break for direct Ada calls to Ada subprograms with foreign language calling conventions. Comments This section spent a lot more time on the negatives than the positives for this solution. This is not to imply that this solution is bad; on the contrary, this is a well thought out solution. Problems were covered in detail primarily because at this stage identifying what needs improvement is much more important than identifying what is done well. Alternative solutions Pragma Layout A solution that has been suggested by others to the data layout problem is pragma LAYOUT. pragma LAYOUT (language_name, type_name); Pragma LAYOUT directs the compiler to declare type_name in a compatible format for language_name. If that cannot be done, the pragma is rejected (ignored?). The level of support for the pragma is implementation-defined. (In the core; annexes may require some support). This pragma allows Ada compilers to make data types compatible with a foreign language, without the users of the compiler needing to know the details of the foreign language implementation. Compiler implementors are likely to be in a better position to ferret out this information, and in addition, that means it only has to be done once. Such a pragma would be very useful for matching C structs to Ada records (the user would only need to get the order of component declarations the same). In the same way, it would be useful for COBOL records and FORTRAN arrays. The last example might take a bit of explanation: FORTRAN always requires arrays to be stored in column-major order. Ada has no such requirement, so pragma LAYOUT could be used to get the proper representation. A suggestion has been made that pragma LAYOUT also be applicable to objects. It is not clear that this is necessary, and it would break the model for composite types that there is one representation for each type (and objects of that type). If a change of representation is truly necessary, a derived type and type conversion can be used. Therefore, this proposal does not include a pragma LAYOUT for objects. It could be easily added if necessary. For pragma LAYOUT, the need to read 'language_name' as 'language_implementation_name' is even more critical than with other interface pragmas. Different compilers for the same language are likely to have subtle differences in algorithms for record and array layouts. The primary criticism of this feature is that appropriate support may be too hard to implement. This difficulty is not in the difficulty of writing actual code, but rather in gathering the information necessary to implement the pragma correctly. Other compiler vendors may not give out the information necessary to support pragma LAYOUT, particularly to companies that they view as competitors. This means that reverse engineering may be the only way to gather the appropriate information; and that is possibly error-prone (and is prohibited by many license agreements). In addition, each new version of each supported compiler may have changes in the layout algorithms, rendering the old support (and reverse engineering) obsolete. All told, however, this area is important enough to interfacing Ada programs with existing code (and existing standards) in other languages, that some sort of standardized support needs to be attempted. Simply proposing a feature in this area is likely to stimulate Ada vendors to ever-better support for data layouts. This will make interfacing to other languages easier, reducing an important barrier to the adoption of Ada. One alternative solution. One alternative solution is to follow a similar scheme to the MS 4.0 solution, but using one pragma to control one major issue. The major change is to drop all of the three parameter forms of the pragmas. By separating all issues, we have better control over the combinations. We also need to add some pragmas for issues which are not controlled by the existing pragmas. Here is one solution following this direction. Pragma INTERFACE (language_name, subprogram_name); Imports a subprogram from a foreign language. Does NOT imply a calling convention. Pragma CALLING_CONVENTION (language_name, subprogram_or_type_name); Same rules as in MS 4.0, except for the lack of a third parameter. Does NOT imply import or export. Pragma INTERFACE_OBJECT (object_name); Essentially the same rules as MS 4.0. It implies import, either by name or by address clause. Some type-specific restrictions could be considered here. Address clauses Same definition as Ada 83. This is one of the two ways of specifying a 'name' for an entity. Pragma INTERFACE_NAME (object_or_subprogram_name, string); Specifies the link-time name for the specified entity. This is the other way of specifying a 'name' for an entity. Link-time names cannot be specified for entities which are neither imported nor library-level. The maximum length of an INTERFACE_NAME is implementation-defined. Specifying an INTERFACE_NAME for an entity which is not imported implies export of that entity. Pragma LAYOUT (language_name, type_name); As described previously. Address clauses Same definition as Ada 83. An address clause cannot be given for an entity for which a link-name is specified. Pragmas can be combined when that makes sense. Pragmas which effect the same property (import vs. export, entity identification, calling convention) cannot be combined on a single entity. (This rule prevents importing and exporting the same subprogram, multiple language calling conventions on one subprogram, and giving multiple names to a single object, among others). One change that could be considered to this, is to leave pragma INTERFACE as in Ada 83. This would cause it to continue to imply a calling convention. This would mess up the model somewhat (not all calling conventions would be explicitly specified by a single pragma), but would provide upward compatibility. I considered separating naming and export with separate pragmas (which would take the model to its logical conclusion), but naming without export or import does not seem useful. A separate pragma EXPORT could be considered, which would eliminate the remaining multiple effect pragma, making the model even more consistent. Let's consider this model as we did the first. Positives First, The user needs are covered very well with this solution. Second, this solution does not require any more new pragmas than the MS 4.0 solution. Third, this solution is very flexible. All meaningful combinations of needs are well supported. Fourth, this solution is a bit simpler than the MS 4.0 solution. Lastly, this solution is probably closer to the existing state of Ada compilers than either of the others we will consider. (I will have more to say on this). Negatives First, this solution requires the use of more than one pragma in many common situations. For example, creating a MS-WINDOWS callback routine would take the use of both pragma CALLING_CONVENTION and pragma INTERFACE_NAME. The effect is that this solution is harder to use than the MS-4.0 solution. Second, this solution is not upward compatible unless the alternative is used, and the alternative muddies the model a bit. Third, this solution makes automatic name mangling difficult. That is because the link names are separated from the associated language names. In addition, items can be exported without giving a language name at all; what mangling to perform cannot be determined. Fourth, there are many illegal combinations of pragmas. This may cause confusion. This is not as bad as the MS 4.0 solution, however, since the pragmas are more specific, making it a bit less likely that confusion will occur. Fifth, usage restrictions which are tied to more than one interfacing issue are hard to express. This not a major problem at the language level (since such restrictions are unlikely to occur in a vendor and target independent way). But this could cause problems for Ada implementors documenting the restrictions on the usage of these pragmas. Sixth, no solution is given to the problem for specifying import or export for a single subprogram in a group of overloaded subprograms. This is also true of the other solution in this note and the MS 4.0 solution. Seventh, this solution may be harder to implement than the others. Many systems require that naming and import/export be determined by a single command/object record/etc. This solution would force an Ada compiler to gather all of the pragmas that apply to an entity before it can determine what to do - what code to generate, what object records to output, etc. Comments I am unconvinced that this solution is an improvement over the MS 4.0 solution. It offers some advantages, but they do not seem to outweigh the deficits. Another alternative solution. Another, different alternative solution is to only support the three parameter forms (that is, have no separate pragma INTERFACE_NAME). Some renaming of pragmas makes them a bit more intuitive. Here is a solution based on that idea: Pragma INTERFACE (language_name, subprogram_name [,string]); Imports a subprogram from the foreign language 'language_name', with the calling convention specified by 'language_name'. If 'string' is present, this gives a link-time name for the subprogram. The maximum length of a link-time name is implementation-defined. If 'string' is not present, an address clause may give the external name for the entity, or if neither an address clause nor 'string' are present, a implementation-defined link-name will be used. Pragma INTERFACE_OBJECT (language_name, object_name [,string]); Imports a object from the foreign language 'language_name'. 'String' is handled by the same rules as for INTERFACE. If foreign language object layouts are deemed necessary, this pragma would specify that such a layout is used. Pragma EXPORT (language_name, subprogram_name [,string]); Exports a subprogram to the foreign language 'language_name', with the calling convention specified by 'language_name'. 'String' is handled by the same rules as for INTERFACE. The language_name 'Ada' should be supported, in order to allow export of Ada subprograms. This pragma can be used only on library-level subprograms. (This roughly corresponds to pragma Calling_Convention in the MS 4.0 solution; the other use of pragma Calling_Convention (for access to subprogram types) is handled by pragma Layout (see below)). Pragma EXPORT_OBJECT (language_name, object_name [,string]); Exports an object to the foreign language 'language_name'. 'String' is handled by the same rules as for INTERFACE. The language_name 'Ada' should be supported, in order to allow export of Ada subprograms. If foreign language object layouts are deemed necessary, this pragma would specify that such a layout is used. If 'string' is given, this pragma can be used only on library-level objects. Pragma LAYOUT (language_name, type_name); As described previously. In addition, pragma LAYOUT on an access to subprogram type implies that the access to subprogram type has the calling convention of the language specified. Pragmas cannot be combined. Link-time names and address clauses cannot both be specified for a single entity. Let's consider this model as we have the others. Positives First, this covers all of the important user needs. Second, combinations of pragmas are always illegal. This means that there should be much less confusion using the pragmas. There is no need for a user to understand the model and interface issues in detail, like there is for the other two solutions. Third, the restrictions (particularly library-level requirements) can be applied to a single pragma, without any complex 'if this and that, then illegal' rules. That makes the restrictions simpler to understand, and simpler to document. Fourth, automatic name mangling is easy, because the language being interfaced with is always known. However, it would be difficult to import or export entities already in the Ada format to/from a foreign language with automatic name mangling, since the language name in that case is 'Ada'. That gives Ada entities a second-class status when interfacing (you really have to use entities formatted in the foreign language, even if it is not unnecessary), but since this is usually the case anyway (since the foreign code is often canned and therefore not modifiable). Fifth, this proposal is easy to use. Only one pragma need be used for any entity which is to be exported or imported. Sixth, this solution seems more natural. As evidence of this, several compiler vendors have chosen this for pragma INTERFACE, even though it is illegal in Ada 83 (via ARG rulings). Finally, this solution is somewhat easier to implement than the others. That happens because many systems require that naming and import/export be determined by a single command/object record/etc. While the other choices require gathering all of the pragmas that apply to an entity before it can determine what to do, this solution has all of that information in a single pragma. Therefore, the compiler can do the implementation of the pragma immediately. Negatives First, some obscure combinations of features cannot be accessed. For example, it is not possible to import a foreign language subprogram with an Ada calling convention. These needs appear to be very rare, and can partially be addressed by using a language name 'Ada'. (However, that won't work if automatic name mangling is in use). More importantly, it is not possible to create a local routine with a foreign calling convention (to be used to create an access to subprogram value, for example). This deficiency is shared with the MS 4.0 proposal, and is not serious. At worst, it causes name space pollution. Second, these pragmas are large and more complex than those of the other solutions. This is to be expected - while the other solutions have their complexity in the combinations of pragmas allowed, here the complexity has moved to within the pragmas. It is probably a matter of taste whether you prefer smaller building blocks or bigger ones. Third, this solution cannot be used in Ada 83 without a change to the ARG rulings. In particular, the third parameter to pragma INTERFACE cannot be added as an implementation-defined parameter. This could be handled by having the ARG explicitly allow an exception in this case, or by continuing to use whatever Ada 83 solution the compiler vendor supports. Finally, no solution is given to the problem for specifying import or export for a single subprogram in a group of overloaded subprograms. This is also true of the other solutions we've considered so far. Comments This solution is pretty good, and the main reason we won't adopt it is that we will use it in the next section to develop an even better one. Another solution in the same vein. Several of the minor problems with the previous solution can be solved by making simple changes to it. (My thanks to Tucker Taft for these ideas and some of the text). First, if we are going to use the term "EXPORT" here then we should abandon the term "INTERFACE" and use "IMPORT[_OBJECT]." Since we are changing pragma INTERFACE in an incompatible way, we might as well just drop the name. It will still have to be supported in Ada 9x compilers for upward compatibility (and implementors will probably have to continue to support whatever implementation-defined pragmas that they had to handle link-names and the like), but any newcomer to Ada would be much happier to see IMPORT/EXPORT than INTERFACE/EXPORT. INTERFACE is a confusing term in any case, and it doesn't suggest "import" to a naive user - indeed, it probably suggests export to many users. Now that we are no longer tied to the Ada 83 definition of INTERFACE, we might as well reverse the language_name and entity_name, putting the item operated on first, where it belongs. It is the most important, so it naturally comes first. We also can change the rules on overloading resolution of the pragma, leading to possibly solving the problem of specifying import or export for a single subprogram in a group of overloaded subprograms. It is also the case that we really don't need separate pragmas for EXPORT and EXPORT_OBJECT. There already exist pragmas which take all kinds of different entities. Also, in this case, there cannot be any conflict, since the pragmas must be specified in the same declarative part as the entity. Finally, it has been suggested that LAYOUT is too loaded of a name for the specification of the language implementation for a type. The suggestion of pragma LANGUAGE() is definitely more neutral. An advantage of pragma LANGUAGE() is that it could be a way to specify the calling convention of a subprogram without exporting its name. Furthermore, pragma LANGUAGE makes it clear we are only specifying exactly one thing, namely the language. With a name like "LAYOUT" (or CALLING_CONVENTION for that matter), people will be tempted to add all kinds of additional parameters to provide finer control. This should be handled with separate pragmas for portability's sake, so we don't want to use pragma names that imply they give more control than they really do. Using these ideas, we get: Pragma IMPORT(entity_name, language[_implementation]_name[, string]); Imports an entity from the foreign language 'language_name'. The entity must be a subprogram or object. If 'string' is present, this gives a link-time name for the entity. The maximum length of a link-time name is implementation-defined. If 'string' is not present, an address clause may give the external name for the entity, or if neither an address clause nor 'string' are present, a implementation-defined link-name will be used. Pragma EXPORT(entity_name, language[_implementation]_name[, string]); Exports an entity to the foreign language 'language_name'. The entity must be a subprogram or object. 'String' is handled by the same rules as for IMPORT. The language_name 'Ada' should be supported, in order to allow export of Ada subprograms and objects. This pragma can be used only on library-level entities. Pragma LANGUAGE(entity_name, language[_impl ementation]_name); Specifies that entity_name has the properties required for the foreign language 'language_name'. Entity_name must name an object, subprogram, or type. The language_name has the same effect for all three of these pragmas. If entity_name denotes a subprogram, the language_name specifies the calling_convention. If foreign language object layouts are deemed necessary, this pragma would specify that such a layout is used when entity_name is an object. Finally, (only for pragma LANGUAGE), if entity_name denotes a type, the effects described for pragma LAYOUT in the previous section would apply. For entity_names which denote subprograms, the pragma only applies to the subprogram which most nearly precedes the pragma. Therefore, in an overloaded family of subprograms, one pragma can be given for each subprogram, as long as the pragma is between that subprogram and the next subprogram which is declared with that name. Pragmas cannot be combined. Link-time names and address clauses cannot both be specified for a single entity. Let's consider this model as we have the others. Positives The positives for the previous solution all apply here, with the possible exception of the implementation one. In addition, this proposal is more flexible, since the possibility of these pragmas supporting other kinds of entities exist. For example, on some systems it may make sense to support importing and exporting exceptions. (C++ 3.0 has exceptions, for one instance). Indeed, we could allow implementations to support additional kinds of entities, depending on the specified language. The regularity of the pragmas makes this possible; we do not have to add EXPORT_EXCEPTION to export exceptions, for example. Another advantage of this proposal is that it can eliminate the problem of specifying import or export for a single subprogram in a group of overloaded subprograms. This proposal also has a way to create a local routine with a foreign calling convention, which the previous one did not. Such routines are often necessary to create an access to subprogram value to be passed to a foreign language routine. This proposal eliminates the name-space pollution caused by having to export a routine in order to specify its calling convention. Negatives First, some obscure combinations of features cannot be accessed. For example, it is not possible to import a foreign language subprogram with an Ada calling convention. These needs appear to be very rare, and can partially be addressed by using a language name 'Ada'. (However, that won't work if automatic name mangling is in use). Second, these pragmas are large and more complex than those of the other solutions. This is to be expected - while the other solutions have their complexity in the combinations of pragmas allowed, here the complexity has moved to within the pragmas. It is probably a matter of taste whether you prefer smaller building blocks or bigger ones. Third, this solution leaves pragma INTERFACE as the 'tailbone' of these pragmas. It will have to continue to exist for Ada 83 compatibility, yet it will have no real use (being equivalent to a weak form of IMPORT). This will turn off some language purists. Fourth, this solution still suffers from the non-one-pass nature of these pragmas. That is true of all pragma based solutions. Fifth, this solution for overloaded subprogram recognition has already been rejected by the DRs once. The portability reason has been eliminated (as INTERFACE will continue to work as it always has), but the other reasons still exist. It remains to be seen if those are serious enough to kill that part of this solution. Finally, this solution may be a bit harder to implement, because of the 'anything goes' nature of the entity. This effect should not be serious. Comments This solution seems to have the least negatives of all of the solutions considered. It is likely that its major deficit will be that it does not use the Ada 83 solution to the problem at all. However, it does so to fix major and minor problems in that solution. Sometimes it is best to start over, and perhaps this is such a time. This solution is not very close to what Ada 83 compilers do, and this probably will be used as an argument against it. This does not seem to be a very compelling argument. First of all, any upward incompatibility is caused by implementation-defined features. It is unlikely that any language design could be invented which has no upward incompatibility caused by implementation-defined features, since there are so many. Moreover, it is hard to use implementation-defined features as a basis for good language design, as they are likely to have been designed as those features easiest to implement, rather than easiest to use or understand. In any event, Ada compiler vendors will probably have to keep supporting their Ada 83 pragmas in order to avoid breaking many of their customer's programs. If the Ada 9x solution is very similar to their existing solution, it could actually be harder to do that (as well as being harder to use). (This is the same effect that makes it harder to program at the same time in Ada and Modula-2, two similar languages, than say, Ada and LISP, two very different languages). As long as the spirit of the Ada 9x solution is similar enough to existing practice (which is true for all of the solutions we've considered), there should be no problem (other than name conflicts) continuing to support the old implementation-defined pragmas in an Ada 9x compiler. That means that there is no real upward-compatibility issue to be concerned with, and that ought to leave us free to pick the best technical solution. A radical solution It is notable that some of the problems exist in all of the pragma based solutions. This is because of a fundamental problem with pragmas: they have to be specified separately from the entity being modified. Separate specification of properties is required for types in Ada 83 (and therefore Ada 9x). Therefore, the designers of Ada found it natural to extend that to objects and subprograms. However, while types are just compile-time entities, objects and subprograms have signification run-time implications. Separate specification of properties for these entities is unnatural, and leads to some of the major problems. Both the problems with overloaded subprogram identification and the problems with address clause initialization go away if those specifications are part of the declaration. Here is a solution based on these ideas. Since this solution is such a radical change from Ada 83 and from MS 4.0, it is more likely to contain minor glitches than the other solutions. In particular, I haven't checked the syntax for conflicts; obviously the exact syntax is not that important. subprogram_specification ::= subprogram_declaration [interface_part]; object_declaration ::= identifier_list : [CONSTANT] object_subtype_definition [:= expression] [interface_part]; interface_part ::= USE identifier [IMPORT|EXPORT] [interface_address] interface_address ::= AT expression The identifier in 'interface_part' is the language name. As in the previous solution, the language name 'Ada' should be supported, to allow export of entities with Ada calling conventions or formats. The language name specified is used to set the calling convention for subprograms and The interface_address part specifies the external name for the entity. If the expression is of type System.Address, this has the same meaning as an address clause in Ada 83. If the expression is a constant of type String, this provides a link-time name for the entity. The keyword IMPORT specifies that the entity is imported from the outside. The keyword EXPORT specifies that the entity is export to the outside environment. If EXPORT is specified, an interface_address must be given, and the entity must be library-level. If neither IMPORT or EXPORT is specified, the entity belongs only to the Ada program. (This is useful to give a subprogram a C calling convention so it can be passed as a pointer to a C routine, for example). Object declarations for which the keyword IMPORT is specified may not have an initialization expression. In addition, the elaboration of the Ada object declaration will have no effect other than the evaluation of the interface_address expression (if necessary). Subprogram declaration for which the keyword IMPORT is specified may not have a body. For subprogram bodies, the interface_part must conform with the subprogram specification. This provides additional documentation about the calling convention at the site of the body; it should help minimize the effects of the semantic break model. Pragma LANGUAGE (type_name, language_name); As described previously (as pragma LAYOUT). In addition, pragma LANGUAGE on an access to subprogram type implies that the access to subprogram type has the calling convention of the language specified. A pragma is retained for types, as all representation clauses are already separated from their declaration in Ada 83. There probably is a solution that does not require additional keywords; I did not look for one. The IMPORT keyword is clearly not necessary, but it makes the description and use of the constructs somewhat easier. For upward compatibility, we would probably continue to support address clauses and pragma INTERFACE. These would probably be marked as obsolete, and defined in terms of the appropriate interface_parts. Let's consider this model as we have the others. Positives First, this covers all of the important user needs. Second, there is no possibility of combinations of interface_parts. The syntax makes it crystal clear that only one external name is allowed per entity and only one calling convention is allowed per subprogram. No pragma-based solution can do that. Third, there is no problem handling overloaded subprograms in this case. Each such overloaded subprogram would contain its own interface_part, and that interface_part would apply only to the subprogram that it is part of. There is no possible confusion. Fourth, this proposal is easy to use. An interface_part is added to only those entities which are to be exported or imported. No additional pragmas are needed. Understanding of programs with interface_parts is also easy (particularly if new keywords are used). Fifth, automatic name mangling is easy, because the language being interfaced with is always known. This does suffer from the same 'second-class status' problem that the previous two solutions have. Finally, this solution is easier to implement than the others. It eliminates all of the implementation problems with the current address clause. In addition, it also eliminates any need to gather pragmas and delaying declarations. It has the effect of improving the one-pass compilability of Ada. (The need to continue to support address clauses to keep compatibility with Ada 83 does hurt this somewhat. However, simply having a better solution available will help Ada compiler vendors, because they will no longer have to try to improve address clause code generation.) Negatives The main negative with this solution is that there is no possible way to use it in Ada 83. Since it is syntax-based, programs using it cannot be compiled by Ada 83 compilers no matter what loopholes are opened in the language. That is not a problem with any of the other solutions. Second, this solution requires new syntax and possibly new keywords. This should not be a major problem, given that Ada 9x is already introducing new keywords and syntax. Finally, the Ada 83 solutions to the problems will have to be retained for upward compatibility. That is unfortunate, since they will prove confusing (they do not fit into this model at all), and indeed, they are part of the problem in this case. That means additional implementation work for obsolete features. Comments This is the solution that we would use if Ada was a totally new language. It is much superior in use and implementation to any of the others. However, this change appears to be just too much of a change from Ada 83. It required admitting that the Ada 83 solution was a mistake, which it is not clear that everyone is willing to do. The inability to use this solution in Ada 83 is also a downside, and the one which pushes it over the edge in my opinion. Portable Interfaces The remaining question is whether any of these solutions goes far enough. Portable interfaces to other languages are very important, as there has been no where near enough work for Ada tools and libraries. In our end of the market, most of the interesting products have C interfaces. Worse, the current state of the art (for Ada 83 compiler) is poor. Even for the same machine, Ada compilers vary widely in their interface support. AETECH has told us that to maintain XLIB interfaces for three different 80386 UNIX Ada compilers, they essentially have to use different code for all three. Differences in the way records, addresses, parameter passing, and call backs are handled make the code for each of the compilers very different. It is important to note that these are not bugs that need to be worked around, but rather legitimate differences in compiler implementation. This sort of problem even occurs between compilers from the same vendor for different targets. If bindings needed to be written only once for all Ada compilers, it would be much more likely that more bindings would be written. For example, in the MS-DOS PC market, supporting Ada means supporting three very different compilers. As a result, few library companies support Ada at all, and the few that do only support one compiler. Another problem is that there is very little information on how to build bindings. Ada 83 books, if they mention pragma INTERFACE at all, usually refer the reader to their compiler's documentation. Worse, INTERFACE building is an arcane art, with lots of pitfalls. (Anyone who thinks this is a simple subject should look at the size of this overview LSN). If bindings were more portable, it is likely that new Ada books would cover this area better. That, of course, would lead to more bindings, because Ada users would not be as afraid of the subject. Ada 9x will help this, particularly by offering better call-back support, and by making it easier to get pointers to objects and subprograms, but the features described by the core language will not be enough. Compilers will still have a lot of freedom to be different. Let's consider C as an example. One way compilers can legally differ is in the mapping of Ada parameters to C parameters. They can match in forward order (the first Ada parameter matches the first C parameter), reverse order (the last Ada parameter matches the first C parameter), or something else. C compilers usually pass their parameters in the reverse order from Ada compilers, so this is a very real choice. Another legal difference is the handling of Ada parameter modes. Particularly, the meaning of Ada In Out parameters can be different amongst implementations. A compiler could make In Out parameters illegal, it could treat them as pointers to the parameter type, or it could treat them the same as In parameters. (C only has In parameters for practical purposes). Another difference between Ada compilers is exactly which types match which C types. For example, in many Ada compilers, Integer is the same as the C int type. But that is not true for all compilers. It is also the case that an Ada access type does not have to be compatible with a C pointer, nor must System.Address be compatible with a C pointer. There are many other legal differences that rob interfaces of portability. We could try to solve this problem by adding additional requirements in the core of Ada. However, that would cause an explosion of requirements in the core. That's because the requirements necessary to make these pragmas truly portable are different on a language by language basis; cluttering up the core with definitions for C interfacing, Fortran interfacing, Cobol interfacing, and so on would be a terrible mess. A better solution is to have the core language provide a framework for portability on which other standards (either in annexes or in secondary standards) can be built. Secondary standards would work, if they were adopted by most Ada compiler vendors. The problem is that the track record of secondary standards for Ada 83 is none too good. Few, if any, secondary standards are commonly supported by Ada 83 compilers, and none are universally supported. Therefore, the best place to put additional portability requirements is in the annexes. Specific other foreign languages are most likely to be used in particular application areas anyway (FORTRAN for numerics, COBOL for IS applications, etc.), so that placement is not artificial. I would like to propose a possible solution for portable C interfaces, to given an idea of what such an annex requirement would look like. C portable interface requirements An implementation of the portable C interface will provide the following package. Package C_Types Is Type C_Int Is -- implementation-defined. Type C_Long Is -- implementation-defined. Type C_Short Is -- implementation-defined. Type C_Unsigned_Int Is -- implementation-defined. Type C_Unsigned_Long Is -- implementation-defined. Type C_Unsigned_Short Is -- implementation-defined. Type C_Unsigned_Char Is -- implementation-defined. -- These are all integer types, with the representation -- of the corresponding C type. Type C_Num_Char Is -- implementation-defined. -- This is an integer type corresponding to the C (signed) char -- type. It is used only when a character needs to be used as a -- number. Type C_Float Is -- implementation-defined. Type C_Double Is -- implementation-defined. -- These are floating point types, with the representation -- of the corresponding C type. Type C_Char Is -- implementation-defined. -- This is a character type corresponding to the C char type. -- If Ada 9x adopts a 256-element character, this can be -- a derivation of Standard.Character. Otherwise it will -- need to be a new character type. Type C_Char_Ptr Is Access All C_Char; -- C *Char. Function To_C_String (Str : String) Return C_Char_Ptr; -- Converts the Ada string to a C string (null terminated). -- The string will be copied to the heap, and a pointer to it -- returned. Function To_Ada_String (Ptr : C_Char_Ptr) Return String; -- Converts the C string (null terminated) to an Ada string. Procedure Free_C_String (Ptr : In Out C_Char_Ptr); -- Frees a C string created by To_C_String. Ptr is set to -- Null. End C_Types; An implementation of the portable C interface is required to implement pragma LAYOUT such that when used on: A general access type whose designated type is from package C_Types or is a type on which pragma LAYOUT(C) was used Statically constrained arrays for which the component type is from package C_Types or is a type on which pragma LAYOUT(C) was used Record types without discriminants for which all of the component types are from package C_Types or are types on which pragma LAYOUT(C) was used the representation is compatible with the appropriate C type (a C pointer, a C array, and a C struct, respectively). The implementation should issue a warning if the pragma LAYOUT(C) cannot be accepted. An access parameter type follows the same rule as a general access type. (Note: This will generally mean that Ada compilers will have to pass access parameter types differently to Ada and C routines, because a scope level must be passed with then for Ada routines.) An implementation of the portable C interface has the following requirements for pragma INTERFACE(C) and pragma CALLING_CONVENTION(C) (or EXPORT(C)): If all of the parameters (and function result, if any) for a subprogram have either pragma LAYOUT(C) given for them or are types from package C_Types, and all parameters have mode In, then: Parameters must be mapped such that the first Ada parameter corresponds to the first C parameter, the second Ada parameter corresponds to the second C parameter, and so on. Parameters must be usable as the corresponding C type within the C code. (This means that a parameter of type C_Int must be usable as an int in the C code). Similarly for the return value. In other situations, the results are implementation-defined. The foreign language name 'C' must be supported in an implementation of the portable C interface. It should refer to the most commonly used C implementation on that target. What implementation(s) the language C refers to must be documented. Other C implementations can be supported, using other language names if necessary. (A language name really refers to a particular implementation of a language, rather than just a language. Implementations of a given language can vary widely, even on the same target). One possible addition to this is to define the meaning of In Out parameters. This would look like: For the parameters for a subprogram whose type has either pragma LAYOUT(C) given for them or is a type from package C_Types, and has mode In Out, then: Such parameters must be usable as a pointer to the corresponding C type within the C code. (This means that a parameter of type C_Int must be usable as a pointer to int in the C code). Adding this rule would allow a more natural representation of some procedures. However, since most C routines are functions, and Ada functions cannot have In Out parameters, this change has limited utility. In addition, access parameter types provide a reasonably convenient work-around (especially because there are no scope checks for those parameters). It is not clear whether adding this complexity to the portable C interface proposal is worth the effort. Another open issue is the handling of exceptions which propagate out of EXPORT(C) (or CALLING_CONVENTION(C)) subprograms. C does not have exceptions, so their mapping is not obvious. The question primarily whether they should be mapped at all. If they are to be mapped, one possibility is to have the subprogram return -1 and set ERRNO (in a POSIX environment) to some appropriate value. A C-callable query function could be provided to get the exception name (much like EXCEPTION_NAME). Such a solution would make the use of exceptions useful in the C code. However, such a solution would suffer from potentially adding overhead to every C routine, whether it was used or not. It also would not work well for functions that returned pointers or float types (at least from an Ada perspective). If exceptions are not mapped, what happens when an exception propagates from an EXPORT(C) (or CALLING_CONVENTION(C)) would be implementation-dependent. That is not as bad as it sounds, since a routine can easily be defensively programmed with appropriate exception handlers when necessary. Such a solution also would allow Ada compilers to keep unhandled exceptions from 'going away' just because C is involved. (A lot of C programs don't check return codes.) I personally prefer the second solution, since it allows programmers more control, and has less overhead. Tasking is another area where the mapping to C is less than clear. Even calling out to C can be a problem, since various C compilers are quite different when it comes to re-enterancy. Probably the most that could be specified is the following: An Ada compiler must arrange that all Ada tasks may call C routines. A compiler may restrict the number of Ada tasks that can call C routines simultaneously. The second restriction may be necessary because some C compilers generate code that will not function unless it is executed on it's own stack. For such a compiler, all calls to C would have to share a single stack, and therefore only one call at a time could be processed. For call-ins to Ada programs with tasks, the situation is even less clear. Probably the best solution is to say that the task which called out is the one which is called back. (For programs without Ada main programs, the task which called out is the environment task.) If additional synchronization is needed, it can be provided by the Ada programmer. In the case of asynchronous call-backs, these have to be handled using the interrupt mechanism (a protected record). Finally, what kind of support is needed for programs which have C main programs is still an open issue. As discussed earlier in the Ada-specific issues section, all of the proposed solutions to this area are deficient. We could define a routine INITIALIZE_ADA_ENVIRONMENT to do the initialization, or define a pragma FREESTANDING_UNIT, but these suffer from the previously discussed problems. We could also just leave this area implementation-dependent. It is not clear just how important this area is, particularly if interfacing to C is easy and portable. That means that defining an Ada main program (which mainly would call the C main program) may be relatively easy, and eliminates the need to address this issue. Conclusions Five possible solutions to the interfacing problem were given earlier in this note. The last solution (new syntax) is definitely the best solution on technical grounds. It appears to be too much of a change from Ada 83 for it to be adopted, however, so I am not recommending it. Therefore, I recommend adoption of the last pragma solution given above. It has fewer negatives than the others; in particular, it does not try to include every reasonable solution to the problem. It also is the easiest to use; only one pragma is needed on each foreign language entity. Finally, it has an expandability that the others do not. In any event, all of the models have significant advantages over doing nothing. It is essential that one of them be adopted in Ada 9x. If the MS 4.0 model is retained, several changes should be made to it: A) Pragma LAYOUT (or LANGUAGE) should be added to the core. B) The independent relocation requirement should be dropped or moved to a special pragma in an annex. C) Wording must be added to define when pragma CALLING_CONVENTION causes export, and when it does not. Wording should also be added to indicate that a link-name cannot be given if an address clause is given for an entity in pragma CALLING_CONVENTION or INTERFACE_NAME. D) Wording must be added to define legal and illegal combinations of pragmas (given that the intent is one (external) name-one entity and one calling convention-one subprogram). The proposal to allow deferred constants to have Interface_Object applied to them is not as important; primarily it increases the consistency of the design. The semantic break model of calling using foreign language conventions must be codified in the Ada reference manual, lest users and ACVC testers think that the semantics should be preserved through such calls. A way of applying an interface pragma to a single routine of an overloaded family would be welcome. I have not proposed such a method, since any workable method would appear to be too upward incompatible. Finally, annexes should go further, so that truly portable interfaces can be built. I have proposed one possible solution for one language, C (in my opinion the most important one to have portable interfaces for). Solutions for other common languages, particularly FORTRAN and COBOL, are still needed. Acknowledgments. I would like to thank Isaac Pentinmaki, Roy Hollinger, and Tucker Taft for their useful comments on this note.