!topic LSN on Interface to C !key LSN-1070 on Interface to C !reference RM9X-M.3;4.0 !from Bob Duff $Date: 94/02/15 16:47:03 $ $Revision: 1.7 $ !discussion This LSN discusses several shortcomings in the interface to C. There may be similar problems with the COBOL and Fortran interfaces -- that should be checked separately. It may be that some of the things suggested here for C are general enough to be in M.1 or M.2. I also recently submitted some more minor issues as normal comments to ada9x-mrt. I don't feel strongly about the difference between Implementation Advice and real requirements. Either one is better than saying nothing about these issues. I have been working on an Ada 9X binding to the X Window System, which is why I noticed some of these issues. It would be very nice if one could write an interface to versions of X written in C, and expect it to work across all Ada compilers and all standard implementations of X. The same would be true of any other binding, of course. The goal should be to get the Ada compiler vendors "out of the loop" when it comes to writing bindings to existing C code. ---------------- Lack of parameter passing semantics: We never actually define what sorts of parameter and result profiles can be mapped, and how they map. Some cases are "obvious", and therefore might be uniform across Ada implementations by accident. However, there are some non-obvious cases. M.1(23) gives general (language independent) Implementation Advice for interfacing support; M.3(4,16-20) give language-specific requirements for interfacing to C. None of these paragraphs gives enough information on passing parameters to C functions. Similar issues arise for imported and exported variables. If the programmer has the C declaration, it should be clear from Annex M: (1) whether that C declaration can be (portably) interfaced to from Ada, and (2) what Ada declarations correspond to the C declaration (portably). M.1(23) should be made more vague, as follows: For each supported convention other than Intrinsic, an implementation should support Import and Export @nt{pragma}s for variables and subprograms, and @nt(pragma) Convention for subprograms and types, as appropriate to the language being interfaced to. @nt{Pragma} Convention should be supported, when appropriate, for array, record, and access types. @nt(Pragma) Convention need not be supported for scalar types. Having made M.1(23) more vague, the language-specific requirements should be made less vague. There should be a notion of a "C-compatible type (or subtype?)" and a notion of a "C-compatible profile". The following should be C-compatible: - The special types in Interfaces.C. - Record and array types whose convention is C, and whose subcomponents are C-compatible. We might want to restrict records to non-discriminated ones, and arrays to statically constrained subtypes. (It's not clear to me whether we should be talking about types or subtypes, here. There is no such thing as a "statically constrained type". But I think we want to allow pragma Convention on an unconstrained array type, although we might want to restrict it to static constrainedness in certain parameter-passing and access type cases.) - Access types whose convention is C, and whose designated subtype (or profile) is C-compatible. - Subprograms profiles with convention C, and with C-compatible parameter subtypes, and certain modes. We probably want an 'in out' Interfaces.C.int to correspond to an 'int *' parameter on the C side. We then need to define which pragmas Import, Export, and Convention ought to be supported for C, and what they correspond to on the C side. The language-specific requirements in M.3(4,16-20) should be rewritten. Instead of the first sentence of M.3(4), add the following to M.2(12): "The types declared in these packages correspond to the types of the same names in C, COBOL, or Fortran" (possibly with some exceptions in the Fortran case -- Character_Set and Imaginary). The minimal support should include subprograms and access-to-subprogram types declared at library level. We also ought to include named general access-to-object types. This is in addition to the special-purpose access types in Interfaces.C.Strings and Interfaces.C.Pointers. The latter is specifically for treating an access value as a pointer into an array, with C-style address arithmetic. A plain old pointer to an int is better represented by a user-defined access type. Note that the language-independent advice in M.2(23) encourages an implementation to support pragma Convention for access types. But in M.3, we contradict our own advice; we don't require pragma Convention(C) for user-defined access types. A void C function corresponds to an Ada procedure. A non-void C function corresponds to an Ada function. An int C function can correspond to an Ada procedure or function (or both); a procedure ignores the result. Rationale: Although any result can be ignored in C, we cannot require the Ada compiler to ignore function results of arbitrary type, because the declaration of the procedure does not tell the compiler the type of the result to be ignored. Code at the call site typically depends on the result type, even when the result is ignored. Restricting Ada procedures to either void or int results allows the Ada compiler to know that if there is a result, it uses the return convention for ints. Furthermore, most ignorable results are ints, since in K&R C, there were no void functions, and the default result type is int. A C function that takes a variable number of parameters can correspond to several Ada subprograms taking various allowed numbers (and types) of parameters. A C pointer type can correspond either to an Ada access type, to an Ada array type, or to an 'in out' parameter of any type. (This implies pass-by-reference; there is no need to explicitly say so in normative text.) To interface to a C pointer-to-array-of-unknown-size, the Ada programmer can declare an access-to-CONSTRAINED-array, making the constraint really huge, and (hopefully) being careful about accessing beyond the end of the array. The length of the array has to be managed by the user, as in C -- often by passing an extra 'size' parameter, and often by marking the end of the array with a special value. For imported subprograms, parameters of unconstrained array subtypes should be supported -- the compiler can simply throw away the dope on the call. However, for other cases (exported functions, imported and exported variables, convention-C access types), the constrained array method is needed. This is probably worth a NOTE. Should we require support for access parameters? ---------------- Link names. There needs to be a more portable definition of link names, both default compiler generated ones, and user-defined ones, at least as Implementation Advice, if not as real rules. Otherwise, it's impossible to write portable bindings to C code, which is a major goal of this annex. There are two steps: 1. Define the link name that is used by default, when the user hasn't given a link name in the pragma Import or Export. 2. Define the semantics of link names. (The semantics can be the same, whether the user specified a link name, or it was chosen by default.) We can't define the semantics of link names in general for all languages in M.1, but we can do much better for the individual languages. For C, I think we ought to say that: pragma Import(C, Foo); is equivalent to: pragma Import(C, Foo, "foo"); That is, the default-link-name rule should be: 1. The default link name is the Ada name given in the pragma, converted to lower case. This has the nice property that a link name is unnecessary in most cases (since most C programmers use all lower case for function and variable names by convention). An alternative rule would be to say that the Ada declaration's case is obeyed. But some existing compilers forget the case of Ada identifiers, so that would be an implementation burden. The semantics-of-link-names rule should be: 2. The link name is identical to the C name as declared in the C program. Some C compilers add a leading underscore to each name. We don't want the user to have to do that "sometimes", because it's not portable. Therefore, the Ada compiler should add an underscore if and only if the supported C compiler does. In other words, the user should not have to know the symbol name that is actually stored in the object code -- the user should be able to tell what to write in Ada by looking at the C code (in the .h file, or whatever). Thus, if the user gives a link name: pragma Import(C, Foo, "_foo_BAR"); this matches a C function called "_foo_BAR" (not "foo_BAR") -- the case is *not* converted, and the Ada compiler mimics the C compiler as far as adding underscores and any other transformations. ---------------- Lack of Size_In_Storage_Elements attribute: C measures sizes (essentially) in storage elements, and all objects, including components, are assumed to be an integer number of storage elements (except in certain specialized contexts). The fact that the Size attribute is measured in bits is a real pain. There ought to be an attribute that measures sizes in storage elements. For components that are not allocated an integer number of storage elements, the attribute should round up. ---------------- Void: Should we say something about C's void type? E.g., what Ada type matches a "void *" in C? The answer is probably, "any access type whose convention is C". Unchecked conversions among all such types should be well defined. ---------------- Record component offsets: There's no easy way to get the offset of a record component without creating an object of the record type (the Position attribute requires a record object). This is because the offset can be different for different objects when there are discriminant-dependent components. But there are often no discriminant-dependent components. This does not seem important enough to fix. One can always create a dummy record object just to get the offsets. ---------------- Mixing with parent components: An implementation is not required to support record_representation_clauses that map the components added by a type extension in between the components inherited from the parent. Such a feature would be helpful in implementing the Object, RectObj, Unnamed, and Core class records of Xt. The reason for allowing the restriction is to allow the implementation of equality using block comparisons -- gaps between components can be set to some known bits. However, this rationale doesn't really hold for limited types, since there is no predefined equality. We could consider requiring support for this for limited types if the implementers don't mind, but I don't think it's very important. The above-mentioned usage is pretty narrow.