!topic LSN on Environment-related Friendliness Issues !key LSN-1069 on Environment-related Friendliness Issues !reference RM9X-14.3;4.0 !reference RM9X-14.5.4;4.0 !from Bob Duff $Date: 94/02/05 11:14:46 $ $Revision: 1.5 $ !discussion This LSN discusses various "friendliness" issues related to the interface between Ada programs and typical operating systems. There was an extensive discussion of these issues at the recent DR/XRG meeting in the UK. Many of these issues tend to trigger regular questions and complaints on the comp.lang.ada newsgroup. In particular, a well thought out note posted by Michael Paus triggered the UK discussion. (For those interested, Erhard Ploedereder included a copy of Michael Paus' note in the agenda -- revised version sent out 12 December.) These issues are generally of the form, "Here's a thing that's easy to do in other languages like C, in most common O.S. environments, and it's commonly needed in simple programs, so why is it such a pain in Ada?" Although Ada is intended for building very large systems, it is important that it be suitable for simple little programs as well. Professional programmers often learn a new language by trying it out on a few small programs. If writing such programs is painful (because, for example, there's no standard way to get at the command line arguments), the programmer may well go back to C or whatever. Given the discussions at the UK meeting, our plan is to address the following issues as explained below: - standard error - floating and fixed point input format - standard files in data structures - ability to mix standard files with Stream I/O - flush output - immediate get-character - access to command line - exit status code Most of these issues are inherently non-portable. However, the same could be said for all of chapter 14, "Input-Output". We take the same approach as chapter 14 -- define the interface, but make the actual interactions with the outside world implementation-defined. There are many systems on which getting command line arguments makes sense. In the C world, there is a standard interface for getting the command line. If there is no command line (e.g. on an embedded system), a C program will behave in a sensible manner in that environment (e.g. return a null command line). In other words, the fact that C can be (and is) used for many embedded systems does not mean that writing a simple C program on Unix or MS-DOS has to be a painful experience. In fact, it is not hard to write a C program that is *portable* to both of those environments, without losing the command-line features. Ada should be as good as C in this regard. Note also that most Ada compilers support most of the above features anyway. Thus, by addressing these issues in the Ada standard, we are substantially increasing uniformity with very little implementation cost. Adding implementation-defined features to the standard can actually *increase* the uniformity among implementations! (In the following, all modifications to Text_IO apply equally to Wide_Text_IO.) ---------------- Add Standard_Error, Current_Error, and Set_Error to Text_IO. (It was noted that at least one implementer already does this, and apparently has not been "caught" in this extension to Ada 83!) As for Standard_Output, the file or device to which Standard_Error output goes is implementation-defined. The vast majority of computers that exist today are running operating systems with a notion of standard error: Unix, MS-DOS, MS Windows, and VAX/VMS, for example, all support a concept of standard error. One would expect that Ada's Standard_Error would be mapped to it in the obvious way, just as one expects Standard_Output to be mapped in the obvious way. On systems with no such built-in concept, Standard_Error and Standard_Output can be mapped to the same device. (This is what happens by default on the above-mentioned operating systems anyway.) ---------------- ISO 6093:1985 defines language-independent formats for the textual representation of floating point numbers. Floating-point input in Text_IO should support the formats defined in this standard, in addition to currently supported formats. Leading and trailing zeros should not be required, and a trailing decimal point should not be required. There was no standard for this feature when Ada 83 was created; now that there is, Ada should obey it. This will lead to better inter-operability with other languages -- one will be able to print out floating point data in Fortran and read it into an Ada program without having to write code to parse the numbers. The same change should be made to fixed point input. Note that this change is not upward compatible, but only for programs that rely on exceptions being raised. The change is actually at least as likely to fix bugs (or misfeatures) in Ada programs than to cause them. There will be no cases in which a previously-accepted input format would become incorrect. ---------------- It is not currently feasible to insert the current input file into a data structure. For example, one might design a program to have a list of files to be read at some point. But if one later wants to put current input in the list, a redesign is necessary. The same is true of current output, current error, standard input, standard output, and standard error. We plan to add six functions to Text_IO that overload the existing six functions, but return an access value. Then, one could insert that access value into data structures. User's files are not a problem -- the user can allocate them on the heap, or declare them aliased, as needed. The new declarations would be like this: type File_Access is access constant File_Type; function Current_Input return File_Access; ... -- similar declarations for the other five The old versions of Current_Input and friends would still be there, and would be defined to return Current_Input.all (etc.). Presumably, the implementation would have a variable in the body of Text_IO (say, Current_Input_Holder), of type File_Access, initialized to point to the same thing as (say) Standard_Input_Holder. The actual File_Type object for standard input could be allocated statically or in the heap. In a program, you could save and restore current input like this: procedure P(File_Name: String) is My_File: File_Type; Old_Current_Input: constant File_Access := Current_Input; begin Open(My_File, In_File, File_Name); Set_Input(My_File); ... -- Do lots of input from Current_Input. Set_Input(Old_Current_Input.all); Close(My_File); end P; And you could make data structures like this: type File_List is array(Integer range <>) of File_Access; procedure Concatenate_Files(Files: File_List) is -- Read all the files, and write the contents to current output. begin for I in Files'Range loop Set_Input(Files(I)); while not End_Of_File loop declare C: Character; begin Get(C); Put(C); end; end loop; end loop; end Concatenate_Files; ... Concatenate_Files((Current_Input, Some_Other_File'Access)); We considered making File_Access be access-to-variable instead of access-to-constant. This would be slightly more flexible, but introduces dangling-pointer kinds of problems. Making File_Access access-to-constant means that one cannot use Unchecked_Deallocation on Current_Input, and that one cannot call Close on Current_Input. (It is still possible for the current input file to be closed or deallocated, as in Ada 83, but only through relatively obscure means -- see RM9X-14.3.3(16-17).) It is possible to cause upward incompatibilities with this change, but they seem pretty rare. For example, the following: X: System.Address := Text_IO.Standard_Input'Address; would become ambiguous. ---------------- Add a child of Text_IO: with Ada.Streams; package Ada.Text_IO.Text_Streams is type Stream_Access is access all Streams.Root_Stream_Type'Class; function Stream(File: in File_Type) return Stream_Access; end Ada.Text_IO.Text_Streams; Add the following declarations to the spec of Text_IO: procedure Flush(File: in out File_Type); procedure Flush; The semantics of the Stream and Flush subprograms are identical to the subprograms of the same name in Stream_IO. Performing operations on the stream corresponding to a text file will not update the column, line, and page counts. The ability to get the stream of a text file allows one to use Current_Input, Current_Output, and Current_Error with all the functionality of streams, such as the ability to mix text and binary I/O, or to mix binary I/O for different data types. ---------------- A few somewhat-related changes are needed to the Streams and Streams.Stream_IO packages: Stream_IO.Flush should raise Mode_Error if the mode is In_File. Stream_IO.Stream_Access should be a general access type. Streams is declared pure, but it illegally depends on System.Storage_Elements, which is not declared pure. It is important that Streams remain declared pure, for distribution (Annex I). It is important that System and Storage_Elements remain impure, because an implementation might want to implement type Address as an access type. Therefore, we must break the dependence of Streams on System.Storage_Elements. This can be done by declaring the following types in Streams: type Stream_Element is mod ; ^^^ or whatever the syntax ends up being ;-) type Stream_Element_Offset is range ; type Stream_Element_Array is array(Stream_Element_Offset range <>) of Stream_Element; Then, the Read and Write procedures in Streams can be defined in terms of Stream_Element_Array and Stream_Element_Offset instead of Storage_Array and Storage_Offset, respectively. This has an added advantage: The "byte size" used on the network is no longer tied to the "byte size" used internally by the machine. I don't see any need for these with_clauses on Streams: with Ada.Tags; with Ada.Exceptions; ---------------- Add the following declarations to the spec of Text_IO: procedure Immediate_Get(File: in File_Type; Item: out Character); procedure Immediate_Get(Item: out Character); Immediate_Get gets a single character from the file. It can be used, for example, to get a single key from a keyboard, without waiting for carriage return to be typed. It raises Mode_Error if the file mode is not In_File. Immediate_Get waits until a character is available. The meaning of input being ``available'' is implementation defined. Implementation Advice: These procedures should use unbuffered input. They should not skip control characters, including any control characters used to represent end of line and end of page. For a device like a keyboard, input should be ``available'' if the user has already typed a key, whereas for a disk file, input should always be available except at end of file. For a file attached to a keyboard-like device, any command-line editing features supported by an underlying O.S. should be turned off when Immediate_Get is called, and not turned back on until some other input operation is called. These procedures will not update the column, line, and page counts. An alternative to Immediate_Get would be to have a mode for each file: type Terminal_IO_Mode is (Immediate, Nonimmediate); procedure Set_Mode(F: File_Type; Mode: Terminal_IO_Mode); function Get_Mode(F: File_Type) return Terminal_IO_Mode; The mode would be ignored for devices that don't behave like keyboards. It would default to Nonimmediate. Presumably, the implementation would define what happens if the same external file is attached to two different files. Presumably, the implementation would restore the mode, if necessary, upon partition exit. The Immediate_Get method seems simpler. ---------------- Add a new language-defined library package: package Ada.Command_Line is function Argument_Count return Natural; function Argument(Number: Positive) return String; function Command_Name return String; type Status is range ; Success: constant Status; Failure: constant Status; procedure Set_Status(S: Status); end Ada.Command_Line; Argument(N) raises Constraint_Error if N is not in 1..Argument_Count. The semantics of Ada.Command_Line are otherwise implementation-defined. Rationale: It's nice to have a standard interface even though the semantics is implementation-defined. The Implementation Advice below will encourage the maximum level of portability that is reasonably achievable. Implementation Advice: If the underlying system has a concept of command line arguments, then Argument_Count should return the number of arguments passed to the partition, and Argument(N) should return the Nth argument, for N in 1..Argument_Count. Otherwise, Argument_Count should return 0. Parsing of the command line into individual arguments should reflect the conventions of the underlying system to the extent possible. Rationale: C doesn't define command-line parsing, and operating systems vary widely in this regard. Furthermore, one usually wants one's Ada programs to fit in well with the underlying O.S. -- we don't want Ada programs running under, say, VAX/VMS to be forced into, say, a Unix-style (or Ada-RM-style) command line syntax. It is up to the implementation to make the command line syntax fit in well -- a good starting point for an implementer would be to look at what C compilers typically do on that system for argc and argv. Command_Name should return a string representing the name of the command used to invoke the partition in a manner that is customary on the underlying system (for example, the name of the executable file). If there are no customary partition names Command_Name should return an empty string. NOTE Note that Argument(0) raises Constraint_Error. Command_Name corresponds to argv[0] in C. If the underlying system supports status codes (or exit codes) returned from partitions, then Set_Status should set the status code. Success and Failure should represent success or failure as is customary on the underlying system. If the partition terminates normally, then it should return the status code last set by a call to Set_Status; if there were no such calls, then the partition should return Success. If the partition terminates abnormally, the status set by Set_Status calls should be ignored, and the implementation should return a status indicating an error occurred (possibly Failure, or possibly some other value). NOTE The implementation may provide child packages of Command_Line that contain such information as alternative failure codes, access to environment variables, and access to arguments given in named notation. Rationale: Abnormal termination takes precedence over Set_Status because (1) it seems worthwhile to encourage portability there, and (2) it seems clear that an unhandled exception propagated by the main subprogram should be viewed as a bug, and (3) the user can program the opposite semantics by handling the exception and then calling Set_Status. If there is no reasonable meaning of status codes on the underlying system, then calls to Set_Status should do nothing. [Rationale: This allows one to write a program that returns a status code on Unix, but does something harmless on a system where status code make no sense. Raising an exception would not catch bugs -- it would more likely *cause* bugs where none exist.] [Note: The usual tasking-safe rules apply -- if multiple tasks are calling Set_Status, the implementation must serialize these calls. Whichever task happens to call Set_Status last wins. Other functionalities can be easily programmed -- for example, the user could write a protected object that manages the status, and ensures that the "worst" failure code is the one that wins. The command line is constant (at least on typical systems), so tasking issues do not arise for the command line.] Note that the above definition does not require portability -- it cannot. However, in practise, portability among hosted implementations of Ada will be achieved. On embedded systems where command lines and status codes make no sense, there is no need for portability. Rejected alternative: We considered making the command line arguments be parameters of the main subprogram, and the status code be the result of the main function. This was rejected because: - It's harder to implement, especially on existing implementations. - It prevents library packages from having access to the command line before the main subprogram starts. - It makes it inconvenient for library packages to have access to the command line even after the main subprogram starts -- typically, the main subprogram would have to immediately copy the command line arguments to global package variables, so other parts of the partition can access them. Given their dynamic size, access types would be required. - The return from the main subprogram is not the end of time, as far as Ada is concerned -- tasks keep running after that, and those tasks might want to set the status code, but it's too late. It is of course OK for an implementation to support the above main subprogram mechanism (in fact, I believe at least two compilers support at least part of it). However, users cannot rely on such support being portable. Any interactions between the two mechanisms are up to the implementation to resolve. ---------------- SUMMARY: The above changes will go a long way toward achieving our goal of interoperability with other languages and systems. The cost is small, if one recognizes that the functionality is generally supported already, albeit in a non-portable fashion. Even though we have not magically solved all portability issues, from a practical point of view, portability will be enhanced. These changes will help make Ada a friendlier language for writing *small* programs.