Comments should be minimized. They should provide needed information that cannot be expressed in the Ada language, emphasize the structure of code, and draw attention to deliberate and necessary violations of the guidelines. Comments are present either to draw attention to the real issue being exemplified or to compensate for incompleteness in the example program.
Maintenance programmers need to know the causal interaction of noncontiguous pieces of code to get a global, more or less complete sense of the program. They typically acquire this kind of information from mental simulation of parts of the code. Comments should be sufficient enough to support this process (Soloway et al. 1986).
This section presents general guidelines about how to write good comments. It then defines several different classes of comments with guidelines for the use of each. The classes are: file headers, program unit specification headers, program unit body headers, data comments, statement comments, and marker comments.
Language Ref Manual references: 2.7 Comments
Using comments to duplicate information in the code is a bad idea for several reasons. First, it is unnecessary work that decreases productivity. Second, it is very difficult to correctly maintain the duplication as the code is modified. When changes are made to existing code, it is compiled and tested to make sure that it is once again correct. However, there is no automatic mechanism to make sure that the comments are correctly updated to reflect the changes. Very often, the duplicate information in a comment becomes obsolete at the first code change and remains so through the life of the software. Third, when comments about an entire system are written from the limited point of view of the author of a single subsystem, the comments are often incorrect from the start.
Comments are necessary to reveal information difficult or impossible to obtain from the code. Subsequent sections of this book contain examples of such comments. Completely and concisely present the required information.
The purpose of comments is to help readers understand the code. Misspelled, ungrammatical, ambiguous, or incomplete comments defeat this purpose. If a comment is worth adding, it is worth adding correctly in order to increase its usefulness.
Making comments visually distinct from the code, by indenting them, grouping them together into headers, or highlighting them with dashed lines is useful because it makes the code easier to read. Subsequent sections of this book elaborate on this point.
The use of such tools is encouraged, and may require that you structure your header comments so they can be automatically extracted and/or updated. Beware that tools which modify the comments in a file are only useful if they are executed frequently enough. Automatically generated obsolete information is even more dangerous than manually generated obsolete information, because it is more trusted by the reader.
Revision histories are maintained much more accurately and completely by configuration management tools. With no tool support, it is very common for an engineer to make a change and forget to update the revision history. If your configuration management tool is capable of maintaining revision histories as comments in the source file, then take advantage of that capability, regardless of any compromise you might have to make about the format or location of the revision history. It is better to have a complete revision history appended to the end of the file than to have a partial one formatted nicely and embedded in the file header.
------------------------------------------------------------------------ -- Copyright (c) 1991, Software Productivity Consortium, Inc. -- All rights reserved. -- Author: J. Smith -- Department:System Software Department -- Revision History: -- 7/9/91 J. Smith -- - Added function Size_Of to support queries of node sizes. -- - Fixed bug in Set_Size which caused overlap of large nodes. -- 7/1/91 M. Jones -- - Optimized clipping algorithm for speed. -- 6/25/91 J. Smith -- - Original version. ------------------------------------------------------------------------ |
Responsibility and revision history information should be present in each file for the sake of future maintainers, this is the header information most trusted by maintainers because it accumulates. It does not evolve. There is no need to ever go back and modify the author's name or the revision history of a file. As the code evolves, the revision history should be updated to reflect each change. At worst, it will be incomplete, it should rarely be wrong. Also, the number and frequency of changes and the number of different people who made the changes over the history of a unit can be good indicators of the integrity of the implementation with respect to the design.
Information about how to find the original author should be included in the file header, in addition to the author's name, to make it easier for maintainers to find the author in case questions arise. However, detailed information like phone numbers, mail stops, office numbers, and computer account usernames are too volatile to be very useful. It is better to record the department for which the author was working when the code was written. This information is still useful if the author moves offices, changes departments, or even leaves the company, because the department is likely to retain responsibility for the original version of the code.
Language Ref Manual references: 10.1 Compilation Units - Library Units
------------------------------------------------------------------------ -- AUTOLAYOUT -- Purpose: -- This package computes positional information for nodes and arcs -- of a directed graph. It encapsulates a layout algorithm which is -- designed to minimize the number of crossing arcs and to emphasize -- the primary direction of arc flow through the graph. -- Effects: -- - The expected usage is: -- 1. Call Define for each node and arc to define the graph. -- 2. Call Layout to assign positions to all nodes and arcs. -- 3. Call Position_Of for each node and arc to determine the -- assigned coordinate positions. -- - Layout can be called multiple times, and recomputes the -- positions of all currently defined nodes and arcs each time. -- - Once a node or arc has been defined, it remains defined until -- Clear is called to delete all nodes and arcs. -- Performance: -- This package has been optimized for time, in preference to space. -- Layout times are on the order of N*log(N) where N is the number -- of nodes, but memory space is used inefficiently. ------------------------------------------------------------------------ package Autolayout is ... --------------------------------------------------------------------- -- Define -- Purpose: -- This procedure defines one node of the current graph. -- Exceptions: -- Node_Already_Defined --------------------------------------------------------------------- procedure Define (New_Node : in Node); --------------------------------------------------------------------- -- Layout -- Purpose: -- This procedure assigns coordinate positions to all defined -- nodes and arcs. -- Exceptions: -- None. --------------------------------------------------------------------- procedure Layout; --------------------------------------------------------------------- -- Position_Of -- Purpose: -- This function returns the coordinate position of the -- specified node. The default position (0,0) is returned if no -- position has been assigned yet. -- Exceptions: -- Node_Not_Defined --------------------------------------------------------------------- function Position_Of (Current : in Node) return Position; ... end Autolayout; |
When you duplicate information in the header that can be readily obtained from the specification, the information tends to become incorrect during maintenance. For example, do not make a point of listing all parameter names, modes, or types when describing a procedure. This information is already available from the procedure specification. Similarly, do not list all subprograms of a package in the header unless this is necessary to make some important statement about the subprograms.
Do not include information in the header which the user of the program unit doesn't need. In particular, do not include information about how a program unit performs its function or why a particular algorithm was used. This information should be hidden in the body of the program unit to preserve the abstraction defined by the unit. If the user knows such details and makes decisions based on that information, the code may suffer when that information is later changed.
When describing the purpose of the unit, avoid referring to other parts of the enclosing software system. It is better to say "this unit does ..." than to say "this unit is called by Xyz to do ...." The unit should be written in such a way that it does not know or care which unit is calling it. This makes the unit much more general purpose and reusable. In addition, information about other units is likely to become obsolete and incorrect during maintenance.
Include information about the performance (time and space) characteristics of the unit. Much of this information is not present in the Ada specification, but it is required by the user. To integrate the unit into a system, the user needs to understand the resource usage (CPU, memory, etc.) of the unit. It is especially important to note when a subprogram call causes activation of a task hidden in a package body, the task may continue to consume resources after the subroutine ends.
Language Ref Manual references: 6.1 Subprogram Declarations, 7.2 Package Specifications and Declarations
------------------------------------------------------------------------ -- Autolayout -- Implementation Notes: -- - This package uses a heuristic algorithm to minimize the number -- of arc crossings. It does not always achieve the true minimum -- number which could theoretically be reached. However it does a -- nearly perfect job in relatively little time. For details about -- the algorithm, see ... -- Portability Issues: -- - The native math package Math_Lib is used for computations of -- coordinate positions. -- - 32-bit integers are required. -- - No operating system specific routines are called. -- Anticipated Changes: -- - Coordinate_Type below could be changed from integer to float -- with little effort. Care has been taken to not depend on the -- specific characteristics of integer arithmetic. ------------------------------------------------------------------------ package body Autolayout is ... --------------------------------------------------------------------- -- Define -- Implementation Notes: -- - This routine stores a node in the general purpose Graph data -- structure, not the Fast_Graph structure because ... --------------------------------------------------------------------- procedure Define (New_Node : in Node) is begin ... end Define; --------------------------------------------------------------------- -- Layout -- Implementation Notes: -- - This routine copies the Graph data structure (optimized for -- fast random access) into the Fast_Graph data structure -- (optimized for fast sequential iteration), then performs the -- layout, and copies the data back to the Graph structure. This -- technique was introduced as an optimization when the algorithm -- was found to be too slow, and it produced an order of -- magnitude improvement. --------------------------------------------------------------------- procedure Layout is begin ... end Layout; --------------------------------------------------------------------- -- Position_Of --------------------------------------------------------------------- function Position_Of (Current : in Node) return Position is begin ... end Position_Of; ... end Autolayout; |
The header is also a good place to record portability concerns. The maintainer may have to port the software to a different environment and will benefit from a list of nonportable features. Furthermore, the act of collecting and recording portability issues focuses attention on these issues and may result in more portable code from the start.
Summarize complex algorithms in the header if the code is difficult to read or understand without such a summary, but do not merely paraphrase the code. Such duplication is unnecessary and hard to maintain. Similarly, do not repeat the information from the header of the program unit specification.
-- Implementation Notes: None.
and
-- NonPortable Features: None.
The first is a message from the author to the maintainer saying "I can't think of anything else to tell you," while the second may mean "I guarantee that this unit is entirely portable."
Language Ref Manual references: 6.3 Subprogram Bodies, 7.3 Package Bodies
... --------------------------------------------------------------------- -- Current position of the cursor in the currently selected text -- buffer, and the most recent position explicitly marked by the -- user. -- Note: It is necessary to maintain both current and desired -- column positions because the cursor cannot always be -- displayed in the desired position when moving between -- lines of different lengths. --------------------------------------------------------------------- Desired_Column : Column_Counter; Current_Column : Column_Counter; Current_Row : Row_Counter; Marked_Column : Column_Counter; Marked_Row : Row_Counter; |
The conditions under which an exception is raised should be commented:
--------------------------------------------------------------------- -- Exceptions --------------------------------------------------------------------- Node_Already_Defined : exception; -- Raised when an attempt is made --| to define a node with an --| identifier which already --| defines a node. Node_Not_Defined : exception; -- Raised when a reference is --| made to a node which has --| not been defined. |
Here is a more complex example, involving multiple record and access types which are used to form a complex data structure:
--------------------------------------------------------------------- -- These data structures are used to store the graph during the -- layout process. The overall organization is a sorted list of -- "ranks," each containing a sorted list of nodes, each containing -- a list of incoming arcs and a list of outgoing arcs. -- The lists are doubly linked to support forward and backward -- passes for sorting. Arc lists do not need to be doubly linked -- because order of arcs is irrelevant. -- The nodes and arcs are doubly linked to each other to support -- efficient lookup of all arcs to/from a node, as well as efficient -- lookup of the source/target node of an arc. --------------------------------------------------------------------- type Arc; type Arc_Pointer is access Arc; type Node; type Node_Pointer is access Node; type Node is record Id : Node_Pointer;-- Unique node ID supplied by the user. Arc_In : Arc_Pointer; Arc_Out : Arc_Pointer; Next : Node_Pointer; Previous : Node_Pointer; end record; type Arc is record ID : Arc_ID; -- Unique arc ID supplied by the user. Source : Node_Pointer; Target : Node_Pointer; Next : Arc_Pointer; end record; type Rank; type Rank_Pointer is access Rank; type Rank is record Number : Level_ID; -- Computed ordinal number of the rank. First_Node : Node_Pointer; Last_Node : Node_Pointer; Next : Rank_Pointer; Previous : Rank_Pointer; end record; First_Rank : Rank_Pointer; Last_Rank : Rank_Pointer; |
In the first example above, the names Current_Column
and Current_Row
are
relatively self-explanatory. The name Desired_Column
is also well chosen, but
it leaves the reader wondering what the relationship is between the current
column and the desired column. The comment explains the reason for having both.
Another advantage of commenting on the data declarations is that the single
set of comments on a declaration can replace multiple sets of comments that
might otherwise be needed at various places in the code where the data is
manipulated. In the first example above, the comment briefly expands on the
meaning of "current
" and "marked
." It states that the "current
" position is
the location of the cursor, the "current
" position is in the current buffer,
and the "marked
" position was marked by the user. This comment, along with the
mnemonic names of the variables, greatly reduces the need for comments at
individual statements throughout the code.
It is important to document the full meaning of exceptions and under what conditions they can be raised, as shown in the second example above, especially when the exceptions are declared in a package specification. The reader has no other way to find out the exact meaning of the exception (without reading the code in the package body).
Grouping all the exceptions together, as shown in the second example, can provide the reader with the effect of a "glossary" of special conditions. This is useful when many different subprograms in the package can raise the same exceptions. For a package in which each exception can be raised by only one subprogram, it may be better to group related subprograms and exceptions together.
When commenting exceptions, it is better to describe the exception's meaning in general terms than to list all the subprograms that can cause the exception to be raised; such a list is harder to maintain. When a new routine is added, it is likely that these lists will not be updated. Also, this information is already present in the comments describing the subprograms, where all exceptions that can be raised by the subprogram should be listed. Lists of exceptions by subprogram are more useful and easier to maintain than lists of subprograms by exception.
In the third example, the names of the record fields are short and mnemonic, but they are not completely self-explanatory. This is often the case with complex data structures involving access types. There is no way to choose the record and field names so that they completely explain the overall organization of the records and pointers into a nested set of sorted lists. The comments shown are useful in this case. Without them, the reader would not know which lists are sorted, which lists are doubly linked, or why. The comments express the intent of the author with respect to this complex data structure. The maintainer still has to read the code if he wants to be sure that the double links are all properly maintained. Keeping this in mind when reading the code makes it much easier for him to find a bug where one pointer is updated and the opposite one is not.
Language Ref Manual references: 3.3.1 Type Declarations, 11.1 Exception Declarations
... -- Loop through all the strings in the array Strings, converting -- them to integers by calling Convert_To_Integer on each one, -- accumulating the sum of all the values in Sum, and counting them -- in Count. Then divide Sum by Count to get the average and store -- it in Average. Also, record the maximum number in the global -- variable Max_Number. for I in Strings'Range loop -- Convert each string to an integer value by looping through -- the characters which are digits, until a nondigit is found, -- taking the ordinal value of each, subtracting the ordinal value -- of '0', and multiplying by 10 if another digit follows. Store -- the result in Number. Number := Convert_To_Integer(Strings(I)); -- Accumulate the sum of the numbers in Total. Sum := Sum + Number; -- Count the numbers. Count := Count + 1; -- Decide whether this number is more than the current maximum. if Number > Max_Number then -- Update the global variable Max_Number. Max_Number := Number; end if; end loop; -- Compute the average. Average := Sum / Count; |
The following is improved by not repeating things in the comments which are
obvious from the code, not describing the details of what goes in inside of
Convert_To_Integer
, deleting an erroneous comment (the one on the statement
which accumulates the sum), and making the few remaining comments more
visually distinct from the code.
Sum_Integers_Converted_From_Strings: for I in Strings'Range loop Number := Convert_To_Integer(Strings(I)); Sum := Sum + Number; Count := Count + 1; -- The global Max_Number is computed here for efficiency. if Number > Max_Number then Max_Number := Number; end if; end loop Sum_Integers_Converted_From_Strings; Average := Sum / Count; |
Comments that paraphrase or explain obvious aspects of the code have no value. They are a waste of effort for the author to write and the maintainer to update. Therefore, they often end up becoming incorrect. Such comments also clutter the code, hiding the few important comments.
Comments describing what goes on inside of another unit violate the principle
of information hiding. The details about Convert_To_Integer
(deleted above)
are irrelevant to the calling unit, and they are better left hidden in case
the algorithm ever changes. Examples explaining what goes on elsewhere in the
code are very difficult to maintain and almost always become incorrect at the
first code modification.
The advantage of making comments visually distinct from the code is that it makes the code easier to scan, and the few important comments stand out better. Highlighting unusual or special code features indicates that they are intentional. This assists maintainers by focusing attention on code sections that are likely to cause problems during maintenance or when porting the program to another implementation.
Comments should be used to document code that is nonportable, implementation-dependent, environment-dependent, or tricky in any way. They notify the reader that something unusual was put there for a reason. A beneficial comment would be one explaining a work-around for a compiler bug. If you use a lower level (not "ideal" in the software engineering sense) solution, comment on it. Information included in the comments should state why you used that particular construct. Also include documentation on the failed attempts, e.g., using a higher level structure. This type of comment is useful to maintainers for historical purposes. You show the reader that a significant amount of thought went into the choice of a construct.
Finally, comments should be used to explain what is not present in the code as well as what is present. If you make a conscious decision to not perform some action, like deallocating a data structure with which you appear to be finished, be sure to add a comment explaining why not. Otherwise, a maintainer may notice the apparent omission and "correct" it later, thus introducing an error.
Count
and Sum
in a local block so that their scope is limited and
their initializations occur near their usage, e.g., by naming the block
Compute_Average
or by moving the code into a function called Average_Of
. The
computation of Max_Number
can also be separated from the computation of
Average
. However, those changes are the subject of other guidelines; this
example is only intends to illustrate the proper use of comments.Language Ref Manual references: 5 Statements
if A_Found then ... elsif B_Found then ... else -- A and B were both not found ... if Count = Max then ... end if; ... end if; -- A_Found ------------------------------------------------------------------------ package body Abstract_Strings is ... --------------------------------------------------------------------- procedure Catenate (...) is begin ... end Catenate; --------------------------------------------------------------------- ... begin -- Abstract_Strings ... end Abstract_Strings; ------------------------------------------------------------------------ |
The if
, elsif
, else
, and end if
of an if statement are often separated by long
sequences of statements, sometimes involving other if statements. As shown in
the first example, marker comments emphasize the association of the keywords
of the same statement over a great visual distance. Marker comments are not
necessary with the block statement and loop statement because the syntax of
these statements allows them to be named with the name repeated at the end.
Using these names is better than using marker comments because the compiler
verifies that the names at the beginning and end match.
The sequence of statements of a package body is often very far from the first line of the package. Many subprogram bodies, each containing many begin lines, may occur first. As shown in the second example, the marker comment emphasizes the association of the begin with the package.
Language Ref Manual references: 5.3 If Statements, 5.4 Case Statements, 5.6 Block Statements, 6.3 Subprogram Bodies, 9.1 Abort Statements