Language Study Note LSN-029-DR
Comments on Aspects of Unsigned Integer Types in Ada-9X
Robert B. K. Dewar
02 Feb 1992
Let's start of by limiting the field of discussion to unsigned types with
two's complement modular operations. I realize there are a couple of other
issues (range checked unsigned and 1's complement), but I don't intend to
address those here.
Question 1: What unsigned types should be provided?
MD 4.0 merely suggests that at least one unsigned type be provided, which
is essentially simply a mechanism for forcing the implementation of the
feature (and is consistent with the general approach in Ada-83, e.g. at
least one float type must be provided). Presumably one expects that a
sensible choice is made, and furthermore that the implementor will provide
other appropriate types. How reasonable is this expectation? Well we have
good and bad experience in Ada-83. Generally, implementors have provided
a reasonable selection of integer base types. In the floating-point field,
the experience is much less encouraging. No implementor has provided the
IEEE extended float type on machines that provide it, despite the fact that
reasonable exploitation of the IEEE capabilities in such environments
requires such access. There are even worse cases of implementors providing
only one float type on machines where there are clearly two types provided
by the hardware.
Some suggestions have been made to require more specific compliance.
I suggested requiring at least UNSIGNED_8 (a common type used to manipulate
untyped byte strings) and the UNSIGNED type corresponding to INTEGER (which
often, but not always, would correspond to the address length). Someone (I
forgot who right now), appropriately commented that this requirement could
be undesirable on a 16-bit machine which had implemented a 32-bit INTEGER,
leading to a situation in which the (relatively useless) UNSIGNED_32 was
required, and the (much more important) UNSIGNED_16 was not provide. This
seems a good point, and I don't like my suggestion any more (although the
idea of requiring UNSIGNED_8 seems still potentially useful).
Bryce Bardin (if I remember right) suggested that an implementation be
required to provide unsigned types corresponding to each implemented
signed integer type. I understand the motivation here, and generally
am sympathetic, but it has one very important flaw. It would extend the
annoying rule in Ada-83 that if you implement a long integer type, you
are forced to implement additional silly things that are useless (like
loops and indexing). I would like to see more implementors provide a
64-bit signed type on 32-bit machines (but I don't want 64-bit indexing
and loops). On the other hand, I find a 64-bit unsigned type on a typical
32-bit machine to be completely useless.
Tucker, without specifically suggesting a formal rule, notes that the
MRT intention is that unsigned types be provided for all lengths which
are conveniently supported by the hardware. This seems exactly the right
idea, although of course it is hard to formalize.
Perhaps the right compromise on this issue is to add a clear statement
requiring implementation of all lengths supported by the hardware without
clearly specifying what that means (it is then a subject of judgment, which
is not so terrible!) Incidentally I would like to see a similar rule stated
for floating-point in the numerics annex. This would *definitely* require
some additional floating-point types to be implemented in some compilers
(but presumably not be too burdensome given that the numerics annex does
not have to be implemented) -- it really is absurd to say that a compiler
implements the numerics annex (and therefore is presumably suitable for
numeric applications) when it does not implement all the available
floating-point types.
Question 2: What are unsigned types used for?
This is a question that should definitely be asked. The following are
possible answers:
1. UNSIGNED_8 (i.e. unsigned bytes) are a convenient type for
manipulating untyped byte streams (such as appear in communication
protocols). In C, the type "unsigned char" (which is really an
integer type) is just right for this, and this is a common usage
pattern in C. For some purposes, a signed 8-bit type can be used
but is often less convenient, because the high order bit (which is
for example often serving as a parity bit) really does not have
the semantics of a sign.
2. Address arithmetic. On many machines, addresses values naturally
correspond to unsigned integers. Even on the IBM-PC (286) with its
non-linear address space, the type UNSIGNED_16 is appropriate both
for segment and offset arithmetic.
Of course address arithmetic is fairly specialized. Compiler writers
use it all the time, which is why this interest sometimes seems to be
over-represented in the compiler community! On the other hand,
address arithmetic arises more often than one would think in low
level systems type programming.
C almost always provides, in the form of "unsigned", an appropriate
type for such operations, and of course pointer arithmetic implicitly
uses the same kind of unsigned operations in typical implementations.
3. General purpose use for numeric subranges, related to representation
considerations. For example, if a financial program involves dealing
with 200 companies, then an UNSIGNED_8 value can fit a company number.
4. Specialized use in applications specifically requiring modular
arithmetic. Typically these modulus values may not be a power of
2, but, as I understand this requirement, it is easier to construct
such arithmetic based on power-of-2 modular arithmetic types, than
on ordinary signed types (is this true).
It seems clear that the overwhelmingly important uses are in categories
1 and 2, but there has been some support for also meeting the other needs.
Question 3: Does Ada-83 Satisfy the Need?
Generally the approach in Ada-83 for unsigned is something like:
type UNSIGNED_8 is range 0 .. 255;
for UNSIGNED_8'SIZE use 8;
function "+" (A,B : UNSIGNED_8) return UNSIGNED_8;
pragma INTERFACE ("+", INTRINSIC); -- or some equivalent
...
How well does this work? Well it gives a type with the right range, right
representation, and right operations, which can be implemented in efficient
form if the compiler and code generator care to do so (I use the 9X term
INTRINSIC here, but Ada-83 compilers are of course free to implement some-
thing equivalent, and many do, for example in the Alsys compilers, the
equivalent language name is KNOWN_TO_CODE_GENERATOR.)
What are the disadvantages of this approach? There are three:
FIRST: it does not work for the largest integer type, since there is no
signed parent type that can be used to provide the (largely fictional)
base type required.
SECOND: the signed base type is pretty well hidden away, but not
completely. For example, the loop:
for I in UNSIGNED_8'BASE'FIRST .. 100 loop
...
end loop;
is an annoying case of reappearence. Of course a code generator can do
appropriate special casing to avoid any general impact on the quality
of code for the normal cases, but you can't always trust this to happen
(at least one compiler fails to do this special casing today). Similar
instances of unwelcome reappearence of the base type can show up in the
generated code. After all, when we write:
A,B,C : UNSIGNED_8;
...
A := B + C;
the formal semantics is essentially that the values of B and C are 16-bit
signed values that are passed to the + routine, which returns a 16-bit
signed value that must be range checked before the store in A. Of course
the actual code can optimize this check away, but again, you can't count
on compilers to always do the optimal thing.
THIRD: When the type is instantiated in a generic, the normal signed
operators reappear inside the generic, according to the usual Ada-83
rules.
Let's look at these disadvantages in more detail. The FIRST problem, that
we cannot deal with the largest negative number, is a serious one. There
are three general approaches to solving it:
The first approach is to build kludges. We can define an unsigned type
which is not really a numeric type with the right operations, and then
provide methods (unary plus, use of strings etc) for introducing numeric
literals, so we get something of the sort:
type UNSIGNED_32 is private;
function "+" (X : INTEGER) return UNSIGNED_32;
function "+" (X : STRING) return UNSIGNED_32;
A,B : UNSIGNED_32;
..
A := +123;
B := +"16#8000_0000";
Well it works, and with appropriate compiler cooperation, can generate
efficient code, but it is certainly unpleasant, and no one would regard
it as a clean solution (or one that adequately meets the requirements).
The second approach is to read the RM imaginatively (both the ARG and
URG have experimented with this imginative reading) to allow implementors
to introduce unsigned types in packages other than STANDARD. The reasoning
is something like the following:
o STANDARD can use magic to introduce new base types, and there is no
explicit statement that such magic cannot be used in other packages.
o The requirement for symmetrical types only applies to predefined
integer types in standard, and therefore would not apply to new
magic types in other packages.
The status of this reasoning is a bit dubious (but as in the past, we have
been willing to bend the RM if the need is great enough), and, as far as
I know, the relevant AI is not yet in approved status (I think it is
597 if I remember the numbers right). [John Goodenough, can you confirm
the current status on this point]. As I remember, the reason the ARG did
not complete this work was that generics are not understood. Do these new
magic types match range <> in a generic? If so, what arithmetic operations
do they get (what are there predefined operations?) This gets tricky, and
no answers seem completely happy.
Note: at least one implementor (Verdix) has gone ahead and implemented
this solution (with an appropriate warning that its validity depends on
RM readings that have not yet been approved and are in dubious status).
I am not sure what Verdix does with generics in this case.
The third approach is simply to implement a longer signed type. For
example, if you need a 32-bit unsigned type, then provide a 64-bit
signed type which can act as the notional base type (and provide the
understood generic predefined operations). This seems the cleanest
response to the problem, but, as far as I know, no implementor has
taken this approach. Why not? I think the answer is clear. Given the
current ACVC interpretation of the Ada-83 rules, if you implement a
64-bit signed type, then you have to implement a bunch of useless
operations, such as conversions, indexing, loops and cases, which
enormously increases the difficulty of such an implementation (just
implementing the basic arithmetic operations is a much simpler, though
not trivial, task).
It is one of the annoying consequences of our current interpretation
of the rules that, given two compilers:
Compiler A: has only 32-bit integers
Compiler B: has everything compiler A has, but also implements
64-bit integers, restricting the operations to simple arithmetic.
we declare compiler A valid, but compiler B invalid, even though compiler
B clearly has more functionality (at such decisions, I tend to support
Stallman's contempt for restrictions imposed by standards!)
Anyway, I think this interpretation is clearly responsible for the fact
that the third approach has not been more popular. If Ada-9X was more
permissive in this respect, this solution might be more attractive --
more on that later.
Turning our attention to the SECOND problem, that of havinmg the signed
base type reappear, this is more of an implementation problem than a
semantic problem. The one semantic problem is that it is conceptually
worrisome to have these types not be the same as their base types (a
relationship which is of course required for normal signed types). If
Ada-9X permits the explicit use of 'BASE as a type mark, then this problem
is aggravated. Nevertheless the conceptual problem is relatively minor,
and probably it is the implementation consequences that are more
significant. One question here is whether doing things right using this
Ada-83 approach (using special casing where appropriate), is any harder
than implementing a separate Ada-9X approach, such as is suggested in MD
4.0. My guess is that it's pretty much a wash (neither approach is likely
to be significantly more work).
Now, what about the THIRD problem -- difficulties in generic instantiation?
The problem is that inside the generic the normal predefined (i.e. signed)
operators reappear, which seems wrong -- or is it?
To study this further we need examples of actual usage. As I consider
these examples, I find three separate cases:
1. Cases in which signed operations are essential (even for dealing
with unsigned cases). As might be expected, examples in this
category are generics which are suitable for use with both signed
and unsigned types.
2. Cases where it does not matter whether you get the signed or
unsigned operations. As might be expected, examples in this
category are generics which are suitable for use with both signed
and unsigned types.
3. Cases of generics suitable only for use with unsigned types, where
the use of the unsigned (modular) operations is essential.
4. Cases of generics that are applicable to either signed or unsigned
types, but where the use of unsigned (modular) operations for the
case where an unsigned type is used is essential.
Let's try to find examples in each category. First, a case where signed
operations are essential. For this, consider:
generic
type ELEMENT is range <>;
...
package STATISTICS is
function AVERAGE (A : ARRAY_OF_ELEMENT) return ELEMENT;
function STD_DEV (A : ARRAY_OF_ELEMENT) return ELEMENT;
...
end;
It makes perfect sense to compute the average and standard deviation of
a set of unsigned or signed numbers, and of course the answers happen
to be of the appropriate type (the average is signed or unsigned, depending
on the input type, and the standard deviation always happens to be positive).
This generic is thus suitable for use by both signed and unsigned integer
values. However, the body very likely requires the use of signed arithmetic,
and *certainly* does not depend on modular arithmetic. Consider how an
average is computed -- it's not so easy, you can't just go adding up the
total and dividing because of overflow. Two obvious approaches are to
detect and handle overflow when it happens, or to fiddle around with
subtractions and scaling (generating intermediate negative numbers) to
avoid the overflow -- in either case modular arithmetic will derail the
computation.
Is the use of this package for unsigned numbers realistic? I think so. Here
are two examples.
Instantiate UNSIGNED_8 type and use it to compute these statistics
for an incoming encoded byte stream in an attempt, part of a decryption
program, to indentify the particular code being used.
Instantiate UNSIGNED_32 type and use it to compute these statistics
for the position of files on a disk. The average is the preferred
head rest position, and the standard deviation indicates the expected
average seek performance.
Now, what about a case in which it does not matter whether or not signed
or unsigned operations are used:
generic
type ELEMENT is range <>;
...
package SORTOPS is
procedure SORT (A : in out ARRAY_OF_ELEMENT);
function MEDIAN (A : ARRAY_OF_ELEMENT) return ELEMENT;
...
end;
Here the most likely implementation of both sort and median is to simply
use the comparison operator on elements that originally exist in the given
array. Since the signed and unsigned comparisons give the same results
(remember for example that we are comparing the 8-bit unsigned comparison
with the 16-bit signed comparison), it makes no difference which set of
operators is used.
Is the use of this package with unsigned types realistic? Again I think so.
Here are some suggestive examples:
Instantiate UNSIGNED_8 type and use it to compute the median of a
sequence of bytes. This operation is needed for certain compression
algorithms that work by encoding the distribution of bytes to make
it evenly distributed.
Instantiate UNSIGNED_32 type and use it to sort a bunch of modules
in memory so that the list is stored in memory order.
Now, what about the third case -- one which is intended only for unsigned
types and where the unsigned operations are essential. That's a lot harder
to find. Certainly one can find examples that are suitable for instantiation
with a particular unsigned base type. For example:
generic
type ELEMENT is range <>;
...
package CYCLIC is
function CRC_16 (A : ARRAY_OF_ELEMENT) return CHECKSUM;
...
end;
The point here is that it makes sense to instantiate this only with a type
that is an 8-bit unsigned type (the algorithm is specific to this case).
It's still useful to create such generics, because, unless the CRC_16
operation is primitive (unlikely!) it does not get derived, so if you
really are using derived types, then this kind of generic is useful. On
the other hand, it is important to note that this is not an example where
completely different types instantiated.
Are there better examples? I don't know, I can't think of any!
The fourth case (where it makes sense to use it for either signed or
unsigned operations, but it is important that the unsigned operations
be used in the unsigned case), is really hard to find examples for.
Again, I have failed to find any non-artificial examples (let alone
convincing ones).
The bottom line here is that the Ada-83 solution may not be as bad as it
seems. Its weakness rests on the importance of convincing examples of
case 3 or case 4 (where unsigned operations are important in the generic).
Any examples would be most welcome!
Question 4: What should we do in Ada-9X?
The issue for Ada-9X is how to provide unsigned facilities with a minimum
of special new features and especially with a minimum of new core features
or peculiar magic.
Many of the questions revolve around the generics issue. For example, one
of the early proposals for 9X was to avoid the reemergence of predefined
operations in generics. One "advantage" of this was to ease the task of
dealing with unsigned by ensuring that the "undesirable" reemergence of
the signed operations did not occur. However, from our previous discussion
of this issue, it is not at all clear that the reemergence is necessarily
so undesirable, and it is also the case that it addresses the generic
case which seemed hardest to find examples for (generics that work for
both signed and unsigned types, where it is imporant tha the unsigned
operations be used for unsigned types).
Similarly, the formulation in MD 4.0 where all unsigned types are derived
from an unsigned class type, is designed to facilitate the use of generics
where unsigned operations are important and which can be instantiated with
different unsigned types - a case that at least I couldn't find any
convincing examples to support.
Going back to our generic examples, the one case where the Ada-83 solution
was found clearly wanting was the cyclic checksum example, where we had a
generic that was suitable for instantiation with any 8-bit unsigned type.
An interesting observation is that in Ada-9X, there are two other possible
approaches to solving this problem:
Use a generic with a generic formal derived type
Use class wide programming (e.g. UNSIGNED_8'CLASS formal parameter)
At least one, and likely both, of these approaches will be in the final
Ada-9X, so this case at least is taken care of satisfactorily.
Given this observation, the Ada-83 solution, combined with other 9X features
seems to satisfy most requirements for generic instantiation. The only
exceptions are cases which we have not yet found good examples for:
Cases of generics which are instantiated with unsigned types of
different sizes, where unsigned operations are essential.
Cases of generics which are instantiated with both signed and
unsigned types, where unsigned operations must be used for the
unsigned case.
Note that at least some proposals that have been made seem to have the
functionality of dealing with the first of these two cases (which is
still in our dubious category - given the absence of examples), while
excluding the cases where we did have good examples, and which the Ada-83
approach allowed. For example, if we have a specific "mod" type, and a
corresponding "mod" generic type, then we accomodate the most dubious
case, but not ones in which signed and unsigned types can both be sensibly
used.
The Ada-83 approach will of course still be possible in Ada-9X, no matter
what other capabilities are provided. If the new unsigned capabilities of
Ada-9X are in some significant respects *less* powerful than the Ada-83
solution, then at the best, both solutions will persist in practice, and
at worst we may find that the new Ada-9X solutions just aren't very useful.
One scenario to be worried about is an implementor who has to decide whether
to optimize the Ada-83 solution and/or an alternative Ada-9X solution. It is
of course the case that a lot of existing code uses the Ada-83 solution.
This is particularly true for UNSIGNED_8 where the Ada-83 solution seems to
work pretty well. This means that lots of 9X code, derived from this 83 code
will continue to use the Ada-83 solution, and compiler writers are likely
to continue investing effort in making this solution work. Will they also
invest effort in an alternative 9X solution? The answer to that question is
not clear, but could very well be no in some circumstances.
If compilers appear in which the Ada-83 solution is carefully optimized,
but the 9X solution is implemented in a correct, but not carefuly
optimized manner, then we certainly may have situations in which the new
9X solution is useless and wastes implementor effort.
Question 5: Can we use the Ada-83 Solution for Ada-9X?
Going back to our three problems with the Ada-83 solution, what can be done
in Ada-9X to minimize the impact of these problems and make the Ada-83
approach more viable.
First there is the problem of the largest signed integer and its
corresponding unsigned type. It seems fundamental to an Ada-83 solution
that at least minimal support of the double length signed type is needed.
For example, if we want 32-bit unsigned, then we will need 64-bit signed.
Obviously this is not a zero level implementation effort. However, Ada-9X
could make it much easier to implement, for example by specifically
allowing an implementation to reject uses other than basic arithmetic.
This does of course raise generic contract problems, but you can't have
it both ways. Note that the introducion of INT_CLASS operations in 9X
tends to be exactly counter-productive here, in requiring significant
support for the longest integer type. It should also be noted that a
minimal support of double length signed integers is definitely of
significant value in its own right, so the extra implementation work
here (which would of course only be required if you needed the longest
unsigned type, and were implementing the systems programming annex).
Second there is the problem that UNSIGNED_x'BASE /= UNSIGNED_x.
As we discussed before this is a possible implementation problem in
generating efficient code, but probably not a significant one. I think
the conceptual difficulty (and slight inconsistency) is acceptable.
Third, there is the generic issue. As has been argued at some length, it
is not at all clear that this *is* a problem. Furthermore, the Ada-9X
attempts to solve this problem may actually be cures that are worse than
the disease.
It certainly seems that the "Ada-83" solution is a possible path to handling
the unsigned requirement in Ada-9X. In fact the situation in some Ada-83
compilers is not at all bad with respect to unsigned. What *is* seriously
wrong in Ada-83 is that different manufacturers solve the unsigned problem
with completely different incompatible packages, so one is in a Pascal like
situation of writing code that depends on local junk and is seriously non-
portable.
Ada-9X at a minimum will provide a standard package interface for unsigned
operations. One advantage of staying as close as possible to the Ada-83
solution is that the Ada-9X design will provide an instantly implementable
recommendation for implementation of a corresponding package in existing
Ada-83 compilers. Such opportunities for retrofitting Ada-83 with Ada-9X
packages are an important tool for easing the transition from Ada-83 to
Ada-9X.
Summary
In this note, we have examined some issues in the implementation of unsigned
integer types, and in particular looked at the generic instantiation issue.
Ada-83 provides paths for addressing unsigned which are not necessarily
so bad, and we should look again at staying closer to the Ada-83 direction
in satisfying the unsigned requirement, rather than devising completely
new approaches which may after all be not entirely satisfactory, as well
as causing additional transition problems.