UNCLASSIFIED

A Survey of Computer Programming Languages
Currently Used in the Department of Defense
Audrey A. Hook, Task Leader

Bill Brykczynski
Catherine W. McDonald
Sarah H. Nash
Christine Youngblut

May 2, 1995

This document is still undergoing review and is subject to modification or withdrawal. It should not be referenced in other publications.

UNCLASSIFIED

INSTITUTE FOR DEFENSE ANALYSES
1801 N. Beauregard Street, Alexandria, Virginia 22311

Prepared for
Defense Information Systems Agency

PREFACE

This paper was prepared by the Institute for Defense Analyses (IDA) for the Defense Information Systems Agency under the task order, Ada Technology Insertion, and fulfills an objective, to perform a survey of high order languages currently used in the Department of Defense.

This paper was reviewed by the following IDA research staff members: Dr. Alfred E. Brenner, Dr. Dennis W. Fife, Dr. Richard J. Ivanetich, Dr. John F. Kramer, and Dr. Dale E. Lichtblau.

The authors would like to acknowledge Ms. Jean Sammet for providing many suggestions on improving the data collection form. Ms. Sammet's knowledge of programming languages and their versions was most helpful. Ms. Linda Brown, Ms. Joan McGarity, and Mr. Don Reifer also provided guidance for conducting the survey. The survey respondents should also be thanked for taking time to complete and return the data collection form.

Table of Contents

EXECUTIVE SUMMARY

1. INTRODUCTION

1.1 Purpose
1.2 Background
1.3 Approach
1.4 Language Counting Issues
1.5 Scope
1.6 Organization

2. SURVEY METHOD

2.1 Population Identification
2.1.1 Weapon Systems Population
2.1.2 Automated Information Systems Population
2.2 Sample Selection
2.2.1 Weapon Systems Sample
2.2.2 Automated Information Systems Sample
2.3 Data Collection Form
2.4 Contact Process
2.5 Respondent Errors
2.6 Analysis Process

3. RESPONDENT AND PROGRAMMATIC PROFILE

3.1 Weapon System Responses
3.1.1 Services
3.1.2 Acquisition Category
3.1.3 Acquisition Phase
3.2 AIS Responses
3.2.1 Services
3.2.2 Acquisition (Life-Cycle) Phase

4. LANGUAGE USAGE FINDINGS

4.1 Weapon System Findings
4.2 AIS Findings

5. CONCLUSIONS AND DISCUSSION

6. RECOMMENDATION

APPENDIX A. SURVEY INSTRUMENT
APPENDIX B. SURVEY DATA

LIST OF REFERENCES
LIST OF ACRONYMS

Table ES-1. Total SLOC by Language Generation for Weapon System Responses
Table ES-2. Total SLOC by Language Generation for AISs Responses
Table ES-3. Total SLOC by General Purpose 3GL for Weapon Systems
Table ES-4. Total SLOC by 3GL for AISs
Table 1. Values Assigned to SLOC Range Estimates
Table 2. Values Assigned to Language Percentage Estimates
Table 3. Total SLOC by Language Generation for Weapon System Responses
Table 4. Total SLOC by General Purpose 3GL for Weapon System Responses
Table 5. Third Generation Special Purpose Languages
Table 6. Third Generation "Other" Languages
Table 7. Total SLOC by Language Generation for AISs
Table 8. Total SLOC by 3GL for AISs
Table B-1. Weapon Program/System Names
Table B-2. AIS Program/System Names
Table B-3. Weapon System Survey Data
Table B-4. AIS Survey Data

EXECUTIVE SUMMARY

Background and Purpose

In June 1994 the Assistant Secretary of Defense for Command, Control, Communications and Intelligence commissioned a programming language survey of the Department of Defense (DoD). The purpose was to identify the number of programming languages being used today in the DoD as compared to 20 years ago when the DoD first began developing the Ada programming language.

A 1977 study, "A Common Programming Language for the Department of Defense-Background, History and Technical Requirements", identified "450" as the minimum, probable number of general purpose languages and dialects used in the DoD, but went on to say that the actual number was not known. How this estimate, and the method used to count root languages, versions, and dialects, came to be is still questioned. For this survey, as part of establishing a strong methodology, counting the number of languages used today required input from the organizations developing or maintaining automated information systems (AISs) and weapon systems. A census sample would include new systems, those being modernized, and those being maintained. For this study, a judgement sample of weapon systems was identified from the 1994 Presidential Budget requests for Research, Development, Test and Evaluation (RDT&E) programs exceeding $15 million and Procurement programs exceeding $25 million. Of the 1,300 programs identified, 423 programs were selected because they included software applications. The current DoD list of 53 major AISs was used as a sample population for non-weapon systems.

Experts in the field of programming languages have differed dramatically in classifying programming languages for counting purposes, particularly in defining the terms "dialect" and "version." For this paper, we use the term "dialect" to indicate a relatively minor change in a language whereas "version" indicates a larger change and usually has a different "name" although the new "name" may only be the concatenation of a different year or number to the baseline name (e.g., Jovial, Jovial 73). We counted a "version" of a root language as a distinct language. The methodology and data collection approach is explained in detail in this report to allow further expansion of the sample population.

Findings and Conclusions

Table ES-1. Total SLOC by Language Generation for Weapon System Responses

Language Generation     Total SLOC Reported
                          (in millions)
-------------------     -------------------
First                         3.90
Second                       26.30
Third:
      General Purpose       148.38
      Special Purpose         3.70
Fourth                        5.00
Fifth                         0.29

Table ES-2. Total SLOC by Language Generation for AISs Responses

Language Generation       Total SLOC Reported
                             (in millions)
-------------------     -------------------
First                          0.30
Second                         0.63
Third:
      General Purpose         38.24
      Special Purpose          0.00
Fourth                        10.81
Fifth                          0.05

Table ES-3. Total SLOC by General Purpose 3GL for Weapon Systems

Third Generation Language  Total SLOC Reported
    and Version              (in millions)
-------------------------  -------------------
Ada 83                         49.70
C 89                           32.50
Fortran pre-91/92              18.55
CMS-2 Y                        14.32
Jovial 73                      12.68
C++                             5.15
CMS-2 M                         4.23
Other 3GLs                      3.38
Pascal pre-90                   3.62
Jovial pre-J73                  1.12
Fortran 91/92                   1.00
PL/I 87/93 subset               0.64
Basic 87/93 (full)              0.48
PL/I 76/87/93                   0.36
Pascal 90 (extended)            0.29
Basic 78 (minimal)              0.17
LISP                            0.10
Cobol pre-85                    0.09
Cobol 85                        0.00
=========================    =======
Total                         148.38

Table ES-4. Total SLOC by 3GL for AISs

Third Generation           Total SLOC Reported
Language and Version          (in millions)
--------------------       ---------------------
Cobol 85                        14.06
Cobol pre-85                     8.59
Ada 83                           8.47
Basic 87/93                      2.18
C++                              2.05
C 89                             1.55
Fortran 91/92                    0.87
Fortran pre-91/92                0.47
=================               =====
Total                           38.24

Recommendation

Accepting the number of 450 or more general purpose programming languages in use in the 1970s, we can see considerable progress has been made by the Military Departments and Agencies in reducing the number to 37 in major systems that are new or being modernized. Yet the survey indicates that a substantial legacy of applications remain that use older versions of programming languages, vendor-unique languages, and military-defined languages. The maintenance costs for these applications could be reduced and their reliability increased by converting these applications to a current version of a Federal Information Processing Standard language. Automated conversion methods should offer a cost-effective technology to facilitate this conversion. Re-engineering these applications in another language is also a cost reduction opportunity. Redundant code can be eliminated, software components can be re-used, and modern off-the-shelf programming tools can be used to improve maintainability and reliability.

Consequently, we recommend that Service and Defense Agency Program Managers regularly review their software applications to identify a migration strategy and plan for upgrading them to current versions of standards-based versions of languages and modern labor-saving tools. The progress in reducing the number of languages used, as shown in this survey, indicates that further reduction should be possible. Indeed, we recognize that several migration efforts are already ongoing now.

1. INTRODUCTION

1.1 Purpose

This paper reports the results of a programming language survey commissioned in June 1994 by the Honorable Emmett Paige, Jr., Assistant Secretary of Defense for Command, Control, Communications and Intelligence, and funded by the Defense Information Systems Agency, Center for Software, DoD Software Initiatives Department. The motivation for the survey was a desire to know how many programming languages are being used in the Department of Defense (DoD) today as compared to 20 years ago when the DoD began development of the Ada language.

1.2 Background

We reviewed studies that preceded and succeeded formation of the DoD High Order Language Working Group (HOLWG) in the mid-1970s to locate a primary source for a list of languages then in use within DoD. Two major software problems were under study at that time. The first was the trend toward unaffordable costs for DoD embedded systems software and the second was the potential proliferation of Service-unique programming languages. Software cost studies of this period did not reference specific programming languages, presumably because software development costs did not appear to vary as a function of the specific programming language being used [AF-CCIP 1973, Fisher 1974]. These studies extrapolated total and projected costs based upon other factors (e.g., labor rates, purchase price, and maintenance costs for hardware and system software used to develop embedded systems).

In 1974, each Military Department independently proposed the adoption of a common programming language for use in the development of its own major weapon systems. The then-Director of Defense Research and Engineering (DDR&E), Malcolm R. Currie, called upon the Military Departments to "- immediately formulate a program to assure maximum useful software commonality in the DoD" [Fisher 1977, p. 7]. The establishment of the HOLWG was the Services' response to DDR&E. The Technical Advisor to the HOLWG, Dr. David Fisher, and the Defense Advanced Research Projects Agency sponsor, Colonel William A. Whitaker, have written historical accounts of HOLWG activities but these published papers do not document a list of programming languages in use while the HOLWG effort proceeded [Fisher 1977, Whitaker 1993]. However, Fisher's paper, which summarizes the technical requirements for a common programming language, contains the following reference to languages in use:

There are at least 450 general-purpose languages and dialects currently used in the DoD, but it is not known whether the actual number is 500 or 1500. With few exceptions, the only languages used in data processing and scientific applications are, respectively, Cobol and Fortran. A larger number of programming languages are used in embedded computer systems applications. [Fisher 1976, p. 6]

As part of the present study, Dr. Fisher was contacted concerning the origin of the oft-quoted number of 450 languages being used. He did not recall that a systematic count of languages and versions had been done by the HOLWG. Although there may be papers or reports containing a list of programming languages used by DoD, we were unable to locate them through the open literature resources for use in this study. The analytical method used in the study of DoD software costs approximated the number of compilers installed on general purpose computers. Software cost estimates were derived from analysis of data that the Services were required to report to the General Services Administration under the requirements of the Brooks Act (1965). This data included the numbers, configurations, models, locations, initial cost, and utilization of computer systems. Questions remain about the 450 estimate, including the following:

The DoD does not maintain "corporate level" information on programming languages used in contemporary software projects. Therefore, gaining a reasonably accurate understanding of programming languages being used in the DoD required input from the organizations responsible for developing or maintaining individual systems. Accordingly, these organizations are the primary source for this survey data.

1.3 Approach

This study began with the identification of data elements needed for an analysis of programming language usage in the development or maintenance of DoD weapon systems and Automated Information Systems (AISs). The 1994 Presidential Budget was used to select a sample of weapon systems to survey. The current DoD list of major AISs was used to select a sample to survey.

Service and DoD program offices provided the data on the programming languages being used to develop or maintain their operational and support software. The primary data reported included the generations and names of the programming languages being used and the amount (source lines) of software written in each programming language expressed as a percentage of the total system. Additional data reported includes the acquisition category and life-cycle phase of the program.

A data collection form was designed to record the data elements identified by the survey respondents. Potential respondents were contacted by telephone to get their agreement to participate in the survey. The data collection form was then faxed to each participant and responses were analyzed to extract the information reported in this study.

1.4 Language Counting Issues

The classification of programming languages for counting purposes has always been, and continues to be, a highly debated subject on which experts differ in definitions and philosophy. Even when definitions are generally agreed upon, the application of the definition in a particular case is often difficult, with results depending on the judgement of a person.

For the purposes of this report, the key issue is the difference between "version" and "dialect." We use the term "dialect" to indicate a relatively minor change in a language whereas "version" indicates a larger change and usually has a different "name" although the new "name" may only be the concatenation of a different year or number to the baseline name (e.g., Jovial, Jovial 73). While these definitions may appear to be abstract issues of interest only to language specialists, they actually have a profound effect on portability, interoperability, and counting. If a dialect (involving small changes) is involved, training and portability may be easier than with a new "version." A dialect would normally not be considered a separate language. A version may or may not be considered a separate language, depending on the purposes of the counting. In this report we counted historical versions that divide conveniently between pre- and current version years.

Because the practical usage of programming languages is generally at the third generation level, this survey concentrates on this level while still collecting some minimal data for other generations of languages. Consequently, the results from this survey can be compared only in a general way with the historical assertion about "450" general purpose languages as a practical illustration of what is happening in the DoD environment.

1.5 Scope

The results of this survey are drawn from a limited sample of DoD weapon systems and AISs; therefore, the survey does not provide an exact and detailed record of computer programming language usage in the DoD. Several constraints affected the precision of the results:

1.6 Organization

A description of the methods used to identify the survey population and sample is found in Section 2. A profile of the survey respondents is presented in Section 3. Analysis of the programming language data obtained by the survey is provided as findings in Section 4. Section 5 summarizes the conclusions drawn from survey results. Section 6 contains the recommendation. Appendix A contains the survey instrument and Appendix B provides the data obtained during the survey. We have provided as much detail as possible about the method and response data with the intent of providing a documented baseline for future language studies.

2. SURVEY METHOD

Several approaches to conducting the survey were initially considered. These approaches are briefly discussed below before describing in detail the selected approach.

A comprehensive DoD data call was considered, involving a formal request for specific data elements throughout the DoD. This approach was rejected because it would have encompassed a great deal of effort on the part of operational organizations whose primary mission is readiness. Historically, the response rate has been low to data calls for information that is not directly related to assigned missions.

Another approach involved reviewing several automated databases that contain programming language information on DoD systems. Several of these databases were examined as part of this study, but none were able to provide the information required. It was also difficult to determine the lineage and accuracy of the data. Therefore, these databases were not used as part of the present study.

The approach that was chosen involved direct contact with the organizations responsible for developing or maintaining systems that contain software. This section provides a detailed description of this approach, including the survey populations and samples, trade-offs made in designing the data collection form, the method used in contacting potential respondents, the methods for handling erroneous response data values, and the methods for analyzing the survey results.

2.1 Population Identification

We recognize that a census population of software would include systems that are new or undergoing major modernization and software in a steady state of maintenance. Software being maintained is a collection of applications that are difficult to identify because they are aggregated under operational costs. After a trial effort, we could see clearly that the estimated time and effort to approximate a census population would exceed the targets agreed for this survey effort. Consequently, we identified a judgement population as described in the next sections.

2.1.1 Weapon Systems Population

Weapon systems include aircraft, ships, tanks, tactical and strategic missiles, smart munitions, space launch and space-based systems, command and control (C2), and command, control, communications (C3), and intelligence (C3I) systems. For the purposes of this survey, weapon system software is considered to comprise embedded, C3, and C3I systems, as well as any other software that directly supports or is critical to a weapon system's mission [STSC 1994].

Four acquisition categories (ACAT) are defined for weapon systems by DoD Instruction 5000.2 [DoDI 1991, pp. 2-2-2-4]:

2.1.2 Automated Information Systems Population

An Automated Information System (AIS) can be functionally described as follows:

A combination of computer hardware and computer software, data and/or telecommunications, that performs functions such as collecting, processing, transmitting, and displaying information. Excluded are computer resources, both hardware and software, that are: physically part of, dedicated to, or essential in real time to the mission performance of weapon systems; used for weapon system specialized training, simulation, diagnostic test and maintenance, or calibration; or used for research and development of weapon systems. [DoDI 1993]

These systems are often categorized as automatic data processing systems that are designed to meet specific user requirements for business functions (e.g., transaction processing, accounting, statistical analysis, or record keeping) and they are implemented on general purpose computers, including personal computers.

An authoritative source for a complete inventory of existing AISs could not be identified. Given the time and effort constraints placed on this study, the list of 53 designated major AISs was used as the AIS survey population [OASD 1994]. A major AIS is defined as one that is not a highly sensitive, classified program (as determined by the Secretary of Defense), and that according to DoDI 8120.1, the instruction on life cycle management of AISs [DoDI 1993], is characterized by the following:

2.2 Sample Selection

The approach used in selecting the sample from the population of weapon systems and AISs is described in the next section.

2.2.1 Weapon Systems Sample

A close approximation of the population of existing weapon systems was found in a commercially available publication [Carroll 1994]. This publication provided a list of over 1,300 RDT&E and procurement programs for all Services and DoD Agencies. The list, called the Program Management Index (PMI), was based on the President's 1994 budget request and identifies all RDT&E programs with current or future fiscal budgets exceeding $15 million and procurement programs with total budgets of more than $25 million.

The PMI contains a number of programs that do not develop or maintain software for a weapon system (e.g., ammunition programs, medical research, biodegradable packaging technology) and lacks some programs that would have been of interest such as intelligence systems, highly classified programs, and programs below the budgetary thresholds cited. The PMI was then reviewed to eliminate programs that were obviously outside of the population of interest. For example, programs such as 25MM Ammunition Development, Health Hazards of Military Material, and Petroleum Distributions were eliminated from the population. Also eliminated were basic and applied research programs that involve technology years away from being fielded. While these programs often involve small amounts of prototype software development, the scope of the survey constrained the size of the survey sample.

Each of the programs remaining in the PMI list was briefly examined to characterize the likelihood of being a weapon system. Weapon systems such as aircraft, ships, and tanks were (usually) easily identifiable. However, many of the programs required additional effort to determine their relevance to the population. For example, the AN/BSY-2 is an RDT&E project. Unless one is familiar with the AN/BSY-2 project, it is not immediately clear that it is the combat system for the Seawolf submarine and contains an aggregate of several million lines of software.

Of the 423 programs selected from the PMI list to form the survey sample, 142 were eliminated from the sample after we found that they had been cancelled or were combined with another program, or contained no software. The remaining 281 programs included most of the typical weapon platforms (e.g., aircraft, ships, submarines, tanks) and many of the sensors, communication systems, and weapon subsystems.

2.2.2 Automated Information Systems Sample

Of the 53 AISs on the original list, 2 have been cancelled, 4 were primarily acquisitions for hardware and commercial off-the-shelf (COTS) software, 5 have not begun to develop software, and 4 programs had no current program manager name and telephone number. The survey sample of AISs for this study, therefore, consists of the remaining 38 major AISs.

2.3 Data Collection Form

A data collection form was designed for this survey to reduce respondent error and to present technically accurate language choices. Because data was to be collected on five different programming language generations, definitions of these language generations were adapted from the ANSI/IEEE Standard Glossary of Software Engineering Terminology [ANSI/IEEE 1990] with advice from Ms. Jean Sammet, language historian. These definitions were provided on the form as follows:

Languages were grouped on the data collection form by these generations and listed by name and version within the third generation languages category. We decided not to ask for name and version of first, second, fourth, and fifth generations because supplying that type of data would require an inordinate amount of research effort for respondents to provide and for us to validate.

An overriding concern for the data collection form was to keep it as simple as possible. Data collection forms that are lengthy or require a great deal of effort to complete are less likely to be completed and returned. Thus, the following design decisions were made with respect to the data collection form:

The key information desired from each survey respondent included the following items:

Secondary information desired from each survey respondent included the following items:

A pilot survey was conducted using a preliminary version of the data collection form. Improvements were made according to suggestions made by several respondents as well as by analysis of their responses. Appendix A provides a copy of the final data collection form.

2.4 Contact Process

The process for contacting potential survey respondents for weapon systems and AISs differed only in the means by which telephone numbers were obtained. For weapon systems, the PMI list provided the name and telephone number of each weapon system program manager. For AISs, the Office of the Secretary of Defense official responsible for oversight of that AIS was contacted to provide the name and telephone number of the AIS program manager.

The purpose of the survey was described upon contacting each potential respondent. Suggestions for filling out the form were provided and the form was then faxed to the potential respondent. If a response was not received after three weeks, a follow-up call was placed.

2.5 Respondent Errors

Some data collection forms were not completely or accurately filled out by survey respondents. For example, respondents may have omitted the Acquisition Category because it was not known to the respondent or was overlooked. The most common instance of inaccurate responses was that two different programming languages were listed as being used for over 75% of the system. If the correct data was not immediately obvious, the respondent was either contacted for the correct data or the values reported for the data element were excluded from our analysis and logged as a non-response. Graphic displays of survey results in the next section show these errors as "data not available."

2.6 Analysis Process

The process for estimating the total number of SLOC addressed by this survey is now described. As discussed in Section 2.3, respondents were not requested to provide an exact SLOC count for their response. Rather, they were asked to select from a range of "Total Source Lines of Code." A uniform procedure for estimating the SLOC represented by each survey response form was developed. Table 1 provides the Total SLOC ranges on the response form and the corresponding SLOC count assigned to each range. For example, if the "100-500K" range was checked on the response form, 300K was used as the total SLOC covered by the response form. The SLOC sizes in the "Value Assigned" column in Table 1 were subjectively assigned. However, if an exact SLOC count was provided on the response form, that count was used in place of an estimate. The total SLOC addressed by this survey was therefore derived by summing the estimated SLOC (or in some cases the exact SLOC) from each response form. Values assigned in Table 1 were subjectively assigned for the top and bottom ranges; the midpoint was used for other ranges.

Table 1. Values Assigned to SLOC Range Estimates

"Total SLOC" Range            Value Assigned
Marked on Response Form
-----------------------       --------------
1-100K                              75K
100-500K                           300K
500-1,000K                         750K
1,000-5,000K                     3,000K
5,000+K                          6,000K

Respondents were also requested to provide the percentage of the total system written in each applicable language. Ranges were available to identify this percentage. Table 2 provides the "% of Total" ranges on the response form and the corresponding percentages assigned to each range. For example, if "5-25%" was checked for Jovial 73, 15% was used as the percentage of the total system written in Jovial 73. If an exact percentage was provided on the response form, that percentage was used in place of an estimate. For each response, the SLOC for each language was derived by multiplying the total SLOC count (see Table 1 on page 12) by the estimated percent of total system written in that language.

Table 2. Values Assigned to Language Percentage Estimates

"% of Total" System               Value Assigned
Marked on Response Form
-----------------------           --------------
<5%                                2.5%
5-25%                             15.0%
25-50%                            37.5%
50-75%                            62.5%
>75%                              87.5%

The problems in using SLOC as a means of measuring the amount of software are well publicized [Jones 1991]. It is unlikely that respondents would have provided much data had specific methods for counting SLOC been required. Therefore, survey respondents were allowed to provide SLOC range estimates using their method for counting SLOC. Clearly, non-uniform methods for counting SLOC reduces the precision of the SLOC-related portions of the survey. However, this trade-off does not detract from the primary purpose of the survey (i.e., to produce a count of programming languages being used in the DoD today).

3. RESPONDENT AND PROGRAMMATIC PROFILE

Before presenting the survey results, it is important to realize that the level of abstraction of survey responses varies (see Section 2.6 to understand the rationale for this decision). For example, some responses describe an entire weapon system (e.g., the V-22 Osprey), other responses describe different versions of a weapon system (e.g., the Standoff Land Attack Missile (SLAM) Baseline and the SLAM Upgrade), while other responses describe major subsystems resident within a weapon system (e.g., seven subsystems on the C/KC-135). Consequently, there is not a one-to-one mapping between a survey response and a single weapon system. Therefore, survey results are presented in terms of responses, not "programs" or "systems".

The survey data collection form was structured to provide the Service and Agency distribution of respondents as the demographic data of interest to DoD. Attributes being surveyed included the acquisition cost category and the life-cycle phase. This section presents observations from the weapon system and AIS responses.

3.1 Weapon System Responses

The distribution of the weapon system responses in terms of Service participation, acquisition category, and acquisition phase are presented for information purposes only.

3.1.1 Services

Figure 1 presents the distribution of responses by Services. The sample of programs selected was not evenly distributed among Army (19%), Navy (50%), and Air Force (26%); consequently, nearly half of the responses were from the Navy. The "Other" category represents responses from the Ballistic Missile Defense Organization, Defense Logistics Agency, and Defense Information Systems Agency.

3.1.2 Acquisition Category

Figure 2 presents the distribution of acquisition categories for the weapon system responses. The largest percentage of responses were from ACAT I programs, with ACAT III close behind.

3.1.3 Acquisition Phase

Figure 3 presents the distribution of acquisition phases for the weapon system responses. The Engineering & Manufacturing Development and Production & Deployment phases combine to represent 79% of the total number of responses.

3.2 AIS Responses

The distribution of the AIS responses in terms of Service participation and acquisition phase are presented for information purposes only. Acquisition category is not defined by the same rules as for weapon systems. The data collected from the survey forms has been omitted here because it was considered unreliable (e.g., over half of the respondents did not report acquisition cost category).

3.2.1 Services

Figure 4 presents the distribution of Services contributing to the major AIS survey. The "Other" category includes the Defense Information Systems Agency and Defense Logistics Agency. There were no Marine Corps AISs in the survey samples.

3.2.2 Acquisition (Life-Cycle) Phase

Life-cycle phases for AISs are defined by DoDI Instruction 8120.1 [DoDI 1993]. Figure 5 presents the distribution of life-cycle phases reported by the major AISs surveyed.

Figure 1. Distribution by Service for Weapon System Responses (Not Shown)

Figure 2. Distribution by Acquisition Category for Weapon System Responses (Not Shown)

Figure 3. Distribution by Acquisition Phase for Weapon System Responses (Not Shown)

Figure 4. Distribution by Service for AIS Responses (Not Shown)

Figure 5. Distribution by Acquisition Phase for AIS Responses (Not Shown)

4. LANGUAGE USAGE FINDINGS

4.1 Weapon System Findings

Finding 1: Most weapon system software is being written and maintained in (general and special purpose) third generation languages.

More than 150 million SLOC (i.e., 81%) of the weapon system software surveyed is written in third generation languages. Without historical data similar to Figure 6, trends such as the changing emphasis on particular language generations cannot be adequately identified. However, it is very likely that over the past 20 years there has been a gradual decline in the use of machine and assembly languages and a corresponding increase in third generation languages.

Table 3 on page 20 provides a numerical presentation of the same data as Figure 6. Table 4 lists the estimated total surveyed SLOC for each third generation language. The Total SLOC Reported column in Table 3 and Table 4 has been rounded to the nearest million.

Table 3. Total SLOC by Language Generation for Weapon System Responses

Language Generation  Total SLOC Reported
                       (in millions)
-------------------  -------------------
First                     3.90
Second                   26.30
Third:
    General Purpose     148.38
    Special Purpose       3.70
Fourth                    5.00
Fifth                     0.29

Table 4. Total SLOC by General Purpose 3GL for Weapon System Responses

Third Generation           Total SLOC Reported
Language and Version       (in millions)
--------------------       -------------------
Ada 83                        49.70
C 89                          32.50
Fortran pre-91/92             18.55
CMS-2 Y                       14.32
Jovial 73                     12.68
C++                            5.15
CMS-2 M                        4.23
Other 3GLs                     3.38
Pascal pre-90                  3.62
Jovial pre-J73                 1.12
Fortran 91/92                  1.00
PL/I 87/93 subset              0.64
Basic 87/93 (full)             0.48
PL/I 76/87/93                  0.36
Pascal 90 (extended)           0.29
Basic 78 (minimal)             0.17
LISP                           0.10
Cobol pre-85                   0.09
Cobol 85                       0.00
====================         ======
Total                        148.38

The following special purpose third generation languages were also reported (Table 5).

Table 5. Third Generation Special Purpose Languages

Language   Purpose                     SLOC
ATLAS      Equipment Checkout          1.38
VHDL       Hardware Description        0.18
CDL        Hardware Description        0.22
GPSS       Simulation                  0.04
Simulink   Simulation                  0.06
CSSL       Simulation                  0.01
ADSIM      Simulation                  0.02
SPL/1      Signal Processing           1.62
SPL        Space Programming           0.01

Respondents were provided space on the data collection form to identify any programming languages being used that were not already listed. These languages formed the "Other 3GLs" noted in Table 4 on page 20, and included the languages listed in Table 5 and Table 6.

Table 6. Third Generation "Other" Languages

Language     Purpose Unverified
--------------------------------------------
DTC
LISA         Language for Systolic Array Processor
PIL          HARM Program Implementation Language
PLM
PLM-51
PLM-86
Pspice
REXX HOL
TACL TSC
VTL

Finding 2: Ada is the leading third generation language in terms of existing weapon system source lines of code.

Figure 7 presents the top five third generation languages in terms of estimated total SLOC surveyed. Survey responses reported an estimated 49+ million SLOC in Ada and 32+ million SLOC in C. These five languages represent about 84% of the total estimated third generation SLOC reported.

Finding 3: Ada is the leading third generation language in terms of number of weapon system responses indicating usage.

Figure 8 presents the top five third generation languages in terms of the number of responses reporting specific language use. As can be seen, 143 responses indicated the use of Ada and 122 responses indicated the use of C. In comparing Figure 7 and Figure 8, the key difference is the more frequent reported use of C++, albeit with fewer total estimated surveyed SLOC. Note that the data presented in Figure 7 do not represent a uniform population (i.e., survey responses address varying levels of abstraction). See Section 2.6 for details.

Finding 4: Two-thirds of the weapon system responses reported on application systems of 500,000 or less SLOC.

Figure 9 presents the distribution of responses in terms of the Total SLOC range selected on the response form. The large number of 1-499+K responses is due, in part, to responses at the subsystem level.

Finding 5: Over 70% of the weapon system responses indicated the use of more than one programming language from all five generations.

Figure 10 presents the distribution of responses in terms of the number of languages reported on a response form (single subsystem or system).

Finding 6: Multiple versions of third generation languages are being used in weapon systems.

The goal of the 1970s, language commonality within the weapon system community, has not been reached yet even for military standards such as Jovial and CMS-2 (Figure 11). In addition, at least two versions are being used for most Federal Information Processing Standards (FIPS). Different versions of a language are almost always incompatible. Dialects of a version present subtle but not inconsequential porting problems, particularly when they are dialects based upon older versions of the language. For example, there are 10 or more different dialects of pre-J73 Jovial still in use.

4.2 AIS Findings

Finding 7: Most AIS software is being written and maintained in third generation languages.

Figure 12 is the SLOC distribution of all generations of languages used in AIS application. Table 7 is the numeric presentation of Figure 12. The use of first generation language (machine language) is limited to only one of the AISs. The use of assembly (including proprietary macro languages) is inconsequential when compared to weapon system applications.

Table 7. Total SLOC by Language Generation for AISs

Language Generation   Total SLOC Reported
                       (in millions)

First                  0.30
Second                 0.63
Third
   General Purpose    38.24
   Special Purpose     0.00
Fourth                10.81
Fifth                  0.05

Table 8 is the SLOC estimates in millions for third generation languages.

Table 8. Total SLOC by 3GL for AISs

Third Generation          Total SLOC Reported
Language / Version        (in millions)

Cobol 85                   14.06
Cobol pre-85                8.59
Ada 83                      8.47
Basic 87/93                 2.18
C++                         2.05
C 89                        1.55
Fortran 91/92               0.87
Fortran pre-91/92           0.47
=================          =====
Total                      38.24

Finding 8: Cobol is the leading third generation language in terms of existing AIS source lines of code.

Figure 13 presents the top five third generation languages in terms of estimated total SLOC reported. Survey responses reported an estimated 22 million SLOC in two versions of Cobol and about 8 million SLOC in Ada. These five languages represent about 89% of the total estimated third generation SLOC reported.

Finding 9: Ada is the leading third generation language in terms of number of AIS responses indicating usage.

Figure 14 shows that the use of Ada was reported by more respondents, although the number of lines of source code written in Ada is less than for Cobol.

Finding 10: Most of the AIS responses reported on application systems are in the range of 100K-5,000K SLOC.

Figure 15 depicts that 85% of the responses are evenly distributed in the mid-size range of applications.

Finding 11: Ninety percent of the AISs surveyed indicated the use of one or more third generation programming languages.

The first column in Figure 16 showing no use of third generation languages indicates that some applications are developed only with fourth generation languages. Fourth generation languages for such applications as database query, report writing, and screens are not applicable to weapon system applications except in the support activities required to construct or maintain applications.

Finding 12: Multiple versions of third generation languages are being used in AISs.

Figure 17 indicates that Cobol 85, the current FIPS version, has not had a significant effect on AIS applications, and that older versions of Fortran exceed the number of applications written in the current version.

Figure 7. Top Five 3GLs by Total SLOC for Weapon System Responses (Not Shown)

Figure 8. Top Five 3GLs by Reported Usage for Weapon System Responses (Not Shown)

Figure 9. Distribution of Total SLOC Size for Weapon System Responses (Not Shown)

Figure 11. Comparison of 3GLs with Multiple Versions for Weapon System Responses (Not Shown)

Figure 12. Total SLOC by Language Generation for AIS Responses (Not Shown)

Figure 13. Top Five 3GLs by Total SLOC for AIS Responses (Not Shown)

Figure 14. Top Five 3GLs Reported by AIS Responses (Not Shown)

Figure 15. Distribution of Total SLOC Size for AIS Responses (Not Shown)

Figure 16. Distribution of Number of 3GLs Reported by AIS Responses (Not Shown)

Figure 17. Comparison of 3GLs with Multiple Versions for AIS Responses (Not Shown)

5. CONCLUSIONS AND DISCUSSION

This survey is not a universal census of weapon systems and AISs but the results reported do represent a substantial and visible portion of the population. Even though the sample size was constrained by available time and resources, a systematic method was used and documented so that others who care to extend the sample size at a later date will be able to obtain results that are consistent with the language counting method used in this survey. The responses received represent over 60% of the programs contacted. We have drawn the following conclusions about programming languages currently used in the DoD, based upon findings from the survey:

Conclusion 1:

The estimated 237.6 million SLOC in this survey are distributed among the five generations of programming languages currently used. The largest and most significant group of programming languages in SLOC is third generation languages which consist of 37 languages. In this group, there are 18 general purpose languages (including separate counts for differing versions of major languages as shown on the survey form), 9 special purpose languages (a subset of third generation languages), and 10 unclassified languages.

The issue of how to count languages makes this conclusion open to some level of debate. There are many dialects of a language version that some may choose to count as unique languages. If we accept the historical assertion that at least 450 third generation languages were used in the late 1970s, we can see that considerable progress has been made toward reducing the number of programming languages used in DoD.

Conclusion 2:

Ada 83 is being used in weapon system software and AISs that are being modernized. Using SLOC as a measure of usage, Ada ranks first (ref. Table 4 on page 20) in weapon systems. In AISs, Ada 83 has not replaced Cobol (ref. Table 8 on page 28).

The fact that Ada usage is not greater in DoD could be due to several factors. First, production quality Ada compilers and development tools were not available immediately after the language was adopted as a standard. There was a lag-time of four to five years before compiler vendors could offer choices of Ada environments for high performance host/target machines. Second, there is always inertia to overcome before change can occur and the resistance of the DoD software development community to DoD policy toward the use of Ada perpetuated that inertia. And third, it takes time to educate and train software engineers and managers to understand the language and to use it effectively.

There is an unknown quantity of legacy software being maintained by software support activities that modify code and/or provide data processing service. Many of these software applications were developed by contractors and are being maintained by the government using the language versions and dialects chosen by the development contractor. The constraints on this survey precluded our being able to systematically collect a sample from the software maintained by O&M budgets. However, we speculate that languages used in the maintenance community include more use of second generation languages (assembly) and older versions of third generation languages.

Conclusion 3:

The usage of first generation language (machine) in both weapon systems and AIS applications is insignificant (ref. Table 3 on page 20).

The existence of first generation language (machine) is almost certainly due to the continued maintenance of fairly old legacy hardware and software. It is highly unlikely that future new software will be written in first generation languages, considering the target computer systems which will be candidates for modernization.

Conclusion 4:

Second generation language (assembly) is being used in both weapon systems and AIS applications and will likely continue in minimal use.

To some extent, the use of second generation languages (assembly) is also due to the continued maintenance of legacy software. However, there are specific reasons, other than historical ones, that have necessitated the use of second generation languages. One of these reasons is special purpose hardware and, in this case, the need for second generation languages will almost certainly continue. Another reason is performance. Ten or more years ago, many systems used second instead of third generation languages for those parts of the system that were time critical. Although the performance of modern third generation languages, such as Ada or C, can meet many such performance issues now, it is likely that minimal use of assembly language will continue for some time for its real or perceived performance properties. However, this will become less of a problem as better software engineering techniques are used in code generation.

Conclusion 5:

The use of fourth generation languages is greater in AIS applications than in weapon system applications.

AIS applications have used fourth generation languages as database management products, graphical user interfaces, and shrink-wrapped tools have been acquired to improve user services. The SQL standard has not only promoted relational database products but has provided an alternative to the continued use of proprietary languages for data access. The modest use of fourth generation languages by the weapon system community could indicate that COTS products are seldom used to develop software or that the respondents did not consider the development environment as appropriate for this survey.

Conclusion 6:

Fifth generation (artificial intelligence) languages are hardly used in weapon system and AIS applications.

There are several reasons for the very low usage of fifth generation languages. One reason is that the immaturity of fifth generation AI languages does not recommend their use in operational weapon systems. Other reasons could be the lack of exploratory R&D programs in the sample or that many AI problems are being solved with third generation languages.

Conclusion 7:

In both weapons system and AIS applications, the data shows that older versions of programming languages are being used. The perpetuation of applications written in these older versions can create portability and re-use problems.

For example, the continued use of several versions of CMS2, Jovial, Fortran, Cobol, and platform/vendor unique languages may be motivated by short-term economic views. There are tools to aid in re-engineering and conversion tools that makes reimplementing existing software more feasible and practical than to continue maintenance of this multi-version software.

Conclusion 8:

Both weapon system and AIS applications use several languages.

Even if only one language were used, software commonality, portability, and interoperability would be imperfect. With modern programming languages and compilers, increased use of COTS products and re-use of software components, it is possible to produce applications with components written in different languages. Ada, with its specified pragma interfaces, is a language that is well suited to being used with other languages in multi-language applications.

6. RECOMMENDATION

Accepting the number of 450 or more general purpose programming languages in use in the 1970s, we can see considerable progress has been made by the Military Departments and Agencies in reducing the number to 37 in major systems that are new or being modernized. Yet the survey indicates that a substantial legacy of applications remain that use older versions of programming languages, vendor-unique languages, and military-defined languages. The maintenance costs for these applications could be reduced and their reliability increased by converting these applications to a current version of a Federal Information Processing Standard language. Automated conversion methods should offer a cost-effective technology to facilitate this conversion. Re-engineering these applications in another language is also a cost reduction opportunity. Redundant code can be eliminated, software components can be re-used, and modern off-the-shelf programming tools can be used to improve maintainability and reliability.

Consequently, we recommend that Service and Defense Agency Program Managers regularly review their software applications to identify a migration strategy and plan for upgrading them to current versions of standards-based versions of languages and modern labor-saving tools. The progress in reducing the number of languages used, as shown in this survey, indicates that further reduction should be possible. Indeed, we recognize that several migration efforts are already ongoing now.

APPENDIX A. SURVEY INSTRUMENT

The data collection form used in the survey is provided in the pages that follow. Two minor changes to the "System Life-Cycle" portion of the data collection form were made to tailor it for the AIS survey: 1) Engineering and Manufacturing Development was replaced by Development, and 2) Major Modification was replaced by Operations and Support.

Language Survey

1. Name of Program: ________________________________________

2. System Name (if different than above): ______________________

3. Acquisition Category: I: ___, II: ___, III: ___, IV: ____

4. System Life-Cycle Phase:         5. Total Current Source Lines of Code:

         Concept Exploration:  ____              1,000 - 99,999:  ____

    Demonstration/Validation:  ____           100,000 - 499,999:  ____

Engineering and Manufacturing 
                 Development:  ____           500,000 - 999,999:  ____

   Production and Deployment:  ____       1,000,000 - 4,999,999:  ____

          Major Modification:  ____                  5,000,000+:  ____

Please complete the remaining portion of the form by indicating the programming languages currently being used in developing or maintaining all the software (e.g., operational, support) for this program/project.

Language TypeLanguage Name and Version% of Total
.<5%5- 25%25 -50%50 -75% >75%
First GenerationMachine.....
Second GenerationAssembly (Provide Count of Distinct Versions Being Used): ___________.....
Third GenerationAda 83 .....
ALGOLALGOL 60.....
ALGOL 68.....
APL 89.....
BASICBASIC 78 (minimal).....
BASIC 87/93 (full).....
C 89.....
C++ (identify version on page 4).....
CHILL 89.....
COBOLCOBOL pre-85.....
COBOL 85.....
CMS-2CMS-2 Y.....
CMS-2 M.....
FORTRANFORTRAN pre-91/92.....
FORTRAN 91/92.....
JOVIALJOVIAL pre-J73.....
JOVIAL J73.....
LISP (identify version on page 4).....
MUMPSMUMPS pre-90.....
MUMPS 90.....
PascalPascal pre-90.....
Pascal 90 (extended).....
PL/IPL/I 76/87/93.....
PL/I 87/93 subset.....
PROLOG (identify version on page 4).....
SIMULASIMULA pre-67.....
SIMULA 67.....
Smalltalk (identify version on page 4) .....
TACPOL .....
Others: list and identify on page 4 .....
Fourth Generatione.g., SQL, RPG, Clipper, Visual BASIC .....
Fifth Generatione.g., Knowledge/rule base shells .....

Special Purpose Languages

Application AreaGeneric Language NameVersion Name and/or Number% of Total
.<5% 5 - 25% 25 - 30% 50 - 75%>75%
Equipment CheckoutATLAS ......
Hardware DescriptionVHDL ......
CDL ......
SimulationGPSS ......
SIMSCRIPT ......
CSSL......
Signal ProcessingSPL/1 ......
Space ProgrammingSPL ......
StatisticsSPSS ......
SAS......
Robotics LanguagesAL ......
AML......
KAREL......
Expert System LanguagesKRL ......
OPS5......

The following definitions are provided for language generation:

Please provide the language name, version, generation, application area (for special purpose languages) and a reference to the manual (i.e., title, date and publisher) for each programming language or version not listed on page 2 or 3. Provide any additional information that would prove useful in uniquely identifying the language.

Language Name, Version, etc. Manual Title, Date, and Publisher
_____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________

Additional Comments

_____________________________________________________________________________________ _____________________________________________________________________________________ _____________________________________________________________________________________