This paper was presented at Quality Week, San Francisco, CA, May 27-30, 1997.

Error, Fault, and Failure Data Collection and Analysis

Dolores R. Wallace and Laura M. Ippolito

National Institute of Standards and Technology

Gaithersburg, MD 20899

dwallace@nist.gov; ippolito@nist.gov

Herbert Hecht

SoHaR Incorporated

8421 Wilshire Blvd. Suite 201

Beverly Hills, CA 90211

herb@sohar.com

Abstract

The collection and analysis of software error, fault, and failure data from many high integrity systems may yield standard reference data for matching development and assurance methods with characteristics of a specific system. Profiles derived from the data may help researchers to identify areas where new methods of error prevention and detection are most needed. The National Institute of Standards and Technology has initiated a program on error, fault, and failure data collection and analysis to address these two topics.

Keywords

Data collection; error; fault; failure; high integrity software; reference data; software quality; taxonomy; world wide web (WWW).

1. Introduction

The development and assessment of software for high integrity systems requires methods that prevent or detect software faults during development and potential system faults and failures before they result in operational failure. It is difficult to predict how well development and assurance methods succeed in prevention and detection. Because introducing new technologies is costly, companies are reluctant to change unless they have confidence that the new methods will benefit them. Failures in high integrity systems are rare (and usually costly), and a single system usually does not accumulate enough data to permit meaningful statistical evaluations. Without sufficient data from many projects in various domains, researchers have difficulty identifying the types of problems for which new development and assurance methods are needed. The results of a Call for White Papers issued by NIST revealed a strong need for an objective organization to address these problems [NIST95].

The mission of the Information Technology Laboratory (ITL) at NIST is to stimulate U.S. economic growth and industrial competitiveness through technical leadership and collaborative research in critical infrastructure technology (tests and test methods) to promote better development and use of information technology. ITL will provide tests and test methods to facilitate a usable, scalable, interoperable, and secure information technology infrastructure. One of the primary goals of ITL is to assure that U.S. industry, academia, and government have access to accurate and reliable test methods, data, and reference material.

Software researchers need project data on errors, faults and failures to identify characteristics across many projects to develop benchmarks and profiles for selecting methods and software tools. Providing this data is very closely related to ITL's mission. Consequently, ITL has initiated a project for error, fault, and failure data collection and analysis, referred to as the EFF project. The EFF project recognizes the data needs for the development of high integrity systems, and supports the mission of ITL.

Section 2 of this paper describes the goals and tasks of this project, while section 3 provides the status as of March, 1997. Section 4 provides the concepts and objectives of the world wide web (WWW)-based tools that are being built to support this project, while section 5 provides details of the first tool developed in this project.

2. The Error, Fault, Failure Data Collection and Analysis Project

The purpose of the EFF project is to provide reference material consisting of methods and data on software errors, faults, and failures. The EFF project will help industry and researchers assess software system quality by collecting, analyzing, and providing error, fault, and failure data and by providing data collection and statistical methods and tools for the analysis of software systems.

Project data are needed to determine trends on broader concepts such as:

Software error profiles: prevalent types (e.g., unachievable path, initial value, control­flow)

Root technical causes of the errors and the development and assurance methods likely to prevent or detect those errors

Types of problems requiring fault tolerance provisions in the software, and

Error, fault, and failure problems not solvable or measurable by current methods or tools.

Data from many individual projects are needed to develop these and other benchmarks and to provide researchers with sufficient samples to develop new analytic methods and to identify where new methods are needed. Projects and their sponsoring companies need similar data to understand where specific errors types are likely to occur and the frequency with which they occur. From various analysis methods, developers may locate troublesome parts of their programs and may adjust their development methods, adapt their testing processes, and maintain records for controlling their product quality.

Possible benefits to industry from the EFF project include:

Reference materials for evaluating and selecting methods and software tools

Taxonomy and frequency profiles of errors, faults, and failures

Reference methods for analyzing software data, and

Data collection and preliminary analysis tools, using World Wide Web (WWW) technology.

By making data from various domains available to researchers, benefits to research may include:

Software error, fault, and failure data available for analysis

Qualitative and statistical methods for data analysis and measurement

Statistical basis to help with understanding error, fault, and failure data.

The EFF project involves the following tasks:

Generate a standard data collection structure derived from IEEE and industry nomenclature and formats, with provisions for anonymity of data and removal of any proprietary information.

Identify industry, government, academic collaborators/contributors to populate the data repository. Provide a WWW-based data collection and statistical summary tool for individual contributors. Address privacy issues.

Enter data from existing collections. Solicit additional data from contributors. Perform initial summary and analysis. Index data and summarize by common descriptive analysis.

Make sanitized data publicly available through WWW-based facilities at NIST. Validate and sanitize data from contributors. Identify, procure commercial database management system. Refine the data collection and classification methodology as needed.

Develop methods and tools for qualitative and statistical analysis. Identify or develop methods or tools for viewing data, for analyzing data, for measuring impact of methods on software quality, and for assessing relationships of project factors to software quality.

Conduct analyses of collected data. Develop frequency profiles. Conduct analyses to provide understanding of impact of various development and diagnostics methods on failures. Report results/findings.

The plan for the EFF project is aggressive; a primary risk is that the group of willing data contributors will be very small. The fact that NIST's traditional role in defining standards and measures for industry includes objectivity and the ability to protect any proprietary information may help to overcome industry reluctance to provide data. A simple data collection tool, with error-tracking and simple statistics for a contributor's project, may be an incentive for contributors. Another primary risk, normalizing data from diverse environments, is discussed in section 5.

3. Current Status

Several EFF tasks have been initiated and are progressing simultaneously. NIST has formed a collaborative relationship with SoHaR, Inc. under a Cooperative Research and Development Agreement (CRADA) for which SoHaR, Inc. will be an active participant in the project. Another collaborative activity included a meeting in September, 1996, of researchers and industry representatives to discuss problems likely to be encountered and the results of NIST research in identifying a draft data structure, or model. These industry representatives and researchers will continue to provide guidance. Currently the EFF project is seeking contributors of data.

Research on data collection and analysis revealed an organized approach on data collection by Basili [BASI] shown on figure 1. At the September meeting, attendees studied taxonomies for faults and models for descriptive data about a project and its errors, faults, and failures (figure 2) to guide the EFF project in selecting the data to be requested of contributors. Several researchers have agreed to provide data from existing collections. Because such data will vary in content, data from existing collections will be adapted into the data models evolving in this project. The data models are discussed in section 4.

BASILI"S GUIDELINES
1. Establish the goals of the data collection

2. Develop a list of questions of interest

3. Establish data categories

4. Design and test the data collection form

5. Collect and validate data

6. Analyze data

Figure 1. Guidelines on Data Collection and Analysis

The EFF project's design includes two data collection tools, one for faults and one for failures. Both contain statistical and tracking capabilities for use by data contributors. There will be a public data base for viewing and analyzing the data collection which will reside on a computer at ITL. Currently, a prototype exists for the fault data collection tool, which is described in section 5. Review of this tool by EFF participants should be completed by June, 1997.

Information on high integrity software system assurance (HISSA) provided through the WWW page at / links to the taxonomy-based Reference Information for Software Quality (RISQ) [NIST97]. RISQ provides direct access to artifacts for software quality. Among the artifacts (e.g., documents, tools, code) is one called data. The EFF data will reside in RISQ as a data artifact. The data will be accessed through a SQL/ORACLE system residing on the HISSA server. Any contributed data will have been sanitized on a non-publicly-accessible system first, and then will be entered into the ORACLE database. Statistical functions, fault and failure tracking capabilities, and interfaces to publicly-available tools for examining the data will be available on the public system. Users of the data will also be able to download and analyze the data with their own tools.

Taxonomies and Data models
Simple, few elementsRubey

Glass

Weiss/Basili

Knuth

Grady

Fenton/Pfleeger

late 1970's

1981

1985

1989

1992

1996

Security-orientedLandwehr

Aslam

1995

1995

CMU experiment: need data from complex projects Greenberg/Siewiorek1996
Detailed, life cycle orientedIEEE Standard 1044

Beizer

Hecht/Wallace

1993

1990

1996

Figure 2. Diversity Among Taxonomies

4. The EFF Tool Concepts

Important criteria imposed on the EFF project at the September, 1996, meeting are to keep any requests for data as simple as possible and to avoid the use of ambiguous terms. Asking for too much data could result in no data, and yet the collected data must be meaningful. The contributors must understand the meaning of each entry on the data collection tool. A condition on the EFF tools that evolved at the meeting is that any data collection tool should provide a service to the contributors that will make collecting the data worthwhile to them during and following the collection process.

Both the fault data collection and the failure date collection tools are to be WWW-based tools that a contributor could easily install on an organization's server. Almost everyone has access to the WWW through at least an early version of a browser and can download Perl. The amount of data requested will be kept minimal, and definitions of terms will be provided to prevent ambiguous responses. Other assumptions are that for every project, there will be one set of descriptive data about the project environment and there may be many unique faults. The fault data may be entered by one or more people. The data, as collected by the organization, is available only to that organization.

The fault data collection tool will contain some statistical capabilities such as number of faults, number of faults of a specific type, and numbers of faults relative to other information. The fault tracking capability has not been completely defined, but will address items such as number of faults not yet fixed, average amount of time between discovery and fixing of a fault. The organization will have access to the statistical and fault tracking capabilities of the tool so that it can monitor its project.

Similarly, the failure data collection tool will provide a system description record and the failure records. The content of these will differ from the fault tool, and while the structure has been drafted, the tool development is still in progress.

NIST receives the data when the contributing organization is ready to release the data. NIST will examine and validate the data, remove any public identification of the contributing organization, and install the data on the public EFF data base. If requested, the organization's contact will be notified of the identity of its data within the EFF data base.

5. Current EFF Fault Tool

The EFF project chose to develop the fault data collection tool first, and will develop the failure date collection tool along the same lines. Originally the fault data collection tool contained a data structure derived from [HECHT] and the taxonomies indicated on figure 2, but many changes were made after the September meeting.

Within an organization, and particularly within a single project, the environment features for a single project may be the same and will be well known to all who work on the project. These features include elements like development processes, the standards which govern the project, the programming language, quality practices and several others. In this case, drawing conclusions about whether a specific method worked well may be easy, provided other experimentation practices are followed [ZELK]. When data are collected across many projects and different organizations, these environment elements are likely to differ as are characteristics about the organization. Researchers may combine data from like elements but they need to be aware of differences and establish how they will treat them.

To aid with the problem of normalizing data from many environments, the project description data are fairly extensive. Some questions address the company, such as how many years has the company developed software in the application domain. Others, such as primary software language, are critical to understanding some project information, such as size when given in SLOC, and then attempting analyses on across several projects. Size, the types of standards or quality practices applied to the project, and language are only a few characteristics for which fault profiles may change across project. Researchers generally understand such constraints as contained in the type of data in Figure 3.

TYPE OF PROJECT DATA REQUESTED
Project name CMM level
Company name & contactPrimary software language
General project descriptionSize of new & reused code
Date project began % of COTS code
Customer typeRequirements, design methods, automation
Potential severityPerformer of QA / VV
CriticalityCompany experience: in domain
Relation to hardwareCompany experience: w/ software in domain
Application typeCompany experience: w/ software in general
Development purposeCompany quality practices
Contractual requirements

Figure 3. The EFF Fault Tool Project Information

These data are entered only once and in most cases are very straightforward. Most questions are concerned with the type of software system and its unique characteristics.

Description of fault at discovery
Date of fault discoveryPotential severity
Person who found faultDiscovery method
Identifier of where fault discoveredFault classification
Artifact where fault discoveredDegree method automated
Activity during which fault discovered Discovery method effort
Symptom leading to fault discovery

Figure 4. The EFF Fault Tool Discovery Information

The fault record contains three basic types of information: general information such as project name, fault record number, and company contact; descriptive parameters when the fault was first discovered (figure 4); and, data describing correction of the fault (figure 5). Several challenges were encountered in making the tool one which personnel would be willing to use. For example, few people want their names forever associated with a major fault. Yet, to be useful to the company for follow-up purposes, the fault record should contain the name of a contact, so that other members of the organization may ask questions about the fault. The contact should be someone who either found the fault, found the origin of the fault, developed the artifact, corrected the artifact(s) containing the fault, or is familiar with the fault history. The person entering the data will not be identified automatically as any of these.


DESCRIPTION OF FAULT CORRECTION
Date fix completed
Fixer of fault
Identification of artifacts fixed
Was this caused by a previous fix?
Activity during which fix is made
Fix method
Degree method is automated
Fix method effort

Figure 5. The EFF Fault Tool Fix Information

It is important to differentiate between the activity where the fault was manifested, and the actual source of the fault. For example, if the source is in the design, and the fault is carried into and found in the code, then researchers will be more interested in the design processes. And, the developing organization will want to track that all locations affected by the fault have been corrected. The selection of data fields was driven by trying to foresee questions that researchers and organization might ask. Obviously, foreseeing every question is impossible, but we believe the fault data will cover many questions.

One of the biggest challenges is of course the classification of faults and symptoms. The classifications must be easily understood and mutually exclusive. Also, we do not want a long pull-down menu that taxes the patience of contributors by forcing them to scroll through a long list. The problem is to synthesize terms from the many taxonomies we examined. If later we find that contributors are willing to scroll through multi-level lists, then it may feasible to go to more fine-grained fault classifications, e.g., Beizer's taxonomy [BEIZ]. The expectation is that feedback from users of this first version of the EFF fault data tool will be useful in defining the features for the second version.

6. Summary

The ITL at NIST is planning to provide standard reference materials and data for software to assist industry and research in improving high integrity software systems. One project, EFF, is concerned with fault data collected during software development and failure data collected from systems in operation. The purposes are to develop profiles of characteristics about the projects and systems that may be useful to industry in making decisions about their current projects and to provide data to researchers which in turn may aid in solving some of the difficult problems in producing high integrity software systems.

To achieve these purposes, the EFF tool set will contain two data collection tools and a public tool for viewing and analyzing the data collection. All tools will provide statistics and fault, failure tracking functions. Other contributors who already have data are also of interest. The EFF fault tool is ready for release, and a request has gone out seeking contributors. The EFF project already has one industry partner, with prominent researchers and industry representatives cooperating with NIST.

7. Acknowledgment

The authors appreciate the support of Mark Zimmerman of NIST in converting the EFF tool designs into the EFF tools. We are grateful to the participants in the September, 1996 meeting at NIST: Dr. V. Basili, University of Maryland; Dr. S. L. Pfleeger, Howard University; Dr. P. Keiller, Howard University; T. Rhodes, NIST; Dr. M. Zelkowitz, University of Maryland & NIST; Dr. C. Michael, RST Corporation; J. Calvert, Nuclear Regulatory Commission; J. Gaffney, Lockheed-Martin; Dr. N. F. Zhang, NIST; S. Hissam, CARDS.

  1. References

[BASI] V. R. Basili and D. M. Weiss, "A methodology for collecting valid software engineering data," TSE 10, Number 6, November, 1984, 728-738.

[BEIZ] Boris Beizer, Software Testing Techniques, International Thomson Computer Press, 1990.

[HECHT] Herbert Hecht, Dolores Wallace, "Project Data to Support High Integrity Methods, " Nuclear Plant Instrumentation, Control and Human Interface Technologies Conference May 6-9, 1996, Pennsylvania State University, State College, PA.

[NIST95] Dolores Wallace and Marvin Zelkowitz, NISTIR 5677, "Center for High Integrity Software System Assurance­Initial Goals and Activities," U.S. Department of Commerce, Technology Administration, National Institute of Standards and Technology, June, 1995.

[NIST97] Charles B. Weinstock and Dolores R. Wallace, NISTIR 5954, "RISQ: A WWW-Based Tool for Referencing Information on Software Quality," U.S. Department of Commerce, Technology Administration, National Institute of Standards and Technology, January, 1997.

[ZELK] Zelkowitz, Marvin V., and Dolores R. Wallace, "Experimental Models for Software Diagnosis," NIST IR 5889, September, 1996.