1.1 Software testing
This document describes the structured testing methodology for software testing. Software testing is the process of executing software and comparing the observed behavior to the desired behavior. The major goal of software testing is to discover errors in the software [MYERS2], with a secondary goal of building confidence in the proper operation of the software when testing does not discover errors. The conflict between these two goals is apparent when considering a testing process that did not detect any errors. In the absence of other information, this could mean either that the software is high quality or that the testing process is low quality. There are many approaches to software testing that attempt to control the quality of the testing process to yield useful information about the quality of the software being tested.
Although most testing research is concentrated on finding effective testing techniques, it is also important to make software that can be effectively tested. It is suggested in [VOAS] that software is testable if faults are likely to cause failure, since then those faults are most likely to be detected by failure during testing. Several programming techniques are suggested to raise testability, such as minimizing variable reuse and maximizing output parameters. In [BERTOLINO]] it is noted that although having faults cause failure is good during testing, it is bad after delivery. For a more intuitive testability property, it is best to maximize the probability of faults being detected during testing while minimizing the probability of faults causing failure after delivery. Several programming techniques are suggested to raise testability, including assertions that observe the internal state of the software during testing but do not affect the specified output, and multiple version development [BRILLIANT] in which any disagreement between versions can be reported during testing but a majority voting mechanism helps reduce the likelihood of incorrect output after delivery. Since both of those techniques are frequently used to help construct reliable systems in practice, this version of testability may capture a significant factor in software development.
For large systems, many errors are often found at the beginning of the testing process, with the observed error rate decreasing as errors are fixed in the software. When the observed error rate during testing approaches zero, statistical techniques are often used to determine a reasonable point to stop testing [MUSA]. This approach has two significant weaknesses. First, the testing effort cannot be predicted in advance, since it is a function of the intermediate results of the testing effort itself. A related problem is that the testing schedule can expire long before the error rate drops to an acceptable level. Second, and perhaps more importantly, the statistical model only predicts the estimated error rate for the underlying test case distribution being used during the testing process. It may have little or no connection to the likelihood of errors manifesting once the system is delivered or to the total number of errors present in the software.
Another common approach to testing is based on requirements analysis. A requirements specification is converted into test cases, which are then executed so that testing verifies system behavior for at least one test case within the scope of each requirement. Although this approach is an important part of a comprehensive testing effort, it is certainly not a complete solution. Even setting aside the fact that requirements documents are notoriously error-prone, requirements are written at a much higher level of abstraction than code. This means that there is much more detail in the code than the requirement, so a test case developed from a requirement tends to exercise only a small fraction of the software that implements that requirement. Testing only at the requirements level may miss many sources of error in the software itself.
1.2 Software complexity measurement
Software complexity is one branch of software metrics that is focused on direct measurement of software attributes, as opposed to indirect software measures such as project milestone status and reported system failures. There are hundreds of software complexity measures [ZUSE], ranging from the simple, such as source lines of code, to the esoteric, such as the number of variable definition/usage associations.
An important criterion for metrics selection is uniformity of application, also known as "open reengineering." The reason "open systems" are so popular for commercial software applications is that the user is guaranteed a certain level of interoperability-the applications work together in a common framework, and applications can be ported across hardware platforms with minimal impact. The open reengineering concept is similar in that the abstract models used to represent software systems should be as independent as possible of implementation characteristics such as source code formatting and programming language. The objective is to be able to set complexity standards and interpret the resultant numbers uniformly across projects and languages. A particular complexity value should mean the same thing whether it was calculated from source code written in Ada, C, FORTRAN, or some other language. The most basic complexity measure, the number of lines of code, does not meet the open reengineering criterion, since it is extremely sensitive to programming language, coding style, and textual formatting of the source code. The cyclomatic complexity measure, which measures the amount of decision logic in a source code function, does meet the open reengineering criterion. It is completely independent of text formatting and is nearly independent of programming language since the same fundamental decision structures are available and uniformly used in all procedural programming languages [MCCABE5].
Ideally, complexity measures should have both descriptive and prescriptive components. Descriptive measures identify software that is error-prone, hard to understand, hard to modify, hard to test, and so on. Prescriptive measures identify operational steps to help control software, for example splitting complex modules into several simpler ones, or indicating the amount of testing that should be performed on given modules.
1.3 Relationship between complexity and testing
There is a strong connection between complexity and testing, and the structured testing methodology makes this connection explicit.
Second, complexity can be used directly to allocate testing effort by leveraging the connection between complexity and error to concentrate testing effort on the most error-prone software. In the structured testing methodology, this allocation is precise-the number of test paths required for each software module is exactly the cyclomatic complexity. Other common white box testing criteria have the inherent anomaly that they can be satisfied with a small number of tests for arbitrarily complex (by any reasonable sense of "complexity") software as shown in section 5.2.
1.4 Document overview and audience descriptions
Programmers who are not directly involved in testing may concentrate on sections 1-4 and 10. These sections describe how to limit and control complexity, to help produce more testable, reliable, and maintainable software, without going into details about the testing technique.