This appendix illustrates the structured testing technique of Chapter 5 for a module of significant complexity. The example is a C translation of the FORTRAN blackjack game program used in [NBS99] for the same purpose. No attempt has been made to improve the structural quality of the original version, and the error has been preserved. Section B.2 describes an experimental framework for comparison of test coverage criteria. Although structured testing does not guarantee detection of the error in the blackjack program, experimental results show that it is much more effective than branch coverage.
B.1 Testing the blackjack program
Since there are many variants on the blackjack rules, the following specification describes the rules being used for this example. The mechanics of the application of structured testing can be understood without reference to the details of the specification, but the specification is required to detect incorrect program behavior. The details of the "hand" module follow, as does the description of the testing process.B.1.1 The specification
The program, as the dealer, deals two cards to itself and two cards to the player. The player's two cards are shown face up, while only one of the dealer's cards is shown. Both the dealer and the player may draw additional cards, which is known as being "hit." The player's goal is to reach 21 or less, but be closer to 21 than the dealer's hand-in which case the player wins. Ties go to the dealer. If the player's or the dealer's hand totals greater than 21, the hand is "busted" (a loss). The King, Queen, and Jack all count as 10 points. All other cards, except the Ace, count according to their face values. The Ace counts as 11 unless this causes the hand to be over 21, in which case it counts as 1.B.1.2 The module
Figure B-1 shows the code for module "hand" from the blackjack program, which will be used to illustrate structured testing.6 A0 int hand()7 {/* return win */8 int d = 0, pace, dace;9 int fHit; /* 1 for hit, zero for not hit */10 A1 A2 A3 A4 A5 p = 0; d = 0; pace = 0; dace = 0; win = 0;11 /* win will be 0 if dealer wins, 1 if player wins, 2 if a push */12 A6 A7 A8 A9 hit(&p, &pace); hit(&d, &dace);13 A10 A11 A12 A13 hit(&p, &pace); hit(&d, &dace);14 A14 count = 0;15 A15 A16 printf("DEALER SHOWS --- %d\n", cards[i-1]);16 A17 dshow = cards[i-1];17 A18 A19 printf("PLAYER = %d, NO OF ACES - %d\n", p, pace);18 A20 if (p == 21) {19 A21 A22 printf("PLAYER HAS BLACKJACK\n");20 A23 win = 1;21 } else {22 A24 count = 2;23 L11:24 A25 A26 check_for_hit(&fHit, p, count);25 A27 if (fHit == 1) {26 A28 A29 hit(&p,&pace);27 A30 count += 1;28 A31 A32 printf("PLAYER = %d, NO OF ACES - %d\n", p, pace);29 A33 if (p > 21) {30 A34 A35 printf("PLAYER BUSTS - DEALER WINS\n");31 A36 goto L13;32 }33 A37 A38 goto L11;34 }35 A39 A40 }36 /* Handle blackjack situations, case when dealer has blackjack */37 A41 if (d == 21) {38 A42 A43 printf("DEALER HAS BJ\n");39 A44 if (win == 1) {40 A45 A46 printf("------ PUSH\n");41 A47 win = 2;42 A48 goto L13;43 } else {44 A49 A50 printf("DEALER AUTOMATICALLY WINS\n");45 A51 goto L13;46 A52 }47 } else {48 /* case where dealer doesn't have blackjack:49 * check for player blackjack or five card hand50 */51 A53 A54 if (p == 21 || count >= 5) {52 A55 A56 printf("PLAYER AUTOMATICALLY WINS\n");53 A57 win = 1;54 A58 goto L13;55 }56 A59 A60 }57 A61 A62 printf("DEALER HAS %d\n", d);58 L12:59 A63 if (d <= 16) {60 A64 A65 hit(&d,&dace);61 A66 if (d > 21) {62 A67 A68 printf("DEALER BUSTS - PLAYER WINS\n");63 A69 win = 1;64 A70 goto L13;65 }66 A71 A72 goto L12;67 }68 A73 A74 A75 printf(" PLAYER = %d DEALER = %d\n", p, d);69 A76 if (p > d) {70 A77 A78 printf("PLAYER WINS\n");71 A79 win = 1;72 } else73 A80 A81 A82 printf("DEALER WINS\n");74 L13:75 A83 return win;
Figure B-1. Code for example blackjack module "hand."76 A84 }
Figure B-2 shows the graph for module "hand," which has cyclomatic complexity 11 and essential complexity 9. The high essential complexity makes the control structure difficult to follow. In such cases, the baseline method can be performed using the source listing as a guide, with paths represented by ordered lists of decision outcomes. Decisions are "flipped" exactly as in the graphical version of the method.
The original blackjack testing example [NBS99] demonstrated the manual application of the baseline method in full detail, which will not be duplicated here. Instead, we follow the modern tool-assisted testing process, in which basis path coverage is built on a foundation of functional testing. We show that unrealizable test paths do not necessarily imply that rc < v(G) or that testing should cease, and we also demonstrate control dependency analysis as a test case derivation technique.
B.1.3 Tests through the module
Replacing the "hit" routine with a version that allows user input of the next card is sufficient to allow individual paths to be exercised. We now follow a typical structured testing process, beginning with functional tests while using an automated tool to measure ac. Input data derivation is simplified by first determining the meaning of all program variables used in decision expressions. The variable "p" holds the total value of the player's hand, and "d" holds the dealer's hand value. Those two variables are updated by the "hit" routine. The "fHit" and "win" variables are commented with their usage. When "fHit'' is 1, the player or dealer has elected to hit, and a value of 0 means stand. A value of 0 for "win" means the dealer wins, 1 means the player wins, and 2 means there was a push. This is the value returned by the module. Finally, the variable "count" stores the total number of cards in the player's hand.18( 20): p==21 ==> TRUE37( 41): d==21 ==> TRUE
The second functional test is blackjack for the player, in which the original deal leaves the player with 21 and leaves the dealer with an inferior hand. Execution shows correct behavior. The associated path is:39( 44): win==1 ==> TRUE18( 20): p==21 ==> TRUE37( 41): d==21 ==> FALSE
The third functional test is blackjack for the dealer, in which the original deal leaves the player with less then 21, the player does not get hit, and the dealer has 21. Execution shows correct behavior. The associated path is:51( 53): p==21 ==> TRUE18( 20): p==21 ==> FALSE25( 27): fHit==1 ==> FALSE37( 41): d==21 ==> TRUE
The fourth functional test is a five-card win, in which the player accumulates five cards without going over 21 and the dealer was not dealt blackjack. Execution shows correct behavior. The associated path is:39( 44): win==1 ==> FALSE18( 20): p==21 ==> FALSE25( 27): fHit==1 ==> TRUE29( 33): p>21 ==> FALSE25( 27): fHit==1 ==> TRUE29( 33): p>21 ==> FALSE25( 27): fHit==1 ==> TRUE29( 33): p>21 ==> FALSE25( 27): fHit==1 ==> FALSE37( 41): d==21 ==> FALSE51( 53): p==21 ==> FALSE
The fifth functional test is a win for the player in which neither the player nor the dealer takes a hit, and neither has blackjack. Execution shows correct behavior. The associated path is:51( 54): count>=5 ==> TRUE18( 20): p==21 ==> FALSE25( 27): fHit==1 ==> FALSE37( 41): d==21 ==> FALSE51( 53): p==21 ==> FALSE51( 54): count>=5 ==> FALSE59( 63): d<=16 ==> FALSE
The sixth functional test is a win for the dealer in which neither the player nor the dealer takes a hit, and neither has blackjack. Execution shows correct behavior. The associated path is:69( 76): p>d ==> TRUE18( 20): p==21 ==> FALSE25( 27): fHit==1 ==> FALSE37( 41): d==21 ==> FALSE51( 53): p==21 ==> FALSE51( 54): count>=5 ==> FALSE59( 63): d<=16 ==> FALSE
The seventh functional test is a win for the dealer in which the player is busted. Execution shows correct behavior. The associated path is:69( 76): p>d ==> FALSE18( 20): p==21 ==> FALSE25( 27): fHit==1 ==> TRUE
The eighth functional test is a win for the player in which the dealer is busted. Execution shows correct behavior. The associated path is:29( 33): p>21 ==> TRUE18( 20): p==21 ==> FALSE25( 27): fHit==1 ==> FALSE37( 41): d==21 ==> FALSE51( 53): p==21 ==> FALSE51( 54): count>=5 ==> FALSE59( 63): d<=16 ==> TRUE
The ninth functional test is a win for the dealer in which the player is hit to reach 21 and the dealer has blackjack. Execution shows correct behavior. The associated path is:61( 66): d>21 ==> TRUE18( 20): p==21 ==> FALSE25( 27): fHit==1 ==> TRUE29( 33): p>21 ==> FALSE25( 27): fHit==1 ==> FALSE37( 41): d==21 ==> TRUE
This concludes our initial set of functional tests. No error has been detected yet, and each test increased ac by 1, so ac is now 9. The coverage analysis tool reports that one branch has not yet been executed, "61( 66): d>21 ==> FALSE." This corresponds to the dealer taking a hit without being busted. Since branches are usually easier to test than paths, and testing a previously unexecuted branch must increase ac, we construct a new test for this branch immediately. The test makes the dealer win the hand by hitting once and then standing. Execution shows correct behavior. The associated path is:39( 44): win==1 ==> FALSE18( 20): p==21 ==> FALSE25( 27): fHit==1 ==> FALSE37( 41): d==21 ==> FALSE51( 53): p==21 ==> FALSE51( 54): count>=5 ==> FALSE59( 63): d<=16 ==> TRUE61( 66): d>21 ==> FALSE59( 63): d<=16 ==> FALSE
All branches have now been covered. No error has been detected yet, and ac is now 10. The coverage analysis tool reports that executing the following path would be sufficient to complete basis path coverage:69( 76): p>d ==> FALSE18( 20): p==21 ==> FALSE25( 27): fHit==1 ==> FALSE37( 41): d==21 ==> FALSE
Unfortunately, this particular path is unrealizable, since the value of p is not altered between the mutually exclusive "30( 13): p==21 ==> FALSE" and "62( 44): p==21 ==> TRUE" decision outcomes. If the basis completion path had been realizable, we could have just derived the corresponding data, executed the test, and completed the testing process. For modules as poorly structured as "hand," however, unrealizable paths are fairly common, so it is important to be prepared for them. As discussed in section 9, the existence of unrealizable paths does not necessarily mean that rc < v(G). To help determine whether a realizable alternative path can complete basis path testing, we use the dependency analysis technique. The dependency analysis tool indicates that for every tested path, the total number of times branches "39( 44): win==1 ==> TRUE" and "51( 53): p==21 ==> TRUE" were executed is equal to the number of times branch "18( 20): p==21 ==> TRUE" was executed. Note that for any single path, none of those three decision outcomes can be executed more than once and at most one of the first two can be executed. We may therefore rephrase the dependency equation in terms of the module's functionality: The player has blackjack if and only if either the result is a push or the player automatically wins with 21. Any test that does not satisfy this dependency relationship will be independent of the current set of tests and increase ac, in this case completing testing by making ac equal to v(G). On the other hand, if we can prove that the dependency must hold for every feasible test, then we have shown that rc < v(G). Fortunately, we can simplify the dependency even further, showing that rc = v(G) and giving a concise characterization of all feasible basis completion paths. Note that if the player has blackjack, the result must be either a push or an automatic win with 21 for the player. Also, there is no way to reach a push result without the player having blackjack. Thus, a test breaks the dependency relationship if and only if the player gets an automatic win with 21 without having blackjack. This is clearly possible for the module as written, for example by having the player reach exactly 21 after taking one hit. Execution of this test shows incorrect behavior-the module declares an automatic win for the player without giving the dealer a chance to take hits and win by also reaching 21. The specification states that the player can only win "automatically" with either Blackjack or a five-card hand. As expected, this test completes the basis set. The associated path is:51( 53): p==21 ==> TRUE18( 20): p==21 ==> FALSE25( 27): fHit==1 ==> TRUE29( 33): p>21 ==> FALSE25( 27): fHit==1 ==> FALSE37( 41): d==21 ==> FALSE
Note that in addition to providing a test that detected the error, dependency analysis clarified the nature of the error and gave a straightforward repair technique: replace the automatic win test "51( 53): p==21" with "51( 53): win==1," since at that point win==1 is precisely equivalent to the player having blackjack. If the error is repaired in this manner, there will be no way to break the dependency with a feasible path, and rc will therefore be reduced to 10.51( 53): p==21 ==> TRUEB.2 Experimental comparison of structured testing and branch coverage
The "hand" module from section B-1 is one of several modules used in [WATSON4] to compare the error detection effectiveness of structured testing with branch coverage. The rest of this appendix discusses that work. Although neither testing technique is guaranteed to detect the error, structured testing performed dramatically better than branch coverage in the experiment, which used random test data generation to simulate actual play.
B.2.1 Experimental design
The overall experimental design was to package the test module with a driver program that accepts a random number seed as input, generates random test data from a realistic distribution, executes the test module with that data, analyzes the module's behavior, and returns an exit code indicating whether an error was detected. A higher-level driver invokes the module test driver with a sequence of seed values, uses an automated tool to measure the test coverage level according to each criterion being compared, and tracks both the error detection and the coverage results.
There is an important distinction between considering all tests and just considering the ones that increased coverage. Considering all tests is essentially doing random testing while using the coverage criterion as a stopping rule. This may be appropriate when detecting incorrect program behavior is cheap, so that program behavior can be examined for each test. However, the criteria comparison information from such an experiment tends to be diluted by the effectiveness of the underlying random test data distribution. A criterion that requires more random tests is likely to detect more errors, regardless of the value of the specific test cases that contribute to satisfying the criterion. Therefore, we also consider just the tests that increase coverage. In that case, we factor out most of the influence of the underlying random test data distribution by in effect considering random minimal test data sets that satisfy each criterion. In addition to giving a more objective comparison, this more accurately models test effectiveness when detecting incorrect program behavior is expensive, so that program behavior can only be examined for tests that contribute to satisfaction of the test coverage criterion. [WATSON4]
|
|
Structured Testing
|
Branch Coverage
|
|
Tests
|
45
|
36
|
|
Testsinc
|
11
|
8
|
|
Detect
|
97
|
85
|
|
Detectinc
|
96
|
53
|
Figure B-3. Experimental data.
B.2.3 Interpretation of experimental data
Structured testing decisively outperformed branch coverage at detecting the error while requiring only 25% more total tests and 38% more tests that increase coverage. The most notable feature of the data is the robustness of structured testing with respect to test set minimization. Only 1% of the error detection effectiveness of structured testing was lost when considering only tests that increased coverage, whereas the number of tests required was reduced by 75%. For branch coverage, 38% of error detection effectiveness was lost, and the number of tests required was reduced by 78%. This indicates that the effectiveness of structured testing is due to the criterion itself rather than just the number of tests required to satisfy it. For the complete set of test programs used in [WATSON4], structured testing preserved 94% of its error detection effectiveness when restricted to the 24% of test cases that increased coverage.