NISTGCR 93/626 An International Survey of Industrial Applications of Formal Methods Volume 2 Case Studies Dan Craigen, ORA Canada Susan Gerhart, Applied Formal Methods Ted Ralston, Ralston Research Associates U.S. DEPARTMENT OF COMMERCE Technology Administration National Insttitute of Standards and Technology Computer Systems Laboratory Gaithersburg, MD 20899 NISTGCR 93/626 An International Survey of Industrial Applications of Formal Methods Volume 2 Case Studies Dan Craigen, ORA Canada Susan Gerhart, Applied Formal Methods Ted Ralston, Ralston Research Associates U.S. DEPARTMENT OF COMMERCE Technology Administration National Institute of Standards and Technology Computer Systems Laboratory Gaithersburg, MD 20899 March, 1993 U.S. DEPARTMENT of COMMERCE Ronald H. Brown, Secretary TECHNOLOGY ADMINISTRATION ....., Under Secretary for Technology NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY ..... Director Blank Page AN INTERNATIONAL SURVEY OF INDUSTRIAL APPLICATIONS OF FORMAL METHODS VOLUME 2 CASE STUDIES ABSTRACT Formal methods are mathematically-based techniques, often supported by reasoning tools, that can offer a rigorous and effective way to model, design and analyze computer systems. The purpose of this study is to evaluate international industrial experience in using formal methods. The cases selected are, we believe, representative of industrial-grade projects and span a variety of application domains. The study had three main objectives:  To better inform deliberations within industry and government on standards and regulations;  To provide an authoritative record on the practical experience of formal methods to date; and  To suggest areas where future research and technology development are needed. This is the second volume of a two volume final report on an international survey of industrial applications of formal methods. In this volume, we provide the details of the twelve case studies. For each of the case studies, we present a case description, summarize the information obtained (from interviews and the literature), provide an evaluation of the case, highlight R & D issues pertaining to formal methods and provide some conclusions. AN INTERNATIONAL SURVEY OF INDUSTRIAL APPLICATIONS OF FORMAL METHODS VOLUME 2 CASE STUDIES INTRODUCTION The purpose of this study is to evaluate international industrial experience in using formal methods. The cases selected are, we believe, representative of industrial-grade projects and span a variety of application domains. The intent of the study is threefold:  To better inform deliberations within industry and government on standards and regulations;  To provide an authoritative record on the practical experience of formal methods to date; and  To suggest areas where further research and technology development are needed. This study was undertaken by three experts in formal methods and software engineering: Dan Craigen of ORA Canada, Susan Gerhart of Applied Formal Methods, and Ted Ralston of Ralston Research Associates. Robin Bloomfield of Adelard was involved with the Darlington Nuclear Generating Station Shutdown System case. Support for this study was provided by organizations in Canada and the United States. The Atomic Energy Control Board of Canada (AECB) provided support for Dan Craigen and for the technical editing provided by Karen Summerskill. The U.S. Naval Research Laboratories (NRL), Washington, D.C., provided support for all three authors. The U.S. National Institute of Standards and Technology (NIST) provided support for Ted Ralston. This final report consists of two volumes. The first volume describes the reason for the study, the cases that were studied, our approach to performing the study, and our analysis and conclusions resulting from the cases investigated. This second volume of the final report provides the details on the case studies. For each of the case studies we present a case description, summarize the information obtained (from interviews and the literature), provide an evaluation of the case, highlight R & D issues pertaining to formal methods, and provide some conclusions. Earlier drafts of the case studies were reviewed by the relevant participants. AN INTERNATIONAL SURVEY OF INDUSTRIAL APPLICATIONS OF FORMAL METHODS VOLUME 2 CASE STUDIES TABLE OF CONTENTS ABSTRACT i INTRODUCTION ii TABLE OF CONTENTS iii 1 STRUCTURED SYSTEMS ANALYSIS AND DESIGN METHOD TOOLSET 1 1.1 Case Description 1 1.1.1 Product and Process Profile 2 1.1.2 References 2 1.2 Interview Summary 3 1.2.1 Organizational Context 3 1.2.2 Project Context and History 3 1.2.3 Formal Methods Factors 4 1.2.4 Formal Methods and Tool Usage 5 1.2.5 Results 5 1.3 Evaluation 7 1.3.1 Background 7 1.3.2 Product Features 7 1.3.3 Process Features 8 1.3.4 Observations 9 1.3.5 Formal Methods R & D Summary 10 1.4 Conclusions 11 2 IBM'S CUSTOMER INFORMATION CONTROL SYSTEM 12 2.1 Case Description 12 2.1.1 Product and Process Profile 12 2.1.2 References 13 2.2 Questionnaire 1 14 2.3 Interview Summary 16 2.3.1 Organizational Context 16 2.3.2 Project Context and History 16 2.3.3 Application Goals 17 2.3.4 Formal Methods Factors 18 2.3.5 Formal Methods and Tool Usage 19 2.3.6 Results 20 2.4 Evaluation 22 2.4.1 Background 22 2.4.2 Product Features 22 2.4.3 Process Features 23 2.4.4 Observations 25 2.4.5 Formal Methods R & D Summary 26 2.5 Conclusions 27 3 CLEANROOM SOFTWARE METHODOLOGY 29 3.1 Case Description 29 3.1.1 Product and Process Profile 30 3.1.2 References 30 3.2 Interview Summary: IBM 31 3.2.1 Organization and Project Context 31 3.2.2 Formal Methods Factors 33 3.2.3 Formal Methods and Tool Usage 33 3.2.4 Results 35 3.3 Interview Summary: NASA Goddard Center 36 3.3.1 Organization and Project Context 36 3.3.2 Application 37 3.3.3 Formal Methods and Tool Usage 37 3.3.4 Results 39 3.4 Evaluation 40 3.4.1 Background 40 3.4.2 Product features 41 3.4.3 Process features 41 3.4.4 Observations 42 3.4.5 Formal Methods R & D Summary 43 3.5 Conclusions 45 4 DARLINGTON: TRIP COMPUTER SOFTWARE 46 4.1 Case Description 46 4.1.1 Product and Process Profile 46 4.1.2 References 47 4.2 Interview Summary 48 4.2.1 Organizational Context 48 4.2.2 Project Context and History 49 4.2.3 Application Goals 54 4.2.4 Formal Methods Factors 55 4.2.5 Formal Methods and Tools Usage 55 4.2.6 Result 57 4.2.7 Interview with a regulator: Richard Taylor 58 4.2.8 Interview with David Parnas 60 4.2.9 Interview with a consultant: Nancy Leveson 61 4.2.10 Project Milestones (OH Perspective) 62 4.2.11 Project Milestones (AECB Perspective) 63 4.3 Evaluation 65 4.3.1 Background 65 4.3.2 Product Features 65 4.3.3 Process Features 66 4.3.4 Observations 67 4.3.5 Formal Methods R & D Summary 68 4.4 Conclusions 69 5 LaCoS ESPRIT PROJECT 70 5.1 Case Description 70 5.1.1 References 71 5.2 Questionnaire 1 72 5.3 Interview Summary 75 5.3.1 Lloyd's Register 75 5.3.2 Interview with Matra Transport 80 5.4 Evaluation 83 5.4.1 Background 83 5.4.2 Product Features 84 5.4.3 Process Features 84 5.4.4 Observations 85 5.4.5 Formal Methods R & D Summary 86 5.5 Conclusions 88 6 MULTINET GATEWAY 89 6.1 Case Description 89 6.1.1 Product and Process Profile 90 6.1.2 References 91 6.2 Interview Summary 92 6.2.1 Organizational Context 92 6.2.2 Project Context and History 92 6.2.3 Application Goals 93 6.2.4 Formal Methods Factors 93 6.2.5 Formal Methods and Tools Usage 94 6.2.6 Results 95 6.2.7 Interview with an assessor: Jim Williams 95 6.3 Evaluation 97 6.3.1 Background 97 6.3.2 Product Features 98 6.3.3 Process Features 98 6.3.4 Observations 99 6.3.5 Formal Methods R & D Summary 100 6.4 Conclusions 101 7 SACEM - A RAILWAY SIGNALLING SYSTEM 102 7.1 Case Description 102 7.1.1 References 102 7.2 Questionnaire 1 103 7.3 KVS 105 7.4 CTDC Calcutta 106 7.5 Interview Summary 107 7.5.1 Organization 107 7.5.2 Project Context 107 7.5.3 Application 108 7.5.4 Formal Methods Usage 111 7.5.5 Results 113 7.6 Evaluation 115 7.6.1 Background 115 7.6.2 Product features 115 7.6.3 Process features 116 7.6.4 Observations 117 7.6.5 Formal Methods R & D Summary 117 7.7 Conclusions 119 8 NIST TOKEN-BASED ACCESS CONTROL SYSTEM (TBACS) 120 8.1 Case Description 120 8.1.1 Formal Methods Background 120 8.1.2 References 120 8.2 Questionnaire 1 121 8.3 Interview Summary 123 8.3.1 Organizational Context 123 8.3.2 Project Context and History 123 8.3.3 Application Goals 124 8.3.4 Formal Methods and Tools Usage 125 8.3.5 Results 128 8.4 Evaluation 130 8.4.1 Background 130 8.4.2 Product features 130 8.4.3 Process features 131 8.4.4 Observations 132 8.4.5 Formal Methods R & D Summary 133 8.5 Conclusions 135 9 TEKTRONIX: USE OF Z METHOD ON OSCILLOSCOPES 136 9.1 Case Description 136 9.1.1 References 136 9.2 Questionnaire 1 (Delisle) 137 9.3 Interview Summary 139 9.3.1 Organizational Context 139 9.3.2 Project Context and History 139 9.3.3 Application Goals 140 9.3.4 Formal Methods Factors 141 9.3.5 Formal Methods and Tools Usage 141 9.3.6 Results 143 9.4 Evaluation 144 9.4.1 Background 144 9.4.2 Product Features 145 9.4.3 Process Features 145 9.4.4 Observations 146 9.4.5 Formal Methods R & D Summary 148 9.5 Conclusions 149 10 TRAFFIC ALERT AND COLLISION AVOIDANCE SYSTEM (TCAS) 150 10.1.1 Product and Process Profile 151 10.1.2 References 151 10.2 Interview Summary 152 10.2.1 Organizational Context 152 10.2.2 Project Context and History 153 10.2.3 Application Goals 154 10.2.4 Formal Methods Factors 155 10.2.5 Formal Methods and Tools Usage 155 10.2.6 Results 156 10.2.7 Interview with Larry Nivert 156 10.2.8 Interview with David Lubkowski 157 10.2.9 Interview with a pilot 158 10.3 Evaluation 159 10.3.1 Background 159 10.3.2 Product Features 159 10.3.3 Process Features 160 10.3.4 Observations 161 10.3.5 Formal Methods R & D Summary 161 10.4 Conclusions 162 11 INMOS TRANSPUTER 163 11.1 Case Description 163 11.1.1 Product and Process Profile 163 11.1.2 Issues of Interest 164 11.1.3 References 164 11.2 Interview Summary 165 11.2.1 Organizational Context 165 11.2.2 Project Context and History 165 11.2.3 Application Goals 166 11.2.4 Formal Methods Factors 166 11.2.5 Formal Methods and Tools Usage 166 11.2.6 T800 Floating Point Unit 167 11.2.7 T800 Scheduler 167 11.2.8 T9000 Virtual Channel Processor 168 11.2.9 Tools 169 11.2.10 Results 169 11.3 Evaluation 170 11.3.1 Background 170 11.3.2 Product Features 171 11.3.3 Process Features 171 11.3.4 Formal Methods R & D Analysis 173 11.4 Conclusions 175 12 HEWLETT-PACKARD MEDICAL INSTRUMENTS 176 12.1 Case Description 176 12.1.1 Product and Process Profile 176 12.1.2 References 177 12.2 Interview Summary 178 12.2.1 Organizational Context 178 12.2.2 Project Context and History 179 12.2.3 Application Goals 179 12.2.4 Formal Methods Factors 180 12.2.5 Formal Methods and Tools Usage 180 12.2.6 Results 181 12.2.7 Defibrillator 182 12.3 Evaluation 183 12.3.1 Background 183 12.3.2 Product Features 183 12.3.3 Process Features 184 12.3.4 Observations 186 12.3.5 Formal Methods R & D Summary 186 12.4 Conclusions 187 Appendix A Acknowledgements 188 1 STRUCTURED SYSTEMS ANALYSIS AND DESIGN METHOD TOOLSET (SSADM) Praxis Systems Ltd. 1.1 Case Description Praxis Systems Ltd. is a software engineering company located in Bath, England. It was founded in 1983 and conducts business using both conventional software development techniques as well as formal methods. Praxis supplies high-integrity software systems and was the first independent software house to be registered under British Standard 5750 (in 1986), which is the UK equivalent of the ISO 9000 quality assurance standard. Praxis has developed considerable strength in the area of formal methods for systems development which are now used extensively in the company. In the last three years, over a dozen significant projects have been carried out involving the use of formal methods including Vienna Development Methodology (VDM), Z, Communicating Sequential Processes (CSP), Temporal Logic, and an internally-developed Process Modelling Language. These projects have involved the production of both government-sponsored and commercial systems. Of these projects, we considered four to fit our definition of a real industrial application: 1. An operating system interface specified in VDM and CSP. Under contract to a commercial client, Praxis formally defined the interfaces between various components of a distributed factory control system using CSP. A particular component for which Praxis had design authority was subsequently specified using VDM and the relationship between the VDM and CSP specifications established. 2. A security policy model and extensions to a secure operating system specified in Z On a Government security project, Praxis has produced a VDM-based functional specification of secure enhancements to a secure operating system (B1/B2 level) and then developed what Praxis calls a "demonstrably conformant" C code to implement these enhancements. 3. Components of the upgrade to London's Air Traffic Control system specified in VDM As part of its advanced program to expand and develop the air traffic control system over the south-east of England, the Civil Aviation Authority is building a major new operations centre, the Central Control Function (CCF) facility, at the London Air Traffic Control Centre. Praxis has been selected to conduct a design study for the CCF Display and Information System (CDIS) which forms part of this program. CDIS forms a vital component of the data entry and display equipment used by air traffic controllers. Praxis used VDM to do an abstract specification of the whole system and for some parts of the design. 4. The use of Z to specify SSADM infrastructure and toolset Under contract to a commercial client, Praxis has developed a Computer-Assisted Systems Engineering toolset to support the use of the CCTA standard development method SSADM. In this project, Z was used to develop a formal specification of the toolset infrastructure. In the early stages of requirements analysis, Z was used for concept exploration. Later, the project team developed guidelines for the use of Z with Object-Oriented Design methods. We selected the SSADM project to be the case study for this study for several reasons. First, it was completed and data had been recorded for it. Second, our first choice was the CDIS London ATC project; however, it turned out to be premature as the project is just finishing and Praxis itself has not had time to analyze the results. Lastly, the SSADM project appeared to be a good example of how Praxis combines formal and informal methods in their normal system development process. 1.1.1 Product and Process Profile Size: Total SSADM system size is 45,000 lines of non-empty, non-comment code, 37,000 lines of which were specified in 350 pages of Z schemas (550 schemas of which 280 defined top-level operations. Stages: July 1987 Project start February 1988 Formal Specification completed June 1989 Implementation delivered 4 phases 0. short requirements analysis 1. infrastructure specified 2. infrastructure software built 3. tools built (mostly by client) 1.1.2 References 1. Hall, J. Anthony, "Seven Myths of Formal Methods", IEEE Software, September, 1990, pp. 11-21. 2. Rawlings, Rosamund, "Some Numbers from an Object-Oriented Development", in Object-oriented Software Engineering: The Next Step, Ed. Bruce Anderson, British Computer Society, London, 1990. 3. Brownbridge, David, "Using Z to Develop a CASE Toolset", in Proceedings of the 1989 Z User Meeting, Workshops in Computing Series, Springer-Verlag, December, 1989. 4. Hall, J. Anthony, Plenary 3 Keynote Address, 13th International Conference on Software Engineering, Austin, Texas, May 13-17, 1991. 1.2 Interview Summary Date of Interview: May 15, 1992 Organization: Praxis Systems plc, Bath, England Respondents: Anthony Hall, Rosamund Rawlings, Dave Ellis, Gavin Finnie, and David Brownbridge Interviewers: Dan Craigen and Ted Ralston (author) 1.2.1 Organizational Context Praxis is a software engineering company providing services for consultancy to full turnkey implementations. At the time of the project, ELLA (a CAD behavioral description language) was still part of Praxis (the ELLA business was recently sold), and there were about 120 individuals in the company. Clients have usually been large organizations where quality was more important than cost (to a point). Praxis was incorporated in 1984 and its main message to its customers is that Praxis will use the most up-to-date methods and will follow a rigorous quality system (e.g., ISO 9000/BS 5750). Praxis was the first independent software house in the UK to be registered under these requirements. 1.2.2 Project Context and History The SSADM project started in mid-1987 (not including proposal time) and completed in mid-1989. By February 1988, the specification was completed; in July, September and December 1988, and March 1989 incremental deliverables of the class hierarchy were made. The product was to be in two parts, both in support of SSADM. The first was a framework---or, in Praxis' terminology, an "infrastructure"---for the entire method, to manage all the documents required by the method, to check the rules of the method, and to provide a project management framework. The second part was a set of automated tools for this infrastructure. Hence, the product was substantially more complex than a data flow tool. At the initial stages, Praxis was concerned about the feasibility of the product, as the requirements were ambitious and required integrating components based on a variety of techniques (some AI, some object-oriented, some data flow) and involving a number of features such as version control, configuration management and the like. The main concerns were how to understand the required system and how to provide a toolset that was feasible and within budget. There had been historical concerns with previous attempts to develop Integrated Programming Support Environments in Europe; many were far too ambitious. There was a very short requirement phase (essentially, the client had already decided on the requirements) and a longer specification phase where formal methods were used to describe the infrastructure, but not the tool kit. Note that the basic structure of the tool was as both an infrastructure and a tool kit. The project was conducted in three phases over a two- year period. The first phase carried out the system specification and architecture, and chose an implementation language (Objective C) and libraries. The second phase entailed the design and implementation of the infrastructure, while the design and implementation of the tools was carried out in the final phase. 1.2.3 Formal Methods Factors Praxis used Z to specify the infrastructure used by project teams using the SSADM method, a more or less standard method in aerospace and defense contracting work in Europe. The toolset runs in a multi-user environment, with various project-management and configuration-management utilities distributed among a number of networked workstations (Suns and PERQs, an ICL workstation). The environment provides a set of basic classes (e.g., diagram, table, and matrix) from which tools for structured analysis can be developed by specialization. The process used involved a series of steps, beginning with a high-level specification, and moving down through successively lower levels toward implementation. First, the requirement (in English language) was formalized, and a high-level specification of the entire system was produced in Z (consisting of 350 pages and about 550 schemas defining about 280 operations). From this high-level specification, some lower level parts of the system were written in informal design documents, while others were written in Z. Also, only some of the lower level modules were specified in Z; others were written informally. Coding was carried out from both the informal designs and the lower level Z specifications. The end product seemed to be less troublesome than normal. However, it is hard to determine whether this is due to the use of formal methods, the use of object-oriented programming, or the use of talented individuals. It is claimed in [3] that test suites were independently constructed by using the formal Z schemas as a reference, not lower-level code documentation (e.g., tests on an object class can be derived from the Z schema for that object, rather than the class description of the Objective C object class). There were few faults found at system test and during client usage. Having got the details right at the start helped and there were few integration problems. 1.2.4 Formal Methods and Tool Usage The only tools used in the SSADM project were a prototype parser and type checker obtained from an Alvey R & D project called FORSITE. Neither were industrially robust, and the type checker was described as "eccentric" (i.e., it would get things wrong). This tool was used because a type checker was needed and the FORSITE type checker was the only one readily available at the time. It was barely adequate. 1.2.5 Results Praxis provided the following data on this project during the interview. It pertains solely to the productivity issues. The SSADM system was developed under BS 5750 product quality requirements which require full specification and design documentation. The effort figures include requirements capture and specification, initial investigations and architecture design, system specification in Z, implementation in Objective C, and system testing. They do not include final product testing or user documentation. The effort expended was recorded on timesheets during the project. The scale of the project is indicated in the following table, which shows the number of classes and the size in thousands of lines of non-empty, non-comment code (KLOC). Subsystem Classes KLOC* 1. Management Facilities 153 25 2. Tool-writers Class Library 99 12 3. Tools (5 only) 81 8 Total 333 45 * KLOC are non-empty non-comment lines The effort expended on parts 1 and 2 combined totalled 2235 work-days; part 3 took 483 work-days. This gives a figure of 16.5 lines of code per work-day or 8 work-days per class. The division of effort between process phases was broken down as follows: Requirements 7% Specification 29% Development (incl. unit testing) 50% Testing and Delivery 14% The Z specification of parts 1 and 2 comprises 350 pages including the English annotations, broken down into 550 schemas of which 280 are top-level operations. The object-oriented paradigm of "model-view-controller" was followed. The figures in the table below show the amount of code and number of classes in each to the libraries, while the "tool" number is the average of the five tools developed by Praxis (the full system was planned to have approximately 60 tools). Model View-Controller classes KLOC classes KLOC Stepstone libraries 26 5 63 16 Toolwriters library 31 2 68 10 Average tool 6 1 10 0.6 Praxis used Boehm's COCOMO productivity measurement model to measure the productivity performance in terms of the size to predict how many lines of code per day should be produced (Boehm, B.W., Software Engineering Economics, Prentice Hall, 1981). They considered the SSADM system to be a "semi-detached" system (in COCOMO terms) because it has distributed and multi-user aspects, and produced a COCOMO predicted benchmark of 11 lines of code per day. However, because the SSADM system used an object-oriented implementation, which differs markedly from a conventional implementation as assumed in COCOMO, it did not seem appropriate to use the COCOMO benchmark. In [2] Rawlings refers to a factor of 3.7 times as many lines for a C program compared to an Objective C program implementing the same small problem (citing Brad Cox example in Object Oriented Programming: An Evolutionary Approach). Therefore, the project used another statistical measurement technique called "function point analysis" to assess the expected size and productivity (Albrecht, A.J. and J.E. Gaffney, "Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation", IEE Transactions on Software Engineering, Vol. SE-9, No.6, November 1983, pp. 639-648). Rawlings in [2] indicates the following assumptions were made: each top-level operation in the Z specification is equivalent to an external input (the simplest function type); the logical internal files are ignored; the operations are of average complexity; and a neutral general application adjustment. Based on these assumptions, a function point count of 1120 is expected, which according to Albrecht and Gaffney would lead to an estimate of 48,000 work-hours (6400 work-days). 1.3 Evaluation 1.3.1 Background Product: A Computer Aided Systems Engineering toolset for SSADM Industrial Process: Contract development per requirements/design previously developed by client Scale: Complete, integrated toolset (some 60 tools); 37,000 lines of Objective C, and 350 pages of Z (schemas plus English annotations) Formal Methods Process: Short requirements dialogue, followed by formal specification of the infrastructure and tools in Z, then implementation in Objective C Major Results: 1. Toolset delivered on schedule and to requirements; 2. Productivity enhanced in part due to use of formal methods and to use of object-oriented programming Interview Profile: 1/2 day in combined interviews with members of Praxis management and the development team 1.3.2 Product Features Client satisfaction n/a (1) Cost n/a (2) Impact of product n/a (3) Quality + (4) Time to market n/a (5) (1) Owing to factors outside the project, which had nothing to do with the formal methods, the product was not used. Therefore, it is not possible to assess client satisfaction. (2) Praxis claimed the development cost was within budget, although delivery schedules during the two years the project slipped slightly. However, no data on product costs were collected which makes it impossible to compare the impact of the development on cost. (3) Since the product has not been used, it is not possible to judge its impact. (4) Few errors were discovered in reviews and testing, although it proved difficult to adduce this exclusively to formal methods since object-oriented programming and good people could have played an equally positive role. Whatever the cause, the product was acceptable in that it met the stated quality requirements. (5) Time to market was not an objective of this product and hence it is not applicable. 1.3.3 Process Features General Process Effects Cost n/a (6) Impact of Process + (7) Pedagogical + (8) Tools - (9) Specific Process Effects Design + (10) Reusable components n/a (11) Maintainability n/a (12) Requirements capture n/a (13) V&V + (14) (6) Cost data was not broken down and, therefore, it was not possible to assess the cost effects of the formal methods on process. In our view, the use of the formal method contributed to achieving control of the system development, and that this is an important positive result of the process, but it is not possible to relate this to an impact on the cost of the process. (7) The impact of the use of formal specification improved communication among the team members (some of whom were not familiar with the SSADM or Z methods) and between Praxis and the client, although it was also noted that early in the project the Z notation was an impediment to communication with the client. (8) This was the first time Praxis had used formal methods on a complete development (earlier efforts had been formal specifications or studies). As such, a number of useful lessons regarding the use of a formal method on a development were learned. One was the experience in using a formal method in conjunction with another approach (object-oriented programming). A second lesson was the relatively easier (and unexpected) job of implementing the code from the formal specification. A third was the relatively difficult task of finding adequate metrics that accommodated formal methods and object-oriented approaches. (9) At the time, Z tools were very primitive. The tools used on the project were essentially untried, not robust, and inadequate to the needs. No attempt to improve them was made during the project. (10) While an initial requirements document had been produced informally by the client, the design of the infrastructure and tools was not feasible from that document. The formal specification effort sharpened the design by helping the Praxis team understand the requirements and control the design based on that understanding. (11) Reuse of either newly created components, or components already existing, was not an objective of the project and not undertaken. (12) The product was not put into use by the client. Hence, no data is available on maintenance or maintainability. (13) Insufficient information, (14) Praxis told us that testing was assisted positively by having the formal specification but we did not explore this in great detail. The claim is that the formal specification was used as a reference in constructing test suites. 1.3.4 Observations With respect to the claims for better productivity, we have collected the following data from Praxis in support of these claims. COCOMO predicts approximately 11 lines of code per day for this class of system; Praxis figures show 17 lines of code per day were produced. The next figure shows a comparison of effort (in percent) expended per phase on the CASE project and a typical project using a conventional method. Phase Z Method Conventional % % Requirements 6 6 Spec and Arch 27 0 Design 0 10 Development 53 62 Integration and Test 14 22 It is worth noting that the conventional methods are slightly more productive for the middle three phases than the formal method, although this conclusion masks somewhat the effect that the effort expended in the Specification and Architecture phase had on the final phase. The smaller amount of effort expended using Z on the more expensive phase of "Integration and Testing" reflects the quality of the specification effort earlier in the process. On balance, however, the data collected thus far on this project does not make a clear case one way or the other for Praxis' claim of increased productivity. If anything, the overall productivity in terms of level of effort seems to be practically the same. Praxis believes the data are interesting in their own right. In our view, the COCOMO data is relatively meaningless, largely because it requires comparative data from other projects with which to compare results and that this data needs to be collected over a period of time on a number of projects in order to have a baseline. It is perhaps valid to note that the use of COCOMO shows that a formal method like Z can be matched to conventional productivity metrics if desired, although as Rawlings notes in [2] care on making assumptions of equivalence between new and different techniques (in this case object-oriented methods) and conventional ones must be taken. Praxis acknowledged that the only reason the COCOMO data was collected was to satisfy the client's management system which required an accounting of how many lines of code per day was produced. Given the difficulty with the COCOMO data, the use of another technique that is more fairly matched to the techniques seemed sensible. Again, comparative data is necessary to develop a full picture of productivity. Absent this, about all one can do is note the data that Praxis developed using the function point analysis method. 1.3.5 Formal Methods R & D Summary 1.3.5.1 Methods Specification The Z method was used to develop a formal abstract specification of the system based on requirements supplied by the client. The features of Z most exploited in this project were precise expression of functions and operations in modelling the requirements in order to improve design, and the specification also served as a more precise communication medium than an English text. Design and Implementation The formal Z specification provided a clearer path to implementing the code, and implementation proceeded in a less troublesome manner than expected. Validation and Verification V & V per se was not performed. However, system testing was carried out, and the Z specification provided assistance in carrying out this testing. More work on using formal specifications to generate test cases seems warranted by this experience. 1.3.5.2 Tools Language Processors Primitive, prototype parser and type checker were the only tools used and the type checker was barely adequate. Problems with the tools were a less than desirable screen-based editor (based on the QED screen-based editor) which was changed to an "ascii" editor which did not require using QED for more flexibility, a lack of a tool that would expand and merge Z schemas, and other editing, browsing and cross-referencing features. It was noted that a way to partially type check an early, incomplete specification would have been desirable. Automated Reasoning No proving or proof checking was carried out or contemplated in this project. This was simply an effort in which formalizing the specification was seen to be sufficient. It was noted that the use of an object-oriented language for implementation narrowed the gap between specification and implementation, and hence, refinement in the sense of theorem proving (either by hand or mechanically) was not seen as necessary, and an informal relationship between specification and code was considered sufficient. Other Tools No other tools were considered necessary. However, based on this experience Praxis personnel have identified the need for additional tools (see above). 1.4 Conclusions The SSADM project is one of the few projects we examined that combined a formal method with another informal approach (object-oriented). It illustrates that this can be done without excessive increases in development costs or delays in schedules, at least to the extent where a formal method is used for an abstract specification that helps clarify and understand difficult requirements. It also is one of the few projects that attempted to keep measurement data, and illustrates the difficulties in trying to apply metrics intended for conventional software development techniques to new and substantially different methods. It points to the need for more research on this topic, as clients and companies are likely to continue to require metrics for project management. The lack of adequate tools for the formal method did not hinder the project significantly, although it is clear some basic capabilities for editing and typechecking were needed and desired. These basic needs have been met subsequent to the completion of this project, since there are few reasonably robust typecheckers and editors currently on the market. 2 IBM'S CUSTOMER INFORMATION CONTROL SYSTEM 2.1 Case Description IBM's Customer Information Control System (CICS) is a large (slightly more than 500,000 lines of mixed language code), on-line transaction processing program which runs on some 30,000 installations around the world (in banks, insurance companies, manufacturing firms, airlines, and many others). It generated about $2.3 billion in revenue for IBM in 1989. It was first developed in 1968 and is written in System 370 Assembler and PLAS (a high-level internal IBM programming language). IBM Hursley Labs Ltd is a subsidiary of IBM UK Ltd., an IBM holding company for all IBM UK operations. It is a product development laboratory and is run as a profit/loss centre. In 1981, IBM Hursley began to investigate new ways to improve future releases of CICS. IBM-Hursley has responsibility for all aspects of CICS, including the development and upgrade of future releases. As part of a general effort within IBM to introduce software engineering techniques into their software development process, in an attempt to deliver defect-free software, IBM Hursley began a contract with the Programming Research Group (PRG) at nearby Oxford University. PRG is the group which developed the Z method [1]. Beginning with the Software Engineering Workshops (an internal IBM training series for software developers), the PRG team provided education in formal, mathematically based methods to the IBM software developers, both through lectures and small projects using the Z method to specify small modules. This initial effort at education lasted through 1984-85. Z was chosen in 1985 to be used to specify several modules of the next commercial release of CICS. The issues of interest to this study in the Hursley formal methods experience are threefold. First, we are interested in IBM's experiences in integrating the formal method into the general software development process at Hursley (including education and training experience). Second, we want to attempt to understand more completely the cost-benefit data which has been anecdotally reported over the years that this project has been running. Lastly, we are interested in the IBM experience in using a formal method for re-engineering parts of an old and evolving system. 2.1.1 Product and Process Profile Size: CICS is several hundred thousand lines of code. CICS/ESA 3 Release 1 involved some 268,000 of new and modified lines of code, of which 37,000 were completely specified using Z, and about 11,000 were partially specified using Z. Stages: 1979-81 Explorations on abstraction and encapsulation 1981-3 Restructure CICS decision; further explorations on Z and Dijkstra's Guarded Command Language 1983 Decision to use Z taken 1984-5 Z use on CICS/OS/VS in earnest; pilot studies 1989 CICS/ESA Version 3 Release 1 shipped 1990 Release 1.1 shipped 1991 Release 2 shipped 2.1.2 References 1. Spivey, J.M., The Z Notation: A Reference Manual, Prentice-Hall, 1989. 2. Collins, B.P., Nicholls, J.E. and Sorensen, I.H., "Introducing Formal Methods: the CICS Experience with Z", IBM Technical Report TR 12.260, December 1987. 3. Nix, C.J. and Collins, B.P., "The Use of Software Engineering, including the Z Notation, in the Development of CICS", Quality Assurance, Vol.14, No.3, pp. 103-110, September, 1988. 4. Phillips, Mark, "CICS/ESA 3.1 Experiences", Z User Group Meeting Minutes, Springer Verlag, September 1990. 5. Hayes, Ian, "Applying Formal Specification to Software Development in Industry", IEEE Transactions on Software Engineering, Vol. SE-11, No.2, February 1985. 6. Normington, Glyn, "Clean/Z", internal IBM paper, 3 April, 1992. 7. Houston, Iain, and King, Steve, "CICS Project Report: Experiences and Results from the Use of Z", VDM '91: Formal Software Development Methods, vol. 551, pp. 588-96, Lecture Notes in Computer Science, Springer-Verlag, December 1991. 8. Coleman, M.J. and J. Allan, Automated Quality Tracking, IEE Software Quality Assurance, May 1989, pp. 149-154. 9. Interviews with personnel at IBM Hursley and Programming Research Group, Oxford University. 2.2 Questionnaire 1 International Study on Industrial Experience with Formal Methods Project: CICS Respondent: John Wordsworth, IBM Hursley Lab Date: April, 1992 Subject: Responses to Questionnaire 1 1. The organization develops the CICS (Customer Information Control System) products for sale under license to IBM customers worldwide. The goals of the organization are to be (remain) the market leader for transaction processing systems. There is a continual search for reduced cost and improved quality in our transaction processing products. CICS is a transaction processing system that allows customers to write applications to run in a CICS-provided environment. That environment supports file and data base access, pseudo-conversational transaction processing, terminal management, function shipping, transaction routing, recovery and integrity of data over connected systems, etc. 2. The project began in 1981, and its aim was to apply mathematics to the development of CICS/OS/VS so as to improve quality and reduce development cost. Certain parts of the product were in need of re-development, and these were decomposed into a number of domains (abstract data objects) to provide services (dispatching, program loading, monitoring, statistics gathering, status saving and restoring, etc.) to other parts of the product. It was assisted by a research contract with Oxford University Computing Laboratory Programming Research Group that has been renewed many times. At the end of 1983, a decision to use Z for specifying the domains was taken, and general Z education was undertaken in 1984. Pilot studies were made to develop processes, procedures and tools to support the restructure of the part of CICS being redeveloped. The new material became part of a successor product to CICS/OS/VS called CICS/ESA Version 3. Release 1 was shipped in 1989, Release 1.1 in 1990 and Release 2 in 1991. Some field experience of the use of the product has therefore been gained, as well as experience of developing a new function on a well-specified base. 3. The product was developed because a policy of continual development and improvement had rendered certain parts of CICS difficult to understand and difficult to change. Impending requirements for support of improved hardware (multi-processing, etc.) meant that control had to be taken of the difficult parts. 4. The development of the domains as abstract data objects would have been impossible without formal methods. Z was chosen for recording specifications. Dijkstra's guarded command language was chosen for recording algorithms. These were chosen because they were powerful and expressive ways of recording, and because Oxford is only 60 miles from Hursley. The initial impulse to use Oxford as consultant arose from a talk given by Tony Hoare and heard by the CICS product manager at an Association for Computing Machinery (ACM) chapter meeting. 5. Z was used for writing specifications of the domains, and the guarded command language for recording the algorithms to implement the operations. Tools for editing, printing and cross-referencing the mathematical notation were developed. They have since been enhanced to include syntax, scope and type checking. 6. Advantages include quality improvement in delivered code, confidence of the developers in the work they were doing, and improvement in development productivity in later releases. However, some people didn't like it. Error rates in inspections suggest that many errors are found early with formal methods, and that the later stages of coding proceed faster than expected. 7. There are several technical reports, technical examples, conferences papers, etc., that report and explain this work and the techniques used in it. Other uses of Z in the CICS/ESA product area have been reported in various technical papers. 2.3 Interview Summary Date of interview: May 11, 1992 Organization: IBM Hursley Laboratory, UK; CICS/ESA, Product Division Respondents: Peter Collins, Peter Lupton, Paul Munday, Glyn Normington, and (tools demo only) Jonathon Hoare Interviewers: Dan Craigen and Ted Ralston (author) 2.3.1 Organizational Context IBM Hursley Labs Ltd is a subsidiary of IBM UK Ltd., an IBM holding company for all IBM UK operations. Hursley is a product development lab and is run as a profit/loss centre. There is a general Director to whom the various product managers report. Within the CICS Product Division there are several groups: a group concerned with the overall product development (ESA), and several groups responsible for the various hardware platform implementations of CICS. ESA is the largest of the groups and consists of some 400 people. The organization develops the CICS (Customer Information Control System) products for sale under license to IBM customers worldwide. The goals of the organization are to be (remain) the market leader for transaction processing systems. There is a continual search for reduced cost and improved quality in their transaction processing products. CICS is a transaction processing system that allows customers to write applications to run in a CICS-provided environment. That environment supports file and data base access, pseudo- conversational transaction processing, terminal management, function shipping, transaction routing, recovery and integrity of data over connected systems, etc. 2.3.2 Project Context and History History: Business development responsibility for CICS was moved to Hursley in 1974-75 from IBM Palo Alto at a time when another project was seen by IBM to be the wave of the future in transaction processing. CICS was then seen to be declining in importance. In 1981, personnel at Hursley began to investigate new ways to improve future releases of CICS. IBM Hursley has responsibility for all aspects of CICS, including the development and upgrade of future releases. Some 400 of the 1500 software engineers on Hursley's staff are employed on CICS. As part of a general effort within IBM at that time to introduce software engineering techniques into the IBM software development process in an attempt to deliver defect-free software, Hursley began a contract with the Programming Research Group at Oxford University, the group which developed the Z method. Beginning with the Software Engineering Workshops (an internal IBM training series for software developers), the Programming Research Group provided education in formal, mathematically based methods to the IBM software developers, both through lectures and small projects using the Z method to specify small modules. This initial effort at education lasted through 1984-85. Z was chosen in 1985 to be used to specify several modules of the next commercial releases of CICS (CICS/ESA Version 3, Release 1, shipped in 1989, followed by Release 1.1 in 1990, and Release 2 in 1991) Versions: Release 3.1 consisted of 268,000 lines of new and modified code, of which Z was used to specify some 37,000 lines fully, and another 11,000 lines partially. Release 3.2 consists of approximately 50,000 lines of code specified in Z. Follow-on Projects: In addition to the continued use of Z on additional CICS modules, such as the Application Programmers Interface, Hursley has begun a small project to link Z with IBM's Cleanroom method (called Clean/Z). The project combines elements of the Cleanroom approach, using Z in place of box-structures for the formal specification, and elements of Knuth's "literate programming" style for design documentation. 2.3.3 Application Goals CICS is an important revenue generator for IBM, producing over $2 billion in revenue annually. Ten years ago, CICS was not such an important product as today, although it was not considered trivial either. Time to market was a sort of goal but not primary, mainly because IBM had an installed base and CICS was a product that customers had come to expect at a regular, not time critical, schedule of releases. Cost reduction and improving quality were major drivers at the beginning. Midway through the project (c. 1985) productivity became an important goal in terms of future functionality. Reliability was and is an important goal since it is too costly to do a lot of field repair. Improving the quality of the application was an important product goal. To achieve it, the development team indicated they needed intellectual control over the code, which they believed a formal specification effort would give them. The marketing and sales force was a key group to be convinced that improving quality and adding more functionality to CICS would make CICS a more saleable product. Marketing and sales were generally of the view that CICS was more or less fine the way it was and that it only needed some restructuring, rather than improving quality and adding new features. However, this resistance was overcome and clients were impressed by IBM's drive for improved quality. The recent release of CICS 3.1 has been very successful, having been installed in roughly half the CICS sites worldwide. One of the main points in convincing the marketing staff was that the effort to improve quality would lead to a reduction in costs, and this became one of the major drivers behind the formal specification effort. Problem reduction is the aim of productivity, where productivity is measured over the lifecycle of the product. A common process model is followed at Hursley (and IBM worldwide) for software product development. It is called Product Process Architecture (PPA) and consists of a number of "stages": requirements, specification, high level design, low level design, code, testing, and delivery. The PPA terms IBM uses for these stages are: Requirements; Product Level Design (PLD); Component Level Design (CLD); Module Level Design (MLD); Code; Unit Test; Functional Verification Test; Product Verification Test; System Verification Test; Package and Release; Early Support Program; and General Availability. Of note, the PLD stage was introduced as a change to the PPA in order to accommodate the production of a formal specification. PPA was not in place at the start of the CICS/ESA use of Z in 1982 but went into effect shortly thereafter. At the start, Hursley followed the Design Reviews process, then current in IBM. 2.3.4 Formal Methods Factors The use of Z on CICS resulted from several influences but was mainly bottom up, i.e., it came from the development team. The CICS/ESA team, which in 1982 was relatively small (about 10 people), had been having trouble and foresaw continuing problems, with increasing functionality and quality largely because they did not know what CICS really was. The specification at that time was written in pseudo-code, was unmanageably large, and was poorly documented. The CICS code itself was a hodge-podge of modules with patches in various languages that dated back to the late 1960s). At the time (c. 1982), the team was exploring the use of abstract data types, encapsulation, and the like, because the pseudo-code they had was imprecise and ambiguous, and they were unable to document the state of the system or the pre-conditions. At the start of the project, design reviews were basically conventional inspections, which were common in the early 80s, but these were soon replaced with the PPA process. Reviews became both informal and formal parts of the process. The team would discuss parts of the spec both before formal design reviews and during the formal reviews. In the view of the developers, Z helped this communication, particularly in comparison to previous experiences. Z helped in two ways: the underlying maths notation was much better than the pseudo-code they had had before; and Z influenced the clarity of the English annotations. The pseudo-code ignored details; it just gave an approximation (WHILE/UNTIL loop example was given and the fact that pseudo-code did not document state or pre-conditions) and defects could be detected only while doing an inspection, once the code had been written. As a result, Program Level Designs are now used for all other modules, even those not using Z, and the Z Program Level Designs help uncover problems earlier. The decision to use formal methods modified the Product Process Architecture by adding the PLD level and a new formal document. This formal document included both Z and English (as is usual in Z specifications). Z was used at Program Level Design, Component Level Design and Module Level Design stages. The Guarded Command Language was used at Component Level Design and Module Level Design. The pressure to change the development process resulted from grass roots displeasure with the loss of intellectual control. This displeasure percolated up to CICS management; a serendipitous talk by Hoare at Oxford resulted in contact between Hursley and the Programming Research Group. The first contracts were between Hursley Technical Planning Group and Oxford. These then migrated to the CICS development team and Oxford. As they started considering the use of formal methods, there was a choice between Z and CDL. Oxford's work on developing Z specifications for a part of the Application Programming Interface helped to convince Hursley to use Z. Composability (using the schema calculus) was viewed as positive and certainly helped in choosing Z over VDM. 2.3.5 Formal Methods and Tool Usage CICS/ESA 3.1 had 268,000 lines of new or rewritten code. Of this, about 50,000 lines could be traced to Z specifications. Typically, a CICS release is around 100,000 lines of code. Hence, this release was much larger and was the first time that there had been a significant rewrite. They rewrote low level domains (e.g., storage manager, dispatcher; i.e., the basic services). It was not at the level of resource management (e.g., database or file). This is a bottom-up process and it seems that some of the current work is now at the resource management level. In their rewrite, they re-engineered existing code and documentation and used the ensuing Z specifications to implement new code. The interface between the new and old code has been a source of a number of problems. There have also been problems with interfacing with the underlying operating systems. Maturity: As of the start of the project, the formal methods process was not mature; now, they feel it is. It is still immature with respect to concurrency, and there is no formal approach to concurrency with which Hursley is happy. There are no real time obligations in CICS, except for delaying execution. The process was certainly mature beforehand. While the insertion of formal methods into an already mature process led to some increased immaturity, the CICS developers felt the risks were reduced. Risks: There exists the usual fall-back position: go back to the conventional approach. Hence, the risk of using formal methods was considered by Hursley to be minimal. Scalability: There were no previous example of systems of the same size being handled with formal methods. However, scalability issues were of importance in the choice of Z (with the schema calculus). Readability/Reviewability: In the view of the developers, a good Z specification plus explanatory English text is the best specification one could have. A bad Z specification could be worse than English. The group felt that Z was superior to VDM, SEDL, etc., with respect to readability. The tools evolved as follows and were produced by the CICS development team. Pencil and paper and eraser | Editing and printing of Z documents | Cross-reference tool | PS/2 based tool The first three levels did not have automatic semantic analysis. The tools were very cheap to develop, easy to use, and robust. The tools were simple in their functionality and implementation. None of the tools were formally specified. The tools were independent of the LCS (Library Control System). LCS performed configuration management, etc., for the code. IBM is considering adding text to LCS and thereby including the Z specifications in LCS. LCS was built to handle unstructured code. 2.3.6 Results The major claim by IBM on the efficacy of using Z to specify major parts of CICS is found in the paper by Mark Phillips, and repeated by other IBM personnel. "The initial conclusions from using the Z notation in the development of CICS/ESA 3.1 were that Z specifications are appropriate both in new components and where large changes involve rewrites. Measurements indicate quality improvement of CICS/ESA 3.1 throughout the development process. There is also an improvement in development costs, and the number of errors detected was reduced. The initial development benefit has been assessed to be a 9% improvement in the total development cost of the release, as opposed to developing the 37,000 lines without Z specifications. The assessment of cost benefit was based on the reduction of programmer days fixing problems (italics in original)". Further, during the interview Hursley personnel indicated that the number of customer reported faults had been reduced by 60% in the Z specified code compared to the non-Z specified code. They attributed this improvement to the more detailed reviews required by the formal specification effort which uncovered design errors earlier. In addition, several quality assurance and process claims have also been made, again principally by IBM personnel [2]. The quality claims centre around defect detection, elimination, root cause removal, and quality prediction through creating defect removal models [8]. Attempts to use these latter defect removal models were not successful, because of difficulties in calibrating them for formal methods. There are also some qualitative claims made about improvement of the development process from the point of view of the managers and developers impressions. Figure 1 lists some of the key metrics used by IBM Hursley in measuring the efficacy of Z in developing the latest CICS release. These metrics were used as a standard part of the IBM development process, although not all the product and process metrics used by IBM are listed here. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ  General Developer or tester person hours Thousands of lines of new or changed source Thousands of lines of shipped source Line item descriptive name or identifier  Inspections Preparation time for each inspector Inspection time for each inspection Total rework time(resolve and repair) Duration of inspection meeting Defects recorded, coded by severity and cause Notation and languages of the design material  Testing Test cases prepared, attempted success/fail Defects found, by severity and functional area When fixed, cause and earliest detection stage Customer implementation Number of licenses Number of apparent defects Defects in CICS code or documentation Hours spent by IBM in maintenance and defect removal/repair ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 1 Metrics 2.4 Evaluation 2.4.1 Background Product: IBM's Customer Information Control System, a large transaction processing system installed at 30,000 locations worldwide. Industrial Process: Re-engineering of a number of modules of the CICS software, and addition of new features. Scale: CICS is approximately 500,000 lines of source code; Z was used to completely specify 37,000 lines of code and to partially specify a further 11,000 lines of code. Formal Methods Process: A formal specification using the Z method introduced into an established software engineering process. Major Results: 9% reduction in total development cost; 60% reduction in errors after coding; successful integration of formal method into an established software engineering process. Interview Profile: 1 day of interviews with CICS Product Division personnel, plus tool demo. 2.4.2 Product Features Client satisfaction n/a (1) Cost n/a (2) Impact of product + (3) Quality + (4) Time to market n/a (5) (1) Not enough evidence was collected to make a definitive judgment. The client in this case is the CICS end-user. The only direct evidence collected regarding client satisfaction was the 60% reduction in number of customer reported errors that require correction, which by itself would indicate a positive impact. Indirect evidence also indicates CICS users are satisfied with the new release, as they have installed it and continue to use it. This should be weighed against the observation that the customers are somewhat "captive" since CICS is an application product matched to IBM hardware (and consequently might have little choice), although CICS is portable to other platforms. (2) The 9% reduction in total development cost coupled with the reduction in customer reported faults supports the contention that the early defect discovery and correction helped achieve a reduction in costs. However, without further data (such as previous development costs) with which to compare the 9% reduction claim, or evidence showing the connection between the fewer customer reported faults and the 9% reduction, it is not possible to assess the impact the use of Z had on the cost of the product. (3) CICS is considered by Hursley to be an important product and a major revenue source for IBM. The fact that IBM still offers, updates, and supports CICS after so many years of service would also seem to indicate IBM's product decision-makers believe this as well. However, no independent evidence was collected to support this view. (4) Quality improvements were observed with respect to defect reduction and to the process used by IBM to develop CICS. While other factors may have also contributed to this quality improvement, the use of the formal specification definitely contributed to solving several quality problems, such as bringing the documentation and code under control and improving design reviews. Reliability, as a constituent element of quality was also improved, as indicated by the 60% reduction of customer reported faults. (5) While even Hursley personnel admitted they are not renowned for speedy delivery, it should be noted the majority of the products they produce, including CICS, are not time-to-market sensitive. CICS is an old, established application package for an installed base and is on an evolutionary path, with regular releases every couple of years. Consequently, time to market was not an applicable measure in this context. 2.4.3 Process Features General Process Effects Cost 0 (6) Impact of process + (7) Pedagogical + (8) Tools + (9) (6) While no figures were collected, significant "start-up" costs were incurred at the beginning in education and training, including the contract with Oxford and the development of an internal course for the Software Engineering Workshops. In addition, the time the software developers spent away from their main duties while they learned the new method must be considered. On the other hand, the positive improvement of the process and the product must be weighed, as should the fact that introduction of any new method or technique incurs start-up costs. On balance, it is our view that the impact on the process in terms of cost was neutral. (7) The PPA process was reinforced by the addition of the formal specification (PLD) stage by means of requiring earlier in-depth reviews. (8) A number of important lessons were learned over the years this project has been running. (9) Starting with "pencil and paper" and developing tools as needed meant the tools were a good fit to the process. The fact that the tools are easy to use and robust has also helped the effort in using Z within the CICS/ESA organization. Specific Process Features Design + (10) Reusable Components n/a (11) Maintainability + (12) Requirements Capture + (13) V & V + (14) (10) The formal method had a positive impact on the design in the sense that the formal specification process concentrated attention on both high-level and low-level decisions, earlier than at the implementation stage. Also, using abstract data types to represent the CICS basic services, such as the storage manager and the dispatcher, was a first for CICS and was facilitated by using Z. The pseudo-code they were using previously did not support Abstract Data Types (ADTs). Through abstraction and documenting the state of the system, Z also made reviewing easier than with the pseudo-code. (11) While some reuse of existing components of CICS was involved, in the sense that functionality in old modules was re-engineered, we did not address this topic fully enough to make a judgment. (12) CICS/ESA Version 3 is in use in over 2000 sites worldwide, and its reliability has been the subject of favourable comment by the users. This is also supported by the 60 percent reduction in customer reported faults, from which one can infer that maintenance of CICS/ESA Version 3 has improved. (13) Given that this project involved a re-engineering of an existing product, as well as the addition of new functionality, the formal specification effort had a positive impact on gaining intellectual control and understanding of the requirements either as they were expressed in the original documentation or, in the case of new features, de novo. The formal method was indirectly useful in negotiating with the CICS marketing personnel by giving confidence to the developers they could add new features and maintain high quality and not run up the development costs. (14) Unit, function, system and product testing were performed on CICS/ESA 3.1 and 3.2. in accordance with the PPA process. The use of formal methods has no real impact on system testing. On function testing, Z was of some value. Most of this testing is black box testing, but there is some white box testing as well. The main influence of Z occurred at the unit testing level. Hursley built a new testing environment which allowed for the testing of new domains (specified in Z) and they could plug in prototype domains. There is no statistical testing in the current CICS process and if the CICS process is related to Cleanroom they still will not drop function and unit testing. We judged this impact to be positive because the unit testing allowed more rapid domain testing through prototyping. 2.4.4 Observations As a case study of industrial experience, the CICS case is one of the best known, longest running projects in which a formal method has been used on a real industrial software product. A reasonable amount of information is available through private interviews and a few published papers and technical reports, enough to conclude that the effort expended by IBM's Hursley Laboratory: - involved a real product with demonstrable commercial pay- off and importance to IBM, - has been focused on both re-specifying old code and specifying new code, - produced satisfactory results that permitted the release of the latest CICS product (CICS/ESA 3.1), and - Z is continuing to be used in the development of subsequent releases of CICS (e.g., Release 3.2). At the same time, the published facts do not as yet provide enough data to conclude definitively that the productivity and quality assurance claims are justified or meaningful. Rather, they suggest a modest productivity improvement and an ability to produce fewer errors, detect errors earlier in the process, and find deeper errors that could go undiscovered until after delivery to the customer. It is difficult to assess the significance of the productivity improvement claim since other productivity data from earlier CICS developments (against which IBM claims to have measured this latest release) have not been provided for comparison. Similarly, some data is provided on previous error detection experiences with earlier releases. However, the most one can do is to infer that IBM had this interest in using a formal specification method because of a real problem. Given the claims regarding reduction in total cost of development and reduction of customer reported faults, it still remains necessary to obtain more quantitative comparative data or authoritative corroboration from IBM before a definitive judgment can be made. In part this is because of IBM's understandable hesitancy to disclose what could be confidential information, and in part because the people involved in the CICS project are programmers and software developers who do not as a rule publish. It is also difficult in advance to assess the effects of a radical change in technology, and to develop metrics adequate to measuring them. It appears that the use of Z or other formal methods has not spread to other parts of IBM. The Clean/Z project mentioned above is a very small effort at this point, and it is too early to tell whether it will in fact turn into anything meaningful. There are questions as to how much of the Cleanroom method is in fact used in Clean/Z, since statistical testing is not used, and the Knuth "literate programming" style is not a part of Cleanroom. Simply substituting Z schemas for box structures does not mean that the full Cleanroom method has been adopted. 2.4.5 Formal Methods R & D Summary 2.4.5.1 Methods Specification The Z method was used exclusively to specify the "basic services" features of CICS and the Application Programmer Interface, rather than the resource management services. CICS has a few modules that have timing and locking constraints (mainly execution delays, but Z was not used on this part of CICS, in part because the developers knew Z could not handle these properties. Also, Z schemas are basically atomic and not modular (e.g., no global definitions or scoping ability) which is seen as a potential problem as systems get larger. Design and Implementation Refinement was not attempted, in large part because the CICS developers believe that proof would be too time consuming. The resulting errors have primarily been coding problems, not design problems. One emerging problem, which is not necessarily specific to Z or indeed formal methods, is keeping the specification current with the code. This has been observed in other cases in this study. Validation and Verification No clear issues emerged from this case as the Z specification seemed to have very little impact on the Testing or Verification parts of the Product Process Architecture. With respect to Unit Testing, it was noted that symbolic execution could help convince others that functions modeled in Z worked but that the cost of achieving symbolic execution might be too high. 2.4.5.2 Tools Language Processors An automated semantic analysis capability was noted as desirable (once a Z semantics is finalized). Other Tools An interface to a configuration manager, or a Z specific configuration manager, would be useful. 2.4.5.3 Recommendations Specification Although CICS has minimal timing constraints, Hursley personnel indicated a belief that Z was not able to handle concurrency or other timing considerations adequately. As observed in some of the other cases using Z, this deficiency should be addressed. Design and Implementation Hursley did not use refinement to implement CICS. They noted that, at the time, refinement by using proofs would have taken too much time. In their view, refinement could be useful and more research is needed, especially to understand the path to lower levels, which is one of the goals of the Clean/Z project. Tools Hursley is happy with the status of their Z tool and does not see any reason for more capability at this time. They noted that development of a pre-condition calculator would be useful. 2.5 Conclusions The CICS project has used a formal method successfully to improve a well-established "bread and butter" product for IBM. Primarily, this means the formal method has been used with success to gain intellectual control of the system. "Gaining intellectual control" means that aspects such as unmanageable documentation, a lack of understanding of what all the code did, and other such issues that plague old, "dusty deck" products, have been brought under the control of the team responsible for it, through the effort to re-engineer important existing modules as well as adding new functionality. The use of an abstract specification helped achieve this control by helping to clearly identify difficult aspects of the system and to facilitate better reviews. This is important in terms of both the improvement of the product and the marketing of CICS. The CICS project is one of the few formal methods efforts we examined that has extended over a relatively long period (a decade). As such, it has passed through several important stages -- feasibility study, education/training, experimentation, real development, extension to further releases -- all of which have contributed to acceptance and routinization of the method at least within the CICS/ESA group. This latter observation is also important: except for a small effort on Clean/Z, use of the Z method has not spread outside of the CICS/ESA group to other parts of the organization or company. This failure to spread is clearly a negative aspect of this experience. The CICS project has also taken an opportunistic approach to developing tools. This means the tools were not developed until a need for them arose (not beforehand as in some other projects). As such, the tools do what is expected of them, neither more nor less, and all things being equal, they are not an issue in the use of the formal method (i.e., concerns for their reliability or performance do not get in the way). Given the reputation IBM has for measuring many aspects of the development process, the absence of published comparable data by which to compare the formal methods results on CICS with previous CICS development efforts using other methods is regrettable. However, it is also unfair to expect commercial companies to disclose proprietary data. Perhaps the absence also illustrates the difficulty that exists in finding adequate and acceptable metrics for new methods generally not just formal methods. 3 CLEANROOM SOFTWARE METHODOLOGY 3.1 Case Description Cleanroom Methodology This case includes interviews with two separate organizations, both using the same basic set of principles that embody the Cleanroom Methodology: - A pipeline of increments - Statistical quality control through usage testing and reliability models - Mathematical verification in place of unit testing Formal methods play an important role in the Cleanroom methodology, but the methodology has several parts that must work in concert to achieve the goals of high quality and productivity. Mathematics-based development and verification (with no unit debugging) are expected to yield such high quality software that a statistical testing process can be applied. Formal team reviews for correctness and simplicity are applied at each step, including stabilization of specifications before design. Appropriate mathematical structures for different parts of the system may include box structures, grammars, and logic. Natural language is used to explain the formal specifications. The basic formalism often used is function-theoretic, with considerable effort expended on intermediate functions that constitute the design decisions. Data structure discipline requires the use of only objects, such as stacks and queues, that simplify reasoning. Testing is based on usage profiles that are built up as part of the requirements process. For more complete explanations of the formal methods aspects of Cleanroom, see the references below. IBM Cleanroom Software Technology Center and the COBOL/SF product The product subject of this case is the IBM COBOL Structuring Facility (COBOL/SF). The case also investigates the more general experience with Cleanroom and its transfer within IBM. Briefly describing the product (from the respondent questionnaire #1): COBOL/SF automatically transforms unstructured COBOL programs into structured form. The product is comparable in function and complexity to a modern high-level language compiler. It applies proprietary graph-theoretic algorithms to convert programs to an internal form suitable for application of the Structure Theorem constructive proof. The resulting mechanically-structured program is then transformed by rewrite rules into a structured equivalent that more resembles code a programmer would produce. 3.1.1 Product and Process Profile Size: 80KLOC, 28KLOC reused from earlier versions, 18KLOC changed, and 34KLOC new. Effort: Seventy person-months of effort spread over seven months in 1987, building the product in five increments. Errors: 3.4/KLOC Productivity: 740 lines/person-month Stages: Prototype: Proof of concept. Version 1: First product, structured VS COBOL II only. Version 1A: First product upgrade, added new features and improved structuring. Version 2: Second product, added OS/VS COBOL structuring and many new features in a major re-architecting. NASA Goddard and a Satellite Control System NASA Goddard Space Flight Center (GSFC) operates a Software Engineering Laboratory (SEL), in cooperation with the University of Maryland and Computer Sciences Corporation. The SEL experiments with technology and methodology of interest for advancing the state of practice in development of GSFC software systems. A series of experiments is in progress for Cleanroom. SEL versions of Cleanroom are tailored for the GSFC environment and may involve different combinations and degree of application. Measurement of experiments is planned and helps guide the experimentation over years and projects. The case experience reported here is offered for comparison with the IBM Cleanroom approach, which is the major subject for the case study. 3.1.2 References 1. Richard C. Linger and Harlan D. Mills. "A Case Study in Cleanroom Software Engineering: the IBM COBOL Structuring Facility", CompSac, IEEE Computer Society, 1988. 2. Michael Dyer. "The Cleanroom Approach to Quality Software Development", Wiley Series in Software Engineering Practice, 1992. 3. R. Linger, H. Mills, and B. Witt. "Software Engineering Laboratory (SEL): Cleanroom Process Model", NASA Goddard Space Flight Center, SEL-91-004. 3.2 Interview Summary: IBM IBM Cleanroom and the COBOL/SF Product Date of Interview: April 2, 1992, 2.5 hours Organization: IBM, Gaithersburg, Maryland Respondents: Richard C. Linger and Phil Hausler Interviewers: Dan Craigen and Susan Gerhart (author) 3.2.1 Organization and Project Context Mission: Linger and Hausler manage the Cleanroom Software Technology Center (STC). The center is composed of 15 people and will be expanding in the near future. It was founded in 1990 as a result of earlier Cleanroom experiences, especially the COBOL/SF program. The center is chartered to transition Cleanroom technology to other divisions in IBM through consultation, education and tools. A general approach has been to assign an experienced Cleanroom practitioner to a development team and for that individual to consult with the project. They also use video and networks to keep in touch, and to help members from the first adopting teams as they work on successor projects. Linger claimed that they now have a number of results where the software developed using Cleanroom improved the results (error rates) of the COBOL/SF project. Linger went on to say that he sees an easy order of magnitude reduction in error rates and feels they have a chance of making two orders of magnitude. (IBM has corporate goals to achieve 10X improvement this year and 100X by 1995.) Other Cleanroom projects include teams making modifications to large systems, e.g., adding functionality, quality re- engineering, and reverse engineering problem modules. Phil Hausler, after working on COBOL/SF, moved to an AI project, with a large organization of 50, that developed a system of 107 KLOC. The approach used to transition Cleanroom is as follows: 1. The labs identify a number of projects where Cleanroom could be used. 2. The relevant individuals are trained in Cleanroom and expert consultation is provided. 3. There is an attempt to have the individuals at the labs increase the use of Cleanroom. Hence, there is an explicit attempt to have the labs become self-sufficient in the technology. Educational Backgrounds: There is a wide variation in the educational backgrounds of those being exposed to Cleanroom, from retrainees to Ph.Ds. In general, however, they are well educated with bachelors or masters degrees in computing science and mathematics and with industrial experiences of 5 to 10 years. Hundreds of individuals within IBM have been trained in Cleanroom and there are a number of projects within IBM where Cleanroom is being, or to be, used. The goal is to add 5% to what the attendees already know. A curriculum of four courses is now taught, one of which uses the older Linger, Witt, Mills book. Management Structure: The management structure is small teams: a first-line manager plus a half-dozen developers and testers. Any kind of management structure is viewed as acceptable for Cleanroom, but stress must be placed on teamwork. Several examples of how teams work include: - People often change roles from testers to developers and vice versa. - Typically, there is one architect on the team. - If a change comes in, it goes through the affected team leader, but they also maintain a database with formal change tracking. Incremental planning is a substantial part of the management, and must be agreed to by both teams (developers and testers) up front to certify against each increment. There are two critical success factors: management commitment and support, and team motivation. Process Maturity: With respect to the Software Engineering Institute Maturity Model, Linger felt that those projects using Cleanroom were certainly at Level 5. Feedback is provided from early experience into courses within projects. As well, there is feedback for process improvement as builds and reliability measures are used. The wider IBM process is typically waterfall, with lots of unit testing. It was noted that Cleanroom products usually result in a smaller amount of code than normal: i.e., higher functionality per line of code. There is no correlation between error rate and team size. How Cleanroom came about: In the early 1970s, the Federal Systems Division needed someone to teach courses on structured programming and verification. Through the development of the course, Linger became more involved with Mills. Their collaboration led to the book [4] on correctness verification, and software analyses and design. In the early 1980s, Cleanroom started coming together: unit testing and debugging were stopped, with verification taking their place. Instead, testing was to be used to certify quality, not to put quality in after the fact. About that time, the box structuring ideas also came along. They could not figure out how to scale up the axiomatic approach but functions do scale up. Mills had the theoretical basis but had to build up the practice. The COBOL/SF Application Objectives: The goal was to develop a COBOL structuring program that would be a companion product to the VS COBOL II compiler. The structuring program was regarded as a critical product. Resources were made available under severe time constraints. Technical Challenges: This project broke new ground in many ways: it was the first structuring engine in IBM, it was an experiment with Cleanroom, and it was a prototype effort to convince IBM management. The complexity of the COBOL semantics almost brought the project to a halt. COBOL was formalized using attribute grammars and denotational semantics. IBM Federal Systems Division had a number of proprietary graph theoretic algorithms for structuring flowchartable programs. Structuring COBOL was a formidable task. It was tough to get COBOL programs into flowchartable form, requiring significant graph theory (a Ph.D. mathematician was on the team). An ANSI standard was used in the restructuring to help define OS/VS COBOL and VS COBOL II. IBM required the capability to restructure 100% (without that there would be no confidence in the product). The COBOL formal description was primarily expressed in attribute grammars. There were three subsystems, each with its own spec, which were 2-3 inches thick. 3.2.2 Formal Methods Factors The criteria for choosing a formal method included referential transparency, completeness and closure. It is crucial to have the right theoretical underpinnings. No extensive search was made nor were other candidates evaluated: they accepted Mills' approach. 3.2.3 Formal Methods and Tool Usage Specification techniques included grammars, transformations, conditional rules, and state machines. Box structures at the highest level were uninteresting but the internal design used lots of state machines and common services. One change that the Cleanroom researchers would do now is to use full box structured notation. Box structures were still under development at the time of the project. They used Program Design Language in the second major phase. The design is expressed as functions in Program Design Language, including conditional rules for intended functions. Verification review used mental proofs based on the correctness theorem. No written proof of correctness was produced. Most Verification Conditions were checked in seconds, but some take hours. They found lots of reasoning patterns, e.g., queue data types. Hausler had doubts about ever finishing but claimed that people get very facile doing these Verification Conditions mentally. Systematic data structures with regular properties had an important role in carving up the state space. The rule "functions first, performance second" was usually satisfactory. Linger sums up with ``The heart and soul of Cleanroom are the verification reviews.'' Testing: The test team has to determine the usage probability distribution, stratify the input for the incremental builds, and run the test cases. No knowledge of the internals is used in developing the test cases. Usage testing is considered 20 times more effective than coverage testing. With usage testing, you get better and better testing results. Newer Markov models give a high number of cases for a few states. On one project, they captured baseline usage for a new device controller. No coverage monitoring is used, although some alternate distributions are introduced for critical failures. Some self checking is applied but there remains the usual oracle problem: What is the right answer? Linger observed that if the testing has more than four errors per KLOC then one should take the code off the machine. This project used formal methods in all aspects of the project. The heuristics used to decompose were smallest interfaces, with referential transparency being maintained. They were always guided by completeness and provability goals. At the time, Cleanroom was a new technique. Nobody had built a structuring engine, hence a prototype was developed first. Only one other such product was on the market; afterwards, there were three or four competitors. There were other Cleanroom projects of significant scale, but this project was the first of this scale (80 KLOC). Cleanroom generally has a relentless push for code reduction and simplification. The users try to reduce the size of the state space and look at getting the functionality correct; they review performance issues later. In this case, the tools were all PC-based: a good text editor, certification modellers, PDL-to-PL/I translator, and test case generator (see the appendices of Dyer's book [2]). Today, they would use a smart editor. All tools were very simple and developed by the group. Other groups are customizing their own tools. There were no interfaces to other developmental tools. Tool development was an auxiliary activity, not a goal in itself. For some humour, consider how well the one noted tool- --the wastebasket---meets the objectives: cost-effective, robust, easy to learn and to use. Cleanroom is not a tool- based methodology. They would like something to keep track of prime programs (e.g., a loop body), and navigation through and coverage of sets of correctness questions to facilitate the additional review of ideas and designs before verification. They used function-theoretic verification, which scales up, versus axiomatic (Hoare-style) which no one knew how to scale up at that time. Linger noted that they had spent a lot of time working on proof checking research. (None of this work is available outside of IBM.) Regarding automation, they were unsure how much change would have occurred if they had had theorem proving, because the other role of verification is communication within a team. The team owns the product, errors, etc., that helps track change. Individuals are assigned tasks in the usual manner, but the team is responsible for review and ownership. The kinds of errors found were not deep design, just typos, etc. A typical build will not run the first time, but when fixed it will run forever. Much less time (a few days) is spent on critical testing, so builds go faster. They did once pull an increment off the machine, took it back and re- designed. 3.2.4 Results Without Cleanroom, they felt they would never have completed the COBOL/SF product. They faced hopeless complexity (COBOL semantics, lots of rules) but did produce a very reliable product in the field. Linger claimed that COBOL/SF was recognized as an incredibly reliable product that resulted in the Cleanroom Software Technology Center. Cleanroom has been used to find errors in operational code and has saved substantial resources. It takes less than one person-year per year to maintain the system. True, there are fewer restructuring installations than compiler installations, but the restructuring runs for long amounts of time. The team felt many effects of the Cleanroom experience. They were invigorated and geared toward 0-defect. That became part of their work ethic. If they could do anything differently, they would use the luxury of up-front education, but there was nothing else substantive to change. They would use better software tools, but no substantial changes in the methodology. Cleanroom continues to change. In research, they are working on an extension of box structures for particular types of systems; real-time extensions, dealing with concurrency (now being used in complex buses, etc. but the results can't be released); improved Markov statistical process; trade-offs in quality certification. In practice, they are learning that each team tailors for themselves. Cleanroom makes available a set of technologies and practices that can be customized and adapted to an organization's environment. 3.3 Interview Summary: NASA Goddard Center NASA Goddard Cleanroom Experiments Date of Interview: April 2, 1992, 2.5 hours Organization: NASA Goddard Center, Greenbelt, Maryland Respondents: Scott Green (NASA) and Victor Basili (U. Maryland) Interviewers: Dan Craigen and Susan Gerhart (author) 3.3.1 Organization and Project Context Software Engineering Laboratory: NASA Goddard Space Flight Center (GSFC) in Greenbelt, Maryland, builds satellite attitude determination and ground support systems. For 15 years, NASA GSFC has worked with the University of Maryland and a contractor as part of a Software Engineering Laboratory (SEL). The SEL experiments with new technologies and new processes, publishes reports, and conducts an annual Software Engineering Workshop. Case studies in the SEL are conducted on-line (as the project is producing its product) along several themes: Cleanroom, Ada, and CASE. The general model for the experiments is described in many papers by Professor Victor Basili on the concept of an "Experience Factory" with the goal of projecting and improving software development from an "experience base", e.g., costs/defects. For example, in attempting to answer questions such as `Should we cost a Cleanroom project in the same way as other projects?', they use these projects to evaluate technologies that they are considering bringing in-house. Process: The normal NASA process follows the waterfall approach, using its own handbook. Projects tend to short- duration (one year) to satisfy specific satellite missions. A large base of software has been developed that drives future applications. This "experience base" also influences the SEL. This was one of the standard satellite systems being built at GSFC. However, the goals of the project did differ from the general goals by being a "designated SEL" project, i.e., a defined process was adopted and the process modeled and measured. The project had a strict time schedule, working on one subsystem concurrent with other subsystem developments. Economics were not a factor since the project was "routine" and predictable. Reliability was important but was handled as usual at GSFC. Quality ditto. Timeline: Consider the project to be the Coarse/Fine Attitude Determination System (CFADS) application as described below. The "extended project" is CFADS plus the SEL experiment structure. We will blend these together since both are the subject. The timeline for experiments is: Experiment 1 (e1): Sept. 1988--1990 ACME a.k.a. CFADS Experiment 2 (e2): 1990--May 1992 ISTP 1990--Dec. 1991 SAMEXTP Experiment 3 (e3): 92-?? TOMS Experiments started with courses from Basili (University of Maryland) and Terry Baker (IBM) and Michael Dyer (IBM) plus Cleanroom originator Harlan Mills' "Theoretical Cleanroom" talks and experiences of previous project teams. Management: Scott Green is the manager of the project for e1, and the process modeller for e2 and e3. The process modeller defines the goal of the experiment, data to collect, and follows through with the data collection and analysis. The project manager determines the size of build, one of the trickier parts of the management of the process. Members of both teams conduct the requirements analysis. 3.3.2 Application The following is the context of CFADS in the Attitude Ground Support System: AGSS: Attitude Ground Support System ADS: Attitude Determination System AGSS / \ ADS utilities / | \ CFADS CFADS was a real project with a tight time schedule. A secondary goal was a SEL experiment to characterize the methodology. A novelty was the merger of two sub-systems previously doing "coarse" and "fine" analysis separately. 3.3.3 Formal Methods and Tool Usage Criteria: Cleanroom was chosen as a state-of-the-art technique to import into NASA. The rationale for how Cleanroom was used (emphasizing reading, not verification) went back to earlier experiments at the University of Maryland comparing testing and reading. They noted that reading was more effective, to the extent that reading can be viewed as a verification technique in itself. It is naturally linked to formal methods, by requiring a structured way of reviewing the code, namely stepwise abstraction. Experimental Process: 0. A course on Cleanroom, including how to read code. These were lectures, not labs. 1. An independent organization sent the specifications to SEL. 2. For the first month the developers and testers worked together to understand the specification. Requirements analysis. 3. Development of the design then started. Developers used state machines in CFADS. The testing teams worked out the statistical patterns of the inputs and developed the test suite. 4. Peer review of the design and software. They had relatively small incremental builds. The formal methods used were state machines and box structures. Such approaches allow for good understanding of the system and result in clean interfaces. There was limited teaching of box structures and state machines and SEL modified the Cleanroom methodology to their needs. Essentially, they took what they thought would work within the organization. Box structures are being used in the second experiment. The requirements document for CFADS looks like a data flow diagram. Usually, SEL abstracts from the data flow diagram using Cleanroom. However, Green and Basili have found that the resulting implementation, using Cleanroom, is much like what they have had in the past. This group is very knowledgeable of their domain of application. No special tools were used nor are tool developments planned. Experience: The CFADS experiment resulted in several observations. The application was mainly formulae, requiring less logic and more sequential code. This may have affected the reading process. The equations came from the specification. Regarding box structures, if one is unfamiliar with the system, then box structures are good for understanding. In the NASA applications, the requirements are usually in the form of data flow, where the box structures can be used to analyze the flows. But the system was designed like all previous ones, in terms of architecture so value of state machines was not maximized. Another view is that they used state machines as models. CFADS incorporated numerical methods techniques such as least squares. The specifications were also very detailed and included algorithms. The specifications are generally not very stable. However, the better understanding of the specifications that resulted from their use of Cleanroom made it easier to modify the software according to changes in the specification. 3.3.4 Results Experimental Results: From the SEL perspective, the results of using Cleanroom have been favourable. SEL feels that Cleanroom is effective for <50KLOC (and up to four person- years of effort). Experiment 1 on 40KLOC yielded good results versus the SEL baseline: half the number of changes, three-quarters the number of errors, and nearly twice the productivity. The team was able to handle unstable specifications and successfully mastered the review process, although there were weaknesses in the technical aspects of Cleanroom involving abstractions and operational testing profiles. The second experiment (on two project of 20 KLOC and 150 KLOC) brought in a process handbook and focused more training on the use of abstractions. However, while the trends are favourable, the second experiment produced fewer changes and errors while maintaining the SEL baseline productivity. Improvement Trends: Two issues cloud the second experiment: (1) larger size and (2) external contracting. The outside/inside team worked together (physically), as normally occurs in NASA. External contracting requires additional tailoring and further investigation. It is hard to control all factors in experiments such as this. Therefore, trends are more important than actual numbers. The results are not conclusive but their data shows their Cleanroom approach to be effective for small projects (four person-years), though less assured up to 150 KLOC. The SEL approach will be spread to four other NASA sites through NASA cooperative arrangement, but this SEL approach is more general than Cleanroom. Lessons learned have been codified in a NASA guidebook [3], including such general advice: - Do Cleanroom to a level of rigor that is appropriate. - All methods must be tailored, which the NASA guidebook assists. 3.4 Evaluation Since COBOL/SF is the primary application, the evaluation is performed with respect to that product and the IBM experience. The NASA Goddard experience is provided as background, but since the Cleanroom methodology was tailored extensively we have not incorporated it into the evaluation, except as notes of other points of view. Also, our NASA Goddard interview was brief, less focused on formal methods, and did not gather sufficient information to complete an evaluation. However, the SEL experiment papers and the NASA Goddard Cleanroom Guidebook are recommended for more insight into the general methodology. 3.4.1 Background Product: A commercial language processor, the COBOL Restructuring Facility. Industrial Process: A new methodology, Cleanroom, was used in the product development. Various standard industrial practices are greatly modified, e.g., no unit debugging in favour of verification by inspection. Incremental builds provide both system planning and reliability estimation. Scale: This product came in at 80KLOC although it was projected to be much larger (400KLOC). 70 person-months of effort were involved. Formal Methods Process: Cleanroom involves mathematics-based specification and verification using some appropriate formalism (here function-theoretic), and statistical testing for quality certification. Major results: COBOL/SF was viewed as highly superior with respect to both reliability and productivity, leading to efforts to institutionalize the Cleanroom process through an IBM Software Technology Center. Interview profile: Approximately one-half day with Richard Linger, STC head, and Phil Hausler, project member and Cleanroom process manager. 3.4.2 Product features Client satisfaction n/a (1) Cost n/a (2) Impact of product n/a (3) Quality + (4) Time to market n/a (5) (1) IBM clients seem to be satisfied by the COBOL/SF process. An IBM Application Brief describes the experience of Pratt & Whitney. However, we have insufficient information with which to draw a conclusion. (2) We have no data regarding product cost. (3) The COBOL/SF product is on the market and used, but we have no data on impact of the product on IBM revenues. (4) Error rates were claimed to be below industrial average and errors were minimal to fix. Formal methods, as both formalism and discipline, were credited. (5) While time to market was very important, we have no data to determine how important. The use of formal methods seemed to provide a steady process with ability to overcome what might have been significant technical obstacles (COBOL semantics, algorithms). 3.4.3 Process features General process effects Cost + (6) Impact of process + (7) Pedagogical + (8) Tools 0 (9) Specific process effects Design + (10) Reusable components n/a (11) Maintainability + (12) Requirements capture n/a (13) V&V + (14) (6) A relatively small team developed this product, with credit given to the formal methods influence (which may have motivated and attracted personnel). (7) The Cleanroom methodology gained recognition within IBM. (8) The Cleanroom methodology has become part of a broader IBM education effort. Individuals on the project learned both the new methodology and the technical aspects of this language processor. Feedback is built into the Cleanroom project monitoring system. (9) Tools did not play a major role in the COBOL/SF project. Cleanroom emphasizes mental power. (10) The function-theoretic approach was applied with success to this particular problem. (11) Part of previous versions were reused, but we did not discuss the role of formal methods in that. (12) This product is claimed to require a small amount of maintenance, less than one person per year. (13) By inference, Cleanroom is claimed to support requirements capture by forcing stabilization and following an incremental build. However, we did not discuss that aspect in this application. (14) Good error results---i.e., low defect rates per KLOC and severity---are claimed. 3.4.4 Observations The general outline of the product and the process are clearly presented in the CompSac paper [1]. Such data is essential for empirical case studies to begin building cost models for formal method processes. The interviews exposed several aspects of the application. While the product technical base is a variant of compiler technology and draws upon the core of computer science, restructuring to the degree required by COBOL/SF was regarded to be, and was, a technical challenge. Computer science expertise, including mathematics, was required for the team. Cleanroom may have worked well, primarily because of the nature of the problem. But embedding the computer science expertise in a process with extensive review and statistical testing distinguished the COBOL/SF project as an industrial process. Cleanroom is a difficult process to understand because it mixes together so many aspects. Despite our readings and interviews, we still do not have a strong feeling for the formal methods used. This point is addressed in the next section. Cleanroom attracts criticism on several points: Does the discipline amount to a "strait-jacket"? Would loosening up some aspects, e.g., allowing execution, provide some rewards and trade-offs? How strongly does the choice of formal method matter? How much does the degree of formalism matter? While Cleanroom is used in several IBM projects, it has hardly spread like wildfire, despite the positive COBOL/SF results. Are there inhibitors or does it just take time? Cleanroom pragmatism is notable. The simplest formal techniques are sought and shorn of language concerns, such as schemas in Z. Management thinking is then applied to draw personnel motivation, measurement, and process structure around the formalism. Transfer is promoted by backing up the formalism and management process with a broader industrial education base. In contrast, the NASA Goddard experience provides a perspective on Cleanroom where the formal methods usage is much less "hard-line." Less training is expected and available than in IBM with the anticipation that, if it proves itself, Cleanroom will be phased in gradually. As well, formal methods are viewed in the Goddard environment more as a review structuring mechanism rather than as a verification technique. 3.4.5 Formal Methods R & D Summary 3.4.5.1 Methods Specification From top to bottom, functions were the basic description, mixed with grammatical formalism. A newer approach, the box structure, aims at providing a more systematic top- level and decomposition technique. Design and Implementation Following the function-theoretic line of formalism, design is a process of (1) choosing more concrete data structures and refining functions and (2) decomposing functions to move toward standard language constructs (loops, conditionals, sequences, and assignments). Validation and Verification Validation comes from two perspectives: (1) the design team must agree that the specification says the right thing with regard to the problem before designing to it and (2) the test team must characterize the environment adequately to come up with usage profiles and therefore integrate other views of the requirements. Verification is split into (1) the review process based on correctness questions and mental proof and (2) testing guided by a reliability model. 3.4.5.2 Tools Language processors Only simple editors are required or used. However, some tools like VC Generators would be useful to track the verification process (see [4], Appendix A). Automated Reasoning Proof checkers and theorem provers were considered less important than the mental verification that is the basis of reviews. Some mechanical support is currently under consideration. Other tools The wastebasket as the repository of overly complex or out-of-control designs is a useful concept. Easily implemented, too. 3.4.5.3 Recommendations to the FM Research Community The results of Cleanroom warrant more study by the formal methods community than has occurred. While papers have been published on functional verification and box structures, these are not usually considered in the common lists of formalisms. Given the extent of experience, the specific packaging provided by box structures and functional verification approaches should be compared and contrasted with other approaches. The weaknesses of the methods should be identified, as well as the intrinsic reasons for why they work when they do. For example, we do not have an adequate characterization of the claim that function-theoretic methods scale whereas axiomatic methods do not. Given the extensive reliance on process, tool developers may not find much of interest to support in the Cleanroom methodology. However, there may be various toolsets, extending the traditional theorem prover, that would prove valuable and interesting. Reliability models are not well understood in the formal methods community. Their interaction with the end results of applying formal methods is worth more attention. For example, does the gain come from the usage analysis process or from the reliability model? Is the implicit "closed specification" an important factor? Reviews and inspections are inherent in industrial process but less attractive in academic-oriented software engineering. Can formal methods be taught in the context of reviews, e.g., following the correctness questions? Can reviews be taught as "structured reading according to the formal methods structures"? The Goddard Cleanroom Guidebook provides an accessible industrial process description. 3.5 Conclusions Cleanroom is an ambitious methodology that interweaves several software development practices. Its premises are that teams of software engineers can apply these practices, following a discipline that drives toward zero defect software. Because the methodology runs against the grain of many companies' practices, it requires significant commitment and time to change an organization. For our purposes, because Cleanroom incorporates so many practices, it is difficult to assess the specific contributions of formal methods. To date, these appear to have been primarily in the form of guidance for designing and checking software as it is developed. Better understanding of the specific techniques (functional verification, box structure specificat