Culture Conflicts in Software Engineering Technology Transfer
Marvin V. Zelkowitz* |
Dolores R. Wallace |
David W. Binkley |
|---|---|---|
Department of Computer Science and |
Information Technology Laboratory |
Computer Science Department |
Inst. for Advanced Computer Studies |
Natl. Inst. of Standards and Technology |
Loyola College |
University of Maryland |
Gaithersburg, Maryland 20899 |
Baltimore, Maryland |
College Park, Maryland 20742 |
and Information Technology Lab. |
|
and Fraunhofer Center - Maryland |
Natl. Inst. of Standards and Technology |
|
College Park, Maryland 20742 |
Gaithersburg, MD 20899 |
Abstract
Although the need to transition new technology to improve the process of developing quality software products is well understood, the computer software industry has done a poor job of carrying out that need. All too often new software technology is touted as the next "silver bullet" to be adopted, only to fail and disappear within a very short period. New technologies are often adopted without any convincing evidence that they will be effective, yet other technologies are ignored despite the published data that they will be useful. It is cClearly there is a clash between those developing new technologies and those responsible for developing quality products. In this paper we discuss a study conducted among a large group of computer software professionals in order to understand what techniques can be used to support the introduction of new technologies, and to understand the biases and opinions of those charged with researching, developing or implementing those new technologies. This study indicates which evaluation techniques are viewed as most successful under various conditions. We show that the research and industrial communities do indeed have different perspectives, which leads to a clash between the goals of the technology developers researchers and the needs of the technology users.
Keywords:
Experimentation, Survey, Technology transfer, Validation models
When the computer industry began several decades ago, software engineering was somewhat unique among engineering fields in that researchers and practitioners worked closely together in using and understanding this new technology. There was easy cross-fertilization between these two communities. Over time, this has changed with tremendous growth of computer applications, computer users, and computing professionals. as pProgramming languages have evolved from-low level assembler languages to todays very high level visual object-oriented languages. Simple programs have become complex large systems, with some systems running an entire enterprise. , and mMethods for developing programs have grown from design-writing on napkins to a myriad of overlapping processes comprising varieties of methods and documentation types.
InA response to this growth has been a corresponding growth in organizations dedicated to supplying an ever-increasing need for better tools and techniques for producing these complex products. Trade shows, research conferences, trade magazines proliferate on the technology scene. New professional technical journals regularly come alive to add to an already large number; the IEEE alone through its Computer Society currently publishes 20 monthly or bimonthly computer technology publications.
In spite of an abundance of methods and tools and information about them, why do the same problems appear over and over again in new software developments? Why are development schedules not met? Why do some systems fail? Why do some technical problems remain unsolved? While new solutions are frequently proposed, many have not been transferred into the industry at large. Many problems remain untouched by researchers. Why does it appear that today researchers and practitioners are no longer necessarily understanding each others needs and efforts?
Researchers have been looking at the role of experimentation in computer science research [Fenton94]. However, most of these have looked at the relatively narrow scope of how to conduct replicated scientific experiments within this domain. We have been looking at the larger problems of the role of experimentation as an agent in transferring new technology into industry. We have been studying various experimental methods, in addition to the replicated experiment, useful for validating newly developed software technology [Zelkowitz97] [Zelkowitz98], and we have also studied various evaluation methods industry uses before adopting a new technology. As we later explain, these two processes are very different. The questions that we kept on coming back important to us include "Which of these validation and evaluation methods are most effective?" "Why aren't these methods used more often?" and "Why don't these results provide evidence for the transference of a technology into industry?" In order tTo try andto understand these questions, we decided to survey a cross section of computer professionals to understand about their views about software engineering technology validation.
Researchers, whether in academia or industry, have a desire to develop new concepts and are rewarded when they produce new designs, algorithms, theorems, and models. The "work product" in this case is often a published paper demonstrating the value of their new technology. Development professionals, however, have a desire and are paid to produce a product using whatever technology seems appropriate for the problem at hand. The end result is a product that produces revenue for their employer.
Researchers select their research according to a topic of their own interest; the topic may or may not be directly related to a specific problem faced by industry. After achieving a result that they consider interesting, they have a great desire to get that result in print. Providing a good scientific validation of the technology is often not necessary for publication, and several studies have shown that experimental validation of computer technology is particularly weak, e.g., [Tichy95] [Zelkowitz98].
In industry, producing a product is most important and the "elegance" of the process used to produce that product is less important than achieving a quality product on time as a result. Being "state of the art" in industry often means doing things as well (or as poorly) as the competition, so there is considerable risk aversion to try a new technology unless the competition is also using it.
So we have the situation where Consequently, researchers produce papers outlining the values of new technology, yet industry often ignores that advice. Assorted "silver bullets" are proposed as solutions to the "software crisis" without any good justification that they may be effective, are used for a time by large segments of the community, and then are discarded whenwith they indeed turn out not to be the solution. Clearly the research community is not generating results that are in tune with what industry needs to hear, and industry is making decisions without the benefit of good scientific developments. The two communities are severely out of touch with one another. The purpose of our survey wais to try and understand these communities and understand their differences.
We began our effort to understand the differences between the research and industrial communities by examining models of experimentation for computer technology research. We identified 12 methods of experimentation that hasve been used in the computer field [Table 1.1] and verified their usage by studying 612 papers appearing in three professional publications at 5-year intervals [Zelkowitz98] from 1985 through 1995. About 20% of the papers contained no validation at all and another third contained only a weak ineffective form of validation. The figure for other scientific fields was more like 10% - 15% [Zelkowitz97]. The methods are defined in Appendix 1.
Table 1.1 Experimental Validation Models |
|
| Case study | Project monitoring |
| Dynamic analysis | Replicated |
| Field study | Simulation |
| Legacy data | Static analysis |
| Lessons learned | Synthetic |
| Literature search | Theoretical analysis |
Our results were consistent with those found by Tichy in his 1995 study of 400 research papers [Tichy95]. He found that over 50% of the design papers did not have any validation in them. In a more recent paper [Tichy98], Tichy makes a strong argument that more experimentation is needed and refutes several myths deprecating the value of experimentation.
Given the set of research validation methods, we then sought to determine the techniques actually used by industry in order to transition a new technology. We visited several large development corporations and interviewed reasonably high level individuals, such as Chief Scientist, Chief Technology Officer, and managers of large divisions. All had ultimate responsibility for technology selection. They were primarily influenced by trade shows, weekly trade magazines, Web information, customer opinion (i.e., technologies that would win the contract), vendor opinion, friends in other companies, and infrequently by the papers in professional technical journals. Sometimes recommendations from technical staff would be based on their readings and would eventually reach the managers offices. Once a technology was identified, the companies might perform a pilot study or were mentored by an expert of the technology to determine if the technology would be effective.
Based on these industrial interviews and some earlier work by Brown and Wallnau [Brown96], we defined a set of industrial transition models for technology evaluation. While the transition models include some that are similar to those of the researchers, many are different [Table 1.2]; Appendix 2 provides a short description of these models.. For example, vendor opinion (e.g., trade shows, weekly trade magazines, web information) seemed important to industry; Web information also provides access to research literature so we needed to separate the medium in which information is located from the type of model that information supports. An important finding, though, is that everyone with whom we spoke claimed to use the web to find technology information.
The appendix gives a short description of these models.
Table 1.2 Industrial Transition Models |
|
| Case study | Research literature |
| Data mining | Shadow (replicated) project |
| Demonstrator projects | State of the art |
| Feature benchmark | Survey |
| Field study | Theoretical analysis |
| Measurement | Vendor opinion |
| Pilot study | |
Our interviews revealed that a company may use people-oriented methods for technology transfer. For example, a company may hire a well-recognized expert in that technology, perhaps its creator, to help integrate the method into company practices. They may specifically recruit people who have that skill on their resumes. Another practice appears to be training by hiring an expert to teach in-house training or by sending their personnel to universities or training companies.
In retrospect we would have entered these models in our survey, especially because the survey results discussed in Section 4 indicate that in two instances, two models could have been combined. Field study and survey both estimate the probable effects of some new technology. In the field study, several development groups may be observed over a short time period while in the survey several experts may discuss their opinions based on their expertise in the technology. They are rather closely aligned in time and people requirements and were perceived approximately the same. A pilot study involves a sample project, usually small, to study a new technique while demonstrator studies are less complete multiple instances of a pilot study.
Researchers principally use methods from Table 1.1 in order to demonstrate the value of their technological improvements and industry selects new technology to employ by using the models methods in Table 1.2. How do these communities interact? How can their methods support forward growth in computer technology and its application in real systems? We need to develop a better understanding of what each community understands and values. Then, perhaps, we can identify commonalities and gaps, and from there, mechanisms to enable each community to benefit better from the other.
In order tTo understand the different perceptions between those who develop technology and those who use technology, we decided to survey the software development community to learn how they view the effectiveness of the various evaluation models mentioned in the previous sectionof Tables 1.1 and 1.2. For questions, we based our survey on a previous survey [Daly97], modified for our current purposes. Each survey participant was to rank the difficulty of each of our 12 experimental models (or 13 evaluation models) according to 7 criteria, criteria 1 and 2 being new and 3 through 7 being the same as the Daly criteria. We decided to try and to obtain an objective score by having all values ranked between 1 and 20, with 10 being arbitrarily defined as the maximum difficulty that a given company would apply in practice, and 20 being defined as an impossible model for that criterion.
2.1 Survey questions
The 7 questions we chose to use were:
In an eighth questionaddition, we asked each participant an eighth question to rank the relative importance (again using the 1-20 ranking) of each of the 7 questions when making a decision on using a new technology. That is, which of the 7 questions was most important when a new technology was being evaluated?
We developed two different survey instruments from these 8 questions -- one by ranking each of the 12 research validation methods described earlier of Table 1.1 (i.e., the research survey) and one by ranking each of the 13 evaluation methods also described in the previous section of Table 1.2 (i.e., the industrial survey).
For our 2 survey instruments we obtained three populations to sample. Sample 1 were included the U.S.-based authors with email addresses published in forseveral recent software engineering conference proceedingss who had email addresses. These were mostly research professionals, although included a few developers. Approximately 150 invitations to participate were sent to these individuals, to participate, and 45 accepted. responded favorably. The survey was not sent until the participant agreed to fill out the form, which we estimated would take about an hour to 90 minutes to read and fill out. About half of the individuals returned the filled out completed form.
Sample 2 were included U.S.-based authors with email addresses forrom several recent industry-oriented conferences. who also had email addresses. They were sent the industrial survey. About 150 invitations to participate were sent out and about 50 responded favorably to our invitation. They were then sent the survey. Again, about half filled out completed and returned the formand returned it.
Sample 3 were thewere students in a (graduate?) professionalsoftware engineering course at the University of Maryland taught by one of the authors of this paper. This sample was sent given the research survey. This course was part of a masters degree program in software engineering, and almost all of the students were working professionals with experience ranging up to 24 years. Not surprisingly, the return rate of the form for this sample was high at 96% (44 of 46).
It is important to realize that we wanted the subjective opinion of those surveyed on the value of the respective validation techniques based upon several criteria. Not everyone returning the survey had previously used all, or even any, of the listed methods. We simply wanted their views on how important they thought the methods were. However, by choosing our sample populations from those writing papers for conferences or taking courses for career advancement, we believe we have chosen sample populations that are more knowledgeable, in general, about validation methods than the average software development professional. The invitations were sent early in 1998, and data was collected February through early April, 1998. Table 2.1 summarizes the 3 sample populations.
Table 2.1 Characteristics of each survey sample |
|||||||
| Sample | Survey | Sample size | Years exper. | Academic Position | Industrial R&D | Industrial developer | Other (e.g., Consultants) |
| 1 (Research) | Research | 18 |
18.6 |
9 |
3 |
3 |
3 |
| 2 (Industry) | Industry | 25 |
19.1 |
0 |
5 |
8 |
12 |
| 3 (Students) | Research | 44 |
6.6 |
1 |
5 |
27 |
11 |
Our initial concern was to determine bias in the set of responses. Would certain individuals rank all techniques high or low compared to other individuals? In order to test for this, we computed the average raw scores for each technique for each question, and we also ranked each answer (i.e., computing the easiest technique for each question, second easiest, third easiest, , 12th easiest). This would eliminate such bias, but would also eliminate the significance of the value 10 being the subjective value of "hard to do." Fortunately, we believe that we dont have to take this into account. Figure 1 shows the value for the question "Easy to do." The first column represents the average raw scores for the 12 methods of Table 1.1 from the research sample (sample 1) and the second column is the average ranked score. Low values indicate the more important techniques. The fact that the ordering of the techniques from best to worst was essentially the same indicates that the raw score is an accurate reflection of the ranking. Only the 3rd and 4th, 5th and 6th, and 9th and 10th techniques switched places, not a major change. Columns 3 and 4 represent similar data from the student sample (sample 3). Here only the third and fourth and eighth and ninth techniques switched places. However, there are some slight differences between sample 1 and sample 3, which will be discussed in Section 4.
Similar charts were obtained from the other questions. In addition, the correlation between the raw scores and the ranked scores for sample 1 was 0.86, 0 .96 for sample 2 and 0.93 for sample 3. On this basis, we decided we could use the raw data and did not need to use only the ranked data for comparisons.
The average value for each technique for each of the 7 criteria appears in Figures 2 through 4. Figure 2 represents the average score for each of the 12 experimental methods over all 7 criteria for sample 1 with alpha=.05 confidence interval bars surrounding each average value. The "7" in each criterion represents the midpoint among the methods in order to make it easier to read the figure. Of greatest interest are bars that do not overlap, meaning there is a 95% probability that the average values for those techniques

Figure 1. Easy to do. Average value for each of 12 validation methods.
indicate a significant difference. Figure 3 represents a similar graph for sample 2 (the industrial group ranking 13 techniques) and Figure 4 represents a similar graph for sample 3 (the student industrial sample).

Figure 2. Sample 1 (research group) results.

Figure 3. Sample 2 (industry group) results.

Figure 4. Sample 3 (student industrial group) results.
One way to simplify the data from these figures is to split the methods for each criterion into three partitions: practical, neutral, and impractical. The following procedure was applied:
Tables 3.1 through 3.3 summarize this process giving the practical and impractical techniques. All other methods are in the neutral partition.
Table 3.1 Practical and impractical techniques from research sample |
|||||||
| Easy | Addit. $ | Int. val. | Ext. val. | Ease of repl. | Theory gen. | Theory conf. | |
| Practical | Dyn. anal | Legacy data | Dyn. anal. | Dyn. anal. | Replicated | ||
| Les. learned | Proj. mon. | Replication | Simulation | ||||
| Legacy data | Static anal. | Static anal. | |||||
| Static anal. | |||||||
| Impractical | Replicated | Replicated | Case study | Case study | Legacy data | ||
| Synthetic | Field study | ||||||
| Les. learned | |||||||
Table 3.2 Practical and impractical techniques from industry sample |
|||||||
| Easy | Addit. $ | Int. val. | Ext. val. | Ease repl. | Theory gen. | Theory conf. | |
| Practical | Case study | Res. Lit | Measure | Field study | Measure | Data mining | Field study |
| Pilot study | Survey | Measure | Res. Lit. | Measure | Measure | ||
| Survey | Vendor opin. | Theory anal. | |||||
| Vendor opin. | |||||||
| Impractical | Replicated | Replicated | State of art | State of art | Vendor opin. | State of art | |
| Vendor opin | Vendor opin | Vendor opin | |||||
Table 3.3 Practical and impractical techniques from student industrial sample |
|||||||
| Easy | Addit. $ | Int. val. | Ext. val. | Ease repl. | Theory gen. | Theory conf. | |
| Practical | Case study | Case study | Case study | Case study | Case study | Case study | Field study |
| Legacy data | Legacy data | Dyn. Anal. | Legacy data | Field study | |||
| Proj. mon. | Proj. mon. | Simulation | Theory anal. | ||||
| Lit. search | |||||||
| Impractical | Replicated | Replication | Proj. mon. | Synthetic | Proj. mon. | Proj. mon. | |
| Synthetic | Synthetic | Theory anal. | Theory anal. | ||||
| Theory anal. | Theory anal. | ||||||
Our final 8th question was to rate the importance of each of the 7 questions when making a decision on using a new technology. The purpose was to determine which of the criteria was most important when making such a decision. Figure 5 summarizes those answers on a single chart, the column labeled 1 representing the average values for the first sample, column 2 representing the average value for sample 2 and column 3 being sample 3.
Figures 2 and 4 and Tables 3.1 and 3.3 present a summary of our findings for the research validation methods. We summarize some of the observations from those figures.
In terms of easiness (question 1), replicated experiments and synthetic experiments for the research sample and replicated experiments, synthetic experiments and theoretical analysis for the student industrial sample were viewed as significantly (at the .05 level) harder to do than the other techniques and as impractical according to Tables 3.1 and 3.3. With average scores above 10, the consensus of these groups was that industry would never use such techniques as part of a validation strategy. It is no wonder that such techniques are rarely reported in the literature. In our earlier survey [Zelkowitz98] only 3.2% of the reported studies used synthetic or replicated experiments.
On the other hand, these two groups differed in their belief in the effectiveness of theoretical analysis with respect to internal and external validity (questions 3 and 4). Whereas the research group considered a theoretical validation likely to be used as much as any other technique (i.e., in the neutral partition of Table 3.1), the industrial group considered it most difficult to use, preferring instead the "hands on" techniques over the more formal arguments.
Other than the cost and ease issues, none of the other criteria exhibited significant differences among the respondents. However, when we combine the criteria into a single composite number, differences do become apparent (See Section 4.3).
4.2 Preferred industrial methods
Figure 3 and table 3.2 give the basic results for the industrial transition methods. As with the research population, the replicated (shadow) project had an average rating (over all 7 questions) of over 10, signifying little industrial interest in performing such studies. Vendor opinion also averaged above 10, as did the need to be state of the art.
These high scores were all probably due to different reasons. Replicated experiments were viewed as hardest to do (highest score among all techniques at about 13.5), while vendor opinion had the worst internal and external validity (the ability for the method to explain the phenomenon under study, i.e., trusting the vendor to give the correct explanation). On the other hand, the need to be state of the art also suffered with respect to internal and external validity.
It is interesting to note that according to table 3.2, vendor opinion was considered practical according to ease of use (criterion 1), yet was impractical according to the criteria that dealt with accuracy of the evaluation (questions 3, 4, 6 and 7).
Theoretical analysis was harder to do than any other technique except the replicated project.

Figure 5. Relative importance of each criterion.
4.3 Culture differences
By comparing results across different samples, we gain an appreciation of the differing values in the software engineering community. Although sample 2 evaluated the industrial methods according to our 7 criteria and sample 3 evaluated the research methods for the same criteria, both were made up mostly of professional developers. Question 8, the importance of each criterion, reveals strong agreement between these two populations, and strong disagreements with the research professionals from sample 1.
Figure 5 summarizes this result. Both samples 2 and 3 viewed easy to do, internal validity (that the validation confirmed the effectiveness of the technique) and the ease of replicating the experiment as the most important criteria in choosing a new method. While internal validity was important, external validity was of less crucial concern. That can be interpreted as the self-interest of industry in choosing methods applicable to its own environment and of less concern if it also aided a competitor.
On the other hand, for the research community of sample 1, internal and external validity, the ability of the validation to demonstrate effectiveness of the technique in the experimental sample and also to be able to generalize to other samples, were the primary criteria. Confirming a theory was next, obviously influenced by the research community's orientation in developing new theoretical foundations for technology. At the other end of the scale, cost was of less concern where ease of replication was only 5th most important and cost of adding additional subjects was rated as last.
This points out some of the problems we addressed at the beginning of this paper. The research community is more concerned with theory confirmation and validity of the experiment and less concerned about costs, whereas the industrial community is more concerned about costs and applicability in their own environment and less concerned about general scientific results which can aid the community at large.
4.4 Composite measures
Given the set of 7 criteria, can we generate any composite measure for evaluating the effectiveness of the various validation methods? Since we have the respondents impressions of the importance of each of the 7 criteria (via Figure 5), one obvious composite measure is the weighted sum of all the criteria evaluations. In this case, low score would determine the most significant methods. Table 4.1 gives these results.
Table 4.1 Composite measures |
|||||
Sample 1 ordering |
Sample 3 ordering |
Sample 2 ordering |
|||
(Research group) |
(Student group) |
(Industry group) |
|||
| Simulation | 288 | Case study | 284 | Measurement | 258 |
| Static analysis | 292 | Legacy data | 314 | Data mining | 305 |
| Dynamic analysis | 298 | Field study | 315 | Theoretical analysis | 324 |
| Project monitoring | 301 | Simulation | 333 | Research literature | 325 |
| Lessons learned | 339 | Dynamic analysis | 355 | Case study | 326 |
| Legacy data | 345 | Static analysis | 361 | Field study | 327 |
| Synthetic study | 346 | Literature search | 370 | Pilot study | 329 |
| Theoretical analysis | 348 | Replicated experiment | 387 | Feature benchmark | 338 |
| Field study | 363 | Project monitoring | 388 | Survey | 343 |
| Literature search | 367 | Lessons learned | 391 | Demonstrator project | 345 |
| Replicated experiment | 368 | Theoretical analysis | 405 | Replicated project | 361 |
| Case study | 398 | Synthetic study | 418 | State of the art | 407 |
| Vendor opinion | 469 | ||||
Table 4.1 reveals some interesting observations:
5. Conclusions
In this paper we discuss a survey taken from approximately 90 software engineering professionals. The survey evaluated subjective opinions on the value of validation methods for transferring new technology into industry. The idea was to study those methods used by the research community to validate new technologies and those methods used by industry to evaluate a new technology and to try and understand the differences. From this survey, we can make the observation that the research community and the development community do indeed have different perceptions of the role of experimentation to validating new technology. Researchers are more interested in how well a theory has been validated, whereas industry is more attuned, as expected, to how well the technique works in their own environment. Costs, while important to the industry sample, are mostly ignored by the research community.
Publication of research results is a major focus of the research community. In this respect, journal editors can play an important role in affecting this cultural difference. Developing new technologies and getting them into use should be a major focus of software engineering research. Editors of journals consider requiring more real-world validation using models like case studies, legacy data and field studies and be more suspect at validation via laboratory models, such as simulation and synthetic studies.
The survey also indicates that one should not simply be state of the art simply to be "fashionable" or listen to vendors for technology transfer decisions. Such decisions should depend on more technological reasons. Yet such actions are taken daily.
Measurement became the most important industrial decision making process in our composite analysis, yet anecdotal evidence indicates that much of industry does not collect the necessary data to build measurement programs. For the most part, our earlier survey [Zelkowitz98], the composite scores, and the results in Tables 3.1 to 3.3 are compatible. In the earlier survey, papers studied from 1995 used case study and lessons learned equally, followed by simulation at half that number. In Table 3.3, the student population considering the research techniques ranked case study as practical in six of the seven questions. The industrial group (Table 3.2) selected either measurement or case study as practical for six of the seven questions, but the researchers find case study either impractical or neutral. Case study requires collection of data and measurement. It appears that the industry population values these measurement techniques as important, cost is a significant driver to industry, measurement techniques are perceived as too expensive. Better methods and tools for aiding measurement techniques are required to address industry concerns and to make the techniques more acceptable to researchers.
Given that industry is most concerned with internal validity, better tools are needed to aid the research community so that external validity can be conveyed more effectively to the industrial community. This would limit the effects of the "silver bullet" solution to complex problems. Studies are needed to identify:
Some of the results obtained here may be viewed as obvious, but we believe that these opinions have not been quantified previously. The industrial and the research community do look at method validation for different purposes, so it is not too surprising that one does not share the beliefs of the other. This leads to conflicts when one group does not provide or use the results of the other.
Given the set of techniques described here, it would aid both communities if those techniques near the top of the rankings had better tool support. Measurement is clearly important to the industrial professional, so less expensive data collection methods are needed. Tools for collecting defect data or analyzing defect and resource data are needed. Tools to better evaluate case studies would help. How to deal with the high cost and poor perception of the replicated experiment needs to be further studied.
In this paper, as with our earlier survey of the research literature, we have tried to understand the process that organizations use to evaluate new technologies and transition them into industrial use. We haven't solved the significant technology transition problems with this survey, but we do believe we have indicated where further research is needed and why some of the current problems in technology transition exist. We need to further understand both cultures in order to determine which technique can best enable industry to make intelligent choices on which new technology to use and, we emphasize the need for research to develop the methods and tools to make these techniques practical..
Acknowledgments
We thank Dr. Nien Zhang for his suggestions regarding statistical methods for viewing this data.
References
[Brown96] Brown A. W. and K. C. Wallnau, A framework for evaluating software technology, IEEE Software, (September, 1996) 39-49.
[Fenton94] Fenton N., S. L. Pfleeger, and R. L. Glass, Science and substance: A challenge to software engineers, IEEE Software, Vol. 11, No. 4, 1994, 86-95.
[Daly97] Daly, J., K. El Emam, and J. Miller, Multi-method research in software engineering, 1997 IEEE Workshop on Empirical Studies of Software Maintenance (WESS 97) Bari, Italy, October 3, 1997.
[Tichy95] Tichy W. F., P. Lukowicz, L. Prechelt, and E. A. Heinz, Experimental evaluation in computer science: A quantitative study, J. of Systems and Software Vol. 28, No. 1, 1995 9-18.
[Tichy98] Tichy, W., Should computer scientists experiment more?, Computer, Vol.31, No.5, 1998, pp. 32-40.
[Zelkowitz97] Zelkowitz M. and D. Wallace, Experimental validation in software engineering, Information and Software Technology, Vol. 39, 1997, 735-743.
[Zelkowitz98] Zelkowitz M. and D. Wallace, Experimental models for validating technology, Computer, Vol.31, No.5, 1998, 23-31.
APPENDIX 1 -- Types of Research Validation
APPENDIX 2 -- Types of Industrial Evaluation