Foundational Models for 21 st Century
Program Evaluation

by

Daniel L. Stufflebeam
The Evaluation Center Western Michigan University

The Evaluation Center
Occasional Papers Series

December 1, 1999

Foundational Models for 21st Century
Program Evaluation 12

In moving to a new millennium, it is an opportune time for evaluators to critically appraise their program evaluation approaches and decide which ones are most worthy of continued application and further development. It is equally important to decide which approaches are best abandoned. In this spirit, this paper identifies and assesses 22 approaches often employed to evaluate programs. These approaches, in varying degrees, are unique and comprise most program evaluation efforts. Two of the approaches, reflecting the political realities of evaluation, are often used illegitimately to falsely characterize a program’s value and are labeled pseudoevaluations. The remaining 20 approaches are typically used legitimately to judge programs and are divided into questions/methods-oriented approaches, improvement/ accountability approaches, and social agenda/advocacy approaches. The best program evaluation approaches appear to be Outcomes Monitoring/Value-Added Assessment, Case Study, Decision/Accountability, Consumer-Oriented, Client-Centered, Constructivist, and Utilization-Based, with the new Democratic Deliberative approach showing promise. The worst bets seem to be Politically Controlled, Public Relations, Accountability (especially payment by results), Clarification Hearings, and Program Theory-Based. The rest fall somewhere in the middle. All legitimate approaches are enhanced when keyed to and assessed against professional standards for evaluations.

1

This paper was prepared for The Evaluation Center’s Occasional Papers Series. It is based on a presentation in the State of the Evaluation Art and Future Directions in Educational Program Evaluation Invited Symposium at the annual meeting of the American Educational Research Association; Montreal, Quebec, Canada; April 20, 1999.

2

Appreciation is extended to colleagues who critiqued prior drafts of this paper, especially Sharon Barbour, Jerry Horn, Tom Kellaghan, Gary Miron, Craig Russon, James Sanders, Sally Veeder, Bill Wiersma, and Lori Wingate. While their valuable assistance is acknowledged, the author is responsible for the paper’s contents and especially any flaws.

ii

Table of Contents

I INTRODUCTION ...........................................................1

Overview of the Paper ....................................................1
Evaluation Models and Approaches ...................................1
The Nature of Program Evaluation ....................................2
Need to Study Alternative Approaches .................................2
Classifications Of Alternative Evaluation Approaches ...........................2
Program Evaluation Defined .........................................3
Pseudoevaluations .................................................3
Questions/Methods-Oriented Approaches ...............................3
Improvement/Accountability-Oriented Evaluations . . . . . . . . . . . . . . . . . . . . . . . 4
Social Agenda-Directed (Advocacy) Models ............................4
Caveats ..........................................................5

II PSEUDOEVALUATIONS ....................................................7

Approach 1: Public Relations-Inspired Studies ..........................7
Approach 2: Politically Controlled Studies .............................8

III QUESTIONS/METHODS-ORIENTED EVALUATION APPROACHES . . . . . . . . . . . . . 11

Approach 3: Objectives-Based Studies................................11
Approach 4: Accountability, Particularly Payment By Results Studies . . . . . . . 12
Approach 5: Objective Testing Programs..............................13
Approach 6: Outcomes Monitoring/Value-Added Assessment . . . . . . . . . . . . . 15
Approach 7: Performance Testing ...................................17
Approach 8: Experimental Studies ...................................18
Approach 9: Management Information Systems.........................19
Approach 10: Benefit-Cost Analysis Approach .........................20
Approach 11: Clarification Hearing ..................................22
Approach 12: Case Study Evaluations ................................23
Approach 13: Criticism and Connoisseurship ..........................24
Approach 14: Program Theory-Based Evaluation .......................25
Approach 15: Mixed Methods Studies ................................28

IV IMPROVEMENT/ACCOUNTABILITY-ORIENTED EVALUATION APPROACHES . . 41

Approach 16: Decision/Accountability-Oriented Studies . . . . . . . . . . . . . . . . . . 41
Approach 17: Consumer-Oriented Studies .............................43
Approach 18: Accreditation/Certification Approach . . . . . . . . . . . . . . . . . . . . . 45

V SOCIAL AGENDA-DIRECTED (ADVOCACY) APPROACHES . . . . . . . . . . . . . . . . . . . 53

Approach 19: Client-Centered Studies (or Responsive Evaluation) . . . . . . . . . . 55
Approach 20: Constructivist Evaluation...............................56
Approach 21: Deliberative Democratic Evaluation ......................58
Approach 22. Utilization-Focused Evaluation ..........................60

VI Best Approaches for 21st Century Evaluations ...................................71

Table 19: RATINGS Strongest Program Evaluation Approaches................72

Conclusions ...........................................................73

Recommendations ......................................................74

Notes ......................................................................75

Bibliography.................................................................76

Appendix ...................................................................87

Checklist for Rating Evaluation Approaches in Relationship to The Joint Committee Program Evaluation Standards....................89

iv

The Occasional Paper Series is published by The Evaluation Center on the campus of Western Michigan University. Its purpose is to advance the theory and practice of evaluation by reporting on new developments in the profession. Authors who contribute to the series retain copyright or their work. This allows them to publish early drafts of a paper, obtain feedback from readers, make necessary modifications, and go on to publish in other venues.

In this volume of the Series, published on the eve of a new millennium, Daniel Stufflebeam reviews the evaluation models that have emerged and identified the models that offer the greatest prospects for future success. Few in the profession are better able to do this than Stufflebeam. During his career, which spans nearly four decades, he has developed nearly 100 standardized tests, authored the CIPP evaluation model, was the first Chair of the Joint Committee on Standards for Educational Evaluation, and pioneered the concept for metaevaluation.

The reader is invited to join the ranks of authors who have published in the Occasional Paper Series including Donald Campbell, Gene Glass, Arnold Love, James Sanders, Michael Scriven, Lori Shephard, Robert Stake, and Daniel Stufflebeam. Manuscripts should be 50-100 pages in length and significant to the field of evaluation. All submissions are reviewed for acceptability by the editorial team made up of the staff of The Evaluation Center.

Craig Russon, Ph.D.
Editor,
The Occasional Paper Series

v

I. INTRODUCTION

Overview of the Paper

Evaluators today have at their disposal many more evaluation approaches than in 1960. As evaluators prepare to surmount the Y2K challenges and cross into the next century, it is an opportune time to consider what 20th century evaluation developments are best to take along and which ones would best be left behind. I have, in this paper, attempted to sort 22 alternative evaluation approaches into what fishermen sometimes call the “keepers” and the “throwbacks.” More importantly, I have attempted to characterize each approach; identify its strengths and weaknesses; and consider whether, when, and how each approach is best applied. The reviewed approaches emerged mainly in the U.S. between 1960 and 1999.

Following a period of relative inactivity in the 1950s, a succession of international and national forces stimulated the development of evaluation theory and practice. Main influences were the efforts to vastly strengthen the U. S. defense system spawned by the Soviet Union’s 1957 launching of Sputnik I; the new U.S. laws in the 1960s to equitably serve persons with disabilities and minorities; the federal evaluation requirements of the Great Society programs initiated in 1965; the U.S. movement begun in the 1970s to hold educational and social organizations accountable for both prudent use of resources and achievement of objectives; the stress on excellence in the 1980s as a means of increasing U.S. international competitiveness; and the trend in the 1990s for various organizations, both inside and outside the U.S., to employ evaluation to assure quality, competitiveness, and equity in delivering services. Education has consistently been at the heart of societal reforms in the U.S., and the

U.S. society has repeatedly pressed educators to show through evaluation whether or not improvement efforts were succeeding.

The development of program evaluation as a field of professional practice was also spurred by a number of seminal writings. These included, in chronological order, publications by Tyler (1942, 1950), Campbell and Stanley (1963), Cronbach (1963), Stufflebeam (1966), Tyler (1966), Scriven (1967), Stake (1967), Stufflebeam (1967), Suchman (1967), Alkin (1969), Guba (1969), Provus (1969), Stufflebeam et al. (1971), Parlett and Hamilton (1972), Eisner (1975), Glass (1975), Cronbach and Associates (1980), House (1980), and Patton (1980). These and other authors/scholars began to project alternative approaches to program evaluation. In the ensuing years a rich literature on a wide variety of alternative program evaluation approaches developed [see, for example, Cronbach (1982); Guba and Lincoln (1981, 1989); Nave, Misch, and Mosteller (1999), Nevo (1993); Patton (1982, 1990, 1994, 1997); Rossi and Freeman (1993); Schwandt (1984); Scriven (1991, 1993, 1994a, 1994b, 1994c); Shadish, Cook, and Leviton (1991); Smith, M. F. (1989); Smith, N. L. (1987); Stake (1975b, 1988, 1995); Stufflebeam (1997); Stufflebeam and Shinkfield (1985); Wholey, Hatry, and Newcomer (1995); Worthen and Sanders (1987, 1997)].

Evaluation Models and Approaches

The chapter uses the term evaluation approach rather than evaluation model because, for one reason, the former is broad enough to cover illicit as well as laudatory practices. Also, beyond covering both creditable and noncreditable approaches, some authors of evaluation approaches say that the term model is too demanding to cover their published ideas about how to conduct program evaluations. But for these two considerations, the term model would have been used to encompass most of the evaluation proposals discussed in this chapter. This is so because most of the presented approaches are idealized or “model” views for conducting program evaluations according to the beliefs and experiences of their authors.

The Nature of Program Evaluation

The chapter employs a broad view of program evaluation. It encompasses evaluations of any coordinated set of activities directed at achieving goals. Examples are assessments of ongoing, cyclical curricular programs; time-bounded projects; and regional or state systems of services. Such program evaluations both overlap and yet are distinguishable from other forms of evaluation, especially student evaluation, teacher evaluation, materials evaluation, and school evaluation. The program evaluation approaches that are considered cut across a wide array of programs and services, e.g., curriculum innovations, school health services, counseling, adult education, preschool, state systems of education, school-to-work projects, adult literacy, and parent involvement in schools. Clearly, program evaluation applies importantly to a broad array of activities.

Need to Study Alternative Approaches

The study of alternative evaluation approaches is vital for the professionalization of program evaluation and for its scientific advancement and operation. Professionally, careful study of the approaches being employed in the name of program evaluation can help evaluators legitimize approaches that comport with sound principles of evaluation and discredit those that don’t. Scientifically, such a review can help evaluation researchers identify, examine, and address conceptual and technical issues pertaining to the development of the evaluation discipline. Operationally, a critical view of alternatives can help evaluators consider and assess optional frameworks for planning and conducting particular studies. On this point, the author has found that different approaches may work differentially well, depending on the evaluation’s context. Often it is advantageous to borrow strengths of different approaches to create a “best fit” approach for specific evaluation projects. Thus, it behooves evaluators to develop a repertoire of different legitimate approaches they can use, plus the ability to discern which approaches work best under what circumstances. However, a main value in studying alternative program evaluation approaches is not to enshrine any of them. On the contrary, the purposes are to discover their strengths and weaknesses, decide which ones merit substantial use, determine when and how they are best applied, and obtain direction for improving these approaches and devising better alternatives.

Classifications Of Alternative Evaluation Approaches

In analyzing the 22 alternative evaluation approaches, prior assessments regarding program evaluation’s state of the art were consulted. Stake’s analysis of 9 program evaluation approaches provided a useful application of advance organizers (the types of variables used to determine information requirements) for ascertaining different types of program evaluations.1 Hastings’ review of the growth of evaluation theory and practice helped to place the evaluation field in a historical perspective.2 Guba’s presentation and assessment of six major philosophies in evaluation was provocative.3 House’s (1983) analysis of different approaches illuminated important philosophical and theoretical distinctions. Finally, Scriven’s (1991, 1994a) writings on the transdiscipline of evaluation helped to sort out different evaluation approaches; it was also invaluable in seeing program evaluation approaches in the broader context of evaluations focused on various objects other than programs. Although the paper does not always agree with the conclusions put forward in these publications, all of the prior assessments helped sharpen the issues addressed.

Program Evaluation Defined

In characterizing and assessing different evaluation approaches, careful consideration was given to the various kinds of activities conducted in the name of program evaluation. These activities were classified based on their degree of conformity to a particular definition of evaluation. This chapter defines evaluation as a study designed and conducted to assist some audience to assess an object’s merit and worth. This definition should be widely acceptable because it agrees with common dictionary definitions of evaluation; also, it is consistent with the definition of evaluation that underlies published sets of professional standards for evaluations (Joint Committee 1981, 1994). However, it will become apparent that many studies done in the name of program evaluation either do not conform to this definition or directly oppose it.

The above definition of an evaluation study was used to classify program evaluation approaches into four categories. The first category includes approaches that promote invalid or incomplete findings (referred to as pseudoevaluations), while the other three include approaches that agree, more or less, with the employed definition of evaluation

(i.e., Questions/Methods-Oriented, Improvement/Accountability, and Social Agenda/Advocacy).

Pseudoevaluations

This paper’s first group of program evaluation approaches includes what I have termed pseudoevaluations . These promote a positive or negative view of a program, irrespective of its actual merit and worth. Such studies often are motivated by political objectives, e.g., persons holding or seeking authority may present unwarranted claims about their achievements and/or the faults of their opponents or hide potentially damaging information. These objectionable approaches are presented because they deceive through evaluation and can be used by those in power to mislead constituents or to gain and maintain an unfair advantage over others, especially those persons with little power. If evaluators acquiesce to and support pseudoevaluations, they help promote and support injustice, mislead decision making, lower confidence in evaluation services, and discredit the evaluation profession. Thus, the paper discusses pseudoevaluations in order to sensitize professional evaluators and their clients to the prevalence of and harm caused by such inappropriate studies and to convince them to oppose such invalid evaluation practices.

Questions/Methods-Oriented Approaches

The second category of approaches includes studies that are oriented to (1) address specified questions whose answers may or may not be sufficient to assess a program’s merit and worth and/or (2) use some preferred method(s). These Questions/Methods-Oriented Approaches include studies that employ as their starting points operational objectives, standardized measurement devices, cost analysis procedures, expert judgment, a theory or model of a program, case study procedures, management information systems, designs for controlled experiments, and/or a commitment to employ a mixture of qualitative and quantitative methods. Most of them emphasize technical quality and posit that it is usually better to answer a few pointed questions well than to attempt a broad assessment of something’s merit and worth. Since these approaches tend to concentrate on methodological adequacy in answering given questions rather than determining a program’s value, the set of these approaches may be referred to as quasi-evaluation approaches. While they are typically labeled as evaluations, they may or may not meet the requirements of a sound evaluation.

Improvement/Accountability -Oriented Evaluations

The third set of approaches involves studies designed primarily to assess and/or improve a program’s merit and worth. These are labeled

Improvement/Accountability-Oriented Evaluations. They are expansive and seek comprehensiveness in considering the full range of questions and criteria needed to assess a program’s value. Often they employ the assessed needs of a program’s stakeholders as the foundational criteria for assessing the program’s merit and worth. They seek to examine the full range of appropriate technical and economic criteria for judging program plans and operations. They also look for all relevant outcomes, not just those keyed to program objectives. Such studies sometimes are overly ambitious in trying to provide broad-based assessments leading to definitive, documented, and unimpeachable judgments of merit and worth. Typically, they must use multiple qualitative and quantitative assessment methods to provide cross-checks on findings. In general, these approaches conform closely to this paper’s definition of evaluation.

Social Agenda/Directed (Advocacy) Models

The fourth category of approaches is labeled Social Agenda/Directed (Advocacy) Models. The approaches in this group are quite heavily oriented to employing the perspectives of stakeholders as well as experts in characterizing, investigating, and judging programs. Mainly, they eschew the possibility of finding right or best answers and reflect the philosophy of postmodernism, with its attendant stress on cultural pluralism, moral relativity, and multiple realities. Typically, these evaluation approaches favor a constructivist orientation and the use of qualitative methods. These evaluation approaches emphasize the importance of democratically engaging stakeholders in obtaining and interpreting findings. They also stress serving the interests of underprivileged groups. Worries about these approaches are that they might concentrate so heavily on serving a social mission that they fail to meet the standards of a sound evaluation. For example, if an evaluator is so intent on serving the underprivileged, empowering the disenfranchised, and/or righting educational and/or social injustices, he or she might compromise the independent, impartial perspective needed to produce valid findings. In the extreme, an advocacy evaluation could compromise the integrity of the evaluation process in order to achieve social objectives and thus devolve into a pseudoevaluation. The particular social agenda/advocacy approaches presented in this paper seem to have sufficient safeguards needed to walk the fine line between sound evaluation services and politically corrupted evaluations. Worries about bias control in these approaches increase the importance of subjecting advocacy evaluations to metaevaluations grounded in standards for sound evaluations.

Of the 22 program evaluation approaches discussed, 2 are classified as pseudoevaluations, 13 as questions/methodsoriented approaches, 3 as improvement/ accountability-oriented approaches, and 4 as social agenda/advocacy-directed approaches. The analysis of the 20 legitimate approaches is preceded with a discussion of the 2 approaches that often are used to distort findings and conclusions. The latter group is considered because evaluators and clients should be alert to and reject approaches that often are masqueraded as sound evaluations, but in reality lack truthfulness and integrity.

Each approach is analyzed in terms of ten descriptors: (1) advance organizers, that is, the main cues that evaluators use to set up a study; (2) main purpose(s) served; (3) sources of questions addressed; (4) questions that are characteristic of each study type; (5) methods typically employed; (6) persons who pioneered in conceptualizing each study type;

(7) other persons who have extended development and use of each study type; (8) key considerations in determining when to use each approach; (9) strengths of the approach, and (10) weaknesses of the approach. Using these descriptors, comments on each of the 22 program evaluation approaches are presented. These assessments are then used to reach conclusions about which approaches should be avoided, which are most meritorious, and under what circumstances the worthy approaches are best applied.

Caveats

I acknowledge, without apology, that the assessments of approaches and the entries in the charts throughout the paper are mainly my best judgments. I have taken no poll, and no definitive research exists to represent a consensus on the characteristics and strengths and weaknesses of the different approaches.

My analyses reflect 35 years of experience in applying and studying different evaluation approaches. Hopefully, as parochial as these might be, they will be useful to evaluators and evaluation students at least in the form of working hypotheses to be tested.

Also, I have mainly looked at the approaches as relatively discrete ways to conduct evaluations. In reality, there are many occasions when it is functional to mix and match different approaches. A careful analysis of such combinatorial applications no doubt would produce several hybrid approaches for analysis. Unfortunately, that step is beyond the scope of what I have attempted here.

II. PSUEDOEVALUATIONS

Because this paper is focused on describing and assessing the state of the art in evaluation, it is necessary to discuss bad and questionable practices, as well as the best efforts. Evaluations can be viewed as threatening or approached in opportunistic ways. In such cases, evaluators and their clients are sometimes tempted to shade, selectively release, or even falsify findings. While such efforts may look like sound evaluations, they are judged in this analysis to be psuedoevaluations, if they do not forthrightly attempt to produce and report to all right-to-know audiences valid assessments of merit and worth. The first type of psuedoevaluation considered—Public Relations approach—may meet the standard for addressing all right-to-know audiences but fails as a legitimate evaluation approach, because typically it presents a program’s strengths (or an exaggerated view of them) but not its weaknesses. The second psuedoevaluation approach—Politically Controlled evaluation—may be quite strong in obtaining valid information but fail as a sound evaluation by either withholding information from right-toknow audiences or releasing only those parts that are advantageous to the client.

Approach 1: Public Relations-Inspired Studies

The public relations approach begins with an intention to use data to convince constituents that a program is sound and effective. Other names for the approach are “ideological marketing” (see Ferguson, June 1999), advertising, and infomercial.

The advance organizer is the propagandist’s information needs. The study’s purpose is to help the program director/public relations official project a convincing, positive public image for a program, project, process, organization, leadership, etc. The guiding questions are derived from the public relations specialists’ and administrators’ conceptions of which questions would be most popular with their constituents. In general, the public relations study seeks information that would most help an organization confirm its claims of excellence and secure public support. From the start, this type of study seeks not a valid assessment of merit and worth but information needed to help the program “put its best foot forward.” Such studies avoid gathering or releasing negative findings.

Typical methods used in public relations studies are biased surveys, inappropriate use of norms tables, biased selection of testimonials and anecdotes, “massaging” of obtained information, selective release of only the positive findings, cover-up of embarrassing incidents, and the use of “expert,” advocate consultants. In contrast to the “critical friends” employed in Australian evaluations, public relations studies use “friendly critics.” A pervasive characteristic of the public relations evaluator’s use of dubious methods is a biased attempt to nurture a good picture for the program being evaluated. The fatal flaw of built-in bias to report only good things offsets any virtues of this approach. If an organization substitutes biased reporting of only positive findings for balanced evaluations of strengths and weaknesses, it soon will demoralize evaluators who are trying to conduct and report valid evaluations and may discredit its overall practice of evaluation.

By disseminating only positive information on a program’s performance while withholding information on shortcomings and problems, evaluators and clients may mislead the taxpayers, constituents, and other stakeholders concerning the program’s true value. The possibility of such positive bias in advocacy evaluations underlies the longstanding policy of Consumers Union not to include advertising by the owners of the products and services being evaluated in its Consumer Reports magazine. In order to maintain credibility with consumers, Consumers Union has steadfastly maintained an independent perspective and a commitment to identify and report both strengths and weaknesses in the items evaluated and not to supplement this information with biased ads.

A contact with an urban school district illustrates the public relations type of study. A superintendent requested a community survey for his district. The superintendent said, straightforwardly, that he wanted a survey that would yield a positive report on the district’s performance and his leadership. He said such a positive report was desperately needed at the time so that the community would restore confidence in the school district and him. The superintendent did not get the survey and positive report, and it soon became clear why he thought one was needed. Several weeks after making the request, he was summarily fired. Another example occurred when a large urban school district used one set of national norms to interpret pretest results and another norms table for the posttest. The result was a spurious portrayal and attendant wrong conclusion that the students’ test performance had vastly improved between the first and second test administrations. Still another example was seen when an evaluator gave her superintendent a sound program evaluation report, showing both strengths and weaknesses of the targeted program. The evaluator was surprised and dismayed one week later, when the superintendent released to the public a revised version showing only the program’s strengths.

Evaluators need to be cautious in how they relate to the public relations activities of their sponsors, clients, and supervisors. Certainly, public relations documents will reference information from sound evaluations. Evaluators should persuade their audiences to make honest use of the evaluation findings. Evaluators should not be party to misuses, especially in cases where erroneous reports are issued that predictably will mislead readers to believe that a seriously flawed program is good. As one safeguard evaluators can promote and help their clients arrange to have independent metaevaluators examine the organization’s production and use of evaluation findings against professional standards for evaluations.

Approach 2: Politically Controlled Studies

The politically controlled study is an approach that can be either defensible or indefensible. A politically controlled study is illicit if the evaluator and/or client (a) withhold the full set of evaluation findings from audiences who have express, legitimate, and legal rights to see the findings; (b) abrogate their prior agreement to fully disclose the evaluation findings; or (c) bias the evaluation message by releasing only part of the findings. It is not legitimate for a client first to agree to make the findings of a commissioned evaluation publicly available and then, having previewed the results, to release none or only part of the findings. If and when a client or evaluator violates the formal written agreement on disseminating findings or applicable law, then the other party has a right to take appropriate actions and/or seek an administrative or legal remedy.

However, clients sometimes can legitimately commission covert studies and keep the findings private, while meeting applicable laws and adhering to an appropriate advance agreement with the evaluator. This is especially the case in the U.S. for private organizations not governed by public disclosure laws. Also, an evaluator, under legal contractual agreements, can plan, conduct, and report an evaluation for private purposes, while not disclosing the findings to any outside party. The key to keeping client-controlled studies in legitimate territory is to reach appropriate, legally defensible, advance, written agreements and to adhere to the contractual provisions concerning release of the study’s findings. Such studies also have to conform to applicable laws on release of information.

The advance organizers for a politically controlled study include implicit or explicit threats faced by the client for a program evaluation and/or objectives for winning political contests. The client’s purpose in commissioning such a study is to secure assistance in acquiring, maintaining, or increasing influence, power, and/or money. The questions addressed are those of interest to the client and special groups that share the client’s interests and aims. The main questions of interest to the client are, What is the truth, as best can be determined, surrounding the particular dispute or political situation? What information would be advantageous in a potential conflict situation? What data might be used advantageously in a confrontation? Typical methods of conducting the politically controlled study include covert investigations, simulation studies, private polls, private information files, and selective release of findings. Generally, the client wants obtained information to be as technically sound as possible. However, he or she may also want to withhold findings that do not support his or her position. The approach’s strength is that it stresses the need for accurate information. However, because the client might release information selectively to create or sustain an erroneous picture of a program’s merit and worth, might distort or misrepresent the findings, might violate a prior agreement to fully release the findings, or might violate a “public’s right to know” law, this type of study can degenerate into a pseudoevaluation.

For obvious reasons, persons have not been nominated to receive credit as pioneers or developers of the illicit, politically controlled study. To avoid the inference that this type of study is imaginary, consider the following examples.

A superintendent of one of the nation’s largest public school districts once confided that he possessed an extensive notebook of detailed information about each school building in his district. The information included student achievement, teacher qualifications, racial mix of teachers and students, average per-pupil expenditure, socioeconomic characteristics of the student body, teachers’ average length of tenure in the system, and so forth. The aforementioned data revealed a highly segregated district with uneven distribution of resources and markedly different achievement levels across schools. When asked why all the notebook’s entries were in pencil, the superintendent replied it was absolutely essential that he be kept informed about the current situation in each school; but he said it was also imperative that the community-atlarge, the board, and special interest groups in the community, in particular, not have access to the information, for any of these groups might point to the district’s inequities as a basis for protest and even removing the superintendent. Hence, one special assistant kept the document up-to-date; only one copy existed, and the superintendent kept that locked in his desk. The point of this example is not to negatively judge the superintendent’s behavior. Instead, the superintendent’s ongoing covert investigation and selective release of information was decidedly not a case of true evaluation, for what he disclosed to the right-to-know audiences did not fully and honestly inform them about the observed situation in the district. This example may appropriately be termed a pseudoevaluation because it both underinformed and misinformed the school district’s stakeholders.

Cases like this undoubtedly led to the federal and state sunshine laws in the United States. Under current U.S. and state freedom of information provisions, most information obtained through the use of public funds must be made available to interested and potentially affected citizens. Thus, there exist legal deterrents to and remedies for illicit, politically controlled evaluations that use public funds.

While it would be unrealistic to recommend that administrators and other evaluation users not obtain and selectively employ information for political gain, they should not misrepresent their politically controlled information-gathering and reporting activities as sound evaluation. Evaluators should not lend their names and endorsements to evaluations presented by their clients that misrepresent the full set of relevant findings, that present falsified reports aimed at winning political contests, or that violate applicable laws and/or prior formal agreements on release of findings.

Before addressing the next group of study types, a few additional comments are in order concerning pseudoevaluation studies. These approaches have been considered because they are a prominent part of the evaluation scene. Sometimes “evaluators” and their clients are coconspirators in performing a purposely misleading study. On other occasions, evaluators, believing they are doing an assessment that is impartial, technically sound, and contracted to inform the public, discover that their client had other intentions or decides to abrogate prior evaluation agreements. When the time is right, the client is able to subvert the study in favor of producing the desired biased picture or none at all. It is imperative that evaluators be more alert than they often are to these kinds of potential conflicts. Otherwise, they will be unwitting accomplices in efforts to mislead through evaluation.

Such instances of misleading constituents through purposely biased reports or cover-up of findings, to which the public has a right, underscore the importance of having professional standards for evaluation work, faithfully applying them, and periodically engaging outside evaluators to assess one’s evaluation work. It is also prudent to develop advance contracts and memoranda of agreements to ensure that the sponsor and evaluator agree on procedures and safeguards to assure that the evaluation will comply with canons of sound evaluation and pertinent legal requirements. Despite these warnings, it can be legitimate for evaluators to give private evaluative feedback to clients, provided that applicable laws, statutes, and policies are met and sound contractual agreements on release of findings are reached and honored.

III. QUESTIONS/METHODS-ORIENTED\
EVALUATION APPROACHES

Questions/methods -oriented program evaluation approaches are so labeled because they start with particular questions and then move to the methodology appropriate for answering the questions. Only subsequently do they consider whether the questions and methodology are appropriate for developing and supporting value claims. These studies can be called quasi-evaluation studies, because sometimes they happen to provide evidence that fully assesses a program’s merit and worth, while in other cases, their focus is too narrow or is only tangential to questions of merit and worth. Quasi-evaluation studies have legitimate uses apart from their relationship to program evaluation, since they can focus on important questions, even though they are narrow in scope. The main caution is that these types of studies not be uncritically equated with evaluation.

Approach 3: Objectives-Based Studies

The objectives-based study is the classic example of a questions/methods-oriented evaluation approach (Madaus & Stufflebeam, 1988). In this approach, some statement of objectives provides the advance organizer. The objectives may be mandated by the client, formulated by the evaluator, or specified by the service providers. The usual purpose of an objectives-based study is to determine whether the program’s objectives have been achieved. Program developers, sponsors, and managers are typical audiences for such a study. These audiences want to know the extent to which each stated objective was achieved.

The methods used in objectives-based studies essentially involve specifying operational objectives and collecting and analyzing pertinent information to determine how well each objective was achieved. A wide range of objective and performance assessments may be employed. Criterion-referenced tests are especially relevant to this evaluation approach.

Ralph Tyler is generally acknowledged to be the pioneer in the objectives-based type of study, although Percy Bridgman and E. L. Thorndike probably should be credited along with Tyler. 4 Many people have furthered the work of Tyler by developing variations of his objectives-based evaluation model. A few of them are Bloom et al. (1956), Hammond (1972), Metfessel and Michael (1967), Popham (1969), Provus (1971), and Steinmetz (1983).

The objectives-based approach is especially applicable in assessing tightly focused projects that have clear, supportable objectives. Even then, such studies can be strengthened by judging project objectives against the intended beneficiaries’ assessed needs, searching for side effects, and studying the process as well as the outcomes.

Undoubtedly, the objectives-based study has been the most prevalent approach used in the name of program evaluation. It is one that has good common sense appeal; program administrators have had a great amount of experience with it; and it makes use of technologies of behavioral objectives and both norm-referenced and criterion-referenced testing. Common criticisms are that such studies lead to terminal information that is of little use in improving a program or other enterprise; that this information often is far too narrow in scope to constitute a sufficient basis for judging the object’s merit and worth; relatedly, that they do not uncover positive and negative side effects; and that they may credit unworthy objectives.

Approach 4: Accountability, Particularly Payment By Results Studies

The accountability study became prominent in the early 1970s. Its emergence seems to have been connected to widespread disenchantment with the persistent stream of evaluation reports indicating that almost none of the massive state and federal investments in educational and social programs were making any positive, statistically discernable difference. One proposed solution posited that accountability systems could be initiated to ensure both that service providers would carry out their responsibilities to improve services and that evaluators would do a thorough job of identifying the effects of improvement programs and determining which persons and groups were succeeding and which were not.

The advance organizers for the accountability study are the persons and groups responsible for producing results, the service providers’ work responsibilities, and the expected outcomes. The study’s purposes are to provide constituents with an accurate accounting of results, to ensure that the results are primarily positive, and to pinpoint responsibility for good and bad outcomes. Sometimes accountability programs administer both sanctions and rewards to the responsible service providers, depending on the extent and quality of their services and achievement.

The questions addressed in accountability studies come from the program’s constituents and controllers, such as taxpayers; parent groups; school boards; and local, state, and national funding organizations. The main question that the groups want answered concerns whether each involved service provider and organization charged with responsibility for delivering and improving services is carrying out its assignments and achieving all it should, given the investments of resources to support the work.

A wide variety of methods have been used to ensure and assess accountability. These include performance contracting; Program Planning and Budgeting System (PPBS); Management By Objectives (MBO); Zero Based Budgeting; mandated “program drivers” and indicators; program input, process, output databases; independent goal achievement auditors; procedural compliance audits; peer review; merit pay for individuals and/or organizations; collective bargaining agreements; mandated testing programs; institutional report cards; self-studies; site visits by expert panels; and procedures for auditing the design, process, and results of self-studies. Also included are mandated goals and standards, decentralization and careful definition of responsibility and authority, payment by results, awards and recognition, sanctions, takeover/intervention authority by oversight bodies, and competitive bidding.

Lessinger (1970) is generally acknowledged as a pioneer in the area of accountability. Some of the people who have extended Lessinger’s work are Stenner and Webster, in their development of a handbook for conducting auditing

5

activities, and Kearney, in providing leadershipto the Michigan Department of Education in developing the first statewide educational accountability system. A recent major attempt at accountability, involving sanctions and rewards, was the ill-fated, heavily-funded Kentucky Instructional Results Information System (Koretz & Barron, 1998). The failure of this program was clearly associated with fast pace implementation in advance of validation, reporting and later retraction of flawed results, results that were not comparable to those in other states, payment by results that fostered teaching tests and other cheating in the schools, and heavy expense–associated with performance assessments–that could not be sustained over time. Kirst (1990) analyzed the history and diversity of attempts at accountability in education within the following six broad types of accountability: performance reporting, monitoring and compliance with standards or regulations, incentive systems, reliance on the market, changing locus of authority or control of schools, and changing professional roles.

Accountability approaches are applicable to organizations and professionals funded and charged to carry out public mandates, deliver public services, implement specially funded programs, etc. It behooves these program leaders to maintain a dynamic baseline of information needed to demonstrate fulfillment of responsibilities and achievement of positive results. They should focus accountability mechanisms especially on those program elements that can be changed with the prospect of improving outcomes. They should also focus accountability to enhance staff cooperation toward achievement of collective goals rather than to stimulate counterproductive competition. Moreover, accountability studies that compare different programs should fairly consider the programs’ different contexts, including especially beneficiaries’ characteristics and needs, local support, available resources, and external forces.

The main advantages of accountability studies are that they are popular among constituent groups and politicians and are aimed at improving public services. Also, they can provide program personnel with clear expectations against which to plan, execute, and report on their services and contributions. They can also be designed to give service providers both freedom to innovate on procedures and clear expectations and requirements for producing and reporting on sound outcomes. In addition, setting up healthy, fair competition between comparable programs can result in better services and products for consumers.

A main disadvantage is that accountability studies often issue invidious comparisons and thereby produce unhealthy competition and much political unrest and acrimony among educators and between them and their constituents. Also, accountability studies often focus too narrowly on outcome indicators and can undesirably narrow the range of services provided. Another disadvantage is that politicians tend to force the implementation of accountability efforts before the needed instruments, scoring rubrics, assessor training, etc. can be planned, developed, field-tested, and validated. Furthermore, prospects for rewards and threats of sanctions have often led service providers to cheat in order to assure positive evaluation reports. For example, in schools, cheating to obtain rewards and avoid sanctions has frequently generated bad teaching, bad press, and turnover in leadership.

Approach 5: Objective Testing Programs

Since the 1930s, American education has been inundated with standardized, multiple choice, norm-referenced testing programs. Probably every school district in the United States has some type of standardized testing program of this type. Such tests are administered annually by local school districts and/or state education departments to inform students, parents, educators, and the public at large about the achievements of children and youth. Their main purposes are to assess the achievements of individual students and groups of students compared to norms and/or standards. Typically, these tests are administered to all students in applicable grade levels. Because these test results focus on student outcomes and are conveniently available, many educators have tried to use the results to evaluate the quality of special projects and specific school programs by inferring that high scores reflect successful efforts and that low scores reflect poor efforts. Such inferences can be erroneous if the tests were not targeted on particular project or program objectives or the needs of particular target groups of students and if the students’ background characteristics were not taken into account.

Advance organizers for standardized educational tests include areas of the school curriculum and specified norm groups. The main purposes of testing programs are to compare the test performance of individual students and groups of students to those of selected norm groups and/or to diagnose shortfalls related to particular objectives. Additionally, standardized test results are often used to compare the performance of different programs, schools, etc., and to examine achievement trends across years. Metrics used to make the comparisons typically are standardized individual and mean scores for the total test and subtests.

The sources of questions addressed by testing programs are usually test publishers and test development/selection committees. The typical question addressed by these tests concerns whether the test performance of individual students is at or above the average performance of local, state, and national norm groups. Other questions may concern the percentages of students who surpassed one or more cut-score standards, where the group of students ranks in comparison with other groups, or whether the current year’s achievement is better than in prior years. The main process involved in using testing programs is to select, administer, score, interpret, and report the tests.

Lindquist (1951), a major pioneer in this area, was instrumental in developing the Iowa testing programs, the American College Testing Program, the National Merit Scholarship Testing Program, and the General Educational Development Testing Program, as well as the Measurement Research Center at the University of Iowa. Many people have contributed substantially to the development of educational testing in America, including Ebel (1965), Flanagan (1939), Lord and Novick (1968), and Thorndike (1971). In the 1990s a number of persons innovated in such areas of testing as item response theory (Hambleton & Swaminathan, 1985) and value-added measurement (Sanders & Horn, 1994; Webster, 1995).

Virtually all public schools in the U.S. engage in one or more forms of standardized, objective achievement testing. If the school’s personnel carefully select such tests and use them appropriately to assess and improve student learning and report to the public, the involved expense and effort is highly justified. However, they should be careful not to rely on these results for evaluating specially targeted projects and programs. Student outcome measures for judging specific projects and programs must be validated in terms of the particular objectives and the characteristics and needs of the students being served by the program.

The main advantages of standardized-testing programs are that they are efficient in producing valid and reliable information on student performance in many areas of the school curriculum and that they are a familiar strategy at every level of the school program in virtually all school districts in the United States. The main limitations are that they provide data only about student outcomes; they reinforce students’ multiple-choice test-taking behavior rather than their writing and speaking behaviors; they tend to address only lower-order learning objectives; and, in many cases, they are perhaps a better indicator of the socioeconomic levels of the students in a given program, school, or school district than of the quality of the implicated teaching and learning. Stake (1971) and others have argued effectively that standardized tests often are poor approximations of what teachers actually teach. Moreover, as has been patently clear in evaluations of programs for both disadvantaged students and gifted students, norm-referenced tests often do not measure achievements well for the low and high scoring students. Unfortunately, program evaluators often have made uncritical uses of standardized test results to judge a program’s outcomes, just because the results are conveniently available and have face validity to the public. Many times the contents of such tests do not match the program’s objectives. Also, they may measure well the differences between students in the middle of the achievement distribution but poorly for the slow learners often targeted by special education programs and high achievers.

Approach 6: Outcomes Monitoring/Value-Added Assessments

Recurrent outcomes/value-added assessment is a special case of the use of standardized testing to evaluate the effects of programs and policies. The emphasis here is on annual testing in order to assess trends and partial out effects of the different levels and components of an educational system. Characteristic of this approach is the cyclical collection of outcome measures based on standardized indicators, analysis of results in relation to policy questions, and reporting of overall results plus specific policy-relevant analyses. The main interest is in an aggregate, not individual performance. A state education department may regularly collect achievement data from all students (at selected grade levels), as is the case in the Tennessee Value-Added Assessment System. The evaluator may analyze the data to look at contrasting results related to particular objectives for schools using and not using particular programs. These results may be further broken out to make comparisons between classes, curricular areas, grade levels, teachers, schools, different size and resource classifications of schools, districts, and different areas of a state. This approach differs from the typical standardized achievement testing program in its emphasis on uncovering and analyzing policy issues rather than only reporting on students’ progress. Otherwise, the two approaches have much in common.

The advance organizers in monitoring outcomes and employing value-added analysis are the indicators of expected and possible outcomes and the scheme for classifying results to examine policy issues and/or program effects. The purposes of Outcomes Monitoring/Value-Added Assessment systems are direction for policymaking, accountability to constituents, and feedback for improving programs and services. This approach also ensures standardization of data for assessment and improvement throughout a system. The source of questions to be addressed by such monitoring systems originate from funding organizations, policymakers, the system’s professionals, and constituents.

Illustrative questions addressed by Outcomes Monitoring/Value-Added Assessment systems are To what extent are particular programs adding value to students’ achievement? What are the cross-year trends in outcomes? In what sectors of the system is the program working best and poorest? What are key, pervasive shortfalls in particular program objectives that require further study and attention? To what extent are program successes and failures associated with the system’s different organizational levels?

Developers of the Outcomes Monitoring/Value-Added Assessment approach include especially William Sanders and Sandra Horn (1994); William Webster (1995); Webster, Mendro, and Almaguer (1994); and Peter Tymms (1995). These developers have used census data on student achievement trends to diagnose areas for improvement and look for effects of programs and policies. What distinguishes the Outcomes Monitoring/Value-Added Assessment approach from the traditional standardized testing program is sophisticated analysis of data to partial out effects of programs and policies and to identify areas where new policies and programs are needed. In contrast to these applications, the typical standardized testing program is focused more on providing feedback on the performance of individual students and groups of students, without the attendant policy-oriented analysis. Probably the Outcomes Monitoring/Value-Added Assessment approach is mainly feasible for well-endowed state education departments and large school districts where there is strong support from policy groups, administrators, and service providers to make the approach work. It requires systemwide buy-in; politically effective leaders to continually explain and sell the program; a smoothly operating, dynamic, computerized baseline of relevant input and output information; highly skilled technicians to make it run efficiently and accurately; complicated statistical analysis; and high-level commitment to use the results for purposes of policy development, accountability, program evaluation, and improvement at all levels of the system.

The central advantage of Outcomes Monitoring/Value-Added Assessment is in the systematization and institutionalization of a database of outcomes that can be used over time and in a standardized way to study and find means to improve outcomes. Also, Outcomes Monitoring/Value-Added Assessment is conducive to using a standard of continuous progress across years for every student as opposed to employing static cut scores. The latter, while prevalent in accountability programs, basically fail to take into account meaningful gains by low or high achieving students, since these gains usually are far removed from the static, cut score standards. Also, Sanders and Horn (1994) have shown that use of static cut scores may produce a “shed pattern,” in which students who began below the cut score make the greatest gains while those who started above the cut score standard make little progress. Like the sloping roof of a tool shed, the gains are greatest for previously low scoring students and progressively lower for the higher achievers. This suggests that teachers are concentrating mainly on getting students to the cut score standard but not beyond it and thus “holding back the high achievers.” This approach makes efficient use of standardized tests; is amenable to analysis of trends at state, district, school, and classroom levels; uses students as their own controls; and emphasizes service to every student.

A major disadvantage of this approach is that it is politically volatile, since it is used to identify responsibility for successes and failures down to the levels of schools and teachers. Also, it is constrained mainly to use quantitative information such as that coming from standardized, multiple choice achievement tests. Consequently, the complex and powerful analyses are based on a limited scope of outcome variables. Nevertheless, Sanders (1989) has argued that a strong body of evidence supports the use of well-constructed, standardized, multiple choice achievement tests. Beyond the issue of outcome measures, the approach does not provide in-depth documentation of program inputs and processes and makes little if any use of qualitative methods. Despite the advancements in objective measurement and the employment of hierarchical mixed models to defensibly partial out effects of a system’s organizational components and individual staff members, critics of the approach argue that causal factors are so complex that no measurement and analysis system can fairly fix responsibility to the level of teachers for the academic progress of individual and collections of students.

Approach 7: Performance Testing

In the 1990s, there were major efforts to offset the limitations of the typical multiple choice tests by employing performance or authentic measures. These are devices that require students to demonstrate the performance being assessed by producing authentic responses, such as written or spoken answers, musical or psychomotor presentations, portfolios of work products, or group solutions to defined problems. Arguments given for such performance tests are that they have high face validity and model and reinforce the skills that students should be acquiring through their studies. For example, students are not being taught so that they will do well in choosing best answers from a list, but so that they will master the underlying understandings and skills and effectively apply them to real life problems.

The advance organizers in performance assessments are life skill objectives and content-related performance tasks plus ways that their achievement can be demonstrated in practice. The main purpose of performance tests is to compare the test performance of individual students and groups of students to model performance on the assessment tasks. Grades assigned to each respondent’s performance, using set rubrics, enables assessment of the quality of achievements represented and comparisons across groups.

The sources of questions that performance tests address are analyses of selected life skill tasks and content specifications in curricular materials. The typical questions addressed by performance tests concern whether individual students can effectively write, speak, figure, analyze, lead, work cooperatively, and solve given problems up to the level of acceptable standards. The main process involved in using performance tests is to define areas of skills to be assessed; select the type of assessment device; construct the assessment tasks; determine scoring rubrics; define standards for assessing performance; train and calibrate scorers; validate the measures; and administer, score, interpret, and report the test results.

In speaking of licensing tests, Flexner (1910) called for tests that ascertain students’ practical ability to successfully confront and solve problems in concrete cases. Some of the pioneers in applying performance assessment to state education systems were the state education departments in Vermont and Kentucky (Kentucky Department of Education, 1993; Koretz, 1986, 1996; Koretz & Barron, 1998). Other sources of information about the general approach and issues in performance testing include Baker, O’Neil, and Linn (1993); Herman, Gearhart, and Baker (1993); Linn, Baker, and Dunbar (1991); Mehrens (1972); Messick (1994); Stillman, Haley, Regan, Philbin, Smith, O’Donnell, and Pohl (1991); Swanson, Norman, and Linn (1995); Torrance (1993); and Wiggins (1989).

Often it is difficult to obtain the conditions necessary to employ the performance testing approach. It requires a huge outlay of time and resources for development and application. Typically, state education departments and school districts probably should use this approach very selectively and only when they can make the investment needed to produce valid results that are worth the large, required investment. On the other hand, students’ writing ability is best assessed and nurtured through obtaining, assessing, and providing critical feedback on students’ writing samples.

The main advantages of performance testing programs are that they require students to construct responses to assessment tasks that are akin to what they will have to do in real life. They eliminate guessing from the testing task. They also reinforce life skills, such as being able to write or otherwise construct responses rather than pass multiple choice tests.

Major disadvantages of the approach are heavy time requirements for administration; high costs of scoring; difficulty in achieving reliable scores; narrow scope of skills that can feasibly be assessed; and lack of norms for comparisons, especially at the national level. In general, performance tests are inefficient, costly, and often of dubious reliability. Moreover, compared with multiple choice tests, performance tests, in the same amount of testing time, can cover only a much narrower range of questions.

Approach 8: Experimental Studies

In using controlled experiments, program evaluators randomly assign subjects or groups of subjects to experimental and control groups and then contrast the outcomes when the experimental group receives a particular intervention and the control group receives no special treatment or some different treatment. This type of study was quite prominent in program evaluation during the late 1960s and early 1970s, when there was a federal requirement to assess the effectiveness of federally funded innovations. However, experimental program evaluations subsequently fell into disfavor and disuse. (In the 1990s, controlled experiments in education have been rare [Nave, Misch, & Mosteller,1999].) Apparent reasons for this decline are that evaluators rarely can meet the required experimental conditions and assumptions and the prevalent finding has been “no statistically significant result.”

This approach is labeled as a questions-oriented or quasi-evaluation strategy because it starts with questions and methodology that may address only a narrow set of the questions needed to assess a program’s merit and worth. In the 1960s, Campbell and Stanley (1963) and others hailed the true experiment as the only sound means of evaluating interventions. This piece of evaluation history reminds one of Kaplan’s (1964) famous warning against the so-called “law of the instrument,” whereby a given method is equated to a field of inquiry. In such a case, the field of inquiry is restricted to the questions that are answerable by the given method. Fisher (1951) specifically warned against equating his experimental methods with science. Similarly, experimental design is a method that can contribute importantly to program evaluation, as Nave, Misch, and Mosteller (1999) have demonstrated, but by itself it is often insufficient to address a client’s full range of evaluation questions.

The advance organizers in experimental studies are problem statements, competing treatments, hypotheses, investigatory questions, and randomized treatment and comparison groups. The usual purpose of the controlled experiment is to determine causal relationships between specified independent and dependent variables, such as a given instructional method and student standardized-test performance. It is particularly noteworthy that the sources of questions investigated in the experimental study are researchers, program developers, and policy figures, and not usually a program’s constituents and practitioners.

The frequent question in the experimental study is, What are the effects of a given intervention on specified outcome variables? Typical methods used are experimental and quasi-experimental designs. Pioneers in using experimentation to evaluate programs are Campbell and Stanley (1963), Cronbach and Snow (1969), and Lindquist (1953). Other persons who have developed the methodology of experimentation substantially for program evaluation are Boruch (1994); Glass and Maguire (1968); Nave, Misch, and Mosteller (1999); Suchman (1967); and Wiley and Bock (1967).

Evaluators should consider conducting a controlled experiment only when its required conditions and assumptions can be met. Often this requires substantial political influence, substantial funding, and widespread agreement–e.g., among the targeted educators, parents, and teachers—to submit to the requirements of the experiment. Such requirements typically include, among others, a stabilized program that will not have to be studied and modified during the evaluation; the ability to establish and sustain comparable program and control groups; the ability to keep the program and control conditions separate and uncontaminated; and the ability to obtain the needed criterion measures from all or at least a representative group of the members of the program and comparison groups. Evaluability assessment was developed as a particular methodology for determining the feasibility of moving ahead with an experiment (Smith, 1989; Wholey, 1995).

Controlled experiments have a number of advantages. They focus on results and not just intentions or judgments. They provide strong methods for establishing relatively unequivocal causal relationships between treatment and outcome variables; this ability can be especially significant when program effects are small but important. Moreover, because of the prevalent use and success of experiments in such fields as medicine and agriculture, the approach has widespread credibility.

The above advantages are offset by serious objections to experimenting on school students and other subjects. It is often considered unethical or even illegal to deprive the control group of the benefits of special funds for improving services. Likewise, many parents don’t want schools to experiment on their children by applying unproven interventions. Typically, schools find it impractical and unreasonable to randomly assign students to treatments and to hold treatments constant throughout the study period. Also, experimental studies provide a much narrower range of information than schools or other organizations often need to assess and strengthen their programs. On this point, experimental studies tend to provide terminal information that is not useful for guiding the development and improvement of programs and in fact need to thwart ongoing modifications of the treatments.

Approach 9: Management Information Systems

The management information system is like the politically controlled approaches, except that it supplies managers with the information they need to conduct and report on their programs, as opposed to supplying them with the information they need to win a political advantage. The management information approach is also like the decision/accountability-oriented approach, which will be discussed later, except that the decision/accountability-oriented approach provides information needed to both develop and defend a program’s merit and worth, which goes beyond providing information that managers need to implement and report on their management responsibilities.

The advance organizers in most management information systems include program objectives, specified activities, and projected program milestones or events. A management information system’s purpose, as already implied, is to continuously supply managers with the information they need to plan, direct, control, and report on their programs or spheres of responsibility.

The sources of questions addressed are the management personnel and their superiors. The main questions they typically want answered are, Are program activities being implemented according to schedule, according to budget, and with the expected results? To provide ready access to information for addressing such questions, these systems regularly store and make accessible up-to-date information on the program’s goals, planned operations, actual operations, staff, program organization, operations, expenditures, threats, problems, publicity, achievements, etc.

Methods employed in management information systems include system analysis, Program Evaluation and Review Technique (PERT), Critical Path Method, Program Planning and Budgeting System (PPBS), Management by Objectives, computer-based information systems, periodic staff progress reports, and regular budgetary reporting.

Cook (1966) introduced the use of PERT in education, and Kaufman (1969) wrote about the use of management information systems in education. Business schools and programs in computer information systems regularly provide courses in management information systems. Mainly, these focus on how to set up and employ computerized information banks for use in organizational decision making.

W. Edwards Deming (1986) argued that managers should pay close attention to process rather than being preoccupied with outcomes. He advanced a systematic approach for monitoring and continuously improving an enterprise’s process, arguing that close attention to the process will result in increasingly better outcomes. It is commonly said that, in paying attention to this and related advice from Deming, Japanese car makers and later the Americans greatly increased the quality of automobiles (Aguaro, 1990). Bayless and Massaro (1992) applied Deming’s approach to program evaluations in education. Based on this writer’s observations, the approach was not well suited to assessing the complexities of educational processes—possibly because, unlike the manufacture of automobiles, educators have no definitive, standardized models for linking exact educational processes to specified outcomes.

Nevertheless, given modern database technology, program managers often can and should employ management information systems in multiyear projects and programs. Program databases can provide information not only for keeping programs on track, but also for assisting in the broader study and improvement of program processes and outcomes.

A major advantage of the use of management information systems is in giving managers information they can use to plan, monitor, control, and report on complex operations. A major difficulty with the application of this industry-oriented type of system to education and social services is that the products of many such programs are not amenable to a narrow, precise definition as is the case with a corporation’s profit and loss statement. Moreover, processes in educational and social programs often are complex and evolving rather than straightforward and standardized like those of manufacturing and business. The information gathered in management information systems typically lacks the scope of context, input, process, and outcome information required to assess a program’s merit and worth.

Approach 10: Benefit-Cost Analysis Approach

Benefit-cost analysis as applied to program evaluation is a set of largely quantitative procedures used to understand the full costs of a program and to determine and judge what those investments returned in objectives achieved and broader social benefits. The aim is to determine costs associated with program inputs, determine the monetary value of the program outcomes, compute benefit-cost ratios, compare the computed ratios to those of similar programs, and ultimately judge the program’s productivity in economic terms.

The benefit-cost analysis approach to program evaluation may be broken down into three levels of procedures: (1) cost analysis of program inputs, (2) cost-effectiveness analysis, and (3) benefit-cost analysis. These may be looked at as a hierarchy. The first type, cost analysis of program inputs, may be done by itself. Such analyses entail an ongoing accumulation of a program’s financial history. These analyses are of use in controlling program delivery and expenditures. The program’s financial history can be used to compare the program’s actual costs to the projected costs in the original budget and to the costs of similar programs. Also, cost analyses can be extremely valuable to outsiders who might be interested in replicating the program.

Cost-effectiveness analysis necessarily includes cost analysis of program inputs to determine the cost associated with the progress toward achieving each objective. Such analyses might compare two or more programs’ costs and successes in achieving the same objectives. A program could be judged superior on cost-effectiveness grounds if it had the same costs but superior outcomes as similar programs. Or the program could still be judged superior on cost-effectiveness grounds if it achieved the same objectives as more expensive programs. Cost-effectiveness analyses do not require conversion of outcomes to monetary terms but must be keyed to clear, measurable program objectives.

Benefit-cost analyses typically build on a cost analysis of program inputs and a cost-effectiveness analysis. But the benefit-cost analysis goes further. It seeks to identify a broader range of outcomes than just those associated with program objectives. It examines the relationship between the investment in a program and the extent of positive and negative impacts on the program’s environment. In doing so, it ascertains and places a monetary value on program inputs and each identified outcome. It identifies a program’s benefit-cost ratios and compares these to similar ratios for competing programs. Ultimately, benefit-cost studies seek conclusions about the comparative benefits and costs of the examined programs.

Advance organizers for the overall benefit-cost approach are associated with cost breakdowns for both program inputs and program outputs. Program input costs may be delineated by line items (e.g., personnel, travel, materials, equipment, communications, facilities, contracted services, overhead, etc.), by program components, by year, etc. In cost-effectiveness analysis, a program’s costs are examined in relation to each program objective, and these must be clearly defined and assessed. The more ambitious benefit-cost analyses look at costs associated with main effects and side effects, tangible and intangible outcomes, positive and negative outcomes, and short-term and long-term outcomes—both inside and outside the program. Frequently, they also may break down costs by individuals and groups of beneficiaries. One may also estimate the costs of foregone opportunities and, sometimes, political costs. Even then, the real value of benefits associated with human creativity or self-actualization are nearly impossible to estimate. Consequently, the benefit-cost equation rests on dubious assumptions and uncertain realities.

The purposes of these three levels of benefit-cost analysis are to gain clear knowledge of what resources were invested, how they were invested, and with what effect. In popular vernacular, cost-effectiveness and benefit-cost analyses seek to determine the program’s “bang for the buck.” There is great interest in answering this type of question. Policy boards, program planners, and taxpayers are especially interested to know whether program investments are paying off in positive results that exceed or are at least as good as those produced by similar programs.

Authoritative information on the benefit-cost approach may be obtained by studying the writings of Kee (1995), Levin (1983), and Tsang (1997).

Benefit-cost analysis is potentially important in most program evaluations. Evaluators and their clients are advised to discuss this matter thoroughly with their clients, to reach appropriate advance agreements on what should and can be done to obtain the needed cost information, and to do as much cost-effectiveness and benefit-cost analysis as can be done well and within reasonable costs.

Benefit-cost analysis is an important but problematic consideration in program evaluations. Most program evaluations are amenable to analyzing the costs of program inputs and maintaining a financial history of expenditures. The main impediment to this is that program authorities often do not want anyone other than the appropriate accountants and auditors looking into the financial books. If cost analysis, even at only the input levels, is to be done, this must be clearly provided for in the initial contractual agreements covering the evaluation work. Performing cost-effectiveness analysis can be feasible if cost analysis of inputs is agreed to; if there are clear, measurable program objectives; and if comparable cost information can be obtained from competing programs. Unfortunately, it is usually hard to meet all these conditions needed for a successful cost-effectiveness analysis. Even more unfortunate is the fact that it is usually impractical to conduct a thorough benefit-cost analysis. Not only must it meet all the conditions of the analysis of program inputs and cost-effectiveness analysis, but it must also place monetary values on identified outcomes, both those anticipated and those not expected.

Approach 11: Clarification Hearing

The clarification hearing is one label for the judicial approach to program evaluation. This approach essentially puts a program on trial. Role-playing evaluators competitively implement both a damning prosecution of the program—arguing that it failed—and a defense of the program—arguing that it succeeded. A judge hears these arguments within the framework of a jury trial and controls the proceedings according to advance agreements on rules of evidence and trial procedures. The actual proceedings are preceded by the collection of and sharing of evidence by both sides. The prosecuting and defending evaluators may call witnesses and place documents and other exhibits into evidence. A jury hears the proceedings and ultimately makes and issues a ruling on the program’s success or failure. Ideally, the jury is composed of persons representative of the program’s stakeholders. By videotaping the proceedings, the administering evaluator can, after the trial, compile a condensed videotape as well as printed reports to disseminate what was learned through the process.

The advance organizers for a clarification hearing are criteria of program effectiveness that both the prosecuting and defending sides agree to apply. The judicial approach’s main purpose is to ensure that the evaluation’s audience will receive balanced evidence on the program’s strengths and weaknesses. The key questions essentially are, Should the program be judged a success or failure? Is it as good or better than alternative programs that address the same objectives?

Robert Wolf (1975) pioneered the judicial approach to program evaluation. Others who applied, tested, and further developed the approach include Levine (1974), Owens (1973), and Popham and Carlson (1983).

Based on the past uses of this approach, it can be judged as only marginally relevant to program evaluation. By its adversarial nature, the approach prods the evaluators to present biased arguments in order to win their cases. The approach subordinates truth seeking to winning. Accuracy suffers in this process. The most effective debaters are likely to convince the jury of their position even when it is poorly founded. Also, the approach is politically problematic, since it generates considerable acrimony. Despite the attractiveness of using the law as a metaphor for program evaluation, with the law’s attendant rules of evidence, the promise of this application has not been fulfilled. There are few occasions in which it makes practical sense for evaluators to apply this approach.

Approach 12: Case Study Evaluations

A case-study-based program evaluation is a focused, in-depth description, analysis, and synthesis of a particular program or other object. The investigators do not control the program in any way. Instead, they look at it as it is occurring or as it occurred in the past. The study looks at the program in its geographic, cultural, organizational, and historical contexts. It closely examines the program’s internal operations and how it uses inputs and processes to produce outcomes. It examines a wide range of intended and unexpected outcomes. It looks at the program’s multiple levels and also holistically at the overall program. It characterizes both central, dominant themes and variations and aberrations. It defines and describes the program’s intended and actual beneficiaries. It examines beneficiaries’ needs and to what extent the program effectively addressed the needs. It employs multiple methods to obtain and integrate multiple sources of information. While it breaks apart and analyzes a program along various dimensions, it also provides an overall characterization of the program.

The main thrust of the case study approach is to delineate and illuminate a program, not necessarily to guide its development and to assess and judge its merit and worth. Hence, this paper characterizes the case study approach as a questions/methods-oriented approach rather than an improvement/ accountability approach.

The advance organizers in case studies include the definition of the program, characterization of its geographic and organizational environment, the historical period in which it is to be examined, the program’s beneficiaries and their assessed needs, the program’s underlying logic of operation and productivity, and the key roles involved in the program. A case study program evaluation’s main purpose is to provide stakeholders and their audiences with an authoritative, in-depth, well-documented explication of the program.

The case study should be keyed to the questions of most interest to the evaluation’s main audiences. The evaluator must therefore identify and interact with the program’s stakeholders. Along the way stakeholders will be engaged in helping to plan the study and interpret findings. Ideally, the audiences include the program’s oversight body, administrators, staff, financial sponsors, beneficiaries, and potential adopters of the program.

Typical questions posed by some or all of the above audiences are, What is the program in concept and practice? How has it evolved over time? How does it actually operate to produce outcomes? What has it produced? What are the shortfalls and negative side effects? What are the positive side effects? In what ways and to what degrees do various stakeholders value the program? To what extent did the program effectively meet beneficiaries’ needs? What were the most important reasons for the program’s successes and failures? What are the program’s most important unresolved issues?

How much has it cost? What are the costs per beneficiary, per year, etc.? What parts of the program have been successfully transported to other sites? How does this program compare with what might be called critical competitors? The above questions only illustrate the range of questions that a case study might address, since each case study will be tempered by the interests of the client and other audiences for the study and the evaluator’s interests.

To conduct effective case studies, evaluators need to employ a wide range of qualitative and quantitative methods. These may include analysis of archives; collection of artifacts, such as work samples; content analysis of program documents; both independent and participant observations; interviews; logical analysis of operations; focus groups; tests; questionnaires; rating scales; hearings; forums; and maintenance of a program database. Reports may incorporate in-depth descriptions and accounts of key historical trends; focus on critical incidents, photographs, maps, testimony, relevant news clippings, logic models, and cross-break tables; and summarize main conclusions. The case study report may include papers on key dimensions of the case, as determined with the audience, as well as an overall holistic presentation and assessment. Case study reports may involve audio and visual media as well as printed documents.

Case study methods have existed for many years and have been applied in such areas as clinical psychology, law, the medical profession, and social work. Pioneers in applying the method to program evaluation include Campbell (1975), Lincoln and Guba (1985), Platt (1992), Stake (1995), and Yin (1992).

The case study approach is highly conducive to program evaluation. It requires no controls of treatments and subjects and looks at programs as they naturally occur and evolve. It addresses accuracy issues by employing and triangulating multiple perspectives, methods, and information sources. It employs all relevant methods and information sources. It looks at programs within relevant contexts and describes contextual influences on the program. It looks at programs holistically and in depth. It examines the program’s internal workings and how it produces outcomes. It includes clear procedures for analyzing qualitative information. It can be tailored to focus on the audience’s most important questions. It can be done retrospectively or in real time. It can be reported to meet given deadlines and subsequently updated based on further developments.

The main limitation of the approach is that some evaluators may mistake its openness and lack of controls as an excuse for approaching it haphazardly and bypassing steps to assure that findings and interpretations possess rigor as well as relevance. Also, because of a preoccupation with descriptive information, the case study evaluator may not collect sufficient judgmental information to permit a broad-based assessment of a program’s merit and worth. Users of this approach might slight quantitative analysis in favor of qualitative analysis. By trying to produce a comprehensive description of a program, the case study evaluator may not produce timely feedback needed to help in program development. To overcome these potential pitfalls, evaluators using the case study approach should fully address the principles of sound evaluation as related to accuracy, utility, feasibility, and propriety.

Approach 13: Criticism and Connoisseurship

The connoisseur-based approach was developed pursuant to the methods of art criticism and literary criticism. This approach assumes that certain experts in a given substantive area are capable of in-depth analysis and evaluation that could not be done in other ways. Just as a national survey of wine drinkers could produce information concerning their overall preferences for types of wines and particular vineyards, it would not provide the detailed, creditable judgments of the qualities of particular wines that might be derived from a single connoisseur who has devoted a professional lifetime to the study and grading of wines and whose judgments are highly and widely respected.

The advance organizer for the connoisseur-based study is the evaluator’s special expertise and sensitivities. The study’s purpose is to describe, critically appraise, and illuminate a particular program’s merits. The evaluation questions addressed by the connoisseur-based evaluation are determined by expert evaluators—the critics and authorities who have undertaken the evaluation. Among the major questions they can be expected to ask are, What are the program’s essence and salient characteristics? What merits and demerits distinguish the particular program from others of the same general kind?

The methodology of connoisseurship includes the critics’ systematic use of their perceptual sensitivities, past experiences, refined insights, and abilities to communicate their assessments. The evaluator’s judgments are conveyed in vivid terms to help the audience appreciate and understand all of the program’s nuances.

Eisner (1975, 1983) has pioneered this strategy in education.6 A dozen or more of Eisner’s students have conducted research and development on the connoisseurship approach, e.g., Vallance (1973) and Flinders and Eisner (1994).

This approach obviously depends on the qualifications of the particular expert chosen to do the program evaluation. The approach also requires an audience that has confidence in and is willing to accept and use the connoisseur’s report. The author of this paper would willingly accept and use any evaluation that Dr. Elliott Eisner agreed to present, but there are not many Eisners out there.

The main advantage of the connoisseur-based study is that it exploits the particular expertise and finely developed insights of persons who have devoted much time and effort to the study of a precise area. They can provide an array of detailed information that the audience can then use to form a more insightful analysis than otherwise might be possible. The approach’s disadvantage is that it is dependent on the expertise and qualifications of the particular expert doing the program evaluation, leaving room for much subjectivity.

Approach 14: Program Theory-Based Evaluation

Program evaluations based on program theory begin with either (1) a well-developed and validated theory of how programs of a certain type within similar settings operate to produce outcomes or (2) an initial stage to approximate such a theory within the context of a particular program evaluation. The former of these conditions is much more reflective of the implicit promises in a theory-based program evaluation, since the existence of a sound theory means that a substantial body of theoretical development has produced and tested a coherent set of conceptual, hypothetical, and pragmatic principles, plus associated instruments to guide inquiry in the particular area. Then, the theory can aid a program evaluator to decide what questions, indicators, and assumed linkages between and among program elements should be used to evaluate a program covered by the theory.

Some well-developed theories for use in evaluations exist, which gives this approach some measure of viability. For example, health education/behavior change programs are sometimes founded on validated theoretical frameworks, such as the Health Belief Model (Becker, 1974; Mullen, Hersey, & Iverson, 1987; Janz & Becker, 1984). Other examples are the PRECEDEPROCEED Model for health promotion planning and evaluation (Green & Kreuter, 1991), Bandura’s (1977) Social Cognitive Theory, the Stages of Change Theory by Prochaska and DiClemente (1992), and Peters and Waterman’s (1982) theory of successful organizations. When such frameworks exist, their use probably can enhance a program’s effectiveness and provide a structure for validly evaluating the program’s functioning. Unfortunately, however, few program areas are buttressed by well-articulated and tested theories.

Thus, most theory-based evaluations begin by setting out to develop a theory that appropriately could be used to guide the particular program evaluation. As will be discussed later in this characterization, such ad hoc theory development efforts and their linkage to program evaluations are problematic. In any case, let us look at what the theory-based evaluator attempts to achieve.

The point of the theory development or selection effort is to identify advance organizers to guide the evaluation. Essentially, these are the mechanisms by which program activities are understood to produce or contribute to program outcomes, along with the appropriate description of context, specification of independent and dependent variables, and portrayal of key linkages. The main purposes of the theory-based program evaluation are to determine the extent to which the program of interest is theoretically sound, to understand why it is succeeding or failing, and to provide direction for program improvement.

Questions for the program evaluation are derived from the guiding theory. Example questions include, Is the program grounded in an appropriate, well-articulated, and validated theory? Is the employed theory up to date and reflective of recent research? Are the program’s targeted beneficiaries, design, operation, and intended outcomes consistent with the guiding theory? How well does the program address and serve the full range of pertinent needs of the targeted beneficiaries? If the program is consistent with the guiding theory, are the expected results being achieved? Are program inputs and operations producing outcomes in the ways the theory predicts? What changes in the program’s design or implementation might produce better outcomes? What elements of the program are essential for successful replication? Overall, was the program theoretically sound, did it operate in accordance with an appropriate theory, did it produce the expected outcomes, were the hypothesized causal linkages confirmed, is the program worthy of continuation and/or dissemination, and what program features are essential for successful replication?

The nature of these questions suggests that the success of the theory-based approach is dependent on a foundation of sound theory development and validation. This, of course, entails sound conceptualization of at least a context-dependent theory, formulation and rigorous testing of hypotheses derived from the theory, development of guidelines for practical implementation of the theory based on extensive field trials, and independent assessment of the theory. Unfortunately, not many program areas in education and the social sciences are grounded in sound theories. Moreover, evaluators wanting to employ a theory-based evaluation often find it infeasible to conduct the full range of theory development and validation steps and still to get the evaluation done on time. Thus, in claiming to conduct a theory-based evaluation, evaluators often seem to promise much more than they can deliver.

The main procedure typically used in these “theory-based program evaluations” is a model of the program’s logic. This may be a detailed flowchart of how inputs are thought to be processed to produce intended outcomes. It may also be a grounded theory like those advocated by Glaser and Strauss (1967). The network analysis of the former approach is typically an armchair theorizing process involving the evaluators and persons who are supposed to know how the program is expected to operate and produce results. They discuss, scheme, discuss some more, network, discuss further, and finally produce networks in varying levels of detail of what is involved in making the program work and how the various elements are linked to produce the desired outcomes. The more demanding grounded theory requires a systematic, empirical process of observing events or analyzing materials drawn from operating programs followed by an extensive modeling process.

Pioneers in applying theory development procedures to program evaluation include Glaser and Strauss (1967) and Weiss (1972, 1995). Other developers of the approach are Bickman (1990), Chen (1990), and Rogers (in press).

In any program evaluation assignment, it is reasonable for the evaluator to examine the extent to which program plans and operations are grounded in an appropriate theory or model. Also, it can be useful to engage in a modicum of effort to network the program and thereby seek out key variables and linkages. As noted previously, in the enviable but rare situation where a relevant, validated theory exists, the evaluator can beneficially apply it in structuring the evaluation and analyzing findings.

However, if a relevant, defensible theory of the program’s logic does not exist, evaluators need not develop one. In fact, if they attempt to do so they will incur many threats to their evaluation’s success. Rather than evaluating the program and its underlying logic, the evaluators might usurp the program staff’s responsibility for program design. They might do a poor job of theory development, given limitations on time and resources to develop and test an appropriate theory. They might incur the conflict of interest associated with having to evaluate the theory they developed. They might pass off an unvalidated model of the program as a theory, when it meets almost none of the requirements of a sound theory. They might bog down the evaluation in too much effort to develop a theory for the program. They might also focus attention on a theory developed early in a program and later discover that the program has evolved to be a quite different enterprise than what was theorized at the outset. In this case the initial theory could become a “Procrustean bed” for the program evaluation.

Overall, there really isn’t much to recommend theory-based program evaluation, since doing it right is usually not feasible and since failed or misrepresented attempts can be highly counterproductive. Nevertheless, modest attempts to model programs—labeled as such—can be useful for identifying measurement variables, so long as the evaluator doesn’t spend too much time on this and so long as the model is not considered as fixed or as a validated theory. Also, in the rare case where an appropriate theory already exists, the evaluator can make beneficial use of the theory to help structure and guide the evaluation and interpret the findings.

Approach 15: Mixed Methods Studies

In an attempt to resolve the longstanding debate about whether program evaluations should employ quantitative or qualitative methods, some authors have proposed that evaluators should regularly combine these methods in given program evaluations (for example, see the National Science Foundation’s 1997 User-Friendly Handbook for Mixed Method Evaluations). Such recommendations, along with practical guidelines and illustrations, are no doubt useful to many program staff members and to evaluators. But in the main, the recommendation for a mixed method approach only highlights a large body of longstanding practice of mixed-methods program evaluation rather than proposing a new approach. All seven approaches discussed in the remainder of this section of the paper employ both qualitative and quantitative methods. What sets them apart from the mixed method approach is that their first considerations are not the methods to be employed but either the assessment of value or the social mission to be served. The mixed methods approach is included in this section on questions/methods approaches, because it is preoccupied with using multiple methods rather than using whatever methods are needed to comprehensively assess a program’s merit and worth. As with the other approaches in this section, the mixed methods approach may or may not fully assess a program’s value; thus, it is classified as a quasi-evaluation approach.

The advance organizers of the mixed methods approach are formative and summative evaluations, qualitative and quantitative methods, and intra-case or cross-case analysis. Formative evaluations are employed to examine a program’s development and assist in improving its structure and implementation. Summative evaluations basically look at whether objectives were achieved, but may look for a broader array of outcomes. Qualitative and quantitative methods are employed in combination to assure depth, scope, and dependability of findings. This approach also applies to carefully selected single programs or to comparisons of alternative programs.

The basic purposes of the mixed method approach are to provide direction for improving programs as they are evolving and to assess their effectiveness after they have had time to produce results. Use of both quantitative and qualitative methods is intended to assure dependable feedback on a wide range of questions; depth of understanding of particular programs; a holistic perspective; and enhancement of the validity, reliability, and usefulness of the full set of findings. Investigators look to quantitative methods for standardized, replicable findings on large data sets. They look to qualitative methods for elucidation of the program’s cultural context, dynamics, meaningful patterns and themes, deviant cases, diverse impacts on individuals as well as groups, etc. Qualitative reporting methods are applied to bring the findings to life, making them clear, persuasive, and interesting. By using both quantitative and qualitative methods, the evaluator secures cross-checks on different subsets of findings and thereby instills greater stakeholder confidence in the overall findings.

The sources of evaluation questions are the program’s goals, plans, and stakeholders. The stakeholders often include skeptical as well as supportive audiences. Among the important stakeholders are program administrators and staff, policy boards, financial sponsors, beneficiaries, taxpayers, and program area experts.

The approach may pursue a wide range of questions. Examples of formative evaluation questions are

  • To what extent do program activities follow the program plan, time line, and budget?
  • To what extent is the program achieving its goals?
    • What problems in design or implementation need to be addressed?
    • Examples of summative evaluation questions are
  • To what extent did the program achieve its goals?
  • Was the program appropriately effective for all beneficiaries?
  • What interesting stories emerged?
  • What are program stakeholders’ judgments of program operations, processes, and outcomes?
  • What were the important side effects?
  • Is the program sustainable and transportable?

The approach employs a wide range of methods. Among the quantitative methods employed are surveys using representative samples, both cohort and cross-sectional samples, norm-referenced tests, rating scales, quasi experiments, significance tests for main effects, and a posteriori statistical tests. The qualitative methods may include ethnography, document analysis, narrative analysis, purposive samples, single cases, participant observers, independent observers, key informants, advisory committees, structured and unstructured interviews, focus groups, case studies, study of outliers, diaries, logic models, grounded theory development, flow charts, decision trees, matrices, and performance assessments. Reports may include abstracts, executive summaries, full reports, oral briefings, conference presentations, and workshops. They should include a balance of narrative and numerical information.

Considering his book on service studies in higher education, Ralph Tyler (Tyler et al., 1932) was certainly a pioneer in the mixed method approach to program evaluation. Other authors who have written cogently on the mixed methods approach are Guba and Lincoln (1981), Kidder and Fine (1987), Lincoln and Guba (1985), Miron (1998), Patton (1990), and Schatzman and Strauss (1973).

Basically, it is almost always appropriate to consider using a mixed methods approach. Certainly, the evaluator should take advantage of opportunities to obtain any and all potentially available information that is relevant to assessing a program’s merit and worth. Sometimes a study can be mainly or only qualitative or quantitative, but usually such studies would be strengthened by including both types of information. The key point is to choose methods because they can effectively address the study’s questions, not because they are either qualitative or quantitative.

Key advantages of using both qualitative and quantitative methods are that they complement each other in ways that are important to the evaluation’s audiences. Information from quantitative methods tends to be standardized, efficient, amenable to standard tests of reliability, easily summarized and analyzed, and accepted as “hard” data. Information from qualitative approaches adds depth; can be delivered in interesting, story-like presentations; and provides a means to explore and understand the more superficial quantitative findings. Using both types of methods affords important cross-checks on findings.

The main pitfall in pursuing the mixed methods approach is using multiple methods because this is the popular thing to do rather than because the selected methods best respond to the evaluation questions. Moreover, sometimes evaluators let the combination of methods compensate for a lack of rigor in applying them. Also, using a mixed methods approach can produce a schizophrenic evaluation if the investigator uncritically mixes positivistic and postmodern paradigms. Along this line, quantitative and qualitative methods are derived from different theoretical approaches to inquiry and reflect different conceptions of knowledge; and many evaluators do not possess the requisite foundational knowledge in both the sciences and humanities to effectively combine quantitative and qualitative methods. The approaches in the remainder of this paper place proper emphasis on mixed methods, making choice of the methods subservient to the approach’s dominant philosophy and to the particular evaluation questions to be addressed.

The mixed methods approach to evaluation concludes this paper’s discussion of the questions/ methods approaches to evaluation. These 13 approaches tend to concentrate on selected questions and methods and thus may or may not fully address an evaluation’s fundamental requirement to assess a program’s merit and worth. The array of these approaches suggests that the field has advanced considerably since the 1950s when program evaluations were rare and mainly used approaches grounded in behavioral objectives, standardized tests, and/or accreditation visits.

Tables 1 through 6 summarize the similarities and differences between the models in relationship to advance organizers, purposes, characteristic questions, methods, strengths, and weaknesses.

Table 1: Comparison of the 13 Quasi-Evaluation Approaches on Most Common ADVANCE ORGANIZERS
Advance Organizers Evaluation Approaches (by identification number)*
3 4 5 6 7 8 9 10 11 12 13 14 15
Program content/definition U U
Program rationale U
Context U
Treatments U
Time period U
Beneficiaries U
Comparison groups U
Norm groups U
Assessed needs U
Problem statements U
Objectives U U U U
Independent/dependent U U
Indicators/criteria U U
Life skills U
Performance tasks U
Questions/hypotheses/ causal factors U U
Policy issues U
Tests in use U U
Formative & summative evaluation U
Qualitative & quantitative methods U
Program activities/milestones U
Employee roles & responsibilities U U
Costs U
Evaluator expertise & sensitivities U
Intra-case/cross-case analysis U
* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments, 9. Management information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship, 14. Program theory-based, 15. Mixed methods.
Table 2: Comparison of the 13 Quasi-Evaluation Approaches on Primary EVALUATION PURPOSES
Evaluation Purposes Evaluation Approaches (by identification number)*
3 4 5 6 7 8 9 10 11 12 13 14 15
Determine whether program objectives were achieved U U U U U
Provide constituents with an accurate accounting of results U U U U U
Assure that results are positive U
Assess learning gains U
Pinpoint responsibility for good & bad outcomes U U U
Compare students’ test scores to norms U
Compare students’ test performance to standards U U U
Diagnose program shortcomings U U U U U
Compare performance of competing programs U U U U U
Examine achievement trends U U
Inform policymaking U U U U
Direction for program improvement U U U U
Ensure standardization of outcome measures U U
Determine cause and effect relationships in programs U U
Inform management decisions & actions U
Assess investments and payoffs U
Provide balanced information on strengths & weaknesses U U
Explicate & illuminate a program U U
Describe & critically appraise a program U
Assess a program’s theoretical soundness U
* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments, 9. Management information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship, 14. Program theory-based, 15. Mixed methods.

Questions/Methods-Oriented Evaluation Approaches

Table 3: Comparison of the 13 Quasi-Evaluation Approaches on Characteristic EVALUATION QUESTIONS
Evaluation Questions Evaluation Approaches (by identification number)*
3 4 5 6 7 8 9 10 11 12 13 14 15
To what extent was each program objective achieved? U U U
Did the program effectively discharge its responsibilities? U U
Did tested performance meet or exceed pertinent norms? U
Did tested performance meet or exceed standards? U U
Where does a group’s tested performance rank compared with other groups? U U
Is a group’s present performance better than past performance? U U U
What sectors of a system are performing best and poorest? U
Where are the shortfalls in specific curricular areas? U
At what grade levels are the strengths & shortfalls? U
What value is being added by particular programs? U
To what extent can students effectively speak, write, figure, analyze, lead, work cooperatively, & solve problems? U
What are a program’s effects on outcomes? U U
Are program activities being implemented according to schedule, budget, & expected results? U
What is the program’s return on investment? U
Is the program sustainable & transportable? U U U
Is the program worthy of continuation and/or dissemination? U U U U U
Is the program as good or better than others that address the same objectives? U U U
What is the program in concept & practice? U U

Stufflebeam

Table 3: Comparison of the 13 Quasi-Evaluation Approaches on Characteristic EVALUATION QUESTIONS
How has the program evolved over time? U
How does the program produce outcomes? U U
What has the program produced? U U
What are the program’s shortfalls & negative side effects? U U
What are the program’s positive side effects? U U
How do various stakeholders value the program? U U
Did the program meet all the beneficiaries’ needs? U U U
What were the most important reasons for the program’s success or failure? U U
What are the program’s most important unresolved issues? U
How much did the program cost? U U
What were the costs per beneficiary, per year, etc.? U U
What parts of the program were successfully transported to other sites? U
What are the program’s essence & salient characteristics? U U
What merits & demerits distinguish the program from similar programs? U U
Is the program grounded in a validated theory? U
Are program operations consistent with the guiding theory? U
Were hypothesized causal linkages confirmed? U U
What changes in the program’s design or implementation might produce better outcomes? U U U U U U U
What program features are essential for successful replication? U U U
What interesting stories emerged? U U
* 3. Objectives-based, 4. Accountability, 5. Objectivetesting, 6. Outcomes monitoring, 7. Performancetesting, 8. Experiments, 9. Management information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship, 14. Program theory-based, 15. Mixed methods.

Questions/Methods-Oriented Evaluation Approaches

Table 4: Comparison of the 13 Quasi-Evaluation Approaches on Characteristic EVALUATION METHODS
Evaluation Questions Evaluation Approaches (by identification number)*
3 4 5 6 7 8 9 10 11 12 13 14 15
Operational objectives U U
Criterion-referenced test U U U U
Performance contracting U
Program Planning & Budgeting System U U
Program Evaluation & Review Technique U
Management by objectives U U U
Staff progress reports U
Financial reports & audits U
Zero Based Budgeting U
Cost analysis, cost-effectiveness analysis, & benefit-cost analysis U
Mandated “program drivers” & indicators U
Input, process, output databases U U
Independent goal achievement auditors U U
Procedural compliance audits U
Peer review U
Merit pay for individual and/or organizations U
Collective bargaining agreements U
Trial proceedings U
Mandated testing U U
Institutional report cards U
Self-studies U
Site visits by experts U
Program audits U
Standardized testing U U U U
Performance measures U U U
Computerized or other database U U U

Stufflebeam

Table 4: Comparison of the 13 Quasi-Evaluation Approaches on Characteristic EVALUATION METHODS
Hierarchical mixed model analysis U
Policy analysis U
Experimental & quasi-experimental designs U U
Study of outliers U U U
System analysis U
Analysis of archives U U
Collection of artifacts U U
Log diaries U
Content analysis U U
Independent & participant observers U U
Key informants U U
Advisory committees U
Interviews U U
Operational analysis U
Focus group U U
Questionnaires U U
Rating scales U U
Hearings & forums U U
In-depth descriptions U
Photographs U
Critical incidents U
Testimony U U U
Flow charts U
Decision trees U
Logic models U U U
Grounded theory U U
News clippings analysis U U
Cross-break tables U U U U
Expert critics U U U U
* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments, 9. Management information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship, 14. Program theory-based, 15. Mixed methods.

Questions/Methods-Oriented Evaluation Approaches

Table 5: Comparison of the 13 Quasi-Evaluation Approaches on Prevalent STRENGTHS
Evaluation Questions Evaluation Approaches (by identification number)*
3 4 5 6 7 8 9 10 11 12 13 14 15
Common senses appeal U U U U U U U
Widely known & applied U U U U
Employs operational objectives U
Employs the technology of testing U U U U U U
Efficient use of standardized tests U U
Popular among constituents & politicians U U U U
Focus on improving public services U
Can focus on audience’s most important questions U U U U
Defines obligations of service providers U
Requires production of and reporting on positive outcomes U
Seeks to improve services through competition U U
Efficient means of data collection U U U
Stress and validity & reliability U U U U
Triangulates findings from multiple sources U U U
Uses institutionalized database U
Monitors progress on each student U U
Emphasizes service to every student U
Hierarchical analysis of achievement U
Conducive to policy analysis U U
Employs trend analysis U
Strong provision for analyzing qualitative information U U U
Rejects use of artificial cut scores U U
Considers student background by using students as their owncontrols U
Considers contextual influences U U U

Stufflebeam

Table 5: Comparison of the 13 Quasi-Evaluation Approaches on Prevalent STRENGTHS
Uses authentic measures U U
Eliminates guessing U
Reinforces life skills U
Focuses on outcomes U U U U U U U
Focuses on a program’s strengths & weaknesses U U U
Determines cause & effects U
Examines program ’ s internal workings & how it produces outcomes U U
Guides program management U
Helps keep programs on track U
Guides broad study & improvement of program processes & outcomes U U
Can be done retrospectively or in real time U U U U
Documents costs of program inputs U
Maintains a financial history for the program U
Contrasts program alternatives on both costs & outcomes U
Employs rules of evidence U
Requires no controls of treatments & participants U U
Examin es prog rams as th ey naturally occur U U
Examines programs holistically & in depth U U
Engages experts to render refined descriptions & judgements U U
Yields in-depth, refined, effectively communicated analysis U U
Employs all relevant information sources & methods U U
Stresses complementarity of qualitative & quantitative methods U U
* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments, 9. Management information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship, 14. Program theory-based, 15. Mixed methods.

Questions/Methods-Oriented Evaluation Approaches

Table 6: Comparison of the 13 Quasi-Evaluation Approaches on Prevalent WEAKNESSES/LIMITATIONS
Weaknesses/Limitations Evaluation Approaches (by identification number)*
3 4 5 6 7 8 9 10 11 12 13 14 15
May credit unworthy objectives U
May define a program’s success in terms that are too narrow and mechanical and not attuned to beneficiaries’ various needs U
May employ only lower-order learning objectives U U U
Relies almost exclusively on multiple choice test data U U
May indicate mainly socioeconomic status, not quality of teaching & learning U
May reinforce & overemphasize multiple choice test taking ability to the exclusion of writing, speaking, etc. U U
May poorly test what teachers teach U U
Yields mainly terminal information that lacks utility for program improvement U U
Provides data only on student outcomes U