WHAT THE PROGRAM EVALUATION STANDARDS
SAY ABOUT
DESIGNING  EVALUATIONS


The Joint Committee on Standards for Educational Evaluation (1994). The Program Evaluation Standards. Thousand Oaks, CA: Sage Publications, Inc. All rights reserved. Approved by the American National Standards Institute as an American national standard. Approval date: March 15, 1994.



This document was prepared by the Joint Committee on Standards for Educational Evaluation as a derivative of The Program Evaluation Standards (Thousand Oaks, CA: Sage Publications, Inc., 1994). It is a compilation of advice from hundreds of practitioners in education and evaluation regarding one function in the program evaluation process. Other derivative documents related to evaluation functions are available. The evaluation functions covered in this series are:



1. Deciding Whether to Evaluate

2. Defining the Evaluation Problem

3. Designing the Evaluation

4. Collecting Information

5. Analyzing Information

6. Reporting the Evaluation

7. Budgeting the Evaluation

8. Contracting for Evaluation

9. Managing the Evaluation

10. Staffing the Evaluation



These derivative documents are available at cost from:



The Joint Committee on Standards for Educational Evaluation

The Evaluation Center

Western Michigan University

Kalamazoo, Michigan 49008



Partial support for the development of these derivative documents came from the National Science Foundation under award number SED-9255369 to Westat, Inc. The Joint Committee takes full responsibility for its content.



DESIGNING EVALUATIONS



Stakeholder Identification



1. Identify persons in leadership roles and ask them to identify other stakeholders in the evaluation. Contact representatives of identified stakeholder groups to learn how they view the evaluation's importance, how they would like to use its results, and what particular information would be useful. Where necessary, help them to develop realistic expectations that take into account the methodological, financial, and political constraints on the evaluation. (U1)



2. Use stakeholders to identify and contact other stakeholders (U1)



3. Reach an understanding with the client concerning the relative importance of the potential stakeholders and the information they desire, and plan and implement the data collection and the reporting activities accordingly (U1)



4. Throughout the evaluation, be alert to identifying additional stakeholders that should be served and, within the limits of time and resources, maintain some flexibility and capability to respond to their needs (U1)



5. Involve clients and other stakeholders directly in designing and conducting the evaluation (U1)



6. Be certain not to exclude any stakeholder because of gender, ethnicity, or language background (U1)



7. Do not allow clients to inappropriately restrict the evaluator's contact with other involved or affected stakeholders (U1)



8. Do not attempt to address all stakeholder information needs when, in reality, they cannot all be addressed (U1)



9. Do not assume that persons in leadership or decision making roles are the only, or most important, stakeholders (U1)



10. Avoid overidentifying stakeholders making it impossible to proceed (U1)



11. Do not fail to distinguish between the client and other stakeholders (U1)



Evaluator Credibility



12. Stay abreast of social and political forces associated with the evaluation, especially those linked to race, gender, socioeconomic status, and language and cultural differences, and use this knowledge when designing and conducting the evaluation (U2)



13. Ensure that both the work plan and the composition of the evaluation team are responsive to the concerns of key stakeholders (U2)



14. Consider having the evaluation plan reviewed and the evaluation work audited by another evaluator whose credentials are acceptable to the client (U2)



15. Be clear in describing the evaluation plan to various stakeholders and demonstrate that the plan is realistic and technically sound (U2)



16. Determine key audience needs for information on the progress of the evaluation and keep them informed about the progress of the evaluation through such means as newsletters, progress reports, telephone calls, memoranda, press releases, and meetings (U2)



17. Include in evaluation proposals a statement describing the evaluator's qualifications relevant to the program being evaluated (U2)



18. Seek evaluators experienced in the setting of the evaluation (U2)



19. Do not overinvest resources to achieve credibility and acceptance (U2)



20. Do not assume that the evaluator's approach to evaluation is acceptable to the client (U2)



21. Avoid turning over the evaluation to an inexperienced student or staff assistant (U2)



Information Scope and Selection



22. Understand client requirements for the evaluation (U3)



23. Interview representatives of major stakeholders to gain an understanding of their different and perhaps conflicting points of view and of their need for information (U3)



24. Avoid giving the impression that all questions will be answered (U3)



25. Help stakeholders develop realistic expectations in light of available financial, time, and personnel resources (U3)



26. Have the client rank potential audiences in order of importance and work with representatives of each stakeholder group to rank topics in order of importance to that audience (U3)



27. Work with the client to collate the ordered topics from each audience, to remove items at the bottom of the list, and to add items that the evaluator believes to be important even though not requested (U3)



28. Allow flexibility for adding questions and including unanticipated information that may arise during the evaluation (U3)



29. Distribute the entire evaluation effort (data collection, analysis, interpretation, and reporting) over the final list of topics, placing the most effort on high-ranked items (U3)



30. Consider the tradeoffs between comprehensiveness and selectivity at every stage of the evaluation: developing the plan; setting the budget; and collecting, analyzing, interpreting, and reporting information (U3)



31. Give voice to multiple stakeholder groups in the process of selecting priority evaluation questions (U3)



32. Do not collect information because it is convenient (e.g., because instruments already exist), rather than because it is necessary (U3)



33. Delimit the scope of the evaluation, i.e., failing to state the questions that will be answered and the purpose of the evaluation, and keeping them in mind at every stage of the evaluation (U3)



34. Do not collect information that is extraneous to the central purpose of the evaluation (U3)



Values Identification



35. Consider alternative bases for interpreting findings: e.g., program objectives, procedural specifications, laws and regulations, institutional goals, democratic ideals, social norms, performance by a comparison group, assessed needs of a consumer group, expected performance of the sample group, professional standards, and reported judgments by various reference groups (U4)



36. Consider who will make interpretations: e.g., the evaluators, the client, the various stakeholders, a regulatory group, or some combination of these (U4)



37. Consider alternative techniques that might be used to assign value meanings to collected information: e.g., having different teams write advocacy reports; conducting a jury or administrative trial of the program being evaluated; or seeking convergence through a delphi study (U4)



38. Do not assume that evaluations can be objective in the sense of being devoid of value judgment (U4)



39. Do not design the data collection and analysis procedures without considering what criteria, such as performance by a comparison group or performance in terms of a predetermined standard, will be needed to interpret the findings (U4)



40. Do not concentrate so heavily on clarifying values that insufficient time and effort are devoted to collecting and analyzing the information needed to make value judgments (U4)



41. Acknowledge that decision rules often are arbitrary and therefore subject to debate (U4)



Practical Procedures



42. Ensure the availability of qualified personnel to complete the evaluation as proposed, including the need to train any personnel who need it (F1)



43. Choose procedures that can be carried out with reasonable effort and that are compatible with the skill level of personnel available for the study (F1)



44. Select procedures in light of known time constraints and the availability of participants or respondents (F1)



45. Whenever appropriate, make evaluation activities a part of routine events (F1)



46. Develop alternative procedures in anticipation of potential problems and retain sufficient flexibility in the plan and budget so that unanticipated problems can be addressed as they occur (F1)



47. Check with the clients about the viability of the schedule for completing the evaluation and the practicality of various data collection procedures before finalizing the data collection plan (F1)



48. Try out procedures and instruments in a pilot test to determine their practicality and their time requirements (F1)



49. Avoid choosing a data collection and analysis plan from a research methods textbook or other general guide without considering whether the plan can be carried out in the given setting (F1)



50. Do not fail to weigh practicality against accuracy--if circumstances will inhibit the collection of valid and reliable data, work with the client to remove or alter these circumstances. If this proves unsuccessful, seriously consider using other procedures or not doing the evaluation. (F1)



51. Avoid disrupting program activities in an attempt to collect information (F1)



Formal Agreements



52. Include the evaluation design in any formal agreements (P2)



53. Do not expect participation in the evaluation by persons who have not previously agreed to do so (P2)



54. Do not act unilaterally in a matter where it has been agreed that evaluator/client collaboration would be required for decisions (P2)



55. Do not change the design without amending formal agreements (P2)



56. Do not adhere so rigidly to contracts that changes dictated by common sense are not made or are unduly delayed (P2)



57. Do not develop contracts that are so detailed that they stifle the creativity of the evaluation team or that detract from conducting the evaluation (P2)



Complete and Fair Assessments



58. Design the evaluation to record both strengths and weaknesses of the program being evaluated (P5)



Program Documentation



59. Ask the client and the other stakeholders to describe--orally and, if possible, in writing--the intended and the actual program with reference to such characteristics as personnel, cost, procedures, location, facilities, setting, activities, objectives, nature of participation, and potential side effects (A1)



60. Collect and analyze for differences and similarities available descriptions of the program, including proposals, public relations reports, slide-tape presentations, and staff progress and final reports (A1)



61. Engage independent observers to describe the program if time and budget permit (A1)



62. Set aside time at the beginning of the evaluation to observe the program and the staff and participants who are involved (A1)

63. As part of the ongoing evaluation process, maintain up-to-date descriptions of the program from different information sources (e.g., participant observers, minutes of staff meetings, interviews of participants, and progress reports), giving particular attention to changes in the description (A1)



64. Consider developing separate descriptions for each aspect of the program being studied (A1)



65. Do not rely solely on the client's or the funding proposal's description of the program (A1)



66. Do not gloss over a description of the program by saying, for example, that "the treatment was all that occurred between time 1 and 2," without describing the actual events (A1)



67. Do not concentrate so much on describing the program that insufficient time is available for assessing its strengths and weaknesses (A1)



68. Do not assume that the program is uniformly implemented as intended (A1)



Described Purposes and Procedures



69. Discuss thoroughly and record the client's initial conceptions of the purposes of the evaluation, and the intended uses of the findings from the evaluation (A3)



70. Discuss thoroughly and record the client's initial conceptions of how the evaluation's purposes will be achieved (A3)



71. Keep a copy of the evaluation plan and the evaluation contract (if one was negotiated) (A3)



72. Reach a clear understanding with the client of major changes in evaluation purposes and procedures as the changes are made (A3)



73. Record any major changes in purposes and procedures and the date on which they occurred (A3)



74. Plan to describe purposes and procedures at the conclusion of the evaluation in both a summary report (executive report) and a full technical report, noting deviations from original plans (A3)



75. Engage independent evaluators to monitor the purposes and procedures of the evaluation, and evaluate them whenever feasible, especially in the case of large-scale evaluations (A3, A12)

76. Allow for the adjustments in purpose and procedure that may be needed during the evaluation (A3)



Described Information Sources



77. Use previously collected information that is pertinent to the evaluation once its soundness has been determined (A4)



78. Do not assume that information based on personal interviews, testimony, observations, or document analysis contains distortions, and hence is not worthy of consideration. Conversely, assuming that "hard" quantitative data lack distortion and hence should be weighted heavily in evaluation. (A4)



79. For each data collection activity describe and justify the sources of information to be used in the study (A4)



80. Assess the adequacy of the information sources as part of the technical documentation of the evaluation, acknowledging limitations that may exist (A4)



Valid Information



81. Check information collection procedures against the objectives and content of the program being evaluated to determine the degree of fit or congruence between them. This check should be informed at least in part by personnel responsible for the program and its operation and by representatives of important stakeholder groups. (A5)



82. Consider the Standards for Educational and Psychological Testing and other available sets of standards, and apply them when making decisions about educational and psychological tests to be used in the evaluation (A5)



83. Consider validity evidence from other similar evaluations in which proposed procedures were used (A5)



84. Ensure that the individuals who will administer or use a particular procedure are qualified and adequately prepared (in terms of knowledge, training, and practice) to do so (A5)



85. For newly developed procedures, present the rationale for the extent of validity claimed. Point out that such procedures are exploratory, and that results obtained from them must be interpreted cautiously and with a clear understanding of the limited validity evidence. Further, proper account must be taken of the context, the characteristics of the subjects or groups with whom the procedure was used, and the qualifications and training, if needed, of the individuals who administered or used the procedure. Use multiple measures to help clarify the validity of the inferences drawn from the information yielded by the new procedure. (A5)



86. Use multiple procedures to obtain a more comprehensive assessment, but do so in as nondisruptive and parsimonious a manner as possible. Often it is desirable to employ nonreactive procedures, and to assess samples instead of populations of respondents. Use existing records, if relevant. (A5)



87. Assess the comprehensiveness of the information provided by the procedures as a set in relation to the information needed to answer the set of evaluation questions (A5)



88. Consider respondent characteristics, such as reading ability, language proficiency, or physical handicaps, that may affect the validity of evaluation results (A5)



89. Do not base important decisions on only one procedure or operational definition of a critical variable (A5)



90. Do not expect that procedures yielding valid inferences can be constructed or developed quickly and easily (A5)



91. Use existing procedures yielding valid inferences when they are available (A5)



92. Ensure that personnel responsible for collecting information are adequately qualified and prepared to perform their assigned tasks (A5)



93. Ensure that observations and descriptions of a process or event are adequately conducted and completed (A5)



94. Allow qualified stakeholders the opportunity to review an instrument or procedure prior to its use (A5)



Reliable Information



95. Whenever possible, evaluators should choose information gathering procedures that have, in the past, yielded data and information with acceptable reliability for their intended uses; however, the generalizability of previously favorable reliability results may not be simply assumed. Reliability information should be collected that is directly relevant to the groups and ways in which the information gathering procedures will be used in the evaluation. (A6)



96. For newly developed information gathering procedures, present the rationale for the type and extent of reliability claimed. Proper account must be taken of the content or behavior assessed by the procedure, of the ways in which the procedures were administered to the subjects or groups, and of the heterogeneity of the persons in terms of the characteristics being measured or observed, for these factors all influence reliability. (A6)



97. Discuss developing propositions, interpretations, and conclusions with an impartial peer to help clarify own posture and values and their role in the inquiry (A6)



98. In the case of open ended instruments and procedures, check the consistency of scoring, categorization, and coding by two or more qualified persons independently analyzing the same set of information or by an outside auditor verifying that the data have been consistently analyzed (A6)



99. Provide adequate training to scorers and analysts to ensure that they are sensitized to the kinds of mistakes they are likely to make, and know the procedures to avoid these mistakes (A6)



100. Do not interpret evidence of one type of reliability (e.g., internal consistency, stability over time, interobserver agreement) as evidence of another type, i.e., different reliabilities reflect different sources of measurement error which influence the interpreting of information in different ways (A6)



101. Do not rely upon the reliability evidence that is reported for a published instrument or procedure taken at face value without considering the likely effects of differences between the setting and sample of the reported reliability study and those of the evaluation (A6)



102. Take into account the fact that the reliability of the scores provided by an instrument or procedure may fluctuate depending on how, when, and to whom the instrument or procedure is administered (A6)



103. Do not assume that because the reliability of individual scores for an instrument is low, the reliability of mean scores for a group will also be low (A6)



104. Do not interpret reliability coefficients for measures of continuous variables as evidence of the reliability of dichotomous decisions (eg., pass-fail; mastery-nonmastery) based on these measures (A6)



105. Recognize that the reliability of a set of difference scores is typically less than the reliability of either of the two sets of scores used to compute the differences (A6)



106. Do not use scores with low reliabilities as influential outcome information (A6)



107. Do not assume that because reliability is high, validity is also high (A6)



108. Do not assume that the observations of one evaluator are not affected by the evaluator's perspective, training, or previous experience (A6)



Analysis of Quantitative Information



109. Choose analytic procedures that are appropriate to the evaluation questions and the nature of the data (A8)



110. Conduct multiple analyses of the data, as is usually warranted (A8)



111. Report potential weaknesses in the study design or data analysis and describe their possible influence on interpretations and conclusions (e.g., attrition, violation of assumptions) (A8)



112. Do not assume that significant statistical results are necessarily of practical significance



113. Do not assume that gain scores, matching, or analysis of covariance will necessarily provide an adequate adjustment for preexisting differences among groups (A8)



114. Use the correct unit of analysis when analyzing quantitative information (U8)



115. Do not use complex statistical techniques when the audience would be better served by the use of simpler analytical methods and graphs (A8)



116. Avoid emphasizing rigor at the expense of relevance, and vice versa (A8)



117. All evaluations do not need to use statistical analyses (A8)



118. All evaluations do not need to be comparative studies (A8)



119. Recognize and exploit the complementarity between qualitative and quantitative analyses and that interpretations and conclusions should be supported by both (A8)



Analysis of Qualitative Information



120. Choose an analytic procedure and method of summarization that is appropriate to the questions to be addressed in the study and to the nature of the qualitative information to be collected (A9)



121. Focus the analysis on clear questions of interest and define the boundaries of information to be examined, e.g., time period, funded activities, target student or other client population, and geographic location (A9)





122. Seek corroboration of qualitative evidence using independent methods and sources (A9)



123. Do not regard qualitative data analysis as relatively nonrigorous and as something that can be accomplished well enough on an intuitive basis without training, choosing information to reinforce preconceptions rather than examine the validity of preconceptions or working hypotheses (A9)



124. Consider alternative interpretations of reality and the multiple value perspectives that may exist in an evaluation situation (A9)



125. Distinguish among different sources of qualitative information on such bases as credibility, degree of expertise, and degree of involvement (A9)





Justified Conclusions



126. Plan to generate, assess, and report plausible alternative explanations of the findings, and, where possible, indicate why these explanations should be discounted (A10)



127. Plan to solicit feedback from a variety of program participants about the credibility of interpretations, explanations, conclusions, and recommendations before finalizing reports. Plan to point out common misinterpretations and inappropriate inferences that may be drawn from the information collected (A10)



128. Attend to possible side effects of the program in reaching conclusions about its effectiveness (A10)



Impartial Reporting



129. Reach agreement with the client during the initial stages of the evaluation about the steps to be taken to ensure the fairness of all reports (A11)



130. Clarify the nature of and authority for editing (A11)



131. Ensure the evaluation report includes perspectives independent of the perspectives of those whose work is being evaluated (A11)



132. Plan to seek out and report alternative, perhaps even conflicting, conclusions and recommendations (A11)



133. Strive to establish and maintain independence in reporting, using techniques such as adversary-advocacy reports, outside audits, or rotation of evaluation team members over various audience contacts (A11)



134. Do not assume that all parties in an evaluation are neutral (A11)



135. Avoid surrendering the authority to edit reports (A11)



136. Be involved in public presentations of the findings as the situation warrants (A11)



137. Do not become so isolated from the program developer that potentially useful information from the developer is not reported to the evaluator, and there is no good way for feedback to be transmitted from the evaluator to the program developer (A11)



Metaevaluation



138. Budget sufficient money and other resources to conduct appropriate formative and summative metaevaluations (A12)



139. Assign someone responsibility for documenting and assessing the program evaluation process and products (A12)



140. Consider asking a respected professional body to nominate someone to chair a team of external metaevaluators in large evaluations. Failing that, either (a) appoint a team and have it elect the chair, or (b) carefully and judiciously select as chair someone who will be competent and credible, and work with this individual to appoint other team members. (A12)



141. Determine and record the rules by which members of the metaevaluation team will reach a consensus and/or issue minority reports (A12)



142. Stipulate that any member of the metaevaluation team who does not fulfill contracted obligations can be dismissed at the discretion of the chair (A12)



143. Reserve final authority for editing the metaevaluation report to the metaevaluators (A12)



144. Determine and record which audiences will receive the metaevaluation reports and how the reports will be transmitted (A12)



145. Evaluate the instrumentation, data collection, data handling, coding, and analysis of the program evaluation to determine how carefully and effectively these steps were implemented (A12)



146. Expect that the metaevaluation itself will be subject to rebuttal and evaluation, and maintain a record of all metaevaluation steps, information, and analyses (A12)



147. Do not conduct only an internal metaevaluation when conflict of interest or other considerations clearly establish the need for an external metaevaluation (A12)



148. Do not assume that every program evaluation study requires a formal metaevaluation study (A12)