It is well known that while objective-type tests are easy to administer and score, creating quality objective items and choice options is very demanding, especially items that test good understanding in realistic situations. Thus the creative work of item construction and subsequent validation forms an important part of this project. There are several aspects to consider in devising the POSIT assessment tool, viz. inquiry approach, science content, type of items and assessment format. These are discussed below.
For any pedagogy-of-inquiry assessment tool to be effective, it must be based upon a widely accepted view of what is inquiry teaching. We will use the view of inquiry presented in the two documents that have gained broad national acceptance, viz. National Science Education Standards (1996) and Science for all Americans (1990). Consequently we have already developed a set of Inquiry-Item-Criteria (IIC) to guide item development & evaluation, consistent with the classroom inquiry features and variations in Table 1.
Inquiry-Item-Criteria. In abbreviated form, the criteria specify that inquiry-based science instruction proceeds such that learners:
a) Engage in scientifically oriented questions and explorations
b) Give priority to evidence in addressing questions
c) Formulate explanations from investigation and evidence
d) Connect explanations to scientific knowledge
e) Communicate and justify models and explanations. For the development, evaluation, and validation of our assessment instrument, the documents above and the Inquiry-Item-Criteria will define the universe of admissible and non-admissible observations. Note that it is difficult to attain these criteria readily with presentation-type science instruction.
Planning and implementing successful inquiry-based learning in the science classroom is a task demanding a combination of good science content knowledge and good inquiry pedagogy knowledge – the latter not just in general terms but as applied to specific topics being taught. In this regard an important feature of our POSIT tool for assessing science inquiry pedagogy is that the assessment items are posed not in generalities but based on specific science topic examples and real classroom scenarios. The science content must be grade appropriate and for this we look to National Science Education Standards (1996) and Science for all Americans (1990). Item scenarios and questions will involve science content specified in standards for the K-8 grades.
For its intended purposes the assessment tool needs to be readily administered and scored, and reliable. Thus we have chosen an objective format with variations. Every item will be written in three different but related formats, viz.
Because we plan to develop multiple items and item types to address each component of the Inquiry-Item-Criteria, we will be able to compare item difficulties and discrimination, which will provide evidence of the validity of the assessments. As such the development of the assessment tool can be conceptualized as a multi-trait, multi-method (MTMM) type of approach advocated by Campbell & Fiske (1959; also see Gulek 1999). In our study, the ‘traits’ are Inquiry-Item-Criteria and the ‘methods’ are the three item formats. (See Appendix for the MTMM table). Results on the Ranked Response (RR) and Constructed Response (CR) formats will also help refine the foils for the Multiple-Choice (MC) format.
The instrument will also be adaptable to web-based use and will conform to The Student Evaluation Standards (Gullickson 2003) and the Standards for Educational and Psychological Testing (AERA 1999).
The model we will use for item development is based on the curriculum technique of Problem-Based Learning (PBL), which has been widely used in medical education (Albanese & Mitchell 1993; Peterson 1997) and more recently adopted in science teacher education (Dean 1999; Ngeow & Kong 2001; Wang et al 1999). PBL presents students with a practical teaching problem, often in the form of a realistic scenario or vignette. Our model, borrowing from PBL, will use realistic K-8 science teaching situations. Each item will begin with a brief classroom vignette followed by a question and set of response choices, for both the multiple-choice and ranked response formats. The responses might be possible evaluations of the teacher’s actions so far, or alternative suggestions for what the teacher should do next in the science lesson. In the short-answer format, students construct their own responses. (See Appendix Two for examples of items.)
Basing a pedagogy-of-inquiry assessment on PBL has several advantages. Firstly, the assessment is realistic and hence more authentic. It is built upon actual classroom occurrences, for which there are resources, e.g., Tippens et al 2002, in the National Standards, and in materials such as the Harvard-Smithsonian case studies in science education, as well ideas generated from the teaching experiences of those on the item writing team. The scenario-based approach also complements texts that build on case studies for teaching methods
Secondly, it is an assessment that does not lapse into measurement of rote memory, nor of generalities about inquiry. Each item specifically requires either application or evaluation (in terms of Bloom’s Taxonomy– Anderson & Krathwohl 2001) involving specific cases. Successful application and evaluation require that one understand inquiry and its use in science content areas at the appropriate grades.
Thirdly, because the assessment involves pedagogical approaches, items are easily adapted for instructional use. A reciprocal relation can be developed between PBL as instruction and PBL as assessment. Problem-solving application of knowledge is needed before students fully grasp an area. Once a set of problem-based items on inquiry pedagogy is available, it can be used to help students develop a usable understanding of the general principles they are learning. Here we draw on the role of worked example problems in mathematics and physics teaching. Our items are essentially ‘problems.’ involving alternative pedagogical approaches to a given teaching situation. ‘Working through’ such problems with students operates as a scaffold for novices’ current lack of schemas and serves as a basis for effective instruction based on active engagement with example cases. Certain studies also suggest that students can learn effectively from suitable worked ‘teaching’ examples, rather than simply attempting lots of problems on their own. (Cooper & Sweller 1987; Maloney 1994; Sweller & Cooper 1985; Trafton & Reiser 1993; Ward & Sweller 1990). Our inquiry items could be used effectively in this way also.
We already have enough preliminary experience with the construction and use of items of this nature to give confidence in the viability of the concept and success of the project. The idea of problem-based vignette-type items was initiated and piloted in elementary science methods courses at WMU by one of the PIs over a period of four years. Some examples of items are in the Appendix.
There might be a concern that students could sometimes ‘guess’ the ‘desired’ answer to items. Our experience with the use of these early items suggests that this is not a significant problem. There was often a bimodal test distribution. We found that undergraduate students with good understanding of inquiry could readily identify it in problem-based teaching vignettes, but that many undergraduate methods students did not in fact have the ability to identify the inquiry approach in real contextual situations, nor guess it. Students either reached the stage where they ‘got’ it, or they didn’t. The idea that a teacher’s role is telling and explaining is deeply entrenched, hence students will often choose this kind of response.
In this section we describe the approach and steps we will take to develop, pilot, analyze, revise and field-validate the science inquiry pedagogy items and POSIT instrument
[Note: A detailed timeline for the entire project is provided near the end of the project description. The timeline lists all the project components, stages, personnel and responsibilities.]
a) Development of criteria for assessment items
As noted above, any pedagogy-of-inquiry assessment tool must be based upon a widely accepted view of what is inquiry teaching. Thus, we have already developed of a set of Inquiry-Item-Criteria based on the view of inquiry presented in the two documents that have gained broad national acceptance: National Science Education Standards (1996) and Science for All Americans (1990). We also use two documents that are supplemental to the National Standards, viz. Inquiry and the National Science Education Standards: A Guide for Teaching and Learning (2000) and Classroom Assessment and the National Science Education Standards (2001). We also use The Atlas of Scientific Literacy (2001), which is a supplement to Science for All Americans. These five documents form our “inquiry domain documents.” The creation of clear-cut criteria, based on a defined domain, to guide our writing teams’ preparation of the items is as important as the items themselves. These criteria initially establish logical linkages between pedagogical knowledge and the assessment items. The criteria compose a definition of inquiry science teaching that will form the basis for the assessment items and their evaluation.
b) Expert panel
A panel of eight experts as independent expert consultants will review and critique the Inquiry-Item-Criteria and then all of the POSIT assessment items as they are developed. The panelists are nationally recognized experts on science teaching, science teacher education or assessment. All are thoroughly familiar with the concept of teaching science as inquiry, and all have had experience with NSF projects. They will review our work based on their expertise and experience.
c) Review of criteria by expert panel
The panelists have already approved of the Inquiry-Item-Criteria and Table 1; however, as the Inquiry-Item-Criteria are further developed, the panel will be asked to review the criteria again. As a rule, our approval standard is agreement by 6 of 8 panel members for material submitted to the panel.
d) Item writing team composition
The item writing team is composed of three project PIs, a doctoral research associate (DRA), and five experienced schoolteachers. The teachers were selected by three grade groups, viz. K-2, 3-5 and 6-8. It is important to have teachers involved with the writing because they have direct knowledge of situations that teachers encounter.
e) Item writing specifications and procedures
The following will be specifications for item construction:
The initial goal is to produce 30 pedagogy-of-inquiry assessment items for each of the three grade groups, for a total of 90 items, each in three formats, giving a collection of 270 individual items. All will be constructed on problem-based scenarios. Teacher members will also share and discuss items with other teachers in their school. This broadened sharing has advantages in drawing in more ideas and promoting thought amongst teachers about good science teaching.
For all items developed, we will monitor science accuracy, pedagogy and compliance with specifications for item construction, consulting colleagues in science and education as appropriate.
f) External review of items by expert panel
As the items are completed they will be sent in reasonable batches (of say 4-6) to the panel of experts for external review. Each panel member will rate each item for science content accuracy and the appropriateness of response choices. This data will be used to establish the content validity of the items, the item’s construct link to the Inquiry-Item-Criteria, and to form the basis for establishing the best answer. This stage will be critical in the RR and CR item formats. Besides its role in item improvement, this process will to some extent assist in verifying the relevance of the criteria themselves. As the panel returns items, the writing team will make revisions.
g) Piloting with pre-service students at several collaborating institutions
Because we are developing an instrument primarily for pre-service K-8 teachers, we will pilot the items with the corresponding groups of undergraduate students at various institutions. We will liaise with collaborators at other institutions who will cooperate in piloting POSIT with pre-service student teachers at their institutions. We expect to have ten or more cooperating that will ensure racial, gender and regional diversity for piloting POSIT items. This is important to be able to claim national applicability for the instrument.
After the writing, reviewing and revision process we will assemble the items into a set of test forms. Two successive rounds of Pilot tests of the instrument will then be conducted at the collaborating institutions. Undergraduate students in elementary science methods courses intending to be teachers will be our subjects. With several participating universities, we will have access to between 500 and 800 subjects in each of two semesters. Members of our project team and our colleagues at the other universities teach these courses and supervise adjunct teachers. Through our collaborators we have ready access to the courses, allowing us to gather additional information about the subjects. After students in a course have taken the assessment, the instructor will use one session to go over the items as “worked examples”. Students will be able to discuss the science teaching aspects and raise questions. This turns the occasion of the research pilot test into an instructional opportunity for students and teachers. For research purposes we will videotape a sample of classroom discussions at each site. We will only tape in classrooms where consent is unanimous and testing will include consent forms with HSIRB approval.
Note that that there will be two stages of piloting, in successive semesters. The statistical analysis of Pilot #1, plus information from the associated discussions, will inform item revision, so that the Pilot #2 can be run with a modified and improved instrument. The writing team will reconvene for revisions, and samples of items ‘before and after’ will be sent to the expert panel for review. After the revision and review, POSIT items will be assembled into test format for Pilot #2, which is necessary to confirm the adequacy of the revisions.
We will also prepare our Scientific Literacy Survey for use during Pilot #2. Since it is possible that content knowledge of science would correlate with pedagogical knowledge of science inquiry, we plan to use an established scientific literacy survey (Laugksch & Spargo, 2000) that is based on Science for All Americans. This will be the first test of the null hypothesis that basic science knowledge is unrelated to pedagogical knowledge of inquiry teaching.
Pilot #2 will be conducted at the same collaborating institutions, using the revised POSIT items. It will follow the same basic procedure as the first, including the classroom discussions; but now the subjects will also take the Scientific Literacy Survey. Again we expect to have between 500 and 800 subjects.
h) Analysis of data from each round of pilot studies
Following each of the pilot tests at the participating institutions, analysis will focus on the following:
a) Review and finalize directions for test scoring and administration:
- Rubrics for Constructed Response (CR) items.
- Rankings and ratings for Ranked Response (RR) items.
- Scoring rules for full and partial credit in CR and RR item formats.
b) Initial reliability estimation.
c) Initial construction of the Multi-Trait Multi-Method (MTMM) matrix (Appendix One)
d) Three parameter Item Response Theory (IRT) will be used to calibrate items and estimate examinee pedagogical knowledge of inquiry science teaching.
e) Differential Item Functioning (DIF) will be examined for possible bias in gender, ethnicity and science concentration.
f) Initial estimates of criterion-related validity by examination of the correlation between performance on the new assessment instrument and academic performance. For the second pilot, in addition to the analyses conducted for the 1st pilot, student performance on the revised POSIT will be correlated with responses on the Scientific Literacy Survey to establish further construct validity evidence.
An important validity component of any educational assessment instrument is to establish content-related validity evidence. Since POSIT is comprised of different item formats and assess different dimensions of inquiry science teaching, the multi-trait/multi-method (MTMM) matrix will be used to establish construct validity evidence by gathering and analyzing the correlations among the item sets. The correlation matrix will be examined against the expected correlation pattern to see if there is convergent and divergent validity evidence.
After the analyses of both pilots, final revisions will be made and the items assembled into final test format. At this stage we will have fully developed the desired Pedagogy Of Science Inquiry Teaching (POSIT) instrument.
i) Interim progress reporting and dissemination
During both pilot-testing periods, we will share the project items with colleagues in science and education departments, and also have doctoral students in the sciences and science education examine the items. Besides providing a further check on item authenticity, it is also a way to foster interest in science teaching. We will draw science doctoral students’ attention to the NSF program for Graduate Teaching Fellows in K-12 Education (Vines 2002). Following analysis of the pilots, we will start preparing submissions for professional conferences to communicate progress thus far. We will create a web site for the dissemination of project information and items.
Although the POSIT instrument will have been developed and refined through this careful multi-stage procedure, it is important to validate it against observation of teacher classroom practice. The next stage will thus be field validation studies. Here the finalized POSIT instrument will be administered to a new pool of undergraduate students (about 600 in all over two semesters). From this pool a sample of about 60 will be randomly drawn, to whom the tests of covariates will be administered and two blinded classroom observation studies subsequently carried out for each subject, to obtain predictive validity evidence. At the same time a random sample of 30 in-service teachers will be drawn from the southwest Michigan and Detroit areas. These teachers also will take the POSIT, covariate instruments, and will be observed twice. The blinded field observations will be carried out by The Science and Mathematics Program Improvement (SAMPI) group. This aspect is described in (c) below.
a) Construct-Related Evidence and Predictive Validity Evidence
The validation studies will collect two primary aspects of validity evidence for POSIT: construct-related evidence and predictive validity evidence. We believe that the multiple development stages and piloting of POSIT, starting with the Inquiry-Item-Criteria and continuing with item writing, two-stage piloting, analysis and revision processes, will provide ample content-related validity evidence.
Although we are working with undergraduate science education majors, our ultimate goal is that our undergraduate students become competent teachers of K-8 school science. Hence, since our goal is to assess pedagogical content knowledge, a critical aspect of validity will be to establish a relationship between POSIT performance and teaching practice. Therefore we will test the hypothesis that level of inquiry pedagogical content knowledge is a good predictor that teachers will or will not be good inquiry teachers of science. Note that good Inquiry PCK should be a necessary, but not sufficient, condition for teaching well by inquiry. If a teacher does not have good understanding of inquiry pedagogy, we hypothesize that it will be almost impossible to implement good inquiry practice in the classroom.
Therefore a low score on the POSIT should be a clear predictor of poor inquiry teaching, and in testing whether POSIT competence is a necessary condition, we should be able to see if a minimum required scoring level can be set. On the other hand, a good score on POSIT cannot on its own be a sufficient condition for good inquiry practice. Knowing the principles of inquiry science teaching does not guarantee that such teaching will occur – there are other factors in a school environment as well as personal attributes that might work against it in practice. Thus to investigate the ‘positive’ hypothesis we can also look for correlation between POSIT scores and effective inquiry teaching, while identifying and examining other variables that are likely to affect practice. Critical examination of the literature and theories underlying inquiry, and discussions with our panel suggests there may be several main intervening variables, as in the table below. We intend to assess our subjects on each of these variables as part of our validation studies.
Independent Variable Covariables Dependent Variable
A. Science Knowledge SAMPI Field Observation Assessment of Inquiry Teaching Score on Pedagogy of Science Inquiry Teaching (POSIT) test
B. Science Teaching Self-Confidence
C. Science Teaching Efficacy
D. Science Interest/Attitudes
E. School Environment for Science Teaching
F. Motivation/Attitude toward hands-on and inquiry-based teaching
We will assess science knowledge using the Laugksch & Spargo (2000) Scientific Literacy Survey. Research on student achievement gains indicates that students learn more from teachers with better academic and subject skills (Hanushek 1997; Mayer et al 2000; Wayne 2002; also see Klentschy et al, in press). Self-confidence as a science teacher and confidence that one’s teaching is efficacious are established constructs and will be assessed by instruments from Enochs & Riggs (1990). There is at least 25 years of work on science attitudes with regard to teachers and we will assess attitude using an established instrument, the Scientific Attitude Inventory (Moore & Foy, 1997). The fifth key-mediating variable is the school environment vis-à-vis inquiry science teaching. We will assess environment using an interview protocol adapted from the Horizon, Inc (2003) Local Systemic Change Initiative. The sixth key mediating variable is motivation and attitude. Is a teacher motivated to use inquiry pedagogy? We will assess motivation by using the Attitude Toward Hands-on & Inquiry-based Teaching developed by the Oklahoma Teacher Education Collaborative (NSF 9553790).
The instruments for covariables A and D will be administered along with POSIT to all students forming the undergraduate pool. Covariables B, C, E and F are sensitive to classroom situations and thus will be administered only to the sample group (below) once students are in the field as student teachers. The data for covariables will be collected prior to the field observations. In-service subjects, after recruitment (see below), will take POSIT and all covariant tests prior to classroom observation.
b) Testing pools and samples for the validation studies
For the validation studies, we will first arrange new testing pools of subjects to take the now-finalized POSIT test. We plan two undergraduate pools of about 300 subjects each, in successive semesters, drawn from the Detroit, Southwest Michigan and Columbus, Ohio areas to maintain diversity. When the undergraduate students take the POSIT and covariate instruments A and D, they will also fill out a form indicating whether they would be willing to participate in the observation studies. SAMPI will then draw a random sample from the first and second testing pools of pre-service subjects. Our aim is to get a total sample of about 60 pre-service teachers to observe in the classroom. We will continue testing pools in the successive semester until we have the full sample for field observations.
Thus far we have talked of the use of POSIT during the undergraduate instruction of pre-service students, and checking predictive validity by observing their subsequent teaching. It is recognized however that there may be various pressures influencing student teachers in particular, which means that lessons may not always go as intended. Therefore we will also do field validation with a sample of about 30 experienced in-service teachers randomly selected from districts across southwest Michigan and Detroit areas. By including an in-service component, the study reflects the idea that the POSIT items could also be used as an evaluative tool for in-service teacher projects, besides their intended use during teacher preparation.
The sampling will be done by SAMPI who will also see that covariate instruments B, C, E and F are distributed to the subjects. These instruments, however, will be returned directly to the research team so that the SAMPI personnel remain blind to the POSIT and covariate instrument results.
In scheduling the classroom observations, SAMPI will not specifically mention ‘inquiry teaching’, but simply ask to observe two science lessons per subject including pre-and post-lesson interviews. There will be inducements to participate. The first is professional growth; the SAMPI procedure includes a debriefing session where the teacher receives feedback on his or her efforts in a non-threatening environment. The second is an honorarium of $200 for teachers who participate in both observations.
c) Field evaluations of teacher practice
As indicated, SAMPI (the Science and Mathematics Improvement group) will be contracted to conduct the classroom evaluations of our subject teachers. SAMPI is based at Western Michigan University, but is completely independent of this project, and serves a wide external clientele. Information about SAMPI is readily accessible from the SAMPI web site (www.wmich.edu/sampi). The SAMPI observation tool is a tool of established validity and reliability for the evaluation of science teaching practice (Barley & Jenness, 1994; Jenness & Barley, 1999; Jenness, 2001). It has been used in both NSF and Eisenhower funded projects, and has aspects in common with the Reformed Teaching Observation Protocol (RTOP). SAMPI will arrange for, and conduct, two observations per subject, including interviews and lesson plan evaluation. The teaching evaluations will be blind; however, SAMPI will have the Inquiry-Item-Criteria so that they can ensure alignment with their classroom observation protocol.
SAMPI personnel will not have access to any other data collected from these subjects. Based on analyzed data from two teaching observations, with pre-and post-lesson interviews for each, including lesson plans, SAMPI will arrive at a classification for each teacher. The SAMPI director will study all the observational data and categorize the teachers as “good, satisfactory, or poor” users of inquiry science practice, according to standard SAMPI practice. The SAMPI director and personnel will not have had any previous contact with our project, and the project team will not know the sample, hence this will be a double-blind evaluation of the subject teachers.
Once the classroom observations have been scored they will serve as the criterion in subsequent analyses aimed at establishing the criterion-related validity of the POSIT; for example, logistic regression (LR) and discriminate function analysis (DFA) will be examined. DFA will establish the ability if the POSIT instrument to correctly classify teachers as good, satisfactory, or poor users of inquiry teaching, and the logistic regression can provide useful estimates (odds ratio) of the impact of POSIT on inquiry teaching practice after accounting for the covariables: science knowledge, science teaching self-confidence and efficacy, science attitudes, school environment, and motivation.