Beruflich Dokumente
Kultur Dokumente
®
This report was written by:
Table of Contents
Henry Braun
Boston College
Richard Coley Preface....................................................................................................................2
Yue Jia Acknowledgments..................................................................................................2
Catherine Trapani
Educational Testing Service Executive Summary..............................................................................................3
Science Achievement .....................................................................................3
The reviews expressed in this report
are those of the authors and do not Teacher Characteristics...................................................................................3
necessarily reflect the views of the
officers and trustees of Educational Instructional Strategies..................................................................................3
Testing Service. Understanding Science Achievement............................................................3
May 2009.
Policy Evaluation and.
Research Center.
Policy Information Center.
Educational Testing Service
Preface
In a recent opinion piece in The New York Times, . these concerns from both policy and pedagogical .
Brian Green, professor of physics at Columbia . perspectives. In support of these efforts, Henry .
University and author of The Elegant Universe . Braun, Richard Coley, Yue Jia, and Catherine .
and The Fabric of the Cosmos, writes: Trapani have mined the data of the National .
Assessment of Educational Progress (NAEP) with .
Science is the greatest of all adventure stories, sophisticated statistical models in an attempt .
one that’s been unfolding for thousands of years to identify aspects of U.S. eighth-grade science .
as we have sought to understand ourselves and classrooms that are associated with science .
our surroundings. Science needs to be taught to achievement. Their report, Exploring What Works
the young and communicated to the mature in a in Science Instruction: A Look at the Eighth-Grade
manner that captures this drama. We must embark Science Classroom, identifies instructional strategies
on a cultural shift that places science in its rightful and teacher characteristics that appear to make a .
place alongside music, art and literature as an difference in NAEP science scores. And because
indispensable part of what makes life worth living. teacher practices that make a difference must make .
a difference for all Americans, these practices are .
In reality, however, the science achievement of . examined for the population generally and for .
U.S. students and the science literacy of the general . students grouped by their various racial/ethnic .
population are mediocre. Such mediocrity does not characteristics and income level. The results offer
bode well for our nation’s ability to compete with . much food for thought for anyone concerned with
the growing science and engineering talent that is . science education.
emerging among the many nations with which we .
will be both competing and cooperating in the future. Michael T. Nettles.
Senior Vice President .
Broad recognition of the challenges that our nation Policy Evaluation and Research Center
faces has led to the creation of several special .
commissions and resulting reports that address .
Acknowledgments
The authors wish to acknowledge the helpful . Scott Davis provided data analyses. Richard Pliskin
comments and feedback provided by the following . was the editor, Marita Gray designed the cover, .
reviewers: Courtney Bell, ETS; Richard Duschl, . and Sally Acquaviva provided desktop publishing .
Pennsylvania State University; Kate McNeil, . services. Errors of fact or interpretation are those .
Boston College; and Andreas Oranje, ETS. . of the authors.
Waverly Van Winkle, Darlene Rosero, and .
Executive Summary
The science achievement of U.S. students has been About one-third of eighth-grade students have .
flat for a decade; in fact, in a recent international teachers with four years or less experience teaching
assessment, U.S. students ranked lower, on aver- science. Many students have science teachers with
age, than their peers in 16 of 30 developed nations. more overall teaching experience than science .
The National Assessment of Educational Progress teaching experience, raising the possibility that .
(NAEP), in addition to serving as the nation’s report science teaching was not the field of choice for those
card, collects information from students, teach- teachers or the subject for which they trained.
ers, and schools on instructional practices, teacher
characteristics, and school factors, allowing us to Instructional Strategies
peer into U.S. schools and view students and teachers As reported by students, several instructional activi-
as they interact in the context of schools and class- ties stand out as more commonly practiced than
rooms. This report takes advantage of these data . others. Reading a science textbook, taking a science
and provides a view of eighth-grade science class- test or quiz, and having a teacher conduct a science
rooms in 2005, using statistical models particularly demonstration were activities that more than half
suited to this type of data — hierarchical linear mod- of eighth-grade students indicated occurred almost
els (HLMs). Our intent is to identify teacher charac- every day or once or twice a week. Other popular
teristics and instructional strategies that are associ- activities (40 percent or more of students report-
ated with student science achievement. After briefly ing frequent activity) were hands-on exercises and
describing the state of science achievement, teacher investigations, talking about results from hands-on
characteristics, and instructional strategies across exercises, and working with other students on a .
U.S. eighth-grade classrooms, the report presents . science activity or project. Activities occurring much
the results of a three-phase analysis that is designed less often included presenting an oral science report,
to illuminate the statistical relationships among . preparing a written science report, using library .
these variables. resources for science, and reading a book or .
magazine about science. In addition, 70 percent .
Science Achievement of students indicated that they did science projects
Since 1996, the overall science scores of eighth grad- that took a week or more.
ers have remained unchanged at 149 (on a scale of 0
to 300). Black students, however, showed an increase Understanding Science Achievement
of 3 scale points since the 1996 assessment. Scores Using multilevel analyses, we found the following
of no other racial/ethnic group improved. Although student and teacher characteristics to be associated
reduced, the White-Black achievement gap remained with student science achievement:
substantial. In 2005, White eighth graders scored
36 points higher than Black eighth graders and 31 • Student demographic characteristics had statisti-
points higher than Hispanic eighth graders. Male cally significant associations with achievement.
students continued to score higher than females. Black and Hispanic students scored considerably
lower than White students, and males scored
With respect to the achievement levels set by the . higher than females. English-language learners
National Assessment Governing Board (NAGB), in and students with disabilities scored much lower
2005, 59 percent of eighth graders scored at or above than other students.
the Basic level, 29 percent performed at or above the .
Proficient level, and 3 percent scored at or above . • Students with many books in the home scored
the Advanced level. considerably higher than students with fewer
books; and students who were absent frequently
Teacher Characteristics scored much lower than other students.
Most eighth-grade students (83 percent) have a . • Students whose teachers held a standard teaching
science teacher who possesses a regular/standard certificate scored slightly higher than other .
teaching certificate. About half have teachers with students, and students whose teachers’ years .
less than 10 years of experience teaching elementary of total experience exceeded their years of .
or secondary school. The distribution of teacher science experience scored slightly lower than
experience in science teaching inclines toward the other students.
lower end of the range — that is, less experience.
1
W. Grigg, M. Lauko, and D. Brockway, The Nation’s Report Card: Science 2005 (NCES 2006 – 466), U.S. Department of Education, National .
Center for Education Statistics, Washington, D.C., 2006 and Education Week, “U.S. Students Fall Short in Math and Science,” December 4, 2007.
We also used multilevel analyses to identify . representative of the three subgroups, from among
which pedagogical strategies made an incremental those studied in the multilevel analysis. For each one,
contribution to accounting for differences in stu- we cross-classified students by racial/ethnic group .
dents’ science achievement after adjusting for student and the percentage of students in the school .
and teacher characteristics. Three subgroups of eligible for the school lunch program, a measure .
teacher pedagogy variables showed different patterns of school disadvantage.
of association between the use of the pedagogy and
science achievement. • Reading a science textbook – Across all racial/
ethnic and school disadvantage groups, scores
The first group comprises instructional strategies increase with the frequency of reading a science
for which increasing frequency was associated with textbook. The percentage of students in the op-
higher average scores: timal categories is similar for each combination
of race/ethnicity and school disadvantage. Thus,
• Reading a science textbook the analysis suggests that mean score differences
among racial/ethnic groups cannot be accounted
• Doing hands-on activities in science
for by differences in exposure to this instructional
• Writing long answers to science tests . strategy. These data suggest that it is reasonable
and assignments to recommend that science teachers should make
some use of science textbooks in their teaching.
• Students talking about measurements .
and results from hands-on activities • Working with others on a science project – .
Across all racial/ethnic and school disadvantage
• Students working with others on a science . groups, scores increase with the frequency of
activity or project working with others on a science project. Thus,
the analysis indicates that mean score differences
The second group comprises instructional activities among racial/ethnic groups cannot be accounted
for which increasing frequency was associated with for by differences in exposure to this instructional
lower average scores: strategy. A reasonable inference from these data
is that science teachers can make effective use of
• Students giving an oral science report group work on science projects.
• Students using library resources for science
• Students giving an oral science report – .
Across all racial/ethnic and school disadvantage
The third group is characterized by the “Goldilocks”
groups, scores decrease with the frequency of
metaphor. Higher scores were associated with a .
students giving an oral science report. The score
moderate or intermediate frequency for these .
deficit is most severe for students who reported
instructional strategies:
giving an oral report more often than once or
twice a month. Black and Hispanic students and
• Students taking a science test
students attending more disadvantaged schools
• Teacher doing a science demonstration were more likely to experience higher frequencies
of this strategy. Since excessive use of this strategy
• Students discussing science in the news is more likely to be seen in more disadvantaged
• Students reading a book or magazine about science schools, curtailing this practice may help to close
the achievement gap.
• Students preparing a written science report
• Teacher doing a science demonstration – .
Another phase of analysis was undertaken to deter- Across all racial/ethnic and school disadvantage
mine whether the patterns revealed by multilevel groups, scores are lowest in the “never or hardly
analysis hold when the results are disaggregated by ever” category and highest in the category of
various student and school characteristics. Having “one or two times a week.” For all levels of school
identified factors that are related to achievement, . disadvantage, Black students are less likely to be
can we detect differences among groups of students exposed to the optimal use of this strategy. Thus,
in their access to these “effective” pedagogies or . schools may make progress in closing the achieve-
strategies? We chose five instructional strategies, . ment gap by focusing on this type of strategy.
•D
iscussing science in the news – Across all Many of the findings are in line with the predictions
racial/ethnic and school disadvantage groups, one would make based on the arguments found in
scores are highest in the middle response . the National Academy of Sciences report Taking
category, “less often than every day.” Black . Science to School. Other findings are somewhat .
and Hispanic students are less likely than other puzzling and call for further investigation. In .
students to fall into that category, suggesting . any case, more detailed information on how .
that the achievement gap may be due, in part, . these practices are implemented in classrooms .
to differences in the frequency of exposure to . and information about the contexts in which .
this instructional strategy. they are employed should be gathered to provide .
further evidence with regard to the efficacy of .
these strategies.
Introduction
“We live in a society — and a nation, and a world — • Know, use, and interpret scientific explanations .
exquisitely dependent on science and technology, in of the natural world
which hardly anyone knows anything about science
and technology.” • Generate and evaluate scientific evidence .
Carl Sagan and explanations
2
Press conference held in September 1988 introducing the 1986 NAEP science assessment.
3
National Science Board, The Science and Engineering Workforce: Realizing America’s Potential, National Science Foundation, August 14,
2003. See also, Committee on Prospering in the Global Economy of the 21st Century: An Agenda for American Science and Technology,
Committee on Science, Engineering, and Public Policy, Rising Above the Gathering Storm: Energizing and Employing America for a Brighter
Economic Future, National Academy of Sciences, National Academy of Engineering, and Institute of Medicine of the National Academies,
Washington, D.C.: National Academies Press, 2007.
4
Richard A. Duschl, Heidi A. Schweingruber, and Andrew W. Shouse (eds.), Taking Science to School: Learning and Teaching Science in
Grades K – 8, Washington, D.C.: National Academies Press, 2007.
5
The principal tool used is a variant of multiple regression analysis, appropriate for the hierarchical structure of the data collected by NAEP.
More information about Hierarchical Linear Modeling (HLM) is provided in the second section of this report.
Science Achievement There was some good news regarding minority .
Below, we provide overall results for the three grade students’ performance in grades 4 and 8. At grade .
levels assessed. We present one view of science 4, since 2000, average scores increased by 7 points .
achievement by examining average scale scores. At for Black students and by 11 points for Hispanic .
grade 4, the average score in 2005 was higher than . students. White and Asian/Pacific Islander fourth-
in earlier years. At grade 8, there was no overall . graders also improved since 1996, as did Hispanic
improvement in the average score since 1996. At and Black students. At grade 8, Black students were
grade 12, the average score declined since 1996. the only racial/ethnic group to make gains since
1996, and no racial/ethnic group has showed .
We present another view by examining the percent- improvement since 2000.
age of students who reach the achievement levels — .
Basic, Proficient, and Advanced — set by NAGB, The Eighth-Grade Science Classroom
based on recommendations from panels of educators The design of NAEP allows us to peer into the
and members of the public, to provide a context for nation’s classrooms and learn about how science is
interpreting student performance. being taught and gather information about the char-
acteristics and qualifications of science teachers. We
• At grade 4 in 2005, 68 percent of students scored selected grade 8 because this is when students begin
at or above the Basic achievement level. 29 per- to focus on science coursework, and the grade serves
cent performed at or above the Proficient level and as a gateway to more advanced science courses in
3 percent scored at or above the Advanced level. high school and ultimately in preparing for an S&E
career. It should be noted that NAEP is based on a
• At grade 8 in 2005, 59 percent of students scored . nationally representative sample of students, not of
at or above the Basic level, 29 percent performed . teachers. Thus, the information and data provided
at or above the Proficient level, and 3 percent here pertain to the characteristics and practices of
scored at or above the Advanced level. teachers of a nationally representative sample of
• At grade 12 in 2005, 54 percent of the students students, and not a nationally representative sample
scored at or above the Basic level, 18 percent of teachers. Consequently, the percentages reported
scored at or above the Proficient level, and 2 . should be interpreted as the percentage of students
percent scored at or above the Advanced level. whose teachers possess that characteristic or use .
that practice.
6
As provided by law, NCES, upon review of congressionally mandated evaluations of NAEP, has determined that achievement levels are
to be used on a trial basis and should be interpreted with caution. However, NCES and NAGB have affirmed the usefulness of these
performance standards for understanding trends in achievement, and these levels have been widely used by national and state officials.
Descriptions of these levels for each grade can be found in Appendix A of the Science Framework for the 2005 NAEP at the NAGB website,
http://www.nagb.org/pubs/pubs.html. This framework, used to guide the 1996, 2000, and 2005 assessments, requires assessment in
three broad fields — and three elements of knowing and doing science: conceptual understanding, scientific investigation, and practical
reasoning. This science framework also specifies that some questions and tasks should assess students’ understanding of the nature of
science and key organizing themes of science. The nature of science encompasses the historical development and habits of mind that.
characterize science and technology; themes of science are ideas that transcend the scientific disciplines and give scientists tools for
investigating the natural world. Themes included in the framework are systems, models, and patterns of change.
7
Grigg, Lauko, and Brockway, 2006.
8
As part of the 2005 NAEP Science Assessment, information was collected from students, teachers, and schools on instructional practices.
and other school factors. The choice of factors included reflects the perspectives of practitioners, researchers, and policymakers. There.
may be other school conditions and practices that foster instruction and learning, but the NAEP items represent factors that have been.
widely discussed in the literature.
Teachers. NAEP data provide information on the Slightly less than half of eighth-grade students have
years of experience of eighth-grade science teachers, teachers with fewer than 10 years of experience
both in terms of teaching elementary or secondary teaching elementary or secondary school. On the
education and in terms of teaching science. Table 1 other hand, nearly 60 percent of students have teach-
shows the distribution of experience. ers with less than 10 years of experience teaching
science. Thus, the distribution of teacher experience
Table 1 in teaching science is shifted toward the lower end
of the range (i.e., less experience), relative to overall
Percentage of Eighth-Grade Students, by Years of Experience
experience. This may mean that science was not .
of Their Science Teachers, 2005
the field of choice or preparation for some teachers.
Years Teaching For example, elementary school teachers may have
Years Elementary Years Teaching Science moved up to middle school and taken on responsibil-
or Secondary School
ity for teaching science.
0 to 4 25% 32%
4 to 9 24 26
NAEP also provides information on the certification
status of eighth-grade science teachers (Figure 1).
10 to 19 27 25 Most students (83 percent) have a science teacher
20 plus 24 17 that possesses a regular/standard certificate. .
Note that this does not imply that they possess .
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics,
National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
certification in science education.
Figure 1
Percentage of Eighth-Grade Students, by Type of Teaching Certificate Held by Their Teacher
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
Instructional Strategies. This section provides an practiced than others. More than half of students
overview of science teachers’ instructional strategies reported having read a science textbook, taken a
in teaching eighth-grade science, as reported by stu- science test or quiz, and having a teacher conduct a
dents. Figure 2 provides a summary of the results. 10 . science demonstration. Other popular activities (40
Figure 2 combines the response categories — “almost . percent or more of students reporting frequent activ-
every day” and “once or twice a week.” Several in- ity) were doing hands-on activities or investigations,
structional activities stand out as more commonly talking about results from hands-on activities, and
Figure 2
Percentage of Eighth-Grade Students Reporting Participation in Various Classroom Activities Almost Every
Day or Once or Twice a Week
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
9
More detailed data on the frequencies are available from the NAEP Data Explorer at http://nces.ed.gov/nationsreportcard/nde/.
10
As a result of the exploratory variable selection analysis described in Appendix B, three instructional strategies that were reported.
in NAEP were removed from the HLM analysis. They are teacher uses computers, hands-on work with living things, and hands-on.
activities that are not listed in the NAEP survey. These strategies were excluded from Figure 2.
working with other students on a science activity or Nonetheless, each NAEP administration produces a
project. In addition, 70 percent of students indicated rich database drawn from a nationally representative
that they did science projects that took a week or sample of students that, when probed with more .
more. Activities occurring much less often included sophisticated statistical tools, can yield useful .
giving an oral science report, preparing a written . insights into the relationships among student .
science report, using library resources for science, characteristics, teachers’ pedagogical strategies, .
and reading a book or magazine about science. and student achievement. The balance of this report
is devoted to revealing and interpreting these rela-
This section of the report has presented some basic tionships. If strong relationships exist, then they .
data from the eighth-grade 2005 NAEP Science . can serve as the basis for further investigations .
Assessment. We summarized overall achievement on to shed light on the causal mechanisms that .
the assessment and provided descriptive statistics . underlie them.
on teacher characteristics and instructional practices.
These data are both illuminating and provocative. There are limitations in the utility of findings based
They are provocative because they beg the question on looking at the relationships between an outcome
of whether science achievement is associated with of interest and single explanatory variables, a “one .
specific pedagogical strategies or teacher characteris- at a time” strategy; the linkages among the different .
tics. If we can gain some understanding of the kinds explanatory variables can yield counter-intuitive
of science instruction and teacher characteristics and even misleading results. The usual remedy is
that are related to achievement, we may be able to to employ a statistical methodology called multiple
help educators and policymakers craft more effective regression that allows one to explore the relationship
education and pedagogical policies. Unfortunately, between an outcome — for example, science achieve-
no one study can provide a definitive answer to that ment — and all the potential explanatory variables
question. In particular, NAEP has been primarily simultaneously. Moreover, with multiple regression,
designed to provide a snapshot of schooling and it is possible to take account of differences among
achievement at a point in time, and so there are students (with respect to those characteristics that
inherent limitations in its ability to answer questions are associated with science achievement) before .
related to the efficacy of one strategy or another. considering the possible contributions of instructional .
Moreover, the students’ responses offer no . practices. This is a useful way of addressing questions .
information on how well the particular strategy . of pedagogical efficacy. Our principal tool here is a
was implemented or whether it was appropriate, . variant of multiple regression analysis, appropriate
given the time and place. Both quality and . for the hierarchical structure of the data collected by
appropriateness are strongly related to effectiveness. NAEP — Hierarchical Linear Modeling (HLM).
10
Methodology
Students invited to take the NAEP assessment are With the exception of parental education, the student
identified according to a carefully implemented two- demographic characteristics are derived from school
stage sampling scheme. First, a systematic random records. Parental education and all student environ-
sample of schools is selected, and then a random ment characteristics are from the student question-
sample of approximately 30 students is drawn from naire, as are all teacher pedagogy characteristics
each school. The original sample for the NAEP Science. other than the responses related to the teacher having
2005 grade 8 assessment contained records for adequate resources and the total time the teacher
148,595 students drawn from more than 6,300 spends with his/her class on science instruction in a
schools. Slightly more than 40,000 students were typical week. Those teacher responses and all teacher
excluded from the analyses because of missing data. characteristics are taken from the teacher question-
The most common reason for the missing data was naire. Finally, all school characteristics are derived
the lack of a match between the student and the either from the questionnaire completed by the .
teacher, precluding complete analysis of the relation- principal or her designee, or from school records.
ships among science scores, teacher characteristics,
and instructional practices. The analysis sample The two-stage sampling scheme described above
comprised 107,933 students.11 For further details and generates a hierarchical data structure of students
a comparison of the reduced sample to the sample nested within schools. To properly analyze this type
used in NAEP reporting, see Appendix A. of data, multilevel or hierarchical regression models
are generally employed. Unlike single-level regres-
We present the full set of student-level variables . sion models, multilevel models properly take account
employed in the analysis in Table 2, in which they . of both the statistical relationships among students
are organized for convenience into four categories. in the same schools and those among students in
There are seven student demographic characteristics, different schools, as well as providing appropriate
eight student home environment characteristics, 22 standard errors for judging the statistical significance
teacher pedagogy characteristics, and 11 teacher of the results. In the first level of a typical two-level
characteristics.12 Table 3 contains the six school . model, a separate regression equation is fit to the
characteristics that are also included in the analysis.13 data from each school. In the second level, the sets
of school-specific regression coefficients from the
Each student or school characteristic generates a first level are represented as functions of school-level
number of variables, depending on the count of . characteristics. For more details, consult Appendix B.
classifications corresponding to the characteristic.
For example, there are five classifications for race/ The principal goal of this stage of the study was to
ethnicity: White, Black, Hispanic, Asian/Pacific Is- identify those pedagogical strategies that make an in-
lander, and American Indian/Alaska Native. One . cremental contribution to accounting for differences
of the classifications is chosen to serve as the base . among students in science achievement after adjust-
for all comparisons. Four indicator variables are . ing for differences among students with respect to
then defined, one for each of the other classifica- demographic characteristics, home environment, and
tions.14 We then compile the values of all variables teacher characteristics. With this goal in mind, we
into a database that we use as the input to the . employed a stagewise exploratory regression strategy.
regression program.
Additional analyses were undertaken to determine if excluding the students with missing Teacher Match Code will have a significant.
11
impact on the relationship between science achievement and teacher pedagogies. The results show that there is no noticeable difference.
between estimates of regression coefficients from the two different samples. All the estimates using the full NAEP reporting sample are
within one standard error of the estimates using the sample that includes students who could not be matched with their teacher.
Items on the student questionnaire related to student attitudes towards science were not employed because of the likelihood that.
12
students’ responses could be causally related to their teachers’ instructional strategies. Including them in the model would then result .
in biased estimates of the focal parameters.
The NAEP background questionnaires are available online. For the 2005 NAEP Science Grade 8 Teacher Questionnaire, .
13
takes the value “0” otherwise. The fitted regression coefficient attached to such an indicator variable represents the average difference.
in science scores between the students in the corresponding classification and the students in the base classification, holding all the.
other variables in the model equal.
11
Table 2
Student-level Variables Included in the HLM Analysis
Gender Newspaper at home Student doing hands-on with electricity Teacher holding above
bachelor academic degree
Race/ethnicity Magazines at home Student doing hands-on with chemicals
Teacher holding standard
IEP disability status Number of books Student doing hands-on with rocks teaching certificate
at home
Eligibility for free/reduced- Student doing hands-on with magnifying Years of teaching experience
price school lunch Computer at home glass/microscope greater than years of science
teaching experience
Status as an English- Encyclopedia at home Student doing hands-on with barometer
language learner (Limited Teacher’s college focus being
English Proficient) Number of pages read Student doing hands-on with simple machines science and/or education
in school and for homework
Title 1 participation Student doing science projects in school that Teacher being a leader
Number of absences take a week or more for science education
Parental education
Talking about things studied in Student writing long answers Teacher undergraduate
school with family to science tests/assignments study major in science
Student talking about the measurements and Teacher graduate study major
results from hands-on activities or investigations in education
12
Table 3 strategies and NAEP science scores. With these
School-level Variables Included in the HLM Analysis considerations in mind, it is appropriate to conduct a
supplemental analysis that examines the relationship
Percentage of minority students (Black and Hispanic)
between science achievement and teacher pedagogies
Percentage of limited-English-proficient students directly — that is, without introducing student de-
Percentage of students eligible for school lunch program mographics and home environments into the model.
Region of the country This is the purpose of the second set of analyses,
Percentage of students enrolled for special education which we refer to as Phase 2.
School type (private or public)
Finally, it is important to recognize that the patterns
in the regression coefficients in the final fitted models
Recall that we grouped substantively related .
represent a type of average over all students in the
characteristics into the five categories. We entered
analysis set. Such averaging may not reflect the pat-
variables into the student-level regression model .
terns that would be observed when the models are fit
sequentially by category, according to a predetermined
to subgroups of students defined by various student
order, based on both statistical considerations and .
and school characteristics. The degree of consistency
interpretive goals. The order was: student demo-
of the patterns in the relations among variables
graphic characteristics, student home environment
across many such subgroups aids in proper interpre-
characteristics, teacher pedagogical strategies, and
tation of the findings. Moreover, examining the data
teacher characteristics.
at this level can reveal if there is differential exposure
At each stage, we retained the set of variables cor- to putatively supportive instructional strategies. In
responding to a particular characteristic only if the other words, to what extent are student characteris-
regression coefficients associated with the variables tics associated with teachers’ pedagogy? Do students
in the set generally exceed a predefined statistical have differential access to “effective” instructional
threshold — for example, significant at the 0.05 level. strategies based on their characteristics? We refer to
At the next stage, we entered into the regression the this set of supplemental analyses as Phase 3.
variables remaining from the previous stage together
with all the variables in the next category. We contin-
Cautions in Interpretation
ued the process until the last category was entered. In Notwithstanding the many strengths of the NAEP
the final model, the estimated regression coefficients design and the rigor of its implementation, there are
of the teacher pedagogy variables, for example, repre- some important limitations to NAEP-based analyses
sent the strength of the relationships between NAEP that should be borne in mind. First, the analyses rely
science scores and instructional strategies after tak- heavily on responses provided by students. These
ing account of measured differences among students responses only imperfectly reflect what actually oc-
who are not influenced by the teacher, as well as . curred in the classroom because of the way students
differences in the characteristics of their teachers. interpreted the questions and the accuracy of their
For more detail on the methodology, see Appendix B. recall. In addition, there is no information with
We refer to this set of analyses as Phase 1. respect to the quality and appropriateness of the
strategy as it was implemented in the classroom.
In Phase 1, the burden of proof, so to speak, is on
the pedagogical variables. That is, the strength of the Second, and equally important, is the fact that NAEP
estimated statistical relationship between those vari- is an observational study rather than a true experi-
ables and NAEP science scores is influenced by the ment. Teachers typically choose which strategies to
other variables in the model and their relationship adopt and when to employ them, rather than having
to both the pedagogical variables and NAEP science the strategies randomly allocated to them by some
scores. For example, suppose that students whose external mechanism. The implication is that the rela-
parents have higher levels of education are also more tionships uncovered, should there be any, cannot be
likely to be exposed to pedagogical strategies that are interpreted causally. That is, if it is determined that
truly more effective. Suppose further that higher pa- students exposed to strategy A score higher on aver-
rental education is strongly positively associated with age than those exposed to strategy B, other things
science achievement. Then it is quite possible that being equal, one cannot conclude that the difference
the final fitted model will underestimate the strength is due to the differential effectiveness of the strategies
of the relationship between those pedagogical . — as one could, in principle, were this an .
13
experimental study. In the case of the NAEP sample, In particular, NAEP is a cross-sectional study in
there may be some student characteristics that are which prior measures of academic achievement are
associated both with a tendency to be exposed to not collected. A student’s observed level of science
strategy A and with higher science performance. . achievement as measured by NAEP depends not only
Similarly, there may be teacher characteristics that on her experiences in the current year but, to some
are associated both with greater general efficacy and degree, also on her educational experiences, in school
the choice of a particular strategy. These circum- and out, up to the time of the NAEP assessment.
stances are instances of selection bias. Adjustments that include prior test scores are more
likely to account for important differences among
Selection bias is always a danger in observational . students than are adjustments based on student and
studies. To mitigate its effects, investigators often . home characteristics alone.
collect ancillary data on the students and employ
regression methods to adjust the outcomes for differ- Statistical adjustment is an appropriate and widely
ences among students in these measured variables. It used approach to the analysis of observational data.
is hoped that the adjusted outcomes are less suscepti- However, rarely, if ever, can it fully compensate for
ble to selection bias. In this report, we have adjusted the lack of randomization. Accordingly, the results
science scores for a host of student demographics must be treated with caution. As explained previous-
and home environment characteristics. The relation- ly, we have undertaken two sets of supplementary .
ships between these adjusted outcomes and teacher analyses to enhance the credibility of the findings .
pedagogy, if any, are more credible than those found of the main study.
with unadjusted outcomes.
14
Exploring What Works in Science Instruction
Figure 3
Estimated* NAEP Science Score Gains or Losses Associated With Student Demographic Characteristics,
Grade 8, 2005
* Each estimate is adjusted for teacher characteristics, teacher pedagogical strategies, and for other student characteristics in the model.
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
15
Only the variables with associated regression coefficients that exceed a pre-defined statistical threshold (i.e., significant at the 0.05 level) .
are presented in the figures.
16
In statistical parlance, the coefficients are partial regression coefficients since they have been adjusted for the other variables in the model.
15
Examination of Figure 3 reveals that the regression with English-language learners and with students
coefficients corresponding to student demograph- with disabilities are strikingly large, 18.7 and 22.7
ics other than the ethnic classification “Asian/Pacific points, respectively.
Islander” are all large in magnitude and statisti-
cally significant. For example, regarding gender, the Figure 4 shows the results for variables related to
base category is female. Thus, the interpretation of student home environment characteristics. Some of
the corresponding coefficient is that, on average, the coefficients are very substantial. For example,
males score 7.9 points higher than females. For the compared with students with fewer than 25 books in
race/ethnicity characteristic, White students are the the home, students with more than 100 books in the
base category. Thus, the fitted coefficient for Black home score 13.3 points higher on average. Similarly,
students means that, other things being equal, on compared with students reporting not being absent,
average Black students score 18.2 points lower than those reporting being absent 10 or more days per
White students.17 The “achievement gaps” associated month score, on average, 11.4 points lower.
Figure 4
Estimated* NAEP Science Score Gains or Losses Associated With Student Home Environment Characteristics,
Grade 8, 2005
* Each estimate is adjusted for teacher characteristics, teacher pedagogical strategies, and other student characteristics in the model.
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
The phrase “other things being equal” means that when comparing two levels of a characteristic (e.g., Whites and Blacks of the student.
17
characteristic race/ethnicity), the values of all the other variables in the model are held constant. For the sake of brevity, this phrase is not
repeated but should be assumed by the reader.
16
The data in Figures 5, 6, and 7, which display the in the affirmative to a more specific question — for
results for variables associated with teacher pedago- example, “Do you do hands-on activities with simple
gies, are particularly interesting. We have organized machines?” — earn an additional premium of 1.7
the variables into three subgroups according to the points, on average. Thus, students who respond .
pattern in the coefficients. The first group (Figure 5) . consistently to the hands-on questions — i.e., in the
comprises those strategies whose reported use is affirmative to the general question, to at least one
associated with higher average scores or, if there are specific question, and who talk about the results
more than two frequency levels, then increased fre- — score at least 6.2 points (2.7 + 1.7 + 1.8) higher
quency is associated with higher average scores. The than those students who report that their teachers .
second group (Figure 6) displays those strategies for do not employ hands-on activities.
which increasing frequency of reported use is associ-
ated with lower average scores. The third group . It is also evident that reading a science textbook
(Figure 7) comprises those strategies with three or more frequently, being asked to write long answers
more frequency levels in which the highest average on tests or assignments, and working with others
score is associated with an intermediate frequency on projects are each associated with higher average
— what we have termed a “Goldilocks” pattern. scores.18 Finally, students whose teachers agree that
they have all the resources they need score signifi-
As shown in Figure 5, students doing hands-on activi- cantly higher than those students whose teachers
ties in science score 2.7 points higher than those who indicate that they have none or few of the resources
don’t, on average. Interestingly, students responding they need.
Figure 5
Estimated* NAEP Science Score Gains Associated With Teacher Pedagogical Strategies, Grade 8, 2005
* Each estimate is adjusted for teacher characteristics, student characteristics, and other teacher pedagogical strategies in the model.
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
Reading a textbook may refer to homework assignments, to the use of text materials directly in class, or both. Note that there is no
18
17
There are just two strategies in the second group different from the average score of the baseline
where increased frequency is associated with lower group. To take another example, students who pre-
scores (Figure 6): giving an oral science report and pare a written science report once or twice a month
using library resources for science. In both cases, the score, on average, one point higher than those who
score deficit associated with the highest frequency never or rarely write a science report (the baseline
level is quite substantial. There is no obvious expla- group). However, those who prepare a report more
nation for this finding. One can speculate that in often than twice a month score on average 2.9 points
many cases, these activities are not well coordinated lower than the baseline group. These patterns bear
with the curriculum and act more as a distraction scrutiny, and we will examine them further below.
than as a support for student learning.
We retained only two teacher characteristics for the
Perhaps the most surprising results are found in final model. As shown in Figure 8, students whose
the last group, shown in Figure 7, comprising five teachers hold a standard teaching certificate scored
strategies where the highest average score is associ- 1.5 points higher than those students with teachers
ated with an intermediate frequency — again — that having other or no credentials. Students whose .
might be termed a “Goldilocks” pattern. For example, teachers’ total experience exceeds their science .
compared with students who never take a science teaching experience scored 1.4 points lower than
test (the baseline group), those who take a test once other students. One possible explanation for this .
or twice a month score 8.3 points higher on average. finding is that science teaching was not the original
Those who take a test once or twice a week score . field of choice for these teachers and so these .
6.1 points higher, while the average score of those teachers, perhaps, may not be as well prepared to
who take a test almost every day is not statistically teach science.
Figure 6
Estimated* NAEP Science Score Gains Associated With Teacher Pedagogical Strategies, Grade 8, 2005
* Each estimate is adjusted for teacher characteristics, student characteristics, and other teacher pedagogical strategies in the model.
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
18
Figure 7
Estimated* NAEP Science Score Gains or Losses Associated With Teacher Pedagogical Strategies,
Grade 8, 2005
* Each estimate is adjusted for teacher characteristics, student characteristics, and other teacher pedagogical strategies in the model.
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
Figure 8
Estimated* NAEP Science Score Gains or Losses Associated With Teacher Pedagogical Strategies,
Grade 8, 2005
* Each estimate is adjusted for student characteristics, teacher pedagogical strategies, and other teacher characteristics in the model.
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
19
Finally, we retained four school characteristics . interpreted carefully. The apparent strength of .
in the model as useful predictors of differences the retained strategies’ relationships to science
among schools in (adjusted) average science scores achievement, as quantified by the magnitudes of .
(Figure 9). In particular, schools enrolling larger . the corresponding regression coefficients, may well
proportions of Black and Hispanic students and . underestimate their true strength. This would occur
larger proportions of students eligible for school if students with those demographic characteristics
lunch programs have lower average science scores. and home environments that were positively correlat-
Regionally, only schools in the West have average ed with science achievement were also more likely to
scores significantly different from schools in the be exposed to effective pedagogies. That might be the
Northeast (the base category). Note that the differ- case, for example, if schools enrolling students with
ence in magnitude of the coefficient for the percentage. higher socioeconomic status also hired more experi-
of disadvantaged minorities compared with the . enced, better qualified science teachers who tended
others is the result of differences in scaling: The . to employ these pedagogies.
percentage variable assumes values from 0 to 100,
while the other variables are coded as 0 or 1. In Phase 2, the models were fit using only teacher
characteristics and teacher pedagogies without
Phase 2 controlling for other variables. We examined only
The Phase 1 analyses were intended to answer the those teacher characteristics and teacher pedagogies
question of which pedagogical strategies make an that were retained in the final model of Phase 1. We
incremental contribution to accounting for differ- restrict our attention to the model in which both cat-
ences among students in science achievement after egories of variables are included. Figure 10 displays
differences among students regarding demographic the results for teacher characteristics, and Figures 11,
characteristics and home environments, as well as 12, and 13 display the results for teacher pedagogies.
teacher characteristics, have been taken into account. Note that we have retained the ordering in Figures 5,
As indicated at the outset, these findings should be 6, and 7.
Figure 9
Estimated* NAEP Science Score Losses Associated With School Characteristics, Grade 8, 2005
* Each estimate is adjusted for teacher characteristics, teacher pedagogical strategies, and student characteristics in the model.
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
20
Figure 10 shows that both teacher characteristics are However, the coefficient for the differential teaching
statistically related to science performance. The re- experience is negative again, but substantially larger
gression coefficient for the teacher having a standard than it was in Phase 1.
certificate is nearly the same as it was in Phase 1.
Figure 10
Estimated Unadjusted NAEP Science Score Gains and Losses Associated with Teacher Characteristics,
Grade 8, 2005
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
Figure 11
Estimated Unadjusted NAEP Science Score Gains and Losses Associated with Teacher Pedagogical
Strategies, Grade 8, 2005
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
21
Figures 11, 12, and 13 reveal that all the retained . Thus, from a qualitative perspective, it appears .
pedagogical strategies are strongly related, in a . that prior adjustment for student demographics .
statistical sense, to science performance. As one . and home environment does not materially affect .
would expect, the magnitudes of the regression . the findings. What remains an issue is which set .
coefficients are larger, and sometimes considerably . of regression coefficients (i.e., those reported in
larger, than the corresponding coefficients in Phase 1 . Phase 1 or those in Phase 2) is closer to what one
(Figures 5, 6, and 7). The signs of the coefficients are . would find in a controlled experiment. There is no
the same. Moreover, for those characteristics associ- way to answer the question with the present data;
ated with multiple levels of response, the patterns in . however, one can speculate that the two sets of .
the regression coefficients are identical to those in . results bracket the coefficients that would be .
Phase 1. In general, the sizes of the coefficients ob- obtained under ideal conditions.
served in Phase 2 are comparable to the sizes of the .
coefficients observed in Phase 1 for student home
environment variables.
Figure 12
Estimated Unadjusted NAEP Science Score Gains and Losses Associated with Teacher Pedagogical
Strategies, Grade 8, 2005
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
22
Figure 13
Estimated Unadjusted NAEP Science Score Gains and Losses Associated with Teacher Pedagogical
Strategies, Grade 8, 2005
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
23
ethnicity) and one collective (school disadvantage). attending such schools is 139 and that 12 percent of
Note that for this analysis, we draw on the full set of Black students attend such schools. Examination of
143,412 students rather than the smaller number em- Table 4 reveals that:
ployed in the regression analyses (see Appendix A).19
• White and Asian/Pacific Islander students are
least likely to be enrolled in schools with the .
Table 4
most disadvantaged populations (3 percent and
Distribution of NAEP Science Scores and Eligibility for School 13 percent, respectively), while American Indian/
Lunch Program by Racial/Ethnic Group, Grade 8 Alaska Native, Hispanic and Black students are
the most likely to be enrolled in the most disad-
Percentage of Students in School Eligible for School Lunch
vantaged schools (41 percent, 36 percent, and 33
0-25% 26-75% 76-100% percent, respectively).
Racial/Ethnic Group
Score (%) Score (%) Score (%)
• For each racial/ethnic group, mean scores de-
White 166 (42) 155 (54) 143 (3) crease with increasing levels of disadvantage of
the school population. For White students, the
mean scores are 166, 155 and 143. The differ-
Black 139 (12) 125 (56) 114 (33)
ences in mean scores between the least and most
disadvantaged categories of schools range from
Hispanic 144 (12) 129 (52) 120 (36)
23 points (166 – 143) for Whites to 37 points (150
– 113) for American Indian/Alaska Natives.
Asian/Pacific Islander 168 (42) 149 (45) 138 (13)
• For a given level of school disadvantage, mean
American Indian/ scores are generally highest for White and Asian/
150 (9) 137 (50) 113 (41)
Alaska Native Pacific Islander students and lowest for Black
and Hispanic students. The patterns displayed in
Note: Entries represent reported average scaled score. Values in parentheses are Table 4 should be kept in mind for the remainder
percentages of students in the corresponding race/ethnicity classification enrolled of the discussion.
in schools with the indicated level of the proportion of the student population eligible
for the school lunch program.
Table 5 contains results for the strategy “reading a
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, science textbook.” The Phase 1 analysis showed that
National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
averaged over all racial/ethnic and school disadvan-
tage groups, scores increase with the frequency of
reading a science textbook. In each main panel of
To carry out this analysis, we selected five strate-
the table, there are four sets (rows) of entries, cor-
gies from the set we examined in Phase 1: two from
responding to the four different frequency categories.
group 1 (positive effects), one from group 2 (negative
Consider the top main panel, which presents data for
effects), and two from group 3 (quadratic, or “Gold-
White students. The first column in that panel cor-
ilocks,” effects). For each strategy, we disaggregated
responds to White students attending schools with no
the students in each cell of Table 4 by their reported
more than 25 percent of the students eligible for the
level of the strategy. For each such subclassification,
school lunch program. As indicated in Table 4, .
we display the mean science score and the corre-
the students in this group represent 42 percent of
sponding percentage. The results are found in Tables
all White students and have a mean score of 166. In
5 to 9.
Table 5, these students are further classified by the
The presentation begins with Table 4, which dis- frequency of the strategy. There are four response
plays the basic statistics for the cross-classification. categories. The first category is “never or hardly ever.”
Consider the left-most column, corresponding to Approximately 19 percent of the group falls in this
schools in which 0 percent to 25 percent of students category with a mean score of 158. The next category
are eligible for the school lunch program. The table is “1-2 times a month.” Approximately 20 percent of
entry for Black students in that column indicates that the group falls in this category with a mean score .
the average NAEP science score for Black students of 166. The third category is “1-2 times a week.” .
Approximately 36 percent of the group falls in this
19
Since the analyses reported here do not require a student-teacher match, it was decided to employ the full NAEP reporting sample in
exploring the patterns of interest.
24
category with a mean score of 169. The last category • The overall ordering of racial/ethnic groups by
is “almost every day.” Approximately 26 percent of mean scores is reflected in the ordering observed
the group falls in this category with a mean score . when students are cross-classified by both school
of 169. disadvantage and frequency of the strategy. For
example, consider those students in schools
There are a number of interesting patterns evident . where 26 percent to 75 percent of students are
in this table: eligible for the school lunch program and who fall
in the base category for this strategy. Mean scores
• First, the disaggregated data support the . are highest for White and Asian/Pacific Islander
findings in Phase 1; there do not appear to . students (146 and 139, respectively) and lowest
be any anomalies. for Black, Hispanic, and American Indian/Alaska
Native students (120, 122, and 124, respectively).
• Generally, the relationship between the frequency
of reading a science textbook and average scores
The principal finding of this examination is that the
is the same for each combination of race/ethnicity
trends in effects observed for this strategy in the
and school disadvantage. The lowest mean cor-
regression analyses (Phase 1) are reflected in the
responds to the base category “never or hardly .
results obtained when the data are disaggregated
ever” and rises with increasing frequency. The .
by a particular pair of student and school charac-
largest gap between categories is that between .
teristics. This adds to the credibility of the conclu-
the base category and the successive, higher-.
sion regarding the apparent efficacy of this strategy
frequency categories.
and argues for further investigation. Moreover, the
• Overall, this trend corresponds to the trend . results indicate that the achievement gap in science
observed in Figure 5. However, the differences . scores cannot be explained by differences in students
in mean scores between the base category and reading a science textbook. Thus, it is reasonable to
the others are rather larger than the sizes of the suggest that science teachers should make some use
corresponding regression coefficients in Figure 5. of science textbooks in their teaching.
Since the latter have been adjusted for the other
strategies included in the model, this comparison
suggests that most students tend to be exposed .
to either multiple supportive or multiple non-.
supportive strategies.
25
Table 5
Average NAEP Science Scores and Frequency of Students Who Report Reading a Science Textbook, by Eligibility for
School Lunch Program and Racial/Ethnic Group, Grade 8
Note: Table values represent reported average scaled score and the percentage of students in each school lunch eligibility category (in parentheses).
‡ means not reported.
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
26
Table 6
Average NAEP Science Scores and Frequency of Students Who Report Working with Others on a Science Activity
or Project, by Eligibility for School Lunch Program and Racial/Ethnic Group, Grade 8
Working with Others on Science Activity or Project
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
Table 6 contains the results for the strategy “working • The proportion of students in the base category
with others on a science/activity or project.” It has is similar for each combination of race/ethnicity
the same format as Table 5, except that here there . and school disadvantage.
are only two categories: “never or hardly ever” .
(base category) and “once a month to daily.” Again, the detailed analysis adds to the credibility .
of the apparent effectiveness of this strategy, yet .
Table 6 shows the following patterns. cannot account for the overall differences in mean
scores among racial/ethnic groups.
• For the majority of combinations of race/ethnicity
and school disadvantage, mean scores are gener- Table 7 displays the results for the strategy .
ally lower and sometimes substantially lower in . “students giving an oral science report.” There .
the base category (“never or hardly ever”). are three categories: “never or hardly ever” (base .
category), “1-2 times a month,” and “more often .
• These differences are larger but in the same direc- than 1-2 times a month.”
tion as the effect for this strategy in Figure 5.
27
Table 7
Average NAEP Science Scores and Frequency of Students Who Report Giving an Oral Science Report, by Eligibility for
School Lunch Program and Racial/Ethnic Group, Grade 8
Giving an Oral Science Report
Percentage Eligible for School Lunch
0-25% 26-75% 76-100%
Frequency
Score (%) Score (%) Score (%)
White 1-2 times a month 166 (33) 155 (31) 144 (30)
More often than 1-2 times a month 154 (4) 142 (5) 125 (8)
Black 1-2 times a month 137 (36) 126 (37) 117 (38)
More often than 1-2 times a month 123 (8) 112 (11) 106 (17)
Hispanic 1-2 times a month 143 (36) 129 (34) 121 (34)
More often than 1-2 times a month 126 (7) 113 (9) 105 (14)
Asian/Pacific Islander 1-2 times a month 167 (36) 147 (38) 141 (37)
More often than 1-2 times a month 154 (4) 129 (9) 114 (13)
American Indian/Alaska Native 1-2 times a month 152 (33) 136 (28) 115 (44)
More often than 1-2 times a month ‡(6) 120 (7) 110 (17)
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
28
• Since excessive use of this instructional strategy The mean score patterns revealed here mimic those
is more likely to be seen in more disadvantaged observed in Figure 13, though again the magnitudes
schools, and is associated with lower average of the gaps are somewhat greater here. Again, too,
scores, schools employing this strategy might be since Black and Hispanic students are more likely
advised to limit or curtail it. to attend schools with high proportions of disad-
vantaged students, the overall differences in mean
This detailed analysis reveals a score pattern that scores among racial/ethnic groups can be accounted
mimics the one seen in Figure 12, although the . for, in part, by the patterns evident in this table.
magnitudes of the differences are larger here. . Thus, schools may make progress in addressing the
Disadvantaged minority students are somewhat . achievement gap by encouraging teachers to include
more likely to report experiencing higher frequencies . demonstration once or twice a week.
of this strategy, corresponding to lower NAEP .
science scores. Table 9 contains the results for the strategy “students
discussing science in the news.” It has three catego-
Table 8 contains the results for the strategy “teacher ries: “never or hardly ever” (base category), “less
doing a science demonstration.” It has four catego- often than every day,” and “almost every day.” Higher
ries: “never or hardly ever” (base category), “1-2 times scores are associated with the moderate use of .
a month,” “1-2 times a week,” and “almost every day.” this strategy.
Scores are highest in the category “one or two times
a week” and lowest in the category “never or hardly
ever.” Thus students who experience this instruc-
tional pedagogy at intermediate frequencies are more
likely to score higher.
29
Table 8
Average NAEP Science Scores and Frequency of Students Who Report Their Teacher Doing a Science
Demonstration, by Eligibility for School Lunch Program and Racial/Ethnic Group, Grade 8
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
30
Table 9
Average NAEP Science Scores and Frequency of Students Who Report Discussing Science in the News, by Eligibility for
School Lunch Program and Racial/Ethnic Group, Grade 8
Student Discussing Science in the News
White Less often than every day 171 (48) 161 (43) 149 (39)
Black Less often than every day 145 (37) 130 (33) 119 (35)
Hispanic Less often than every day 149 (38) 133 (37) 123 (38)
Asian/Pacific Islander Less often than every day 174 (47) 152 (42) 142 (45)
American Indian/Alaska Native Less often than every day 152 (40) 143 (39) 116 (47)
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
In summarizing the data patterns in Table 9: • Black and Hispanic students were less likely to .
be exposed to the optimal level of this strategy.
• For nearly all combinations of race/ethnicity
and school disadvantage, the average scores for The mean score patterns revealed here mimic those
students reporting the middle category “less often observed in Figure 13, though, again, the magnitudes
than every day” are higher than those for students of the gaps are somewhat greater here. The propor-
reporting either of the other two categories. tions of White students in the middle (optimal) cat-
egory are somewhat larger than the proportions for
• The gaps in average scores typically fall in the
Black students and Hispanic students. Consequently,
range of 5 – 10 points.
the overall differences in mean scores among racial/
• In most cases, the base category is the . ethnic groups may be due, in part, to differences in .
modal category. the frequency of exposure to this strategy.
31
Conclusions and Implications
We undertook this analysis to identify and examine We conducted Phase 3 to examine how the appar-
the characteristics — school, student, teacher, and in- ently effective and ineffective pedagogies identified
structional — of the eighth-grade science classroom in the multivariate analyses are distributed across
that are statistically associated with performance students when the data are disaggregated by racial/
on the NAEP science assessment. Indeed, a number ethnic group and school disadvantage categories. .
of strong, interesting patterns were discovered. But We designed this phase to answer two questions: .
what is one to make of these findings? On the one (1) Are the relationships between instructional strate-
hand, NAEP is an unrivaled source of data; it pres- gies and NAEP science scores found in the aggregate
ents results from a large, nationally representative analyses replicated when viewed at the subgroup
sample of students. On the other hand, information level? And (2) Do different groups of students have
on instructional practices is derived from student re- differential access to effective (or ineffective) .
sponses, which can be affected by misunderstanding instructional strategies?
of the questions, recall bias, and so on. In addition,
limitations of the administrative context preclude The Phase 1 analyses revealed that student demo-
obtaining more fine-grained information relating to graphic characteristics had statistically significant
the nature and quality of the instructional practices. associations with achievement. Black and Hispanic
Finally, NAEP is an observational study, so that . students scored considerably lower than White .
statistical results cannot blithely be interpreted . students, and males scored higher than females. .
causally. Notwithstanding these caveats, the consis- English-language learners and students with .
tency across the different analyses conducted in this disabilities scored much lower than other students.
study suggests that these findings should be taken Analysis of students’ home characteristics revealed
seriously and could serve as the basis for further that students with many books in the home scored
research studies. considerably higher than students with fewer books;
and students who were absent frequently scored
Phase 1 examined the relationships among student much lower than other students. Finally, students
demographic, background, and home environment whose teachers held a standard certificate scored
characteristics and teacher instructional practices on slightly higher than other students and students
the one hand, and NAEP science scores on the other. whose teachers’ years of total experience exceeded
This phase of the analysis identified instructional their years of science experience scored slightly .
practices that were associated with differences in lower than other students.
NAEP science scores after differences among stu-
dents with respect to demographic characteristics From the Phase 1 and Phase 2 analyses, we identified
and home environment, as well as teacher character- three subgroups of teacher instructional practices
istics, have been taken into account, or controlled for. with distinct patterns of association between their
reported frequencies and NAEP science achievement.
To investigate the possibility that controlling for One group comprises strategies for which increasing
those background characteristics might underes- frequency is associated with higher average NAEP .
timate the relationships found in Phase 1, Phase 2 science scores:
was conducted using models that employed teacher
characteristics and instructional pedagogies without • Using a science textbook
controlling for other variables.20 Results of Phase 2
• Doing hands-on activities in science
indicated that the Phase 1 findings with regard to
the pattern of associations between instructional • Writing long answers to science tests .
strategies and NAEP science achievement were not and assignments
materially affected by prior adjustment for student
demographic and home environment variables. This • Talking about measurements and results .
provides further support for the hypothesis that from hands-on activities
teachers’ instructional practices do make a difference
• Working with others on a science activity .
in students’ success on the NAEP science assessment.
or project
20
As explained earlier, under- or overestimation can occur as a result of selection bias that is not captured by the variables incorporated
into the model.
32
A second group comprises instructional activities for deficit is most severe for students reporting giving
which increasing frequency is associated with lower an oral report more often than once or twice a
average scores: month. Black and Hispanic students and stu-
dents attending more disadvantaged schools were
• Students giving an oral science report more likely to experience higher frequencies of
this strategy. Since excessive use of this strategy
• Students using library resources for science
is more likely to be seen in more disadvantaged
schools, curtailing this practice may help to close
A final group is characterized by the “Goldilocks”
the achievement gap.
analogy. For these instructional strategies, higher
scores are associated with moderate frequency: • Teacher doing a science demonstration – .
Across all racial/ethnic and school disadvantage
• Students taking a science test groups, scores are lowest in the “never or hardly
ever” category and highest in the category of “one
• Teachers doing a science demonstration
or two times a week.” For all racial/ethnic groups,
• Students discussing science in the news students attending schools with high levels of
disadvantage are less likely to be exposed to the
• Students reading a book or magazine about science optimal use of this strategy (one or two times a
week). Thus, schools may make progress in clos-
• Students preparing a written science report
ing the achievement gap by encouraging teachers
For the Phase 3 analyses, we chose five instructional to employ this strategy to a moderate degree.
strategies from the multilevel analyses. For each one, • Discussing science in the news – Across all .
students are cross-classified by racial/ethnic group racial/ethnic and school disadvantage groups,
and the percentage of students in school who are scores are highest in the middle response .
eligible for the school lunch program (a measure of category, “less often than every day.” Black .
school disadvantage). and Hispanic students are less likely than other
students to fall into that category, suggesting .
• Reading a science textbook – Across all racial/ that the achievement gap may be due, in part, .
ethnic and school disadvantage groups, scores to differences in the frequency of exposure to .
increase with the frequency of reading a science this instructional strategy.
textbook. The percentage of students in the op-
timal categories is similar for each combination In this report, we have provided a view of the eighth-
of race/ethnicity and school disadvantage. Thus, grade science classroom in 2005 using HLMs, which
the analysis suggests that mean score differences are well-suited to this type of data. Our intent was to
among racial/ethnic groups cannot be accounted identify teacher characteristics and pedagogies that
for by differences in exposure to this instructional are strongly related to student science achievement.
strategy. These data suggest that it is reasonable We conducted three sets of analyses that yielded
to recommend that science teachers should make similar findings with regard to the relationships be-
some use of science textbooks in their teaching. tween instructional practices and NAEP eighth-grade
• Working with others on a science project – . science scores. With due regard to the usual cautions
Across all racial/ethnic and school disadvantaged attendant upon drawing policy conclusions from the
groups, scores increase with the frequency of analysis of cross-sectional data, the consistency of
working with others on a science project. Thus, the findings speaks to the plausibility of the results.
the analysis also indicates that mean score dif- Many of the findings are in line with the predictions
ferences among racial/ethnic groups cannot be one would make based on the arguments found in the
accounted for by differences in exposure to this National Academy of Sciences report Taking Science
instructional strategy. A reasonable inference . to School. Other findings are somewhat puzzling .
from these data is that science teachers make . and call for further investigation, not least because .
use of group work on science projects. of the methodological issues to which we previously
alluded. In any case, more detailed information on
• Students giving an oral science report – . how these practices are implemented in classrooms
Across all racial/ethnic and school disadvantage and information about the contexts in which they are
groups, scores decrease with the frequency of employed should be gathered to provide further evi-
students giving an oral science report. The score dence with regard to the efficacy of these strategies.
33
Appendix A: Data for HLM
The 2005 NAEP grade 8 science assessment contains e) Students whose teacher holds a degree below the
records for 148,595 students from more than 6,300 bachelor’s level (i.e., their response to “What is
schools. A total of 5,183 students who could not be the highest academic degree the teacher holds?”
reported because of the nature or severity of their was “High-school diploma” or “Associate’s de-
disability were first excluded. The analysis sample . gree/voc certificate”); multiple or omitted .
for HLM was obtained by excluding students sequen- (n = 1,039).
tially from the full reporting sample (143,412) as
indicated below: A total of 107,933 students were included in the .
reduced sample for regression analyses. Table A.1
a) Students from International Department of . shows the percentages of students by their demo-
Defense Schools (n = 1,473). graphic characteristics using the reduced sample,
along with comparisons to the percentages .
b) Students missing the “teacher match code” . reported in the NAEP Data Explorer (NDE) .
variable (n = 27,611). (http://nces.ed.gov/nationsreportcard/nde/).
34
Table A.1
Percentage of Students by Demographic Characteristics, NAEP Grade 8 Science, 2005
Student Demographic Characteristics Percentage in Reduced Sample Percentage in NDE
Gender
Male 50% 50%
Ethnicity
White 63% 61%
Asian/Pacific Islander
4% 4%
American/Pacific Islander
Unclassified 1% 1%
Title 1 participation
Title 1 25% N/A
Source: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2005 Science Assessment.
35
Appendix B: HLM Methodology
There are a number of approaches to conducting a g01 ,..., gQ1 are the regression coefficients associated
multilevel analysis of a large data set with numerous with the school level covariates W1 j ,..., WQj ;
potential explanatory characteristics. The approach
adopted for this report follows the one implemented u0 j is the random error in the level 2 equation, as-
in a study of charter schools.21 It is a variant of what sumed to be independently and normally distributed
is often referred to as stagewise regression. across schools with mean zero and variance t2 ;
The multilevel or hierarchical regression analysis was and g10 , ... , gP0 are constants denoting the common
implemented using the program HLM6-PV that is
values of the P regression coefficients across schools.
designed to accommodate the special features of the
For example, g10 is the common regression coefficient
NAEP database.22 To make these ideas more concrete,
associated with the first covariate in the level 1 model
consider the following model:
for each school.
Level 1: yij = b0 j + b1 j X1ij +...+ bPj XPij + eij (1) In the level 1 equation, HLM estimates an adjusted
mean for each school. In the level 2 equation, these
Level 2: b0 j = g00 + g01W1 j +...+ gQ1WQj + u0 j adjusted means are, in turn, regressed on the school-
level covariates. The level 1 regression intercept b0 j
b1 j = g10 is assumed to be a random variable that is regressed
.
. on the level 2 indicators, while the level 1 regression
.
coefficients b1 j ,…, bPj are all assumed to be common
bPj = gP0 values (non-stochastic). Therefore, the estimated
where i indexes students within schools, j . values of b1 j ,…, bPj will be fairly insensitive to the
indexes schools; inclusion of any level 2 covariates.
yij is the outcome for student i in school j; It should be noted that the criterion or dependent
variables in these analyses are the plausible values
X1 , ..., XP are P student- and teacher-related . generated during NAEP operational analyses. The
covariates, centered at their grand means, and . regression models that are used to create the plau-
indexed by i and j as above; sible values do not include predictor variables that
directly reflect the nested structure of the data, as do
b0 j is the mean for school j, adjusted for the . the HLMs. Consequently, there may be some bias in
covariates X1 , ..., XP ; the estimates of the HLM parameters. However, the
bias should be quite small, given the extensive set of
b1 j , ..., bPj are the regression coefficients for . predictors that are used in NAEP operations.
school j, associated with the covariates X1 , ..., XP ;
Determining appropriate weights to be employed
eij is the random error (i.e., residual term) in the in an HLM analysis is a complex matter. The gen-
level 1 equation, assumed to be independently and eral recommendation is to apply school weights to
normally distributed with mean zero and a common schools, and the student-within-school weights to
variance s2 for all students; students.23 Since the student-within-school weights
are fairly constant within each school, only school
W1 j ,..., WQj are school-related covariates; weights were incorporated at level 2 of the models .
in the Phase 1 and 2 analyses.
g00 is the intercept for the regression of the adjusted
school mean on school characteristics;
Henry Braun, Frank Jenkins, and Wendy Grigg, A Closer Look at Charter Schools Using Hierarchical Linear Modeling (NCES 2006-460), .
21
U.S. Department of Education, National Center for Education Statistics, Institute of Education Sciences, Washington, D.C.: U.S.
Government Printing Office, 2006.
Stephen Raudenbush et al., HLM6: Hierarchical Linear and Nonlinear Modeling, Lincolnwood, IL: Scientific Software International,
22
2004. See also, Stephen Raudenbush and Anthony S. Bryk, Hierarchical Linear Models: Applications and Data Analysis Methods (2nd Ed.),
Thousand Oaks, CA: Sage Publications, 2002.
Danny Pfefferman, Chris J. Skinner, David J. Holmes, Harvey Goldstein, and Jon R. Rasbash, “Weighting for Unequal Selection
23
Probabilities in Multilevel Models,” Journal of the Royal Statistical Society, Series B, 60 (1), pp. 23 – 40, 1998.
36
For this study, the characteristics are first organized At the last stage, the full set of school characteristics
into a small number of categories, each containing was entered into the school level of the two-level
substantively related characteristics. In this study, model, and three were retained in the model.
there are five such categories, described in the main
body of the report. Student attitude questions, such As a result of such model-fitting sequence in Phase 1, .
as attitudes toward science, were excluded from the NAEP science scores are first adjusted for both
analysis. Before conducting the multilevel analysis, student demographics and student home environ-
a systematic sequence of exploratory unweighted ment before the set of teacher pedagogy variables are
regression models was first fitted using the General introduced. Consequently, the estimated regression
Linear Models (GLM) methodology. The goal was . coefficients of the teacher pedagogy variables repre-
to select a set of candidate covariates for the more . sent the strength of the relationship between NAEP
intensive HLM analysis. SAS GLMSELECT proce- science scores and teacher pedagogy after taking
dure was used, with variable selection based on . account of (some of the) differences among students
stepwise regression and the Schwarz Bayesian . that are not influenced by the teacher. Finally, teacher
Information (SBC) criterion. characteristics and school characteristics are entered
into the model to provide further insight.
Variables emerging from the GLM analysis are .
entered into the regression by category, according . Ordinarily, the introduction of a new variable into a
to a predetermined order, based on both statistical regression model results in changes in the regression
considerations and interpretive goals. At each stage, coefficients of the variables already present. (The
the set of variables corresponding to a particular changes are due to the correlations between the new
characteristic is retained only if the regression . variable and the old ones.) That is certainly the case
coefficients associated with the variables in the set in this study. The observed changes, however, are
generally exceed a predefined statistical threshold rather modest. For example, when the home environ-
(i.e., significant at the 0.05 level). At the succeeding ment characteristics are introduced in stage 3, the
stage, the variables remaining from the previous coefficients of the student demographic variables
stage together with all the variables in the next are slightly reduced in magnitude but the signs and
category are entered into the regression. The process patterns are unaffected. This holds true for both
continues until the last category has been entered. demographic characteristics and home environment
characteristics at stage 4, when teacher pedagogy
As indicated earlier, categories of characteristics characteristics are entered into the model. Accord-
were introduced in a particular order. For the Phase ingly, the discussion in the text focuses on the results
1 analysis, a model with no explanatory character- obtained after the conclusion of stage 6, when all five
istics was run at the first stage in order to obtain a categories of characteristics have been entered and
preliminary decomposition of score variance into those with significant regression coefficients retained.
within-school and between-school components. The
full set24 of student characteristics was then entered Each HLM model estimated in Phase 1 also yields a
at the second stage of analysis. All but one of the re- decomposition of the total variance of NAEP scores
gression coefficients were statistically significant and into the fraction attributable to the differences
so all characteristics were retained for the next stage. among students within schools and the fraction
At the third stage, the full set of home environment attributable to differences among schools. These vari-
characteristics was entered as student-level covari- ance decompositions are a convenient summary in-
ates. Based on the sizes of the regression coefficients dicator of the success of the model in accounting for
and their associated p-values, seven characteristics the heterogeneity in student outcomes (NAEP science
were retained for the next stage. At the fourth stage, scores). Table B.1 presents the variance decomposi-
all teacher pedagogy characteristics were entered tions corresponding to the model sequence described
at the student level and 13 were retained for further previously. The fully unconditional model on the first
analysis. At the fifth stage, all the teacher characteris- row of the table yields the basic decomposition. The
tics were entered into the student level and only two total variance is simply the sum of the two displayed
characteristics were retained for the further analysis. components: 966=601+365. That is, about 62 per-
cent of the total variance (601/966) is attributable
24
Refer to Tables 2 and 3 for the full set of variables entered into the models at each stage.
37
to within-school heterogeneity, and about 38 percent between schools. Indeed, 80 percent of the variance
(365/966) is attributable to between-school heteroge- between school means is accounted for by measured
neity. For the succeeding models, the numbers in . differences among students with respect to demo-
the fourth and sixth columns of the table represent . graphics and home environment. The inclusion .
the percentage reduction in the residual variance at of teacher pedagogy covariates results in a further .
the corresponding level of the model, treating the . reduction of 6 percent in within-school variance .
residual variance in the fully unconditional model . and of 4 percent in between-school variance. When
as the baseline. viewed, instead, in terms of the reduction in the .
residual variance of the previous model, the corre-
Note that including covariates at Level 1 yields a great- sponding percentages are 8 and 20. These are .
er reduction in the between-school variance than in substantial and lend support to the hypothesis .
the within-school variance. This is a common finding that teacher instructional practices do have an .
in such analyses and is explained by the fact that there impact on student science achievement.
is greater student homogeneity within schools than .
Table B.1
Variance Decomposition for NAEP Science Scale Scores in Phase 1 Analysis, NAEP Grade 8, 2005
Student demographics
None 438 27 72 80
† Home environment
Student demographics
† Home environment None 404 33 57 84
† Teacher pedagogy
Student demographics
† Home environment
None 404 33 55 85
† Teacher pedagogy
† Teacher background
Student demographics
† Home environment School
404 33 50 86
† Teacher pedagogy characteristics
† Teacher background
† Not applicable.
38
Appendix C: Standard Errors and Significance Tests for Figures 3 to 13
Race/Ethnicity1
1
The reference group is White.
Note: Standard errors of the estimates appear in parentheses.
Talking about things studied in school with family at least once a week 1.65 (0.44) .00
1
The reference category is 0-25 books.
2
The reference category is “none.”
Note: Standard errors of the estimates appear in parentheses.
39
Data for Figure 5
Estimated Regression Coefficients for Teacher Pedagogical Strategies, Level 1 Variables, NAEP Grade 8 Science, 2005
Teacher Pedagogical Strategies Regression Coefficient p Value
Student reading a science textbook 1
Student talking about the measurements and results from hands-on activities or investigations 1.77 (0.59) .00
1
The reference category is “never or hardly ever.”
2
The reference category is “teacher not getting any of the resources needed.”
Note: Standard errors of the estimates appear in parentheses.
1
The reference category is “never or hardly ever.”
Note: Standard errors of the estimates appear in parentheses.
40
Data for Figure 7
Estimated Regression Coefficients for Teacher Pedagogical Strategies, Level 1 Variables, NAEP Grade 8 Science, 2005
Teacher Pedagogical Strategies Regression Coefficient p Value
Student taking a science test 1
1
The reference category is “never or hardly ever.”
Note: Standard errors of the estimates appear in parentheses.
Years of teaching experience greater than years of science teaching experience -1.35 (0.45) .00
41
Data for Figure 9
Estimated Regression Coefficients for School Characteristics, Level 2 School Variables, NAEP Grade 8 Science, 2005
School Characteristics Regression Coefficient p Value
Percentage of minority students (Black and Hispanic) -0.06 (0.01) .00
1
The reference group is 0-25%.
2
The reference region is Northeast.
Note: Standard errors of the estimates appear in parentheses.
Years of teaching experience greater than years of science teaching experience -3.84 (0.77) .00
Student talking about the measurements and results from hands-on activities or investigations 2.83 (0.61) .00
1
The reference category is “never or hardly ever.”
2
The reference category is “teacher not getting any of the resources needed.”
Note: Standard errors of the estimates appear in parentheses.
42
Data for Figure 12
Estimated Regression Coefficients for Teacher Pedagogical Strategies, Level 1 Covariates, NAEP Grade 8 Science, 2005
Teacher Pedagogical Strategies Regression Coefficient p Value
Student giving an oral science report 1
1
The reference category is “never or hardly ever.”
Note: Standard errors of the estimates appear in parentheses.
1
The reference category is “never or hardly ever.”
Note: Standard errors of the estimates appear in parentheses.
43
44
About ETS
10293