Sie sind auf Seite 1von 18

The Journal of Special Education

http://sed.sagepub.com The Predictive Validity of Dynamic Assessment: A Review


Erin Caffrey, Douglas Fuchs and Lynn S. Fuchs J Spec Educ 2008; 41; 254 DOI: 10.1177/0022466907310366 The online version of this article can be found at: http://sed.sagepub.com/cgi/content/abstract/41/4/254

Published by: Hammill Institute on Disabilities

and
http://www.sagepublications.com

Additional services and information for The Journal of Special Education can be found at: Email Alerts: http://sed.sagepub.com/cgi/alerts Subscriptions: http://sed.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav

Downloaded from http://sed.sagepub.com by on February 17, 2009

The Predictive Validity of Dynamic Assessment


A Review
Erin Caffrey Douglas Fuchs Lynn S. Fuchs
Peabody College of Vanderbilt University

The Journal of Special Education Volume 41 Number 4 Winter 2008 254-270 2008 Hammill Institute on Disabilities 10.1177/0022466907310366 http://journalofspecialeducation.sagepub.com hosted at http://online.sagepub.com

The authors report on a mixed-methods review of 24 studies that explores the predictive validity of dynamic assessment (DA). For 15 of the studies, they conducted quantitative analyses using Pearsons correlation coefficients. They descriptively examined the remaining studies to determine if their results were consistent with findings from the group of 15. The authors implemented analyses in five phases: They compared the predictive validity of traditional tests and DA, compared two forms of DA, examined the predictive validity of DA by student population, investigated various outcome measures to determine whether they mediate DAs predictive validity, and assessed the value added of DA over traditional testing. Results indicated superior predictive validity for DA when feedback is not contingent on student response, when applied to students with disabilities rather than at-risk or typically achieving students, and when independent DA and criterion-referenced tests were used as outcomes instead of norm-referenced tests and teacher judgment. Keywords: dynamic assessment; interactive assessment; predictive validity; disabilities

he purposes of educational assessment are to evaluate current school achievement, predict future achievement, and prescribe educational treatments. Traditional one-point-in-time assessments or pretest posttest assessments have been used to accomplish these aims because the testing is standardized, easily administered, and norm referenced. Traditional tests tend to produce clear-cut results that are used to evaluate, identify, and classify children. Nevertheless, these tests have been criticized for underestimating general ability (Swanson & Lussier, 2001) and lacking sensitivity toward both so-called disadvantaged students (e.g., Pea, Quinn, & Iglesias, 1992; Utley, Haywood, & Masters, 1992) and students with disabilities (Lidz, 1987). The scores of low-achieving students on traditional tests are often difficult to interpret because of floor effects. That is, many unskilled kindergartners and first graders, when given a traditional reading test such as the Word Identification and Word Attack subtests of the Woodcock Reading Mastery TestsRevised, obtain a score of zero. Is such a score indicative of an unskilled reader not yet ready to acquire beginning reading skills, or does it signal a currently unskilled

reader ready to learn after pertinent instruction? Dynamic assessment (DA), an alternative to traditional testing, may be capable of distinguishing between these two types of nonreaders.

An Alternative to Traditional Testing


DA has been variously described as learning potential assessment (e.g., Budoff, Gimon, & Corman, 1976; Budoff, Meskin, & Harrison, 1971), mediated learning (e.g., Feuerstein, Rand, & Hoffman, 1979), testing the limits (Carlson & Wiedl, 1978, 1979), mediated assessment (e.g., Bransford, Delclos, Vye, Burns, & Hasselbring, 1987), and assisted learning and transfer by graduated prompts (e.g., Campione, Brown, Ferrara, Jones, & Steinberg, 1985). Across its variants, DA differs from traditional testing in terms of the nature of the examinerstudent relationship, the
Authors Note: Address correspondence to Douglas Fuchs, Peabody #328, 230 Appleton Place, Nashville, TN 37203; e-mail: doug.fuchs@vanderbilt.edu.

254
Downloaded from http://sed.sagepub.com by on February 17, 2009

Caffrey et al. / Dynamic Assessment 255

content of feedback, and the emphasis on process rather than on product (Grigorenko & Sternberg, 1998). In traditional testing, the examiner is a neutral or objective participant who provides standardized directions and does not typically provide performance-contingent feedback. The DA examiner, by contrast, not only gives performance-contingent feedback but offers instruction in response to student failure to alter or enhance the students achievement. Put differently, traditional testing is oriented toward the product of student learning (i.e., level of performance), whereas the DA examiners interest is both in the product and in the process (i.e., rate of growth) of student learning. Some researchers claim that DAs twin focus on the level and rate of learning makes it a better predictor of future learning. Consider the child who enters kindergarten with little background knowledge. She scores poorly on traditional tests, but during DA, she demonstrates intelligence, maturity, attention, and motivation, and she learns a taskor a series of tasks with relatively little guidance from the examiner. Because of this, and in spite of her performance on traditional tests, she is seen as in less danger of school failure than her classmates who score poorly on both traditional tests and DA. So, DA may correctly identify children who seem at risk for school failure but who, with timely instruction, may respond relatively quickly and perform within acceptable limits. Data from DA may also identify the type and intensity of intervention necessary for academic success. DA incorporates a test-teach-test format, conceptually similar to responsiveness-to-intervention techniques. However, as we will discuss later, DA can potentially measure ones responsiveness within a much shorter time frame.

Clinically Oriented DA Versus ResearchOriented DA


Over time, DA has evolved into two branches of study: clinically oriented DA and research-oriented DA. Clinically oriented DA began as an educational treatment to remediate cognitive deficiencies presumed to cause learning problems. Its most wellknown operationalization is Feuersteins Learning Potential Assessment Device (LPAD). The LPAD is a nonstandardized method of assessing and treating the cognitive deficiencies of children with learning problems. Treatment duration can last many years (Rand, Tannenbaum, & Feuerstein, 1979). Research-oriented

DA, by contrast, originated as an assessment tool. It typically involves a standardized assessment during which the examiner guides a students learning in a single session. The time required for the student to reach mastery, or the necessary level of instructional explicitness to advance the student, serves as an index of the students learning potential. Researchers and practitioners have used this form of DA to identify students who may require more intensive intervention and to place them in settings where such interventions can be implemented. Three concerns are typically expressed about DA: Namely, its construct is fuzzy, its technical characteristics are largely unknown, and its administration and scoring are labor intensive. First, construct fuzziness (Jitendra & Kameenui, 1993) refers to when DAs theory, purpose, procedures, and uses are unclear. Fuzziness occurs, for example, when, at a most general level, researchers fail to distinguish for their audience between clinically oriented or research-oriented DA. As a second example, a major purpose of clinically oriented DA, as just indicated, is to remediate deficient cognitive processes that appear to contribute to learning problems. However, the procedures of clinically oriented DA are generally nonstandardized and require the examiners insight and expertise to assess learning problems and adapt intervention. Second, the extant literature does not typically report the reliability and validity of DA measures. This stems partly from a deliberate rejection of standardized procedures by some researchers. Many advocates of clinically oriented DA believe standardization contradicts its spirit and theoretical orientation (e.g., Feuerstein, 1979). A standardized approach, they say, would fail to provide truly individualized intervention in response to student failure. Proponents of research-oriented DA, by contrast, believe standardization and technical adequacy are necessary to make it a worthwhile tool for research and practice (e.g., Bryant, Brown, & Campione, 1983; Ferrara, 1987; Swanson, 1994). These two views of standardization and DA are reflected in the nature of feedback offered during clinically oriented and research-oriented DA. In clinically oriented DA, examiners tend to frequently change how they teach to determine the type of intervention with which the student is most successful. In research-oriented DA, examiners typically change how much they teach and the level of explicitness of their teaching rather than the intervention. So, in essence, practitioners of clinically oriented DA use an everchanging process to maximize student achievement,

Downloaded from http://sed.sagepub.com by on February 17, 2009

256 The Journal of Special Education

whereas those using research-oriented DA attempt to assess student achievement in response to a more standardized intervention. Third, critics have suggested that the time required to develop protocols and train examiners may not be worth the information DA provides. Traditional tests already exist, and preparing examiners to use them is relatively straightforward. DA protocols have been around for decades, too, but because of inadequate information about their psychometric properties, more time may be needed to establish their validity and utility. Again, this criticism may be better understood by contrasting the two types of DA. Clinically oriented DA involves relatively little time to develop because scripted protocols are rarely developed. Insight and expertise are essential, and student responsiveness to instruction is relatively dependent on the specific educator providing the help. Conversely, researchoriented DA requires a laborious process of protocol development because the protocols must be standardized (and possibly norm based) on the target population. At the same time, the demand for practitioner insight and expertise is less. Because procedures are standardized, practitioners can be trained in about the time it takes to train examiners in traditional testing.

to predict future student achievement because it attempts to measure both level of performance and rate of growth. Presumably, those who learn with greater ease during DA will benefit more from classroom instruction and achieve at a higher level.

Purpose of Review
Several extensive reviews of DA are available in the literature. Grigorenko and Sternberg (1998) offer a comprehensive descriptive review of types of DA based on their comparative informativeness, power of prediction, degree of efficiency, and robustness of results. Yet, no quantitative syntheses were conducted, and DAs predictive validity was not systematically analyzed by type of feedback, population, or achievement criterion. Swanson and Lussier (2001) conducted a selective quantitative synthesis of DA. They used effect sizes and mixed regression analyses to model responsiveness to DA and found that the magnitude of effects was best predicted by type of DA and domain. Whereas they focused on differences between ability groups and effectiveness of various types of DA, they did not pursue issues of validity. This review focuses on the predictive validity of DA. Prediction of future achievement is important because it can identify students at risk for school failure and in need of more intensive intervention. Students enter school with different cognitive strengths and weaknesses, different home and community experiences and expectations, and different levels of prior education. These capacities, experiences, and expectations result in various levels of academic competence and readiness. Traditional testing has been criticized for its limited ability to estimate accurately a students potential for change. It is possible that DA, in conjunction with traditional testing, may provide a more accurate estimate of a students potential for change and likelihood of school success and inform planning for appropriate instruction. As we write, there is strong interest in responsiveness to intervention as a substitute for, or an adjunct to, traditional testing to identify at-risk and specialneeds students. Most current thinking about and formal operationalizations of responsiveness to intervention require a 10-week to 30-week instructional period. DA may be viewed as a response-to-intervention (RTI) process, too. Its possible advantage is that instructional responsiveness may be determined in a single assessment session. However, much still needs to be

Is There a Need for DA?


Currently, few proponents of DAincluding clinicians and researchersbelieve it is a viable alternative to traditional testing. Rather, many would say DA should not replace traditional testing but should be used in conjunction with it (e.g., Lidz, 1987). The question then becomes, What unique information can DA provide? First, DA may represent a less-biased measure of school achievement for certain student groups because it is less dependent on mainstream language skills and background experience (e.g., Pea et al., 1992; Sewell, 1979; Sewell & Severson, 1974). As we suggested earlier, it may be especially useful in recognizing readiness to learn among low-achieving students because, unlike many traditional tests, it does not suffer from floor effects. Also, items on most traditional tests are scored right or wrong, reflecting an allor-nothing perspective. DA, by contrast, gives multiple opportunities for success, and low-achieving students performances can be measured along a continuum of how easily they learn. Second, clinically oriented DA may inform instruction so that educational interventions can be more readily designed (e.g., Feuerstein, 1979; Haywood, 1992). Third, research-oriented DA has the potential

Downloaded from http://sed.sagepub.com by on February 17, 2009

Caffrey et al. / Dynamic Assessment 257

understood in this regard about DAs psychometric propertiesand about the operationalizations of responsiveness to intervention (e.g., Fuchs & Fuchs, 2006; Fuchs, Fuchs, & Compton, 2004).

Method
Definitions
As we indicated earlier, no single definition of DA exists. In this review, DA refers to any procedure that examines the effects of deliberate, short-term, intervention-induced changes on student achievement, with the intention of measuring both the level and rate of learning. In addition, for purposes of our review, DA must provide corrective feedback and intervention in response to student failure. Whereas, as discussed, DA is used for many purposes (e.g., to measure current achievement, to predict future achievement, to inform intervention), this review is concerned primarily with its predictive validity, that is, determining how well DA predicts future student achievement.

were not included (e.g., Dillon, 1979; Tellegen & Laros, 1993). Fourth, included articles described studies in which reported data could be used to examine DAs predictive validity. Studies of only concurrent validity or only construct validity were excluded (e.g., Campione et al., 1985). To examine predictive validity, included studies compared students levels of performance on a DA measure to their levels of performance on an academic achievement measure at some point in the future or to their future educational identification or classification. Studies that operationalized DA as an educational treatment were excluded (e.g., Feuerstein et al., 1981; Feuerstein, Rand, Hoffman, Hoffman, & Miller, 1979; Muttart, 1984; Rand et al., 1979; Savell, Twohig, & Rachford, 1986). In these studies, there were no data of a predictive nature. Finally, those studies that operationalized DA as different conditions of behavioral reinforcement (i.e., praise, candy, reproof) were excluded (e.g., Kratochwill & Severson, 1977).

Search Procedure
ERIC, PsycInfo, and ECER were searched for the terms dynamic assessment or interactive assessment or learning potential or mediated assessment. From this search, the first author identified the major contributors to the study of DA (e.g., Feuerstein & Budoff) and discovered a special issue of The Journal of Special Education devoted to the topic. In his introduction to this special issue, Haywood (1992) identified the individuals who engaged in the groundbreaking research in the DA field: Feuerstein, Rand, and Hoffman (1979); Feuerstein, Haywood, Rand, Hoffman, and Jensen (1986); Haywood and Tzuriel (1992), and Lidz (1987, 1991). Articles by these authors were searched for potential studies of predictive validity. In addition, two comprehensive reviews by Grigorenko and Sternberg (1998) and Swanson and Lussier (2001) were consulted. From these resources, articles were collected that were described as addressing the validity of DA or that had titles indicating that the validity of DA was addressed. Finally, a second search was conducted of ERIC, PsycInfo, and ECER with the terms dynamic assessment or interactive assessment or learning potential or mediated learning and predictive validity to ensure that the collected studies represented most of what was available. A total of 24 studies were identified during the search. (In the References section, these studies are indicated by an asterisk.)

Inclusion Criteria
Four criteria were used to select articles for this review. First, selected articles were published in English. Several relevant programs of DA research have been published in Russian (e.g., Ginzburg, 1981; Goncharova, 1990), German (e.g., Carlson & Wiedl, 1980; Guthke, 1977; Wiedl & Herrig, 1978), and Dutch (e.g., Hamers, Hessels, & Van Luit, 1991; Hamers & Ruijssenaars, 1984). A subset of these authors published their original studies in English (Hessels & Hamers, 1993; Meijer, 1993; Resing, 1993; Tissink, Hamers, & Van Luit, 1993), which are included in this review. If only secondary reports were available in English (e.g., Flammer & Schmid, 1982; Hamers & Ruijssenaars, 1984), the studies were excluded. Second, the articles selected included participants enrolled in preschool through high school. A study by Shochet (1992), for example, was excluded for using South African college students. Third, articles in the review included (a) students with highincidence disabilities, (b) students at risk for school failure due to cultural or economic disadvantage, (c) second-language learners, or (d) normally achieving students. Studies involving students with lowincidence disabilities, such as sensory impairments,

Downloaded from http://sed.sagepub.com by on February 17, 2009

258 The Journal of Special Education

Analysis
The data in the 24 studies were analyzed with regard to four dimensions. First, we compared traditional testing and DA with respect to the magnitude of their respective correlation coefficients with an achievement criterion. Second, two forms of DA (with contingent feedback and with noncontingent feedback) were compared. Contingent feedback refers to a process by which an examiner responds to student failure with highly individualized, nonstandardized intervention. Noncontingent feedback requires an examiner to respond to student failure with standardized interventions, regardless of the error, or errors, committed. Type of feedback was analyzed because, arguably, it speaks to the nature of classroom instruction. In classrooms with a standard nondifferentiated instructional approach, students would most likely receive noncontingent feedback, whereas in a classroom with more of an individualized approach, students would likely receive more contingent feedback. Third, the predictive validity of DA was analyzed across four populations: mixed-ability groups, typically achieving students, students at risk or disadvantaged but not disabled, and students with disabilities. Second-language learners were classified as at risk or disadvantaged. To use DA as a tool for identification, it is especially important that the predictive validity be strong for at-risk students and students with disabilities because these students are particularly susceptible to the floor effects of traditional tests, discussed earlier. Fourth, the nature of the achievement criterion was analyzed to determine whether DA could best predict (a) independent performance on the DA posttest (referred to as posttest DA), (b) norm-referenced achievement tests, (c) criterion-referenced achievement tests, or (d) teacher judgment. Posttest DA is the score on the DA measure given at the end of the study. It is the same measure given at the beginning of the study, but the administration is different. For posttest DA, the examiner does not offer corrective feedback. The posttest DA measure represents independent student performance on identical content measured by the pretest DA. Norm-referenced achievement tests are any commercially available assessments of achievement. Criterion-referenced achievement tests are researcher-designed assessments to measure the same construct as explored by the DA administered in the study. Teacher judgment is a rating of the students

achievement in the classroom. Table 1 provides demographic information on study participants in each of the 24 studies (i.e., number of participants and age of participants). In addition, Table 1 classifies the studies into categories for analysis (i.e., contingent vs. noncontingent, type of feedback, population, and achievement criterion). After analyzing the studies along these four dimensions, we also determined the value added of DA, over and above traditional testing. This was accomplished by finding studies in which researchers used forced-entry multiple regression to investigate how much variance DA could explain after the variance due to traditional testing was explained. If DA explains significant added variance, it may be worth the time and effort to develop new protocols and use them for identification and placement. Mixed methods were used to explore the data. Pearsons correlation coefficients were used as an indicator of prediction strength, and the coefficients served as a common metric across 15 of the 24 studies. If multiple correlations were reported, the appropriate correlations were averaged to provide only one correlation coefficient per analysis category per study. For example, if DA with contingent feedback was used to predict both math and reading performance, the two correlations were averaged to calculate one correlation for the contingent versus noncontingent analysis category. Authors of the 9 studies not reporting Pearsons correlation coefficients used various group designs and single-subject designs that produced data not directly comparable to Pearsons correlation coefficient. Nevertheless, the information in these 9 studies was considered valuable because of the few investigations exploring DAs predictive validity. Hence, we provide descriptions of their methods and outcomes in the narrative. Because studies reporting Pearsons correlation coefficients were included in the aggregated data (see Table 2), we do not describe them in the narrative. Significance testing between average correlation coefficients was not possible due to small samples and correspondingly low statistical power. Instead, we discuss trends in the magnitude and direction of the coefficients. Table 2 presents the 15 relevant studies and associated correlation coefficients along the four dimensions: DA versus traditional testing, contingent feedback versus noncontingent feedback, population (mixed-ability groups vs. typically achieving students vs. students who are at risk or disadvantaged but not disabled vs. students with disabilities), and achievement

Downloaded from http://sed.sagepub.com by on February 17, 2009

Caffrey et al. / Dynamic Assessment 259

Table 1 Demographic Characteristics of Participants and Study Characteristics Used in Analysis


Study Babad and Budoff (1974) N 207 Chronological Age or Grade 3rd grade Feedback Noncontingent Population Mixed-ability group; normally achieving; at-risk/ disadvantaged; students with disabilities Students with disabilities Mixed-ability group Mixed-ability group At-risk/disadvantaged Students with disabilities Mixed-ability group Normally achieving Normally achieving At-risk/disadvantaged Students with disabilities Mixed-ability group Students with disabilities At-risk/disadvantaged Students with disabilities Students with disabilities Mixed-ability group Achievement Criterion Teacher judgment

Bain and Olswang (1995) Bryant (1982) Bryant, Brown, and Campione, (1983) Budoff, Gimon, and Corman (1976) Budoff, Meskin, and Harrison (1971) Byrne, Fielding-Barnsley, and Ashley (2000) Day, Engelhardt, Maxwell, and Bolig (1997) Ferrara (1987) Hessels and Hamers (1993) Lidz, Jepsen, and Miller (1997) Meijer (1993) Olswang and Bain (1996) Pea, Quinn, and Iglesias (1992) Resing (1993) Rutland and Campbell (1995) Samuels, Killip, MacKenzie, and Fagan (1992) Sewell (1979) Sewell and Severson (1974) Spector (1992) Speece, Cooper, and Kibler (1990) Swanson (1994) Swanson (1995) Tissink, Hamers, and Van Luit (1993)

15 188 96 103 84 30 500 66 224 21 50 234 26 20

2 years 6 months to 3 three years Preschool Preschool 6 years 2 months to 14 years 10 months 7th9th grade 11 years 4 years to 5 years 6 months 5 years 2 months to 6 years 2 months 5 years 4 months to 7 years 9 months 11 years to 21 years 16 years to 17 years 11 months 2 years 7 months to 3 years 3 years 7 months to 4 years 9 months 7 years 1 month to 8 years 4 months 11 years to 14 years 5 months 4 years to 5 years 6 months 1st grade 5 years 10 months to 7 years 5 months 5 years 11 months 1st grade 10 years 9 months 10 years 7 months 5 years 5 months to 6 years 8 months

Noncontingent Noncontingent Noncontingent Noncontingent Noncontingent Contingent Contingent Noncontingent Noncontingent Contingent Contingent Noncontingent Contingent Noncontingent Noncontingent Contingent

Criterion referenced Posttest DA score Posttest DA score Norm referenced Criterion referenced Norm referenced Posttest DA score Posttest DA score Criterion referenced Criterion referenced Criterion referenced Criterion referenced Norm referenced Teacher judgment Posttest DA score Norm referenced Norm referenced Norm referenced Criterion referenced Norm referenced Norm referenced Norm referenced Criterion referenced

91 62 38 193 143 61 115

Contingent Contingent Noncontingent Noncontingent Contingent Contingent Noncontingent

Normally achieving; at-risk/disadvantaged At-risk/disadvantaged Normally achieving At-risk/disadvantaged Mixed-ability group Mixed-ability group Mixed-ability group

Note: A dash indicates information not reported. DA = dynamic assessment.

Downloaded from http://sed.sagepub.com by on February 17, 2009

260

Table 2 Average Correlation per Study Within Analysis Categories Between Dynamic Assessment (DA) and Student Achievement
Achievement Criterion Population Feedback Contingent .39 .39 .36 .34 .35 Noncontingent Mixed-Ability Group Typically Achieving At-Risk/ Disadvantaged

DA or Traditional

Study

DA

Traditional

NormPosttest Referenced Disability DA Score Test

CriterionReferenced Test

Teacher Judgment .39

.39

.27

.64 .57

.49 .52

.64 .57

.64 .57

.64 .57

Downloaded from http://sed.sagepub.com by on February 17, 2009

.24

.41

.24

.24

.24

.57 .41 .41

.38 .51

.57

.57 .41

.57 .41

.59

.60

.59

.59

.59

.73 .73

.33

.73

.73

Babad and Budoff (1974) Bain and Olswang (1996) Bryant (1982) Bryant, Brown, and Campione, (1983) Budoff, Gimon, and Corman, (1976) Budoff, Meskin, and Harrison (1971) Byrne, Fielding-Barnsley, and Ashley (2000) Day, Engelhardt, Maxwell, and Bolig (1997) Ferrara (1987) Hessels and Hamers (1993) Lidz, Jepsen, and Miller (1997) Meijer (1993) Olswang and Bain (1996) Pea, Quinn, and Iglesias (1992)

(continued)

Table 2 (continued)
Achievement Criterion Population Feedback Contingent .68 .68 .68 Noncontingent Mixed-Ability Group Typically Achieving At-Risk/ Disadvantaged

DA or Traditional

Study

DA

Traditional

NormPosttest Referenced Disability DA Score Test

CriterionReferenced Test

Teacher Judgment

.68

.50

.32 .41 .58 .44 .44 .36 .46 .39 .56 .46 .42 .36 .46 .37 .58

.39 .41

.32 .41

.37

.28 .41

.32 .41 .58 .44 .36 .46 .59 .53 .38 .63 .39

Downloaded from http://sed.sagepub.com by on February 17, 2009

.58 .44

.29 .48

Resing (1993) Rutland and Campbell (1995) Samuels, Killip, MacKenzie, and Fagan (1992) Sewell (1979) Sewell and Severson (1974) Spector (1992) Speece, Cooper, and Kibler (1990) Swanson (1994) Swanson (1995) Tissink, Hamers, and Van Luit (1993) Average

.36 .46

.18 .35

.49

.41

Note: A dash indicates information not reported. Studies for which no correlations appear did not report correlations or provide data with which we could calculate them. Findings from these studies are reported in the narrative of our review.

261

262 The Journal of Special Education

criterion (posttest DA vs. norm-referenced tests vs. criterion-referenced tests vs. teacher judgment).

Findings
DA Versus Traditional Testing
As indicated, correlations between DA measures and achievement measures were reported in 15 of the 24 studies, and correlations between traditional testing and achievement measures were also reported in the same 15 studies (Babad & Budoff, 1974; Bryant, 1982; Bryant et al., 1983; Day, Engelhardt, Maxwell, & Bolig, 1997; Ferrara, 1987; Hessels & Hamers, 1993; Lidz, Jepsen, & Miller, 1997; Olswang & Bain, 1996; Rutland & Campbell, 1995; Sewell, 1979; Sewell & Severson, 1974; Spector, 1992; Speece, Cooper, & Kibler, 1990; Swanson, 1995; Tissink et al., 1993). The average correlation between DA and achievement measures was .49. The average correlation between traditional testing and achievement measures was .41. Correlations equal to or greater than .40 are considered by some scholars to be large (Cohen, 1977, 1988; Lipsey & Wilson, 2000). In the prediction of academic achievement, however, these correlations seem modest. Pearsons correlation coefficients do not consider the shared variance between traditional and dynamic measures, and it is impossible to determine the unique predictive ability of traditional or dynamic measures by these correlations (Lipsey & Wilson, 2000). As indicated, 9 additional studies investigated the predictive validity of DA without reporting Pearsons correlation coefficients (Bain & Olswang, 1995; Budoff et al., 1976; Budoff et al., 1971; Byrne, Fielding-Barnsley, & Ashley, 2000; Meijer, 1993; Pea et al., 1992; Resing, 1993; Samuels, Killip, MacKenzie, & Fagan, 1992; Swanson, 1994). These studies were grouped into three categories according to their design and analysis: single-subject design with visual analysis (Bain & Olswang, 1995), quasiexperimental design with multiple regression analysis (Budoff et al., 1976; Byrne et al., 2000; Meijer, 1993; Resing, 1993; Swanson, 1994), and experimental design with between-group comparisons (Budoff et al., 1971; Pea et al., 1992; Samuels et al., 1992). Single-subject design with visual analysis. Bain and Olswang (1995) studied the validity of DA to predict future speech growth in a sample of 15 preschoolers with specific language impairment.

Data were displayed on two scatterplots. The first scatterplot displayed participants based on their weighted DA scores for both semantic and functional relations against their changes in mean length utterance during the 9-week study. Results indicated that the weighted DA score accurately predicted change in rate of learning for 12 of the 15 participants. The second graph plotted participants weighted DA scores only for semantic relations against their changes in mean length of utterance. Results indicated that the weighted DA scores accurately predicted the changes in rate of learning for all 15 participants. That is, those with the highest weighted DA scores showed the greatest gains in speech. Quasi-experimental design with multiple regression analysis. Budoff et al. (1976), Byrne et al. (2000), Meijer (1993), Resing (1993), and Swanson (1994) used multiple regression analyses to study the unique predictive ability of DA over and above traditional assessment. All of these studies used some form of verbal and quantitative achievement as criteria to determine predictive validity. Budoff et al. found mixed results with a population of disadvantaged students: DA was significantly better than traditional testing in the prediction of nonverbal/ quantitative achievement; however, patterns of prediction for verbal measures were inconsistent. Although DA scores were a statistically significant predictor of one of the four verbal measures, traditional measures (e.g., IQ) and demographic information (e.g., age) were generally more consistent predictors. By contrast, Byrne et al. (2000), Meijer (1993), and Resing (1993) showed that DA made a significant and consistent contribution to the prediction of achievement. Byrne et al. used a DA procedure called session of last error to predict future phonemic awareness and reading achievement. Session of last error is a measure of the rate of reading progress throughout the study. It is closer to the current operationalization of RTI than it is to DA because it tracks student achievement for several weeks. The faster students reached mastery, the earlier their session of last error. Byrne et al. (2000) studied the reading achievement of a cohort of children in kindergarten and conducted follow-up tests in second and fifth grade. Byrne and his colleagues performed a series of multiple regression analyses on achievement in kindergarten, second grade, and fifth grade. In each

Downloaded from http://sed.sagepub.com by on February 17, 2009

Caffrey et al. / Dynamic Assessment 263

of the analyses, the posttest traditional score was entered first into the equation. Session of last error was entered as the second predictive variable. In all cases, the session of last error was a significant predictor of achievement above and beyond the traditional posttest score. It explained from 9% to 21% of the total variance. Meijer (1993) performed a similar analysis on the math achievement of a mixed-ability group of secondary students. First, a traditional measure of initial math achievement was entered into the multiple regression, which accounted for 11% of the variance in achievement. Second, a DA measure was added as a predictor, and it accounted for an additional 13% of the variance. Similarly, Resing (1993) found that, after controlling for verbal IQ, the combination of two dynamic measures (number of hints required to solve a problem and number of items requiring help) predicted an additional 13% of the variance in verbal achievement, 18% of the variance in math achievement, and 14% of the variance in teacher ratings of school performance for primary students with disabilities. Swanson (1994) conducted two separate multiple regression analyses on a mixed-ability group of primary students. In the first analysis, the initial traditional score was entered before dynamic variables. For reading achievement, the initial traditional score explained 11% of the total variance, and a combination of dynamic scores explained an additional 19%. For math achievement, the initial traditional score explained 20% of the total variance, and a processing stability score (initial score minus maintenance score) explained an additional 12%. DA did not explain unique variance in math achievement. In the second regression analysis, all variables were allowed to compete against each other. For reading achievement, three DA measures (gain score, probe score, and maintenance score) were found to be the best predictors of achievement, explaining a total of 34% of the variance. For math achievement, only one DA measure (gain score) was a significant predictor of achievement, explaining 32% of the variance. The ability of DA to predict future achievement, therefore, may depend on what domain of achievement is being predicted and whether initial traditional scores are entered as the first variable in a multiple regression. Experimental design with between-group comparisons. Three studies investigated the predictive validity of DA with experimental methods (Budoff et al., 1971; Pea et al., 1992; Samuels et al., 1992). Budoff et al.

studied DAs utility in predicting the response to a classroom science curriculum for low-achieving students in Grades 7 through 9. Even after IQ was factored out, performance on DA predicted which students would respond positively to the science curriculum. That is, students who initially scored higher on DA or students who improved throughout the administration of DA tended to learn more than students who scored lower on DA and showed no improvement during its administration. Pea et al. (1992) used DA to differentiate Spanish-speaking preschool students with language disorders from nondisabled Spanish-speaking students who had poor English skills. Pea and her colleagues developed a measure of learning potential called the modifiability index. Results indicated that students with a language disorder had a significantly lower modifiability index than did nondisabled students. In addition, students with a higher modifiability index demonstrated more gain in single-word vocabulary over the course of the school year. Pea et al. concluded that static measures alone would overidentify Spanish-speaking students for special education placement, but DA distinguished students with language disorders from nondisabled students. Another potential use of DA is informing educational placement. Samuels et al. (1992) studied DA with regard to its prediction of regular versus special education placement of students after preschool. DA significantly predicted educational placement (general vs. special). Results also indicated that placement could not be predicted on the basis of a traditional receptive vocabulary measure (Peabody Picture Vocabulary TestRevised). Samuels et al. concluded that traditional assessment alone could not fully capture the potential of a student and that DA may be an important tool for placement and programming decisions. Summary. DA and traditional tests correlate similarly with future achievement measures. However, researchers have demonstrated that DA can identify students who will respond to instruction (Bain & Olswang, 1995; Budoff et al., 1971), distinguish between minority students with and without language disorders (Pea et al., 1992), and predict future educational placement (Samuels et al., 1992). Researchers in several studies have reported that DA can contribute to the prediction of achievement beyond traditional tests (Byrne et al., 2000; Meijer, 1993; Resing, 1993). However, this seems to depend on the analysis techniques and domains of study (Swanson, 1994).

Downloaded from http://sed.sagepub.com by on February 17, 2009

264 The Journal of Special Education

Does Type of Feedback in DA Influence Predictive Validity?


Of the 15 DA studies reporting Pearsons correlation coefficients, 6 provided contingent feedback (individualized instruction in response to student failure) and 9 provided noncontingent feedback (standardized instruction in response to student failure). Studies with contingent feedback correlated .39 with achievement, whereas studies with noncontingent feedback correlated .56 with achievement. Nine studies did not report Pearsons correlation coefficients: 6 studies with contingent feedback (Budoff et al., 1976; Byrne et al., 2000; Meijer, 1993; Pea et al., 1992; Samuels et al., 1992; Swanson, 1994) and 3 studies with noncontingent feedback (Bain & Olswang, 1995; Budoff et al., 1971; Resing, 1993). Contingent feedback. It was difficult to investigate contingent feedback studies as a group because the authors of these studies operationalized achievement variables in different ways (continuous or dichotomous), which changed the meaning of significant results. When achievement was operationalized as a continuous variable (i.e., an achievement test), 2 studies reported positive support for the predictive validity of DA (Budoff et al., 1976; Byrne et al., 2000), and 2 additional studies reported mixed findings (Meijer, 1993; Swanson, 1994) such that results depended on the analysis technique and achievement domain. Two other studies operationalized achievement as a dichotomous variable and found that DA predicted identification or educational placement (Pea et al., 1992; Samuels et al., 1992). When an inherently continuous variable (i.e., achievement) is transformed into an artificial dichotomy (i.e., educational placement using an achievement cutoff point), these variables become less reliable and result in a loss of statistical power. Although the 2 studies using dichotomized variables demonstrate positive results for DA, they should be interpreted with caution. Noncontingent feedback. Results of the studies using noncontingent feedback were somewhat more straightforward. Using visual analysis, Bain and Olswang (1995) found that their noncontingent DA measure predicted immediate growth in speech with consistency. In addition, Budoff et al. (1971) and Resing (1993) found that DA predicted unique variance above and beyond that which was predicted by IQ.

Summary. Trends in Pearsons correlation coefficients show that DA with noncontingent feedback is more strongly associated with future achievement than is DA with contingent feedback. Studies in which researchers have used contingent feedback and do not report correlation coefficients are difficult to synthesize across participants and studies because of their highly individualized nature. Studies in which noncontingent feedback was used, and in which correlation coefficients are not reported, are somewhat easier to synthesize. Generally, they provide evidence that DA is useful in the prediction of future achievement, even when used in conjunction with traditional testing.

For Whom Does DA Have Predictive Validity?


Study participants were separated into four categories: mixed-ability groups, normally achieving students, at-risk or disadvantaged students, and students with disabilities. Two studies (Babad & Budoff, 1974; Sewell, 1979) reported data separately for more than one participant group and therefore provided Pearsons correlation coefficients in more than one category. Correlations were provided in 5 studies with mixed-ability groups (r = .46), 5 studies with normally achieving students (r = .42), 5 studies with at-risk or disadvantaged students (r = .37), and 4 studies with students with disabilities (r = .59). All studies with typically achieving students provided Pearsons correlation coefficients. DA correlated .42 with outcome measures. Four studies with mixed-ability groups did not provide Pearsons correlation coefficients. These results will not be discussed because they do not differentiate typically achieving students from at-risk students from students with disabilities. The data in mixed-ability group studies were not disaggregated by population. With no details on the mixed-ability group, it is impossible to tell what type of student (i.e., normally achieving, at risk, or disabled) contributed most significantly to the results. Achievement of at-risk or disadvantaged students, for whom DA measures are often designed, is predicted with slightly less accuracy than for mixedability groups and typically achieving students. Two studies with at-risk or disadvantaged students did not report Pearsons correlation coefficients (Budoff et al., 1976; Pea et al., 1992). As discussed, Budoff et al. found that DA scores were significant, yet

Downloaded from http://sed.sagepub.com by on February 17, 2009

Caffrey et al. / Dynamic Assessment 265

inconsistent, predictors of achievement. The results of Pea et al. indicated that DA can differentiate disabled from nondisabled Spanish-speaking children and predict English-language growth. DA predicted the academic achievement of students with disabilities with considerably more accuracy than it did the other three groups. Two DA studies (Bain & Olswang, 1995; Budoff et al., 1971) predicting the achievement of students with disabilities did not provide Pearsons correlation coefficients. The results of these two studies, as discussed, support the quantitative trend of correlation coefficients, indicating that DA may be a better predictor of achievement than traditional testing for students with disabilities. Summary. Trends in correlation coefficients show that DA was most strongly correlated with the achievement of students with disabilities. The correlation between DA and achievement was weakest for at-risk or disadvantaged students. Ironically, DA is often designed with the intent of creating a less biased measure of achievement for at-risk students. These results indicate that DA may not be less biased than traditional testing for this population.

to the achievement measure. Measures more similar to DA, such as posttest DA and criterion-referenced tests, are predicted with greater accuracy (.53 and .63, respectively) than those measures that are less similar, such as norm-referenced tests and teacher judgment (.38 and .39, respectively). Posttest DA. All studies that predicted posttest DA provided Pearsons correlation coefficients. DA measures correlated .53 with independent posttest DA performance. Norm-referenced achievement tests. Five studies that predicted norm-referenced tests did not provide correlation coefficients (Budoff et al., 1976; Byrne et al., 2000; Pea et al., 1992; Samuels et al., 1992; Swanson, 1994). Mixed support was found for DAs ability to predict achievement as measured by normreferenced tests. As discussed, Pea et al. and Samuels et al. found positive support for the use of DA as a tool for identification and placement, respectively, and Byrne et al. determined that DA explained unique variance in achievement. Budoff et al. and Swanson found mixed results. Demographic factors and traditional testing were more consistent predictors than was DA in Budoff et al.s study; Swanson found that the significance of the results depended on analysis techniques and the academic domain in question. Criterion-referenced achievement tests. Four studies that predicted criterion reference did not provide correlation coefficients (Bain & Olswang, 1995; Budoff et al., 1971; Meijer, 1993; Resing, 1993). As discussed, Bain and Olswang as well as Budoff et al. found positive support for the ability of DA to predict growth in achievement. Meijer and Resing both concluded that DA explained unique variance in the prediction of achievement, even after intelligence had been factored out. DA was a consistently significant predictor of achievement as measured by criterionreferenced tests. Teacher judgment. One study that predicted teacher judgment (Resing, 1993) did not report Pearsons correlation coefficients. Although DA did not predict teacher judgment as well as posttest DA or criterion-referenced achievement tests did, one study (Resing, 1993) found that DA accounted for 14% of the variance in teacher judgment of achievement, even after IQ had been factored out.

What Achievement Criteria Affect DAs Predictive Validity?


There were four types of achievement criteria: independent performance on the posttest of the DA measure (posttest DA), norm-referenced tests, criterion-referenced tests, and teacher judgment. Posttest DA is the achievement measure that is most similar to the DA measure itself. In most cases, the posttest DA is simply an alternate form of the pretest and training phases of DA. Criterion-referenced achievement tests are the next most similar to the DA measure. These criterion-referenced tests are designed by the researcher to measure the same construct being taught during DA. Norm-referenced achievement tests, by contrast, may or may not be similar to the DA measure. Fifteen studies provided Pearsons correlation coefficients: 5 predicted posttest DA, 4 predicted normreferenced tests, 5 predicted criterion-referenced tests, and 1 predicted teacher judgment. DA measures correlated .53 with posttest DA, .38 with norm-referenced tests, .63 with criterion-referenced tests, and .39 with teacher judgment. The trend of the correlations is interesting with respect to the similarity of the DA measure

Downloaded from http://sed.sagepub.com by on February 17, 2009

266 The Journal of Special Education

Summary. Again, studies in which researchers did not report Pearsons correlation coefficients followed the general trend of the quantitative analysis. Posttest DA and criterion-referenced tests were predicted more consistently than were norm-referenced tests and teacher judgment.

investigated the unique contribution of DA after traditional cognitive tests (i.e., IQ tests) had been entered in the multiple regression. Value added to traditional achievement tests. DA contributed significant unique variance to the prediction of future achievement beyond traditional achievement tests. Byrne et al. (2000) found that DA accounted for an additional 9% to 21% of the variance in phonemic awareness and reading achievement for students in kindergarten, Grade 2, and Grade 5. Likewise, Meijer (1993) found that DA accounted for an additional 13% of the variance in math achievement for secondary students. Value added to traditional cognitive tests. DA also consistently contributed significant unique variance to the prediction of future achievement beyond traditional cognitive tests. The 8 studies in which researchers conducted these analyses predicted three domains: general reasoning, verbal achievement, and math achievement. Regarding general reasoning, researchers investigated student performance on measures such as mazes, matrices, and series completion. Bryant (1982) found that two DA measures predicted significant variance in achievement: training score (22%) and transfer score (17%). Bryant et al. (1983) found that transfer score explained 22% of the variance in achievement above and beyond IQ (although the training score was found to be nonsignificant). Rutland and Campbell (1995) found that dynamic training, maintenance, and transfer all made significant contributions to the variance in achievement (11%, 11%, and 9%, respectively). In the verbal domain, DA also consistently contributed to the prediction of achievement. Spector (1992) found that DA contributed between 12% and 14% on phonological awareness measures and 21% on a word-reading measure. Indeed, DA was the only significant predictor of word reading. In Resings (1993) study, DA contributed an additional 13% in higher level verbal measures, such as reading sentences and writing. Speece et al. (1990), however, reported that DA was not a significant predictor of verbal achievement. The only significant predictors of verbal achievement in this study were verbal IQ and traditional pretest (25% combined). Results concerning the added value of DA in the prediction of math achievement were consistent but varied in magnitude. Ferrara (1987) noted that two dynamic measures explained a statistically significant

Discussion
The purpose of this review was to synthesize evidence on the predictive validity of DA. Pearson correlation coefficients indicate that traditional testing and DA predict future achievement with similar accuracy. Trends among the correlation coefficients indicated that DA predicted achievement more accurately (a) when the feedback of the assessment was noncontingent on student response, (b) with respect to the achievement of students with disabilities rather than students at risk or typically achieving students, and (c) when involving independent DA posttests and criterion-referenced tests instead of norm-referenced tests and teacher judgment of student achievement. If traditional testing and DA do equally well in predicting achievement, why should we consider using DA? If DA is laborious to develop and validate, why exert the extra effort to develop new tests when valid traditional tests are available? To address this question, we must consider another question: Do traditional testing and DA measure the same constructs that predict achievement? Past reviews have not focused on whether DA explains unique variance in student achievement. To examine this, we must look at the added value of DA. This is possible in analyses in which researchers used forced-entry multiple regression. In such an analysis, if traditional variables are entered first, it is possible to examine DAs unique contribution to the variance in student performance.

Does DA Provide Added Value?


Ten studies conducted forced-entry multiple regression analysis to explore DAs unique ability to predict achievement over and above traditional testing (Bryant, 1982; Bryant et al., 1983; Byrne et al., 2000; Ferrara, 1987; Meijer, 1993; Resing, 1993; Rutland & Campbell, 1995; Spector, 1992; Speece et al., 1990; Tissink et al., 1993). Two studies (Byrne et al., 2000; Meijer, 1993) investigated the unique contribution of DA after traditional achievement tests had been entered in the multiple regression, and 8 studies

Downloaded from http://sed.sagepub.com by on February 17, 2009

Caffrey et al. / Dynamic Assessment 267

portion of the variance in math growth: training score (17%) and maintenance and transfer score (32%). Resing (1993) and Tissink et al. (1993) reported that DA contributed significant variance to math achievement although it contributed less so than Ferraras study (18% and 7%, respectively). Speece et al. (1990) indicated that DA training contributed significant variance to math achievement; however, it explained only 2% of the overall variance. In general, there is evidence that DA can predict unique achievement not tapped by traditional achievement or traditional cognitive testing. When DA scores followed the entry of traditional scores in forced-entry multiple regressions, they explained significant variance in the prediction of general reasoning, verbal achievement, reading achievement, and math achievement. Only in Speece et al. (1990) were results inconsistent with these findings. Future research, therefore, should acknowledge that DA may not be synonymous with, or substitute for, traditional tests. Rather, it may provide valuable information beyond that which traditional tests provide. The practical significance of this additional information, however, is not yet understood.

All this is to say that there is an unequal relationship between pretest DA measures and the various achievement measures, which complicates comparisons among different achievement measures. Nevertheless, selecting varied outcome measures may be important to keep the magnitude of the results in perspective. External validity and the selection of outcome measures. The selection of outcome measures is equally important for purposes of external validity. The big question is, What outcome are we trying to predict? A related question is, What are the skills most representative of that outcome? If researchers use criterion-referenced tests to measure outcome, would performance on these tests generalize to success or failure in the classroom? Perhaps curriculumbased measures or teacher judgment of classroom performance would be a more sensitive index of classroom success. The second question above is concerned with the skills being assessed independent of the type of outcome measure selected. In a study measuring reading achievement, researchers may use both real-word reading and nonword reading. Some may suggest, however, that predicting nonword reading is less important than predicting real-word reading. Selecting multiple measures using real words may be more appropriate in that case.

Implications for Research


Internal validity and the selection of outcome measures. One issue in the selection of measures concerns the relationship of the predictor variables to the outcome variables. If Predictor A measures the same skill as the outcome measure and Predictor B does not, it would be reasonable to expect that Predictor A would be the stronger of the two. In this review, posttest DA performance and criterion-referenced achievement tests were more highly correlated with DA than were norm-referenced tests and teacher judgment. This finding makes sense given that posttest DA and criterion-referenced achievement tests are more similar to the pretest DA measure. Indeed, posttest DA and criterion-referenced tests are often designed by the researcher to be more similar to pretest DA and particularly sensitive to measuring change. With respect to norm-referenced tests, one would hope researchers would choose tests that are similar to pretest DA. However, it is unlikely that a commercially produced standardized test could be sensitive to change within a short time frame. Likewise, teacher judgment can be similar or not similar to the DA measure, depending on the correlation between a teachers perception of student achievement and the actual achievement.

Implications for Practice


Our review suggests that DA may add unique variance to the prediction of future academic achievement. And because DA is a test-teach-test process, we believe it is appropriate to ask whether it may be regarded, and further explored, as an alternative approach to RTI. If DAalone or in conjunction with other measuresis administered early in the school year and predicts academic performance at a later point, practitioners may choose to use it as a means of helping them identify students in need of more intensive instruction. Moreover, it may be a quicker, more efficient method of RTI than conventional methods for selecting an appropriate tier, or level, of intensity of instruction. More conventional RTI methods require anywhere from 8 to 30 weeks or more to determine instructional responsiveness and the appropriateness of a given instructional level. DA may be more practical in another way. Standard written protocols guide its use (e.g., Fuchs et al., 2007), whereas the implementation of instruction associated

Downloaded from http://sed.sagepub.com by on February 17, 2009

268 The Journal of Special Education

with more conventional RTI methods is less straightforward and more difficult to achieve with fidelity.

Limitations of Review
There are very few quantitative syntheses of DA research (e.g., Swanson & Lussier, 2001) and none that are concerned primarily with predictive validity. It is difficult to synthesize research on such a broad and sometimes poorly defined topic, and these results must be understood relative to the paucity of available studies. Study design. Several studies in this review were not primarily concerned with measuring the predictive validity of DA, and the DA measures may not have been designed with this purpose in mind. Similarly, the achievement measures may not have been chosen specifically to measure change across time. In addition, both the DA measures and criterion-referenced measures had unreported psychometric properties. We cannot be sure that the constructs indexed were valid, that the measures were reliable, or that they were implemented with fidelity. Study rigor. One final note concerns the relationship of DA feedback and study rigor. In well-controlled research, the researcher strives to minimize variables that will confound results. It is easier to conduct rigorous research in DA using standardized, noncontingent feedback. Individualized, contingent feedback is more difficult to control. Researchers who use noncontingent feedback may be exploring performance using methods that are easier to measure, quantify, and analyze. In such studies, standardized procedures are used in all cases of student failure; therefore, the independent variable is clear and unchanging. Researchers who use contingent feedback, by contrast, introduce an ifthen process into intervention. For example, if the students fail because they did not understand the directions, then the examiner may need to repeat or clarify the directions. If the students fail because they lack the underlying skills necessary for success, then the examiner may need to concentrate on teaching lower level skills. How can we compare the results of DA across students who require individualized intervention? If the instructional elements are not the same, how can we determine that the predictive ability is due to the nature of the DA and not to the teacher, teaching method, or some other unmeasured variable? It may be that noncontingent and contingent feedback cannot be judged by the same standards of rigor. And, consequently, it

may not be appropriate to compare noncontingent and contingent feedback using current research methods because noncontingent feedback fits more easily into the framework of rigorous, empirical research and therefore is more likely to produce consistent results. Clinically oriented DA that uses contingent feedback may need to develop new and different standards of rigor. DA is a broad area of research difficult to navigate. As we conduct more rigorous research with an articulated purpose, we can begin to accurately determine if research-oriented DA is useful in the identification of children at risk for school failure. Furthermore, we can investigate if clinically oriented DA is useful in the treatment of these children.

References
References marked with an asterisk indicate studies included in the synthesis. *Babad, E. Y., & Budoff, M. (1974). Sensitivity and validity of learning-potential measurement in three levels of ability. Journal of Educational Psychology, 66, 439447. *Bain, B. A., & Olswang, L. B. (1996). Examining readiness for learning two-word utterances by children with specific expressive language impairment: Dynamic assessment validation. American Journal of Speech Language Pathology, 4, 8191. Bransford, J. C., Delclos, J. R., Vye, N. J., Burns, M., & Hasselbring, T. S. (1987). State of the art and future directions. In C. S. Lidz (Ed.), Dynamic assessment: An interactional approach to evaluating learning potential (pp. 479496). New York: Guilford. *Bryant, N. R. (1982). Preschool childrens learning and transfer of matrices problems: A study of proximal development. Unpublished masters thesis, University of Illinois. *Bryant, N. R., Brown, A. L., & Campione, J. C. (1983, April). Preschool childrens learning and transfer of matrices problems: Potential for improvement. Paper presented at the meeting of the Society for Research in Child Development, Detroit. *Budoff, M., Gimon, A., & Corman, L. (1976). Learning potential measurement with Spanish-speaking youth as an alternative to IQ tests: A first report. Interamerican Journal of Psychology, 8, 233246. *Budoff, M., Meskin, J., & Harrison, R. H. (1971). Educational test of the learning-potential hypothesis. American Journal of Mental Deficiency, 76, 159169. *Byrne, B., Fielding-Barnsley, R., & Ashley, L. (2000). Effects of preschool phoneme identity training after six years: Outcome level distinguished from rate of response. Journal of Educational Psychology, 92, 659667. Campione, J. C., Brown, A. L., Ferrara, R. A., Jones, R. S., & Steinberg, E. (1985). Breakdowns in flexible use of information: Intelligence-related differences in transfer following equivalent learning performance. Intelligence, 9, 297315. Carlson, J. S., & Wiedl, K. H. (1978). Use of testing-the-limits procedures in the testing of intellectual capabilities in children

Downloaded from http://sed.sagepub.com by on February 17, 2009

Caffrey et al. / Dynamic Assessment 269

with learning difficulties. American Journal of Mental Deficiency, 11, 559564. Carlson, J. S., & Wiedl, K. H. (1979). Toward a differential testing approach: Testing-the-limits employing the Raven matrices. Intelligence, 3, 323344. Carlson, J. S., & Wiedl, K. H. (1980). Applications of a dynamic testing approach in intelligence assessment: Empirical results and theoretical formulations. Zeitschrift fur Differentielle und Diagnostische Psychologie, 1, 303318. Cohen, J. (1977). Statistical power analysis for the behavioral sciences (Rev. ed.). New York: Academic Press. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. *Day, J. D., Engelhardt, J. L., Maxwell, S. E., & Bolig, E. E. (1997). Comparison of static and dynamic assessment procedures and their relation to independent performance. Journal of Educational Psychology, 89, 358368. Dillon, R. (1979). Improving validity by testing for competence: Refinement of a paradigm and its application to the hearing impaired. Educational and Psychological Measurement, 39, 363371. *Ferrara, R. A. (1987). Learning mathematics in the zone of proximal development: The importance of flexible use of knowledge. Unpublished doctoral dissertation, University of Illinois. Feuerstein, R. (1979). The dynamic assessment of retarded performers. Baltimore: University Park Press. Feuerstein, R., Haywood, H. C., Rand, Y., Hoffman, M. B., & Jensen, M. (1986). Examiner manual for the Learning Potential Assessment Device. Jerusalem: Hadassah-WIZCOCanada Research Institute. Feuerstein, R., Miller, R., Hoffman, M. B., Rand, Y., Mintzker, Y., & Jensen, M. R. (1981). Cognitive modifiability in adolescence: Cognitive structure and the effects of intervention. The Journal of Special Education, 15, 269287. Feuerstein, R., Rand, Y., & Hoffman, M. B. (1979). The dynamic assessment of retarded performers: The Learning Potential Assessment Device. Baltimore: University Park Press. Feuerstein, R., Rand, Y., Hoffman, M., Hoffman, M., & Miller, R. (1979). Cognitive modifiability in retarded adolescents: Effects of instrumental enrichment. American Journal of Mental Deficiency, 83, 539550. Flammer, A., & Schmid, H. (1982). Lerntests: Konzept, Realisierungen, Bewhrung. Eine bersicht. Schweizerische Zeitschrift fr Psychologie und ihre Anwendungen, 33, 1432. Fuchs, D., & Fuchs, L. S. (2006). Introduction to responsivenessto-intervention: What, why, and how valid is it? Reading Research Quarterly, 41(1), 9399. Fuchs, D., Fuchs, L. S., & Compton, D. L. (2004). Identifying reading disability by responsiveness-to-instruction: Specifying measures and criteria. Learning Disability Quarterly, 27(4), 216227. Fuchs, D., Fuchs, L. S., Compton, D. L., Bouton, B., Caffrey, E., & Hill, L. (2007). Dynamic assessment as responsiveness-tointervention: A scripted protocol to identify young at-risk readers. Teaching Exceptional Children, 39(5), 5863. Ginzburg, M. P. (1981). O vozmozhnoi interpretatsii poniatia zony blitzhaishego razvitia [On a possible interpretation of the concept of the zone of proximal development]. In D. B. Elkonin & A. L. Venger (Eds.), Diagnostika uchebnoi

diatelnosti I intellectualnogo razvitia detei (pp. 145-155). Moscow: Academia Pedagogicheskikh Nauk SSSR. Goncharova, E. L. (1990). Nekotorye voprosy vyshego obrazovania vzrnslykh slepoglukhikh [On higher education for the deaf-blind]. In V. N. Chulkov, V. I. Lubovsky, & E. N. Martsinovskaia (Eds.), Differentsirovannyi podkhod pri obuchenii I vospitanii slepoglukhikh detei (pp. 56-70). Moscow: Academia Pedagogicheskikh Nauk SSSR. Grigorenko, E. L., & Sternberg, R. J. (1998). Dynamic testing. Psychological Bulletin, 124, 75111. Guthke, J. (1977). Zur Diagnostik der intellekturllen Lernahigkeit [Assessment of intellectual learning potential]. Berlin: VEB Deutcher Verlag der Wissenschafen. Hamers, J. H. M., Hessels, M. G. P., & Van Luit, J. E. H. (1991). Leertest voor etnische minderheden. In Test en handleiding. Lisse: Swets & Zeitlinger. Hamers, J. H. M., & Ruijssenaars, A. J. J. M. (1984). Leergeschiktheid en leertests. Lisse: Swets & Zeitlinger. Haywood, H. C. (1992). Introduction to special issue. Journal of Special Education, 26, 233234. Haywood, H. C., & Tzuriel, D. (Eds.). (1992). Interactive assessment. New York: Springer-Verlag. *Hessels, M. G. P., & Hamers, J. H. M. (1993). The learning potential test for ethnic minorities. In J. H. M. Hamers, K. Sijtsma, & A. J. J. M. Ruijssenaars (Eds.), Learning potential assessment: Theoretical, methodological, and practical issues (pp. 285311). Lisse, Netherlands: Swets & Zeitlinger B.V. Jitendra, A. K., & Kameenui, E. J. (1993). Dynamic assessment as a compensatory assessment procedure: A description and analysis. Remedial and Special Education, 14(5), 618. Kratochwill, T. R., & Severson, R. A. (1977). Process assessment: An examination of reinforcer effectiveness and predictive validity. Journal of School Psychology, 5, 293300. Lidz, C. S. (Ed.). (1987). Dynamic assessment: An interactional approach to evaluating learning potential. New York: Guilford. *Lidz, C. S., Jepsen, R. H., & Miller, M. B. (1997). Relationships between cognitive process and academic achievement: Application of a group dynamic assessment procedure with multiply handicapped adolescents. Educational and Child Psychology, 14, 5667. Lipsey, M. W., & Wilson, D. B. (2000). Practical meta-analysis. Thousand Oaks, CA: Sage. *Meijer, J. (1993). Learning potential, personality characteristics, and test performance. In J. H. M. Hamers, K. Sijtsma, & A. J. J. M. Ruijssenaars (Eds.), Learning potential assessment: Theoretical, methodological, and practical issues (pp. 341362). Lisse, Netherlands: Swets & Zeitlinger B.V. Muttart, K. (1984). Assessment of effects of instrumental enrichment cognitive training. Special Education in Canada, 58, 106108. *Olswang, L. B., & Bain, B. A. (1996). Assessment information for predicting upcoming change in language production. Journal of Speech and Hearing Research, 39, 414423. *Pea, E., Quinn, R., & Iglesias, A. (1992). The application of dynamic methods to language assessment: A nonbiased procedure. The Journal of Special Education, 26, 269280. Rand, Y., Tannenbaum, A. J., & Feuerstein, R. (1979). Effects of instrumental enrichment on the psychoeducational

Downloaded from http://sed.sagepub.com by on February 17, 2009

270 The Journal of Special Education

development of low-functioning adolescents. Journal of Educational Psychology, 71, 751763. *Resing, W. C. M. (1993). Measuring inductive reasoning skills: The construction of a learning potential test. In J. H. M. Hamers, K. Sijtsma, & A. J. J. M. Ruijssenaars (Eds.), Learning potential assessment: Theoretical, methodological, and practical issues (pp. 219242). Lisse, Netherlands: Swets & Zeitlinger B.V. *Rutland, A., & Campbell, R. (1995). The validity of dynamic assessment methods for children with learning difficulties and nondisabled children. Journal of Cognitive Education, 5, 8194. *Samuels, M. T., Killip, S. M., MacKenzie, H., & Fagan, J. (1992). Evaluating preschool programs: The role of dynamic assessment. In H. C. Haywood & D. Tzuriel (Eds.) Interactive assessment (pp. 251271). New York: Springer-Verlag. Savell, J. M., Twohig, P. T., & Rachford, D. L. (1986). Empirical status of Feuersteins Instrumental Enrichment (FIE) technique as a method of teaching thinking skills. Review of Educational Research, 56, 381409. *Sewell, T. E. (1979). Intelligence and learning tasks as predictors of scholastic achievement in Black and White first-grade children. Journal of School Psychology, 17, 325332. *Sewell, T. E., & Severson, R. A. (1974). Learning ability and intelligence as cognitive predictors of achievement in firstgrade black children. Journal of Educational Psychology, 66, 948955. Shochet, I. M.(1992). A dynamic assessment for undergraduate admission: The inverse relationship between modifiability and predictability. In H. C. Haywood & D. Tzuriel (Eds.), Interactive assessment (pp. 332355). New York: Springer-Verlag. *Spector, J. E. (1992). Predicting progress in beginning reading: Dynamic assessment of phonemic awareness. Journal of Educational Psychology, 84, 353363. *Speece, D. L., Cooper, D. H., & Kibler, J. M. (1990). Dynamic assessment, individual differences, and academic achievement. Learning and Individual Differences, 2, 113127. *Swanson, H. L. (1994). The role of working memory and dynamic assessment in the classification of children with learning disabilities. Learning Disabilities Research and Practice, 9, 190202. *Swanson, H. L. (1995). Effects of dynamic testing on the classification of learning disabilities: The predictive and discriminant validity of the Swanson-Cognitive Processing Test (S-CPT). Journal of Psychoeducational Assessment, 13, 204229.

Swanson, H. L., & Lussier, C. M. (2001). A selective synthesis of the experimental literature on dynamic assessment. Review of Educational Research, 71, 321363. Tellegen, P. J., & Laros, J. A. (1993). The Snijders-Oomen nonverbal intelligence tests: General intelligence tests or tests for learning potential? In J. H. M. Hamers, K. Sijtsma, & A. J. J. M. Ruijssenaars (Eds.), Learning potential assessment: Theoretical, methodological, and practical issues (pp. 267283). Lisse, Netherlands: Swets & Zeitlinger B.V. *Tissink, J., Hamers, J. H. M., & Van Luit, J. E. H. (1993). Learning potential tests with domain-general and domain-specific tasks. In J. H. M. Hamers, K. Sijtsma, & A. J. J. M. Ruijssenaars (Eds.), Learning potential assessment: Theoretical, methodological, and practical issues (pp. 243266). Lisse, Netherlands: Swets & Zeitlinger B.V. Utley, C. A., Haywood, H. C., & Masters, J. C. (1992). Policy implications of psychological assessment of minority children. In H. C. Haywood & D. Tzuriel (Eds.), Interactive assessment (pp. 445469). New York: Springer-Verlag. Wiedl, K. H., & Herrig, D. (1978). Okologische Validitat und Schulerfolgsprognose im Lern- und Intelligenztest: Eine exemplarische Studie. Diagnostica, 24, 175-186. Erin Caffrey is a graduate student at Vanderbilt University. Her primary interests include students with learning disabilities, beginning reading, and assessment. Her current research examines the predictive validity of dynamic assessment measures of early reading skills. Douglas Fuchs is a professor of special education at Peabody College of Vanderbilt University where he also codirects the VanderbiltKennedy Center Reading Clinic. With Lynn S. Fuchs, Don Compton, and others, he is currently exploring the importance of dynamic assessments in reading and math to match atrisk students to appropriate levels of instruction in an response-to-intervention framework. Lynn S. Fuchs is a professor of special education at Peabody College of Vanderbilt University where she conducts research funded by the National Institute of Child Health and Human Development and other agencies on the importance of individual differences to math and reading performance and how curriculumbased measurement may be used to develop effective instruction for difficult-to-teach children.

Downloaded from http://sed.sagepub.com by on February 17, 2009

Das könnte Ihnen auch gefallen