Do The Teachers' Grading Practices Affect Student Achievement? Hans Bonesrønning

Education Economics, Vol. 12, No.
2, August 2004
Do the Teachers Grading Practices Affect Student Achievement?
HANS BONESRNNING
hansbo@svt.ntnu.no HansBonesrnning Ltd 0 200000August 2004 12 2004 & Francis Original Article 0964-5292 (print)/1469-8257 (online) EducationFrancis 10.1080/0964529042000239168 cede100097.sgm Ltd Taylor andEconomics
ABSTRACT The present paper explores empirically the relationship between teacher grading and student achievement. The hypothesis is that the teachers can manipulate student effort, and hence student achievement, by choosing the proper grading practices. The grading model is analogous to a labor supply model, where the teachers can set the marginal returns to achievement or determine the grade level that is independent of real achievement. The empirical analysis shows that grading differences in the lower secondary school in Norway are much like differences in non-labor income and, further, that students who are exposed to hard grading perform significantly better than other students.
Introduction It is well documented that teachers differ in their effectiveness (see, for instance, Goldhaber & Brewer, 1996; Rivkin et al., 2001), but very little is known about the characteristics of effective teaching. Whether effective teachers have some common characteristics is a controversial issue; many researchers seem to think that the returns to investigations of the characteristics of teacher effectiveness are small. The analysis presented here takes a more optimistic view. It departs from the general hypothesis that effective teachers are characterized by being able to elicit much effort from their students. While this potentially can be achieved in numerous ways, one way might be by choosing the proper grading practices. The specific hypothesis to be investigated here is that teachers who practice hard grading (i.e., give good grades for high achievement only) are associated with students who improve their knowledge base and skills greatly. This approach is inspired by a small number of theoretical contributions that focus on student incentives. Prominent examples are Becker and Rosen (1992) and Costrell (1994). The present analysis first and foremost builds on the student teacher interaction model established by Correa and Gruver (1987). Correa and Gruver conceptualize student achievement as a function of student effort and teacher effort, and introduce utility-maximizing teachers and students that determine their own efforts. Grading is introduced as a teachers instrument by which effort substitution problems can be mitigated; that is, by choosing the proper grading practices, teachers can avoid that students substitute student effort for teacher
Hans Bonesrnning, Department of Economics, Norwegian University of Science and Technology, N-7491 Trondheim, Norway. E-mail: hansbo@svt.ntnu.no ISSN 09645292 Print; 14695782 Online/04/02015117 DOI: 10.1080/0964529042000239168 2004 Taylor & Francis Ltd
152 H. Bonesrnning
effort. The theoretical part of this paper utilizes the general framework provided by Correa and Gruver, and expands their discussion of the grading practices, while abstracting from substitution problems. The characteristics of effective teaching are interesting in their own right, but the current analysis also relates to the broader discussion about educational standards. This discussion is about issues such as: Will student achievement improve if standards are set higher? Do higher standards imply higher drop out rates? The analysis provided here might shed some light on these issues as far as educational standards and teachers grading practices are substitutes. A couple of issues related to the economics of grading have to be clarified before the empirical analysis can be carried out. First, are there any reasons to believe that grading really is important? The teachers grading introduces a distinction between real and perceived achievement. Perceived achievement is the students indirect perception of their real achievement, as mediated through the grades given by their teachers. Under some circumstances (for instance, if the grades determine admission to the next level of education) the students may care more about perceived achievement than real achievement. Formally, this can be expressed by stating that perceived achievement (and not real achievement) is an argument in the students utility function. The empirical investigations provided in this paper make use of data from a lower secondary school in Norway. Within the existing Norwegian system, the use of externally set examinations is very limited (only one examination in one subject at the end of the 10th grade). By setting the grades in all the other subjects the classroom teachers determine the average grade level, which subsequently determines admission to the upper secondary school. This system implies that perceived achievement is important for Norwegian lower secondary students. The second issue that should be clarified is that the importance of grading depends on how it is structured. Perceived achievement can, for instance, be thought of as the product of real achievement and the teachers grading parameter, as in Correa and Gruvers (1987) studentteacher interaction model. Easy grading then has very much the same effects on student effort as a wage increase has on labor supply in the standard textbook model. That is, the net effect can be decomposed into an income effect and a substitution effect. The former is negative; the student responds to easier grading by decreasing studying effort when both leisure and perceived achievement are normal goods. The latter effect is positive; the student increases effort when the marginal returns to effort increase. Clear-cut predictions about the sign of the net effect cannot be established because the income and substitution effects work in opposite directions. However, the teachers can manipulate the relationship between perceived and real achievement in a number of ways. One alternative is that the teachers practice grading in a way that is analogous to non-labor income in the labor supply model, and a second that teachers emphasize student effort as well as real achievement when setting their grades. It is easily verified in theoretical models that the student responses depend critically on the chosen scheme. Thus, unless the ways teachers set their grades are identified, investigations of teacher grading effects are likely to be less successful. This is one of the major points made in the current paper. The relationships between grading practices and student responses are discussed more thoroughly in the third section. The empirical strategy has two parts. Since the grading practices are not directly observable, the first stage seeks to reveal the grading practices by estimating fixed class effects on individual grades. In the second stage, the effects of grading are
Do Grading Practices Affect Student Achievement? 153

estimated by including the measures of the teachers grading as independent variables in an education production function. The main econometric challenges come in the second stage. The problem is that the variation in grading practices basically may reflect unobserved teacher and student characteristics. For instance, the unobserved teacher characteristics that determine the grading practices may as well be important determinants of teacher effectiveness. The grading coefficients estimated by ordinary least squares will then be biased, reflecting other traits of teacher effectiveness than the grading practices. Moreover, teachers may for instance respond to poor student achievement, or to aggressive student negotiations for better grades, by relaxing their grading standards, implying a simultaneity bias in the estimated coefficients. Several approaches, including use of instrumental variables, aggregation of the grading measures to the school level, and the estimation of differences in differences specifications, are proposed as solutions to these problems. A detailed description of the estimation strategies is presented in the fifth section. The paper is organized as follows. The next section gives a short presentation of the existing empirical evidence about the achievement effects of the teachers grading. The third section provides discussions of the relationships between specific teacher grading practices and student achievement. To provide a better background for the empirical estimations, the subsequent section describes relevant institutional features of the Norwegian lower secondary school. Econometric problems and their solutions are then discussed. The data are described in the sixth section, and the seventh section provides estimation results. The final section offers some concluding remarks. Prior Studies There are not many empirical studies of teachers grading practices around. The first contributors I am aware of are Montmarquette and Mahseredjian (1989), who analyze the performance of Canadian primary school pupils. They find that hard grading (i.e., grades set below the real achievement level) deteriorates the standardized test results for French language, and that the students mathematics performance is unaffected by the grading practices. Montmarquette and Mahseredjian provide a simple grading equation, and make no attempts to separate different aspects of the grading practices. More recent contributions support the claim that easy grading deteriorates student achievement. Bonesrnning (1999) provides evidence that the variation in grading practices in the upper secondary school in Norway is captured by a shift parameter much like non-labor income in the labor supply equation, and finds that hard grading improves the performance of Norwegian secondary school students. Moreover, the largest effects seem to be for high achievers. Betts and Grogger (2003) report quite similar findings: test scores rise in schools with high standards, and more for students near the top of the achievement distribution than for students near the bottom. Figlio and Lucas (2004) report that the grading effects depend on the student body composition of the class: high-achieving students benefit most from high standards when they are in a low-achieving class, and low-achieving students benefit most from high standards when they are in a high-achieving class. None of these studies found that the grading practices have qualitatively opposite effects for different student subgroups when focusing on student achievement, but Betts and Grogger (2003) take the investigations one step further by investigating the consequences for educational attainment. They find no evidence that higher standards raise
154 H. Bonesrnning
either high school graduation or college attendance, but that higher standards have negative effects on high school graduation among blacks and Hispanics. Compared with the existing studies, the present study to a larger extent emphasizes the importance of revealing the (exact) way teachers practice their grading. Grading Practices and Student Responses This section clarifies the teachers options with respect to grading in some more detail, and provides an informal discussion of the students most probable responses to different types of grading practices. The discussion of the teachers grading options is irrelevant if national guidelines restrict the grading practices to a substantial degree, but this is unlikely to be the case, at least in Norway. National grading guidelines are rather vague, and no additional mechanisms are established to secure that teachers practice the same type of grading. For example, the teachers are prescribed to reward real achievement, but there is no guarantee that they restrict themselves to such a reward scheme. One reason is that there is a rationale for rewarding effort independently of achievement: if the teacher place a premium on student effort, which is the cause of learning that is most under the students control, the student may respond positively. The teachers may of course realize this mechanism. This discussion motivates a rich description of the teachers options. Lin and Lai (1996), who discuss how parents reward (or punish) their children, have inspired the following equation: w ij = G j + g j ( ij j ) + j ( e ij e j ) (1)
where wij is the perceived achievement for student i in classroom j, vij is the real achievement for student i in classroom j, and eij is the effort provided by student i in classroom j. Gj, gj, j, vj, and ej are parameters characterizing the grading practices chosen by the teacher in classroom j. Equation (1) states that the student who performs equal to the achievement threshold level (vj), and provides effort equal to the effort threshold level (ej), gets the grade Gj. The returns to marginal deviations from the achievement threshold level and the effort threshold level are gj and j, respectively. Equation (1) is part of a general framework (no references to Norwegian institutions) that is used to discuss the students most probable responses to teacher grading. This framework has two parts in addition to the grading practices; the students responses also depend on student preferences and the characteristics of the achievement production function. Let us assume that students care about perceived achievement and one other commodity, say other activities, and that real achievement is a function of student effort (i.e., the achievement production function). Then the students problem is to allocate available time between education production and other activities to maximize utilitysubject to the achievement production function and a time constraint. These ideas, which are loosely outlined in this paper, are formalized in Correa and Gruver (1987) for a case with a more restrictive grading equation than presented here. A number of clear predictions follow from this model. First, if the teacher chooses easier grading by increasing Gj, the students respond by decreasing their studying effort. The reason is that an increase in Gj is like an income effect; and the students choose more perceived achievement and more other activities when both

goods are normal goods. Second, students will allocate more studying effort following an increase of the achievement threshold level or an increase of the effort threshold level. The intuition is as follows. The threshold levels define the achievement level and the effort level from which students achieve rewards above Gj. If the threshold levels are set low, the students receive additional rewards even though they perform poorly and provide little effort. Increases in the threshold levels are very much like a decrease in Gj (the difference is that in the latter case the student responses depend on the teachers valuations of marginal achievement and effort), implying that the students respond by increasing their efforts. Finally, the students who perform below the achievement (effort) threshold level increase their studying effort following an increase in gj (j). No clear predictions follow for students who perform, or provide effort, above the threshold levels. This discussion has brought out some basic features of the student responses to teacher grading, but admittedly the model has a couple of unrealistic features. First, it seems unrealistic that the teacherwithout limitsis able to increase student performance and students effort by increasing the threshold levels. A more realistic assumption is that the students have a fallback position where they provide no effort at all (i.e., they stay in school for psychic reasons or they drop out). Thus, when the threshold level is set above a critical level, the students turn to their fallback position. Second, the teacher faces not one student, but several students, implying that student heterogeneity and trade-offs have to be taken into account. Costrell (1994), who models standards for educational credentials, has inspired the following discussion. Let us assume that there is a continuum of students ordered by their fallback positions and that this ordering can be described by a distribution function. Then the teacher typically faces a trade-off when deciding the threshold levels: on one hand, if the teacher increases the threshold levels, a subgroup of students will improve their performance. On the other hand, increasing threshold levels imply that a number of marginal students will prefer the fallback position to the situation where they provide some effort. The optimal threshold levels are characterized by equalizing the gains for the former group to the losses for the latter group. This implies that the threshold levels will be determined by teacher preferences and the distribution of the relevant student characteristics in class. Moreover, if this explanation for the grading practices variation among classes is correct, there will be winners and losers from grading. This prediction can be investigated empirically. The variation in Gj can be rationalized along similar lines. Finally, note that the model applied here implicitly assumes a decision-making teacher who manipulates passive students. A less restrictive model includes teachers and students who are in a two-sided exchange, in which students typically want to minimize effort and maximize grades. A model along these lines is developed in Bonesrnning (1999), but this issue is not pursued any further here. The Norwegian Lower Secondary School The Norwegian lower secondary school covers the last 3 years of the compulsory school, Grades 810. The students are 1316 years old. According to national guidelines, neighborhood schools, heterogeneous home classes, a common curriculum, and a modest use of external examinations characterize the school organization. The latter feature was commented on in the Introduction. The students are taught a common curriculum in all subjects. The tailoring of teaching to a wide
156 H. Bonesrnning
variety of abilities is supposed to take place within the home classes. This put pressure on the requirement that students should be randomly allocated to home classes. A closer look on student allocation across classrooms is therefore warranted. The students are allocated to home classes at the start of the eighth grade, and these classes stay intact for almost all lessons through 3 years. Formally, ability tracking is abolished, but the combination of tailoring problems and a lack of external control indicates that the randomness of student allocation across classes is an empirical issue. The within-school range in average class performance is used to measure the randomness of the allocation in the present sample. Ideally, this measure should be based on student test scores prior to the establishment of classes, but these test scores have not been available. Instead, the mathematics test scores 1 year after the classes are established are used. Large fractions of students from all classes have participated in the tests, making the calculation of class means meaningful. The total sample mean is 28.60 points, and class averages vary between 19.49 points and 32.86 points (generating a sample range of 13.40 points). The withinschool ranges in class averages vary between 0.30 and 7.51 points. In the former case, we obviously cannot reject the hypothesis that students are randomly allocated across classes. In the latter case, random allocation is much less likely: This school has an average performance of 28.90 points, with a 95% error bound of 2.31 points. That is, the probability is less than 5% that the sample range is larger than 4.62 points by random draw. A rough indication about the occurrence of non-random student allocations is provided by counting the number of schools that have a within-school range in class averages above 4.6 points. (This is a very rough indication because student heterogeneity varies between the schools, and hence, the critical sample range varies). Thirty percent of the schools that have more than one class in each grade belong to this category. The school organization as already presented has two major implications for the present study. First, the students cannot easily escape hard-grading teachers. The students cannot choose which school to attend (unless by choosing home residence), and they cannot choose between different courses within the lower secondary school. However, they can respond to hard grading by dropping out. Drop-out rates are not published, but there are some indications that they are low. For example, 86% of the pupils quitting the lower secondary school in 1994 (drop-outs included) were enrolled in the upper secondary school within the next 4 years. A reasonable assumption is that a major part of the 14% of school-leavers leaves the schooling system after finishing the lower secondary school. Second, we cannot exclude that students are non-randomly allocated across classrooms. This generates potential problems for the analyses. For instance, poorly motivated students (unobserved by the researcher) may be sorted out for particular classes. If these students then are exposed, say, to easy graders, the revealed relationship between grading and student achievement obviously is not a causal relationship. The econometric solutions to this problem are discussed in the next section. Empirical Specifications The grading equation (equation (1)) contains too many parameters to be estimated. Notably, the threshold levels cannot be separated from Gj in empirical work. I therefore provide estimations of the simplified version, given as:

w ij = x ij + G j class j + g j ij class j + j e ij class j + ij
j =1 j =1 j =1 N N N
(2)
where Xij is a vector of individual student and family background characteristics, class j is a dummy variable that is 1 if student i belongs to class j and 0 otherwise, and ij is a random error term. Why include individual student background characteristics as independent variables in equation (2)? One answer is that teachers may use these variables as indicators of student ability or student effort to reduce the uncertainty related to grading. Alternatively, the individual student characteristics may capture teacher prejudice. The estimated coefficients Gj, gj, and j from equation (2) are subsequently used as measures of the teachers grading practices. The standard education production function provides the framework for investigating the effects of the teachers grading practices. The education production function is a reduced-form equation linking student achievement to individual student and family characteristics, peer group characteristics and purchased inputs. The conventional understanding is that the students, the teachers and the parents efforts, and more broadly the incentive schemes established by teachers and parents, are solved out from an underlying structural model (for instance, the model presented in the third section) and represented by their exogenous determinants in the education production function. As stated in the Introduction, it is well established that the teacher characteristics that usually are included in the equation do not capture teacher effectiveness to any reasonable extent, implying that potential systematic effective teacher behavior is included in the residual. The following equation should be thought of as a conventional education production function where the characteristics of the teachers grading practices are separated out from the residual:
ijt = 0 + 1Yij + 2 P j + 3 S j + 4 ijt 1 + 5G j + 6 g j + 7 j + ij
(3)
where vijt and vijt1 are achievement at time t and t 1, respectively for student i in class j, Yij is a vector of family background characteristics, not necessarily identical to Xij in equation (2), Pj is a vector of peer group characteristics, Sj is a vector of school inputs, and ij is a random error term. Hanushek (1979) discusses the strengths and weaknesses of value-added specifications like equation (3). Equation (3) raises some interpretative problems. Notably, the situation where grading is important, but fully captured by the exogenous determinants included in the equation, cannot be separated from the situation where grading has no effects on student achievement. Separating these cases may be of crucial importance if we want to reveal the characteristics of effective teaching, but for policy-makers the ability to separate these situations does not seem to be very important. The situation that might potentially call for some policy actions is when grading is important and not fully captured by the traditional variables in the education production function. Equation (3) is geared towards this situation. The parameters of interest in equation (3) are the grading coefficients, 5, 6, and 7. Ordinary least squares estimates of these coefficients will most probably be biased. The reason is that a substantial part of the variation in the grading practices is not driven by some exogenous factors, but reflects the teachers preferences and resource constraints. For instance, the discussion provided in the third section highlights that the teachers grading practices might reflect student
158 H. Bonesrnning
performance. The inclusion of peer group measures together with teacher characteristics should reduce the problems that the grading variables are correlated with the error term, but reducing the omitted variables problem does not eliminate the simultaneity problems. The standard approach to the latter problem is instrumental variable estimation. Instruments are generated from auxiliary regressions of the grading variables against student body and teacher characteristics. Since these equations must include at least one variable that is not a determinant of student achievement, credible instruments may be hard to establish. Supplementary approaches are therefore pursued. The grading variables estimated at class level are substituted for grading variables estimated at the school level. The latter are probably much less correlated with unobserved student body and teacher characteristics in particular classes, but, admittedly, it is also a less precise measure of the grading to which individual students are exposed. The success of this approach depends heavily on the existence of a school consensus about grading practices. Also, a differences-in-differences approach is applied that makes use of the time series nature of the data: the students are tested four times during two subsequent years, and they have reported their grades in both years. In a first stage, the grading practices for two subsequent years are estimated as specified by equation (2). Thereafter, it is investigated whether the year-to-year changes in the grading practices can explain the year-to-year changes in achievement growth. This approach circumvents many of the problems related to unobserved student characteristics, but is successful only to the extent that the students experience different practices in subsequent years. The Data There are no publicly available databases that can be used to estimate education production functions in Norway. The researchers involved in the present project have sampled the data. Due to budget constraints, a national sample was beyond the range of the project. A non-random sample was generated according to the following guidelines: each region of Norway (north, middle, west, south, southeast) is represented with one county. The chosen county has the largest variation in expenditures per student among its local governments. This sampling procedure implies that the average expenditures per student within the sample are slightly higher (4%) than the national average, while the variation in expenditures per student among schools is substantially higher (22%) than nationally. During 2 years (the school years 1998/99 and 1999/2000) a cohort of students from the five counties were tested in mathematics four times; at the start and the end of each school year. In addition, the students reported their grades, their family background, and their efforts once each year. Data about purchased inputs come from three sources. The school principals have reported class size, while resources at the school level are taken from the Governments own database (GSI). The teachers have reported their own education and experience. Approximately 3800 students participated in the first test, but the analyses are based on a smaller number of students. This is due to several reasons: a number of students have not reported all relevant information, a number of students have not participated in the subsequent tests, not all school principals have reported class size, and not all teachers have reported their background. The numbers of students on which the analyses are based are displayed in the tables reported in the next section. The average performance of the students included in the present analyses is

slightly better than the performance of all the students who participated on the initial test. Most of the analyses reported here are based on data from the last year (the 10th grade). That is, applying the grades reported by the students in March 2000, and the test results from April 2000 calculate the grading practices. The value-added education production function applies the test results from September 1999 (t 1) and April 2000 (t). The differences-in-differences approach applies test results from September 1998, September 1999 and April 2000, and grading practices calculated in spring 1999 and spring 2000. Some of the analyses make use of a student effort measure. This measure is constructed from the students answer to the following question given in March 2000: How much time do you spend, on average, thinking on other things than mathematics during a mathematics lesson? The students are given four alternatives: less than 5 minutes (1), 610 minutes (2), 1115 minutes (3), and more than 15 minutes (4). Changing the sign of the alternative chosen and adding 5 (i.e., the students providing most effort in class are assigned the highest number) generates the student effort measure. This, of course, is a crude measure of student effort. We therefore should exercise care in interpreting the estimated effort coefficients The list of control variables includes family background measures, class size and teacher education, peer group measures and a community characteristic (rural/ urban). The family background measures are based on information provided by the students in March 2000, class size and teacher education are reported by school principals and are for the ninth grade only, and peer group measures based on student test scores are lagged with 1 year (i.e., based on the test results from autumn in 1998). Note that teacher education is measured by a dichotomous variable; which is 1 if the teacher comes from a training college, and 0 otherwise. Otherwise in most cases mean universities. The teacher education variable is motivated by the continuing debate in Norway about which is the superior type of teacher education. Most control variables are potentially endogenous, but this paper makes no attempt to deal with these problems. The descriptive statistics are reported in Table 1. The range of grades is 16, where 6 is the best grade and 1 represents failure, while the test scores are raw scores with a range of 028 points in the final 10th-grade test. No measures of the grading practices are included in Table 1. Their calculation is part of the next section. Results The first part of this section reports results from the grading equation; which provide the measures of the teachers grading practices to be used in the subsequent parts. The second part explores the determinants of the grading practices. The motivation is to provide an instrumental variable that can be used in the third part, but also this analysis has some interest in its own right. The third part reports how students respond to the grading practices. The Grading Equation The grading equation (equation (2)) is estimated with fixed effects to reveal the variation in the teachers grading across classrooms. The estimation of the most general equation does not perform very well when estimated at the classroom level. Both gj and j are estimated with very little precision, and the explanatory power of the
160 H. Bonesrnning
Table 1. Descriptive statistics
Variable Mathematics test result end 10th grade Mathematics test result start 10th grade Mathematics grades 10th grade Student effort Females Highly educated fathers Highly educated mothers Fraction of intact families Number of siblings Birth order More than 200 books More than one television Class size School size Teachers from training colleges Average mathematics achievement start-up ninth grade Fraction of hard-working students Number of students Mean 14.80 27.73 3.74 2.29 0.50 0.41 0.35 0.80 2.00 1.94 0.29 0.86 24.82 1.1 104 0.68 27.64 25.50 887 Standard deviation 4.35 6.90 1.10 1.09
1.19 1.39
4.35 0.5 103 2.07 2.78
equation is small. This is taken as evidence that the teachers evaluations of marginal student achievement and effort do not vary systematically across classes. More restrictive versions of equation (2) are estimated, and it turns out that the equation where only Gj is allowed to vary between classes provides the best fit to the data. The results are reported in Table 2 (column 1). A total of 103 classes are included in the
Table 2. The grading equation (t-statistics in parentheses)

Variable Female Fathers education Mothers education Two-parent family Real achievement Student effort GC72 GC100 GS30 GS48 GS39 GS40 S39 S40 R2adj N Class level 0.17 (2.74) 0.26 (3.29) 0.13 (1.61) 0.18 (2.29) 0.15 (20.96) 0.09 (3.08) 0.29 (1.15) 2.41 (4.46) School level 0.01 (0.18) 0.17 (2.36) 0.16 (2.11) 0.11 (1.38) 0.12 (16.03) 0.14 (4.74) School level 0.02 (0.37) 0.17 (2.20) 0.15 (1.93) 0.10 (1.22) 0.12 (15.00)
1.07 (1.99) 3.14 (4.93) 2.58 (2.92) 1.66 (3.96) 0.45 (1.60) 0.18 (1.19) 0.36 858
0.34 858
0.36 858
Note: The dependent variable is individual students mathematics grades in the 10th grade. The table reports class xed effects and school xed effects for a small number of classes/schools only. All classes/schools are included in the estimated equations, which are estimated without a constant.

estimation (j = C1, C2, , C103), but only the classes with the highest and lowest Gj values are represented in the table. These are classes 100 and 72, respectively. As indicated by the two Gj values reported in column 1, the grading practices vary substantially between the classes in the present sample. All else equal, a student in class 100 gets a grade that is approximately two points better than a student in class 72. The average Gj value in the sample is 1.96 with a standard deviation of 0.50. Formally, the hypothesis that the grading practices are equal across the classes is highly rejected. The Gj coefficients estimated from the grading equation are used as explanatory variables in the education production function reported in the following. The estimated equation provides some additional information. The teachers grading is closely related to real achievement. As shown in column 1 of Table 2, an improvement in real achievement by 10 points transforms into a grade improvement of 1.5 grades. More surprisingly, perhaps (given that the teachers grading guidelines leave no discretion for rewarding effort), there is evidence that the teachers reward student effort. A student who is very much concentrated during the mathematics lesson achieves approximately 0.3 points better grades than a student who is very little concentrated, all else equal. However, it seems quite probable that student effort is endogenous to teachers grading. The estimated coefficient should therefore be treated with care. There is strong evidence that female students with well-educated parents and students from two-parent families get better grades than other students, all else equal. These results may reflect that the background variables capture student behavior that is rewarded by teachers, or that the teachers use information about student background to reduce uncertainty related to grading, or that students or parents with these characteristics put pressure on teachers for easy grading. Even more interpretations can be thought of. Table 2 (columns 2 and 3) report results where schools are substituted for classes in the grading equation. Forty-eight schools are included in the estimation (j = S1, S2, , S48). In column 2 only Gj is allowed to vary between schools. The estimated equation is thus comparable with the equation reported in column 1. As can be seen, the results reported in the two columns are very similar: The school grading practices variation is approximately within the range of two points (GS48 GS30 = 2.07), real achievement and student effort are rewardedwith a little less weight on real achievement and some more weight on effortand family background matters qualitatively as in the former case. Note, however, that no significant gender differences are revealed in column 2. Column 3 of Table 2 reports the results from a less restrictive specification, where the marginal evaluation of student effort is allowed to vary between schools. The table reports results from two rather arbitrarily chosen schools that display different grading practices. Relative to the teachers in school 40, the mathematics teachers in school 39 practice easy grading in two respects; as both the Gj and the j parameters have relatively high values in school 39 compared with school 40. It should be noted that in this less restrictive specification, the Gj parameters are estimated with fairly good precision (t-statistics in the range from 0.89 to 11.35), but the j parameters are much less precisely estimated (t-statistics in the range from 0.19 to 2.60). The earlier comments regarding the potential endogeneity of the effort variable apply here also. Table 2 displays no information about the variation in marginal evaluations of real achievement among schools. The reason is that there is no evidence that gj varies in any systematic way.
162 H. Bonesrnning
Table 3. Determinants of the teachers grading practices (t-statistics in parentheses)
Variable Teacher education Average mathematics achievement in the ninth grade Standard deviation in mathematics achievement in the ninth grade Fraction of hard-working students in school Class size School size R2adj N Estimated coefficient 0.32 (2.27) 0.06 (1.76) 0.07 (1.24) 0.51 (2.30) 0.01 (0.40) 0.26 10-4 (2.55) 0.20 54
Note: The dependent variable is the grading variable reported in Table 1, column 1. The equation is estimated by ordinary least squares. Note that this method does not take account of the additional errors that come from estimation of the grading practices in the rst stage. The additional errors may be heteroskedastic when the sampling variances of the predicted class effects differ across classes. The estimated variances underlying the reported t-statistics may thus be biased.
The Determinants of the Teachers Grading Practices Table 3 reports results from a regression of the grading variable Gj calculated by the equation reported in Table 2, column 1, against student body and teacher characteristics. Table 3 shows that teachers from training colleges practice significantly easier grading than do other teachers. Moreover, students in classes where classmates perform well, as measured by average test results at the start of the ninth grade, seem to be exposed to easier grading. It is not clear how the latter result should be interpreted. The discussion of the rational teachers decision problem provided in the third section indicates that hard grading should take place in classes where the initial average performance is highbecause the number of students who loose from hard grading is likely to be small when the initial average performance is high. One potential reason the estimations reveal the opposite sign is that high achievers put pressure on the teachers for easy grading. This interpretation is, of course, speculative. The revealed relationships between grading practices and student body characteristics warrant a more general comment related to the discussion about peer group effects in education production. Manski (2000) separates between three different mechanisms why students belonging to the same class tend to behave similarly. Endogenous effects and contextual effects occur when the propensity of an individual to behave in some way varies with behavior in the group or the background characteristics of the group, respectively. Correlated effects occur when individuals achieve similarly because they, for example, are taught by the same teacher. The results reported earlier indicate that teacher behavior reflects student behavior. Thus, teacher responses to the student body composition may actually weaken or strengthen the peer group effects (i.e., the endogenous or contextual effects). The fraction of hard-working students is included as an explanatory variable in the regression reported in Table 3 to provide the necessary identification restrictions. This variable is generated from the students questionnaire in the ninth grade. The students are given four alternatives to characterize their own efforts in school, and a measure of hard-working students in class is generated from their answers.

Hard-working students are those students that say that they provide much and very much effort. The class average is 25.5%, with a standard deviation of 2.78. To make sense, we must believe that this between-class variation stems from basic motivational factors that are not influenced to any extent by the mathematics teachers. As presented in Table 2, this variable is a significant determinant of the teachers grading practices. One standard deviation in the fraction of hard-working students transforms into 0.30 of one standard deviation in the measure of the teachers grading. Thus, quite a large part of the variation in the teachers grading seems to be explained by this variable. It is hard to provide convincing arguments that the exclusion restriction is fulfilled; that is, that the fraction of hard-working students does not belong in the education production function. However, some support may come from Lazear (2001), who states that teaching should be viewed as a congested good, where disruptive students generate congestion. If this is the basic mechanism generating externalities in the classroom, hard-working students may simply not interfere much with their peers, implying that there are no externalities related to this subgroup of students. An often used, but admittedly not very powerful, test of the exclusion restriction is to include the potential identifying variable as an independent variable in the education production function. When this is done, the fraction of hard-working students appears to have no significant effect on student achievement. The predicted grading practices will be used as an instrumental variable in the education production function, but in spite of the argument provided, it is realized that its validity can be questioned. The Consequences of the Teachers Grading Practices Table 4 presents results for the grading coefficients generated from estimations of the education production functions augmented with measures of the teachers grading practices. The control variables are not reported, but include family background variables (family structure, parental education, number of books at home, the number of televisions at home), class size, teacher education, peer group measures, and community characteristics (rural/urban). The value added specification is applied. The estimated coefficients therefore show how individual student achievement growth during the 10th grade is affected. Table 4. The education production function (t-statistics in parentheses)
Variable GjC Gj predicted GjS Gj
S C
Class-level grading measures included 1.09 (3.41)
Predicted grading included
School-level grading School-level grading measures included measures included
0.40 (0.23) 0.84 (2.03) 0.97 (2.40) 0.30 640 0.28 323 0.29 640 0.20 (2.09) 0.29 640
j R2adj N
S
Note: Dependent variable is the test results at the end of the 10th grade. Independent variables not reported are test results at the start of the 10th grade, gender, parental education, family size, family structure, books at home, number of televisions, peer group measures, teacher education, the number of teachers per student, and school size.
164 H. Bonesrnning
All coefficients reported in Table 4 have negative signs, clearly indicating that easy grading deteriorates student achievement. Thus, the results are broadly consistent with the model predictions, but some closer inspection is warranted. The equation reported in the first column uses the class level measure of the grading practices as established by the equation reported in Table 2, column 1. As can be seen in Table 4, the estimated grading coefficient is highly significant. The effect is of some considerable size as one standard deviation in grading practices transforms into more than 0.5 points in student achievement, which is 12% of one standard deviation in student achievement. As noted earlier, the estimated effect may be biased due to unobserved student and teacher characteristics. Therefore, the predicted grading practices, based on the equation reported in Table 3, are substituted for the observed grading practices. As shown in the second column, the coefficient is still negative, but insignificant and much smaller than in the prior case. In the third column, the grading practices estimated at the school level are substituted for the class level variable. This also is a procedure to mitigate the problems related to unobserved student and teacher characteristics; in particular, the problem that students seem to be non-randomly allocated to classes. The estimated grading coefficient is somewhat smaller than that reported in column 1, but it is still significant at the 5% level. The fourth column reports the results from an estimation of the education production function that contains two measures of the grading practices. These measures are Gj and j that follow from the equation reported in the third column in Table 2. The grading effect related to Gj is within the range reported earlier, and the j coefficient is significant and negative. The latter result indicates that students respond to easier grading of effort by decreasing their efforts, leading to a decrease in student achievement. Formally this result indicates that the income effect of easy effort grading seems to dominate the substitution effect. Once again, it should be emphasized that the results for the effort variable should be treated with care. The education production function results reported so far come from estimations of value-added specifications; that is, from estimations of equation (3). Now I make more extensive use of the time-series nature of the data. In Table 5, the dependent variable is a measure of the difference in achievement growth between the ninth and the 10th grades, and the independent variable of main interest is the change in grading practices between the ninth grade and the 10th grade. As already described, the measure of the difference in achievement growth is generated from the test scores at the start of the ninth grade, the start of the 10th grade, and the end of the 10th grade. First identifying the grading practices in both the ninth and the 10th grade, and subsequently taking the difference between these measures, generate the measure of the change in the grading practices. These variables are constructed so that a negative estimated coefficient for the grading variable is consistent with the results reported earlier. Column 1 includes school fixed effects, but no other control variables, while columns 2 and 3 include additional variables to capture family background and peer influences. The rationale for including time-invariant variables is that their influences on achievement growth may not be time-invariant. As shown, the estimated grading coefficient is negative throughout Table 5, thus lending some support to the claim that easy grading deteriorates student achievement. The coefficient becomes insignificant in the specification that includes peer group variables. These results are consistent with the findings reported earlier: easy grading is more likely to take place in classes with high average performance. In one

Table 5. Differences-in-differences estimations (t-statistics in parentheses)
Variable GjC Female Fathers education Mothers education Intact family Average mathematics score in the ninth grade Standard deviation of mathematics score in the ninth grade School fixed effects R2 adj N (1) 0.18(1.74) (2) 0.16 (1.53) 0.34 (0.54) 2.05 (2.16) 0.81 (0.84) 2.28 (2.49) (3) 0.10 (1.02) 1.61 (2.08) 1.31 (1.36) 0.50 (0.53) 0.40 (0.40) 0.19 (1.85) 0.49 (1.22) Yes 0.07 657
Yes 0.01 657
Yes 0.05 657
Note: The dependent variable is the difference in achievement growth between the ninth and 10th grades. The difference is positive if the achievement growth is higher in the 10th grade than in the ninth grade. Grading is the difference in the teachers grading practices between the ninth and 10th grades. The difference is positive if grading is easier in the 10th grade.
interpretation, then, Table 4 reveals a significant and negative relationship between easy grading and student achievement growth, but since grading is affected by the peer group composition, some part of the grading effect can be conceptualized as a peer group effect. Note that the peer group composition affects the change in achievement growth even though it is unchanged through 2 years. The reason has to be that individual students are more affected by their peers one year than the other. The same interpretation applies to the other time-invariant variables that come out with significant coefficients in the differences-in-differences estimations. From the point of view of the rational teacherwho, by presumption, chooses the optimal grading practicesthe reason for practicing easy grading must be that these practices have some positive achievement consequences for at least some subgroups of students. However, the estimation results presented so far have revealed no positive effects of easy grading. One reason may be that the effects are concealed within the sample of all students. As a first step to investigate this issue, the students are separated into two categories; low achievers and high achievers. The former subgroup consists of students who have performed below the sample average on the test in the beginning of the 10th grade (which is an independent variable in the education production function), while the latter subgroup are those students who have performed above the sample average on the same test. Separate grading effects for these two student subgroups are estimated, and the results are reported in Table 6. Table 6 reports the results for the same grading practices parameters as in Table 4; with one exception. The predicted grading practices in class are excluded; instead results are reported for a grading practices parameter that is estimated for low achievers only (GjSLA). This variable is established to investigate the possibility that the teachers grade different student subgroups differently across schools. As shown in Table 6, there is strong evidence that high-achieving students are negatively affected by easy grading, while the effects for low-achieving students are much smaller and insignificant at conventional levels. However, in all but one of the
166 H. Bonesrnning
Table 6. The education production function. Separate estimations for low achievers and high achievers (t-statistics in parentheses)
Variable GjC GjS GjS Low achievers High achievers Low achievers High achievers Low achievers High achievers Low achievers
0.47 (1.15) 1.83 (3.56) 0.09 (0.17) 2.05 (3.13) 0.10 (0.18) 2.17 (3.50) 0.42 (0.34) 3.97 (2.62) 0.29(0.64) 0.07 372 0.21 266 0.07 372 0.20 266 0.06 372 0.21 266 0.08 307
jS
GjSLA R2 adj N
Note: Dependent variable is the test results at the end of the 10th grade. Independent variables that are not reported are test results at the start of the 10th grade, gender, parental education, family size, family structure, books at home, number of televisions, peer group measures, teacher education, the number of teachers per student, and school size.
cases, the estimated coefficients come out with negative signs also for the latter student subgroup. These results therefore indicate that there may be no student subgroups that actually improve their performance from easy grading. This assertion is confirmed by investigation of the grading effects for more narrowly defined subgroups of low achievers, but these results are not reported. Note, however, that the analysis reported earlier does not control for students who drop out because the grading is too hard, and no such estimations are provided due to lack of data. The dropout problem is unlikely to affect the estimated coefficients for high-achieving students, but may possibly lead to biased coefficients for low-achieving students. For this reason, the results for low-achieving students should not be treated as decisive evidence. Thus, the reported coefficients may suffer from a drop-out bias. But there are other potential explanations for the findings revealed. One is that the grading practices may be determined by unobserved teacher characteristics, which also determine other aspects of teaching effectiveness. Then the grading puzzle (i.e., why do teachers choose ineffective grading?) is part of a greater puzzle of why some teachers are more effective than others. Another potential explanation is that the model presented in the second section may not have captured the political economy of teacher grading very well. One alternative model is suggested by Bonesrnning (1999): students who care about perceived achievement and other activities are better off when exposed to easy grading, and thus they have incentives to put pressure on teachers. Some student subgroup may have stronger incentives, or are being more successful in their rent seeking activities, and some teachers may be more responsive to students claims than are others. Concluding Remarks Teachers affect student achievement in a number of ways. This paper has focused on the teachers grading practices. It is argued that this may be a fertile area of research because grading is a teachers instrument by which student effort can

potentially be manipulated. Further, it is argued that the power of grading is higher if perceived achievement is an important determinant of admission to higher education (as it is in Norway). The first part of the empirical analyses explores the grading practices. The motivation is that the grading effects depend on how the grading is designed. It turns out that the variation in grading practices in mathematics to a substantial degree is captured by a constant term. The second part of the analysis shows that all students, and high-achieving students in particular, perform better when exposed to hard grading. The strong message from this analysis is that effective teachers are characterized by being able to manipulate student effort. Note, however, that both the inherent endogenous problems and the problems of separating the grading effects from other effective teachers traits point to the need of more analyses before this conclusion rests on firm ground. Acknowledgements The Norwegian Research Council has provided financial support for this work. Comments from two anonymous referees are highly acknowledged. References
Becker, W. & Rosen, S. (1992) The learning effect of assessment and evaluation in high school, Economics of Education Review, 11(2), pp.107118. Betts, J. R. & Grogger, J. (2003) The impact of grading standards on student achievement, educational attainment, and entry-level earnings, Economics of Education Review, 22(4), pp. 343352. Bonesrnning, H. (1999) The variation in teachers grading practices: causes and consequences, Economics of Education Review, 18, pp. 89105. Correa, H. & Gruver, G. W. (1987) Teacherstudent interaction: a game-theoretic extension of the economic theory of education, Mathematical Social Science, 13, pp. 1947. Costrell, R. M. (1994) A simple model of educational standards, American Economic Review, 84(4), pp. 956971. Figlio, D. N. & Lucas, M. E. (2004) Do high grading standards affect student performance?, Journal of Public Economics, 88(910), pp. 18151834. Goldhaber, D. D. & Brewer, D. J. (1996) Why dont schools and teachers seem to matter?, Journal of Human Resources, 32(3), pp. 505523. Hanushek, E. A. (1979) Conceptual and empirical issues in the estimation of educational production functions, Journal of Human Resources, 14(3), pp. 351388. Lazear, E. P. (2001) Education production, Quarterly Journal of Economics, 66(3), pp. 777801. Lin, C. & Lai, C. (1996) Why parents and teachers may prefer punishment to encouragement for child education?, Southern Economic Journal, 63(1), pp. 244247. Manski, C. (2000) Economic analysis of social interactions, Journal of Economic Perspectives, 14(3), pp. 115136. Montmarquette, C. & Mahseredjian, S. (1989) Could teacher grading practices account for the unexplained variation in school achievements?, Economics of Education Review, 8(4), pp. 335343. Rivkin, S. G., Hanushek, E. A. & Kain, J. F. (2001) Teachers, schools and academic achievement, NBER Working Paper No. 6691.

Do The Teachers' Grading Practices Affect Student Achievement? Hans Bonesrønning

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Do The Teachers' Grading Practices Affect Student Achievement? Hans Bonesrønning

Hochgeladen von

Copyright:

Verfügbare Formate

Education Economics, Vol. 12, No.

Do the Teachers Grading Practices Affect Student Achievement?

Do Grading Practices Affect Student Achievement? 153

Do Grading Practices Affect Student Achievement? 155

Do Grading Practices Affect Student Achievement? 157

ijt = 0 + 1Yij + 2 P j + 3 S j + 4 ijt 1 + 5G j + 6 g j + 7 j + ij

Do Grading Practices Affect Student Achievement? 159

4.35 0.5 103 2.07 2.78

Table 2. The grading equation (t-statistics in parentheses)

Do Grading Practices Affect Student Achievement? 161

Do Grading Practices Affect Student Achievement? 163

Class-level grading measures included 1.09 (3.41)

Predicted grading included

School-level grading School-level grading measures included measures included

Do Grading Practices Affect Student Achievement? 165

Yes 0.01 657

Yes 0.05 657

Do Grading Practices Affect Student Achievement? 167

Das könnte Ihnen auch gefallen