An Experimental Study of the Effects of Class Size

Author(s): Stan M. Shapson, Edgar N. Wright, Gary Eason, John Fitzgerald

Source: American Educational Research Journal, Vol. 17, No. 2 (Summer, 1980), pp. 141-152
Published by: American Educational Research Association
Accessed: 11/01/2011 03:15

American Educational Research Journal
Spring 1980, Vol. 17, No. 2, Pp. 141-152

An Experimental Study of the

Effects of Class Size

Simon Fraser University


Toronto Board of Education

The study investigated the effects offour class sizes (16, 23, 30, and
37) on teachers'expectations; the attitudesand opinions ofparticipants
(studentsand teachers); student achievementin reading, mathematics,
composition,and art; student self-concept; and a variety of classroom
process variables (e.g., teacher-pupil interaction,pupil participation,
method of instruction).Teachersand students were randomly assigned
to a class size in Grades 4 and 5. A total of 62 classes in three school
districts in Metropolitan Torontoparticipated in the two-year study.
Findings indicated that teachers had definite expectations of class size
effects that subsequently were reported to be confirmed by their
experience in the study. However, most other resultsfailed to support
teachers' opinions. Few of the observed classroom process variables
were affected by class size. Although students' mathematics-concept
scores were higher in size 16 than 30 or 37, there were no class size
effectsfor the other achievementmeasures (reading, vocabulary,math-
ematics-problemsolving, art, and composition) or for students' atti-
tudes and self-concepts.

The issue of class size has long attracted the interest of the educational
community (Glass & Smith, 1979). The present study was conducted in the
province of Ontario where, in the early 1970's, educators were faced with
the problem of how best to cope with diminishing school enrollments and
ceilings on educational expenditures. In responding to these concerns, the

This research project was funded under contract by the Ministry of Education, Ontario.


Boardof Educationfor the City of Torontorequesteda reporton literature

relatingto class size. After publicationof this report(Shapson, 1972),the
Toronto Teachers'Federationpassed a motion requestingthat the Board
institute "a researchproject to determinethe effects of class size on the
educationof a child in Toronto."Subsequently,the researchreportedhere
was funded undercontractby the OntarioMinistryof Education(Wright,
Shapson,Eason,& FitzGerald,1977).
This study was designed to examine experimentallythe differencesbe-
tweenfour class sizes rangingfrom 16to 37 studentsby randomlyassigning
studentsand teachersto a particularclass size. The study investigatedthe
effects of class size on teachers'expectationsabout the effects of class size,
the attitudesand opinionsof participants,studentachievement,and a variety
of classroomprocessvariables(e.g., teacher-pupilinteraction,pupil partici-
Sixty-twoclassesof studentsin the fourthand fifthgradesfrom 11schools
in MetropolitanTorontoparticipatedin the two-yearstudy. Only teachers
who had at least two years of teachingexperiencewere selected.Meetings
wereheld in all schoolsto ensurethatteachershad the optionof participating
on the basis of informedconsent.Approximately70 percentof the sample
were women, consistent with the total population of elementaryschool
teachersin Toronto.Studentsfrom all socioeconomiclevels (Blishen, 1967)
wererepresentedin the sample,but therewas a slightlyhigherproportionof
students from the lower socioeconomic categories (52 percent) in comparison
to the total elementarystudentpopulation(44 percent).
Design and Procedure
In the first year of the study, teachersand studentsin the fourthgrade
wererandomlyassignedto classesof foursizes: 16,23, 30, or 37. The student
assignmentswere stratifiedby sex and by ratingsof academicperformance.
For the secondyear,the same teachersand studentsweresimilarlyassigned
to Grade5 classes,with the constraintsthat studentsnot be in a class size of
16 or 37 for both yearsof the study and that teacherswho taughtclassesof
the two largersizes receiveclassesof the two smallersizes (and vice versa).
The actualsize of each participatingclass was closely monitoredduringthe
studyto ensurethat the enrollmentdid not vary by more than +3 from the
assignedclass size.
A resourcegroupof TorontoTeachers'Federationmembers
in the planningphase of the study. This group met with the researchersto
help definethe scopeof the studyand to reviewproceduresand draftsof the
measurementinstruments.Eachyearof the study,paperand pencilmeasures


of the opinions and attitudes of participants (students, teachers) were col-

lected. Standardized achievement tests, a self-concept scale, and an art and
composition measure were administered to students. Observations of class-
room process variables were made with the Toronto Classroom Observation
Schedule, which was designed for the study. For both years of the study,
trained observers used this schedule for eight half-day visits to each partici-
pating classroom. In addition, the observers used "Indicators of Quality," an
instrument used in a previous class size study (Olson, 1970), for five 20-
minute visits to each class.

The Toronto Classroom Observation Schedule (TCOS). The TCOS was
designed and field tested prior to the start of the study. Detailed information
on the development of the instrument and reliability data are presented in
Wright et al. (1977). The schedule includes a fixed, time-sample observation
system (to record teacher verbal behavior, pupil participation, and pupil
aggressive behavior); an observation checklist (method of instruction, subject
emphasis, use of educational aids, and physical conditions); and a Classroom
Atmosphere Rating scale. The following variables were investigated with
this schedule:
(1) Teacher-pupilinteraction-observations of three aspects of teachers'
verbal behavior: content (pupils' behavior, course content, or routine proce-
dures);function (approving, disapproving, or neutral; verbalizations about
course content were further classified as questioning or telling); audience
(addressed to an individual, a group, or the whole class);
(2) Pupil participation-the participation of randomly selected individual
pupils was classified as verbal or nonverbal. Verbal participation was further
classified as self-initiated or reactive, and according to audience (i.e., teacher,
peer, or self). A pupil could also be classified as off-task or as having no
assigned task;
(3) Pupil satisfaction-the frequency of all overt, disruptive, hostile acts
occurring within a fixed time period;
(4) Method of instruction-a ranking of the frequency of various methods
of instruction;
(5) Subject emphasis-a ranking of the subjects taught according to the
time spent on them;
(6) Physical conditions-checklist of classroom noise, furniture arrange-
ment, and educational aids;
(7) Use of educational aids-the proportion of pupils using audio-visual,
written, and mechanical or concrete aids; and
(8) Classroom atmosphere-rating scale of 45 Likert items representative
of classroom atmosphere (e.g., pupils' regard for the teacher).
Indicators of Quality. This observation instrument taps four categories of


classroom activity: individualization, interpersonal regard, creative expres-

sion, and group activity (Vincent & Olson, 1972). It consists of 51 items,
each describing one "positive" and one "negative" classroom activity pur-
ported to signify "quality of education."
The Canadian Tests of Basic Skills (1968). Four subscales of this standard-
ized test were administered:vocabulary (17 minutes), reading comprehension
(55 minutes), mathematics-concepts (30 minutes), and mathematics-problem
solving (30 minutes).
Art Sample. Samples of students' art were collected under standard
procedures each year for the topic "What I like to do." The art samples were
rated on a developmental scale ranging from the manipulative stage through
to the preadolescent stage (Gaitskell & Hurwitz, 1970).
CompositionSample. Students' compositions on the topics "Dreams" (the
first year) and "Wishes" (the second year) were collected under standard
procedures. A five-point rating scale was developed for assessing the com-
positions (Wright & Reich, 1972).
Self-concept. The North York Self-Concept Inventory (Shapson, Virgin,
& Crawford, 1971) comprises 30 items measuring pupils' academic self-
concept. Students indicate whether they feel each statement is "true"or "not
true," and responses are scored to indicate a positive-negative self-concept.
Pupils' Questionnaire.This questionnaire contained six items measuring
students' attitudes toward specific subjects of instruction and 30 items
measuring students' attitudes toward the physical and social classroom
environment, their contact with teachers and peers, and their general satis-
faction in school.
Teachers' Questionnaires. One questionnaire was administered before
teachers were informed of their assigned class size to obtain background
information (e.g., teachers' previous experience) and their expectations for
each of the proposed class sizes. A second questionnaire was designed to
survey teachers' opinions toward their assigned class size each year.
Semantic Differential. Two forms of a 7-point semantic differential scale
were developed for participating teachers to assess their attitudes toward
"My Classroom" and "The Pupils I Teach." A 5-point semantic differential
was used with pupils to describe "My Classroom."

Data Analysis
The analysis included data from 62 classes--16 each of class sizes 16 and
23, and 15 each of class sizes 30 and 37. In general, differences between class
sizes were assessed by a one-way analysis of variance with the class serving
as the unit of analysis. For the student outcome data, the variability due to
year of the study and teacher was first removed using a multiple linear
regression technique and an analysis of variance was performed using the
"residuals." For the observational data, means of each variable were first


compared with proportions tests to determine if there were any differences

between years which would necessitate statistical adjustment. A one-way
analysis of variance by class size was conducted. When a significant differ-
ence was found, pseudovalues were calculated by Mosteller and Tukey's
(1968) jackknife procedure and a similar analysis of variance was performed
on the pseudovalues. If an analysis of variance of either type of data resulted
in a significant overall effect due to class size, paired contrasts or range tests
were conducted.
Teachers'Affective Measures
Teachers' Expectations. Prior to the study, teachers reported the effects
they expected in each of the proposed class sizes. A summary of the responses
showed that 94 percent of the positive expectations were directed toward the
smaller classes (16 and 23), and 91 percent of the negative expectations
toward the larger classes (30 and 37). For example, teachers reported that
the smaller classes would allow them to offer a more individualized program,
to provide more individual attention, and to develop a better rapport with
pupils. They expected academic improvement and the development of self-
confidence from pupils in the smaller classes. They were also looking forward
to a relaxed, enjoyable environment, in contrast to more structure and
disciplinary problems in the larger classes.
Teachers' Opinions. To determine if direct experience with a class size
altered their expectations, teachers were again asked for their expectations
toward the end of each year of the study. A teacher's initial expectations
were compared to those expressed after experiencing a particular class size,
so that a list of restated expectations could be compiled.
It was discovered that teachers' opinions matched their expectations. For
example, 43 initial statements expressed the expectation that class size would
specifically determine the amount of individualization possible. After direct
experience with the class sizes, 35 teachers (81 percent) restated that either
the smaller class sizes (i.e., 16 and 23) actually had resulted in more
individualized programs and more individual attention, or that individuali-
zation had been difficult or almost impossible in the larger classes (i.e., 30
and 37).
Teachers also reported that they had made changes directly related to the
size of their class in areas such as classroom management, physical layout,
and student evaluation. Teachers' comments regarding these areas are
summarized briefly below:

Management. Teachers of class size 16 felt that they were able to run their
classrooms more smoothly and efficiently and that their pupils showed
more responsibility in working on their own, handling supplies, and behav-


ing appropriately. In class size 37, it was reported that rules had to be
strictly enforced with restricted physical movement of pupils.
Physical Layout. Teachers of class size 16 were pleased with the amount of
room available, especially for centres and activities, and they reported that
pupils had a choice of where to work and were free to move around.
Teachers reported that their classrooms were crowded in size 37.
Evaluation.Teachers of class sizes 16 and 23 were pleased because marking
took little time and corrections were immediate. Some alternative methods
were mentioned such as interviews and pupil self-evaluation, and it was felt
that evaluation was more personal and extensive. In size 30, it was reported
that marking became more formal, time-consuming, and sometimes de-
layed. In size 37, marking and evaluation were viewed to be overwhelming,
teachers enlisted pupils for marking, and evaluation was based on tests or
specific projects, rather than on general teaching knowledge of pupils.
Teachers expressed dissatisfaction with this kind of evaluation.

Teachers' responses to several question were analyzed as a function of

two "class-size change" groups: one group of teachers who went from a large
class size (30 or 37) the first year to a small size (16 or 23) the second year;
and a second group who went from a small size to a large size. Teachers who
went from a large to a small class size were significantly more likely to:
(1) like their current class size (x2 (1) = 35.2,p < .001);
(2) report a higher personal energy level (x2 (2) = 76.4, p < .001); and
(3) feel that their pupils contributed more, paid more attention, and were
more satisfied the second year (in each case x2 (2) >90, p < .001).
Teachers' Attitudes. A significant effect due to class size was found on
semantic differential ratings for the concept "My Classroom" but not for
"The Pupils I Teach" (Table I). Teachers in class size 16 rated "My
Classroom" significantly more positively than those in size 30 (t54= 2.88,
p < .01) and than those in size 37 (t4 = 3.10,p < .01).
Observationof Classroom Process Variables
Of the numerous variables on the Toronto Classroom Observation Sched-
ule that were investigated, data for those variables in which significant
differencesbetween class sizes were detected are presented in Table I.
Teacher Pupil Interaction. Some of the variables unaffected by class size
included frequency of
(1) verbalizations about classroom routines,
(2) lecturing ("tell") to the class and to all audiences combined (individual,
group, and class),
(3) questioning,
(4) approving, disapproving, and all verbalizations about course content,
(5) approving, disapproving, and all utterances about pupils' behavior,
(6) all approving verbalizations,
(7) all disapproving verbalizations,
(8) all observations about social matters,
(9) all verbalizations addressed to individuals,


Mean Scores by Class Size
Variable F-ratio
16 23 30 37
Teachers' Semantic Differential
My Classroom (maximum = 126) 100.9 96.8 92.2 91.4 4.20*
Pupils I Teach (maximum = 147) 114.1 108.9 110.8 110.2 0.48
Observational Variables'
Proportion of pupils addressed as in- .43 .34 .30 .25 12.96*
Lecture by teacher (maximum = 8) 4.19 4.69 3.13 3.80 4.06*
Supervision by teacher while pupils 1.94 2.44 3.60 1.92 3.38*
working (maximum = 8)
Proportion of written aids used .40 .48 .48 .41 4.50*
Student Affective Measures
Attitudes Toward School (maximum
= 30)
Unadjusted mean score 19.71 17.35 17.57 17.48
Residual mean score 0.29 0.11 0.11 -0.08 1.08
Semantic Differential (maximum =
Unadjusted mean score 8.63 8.93 8.78 9.20
Residual mean score 0.03 0.18 0.26 0.06 0.73
Self-concept (maximum = 30)
Unadjusted mean score 20.16 20.34 20.21 20.25
Residual mean score 0.09 -0.08 0.13 -0.14 1.38
Student AchievementMeasures
Art (maximum = 10)
Unadjusted mean score 6.47 6.30 6.49 6.58
Residual mean score -0.07 0.01 -0.11 0.18 1.29
Composition (maximum = 5)
Unadjusted mean score 4.56 4.40 4.51 4.46
Residual mean score 0.14 0.02 -0.06 -0.11 0.88
Unadjusted mean score 43.16 43.58 42.42 42.84
Residual mean score -0.00 0.04 0.03 -0.07 0.71
Unadjusted mean score 44.08 44.47 44.05 44.28
Residual mean score -0.00 -0.04 0.02 0.02 0.16
Unadjusted mean score 47.41 45.60 44.03 44.69
Residual mean score 0.15 0.00 -0.09 -0.07 4.11*
Mathematics-Problem Solving
Unadjusted mean score 51.17 49.33 48.20 48.82
Residual mean score 0.14 0.04 -0.09 -0.10 2.57
Because of the numerous observational variables that were investigated, data are presented
only for those variables in which significant differences between class sizes were detected. The
F-ratios are based on analyses of pseudovalues (Mosteller & Tukey, 1968). Critical F-value (.05
level)= 2.90.
*p < .05.


(10) all verbalizations addressed to groups,

(11) all verbalizations addressed to the class,
(12) pupil talking which is part of learning, and
(13) observations during which neither the teacher nor any pupils were
Further data on teacher-pupil interaction were collected during the obser-
vation of the participation of individual pupils. The measure obtained was
the proportion of pupils observed whom the teacher addressed individually.
There was a significant class size effect for this variable; pupils were more
likely to be addressed individually in class size 16 than in the other three
sizes, while pupils in class size 23 were more likely to be addressed individ-
ually than those in sizes 30 and 37.
Pupil Participation. The following variables concerned with individual
pupil participation were addressed:
(1) the frequency of observations during which the pupil observed was
participating (i.e., performing an assigned task);
(2) the frequency of observations during which the pupil was participating
(3) the frequency of observations during which the pupil was participating
verbally; and
(4) the frequency of observations during which the pupil had no task to
None of the above variables was affected by class size.
Pupil Satisfaction. It was originally intended to measure pupil aggressive
behavior. However, overt pupil hostility was reported during only 66 of
1,390 observations. A related variable was the number of observations during
which no conflict between pupils was reported. There was no significant
effect of this variable attributable to class size.
Teacher Activity. Observers recorded the three most frequent teacher
activities during each half-day observation visit. Counts were made of the
number of times the following activities were recorded in the course of a
year: lecturing, organization of learning activities, question and answer, and
supervision. No significant differences between class sizes were found for
organization, question and answer, and supervision. However, lecturing was
recorded less frequently in class size 30 than in classes of other sizes.
A related variable was the teacher's use of time while pupils worked on
their own. The teacher was classified as either working alone, supervising,
working with an individual pupil, or working with a group of pupils. The
number of times during a year that each of these categories was reported
was calculated (maximum score = eight). Supervision by the teacher while
pupils were working was found to be more frequent in class size 30 than in
class size 16.
Subject Emphasis. The subjects studied during each observation visit were


recorded and the data were analyzed for three categories: reading, mathe-
matics, and all language arts subjects. The reading and mathematics variables
were counts of the number of times during the year each subject was reported
in each class. The third variable was the mean number of language arts
subjects reported during an observation. The results revealed no significant
class size effects for these variables.
Educational Resources. The variables investigated were the number of
audio-visual, written, and mechanical-concrete resources present in the
classroom as well as those actually used. There was no significant class size
effect for the presence of any of the above educational resources or for the
use of audio-visual or mechanical-concrete aids. However, written aids were
used more frequently in class sizes 23 and 30 than in size 16.
ClassroomAtmosphere.There was no significant effect of class size on the
classroom atmosphere rating scale.
Indicators of Quality. There were no significant effects of class size on the
Indicator of Quality scores.

Student Affective Measures

Attitudes TowardSchool Scale. While the mean attitude score appeared to
be more positive in class size 16, the analysis of the residual data, removing
the variability due to year and teacher, revealed no significant difference
due to class size (see Table I).
Semantic Differential Scale. The mean scale scores for the concept "My
Classroom" revealed no significant difference attributable to class size.
Self-concept Scale. There was no significant difference in student self-
concept attributable to class size.
Student AchievementMeasures
There were no significant differences attributable to class size for art,
composition, vocabularly, reading, and mathematics-problem solving. Only
for mathematics-concepts was there a significant overall effect due to class
size (see Table I). Students in class size 16 had significantly higher mathe-
matics-concepts scores than their peers in class size 30 (t58 = 3.16, p < .005)
and in class size 37 (t8 = 2.87, p < .01).

Manipulating class sizes experimentally resulted in few changes in class-
room functioning in the fourth and fifth grades. Of all the dependent
variables examined in this study, the ones that tended to show differences
due to class size were teachers' opinions and attitudes. Teachers expected
the two smaller class sizes (16 and 23) to have many advantages over the
larger classes (30 and 37), especially in the amount of individualization
possible. After direct experience during the study, teachers felt that they did


make changes to adjust to the different class sizes. Even though these
perceptionsdo not receivemuchsupportfromthe observationaland student
outcome data, they must not be ignored; teachers do believe that their
experiencesin smallerclassesare better.
The observationof classroomprocessvariablesrevealedvery few effects
of class size. Class size did not affect the amount of time teachersspent
talking about course content or classroomroutines.Nor did it affect the
choice of audience for teachers' verbal interactions; that is, when they
changed class sizes, teachers did not alter the proportion of their time spent
interacting with the whole class, with groups, or with individual pupils.
It was found that individualpupils were addressedmore frequentlyby
teachersin the small classes. However,since there were no corresponding
differencesin the total amountof time the teachersspenttalkingto individ-
uals, it seems that pupils in the smaller class sizes had more individual
interactionswith their teacherssimply becausea constantamount of time
for individualinteractionswas being distributedamong fewerpupils.These
findings representchanges that generallywould be expected if class sizes
were reduced and teachers did not change their instructionalstyles or
teachingmethods.In fact, furtherobservationaldata indicatedvirtuallyno
changes in methods of instructionused by teachersin the differentclass
sizes.Previousstudiesalso have indicatedthatteachersgenerallydo not take
advantageof the opportunityaffordedby smallclassesto individualizetheir
instructionalprocedures(e.g., Danowski,1965)and thateven if thereis more
individualization,a considerableamount of instructionin small classes is
still massoriented(e.g., Pugh, 1965).
Standardizedmeasures of students' academic achievement showed a
significantclass size effect only for mathematics-concepts; studentsin class
size 16 had higher scores than their peers in class sizes 30 and 37. There
were no significant differences on measures of reading, vocabulary,or
mathematics-problem solving.An argumentthat performancein endeavors
such as art or compositionwould be more sensitiveto class size effectsthan
the morestandardizedachievementmeasureswas also not supported.There
were also no class size effects for students'attitudestowardschool and for
theirself-concepts.Finally,changingclasssize did not resultin any observed
differencesin pupils'participationin classroomtasks.
This study was includedin the meta-analysisof class size undertakenby
Glass & Smith(1979).They pointedout that the chanceof findinga positive
difference-student achievementresultsfavouringsmallerclasses-in com-
paringclass sizes between 15 and 40 could perhapsbe as low as 45 to 55
percentand that this does not necessarilyrepresentstatisticallysignificant
differences.In Glassand Smith'sanalysis,studentachievementmeasureson
variousinstructionalsubjectswere combinedand not examinedseparately.
This is not tenable unless the instructionalstyles for all subjectareas are


consistent. For example, secondary analyses of the data from the present
study have indicated that the subject of instruction affects the classroom
behavior of teachers (FitzGerald, Wright, Eason, & Shapson, 1978). Math-
ematics was more likely than other subjects to be taught as a single lesson to
the entire class and reading was less likely to be taught as a single lesson to
the entire class. These findings are not related to class size. It is quite possible
to suggest that since students were generally grouped for reading, changing
the overall class size had no effect on this measure, whereas it did for
The range of class sizes included in Glass and Smith's analysis was wider
than in the present study, and in fact their most dramatic results were
obtained for class sizes which dropped below those included in this study
(i.e., below 16). Rather than suggesting that class size does not make any
difference, this study demonstrates that within a narrower range of class
sizes (i.e., 16 to 37), it makes a large difference to the teachers but little
difference to the students or to the instructional methods used.
It must be pointed out that there was no attempt to experimentally
manipulate instructional strategies for the different class sizes. Essentially,
this has been a study of "what happens" when class size is changed, but it
cannot be considered a study of "what can happen." The results suggest
that, in the future, emphasis could be placed on providing teachers with
training in specific instructional strategies most appropriate for different
class sizes. As well, rather than merely advocating reductions or increases in
class size, a more flexible policy could be adopted. For example, class size
could be appropriately altered in different situations by redistributing stu-
dents and time and by changing instructional techniques.

STANLEY M. SHAPSON, Associate Professor, Directoral Professional
Programs, Faculty of Education, Simon Fraser University, Buraby,
British Columbia, Canada. Specializations: Program Evaluation and Cog-
nitive Psychology.
EDGAR N. WRIGHT, Director of Research, Research Department, To-
ronto Board of Education, Toronto, Ontario, Canada. Specializations:
Research Methodology, School Evaluation.
GARY EASON, Research Assistant, Research Department, Toronto Board
of Education, Toronto, Ontario, Canada. Specializations: Psychology.
JOHN FITZGERALD, Research Assistant, Research Department, Toronto
Board of Education, Toronto, Ontario, Canada. Specializations: Psychol-