Sie sind auf Seite 1von 22

Educational Research and Evaluation

ISSN: 1380-3611 (Print) 1744-4187 (Online) Journal homepage:

Understanding the Nature of Errors in Probability

Ann Aileen O'Connell
To cite this article: Ann Aileen O'Connell (1999) Understanding the Nature of Errors in
Probability Problem-Solving, Educational Research and Evaluation, 5:1, 1-21
To link to this article:

Published online: 09 Aug 2010.

Submit your article to this journal

Article views: 85

View related articles

Citing articles: 4 View citing articles

Full Terms & Conditions of access and use can be found at
Download by: [Federation University Australia]

Date: 20 March 2016, At: 16:08

Educational Research and Evaluation

1999, Vol. 5, No. 1, pp. 121

Swets & Zeitlinger

Understanding the Nature of Errors in

Probability Problem-Solving
Ann Aileen OConnell
Downloaded by [Federation University Australia] at 16:08 20 March 2016

University of Memphis

This study provides an investigation of relationships among different types of errors
occurring during probability problem-solving. Fifty non-mathematically sophisticated graduate student subjects enrolled in an introductory probability and statistics course were
asked to solve a set of probability problems, and their attempts at solution were analyzed
for presence and type of errors. The errors contained within these solutions were categorized according to a coding scheme which identifies 110 specific kinds of errors in four
categories: text comprehension errors, conceptual errors, procedural errors, and arithmetic/computation errors. Relationships among types of errors included in each category
were investigated using hierarchical clustering via additive trees. Implications of these
relationships for the teaching and learning of probability problem-solving are discussed.

Diagnostic teaching involves a qualitative analysis of students errors followed by adaptive instruction to eliminate these errors. Such is the nature
of teaching and learning: instruction is motivated in part by attempts to
correct flawed student knowledge. However, accurately describing a students current knowledge base can be extremely difficult, particularly in
complex domains. Consider the case of probability problem-solving, which
The author would like to thank John Nickey and Garry Rabin for their assistance with the
reliability assessment. A portion of this work was based on a doctoral dissertation submitted to the faculty at Teachers College, Columbia University. James E. Corter directed that
Correspondence: Dr. Ann A. OConnell, Educational Psychology and Research, College
of Education, Room 100, University of Memphis, Memphis, TN 38152, USA. Tel.: (901)
678-3936. Fax: (901) 678-5114. E-mail:
Manuscript submitted: May, 1997
Accepted for publication: February, 1998

Downloaded by [Federation University Australia] at 16:08 20 March 2016


requires a combination of procedural, conceptual, and real-world knowledge. Extensive research has documented the existence of cognitive biases
in peoples reasoning about probability and probabilistic events (Fischbein & Schnarch, 1997; Konold, 1989; Konold, Pollatsek, Well, Lohmeier, & Lipson, 1993; Tversky & Kahneman, 1974, 1983), as well as some of
the conceptual difficulties students have when learning elementary probability (Garfield & Ahlgren, 1988; Hansen, McCann, & Myers, 1985;
Shaughnessy, 1992). Consequently, there appears to be no easy approach
to assessing students misconceptions and encouraging them to apply probabilistic principles appropriately.
Problems involving probability or probabilistic reasoning, such as those
typically encountered by college students in education, psychology, biology, and business, clearly demand an appreciation for probability concepts
and principles (Derry, Levin, & Schauble, 1995; Hansen et al., 1985;
Hong & ONeil, 1992; Shaughnessy, 1992). Successful solution of these
kinds of problems also requires an understanding of the terminology and
procedures (i.e., equations, formulas, rules and their interrelationships)
generally used to represent these concepts. Studies in other disciplines
where formal procedures are needed during problem-solving, such as in
physics, algebra, or arithmetic, have often approached the analysis of problem-solving by investigating the relationship between problem presentation and the application of formulas, or through comprehensive analysis of
errors made during post-instruction solution attempts (Hiebert & Carpenter, 1992; Hinsley, Hayes, & Simon, 1977; Kintsch & Greeno, 1985; Larkin, McDermott, Simon, & Simon, 1980; Matz, 1982; VanLehn, 1982).
The focus of this article is on the description and analysis of errors made
during problem-solving in probability. The study seeks to clarify the nature of the relationships occurring among different kinds of errors in order
to provide some guidelines for improving instruction and learning in this
Several studies in subject areas such as algebra (Matz, 1982; Sleeman,
1982), physics (Chi, Feltovich, & Glaser, 1981), and subtraction (Brown
& Burton, 1978; Brown & VanLehn, 1980; VanLehn, 1982) have shown
that different students often exhibit strikingly similar errors or misconceptions as they are learning a new skill. Similarities have also been found in
the domain of probability problem-solving, where many errors and misconceptions are repeatedly observed across students and problems
(OConnell, 1993; OConnell & Corter, 1993). The importance of such
studies is clear, since, research on students errors makes it possible to
identify specific deficits in the way students knowledge is connected so
that instruction can be designed to address the specific connections stu-

Downloaded by [Federation University Australia] at 16:08 20 March 2016


dents lack or to point out why certain connections are inappropriate (Hiebert & Carpenter, 1992, p. 89). We know that students have difficulty
with problems involving probability; one way to help them overcome
these difficulties is by better understanding the nature of errors they are
making when solving typical problems. Such information is important for
diagnosis of a students flawed understanding, as well as for developing
adaptive individualized instruction.
Research has shown that most of the information about common errors
occurring within a domain can be obtained through the use of verbal or
written protocols of student work, or through the use of diagnostic tests
designed to elicit predicted errors (Brown & Burton, 1978; Ginsburg,
Kossan, Schwartz, & Swanson, 1983). However, identifying specific errors is only part of the process in working towards adaptive instruction for
improved learning. Establishing the role of relationships among these observable errors is also critical for designing appropriate and successful
instructional strategies.
Just as children sometimes learn only partial procedures when being
taught skills such as subtraction (Brown & Burton, 1978; Brown & VanLehn, 1980; VanLehn, 1990), many college students seem to grasp only a
partial understanding of fundamental concepts and procedures in probability. VanLehns (1990) study of subtraction concerned childrens acquisition of procedural skills and the development of procedural bugs. Earlier, he defined a bug as a slight modification or perturbation of a correct
procedure (VanLehn, 1982). In terms of investigating procedural skill,
subtraction was a good domain for VanLehn to choose for his research in
part because of the fact that at the age at which subtraction is taught,
children generally have no preconceived notions about the processes of
subtraction. In contrast, studies have shown that children, as well as adults,
do develop conceptions about probability and chance prior to classroom
instruction (Fischbein & Schnarch, 1997; Kahneman, Slovic, & Tversky,
1982; Piaget & Inhelder, 1975; Tversky & Kahneman, 1974). The existence of faulty preconceptions or learned misconceptions regarding probability serves to make the study of the probability problem-solving process
much more interesting than that of a simple procedural skill.
The work presented here is a critical study of how students solve probability problems typically found in many graduate level introductory statistics textbooks aimed at students in social or behavioral sciences. The study
provides an investigation into the nature of the relationships among different types of text comprehension, procedural, conceptual and arithmetic
errors. This paper also describes the error analysis methodology used to
identify and code the errors present in post-instruction solutions to proba-


bility problems. The results of this approach offer important diagnostic

information for other teachers of probability and statistics at the secondary
school or college level. Once relationships among specific types of errors
are identified, instruction can then be adapted to directly address and
remedy these inaccuracies.

Downloaded by [Federation University Australia] at 16:08 20 March 2016

Subjects and Task
Fifty graduate students of education and psychology, who were enrolled in
a one semester course in probability and statistics at a large urban university in New York City served as subjects for this study. These subjects are
part of a larger study on probability problem-solving. At the completion of
the probability section of the course, the students were assigned twelve
probability problems, typical of those found in introductory texts on probability and statistics. The solutions to these problems were collected one
week after distribution. Students were requested to show all of their work
during solution to a problem.
The collection of problems assigned covered topics including equally
likely, non-equally likely, mutually exclusive, independent and conditional events, and assessing their associated probabilities. Examples of two of
the twelve problems can be found below during the discussion of the
coding scheme. Several of the problems, such as the second example
below, contained sequential questions, amounting to a total of 50 individual items over all 12 problems. Each of the solutions obtained for these 50
items was inspected for errors, and the type of error made on each question
was coded according to the scheme described in the following paragraphs.
Development of Error Coding Scheme
The error coding scheme was developed through analysis of the written
work of 180 students solving 93 different probability problems. First,
broad categories corresponding to text comprehension errors (T), conceptual errors (C), procedural errors (P), and arithmetic errors (A) were created. Table 1 provides a brief description of the kinds of errors represented
by these categories. In general, errors arising from deficiencies in problem
understanding are classified as text comprehension errors. Conceptual errors refer to observed difficulties with probability concepts or from working within probabilistic systems. Procedural errors are those arising from
faulty application of formulas or rules. The arithmetic errors category is
used to identify calculation mistakes.


Table 1. Error Categories.

Downloaded by [Federation University Australia] at 16:08 20 March 2016



Text Comprehension

General misunderstanding of the information contained in the

text of a problem, such as assigning a probability value to the
wrong event, incorrectly identifying the goal of a problem,
misinterpreting statements involving inequalities, etc.


Errors involving basic concepts or definitions of probability,

such as reporting a negative probability value or a probability
greater than 1.0, assuming events are equally likely without
appropriate justification, applying the algebra of real numbers
to sets, equating frequency with probability, misunderstandings of independence, mutually exclusive events, or complementary events, etc.


Faulty procedures, such as: forgetting outcomes when defining

a sample space, not checking preconditions before applying a
formula (i.e., for ME or Independent events); using an incorrect version of a formula, forgetting values or substituting incorrect values into a formula, inventing incorrect procedures,
using inappropriate strategies, or not completing a strategy,
substituting the wrong values into an expression, etc.


These are errors involving simple miscalculations, copy mistakes such as transposing digits, incorrect cancellation of terms
from numerator and denominator of an expression, etc.

In order to systematically track the different kinds of errors occurring

within these four general categories, the written protocols for the 93 problems were inspected for errors and a code was assigned to identify each
different error observed. This process resulted in the identification of 110
specific errors; there were 9 specific text comprehension errors, 18 specific conceptual errors, 71 specific procedural errors, and 12 specific arithmetic errors. A fifth category (X) was used to identify an error which
could not be clearly determined.
Finally, the assignment of codes to the identified errors was arranged
hierarchically to facilitate classification of similar specific errors. For example, four specific conceptual errors involving mutually exclusive events
were identified during the error analysis (C8.1, C8.2, C8.3, C8.4). These
four errors belong to a higher level category characterizing one type of
conceptual error (C8: misconceptions involving mutually exclusive events).
Examples of this hierarchical assignment of error codes is given in Table
2. Overall, 30 higher-order levels, or types of errors, were determined


Table 2. Examples of Type and Specific Errors Codes Used for the Error Analysis.

Specific Error

Downloaded by [Federation University Australia] at 16:08 20 March 2016


Misconceptions involving mutually exclusive events.


Incorrect definition of mutually exclusive events, or the inability to distinguish between mutually exclusive vs. non-mutually
exclusive events.


Believing that a single event can be mutually exclusive or not

mutually exclusive.


Claiming events are mutually exclusive when the intersection

of these events is either provided or observable from data in a


Not recognizing that the intersection of mutually exclusive events

is null, so that the probability of their intersection is zero.


Procedural errors involving mutually exclusive events or formulas for mutually exclusive events.

Incorrect formulas: and (or intersection) of events implies addition of probabilities.


Determining the probability of the union of two events by summing the probabilities of the simple events, without verifying if
the simple events are mutually exclusive.

through this analysis. This allowed for classification of 8 types of text

comprehension errors, 11 types of conceptual errors, 10 types of procedural errors, and all arithmetic errors were considered as one type. The final
codes used and the names for each error type can be found in Table 3. A
more detailed summary of the specific errors observed and the coding
scheme is available from the author.
Reliability of the Coding Scheme
Given the complexity of the error coding scheme, reliability was assessed
in two different ways. Using a subset of the work of 30 students (different
from the sample used for the current study), two independent judges were
asked to assess each instance of an error as either text comprehension (T),
procedural (P), conceptual (C), or arithmetic (A). Inter-rater assessments
produced the following percent agreements and Cohens kappa estimates
for identification of errors from each of the four categories: T: 84%; kap-


Table 3. Type and Frequency of Observed Errors in Probability Problem-Solving.

Downloaded by [Federation University Australia] at 16:08 20 March 2016

Type of Error

f(type) f(total) %(type) %(total)

T1: Incorrect assignment of value given in the problem

T2: Incorrect specification of goal (equality)
T3: Choosing pairs instead of triples/singles, etc.
T4: Misinterpretations of inequalities
T5: Selection with vs. without replacement
T6: Real world knowledge errors
T7: Incorrect model of experiment described in problem
T8: Interference from another (previous) problem
Totals: Text Comprehension Errors
C1: Misconceptions: probability/sample space/n(S)
C2: Misconceptions: frequency vs. probability
C3: p>1.0
C4: p<0
C5: P(S)1.0
C6: formal language of probability
C7: Misconceptions: equally likely events
C8: Misconceptions: mutually exclusive events
C9: Misconceptions: independence
C10: Misconceptions: mutually exclusive vs. indep.
C11: Misconceptions: complementary events
Totals: Conceptual Errors
P1: Procedural errors in determining sample/event space
P2: Incomplete/unfinished
P3: General use of formulas
P4: Procedural errors involving independence
P5: Procedural errors involving mutual exclusiveness
P6: Procedural errors involving sequential experiments
P7: Procedural errors involving use of tabled data
P8: Procedural errors involving conditional probability
P9: Procedural errors involving complementary events
P10: Inventing incorrect procedures or rules
Totals: Procedural Errors
Totals: A: Arithmetic errors
Totals: X: Unclassified errors
Overall Totals













Note. n = 50 students.

pa=0.61 (p<0.01); C: 93%, kappa=0.83 (p<0.01); P: 82%, kappa=0.82

(p<0.01); and A: 89%, kappa=0.68 (p<0.01).
In addition to reliability assessments for the four broad categories, inter-rater agreement was also used to investigate reliability of classification
of errors into 1 of the 30 types specified in Table 3. The author and a third
rater used the coding scheme to identify errors occurring in the work of 14
additional students solving four probability problems; 22 of the solutions


Downloaded by [Federation University Australia] at 16:08 20 March 2016

contained errors, and in this subset 84% agreement as to the type of error
made was reached by the two raters.
Error Coding Scheme Guidelines
The present study consisted of coding the work of 50 students solving
twelve probability problems, using the above coding scheme. One difficulty with this type of qualitative analysis is that when a problem consists
of several different parts, an error near the beginning often affects the
solution to subsequent parts of the same problem. For consistency, the
following guidelines were adhered to during the error analysis:
(1) If a student made an arithmetic error in one part of the problem which
affected the solution to any of the remaining parts, the arithmetic error
was coded only once. If, however, the student made an error in text
comprehension or a procedural or conceptual error which affected the
correct solution to subsequent parts, the error was coded each time it
affected the solution. This approach is justified because such errors of
understanding carry over from problem to problem in a manner that
is vastly different from a simple calculation or arithmetic error.
(2) Often, one students solution process to a single question or part of a
question contained several different errors. All of the observed errors
for a solution process were coded according to the coding scheme
given in Table 3.
(3) If a student attempted the problem in more than one way, and neither
attempt leads to an accurate solution, only the first solution attempt
was coded. This guideline was followed since we were interested in
assessing the relationships among errors made during initial attempts
to solve a particular problem.
In order to investigate possible relationships among observed errors, a
frequency score corresponding to each error type was calculated for all 50
subjects in this study. Using this technique, the tendency for a student to
make a particular type of error could be readily discerned. Table 3 provides the type and frequency of observed errors for this sample of 50
subjects. Three of the 50 students made no errors on any of the 12 problems. Thus, the relationships among different kinds of errors was assessed
using the frequencies of error types for the remaining 47 students.
Hierarchical clustering was used to help identify a natural structure to the
set of text comprehension, conceptual, procedural and arithmetic errors.

Downloaded by [Federation University Australia] at 16:08 20 March 2016


The ADDTREE/P program (Corter, 1982), based on Sattath and Tverskys

(1977) algorithm for fitting additive trees, was used to investigate the
hierarchical cluster structure of the proximity matrix. The regular Pearson
correlation between all pairs of variables was used as the measure of
Additive trees provide a convenient visual representation of the relationships among a set of variables, in terms of their cluster hierarchy as
well as for interpretation of the unique and common features of items in
the additive tree. Traditional hierarchical clustering schemes measure cluster distances in terms of average inter-cluster distance or in terms of the
furthest or nearest neighbor algorithms. Additive trees, however, provide
a more accurate representation of the original distances among the items.
The use of an additive tree tends to preserve the original distance relations
among the items, instead of assuming these distances to be equal for all
items in two different clusters.
According to Sattath and Tversky (1977), each arc on an additive tree
defines a cluster which consists of all the objects that follow from it.
Thus, each arc can be interpreted as the features shared by all the objects in the cluster and by them alone. The length of the arc can thus be
viewed as the weight of the respective features, or as a measure of the
distinctiveness of the respective cluster (p. 330).
Therefore, the use of additive trees allows us to visually recognize the
strength of the relationships observed among the set of text comprehension, conceptual, procedural and arithmetic errors.
Example of the Coding Scheme
To illustrate the coding scheme used during this research, several written
protocols of students solving actual problems are now presented. First,
consider the following problem.
Example 1: There are eight students in a reading group. Three of the
students are classified as strong readers, three as average and two as
weak readers. A researcher wants to work with two randomly selected
students from this group. What is the probability that both of the students she selects are the same type of reader?
In the example shown in Figure 1, the student is assuming that all of the
events determined in this experiment are equally likely to occur. This
particular error is coded as a conceptual error (C7.2), indicating that the



Downloaded by [Federation University Australia] at 16:08 20 March 2016

Subject 43:

Fig. 1. Example of a solution to the reading group problem.

student does not have a clear understanding of the antecedents for events
to be considered equally likely. Assuming that events are equally likely
without appropriate justification is the second most common error occurring in this group of 50 students (see Table 3). This assumption makes
many kinds of probability problems computationally easier to solve, therefore, students who have difficulty working with formulas or understanding
the process involved in random selection (of single outcomes or compound outcomes) may also feel comfortable relying on this assumption
simply to reduce the complexity of the solution process.
The next example illustrates the use of the coding scheme in a more
complex probability problem.
Example 2. Assume that there are equal numbers of males and females
at a school. The probability is 1/5 that a male student and 1/20 that a
female student will be taking a science course. What is the probability
that (a) a randomly selected student will be a male science student, (b)
a randomly selected student will be a science student, (c) a student is a
science student given that she is female. (d) Are gender and science
registration independent?
This particular problem consisted of four different questions; accordingly,
each question was inspected for errors. For the solution depicted in Figure
2, we see that the first difficulty encountered by this student is in text



Downloaded by [Federation University Australia] at 16:08 20 March 2016

Subject 32:

Fig. 2. Example of a solution to the science and gender problem.

comprehension. The phrase the probability is 1/5 that a male student ...
will be taking a science course was interpreted as a conjunction (P(male
and science) instead of as a conditional probability statement (P(science |
male). This miscomprehension, coupled with the students reliance on the
and means multiply mal-rule (i.e., using P(A B) = P(A) * P(B), without justifying whether or not independence holds) leads to erroneous solutions during many aspects of the problem. Perhaps with better training in
translation, the student may have found this problem easier to solve. The
errors coded for each of the four questions are as follows:
(a) T1: Incorrect assignment of the probabilities or numerical values given in the problem. In this example, the given conditional probability
was represented as a joint probability.
(b) P7.3: Ignoring the probabilities of simple events as provided in the
problem (i.e., P(M) = P(F) =.50), while using incorrect substitution of
conditional probability as a joint probability to complete the cells of a
table. The student read the value correctly but from an erroneously
constructed table. Note the probabilities of Male and Female given at
the bottom of the table.
(c) P7.2: Incorrect determination of intersection of events or conditional
probability when reading data from a table. Here, the student used the
joint probability as presented in her table to represent P(S | F). Al-

Downloaded by [Federation University Australia] at 16:08 20 March 2016



though the answer to this problem, when done correctly, is .05, this
answer does not result from the students constructed table.
(d) Several errors are identified in the final question. First, the student is
again using the joint probability for Male and Science as .20, which
should have been interpreted as a conditional probability (T1). The
student then assumes that these two events, male and science are
independent in order to solve for the probability of male (P4.1).
Finally, the probability used for science is again taken from the
erroneously constructed table (P7.3). Due to the students assumption
of independence of events, her solution to this question would of course
lead her to erroneously conclude that the two events are, in fact, independent.

Error Analysis
Table 3 provides a description of each error type, and the frequency with
which each of the variables (error types) was observed for this sample of
students. Arithmetic errors were combined as one type of error.
As can be seen from Table 3, the most common errors overall were
procedural in nature, followed by errors in text comprehension. It should
be noted that procedural difficulties are often preceded by difficulties in
text comprehension, as seen in the second example above. To investigate
the nature of this and other relationships among the variables, the data
were submitted to two clustering analyses with results as presented below.
Hierarchical Clustering using Additive Trees
Relationships among the types of errors found in this sample of 47 students were assessed through hierarchical clustering using additive trees.
Two separate analyses were conducted. Relationships among the conceptual and procedural errors were investigated first, followed by an assessment of relationships among errors in all four categories. Correlations
among the variables (error types) served as the measure of proximity for
the cluster analyses. For stability of solution, the additive trees were fit for
the set of error types with a frequency greater than or equal to four for each
Conceptual/Procedural Relationships
The data for this analysis consisted of the correlations between the 10
types of procedural errors and 6 types of conceptual errors which had an

Fig. 3. Additive tree for conceptual and procedural error types.

Downloaded by [Federation University Australia] at 16:08 20 March 2016



Downloaded by [Federation University Australia] at 16:08 20 March 2016



overall frequency greater than or equal to four. The cluster analysis revealed a correlation between actual and estimated proximities for this data
as 0.84, accounting for nearly 70% of the variance (R2 = 0.6988). The
additive tree is presented in Figure 3, and three main clusters of items
identified through the analysis are indicated by number on the tree.
Interpretation of the three items forming the first cluster suggests that
conceptual difficulties in working with the formal language of probability
(which includes difficulty working with the algebra of sets versus the algebra of real numbers) are related to misconceptions and procedures involving
mutually exclusive events. Due to the long arc emanating from this cluster,
this is also one of the more prominent relationships observed. One explanation for this prominence is the tendency of interpreting the word and as
implying addition, which may lead to application of the addition rule for
determining the union of two mutually exclusive events. This means that the
student would apply the rule P(A or B) = P(A) + P(B), when the task actually involves finding the intersection of these events, that is, P(A and B).
The second cluster identified on the additive tree includes several combinations of conceptual and procedural errors. Taken together, the eight
items in this cluster seem to indicate a very general relationship between
misconceptions of independence, conceptual difficulty in distinguishing
between independent and mutually exclusive events, and procedural difficulty in solving probability problems which require some knowledge of
independent versus non-independent events, such as conditional probability, sequential selection with or without replacement, and working with
data in table form.
The third cluster on the tree can be identified as difficulty in working
with formulas in general. The errors in this cluster include unfinished
solution attempts, inventing procedures or rules to fit ones understanding of a problem, and difficulty working with formulas for complementary
events. Formulas involving complementary events are often confusing for
students, particularly if determining the complement of an event is required as a first step towards solution.
One item which appears to stand alone in relation to the other clusters is
the concept of equally likely events. This item, then, is relatively unique,
although it is placed closest in the tree to the cluster of items indicating
difficulty working with formulas in general. The assumption that events
are equally likely, whether justified or not, makes the computation involved in many probability problems easier. Therefore, students who have
difficulty understanding and working with formulas may also tend to feel
more comfortable relying on this assumption simply to reduce the complexity of the solution process.

Downloaded by [Federation University Australia] at 16:08 20 March 2016



Relationships Among All Errors

To shed more light on the relationship between conceptual and procedural
errors, a second additive tree was fit to the correlation matrix for error
types in all four categories. The data for this analysis consisted of correlations between 5 text comprehension errors, the 10 procedural and 6 conceptual error types, and 1 class representing errors in arithmetic. The
resulting additive tree is shown in Figure 4. Six clusters were identified
and are labeled in the Figure. The correlation between the observed and
estimated proximities is .79, accounting for 62% of the variance (R2 =
.6169). Useful information is obtained from this analysis about the factors
which may influence a students tendency to exhibit certain types of conceptual and procedural errors.

Fig. 4. Additive tree for text comprehension, conceptual, procedural, and arithmetic error

Downloaded by [Federation University Australia] at 16:08 20 March 2016



The first cluster consists of two items, combining unfinished solution

attempts with misconceptions concerning the validity of a probability value greater than one. This combination of errors indicates that those students who work a solution through to completion and arrive at a probability value greater than one are often unable to recognize that their solution
attempt has proceeded inaccurately. Perhaps this is why they often get
stuck on other problems, and leave their attempts unfinished. This cluster
of items suggests that appropriate strategies are as necessary for successful performance in probability problem-solving as knowledge of concepts
and specific procedures. While this may seem intuitively obvious, we see
evidence here of a flawed schema or model (missing or incomplete) for
solving problems in this domain.
Interpretation of other clusters on the tree support this notion of strategic difficulties being a major hindrance to students. In particular, the second cluster on the tree, consisting of five items, indicates that accurate
identification of the goal of a problem, as well as the ability to work with
inequalities, are frequent difficulties. These two errors are combined with
misconceptions about the formal language of probability and mutually
exclusive events, and procedural errors concerning mutually exclusive
events. It seems that those students who exhibit a poor assessment of the
quantity requested in a probability problem also tend to misunderstand the
concept and use of formulas involving mutually exclusive events. One
reason for this might be that the formula for the union of mutually exclusive events is extremely simplistic (P(A or B) = P(A) + P(B)). When in
doubt of what is being asked for, addition of the quantities presented in the
text may seem like a reasonable approach to the student.
The third cluster identified on the additive tree combines one text comprehension error and three procedural errors. Those students who have
difficulty correctly representing a situation as described in the text of a
problem also have a tendency to exhibit procedural errors involving independence (i.e., applying the formula for the intersection of two independent events without verifying that the events are indeed independent); have
difficulty working with tabled data, particularly in representing the given
information in table form; and have difficulty working with sequential
experiments, especially in setting up tree diagrams. Assisting students in
the correct interpretation of text information and designing a representation of this information may help to alleviate many of these procedural
Cluster four contains 3 items, and indicates that interference with information given in a previous problem may be a factor in the tendency to
invent procedures or rules, as well as in the tendency to assume that events

Downloaded by [Federation University Australia] at 16:08 20 March 2016



are equally likely. This clustering suggests that the wording of traditional
probability problems, as well as their contextual placement in a set of
problems, may be confusing to some students. Again, assuming that events
are equally likely may be the easiest strategy for a student to rely on in
order to reach a solution in a difficult situation.
Difficulty in assigning a given probability value to the correct event as
given in the text forms a cluster (cluster 5) with two procedural errors:
those involving conditional probability and errors involving complementary probability. This is not surprising, since many students tend to interpret sentences such as the probability is 1/20 that a female student will be
taking a science course as a conjunction (i.e., P(F and Science)=1/20)
instead of a conditional probability (P(Science | F)=1/20). Similarly, if the
text of a problem supplies complementary probabilities, such as the probability of an elevator not working, misinterpretation of the given information is likely to occur. These three errors are also associated with difficulties understanding the concept of independence, and procedural errors in
determining an appropriate event or sample space. Again we see that textual difficulties are associated with specific conceptual and procedural
The last cluster identified on the tree contains three errors (cluster 6). In
this cluster, arithmetic errors are combined with procedural errors in the
general use of formulas and difficulty distinguishing conceptually between independent and mutually exclusive events. Poor arithmetic skill is
only one reason why students are often unsuccessful at probability problem-solving, yet improving this skill may also help people understand the
conceptual underpinnings of formulas used most often in this domain.
Summary of the Analyses
Overall, results of the first analysis suggest several relationships among
conceptual and procedural errors: associations between conceptual and
procedural errors regarding mutually exclusive events; associations between conceptual and procedural errors regarding independent events and
related formulas; and procedural difficulty when working with formulas in
However, more useful information pertinent to diagnosing a students
difficulties during probability problem-solving is obtained from the cluster analysis for error types in all four categories: text comprehension,
conceptual, procedural, and arithmetic errors. The clusters on the additive
tree derived from the set of errors for this sample of students indicate how
text comprehension difficulties are associated with conceptual and procedural errors during problem-solving. These results suggest that those stu-



dents committing errors are often experiencing difficulty in strategy and

planning of a solution appropriate to the problem being asked.

Downloaded by [Federation University Australia] at 16:08 20 March 2016

Studies of errors and misconceptions have enormous potential for the
improvement of teaching and learning (Hiebert & Carpenter, 1992; Wittrock, 1991; Langley, Wogulis, & Ohlsson, 1990). Instruction should be
flexible and guided by an accurate assessment of the students understanding of the subject. As a preliminary step towards diagnosis and remediation of student difficulties in probability problem-solving, this study has
revealed the nature of several relationships among conceptual, procedural,
arithmetic, and text comprehension errors.
From an educators perspective, a students understanding of probability problem-solving is recognized by the ability to successfully work within the formal system of concepts and procedures which define this domain.
Diagnosis of student difficulties in this area is a complicated task, as many
misconceptions are related to each other and attempts to remediate a single
misconception may not result in an improved ability to solve different, or
even similar, types of problems. However, several specific pedagogical
strategies are suggested based on the results of this study.
In particular, it was shown that poor arithmetic skills are related to
difficulties in working with formulas in general. Pre-requisite arithmetic
skills and the ability to understand information presented in words, as well
as in symbols, are crucial to students development of appropriate cognitive models for probability problem-solving. One suggestion for improved
instruction, then, is to encourage a prerequisite course in arithmetic and
basic algebra before students enroll in a first course in probability and
statistics. This is especially pertinent for graduate students who may not
have had a math course for quite some time. A refresher course in basic
mathematics concepts may also help to alleviate the difficulty which some
students have in working with inequalities and relational expressions.
Due to the high proportion of errors attributed to text comprehension
difficulties (23%), particularly regarding translation of probabilities given
in the text of a problem and the identification of the goal, students should
be given practice at reading and interpreting word problems in probability.
Students need the ability to relate natural language to the language of
probability. Since many probability problems require understanding of
relational operators such as less than, at least, etc., students should
also be given practice in representing these phrases in set notation. This

Downloaded by [Federation University Australia] at 16:08 20 March 2016



should be done in conjunction with instruction in procedural methods in

probability problem-solving.
Given the structure of the relationships identified in this research, instruction in probability problem-solving should address ability in three
areas concurrently: (1) text comprehension; (2) an understanding of basic
concepts, including set notation and the formal system used to express
probability concepts; and (3) the application and manipulation of specific
formulas. Instruction should proceed with knowledge of the relationships
among text, conceptual and procedural errors; knowledge of which types
of errors are frequently observed in student work; and knowledge of the
conceptual determinants of common as well as uncommon procedural
errors, such as interpreting the word and as implying addition. Procedural knowledge should be integrated with conceptual knowledge and the
ability to accurately discern information contained in the text of a problem. The formulas typically taught in a first course in probability must be
taught with greater emphasis on why certain formulas are appropriate and
in what situations, as well as how to computationally execute these formulas.
Interpretation of many of the relationships uncovered in this study are
consistent with those discussed in previous research about the nature of
problem solving and the relationships between different kinds of knowledge. For example, Riley, Greeno, and Heller (1983) describe three types
of knowledge necessary for successful problem solving: a problem schema, for understanding a word problem; an action schema, for relating the
representation of the problem to procedures; and strategic knowledge, for
planning a solution. These researchers state that conceptual knowledge
can influence which actions get selected (p. 188).
The findings presented here suggest that knowledge of what kinds of
errors are likely to occur at different points in the problem-solving process
might help instructors guide students towards development of a more efficient schema for solving probability problems. Probability problem-solving is a difficult task, both to teach and to learn successfully. Progress
towards accurate diagnosis of a students difficulties should be aided
through the identification of relationships among errors reported here.
With an understanding of these relationships among text comprehension
errors and procedural and conceptual errors, appropriate and effective
remediation could then be designed.



Downloaded by [Federation University Australia] at 16:08 20 March 2016

Brown, J. S., & Burton, R. R. (1978). Diagnostic models for procedural bugs in basic
mathematical skills. Cognitive Science, 2, 155192.
Brown, J. S., & vanLehn, K. (1980). Repair Theory: A generative theory of bugs in
procedural skills. Cognitive Science, 4, 379426.
Corter, J. E. (1982). ADDTREE/P: A PASCAL program for fitting additive trees based on
Sattath and Tverskys ADDTREE algorithm. Behavior Research Methods and Instrumentation, 14, 353354.
Chi, M.T.H., Feltovich, P.J., & Glaser, R. (1981). Categorization and representation of
physics problems by experts and novices. Cognitive Science, 5, 121152.
Derry, S., Levin, J.R., & Schauble, L. (1995). Stimulating statistical thinking through
situated simulations. Teaching of Psychology, 22, 5157.
Fischbein, E., & Schnarch, D. (1997). The evolution with age of probabilistic intuitively
based misconceptions. Journal for Research in Mathematics Education, 28, 96
Garfield, J., & Ahlgren, A. (1988). Difficulties in learning basic concepts in probability
and statistics: Implications for research. Journal for Research in Mathematics Education, 19, 4463.
Ginsburg, H. P., Kossan, N. E., Schwartz, R., & Swanson, D. (1983). Protocol methods in
research on mathematical thinking. In H.P. Ginsburg (Ed.), The development of
mathematical thinking (pp.747). New York: Academic Press.
Hansen, R. S., McCann, J., & Myers, J.L. (1985). Rote vs. conceptual emphases in teaching
elementary probability. Journal for Research in Mathematics Education, 16, 364
Hiebert, J., & Carpenter, T.P. (1992). Learning and teaching with understanding. In D.A.
Grouws (Ed.), Handbook of research on mathematics teaching and learning: A
project of the National Council of Teachers of Mathematics (pp. 6597). New
York: Macmillan.
Hinsley, D., Hayes, J.R., & Simon, H.A. (1977). From words to equations: Meaning and
representation in algebra word problems. In P.A. Carpenter & M.A. Just (Eds.),
Cognitive processes in comprehension (pp. 89106). Hillsdale, NJ: Lawrence
Hong, E., & ONeil, Jr., H.F. (1992). Instructional strategies to help learners build relevant
mental models in inferential statistics. Journal of Educational Psychology, 84,
Kahneman, D., Slovic, P., & Tversky, A. (Eds.) (1982). Judgment under uncertainty:
Heuristics and biases. New York: Cambridge University Press.
Kintsch, W., & Greeno, J.G. (1985). Understanding and solving word arithmetic problems.
Psychological Review, 92, 109129.
Konold, C. (1989). Informal conceptions of probability. Cognition and Instruction, 6, 59
Konold, C., Pollatsek, A., Well, A., Lohmeier, J., & Lipson, A. (1993). Inconsistencies in
reasoning about probability. Journal for Research in Mathematics Education, 24,
Langley, P., Wogulis, J., & Ohlsson, S. (1990). Rules and principles in cognitive diagnosis.
In N. Frederiksen, R. Glaser, A. Lesgold, & M. G. Shafto (Eds.), Diagnostic monitoring of skill and knowledge acquisition (pp. 217249). Hillsdale, NJ: Lawrence
Larkin, J.G., McDermott, J., Simon, D.P., & Simon, H.A. (1980). Models of competence
involving physics problems. Cognitive Science, 4, 317345.

Downloaded by [Federation University Australia] at 16:08 20 March 2016



Matz, M. (1982). Towards a process model for high school algebra errors. In D. Sleeman &
J.S. Brown (Eds.), Intelligent tutoring systems (pp. 2550). New York: Academic
OConnell, A. A. (1993). A classification of student errors in probability problem-solving.
Unpublished doctoral dissertation, Teachers College, Columbia University, New
OConnell, A. A., & Corter, J. E. (1993, April). Student misconceptions in probability
problem-solving. Paper presented at the Annual Meeting of the American Educational Research Association, Atlanta, Georgia.
Piaget, J., & Inhelder, B. (1975). The origin of the idea of chance. New York: Norton.
(Original work published 1951).
Riley, M. S., Greeno, J. S., & Heller, J. I. (1983). Development of childrens problemsolving ability in arithmetic. In H. P. Ginsburg (Ed.), The development of mathematical thinking (pp. 153196). Orlando, FL: Academic Press.
Sattath, S., & Tversky, A. (1977). Additive similarity trees. Psycholometrika, 42, 319345.
Shaughnessy, J. M. (1992). Research in probability and statistics: Reflections and directions. In D.A. Grouws (Ed.), Handbook of research on mathematics teaching and
learning: A project of the National Council of Teachers of Mathematics (pp. 465
494). New York: Macmillan.
Sleeman, D. H. (1982). Assessing aspects of competence in basic algebra. In D. Sleeman &
J.S. Brown (Eds.), Intelligent tutoring systems (pp. 185199). New York: Academic Press.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases.
Science, 185, 11241131.
Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90, 293315.
VanLehn, K. (1982). Bugs are not enough: Empirical studies of bugs, impasses and repairs
in procedural skills. Journal of Mathematical Behavior, 3, 372.
VanLehn, K. (1990). Mind bugs: The origins of procedural misconceptions. Cambridge,
MA: MIT Press.
Wittrock, M. C. (1991). Testing and recent research in cognition. In M. C. Wittrock & E. L.
Baker (Eds.), Testing and cognition (pp. 516). Englewood Cliffs, NJ: Prentice