Sie sind auf Seite 1von 31

Statistics and Probability

Statistics and probability are sections of mathematics that deal with

data collection and analysis. Probability is the study of chance and is
a very fundamental subject that we apply in everyday living, while
statistics is more concerned with how we handle data using different
analysis techniques and collection methods. These two subjects
always go hand in hand and thus you can't study one without
studying the other.
Introduction to Statistics
Statistics is a branch of mathematics that deals with the collection,
analysis and interpretation of data. Data can be defined as groups of
information that represent the qualitative or quantitative attributes
of a variable or set of variables. In layman's terms, data in statistics
can be any set of information that describes a given entity. An
example of data can be the ages of the students in a given class.
When you collect those ages, that becomes your data.

A set in statistics is referred to as a population. Though this term is

commonly used to refer to the number of people in a given place, in
statistics, a population refers to any entire set from which you
collect data.
Data Collection Methods
As we have seen in the definition of statistics, data collection is a
fundamental aspect and as a consequence, there are different
methods of collecting data which when used on one particular set
will result in different kinds of data. Let's move on to look at these
individual methods of collection in order to better understand the
types of data that will result.
Census Data Collection
Census data collection is a method of collecting data whereby all the
data from each and every member of the population is collected.
For example, when you collect the ages of all the students in a given
class, you are using the census data collection method since you are
including all the members of the population (which is the class in
this case).

This method of data collection is very expensive (tedious, time

consuming and costly) if the number of elements (population size) is
very large. To understand the scope of how expensive it is, think of
trying to count all the ten year old boys in the country. That would
take a lot of time and resources, which you may not have.
Sample Data Collection
Sample data collection, which is commonly just referred to
as sampling, is a method which collects data from only a chosen
portion of the population.
Sampling assumes that the portion that is chosen to be sampled is a
good estimate of the entire population. Thus one can save resources
and time by only collecting data from a small part of the population.
But this raises the question of whether sampling is accurate or not.
The answer is that for the most part, sampling is approximately
accurate. This is only true if you choose your sample carefully to be
able to closely approximate what the true population consists of.
Sampling is used commonly in everyday life, for example, all the
different research polls that are conducted before elections. Pollsters
don't ask all the people in a given state who they'll vote for, but they
choose a small sample and assume that these people represent how
the entire population of the state is likely to vote. History has shown
that these polls are almost always close to accuracy, and as such
sampling is a very powerful tool in statistics.
Experimental Data Collection
Experimental data collection involves one performing an experiment
and then collecting the data to be further analyzed. Experiments
involve tests and the results of these tests are your data.
An example of experimental data collection is rolling a die one
hundred times while recording the outcomes. Your data would be the
results you get in each roll. The experiment could involve rolling the
die in different ways and recording the results for each of those
different ways.
Experimental data collection is useful in testing theories and
different products and is a very fundamental aspect of mathematics
and all science as a whole.

Observational Data Collection

Observational data collection method involves not carrying out an
experiment but observing without influencing the population at all.
Observational data collection is popular in studying trends and
behaviors of society where, for example, the lives of a bunch of
people are observed and data is collected for the different aspects of
their lives.

Data can be defined as groups of information that represent the
qualitative or quantitative attributes of a variable or set of variables,
which is the same as saying that data can be any set of information
that describes a given entity. Data in statistics can be classified into
grouped data and ungrouped data.
Any data that you first gather is ungrouped data. Ungrouped data is
data in the raw. An example of ungrouped data is a any list of
numbers that you can think of.
Grouped Data
Grouped data is data that has been organized into groups known as
classes. Grouped data has been 'classified' and thus some level of
data analysis has taken place, which means that the data is no
longer raw.
A data class is group of data which is related by some user defined
property. For example, if you were collecting the ages of the people
you met as you walked down the street, you could group them into
classes as those in their teens, twenties, thirties, forties and so on.
Each of those groups is called a class.
Each of those classes is of a certain width and this is referred to as
the Class Interval or Class Size. This class interval is very
important when it comes to drawing Histograms and Frequency
diagrams. All the classes may have the same class size or they may
have different classes sizes depending on how you group your data.
The class interval is always a whole number.
Below is an example of grouped data where the classes have the
same class interval.



0 - 9


10 - 19


20 - 29


30 - 39


40 - 49

50 - 59

60 - 69

Below is an example of grouped data where the classes have
different class interval.
Age (years)


Class Interval

0 - 9



10 - 19



20 - 29



30 - 49



50 - 79



Calculating Class Interval

Given a set of raw or ungrouped data, how would you group that
data into suitable classes that are easy to work with and at the same
time meaningful?
The first step is to determine how many classes you want to have.
Next, you subtract the lowest value in the data set from the highest
value in the data set and then you divide by the number of classes
that you want to have:

Example 1:
Group the following raw data into ten classes.


The first step is to identify the highest and lowest number

Class interval should always be a whole number and yet in this case
we have a decimal number. The solution to this problem is to round
off to the nearest whole number.
In this example, 2.8 gets rounded up to 3. So now our class width
will be 3; meaning that we group the above data into groups of 3 as
in the table below.


1 - 3

4 - 6

7 - 9

10 - 12

13 - 15

16 - 18

19 - 21

22 - 24

25 - 27

28 - 30

Class Limits and Class Boundaries

Class limits refer to the actual values that you see in the table.
Taking an example of the table above, 1 and 3 would be the class
limits of the first class. Class limits are divided into two categories:
lower class limit and upper class limit. In the table above, for the
first class, 1 is the lower class limit while 3 is the upper class limit.
On the other hand, class boundaries are not always observed in the
frequency table. Class boundaries give the true class interval, and
similar to class limits, are also divided into lower and upper class

The relationship between the class boundaries and the class interval
is given as follows:

Class boundaries are related to class limits by the given


As a result of the above, the lower class boundary of one class is

equal to the upper class boundary of the previous class.

Class limits and class boundaries play separate roles when it comes
to representing statistical data diagrammatically as we shall see in a

Probability is the branch of mathematics that deals with the study
chance. Probability deals with the study of experiments and their
Probability Key Terms
An experiment in probability is a test to see what will
happen incase you do something. A simple example is
flipping a coin. When you flip a coin, you are performing
an experiment to see what side of the coin you'll end up
An outcome in probability refers to a single (one) result of
an experiment. In the example of an experiment above,
one outcome would be heads and the other would be tails.
An event in probability is the set of a group of different
outcomes of an experiment. Suppose you flip a coin
multiple times, an example of an event would the getting
a certain number of heads.
Sample Space
A sample space in probability is the total number of all the
different possible outcomes of a given experiment. If you
flipped a coin once, the sample space S would be given

If you flipped the coin multiple times, all the different

combinations of heads and tails would make up the sample
space. A sample space is also defined as a Universal Set for the
outcomes of a given experiment.

Notation of Probability
The probability that a certain event will happen when an experiment
is performed can in layman's terms be described as the chance that
something will happen.
The probability of an event, E is denoted by

Suppose that our experiment involves rolling a die. There are 6

possible outcomes in the sample space, as shown below:

The size of the sample space is often denoted by N while the

number of outcomes in an event is denoted by n.
From the above, we can denote the probability of an event as:

For the sample space given above, if the event is 2, there is only one
2 in the sample space, thus n = 1 and N = 6.
Thus probability of getting a 2 when you roll a die is given by

Understanding the Magnitude of the Probability of an

The largest probability an event can have is one and the smallest is
zero. There are no negative probabilities and no probabilities greater
than one. Probabilities are real positive numbers ranging from zero
to one. The closer the probability is to 1, the more likely the event is
to occur while the closer the event is to zero, the less likely the
event is to occur.

When an event has probability of one, we say that the event must
happen and when the probability is zero we say that the event is

The total of all the probabilities of the events in a sample space add
up to one.
Events with the same probability have the same likelihood of
occurring. For example, when you flip a fair coin, you are just as
likely to get a head as a tail. This is because these two outcomes
have the same probability i.e.

Concepts in Probability
The study of probability mostly deals with combining different
events and studying these events alongside each other. How these
different events relate to each other determines the methods and
rules to follow when we're studying their probabilities.
Events can be pided into two major categories dependent or
Independent events.
Independent Events

When two events are said to be independent of each other, what

this means is that the probability that one event occurs in no way
affects the probability of the other event occurring. An example of
two independent events is as follows; say you rolled a die and
flipped a coin. The probability of getting any number face on the die
in no way influences the probability of getting a head or a tail on the
Dependent Events
When two events are said to be dependent, the probability of one
event occurring influences the likelihood of the other event.
For example, if you were to draw a two cards from a deck of 52
cards. If on your first draw you had an ace and you put that aside,
the probability of drawing an ace on the second draw is greatly
changed because you drew an ace the first time. Let's calculate
these different probabilities to see what's going on.
There are 4 Aces in a deck of 52 cards

On your first draw, the probability of getting an ace is given by:

If we don't return this card into the deck, the probability of drawing
an ace on the second pick is given by

As you can clearly see, the above two probabilities are different, so
we say that the two events are dependent. The likelihood of the
second event depends on what happens in the first event.
Conditional Probability
We have already defined dependent and independent events and
seen how probability of one event relates to the probability of the
other event.
Having those concepts in mind, we can now look at conditional
Conditional probability deals with further defining dependence of
events by looking at probability of an event given that some other
event first occurs.
Conditional probability is denoted by the following:

The above is read as the probability that B occurs given that A

has already occurred.
The above is mathematically defined as:

Set Theory in Probability

A sample space is defined as a universal set of all possible outcomes
from a given experiment.
Given two events A and B and given that these events are part of a
sample space S. This sample space is represented as a set as in the
diagram below.

The entire sample space of S is given by:

Remember the following from set theory:

The different regions of the set S can be explained as using the rules
of probability.
Rules of Probability
When dealing with more than one event, there are certain rules that
we must follow when studying probability of these events. These
rules depend greatly on whether the events we are looking at are
Independent or dependent on each other.
First acknowledge that

Multiplication Rule (AB)

This region is referred to as 'A intersection B' and in probability; this
region refers to the event that both A and B happen. When we use

the word and we are referring to multiplication, thus A and B can

be thought of as AxB or (using dot notation which is more popular in
probability) AB
If A and B are dependent events, the probability of this event
happening can be calculated as shown below:

If A and B are independent events, the probability of this event

happening can be calculated as shown below:

Conditional probability for two independent events can be redefined

using the relationship above to become:

The above is consistent with the definition of independent events,

the occurrence of event A in no way influences the occurrence of
event B, and so the probability that event B occurs given that
event A has occurred is the same as the probability of event B.
Additive Rule (AB)
In probability we refer to the addition operator (+) as or. Thus when
we want to we want to define some event such that the event can
be A or B, to find the probability of that event:

Thus it follows that:

But remember from set theory that and from the way we defined our
sample space above:

and that:

So we can now redefi ne out event as

The above is sometimes referred to as the subtraction rule.

Mutual Exclusivity
Certain special pairs of events have a unique relationship referred to
as mutual exclusivity.
Two events are said to be mutually exclusive if they can't occur at
the same time. For a given sample space, its either one or the other
but not both. As a consequence, mutually exclusive events have
their probability defined as follows:

An example of mutually exclusive events are the outcomes of a fair

coin flip. When you flip a fair coin, you either get a head or a tail but
not both, we can prove that these events are mutually exclusive by
adding their probabilities:

For any given pair of events, if the sum of their probabilities is equal
to one, then those two events are mutually exclusive.
Rules of Probability for Mutually Exclusive Events
Multiplication Rule
From the definition of mutually exclusive events, we
should quickly conclude the following:

Addition Rule
As we defined above, the addition rule applies to
mutually exclusive events as follows:

Subtraction Rule
From the addition rule above, we can conclude that
the subtraction rule for mutually exclusive events
takes the form;

Conditional Probability for Mutually Exclusive Events

We have defined conditional probability with the following equation:

We can redefine the above using the multiplication rule


Below is a venn diagram of a set containing two mutually exclusive

events A and B.

Probability Distributions - Random Variables

A random variable is defined as a function that associates a real
number (the probability value) to an outcome of an experiment.
In other words, a random variable is a generalization of the
outcomes or events in a given sample space. This is possible since
the random variable by definition can change so we can use the
same variable to refer to different situations. Random variables
make working with probabilities much neater and easier.

A random variable in probability is most commonly denoted by

capital X, and the small letter x is then used to ascribe a value to
the random variable.
For examples, given that you flip a coin twice, the sample space for
the possible outcomes is given by the following:

There are four possible outcomes as listed in the sample space

above; where H stands for heads and T stands for tails.
The random variable X can be given by the following:

To find the probability of one of those out comes we denote that

question as:

which means that the probability that the random variable is equal
to some real number x.
In the above example, we can say:
Let X be a random variable defined as the number of heads
obtained when two coins are tossed. Find the probability the you
obtain two heads.
So now we've been told what X is and that x = 2, so we write the
above information as:

Since we already have the sample space, we know that there is only
one outcomes with two heads, so we find the probability as:

we can also simply write the above as:

From this example, you should be able to see that the random
variable X refers to any of the elements in a given sample space.
There are two types of random variables: discrete variables and
continuous random variables.
Discrete Random Variables
The word discrete means separate and individual. Thus discrete
random variables are those that take on integer values only. They
never include fractions or decimals.
A quick example is the sample space of any number of coin flips, the
outcomes will always be integer values, and you'll never have half
heads or quarter tails. Such a random variable is referred to as
discrete. Discrete random variables give rise to discrete probability
Continuous Random Variable
Continuous is the opposite of discrete. Continuous random variables
are those that take on any value including fractions and decimals.
Continuous random variables give rise to continuous probability
Probability Distributions
A probability distribution is a mapping of all the possible values of a
random variable to their corresponding probabilities for a given
sample space.
The probability distribution is denoted as

which can be written in short form as

The probability distribution can also be referred to as a set of

ordered pairs of outcomes and their probabilities. This is known as
the probability function f(x).
This set of ordered pairs can be written as:

where the function is defined as:

Cumulative Distribution Function (CDF)

The Cumulative Distribution Function (CDF) is defined as the
probability that a random variable X with a given probability
distribution f(x) will be found at a value less than x. The cumulative
distribution function is a cumulative sum of the probabilities up to a
given point.
The CDF is denoted by F(x) and is mathematically described as:

Discrete Probability Distributions

Discrete random variables give rise to discrete probability
distributions. For example, the probability of obtaining a certain
number x when you toss a fair die is given by the probability
distribution table below.

P(X = x)

For a discrete probability distribution, the set of ordered pairs

(x,f(x)), where x is each outcome in a given sample space and f(x) is
its probability, must follow the following:
P(X = x) = f(x)
f(x) 0
x f(x) = 1
Cumulative Distribution Function for a Discrete Random
For a discrete random variable, the CDF is given as follows:

In other words, to get the cumulative distribution function, you sum

up all the probability distributions of all the outcomes less than or
equal to the given variable.
For example, given a random variable X which is defined as the face
that you obtain when you toss a fair die, find F(3)

The probability function can also found from the cumulative

distribution function, for example

given that you know the full table of the cumulative distribution
functions of the sample space.
Continuous Probability Distribution
Continuous random variables give rise to continuous probability
distributions. Continuous probability distributions can't be tabulated
since by definition the probability of any real number is zero i.e.

This is because the random variable X is continuous and as such can

be infinitely divided into smaller parts such that the probability of
selecting a real integer value x is zero.
Consequently, the continuous probability distribution is found as

and so on.
While a discrete probability distribution is characterized by its
probability function (also known as the probability mass function),

continuous probability distributions are characterized by

their probability density functions.
Since we look at regions in which a given outcome is likely to occur,
we define the Probability Density Function (PDF) as the a function
that describes the probability that a given outcome will occur at a
given point.
This can be mathematically represented as:

In other words, the area under the curve.

For a continuous probability distribution, the set of ordered pairs
(x,f(x)), where x is each outcome in a given sample space and f(x) is
its probability, must follow the following:
P(x_ 1 < X < x 2 ) = x _ 1 x 2 f(x) dx
f(x) 0 for all real numbers
f(x) dx = 1
Cumulative Distribution Function for a Continuous
Probability Distribution
For a continuous random variable X, its CDF is given by

which is the same as saying:


From the above, we can see that to find the probability density
function f(x) when given the cumulative distribution function F(x);

if the derivative exists.

Continuous probability distributions are given in the form

whereby the above means that the probability density function f(x)
exists within the region {x;a,b} but takes on the value of zero
anywhere else.
For example, given the following probability density function

1. P(X 4)

Since we're finding the probability that the random variable is less
than or equal to 4, we integrate the density function from the given
lower limit (1) to the limit we're testing for (4).
We need not concern ourselves with the 0 part of the density
function as all it indicates is that the function only exists within the
given region and the probability of the random variable landing
anywhere outside of that region will always be zero.

2. P(X < 1)
P(X < 1) = 0 since the density function f(x) doesn't exist outside of
the given boundary
3. P(2 X 3)
Since the region we're given lies within the boundary for which x is
defined, we solve this problem as follows:

4. P(X > 1)
The above problem is asking us to find the probability that the
random variable lies at any point between 1 and positive Infinity. We
can solve it as follows:

but remember that we approximate the inverse of infinity to zero

since it is too small

The above is our expected result since we already defined f(x) as

lying within that region hence the random variable will always be
picked from there.
5. F(2)
The above is asking us to find the cumulative distribution function
evaluated at 2.

Thus F(2) can be found from the above as

Joint Probability Distributions

In the section on probability distributions, we looked at discrete and
continuous distributions but we only focused on single random
variables. Probability distributions can, however, be applied to
grouped random variables which gives rise to joint probability
distributions. Here we're going to focus on 2-dimensional
distributions (i.e. only two random variables) but higher dimensions
(more than two variables) are also possible.
Since all random variables are divided into discrete and continuous
random variables, we have end up having both discrete and
continuous joint probability distributions. These distributions are not
so different from the one variable distributions we just looked at but
understanding some concepts might require one to have knowledge
of multivariable calculus at the back of their mind.
Essentially, joint probability distributions describe situations where
by both outcomes represented by random variables occur. While we
only X to represent the random variable, we now have X and Y as
the pair of random variables.
Joint probability distributions are defined in the form below:

where by the above represents the probability that

events x and y occur at the same time.

The Cumulative Distribution Function (CDF) for a joint

probability distribution is given by:

Discrete Joint Probability Distributions

Discrete random variables when paired give rise to discrete joint
probability distributions. As with single random variable discrete
probability distribution, a discrete joint probability distribution can
be tabulated as in the example below.
The table below represents the joint probability distribution obtained
for the outcomes when a die is flipped and a coin is tossed.

1 2 3 4 5 6

Row Totals


a b c d e f


g h i

Column Totals

k l

In the table above, x = 1, 2, 3, 4, 5, 6 as outcomes when the die is

tossed while y = Heads, Tails are outcomes when the coin is
flipped. The letters a through l represent the joint probabilities of the
different events formed from the combinations of x and ywhile the
Greek letters represent the totals and should equal to 1. The row
sums and column sums are referred to as the marginal probability
distribution functions (PDF).
We shall see in a moment how to obtain the different probabilities
but first let us define the probability mass function for a joint
discrete probability distribution.
The probability function, also known as the probability mass function
for a joint probability distribution f(x,y) is defined such that:
f(x,y) 0 for all (x,y)
Which means that the joint probability should
always greater or equal to zero as dictated by the
fundamental rule of probability.
x y f(x,y) = 1

Which means that the sum of all the joint

probabilities should equal to one for a given sample
f(x,y) = P(X =x, Y = y)
The mass probability function f(x,y) can be calculated in a number of
different ways depend on the relationship between the random
variables X and Y.
As we saw in the section on probability concepts, these two
variables can be either independent or dependent.
If X and Y are Independent:

In the example we gave above, flipping a coin and tossing a die are
independent random variables, the outcome from one event does
not in any way affect the outcome in the other events. Assuming
that the coin and die were both fair, the probabilities given
by a through l can be obtained by multiplying the probabilities of the
different x and y combinations.
For example: P(X = 2, Y = Tails) is given by

Since we claimed that the coin and the die are fair, the
probabilities a through l should be the same.
The marginal PDF's, represented by the Greek letters should be the
probabilities you expect when you obtain each of the outcomes.
For example:

The table thus becomes:


Row Totals
















Column Totals
If X and Y are Dependent:

If X and Y are dependent variables, their joint probabilities are

calculated using their different relationships as in the example
Given a bag containing 3 black balls, 2 blue balls and 3 green balls,
a random sample of 4 balls is selected. Given that X is the number
of black balls and Y is the number of blue balls, find the joint
probability distribution of X and Y.

The random variables X and Y are dependent since they are picked
from the same sample space such that if any one of them is picked,
the probability of picking the other is affected. So we solve this
problem by using combinations.
We've been told that there are 4 possible outcomes of X i.e
{0,1,2,3} where by you can pick none, one, two or three black balls;
and similarly for Y there are 3 possible outcomes {0,1,2} i.e. none,
one or two blue balls.
The joint probability distribution is given by the table below:
Column Totals

0 1 2 3 Row Totals

To fill out the table, we need to calculate the different entries. We

know the total number of black balls to be 3, the total number of
blue balls to be 2, the total sample need to be 4 and the total
number of balls in the bag to be 3+2+3 = 8.
We find the joint probability mass function f(x,y) using combinations

What the above represents are the different number of ways we can
pick each of the required balls. We substitute for the different values
of x (0,1,2,3) and y (0,1,2) and solve i.e.

f(0,0) is a special case. We don't calculate this and we outright

claim that the probability of obtaining zero black balls and zero blue
balls is zero. This is because of the size of the entire population
relative to the sample space. We need 4 balls from a bag of 8 balls,
in order not to pick black nor blue balls, we would need there to be
at least 4 green balls. But we only have 3 green balls so we know
that as a rule we must have at least either one black or blue ball in
the sample.

f(3,2) doesn't exist since we only need 4 balls.

From the above, we obtain the joint probability distribution as:
Column Totals






Row Totals

Continuous Joint Probability Distribution

Continuous Joint Probability Distributions arise from groups of
continuous random variables.
Continuous joint probability distributions are characterized by
the Joint Density Function, which is similar to that of a single
variable case, except that this is in two dimensions.
The joint density function f(x,y) is characterized by the following:
f(x,y) 0, for all (x,y)
f(x,y) dx dy = 1
For any region A lying in the xy plane,

The marginal probability density functions are given by

whereby the above is the probability distribution of random variable

X alone.

The probability distribution of the random variable Y alone, known as

its marginal PDF is given by

A certain farm produces two kinds of eggs on any given day; organic
and non-organic. Let these two kinds of eggs be represented by the
random variables X and Y respectively. Given that the joint
probability density function of these variables is given by

a) Find the marginal PDF of X

b) Find the marginal PDF of Y
c) Find the P(X 12, Y 12)

a) The marginal PDF of X is given by g(x) where

b) The marginal PDF of Y is given by h(y) where

c) P(X 12, Y 12

Mixed Joint Probability Distribution

So far we've looked pairs of random variables where both variables
are either discrete or continuous. A joint pair of random variables
can also be composed of one discrete and one continuous random
variable. This gives rise to what is known as a mixed joint probability

The density function for a mixed probability distribution is given by

where by X is a continuous random variable and Y is a discrete

random variable, g(x) is the marginal pdf of X.
The cumulative distribution function is given by

Conditional Probability Distribution

Conditional Probability Distributions arise from joint probability
distributions where by we need to know that probability of one event
given that the other event has happened, and the random variables
behind these events are joint.
Conditional probability distributions can be discrete or continuous,
but the follow the same notation i.e.

where the above is the conditional probability of X given that Y = y.

The conditional probability of variable Y given that X = x is given by:

The conditional probability distribution for a discrete set of random

variables can be found from:

where the above is the probability that X lies between a and b given
that Y = y.
For a set of continuous random variables, the above probability is
given as:

Two random variables are said to be statistically independent if

their conditional probability distribution is given by the following:

where g(x) is the marginal pdf of X and h(y) is the marginal pdf of Y.