Sie sind auf Seite 1von 40

Discrete Random Variables and

Their Probability Distribution:


Part I
Cyr Emile MLAN, Ph.D.
mlan@stat.uconn.edu
Discrete Random Variable: Part I

p. 1/40

Introduction
Text Reference: Introduction to Probability and Its
Application, Chapter 4.
Reading Assignment: Sections 4.1-4.2, February
16-February 18

In this chapter we extend the concepts and techniques


of probability introduced in Chapter 2.
Well also study how probability provides the basis for
making statistical inferences.
Suppose that one flips a coin 100 times and counts the
number of heads. The objective is to infer from the
count that the coin is not balanced.
Discrete Random Variable: Part I

p. 2/40

Introduction
It is reasonable to believe that observing a large
number of heads (say, 90) or a small number of tails
(say, 15) would be an indication of an unbalanced coin.
However, where do one draw the line? At 85 or 75 or 65
or 55? Without knowing the probability of the frequency
of the number of heads from a balanced coin, one
cannot draw such a line. Hence, we would not be able
to draw any conclusions from the sample of 100 coin
flips.
The concepts and techniques of probability developed
in this chapter will allow us to calculate such thresholds
we seek.

Discrete Random Variable: Part I

p. 3/40

Random Variables
Sample space need not consist of numbers. But very
often we are interested in numerical outcomes
associated with the random phenomenon generating
the sample space.

Example 4.1:

Consider an experiment where we flip two balanced


coins and observe the results. The sample space is
S = {(H, H), (H, T ), (T, H), (T, T )}.
However, one can list the events in a different way by
counting the number of heads (or if we wish, the
number of tails). Thus, the four simple events are
replaced now by: 2 heads, 1 head, 1 head, 0 heads,
resp.
The number of heads, X , is called a random variable
(r.v.). The induced sample space associated to the r.v.
Discrete Random Variable: Part I
p. 4/40
X is S ? = {0, 1, 2}.

Random Variables
Example 4.2:
In the game of craps played in casinos, the player
tosses two dice. A natural way of listing the events is to
describe the number on the first die and the number on
the second die. Hence, the sample space is.
S = {(i, j), i, j = 1, 2, 3, 4, 5, 6} .

However, in this game the player is primarily interested


in the total sum of the two dice. If we define the random
variable X to be the total of the two dice, X can take
integer values between 2 and 12. Hence, the induced
sample space for Y is
S ? = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} .

Discrete Random Variable: Part I

p. 5/40

Random Variables
Definition 4.1:
A random variable is a real-valued function or rule defined
over a sample space that assigns a number to each outcome
of a random experiment.

Types of Random Variables


There are two basic types of random variables: discrete
and continuous.
A random variable (r.v.) is discrete if its sets of
possible values is finite or can be listed as an infinite
sequence. (outcomes can be listed or counted).
X = number of defective television sets in a shipment of 100 sets.
Y = number of Queens on four successive draws (without replacement)
in a deck of 52 cards.

Discrete Random Variable: Part I

p. 6/40

Random Variables
Z = number of words spelled correctly in a spelling test.
T = number of bacteria per cubic centimeter of drinking water.
V = number of heart beats per minute.
W = number of batteries examined until a good one is found.

A random variable is continuous if the values it can


assumed cannot be enumerated (can be
represented by an interval).
X = the height above sea level of any geographical location.
Y = lifetime of a light bulb.
Z = body temperature or weight or height of a new born child.
U = length of time until recovery from a disease.
V = gas mileage.
R = amount of rainfall in storrs per year.
T = the amount of monoxide produced per gallon of unleaded gas.
W = the amount of water used by a household during a given month.

Discrete Random Variable: Part I

p. 7/40

Joke

Two random variables were


gossiping and thought they
were discrete by whispering
but I heard their chatter
continuously.

Discrete Random Variable: Part I

p. 8/40

Discrete Random Variables


The rest of this chapter is dedicated to discrete random
variables.
Because certain types of discrete random variables are
more common in practice, one goal of this chapter is to
study the properties of these random variables and to
learn to apply the results to cases with similar
characteristic without having to solve the same
probability problems over and over again.
Since the value of a discrete random variable is
determined by the outcome of a random experiment, we
may assign probabilities to its possible values.
Discrete Random Variable: Part I

p. 9/40

Discrete Random Variables


Hence, it is meaningful to talk about the probability that
the discrete random variable X takes on the value x.
We denote this by P (X = x).
Note that simple events for a discrete random variables
are of the type (X = x).
The simple event (X = x) represents the set of all
simple outcomes in the sample space of the random
experiment that have been assigned the value x by the
discrete random variable X .

Discrete Random Variable: Part I

p. 10/40

The Probability Distribution of a Discrete Random Variables

Example 4.3:

In the monopoly game, two dices are thrown in the air


and the outcomes are recorded. However the r.v.,
X = sum of the two outcomes is the one of interest. Let
assume that the two dices are identical and balanced.
As a result, all pairs of possible outcomes (i, j),

1
1 i, j 6 are equally likely and P (i, j) = .
36
X takes the values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.
Note that the event (X = 8) can be obtained as 2+6,
3+5, 4+4, 5+3, and 6+2 while the event (X = 2) can
only be obtained as 1+1. As a result, the sum 8 is 5
times more likely than the sum 2.
So outcomes of the random variable X do not have
the same chance to be observed unlike the
Discrete Random Variable: Part I
p. 11/40
outcomes (i, j)s recorded from both dice.

The Probability Distribution of a Discrete Random Variables

The following table summarizes all outcomes with


their chance of occurrence.
x

10

11

12

P (X = x)

1
36

2
36

3
36

4
36

5
36

6
36

5
36

4
36

3
36

2
36

1
36

These chances are referred to as the probability


mass function of the r.v. X . The table above
provides a complete description of the r.v. X .
The chances can be derived as:
P (X = 8) = P ((2, 6) or (3, 5) or (4, 4) or (5, 3) or (6, 2))
1
1
1
1
1
5
=
+
+
+
+
=
36 36 36 36 36
36
Discrete Random Variable: Part I

p. 12/40

The Probability Distribution of a Discrete Random Variable

Definition 4.2:
The probability distribution or probability mass function of a
discrete random variable X is a mathematical function that
can be represented by a formula, a table or a graph that
assigns to every possible outcome (X = x), its probability
of occurrence P (X = x). It is sometimes simply denoted by
p(x).

The probability that X takes on the value x, P (X = x)


can be obtained as the sum of the probabilities of all
sample points in S that are assigned the values x.
Discrete Random Variable: Part I

p. 13/40

The Probability Distribution of a Discrete Random Variable

The probability mass function tells us what values the


discrete random variable X takes and how often the
random variable X takes these values.
Each probability, P (X = x) can be viewed as a limiting
value of relative frequency of the outcomes of the
discrete random variable X when the experiment is
repeated over and over again many times.
This interpretation is very useful as it tells us that if we
can repeat an experiment over and over again many
times, one will get relative frequencies very similar to
the probability distribution. Hence, if one decide that a
probability distribution is appropriate for a discrete
random variable, one has a way to verify if the model is
accurate.
Discrete Random Variable: Part I

p. 14/40

Joke on Sample Size


An engineer, consultant and statistician were driving down a steep
mountain road in County Donegal one evening. All of a sudden the
brakes failed and the car careened down the road out of control. But
half way down, the driver somehow managed to stop the car by
running it against the embankment narrowly avoiding going over a
very steep cliff. They all got out, shaken by their narrow escape from
death, but otherwise unharmed. The consultant said: "To fix this
problem we need to organize a committee, have meetings, write
several interim reports and through a process of continuous
improvement, develop a solution."
The engineer said: "No! That would take far too long, and besides
that method never worked before. I have my trusty penknife here and
will take apart the brake system, isolate the problem and correct it."
The statistician said: "No - youre both wrong! Lets all push the car
back up the hill and see if it happens again. We only have a sample
of size 1 here!"
Discrete Random Variable: Part I

p. 15/40

The Probability Distribution of a Discrete Random Variable

The following properties must be satisfied for any


discrete probability distribution. They are equivalent to
the rule of probability provided in Chapter 3.
Theorem 4.1:
Suppose that a discrete random variable X assumes the
values x1 , x2 , , xk with probability p1 = P (X = x1 ),
p2 = P (X = x2 ), , pk = P (X = xk ).
Then the following properties must be true:
1. 0 pi 1 for i = 1, , k.
2. p1 + p2 + + pk = 1.

To find the probability of any composite event we add


up the probabilities pi of the xi that make up the event
Discrete Random Variable: Part I
p. 16/40
of interest.

The Probability Distribution of a Discrete Random Variable

Example 4.4:
A supervisor in a manufacturing plant has three men and three
women for him. He wants to choose two workers for a special job.
Not wishing to show any biases in the selection, he decided to select
the two workers at random. Let X denote the number of women in
his selection. Find the probability distribution for X.

6
Solution: The supervisor can select two workers from six in 2
ways. Hence, S contains 15 sample points assumed to be equally
likely. The values for X are 0, 1, 2.
p(0)

P (X = 0) =

`3 `3
0

`62 =
2

p(1)

P (X = 1) =

`3 `3
1

`61 =
2

p(2)

1
5
3
5

`3 `3

1
5
2
Discrete Random Variable: Part I

P (X = 2) =

`60 =

p. 17/40

The Probability Distribution of a Discrete Random Variable

Hence,
x

p(x)

1/5

3/5

1/5

or in short
p(x) =

3
3 
x 2x
6
2

x = 0, 1, 2 .

Example 4.5:

Consider a group of five potential blood donors A, B, C, D, and E


of whom only A and B have type O+ blood. Five blood samples, one
from each individual, will be typed in random order until an O+
individual is identified. Let X = the number of typings necessary to
Discrete Random Variable: Part I
p. 18/40
identify an O+ individual. Find the probability mass function of x.

The Probability Distribution of a Discrete Random Variable

Solution:
The values for Y are 1, 2, 3, and 4.
2
5

p(1)

P (X = 1) = P (A or B typed first) =

p(2)

P (X = 2) = P (C, D, or E first, and then A or B) =

p(3)

P (X = 3) = P (C, D, or E first and second, and then A or B) =

p(4)

P (X = 4) = P (C, D, or E all done first) =

32
3
=
54
10
322
1
=
543
5

321
1
=
543
10

Hence,
x

p(x)

.4

.3

.2

.1

Discrete Random Variable: Part I

p. 19/40

The Probability Distribution of a Discrete Random Variables

Often one is interested in events of the form {X b} for


any real b number. To describe such events we define:
The distribution function or cumulative distribution
function F (b) for a r.v. X is defined as
F (b) = P (X b).
It is the probability that the observed value of X will
be at most b.
In the case of a discrete random variable X ,
F (b) =

b
X

p(x)

x=

By looking at the cumulative probabilities associated to


a random variable one can study the character of a
random variable.
Discrete Random Variable: Part I
p. 20/40

The Probability Distribution of a Discrete Random Variables

Theorem 4.2:
Every distribution function satisfies the following four properties:
1. lim F (x) = 0
x

2.

lim F (x) = 1

3. The distribution function is a nondecreasing function;


if a < b,

F (a) F (b)

4. The distribution function is right-hand continuous;


lim F (x + h) = F (x)

h0+

Similarly, any function satisfying the four properties above is


a distribution function.
Discrete Random Variable: Part I
p. 21/40

The Probability Distribution of a Discrete Random Variables

The graph of the distribution function is a step function


and it is defined for all real numbers. It is discontinuous
at points of positive probability and constant between
two consecutive points of positive probability. The
change in the value of the cumulative function at a point
of positive probability (high of the jump in the graph) at b
is P (X = b), therefore the change is always positive.
One can retrieve the probability mass function from the
probability distribution
P (X = x) = F (x) lim F (x + h)
h0

More generally, we have


P (a < X b) = F (b) F (a)
Discrete Random Variable: Part I

p. 22/40

The Probability Distribution of a Discrete Random Variables

Example 4.6: Example 4.3 revisited


X =x

10

11

12

p(x)

1
36
1
36

2
36
1
12

3
36
1
6

4
36
5
18

5
36
5
12

6
36
7
12

5
36
13
18

4
36
5
6

3
36
11
12

2
36
35
36

1
36

F (x)

P (X = 7) = F (7) lim F (7 + h)
h0

= F (7) F (6) =

5
1
7

=
12 12
6

5 1
4
=
6 6
6
P (4 < X 9) = p(5) + p(6) + p(7) + p(8) + p(9)
4
5
6
5
4
24
=
+
+
+
+
=
36 36 36 36 36
36

P (4 < X 9) = F (9) F (4) =

Discrete Random Variable: Part I

p. 23/40

Discrete Random Variable and Expected Values


Just like in descriptive statistics, when the sample space for the
discrete random variable X is large, the probability distribution is not
a useful summary measure. It provides too many information and it is
hard to capture the big picture. We need again measure of central
tendency and measure of variability.

Example 4.7:
Consider two r.v. X and Y , representing the profit from two different
investments. Suppose the two probability distributions have been set
up as follows:
X=x

-2000

-1000

1000

2000

5000

pX (x)

.05

.10

.10

.25

.50

Y =y

1000

2000

3000

5000

pY (y)

.40

.20

.20

.10

.10

Discrete Random Variable: Part I

Compare investment X and Y .

p. 24/40

Discrete Random Variable and Expected Values


Definition 4.3:
Let X be a discrete random variable with the probability function p(x). Then expected value of X, denoted E(X) or X ,
is defined to be
X
X = E(X) =
xp(x) .
x

Sum is over all discrete values assumed by X.

Note that the expected value is a number, not a random


variable. It is a population parameter.
The expected value is the center of the mass of the
probability distribution of X .

It is expected in the sense


that

we
expect
on
average
Discrete Random Variable: Part I
p. 25/40
over the long-run.

Discrete Random Variable and Expected Values


Example 4.8: Example 4.7 revisited.
Find the expected values of the investment X and Y .

Solution:
X = 2000(.05) 1000(.10) + 1000(.10) +
2000(.25) + 5000(.50)
= 2900.
Y = 0(.40) + 1000(.20) + 2000(.20) + 3000(.10) +
5000(.10)
= 1400.

It follows that 2900 = X > Y = 1400.


Thus, investment X has a better return than investment
Y on basis of their expected values. In other words, in
will provide
higher profit than
the long run, investment XDiscrete
Random Variable: Part I
p. 26/40
investment Y .

Discrete Random Variable and Expected Values


Theorem 4.4:
Let X be a discrete random variable with the probability function p(x) and g(X) be a real-valued function of X. Then the
expected value of the random variable g(X) is given by

 X
E g(X) =
g(x)p(x) .
x

Sum is over all discrete values assumed by X.


In particular, if a and b are two constant
E(b) = b
E(X + b) = E(X) + b
E(aX) = a E(X)
E(aX + b) = a E(X) + b
Discrete Random Variable: Part I

p. 27/40

Discrete Random Variable and Expected Values


Example 4.9:
When an attorney takes a case, she must decide whether to charge a
flat fee or to take a fixed percent of any settlement. Suppose the
attorney has a case for which she could either charge a flat fee of
$5000 or take 20% of any settlement. The attorney feels that the
probability of losing the case is .4. If the case is won, the judge will
award either $50,000 or $100,000. The probability of winning the
case and receiving a judgment of $50,000 is thought to be .5,
whereas the probability of winning with a judgement of $100,000 is
thought to be .1. Does the attorney expect to make more money by
charging the full flat fee or by taking 20% of any settlement?

Solution:
First scenario of flat fee: = 5000
Discrete Random Variable: Part I

p. 28/40

Discrete Random Variable and Expected Values


Second scenario of 20% of settlement:
The probability distribution for this problem is
X
0 50,000 100,000
p(X) .40
.50
.10
= 0(.4) + 50, 000(.5) + 100, 000(.1) = 35, 000

and 20% of 35,000 is 7000.


Hence, the average earned income is greater for the
attorney if she takes a 20% of the settlement instead of
the flat fee.
Discrete Random Variable: Part I

p. 29/40

Discrete Random Variable and Standard Deviation


The expected value alone is not enough as a summary
measure. Measure of spread along with expected
values provide better description of the probability
distribution.
Definition 4.4:
Let X be a discrete random variable with the probability function p(x) and mean E(X) = .
The variance, 2 , of the random variable X is defined as
X


2 = V (X) = E (x )2 =
(x )2 p(x).
x

Sum is over all discrete values assumed by X.


The standard deviation, , of the random variable X is de
fined as = 2 .
Discrete Random Variable: Part I

p. 30/40

Variance and Standard Deviation


The variance is thought of as an average squared
distance between the values of X and its center . Its
units of measurements are in square units of
measurements of X .
The standard deviation is thought of as an average
distance between the values of X and its center . Its
units of measurements are in the original units of
measurements.

Shortcut Computational Formula:


2

=E X

where E X


2

2 .

x2 p(x).

x
Discrete Random Variable: Part I

p. 31/40

Variance and Standard Deviation


If a and b are two constants,
V (aX + b) = a2 V (X)

Example 4.10: Example 4.7 revisited.


Find the standard deviations of the investment X and Y .

Solution:
2
X
= 5, 490, 000
X = 2343.07

and
and

Y2 = 2, 440, 000
Y = 1562.05

It follows that 2243.07 = X > Y = 1562.05.


Thus, investment X is more variable than investment Y .
Discrete Random Variable: Part I

p. 32/40

Discrete Random Variable and Expected Value


Example 4.11:
A department store is considering the cost maintenance before
purchasing a new photocopy machine. Let X denote the number of
repairs during a year. Based on past performance of this photocopy
machine in other departments that have purchased it a year ago, the
distribution of X is

Number of repairs, x
p(x)

0.2

0.3

0.4

0.1

The cost of maintenance consists of the expense of a service


agreement fee and the cost of each repair. The service agreement
fee is $200. With the agreement, the cost of each repair is $50.
Find mean and standard deviation of the annual costs of repair for a
photocopy machine.
Discrete Random Variable: Part I

p. 33/40

Discrete Random Variable and Expected Value


Solution:
= 0(0.2) + 1(0.3) + 2(0.4) + 3(0.1) = 1.4.
Thus, the photocopy machine will need to be repaired 1.4 times a
year on average.
2 = 02 (0.2) + 12 (0.3) + 22 (0.4) + 32 (0.1) 1.42 = 0.84,

0.84 = 0.92
=
The standard deviation of the number of repairs is 0.92 times per
year.
The cost of the maintenance is 50X + 200.
E(50X + 200) = 50(1.4) + 200 = 270
V (50X + 200)
50X+200

= 502 (0.84) = 2100

2100 = 45.83
=

Hence, the manager could anticipate the average annual cost of


Discrete Random Variable: Part I
p. 34/40
maintenance of the photocopy machine to be $270. The standard
deviation is $45.83.

Joke

A somewhat advanced society has figured how to package


basic knowledge in pill form. A student, needing some learning,
goes to the pharmacy and asks what kind of knowledge pills
are available. The pharmacist says, Heres a pill for English
literature. The student takes the pill and swallows it and has
new knowledge about English literature! What else do you
have? asks the student. Well, I have pills for art history,
biology, and world history, replies the pharmacist. The student
asks for these, and swallows them and has new knowledge
about those subjects! Then the student asks, Do you have a
pill for statistics? The pharmacist says, Wait just a moment
and goes back into the storeroom and brings back a whopper
of a pill that is about twice the size of a jawbreaker and plunks
it on the counter. I have to take that huge pill for statistics?
inquires the student. The pharmacist understandingly nods his
head and replies, Well, you know statistics always was a little
Discrete Random Variable: Part I
p. 35/40
hard to swallow.

Discrete Random Variable and Expected Value


Example 4.12:
Otitis media is a disease of the middle ear and is one of the most
frequent reasons for visiting a doctor in the first 2 years of life other
than a routine check-up. Let X be the random variable that
represents the number of otitis media in the first 2 years of life.
x

p(x)

.129

.264

.271

.185

.095

.039

.017

Find the expected value and standard deviation of X. Find the


probability that Y fall within one, two and three standard deviation of
the mean.

Solution:

= 0(.129) + 1(.264) + 2(.271) + 3(.185) + 4(.095) +


5(.039) + 6(.017) = 2.038. Thus, a child would be
expected to have approximately
2 episodes of otitis p. 36/40
Discrete Random Variable: Part I
media on average in the first two years of life.

Discrete Random Variable and Expected Value


2

= 02 (.129) + 12 (.264) + 22 (.271) + 32 (.185) + 42 (.095) +


52 (.039) + 62 (.017) 2.0382 = 1.967,

= 1.402

1 = (0.636, 3.440),

2 = (.766, 4.842),

3 = (2.168, 6.244),

P (0.636 < X < 3.444) = .72.


P (.766 < X < 4.842) = .944.
P (2.168 < X < 6.244) = 1.00.

Theorem 4.5:
Let X be a discrete random variable with the probability function p(x) and g1 (X), g2 (X), . . . , gk (X) be k
functions of X . Then,


E g1 (X) + g2 (X) + + gk (X) = E g1 (X) + g2 (X) + . . . gk (X)


Discrete Random Variable: Part I

p. 37/40

0.8
0.6
0.4

cumulative probability function

0.20
0.15
0.10

0.2

0.05
0.00

probability mass function

0.25

1.0

Discrete Random Variable and Expected Value

number of otitis episodes

number of otitis episodes

Discrete Random Variable: Part I

p. 38/40

Tchebysheffs Inequality
The following theorem can be used to determine the
bound for the probability that a random variable X fall in
an interval k .
Theorem 4.6: Tchebysheffs Theorem
Let X be a random variable with mean and finite variance 2 .
Then, for any constant k > 0,
1
P |X | < k 1 2
k


or

P |X | k

1
k2

This result applies to any probability distribution


regardless of its shape or character.
Discrete Random Variable: Part I

p. 39/40

Discrete Random Variable and Expected Value


Example 4.13: example 4.11 revisited
The manager wants to budget enough for maintenance
that he is unlikely to go over the budgeted amount. He
is considering budgeting $400 for maintenance. How
often will the maintenance cost exceed this amount?

Solution:
Note that 400 stands at 2.84 standard deviation
($45.83) of the mean $270. Thus, k = 2.84 and the
resulting interval ($140, $400) must contain at least
1
1
= 0.88
1 2 =1
2
k
2.84

of probability. Hence, the annual cost cannot exceed


$400 more than 12% of the time. Indeed, according the
mathematical model assumed, the cost will never
Discrete Random Variable: Part I
p. 40/40
exceed $350 which corresponds to the cost associated
to the maximum of three repairs.

Das könnte Ihnen auch gefallen