Sie sind auf Seite 1von 302

MATH 3301 / Summer 2019

John Garza
September 5, 2019
MATH 3301 / Notes Section 0.0

table of contents 2
Contents

Preface 7

1 What is Statistics? 9
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Using R in this Course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Basic Probability 13
2.1 A Review of Basic Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Experiments and Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Conditional Probability and Independent Events . . . . . . . . . . . . . . . . 31
2.5 The Additive and Multiplicative Laws . . . . . . . . . . . . . . . . . . . . . 35
2.6 Baye’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.7 Random Variables and Random Samples . . . . . . . . . . . . . . . . . . . . 39

3 Discrete Random Variables 41


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Moments and Moment Generating Functions . . . . . . . . . . . . . . . . . . 46
3.4 Factorial Moments and Probability Generating Functions . . . . . . . . . . . 51
3.5 Uniform Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6 Bernoulli Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.7 Binomial Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.8 Geometric Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.9 Negative Binomial Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.10 Hyper-Geometric Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.11 Poisson Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4 Continuous Probability Distributions 103


4.1 Cumulative Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . 103
MATH 3301 / Notes Section 0.0

4.2 Density Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107


4.3 Expected Value for Continuous Distributions . . . . . . . . . . . . . . . . . . 113
4.4 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.5 Uniform Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . 123
4.6 Gamma Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.7 χ2 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.8 Exponential Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.9 Beta Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.10 Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

5 Multivariate Probability Distributions 165


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.2 Bivariate Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . 171
5.3 Marginal and Conditional Distributions . . . . . . . . . . . . . . . . . . . . . 180
5.4 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 183
5.5 The Expected Value of a Function of Random Variables . . . . . . . . . . . . 187
5.6 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.7 Linear Combinations of Random Variables . . . . . . . . . . . . . . . . . . . 199
5.8 Multinomial Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
5.9 Bi-variate Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 208
5.10 Conditional Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
5.10.1 Conceptual Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
5.10.2 Conditional Expectations from Excel Data . . . . . . . . . . . . . . . 220
5.10.3 Categorical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

6 Functions of Random Variables 227


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
6.2 Probability Distributions for Functions of Random Variables . . . . . . . . . 228
6.3 The Method of Distribution Functions . . . . . . . . . . . . . . . . . . . . . 230
6.4 The Transformation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
6.5 The Method of Moment Generating Functions . . . . . . . . . . . . . . . . . 243
6.6 Transformations Using Jacobians . . . . . . . . . . . . . . . . . . . . . . . . 246
6.7 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

7 The Central Limit Theorem 263


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
7.2 Sampling Distributions from a Normal Distribution . . . . . . . . . . . . . . 267
7.3 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
table of contents 4
Section 0.0 MATH 3301 / Notes

7.4 The Continuity Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284


7.5 The t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

Biblography 299

Index 300
Preface

table of contents 5
MATH 3301 / Notes Section 0.0

table of contents 6
Preface

These notes have been prepared for the Statistics course at UT Permian Basin. The notes
are intended to be used with the matching lecture videos provided in the course’s canvas
pages. The goal of these notes is provide a simplified summary of the content contained in
the required course textbook. These notes have been prepared using LATEX and R has been
weaved into the files using http://yihui.name/knitr/.

John Garza
September 5, 2019
MATH 3301 / Notes Section 0.0

table of contents 8
Chapter 1

What is Statistics?

1.1 Introduction

Statistics is a Part of Data Analysis


A natural way of answering the question ”What is statistics?” is to say that it is a part of
data analysis. In statistics, the data often are associated with populations and samples and
the analysis includes quantities like means and variances.

Population

The population is the entire set of measurements, objects, or individuals of interest.

Sample

A sample is a subset of the population that is selected.

Statistic

A statistic is a quantity that is computed from the values of a sample. Examples


include the sample mean, sample variance, and the sample standard deviation. To
define these statistics, let x1 , . . . , xn be a sample of size n.
MATH 3301 / Notes Section 1.1

Definition of the Sample Mean x

n
1X
x = xi
n i=1

Definition of the Sample Variance, s2

n
1 X
s 2
= (xi − x)2
n − 1 i=1

Definition of the Sample Standard Deviation, s


s = s2

table of contents 10
Section 1.1 MATH 3301 / Notes

Population Mean, Population Variance, and Population Standard Deviation

The sample statistics are estimates for the population


• x estimates the population mean µ
• s2 estimates the population variance, σ 2
• s estimates the population standard deviation, σ

The Empirical Rule

For a distribution of measurements that is approximately bell shaped...


• µ ± σ contains approximately 68% of the measurements
• µ ± 2σ contains approximately 95% of the measurements
• µ ± 3σ contains almost all of the measurements

table of contents 11
MATH 3301 / Notes Section 1.2

1.2 Using R in this Course

Introduction
We will be using R, a free Data Analysis program available at https://www.r-project.
org/. After downloading R, install https://www.rstudio.com/home/ on your computer.
Set up a folder named ACTS131 on your computer. This folder will contain all of your files
for the course. As a student in this course you will have free access to all premium content
at www.datacamp.com

table of contents 12
Chapter 2

Basic Probability

2.1 A Review of Basic Set Theory

Sets and Elements

A set is a collections of elements. The notation x ∈ B means that x is an element


in the set B. The notation x 6∈ B means that x is not an element of the set B.

Set Notation

If A is a set and x, y, and z are the only elements of A, then we write


A = {x, y, z}

Sets are not Ordered

The elements of a set are not ordered. A set is determined only by its elements. For
example, if A = {x, y, z} then it is also true that A = {y, z, x}

The Universal Set

In order for set theory to be consistent, we suppose that there is a largest possible
set. This set is called set the universal set and is often denoted by a capital S.
MATH 3301 / Notes Section 2.1

The Empty Set

There is a unique empty set which is a denoted ∅. This empty set is also called
the null set.

Size or Cardinality of a Set

For a set that contains a finite number of elements, we donote the size or cardi-
nality of A by either N (A) or |A|. If the cardinality of a set is zero, then the set is
the empty set.

Subsets

If A and B are sets then A is a subset of B if and only if every element of A is also
and element of B. This is denoted by A ⊆ B

Proper Subsets

A is a proper subset of B if A is a subset of B but there exists at least one element


of B that is not contained in A. This is denoted by A ⊂ B.

Equality of Sets

Two sets A and B are equal if every element of A is an element of B and every
element of B is an element of A. This is denoted by A = B. A = B if and only if
A ⊆ B and B ⊆ A.

table of contents 14
Section 2.1 MATH 3301 / Notes

Set Operations
The Basic Set Operations

union: A ∪ B = {x ∈ S | x ∈ A or x ∈ B}

intersection A ∩ B = {x ∈ S | x ∈ A and x ∈ B}

difference A − B = {x ∈ S | x ∈ A and x 6∈ B}

complement A = {x ∈ S | x 6∈ A}

Disjoint or Mutually Exlusive Sets

Sets A and B are disjoint or mutually exclusive if A ∩ B = ∅

table of contents 15
MATH 3301 / Notes Section 2.1

The Distributive Laws and DeMorgan’s Laws


DeMorgan’s Laws

A∩B = A∪B

A∪B = A∩B

Distributive Laws

A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)

A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)

Double Complement Law

A = A

table of contents 16
Section 2.1 MATH 3301 / Notes

Extending Set Laws to More than Two Sets


DeMorgan’s Laws for Three Sets

A∩B∩C = A∪B∪C

A∪B∪C = A∩B∩C

Set Operations in R

R has built in functions for set operations


Set Operations in R

Set Operation R Function


union union(x, y)
intersection intersect(x, y)
set difference setdiff(x, y)
equality setequal(x, y)
membership el %in% set
membership is.element(el, set)

table of contents 17
MATH 3301 / Notes Section 2.1

#----- Script 2.1.1 / Set Operations in R -----

A <- sample(letters, rep = FALSE, size = 10)


A
[1] "b" "x" "p" "w" "c" "d" "q" "m" "o" "g"
B <- sample(letters, rep = FALSE, size = 10)
B
[1] "i" "t" "q" "x" "d" "s" "v" "c" "o" "l"
union(A, B) # set union
[1] "b" "x" "p" "w" "c" "d" "q" "m" "o" "g" "i" "t" "s" "v" "l"
intersect(A, B) # set intersection
[1] "x" "c" "d" "q" "o"
setdiff(A, B) # set difference
[1] "b" "p" "w" "m" "g"
setequal(A, B) # test for set equality
[1] FALSE
is.element("a", A) # is "a" in A
[1] FALSE
"b" %in% B # is "b" in B
[1] FALSE

table of contents 18
Section 2.1 MATH 3301 / Notes

Example: Drawing a Venn Diagram

https://cran.r-project.org/web/packages/VennDiagram/VennDiagram.pdf

https://scriptsandstatistics.wordpress.com/2018/04/26/how-to-plot-venn-diagrams

A survey of 100 TV viewers revealed that over the last year:

i) 34 watch CBS.

ii) 15 watch NBC.

iii) 10 watch ABC.

iv) 7 watch CBS and NBC.

v) 6 watch CBS and ABC.

vi) 5 watch NBC and ABC.

vii) 4 watch CBS, NBC and ABC.

Draw a Venn Diagram corresponding to this information

# Clear the environment


remove(list = ls())

# Install the VennDiagram Package


library(VennDiagram)

# Define the values of n1, n2 and n3


n1 <- 34
n2 <- 15
n3 <- 10

# Define the values of n12, n13 and n23

table of contents 19
MATH 3301 / Notes Section 2.1

n12 <- 7
n13 <- 6
n23 <- 5

# Define the values of n123


n123 <- 4

# Create the VennDiagram


venn_plot <-
draw.triple.venn(
area1 = n1,
area2 = n2,
area3 = n3,
n12 = n12,
n13 = n13,
n23 = n23,
n123 = n123,
category = c('NBC', 'CBS', 'ABC'),
fill = c('powderblue', 'yellow', 'pink'),
cex = 3,
cat.cex = 3,
cat.dist = c(0.1, 0.1, 0.1))

# Draw the plot


grid.draw(venn_plot)

table of contents 20
Section 2.1 MATH 3301 / Notes

NBC CBS
3
25 7
4
2 1

ABC

table of contents 21
MATH 3301 / Notes Section 2.2

2.2 Experiments and Probability

This section defines basic terminology and introduces probability functions.

Experiment

An experiment is the process by which an observation is made.

Sample Space

A sample space associated with an experiment is the set of all possible observa-
tions. A sample space will usually be denoted by S. The elements of the sample
space are called sample points.

Sample Point

A sample point is an element of the sample space.

Event

An event is a subset of the sample space. The elements of an event are called
sample points.

Simple Event

A simple event is an event that cannot be decomposed. Each simple event contains
a unique element, a sample point.

table of contents 22
Section 2.2 MATH 3301 / Notes

Compound Event

An event is a compound event if has at least two sample points. A compound


event can be decomposed into the union of two smaller events.

Discrete Sample Space

A discrete sample space is one that contains a finite or countable number of


distinct sample points.

Probability Function

Suppose S is a sample space associated with an experiment. To every event A in


S (A is a subset of S), a probability function assigns a number P (A), called the
probability of A, so that the following hold:

Axiom 1: P (A) ≥ 0.

Axiom 2: P (S) = 1.

Axiom 3: If A1 , A2 , A3 , . . . are mutually exclusive events in S then



X
P (A1 ∪ A2 ∪ A3 ∪ · · · ) = P (Ai )
i=1

table of contents 23
MATH 3301 / Notes Section 2.3

2.3 Counting

Introduction
For discrete sample spaces, probability functions require us to count the number of sample
points in an event or in the sample space. In this section we will review some of basic count-
ing definitions and theorems and we will see how R can be used to compute these quantities.

Equal Probability Formula

Let S be a finite sample space where every simple event has equal probability. Then
|E|
P (E) =
|S|

The Counting Principle or mn Rule

If a process can be completed in k ∈ Z+ steps and if step i can be completed in ni


distinct ways then the total number of distinct ways to complete the entire process
is n1 n2 · · · nk . The counting principle is also called the mn rule or the multiplication
rule.

Combinations and Binomial Coefficients

The number of combinations of n objects taken r at a time is the number of subsets,


each of size
 rthat can be formed from the n objects. This number will be denoted
n n
by Cr or . These numbers are called binomial coefficients.
r

table of contents 24
Section 2.3 MATH 3301 / Notes

The Binomial Theorem

 
n n!
=
r r!(n − r)!

n  
X n i n−i
(x + y)n = xy
i=0
i

A Binomial Identity

Letting x = 1 and y = 1 we get the identity


n  
X n
= 2n
k
k=0

Pascal’s Formula

     
n+1 n n
= +
r r−1 r

Symmetry of Binomial Coefficients

   
n n
=
r n−r

table of contents 25
MATH 3301 / Notes Section 2.3

Special Values for Binomial Coefficients

   
n n
= 1 = 1
0 n
   
n n
= n = n
1 n−1
 
n n(n − 1)
=
2 2

Permutations

An ordered arrangement of r distinct objects is called a permutation. The number


n
of ways of ordering n distinct objects taken r at a time is denoted by the symbol Pr
r
Pn = n(n − 1) · · · (n − r + 1)

n!
=
(n − r)!

table of contents 26
Section 2.3 MATH 3301 / Notes

Order and Repetition


Combinations and permutations are two cases of choosing k elements from a set of n el-
ements. In combinations and permutations repetition was not allowed. If we allow for
repetition we then have four ways of choosing k elements from a set of n

Order Matters Order Does Not Matter

 
n+k−1
Repetition Allowed nk
k
 
n
Repetition not Allowed Pnk
k

table of contents 27
MATH 3301 / Notes Section 2.3

Multinomial Coefficients

The number of ways of partitioning n distinct objects into k distinct subsets of sizes
n1 , . . . nk where n1 + · · · + nk = n is called a multinomial coefficient.
 
n n!
=
n1 · · · nk n1 ! · · · nk !

The Multinomial Theorem

X n

(x1 + · · · + xk )n = xn1 1 · · · xnk k
n1 n2 · · · nk

where the sum is over all ni ∈ {0, . . . , n} such that n1 + · · · + nk = n

Multinomial Identity

Letting x1 = x2 = · · · = xk = 1 we get the identity


X n

= kn
n1 n2 · · · nk

A Binomial Coefficient is a Multinomial Coefficient

A binomial coefficient is a multinomial coefficient.


 
n n!
=
r r!(n − r)!

table of contents 28
Section 2.3 MATH 3301 / Notes

The Addition Rule

Let S be a finite set that is the union of k mutually disjoint sets B1 , . . . , Bk . Then
|S| = |B1 | + · · · + |Bk |

The Difference Rule

Let A be a finite set and suppose that B ⊆ A. Then


|A − B| = |A| − |B|

Inclusion/Exclusion Rule

Let A, B and C be finite sets. Then


|A ∪ B| = |A| + |B| − |A ∩ B|

|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|

table of contents 29
MATH 3301 / Notes Section 2.3

Counting Functions in R
R has several built in functions for counting and we can define our own functions for those
that are not built-in already.

# Counting Functions in R ----


choose(4, 2) # binomial coeffcients
[1] 6
factorial(5) # factorial

[1] 120
# Defining a permutations function-----

perm <-
# computes the number of ordered subsets of size k from a set of size n
function(n, k){factorial(n) / factorial(k)}

# Example
perm(4, 2)
[1] 12

# defining a multinomial coefficient function----

multinom <-
# computes a multinomial coefficient
function(v){factorial(sum(v)) / prod(factorial(v))}

# Example
multinom(c(1, 2, 3))

[1] 60
#End of Script

table of contents 30
Section 2.4 MATH 3301 / Notes

2.4 Conditional Probability and Independent Events

Conditional Probability

Let A and B be events and suppose that P (B) > 0. The conditional probability of
an event A given that an event B has occurred, is defined as
P (A ∩ B)
P (A|B) =
P (B)

Conditional Probability Equations

P (A ∩ B) = P (A|B)P (B)

P (A ∩ B) = P (B|A)P (A)

P (B|A)P (A) = P (A|B)P (B)

Mutually Exclusive Events and Conditional Probability

Let A and B be mutually exclusive events. If P (B) > 0 then


P (A)
P (A|A ∪ B) =
P (A) + P (B)

table of contents 31
MATH 3301 / Notes Section 2.4

Set Inclusion and Conditional Probability

Let A and B be events such that B ⊂ A, P (B) > 0 and P (A) > 0. Then
P (A|B) = 1

P (B)
P (B|A) =
P (A)

Conditional Probability and Complements

P (A|B) = 1 − P (A|B)

Independent Events

Two events A and B are independent if any of the following hold.


P (A|B) = P (A)

P (B|A) = P (B)

P (A ∩ B) = P (A)P (B)

table of contents 32
Section 2.4 MATH 3301 / Notes

Dependent Events

Two events are dependent if and only if any of the following hold.
P (A|B) 6= P (A)

P (B|A) 6= P (B)

P (A ∩ B) 6= P (A)P (B)

Independence and Complements

The following are equivalent. If one of them is true then so are all of the others
• A and B are independent
• A and B are independent
• A and B are independent
• A and B are independent.

table of contents 33
MATH 3301 / Notes Section 2.4

Mutual Independence of Several Events

Events A, B and C are said to be mutually independent if all of the following hold
P (A ∩ B) = P (A)P (B)

P (A ∩ C) = P (A)P (C)

P (B ∩ C) = P (B)P (C)

P (A ∩ B ∩ C) = P (A)P (B)P (C)

table of contents 34
Section 2.5 MATH 3301 / Notes

2.5 The Additive and Multiplicative Laws

The Multiplicative Law of Probability

P (A ∩ B) = P (A|B)P (B)

= P (B|A)P (A)

Multiplicative Law / Independent Events

P (A ∩ B) = P (A)P (B)

Multiplicative Law / Three Sets

P (A ∩ B ∩ C) = P (A)P (B|A)P (C|A ∩ B)

Multiplicative Law / Many Sets

P (A1 ∩ A2 ∩ · · · ∩ Ak ) = P (A1 )P (A2 |A1 )P (A3 |A2 ∩ A3 ) · · · P (Ak |A1 ∩ A2 ∩ · · · ∩ Ak−1 )

table of contents 35
MATH 3301 / Notes Section 2.5

The Additive Law of Probability

P (A ∪ B) = P (A) + P (B) − P (A ∩ B)

Additive Law / Mutually Exclusive Sets

P (A ∪ B) = P (A) + P (B) when A ∩ B = ∅

Additive Law / Three Sets

P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) − P (B ∩ C)


+ P (A ∩ B ∩ C)

Complement Law

P (A) + P (A) = 1

table of contents 36
Section 2.6 MATH 3301 / Notes

2.6 Baye’s Rule

Partitions

Let S be a set. A collection of subsets B1 , . . . , Bk of S is a partition of S if


1. Bi 6= ∅ for i ∈ {1, . . . , k}
2. Bi ∩ Bj = ∅ for i 6= j
3. S = B1 ∪ B2 ∪ · · · ∪ Bk

The Law of Total Probability

Let S be a sample space and let B1 , . . . , Bk be a partition of S such that 0 < P (Bj )
for j ∈ {1, . . . , k}. Then for any event A
k
X
P (A) = P (A|Bj )P (Bj )
j=1
k
X
= P (A ∩ Bj )
j=1

Baye’s Rule

Let S be a sample space and let B1 , . . . , Bk be a partition of S such that 0 < P (Bj )
for j ∈ {1, . . . , k}. Then for any event E
P (E|Bj )P (Bj )
P (Bj |E) = Pk
i=1 P (E|Bi )P (Bi )

table of contents 37
MATH 3301 / Notes Section 2.6

Elementary Partitions

Let A and B be events such that 0 < P (B) < 1. Then

1. {B, B} is a partition.

2. P (A) = P (A ∩ B) + P (A ∩ B)

3. P (A) = P (A|B)P (B) + P (A|B)P (B)

table of contents 38
Section 2.7 MATH 3301 / Notes

2.7 Random Variables and Random Samples

Random Variables

A random variable is real valued function whose domain is a sample space.

Random Sample

If a population has size N and if all samples from the population of size n have
an equal probability of being selected, the sampling is said to be random and the
resulting sample is called a random sample.

Formula for the Number of Random Samples

 
N
number of random samples of size n from a population of size N =
n

table of contents 39
MATH 3301 / Notes Section 2.7

table of contents 40
Chapter 3

Discrete Random Variables


3.1 Introduction

Countably Infinite

A set S is said to be countably infinite if there exists a bijection


Ψ:N → S

Countable

A set S is said to be countable if it is finite or countably infinite.

Discrete Random Variable

A random variable X is discrete if the set of values it can take is countable.

Notation

1. Upper case letters, X are for random variables


2. Lower case letters, x are for values of the random variable
3. (X = x) = {s ∈ S|X(s) = x}
MATH 3301 / Notes Section 3.1

Definition of P (X = x)

The probability that a random variable X equals the value x is denoted P (X = x)


and is defined as the sum of the probabilities of all s ∈ S such that X(s) = x

Probability Function of a Discrete Random Variable

Let X be a discrete random variable. The probability function of X is defined as


pX (x) = P (X = x)

Probability Distribution of a Discrete Random Variable

The probability distribution of a discrete random variable X is a function, graph or


table that indicates pX (x) for all possible values of x

Properties of Discrete Probability Functions

The probability function pX (x) of a random variable X must satisfy


1. 0 ≤ pX (x) ≤ 1, ∀ x
X
2. pX (x) = 1
all x

Cumulative Distribution Function of a Discrete Random Variable

The cumulative distribution function of a discrete random variable X is defined as


FX [x] = P [X ≤ x]
X
= pX (w)
w≤x

table of contents 42
Section 3.2 MATH 3301 / Notes

3.2 Expected Value

Expected Value of a Discrete Random Variable

The expected value of a discrete random variable X is defined by


Xh i
E[X] = x × pX (x)
all x

Expected Value and Absolute Convergence

The expected value of a random variable can be undefined. In order for the expected
value to be defined, the above sum must be absolutely convergent, that is
Xh i
|x| × pX (x) < ∞
all x

Notation for Expected Value

The expected value is sometimes denoted µX

Expected Value of a Function of a Random Variable

Let X be a random variable and let h(X) be a function of X.


Xh i
E[h(X)] = h(x) × pX (x)
all x

table of contents 43
MATH 3301 / Notes Section 3.2

Variance

The variance of a random variable is defined as


V [X] = E (X − µX )2
 

2
= E X 2 − E[X]
 

The variance of X is sometimes denoted σX2 .

Standard Deviation

The standard deviation of the random variable X is the positive square root of the
variance of X. The standard deviation is sometimes denoted σX

Linear Transformations

Let X be a random variable. For real numbers a and b

1. E[aX + b] = aE[X] + b

2. E[b] = b

3. V [b] = 0

4. V [aX + b] = a2 V [X]

5. V [X + b] = V [X]

6. V [aX] = a2 V [X]

table of contents 44
Section 3.2 MATH 3301 / Notes

Sums of Functions of a Random Variable

For h1 (X), . . . , hk (X) functions of X and real numbers a1 , . . . , ak


 
E a1 h1 (X) + · · · + ak hk (X) = a1 E[h1 (X)] + · · · + ak E[hk (X)]

Using R to Explore Expected Value and Variance

# Using R to explore expected value and variance

# Generate a vector, Y, of random numbers from {1, 2, 3, ...., 100}


w <- seq(from = 1, to = 100, by = 1)
Y <- sample(x = w, size = 50, replace = TRUE)

# compute the mean and var of Y


mu_y <- mean(Y)
var_y <- var(Y)

# define constants a and b


a <- +3
b <- -2

# Compare E[aY + b] and aE[Y] + b


mean(a * Y + b)
[1] 149.2
a * mu_y +b
[1] 149.2
# Compare V[aY+b] and a^2V[Y]
var(a * Y + b)
[1] 7243.347
a ^ 2 * var_y
[1] 7243.347

table of contents 45
MATH 3301 / Notes Section 3.3

3.3 Moments and Moment Generating Functions

kth Moment about the Origin

The kth moment about the origin of a random variable X is


/
µk = E X k
 

kth Central Moment

The kth central moment of the random variable X is


µk = E (X − µX )k
 

Note

/
Notice that µ2 = σX2 and µ1 = µX

Moment Generating Functions

Let X be a random variable. The moment-generating function of X is


mX (t) = E[etX ]

Note

The definition requires that ∃ b ∈ R+ such that mX (t) < ∞ for all −b ≤ t ≤ b

table of contents 46
Section 3.3 MATH 3301 / Notes

Theorem

Let X be a random variable with moment generating function mX (t). Then


dk mX (t)

= m(k) (0)
dtk t=0
X

= E Xk
 

Note

The previous theorem can be used to calculate the variance of X


2
V [X] = E[X 2 ] − E[X]

 (1) 2
= m(2)
X
(0) − mX (0)

table of contents 47
MATH 3301 / Notes Section 3.3

Note

mX (0) = E[e0X ]

= E[e0 ]

= E[1]

= 1

Note

x
E[ax ] = E[eln(a ) ]

= E[ex ln a ]

= mX (ln a)

table of contents 48
Section 3.3 MATH 3301 / Notes

Linear Transformations

Let Y = aX + b. Then
mY (t) = E[etY ]

= E[et(aX+b) ]

= E[eatX etb ]

= etb mX (at)

Independent Random Variables and Moment Generating Functions

Let X1 , . . . , Xn be independent random variables and define W = X1 + · · · + Xn .


mW (t) = m1 (t) × · · · × mn (t)

Log of mX (t), E[X], and V [X]

Let X be a random variable with moment generating function mX (t).

Define h(t) = ln(mX (t)).


E[X] = h0 (0)

V [X] = h00 (0)

table of contents 49
MATH 3301 / Notes Section 3.3

Exploring Moment Generating Functions with R


# Exploring moment generating functions with R

# Let X be a binomial random variable with parameters p = 0.7 and n = 100

# Define the moment generating function of £X£


p <- 0.7
n <- 100

mg_x <- function(t){(p * exp(t) + (1 - p)) ^ n}

# Use the moment generating function to calculate E[1.1 ^ X]


mg_x(log(1.1)) # in R log is the natural logarithm ln
[1] 867.7163

# Compute E[1.1 ^ X] directly


x <- seq(from = 0, to = 100, by = 1)
p_x <- dbinom(x = x, size = n, prob = p)
sum(1.1 ^ x * p_x)
[1] 867.7163
# we get the same value

table of contents 50
Section 3.4 MATH 3301 / Notes

3.4 Factorial Moments and Probability Generating Functions

Introduction
Many random variables can take only the values 0, 1, 2, 3, ...... Such variables may be called
counts and examples include geometric, binomial, Poisson, hyper-geometric, and negative-
binomial variables. The probability generating function is useful for computing expected
value and variance for these variables.

Probability Generating Function

Let X be a random variable that takes only the values X = 0, 1, 2, 3, . . . and define
pn = P [X = n]. The probability generating function of X is

PX (t) = E[tX ]


X
p j × tj
 
=
j=0

= p0 + p1 t + p2 t2 + · · ·

Factorial Moments

For a random variable X, the k-th factorial moment of X is


µ[k] = E[X(X − 1) · · · (X − k + 1)]

For example,
µ[1] = E[X]

µ[2] = E[X(X − 1)]

table of contents 51
MATH 3301 / Notes Section 3.4

Theorem

Let X be a discrete random variable that takes only the values X = 0, 1, 2, 3, . . .


and let P (t) be the probability generating function of X. Then

dk PX (t)

= PX(k) (1)
dtk t=1

= µ[k]

= E[X(X − 1) · · · (X − k + 1)]

Note

For a count random variable X with probability generating function P (t)

E[X] = PX0 (1)

2
V [X] = PX00 (1) + PX0 (1) − PX0 (1)

table of contents 52
Section 3.5 MATH 3301 / Notes

3.5 Uniform Discrete Distributions

Definition

The uniform discrete distribution assigns equal probabilities to a finite set of num-
bers. If X is uniform discrete on {1, 2, . . . , n} then

1

 n , for x ∈ {1, . . . , n}


p(x) =


0, otherwise

Properties of the Uniform Discrete Random Variables

n+1
E[X] =
2

n2 − 1
V [X] =
12

mX (t) = E[eXt ]

X
px × etx

=
all x

n
X 1
= ejt
j=1
n

et ent − 1
= ·
n et − 1

table of contents 53
MATH 3301 / Notes Section 3.5

Exploring the Discrete Uniform Distribution using R

# Exploring the discrte uniform distribution with R

# Let X be discrete uniform on the set {1, 2, ..., 23}

n <- 23

# Define a vector of the possible values of X

x <- seq(from = 1, to = n, by = 1)
x
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
# Define the probability function

p_x <- rep(x = 1 / n, times = n)

p_x
[1] 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826
[7] 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826
[13] 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826
[19] 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826
# Compute the expected value of X using the formula

E_X <- (n + 1) / 2
E_X
[1] 12
# Compute the Expected value using the definition

sum(x * p_x)
[1] 12
# Compute the variance using the formula

V_x <- (n ^ 2 - 1) / 12
V_x
[1] 44
# Compute the variance directly

E2_X <- sum(x ^ 2 * p_x)

E2_X - (E_X) ^ 2
[1] 44

table of contents 54
Section 3.6 MATH 3301 / Notes

3.6 Bernoulli Distributions

Definition

A random variable X is a Bernoulli random variable with probability of success


p, 0 < p < 1 if it has probability function


 p, for x = 1
pX (x) = 1 − p, for x = 0

0, otherwise

For a Bernoulli random variable, 1 − p is often denoted by q.

Properties of Bernoulli Random Variables

E[X] = p

V [X] = pq

E[X n ] = p

mX (t) = q + pet

Bernoulli Trial

A Bernoulli trial is an experiment that has only two possible outcomes.

table of contents 55
MATH 3301 / Notes Section 3.7

3.7 Binomial Distributions

Binomial Experiments

A binomial experiment is defined by


1. A fixed number of identical trials, n
2. The n trials are independent
3. A trial results in one of two outcomes: Success, S, or failure, F
4. The probability of success on a single trial is equal to some value p and remains
the same from trial to trial. The probability of a failure is q = (1 − p)
5. The random variable, X, is the number of successes during the n trials.

The Binomial Distribution

For a random variable X with binomial distribution, parameters p ∈ (0, 1) and


n ∈ Z+
 
n x n−x
p(x) = p q where x ∈ {0, . . . , n}
x

Properties

E[X] = np

V [X] = npq

mX (t) = (q + pet )n
(
b(n + 1)pc, if (n + 1)p 6∈ Z
mode =
(n + 1)p and (n + 1)p − 1, if (n + 1)p ∈ {1, . . . , n}

table of contents 56
Section 3.7 MATH 3301 / Notes

Using R for Binomial Distributions

probability function dbinom(x, size, prob, log = FALSE)

CDF pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)

quantiles qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE)

random numbers rbinom(n, size, prob)

Note

Note that the argument prob is used to specify the probability of success. In the R
functions above the arguments p and q stand for probability and quantile.

Using R for Binomial Distributions


# X is binomial with p = 0.3 and n = 10. Compute P[X = 4]
dbinom(x = 4, size = 10, prob = 0.3)

[1] 0.2001209
#X is binomial with p = 0.234 and n = 27. Compute P[X <= 11]
pbinom(11, size = 27, prob = 0.234)
[1] 0.9871378

#X is binomial with p = 0.44 and n = 637. Compute P[X > 319]


pbinom(319, size = 637, prob = 0.44, lower.tail = FALSE)

[1] 0.0009037255
#X is binomial with n = 781 and p = 0.602. Compute the 71st percentile of X
qbinom(0.71, size = 781, prob = 0.602)
[1] 478

table of contents 57
MATH 3301 / Notes Section 3.7

Calculations for Binomial Distributions


# X is binomial with p = 0.5 and n = 10. Plot the probability function of X
f <- function(x){dbinom(x, size = 10, prob = 0.5)}

x <- seq(from = 0, to = 10, by = 1)

# set up the pdf

x <- seq(from = 0, to = 10, by = 1)


d <- dbinom(x = x, size = 10, prob = 0.5)
M <- data_frame(x = x, y = d)
G <-
ggplot(
data = M,
mapping = aes(x = x, y = y)) +
geom_col(
color = 'red',
size = 0.5,
fill = 'red',
alpha = 0.5,
width = 1) +
scale_x_continuous(
breaks = seq(from = 0, to = 10, by = 1)) +
labs(
x = 'random variable',
y = 'probability',
title = 'Binomial Probability Function',
subtitle = NULL) +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

table of contents 58
Section 3.7 MATH 3301 / Notes

Binomial Probability Function


0.25

0.20

0.15
probability

0.10

0.05

0.00

0 1 2 3 4 5 6 7 8 9 10
random variable

table of contents 59
MATH 3301 / Notes Section 3.7

Using R for Binomial Distributions

# X is a binomial random variable with parameters n = 1023 and p = 0.439.

# Create a histogram for a random sample of 1000 from X. Use 50 bins for the histogram

M <- data_frame(rb = rbinom(n = 1000, size = 1023, prob = 0.439))

G <-
ggplot(data = M) +
geom_histogram(
mapping = aes(x = rb, y = ..density..),
fill = 'darkgreen',
color = 'darkgreen',
alpha = 0.5,
size = 0.5,
bins = 33) +
labs(
x = 'random binomial numbers',
y = 'relative frequency',
title = 'Random Binomial Numbers, p = 0.439, n = 1023',
subtitle = NULL,
caption = 'Summer 2019')+
theme_classic() +
theme(
text = element_text(size = 16, face = 'italic', family = 'serif', color = 'black'),
axis.ticks = element_blank())

table of contents 60
Section 3.7 MATH 3301 / Notes

Random Binomial Numbers, p = 0.439, n = 1023

0.03
relative frequency

0.02

0.01

0.00

400 425 450 475 500


random binomial numbers
Summer 2019

table of contents 61
MATH 3301 / Notes Section 3.7

Using R for Binomial Distributions


# X is a binomial random variable with parameters n = 25 and p = 0.439.

# Create a graph of the emprical distribution function for a random sample of 1000 from X.
rb <- rbinom(n = 1e6, size = 25, prob = 0.439)
tb <- table(rb)
nb <- sort(unique(rb))
cb <- cumsum(tb) /length(rb)

M <- data_frame(nb, tb, cb)

G <-
ggplot(data = M) +
geom_col(
mapping = aes(x = nb, y = cb),
fill = 'maroon',
color = 'maroon',
alpha = 0.5,
width = 1,
size = 0.5) +
scale_x_continuous(
breaks = seq(from = min(rb), to = max(rb), by = 2))+
labs(
x = 'random binomial numbers',
y = 'cumulative frequency',
title = 'Random Binomial Numbers, p = 0.439, n =25',
subtitle = NULL,
caption = 'Summer 2019') +
theme_bw() +
theme(
text = element_text(size = 16, face = 'italic', family = 'serif', color = 'black'),
axis.ticks = element_blank())

table of contents 62
Section 3.7 MATH 3301 / Notes

Random Binomial Numbers, p = 0.439, n =25


1.00

0.75
cumulative frequency

0.50

0.25

0.00

1 3 5 7 9 11 13 15 17 19 21
random binomial numbers
Summer 2019

table of contents 63
MATH 3301 / Notes Section 3.7

Computing Expected Values and Variances Using R

# Let X be a binomial random variable with parameters p = 0.439 and n = 571

# Define h(X) = (X ^ 3.1 + x ^ 1.58 + 1) ^ (1 / 3)


h <- function(x){(x ^ 3.1 + x ^ 1.58 + 1) ^ (1 / 3)}

# Compute the variance of h(X) using R

# Define the binomial paramters


p <- 0.439
n <- 571

# Define the possible values of X


x_vals <- seq(from = 0, to = n, by = 1)

# Define the probability function


f <- function(x){dbinom(x, size = n, prob = p)}

# Compute E[h(X)] and E[h(X)^2]


E1_h <- sum(h(x_vals) * f(x_vals))

E2_h <- sum(h(x_vals) ^ 2 * f(x_vals))

# Compute the variance of h(X)


var_h <- E2_h - (E1_h) ^ 2
var_h
[1] 216.985

table of contents 64
Section 3.8 MATH 3301 / Notes

3.8 Geometric Distributions

Geometric Distributions

A geometric distribution is described by


1. Identical Bernoulli trials
2. The trials are independent
3. Success probability p and failure probability q = 1 − p
4. The random variable X is the number of trials until the first success
5. The random variable Y is the number of failures before the first success
6. Note that X = Y + 1

Definition of the Geometric Probability Distribution

For the random variable X from the geometric distribution with parameter p ∈ (0, 1)
and q = 1 − p
pX (x) = q x−1 p, where x ∈ Z+

table of contents 65
MATH 3301 / Notes Section 3.8

Properties of the Geometric Distribution

pX (x) = q x−1 p

pX (x + 1) = qpX (x)

P [X ≤ x] = 1 − q x

1
E[X] =
p
1−p
V [X] =
p2
pt
PX (t) =
1 − qt
pet
mX (t) =
1 − qet

mode = 1

Memory-less Property of the Geometric Distribution

P [X > a + b | X > a] = P [X > b]

table of contents 66
Section 3.8 MATH 3301 / Notes

Properties of Geometric Distributions for Y

For the random variable Y from the geometric distribution with parameter p ∈
(0, 1)
pY (y) = q y p, where y ∈ Z+ ∪ {0}

P [Y ≤ y] = 1 − q y+1

1−p
E[Y ] =
p
1−p
V [Y ] =
p2
p
PY (t) =
1 − qt
p
mY (t) =
1 − qet

mode = 0

Memory-less Property of the Geometric Distribution

P [Y > a + b | Y > a] = P [Y > b]

table of contents 67
MATH 3301 / Notes Section 3.8

Geometric Sum Formula

Let r ∈ R the geometric sum formula is ...



X 1
rj = for −1 < r < 1
j=0
1−r


X r
rj = for −1 < r < 1
j=1
1−r


X
rj diverges for 1 ≤ |r|
j=0

Example

∞  j
X 2 1
=
j=0
3 1 − 2/3
1
=
1/3

= 3

table of contents 68
Section 3.8 MATH 3301 / Notes

Using R for the Geometric Random Variable Y

probability function dgeom(x, prob, log = FALSE)

CDF pgeom(q, prob, lower.tail = TRUE, log.p = FALSE)

quantiles qgeom(p, prob, lower.tail = TRUE, log.p = FALSE)

random numbers rgeom(n, prob)

Using R for the Geometric Random Variable Y


# Let Y be a geometric random variable with parameter p = 0.25
# Compute P[Y = 3]
dgeom(x = 3, prob = 0.25)
[1] 0.1054688

# Let Y be a geometric random variable with parameter p = 1/3


# Compute P[Y <= 5]
pgeom(q = 5, prob = 1 / 3)
[1] 0.9122085

# Let Y be a geometric random variable with parameter p = 2/5


# Compute P[Y > 4]
pgeom(q = 4, prob = 2 / 5, lower.tail = FALSE)
[1] 0.07776

# Let Y be a geometric random variable with parameter p = 0.4


# Solve for the smallest value of x such that P[Y <= x] > 0.995
qgeom(p = 0.995, prob = 0.4)
[1] 10

table of contents 69
MATH 3301 / Notes Section 3.8

Using R for the Geometric Random Variable Y


library(tidyverse)
# Let be a geometric random variable with parameter p = 0.25
# Generate a random sample of size 1000 from X and plot the histogram
rn <- rgeom(n = 1000, prob = 0.25)
br <- seq(from = min(rn) - 0.5, to = max(rn) + 0.5, by = 1)
M <- data_frame(x = rn)

# Create the histogram


G <-
ggplot(
data = M,
mapping = aes(x = rn)) +
geom_histogram(
mapping = aes(y = ..count.. / sum(..count..)),
breaks = br,
color = 'blue',
fill = 'blue',
alpha = 0.3) +
labs(
title = 'Relative Frequency of Geometric Data',
x = NULL,
y = NULL) +
xlim(
lower = min(rn) - 0.5,
upper = max(rn) + 0.5) +
theme(
axis.ticks = element_blank(),
text =element_text(size = 16, face = 'italic', color = 'black', family = 'serif'))

vfill

table of contents 70
Section 3.8 MATH 3301 / Notes

Relative Frequency of Geometric Data


0.25

0.20

0.15

0.10

0.05

0.00

0 10 20

table of contents 71
MATH 3301 / Notes Section 3.9

3.9 Negative Binomial Distributions

Introduction

The negative binomial distribution is defined by

1. Identical Bernoulli trials

2. The trials are independent

3. Success probability p and failure probability q = 1 − p

4. The random variable X is the number trials until the rth success

5. The random variable Y is the number of failures until the rth success

6. Note that X = Y + r

table of contents 72
Section 3.9 MATH 3301 / Notes

Definition of the Negative Binomial Distribution, X

For a negative-binomial random variable X with parameters p and r = 2, 3, 4, . . .


 
x − 1 r x−r
p(x) = pq , where x = r, r + 1, . . . ,
r−1

Properties of the Negative Binomial Distribution

r
E[X] =
p

r(1 − p)
V [X] =
p2

r
pet

mX (t) =
1 − qet

Note:

In the equations above, X is the number of trials until the rth success.

table of contents 73
MATH 3301 / Notes Section 3.9

Definition of the Negative Binomial Distribution, Y

For a negative-binomial random variable Y with parameters p ∈ (0, 1) and r =


2, 3, 4, . . .
 
r+y−1 r y
pY (y) = pq , where y = 0, 1, 2, 3, , . . . ,
y

Properties of the Negative Binomial Distribution, Y

r(1 − p)
E[Y ] =
p

r(1 − p)
V [Y ] =
p2

 r
p
mY (t) =
1 − qet

Note:

In the equations above, Y is the number of failures until the rth success.

table of contents 74
Section 3.9 MATH 3301 / Notes

Using R for the Negative-Binomial Random Variable, Y

probability function dnbinom(y, size, prob, mu, log = FALSE)

distribution function pnbinom(q, size, prob, mu, lower.tail = TRUE)

quantile function qnbinom(p, size, prob, mu, lower.tail = TRUE)

random numbers rnbinom(n, size, prob, mu)

Using R for the Negative-Binomial Random Variable, Y


# Let Y be a negative-binomial random variable with parameters
# p = 0.5 and r = 6. Compute the proability that Y=3
dnbinom(3, size = 6, prob = 0.5)

[1] 0.109375
# Compute the probability that Y <= 4
pnbinom(4, prob = 0.5, size = 6)
[1] 0.3769531
# Compute the probability that Y>6
pnbinom(6, prob = 0.5, size = 6, lower.tail = FALSE)
[1] 0.387207

# Solve for the smallest value w such that P[Y<=w] > 0.5
qnbinom(0.5, size = 6, prob = 0.5)

[1] 5
# Solve for the largest value w such that P[Y>w] > 0.4
qnbinom(0.4, size = 6, prob = 0.5, lower.tail = FALSE)
[1] 6

table of contents 75
MATH 3301 / Notes Section 3.9

Using R for the Negative-Binomial Random Variable, Y

# Let Y be a negative binomial random variable with parameters r = 6 and p = 0.5


# Plot the probability function for Y from Y = 0 to Y = 20

x <- seq(from = 0, to = 20, by = 1)


d <- dnbinom(x = x, size = 6, prob = 0.5)
M <- data_frame(x = x, y = d)
G <-
ggplot(
data = M,
mapping = aes(x = x, y = y)) +
geom_col(
color = 'purple',
fill = 'purple',
alpha = 0.5,
size = 0.5) +
labs(
x = NULL,
y = NULL,
title = 'A Negative Binomial Distribution',
subtitle = 'John Garza')+
theme(
text = element_text(size = 18, face = 'italic', family = 'serif', color = 'black'),
axis.ticks = element_blank())

table of contents 76
Section 3.9 MATH 3301 / Notes

A Negative Binomial Distribution


John Garza
0.125

0.100

0.075

0.050

0.025

0.000

0 5 10 15 20

table of contents 77
MATH 3301 / Notes Section 3.9

Using R for the Negative-Binomial Random Variable, Y

M <- data_frame(rv = rnbinom(n = 1000, size = 6, prob = 0.5))

G <-
ggplot(data = M) +
geom_histogram(
mapping = aes(x = rv, y = ..density..),
fill = 'darkorange',
color = 'darkorange',
alpha = 0.5,
breaks = seq(from = -0.5, to = max(M$rv) + 0.5, by = 1)) +
labs(
x = NULL,
y = NULL,
title = 'Random Negative Binomial Numbers, p = 0.5, n = 6',
subtitle = NULL,
caption = 'Summer 2019')+
theme(
text = element_text(size = 16, face = 'italic', family = 'serif', color = 'black'),
axis.ticks = element_blank())

table of contents 78
Section 3.9 MATH 3301 / Notes

Random Negative Binomial Numbers, p = 0.5, n = 6

0.10

0.05

0.00

0 5 10 15 20
Summer 2019

table of contents 79
MATH 3301 / Notes Section 3.10

3.10 Hyper-Geometric Distributions

Introduction

The hyper-geometric distribution can be described by the following

1. A box contains N = n + m ∈ Z+ balls

2. m of the balls are red and n = N − m of the balls are black.

3. A random sample of size k is chosen from the box.

4. The random variable, X, is the number of red balls in the sample.

5. The possible range of X is max{0, k − n} ≤ X ≤ min{k, m}

table of contents 80
Section 3.10 MATH 3301 / Notes

Probability Function for the Hyper-Geometric Distribution

  
m N −m
x k−x
pX (x) =  
N
k

mk
E[X] =
N

m N − m N − n
V [X] = k
N N N −1

table of contents 81
MATH 3301 / Notes Section 3.10

Identity

n
X
Since pX (k) = 1, we should expect the following identity to be true
k=0

k     
X m N −m N
=
i=0
i k−i k

table of contents 82
Section 3.10 MATH 3301 / Notes

Using R for Hyper-Geometric Distributions

probability function dhyper(x, m, n, k, log = F)

distribution function phyper(q, m, n, k, lower.tail = T, log.p = F)

quantile function qhyper(p, m, n, k, lower.tail = T, log.p = F)

random numbers rhyper(nn, m, n, k)

Using R for Hyper-Geometric Distributions


# X is a hyper-geometric random variable with parameters m = 5, n = 10 and k = 7
# Compute P[X = 2]
dhyper(x = 2, m = 5, n = 10, k = 7)

[1] 0.3916084
# X is a hyper-geometric random variable with parameters m = 53, n = 100
# and k = 77
# Compute P[X <= 20]
phyper(q = 20, m = 53, n = 100, k = 77)
[1] 0.01773732

# X is a hyper-geometric random variable with m = 101, n = 202, and k = 45


# Compute P[X > 15]
phyper(q = 15, m = 101, n = 202, k = 45, lower.tail = FALSE)

[1] 0.4269205
# X is a hyper-geometric random variable with m = 15, n = 17, and k = 11
# Compute the smallest value w such that P[X < w] > 0.65
qhyper(p = 0.65, m = 15, n = 17, k = 11)

[1] 6

table of contents 83
MATH 3301 / Notes Section 3.10

Using R for Hyper-Geometric Distributions

# X is a hyper-geometric random variable with m = 10, n = 10 and k = 10


# Plot P[X = j] for j = 0, 1, 2,...., 10

x <- seq(from = 0, to = 10, by = 1)


y <- dhyper(x = x, m = 10, n = 10, k = 10)
M <- data_frame(x = x, y = y)

G <-
ggplot(
data = M,
mapping = aes(x = x, y =y)) +
geom_col(
color = 'darkviolet',
fill = 'darkviolet',
alpha = 0.7,
width = 1) +
scale_x_continuous(
breaks = seq(from = 0, to = 10, by = 1))+
theme_classic()+
labs(
x = NULL,
y = NULL,
title = 'A Hypergeometric Probability Distribution',
subtitle = NULL)+
theme(
text = element_text(size = 16, face = 'italic', family = 'serif', color = 'black'),
axis.ticks = element_blank())

table of contents 84
Section 3.10 MATH 3301 / Notes

A Hypergeometric Probability Distribution

0.3

0.2

0.1

0.0

0 1 2 3 4 5 6 7 8 9 10

table of contents 85
MATH 3301 / Notes Section 3.10

Random Hyper-Geometric Numbers


# X is hyper geometric with m = 50, n = 100, and k = 75
# Generate 10000 random numbers from X and plot the histogram

x <- rhyper(nn = 1e4, m = 50, n = 100, k = 75)


br <- seq(from = min(x) - 0.5, to = max(x) + 0.5, by = 1)
M <- data_frame(x = x)

G <-
ggplot(
data = M,
mapping = aes(x = x, y = ..density..))+
geom_histogram(
color = 'darkturquoise',
fill = 'darkturquoise',
alpha = 0.7,
binwidth = 1) +
theme_bw() +
labs(
title = 'Random Hyper Geometric Numbers',
subtitle = NULL,
x = NULL,
y = NULL) +
theme(
text = element_text(size = 18, face = 'italic', family = 'serif', color = 'black'),
axis.ticks = element_blank())

table of contents 86
Section 3.10 MATH 3301 / Notes

Random Hyper Geometric Numbers


0.15

0.10

0.05

0.00

15 20 25 30 35

table of contents 87
MATH 3301 / Notes Section 3.11

3.11 Poisson Distributions

Introduction

The Poisson distribution has a single parameter which is often called λ. A Poisson
random variable X can take only the values X = 0, 1, 2, 3, . . . Examples of Poisson
random variables may include, the number accidents per day at a factory, the number
of typos per page in a printed book, or the number of customers arriving at a store
in the next hour.

The Probability Function for the Poisson Distribution

λk −λ
pX (k) = e
k!

Properties of the Poisson Distribution

λ
pX (k) = pX (k − 1) ×
k

E[X] = λ

V [X] = λ

t
mX (t) = eλ(e −1)

PX (t) = eλ(t−1)
(
bλc for λ 6∈ Z+
mode =
λ and λ − 1 for λ ∈ Z+

table of contents 88
Section 3.11 MATH 3301 / Notes

Sums of Independent Poisson Random Variables

Let X1 and X2 be independent Poisson random variables with means λ1 and λ2 .


Define W = X1 + X2 . Then W is a Poisson random variable with mean
λw = λ1 + λ2

table of contents 89
MATH 3301 / Notes Section 3.11

The Power Series Expansion for ex


x
X xj
e =
j=0
j!

x2 x3
= 1+x+ + + ···
2! 3!

An Application of the Power Series Expansion of ex

The power series expansion is important for problems about Poisson distributions.
Note the flowing application. Let X be a PoissonP random variable with mean λ.
Use the power series expansion of e to verify that ∞
x
k=0 P [X = k] = 1.
∞ ∞
X X λk
P [X = k] = e−λ
k!
k=0 k=0


!
X λk
= e−λ
k!
k=0

= e−λ eλ

= e0

= 1

table of contents 90
Section 3.11 MATH 3301 / Notes

Poisson Distributions

probability function dpois(x, lambda, log = F)

distribution function ppois(q, lambda, lower.tail = T, log.p = F)

quantile function qpois(p, lambda, lower.tail = T, log.p = F)

random numbers rpois(n, lambda)

# X is Poisson random variable with mean 3. Compute P[X = 2]


dpois(x = 2, lambda = 3)

[1] 0.2240418
# X is a Poission random variable with mean 5. Compute P[X <= 6]
ppois(q = 6, lambda = 5)
[1] 0.7621835
# X is a Poisson random variale with mean 4. Compute P[X > 5]
ppois(q = 5, lambda = 4, lower.tail = FALSE)

[1] 0.2148696
# X is a Poisson random variable with mean 2 Compute P[4 <= X <= 7]
ppois(q = 7, lambda = 2) - ppois(q = 3, lambda = 2)
[1] 0.1417798
# X is a Poisson random variable with mean 3.3. Solve for the
# smallest value of w such that P[X < w] > 0.99. Check your answer
qpois(p = 0.99, lambda = 3.3)
[1] 8
ppois(q = 8, lambda = 3.3)

[1] 0.9930882
ppois(q = 7, lambda = 3.3)
[1] 0.9802229

table of contents 91
MATH 3301 / Notes Section 3.11

Using R for Poisson Distributions


# X is Poisson with mean 1.2. Find the smallest w where P[X > w] < 0.01

qpois(p = 0.01, lambda = 1.2, lower.tail = FALSE)


[1] 4
# For a Poisson random variable X with mean 5,
# Plot the probability function for 0 <= X <= 20

M <-
data_frame(
x = seq(from = 0, to = 20, by = 1),
y = dpois(x, lambda = 5))

G <-
ggplot(
data = M,
mapping = aes(x = x, y = y)) +
geom_col(
color = 'darkgreen',
fill = 'darkgreen',
alpha = 0.4,
width = 1)+
labs(
x = 'Random Variable',
y = NULL,
title = 'Poisson Probability Function',
subtitle = NULL) +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 18,
color = 'black',
face = 'italic',
family = 'serif'))

table of contents 92
Section 3.11 MATH 3301 / Notes

Poisson Probability Function

0.15

0.10

0.05

0.00

0 5 10 15 20
Random Variable

table of contents 93
MATH 3301 / Notes Section 3.11

Using R for Poisson Distributions


# Generate 1000 random poisson numbers with mean 5 and plot the results in a histogram
M <- data_frame(x = rpois(n = 1000, lambda = 5))
G <-
ggplot(
data = M,
mapping = aes(x = x, y = ..density..)) +
geom_histogram(
color = 'purple',
fill = 'purple',
alpha = 0.2,
binwidth = 1) +
labs(
x = 'Random Variable',
y = NULL,
title = 'Relative Frequency of Random Poison Numbers',
subtitle = NULL) +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

table of contents 94
Section 3.11 MATH 3301 / Notes

Relative Frequency of Random Poison Numbers

0.15

0.10

0.05

0.00

0 5 10
Random Variable

table of contents 95
MATH 3301 / Notes Section 3.11

The Cumulative Distribution Function:


Let X be a Poisson random variable with λ = 4. Graph the cumulative distribution function, FX (x) = P [X ≤
x] for X = 0, 1, 2, 3, · · · , 15.

# Define a vector of x values and p_X(x) values


x <- seq(from = 0, to = 15, by = 1)
p_x <- dpois(x, lambda = 4)
F_x <- cumsum(p_x)

# Plot F_x
M <- data_frame(x, F_x)

G <-
ggplot(
data = M,
mapping = aes(x = x, y = F_x)) +
geom_col(
color = 'darkorange',
fill = 'darkorange',
alpha = 0.6,
width = 1) +
labs(
title = 'A Cumulative Distribution Function / Poisson ',
subtitle = NULL,
x = 'x-values',
y = NULL) +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

table of contents 96
Section 3.11 MATH 3301 / Notes

A Cumulative Distribution Function / Poisson


1.00

0.75

0.50

0.25

0.00

0 5 10 15
x−values

table of contents 97
MATH 3301 / Notes Section 3.11

The Survival Function:


Let X be a Poisson random variable with λ = 4. Graph the survival function, sX (x) = P [X > x] for
X = 0, 1, 2, 3, · · · , 15.

# Define a vector of x values and p_X(x) values


x <- seq(from = 0, to = 15, by = 1)
p_x <- dpois(x, lambda = 4)
s_x <- ppois(x, lambda = 4, lower.tail = FALSE)

# Plot F_x
M <- data_frame(x, s_x)

G <-
ggplot(
data = M,
mapping = aes(x = x, y = s_x)) +
geom_col(
color = 'red',
fill = 'red',
alpha = 0.3,
width = 1) +
labs(
title = 'The Survival Function of a Poisson Random Variable',
subtitle = NULL,
x = 'x-values',
y = NULL) +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

table of contents 98
Section 3.11 MATH 3301 / Notes

The Survival Function of a Poisson Random Variable


1.00

0.75

0.50

0.25

0.00

0 5 10 15
x−values

table of contents 99
MATH 3301 / Notes Section 3.11

Emperical Distibution Functions


Plot the emperical distribution function for a vector of 10,000 random numbers from a Poisson random variable
with mean 4.

# Generate a vector of random


x = rpois(n = 1e4, lambda = 4) %>% sort

ecdf = (x %>% table %>% as.vector %>% cumsum) / length(x)

M <- data_frame(ecdf = ecdf, x = unique(x))

# Plot the empiracal cumulative distribution function


G <-
ggplot(
data = M,
mapping = aes(x = x, y = ecdf)) +
geom_col(
fill = 'thistle',
color = 'purple',
width = 1) +
labs(
title = 'An Emperical Cumulative Distribution Function',
subtitle = NULL,
x = 'x-values',
y = NULL) +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'))

table of contents 100


Section 3.11 MATH 3301 / Notes

An Emperical Cumulative Distribution Function


1.00

0.75

0.50

0.25

0.00

0 5 10
x−values

table of contents 101


MATH 3301 / Notes Section 3.11

table of contents 102


Chapter 4

Continuous Probability Distributions

4.1 Cumulative Distribution Functions

Cumulative Distribution Functions

X be a random variable. The cumulative distribution function of X is...


FX (x) = P [X ≤ x]

The cumulative distribution function is often written as CDF

Properties of Cumulative Distribution Functions

Let X be a random variable with cumulative distribution function FX (x)

1. FX (x) is non-decreasing

2. P [a < X ≤ b] = FX (b) − FX (a)

3. lim FX (x) = 0
x→−∞

4. lim FX (x) = 1
x→+∞
MATH 3301 / Notes Section 4.1

Cumulative Distribution Functions in R

You can work with many CDFs in R. The below script demostrates the four prperties above
using the CDF for standard normal random variable.

x <- seq(from = -4, to = +4, length = 1e3)


y <- pnorm(x, mean = 0, sd = 1)
M <- data_frame(x = x, y = y)

G <-
ggplot(
data = M, mapping = aes(x = x, y = y)) +
geom_line(
color = 'red',
size = 1.2,
linetype = 'solid') +
labs(
title = 'A Cumulative Distribution Function',
subtitle = NULL,
x = 'x-values',
y = NULL,
caption = 'Statistics I / UT - Permian Basin') +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'))

table of contents 104


Section 4.1 MATH 3301 / Notes

A Cumulative Distribution Function


1.00

0.75

0.50

0.25

0.00

−4 −2 0 2 4
x−values
Statistics I / UT − Permian Basin

# Compute P[-1 <= Z <= +1]


pnorm(1) - pnorm(-1)
[1] 0.6826895

table of contents 105


MATH 3301 / Notes Section 4.1

Definition of Continuous Random Variable

Let X be a random variable with cumulative distribution function FX (x). X is a continuous random
variable if

0
1. FX (x) exists everywhere except possibly for a finite set on any finite interval

0
2. FX (x) is continuous except possibly for a finite set on any finite interval

table of contents 106


Section 4.2 MATH 3301 / Notes

4.2 Density Functions


Probability Density Functions

For a continuous random variable X the probability density function is


0
fX (x) = FX (x)

Relationships Between FX (x) and fX (x)

0
fX (x) = FX (x)

Zx
FX (x) = f (t) dt
−∞

Review of the Fundamental Theorem of Calculus

Zx
If f (x) is a continuous function on the interval [a, b] and F (x) = f (t) dt, then
a

0 d
F (x) = [F (x)]
dx

 x 
Z
d 
= f (t) dt
dx
a

= f (x)

table of contents 107


MATH 3301 / Notes Section 4.2

Properties of Density Functions

For a random variable X with density function fX (x)

1. fX (x) ≥ 0 ∀ x

Z+∞
2. fX (x) dx = 1
−∞

Zb
3. P [a ≤ X ≤ b] = fX (x) dx
a

4. P [a ≤ X ≤ b] = FX (b) − FX (a)

The Mode of a Continuous Random Variable

Let X be a continuous random variable with density function fX (x). The mode(s) of X are the
value(s) of X that maximize fX (x)

The Support of a Density Function

Let X be a continuous random variable with probability density function fX (x). The support of
fX (x) is {x | fX (x) 6= 0}.

table of contents 108


Section 4.2 MATH 3301 / Notes

Quantiles

Let X be a random variable with cumulative distribution function FX (x). For a number p ∈ (0, 1),
the pth quantile of X is the smallest number φp such that
P [X ≤ φp ] = FX (φp )

≥ p

Percentiles

For a continuous random variable X with cumulative distribution function FX (x), the 100pth per-
centile of X is the smallest number φp satisfying
P [X ≤ φp ] = FX (φp )

= p

table of contents 109


MATH 3301 / Notes Section 4.2

Working with Density Functions in R

The following script is an example of working with a probability density function using R. The density is a
gamma density which you will learn about later in the chapter.

# Define the density function -----


f <- function(x){(0 < x) * (x * exp(-x))}

# Define a sequence of x values and a sequene of y values


x <- seq(from = 0, to = 7, length = 1e3)
y <- f(x)

# Create a data_frame
M <- data_frame(x = x, y = y)

# Plot the density curve


G <-
ggplot(
data = M,
mapping = aes(x = x, y = y)) +
geom_line(
color = 'red',
size = 1.3,
linetype = 'solid') +
geom_vline(
xintercept = 1,
color = 'black',
size = 1,
linetype = 'dashed') +
labs(
title = 'A Probability Density Function',
subtitle = NULL,
x = 'x-values',
y = NULL) +
theme(
axis.ticks = element_blank(),
text = element_text(
color = 'black',
size = 16,
face = 'italic',
family = 'serif'))

table of contents 110


Section 4.2 MATH 3301 / Notes

A Probability Density Function

0.3

0.2

0.1

0.0

0 2 4 6
x−values

# Compute P[2 < = X <= 3] using the density function


integrate(f, lower = 2, upper = 3)$value

[1] 0.2068576

table of contents 111


MATH 3301 / Notes Section 4.2

Working with Quantile Functions in R

# Let X be a hypergeometric random variable with parameters m = 101, n = 88, and k = 33

# Find the 63rd quantile of X


qhyper(p = 0.63, m = 101, n = 88, k = 33)
[1] 19

# review your answer using phyper


phyper(seq(from = 1, to = 33, by = 1), m = 101, n = 88, k = 33)

[1] 1.296440e-11 3.708639e-10 6.683471e-09 8.532358e-08 8.227057e-07


[6] 6.238233e-06 3.824475e-05 1.934509e-04 8.199343e-04 2.948050e-03
[11] 9.083238e-03 2.419228e-02 5.613663e-02 1.143375e-01 2.059623e-01
[16] 3.308173e-01 4.782155e-01 6.289789e-01 7.624786e-01 8.646503e-01
[21] 9.320606e-01 9.702624e-01 9.887671e-01 9.963798e-01 9.990176e-01
[26] 9.997791e-01 9.999597e-01 9.999942e-01 9.999994e-01 1.000000e+00
[31] 1.000000e+00 1.000000e+00 1.000000e+00

table of contents 112


Section 4.3 MATH 3301 / Notes

4.3 Expected Value for Continuous Distributions

The Expected Value of a Continuous Random Variable

Let X be a continuous random variable with density functions f (x). The expected
value of X is defined as
Z∞
E[X] = xf (x) dx
−∞

The Expected Value of a function of X

Let X be a continous variable with density f (x) and let g(X) be a function of X.
The expected value of g(X) is ...
Z∞
E[g(X)] = g(x)f (x) dx
−∞

table of contents 113


MATH 3301 / Notes Section 4.3

The Expected Value of a Constant

For a constant c ∈ R
Z∞
E[c] = cf (x) dx
−∞

Z∞
= c f (x) dx
−∞

= c·1

= c

table of contents 114


Section 4.3 MATH 3301 / Notes

The Expected Value is a Linear Function of X

Let X be a continuous random variable and let a, b ∈ R. Then

1. E[aX] = aE[X]

2. E[aX + b] = aE[X] + b

3. E[ag(X) + b] = aE[g(X)] + b

Computing an Expected Value with R

A random variable X has density function



0.2 , for − 1 ≤ x ≤ 0





f (x) = 0.2 + (1.2)x , for 0 < x ≤ 1






0 , otherwise

Plot the density function. Compute the mean and variance of X.

table of contents 115


MATH 3301 / Notes Section 4.3

# Define the density function


f <- function(x){(-1 <= x & x <= 0) * 0.2 + (0 < x & x <= 1) * (0.2 + 1.2 * x)}

# Verify that the area under the density is one


integrate(f, lower = -Inf, upper = +Inf)$value
[1] 1

# Expected Value
EX <- integrate(function(x){x * f(x)}, lower = -Inf, upper = +Inf)$value
EX

[1] 0.4
# Variance
VX <- integrate(function(x){(x - EX) ^ 2 * f(x)}, lower = -Inf, upper = +Inf)$value
VX

[1] 0.2733333

table of contents 116


Section 4.3 MATH 3301 / Notes

Plotting the Density Function

# Define the density function


f <- function(x){(-1 <= x & x <= 0) * 0.2 + (0 < x & x <= 1) * (0.2 + 1.2 * x)}

# Define sequences for plotting

# Plot the density function


v <- c(-1, 1)

M <-
data_frame(
x = seq(from = -1, to = 1, length = 1e3),
y = f(x))

G <-
ggplot(
data = M,
mapping = aes())+
geom_area(
mapping = aes(x = x, y = y),
fill = 'thistle',
color = 'thistle',
alpha = 0.7,
size = 1.3,
linetype = 'solid') +
ylim(
lower = 0,
upper = 1.5) +
xlim(
lower = -1.5,
upper = +1.5) +
geom_segment(
data = data_frame(x = v, y = f(v)),
mapping = aes(x = x, xend = x, y = 0, yend = y),
color = 'purple',
linetype = 'dashed',
size = 1.3) +
labs(
title = 'A Probability Density Function',
subtitle = NULL,
x = 'x-axis',
y = NULL,
caption = 'Statistics I / Section 4.4') +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

table of contents 117


MATH 3301 / Notes Section 4.3

A Probability Density Function


1.5

1.0

0.5

0.0

−1 0 1
x−axis
Statistics I / Section 4.4

table of contents 118


Section 4.4 MATH 3301 / Notes

4.4 Moment Generating Functions

The Moment Generating Function of a Continuous Random Variable

For a continuous random variable X, the moment generating function of X is ...


mX (t) = E[eXt ]

Z∞
= ext f (x) dx
−∞

MGF for Functions of a Random Variable

For a function g(X) of a continuous random variable X,


mg(X) (t) = E[eg(X)t ]

Z∞
= eg(x)t f (x) dx
−∞

Notes

As with discrete random variables,...


mX (0) = 1

m(k)
 k
X
(0) = E X

mX (ln a) = E aX
 

table of contents 119


MATH 3301 / Notes Section 4.4

Theorem

Let X be a random variable with moment generating function mX (t). Then


dk mX (t)

= m(k) (0)
dtk t=0
X

= E Xk
 

Note

The previous theorem can be used to calculate the variance of X


V [X] = E[(X − µX )2 ]

2
= E[X 2 ] − E[X]

 (1) 2
= m(2)
X
(0) − mX (0)

table of contents 120


Section 4.4 MATH 3301 / Notes

Note

mX (0) = E[e0X ]

= E[e0 ]

= E[1]

= 1

Note

x
E[ax ] = E[eln(a ) ]

= E[ex ln a ]

= mX (ln a)

table of contents 121


MATH 3301 / Notes Section 4.4

Linear Transformations

Let Y = aX + b. Then
mY (t) = E[etY ]

= E[et(aX+b) ]

= E[eatX etb ]

= etb mX (at)

Independent Random Variables and Moment Generating Functions

Let X1 , . . . , Xn be independent random variables and define W = X1 + · · · + Xn .


mW (t) = m1 (t) × · · · × mn (t)

Log of mX (t), E[X], and V [X]

Let X be a random variable with moment generating function mX (t).

Define h(t) = ln(mX (t)).


E[X] = h0 (0)

V [X] = h00 (0)

table of contents 122


Section 4.5 MATH 3301 / Notes

4.5 Uniform Continuous Distributions

The Continuous Uniform Distribution

The continuous uniform distribution is constant on an interval and zero elsewhere.


For a < b real numbers,
 1
 b − a , for a ≤ x ≤ b


f (x) =


0, otherwise



 0, for x < a





 x−a
FX [x] = , for a ≤ x ≤ b

 b − a





1, for b < x

bn+1 − an+1
E[X n ] =
(n + 1)(b − a)

a+b
E[X] =
2

(b − a)2
V [X] =
12

ebt − eat
MX (t) =
(b − a)t

table of contents 123


MATH 3301 / Notes Section 4.5

Using R for Continuous Uniform Distributions

density dunif(x, min = 0, max = 1, log = FALSE)

CDF punif(q, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)

quantiles qunif(p, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)

random numbers runif(n, min = 0, max = 1)

Using R for Continuous Uniform Distributions


Generate 100,000 random numbers from a continuous uniform distribution on the interval
[0, 2] plot them as a relative frequeny histogram and overlay the corresponding density
function.
# Generate radom numbers
M <- data_frame(
xr = runif(n = 1e4, min = 0, max = 2),
xp = seq(from = 0, to = 2, length = 1e4),
fp = dunif(xp, min = 0, max = 2))

# Plot
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = xr, y = ..density..),
color = 'purple',
fill = 'thistle',
alpha = 0.9,
breaks = seq(from = 0, to = 2, by = 0.05)) +
geom_line(
mapping = aes(x = xp, y = fp),
color = 'darkgreen',
size = 1.3,
linetype = 'solid') +
scale_x_continuous(
breaks = c(0, 1, 2),
labels = c(0, '', 2),
limits = c(-0.5, 2.5)) +
scale_y_continuous(
limits = c(0, 0.75)) +
theme_classic() +
labs(
title = 'Relative Frequency Histogram / Random Uniform Numbers',
subtitle = NULL,

table of contents 124


Section 4.5 MATH 3301 / Notes

x = 'x-values',
y = NULL) +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'))

Relative Frequency Histogram / Random Uniform Numbers

0.6

0.4

0.2

0.0

0 2
x−values

table of contents 125


MATH 3301 / Notes Section 4.5

The Empirical Cumulative Distribution Function


Generate 100,000 random numbers from a continuous uniform distribution on the interval
[0, 2] plot them as an empirical cumulative distribution function and overlay the theoretical
cdf.

# Generate random numbers


a <- 0
b <- 2
xr <- runif(n = 1e4, min = a, max = b)
xc <- seq(from = 0, to = 2, by = (b - a) / 50)
xf <- cut_width(xr, width = (b - a) / 50)
xt <- as.numeric(table(xf))
yc <- cumsum(xt) / sum(xt)

M <- data_frame(xc = xc, yc = yc)

N <- data_frame(
xp = seq(from = 0, to = 2, length = 1e4),
yp = punif(xp, min = 0, max = 2))

# Plot
G <-
ggplot(
data = M) +
geom_col(
mapping = aes(x = xc, y = yc),
color = 'blue',
fill = 'powderblue',
alpha = 0.9,
size = 0.5,
linetype = 'solid') +
geom_line(
data = N,
mapping = aes(x = xp, y = yp),
color = 'darkgreen',
size = 1.3,
linetype = 'solid') +
labs(

table of contents 126


Section 4.5 MATH 3301 / Notes

title = 'An Empirical Cumulative Distribution Function',


subtitle = NULL,
x = 'x-values',
y = NULL) +
theme(
axis.ticks = element_blank(),
text =element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

table of contents 127


MATH 3301 / Notes Section 4.5

An Empirical Cumulative Distribution Function


1.00

0.75

0.50

0.25

0.00

0.0 0.5 1.0 1.5 2.0


x−values

table of contents 128


Section 4.6 MATH 3301 / Notes

4.6 Gamma Distributions

The Density Function of a Gamma Random Variable

For X with gamma distribution and parameters α > 0, β > 0

 α−1 −x/β
x e
 β α Γ(α) , 0 ≤ x



f (x) =



 0, otherwise

Z∞
Γ(α) = xα−1 e−x dx
0

Properties of Gamma Random Variables

E[X] = αβ

V [X] = αβ 2

mX (t) = (1 − βt)−α

The Shape and Scale Parameters

The shape parameter is α and the scale parameter is β

table of contents 129


MATH 3301 / Notes Section 4.6

Notation

A gamma random variable X with parameters α and β is denoted X ∼ Γ(α, β).

Properties of the Gamma Function, Γ(α)

1. For n ∈ Z+ , Γ(n) = (n − 1)!

2. For α > 1, Γ(α) = (α − 1)Γ(α − 1)


3. Γ(1/2) = π

Sums of Independent Gamma Random Variables

Let X ∼ Γ(αX , β) and Y ∼ Γ(αY , β) be independent gamma random variables.

Define W = X + Y .

Then W ∼ Γ(αX + αY , β) is a gamma random variable.

table of contents 130


Section 4.6 MATH 3301 / Notes

Using R for Gamma Distributions

density dgamma(x, shape, rate = 1, scale = 1/rate, log = FALSE)

CDF pgamma(q, shape, rate = 1, scale = 1/rate, lower.tail = TRUE)

quantiles qgamma(p, shape, rate = 1, scale = 1/rate, lower.tail = TRUE)

random numbers rgamma(n, shape, rate = 1, scale = 1/rate)

Using R for Gamma Distributions


# X is a gamma random variable with shape parameter 3 and scale parameter 1.
# Calculate P[X > 6]
pgamma(q = 6, shape = 3, scale = 1, lower.tail = FALSE)
[1] 0.0619688
# X is a gamma random variable with shape parameter 2 and
# scale parameter 2. Determine the value w such that P[X <= w]=0.66
qgamma(p = 0.66, shape = 2, scale = 2)
[1] 4.521549

The Gamma Function Γ(x) in R


Use R to calculate Γ(3.8) and Γ(5)

gamma(3.8)
[1] 4.694174
gamma(5)
[1] 24

table of contents 131


MATH 3301 / Notes Section 4.6

Area Plots for Gamma Distributions


X is a gamma random variable with shape parameter 2 and scale parameter 3. Plot the
probability density function of X on the interval [0, 15]. Show the area corresponding to the
probability P [2 ≤ X ≤ 7]
# Define parameters
shape <- 2
scale <- 3
a <- 2
b <- 7

# Calculate P[2 <= X <= 7]


prob <- pgamma(q = 7, shape = shape, scale = scale) - pgamma(q = 2, shape = shape, scale = scal
prob
[1] 0.5324553
# Plot the area
M <- data_frame(
x = seq(from = a, to = b, length = 1e3),
y = dgamma(x, shape = shape, scale = scale),
xp = seq(from = 0, to = 15, length = 1e3),
yp = dgamma(xp, shape = shape, scale = scale))

G <-
ggplot(
data = M)+
geom_area(
mapping = aes(x = x, y = y),
fill = 'orange',
alpha = 0.3) +
geom_line(
mapping = aes(x = xp, y = yp),
color = 'orange',
size = 1.2,
linetype = 'solid') +
geom_text(
mapping = aes(x = 4.50, y = 0.05),
label = paste('P[2 < X < 7] = ', round(prob, digits = 2)),
size = 5,
color = 'black',
family = 'serif',
fontface = 'italic',
angle = 0) +
labs(
title = 'A Gamma Density Curve',
subtitle = NULL,
x = 'x-values',
y = NULL) +

table of contents 132


Section 4.6 MATH 3301 / Notes

theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'))

A Gamma Density Curve


0.125

0.100

0.075

0.050 P[2 < X < 7] = 0.53

0.025

0.000

0 5 10 15
x−values

table of contents 133


MATH 3301 / Notes Section 4.6

Plotting Gamma Densities Using ggplot2


Plot several gamma density curves together

M <-
expand.grid(
x = seq(from = 0, to = 10, length = 1e3),
shape = c(1, 2, 3),
scale = 2)

M %<>% mutate(y = dgamma(x, shape = shape, scale = scale))

M %>% head %>% kable


x shape scale y
0.0000000 1 2 0.5000000
0.0100100 1 2 0.4975037
0.0200200 1 2 0.4950200
0.0300300 1 2 0.4925486
0.0400400 1 2 0.4900895
0.0500501 1 2 0.4876428
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = y, color = as.factor(shape)),
size = 1.3) +
scale_color_manual(
name = 'Shape Parameter',
breaks = c('1', '2', '3'),
values = c('darkred', 'steelblue', 'darkgreen'))+
labs(
title = 'Gamma Density Functions',
subtitle = NULL,
x = 'x-values',
y = NULL,
caption = 'Statistics I / Section 4.6')+
theme(
legend.position = c(0.85, 0.9),
legend.key.width = unit(1.2, unit = 'inches'),
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

table of contents 134


Section 4.6 MATH 3301 / Notes

Gamma Density Functions


0.5 Shape Parameter
1
2
3

0.4

0.3

0.2

0.1

0.0

0.0 2.5 5.0 7.5 10.0


x−values
Statistics I / Section 4.6

table of contents 135


MATH 3301 / Notes Section 4.7

4.7 χ2 Distributions

Introduction

A special case of the gamma distribution is th chi-square random variable.


This distribution is a gamma distribution with scale parameter β = 2 and shape
parameter ν/2 where ν ∈ Z+

The Density of a χ2 Random Variable

Let ν ∈ Z+ . For a random variable X with a χ2 distribution with ν degrees of


freedom
 ν/2−1 −x/2
x e
 2ν/2 Γ(ν/2) , 0 ≤ x



fX (x) =



0, otherwise

Properties of a χ2 Random Variable

E[X] = ν

V [X] = 2ν

mX (t) = (1 − 2t)−ν/2

table of contents 136


Section 4.7 MATH 3301 / Notes

Relationship to the Normal Distribution

For i ∈ {1, . . . , k} let Zi ∼ N (0, 1) be independent


 
1
1. Z 2 ∼ Γ ,2 is a chi-square random variable with df = 1 freedom.
2
n 
2. S = Z12 + ··· + Zn2 ∼Γ , 2 is a chi-square random variable with df = n
2

3. E[S] = n and V [S] = 2n

table of contents 137


MATH 3301 / Notes Section 4.7

Using R for χ2 Distributions

density function dchisq(x, df, ncp = 0, log = FALSE)

CDF pchisq(q, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)

quantiles qchisq(p, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)

random numbers rchisq(n, df, ncp = 0)

Using R for χ2 Distributions


# X is a chi-square random variable with df = 6. Calculate P[X > 5]
pchisq(q = 5, df = 6, lower.tail = FALSE)
[1] 0.5438131
# X is a chi-square random variable with df = 5. Solve for the 70th percentile of X
qchisq(p = 0.70, df = 5)
[1] 6.06443

# X is a chi-square random variable with df = 10


# calculate P[6 < X < 7]
pchisq(q = 7, df = 10) - pchisq(q = 6, df = 10)
[1] 0.08981829

table of contents 138


Section 4.7 MATH 3301 / Notes

Using ggplot2 to plot the density function

M <-
expand.grid(
x = seq(from = 0, to = 10, by = 0.01),
df = c(2, 3, 4))

M %<>% mutate(y = dchisq(x = x, df = df))

G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = y, color = as.factor(df)),
size = 1.2,
linetype = 'solid') +
scale_y_continuous(
limits = c(0, 0.75)) +
labs(
title = 'Chi-Squared Density Functions',
subtitle = NULL,
x = 'x-values',
y = NULL,
caption = 'Statistics I / Section 4.7',
color = 'Degrees of Freedom') +
theme(
legend.position = c(0.83, 0.87),
legend.key.width = unit(0.6, units = 'inches'),
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

table of contents 139


MATH 3301 / Notes Section 4.7

Chi−Squared Density Functions

Degrees of Freedom
2
3
4
0.6

0.4

0.2

0.0

0.0 2.5 5.0 7.5 10.0


x−values
Statistics I / Section 4.7

table of contents 140


Section 4.8 MATH 3301 / Notes

4.8 Exponential Distributions

The Density Function of an Expoential Random Variable

The exponential distribution is determined by a single parameter. In this course we


use β for the parameter. An exponential distribution is a gamma distribution with
α = 1. The density function is

1 −x/β
β e , for 0 < x



f (x) =


0, otherwise

Properties of Exponential Random Variables

FX [x] = 1 − e−x/β

sX [x] = e−x/β

P [a ≤ x ≤ b] = e−a/β − e−b/β

E[X] = β

V [X] = β 2

MX (t) = (1 − βt)−1

The Memoryless Property of the Exponential Distribution

P [X > a + b | X > b] = P [X > a]

table of contents 141


MATH 3301 / Notes Section 4.8

Using R for Exponential Distributions

density function dexp(x, rate = 1, log = FALSE)

CDF pexp(q, rate = 1, lower.tail = TRUE, log.p = FALSE)

Percentiles qexp(p, rate = 1, lower.tail = TRUE, log.p = FALSE)

Random Numbers rexp(n, rate = 1)

Package Actuar: Functions for the Exponential Distribution

raw moments mexp(order, rate = 1) E[X k ]

limited expected value levexp(limit, rate = 1, order=1) E[min(x, d)k ]

MGF mgfexp(t, rate = 1, log = FALSE) E[etX ]

table of contents 142


Section 4.8 MATH 3301 / Notes

Using R for Exponential Distributions


# Find the 55- percentile of an exponential random variable that has mean 1
qexp(p = 0.55, rate = 1)
[1] 0.7985077

# Calculate the probability that an exponential random variable


# with mean 2 is greater than 3
pexp(q = 3, rate = 0.5, lower.tail = FALSE)
[1] 0.2231302

#check using the survival function


exp(x = -3/2)

[1] 0.2231302

table of contents 143


MATH 3301 / Notes Section 4.8

Usin R for Exponential Distributions

require(actuar)
# Use the actuar package. X is exponential with mean 0.5, calculate E[X^5]
mexp(order = 5, rate = 2)
[1] 3.75
integrate(function(x){x ^ 5 * dexp(x, rate = 2)}, lower = 0, upper = Inf)$value

[1] 3.75
# Calculate E[min(X,5)^2]
levexp(limit=5, rate = 2, order = 2)
[1] 0.4997503
integrate(f = function(x){((x < 5) * x + (5 < x) * x) ^ 2 * dexp(x, rate = 2)},
lower = 0,
upper = Inf)$value
[1] 0.5

# Calculate E[2^x]
mgfexp(log(2), rate = 2, log = FALSE)

[1] 1.530394
# Calculate E[2^x] using integration
integrate(
function(x){2 ^ x * dexp(x, rate = 2, log = FALSE)},
lower = 0,
upper = Inf)$value
[1] 1.530394

table of contents 144


Section 4.8 MATH 3301 / Notes

# Create a vector of 1000 random numbers from an exponential distribution with mean 5 and plot
togram with 50 bins

M <- data_frame(x = rexp(n = 1000, rate = 0.2))

G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = x, y = ..density..),
color = 'red',
fill = 'red',
alpha = 0.6,
breaks = seq(from = 0, to = 30, by = 1)) +
labs(
x = 'Random Exponential Numbers',
y = NULL,
title = 'Relative Frequency Histogram / Exponential Numbers',
subtitle = NULL,
caption = 'Statistics I / Section 4.8') +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

table of contents 145


MATH 3301 / Notes Section 4.8

Relative Frequency Histogram / Exponential Numbers

0.15

0.10

0.05

0.00

0 10 20 30
Random Exponential Numbers
Statistics I / Section 4.8

table of contents 146


Section 4.8 MATH 3301 / Notes

The Memoryless Property of the Exponential Distribution


# introduction -----
# this script provides a demonstration of the memoryless property
# of the exponential distribution

# select values for a and b -----


a <- 3
b <- 5

# define the density function -----


f <- function(x){dexp(x, rate = 1)}

# compute P[X > b]


prob_b <- integrate(f, lower = b, upper = Inf)$value

# define the conditional density function


f_cond <- function(x){(b < x) * f(x) / prob_b}

# compare the conditional and unconditional probabilities


integrate(f, lower = a, upper = Inf)$value

[1] 0.04978707
integrate(f_cond, lower = a + b, upper = Inf)$value
[1] 0.04978708

table of contents 147


MATH 3301 / Notes Section 4.8

Line Plots the Exponential Distribution


M <-
expand.grid(
x = seq(from = 0, to = 10, by = 0.001),
rate = c(1, 0.5, 0.25)) %>%
mutate(y = dexp(x = x, rate = rate))

G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = y, color = as.factor(rate)),
size = 1.2) +
scale_color_manual(
name = 'rate',
values = c('darkred', 'darkgreen', 'darkorange'))+
labs(
title = 'Exponential Density Functions',
subtitle = NULL,
x = 'x-values',
y = NULL,
caption = 'Statistics I / Section 4.8') +
theme(
legend.position = c(0.85, 0.85),
axis.ticks = element_blank(),
text = element_text(
size = 14,
face = 'italic',
family = 'serif',
color = 'black'))

table of contents 148


Section 4.8 MATH 3301 / Notes

Exponential Density Functions


1.00
rate
0.25
0.5
1

0.75

0.50

0.25

0.00

0.0 2.5 5.0 7.5 10.0


x−values
Statistics I / Section 4.8

table of contents 149


MATH 3301 / Notes Section 4.9

4.9 Beta Distributions

The Density Function of a Beta Distribution

The beta distribution has two parameters α > 0 and β > 0. The density function is
defined on the closed interval [0, 1]. For a random variable X with beta distribution
and parameters α > 0 and β > 0
 α−1
x (1 − x)β−1
, for 0 ≤ x ≤ 1



 B(α, β)
fX (x) =



 0, otherwise

Z1
B(α, β) = xα−1 (1 − x)β−1 dx
0

Γ(α)Γ(β)
=
Γ(α + β)

Properties of a Beta Distribution

α
E[X] =
α+β

αβ
V [X] =
(α + β)2 (α + β + 1)

α−1
mode =
α+β−2

table of contents 150


Section 4.9 MATH 3301 / Notes

Note about the Moment Generating Function

The moment generating function for a beta function does not exist in closed form.

Intervals other than [0,1]

To use the Beta distribution but on the interval [a, b] instead of [0, 1]...
y−a
y∗ = where a ≤ y ≤ b
b−a
The new variable y ∗ is defined on the interval [0, 1]

table of contents 151


MATH 3301 / Notes Section 4.9

Using R for Beta Distributions

density dbeta(x, shape1, shape2, ncp = 0, log = FALSE)

CDF pbeta(q, shape1, shape2, ncp = 0, lower.tail = TRUE)

quantiles qbeta(p, shape1, shape2, ncp = 0, lower.tail = TRUE)

random numbers rbeta(n, shape1, shape2, ncp = 0)

Using R for Beta Distributions


* shape1 = α
* shape2 = β

# X is a beta random variable with alpha=4 and beta=2


# Calculate P[0.4 < X <= 0.7] using the CDF
pbeta(q = 0.7, shape1 = 4, shape2 = 2) - pbeta(q = 0.4, shape1 = 4, shape2 = 2)
[1] 0.44118
# Calculate P[0.4 < X <= 0.7] using integration of the density function
integrate(function(x){dbeta(x, shape1 = 4, shape2 = 2)}, lower = 0.4, upper = 0.7)$value

[1] 0.44118
# X is a beta random variable with alpha = 2 and beta = 6
# Find the first quartile of X
qbeta(p = 0.25, shape1 = 2, shape2 = 6)

[1] 0.137974

table of contents 152


Section 4.9 MATH 3301 / Notes

Using R for Beta Distributions


# Plot Beta Density functions for alpha = 1, and beta = 1, 2, 3

M <-
expand.grid(
x = seq(from = 0, to = 1, by = 0.01),
alpha = 3,
beta = c(1, 2, 3)) %>%
mutate(y = dbeta(x = x, shape1 = alpha, shape2 = beta))

G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = y, col = as.factor(beta)),
size = 1.2,
linetype = 'solid') +
scale_color_manual(
name = 'Beta',
values = topo.colors(3)) +
labs(
title = 'Beta Density Functions',
subtitle = NULL,
x = 'x-values',
y = 'Prabability Density',
caption = 'Statistics I / Section 4.9') +
theme(
legend.position = c(0.3, 0.8),
legend.direction = 'horizontal',
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
family = 'serif',
face = 'italic'))

table of contents 153


MATH 3301 / Notes Section 4.9

Beta Density Functions


3

Beta 1 2 3

2
Prabability Density

0.00 0.25 0.50 0.75 1.00


x−values
Statistics I / Section 4.9

table of contents 154


Section 4.9 MATH 3301 / Notes

Using R for Beta Distributions


# X is a beta random variable with alpha = 3 and beta = 2
# Generate 10,000 random numbers from X and plot the histogram

M <- data_frame(x = rbeta(n = 1000, shape1 = 3, shape2 = 2))

G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = x, y = ..density..),
color = 'black',
fill = 'purple',
alpha = 0.2,
breaks = seq(from = 0, to = 1, length = 30)) +
geom_line(
mapping = aes(x = x, y = dbeta(x, shape1 = 3, shape2 = 2)),
color = 'purple',
size = 1.2,
linetype = 'solid') +
labs(
x = 'Random Variable',
y = NULL,
title = 'Relative Frequency Histogram / Beta Numbers',
subtitle = NULL,
caption = 'Statistics I / Section 4.9') +
theme(

axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
family = 'serif',
face = 'italic'))

table of contents 155


MATH 3301 / Notes Section 4.9

Relative Frequency Histogram / Beta Numbers

2.0

1.5

1.0

0.5

0.0

0.00 0.25 0.50 0.75 1.00


Random Variable
Statistics I / Section 4.9

table of contents 156


Section 4.10 MATH 3301 / Notes

4.10 Normal Distributions

Introduction

The normal distribution is very important and there is alot of basic facts that you
will want to know. Let X be a normal random variable with mean µ and variance
σ 2 we symbolize this by X ∼ N (µ, σ 2 ) . There is not a closed form for the CDF, it
is just expressed as an integral.

The Density Function of a Normal Random Variable

1 2 2
fX (x) = √ e−(x−µ) /(2σ ) −∞<x<∞
σ 2π

The CDF of a Normal Random Variable

Zx
1 2
/(2σ 2 )
FX (x) = √ e−(t−µ) dt
σ 2π
−∞

table of contents 157


MATH 3301 / Notes Section 4.10

Properties of a Normal Random Variable

E[X] = µ

V [X] = σ 2

mode = µ

median = µ

2 2
mX (t) = eµt+t σ /2

table of contents 158


Section 4.10 MATH 3301 / Notes

Standard Normal Random Variables

If X ∼ N (0, 1) then X is a standard normal random variable. Z is often used to


denote a standard normal random variable. The CDF of a standard normal random
variable is denoted Φ(z).
1 2
fZ (z) = √ e−z /2 −∞<z <∞

Zz
1 2
Φ(z) = √ e−t /2
dt

−∞

E[Z] = 0

V [Z] = 1

mode = 0

median = 0

2
mX (t) = et

table of contents 159


MATH 3301 / Notes Section 4.10

Special Formulas for the Standard Normal Distribution

1. Φ(z) + Φ(−z) = 1

2. P [Z < a] = P [Z > −a]

3. P [−a < Z < a] = 2Φ(a) − 1

4. Φ−1 (a) = −Φ−1 (1 − a)

Sums of Normal Random Variables

Let X ∼ N (µX , σX2 ) and Y ∼ N (µY , σY2 ). Then W = X + Y is a normal random


variable. If X an Y are independent then W ∼ N (µX + µY , σX2 + σY2 )

Standardized Random Variables and z-Scores

Let X be a random variable with mean µ and standard deviation σ. Define


X −µ
Z=
σ
Z is called the standardization of X and has mean 0 and standard deviation 1. Let
x be a value from a probability distribution with mean µ and standard deviation σ.
The z-score of x is defined as
(x − µ)
σ

table of contents 160


Section 4.10 MATH 3301 / Notes

Standardizing Normal Random Variables

1. Let X ∼ N (µ, σ 2 ) then X = µ + σZ where Z ∼ N (0, 1)

2. Let Z ∼ N (0, 1) then for constants µ and σ, µ + σZ ∼ N (µ, σ 2 )

Outliers, Extreme Values

Extreme values from a data set or distribution are often referred to as outliers. One
definition of an outlier are data values whose z-scores are greater than +3 or less
than -3.

table of contents 161


MATH 3301 / Notes Section 4.10

Using R for the Normal Distributions

density dnorm(x, mean = 0, sd = 1, log = F)

CDF pnorm(q, mean = 0, sd = 1, lower.tail = T, log.p = F)

quantiles qnorm(p, mean = 0, sd = 1, lower.tail = T, log.p = F)

random numbers rnorm(n, mean = 0, sd = 1)

Using R for the Normal Distributions


# X is a random variable with mean=2 and sd=1. Calculate P[X > 3]
pnorm(q = 3, mean = 2, sd = 1, lower.tail = FALSE)

[1] 0.1586553
# X is a random variable with mean = 5 and sd = 2. Calculate P[X < 6]
pnorm(q = 6, mean = 5, sd = 2, lower.tail = TRUE)
[1] 0.6914625
# Z is a standard normal random variable. Compute the 66th percentile of Z
qnorm(p = 0.66, mean = 0, sd = 1)

[1] 0.4124631
# mu = -3, sigma = 4. Compute the smallest number w such that P[X > w] < 0.33
qnorm(p = 0.33, mean = -3, sd = 4, lower.tail = FALSE)
[1] -1.240347

table of contents 162


Section 4.10 MATH 3301 / Notes

Using R for the Normal Distributions

# Plot normal densities for mean = 0 and sd = 1, 2


M <-
expand.grid(
x = seq(from = -5, to = +5, length = 1e3),
mu = 0,
sigma = c(1, 2))

M %>% head %>% kable


x mu sigma
-5.00000 0 1
-4.98999 0 1
-4.97998 0 1
-4.96997 0 1
-4.95996 0 1
-4.94995 0 1

M %<>% mutate(y = dnorm(x, mean = mu, sd = sigma))

G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = y, color = as.factor(sigma)),
size = 1.3,
linetype = 'solid') +
scale_color_manual(
values = c('darkgreen', 'darkorange'),
name = 'Standard Deviation') +
labs(
title = 'Normal Density Functions',
caption = 'Statistics I / Section 4.10',
x = 'Normal Numbers',
y = NULL) +
theme(
legend.position = c(0.8, 0.9),
legend.key.width = unit(1, units = 'inches'),
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

table of contents 163


MATH 3301 / Notes Section 4.10

Normal Density Functions


0.4 Standard Deviation
1
2

0.3

0.2

0.1

0.0

−5.0 −2.5 0.0 2.5 5.0


Normal Numbers
Statistics I / Section 4.10

table of contents 164


Chapter 5

Multivariate Probability Distributions

5.1 Introduction
In this chapter of the course, joint probability distributions are introduced. The first section
provides an example of using R and heatmaps to visualize a bivariate probability density.

library(tidyverse)

# parameters for an example bivariate continuous density function -----


u_x <- 0.0
s_x <- +0.5
u_y <- +1.0
s_y <- +0.5
r_xy <- -0.5

# Define the biviriate normal density


Q <- function(x, y){
((x - u_x)/ s_x) ^ 2 + ((y - u_y) / s_y) ^ 2 -
2 * r_xy * (x - u_x) / s_x * (y - u_y) / s_y}

f <- function(x,y){1 / (2 * pi * s_x * s_y * sqrt(1 - r_xy ^ 2)) * exp(-Q(x, y) / 2)}

# use the outer function to create a square matrix of density values


M <-
expand.grid(
x = seq(from = -3, to = 3, by = 0.01),
y = seq(from = -3, to = 3, by = 0.01)) %>%
mutate(density = f(x, y))

G <-
ggplot(
data = M,
mapping = aes(x = x, y = y)) +
geom_tile(
mapping = aes(fill = density)) +
MATH 3301 / Notes Section 5.1

labs(
title = 'Bivariate Normal Density Function, John Garza',
subtitle = NULL,
x = 'x-axis',
y = 'y-axis',
caption = 'Statistics I / Section 5.1')+
scale_fill_continuous(
name = 'density',
type = 'viridis') +
theme_classic() +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'),
legend.position = c(0.86, 0.2))

table of contents 166


Section 5.1 MATH 3301 / Notes

Bivariate Normal Density Function, John Garza

2
y−axis

density
0.6
−2 0.4
0.2

−2 0 2
x−axis
Statistics I / Section 5.1

table of contents 167


MATH 3301 / Notes Section 5.1

# Load the mvtnorm package


library(mvtnorm)
library(magrittr)

ux <- -2.1
uy <- +0.5
sx <- +1.3
sy <- +2.3
rho <- -0.7

# Define the mean vector


mu <- c(ux, uy)

# Define the variance - covariance matrix


sigma <-
cbind(
c(sx^2, rho * sx * sy),
c(rho * sx * sy, sy^2))

# Generate 500 multivariate normal samples


M <-
rmvnorm(
n = 500,
mean = mu,
sigma = sigma)

colnames(M) = c('x', 'y')


M %<>% as_data_frame()

# Create the 2d density plot


G <-
ggplot(
data = M,
mapping = aes(x = x, y = y)) +
stat_density_2d(
mapping = aes(fill = stat(level)),
geom = 'polygon',
position = 'identity',
size = 1.2,
contour = TRUE,
n = 100,
h = NULL,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) +
scale_fill_gradient(
name = 'Density',

table of contents 168


Section 5.1 MATH 3301 / Notes

low = 'yellow',
high = 'steelblue') +
geom_point(
size = 1,
alpha = 0.2) +
scale_y_continuous(
breaks = c(-4, 0, 4, 8),
labels = c(-4, '', 4, 8)) +
scale_x_continuous(
breaks = c(-6, -4, -2, 0, 2),
labels = c(-6, -4, '', 0, 2)) +
labs(
title = 'Random Bivariate Normal Numbers',
caption = 'Statistics I / Test #3',
x = 'x-axis',
y = 'y-axis') +
theme(
plot.title = element_text(hjust = 0.5),
axis.title.x = element_text(vjust = +5),
axis.title.y = element_text(vjust = -3),
legend.position = c(0.09, 0.15),
text = element_text(
face = 'italic',
family = 'serif',
size = 16,
color = 'black'),
axis.ticks = element_blank())
G

table of contents 169


MATH 3301 / Notes Section 5.1

Random Bivariate Normal Numbers


8

4
y−axis

Density
−4 0.06

0.04

0.02

−6 −4 x−axis 0

Statistics I / Test #3

table of contents 170


Section 5.2 MATH 3301 / Notes

5.2 Bivariate Probability Distributions

Joint Probability Functions

Let X and Y be discrete random variables. The joint probability function for X
and Y is defined as
p(x, y) = P (X = x, Y = y)

−∞ < x < +∞

−∞ < y < +∞

Properties of Joint Probability Functions

For discrete random variables X and Y with joint probability function p(x, y)
1. p(x, y) ≥ 0 ∀x, y

X
2. p(x, y) = 1
all x,y

table of contents 171


MATH 3301 / Notes Section 5.2

Joint Distribution Functions

Let X and Y be random variables. The joint distribution function of X and Y is


defined as
F (x, y) = P (X ≤ x, Y ≤ y)

−∞ < x < +∞

−∞ < y < +∞

Properties of Joint Distribution Functions

Let X and Y be random variables with joint distribution function F (x, y). Then
1. F (−∞, −∞) = 0

2. F (−∞, y) = 0

3. F (x, −∞) = 0

4. F (+∞, +∞) = 1

5. For x∗ ≥ x and y ∗ ≥ y,

F (x∗ , y ∗ ) − F (x∗ , y) − F (x, y ∗ ) + F (x, y) ≥ 0

table of contents 172


Section 5.2 MATH 3301 / Notes

Joint Density Functions

Let X and Y be continuous random variables with joint distribution function F (x, y).
If there exists a non-negative function f (x, y) satisfying
Zx Zy
F (x, y) = f (x, y) dydx ∀ x and y
−∞ −∞

Then X and Y are said to be jointly continuous and f (x, y) is called the joint
probability density function of X and Y .

Properties of Joint Density Functions

Let X and Y be jointly continuous random variables density function f (x, y).

1. f (x, y) ≥ 0 ∀ (x, y)

Z+∞ Z+∞
2. f (x, y) dy dx = 1
−∞ −∞

Zb Zd
3. P (a < X < b, c < Y < d) = f (x, y) dy dx
a c

table of contents 173


MATH 3301 / Notes Section 5.2

Multivariate Probability Distributions

Let n ∈ Z+ . The idea of joint probability distributions can be generalized to joint


density functions of random variables X1 , . . . Xn . In the disrete case, the joint prob-
ability function is defined as
p(x1 , . . . , xn ) = P (X1 = x1 , . . . , Xn = xn )
In the case of continuous random variables, the joint density function and joint
distribution function satisfy
P (X1 ≤ x1 , . . . , Xn ≤ xn ) = F (x1 , . . . , xn )

Zx1 Zx2 Zxn


= ··· f (t1 , t2 , . . . , tn ) dtn · · · dt2 dt1
−∞ −∞ −∞

table of contents 174


Section 5.2 MATH 3301 / Notes

A Function for computing Iterated Integration


iterated_integral <- function(xl, xu, yl, yu, f, dx){
# computes the iterated integral of f over the region defined by
# xl < x < xu and yl < y < yu
# Args:
# xl: lower bound for x, as a function of y
# xu: upper bound for x, as a function of y
# yl: lower bound for y, as a function of x
# yu: upper bound for y, as a function of x
# dx: 1 means integrate x first (on the inside) and then y
# f : the integrand as a function of x and y
#
# Returns: the iterated integral of f over the region defined by xl, xu, yl, and yu
if (dx == 1){
integrate(function(y)
{sapply(y, function(y)
{integrate(function(x){f(x,y)}, lower = xl(y), upper=xu(y))$value})},
lower = yl(x), upper = yu(x))
} else {
integrate(function(x)
{sapply(x, function(x)
{integrate(function(y){f(x,y)}, lower = yl(x), upper=yu(x))$value})},
lower = xl(y), upper = xu(y))}
}

Example

A device runs until either of two components fails, at which point the device stops running.
The joint density function of the lifetimes of the two components, both measured in hours,
is
x+y



 8 , for 0 < x < 2, 0 < y < 2
f (x, y) =


0 , otherwise

Calculate the probability that the device fails during its first hour of operation. Plot the
joint density function and also show the region corresponding to the probability that the
device fails during its first hour of operation.

table of contents 175


MATH 3301 / Notes Section 5.2

Solution

# Define the joint density function


f <- function(x,y){(x + y) / 8 * (0 < x) * (0 < y) * (x < 2) * (y < 2)}

# Define the region of interest

r <- function(x,y){(x < 1 | y < 1) * (0 < x) * (0 < y)}

# Limits
xl <- function(y){0}
xu <- function(y){2}
yl <- function(x){0}
yu <- function(x){2}

# Compute the answer

prob <- iterated_integral(


xl = xl,
xu = xu,
yl = yl,
yu = yu,
f = function(x,y){r(x,y) * f(x,y)}, dx = 1)$value

prob

[1] 0.625

table of contents 176


Section 5.2 MATH 3301 / Notes

# Create an equalled spaced grid over the support of f(x,y)


M <-
expand.grid(
x = seq(from = 0, to = 2, length = 100),
y = seq(from = 0, to = 2, length = 100)) %>%
mutate(z = f(x, y))

N <-
data_frame(
x = c(0, 0, 1, 1, 2, 2),
xend = c(0, 1, 1, 2, 2, 0),
y = c(0, 2, 2, 1, 1, 0),
yend = c(2, 2, 1, 1, 0, 0))

G <-
ggplot(
data = M,
mapping = aes(x = x, y = y)) +
geom_tile(
mapping = aes(fill = z)) +
scale_x_continuous(
breaks = c(0.0, 0.5, 1.0, 1.5, 2.0),
labels = c('0.0', 0.5, '', 1.5, '2.0')) +
scale_y_continuous(
breaks = c(0.0, 0.5, 1.0, 1.5, 2.0),
labels = c('0.0', 0.5, '', 1.5, '2.0')) +
scale_fill_continuous(
name = 'Density',
type = 'viridis') +
labs(
x = 'x-axis',
y = 'y-axis',
title = 'A Bivariate Probability Density Function, John Garza',
caption = 'Statistics I / Section 5.2') +
geom_segment(
data = N,
mapping = aes(x = x, xend = xend, y = y, yend = yend),
size = 0.4) +
geom_text(
x = 0.5,
y = 0.5,
label = 'Area',
size = 7,
fontface = 'italic',
family = 'serif') +
theme_classic() +
theme(
legend.position = c(0.86, 0.2),

table of contents 177


MATH 3301 / Notes Section 5.2

legend.direction = 'vertical',
text = element_text(
size = 16,
color = 'black',
family = 'serif',
face = 'italic'),
axis.line = element_blank(),
axis.title.x = element_text(
vjust = -0.8),
axis.title.y = element_text(
vjust = +0.4),
axis.ticks = element_blank())

table of contents 178


Section 5.2 MATH 3301 / Notes

A Bivariate Probability Density Function, John Garza


2.0

1.5
y−axis

Density
0.5 Area
0.4
0.3
0.2
0.1
0.0
0.0

0.0 0.5 1.5 2.0


x−axis
Statistics I / Section 5.2

table of contents 179


MATH 3301 / Notes Section 5.3

5.3 Marginal and Conditional Distributions

Marginal Probability Functions

Let X and Y be discrete random variables with joint probability function p(x, y).
The marginal probability functions of X and Y are defined as
X
pX (x) = p(x, y)
all y

X
pY (y) = p(x, y)
all x

Marginal Density Functions

Let X and Y be continuous random variables with joint density function f (x, y).
The marginal density functions of X and Y are

Z+∞
fX (x) = f (x, y) dy
−∞

Z+∞
fY (y) = f (x, y) dx
−∞

table of contents 180


Section 5.3 MATH 3301 / Notes

Conditional Probability Function

Let X and Y be discrete random variables with joint probability function p(x, y).
The conditional probability functions are defined as

p(x | y) = P (X = x|Y = y)

P (X = x, Y = y)
=
P (Y = y)

p(x, y)
=
pY (y)

p(y | x) = P (Y = y|X = x)

P (X = x, Y = y)
=
P (X = x)

p(x, y)
=
pX (x)

Note:

p(x|y) is defined only if pY (y) > 0.

table of contents 181


MATH 3301 / Notes Section 5.3

Conditional Distribution Functions

Let X and Y be jointly distributed random variables. The conditional distribution


function of X given Y = y is defined as

F (x | y) = P (X ≤ x | Y = y)

Conditional Density Functions

Let X and Y be continuous random variables with joint density function f (x, y).
The conditional density functions are defined as

f (x, y)
f (x | y) =
fY (y)

f (x, y)
f (y | x) =
fX (x)

Note:

f (x | y) is defined only if fY (y) > 0

table of contents 182


Section 5.4 MATH 3301 / Notes

5.4 Independent Random Variables

Introduction

Two events A and B are independent if P (A ∩ B) = P (A) × P (B). If two random


variables X and Y are independent of each other we would like to have
P (a ≤ X ≤ b, c ≤ Y ≤ d) = P (a ≤ X ≤ b) × P (c ≤ Y ≤ d)
In this section we will describe indpendent random variables in terms of their joint
probability, density and distribution functions.

Independent Random Variables

Let X and Y be random variables with joint distribution function F (x, y). Then X
and Y are independent if and only if
F (x, y) = FX (x) × FY (y) for all real numbers x, y

Dependent Random Variables

If X and Y not independent, they are said to be dependent random variables.

Independent Jointly Discrete Random Vriables

Let X and Y be discrete random variables with joint probability function p(x, y)
and marginal probability functions pX (x) and pY (y). X and Y are independent if
and only if
p(x, y) = pX (x) × pY (y)
for all real numbers x and y

table of contents 183


MATH 3301 / Notes Section 5.4

Independent Jointly Continuous Random Variables

Let X and Y be continuous random variables with joint density function f (x, y) and
marginal density functions fX (x) and fY (y). Then X and Y are independent if and
only if
f (x, y) = fX (x) × fY (y)
for all real numbers x and y.

Theorem About Independence and Rectangular Regions

Let X and Y be continuous random variables with joint density function f (x, y).
Suppose that f (x, y) > 0 on the rectangular region defined by a ≤ x ≤ b and c ≤
y ≤ d and that f (x, y) = 0 otherwise. Then X and Y are independent random
variables if and only if there exist functions h(x) and g(y) such that
f (x, y) = h(x) × g(y)
where h(x) is a function of x only and g(y) is a function of y only.

table of contents 184


Section 5.4 MATH 3301 / Notes

Independent Random Variables and Conditional Density Functions

Let X and Y be continuous random variables with joint density function


f (x, y), conditional density functions f (x|y), f (y|x) and marginal density functions
fX (x), fY (y). It can be shown that X and Y are independent if and only if either
of two equations hold
f (x|y) = fX (x) for all y such that fY (y) > 0

f (y|x) = fY (y) for all x such that fX (x) > 0

Independence of Many Random Variables

Let X1 , X2 , . . . , Xn be random variables with joint distribution function


F (x1 , x2 , . . . , xn ) and marginal distribution functions F1 (x1 ), . . . , Fn (xn ). The vari-
ables are independent if and only if
F (x1 , x2 , . . . , xn ) = F1 (x1 ) × · · · × Fn (xn )

table of contents 185


MATH 3301 / Notes Section 5.4

Independence and Covariance

The following facts relate independence and covariance

• If X and Y are independent then Cov(X, Y ) = 0

• Cov(X, Y ) = 0 does not imply that X and Y are independent.

table of contents 186


Section 5.5 MATH 3301 / Notes

5.5 The Expected Value of a Function of Random Variables

The Expected Value of a Function of Many Discrete Random Variables

Let X1 , X2 , . . . , Xn be discrete random variables with joint probability function


p(x1 , x2 , . . . , xn ) and let h(x1 , x2 , . . . , xn ) be a function of X1 , X2 , . . . , Xn The ex-
pected value of h(x1 , x2 , . . . , xn ) is
X X
E[h(X1 , X2 , . . . , Xn )] = ··· h(x1 , . . . , xn )p(x1 , . . . , xn )
all x1 all xn

The Expected Value of a Function of Many Continuous Random Variables

Let X1 , X2 , . . . , Xn be continuous random variables with joint density function


f (x1 , x2 , . . . , xn ) and let h(x1 , x2 , . . . , xn ) be a function of X1 , X2 , . . . , Xn The ex-
pected value of h(x1 , x2 , . . . , xn ) is
Z+∞ Z+∞
E[h(X1 , X2 , . . . , Xn )] = ··· h(x1 , . . . , xn )f (x1 , . . . , xn ) dxn · · · dx1
−∞ −∞

table of contents 187


MATH 3301 / Notes Section 5.5

Using R to Compute an Expected Value

Let X and Y be jointly continuous random variables with joint density function

−x−2y
2e
 for x > 0, y > 0
f (x, y) =

0 otherwise

Plot the joint density function as a heat map

Use R to calculate E[X + Y ].

Z+∞ Z+∞
E[X + Y ] = (x + y)f (x, y) dy dx
−∞ −∞

table of contents 188


Section 5.5 MATH 3301 / Notes

# Define the iterated integration function


iterated_integral <- function(xl, xu, yl, yu, f, dx){
# computes the iterated integral of f over the region defined by
# xl < x < xu and yl < y < yu
# Args:
# xl: lower bound for x, as a function of y
# xu: upper bound for x, as a function of y
# yl: lower bound for y, as a function of x
# yu: upper bound for y, as a function of x
# dx: 1 means integrate x first (on the inside) and then y
# f : the integrand as a function of x and y
#
# Returns: the iterated integral of f over the region defined by xl, xu, yl, and yu
if (dx == 1){
integrate(function(y)
{sapply(y, function(y)
{integrate(function(x){f(x,y)}, lower = xl(y), upper=xu(y))$value})},
lower = yl(x), upper = yu(x))
} else {
integrate(function(x)
{sapply(x, function(x)
{integrate(function(y){f(x,y)}, lower = yl(x), upper=yu(x))$value})},
lower = xl(y), upper = xu(y))}
}

# Define the density function

f <- function(x, y){(0 < x) * (0 < y) * 2 * exp(-x) * exp(-2 * y)}

# Limits of integration
xl <- function(y){0}
xu <- function(y){Inf}
yl <- function(x){0}
yu <- function(x){Inf}

# compute the expected value, E[X + Y]


ex <- iterated_integral(xl, xu, yl, yu, function(x,y){(x + y) * f(x,y)}, dx = 2)$value
ex

[1] 1.5

table of contents 189


MATH 3301 / Notes Section 5.5

# Create an equally spaced grid


M <- expand.grid(
x = seq(from = 0, to = 1, length = 300),
y = seq(from = 0, to = 1, length = 300)) %>%
mutate(z = f(x, y))

# Create the heat map


G <-
ggplot(
data = M,
mapping = aes(x = x, y = y, fill = z))+
geom_tile() +
scale_x_continuous(
breaks = c(0.00, 0.25, 0.50, 0.75, 1.00),
labels = c('0.00', '0.25', '', '0.75', '1.00')) +
scale_y_continuous(
breaks = c(0.00, 0.25, 0.50, 0.75, 1.00),
labels = c('0.00', '0.25', '', '0.75', '1.00')) +
scale_fill_continuous(
name = 'Probability Density',
low = 'white',
high = 'darkgreen') +
labs(
x = 'x-axis',
y = 'y-axis',
title = 'A Bivariate Density Function',
caption = 'Statistics I / Section 5.5') +
theme_classic() +
theme(
legend.position = c(0.80, 0.82),
axis.line = element_blank(),
axis.title.x = element_text(vjust = +8),
axis.title.y = element_text(vjust = -8),
text = element_text(
size = 16,
color = 'black',
family = 'serif',

table of contents 190


Section 5.5 MATH 3301 / Notes

face = 'italic'),
axis.ticks = element_blank())

A Bivariate Density Function


1.00
Probability Density

1.5
1.0
0.5
0.75
0.0
y−axis

0.25

0.00

0.00 0.25 x−axis 0.75 1.00

Statistics I / Section 5.5

table of contents 191


MATH 3301 / Notes Section 5.5

Properties of Expected Value

1. For a constant c ∈ R,

E[c] = c

2. Let g(X, Y ) be a function of the random variables X and Y . For c ∈ R

E[cg(X, Y )] = cE[g(X, Y )]

3. Let g1 (X, Y ), . . . , gn (X, Y ) be functions of the random variables X and Y . Then

E[g1 (X, Y ) + · · · + gn (X, Y )] = E[g1 (X, Y )] + · · · + E[gn (X, Y )]

4. Let g(X) and h(Y ) be functions of the independent random variables X and Y .

E[g(X)h(Y )] = E[g(X)] × E[h(Y )]

table of contents 192


Section 5.6 MATH 3301 / Notes

5.6 Covariance and Correlation

Introduction

Covariance and correlation are very important topics and will play an important role
in your future studies. The definitions and equations presented in this section should
be learned very carefully. Additionally, there are many properties of covariance and
correlation to know and understand.

The Covariance of Two Random Variables

Let X and Y be random variables with means µX and µY . The covariance of X and
Y is
Cov(X, Y ) = E[(X − µX )(Y − µY )]

Uncorrelated Random Variables

Two random variables X and Y are said to be uncorrelated if Cov(X, Y ) = 0

Correlation Coefficients, ρ

Let X and Y be random variables. correlation coefficient of X and Y is


Cov(X, Y )
ρXY =
σX σY

table of contents 193


MATH 3301 / Notes Section 5.6

Properties of Covariance and Correlation

Let X and Y be random variables and let a ∈ R be a constant.

0. Cov(X, Y ) = E[(X − µX )(Y − µY )]

1. Cov(X, Y ) = E[XY ] − E[X]E[Y ]

2. Cov(X, Y ) = Cov(Y, X)

3. Cov(X, X) = V ar(X)

4. Cov(X + Y, W ) = Cov(X, W ) + Cov(Y, W )

5. Cov(aX, Y ) = a × Cov(X, Y )

6. Cov(X, Y + a) = Cov(X, Y )

7. Cov(X, Y ) = ρXY σX σY

8. −1 ≤ ρXY ≤ +1

9. |Cov(X, Y )| ≤ σX σY

10. If X and Y are independent then Cov(X, Y ) = 0

11. Cov(X, Y ) = 0 does not mean that X and Y are independent.

table of contents 194


Section 5.6 MATH 3301 / Notes

Calculating the Covariance of Jointly Distributed Random Variables

Let X and Y be continuous random variables with joint density function

8

 3 xy

 , 0 ≤ x ≤ 1, x ≤ y ≤ 2x
f (x, y) =


0 , otherwise

Calculate the covariance of X and Y .


# Calculating covariance using R

source('iterated_integral.R')

# Define the density function


f <- function(x, y){(0 < x & x < 1) * (x <= y & y <= 2 * x) * (8 / 3) * x * y }

# Limits of integration
xl <- function(y){0}
xu <- function(y){1}
yl <- function(x){x}
yu <- function(x){2 * x}

# Calculate the expected value of XY, X, and Y


E_XY <- iterated_integral(xl, xu, yl, yu, function(x, y){x * y * f(x, y)}, dx = 2)$value
E_X <- iterated_integral(xl, xu, yl, yu, function(x, y){x * f(x, y)}, dx = 2)$value
E_Y <- iterated_integral(xl, xu, yl, yu, function(x, y){y * f(x, y)}, dx = 2)$value

# Calculate Cov(X, Y)
Cov <- E_XY - E_X * E_Y
Cov
[1] 0.04148148

table of contents 195


MATH 3301 / Notes Section 5.6

R Functions for Unbiased Sample Variance, Covariance, and Correlation

variance var(x, y = NULL, na.rm = FALSE, use)


standard deviation sd(x, na.rm = FALSE)
covariance cov(x, y = NULL, use = "everything", method = c("pearson", "kendall", "spearman"))
correlation cor(x, y = NULL, use = "everything", method = c("pearson", "kendall", "spearman"))

# Standard Deviation, sd
st_dev <- function(x){sqrt(sum((x - mean(x)) ^ 2) / (length(x) - 1))}
rx <- runif(n = 1e3, min = -5, max = +5)
st_dev(rx)
[1] 2.820544
sd(rx)
[1] 2.820544
# Variance, var
var_x <- function(x){sum((x - mean(x)) ^ 2) / (length(x) - 1)}
rx <- rgamma(n = 1e4, scale = 2, shape = 2)
var_x(rx)
[1] 7.922016
var(rx)
[1] 7.922016
# Covariance, cov
cov_xy <- function(x,y){sum((x - mean(x)) * (y - mean(y))) / (length(x) - 1)}
rx <- rexp(n = 1e2, rate = 3)
ry <- rbinom(n = 1e2, size = 10, prob = 0.5)
cov_xy(rx,ry)
[1] 0.08773042
cov(rx,ry)
[1] 0.08773042
# Correlation, cor
cor_xy <- function(x,y){cov_xy(x,y) / (st_dev(x) * st_dev(y))}
rx <- rgamma(n = 1e4, shape = 2, scale = 3)
ry <- rbeta(n = 1e4, shape1 = 2, shape2 = 3)
cor_xy(rx, ry)
[1] 0.002670185
cor(rx, ry)
[1] 0.002670185

table of contents 196


Section 5.6 MATH 3301 / Notes

Example:
Analyze the harmon74.cor dataset by generating a heat map

# Clear the environment


rm(list = ls())

# Load the data


test <- datasets::Harman74.cor$cov %>% as_data_frame()
test$test2 <- names(test)

# Gather the data and add column of test names


test <- test %>% gather(key = test1, value = covariance, -test2)

# View the first few rows of test


head(test, n = 4)
# A tibble: 4 x 3
test2 test1 covariance
<chr> <chr> <dbl>
1 VisualPerception VisualPerception 1
2 Cubes VisualPerception 0.318
3 PaperFormBoard VisualPerception 0.403
4 Flags VisualPerception 0.468

# Create the heat map


G <-
ggplot() +
geom_tile(
data = test,
mapping = aes(x = test1, y = test2, fill = covariance)) +
scale_fill_gradient2(
name = NULL,
low = 'white',
mid = 'pink',
high = 'darkred')+
labs(
x = NULL,
y = NULL,
title = 'Test Covariance') +
theme_classic() +
theme(
legend.position = 'right',
legend.direction = 'vertical',
axis.ticks = element_blank(),
axis.line = element_blank(),
text = element_text(
size = 16,
color = 'black',
family = 'serif',

table of contents 197


MATH 3301 / Notes Section 5.6

face = 'italic'),
axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

Test Covariance
WordRecognition
WordMeaning
WordClassification
VisualPerception
StraightCurvedCapitals
SeriesCompletion
SentenceCompletion
ProblemReasoning
PargraphComprehension
PaperFormBoard 1.00
ObjectNumber 0.75
NumericalPuzzles
NumberRecognition 0.50
NumberFigure 0.25
GeneralInformation 0.00
Flags
FigureWord
FigureRecognition
Deduction
Cubes
CountingDots
Code
ArithmeticProblems
Addition
un le on
tin C ms
re D CgDode
en Fi coguctbes

m rRe er at gs
Re ed u ts

Nu ralI ure ition


NubeumbformFlaord
mN n W n

ra PaObricaogniguon
ph pe je lP it re
SeProComrFo tNuuzzlon
St nte le pr mBmbes
ra S nc mR eh o er
ht ie o as si d
W Visurv ompleningn
dC lP C let on
W Wlasesrcepitaon
or or if p ls
ec ea tion
og n n
ni ing
n
ig er eC e en ar
e g n io

C sC m o o

dR dMicatio

tio
o
Co rodbiti

e c F i
i

or ua ed p ti
a i
ic d
et A
P

r
c
m

gu

b
ith

Fi
Ar

rg
Pa

table of contents 198


Section 5.7 MATH 3301 / Notes

5.7 Linear Combinations of Random Variables

Introduction

Linear functions of random variables will appear in your studies often. It will be
important to know how expected value, variance, and covariance relate to linear
combinations of random variables. You probably want to derive these identities, but
we will not include that here. The textbooks contain the derivations which rely on
the elementary properties of expected value.

Notation

1. X1 , . . . , Xn are random variables

2. a1 , . . . , an are constants

3. U = a1 X1 + · · · + an Xn

4. Y1 , . . . , Ym are random variables

5. b1 , . . . , bm are constants

6. W = b1 Y1 + · · · + bm Ym

table of contents 199


MATH 3301 / Notes Section 5.7

Properties of Linear Combinations of Random Variables

1.
E[U ] = E[a1 X1 + · · · + an Xn ]

= a1 E[X1 ] + · · · + an E[Xn ]

n
" #
X
= ai × E[Xi ]
i=1

2.
V [U ] = V [a1 X1 + · · · + an Xn ]

n
" # " #
X X
= a2i × V [Xi ] + 2ai aj Cov(Xi , Xj )
i=1 i<j

3.
Cov(U, W ) = Cov(a1 X1 + · · · + an Xn , b1 Y1 + · · · + bm Ym )

n X
m
" #
X
= ai bj × Cov(Xi , Yj )
i=1 j=1

table of contents 200


Section 5.7 MATH 3301 / Notes

Simple Linear Combinations

The most common case of a linear combination consists of two variables and two
constants. It is a good idea to remember the formulas for this situation. Let X, Y,
and W be random variables and let a, b, and c be constants. Then

1. E[aX + bY ] = aE[X] + bE[Y ]


2. E[aX] = aE[X]
3. E[a] = a
4. E[aX + b] = aE[X] + b

5. V [aX + bY ] = a2 V [X] + 2abCov(X, Y ) + b2 V [Y ]


6. V [X + c] = V [X]
7. V [aX] = a2 V [X]
8. V [aX + c] = a2 V [X]
9. V [c] = 0

10. Cov(aX + bY, cW ) = acCov(X, W ) + bcCov(Y, W )


11. Cov(X, c) = 0
12. Cov(X, X) = V [X]
13. Cov(X, Y ) = Cov(Y, X)
14. Cov(aX, bY ) = abCov(X, Y )
15. Cov(X, Y + c) = Cov(X, Y )

table of contents 201


MATH 3301 / Notes Section 5.7

Exploring the Variance - Covariance Identities with R

# Generate two random vectors

X <- runif(n = 1e4, min = -4, max = 3)


Y <- runif(n = 1e4, min = -2, max = 10)

# Define coefficients a and b


a <- +2
b <- -3

# Compute V[2X - 3Y] using the idendity V[aX+bY] = a^2V[X] + 2abCov(X,Y) +b^2V[Y]
a ^ 2 * var(X) + 2 * a * b * cov(X,Y) + b ^ 2 * var(Y)
[1] 123.0317

# Compute the variance directly


var(a * X + b * Y)

[1] 123.0317

table of contents 202


Section 5.7 MATH 3301 / Notes

Variance and Independence


Let X1 , X2 , . . . , Xn be independent random variables. Then

Variance and Independence

V [X1 + X2 + · · · + Xn ] = V [X1 ] + V [X2 ] + · · · + V [Xn ]

table of contents 203


MATH 3301 / Notes Section 5.8

5.8 Multinomial Distributions

Multinomial Experiments

A multinomial experiments defined by

1. n identical trials

2. the n trials are independent

3. each trial results in exactly one of k possible outcomes

4. for i = 1, . . . , k the probability of outcome i is denoted pi .

5. For X1 , . . . , Xk , Xi is the number of trials that resulted in outcome k.

Notes

• p 1 + · · · + pk = 1

• X1 + · · · + Xk = n

table of contents 204


Section 5.8 MATH 3301 / Notes

Multinomial Distributions

Let p1 , . . . , pk be constants satisfying

k
X
1. pi = 1
i=1

2. pi > 0 for i = 1, . . . , k

The X1 , . . . , Xk have a multinomial probability distribution if


 
n!
p(x1 , . . . , xk ) = px1 1 · · · pxk k
x1 ! · · · xk !
where
1. xi ∈ {0, . . . , n} for i ∈ {1, . . . , k}
k
X
2. xi = n
i=1

Properties of Multinomial Distributions

1. E[Xi ] = npi

2. V [Xi ] = npi qi where qi = 1 − pi

3. Cov(Xi , Xj ) = −npi pj for i 6= j

table of contents 205


MATH 3301 / Notes Section 5.8

Using R for Multinomial Distributions

probability function dmultinom(x, size = NULL, prob, log = FALSE)

random multinomial vectors rmultinom(n, size, prob)

Using R for Multinomial Distributions

# For a multinomial distribution with p_1 = 0.2, p_2 = 0.5, p_3 = 0.3 and k = 3
# Generate 1,000,000 random vectors from the distribution.
# Compute cov(X_1, X_2)
# Compare the result to the analytic formula cov(X_1, X_2) = -n * p_1 * p_2

M <- rmultinom(n = 1e6, size = 10, prob = c(0.2, 0.5, 0.3))

cov(M[1, ], M[2, ])
[1] -1.002118
# using the formula cov(X_i, X_j) = -n * p_i * p_j

-10 * 0.2 * 0.5


[1] -1

table of contents 206


Section 5.8 MATH 3301 / Notes

Using R for Multinomial Distributions


The ages of auto policyholders are distributed according to the following table

Age Group Proportion


18-24 0.22
25-34 0.19
35-44 0.20
45-64 0.23
64-up 0.16

If five auto policy holders are randomly selected for a study, what is the probability that
one policy holder is selected from each age group?

dmultinom(c(1, 1, 1, 1, 1), size = 5, prob = c(0.22, 0.19, 0.20, 0.23, 0.16))


[1] 0.03691776

table of contents 207


MATH 3301 / Notes Section 5.9

5.9 Bi-variate Normal Distributions

The Bi-variate Normal Distribution

The random variables X and Y have a bi-variate normal distribution if the joint
density function of X and Y is

−Q/2
e
f (x, y) = p where Q is defined below
2πσX σY 1 − ρ2

(y − µY )2 (x − µX )(y − µY ) (x − µX )2
 
1
Q = − 2ρ +
1 − ρ2 σY2 σX σY σX2

Cov(X, Y )
ρ =
σX σY

table of contents 208


Section 5.9 MATH 3301 / Notes

Properties of the Bi-variate Normal Distribution

1. X and Y are independent if and only if ρXY = 0

2. E[X] = µX

3. V [X] = σX2

4. E[Y ] = µY

5. Y [Y ] = σY2
  σX

2 2

6. X|Y = y ∼ N µX + ρ (y − µY ), (1 − ρ )σX
σY
   σY 2 2

7. Y |X = x ∼ N µY + ρ (x − µX ), (1 − ρ )σY
σX

Multivariate Normal Distributions

There is a generalization of the bi-variate normal distribution to more than two


variables. We call these distributions multivariate normal distributions. These dis-
tributions are more complicated and described in terms of matrices and we will not
work with them in this class.

table of contents 209


MATH 3301 / Notes Section 5.9

Using R for Multivariate Normal Distributions


The mvtnorm package has built in functions for computing multi normal probabilities. As
an example consider two jointly continuous random variables X and Y that have a bi-variate
normal distribution with the following parameters.

µX = −1
σX = 1
µY = −1

σY = 2
Cov(X, Y ) = −0.30

Use the mvtnorm library to calculate P [−4 ≤ X ≤ −1, −1 ≤ Y ≤ 0]

# Load the package


library(mvtnorm)

# Define the vector of means


mean <- c(-1, -1)

# Define the variance-covariance matrix


sigma <- diag(c(1, 2))
sigma[lower.tri(sigma)] <- -0.30
sigma[upper.tri(sigma)] <- -0.30
sigma
[,1] [,2]
[1,] 1.0 -0.3
[2,] -0.3 2.0
# Define the upper and lower values
lower <- c(-4, -1)
upper <- c(-1, 0)

# Compute the probability


prob <- pmvnorm(lower = lower, upper = upper, mean = mean, sigma = sigma)
prob[[1]]
[1] 0.1373936

table of contents 210


Section 5.9 MATH 3301 / Notes

Graphing a Bi-variate Normal Density using persp()

# Define the distribution parameters


ux <- -1 # mean of X
sx <- +1 # sd of X
uy <- -1 # mean of Y
sy <- +sqrt(2) # sd of Y
sxy <- -0.5 # covariance of X and Y
rho <- (sxy) / (sy * sx) # correlation of X and Y

# Define the joint density function


Q <- function(x,y){(1 - rho ^ 2) ^ {-1} * (((y - uy) / sy) ^ 2 -
2 * rho * ((x - ux) / sx) * ((y - uy) / sy) + ((x - ux) / sx) ^ 2)}

f <- function(x,y){exp(-Q(x, y) / 2) / (2 * pi * sx * sy * sqrt(1 - rho ^ 2))}

# Plotting the Density Function


x <- seq(from = ux - 3 * sx, to = ux + 3 * sx, length = 5e1)
y <- seq(from = uy - 3 * sy, to = uy + 3 * sy, length = 5e1)
z <- outer(x, y, FUN = 'f')

persp(x, y, z,
main = paste('Bivariate Normal Density '),
col = 'lightblue',
theta = 30,
phi = 20,
r = 50,
d = 0.1,
expand = 0.5,
ltheta = 90,
lphi = 180,
shade = 0.75,
ticktype = 'simple',
border = FALSE,
zlab = '',
xlab = '',
ylab = '',
box = FALSE)

table of contents 211


MATH 3301 / Notes Section 5.9

Bivariate Normal Density

table of contents 212


Section 5.9 MATH 3301 / Notes

Bi-variate Normal Probabilities Using Iterated Integration

Use iterated integration and the the function defined in the previous example to
calculate P [−4 ≤ X ≤ −1, −1 ≤ Y ≤ 0]

# Define the distribution parameters


ux <- -1 # mean of X
sx <- +1 # sd of X
uy <- -1 # mean of Y
sy <- +sqrt(2) # sd of Y
sxy <- -0.3 # covariance of X and Y
rho <- (sxy) / (sy * sx) # correlation of X and Y

# Source files
source('bivariate.R')
source('iterated_integral.R')

# Define the limits of integration


xl <- function(y){-4}
xu <- function(y){-1}
yl <- function(x){-1}
yu <- function(x){0}

# Calculate the probability


iterated_integral(xl, xu, yl, yu, f, dx = 1)$value
[1] 0.1373936

This is the same answer we got when we used the mvtnorm package.

table of contents 213


MATH 3301 / Notes Section 5.10

5.10 Conditional Expectations


5.10.1 Conceptual Formulas

Definition of Conditional Expected Value

Let X and Y be random variables and let g(X) be a function of X. The conditional
expectation of g(X) given Y = y is defined as

Z+∞
1. E[g(X)|Y = y] = g(x)f (x|y) dx for X and Y jointly continuous.
−∞

X
2. E[g(X)|Y = y] = g(x)p(x|y) for X and Y jointly discrete.
all x

Conditioning Formulas

The following formulas are sometimes call the conditioning formulas. The proofs
are not too hard and rely on the identity f (x, y) = f (x|y)fY (y). The proofs are
contained in the textbook.

h i
1. E[X] = EY EX [X|Y ]

h i h i
2. V [X] = EY VX [X|Y ] + VY EX [X|Y ]

table of contents 214


Section 5.10 MATH 3301 / Notes

Conditional Variance

Let X and Y be random variables. The conditional variance of X given Y = y is

2
V [X|Y = y] = E[X 2 |Y = y] − E[X|Y = y]

SOA Exam P # 203 (Conditional Variance)

A machine has two components and fails when both components fail. The number of years
from now until the first component fails, X, and the number of years from now until the
machine fails, Y , are random variables with joint density function

1 −(x+y)/6

 18 e , 0<x<y


f (x, y) =


, otherwise

Calculate V (Y |X = 2).

table of contents 215


MATH 3301 / Notes Section 5.10

# Define the joint density function


f <- function(x,y){(0 < x) * (x < y) * (1 / 18) * exp(-(x + y) / 6)}

# Define the marginal density function of x

f_x <- function(x){integrate(function(y){f(x, y)}, lower = x, upper = Inf)$value}

# Define the conditional density function


f_cond <- function(y){f(x = 2, y) / f_x(2)}

# Calculate E[Y^2|X=2]
E_Y2 <- integrate(function(y){y ^ 2 * f_cond(y)}, lower = 2, upper = Inf)$value

# Calculate E[Y|X=2]
E_Y1 <- integrate(function(y){y ^ 1 * f_cond(y)}, lower = 2, upper = Inf)$value

# Calculate V[Y|X=2]
V_Y <- E_Y2 - (E_Y1)^2
V_Y
[1] 36

table of contents 216


Section 5.10 MATH 3301 / Notes

Ch5 Index

John Garza

table of contents 217


MATH 3301 / Notes Section 5.10

Ch5 Index

John Garza

table of contents 218


Section 5.10 MATH 3301 / Notes

Ch5 Index

John Garza

table of contents 219


MATH 3301 / Notes Section 5.10

5.10.2 Conditional Expectations from Excel Data

The first part of this section described conceptual formulas for theoretical models. In
practice you may instead find yourself working with data that is presented in an excel
spreadsheet. Here we will look at importing bivariate data from excel into R, plotting a
bivariate histogram, and then computing a conditional expected value based on the excel
spread sheet data
# Load the readxl package
library(readxl)

# Import the data from excel


M <- read_excel('C:/Users/100145123/Desktop/ACTS 131/New Notes/data.xls')

# View the first few rows of data


head(M)
# A tibble: 6 x 2
X Y
<dbl> <dbl>
1 1.04 1.53
2 1.61 0.941
3 -0.858 4.24
4 1.16 3.01
5 -0.762 -0.0286
6 1.92 0.703
# Create bivariate histogram of this data
G <-
ggplot(
data = M)+
geom_hex(
mapping = aes(x = X, y = Y, fill = ..count..),
color = 'black',
size = 0.5)+
scale_fill_gradient(
name = 'Count',
low = 'white',
high = 'red')+
labs(

table of contents 220


Section 5.10 MATH 3301 / Notes

title = 'A Two Dimendional Histogram',


x = 'x-axis',
y = 'y-axis',
caption = 'Statistics') +
theme_bw() +
theme(
legend.position = c(0.1, 0.16),
axis.title.x = element_text(vjust = +7),
axis.title.y = element_text(vjust = -7, hjust = 0.45),
text = element_text(
face = 'italic',
size = 16,
family = 'serif',
color = 'black'),
axis.ticks = element_blank())

table of contents 221


MATH 3301 / Notes Section 5.10

A Two Dimendional Histogram

5.0

2.5
y−axis

0.0
Count
12.5
10.0
7.5
5.0
−2.5 2.5

−2 0 x−axis 2 4

Statistics

table of contents 222


Section 5.10 MATH 3301 / Notes

Example:

Calculate E[sin(X) | Y > 1] and V[cos(Y ) | 1 ≤ X ≤ 3]

M %>%
filter(Y > 1) %>%
mutate(sin_x = sin(X)) %>%
summarize(ex_conditional = mean(sin_x))
# A tibble: 1 x 1
ex_conditional
<dbl>
1 0.557
M %>%
filter(between(x = X, left = 1, right = 3)) %>%
mutate(sin_y = sin(Y)) %>%
summarize(variance = var(sin_y))
# A tibble: 1 x 1
variance
<dbl>
1 0.367

table of contents 223


MATH 3301 / Notes Section 5.10

5.10.3 Categorical Data


Import the excel spreadsheet ’categorical data.xls’ into R and create a heat map showing
the relationship between three categorical variables
# Load the readxl package
library(readxl)

# Import data from excel spreadsheet


M <-
read_excel('C:/Users/john/Desktop/ACTS 131/UTPB/excel data/cateorical data.xls')

# Review M
head(M)
# A tibble: 6 x 3
stores products status
<chr> <chr> <chr>
1 Store - A Product - A high
2 Store - A Product - B very high
3 Store - A Product - C high
4 Store - A Product - D high
5 Store - A Product - E high
6 Store - A Product - F very low
unique(M$status)
[1] "high" "very high" "very low" "medium" "low"
# Set levels for status
M$status <- factor(M$status, levels = c('very high', 'high', 'medium', 'low', 'very low'))

# Create the tile map


G <-
ggplot(
data = M,
mapping = aes(x = stores, y = products)) +
geom_tile(
mapping = aes(fill = status),
color = 'black') +
scale_fill_manual(
name = 'Danger Level',
values = rainbow(5)) +
theme_classic() +
labs(
title = 'Danger Levels',
x = NULL,
y = NULL,
caption = NULL)+
theme(

table of contents 224


Section 5.10 MATH 3301 / Notes

axis.line = element_blank(),
axis.ticks = element_blank(),
text = element_text(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'),
axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))

Danger Levels
Product − Z
Product − Y
Product − X
Product − W
Product − V
Product − U
Product − T
Product − S
Product − R
Product − Q
Product − P Danger Level
Product − O very high
Product − N high
Product − M medium
Product − L low
Product − K very low
Product − J
Product − I
Product − H
Product − G
Product − F
Product − E
Product − D
Product − C
Product − B
Product − A
St ore − A
Store − CB
Store − D
Store − E
o F
St re −− G
St ore H
S ore − I
Sttore − J
Store − K
St ore − ML
Store − N
St ore − O
Store − P
Store − Q
Sttore − R
Store − S
St ore − T
o U
St re − V
Store− W
Store − X
e−Y
Z
St ore −

or −
St ore −

St ore −

S ore −
Store
St

table of contents 225


MATH 3301 / Notes Section 5.10

table of contents 226


Chapter 6

Functions of Random Variables

6.1 Introduction

This chapter deals with functions of random variables. Topics include


• transformations of a single random variable
• the transformation formula
• moment-generating functions
• Jacobians
• order statistics

”Random Sample from a Population with Density f (x)”

The book assumes that population sizes are much larger than sample sizes and that
random variables obtained through random sampling are independent and identically
distributed. As a result, if X1 , . . . , Xn is a random sample from a distribution with
density function f (x). Then the joint density function will be
f (x1 , x2 , . . . , xn ) = f (x1 )f (x2 ) · · · f (xn )
If Y1 , Y2 , . . . , Yn is a random sample from a discrete distribution with probability
function p(y), then the joint probability function will be
p(y1 , y2 , . . . , yn ) = p(y1 )p(y2 ) · · · p(yn )
MATH 3301 / Notes Section 6.2

6.2 Probability Distributions for Functions of Random Variables

Introduction

Let U (X1 , X2 , . . . , Xn ) be a function of the random variables X1 , X2 , . . . , Xn and let


FU (u) be the cumulative distribution function of U . This chapter describes three
methods of determining FU (u).

The Distribution of a Single Function of a Random Variable

1. Method of Distribution Function: Find FU (u) directly by using the defini-


tion of U and FU (u) = P [U ≤ u] then obtain fU (u) using
d
fU (u) = F (u)
du U
2. Method of Transformations: If h(X1 ) is a strictly increasing or strictly de-
creasing function of X1 , we can apply a transformation formula to obtain the
density of h(X1 ). If U (X1 , X2 ) is a function of two random variables we can ap-
ply a modified version of the transformation formula to obtain the joint density
of g(X1 , U ) of X1 and U . The density fU (u) is then obtained as the marginal
distribution of U by integrating out X from the joint density.
Z+∞
fU (u) = g(X1 , U ) dX1
−∞

3. Method of Moment Generating Functions: This method uses the fact


that random variables with the same moment-generating functions have the
same distribution. If we can find the moment generating function of the random
variable U and identify it with a known distribution then we have found the
distribution of U

table of contents 228


Section 6.2 MATH 3301 / Notes

The Joint Distribution of Several Functions of Random Variables

4. Method of Bivariate Transformations: Let U (X1 , X2 ) and V (X1 , X2 ) be


functions of the random variables X1 and X2 . Under certain conditions, the
bivariate transformation method can use a Jacobian determinant to find the
joint density function fU,V (u, v).

Note: The next sections will describe each of these methods in more detail.

table of contents 229


MATH 3301 / Notes Section 6.3

6.3 The Method of Distribution Functions

Introduction

Let X be a continuous random variable with density function fX (x) and let U (X)
be a function of X. We can solve for the cumulative distribution function FU (u) of
U directly by integrating X over the region corresponding to U ≤ u. The density
function fU (u) can then be found by differentiation using
dh i
fU (u) = F (u)
du U

Summary of the Method of Distribution Functions

The textbook summarizes the method of distribution function in four steps. Let U
be a function of the random variables X1 , . . . , Xn

1. Identify the region U = u in terms of x1 , . . . , xn

2. Identify the region U ≤ u

3. Integrate the joint density function f (x1 , . . . , xn ) over the region U ≤ u to ob-
tain FU (u) = P [U ≤ u]

4. Find the density function fU (u) by differentiating


dh i
fU (u) = F (u)
du U

Note

You will have to do many practice questions to get good at this method

table of contents 230


Section 6.3 MATH 3301 / Notes

Example:

Let X be a continuous random variable that is uniform over the interval [0, 1]

Define U = X 2 . Sole for the density function of U , fU (u).



1 , for 0 ≤ x ≤ 1
 " #
d
f (x) = fU (u) = F [u]
 du U
0 , otherwise

" #
d √
F [X] = P [X ≤ x] = u
du

Zx
1
= f (t) dt = √ 0≤u≤1
2 u
0

Zx
= 1 dt
0

= x

FU [u] = P [U ≤ u]

= P [X 2 ≤ u]

= P [X ≤ u]

= FX [ u]

= u

table of contents 231


MATH 3301 / Notes Section 6.3

# Generate 1000 random uniform numbers


M <- data_frame(xr = runif(min = 0, max = 1, n = 1e5))

# View the first few rows of M


M %>% head %>% kable
xr
0.2785159
0.8718237
0.4035373
0.3403416
0.4681988
0.1207040
# Mutate to add a new column x.squared
M %<>% mutate(x.squared = xr ^ 2)

# View the first few rows of M


M %>% head %>% kable
xr x.squared
0.2785159 0.0775711
0.8718237 0.7600766
0.4035373 0.1628423
0.3403416 0.1158324
0.4681988 0.2192102
0.1207040 0.0145695
# Create an histograms with density curves on top
M %<>%
mutate(
us = seq(from = 0.001, to = 1, length = 1e5),
fu = 0.5 / sqrt(us))

table of contents 232


Section 6.3 MATH 3301 / Notes

# Create a histogram of u
Gu <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = x.squared, y = ..density..),
fill = 'darkred',
color = 'darkred',
size = 0.2,
alpha = 0.5,
bins = 77) +
geom_line(
mapping = aes(x = us, y = fu),
color = 'darkred',
size = 1.2) +
geom_hline(
yintercept = 1,
linetype = 'solid',
size = 1.2,
color = 'darkblue') +
labs(
title = 'Squared Random Uniform Numbers',
x = 'x.squared values',
y = 'Relative Frequency',
caption = 'Statistics I, Section 6.4') +
scale_y_continuous(
breaks = c(0, 1, 2, 3, 4, 5),
labels = c(0, 1, '', '', 4, 5),
limits = c(0, 5)) +
scale_x_continuous(
breaks = c(0.00, 0.25, 0.50, 0.75, 1.00),
labels = c('0.00', '0.25', '', '0.75', '1.00'),
limits = c(0, 1)) +
theme_bw() +
theme(
axis.ticks = element_blank(),
text = element_text(

table of contents 233


MATH 3301 / Notes Section 6.3

size = 16,
family = 'serif',
face = 'italic',
color = 'black'),
plot.title = element_text(hjust = 0.5))

Gu

Squared Random Uniform Numbers


5

4
Relative Frequency

0.00 0.25 0.75 1.00


x.squared values
Statistics I, Section 6.4

table of contents 234


Section 6.4 MATH 3301 / Notes

6.4 The Transformation Method

Introduction

The transformation method is a special case of the method of distributions. Let


X be a random variable with density function f (x) and let U = h(x) be a strictly
increasing or decreasing function of X on the support of X. Then h(x) has an inverse
function h−1 (u). The transformation formula relies on the following idea. Suppose
that U = h(X) is strictly increasing

FU (u) = P [U ≤ u]

= P [h(X) ≤ u]

= P [h−1 (h(X)) ≤ h−1 (u)]

= P [X ≤ h−1 (u)]

= FX [h−1 (u)]

Applying the Chain Rule

Now apply the chain rule for differentiation to obtain the density fU (u).

d
fU (u) = FX [h−1 (u)]
du
d −1
= fX [h−1 (u)] · h (u)
du

= fX (x(u))x0 (u)

table of contents 235


MATH 3301 / Notes Section 6.4

The Derivative is a Positive Function

Since h(x) is strictly increasing, h−1 (u) is also strictly increasing. Therefore

d −1
h (u) > 0
du

d d −1
−1
h (u) = h (u)

du du

This leads to the transformation formula

d 
−1 −1
fU (u) = fX (h (u)) h (u)

du

= fX (x(u))|x0 (u)|

table of contents 236


Section 6.4 MATH 3301 / Notes

Summary of the Transformation Method

The textbook summarizes the transformation method in three steps. Let X be


a continuous random variables with density function fX (x). Let W = g(X) be a
strictly increasing or strictly decreasing function of x whenever fX (x) > 0. To find
the density functions fW (w) of W ,

1. Find the inverse function g −1 (w)

d  −1 
2. Calculate the derivative g (w)
dw


 −1  d  −1
3. fW (w) = fX g (w) g (w)
dw

table of contents 237


MATH 3301 / Notes Section 6.4

Applying the Transformation Formula for a Function of Two Variables

The textbook provides examples of using the transformation formula in to find


fU (u) in the case where U (X, Y ). To obtain fU (u) in the case, the textbook uses
the following steps:

1. Fix the value of X = x

2. Define g(Y ) = U (x, Y ). x is fixed, so g is really a function of Y only

3. Apply the transformation formula to get the joint density of U and X, f (x, u)

4. Obtain the density of fU (u) by integrating out x

Z+∞
fU (u) = f (x, u) dx
−∞

table of contents 238


Section 6.4 MATH 3301 / Notes

Example:

Let X be continuous and uniformly distributed over the interval 1 ≤ x ≤ 2.

Solve for the probability density function, fU (u) of U = X 2

Solution:





1 , for 1 ≤ x ≤ 2



f (x) =

0 , otherwise





u(x) = x2


x(u) = u

1
x0 (u) = √
2 u

fU (u) = fX (x(u))|x0 (u)|

1
= (1) × √
2 u

1
= √ , 1≤u≤4
2 u

table of contents 239


MATH 3301 / Notes Section 6.4

# Clear the working environment


remove(list = ls())

# Define f(u) and fx


fx <- function(x){(1 <= x & x <= 2)* 1}
fu <- function(u){(1 <= u & u <= 4) / (2 * sqrt(u))}

# Create a data_frame
M <-
data_frame(
x = runif(n= 10000, min = 1, max = 2),
u = x ^ 2,
xs = seq(from = 1, to = 2, length = 1e4),
fx = fx(xs),
us = seq(from = 1, to = 4, length = 1e4),
fu = fu(us))

G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = us, y = fu),
color = 'darkgreen',
size = 1.2) +
geom_histogram(
mapping = aes(x = xs, y = ..density..),
fill = 'darkred',
alpha = 0.3,
bins = 77,
color = 'black',
size = 0.4) +
geom_histogram(
mapping = aes(x = u, y = ..density..),
fill = 'darkorange',
alpha = 0.2,
bins = 77,
color = 'black',

table of contents 240


Section 6.4 MATH 3301 / Notes

size = 0.4) +
labs(
title = 'The Transformation Formula',
y = 'Relative Frequency',
x = 'Random Numbers',
caption = 'Statistics I / Section 6.4') +
theme_bw() +
theme(
plot.margin = margin(unit = 'cm', c(1, 1, 1, 1)),
axis.title.y = element_text(vjust = +5),
axis.title.x = element_text(vjust = -5),
plot.title = element_text(hjust = 0.5),
axis.ticks = element_blank(),
text = element_text(
size = 16,
face = 'italic',
family = 'serif'))
G

table of contents 241


MATH 3301 / Notes Section 6.4

The Transformation Formula


1.00

0.75
Relative Frequency

0.50

0.25

0.00

1 2 3 4

Random Numbers Statistics I / Section 6.4

table of contents 242


Section 6.5 MATH 3301 / Notes

6.5 The Method of Moment Generating Functions

The Uniqueness Theorem of Moment Generating Functions

Let X and Y be variables with moment generating functions mX (t) and mY (t). If
mX (t) = mY (t)
for all values of t then X and Y have the same probability distribution.

Sums of Independent Random Variables

Let X1 , . . . , Xn be independent random variables with moment generating functions


n
X
m1 (t), . . . , mn (t). Then the moment generating function of Y = Xi is
i=1

mY (t) = m1 (t) × m2 (t) × · · · × mn (t)

Proof:

mY (t) = E[etY ]

= E[et(X1 +···+Xn ) ]

= E[etX1 etX2 · · · etXn ]

= E[etX1 ] × · · · × E[etXn ]

= m1 (t) × · · · × mn (t)

table of contents 243


MATH 3301 / Notes Section 6.5

Sums of Independent Normal Random Variables

One application of the previous theorems is that a linear combination of inde-


pendent normal random variables is normal. More specifically, let X1 , . . . , Xk
2
be independent normal random variables Pn with E[Xi ] = µi and V [Xi ] = σi . Let
a1 , . . . , ak be constants and define Y = i=1 ai Xi . Then

µY = E[Y ]

= a1 µ1 + · · · + an µn

σY2 = V [Y ]

= a21 σ12 + · · · + a2n σn2

Y ∼ N µY , σY2


table of contents 244


Section 6.5 MATH 3301 / Notes

Sums of Squares of Independent Standard Normal Random Variables

The textbook uses the results of this section to derive the following result.

1. Let X1 , . . . , Xn be independent normal random variables

2. E[Xi ] = µi and V [Xi ] = σi2

Xi − µi
3. Let Zi =
σi
n
X
4. Define Y = Zi2
i=1

5. Then Y has a χ2 distribution with df = n.

Summary of the Method of Moment Generating Functions

Let W be a function of the random variables X1 , . . . , Xn

1. Find the moment generating function for W , mW (t)

2. Compare mW (t) to known moment generating functions. If mW (t) = mY (t) for


all t then W has the same distribution as Y

table of contents 245


MATH 3301 / Notes Section 6.6

6.6 Transformations Using Jacobians

Introduction

Let X and Y be continuous random variables with joint density function fX,Y (x, y).
Suppose that U (X, Y ) and V (X, Y ) are functions of the random variables. How can
we determine the joint density function of U and V ? Under certain conditions, the
bi-variate transformation method can be used. Before describing this method, lets
extend the definition of support to joint density functions.

Support of a Joint Density Function

Let X and Y be continuous random variables with joint density function fX,Y (x, y).
The support of fX,Y (x, y) is the set
n o
2

support of fX,Y (x, y) = (x, y) ∈ R fX,Y (x, y) > 0

table of contents 246


Section 6.6 MATH 3301 / Notes

Jacobian of a Transformation

Let T : (x, y) → (u, v) be a continuously differentiable transformation. The


Jacobian is


∂x ∂x

∂(x, y) ∂u ∂v
=

∂(u, v)

∂y ∂y

∂u ∂v

! ! ! !
∂x ∂y ∂x ∂y
= −
∂u ∂v ∂v ∂u

table of contents 247


MATH 3301 / Notes Section 6.6

The Bi variate Transformation Method

Let X and Y be continuous random variables with joint density function fX,Y (x, y)
and suppose that T : (x, y) → (u, v) is a one-to-one function on the support of
fX,Y (x, y). If x(u, v) and y(u, v) have continuous partial derivatives with respect to
u and v and if

∂(x, y)
J =
∂(u, v)

! ! ! !
∂x ∂y ∂x ∂y
= −
∂u ∂v ∂v ∂u

6= 0

Then, the joint density function of U and V is

 
fU,V (u, v) = fX,Y x(u, v), y(u, v) × J

table of contents 248


Section 6.6 MATH 3301 / Notes

Example:

X and Y are random variables with joint density function


8xy
 , 0≤x≤y≤1
f (x, y) =

0 , otherwise

Define U = X/Y and V = Y .

Derive fU,V (u, v) using the bivariate transformation method.

table of contents 249


MATH 3301 / Notes Section 6.6

U = X/Y fU,V (u, v) = fX,Y (x(u, v), y(u, v))|J|

V = Y = 8x(u, v)y(u, v)|v|

⇓ = 8(uv)(v)(v)

x(u, v) = uv = 8uv 3

y(u, v) = v 0 ≤x≤y≤ 1

⇓ ⇓

∂x
= v 0 ≤v≤ 1
∂u
∂x
= u 0 ≤u≤ 1
∂v
∂y
= 1
∂v
∂y
= 0
∂u

∂(x, y)
J =
∂(u, v)

∂x ∂y ∂x ∂y
J = −
∂u ∂v ∂v ∂u

J = (v)(1) − (u)(0)

J = v

table of contents 250


Section 6.6 MATH 3301 / Notes

# Load the magrittr package


library(magrittr)
library(gridExtra)

# Define f(x, y)
f <- function(x, y){8 * x * y * (0 <= x & x <= y & y <= 1)}

# Create a grid of x,y values


Mxy <-
expand.grid(
x = seq(from = 0, to = 1, length = 150),
y = seq(from = 0, to = 1, length = 150))

# Add a columns for the density values


Mxy %<>% mutate(z = f(x, y))

# Create the tile plot


Gxy <-
ggplot(
data = Mxy) +
geom_tile(
mapping = aes(x = x, y =y, fill = z)) +
scale_fill_gradient(
low = 'white',
high = 'darkred')+
labs(
title = expression(italic('f(x, y) = 8xy')),
fill = 'Density',
x = NULL,
y = NULL,
caption = NULL) +
theme_classic() +
theme(
plot.title = element_text(hjust = 0.5),
legend.position = c(0.85, 0.15),
legend.key.width = unit(units = 'cm', 1.25),
panel.border = element_rect(

table of contents 251


MATH 3301 / Notes Section 6.6

fill = NA,
size = 1),
axis.ticks = element_blank(),
text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))

# Define g(u, v)
g <- function(u, v){8 * u * v ^ 3 * (0 <= u & u <= 1) * (0 <= v & v <= 1)}

# Create a grid of x,y values


Muv <-
expand.grid(
u = seq(from = 0, to = 1, length = 150),
v = seq(from = 0, to = 1, length = 150))

# Add a columns for the density values


Muv %<>% mutate(z = g(u, v))

# Create the tile plot


Guv <-
ggplot(
data = Muv) +
geom_tile(
mapping = aes(x = u, y = v, fill = z)) +
scale_fill_gradient(
low = 'white',
high = 'darkgreen')+
labs(
title = 'u = xy, v = y',
fill = 'Density',
x = NULL,
y = NULL,
caption = NULL) +
theme_classic() +

table of contents 252


Section 6.6 MATH 3301 / Notes

theme(
plot.title = element_text(hjust = 0.5),
legend.position = c(0.15, 0.15),
legend.key.width = unit(units = 'cm', 1.25),
panel.border = element_rect(
fill = NA,
size = 1),
axis.ticks = element_blank(),
text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))

grid.arrange(Gxy, Guv, nrow = 1)

table of contents 253


MATH 3301 / Notes Section 6.6

f(x, y) = 8xy u = xy, v=y

1.00 1.00

0.75 0.75

0.50 0.50

0.25 0.25
Density Density
8 8
6 6
4 4
2 2
0.00 0 0.00 0

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

table of contents 254


Section 6.7 MATH 3301 / Notes

6.7 Order Statistics

Introduction

Let X1 , . . . Xn be independent continuous random variables from a distribution


with density f (x) and cumulative distribution function F [x]. The ordered random
variables

X(1) ≤ X(2) ≤ · · · ≤ X(n)

are called order statistics.

Maximum and Minimum

X(1) = min{X1 , . . . , Xn }

X(n) = max{X1 , . . . , Xn }

table of contents 255


MATH 3301 / Notes Section 6.7

Density and CDF of the Maximum

The method of distribution functions can be used to find the densities X(1) and
X(n) . Let F(n) (x) be the distribution function of X(n) and let g(n) (x) be the density
function of X(n)

F(n) (x) = P [X(n) ≤ x]

= P [X1 ≤ x, X2 ≤ x, . . . , Xn ≤ x]

= P [X1 ≤ x] × P [X2 ≤ x] × · · · × P [Xn ≤ x]

 n
= F (x)

dh i
g(n) (x) = F(n) (x)
dx
d n
= [F (x)
dx

= n · f (x) · F [x]n−1

table of contents 256


Section 6.7 MATH 3301 / Notes

Density and CDF of the Minimum

The method of distribution functions can be used to derive the density and cumu-
lative distribution function of min{X1 , . . . , Xn }. Let the cumulative distribution
function and density of X(1) be denoted by F(1) (x) and g(1) (x).

F(1) (x) = P [min{X1 , . . . , Xn } ≤ x]

= 1 − P [min{X1 , . . . , Xn } > x]

= 1 − P [X1 > x, X2 > x, . . . , Xn > x]

= 1 − P [X1 > x] × P [X2 > x] × · · · × P [Xn > x]

n
= 1 − 1 − F [x]

dh i
g(1) (x) = F(1) (x)
dx
d n 
= 1 − 1 − F [x]
dx
 n−1
= n · f (x) · 1 − F [x]

table of contents 257


MATH 3301 / Notes Section 6.7

The Density Function of the kth Order Statistic

The textbook uses the multinomial distribution to provide a heuristic explanation


for the kth order statistics density function. Let X1 , . . . , Xn be independent
continuous random variables from a distribution with density function f (x) and
cumulative distribution function F [x]. For k ∈ {1, , . . . , n} let g(k) (x) be the density
function of X(k) . Then,

n! n−k
× F (x)]k−1 × 1 − F (x)
 
g(k) (x) = × f (x)
(k − 1)!(n − k)!

table of contents 258


Section 6.7 MATH 3301 / Notes

The Joint Density Function of Two Order Statistics

The textbook also uses the multinomial distribution to provide a heuristic derivation
for the joint density function of two order statistics. Let j and k be elements of
{1, 2, . . . , n} such that j < k and le X1 , . . . , Xn be independent continuous random
variables from a distribution with density function f (x) and cumulative distribution
function F [x]. Then the joint density function of the order statistics X(j) and X(k)
is
n!  j−1  k−1−j
g(j),(k) (xj , xk ) = × F (xj ) × F (xk ) − F (xj )
(j − 1)!(k − 1 − j)!(n − k)!
 n−k
× 1 − F (xk ) × f (xj ) × f (xk )

table of contents 259


MATH 3301 / Notes Section 6.7

Exploring Order Statistics Using R

# Set the parameters n and k


k <- 2
n <- 3

# Set the number of simulations


nr <- 1e4

# Generate random exponential numbers


r <- rexp(n = nr * n, rate = 1)

# Create a matrix
M <- matrix(r, nrow = nr, ncol = n)

# Sort the rows


for(i in 1:nrow(M)){M[i, ] <- sort(M[i, ])}

# View the first 5 rows of M


head(M, 5)

[,1] [,2] [,3]


[1,] 0.6745128 1.1950745 2.174020
[2,] 0.2337236 1.5698768 3.221263
[3,] 0.2637691 0.3542450 1.655537
[4,] 0.2114046 0.4367300 1.010346
[5,] 0.2097365 0.3495607 2.894480

# Convert to a data_frame and assign column names


M %<>%
as_data_frame %>%
rename(X_3 = V1, X_2 = V2, X_1 = V3) %>%
mutate(x = seq(from = 0, to = 10, length = nr))

# Define the density of X_(2)


g_2 <- function(x){factorial(n) / factorial(k - 1) / factorial(n - k) *
pexp(x, rate = 1) ^ (k - 1) *(1 - pexp(x, rate = 1)^{n - k}) * dexp(x, rate = 1)}

G <-
ggplot(
data = M) +
geom_histogram(
mapping= aes(x = X_2, y = ..density..),
fill = 'darkorange',
alpha = 0.2,
color = 'black',
bins = 50) +

table of contents 260


Section 6.7 MATH 3301 / Notes

geom_line(
mapping = aes(x = x, y = g_2(x)),
color = 'darkgreen',
size = 1.2) +
scale_x_continuous(
limits = c(0, 3)) +
scale_y_continuous(
breaks = c(0.00, 0.25, 0.50, 0.75, 1.00),
labels = c('0.00', '0.25', '', '0.75', '1.00'),
limits = c(0, 1)) +
labs(
title = 'Second Order Statistic / Relative Frequency',
x = 'x-values',
y = 'Relative Frequency',
caption = 'Statistics / Section 6.7') +
theme(
axis.ticks = element_blank(),
axis.title.y = element_text(vjust = -6),
axis.title.x = element_text(vjust = +6),
text = element_text(
size = 16,
face = 'italic',
family = 'serif'),
plot.title = element_text(hjust = 0.5))

table of contents 261


MATH 3301 / Notes Section 6.7

Second Order Statistic / Relative Frequency


1.00

0.75
Relative Frequency

0.25

0.00

0 1 x−values 2 3

Statistics / Section 6.7

table of contents 262


Chapter 7

The Central Limit Theorem

7.1 Introduction

Recall the definition of a random sample. Let X1 , . . . , Xn be the random variables


observed from a random sample. Then the variables X1 , . . . , Xn are independent
and have the same distribution. Functions of the random variables that result from
a random sample are themselves random variables. These functions are used to
estimate population parameters such as the mean µ and the standard deviation σ.
For example, we might use the sample mean
n
1X
X = Xi
n i=1

to estimate the population mean µ. The sample mean is an example of a statistic.

Statistic

A statistic is a function of observable random variables in a sample and known


constants. Examples statistics include the sample mean, sample variance, and order
statistics.

Sampling Distribution

The probability distribution of a statistic is called its sampling distribution. The


sampling distribution of a statistic is a model for the relative frequency histogram
of the possible values for the statistics.
MATH 3301 / Notes Section 7.1

Example:

Generate 1000 samples each of size 1000 from an gamma distribution with shape parameter
2 and scale parmeters 3.

Compute the sample mean of each sample and plot a histogram of the results.

# Define the number of samples


ns <- 10000

# Define the size of each sample


sz <- 1000

# Generate the random numbers


rg <- rgamma(n = ns * sz, shape = 2, scale = 3)

# Assemble the random numbers into a matrix


M <- matrix(rg, nrow = sz, ncol = ns)

# Convert M into a matrix and compute the sample means


N <-
M %>%
as_data_frame %>%
summarize_all(mean) %>%
gather(value = sample_mean, key = sample)

# Plot the relative frequency histogram of sample means


G <-
ggplot(
data = N) +
geom_histogram(
mapping = aes(x = sample_mean, y = ..density..),
fill = 'darkred',
alpha = 0.2,
color = 'black',

table of contents 264


Section 7.1 MATH 3301 / Notes

size = 0.2,
bins = 50) +
scale_y_continuous(
breaks = c(0, 1, 2, 3),
labels = c(0, '','', 3)) +
scale_x_continuous(
breaks = c(5.7, 6.0, 6.3),
labels = c(5.7, '',6.3)) +
labs(
title = 'The Sampling Distribution of the Sample Mean',
x = 'Sample Means',
y = 'Relative Frequency',
caption = 'Statistics I / Section 7.1') +
theme(
axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5),
text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))
G

table of contents 265


MATH 3301 / Notes Section 7.1

The Sampling Distribution of the Sample Mean


3
Relative Frequency

5.7 6.3
Sample Means
Statistics I / Section 7.1

table of contents 266


Section 7.2 MATH 3301 / Notes

7.2 Sampling Distributions from a Normal Distribution

The Sample Mean for Normal Distributions

In chapter 6, the method of moment generating functions was used to show that
a linear combination of normal random variables is normal. The mean of a set of
random variables is a linear combination of the variables. As a result, we have the
following theorem. Let X1 , . . . Xn be a random sample from a normal distribution
with mean µ and variance σ 2 . Then

n
1X
X = Xi
n i=1

X ∼ N µ, σ 2 /n


µX = µ

2 σ2
σX =
n
1
The mean stays the same but that the variance is reduced by a factor of .
n

table of contents 267


MATH 3301 / Notes Section 7.2

Sums of Squares of Standardized Normal Random Variables

Let X1 , . . . , Xn be a random sample from a normal distribution with mean µ and


variance σ 2 . For i ∈ {1, 2, . . . , n}. Define
Xi − µ
Zi =
σ
Then Zi ∼ N (0, 1) and
n n
!2
X X Xi − µ
Zi2 =
i=1 i=1
σ

has a χ2 distribution with df = n.

Using R for χ2 Questions


Let Z1 , Z2 , . . . , Z7 be a random sample from the standard normal distribution. Find the
value of b such that
P Z12 + · · · + Z72 ≤ b = 0.88
 

# we are lookng for the 88th percentile of a chi-squared, df=7


qchisq(p = 0.88, df= 7)
[1] 11.45414
# check your answer
pchisq(q = 11.45414, df = 7)
[1] 0.8800001

table of contents 268


Section 7.2 MATH 3301 / Notes

The Sampling Distribution of the Sample Variance S 2

Let X1 , . . . , Xn be a random sample from the distribution X ∼ N (µ, σ 2 ). Remember


that the sample mean and the sample variance were defined as

n
1X
X = Xi
n i=1

n
2 1 X 2
S = Xi − X
n − 1 i=1

The previous theorem can be used to show that

n
(n − 1)S 2 1 X 2
1. = X i − X ∼ χ2 with df = (n − 1)
σ2 σ 2 i=1

2. S 2 and X are independent random variables.

table of contents 269


MATH 3301 / Notes Section 7.2

The Student’s t distribution

Let Z ∼ N (0, 1) and W ∼ χ2 with df = ν be independent random variables. Define

Z
T = p
W/ν
Then T is said to have a t distribution with ν df

R Functions for the t Distribution

density dt(x, df, ncp, log = FALSE)


cdf pt(q, df, ncp, lower.tail = TRUE, log.p = FALSE)
quantile function qt(p, df, ncp, lower.tail = TRUE, log.p = FALSE)
random numbers rt(n, df, ncp)

Using R to Explore the t-Distribution

# Clear the environmet


rm(list = ls())

# Set initial parameters


n <- 1e4
v <- 4

# Generate random normal, chi^2 numbers, and T values


M <-
data_frame(
Z = rnorm(n),
W = rchisq(n, df = v),
T_vals = Z / sqrt(W/v),
t = dt(T_vals, df = v))

# Plot a histogram
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = T_vals, y = ..density..),
fill = 'darkred',

table of contents 270


Section 7.2 MATH 3301 / Notes

alpha = 0.4,
bins = 100,
size = 0.2,
color = 'black') +
scale_x_continuous(
breaks = c(-5.0, -2.5, 0.0, 2.5, 5.0),
labels = c('-5.0', '-2.5', '', '2.5', '5.0'),
limits = c(-5, 5)) +
scale_y_continuous(
breaks = c(0.0, 0.1, 0.2, 0.3, 0.4),
labels = c('0.0', '0.1', '', '', '0.4'),
limits = c(0.0, 0.44)) +
geom_line(
mapping = aes(x = T_vals, y = t),
size = 1.3,
color = 'navy')+
labs(
title = 'Simulating a t-Distribution',
x = 't-values',
y = 'Relative Frequency',
caption = 'Statistics I / Section 7.2') +
theme(
axis.ticks = element_blank(),
axis.title.y = element_text(vjust = -5),
axis.title.x = element_text(vjust = +5),
plot.title = element_text(hjust = 0.5),
text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))

table of contents 271


MATH 3301 / Notes Section 7.2

Simulating a t−Distribution

0.4
Relative Frequency

0.1

0.0

−5.0 −2.5 t−values 2.5 5.0

Statistics I / Section 7.2

table of contents 272


Section 7.2 MATH 3301 / Notes

The F Distribution

Let W1 and W2 be independent χ2 random variables with degrees of freedom ν1 and


ν2 . Define
W1 /ν1
F =
W2 /ν2
Then F is said to have an F-distribution with ν1 numerator degrees of freedom and
ν2 denominator degrees of freedom.

R Functions for the F Distribution

density df(x, df1, df2, ncp, log = FALSE)


cdf pf(q, df1, df2, ncp, lower.tail = TRUE, log.p = FALSE)
quantile qf(p, df1, df2, ncp, lower.tail = TRUE, log.p = FALSE)
random Numbers rf(n, df1, df2, ncp)

Using R to explore F -Distributions


# Clear the environmet
rm(list = ls())

# Set initial parameters


n <- 1e4
v1 <- 4
v2 <- 7

# Generate chi^2 numbers, and F values


M <-
data_frame(
W1 = rchisq(n, df = v1),
W2 = rchisq(n, df = v2),
F_vals = (W1 / v1) / (W2 / v2),
F = df(F_vals, df1 = v1, df2 = v2))

# Plot a histogram
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = F_vals, y = ..density..),

table of contents 273


MATH 3301 / Notes Section 7.2

fill = 'darkred',
alpha = 0.4,
bins = 100,
color = 'black',
size = 0.2) +
geom_line(
mapping = aes(x = F_vals, y = F),
size = 1.3,
color = 'navy')+
labs(
title = 'Simulating an F-Distribution',
x = 'F-values',
y = 'Relative Frequency',
caption = 'Statistics I / Section 7.2') +
scale_x_continuous(
breaks = c(0, 2, 4, 6, 8, 10),
labels = c(0, 2, 4, 6, 8, 10),
limits = c(0, 10)) +
scale_y_continuous(
breaks = c(0.0, 0.2, 0.4, 0.6, 0.8),
labels = c('0.0', '0.2', '', '', '0.8'),
limits = c(0.0, 0.85)) +
theme(
axis.ticks = element_blank(),
axis.title.y = element_text(vjust = -5),
axis.title.x = element_text(vjust = +5),
plot.title = element_text(hjust = 0.5),
text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))

table of contents 274


Section 7.2 MATH 3301 / Notes

Simulating an F−Distribution

0.8
Relative Frequency

0.2

0.0

0 2 4 F−values 6 8 10

Statistics I / Section 7.2

table of contents 275


MATH 3301 / Notes Section 7.2

Summary

Suppose that

• X1 , . . . , Xn is a random sample from X ∼ N µX , σX2


• Y1 , . . . , Ym is a random sample from Y ∼ N µY , σY2

Then
!
√ X − µX 
• n ∼ N 0, 1
σX

!
n−1
• 2
SX2 ∼ χ2 with df = (n − 1)
σX

!
√ X − µX
• n ∼ t distribution with df = (n − 1)
SX

SX2 /σX2
• F = 2 2 ∼ F distribution, numerator df = (n−1), denominator df =(m−1)
SY /σY

table of contents 276


Section 7.3 MATH 3301 / Notes

7.3 The Central Limit Theorem

Introduction

The central limit theorem will apply to any distribution with finite mean µ and finite
variance σ 2 . The central limit theorem says that if a random sample is large enough,
then the sample mean is approximately normal. This theorem is very important
and will allow us to compute approximate probabilities for the sums of a random
sample when we only know the mean and standard deviation and not the underlying
distribution.

The Central Limit Theorem (Version I)

Let X1 , X2 , . . . , Xn be a random sample from a distribution X with finite mean µ


and finite variance σ 2 . Then for large values of n, the sum X1 + X2 + · · · + Xn is
approximately normal. That is

W = X 1 + X2 + · · · + Xn
 
2
∼ N nµ, nσ [approximately]

Proof:

A proof of the central limit theorem is outside the focus of this course. Textbook
describes the basic idea which uses moment generating function.

table of contents 277


MATH 3301 / Notes Section 7.3

The Central Limit Theorem (Version II)

Let X1 , X2 , . . . , Xn be a random sample from a distribution X with finite mean µ and


finite variance σ 2 . Then for large values of n, the sample mean X is approximately
normal. That is
1 
X = X1 + X 2 + · · · + X n
n
!
σ2
∼ N µ, [approximately]
n

table of contents 278


Section 7.3 MATH 3301 / Notes

Using R to Visualize the Central Limit Theorem


Since a binomial random variable is the sum of independent and identical Bernoulli random
variables, it should be approximately normal. Lets use R to generate random binomial
numbers, plot the histogram and then overlay the corresponding normal density function
# Set the population parameters
n <- 100
p <- 0.43

# Generate Random Binomial Numbers


M <-
data_frame(
xr = rbinom(n = 1e4, size = n, prob = p),
y = dnorm(xr, mean = n * p, sd = sqrt(n * p * (1-p))))

# Create a histogram with overlayed normal density function


G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = xr, y = ..density..),
fill = 'darkred',
alpha = 0.4,
breaks = seq(from = 24.5, to = 65.5, by = 1),
size = 0.2,
color = 'black') +
# scale_x_continuous(
# breaks = c(-5.0, -2.5, 0.0, 2.5, 5.0),
# labels = c('-5.0', '-2.5', '', '2.5', '5.0'),
# limits = c(-5, 5)) +
scale_y_continuous(
breaks = c(0.00, 0.02, 0.04, 0.06, 0.08, 0.10),
labels = c('0.00', '0.02', '', '', '0.08', '0.10'),
limits = c(0.0, 0.1)) +
geom_line(
mapping = aes(x = xr, y = y),
size = 1.3,
color = 'navy')+
labs(
title = 'The Central Limit Theorem',
x = 'x-values',
y = 'Relative Frequency',
caption = 'Statistics I / Section 7.3') +
theme(
axis.ticks = element_blank(),
axis.title.y = element_text(vjust = -5),
axis.title.x = element_text(vjust = +5),

table of contents 279


MATH 3301 / Notes Section 7.3

plot.title = element_text(hjust = 0.5),


text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))

The Central Limit Theorem


0.10

0.08
Relative Frequency

0.02

0.00

30 40 x−values 50 60

Statistics I / Section 7.3

table of contents 280


Section 7.3 MATH 3301 / Notes

Example #1
The total claim amount for a health insurance policy follows a distribution with density
function



1 −(x/1000)


 1000 e , for 0 < x



f (x) =


0 , otherwise





The premium for the policy is set at the expected total claim amount plus 100.If 100 policies
are sold, calculate the approximate probability that the insurance company will have claims
exceeding the premiums collected. Graph a bell curve with a shade region corresponding
to this probability.
# Clear the environment
rm(list = ls())

# Population Parameters
# X is exponential with mean 1000
mu_x <- 1000
sigma_x <- 1000

# Approximate Normal Distribution Parameters


n <- 100
mu <- n * mu_x
sigma <- sqrt(n) * sigma_x

# Calculate the probability


amount <- n * (mu_x + 100)
ans <- pnorm(q = amount, mean = mu, sd = sigma, lower.tail = FALSE)
ans
[1] 0.1586553
# Sketch this probability
M <-
data_frame(
x1 = seq(from = mu - 3 * sigma, to = mu + 3 * sigma, length = 1e3),

table of contents 281


MATH 3301 / Notes Section 7.3

y1 = dnorm(x1, mean = mu, sd = sigma),


x2 = seq(from = amount, to = mu + 3 * sigma, length = 1e3),
y2 = dnorm(x2, mean = mu, sd = sigma))

G <-
ggplot(
data = M)+
geom_line(
mapping = aes(x = x1, y = y1),
color = 'darkred',
size = 1.2)+
geom_ribbon(
mapping = aes(x = x2, ymin = 0, ymax = y2),
fill = 'darkred',
alpha = 0.3,
color = 'darkred',
size = 1.2)+
geom_text(
mapping = aes(x = amount+1000, y = mean(range(y2))),
label = paste('Area = ', round(ans, digits = 3)),
size = 6,
angle = 90,
color = 'black',
family = 'serif',
fontface = 'italic',
nudge_x = 1500,
nudge_y = -4e-6)+
labs(
title = 'The Central Limit Theorem',
caption = 'Statistics I / Section 7.3',
x = 'Total Claims',
y = 'Density')+
theme(
axis.ticks = element_blank(),
axis.title.y = element_text(vjust = +3),
axis.title.x = element_text(vjust = -3),
plot.title = element_text(hjust = 0.5),

table of contents 282


Section 7.3 MATH 3301 / Notes

text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))
G

The Central Limit Theorem


4e−05

3e−05
Density

2e−05 Area = 0.159

1e−05

0e+00

80000 100000 120000

Total Claims Statistics I / Section 7.3

table of contents 283


MATH 3301 / Notes Section 7.4

7.4 The Continuity Correction

Introduction

Let X be an integer valued random variable with expected value µ and variance σ 2
and let X1 , X2 , . . . , Xn be a random sample from X. Define

S = X 1 + X2 + · · · + Xn

W ∼ N µS , σS2


S is also integer valued and by the Central Limit Theorem is approximately normal.

The Continuity Correction

For k ∈ Z+ , the continuity correction is


h 1 1i
P [S = k] = P k − ≤ S ≤ k +
2 2
h 1 1i
≈ P k− ≤W ≤k+
2 2
h 1i
P [S ≥ k] = P S ≥ k −
2
h 1i
≈ P W ≥k−
2
h 1i
P [S ≤ k] = P S ≤ k +
2
h 1i
≈ P W ≤k+
2

table of contents 284


Section 7.4 MATH 3301 / Notes

Normal Approximation to the Binomial

Let X be a binomial random variable with parameters n and p. Then

• µX = np

• σX2 = npq

• X is a sum of independent and identical Bernoulli trials.

• X assumes only integer values


• From the Central Limit Theorem, X is approximately N np, npq


• Let W ∼ N np, npq , then it follows from the continuity correction, that
h 1 1i
P [X = k] ≈ P k − ≤ W ≤ k +
2 2
h 1i
P [X ≥ k] ≈ P W ≥ k −
2
h 1i
P [X ≤ k] ≈ P W ≤ k +
2

table of contents 285


MATH 3301 / Notes Section 7.4

How Big Must n Be?

How big must n be to use the normal approximation to the binomial distribution?
The answer is given by either of the two rules
p p
• 0 < p − 3 pq/n < p + 3 pq/n < 1
!
max{p, q}
• n>9×
min{p, q}

Test the Continuity Correction

# Binomial parameters
n <- 92
p <- 0.61

# normal parameters
m <- n * p
v <- sqrt(n * p * (1 - p))

# Let X be binomial with parameters n and p. Compute P[X = 45]


dbinom(x = 45, size = n, prob = p)
[1] 0.005298644
# Estimate the same probability using the continuity correction and normal ap-
proximation
pnorm(q = 45.5, mean = m , sd = v) - pnorm(q = 44.5, mean = m , sd = v)
[1] 0.005102989

table of contents 286


Section 7.5 MATH 3301 / Notes

7.5 The t-Distribution

The Student’s t distribution

Let Z ∼ N (0, 1) and W ∼ χ2 with df = ν be independent random variables. Define

Z
T = p
W/ν
Then T is said to have a t distribution with ν df

R Functions for the t Distribution

density dt(x, df, ncp, log = FALSE)


cdf pt(q, df, ncp, lower.tail = TRUE, log.p = FALSE)
quantile function qt(p, df, ncp, lower.tail = TRUE, log.p = FALSE)
random numbers rt(n, df, ncp)

table of contents 287


MATH 3301 / Notes Section 7.5

Exploring the t-Distribution

# Clear the environmet


remove(list = ls())

# Set initial parameters


n <- 1e5
v <- 4

# Generate random normal, chi^2 numbers, and T values


M <-
data_frame(
Z = rnorm(n),
W = rchisq(n, df = v),
T_vals = Z / sqrt(W/v),
t = dt(T_vals, df = v))

# Plot a histogram
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = T_vals, y = ..density..),
fill = 'darkred',
alpha = 0.4,
bins = 100,
color = 'black',
size = 0.1) +
geom_line(
mapping = aes(x = T_vals, y = t),
size = 1.3,
color = 'blue')+
scale_x_continuous(
breaks = c(-5.0, -2.5, 0.0, 2.5, 5.0),
labels = c('-5.0', '-2.5', '', '2.5', '5.0'),
limits = c(-5, 5)) +
scale_y_continuous(

table of contents 288


Section 7.5 MATH 3301 / Notes

breaks = c(0.0, 0.1, 0.2, 0.3, 0.4),


labels = c('0.0', '0.1', '', '', '0.4'),
limits = c(0.0, 0.44)) +
labs(
title = 'Simulating a t-Distribution',
x = 't-values',
y = 'Relative Frequency',
caption = 'Statistics I / Section 7.5') +
theme(
axis.ticks = element_blank(),
axis.title.y = element_text(vjust = -5),
axis.title.x = element_text(vjust = +5),
plot.title = element_text(hjust = 0.5),
text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))
G

table of contents 289


MATH 3301 / Notes Section 7.5

Simulating a t−Distribution

0.4
Relative Frequency

0.1

0.0

−5.0 −2.5 t−values 2.5 5.0

Statistics I / Section 7.5

table of contents 290


Section 7.5 MATH 3301 / Notes

The Density Function of a Student t-Distribution

For T a random variable with a t-distribution and ν degrees of freedom, the density
function is

−(ν+1)/2
t2
 
Γ[(ν + 1) / 2]
f (t) = √ 1+
πν Γ(ν/2) ν

where Γ is the gamma function defined by

Z∞
Γ(α) = y α−1 e−y dy
0

table of contents 291


MATH 3301 / Notes Section 7.5

Comparing a t-Density Function and a Normal Density Function

The next plot compares a t-density function and a standard normal density function. Note
that both densities are symmetric about zero but that the t-distribution has more proba-
bility mass in its tails. t-disributions have heavier tails.

M <-
data_frame(
x = seq(from = -4, to = +4, length = 1e3),
dn = dnorm(x),
dt = dt(x, df = 4))

G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = dn, color = 'Standard Normal'),
size = 1.4) +
geom_line(
mapping = aes(x = x, y = dt, color = 't-Distribution'),
size = 1.4) +
scale_x_continuous(
breaks = c(-4, -2, 0, 2, 4),
labels = c(-4, -2, '', 2, 4)) +
scale_y_continuous(
breaks = c(0.0, 0.1, 0.2, 0.3, 0.4),
labels = c('0.0', '0.1', '', '0.3', '0.4'))+
scale_color_manual(
name = 'Density',
values = c('Standard Normal' = 'darkgreen', 't-Distribution' = 'maroon')) +
labs(
x = 'x-axis',
y = 'y-axis',
title = 'Comparing Normal and t Density Functions',
caption = 'Statistics I / Section 7.5') +
theme(
axis.ticks = element_blank(),
legend.position = c(0.85, 0.85),

table of contents 292


Section 7.5 MATH 3301 / Notes

axis.title.y = element_text(vjust = -5),


axis.title.x = element_text(vjust = +5),
plot.title = element_text(hjust = 0.5),
text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))

table of contents 293


MATH 3301 / Notes Section 7.5

Comparing Normal and t Density Functions


0.4
Density
Standard Normal
t−Distribution

0.3
y−axis

0.1

0.0

−4 −2 x−axis 2 4

Statistics I / Section 7.5

table of contents 294


Section 7.5 MATH 3301 / Notes

A Table of Percentage Points for the t-Distribution

Create a table of percentage points for the t-distribution corresponding to the below picture.
each column should correspond to the tα and each row should correspond to df

M <-
data_frame(
x1 = seq(from = -4, to = 4, length = 1e3),
x2 = seq(from = 1, to = 4, length = 1e3),
y1 = dt(x1, df = 5),
y2 = dt(x2, df = 5))

G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x1, y = y1),
color = 'red') +
geom_ribbon(
mapping = aes(x = x2, ymin = 0, ymax = y2),
fill = 'red',
alpha = 0.4) +
geom_text(
mapping = aes(x = 1.5, y = 0.05),
label = expression(alpha),
size = 7,
color = 'black') +
scale_x_continuous(
breaks = c(1),
labels = expression(x = t[alpha]))+
labs(
x = NULL,
y = NULL)+
theme_classic()+
theme(
axis.text.x = element_text(face = 'bold', family = 'serif', size = 16))

table of contents 295


MATH 3301 / Notes Section 7.5

0.3

0.2

0.1

0.0

table of contents 296


Section 7.5 MATH 3301 / Notes

M <-
expand.grid(
df = seq(from = 1, to = 29, by = 1),
alpha = c(0.100, 0.050, 0.025, 0.010, 0.005)) %>%
mutate(t_alpha = qt(p = alpha, df = df, lower.tail = FALSE)) %>%
spread(key = alpha, value = t_alpha)

table of contents 297


MATH 3301 / Notes Section 7.5

Table 7.1: Percentage Points for t-Distributions

df 0.005 0.01 0.025 0.05 0.1


1 63.657 31.821 12.706 6.314 3.078
2 9.925 6.965 4.303 2.920 1.886
3 5.841 4.541 3.182 2.353 1.638
4 4.604 3.747 2.776 2.132 1.533
5 4.032 3.365 2.571 2.015 1.476
6 3.707 3.143 2.447 1.943 1.440
7 3.499 2.998 2.365 1.895 1.415
8 3.355 2.896 2.306 1.860 1.397
9 3.250 2.821 2.262 1.833 1.383
10 3.169 2.764 2.228 1.812 1.372
11 3.106 2.718 2.201 1.796 1.363
12 3.055 2.681 2.179 1.782 1.356
13 3.012 2.650 2.160 1.771 1.350
14 2.977 2.624 2.145 1.761 1.345
15 2.947 2.602 2.131 1.753 1.341
16 2.921 2.583 2.120 1.746 1.337
17 2.898 2.567 2.110 1.740 1.333
18 2.878 2.552 2.101 1.734 1.330
19 2.861 2.539 2.093 1.729 1.328
20 2.845 2.528 2.086 1.725 1.325
21 2.831 2.518 2.080 1.721 1.323
22 2.819 2.508 2.074 1.717 1.321
23 2.807 2.500 2.069 1.714 1.319
24 2.797 2.492 2.064 1.711 1.318
25 2.787 2.485 2.060 1.708 1.316
26 2.779 2.479 2.056 1.706 1.315
27 2.771 2.473 2.052 1.703 1.314
28 2.763 2.467 2.048 1.701 1.313
29 2.756 2.462 2.045 1.699 1.311

table of contents 298


Bibliography

[Dev86] J. Devore and R. Peck 1986, Statistics: The Exploration and Analysis of Data. St
Paul, MN: West Publishing Company
[Si09] Thomas Sibley 2009, Foundations of Mathematics Hoboken, NJ: John Wiley & Sons,
Inc.
Index

addition rule, 29 cumulative distribution, discrete, 42


additive law of probability, 36
DeMorgan’s laws, 16
Baye’s Rule, 37 dependent events, 33
Bernoulli random variable, 55 dependent random variables, 183
Bernoulli trial, 55 difference rule, 29
beta distribution, 150 discrete random variable, 41
bi-variate normal distribution, 208 discrete sample space, 23
binomial coefficient, 24 disjoint sets, 13, 15
binomial distributions, 56 distributive laws, 16
binomial experiment, 56 double complement law, 16
binomial theorem, 24, 25
empirical rule, 11
CDF, 103 empty set, 13
central limit theorem, 277 equal probability, 24
chi-square random variables, 136 event, 22
combination, 24 expected value, continuous, 113
compound event, 23 expected value, discrete, 43
conditional density functions, 182, 185 experiment, 22
conditional distribution function, 182 exponential distribution, 141
conditional expected value, 214
F-distribution, 273
conditional probability, 31
Fundamental Theorem of Calculus, 107
conditional probability function, 181
conditional variance, 215 gamma function, 130, 131
conditioning formulas, 214 geometric distribution, 65
continuity correction, 284 geometric sum formula, 68
continuous random variables, 106
correlation coefficient, 193 hyper-geometric distribution, 80
countable, 41 inclusion/exlusion rule, 29
countably infinite, 41 independent events, 32
counting principle, 24 independent random variables, 183, 243
covariance, 186, 193
cumulative distribution function, 103 Jacobian, 247
Index MATH 3301 / Notes

joint distribution functions, 172 Pascal’s triangle, 25


joint probability density, 173 percentiles, 109
joint probability functions, 171 permutation, 26
jointly continouos, 173 Poisson random variables, 88
population, 9
k-th moment about the origin, 46
power series for ex , 90
kth central moment, 46
probability density function, 107
kth factorial moment, 51
probability function, 23, 42
linear transformations, 44 probability generating function, 51
proper subsets, 14
marginal density functions, 180
marginal probability functions, 180 quantiles, 109
memory-less property, geometric, 66, 67
random sample, 39
method of distribution functions, 230
random variables, 39
method of moment generating functions, 243
mn rule, 24 sample, 9
mode, 108 sample mean, 9
moment generating functions, 46 sample point, 22
multinomial coefficients, 28 sample standard deviation, 9
multinomial distributions, 205 sample variance, 9
multinomial experiments, 204 sampling distribution, 263
multinomial theorem, 28 scale parameter, 129
multiplication rule, 24 set complement, 13
multiplicative law of probability, 35 set difference, 13
multivariate distributions, 174 set equality, 14
multivariate normal distributions, 209 set intersection, 13
mutually exclusive, 13, 15 set notation, 13
mutually independent events, 34 set union, 13
shape parameter, 129
negative binomial distributions, 72
simple event, 22
normal approximation to the binomial, 285
standard deviation, 44
normal distributions, 157
standard normal, 159
null set, 13
standardized variables, 160
order and repetition, 27 statistic, 9, 263
order statistics, 255 subsets, 14
outliers, 161 support, 108
support of a joint density function, 246
partition, 37
Pascal’s formula, 25 t distribution, 270, 287
table of contents 301
MATH 3301 / Notes Index

the law of total probability, 37


transformation formula, 235
uncorrelated variables, 193
uniform discrete distribution, 53
uniform discrete random variable, 53
universal set, 13
variance, discrete, 44
z-score, 160

table of contents 302

Das könnte Ihnen auch gefallen