Sie sind auf Seite 1von 32

Statistics 111 - Lecture 6 Probability

Introduction to Probability, Conditional Probability and Random Variables


June 3, 2008 Stat 111 - Lecture 6 - Probability 1

Administrative Note
Homework 2 due Monday, June 8th
Look at the questions now!

Prepare to have your minds blown today

June 3, 2008

Stat 111 - Lecture 6 - Probability

Course Overview
Collecting Data

Exploring Data Probability Intro. Inference

Comparing Variables

Relationships between Variables

Means

Proportions

Regression

Contingency Tables

June 2, 2008

Stat 111 - Lecture 6 - Introduction

Why do we need Probability?


We have several graphical and numerical statistics for summarizing our data We want to make probability statements about the significance of our statistics Eg. In Stat111, mean(height) = 66.7 inches
What is the chance that the true height of Penn students is between 60 and 70 inches?

Eg. r = -0.22 for draft order and birthday


What is the chance that the true correlation is significantly different from zero?

June 3, 2008

Stat 111 - Lecture 6 - Probability

Deterministic vs. Random Processes


In deterministic processes, the outcome can be predicted exactly in advance
Eg. Force = mass x acceleration. If we are given values for mass and acceleration, we exactly know the value of force

In random processes, the outcome is not known exactly, but we can still describe the probability distribution of possible outcomes
Eg. 10 coin tosses: we dont know exactly how many heads we will get, but we can calculate the probability of getting a certain number of heads
June 3, 2008 Stat 111 - Lecture 6 - Probability 5

Events
An event is an outcome or a set of outcomes of a random process
Example: Tossing a coin three times Event A = getting exactly two heads = {HTH, HHT, THH} Example: Picking real number X between 1 and 20 Event A = chosen number is at most 8.23 = {X 8.23} Example: Tossing a fair dice Event A = result is an even number = {2, 4, 6}

Notation: P(A) = Probability of event A Probability Rule 1: 0 P(A) 1 for any event A
June 3, 2008 Stat 111 - Lecture 6 - Probability 6

Sample Space
The sample space S of a random process is the set of all possible outcomes
Example: one coin toss S = {H,T} Example: three coin tosses S = {HHH, HTH, HHT, TTT, HTT, THT, TTH, THH} Example: roll a six-sided dice S = {1, 2, 3, 4, 5, 6} Example: Pick a real number X between 1 and 20 S = all real numbers between 1 and 20

Probability Rule 2: The probability of the whole sample space is 1 P(S) = 1


June 3, 2008 Stat 111 - Lecture 6 - Probability 7

Combinations of Events
The complement Ac of an event A is the event that A does not occur Probability Rule 3: P(Ac) = 1 - P(A) The union of two events A and B is the event that either A or B or both occurs The intersection of two events A and B is the event that both A and B occur
Event A Complement of A Union of A and B Intersection of A and B

June 3, 2008

Stat 111 - Lecture 6 - Probability

Disjoint Events
Two events are called disjoint if they can not happen at the same time
Events A and B are disjoint means that the intersection of A and B is zero

Example: coin is tossed twice


S = {HH,TH,HT,TT} Events A={HH} and B={TT} are disjoint Events A={HH,HT} and B = {HH} are not disjoint

Probability Rule 4: If A and B are disjoint events then P(A or B) = P(A) + P(B)
June 3, 2008 Stat 111 - Lecture 6 - Probability 9

Independent events
Events A and B are independent if knowing that A occurs does not affect the probability that B occurs

Example: tossing two coins


Event A = first coin is a head Event B = second coin is a head

Independent

Disjoint events cannot be independent!


If A and B can not occur together (disjoint), then knowing that A occurs does change probability that B occurs

Probability Rule 5: If A and B are independent P(A and B) = P(A) x P(B)


multiplication rule for independent events
June 3, 2008 Stat 111 - Lecture 6 - Probability 10

Equally Likely Outcomes Rule


If all possible outcomes from a random process have the same probability, then
P(A) = (# of outcomes in A)/(# of outcomes in S)

Example: One Dice Tossed P(even number) = |2,4,6| / |1,2,3,4,5,6| Note: equal outcomes rule only works if the number of outcomes is countable
Eg. of an uncountable process is sampling any fraction between 0 and 1. Impossible to count all possible fractions !

June 3, 2008

Stat 111 - Lecture 6 - Probability

11

Combining Probability Rules Together


Initial screening for HIV in the blood first uses an enzyme immunoassay test (EIA) Even if an individual is HIV-negative, EIA has probability of 0.006 of giving a positive result Suppose 100 people are tested who are all HIV-negative. What is probability that at least one will show positive on the test? First, use complement rule:
P(at least one positive) = 1 - P(all negative)

June 3, 2008

Stat 111 - Lecture 6 - Probability

12

Combining Probability Rules Together


Now, we assume that each individual is independent and use the multiplication rule for independent events:
P(all negative) = P(test 1 negative) P(test 100 negative)

P(test negative) = 1 - P(test positive) = 0.994


P(all negative) = 0.994 0.994 = (0.994)100

So, we finally we have


P(at least one positive) =1 (0.994)100 = 0.452
June 3, 2008 Stat 111 - Lecture 6 - Probability 13

Curse of the Bambino:


Boston Red Sox traded Babe Ruth after 1918 and did not win a World Series again until 2004 (86 years later) What are the chances that a team will go 86 years without winning a world series? Simplifying assumptions:
Baseball has always had 30 teams Each team has equal chance of winning each year
June 3, 2008 Stat 111 - Lecture 6 - Probability 14

Curse of the Bambino


With 30 teams that are equally likely to win in a year, we have P(no WS in a year) = 29/30 = 0.97 If we also assume that each year is independent, we can use multiplication rule P(no WS in 86 years) = P(no WS in year 1) x xP(no WS in year 86) = (0.97) x x (0.97) = (0.97)86 = 0.05 (only 5% chance!)
June 3, 2008 Stat 111 - Lecture 6 - Probability 15

Break

June 3, 2008

Stat 111 - Lecture 6 - Probability

16

Outline
Moore, McCabe and Craig: Section 4.3,4.5 Conditional Probability Discrete Random Variables Continuous Random Variables Properties of Random Variables
Means of Random Variables Variances of Random Variables
June 4, 2008 Stat 111 - Lecture 6 - Random Variables 17

Conditional Probabilities
The notion of conditional probability can be found in many different types of problems Eg. imperfect diagnostic test for a disease
Disease + Test + Test 30 10 Disease 10 50 Total 40 60

Total

40

60

100

What is probability that a person has the disease? Answer: 40/100 = 0.4 What is the probability that a person has the disease given that they tested positive?
More Complicated !
June 4, 2008 Stat 111 - Lecture 6 - Random Variables 18

Definition: Conditional Probability


Let A and B be two events in sample space The conditional probability that event B occurs given that event A has occurred is:

P(A|B) = P(A and B) / P(B)


Eg. probability of disease given test positive
P(disease +| test +) = P(disease + and test +) / P(test +) = (30/100)/(40/100) =.75

June 4, 2008

Stat 111 - Lecture 6 - Random Variables

19

Independent vs. Non-independent Events


If A and B are independent, then

P(A and B) = P(A) x P(B)


which means that conditional probability is: P(B | A) = P(A and B) / P(A) = P(A)P(B)/P(A) = P(B) We have a more general multiplication rule for events that are not independent: P(A and B) = P(B | A) P(A)
June 4, 2008 Stat 111 - Lecture 6 - Random Variables 20

Random variables
A random variable is a numerical outcome of a random process or random event Example: three tosses of a coin
S = {HHH,THH,HTH,HHT,HTT,THT,TTH,TTT} Random variable X = number of observed tails Possible values for X = {0,1, 2, 3}

Why do we need random variables?


We use them as a model for our observed data

June 4, 2008

Stat 111 - Lecture 6 - Random Variables

21

Discrete Random Variables


A discrete random variable has a finite or countable number of distinct values Discrete random variables can be summarized by listing all values along with the probabilities
Called a probability distribution

Example: number of members in US families


X
P(X)

2
0.413

3
0.236

4
0.211

5
0.090

6
0.032

7
0.018

June 4, 2008

Stat 111 - Lecture 6 - Random Variables

22

Another Example
Random variable X = the sum of two dice X takes on values from 2 to 12 Use equally-likely outcomes rule to calculate the probability distribution:
X
# of Outco mes

2 1 1/36

3 2 2/36

4 3 3/36

5 4 4/36

6 5 5/36

7 6 6/36

8 5 5/36

9 4 4/36

10 3 3/36

11 2 2/36

12 1 1/36

P(X)

If discrete r.v. takes on many values, it is better to use a probability histogram


June 4, 2008 Stat 111 - Lecture 6 - Random Variables 23

Probability Histograms
Probability histogram of sum of two dice:

Using the disjoint addition rule, probabilities for discrete random variables are calculated by adding up the bars of this histogram:
P(sum > 10) = P(sum = 11) + P(sum = 12) = 3/36
June 4, 2008 Stat 111 - Lecture 6 - Random Variables 24

Continuous Random Variables


Continuous random variables have a noncountable number of values Cant list the entire probability distribution, so we use a density curve instead of a histogram Eg. Normal density curve:

June 4, 2008

Stat 111 - Lecture 6 - Random Variables

25

Calculating Continuous Probabilities


Discrete case: add up bars from probability histogram Continuous case: we have to use integration to calculate the area under the density curve:

Although it seems more complicated, it is often easier to integrate than add up discrete bars If a discrete r.v. has many possible values, we often treat that variable as continuous instead
June 4, 2008 Stat 111 - Lecture 6 - Random Variables 26

Example: Normal Distribution


We will use the normal distribution throughout this course for two reasons:
1. 2. It is usually good approximation to real data We have tables of calculated areas under the normal curve, so we avoid doing integration!

June 4, 2008

Stat 111 - Lecture 6 - Random Variables

27

Mean of a Random Variable


Average of all possible values of a random variable (often called expected value) Notation: dont want to confuse random variables with our collected data variables = mean of random variable x = mean of a data variable For continuous r.v, we again need integration to calculate the mean For discrete r.v., we can calculate the mean by hand since we can list all probabilities
June 4, 2008 Stat 111 - Lecture 6 - Random Variables 28

Mean of Discrete random variables


Mean is the sum of all possible values, with each value weighted by its probability: = xi*P(xi) = x1*P(x1) + + x12*P(x12) Example: X = sum of two dice
X P(X) 2 1/36 3 2/36 4 3/36 5 4/36 6 5/36 7 6/36 8 5/36 9 4/36 10 3/36 11 2/36 12 1/36

= 2 (1/36) + 3 (2/36) + 4 (3/36) ++12 (1/36) = 252/36 = 7

June 4, 2008

Stat 111 - Lecture 6 - Random Variables

29

Variance of a Random Variable


Spread of all possible values of a random variable around its mean Again, we dont want to confuse random variables with our collected data variables: 2 = variance of random variable s2 = variance of a data variable For continuous r.v, again need integration to calculate the variance For discrete r.v., can calculate the variance by hand since we can list all probabilities
June 4, 2008 Stat 111 - Lecture 6 - Random Variables 30

Variance of Discrete r.v.s


Variance is the sum of the squared deviations away from the mean of all possible values, weighted by the values probability: = (xi-)*P(xi) = (x1-)*P(x1) + + (x12-)*P(x12)

Example: X = sum of two dice


X P(X) 2 1/36 3 2/36 4 3/36 5 4/36 6 5/36 7 6/36 8 5/36 9 4/36 10 3/36 11 2/36 12 1/36

2 = (2 - 7)2(1/36) + (3 7)2(2/36) ++(12 - 7)2(1/36) = 210/36 = 5.83


June 4, 2008 Stat 111 - Lecture 6 - Random Variables 31

Next Class - Lecture 7


Standardization and the Normal Distribution Moore and McCabe: Section 4.3,1.3

June 4, 2008

Stat 111 - Lecture 6 - Random Variables

32

Das könnte Ihnen auch gefallen