Sie sind auf Seite 1von 36

SOC 1100 Introductory

Statistics for Social Research


Professor Elizabeth Fussell
Mencoff Hall, Room 204
Office Hours: T 12:00 -2:00
Learning goals for Chapter 5
Understand probability rules
Understand counting for sample spaces and events
Probability functions
binomial distribution mass function
probability density function
A normal distribution
How to calculate a z-score and use a z-table
Probability theory Observed (known) Estimate

Probability theory is the basis for , 2 , 2


statistical inference
Known for the Sample estimate
Probabilistic reasoning: population should be close to
Known population characteristics population values
deduce sample characteristics
Statistical inference inverts
probabilistic reasoning 2 , 2
,
Statistical inference:
Known sample characteristics Known for the Estimated for the
infer population characteristics sample population with
margin of error
Probability theory
Probability theory is used to
generate a distribution of
probabilities associated with all
possible values of a variable
Different types of variables produce
different distributions
Use a probability mass function for
discrete variables (0, 1, 2, 3, 4, 5,)
Binomial mass function
Use a probability density function for
continuous variables
Normal density function
Probability theory
Probability theory is used to
generate a distribution of
probabilities associated with all
possible values of a variable
We compare a sample mean
against a probability distribution to
determine the probability of
obtaining the specific value
Probability is expressed as a p-value
What is a probability?
Parts of a probability
A success is a desired outcome
An event is the outcome of a trial
A trial is the experiment or observation
A probability is the ratio of successes to events in a trial

In one fair coin flip (trial) the chance of heads (success) is


; the event is the actual outcome

In one fair roll of a die (trial) the chance of getting a snake


eye (success) is 1/6; the event is the actual outcome
Rules of probability: (1) Sample space
The collection of all possible events, (1 ), in a trial is called the
sample space (S)

= ,
A fair coin flip has a sample space of 2 equally likely outcomes

= 1, 2, 3, 4, 5, 6
A fair die toss has a sample space of 6 equally likely events
Rules of probability: (2) Probability bounds
A probability is the ratio of the count of ways a success can be
obtained out of a given number of possible outcomes (events)
Probabilities are bounded by 0 and 1
The number of successes number of events
An impossible outcome has a probability of 0
A certain outcome has a probability of 1
A fair coin flip has a sample space of 2 equally likely outcomes
1 possible success out of 2 possible outcomes forms a ratio of
A fair die toss has a sample space of 6 equally likely events
1 possible success out of 6 possible outcomes forms a ratio of 1/6
Rules of probability: (3) Some event must occur
If the sample space is all possible events that can occur in a trial,
one of the events must occur if we go through with the trial
= 1

The probability that either heads or tails occurs given a coin flip occurs
is 1.

The probability that the die lands on 1, 2, 3, 4, 5, or 6 given a roll of the


die is 1.
Venn diagram
To conceptualize a probability,
use a Venn diagram ~E
E is the probability of an outcome
occurring
~E is the complement of E, the
probability of the outcome not E
occurring
Venn diagram ~A~B

Two events may be represented A AB B


~ indicates not

indicates the intersection of two events


blue area in top figure and the area outside ~A~B
of the circles

indicates the union of two events AUB


the entire blue area in bottom figure
Contingency tables A Not A

The information in a Venn diagram can


also be given in a contingency table B AB ~A B
Not B A ~B ~A~B
A two by two table shows the
1.0
relationship of two binomial variables
Each shows how the two binomial variables
can be combined or divided ~A~B
Cells can be filled in with counts or
probabilities
The probabilities sum to 1.0 (Rule 3) AUB
Rules of probability: (4) Joint probability, independent events

The joint probability that two events both


occur is equal to the product of the events,
if they are independent
, = ()

The probability of getting heads on flip 1 is H1 Not H1


not related to the probability of getting heads
on flip 2. Therefore:
1 1 1
H2 0.25 0.25 0.5
, = = Not H2 0.25 0.25 0.5
2 2 4
0.5 0.5 1.0
What is the probability that a family with two children
has two girls?
i-Clicker: What is the sample
space?
A) S={BB, BG, GG}
B) S={BB, GG}
C) S={BB, BG, GB, GG}
D) S={B, G}
What is the probability that a family with two children
has two girls?
i-Clicker: What is the sample
space?
A) S={BB, BG, GG}
B) S={BB, GG}
C) S={BB, BG, GB, GG}
D) S={B, G}
What is the probability that a family with two children
has two girls?
i-Clicker: What is the probability
the family has two girls?
A) 1
B) 1/2
C) 1/3
D) 1/4
What is the probability of that a family with two
children has two girls?
i-Clicker: What is the probability If the sample space is:
the family has two girls? S={BB, BG, GB, GG}
A) 1 Only one event involves two girls
B) 1/2 S={BB, BG, GB, GG}
C) 1/3 Since having a boy or girl is equally
likely in each birth (independence),
D) 1/4 there is only one event in which
both children are girls , so:
*=
Independent and dependent events
Two events are independent if the
outcome of one does not depend
on the outcome of the other
The probability of A is the same
regardless of the probability of B A B
P(A,B) = P(A)*P(B)

Two events are dependent if the


outcome of one depends on the

A B
outcome of the other
The distribution of A depends on the
distribution of B
P(A,B) P(A)*P(B)
Independent and dependent events
i-clicker: Is agreement that
government spends too little on
Key

urban problems independent of frequency

respondents sex?
cell percentage

A) Yes Spending on urban

B) No
Respondent problems
s sex Too littl About rig Total

C) Cannot determine Male 245 288 533


21.32 25.07 46.39

Female 299 317 616


26.02 27.59 53.61

Total 544 605 1,149


47.35 52.65 100.00
Independent and dependent events
i-clicker: Is agreement that
government spends too little on
Key

urban problems independent of frequency

respondents sex?
cell percentage

A) Yes Spending on urban

B) No
Respondent problems
s sex Too littl About rig Total

C) Cannot determine Male 245 288 533


21.32 25.07 46.39

This is so for the following reason Female 299 317 616


P(male)=0.464 26.02 27.59 53.61
P(too little)=0.474
P(male)*P(too little)=0.220 Total 544 605 1,149
P(male, too little)=0.213 47.35 52.65 100.00

0.220 is pretty close to 0.213


Virtually no overlap of P(male) and P(too little)
Independent and dependent events
i-clicker: Is agreement that Key
government spends too little on
urban problems independent of frequency
expected frequency
respondents sex?
A) Yes Respondent
Spending on urban
problems

B) No s sex Too littl About rig Total

C) Cannot determine Male 245


252.4
288
280.6
533
533.0

Female 299 317 616


Compare observed frequencies with those 291.6 324.4 616.0
expected if the distributions were the same as the
marginals Total 544 605 1,149
Also, notice the chi-2 value and its associated 544.0 605.0 1,149.0

probability of 0.384 Pearson chi2(1) = 0.7587 Pr = 0.384

But we get ahead of ourselves.


Rules of probability: (5) Conditional probability
If two events, A and B, are dependent, the conditional probability
that both events occur is:
p A,B
p AB = or equivalently p(A, B)=p(A|B)*p(B)
p B

What is the probability a randomly selected person is male given that


they own a pet?
p M, .41
p M PO = = = 0.48
p PO .86

Pet owner Not a pet owner


Male .41 .08 .49
Female .45 .06 .51
.86 .14 1.00
Rules of probability: (5) Conditional probability
i-Clicker: Is the probability of Respondent
Spending on space
exploration program
agreeing that the s sex Too littl About rig Total

government spends too little Male 179 373 552

on space exploration 15.57 32.43 48.00

independent of respondents Female 119 479 598

sex? 10.35 41.65 52.00

A) Yes
Total 298 852 1,150
25.91 74.09 100.00

B) No
C) Cannot determine
Rules of probability: (5) Conditional probability
i-Clicker: Is the probability of Respondent
Spending on space
exploration program
agreeing that the s sex Too littl About rig Total

government spends too little Male 179 373 552

on space exploration 15.57 32.43 48.00

independent of respondents Female 119 479 598

sex? 10.35 41.65 52.00

A) Yes
Total 298 852 1,150
25.91 74.09 100.00

B) No This is so for the following reason


P(male)=0.480
C) Cannot determine P(too little)=0.259
P(male)*P(too little)=0.124
P(male, too little)=0.156
0.156 is not close to 0.124
Rules of probability: (5) Conditional probability
Key

i-Clicker: Is the probability of frequency

agreeing that the expected frequency

government spends too little


on space exploration
Spending on space
Respondent exploration program

independent of respondents
s sex Too littl About rig Total

sex?
Male 179 373 552
143.0 409.0 552.0

A) Yes Female 119


155.0
479
443.0
598
598.0

B) No Total 298 852 1,150

C) Cannot determine
298.0 852.0 1,150.0

Pearson chi2(1) = 23.4659 Pr = 0.000


Also the expected values of the cells based on their
marginal distributions is not close to the observed
values.
So, the chi-square test gives a p-value of 0.000
Rules of probability: (6) union of two events
The probability that, of two events A and
B, at least one will occur is: ~A~B
p = + (, )

What is the probability that a randomly AUB


selected person is either male or owns a
pet?
p = 0.49 + 0.86 0.41 = 0.94

Pet owner Not a pet owner


Male .41 .08 .49
Female .45 .06 .51
.86 .14 1.00
Rules of probability: (6) union of two events
i-clicker: What is the probability Spending on urban

that a respondent agrees that


Spending on space problems
exploration program Too littl About rig Total
government spends too little on
urban problems or space Too little 133
12.58
143
13.53
276
26.11
exploration?
A) 0.126
About right or too mu 358 423 781
33.87 40.02 73.89

B) 0.261 Total 491 566 1,057

C) 0.465
46.45 53.55 100.00

D) 0.600
E) 1.000
Rules of probability: (6) union of two events
i-clicker: What is the probability Spending on urban

that a respondent agrees that


Spending on space problems
exploration program Too littl About rig Total
government spends too little on
urban problems or space Too little 133
12.58
143
13.53
276
26.11
exploration?
A) 0.126
About right or too mu 358 423 781
33.87 40.02 73.89

B) 0.261 Total 491 566 1,057

C) 0.465
46.45 53.55 100.00

P(too little space)=0.261


D) 0.600

P(too little city)=0.465
E) 1.000 P(too little space, too little city)=0.126
P(too little space)+P(too little city)-P(both)=0.600
Or:
P(~too little space)P(~too little city)=0.400
P(too little space, too little city)=1 0.400=0.600
The probability of a respondent agreeing with one or the
other or both is 0.600
Law of total probability
B ~B
The partition of a sample space will
lead to the partition of any event E
into mutually exclusive parts
If B partitions the sample space, S,
then P(A)=P(A,B)+P(A,~B)

There is no joint probability of being


in state B and ~B (A, B) (A, ~B)
A person has a disease, P(B), or does
not have a disease, P(~B)
The probability of testing positive for
the disease is P(A) and depends on
having the disease P(B)
Bayes Theorem
Evaluates the strength of evidence. It asks, should we revise our
probabilities when we learn a new fact or observe new evidence?
A test for a disease is different then having a disease
If a person tests positive for a disease (new evidence), should they revise their
prior conclusion that they did not have the disease?
If they know p(A|B), what is the p(B|A)?
Bayes Theorem reverses a conditional probability
(|)
=
()
If we know P(+|D), what about P(D|+)?
Bayes theorem:
+ ()
+ =
(+)
Law of total probability:
+ = + + + ~ ~

Multiply Bayes theorem by p(+)


+ P(+|D)=0.9
+ + = (+)
+
P(D)=0.0001
Observe that both sides are joint probabilities written in reverse
P(-|D)=0.1
+ + = + ()
1,000
Divide both sides by p(+) to isolate p(D|+) P(+|~D)=0.1
+ + + () + P(~D)=0.9999
= , + =
(+) (+) (+)
P(-|~D)=0.9
Substitute law of total probability for p(+)
+ + ()
+ = =
(+) + + + (~)
If we know P(+|D), what about P(D|+)?
Solve the equation to find p(D|+)
P(+|D)=0.9
+ ()
+ = P(D)=0.0001
+ + + (~)
P(-|D)=0.1

(0.9)(0.0001) 1,000
+ = P(+|~D)=0.1
0.9 0.0001 + 0.10 (0.9999)
P(~D)=0.9999
(0.00009) 0.00009 P(-|~D)=0.9
+ = = = 0.0009
0.00009 + (0.09999) 0.10008
How to count
Computing a probability involves:
Counting successes (numerator)
Counting the size of sample spaces (denominator)

The denominator is a count of combinations and permutations


Combinations are all possible combinations of events, regardless of order
!
, =
! !
Permutations are all possible combinations of events, in which order matters
!
P , = !

Assumes sampling without replacement from a finite population


Once an item is selected it is not possible to select it again
Counting sample space and successes
What is the probability of rolling a 7 with a pair of dice?
Order of dice matters
Sample space for two dice: S={(1,1), (1,2), (6,5),(6,6)}
Or 6*6=36
Successes: {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}
Probability is ratio of successes to sample space
6 ways to get a success
36 events in sample space
Probability = 1/6
Counting sample space and successes
What is the probability of rolling a 7 with a pair of dice?
Sample space for two dice (order doesnt matter): S={2,3,4,5,6,7,8,9,10,11,12}
Count number of ways of obtaining each of these sums
1 way to get 2: (1,1)
2 ways to get 3: (1,2) or (2,1)
3 ways to get 4: (1,3), (2,2), (3,1)
4 ways to get 5: (1,4), (2,3), (3,2), (4,1) Probability of sum of two dice
Etc 0.18
0.16
Map probabilities for two dice: 0.14
0.12
P(E)={1/36, 2/36, 3/36, 4/36, 5/36, 6/36, 5/36, 4/36, 3/36, 2/36, 1/36} 0.1
P(E)={1/36, 1/18, 1/12, 1/9, 5/36, 1/6, 5/36, 1/9, 1/12, 1/18, 1/36} 0.08
0.06
Note that the probabilities are not equal 0.04

Probabilities cluster around a central value 0.02


0
2 3 4 5 6 7 8 9 10 11 12
Review of items
Probability and its rules Law of total probability
Sample space Bayes Theorem
Event Venn diagram
Joint probability of independent Combinations and permutations
events Sampling with and without
Joint probability of non- replacement
independent events
Conditional probability

Das könnte Ihnen auch gefallen