Beruflich Dokumente
Kultur Dokumente
Review
Re-expressing Data:
Getting It Straight!
Slide 10 - 3
Copyright © 2016, 2014, 2012, 2009 Pearson Education, Inc. 3
Straight to the
Point (cont.)
• The relationship between fuel efficiency
(in miles per gallon) and weight (in
pounds) for late model cars looks fairly
linear at first:
Slide 10 - 4
Copyright © 2016, 2014, 2012, 2009 Pearson Education, Inc. 4
Straight to
the Point
(cont.)
• A look at the
residuals plot
shows a
problem:
Slide 10 - 5
Copyright © 2016, 2014, 2012, 2009 Pearson Education, Inc. 5
Straight to the Point (cont.)
• We can re-express fuel efficiency as gallons per
hundred miles (a reciprocal) and eliminate the bend in
the original scatterplot:
Slide 10 - 6
Copyright © 2016, 2014, 2012, 2009 Pearson Education, Inc. 6
Straight to the Point (cont.)
Slide 10 - 7
Copyright © 2016, 2014, 2012, 2009 Pearson Education, Inc. 7
Why Not Just Use a Curve?
If there’s a curve in the scatterplot, why not
just fit a curve to the data?
Slide 10 - 8
Copyright © 2016, 2014, 2012, 2009 Pearson Education, Inc. 8
Why Not Just Use a Curve?
(cont.)
Suggest a Transformation
From Randomness
to Probability
Random
Phenomena
Trial
• Each occasion which we observe a random
phenomena
Outcome
• The value of the trial for the random phenomena
Event
• The combination of the trial’s outcomes
Sample Space
• The collection of all possible outcomes
• If you flip a coin once, you will either get 100% heads
or 0% heads.
• If you flip a coin 1000 times, you will probably get
close to 50% heads.
Identical Probabilities
• The probabilities for each event must remain the
same for each trial.
Independence
• The outcome of a trial is not influenced by the
outcomes of the previous trials.
Empirical probability
• P(A) # times A occurs (in the long run)
# of trials
Identical Probabilities
• The probabilities for each event must remain the
same for each trial.
Independence
• The outcome of a trial is not influenced by the
outcomes of the previous trials.
Empirical probability
• P(A) # times A occurs (in the long run)
# of trials
Modeling
Probability
American Roulette
• 18 Red, 18 Black, 2 Green
• If you bet on Red, what is the probability of winning?
Theoretical Probability
• P(A) = # of outcomes in A
# of possible outcomes
• P(red) 18
38
Formal Probability
Rule 1: 0 ≤ P(A) ≤ 1
• You can’t have a −25% chance of winning.
• A 120% chance also makes no sense.
Rule 2: P(S) = 1
• The set of all possible outcomes has probability 1.
• There is a 100% chance that you will get a head or
a tail.
Complements
• Define AC as the complement of A.
Suppose
P(sophomore) = 0.2 and P(junior) = 0.3
• Find P(sophomore OR junior)
• Solution: 0.2 + 0.3 = 0.5
• This works because sophomore and junior are
disjoint events. They have no outcomes in
common.
The Addition Rule
• If A and B are disjoint events, then
P(A OR B) = P(A) + P(B)
Probability Rules!
The General
Addition Rule
Conditional Probability
and the General
Multiplication Rule
Probability of B Given A:
P ( A and B)
• P (B | A) =
P( A)
• Example:
P (girl and popular )
P(girl | popular) =
P (popular )
91/ 478
141/ 478
91
0.65
141
Copyright © 2016, 2014, 2012, 2009 Pearson Education, Inc. 35
The General Multiplication Rule
• Equivalently,
P(A and B) = P(B) × P(A | B)
Independence
Picturing Probability:
Tables, Venn
Diagrams, and Trees
Reversing the
Conditioning and
Bayes’ Rule
0.075
0.694
0.108
P ( A | B)P (B)
P (B | A) =
P ( A | B)P (B)+ P( A | BC )P(BC )
Establish Independence
Random Variables
Center:
The Expected
Value
m = E ( X ) = å x × P (x )
Spread:
The Standard
Deviation
Shifting and
Combining
Random Variables
In general,
• The mean of the sum of two random variables is
the sum of the means.
• The mean of the difference of two random
variables is the difference of the means.
E(X ± Y) = E(X) ± E(Y)
• If the random variables are independent, the
variance of their sum or difference is always the
sum of the variances.
Var(X ± Y) = Var(X) + Var(Y)
Continuous
Random Variables
Probability
Models
The Geometric
Model
1
E(X) = m =
Expected value: p
q
s=
Standard deviation: p 2
The Binomial
Model
Approximating the
Binomial with the
Normal Model
•P ( X 151-127.98
151) P z P ( z 2.13) 0.9834
10.79
• There is over a 98% chance that no more than
151 of them were real messages. The filter may
be working.
The Continuity
Correction
The Poisson
Model
Other Continuous
Random
Variables: The
Uniform and
Exponential
Sampling
Distribution
Models
Sampling
Distribution of a
Proportion
Sampling
Distributions: A
Summary
s pq
• SD(y ) = SD( pˆ ) =
n n
• Larger sample size → Smaller standard deviation
Confidence
Intervals for
Proportions
Sampling
Distributions: A
Summary
Margin of Error:
Certainty vs.
Precision
Estimate ME
Copyright © 2016, 2014, 2012, 2009 Pearson Education, Inc. 105
Critical Values
Assumptions and
Conditions
• Independence Condition
• If data is collected using SRS or a randomized
experiment → Randomization Condition
• Some data values do not influence others.
• Check for the 10% Condition: The sample size
is less than 10% of the population size.
• Success/Failure Condition
• There must be at least 10 successes.
• There must be at least 10 failures.
• SE pˆ
pˆ qˆ
n
• z*: the critical value that specifies the number of
SE’s needed for C% of random samples to yield
confidence intervals that capture the population
proportion.
0.03 1.96
0.5 0.5
n
• Solving for n, gives n ≈ 1067.1.
• We need to survey at least 1068 to ensure a ME
less than 0.03 for the 95% confidence interval.