You are on page 1of 39

by

Jason Samuels
CUNY-BMCC AMATYC 39, 2013-11-2

Students Dont Get Statistics


After years of Algebra courses, Statistics requires a very

different way of thinking Whats the formula?


Some steps require a formula

e.g. find the z-score p p x z z

Some steps dont

p(1 p) n

x1 x2

12
n1

22
n2

p1 p2 p1 (1 p1 ) p2 (1 p2 ) n1 n2

e.g. find the z-score

wait, what?

Which Topics Can Be Unified?


Doing calculations with standard data distributions Find the data value, z-score, probability Normal distribution, t-distribution, etc. Confidence Intervals Hypothesis tests

Some ideas so these topics make sense to students

Key idea #1: Describe the distribution


Orients the students toward the values they will use in

the problem and in their calculations


Describe the distribution of the data: Center (mean) Spread (standard deviation) Shape (which distribution: normal, t, etc.)

Describe the distribution an example


Ex) A college has an average of 23.7 students in each

class, with a standard deviation of 5.6. What is the probability that a sample of 35 classes has an average of more than 25 students?
Get the facts: =23.7 =5.6 n=35 want P( x > 25) Describe the distribution of x Mean: x 23.7 5.6 0.95 Standard deviation: x 35 Shape: n>30 so its normal

Key Idea #2: Draw the Graph


All values can be organized and connected using one

graph:

Draw the graph example continued


From before Get the facts: =23.7 =5.6 n=35 want P( x > 25) Describe the distribution of x: x 23.7 x 0.95 normal Now draw the graph:

23.7

25

Key Idea #3: The Flow Chart


Almost every calculation

students will do with standard distributions is guided by this flow chart:

Key Idea #4: The Formula


There is only one formula students need to know:

(data value) (mean) test statistic (standard deviation)


Or, equivalently:
data value = (mean)+(test statistic)(standard deviation)
x

For a single data value: For a sample mean:

or x z
...or... x
x

For a sample proportion:

p p
p

z ...or... p p p

Benefit
Students learn that z has one meaning the number of standard deviations from the mean so z has one formula Never again will students use these varied, complex formulas:
z x

x1 x2

z
2

n1

p p p (1 p ) n

n2

p1 p2 p1 (1 p1 ) p2 (1 p2 ) n1 n2

Students make fewer order-of-operation calculation errors

The formula example continued


From before Get the facts:
=23.7 =5.6 n=35 want P( x > 25)

Describe the distribution of x 5.6 0.95 normal x 23.7 x


35

Now: Find the z-score:


z x x

25 23.7 1.37 0.95

Flow Chart & Graph - together


Probability

Z-score Data value

Flow Chart & Graph - example continued


Get the facts: =23.7 =5.6 n=35 want P(x > 25) Describe the distribution of x : x 23.7 x 0.95 normal Now, fill in the graph following the flowchart: In this case, we go up the ladder

Probability
.9147 .0853 1.37

Z-score Data value

Putting it together: an exercise


The mean time for all flight delays is 21 minutes with a

standard deviation of 12 minutes. What is the probability that a sample of 36 flights has a delay average above 26 minutes?

Putting it together: an exercise


Step 1: get the facts
=21 =12 n=36 find P( x >26)

(1) Get the facts: =21 =12 n=36 find P( x >26) (2) Describe the distribution: (3) Draw the graph:

(4) Do the calculations: (5) Conclusion:

Putting it together: an exercise


Step 2: describe the distribution
Center:

mean x =21
Spread:

(1) Get the facts: =21 =12 n=36 find P( x >26) (2) Describe the distribution: x = 21 x =2 Normal (3) Draw the graph:

standard deviation 12 12 x 2
Shape:
n 36 6

n>30, so the distribution is normal (4) Do the calculations:


(5) Conclusion:

Putting it together: an exercise


Step 3: Draw the graph
(1) Get the facts: =21 =12 n=36 find P( x >26) (2) Describe the distribution: x = 21 x =2 Normal (3) Draw the graph:

(4) Do the calculations: (5) Conclusion:

Putting it together: an exercise


Step 4: Do the calculations z-score:
z x x
(1) Get the facts: =21 =12 n=36 find P( x >26) (2) Describe the distribution: x = 21 x =2 Normal (3) Draw the graph:

26 21 2.5 2

Areas: using technology area to the left = .9937 area to the right = .0063

2.5

(4) Do the calculations: z = 2.5 area=.9937 & .0063 (5) Conclusion:

Putting it together: an exercise


Step 5: Write the conclusion The probability is .0063
(1) Get the facts: =21 =12 n=36 find P( x >26) (2) Describe the distribution: x = 21 x =2 Normal (3) Draw the graph:

(4) Do the calculations: z = 2.5 area=.9937 & .0063 (5) Conclusion: The probability is .0063

A harder exercise (thats not harder)


Ex) for United, the mean delay time is 18 minutes,

st.dev.=11 minutes. For Delta, the mean delay time is 22 minutes, st.dev.=14 minutes. Find the probability that, for a sample of 32 United flights and 34 Delta flights, Delta has a higher mean delay time by over 2 minutes.

A (not) harder exercise


Step 1: Get the facts Delta: 1=22 1=14 n1=34 United: 2=18 2=11 n2=32 Find P( x1 x 2 >2)
(1) Get the facts: 1=22 1=14 n1=34 2=18 2=11 n2=32 find P( x 2 x1 >2) (2) Describe the distribution: (3) Draw the graph:

(4) Do the calculations: (5) Conclusion:

A (not) harder exercise


Step 2: Describe the distribution
Center

Mean

x x 1 2
1 2

=22-18=4
Spread

(1) Get the facts: 1=22 1=14 n1=34 2=18 2=11 n2=32 find P( x 2 x1 >2) (2) Describe the distribution: x1 x2 4 x1x2 3.090 Normal (3) Draw the graph:

Standard deviation

x x
1

(1)2 ( 2 )2 or ( x1 )2 ( x2 )2 n1 n2
14 2 112 3.09 34 32

(4) Do the calculations: (5) Conclusion:

Shape

n1, n2>30 so its normal

A (not) harder exercise


Step 3: Draw the graph
(1) Get the facts: 1=22 1=14 n1=34 2=18 2=11 n2=32 find P( x 2 x1 >2) (2) Describe the distribution: x1 x2 4 x1x2 3.090 Normal (3) Draw the graph:

(4) Do the calculations: (5) Conclusion:

A (not) harder exercise


Step 4: Do the calculations

z-score:
24 z 0.65 3.09

(1) Get the facts: 1=22 1=14 n1=34 2=18 2=11 n2=32 find P( x 2 x1 >2) (2) Describe the distribution: x1 x2 4 x1x2 3.090 Normal (3) Draw the graph:

Areas: area to the left = .2587 area to the right = .7413

-0.65

(4) Do the calculations: z = -0.65 areas: .2587 & .7413 (5) Conclusion:

A (not) harder exercise


Step 5: Write the conclusion The probability is .7413
(1) Get the facts: 1=22 1=14 n1=34 2=18 2=11 n2=32 find P( x 2 x1 >2) (2) Describe the distribution: x1 x2 4 x1x2 3.090 Normal (3) Draw the graph:

(4) Do the calculations: z = -0.65 areas: .2587 & .7413 (5) Conclusion: The probability is .7413

A Handy Tool
StatDisk Does all basic statistics calculations with a simple graphical interface and one or two clicks Available for free at StatDisk.org

The Issue of the Center


First students learn that they know , this defines the

center of the distribution, and x (the value from the data) exists relative to that x Later, also and p In the case of inference confidence intervals and hypothesis tests (or p) is not known. Rather, we know x (or p ) and make an inference about (or p). What does this mean for the distribution, and the graph?

The Issue of the Center


Confidence Interval Formula: ( x z , x z ) x x What does this imply for the graph?

x z x

x z x

The center is x , not ! We are calculating values for , not x With confidence intervals we just use the formula and ignore it With hypothesis tests, the issue does not go away

The Issue of the Center


Hypothesis Test Old way:

Ho: = 0 H1: > o

and you spend all this time explaining why, even though the hypothesis says > o you shade to the right of x (and I think students still dont understand, they just do it)

Recognizing a Different Center


Hypothesis Test New way:

Ho: = 0 H1: > o

and now you shade where the claim tells you to shade, and that area is your confidence level

Why This Makes Sense


Shaded area matches the claim Hypothesis tests and confidence intervals are both

inferences about the population, and they should agree (in terms of the graph, distribution, etc.)
We are using a distribution of values for The center is

What does confidence mean? Its a type of probabilistic statement 95% of the time, a conclusion made in this way will be correct

Different center: an exercise


Ex) We want to find out if the average American family

has more than 1.8 kids (because that places a strain on municipal services). From a survey of 500 families, the mean is 1.92 (take =0.9). What can we conclude?

Different center: an exercise


Step 1: Get the facts
(1) Get the facts: x =1.92 =0.9 n=500 test claim: > 1.8 (2) Describe the distribution: (3) Draw the graph:

x =1.92 =0.9 n=500


claim: > 1.8

(4) Do the calculations: (5) Conclusion:

Different center: an exercise


Step 2: describe the distribution of
Center: mean=1.92 Spread:
st.dev.

(1) Get the facts: x =1.92 =0.9 n=500 test claim: > 1.8 (2) Describe the distribution: mean=1.92 stdev=.0402 normal (3) Draw the graph:

Shape: n>30 so its normal

0.9 .0402 500

(4) Do the calculations: (5) Conclusion:

Different center: an exercise


Step 3: Draw the graph
(1) Get the facts: x =1.92 =0.9 n=500 test claim: > 1.8 (2) Describe the distribution: mean=1.92 stdev=.0402 normal (3) Draw the graph:

(4) Do the calculations: (5) Conclusion:

Different center: an exercise


Step 4: Do the calculations
1.8 1.92 z 2.99 .0402
(1) Get the facts: x =1.92 =0.9 n=500 test claim: > 1.8 (2) Describe the distribution: mean=1.92 stdev=.0402 normal (3) Draw the graph:

Areas: area to the left = .0014 area to the right = .9986

-2.99

(4) Do the calculations: z = 2.99 areas: .0014 & .9986 (5) Conclusion:

Different center: an exercise


Step 5: Write the conclusion We are .9986 confident in the claim that > 1.8 (the
(1) Get the facts: x =1.92 =0.9 n=500 test claim: > 1.8 (2) Describe the distribution: mean=1.92 stdev=.0402 normal (3) Draw the graph:

average American family has more than 1.8 children)


(4) Do the calculations: z=-2.99 areas .0014 & .9986 (5) Conclusion: We have .9986 confidence that > 1.8

Big Changes
All the formulas for the test statistic flip For means

the center is x The formula for z is:

0 x
s.d .

For proportions p the center is The formula for z is:

p0 p z s.d .

These are equivalent to the confidence interval formulas

(just solve for 0) so we already used them without knowing it The formulas for x & z (given population info) were inverses; Now the formulas for and z from inference (confidence intervals & hypothesis tests) are inverses as they should be

Jason Samuels
jsamuels@bmcc.cuny.edu