You are on page 1of 39

# by

Jason Samuels
CUNY-BMCC AMATYC 39, 2013-11-2

## Students Dont Get Statistics

After years of Algebra courses, Statistics requires a very

## different way of thinking Whats the formula?

Some steps require a formula

## Some steps dont

p(1 p) n

x1 x2

12
n1

22
n2

p1 p2 p1 (1 p1 ) p2 (1 p2 ) n1 n2

wait, what?

## Which Topics Can Be Unified?

Doing calculations with standard data distributions Find the data value, z-score, probability Normal distribution, t-distribution, etc. Confidence Intervals Hypothesis tests

## Key idea #1: Describe the distribution

Orients the students toward the values they will use in

## the problem and in their calculations

Describe the distribution of the data: Center (mean) Spread (standard deviation) Shape (which distribution: normal, t, etc.)

## Describe the distribution an example

Ex) A college has an average of 23.7 students in each

class, with a standard deviation of 5.6. What is the probability that a sample of 35 classes has an average of more than 25 students?
Get the facts: =23.7 =5.6 n=35 want P( x > 25) Describe the distribution of x Mean: x 23.7 5.6 0.95 Standard deviation: x 35 Shape: n>30 so its normal

## Key Idea #2: Draw the Graph

All values can be organized and connected using one

graph:

## Draw the graph example continued

From before Get the facts: =23.7 =5.6 n=35 want P( x > 25) Describe the distribution of x: x 23.7 x 0.95 normal Now draw the graph:

23.7

25

## Key Idea #3: The Flow Chart

Almost every calculation

## Key Idea #4: The Formula

There is only one formula students need to know:

## (data value) (mean) test statistic (standard deviation)

Or, equivalently:
data value = (mean)+(test statistic)(standard deviation)
x

or x z
...or... x
x

## For a sample proportion:

p p
p

z ...or... p p p

Benefit
Students learn that z has one meaning the number of standard deviations from the mean so z has one formula Never again will students use these varied, complex formulas:
z x

x1 x2

z
2

n1

p p p (1 p ) n

n2

p1 p2 p1 (1 p1 ) p2 (1 p2 ) n1 n2

## The formula example continued

From before Get the facts:
=23.7 =5.6 n=35 want P( x > 25)

35

z x x

Probability

## Flow Chart & Graph - example continued

Get the facts: =23.7 =5.6 n=35 want P(x > 25) Describe the distribution of x : x 23.7 x 0.95 normal Now, fill in the graph following the flowchart: In this case, we go up the ladder

Probability
.9147 .0853 1.37

## Putting it together: an exercise

The mean time for all flight delays is 21 minutes with a

standard deviation of 12 minutes. What is the probability that a sample of 36 flights has a delay average above 26 minutes?

## Putting it together: an exercise

Step 1: get the facts
=21 =12 n=36 find P( x >26)

(1) Get the facts: =21 =12 n=36 find P( x >26) (2) Describe the distribution: (3) Draw the graph:

## Putting it together: an exercise

Step 2: describe the distribution
Center:

mean x =21

(1) Get the facts: =21 =12 n=36 find P( x >26) (2) Describe the distribution: x = 21 x =2 Normal (3) Draw the graph:

standard deviation 12 12 x 2
Shape:
n 36 6

(5) Conclusion:

## Putting it together: an exercise

Step 3: Draw the graph
(1) Get the facts: =21 =12 n=36 find P( x >26) (2) Describe the distribution: x = 21 x =2 Normal (3) Draw the graph:

## Putting it together: an exercise

Step 4: Do the calculations z-score:
z x x
(1) Get the facts: =21 =12 n=36 find P( x >26) (2) Describe the distribution: x = 21 x =2 Normal (3) Draw the graph:

26 21 2.5 2

Areas: using technology area to the left = .9937 area to the right = .0063

2.5

## Putting it together: an exercise

Step 5: Write the conclusion The probability is .0063
(1) Get the facts: =21 =12 n=36 find P( x >26) (2) Describe the distribution: x = 21 x =2 Normal (3) Draw the graph:

(4) Do the calculations: z = 2.5 area=.9937 & .0063 (5) Conclusion: The probability is .0063

## A harder exercise (thats not harder)

Ex) for United, the mean delay time is 18 minutes,

st.dev.=11 minutes. For Delta, the mean delay time is 22 minutes, st.dev.=14 minutes. Find the probability that, for a sample of 32 United flights and 34 Delta flights, Delta has a higher mean delay time by over 2 minutes.

## A (not) harder exercise

Step 1: Get the facts Delta: 1=22 1=14 n1=34 United: 2=18 2=11 n2=32 Find P( x1 x 2 >2)
(1) Get the facts: 1=22 1=14 n1=34 2=18 2=11 n2=32 find P( x 2 x1 >2) (2) Describe the distribution: (3) Draw the graph:

## A (not) harder exercise

Step 2: Describe the distribution
Center

Mean

x x 1 2
1 2

=22-18=4

(1) Get the facts: 1=22 1=14 n1=34 2=18 2=11 n2=32 find P( x 2 x1 >2) (2) Describe the distribution: x1 x2 4 x1x2 3.090 Normal (3) Draw the graph:

Standard deviation

x x
1

(1)2 ( 2 )2 or ( x1 )2 ( x2 )2 n1 n2
14 2 112 3.09 34 32

Shape

## A (not) harder exercise

Step 3: Draw the graph
(1) Get the facts: 1=22 1=14 n1=34 2=18 2=11 n2=32 find P( x 2 x1 >2) (2) Describe the distribution: x1 x2 4 x1x2 3.090 Normal (3) Draw the graph:

## A (not) harder exercise

Step 4: Do the calculations

z-score:
24 z 0.65 3.09

(1) Get the facts: 1=22 1=14 n1=34 2=18 2=11 n2=32 find P( x 2 x1 >2) (2) Describe the distribution: x1 x2 4 x1x2 3.090 Normal (3) Draw the graph:

## Areas: area to the left = .2587 area to the right = .7413

-0.65

(4) Do the calculations: z = -0.65 areas: .2587 & .7413 (5) Conclusion:

## A (not) harder exercise

Step 5: Write the conclusion The probability is .7413
(1) Get the facts: 1=22 1=14 n1=34 2=18 2=11 n2=32 find P( x 2 x1 >2) (2) Describe the distribution: x1 x2 4 x1x2 3.090 Normal (3) Draw the graph:

(4) Do the calculations: z = -0.65 areas: .2587 & .7413 (5) Conclusion: The probability is .7413

A Handy Tool
StatDisk Does all basic statistics calculations with a simple graphical interface and one or two clicks Available for free at StatDisk.org

## The Issue of the Center

First students learn that they know , this defines the

center of the distribution, and x (the value from the data) exists relative to that x Later, also and p In the case of inference confidence intervals and hypothesis tests (or p) is not known. Rather, we know x (or p ) and make an inference about (or p). What does this mean for the distribution, and the graph?

## The Issue of the Center

Confidence Interval Formula: ( x z , x z ) x x What does this imply for the graph?

x z x

x z x

The center is x , not ! We are calculating values for , not x With confidence intervals we just use the formula and ignore it With hypothesis tests, the issue does not go away

## The Issue of the Center

Hypothesis Test Old way:

## Ho: = 0 H1: > o

and you spend all this time explaining why, even though the hypothesis says > o you shade to the right of x (and I think students still dont understand, they just do it)

## Recognizing a Different Center

Hypothesis Test New way:

## Ho: = 0 H1: > o

and now you shade where the claim tells you to shade, and that area is your confidence level

## Why This Makes Sense

Shaded area matches the claim Hypothesis tests and confidence intervals are both

inferences about the population, and they should agree (in terms of the graph, distribution, etc.)
We are using a distribution of values for The center is

What does confidence mean? Its a type of probabilistic statement 95% of the time, a conclusion made in this way will be correct

## Different center: an exercise

Ex) We want to find out if the average American family

has more than 1.8 kids (because that places a strain on municipal services). From a survey of 500 families, the mean is 1.92 (take =0.9). What can we conclude?

## Different center: an exercise

Step 1: Get the facts
(1) Get the facts: x =1.92 =0.9 n=500 test claim: > 1.8 (2) Describe the distribution: (3) Draw the graph:

claim: > 1.8

## Different center: an exercise

Step 2: describe the distribution of
st.dev.

(1) Get the facts: x =1.92 =0.9 n=500 test claim: > 1.8 (2) Describe the distribution: mean=1.92 stdev=.0402 normal (3) Draw the graph:

## Different center: an exercise

Step 3: Draw the graph
(1) Get the facts: x =1.92 =0.9 n=500 test claim: > 1.8 (2) Describe the distribution: mean=1.92 stdev=.0402 normal (3) Draw the graph:

## Different center: an exercise

Step 4: Do the calculations
1.8 1.92 z 2.99 .0402
(1) Get the facts: x =1.92 =0.9 n=500 test claim: > 1.8 (2) Describe the distribution: mean=1.92 stdev=.0402 normal (3) Draw the graph:

## Areas: area to the left = .0014 area to the right = .9986

-2.99

(4) Do the calculations: z = 2.99 areas: .0014 & .9986 (5) Conclusion:

## Different center: an exercise

Step 5: Write the conclusion We are .9986 confident in the claim that > 1.8 (the
(1) Get the facts: x =1.92 =0.9 n=500 test claim: > 1.8 (2) Describe the distribution: mean=1.92 stdev=.0402 normal (3) Draw the graph:

## average American family has more than 1.8 children)

(4) Do the calculations: z=-2.99 areas .0014 & .9986 (5) Conclusion: We have .9986 confidence that > 1.8

Big Changes
All the formulas for the test statistic flip For means

0 x
s.d .

p0 p z s.d .

## These are equivalent to the confidence interval formulas

(just solve for 0) so we already used them without knowing it The formulas for x & z (given population info) were inverses; Now the formulas for and z from inference (confidence intervals & hypothesis tests) are inverses as they should be

Jason Samuels
jsamuels@bmcc.cuny.edu