free

© All Rights Reserved

Als DOCX, PDF, TXT **herunterladen** oder online auf Scribd lesen

42 Aufrufe

free

© All Rights Reserved

Als DOCX, PDF, TXT **herunterladen** oder online auf Scribd lesen

- Mat 2377 Final Spring 2011
- Amylase Inactivation by Temperature During Starch Hydrolysis
- Lec9_2017_3027
- Mean Distribution
- Standard and Vocabulary Sheet
- Midterm Assignment
- Chapter05 Vam
- 29 How to Establish Stage Discharge Rating Curve
- Standard Deviation
- ASWchapter6
- euclid.ejs.1346421603.pdf
- 5 the Normal Distribution
- ZScore
- Asq Control Chart
- Monte Carlo Simulation
- Transportation Statistics: table 34
- c Wo 5896559088
- Plugin Unit15
- T3_OneWay
- Est Koprueba

Sie sind auf Seite 1von 33

1. Why do We Use Samples?

2. Probability Sampling

2.1. Simple Random Samples

3. Sampling Distributions

3.1. The Sampling Distribution of the Sample Mean x

3.1.1.

The Expected Value of x

3.1.1.1.

The Relationship Between the Mean of the Parent Population

and the Mean of All x Values

3.1.2.

1The Variance of xx and Standard Error of x

3.1.3.

The Relationship Between the Variance of the Parent Population and

the Variance of x

3.1.4.

The Shape of the Sampling Distribution of x . The Relationship

Between the Parent Population Distribution and the Sampling Distribution

3.1.5.

Examples Using the Normal Sampling Distribution of x

3.1.6.

The Margin of Sampling Error (MOE)

3.1.7.

Error Probability

3.1.8.

Determining the Sample Size for a Given MOE

3.2. The Sampling Distribution of the Sample Proportion px

3.2.1.

The Expected Value of px

3.2.1.1. The Relationship Between the Parent Population Proportion and

the Mean of All px Values

3.2.2.

The Variance of px and Standard Error of px

3.2.2.1. The Relationship Between Variance of the Binary Parent

Population and the Variance of px

3.2.3.

The Sampling Distribution of px as a Normal Distribution

3.2.4.

Margin of Error for px

3.2.5.

Determining the Sample Size for a Given MOE

Sampling is the basis for inferential statistics. A sample is a segment of a population. It is,

therefore, expected to reflect the population. By studying the characteristics of the sample

one can make inferences about the population. There are several reasons why we take a

part of the population to study rather than taking a full census of the population. These are:

Sampling takes less time.

Samples are more accurate. Sample observations are usually of higher quality

because they are better screened for errors in measurement and for duplication and

misclassifications;

Samples can be destroyed to gain information about quality (destructive sampling).

2. Probability Sampling

Page 1 of 33

A sample in which each element of the population has a known and nonzero chance of being

selected is called a probability sample.

2.1.

A simple random sample is a probability sample in which all possible samples of size n are

equally likely to be chosen. To explain this requirement, let the population consist of letters

A, B, C, D, and E. Since there are five items in the population, then

N=5 . We want to

select a sample of size 3, that is, n=3 . Since sampling is random (the letters are written

on little balls and are put in a bowl), there is more than one way that we can select 3 items

from 5 items. Using the combination formula, the total number of possible samples is

C(N, n) = C(5, 3) = 10. The following is the list of all 10 possible samples:

ABC

ADE

ABD

BCD

ABE

BCE

ACD

BDE

ACE

CDE

The definition of SRS implies that each sample has the equal chance of 0.10 of being

selected. This process of simple random selection applies to a finite (small) population. The

simple random selection process is different when the population is not finite (large). Even

when the population is relatively small, the application of the definition becomes very

cumbersome. For example, what if the population size is 50 and we want to select a sample

of size 10. How many different samples are possible? Using the combination formula, the

total number of possible samples is 10,272,278,170. It would be impractical! to list all the

10.3 billion possible samples and select one of them at random.

The correct procedure to select a random sample is to assign a serial number to each of the

population elements and select the sample by drawing a pre-specified number of serial

numbers at random (use the "random numbers table").

Page 2 of 33

3. Sampling Distributions

A sampling distribution is a probability distribution of a sample statistic. Recall from Chapter

1 that a sample statistic is a summary characteristic computed from sample data. Since a

sample statistic is a summary characteristic obtained from a randomly selected sample, the

sample statistic is then a random variable. The value assigned to the sample statistic is

randomly determined. Furthermore, because a sample statistic is a random variable, it has

a probability distribution. The probability distribution of a sample statistic is called a

sampling distribution.

3.1.

Since

variable. The value of

is that

x . The

is a random

To illustrate the sampling distribution of x in the simplest terms, consider the following

example. The Jones family has five children. The following table lists the age of the children.

Since we are considering the age of all the Jones children, then the age data constitutes a

population.

Name

Age

want to estimate the average

3

age of the children by taking a Beth

sample of size three. Note that

6

for estimation purposes only a Charlotte

single sample of a size n is

9

randomly selected.

Thus, a David

single random sample selected

12

from the above population Eric

may result in the sample

15

elements, say, Ann, Beth and

David,

with

corresponding

values {3, 6, 12}. But we know this is one of the 10 possible samples. 1 There are nine other

possible samples that we could have randomly selected. Next table lists all the ten possible

samples of size n = 3 that we may select from a population of size N=5 . The table also

shows the average age computed from the values of each sample.

1 Using the combination formula C(N, n), there are C(5, 3) = 10 different samples of size

three selected from 5 objects without replacement.

Page 3 of 33

Sample Mean

Sample

Sample Values

Composition

x

A

A

A

A

A

A

B

B

B

C

B

B

B

C

C

D

C

C

D

D

C

D

E

D

E

E

D

E

E

E

3

3

3

3

3

3

6

6

6

9

6

6

6

9

9

12

9

9

12

12

x =

9

12

15

12

15

15

12

15

15

15

x

n

6

7

8

8

9

10

9

10

11

12

In above table note that the x values 8, 9 and 10 appear twice. Since three of the ten xx

are repeated, then there are seven distinct values of x . Next table shows the sampling

distribution of x , which is the listing of all 7 possible values the random variable x can

take on along with the probability (relative frequency) associated with each value. Since in

the sampling process values 8, 9 and 10 each occur twice, then the probability associated

with these values is

2

=0.20 . The sampling distribution of the sample mean age is then,

10

Sampling Distribution of

f ( x )

6

7

8

9

10

11

12

0.1

0.1

0.2

0.2

0.2

0.1

0.1

1.0

Page 4 of 33

The sample statistic

The expected

value of x is the (weighted) average of all the sample means. The weights are the

probability associated with each value of the sample mean. Since the expected value

represents the average of all possible sample means, it is also denoted by the symbol

x .

E( x )= x = x f ( x )

In the Jones family example the expected value of the sampling distribution of

determined as shown in following table.

is

2Calculation of x

f ( x )

x f ( x )

6

7

8

9

10

11

12

0.1

0.1

0.2

0.2

0.2

0.1

0.1

0.6

0.7

1.6

1.8

2.0

1.1

1.2

E ( x )=x = x f ( x )=

9.0

Page 5 of 33

x=

x 6+ 7+8+9+ 10+9+11+12 90

=

= =9

n

10

10

3.1.1.1.

The Relationship Between the Mean of the

Parent Population and the Mean of All x Values

To show an important relationship between the expected value of

sample means,

population mean directly from the Jones family children population age data in.

x 3+6+9+12+15 45

=

= =9

N

5

5

The parent population average age = 9 is exactly the same as the mean of x . That is,

the mean value of all possible sample means is equal to the mean of the parent

populationthe mean of the means equals the mean.

E( x )= x =

This equality is not coincidental for this example. The equality of the expected value of the

sampling distribution of x and the population mean is true for all sampling distributions

of x . The mean of the means equals the mean!2

The variance of x , denoted by var ( x ) , like any other variance measure, is simply the

mean squared deviation of the random variable x . Since within the random variable

framework the mean and expected value convey the same meaning, then we can express

the variance of x as the expected value (weighted mean) of the squared deviations of

x :

var ( x )=E [( x )2 ]= ( x )2 f ( x )

Next table shows the calculation of

E( x )=

Page 6 of 33

4Calculation of

x

6

7

8

9

10

11

12

var ( x )=E [( x ) ]

f ( x )

( x )2 f ( x )

0.1

0.1

0.2

0.2

0.2

0.1

0.1

var ( x )=E [ ( x )2 ] =

0.9

0.4

0.2

0.0

0.2

0.4

0.9

3.0

se ( x ) . The standard error is a measure of the dispersion of all possible x values

around the mean of x . It is the positive square root of the var ( x ) . For the Jones family

example:

se ( x )= var ( x )= 3=1.732

3.1.2.1.

The Relationship Between the Variance of

Parent Population and the Variance of x

Going back to the population age data, compute the population variance, using the variance

formula we learned in Chapter 1:

2

(x) 90

= =18

N

5

between var ( x ) and 2. This relationship is shown as

Note that

var ( x )=

2 N n

n N1

var ( x )=

In the

18 53

=3

3 51

( )

var ( x )

n

( NN1

)

Page 7 of 33

n

( NN1

)

When the population is finite or small, as in the example above, the sample size relative to

the population,

n

, is large:

N

3

=60 . When population is nonfinite or large this ratio

5

becomes insignificant, the FPCF approaches 1 and, therefore, it plays no role in the

var (xx ) formula. The tendency of the FPCF to approach 1 as N gets larger is shown in the

following table. A sample size of n=10 is used to show this tendency.

as N Increases (for n = 10)

N

N n

N1

25

50

100

1,000

10,000

100,000

1000,000

0.6250

0.8163

0.9091

0.9910

0.9991

0.9999

1.0000

var ( x )=

becomes

2

n

se ( x )=

Values

To explain the concepts of sampling distribution, expected value, and standard error of the

sampling distribution, we used a simple example where from a very small parent population

(N=5) we took very small samples (n=3) . The number of possible samples (, the

Greek letter nu) is determined using the combination formula:

3 See Appendix for the mathematical proof.

Page 8 of 33

=C ( N , n )=C ( 5, 3 )=10 .

When the population size N increases, even with small sample size n , the number of

possible samples , and the number of corresponding

x values computed from these

samples, quickly rises to astronomical levels. The following table shows this clearly.

N

5

10

50

n

3

3

5

100

10

10

120

2,118,760

17,310,309,456,

440

population, where N = 608, from which we selected a single sample of size n=40 to

explain the difference between the population parameter and the sample statistic x .

For that explanation we used only a single sample the values of which were selected

randomly. This sample yielded a sample mean of x =62.8 . This was only one sample and

one x among the following possible number of x values:

=

749,670,807,490,441,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000.

With the introduction of the variance of x we have added a new variance concept to the

two we learned in Chapter 1. These variance concepts are summarized below:

Population Variance measures the mean squared deviation of population data from the

population mean:

2=

(x)2

N

Sample Variance measures the mean squared deviation of a sample data from the sample

mean:

2

(x x )

s=

n1

2

Variance of the mean x measures the mean squared deviation of all possible

x

values from the mean of

x . Since in all sampling problems there are astronomically

large number of x values, there is no formula to compute the var ( x ) from all possible

values of x . Rather, if the population variance is given, var ( x ) is determined as

follows:

2

var ( x )=

n

Page 9 of 33

Relationship Between the Parent Population Distribution

and the Sampling distribution

The foundation of inferential statistics is the sampling distribution. We use the sampling

distribution of

x to infer about the population mean . The shape of the sampling

distribution plays a vital role in inferential statistics. In order make the inference about the

population parameter, the sampling distribution must have a specific shape. The required

shape of distribution is the normal distribution. If the sampling distribution is not normal,

then it cannot be used for inferential statistics.

At the outset, the most important issue to understand is that the shape of the sampling

distribution of x depends on one of two things: (1) the shape or distribution of the

population data set, and/or (2) the size of the sample (n) .

3.1.4.1.

When the Parent Population Has a Normal

(Bell-Shaped) Distribution

The first practical conclusion from this discussion is that when the parent population has

a normal (bell-shaped) distribution with mean and standard deviation , the

sampling distribution of

also has a normal distribution with mean

x

E ( x )= x = and standard deviation (standard error) se ( x )= / n .

Page 10 of 33

normally distributed

When the parent population is not normally distributed, the shape of the sampling

distribution will depend on the sample size n . The sampling distribution of x will

approach normal as the size of the sample increases. The rule thumb is, if the sample size

is 30 or more, the sampling distribution of

x will be treated as if normal. This conclusion

is based on the Central Limit Theorem.

This property of the sampling distribution makes statistical inference about possible even

when the population is not normally distributed.

Page 11 of 33

Distribution of x

The subsequent chapters are all devoted essentially to inferential statistic, where we will

apply the basic concepts we learned in this chapter to infer about characteristics of

population data by analyzing the characteristics of sample data. Inferences about a

summary characteristic of the population data, for now the mean , from the mean of a

sample are never exact statements. These inferences, instead, are probabilistic statements.

To make these probabilistic statements, and be able to state the exact probabilities, it is

essential that the sampling distribution of

x be normal. The following examples are

typical applications of the normal distribution to the sampling distribution of x . What we

learn from these examples, will help us with understanding of inferences about the

population mean in the subsequent chapters.

Example 1

In a bottling plant the amount of soda in each 32-ounce bottle is a normally distributed

random variable with a mean = 32 ounces and standard deviation of = 0.3 ounces.

a)

If a single bottle is randomly selected, what is the probability that it contains between

31.8 and 32.2 ounces of soda? Alternatively stated, given the mean and standard

deviation of the fill of bottles, what fraction (proportion, or percentage) of the bottles

contain between 31.8 and 32.2 ounces of soda?

Note: This part of the problem does not deal with sampling distribution. It is shown,

however, to explain how to differentiate between the probability of

x (the random

variable representing the parent population) and the probability of

x (the random

variable representing the sample means).

= 32

= 0.3

P(31.8< x <32.2)

z=

x 31.832

=

=0.670.67

0.3

b)

If a sample of size n = 9 bottles is taken, what is the probability that the mean of this

sample, xx , is between 31.8 and 32.2 ounces? Alternatively stated, what fraction

(proportion, or percentage) of the means obtained from samples of size n = 9 fall

within 31.8 and 32.2 ounces?

Now you are dealing with the probability distribution of x . Since the parent population of

bottles is normal, then the distribution of x values (the sampling distribution of x ) is

also normal with the following mean and standard deviation (standard error):

x ==3 2

se ( x )=

0.3

=

=0.1

n 9

Page 12 of 33

P(31.8< x <32.2)

First we must convert the normal random variable

conversion formula is

z=

x

se ( x )

z 1=

31.832

=2.00

0.1

z 2=

32.232

=2.00

0.1

and

P(2.00 < z < 2.00) = 0.9545

x (the parent

population) and sampling distribution of x both have the same mean ( = 32), the same

interval (31.8-32.2) contains 95.5% of all the x values, but only 49.7% of the x values.

Page 13 of 33

The reason for this difference is that the x values are far less dispersed than the x

values.

And, this is because the standard deviation of the distribution of

x ,

se ( x )= / n , is smaller than , the standard deviation of x . The x values are

much more closely clustered around the mean =32 than the x values.

The next example is used to explain the extremely important concept of the margin of

sampling error ( MOE) . This concept plays a crucial rule in inferential statistic. You

must always keep MOE in mind when dealing with the sampling distribution of a sample

statistic.

Example 2

A given population has a mean of 50 and a standard deviation of 18. Consider the sampling

distribution of the means of samples of size 36 obtained from this population. Find the

interval of x values that contains the middle 90 percent of all possible x values.

First, establish the parameters of the distribution of the population, and the parameters of

the sampling distribution. In the population, x is normally distributed with mean the

mean and standard deviation:

=5 0

=18

normal) with mean and standard deviation (standard error):

x ==50 se ( x )=

18

=

=3

n 36

where

x 1

and

x 2

represent the upper and lower end of the interval which contains the middle 90% of all

possible sample means obtained from samples of size n=36 . The objective is to find the

x 1 and x 2 .

values of

into the standard normal random variable z :

x 1 and

Page 14 of 33

z=

x

se ( x )

x :

x =+ z se ( x )

The term

( MOE) .

margin of error

MOE=z se ( x )

To find MOE, first compute the standard error of

se ( x )=

x .

18

=

=3

n 36

The value for z is determined as follows: Note that the middle area within the interval is

90%. Thus, the two tail areas are 5% each. Therefore, the z score corresponding to x 2 is

the

z 0.05=1.64 . Thus,

The margin of error of 4.92 simply implies that the middle 90% of all possible x values

fall within 4.92 (data units) from the population mean . The lower and upper ends of the

interval are thus:

x L =504.92=45.02

x U =50+4.92=54.92

Page 15 of 33

Again, the lower and upper boundaries of this interval indicates that the middle 90% of all

x fall within the interval bounded by 45.08 and 54.92. Stated differently, 90% of the

means computed from samples of size n=36 deviate from the parent population mean by

no more than 4.92.

Example 3

In the previous example, where =50 and =18 , find the interval that contains the

middle 95% of all the means obtained from samples of size n=36 .

Form this example we must find the 95% margin of error.

Thus,

x 1=505.88=44.12

x 2=50+5.88=55.88

Example 4

In the soda bottle example, where =32 ounces and =0.3 ounces, find the interval

that contains the middle 95% of the means obtained from samples of size n=25 bottles.

Since the middle interval to contain 95% of all x values, then then each tail area would

contain 2.5% of x s. The z score that bounds a tail area of 0.025 is z 0.025 =1.96 .

x 1 , x 2= MO E

MOE=z 0.025 se ( x )

se ( x )=0.3/ 25=0.06

MOE=1.96 ( 0.06 )=0.118

x 1 , x 2=32 0.118=( 31.882, 32.118 )

Page 16 of 33

We can, therefore, state that of every 100 samples of size 25 that we select from the

population of soda bottles, we expect 95 of them to have a sample mean fill that is between

31.88 and 32.12 ounces.

3.1.7.

Error Probability

In computing the MOE in the first two examples in this section, each MOE involved a

specified probability. The first required a middle interval with a 90% margin of error, and the

second a 95% MOE. In the first example, the middle interval built around using a 90%

MOE contained 90% of all possible sample means. Thus 10% of the sample means fell

outside the interval, that is, they deviated from by more than the established MOE. Thus,

in that example, if a random sample of size n=36 were selected from the population,

there was a 10% probability that the sample mean deviated from the =50 by more than

4.92 . This 10% probability is called the error probability and is denoted by the Greek

letter .

In the second example, 95% of sample means deviated from =50 by no more than

5.88. The error probability in that example was, therefore, = 0.05.

Using the as a general symbol for error probability, the

written as:

MOE

MOE=z /2 se ( x )

Note that the subscript of z is /2 , since we divide the error probability equally

between the two tails of the normal curve.

Error

In the margin of error formula

Thus,

MOE=z /2

This indicates that the MOE varies inversely with the sample size n. The bigger the

sample size, the narrower the MOE . In many statistical questions you are required to

determine the sample size for a specified MOE . To determine n , we can reconfigure

the MOE formula as follows:

n=

z / 2

MOE

Squaring both sides, we obtain the formula to determine the sample size

given MOE .

for a

Page 17 of 33

z

n= / 2

MOE

Example 5

In the previous example, where =32 ounces and =0.3 ounces, what should the

sample size be so that 95% of all possible sample means fall within a margin of error of 0.08

( MOE=0.08 ) ounces from the population mean?

Given a 95%

n=

=0.05 .

z / 2

1.96 0.3 2

=

=54.02

MOE

0.08

) (

Note that in this example, we are interested in a narrower margin of error (0.08 versus

0.118). To make MOE narrower and, hence, the interval more precise, we must increase

the sample size. Of every 100 means obtained from samples of size n = 55 bottles, 95 of

them are expected to fall within 0.08 ounces from the mean of all bottles filled by the

machine.

Proportion p

Consider a population of size N . Let x be the number of elements in the population

that have a given attribute. Assign the number 1 to the elements with this attribute and

0 to all others. Then the population data is binary, and x is a binary variable. As

explained in Chapter 1, the mean of a binary population data set is called the proportion and

is denoted by . We use the same formula to compute the population proportion formula as

for the population mean:

x

N

For example, in a given academic year a total of 37,196 students (full-time equivalent) were

enrolled at a major university campus, of whom 30,131 were undergraduate students.

Assigning 1 to undergraduate student, then the population proportion of undergraduates

enrolled at this campus is:

30,131

=0.81

37,196

Now, suppose a sample of size n students is taken from the population. The proportion of

undergraduates in the sample, the sample proportion, is

Page 18 of 33

p=

x

n

undergraduate students, then the sample proportion is,

p=

x=156

were

156

=0.78

200

Note that, like x , which is the sample statistic estimating the population parameter ,

p is also a sample statistic, now estimating the population parameter . Like x , p

is then a random variable because its value is determined by the outcome of a random

experimentthe experiment being selecting a random sample. The probability distribution

of p is called the sampling distribution of p .

To explain how the sampling distribution is generated, consider the Jones family example

used in explaining the sampling distribution of x . In this case, instead of the age of the

children, we are interested in a non-quantitative attribute of the children, their gender

(male/female). To show how the concepts of the sampling distributions of x and

p are closely related, assign the value 1 to female (the attribute of interest in this

example) and 0 to male. The following table shows the population elements by gender

and the numeric assignment to each gender.

Gender of the Jones Family Children

Numeric

Name

Gender

Assignment

Ann

F

1

Beth

F

1

Charlotte

F

1

David

M

0

Eric

M

0

The proportion of female in the population of the Jones family children is,

3

= =0.60

5

Now, we conduct an experiment by taking a sample of size n=3 to estimate the

population proportion. For samples of size n=3 , there are 10 samples possible with the

sample proportion of females shown in the following table.

Page 19 of 33

Children

Sample

Proportion

x

Sample Values

p=

xi

Sample Composition

A

B

C

A

B

D

A

B

E

A

C

D

A

C

E

A

D

E

B

C

D

B

C

E

B

D

E

C

D

E

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

0

0

1

0

0

0

0

0

0

0

0

0

3/3

2/3

2/3

2/3

2/3

1/3

2/3

2/3

1/3

1/3

The sampling distribution of p , the proportion of females, is shown below as the relative

frequency of the proportions in previous table.

Sampling Distribution of

f ( p )

1/3

2/3

3/3

0.30

0.60

0.10

1.00

other random variables, p has an expected value and a standard deviation. The

expected value of p is the (weighted) mean of all the sample proportions. The weights

The sample statistic

are the probability associated with each value of the sample proportion. Since the expected

value represents the mean of all possible sample proportions, it is also denoted by the

symbol

p .

E( p )= p= p f ( p )

Using the sampling distribution of the sample proportion of females shown in the previous

table, the calculation of the mean of p is shown as follows.

Page 20 of 33

Calculation of E(px )

f ( p )

p f ( p )

1/3

2/3

3/3

0.30

0.60

0.10

0.10

0.40

0.10

E ( p )= p= p f ( p )= 0.60

Alternatively, we can compute

p=

p values.

=

=0.60

10

10

3.2.1.1.

The Relationship Between the Parent

Population Proportion and the Mean of All p

Values

Now, considering the binary population data of the gender of the children, three out of five

children are female. Therefore, the population proportion is,

=3 /5=0.60

Note the important conclusion here that the mean of all possible sample proportions is

exactly the same as the population proportion .4

E( p )= p =

Recall that at the start of this discussion it was stated that the proportion is a special case of

the mean where the values in the data set are binary values 0s and 1s. Thus, the mean of

the sampling distribution of p and the mean of sampling distribution of x are both

equal to the population mean. Only the symbols differ is the mean of the population

when the data is binary, and is the mean of non-binary data.

The variance of the random variable p

, denoted by

weighted mean squared deviation, of p

.

Since

p= , then,

E( p )=

Page 21 of 33

Distribution of p

2

f ( p )

( p ) f ( p )

1/3

2/3

3/3

0.30

0.60

0.10

0.021

0.003

0.016

( p ) f ( p )=

0.040

se ( p )= var ( p )

se ( p )= 0.04=0.2

3.2.2.1. The Relationship Between Variance of the

Parent Population and the Variance of p

To explain the relationship, lets first compute the variance of the parent population in our

Jones children example. Using the appropriate symbols for the binary population data,

recalling from Chapter 1, the population variance is:

2= (1)

Thus, for the Jones family children binary data,

2=0.6(10.6)=0.24

The variance of px is then,

var ( p )=

(1 ) N n

n

N 1

Thus,

var ( p )=

0.24 53

=0.04

3 51

( )

When the population is non-finite the FPCF approaches 1 and disappears from the

picture and the formula for var ( p ) becomes simply:5

var ( p )=

(1 )

n

var ( p )=

(1 )

n

see Appendix.

Page 22 of 33

se ( p )=

p is then,

(1)

n

Distribution

as a Normal

In the binomial distribution, as the number of independent trials increases (and if probability

of success is closer to 0.5), then the distribution of the binomial random variable x , the

number of successes in the trial, can be approximated by the normal distribution. The rule

of thumb for x to be approximately normally distributed is:

n =5

and

n (1 )=5

Now, rather than x , we are interested in the distribution of the random variable p .

Note that p is a linear transformation of x , the number of successes (the number of

1s in the binary sample data):

p=

x

n

We transform

to

p by multiplying

x by the constant

1

. Thus, if

n

is

(Only the location of the normal curve along the number line changes and not its shape.)

The following diagram shows the sampling distribution of p as a normal distribution with

a mean of and the standard deviation (standard error) of

se ( p )=

(1) .

n

The following examples use the normal distribution to solve probabilities involving the

sampling distribution of px .

Page 23 of 33

Example 6

Sixty eight percent (68%) of vehicles on Indiana interstate highways violate the speed limit

( =0.68 ) . A sample of 500 vehicles are randomly clocked for speed. What is the

probability that more than 70% of vehicles in the sample violate the speed limit? Find

P( p >0.70)

Since the requirements for normal approximation are satisfied (n = 340, and n (1 ) =

231.2), then p is normally distributed with the following parameters:

p==0.68

se ( p )=

(1)

0.68(10.68)

=

=0.0209

n

500

z=

p to

is:

p

se ( p )

z=

0.700.68

=0.96

0.0209

P( z >0.96)=0.1685

The diagram indicates that 0.1685 proportion (16.85%) of sample proportions obtained from

random samples of n=500 would exceed 0.70.

Example 7

In the previous example, what is the probability that the sample proportion is within 3

percentage points from the population proportion? Alternatively stated, what proportion

(percentage) of p values computed from repeated samples of size n=500 are within 3

Page 24 of 33

percentage points

( 0.03)

=0.68

se ( p )=

n=500

(1)

=0.0209

n

P ( 0.65< p <0.71 )=

z 1=

0.650.68

=1.44

0.0209

z 2=

0.710.68

=1.44

0.0209

Example 8

In the previous example, what proportion (or proportion) of p values computed from

samples of size n = 500 fall within 4 percentage points ( 0.04 ) from the population

proportion?

=0.68

se ( p )=

n=500

(1)

=0.0209

n

Page 25 of 33

z 1=

0.6 40.68

=1. 91

0.0209

z 2=

0.7 20.68

=1. 91

0.0209

P(1.91< z<1.91)=0.9438

As the diagram shows 94.38% of p values computed from samples of size n=500 fall

within 0.04 ( 4 percentage points) from the population proportion =0.68 , that

is, they fall within the interval bounded by pL =0.64 and pU =0.72 .

Example 9

In the previous example, what proportion (or percentage) of

samples of size n = 1,000 fall within 3 percentage points

proportion?

( 0.03 ) from the population

P ( 0.680.03< p < 0.68+0.03 )

P ( 0.65< p <0.71 )=

Note that even though the p interval is the same as in the Example 7, the probability will

be different because the sample size is larger. We need to recalculate the standard error of

p taking into account the new, larger, sample size.

Page 26 of 33

se ( p )=

z=

(1)

0.68(10.68)

=

=0.0148

n

1000

0.650.68

=2.032.03

0.0148

3.2.4.

Similar to the discussion of MOE for x , the concept of margin of error for p plays a

crucial rule in inferential statistics. This is why we place a special emphasis on this topic.

The following example involves the MOE for p .

Example 10

Given that the population proportion of vehicles violating the legal speed limit is 0.68, using

the sample size of n = 1,000, in the sampling distribution of p find the interval of p

values which contains the middle 90% of all sample proportions computed from random

samples of size n=1,000 .

To find the lower and upper ends of the interval, you must add to and subtract from a

certain quantity (in this case, a proportion, or percentage points). The lower end and upper

p2 .

end are denoted by, respectively, p1

and

MOE

for

Page 27 of 33

p rearrange

z=

p

se ( p)

by solving for

p :

p= + z se( p)

Thus, to obtain

p1 we must subtract

z se ( p) and for

p2 must add

z se ( p) .

p1 , p2= z se ( p)

We know = 0.68 and, given n=1,000 , se ( p )=0.0148 . Since we want 90% of all

sample proportions to be included in the interval, then of the remaining = 10% (recall that

is called the error probability), one half will be on the right tail and the other half on the

left tail outside the interval. The margin of statistical error is then,

MOE=z /2 se ( p)

Since =0.10 , the relevant z-score is

interval is then:

MOE=1.64 (0.0148)=0.024

The lower and upper end of the interval are therefore:

This means that if you took repeated samples of 1,000 vehicles and computed the

proportion in each sample which violated the speed limit, then 90% of these proportions

would have values ranging from 0.656 to 0.704. Alternatively stated, 90% of sample

proportions would deviated from by no more than 0.025, or 2.5 percentage points.

Example 11

Page 28 of 33

Suppose in a certain election a candidate received 55% of the votes. What proportion (or

percentage) of sample proportions obtained from repeated samples of size n=600 voters

each would fall within 3 percentage points ( 0.03 ) of the population proportion of

0.55? The objective here is to find

=0.5 5

n=600

P(0.52< p <0.58)

se ( p )=

z=

( 1 )

0.55(10.55)

=

=0.0203

n

600

p

se ( p)

z 1=

0.520.55

=1.48

0.0203

z 2=

0.580.55

=1.48

0.0203

Therefore, about 86% of sample proportions would deviate from = 0.55 by no more than

0.03, or by no more than 3 percentage points.

Example 12

In the previous example, where = 0.55, what interval of

middle 95% of p values of all possible samples of size

Now that we have learned about

are:

n=600 ?

pL , pU = MO E

Since the interval is to contain 95% of all sample proportions, then the error probability is

= 0.05. The margin of error is then,

MOE=z /2 se ( p)

where relevant z-score is

Chapter 4Sampling Distributions

Page 29 of 33

That is, 95% of sample proportions in samples of size 600 fall within 0.04 (or 4 percentage

points) from the population proportion of 0.55.

3.2.5.

p

MOE

for

Once again, in many inferential statistics questions you will be asked to determine the

sample size that yields a desired margin of error for p . Considering the formula for the

margin of error for p , the M OE varies inversely with sample size.

MOE=z /2 se ( p)

MOE=z /2

(1 )

n

We can rearrange this formula to solve for n. Squaring both sides and then solving for n we

obtain the formula to determine the sample size for a given MOE .

z /2 2

n=

(1)

MOE

Example

In the previous question, where = 0.55, what is the minimum sample size so that the

probability that the sample proportion is within 0.02 (or 2 percentage points) from the

population proportion is 95%?

Here we are looking for a 95% MOE . Therefore, the error probability is

z / 2=z 0.025=1.96 . We want the margin of error to be MOE=0.02 .

=0.05 , and

1.96 (

0.55 ) (10.55 )=2376.99

0.02

( )

n=

Page 30 of 33

Appendix

The proof that

E ( x )= x =

x

E( x )=E

n

1

E( x )= E ( x 1 + x 2++ x n )

n

( )

x

x

E( n)

( 1)+ E ( x 2)++

E

1

E(x )=

n

x i are selected from the same population, then

Since all

x

x

E ( n)=

( 1)=E ( x 2 )==

E

Therefore,

1

n

E ( x )= ( + ++ )= =

n

n

The proof that

2

var ( x )=

var (x)=var

var (x)=

Since all

( nx )

1

var ( x 1+ x 2 ++ x n )

n2

x i are independently selected from the same population,

and,

Page 31 of 33

Thus,

var ( x )=

1( 2 2

n 2 2

2

)

++

=

=

n

n2

n2

After taking a sample of size n, determining the number of successes x in the sample

(number of 1s in the binary data) is a Bernoulli process. Thus x has a binomial

distribution. In Chapter 2 it was shown that the expected value of a binomial random

variable is:

E ( x )=n

Since

p=

x

n

then,

x=n p

Substituting

n p for

in

E ( x )=n ,

E ( n p )=n

For a given sample size n, then

E ( n p )=n E( p)

n E ( p )=n

Dividing both sides by

n ,

E ( p )=

The proof that

var ( p )=

(1 )

n

then the variance of x is:

( x)

var ( x)=n (1 )

Page 32 of 33

Substituting for

x=n p , we have:

var (p)=

(1 )

n

Page 33 of 33

- Mat 2377 Final Spring 2011Hochgeladen vonDavid Lin
- Amylase Inactivation by Temperature During Starch HydrolysisHochgeladen vonJulian Dario Mora Morales
- Lec9_2017_3027Hochgeladen vonJi Seong Bae
- Mean DistributionHochgeladen vonasdlkjasdlkj123
- Standard and Vocabulary SheetHochgeladen vonMegan Golding
- Midterm AssignmentHochgeladen vondfsdfsdfdf4646545
- Chapter05 VamHochgeladen vonVamshi Krishna
- 29 How to Establish Stage Discharge Rating CurveHochgeladen vontsuak
- Standard DeviationHochgeladen vonM.PRASAD NAIDU
- ASWchapter6Hochgeladen vonKamal Yagami
- euclid.ejs.1346421603.pdfHochgeladen vonNisa ニサ Ashin
- 5 the Normal DistributionHochgeladen vonIndah Prasetyawati
- ZScoreHochgeladen vonGerald Evans
- Asq Control ChartHochgeladen vondavid.rip
- Monte Carlo SimulationHochgeladen vonnyj martin
- Transportation Statistics: table 34Hochgeladen vonBTS
- c Wo 5896559088Hochgeladen vonAli
- Plugin Unit15Hochgeladen vonromesh10008
- T3_OneWayHochgeladen vonTeflon Slim
- Est KopruebaHochgeladen vonBenjamin Enoc
- Assignement DerivativesHochgeladen vonetravo
- 02 2011 May (2).pdfHochgeladen vonLual Malueth Kucdit
- Confidence IntervalHochgeladen vonLu Shuyuan
- Www Analyzemath Com Statistics Normal Distribution HTMLHochgeladen vonnishant gaurav
- Hasil Spss Rilo BarupisanHochgeladen vonRylo Pambudi
- OutputHochgeladen vonShinta Friesiliya An-nurr
- sHochgeladen vonSujeet Kumar
- Report 208Hochgeladen vonjoaosevan
- Quiz 06Hochgeladen vonStatia
- Chap012_GCP_week3.pdfHochgeladen vonTzyy Ng

- vulHochgeladen vonDevid Bita
- sm4Hochgeladen vonDevid Bita
- sm3Hochgeladen vonDevid Bita
- sm2Hochgeladen vonDevid Bita
- July2007 FormulasHochgeladen vonmirmoinul
- 3 Nishad (Nishada - An Indigenous Tribe Inhabiting Ancient India, According to Sources in Hindu Mythology) by Humayun Ahmed{Dobd.tk}Hochgeladen vonDevid Bita
- smHochgeladen vonDevid Bita
- bbq2Hochgeladen vonDevid Bita
- BbqHochgeladen vonDevid Bita
- bbnHochgeladen vonDevid Bita
- ValHochgeladen vonDevid Bita
- Application of Order StatisticsHochgeladen vonDevid Bita
- Maxim Gorky - MotherHochgeladen vonapi-3829419
- CafeteriaHochgeladen vonDevid Bita
- TariffsHochgeladen vonDevid Bita
- Great Answers to Tough Interview QuestionsHochgeladen vonrohail_taimour5089
- Female Migration12Hochgeladen vonDevid Bita
- Circular Admission 2015 16.DocHochgeladen vonDevid Bita
- Statistics.xlsHochgeladen vonDevid Bita
- Gujrati Table 9.3Hochgeladen vonDevid Bita
- STAT books.txtHochgeladen vonDevid Bita

- mm8500Hochgeladen vongranrio34
- Mrt SyllabusHochgeladen vonBat Mon
- history of ic enginesHochgeladen vonSrinivas Rajanala
- ALP Progressing Cavity Pumps Nov2010Hochgeladen vonSteve Marfissi
- Pauskara Agama Vidy PadaHochgeladen vonaghorishiva
- sianidaHochgeladen vonArina Windri Rivarti
- Geography Optional SyllabusHochgeladen vonAshishMahajan
- fsq110Hochgeladen vonzektor
- Concrete BasicsHochgeladen vonengcecbepc
- teach yourselfHochgeladen vonhaider
- Murrah BuffaloHochgeladen vonpremnaryans
- TWI Knowledge Summary - Fitness for PurposeHochgeladen vondhurusha
- The Oerth Journal 9.pdfHochgeladen vonvanhalen
- Academic Writing Sample Task 1 [56-92]Hochgeladen vonC IB Cibeternal
- ntbtlogHochgeladen vonJawad Malik
- TUGAS B INGGRIS .docxHochgeladen vonwilujeng
- 987_9.pdfHochgeladen vonHerman Jamal
- Rajani KanthHochgeladen vonSwagat Pradhan
- Langstroth 2011 - Liolaemus Stolzmanni L. Reichei and L. Jamesi Pachecoi-libreHochgeladen vonFrancisco J. Ovalle
- P2 Marking Scheme 2017Hochgeladen vonElizabeth Knight
- The Application of Theory of Failure is in the Context of Assessment of the Margin of SafetyHochgeladen vonHarish Shridharamurthy
- ANTIMATTER.docHochgeladen vonPrabir Kumar Pati
- f 42 - 93 r97 _rjqyltkzujk3Hochgeladen vonjamaljamal20
- GMW-14829.pdfHochgeladen vonEd Ri
- Egyptian Journal of Petroleum Volume issue 2015 [doi 10.1016_j.ejpe.2015.03.001] Al-Sabagh, A.M.; Yehia, F.Z.; Eshaq, Gh.; Rabie, A.M.; ElMetwall -- Greener routes for recycling of polyethylene tereHochgeladen vonmaged_abdnagho
- Syllabus Bangla EF 2009 10Hochgeladen vonmaruf_billah
- De Thi Hoc Sinh Gioi Mon Tieng Anh Lop 8 Huyen Vinh YenHochgeladen vonMy Kiwido
- Food 2030 StrategyHochgeladen vonjamaljamal20
- FribeeHochgeladen vonswaroopcharmi
- All Tense PracticeHochgeladen vonAnooj D. Patel