Sie sind auf Seite 1von 60

Parameters vs. Statistics ?

Malla, G

MATh 257

30 Jan 2013

Parameters vs. Statistics ?

Statistic
(sample) x s s2 p

Characteristic
Mean Standard deviation Variance Proportion

(population)
2

Parameter

Malla, G

MATh 257

30 Jan 2013

Parameters vs. Statistics ?

Statistic
(sample) x s s2 p

Characteristic
Mean Standard deviation Variance Proportion

(population)
2

Parameter

Standard deviation?

Malla, G

MATh 257

30 Jan 2013

Estimation of the population mean ()

Malla, G

MATh 257

30 Jan 2013

Estimation of the population mean ()


Note down these developments.

Malla, G

MATh 257

30 Jan 2013

Estimation of the population mean ()


Note down these developments. (i) When variable X is normal with mean , and SD ,
x

z=

is standard normal.

Malla, G

MATh 257

30 Jan 2013

Estimation of the population mean ()


Note down these developments. (i) When variable X is normal with mean , and SD ,
x

z=
(ii) When 2

is standard normal.

is known,
z=
x / n

is standard normal.

Malla, G

MATh 257

30 Jan 2013

Estimation of the population mean ()


Note down these developments. (i) When variable X is normal with mean , and SD ,
x

z=
(ii) When 2

is standard normal.

is known,
z=
x / n

is standard normal.

(iii) When is

unknown, we estimate it by the sample SD, S , and then


t=
x S/ n

is said to follow the freedom (df).

student's t-distribution with (n-1) degree of

Malla, G

MATh 257

30 Jan 2013

What is df?

Malla, G

MATh 257

30 Jan 2013

What is df?
df for any statistic = Number of independent observations that are
used to calculate it. (Example ?)

Malla, G

MATh 257

30 Jan 2013

What is df?
df for any statistic = Number of independent observations that are
used to calculate it. (Example ?)

Some properties of the t-distribution

Malla, G

MATh 257

30 Jan 2013

What is df?
df for any statistic = Number of independent observations that are
used to calculate it. (Example ?)

Some properties of the t-distribution

(i) Like normal distribution, it is another Bell-shaped distribution with center at 0.

Malla, G

MATh 257

30 Jan 2013

What is df?
df for any statistic = Number of independent observations that are
used to calculate it. (Example ?)

Some properties of the t-distribution

(i) Like normal distribution, it is another Bell-shaped distribution with center at 0. (ii) Unlike normal distribution, it depends on the the df.

Malla, G

MATh 257

30 Jan 2013

What is df?
df for any statistic = Number of independent observations that are
used to calculate it. (Example ?)

Some properties of the t-distribution

(i) Like normal distribution, it is another Bell-shaped distribution with center at 0. (ii) Unlike normal distribution, it depends on the the df. Accordingly, there will be innite number of t-distributions, one of each df.

Malla, G

MATh 257

30 Jan 2013

What is df?
df for any statistic = Number of independent observations that are
used to calculate it. (Example ?)

Some properties of the t-distribution

(i) Like normal distribution, it is another Bell-shaped distribution with center at 0. (ii) Unlike normal distribution, it depends on the the df. Accordingly, there will be innite number of t-distributions, one of each df. (iii) For large df, t-distribution becomes z-distribution.

Malla, G

MATh 257

30 Jan 2013

What is df?
df for any statistic = Number of independent observations that are
used to calculate it. (Example ?)

Some properties of the t-distribution

(i) Like normal distribution, it is another Bell-shaped distribution with center at 0. (ii) Unlike normal distribution, it depends on the the df. Accordingly, there will be innite number of t-distributions, one of each df. (iii) For large df, t-distribution becomes z-distribution. (iv)

The main dierence between the normal dist. and t-distribution

Malla, G

MATh 257

30 Jan 2013

What is df?
df for any statistic = Number of independent observations that are
used to calculate it. (Example ?)

Some properties of the t-distribution

(i) Like normal distribution, it is another Bell-shaped distribution with center at 0. (ii) Unlike normal distribution, it depends on the the df. Accordingly, there will be innite number of t-distributions, one of each df. (iii) For large df, t-distribution becomes z-distribution. (iv) The main dierence between the normal dist. and t-distribution (Can you tell me, just looking at their graphs in the next slide?)

Malla, G

MATh 257

30 Jan 2013

t -distributions

Malla, G

MATh 257

30 Jan 2013

t -distributions

Malla, G

MATh 257

30 Jan 2013

t -distributions

Malla, G

MATh 257

30 Jan 2013

t -distributions

Malla, G

MATh 257

30 Jan 2013

t -distributions

Malla, G

MATh 257

30 Jan 2013

t -distributions

Malla, G

MATh 257

30 Jan 2013

t -distributions

Malla, G

MATh 257

30 Jan 2013

t -distributions

Area under the normal curve > area under the t-curve

Malla, G

MATh 257

30 Jan 2013

t -distributions

Area under the normal curve > area under the t-curve (at the center, near mean), but ...

Malla, G

MATh 257

30 Jan 2013

t -distributions

Area under the normal curve > area under the t-curve (at the center, near mean), but ... opposite is true at the tail of the distributions.

Malla, G

MATh 257

30 Jan 2013

t -distributions

Area under the normal curve > area under the t-curve (at the center, near mean), but ... opposite is true at the tail of the distributions.

Condence interval for (known )

Malla, G

MATh 257

30 Jan 2013

t -distributions

Area under the normal curve > area under the t-curve (at the center, near mean), but ... opposite is true at the tail of the distributions.

Condence interval for (known )


Suppose we wish to estimate the population mean, , using a sample of size n. A level 1 condence interval for is given by

Z/2 x

Malla, G

MATh 257

30 Jan 2013

t -distributions

Area under the normal curve > area under the t-curve (at the center, near mean), but ... opposite is true at the tail of the distributions.

Condence interval for (known )


Suppose we wish to estimate the population mean, , using a sample of size n. A level 1 condence interval for is given by

Z/2 x

Malla, G

MATh 257

30 Jan 2013

t -distributions

Area under the normal curve > area under the t-curve (at the center, near mean), but ... opposite is true at the tail of the distributions.

Condence interval for (known )


Suppose we wish to estimate the population mean, , using a sample of size n. A level 1 condence interval for is given by

Z/2 x

Condence interval for (unknown )

Malla, G

MATh 257

30 Jan 2013

t -distributions

Area under the normal curve > area under the t-curve (at the center, near mean), but ... opposite is true at the tail of the distributions.

Condence interval for (known )


Suppose we wish to estimate the population mean, , using a sample of size n. A level 1 condence interval for is given by

Z/2 x

Condence interval for (unknown )


Suppose we wish to estimate the population mean, , using a sample of size n. A level 1 condence interval for is given by

s t/2 x n

Malla, G

MATh 257

30 Jan 2013

t -distributions

Area under the normal curve > area under the t-curve (at the center, near mean), but ... opposite is true at the tail of the distributions.

Condence interval for (known )


Suppose we wish to estimate the population mean, , using a sample of size n. A level 1 condence interval for is given by

Z/2 x

Condence interval for (unknown )


Suppose we wish to estimate the population mean, , using a sample of size n. A level 1 condence interval for is given by

s t/2 x n
where /2 is the upper-tail area for the desired level of condence, and t/2 is chosen to correspond to /2 and (n 1) degrees of freedom.
Malla, G MATh 257 30 Jan 2013

Calculation of

t/2

using R

Malla, G

MATh 257

30 Jan 2013

Calculation of

t/2

using R

In R, t/2 = qt (1 /2, df).

Malla, G

MATh 257

30 Jan 2013

Calculation of

t/2

using R

In R, t/2 = qt (1 /2, df). For n = 50, df = 50-1 = 49, (i) For 98 % CI, (ii) For 95 % CI, (iii) For 80 % CI,

= .02. So, = .05. So, = .20. So,

t/2 = qt (.99, 49)= ? t/2 = qt (.975, 49) = ? t/2 = qt (.90, 49) = ?

Malla, G

MATh 257

30 Jan 2013

Calculation of

t/2

using R

Malla, G

MATh 257

30 Jan 2013

Calculation of

t/2

using R

For n = 50, df = 50-1 = 49, (i) For 98 % CI, (ii) For 95 % CI, (iii) For 80 % CI,

= .02. So, = .05. So, = .20. So,

t/2 = qt (.99, 49) = 2.4049 t/2 = qt (.975, 49) = 2.0096 t/2 = qt (.90, 49) = 1.2991

Malla, G

MATh 257

30 Jan 2013

Calculating condence intervals for impact numbers: A title of a research paper!


We considered a British and a Japanese cohort study that investigated the association between smoking and death from coronary heart disease (CHD) and between smoking and stroke, respectively. We used the reported death and disease rates and calculated impact numbers with corresponding 95% condence intervals. In the British study, the CIN was 6.46, i.e. on average, of any 6 to 7 persons who died of CHD, one case was attributable to smoking with corresponding 95% condence interval of [3.84, 20.36]. For the exposed cases, the results of ECIN = 2.64 with 95% condence interval [1.76, 5.29] were obtained. In the Japanese study, the CIN was 6.67, i.e. on average, of the 6 to 7 persons who had a stroke, one case was attributable to smoking with corresponding 95% condence interval of [3.80, 27.27]. For the exposed cases, the results of ECIN = 4.89 with 95% condence interval of [2.86, 16.67] were obtained.

Malla, G

MATh 257

30 Jan 2013

Interval Estimation of the population mean: unknown

Malla, G

MATh 257

30 Jan 2013

Interval Estimation of the population mean: unknown


Problem
Suppose you wanted to know the average amount spent per patient for the treatment of a certain type of disease at a particular hospital. You take a random sample of 34 patients and note the cost of their treatment. You nd the sample average cost to be $1027.18, with a sample standard deviation of $105.29. Find a 95% condence interval for the average cost per patient for the treatment of that disease at that hospital.

Malla, G

MATh 257

30 Jan 2013

Interval Estimation of the population mean: unknown


Problem
Suppose you wanted to know the average amount spent per patient for the treatment of a certain type of disease at a particular hospital. You take a random sample of 34 patients and note the cost of their treatment. You nd the sample average cost to be $1027.18, with a sample standard deviation of $105.29. Find a 95% condence interval for the average cost per patient for the treatment of that disease at that hospital.

Solution

Malla, G

MATh 257

30 Jan 2013

Interval Estimation of the population mean: unknown


Problem
Suppose you wanted to know the average amount spent per patient for the treatment of a certain type of disease at a particular hospital. You take a random sample of 34 patients and note the cost of their treatment. You nd the sample average cost to be $1027.18, with a sample standard deviation of $105.29. Find a 95% condence interval for the average cost per patient for the treatment of that disease at that hospital.

Solution
For 95% condence and (34 1) = 33 df, we nd t/2 = qt(0.975, 33) 2.035.

Malla, G

MATh 257

30 Jan 2013

Interval Estimation of the population mean: unknown


Problem
Suppose you wanted to know the average amount spent per patient for the treatment of a certain type of disease at a particular hospital. You take a random sample of 34 patients and note the cost of their treatment. You nd the sample average cost to be $1027.18, with a sample standard deviation of $105.29. Find a 95% condence interval for the average cost per patient for the treatment of that disease at that hospital.

Solution
For 95% condence and (34 1) = 33 df, we nd t/2 = qt(0.975, 33) 2.035.

s t/2 x = n

Malla, G

MATh 257

30 Jan 2013

Interval Estimation of the population mean: unknown


Problem
Suppose you wanted to know the average amount spent per patient for the treatment of a certain type of disease at a particular hospital. You take a random sample of 34 patients and note the cost of their treatment. You nd the sample average cost to be $1027.18, with a sample standard deviation of $105.29. Find a 95% condence interval for the average cost per patient for the treatment of that disease at that hospital.

Solution
For 95% condence and (34 1) = 33 df, we nd t/2 = qt(0.975, 33) 2.035.

s 105.29 t/2 x = 1027.18 2.035 n 34

Malla, G

MATh 257

30 Jan 2013

Interval Estimation of the population mean: unknown


Problem
Suppose you wanted to know the average amount spent per patient for the treatment of a certain type of disease at a particular hospital. You take a random sample of 34 patients and note the cost of their treatment. You nd the sample average cost to be $1027.18, with a sample standard deviation of $105.29. Find a 95% condence interval for the average cost per patient for the treatment of that disease at that hospital.

Solution
For 95% condence and (34 1) = 33 df, we nd t/2 = qt(0.975, 33) 2.035.

s 105.29 t/2 x = 1027.18 2.035 n 34


= 1027.18 36.75
Malla, G MATh 257 30 Jan 2013

Interval Estimation of the population mean: unknown


Problem
Suppose you wanted to know the average amount spent per patient for the treatment of a certain type of disease at a particular hospital. You take a random sample of 34 patients and note the cost of their treatment. You nd the sample average cost to be $1027.18, with a sample standard deviation of $105.29. Find a 95% condence interval for the average cost per patient for the treatment of that disease at that hospital.

Solution
For 95% condence and (34 1) = 33 df, we nd t/2 = qt(0.975, 33) 2.035.

s 105.29 t/2 x = 1027.18 2.035 n 34


= 1027.18 36.75
Malla, G

= [990.43, 1063.93]
MATh 257

30 Jan 2013

Interval Estimation of the population mean: unknown

Malla, G

MATh 257

30 Jan 2013

Interval Estimation of the population mean: unknown

[990.43, 1063.93]

Interpretations:

Malla, G

MATh 257

30 Jan 2013

Interval Estimation of the population mean: unknown

[990.43, 1063.93]

Interpretations:
Interval interpretation:

Malla, G

MATh 257

30 Jan 2013

Interval Estimation of the population mean: unknown

[990.43, 1063.93]

Interpretations:
Interval interpretation: We are 95% condent that the actual average cost per patient for the treatment of that disease at that hospital is between $990.43 and $1063.93.

Malla, G

MATh 257

30 Jan 2013

Interval Estimation of the population mean: unknown

[990.43, 1063.93]

Interpretations:
Interval interpretation: We are 95% condent that the actual average cost per patient for the treatment of that disease at that hospital is between $990.43 and $1063.93. Point estimate and margin of error:

Malla, G

MATh 257

30 Jan 2013

Interval Estimation of the population mean: unknown

[990.43, 1063.93]

Interpretations:
Interval interpretation: We are 95% condent that the actual average cost per patient for the treatment of that disease at that hospital is between $990.43 and $1063.93. Point estimate and margin of error: With 95% condence level, an point estimate of the true average cost per patient for the treatment of that disease at that hospital $1027.18, with a margin of error of $36.75.

Malla, G

MATh 257

30 Jan 2013

Condence interval for (unknown )

Malla, G

MATh 257

30 Jan 2013

Condence interval for (unknown )

When sample size (n) is large, say n = 500:

Malla, G

MATh 257

30 Jan 2013

Condence interval for (unknown )

When sample size (n) is large, say n = 500:

t/2 = qt(0.975, 499)

for 95% Condence level, and (500 1) = 499 df, we nd

Malla, G

MATh 257

30 Jan 2013

Condence interval for (unknown )

When sample size (n) is large, say n = 500:

t/2 = qt(0.975, 499) 1.96.

for 95% Condence level, and (500 1) = 499 df, we nd What did you notice in this problem?

Malla, G

MATh 257

30 Jan 2013

Condence interval for (unknown )

When sample size (n) is large, say n = 500:

t/2 = qt(0.975, 499) 1.96.

for 95% Condence level, and (500 1) = 499 df, we nd What did you notice in this problem? The t -value for the interval was

Malla, G

MATh 257

30 Jan 2013

Condence interval for (unknown )

When sample size (n) is large, say n = 500:

t/2 = qt(0.975, 499) 1.96.

for 95% Condence level, and (500 1) = 499 df, we nd What did you notice in this problem? The t -value for the interval was 1.96  practically identical to the z -value we used when was known. Why?

Malla, G

MATh 257

30 Jan 2013

Condence interval for (unknown )

When sample size (n) is large, say n = 500:

t/2 = qt(0.975, 499) 1.96.

for 95% Condence level, and (500 1) = 499 df, we nd What did you notice in this problem? The t -value for the interval was 1.96  practically identical to the z -value we used when was known. Why? Because the sample size was so large the sample standard deviation, s , provides a very good estimate of .

Malla, G

MATh 257

30 Jan 2013

Das könnte Ihnen auch gefallen