Sie sind auf Seite 1von 39

5

Random Sample and Central Limit


Theorem; X-Bar and R control charts.

Exercise 1: (Example 1)
Suppose X1, X2, , X20 is a sample from normal distribution N ( 2) with = 5,
2 = 4. Find
X
(a) Expectation and Variance of
(b) Distribution of X

Exercise 2: (Example 2)
Given that X is normally distributed with mean 50 and standard deviation 4, compute the
following for n=25.
(a)
(b)
(c)
(d)

Mean and variance of X


P ( X 49)

P ( X 52)
P ( 49 X 51.5)

Exercise 3: (Tutorial 5, No.1)

Probability and Statistics Work Book

Given that X is normally distributed with mean 20 and standard deviation 2, compute the
following for n=40.
(a) Mean and variance of X
(b) P ( X 19)
(c) P ( X 22)
(d) P (19 X 21.5)
Solution:
(a) Mean of X = 20 and variance of X = 4/40 = 0.1

19 20
) P ( Z 3.16) 0.000789
0.1
22 20
) P ( Z 6.32) 1 P ( Z 6.32) 1 1 0
(c) P ( X 22) P ( Z
0.1)
(b) P ( X 19) P ( Z

(d)

19 20
21.5 20
Z
) P ( 3.16 Z 7.9)
0.1
0.1
(7.9) ( 3.16) 1 0.000789 0.999211

P (19 X 21.5) P (

Exercise 4: (Tutorial 5, No.2)


Let X denote the number of flaws in a 1 in length of copper wire. The pmf of X is given in the
following table
X=x
P(X=x)

0
0.48

1
0.39

2
0.12

3
0.01

100 wires are sampled from this population. What is the probability that the average number
of flaws per wire in this sample is less than 0.5?
Solution: Given that,
Mean of X = 0(0.48) + 1(0.39) + 2(0.12) + 3(0.01)=0.66
Variance of X =[ 02(0.48) + 12(0.39) + 22(0.12) + 32(0.01) ] (0.66)2 = 0.5244
If n=100, the mean of X is 0.66 and the variance of X is 0.5244/100 = 0.005244
So, P ( X 0.5) P ( Z

0.5 0.66
) P ( Z 2.21) 0.0136
0.005244

Exercise 5: (Tutorial 5, No.3)


At a large university, the mean age of the students is 22.3 years, and the standard deviation is
4 years. A random sample of 64 students is drawn. What is the probability that the average
age of these students is greater than 23 years?

Probability and Statistics Work Book

Solution: Given that, the mean of X is 22.3 and the variance of X is 16


If n = 64, the mean of X is 22.3 and the variance of X is 16/64 = 0.25

So,

23 22.3
) P ( Z 1.4) 1 P ( Z 1.4)
0.25
1 (1.4) 1 0.919 0.081

P ( X 23) P ( Z

Exercise 6:
The flexural strength (in MPa) of certain concrete beams is X ~ N (8, 2.25). Find the
probability that the sample mean of strength of 16 concrete beams will belong to (7.55, 8.75)

Exercise 7(Example 3)
A component part for a jet aircraft engine is manufactured by an investment casting process.
The vane opening on this casting is an important functional parameter of the part.

Probability and Statistics Work Book

We will illustrate the use of X and R control charts to assess the statistical stability of this
process. The table presents 20 samples of five parts each. The values given in the table have
been coded by using the last three digits of the dimension; that is, 31.6 should be 0.50316
inch.

Sample Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(a)
(b)

x1
33
33
35
30
33
38
30
29
28
38
28
31
27
33
35
33
35
32
25
35

x2
29
31
37
31
34
37
31
39
33
33
30
35
32
33
37
33
34
33
27
35

x3
31
35
33
33
35
39
32
38
35
32
28
35
34
35
32
27
34
30
34
36

x4
32
37
34
34
33
40
34
39
36
35
32
35
35
37
35
31
30
30
27
33

x5
33
31
36
33
34
38
31
39
43
32
31
34
37
36
39
30
32
33
28
30

X
31.6
33.4
35.0
32.2
33.8
38.4
31.6
36.8
35.0
34.0
29.8
34.0
33.0
34.8
35.6
30.8
33.0
31.6
28.2
33.8

r
4
6
4
4
2
3
4
10
15
6
4
4
10
4
7
6
5
3
9
6

Construct X and R control charts.


After the process is in control, estimate the process mean and standard deviation.

Exercise 8(Tutorial 5, No.4)

Probability and Statistics Work Book

The overall length of a skew used in a knee replacement device is monitored using
and R
charts. The following table gives the length for 20 samples of size 4. (Measurements are
coded from 2.00 mm; that is, 15 is 2.15 mm.)
Observation
Sample 1

Observation

4 Sample 1

16 18 15 13

11

14 14 15 13

16 15 17 16

12

15 13 15 16

15 16 20 16

13

13 17 16 15

14 16 14 12

14

11 14 14 21

14 15 13 16

15

14 15 14 13

16 14 16 15

16

18 15 16 14

16 16 14 15

17

14 16 19 16

17 13 17 16

18

16 14 13 19

15 11 13 16

19

17 19 17 13

10

15 18 14 13

20

12 15 12 17

(i) Using all the data, find trial control limits for
and R charts, construct the chart, and
plot the data.
(ii) Use the trial control limits from part (a) to identify out-of-control points. If necessary,
revise your control limits, assuming that any samples that plot outside the control limits
can be eliminated.
(iii) Assuming that the process is in control, estimate the process mean and process standard
deviation.

Solution:

Probability and Statistics Work Book

(i)

(ii)

The trial control limits are as follows.

Based on the control charts, there is a single observation beyond the


control limits. Observation 14 is above the upper control limit on the R
chart.
With Observation 14 removed, the control limits and charts are as follows.

Probability and Statistics Work Book

.0
All points are within the control limits. The process is said to be in
statistical control.
(iii)

The estimate process mean is 15.14


The estimate process standard deviation is 3.895/2.059 = 1.892

Exrcise 9:

Probability and Statistics Work Book

The thickness of a printed circuit board (PCB) is an important quality parameter. Data on
board thickness (in cm) are given below for 25 samples of three boards each.
Sample

Sample

0.0629

0.0636

0.0640

14

0.0645

0.0640

0.0631

0.0630

0.0631

0.0622

15

0.0619

0.0644

0.0632

0.0628

0.0631

0.0633

16

0.0631

0.0627

0.0630

0.0634

0.0630

0.0631

17

0.0616

0.0623

0.0631

0.0619

0.0628

0.0630

18

0.0630

0.0630

0.0626

0.0613

0.0629

0.0634

19

0.0636

0.0631

0.0629

0.0630

0.0639

0.0625

20

0.0640

0.0635

0.0629

0.0628

0.0627

0.0622

21

0.0628

0.0625

0.0616

0.0623

0.0626

0.0633

22

0.0615

0.0625

0.0619

10

0.0631

0.0631

0.0633

23

0.0630

0.0632

0.0630

11

0.0635

0.0630

0.0638

24

0.0635

0.0629

0.0635

12

0.0623

0.0630

0.0630

25

0.0623

0.0629

0.0630

13

0.0635

0.0631

0.0630

(i) Using all the data, find trial control limits for
and R charts, construct the chart, and
plot the data.
(ii) Use the trial control limits from part (a) to identify out-of-control points. If necessary,
revise your control limits, assuming that any samples that plot outside the control limits
can be eliminated.
(iii) Assuming that the process is in control, estimate the process mean and process standard
deviation.

Hypothesis Testing
8

Probability and Statistics Work Book

- One Population

Exercise 1: (Example 1)
A manufacturer of sprinkler systems used for fire protection in office buildings claims that
the true average system- activation temperature is 1300. A sample of 9 systems, when tested
yields an average activation temperature of 131.080F. If the distribution of activation times is
normal with standard deviation 1.50F, does the data contradict the firms claim at level of
significance a = 0.01. What is the P-value for this test?

Exercise 2: (Example 2)
A random sample of 50 battery packs is selected and subjected to a life test. The average life
of these batteries is 4.05 hours. Assume that the battery life is normally distributed with
standard deviation equals 0.2 hour. Is there evidence to support the claim that mean battery
life exceeds 4 hours? Use a = 0.05. What is the P-value for this test?

Exercise 3:
A new cure has been developed for a certain type of cement that results in a compressive
strength of 5000 kilograms per square centimeter with a standard deviation of 120 kilograms
follow the normal distribution. To test the null hypothesis that = 5000 against the
alternative that < 5000, a random sample of 50 pieces of cement is observed. The critical
region is defined to be X < 4970.
(a) Find the probability of committing a type I error when H0 is true.
(b) Evaluate (the probability of type II error) if = 4960

Probability and Statistics Work Book

Exercise 4: (Tutorial 6, No.1)


A civil engineer is analyzing the compressive strength of concrete. Compressive strength is
approximately normally distributed with variance 2 = 1000psi2. A random sample of 12
specimens has a mean compressive strength of x

=3255.42 psi.

(a) Test the hypothesis that mean compressive strength is 3500psi. Use a fixed-level test
with =0.01;
(b) What is the smallest level of significance at which you would be willing to reject the null
hypothesis?;
(c) Construct a 95% two-sided CI on mean compressive strength; and
(d) Construct a 99% two-sided CI on mean compressive strength. Compare the width of this
confidence interval with the width of the one in part (c). What is your comment?
Solution:
(a) (i) The parameter of interest is the true mean compressive strength, .
(ii) The hypothesis Testing:
vs
(iii) The significance level = 0.01
(iv) The test statistics is:

Computation

(v)

x 3255.42, 31.62
3255.42 3500
z0
26.79
31.62 / 12
Decision:

Reject H0 if z0 <- z/2 where z0.005 = 2.58 or z0 > z/2 where z0.005 = 2.58
(vi) Result and conclusion:
Since -26.79 < -2.58, so we reject the null hypothesis and conclude the true mean
compressive strength is significantly different from 3500 at = 0.01.
(b) The smallest level of significance at which we are willing to reject the null hypothesis is
P-value = 2[1 - (26.84)]=2[1-1]=0
(c) A 95% two-sided CI on mean compressive strength is

10

Probability and Statistics Work Book

With 95% confidence, we believe the true mean compressive strength is between
3237.53psi and 3273.31psi.

(d) A 99% two-sided CI on mean compressive strength is

With 99% confidence, we believed that the true mean compressive strength is between
3231.96 psi and 3278.88 psi.
The 99% confidence interval is wider than the 95% confidence interval.
We can conclude that the confidence interval with the larger level of confidence will
always result in a wider confidence interval when x , 2, and n are held constant.

Exercise 5: (Example 3)
A new process for producing synthetic diamonds can be operated at a profitable level only if
the average weight of the diamonds is greater than 0.5 karat. To evaluate the profitability of

11

Probability and Statistics Work Book

the process, six diamonds are generated with recorded weights, 0.46, 0.61, .52, .48, .57 and .
54 karat.
(a) At 5% significance level Do the six measurements present sufficient evidence that the
average weight of the diamonds produced by the process is in excess of .05 karat?
(b) Use the P-value approach to test the hypothesis null.
(c) Construct a 95% CI on the average weight of diamonds.

Exercise 6: (Tutorial 6, No.2)


One of the Cigarette Company claims that their cigarettes contain an average of only 10mg
of tar. A random sample of 25 cigarettes shows the average tar content to be 12.5mg with
standard deviation of 4.5mg.
(a) Construct a hypothesis test to determine whether the average tar content of cigarettes
exceeds 10mg. using the P-value approach;
(b) Construct a 95% two-sided CI on the average tar content of cigarettes.
Solution:
(a) (i) The parameter of interest is the true mean tar content, .
(ii) The hypothesis testing:

H 0 : 10mg
vs
H 1 : 10mg
(iii) The test statistics is:
t0

x
s/

12.5 10
2.778
4.5 / 5

(v) Decision:
Reject H0 if P-value is smaller than 0.05
(vi) Conclusion:
From a t-distribution table, for a t distribution with 24degree of freedom, that t0
=2.778 falls between two values: 2.492 for which =0.01 and 2.797 for which
=0.005. So the P-value is : 0.005 < P < 0.01. Since P<0.05, thus we reject H0 and
conclude that the mean tar content of the cigarette exceeds 10mg.

12

Probability and Statistics Work Book

(b)

A 95% two-sided CI on mean tar content is

x 12.5, s 4.5, n 25, t / 2 , n 1 t 0.025, 24 2.064

x t / 2 , n 1

x t / 2, n 1

n
n

4.5

12.5 (2.064)
12.5 (2.064)
25

10.642 14.358

4.5

25

Exercise 7: (Example 4)
Regardless of age, about 20% of Malaysian adults participate in fitness activities at least
twice a week. In a local survey of 100 adults over 40 years old, a total of 15 people indicated
that they participated in a fitness activity at least twice a week.
(a) Do these data indicate that the participation rate for adults over 40 years of age is
significantly less than 20%? Carry out a test at 10% significance level and draw
appropriate conclusion.
(b) Construct a 95% two-sided CI on the participation rate.

Exercise 8: (Tutorial 6, No.3)


A survey done one year ago showed that 45% of the population participated in recycling
programs. In a recent poll a random sample of 1250 people showed that 588 participate in
recycling programs.
(a) Test the hypothesis that the proportion of the population who participate in recycling
programs is greater than it was one year ago. Use a 5% significance level.

13

Probability and Statistics Work Book

(b) Construct a 95% two-sided CI on the proportion.


Solution:
(a) (i)

The parameter of interest is the proportion of the population who


participate in
recycling program, p.
(ii) The hypothesis testing:
H 0 : p 0 0.45

vs
H 1 : p 0 0.45
(iii) The significance level = 0.05
(iv) Test statistics is:

p X / n p0

p0 (1 p0 ) / n

z0

588 / 1250 0.45


1.449
(0.45)(0.55) / 1250

(v) Decision:
Reject H0 if z0 > z where z = z0.05 = 1.645.
(vi) Conclusion:
Since 1.449 < 1.645, thus we do not reject the null hypothesis and
conclude that 45% of the population who participate in recycling program
is true at the 0.05 level of significance.
(b) 95% two-sided CI is

Z / 2
p
0.47 1.96

(1 p
)
p
Z / 2
p p
n

(1 p
)
p
n

(0.47)(0.53
(0.47)(0.53)
p 0.47 1.96
1250
n
0.442 p 0.498

Since p =0.45 is inside the interval, then we cannot reject the null hypothesis.
Exercise 9:
A Ipoh city council member gave a speech in which she said that 18% of all private homes in
the city had been undervalued by the county tax assessors office. In a follow-up story the
local newspaper reported that it had taken random sample of 91 private homes. Using
professional evaluator to evaluate the property and checking against county tax records it
found that 14 of the homes had been undervalued.
(i) Does this data indicate that the proportion of private homes that are undervalued by the

14

Probability and Statistics Work Book

county tax assessor is different from 18%? Use a 5% significance level.


(ii) Construct a 95% two-sided CI on the proportion.

Exercise 10: (Example 5)


Engineers designing the front-wheel-drive half shaft of a new model automobile claim that
the variance in the displacement of the constant velocity joints of the shaft is less than 1.5
mm. 20 simulations were conducted and the following results were obtained, x 3.39 and
s = 1.41.
(i) At = 0.05, do these data support the claim of the engineers?
(ii) What is the P-value for this test?
(iii) Construct a two-sided CI for

Exercise 11: (Tutorial 6, No.4)


An Aerospace Engineers claim that the standard deviation of the percentage in an alloy used
in aerospace casting is greater than 0.3. 51 parts were randomly selected and the sample
standard deviation of the percentage in an alloy used in aerospace casting is s =0.37.
(i). At = 0.05, do these data support the claim of the engineers?
(ii) What is the P-value for this test?

15

Probability and Statistics Work Book

(iii) Construct a 95% two-sided CI for . What is conclusion?


Solution:
(i) (a) The parameter of interest is the population variance
(b) The hypothesis testing:

H 0 : 2 (0.3) 2
vs
H1 : 2 (0.3) 2
(c) The significance level = 0.05
(d) Test statistics is:

(n 1) s 2 50(0.37) 2

76.056
02
(0.3) 2
2
0

(e) Decision:
2
2
Reject H0 if 0 0.05,50 67.50
(f) Conclusion:
Since 76.056 > 67.50, thus we reject the null hypothesis and conclude that
the engineers claim is true at the 0.05 level of significance.
02.1,50 76.15, 02.25,50 71.42 . Since
2 table,
(ii)
From
the
71.42<76.056< 76.15, so the P-value is 0.1 < p < 0.25. Because the P-value is
large, then we do not reject the null
hypothesis.
(b) 95% two-sided CI is

(n 1) s 2
(n 1) s 2
2

2 / 2, n 1
12 / 2, n 1
50(0.37) 2
50(0.37) 2
2
71.42
32.36
0.442 p 0.498

16

Probability and Statistics Work Book

Exercise 12:
The scientists claim that the variance of sugar content of the syrup in canned peaches thought
to be 18 mg2. From a random sample of 10 cans yields a sample deviation of 4.8mg.
(i) At = 0.05, do these data support the claim of the scientists?
(ii) What is the P-value for this test?
(iii) Construct a 95% two-sided CI for . What is conclusion?

7
5

Hypothesis Testing
-Two Population

Exercise 1: (Example 1)
A random sample of size n = 25 taken from a normal population with = 5.2 has a mean
equals 81. A second random sample of size n = 36, taken from a different normal population
with = 3.4, has a mean equals 76.
(a) Do the data indicate that the true mean value 1 and 2 are different? Carry out a test
at = 0.01

17

Probability and Statistics Work Book

(b) Find 90% CI on the difference in mean strength

Exercise 2: (Example 2)
Two machines are used for filling plastic bottles with a net volume of 16.0 oz. The fill
volume can be assumed normal with, s1 = 0.02 and s2 = 0.025. A member of the quality
engineering staff suspects that both machines fill to the same mean net volume, whether or
not this volume is 16.0 oz. A random sample of 10 bottles is taken from the output of each
machine with the following results:
(a) Do you think the engineer is correct? Use the p value approach.
(b) Find a 95% CI on the difference in means.

Exercise 3: (Tutorial 7, No.1)


Two machine are used to fill plastic bottles with dishwashing detergent. The standard
deviations of fill volume are known to be 10.01 and = 0.15 fluid ounce for two
machines, respectively. Two random samples of n1 = 12 bottles from machine 1 and n2=10
bottles from machine 2 are selected, and the sample mean fill volumes are x 1 =30.61

x 2 =30.24 fluid ounces. Assume normality.


(i) Test the hypothesis that both machines fill to the same mean volume. Use the P-value
approach;
(ii) Construct a 90% two-sided CI on the mean difference in fill volume; and
(iii) Construct a 95% two-sided CI on the mean difference in fill volume. Compare and
comment on the width of this interval to the width of the interval in part (ii).

18

Probability and Statistics Work Book

Exercise 4: (Example 3)
To find out whether a new serum will arrest leukemia, 9 mice, all with an advanced stage of
the disease are selected. 5 mice receive the treatment and 4 do not. Survival, in years, from
the time the experiment commenced are as follows:
Treatment

2.1

5.3

1.4

4.6

No treatment

1.9

0.5

2.8

3.1

0.9

At the 0.05 level of significance can the serum be said to be effective? Assume the two
distributions to be of equal variances.

Exercise 5: (Tutorial 7, No.2)


A new policy regarding overtime pay was implemented. This policy decreased the pay factor
for overtime work. Neither the staffing pattern nor the work loads changed. To determine if
overtime loads changed under the policy, a random sample of employees was selected. Their
overtime hours for a randomly selected week before and for another randomly selected week
after the policy change were recorded as follows:
Employees:
Before:
After:

1
5

2
4

2
7

3 4
5 6 7 8 9 10 11 12
8 10 4 9 3 6 0 1 5
5 3 7
4 4 1 2 3 2 2

Assume that the two population variances are equal and the underlying population is
normally distributed.
(i) Is there any evidence to support the claim that the average number of hours worked as
overtime per week changed after the policy went into effect. Use a P-value approach in
arriving at this conclusion.

19

Probability and Statistics Work Book

(ii) Construct a 95% CI for the difference in mean before and after the policy change.
Interpret this interval.

Exercise 6:
The diameter of steel rods manufactured on two different extrusion machines is being
investigated. Two random samples of sizes n1 = 15 and n2 = 17 are selected, and
x1 8.37,Assume
s12 0.that
35 data
andarex2drawn
8.68
, s22
0.40 with equal variances.
respectively.
normal
distribution
(a) Is there evidence to support the claim that the two machines produce rods with different
mean diameters ? Use the p value approach.
(b) Construct a 95% CI on the difference in mean rod diameter.

Exercise 7: (Example 4)
The following data represent the running times of films produced by 2 motion-picture
companies. Test the hypothesis that the average running time of films produced by company
2 exceeds the average running time of films produced by company 1 by 10 minutes against
the one-sided alternative that the difference is less than 10 minutes? Use a = 0.01 and assume
the distributions of times to be approximately normal with unequal variances.
Time

Company
X1

102

86

98

20

109

92

Probability and Statistics Work Book

X2

81

165

97

134

92

87

114

Exercise 8:
Two companies manufacture a rubber material intended for use in an automotive application.
25 samples of material from each company are tested, and the amount of wear after 1000
cycles are observed. For company 1, the sample mean and standard deviation of wear are

x1 20.12mg / 1000cycles and s1 1.9mg / 1000cycles


and for company 2, we obtain x2 11.64mg / 1000cycles and s2 7.9mg / 1000cycles
(a) Do the sample data support the claim that the two companies produce material with
different mean wear? Assume each population is normally distributed but unequal
variances?
(b) Construct a 95% CI for the difference in mean wear of these two companies. Interpret
this interval.

Exercise 9: (Tutorial 7, No.3)


Professor A claims that a probability and statistics student can increase his or her score on
tests if the person is provided with a pre-test the week before the exam. To test her theory she
selected 16 probability and statistics students at random and gave these students a pre-test the
week before an exam. She also selected an independent random sample of 12 students who
were given the same exam but did not have access to the pre-test. The first group had a mean
score of 79.4 with standard deviation 8.8. The second group had sample mean score 71.2
with standard deviation 7.9.
(i) Do the data support Professor A claims that the mean score of students who get a pre-test
are different from the mean score of those who do not get a pre test before an exam. Use
the P-value approach and assume that their variances are not equal.
(ii) Construct a 95% CI for the difference in mean score of students who get a pre-test and
those who do not get a pre-test before an exam. Interpret this interval.

21

Probability and Statistics Work Book

Exercise 10: (Example 5)


A vote is to be taken among residents of a town and the surrounding county to determine
whether a proposed chemical plant should be constructed. If 120 of 200 town voters favour
the proposal and 240 of 500 county residents favour it, would you agree that the proportion
of town voters favouring the proposal is higher than the proportion of county voters? Use a =
0.05

Exercise 11: (Tutorial 7, No.4)


The rollover rate of sport utility vehicles is a transportation safety issue. Safety advocates
claim that the manufacturer As vehicle has a higher rollover rate than that of manufacturer
B. One hundreds crashes for each of this vehicles were examined. The rollover rates were
pA=0.35 and pB=0.25.
(i) By using the P-value approach, does manufacturer As vehicle has a higher rollover rate
than manufacturer Bs?
(ii) Construct a 95% CI on the difference in the two rollover rates of the vehicle. Interpret
this interval.

22

Probability and Statistics Work Book

Exercise 12:
Professor Rady gave 58 As and Bs to a class of 125 students in his section of English 101.
The next term Professor Hady gave 45 As and Bs to a class of 115students in his section of
English 101.
(i) By using a 5% significance level, test the claim that Professor Rady gives a higher
percentage of As and Bs in English 101 than Professor Hady does. What is comment?
(ii) Construct a 95% CI on the difference in the percentage of As and Bs in English 101
given by this two professors.

8
5

Simple Linear Regression

Exercise 1: (Example 1)
The manager of a car plant wishes to investigate how the plants electricity usage depends
upon the plant production. The data is given below

23

Probability and Statistics Work Book

Production 4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2
(RMmillion)
(x)
Electricity 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.53
Usage
(y)
(a) Estimate the linear regression equation Y 0 1 x
(b) An estimate for the electricity usage when x = 5
(c) Find a 90% Confidence Interval for the electricity usage.

Exercise 2:
An experiment was set up to investigate the variation of the specific heat of a certain
chemical with temperature. The data is given below

(a)
(b)
(c)
(d)

Temperature oF
(x)

50

60

70

80

90

Heat
(y)

1.60
1.64

1.63
1.65

1.67
1.67

1.70
1.72

1.71
1.72

100
1.71
1.74

Estimate the linear regression equation Y 0 1 x


Plot the results on a scatter diagram
An estimate for the specific heat when the temperature is 75oF
Find a 95% Confidence Interval for the specific heat.

Exercise 3: (Example 2)
An engineer at a semiconductor company wants to model the relationship between the device
HFE (y) and the parameter Emitter - RS ( x).1 Data for Emitter - RS was first collected and
a statistical analysis is carried out and the output is displayed in the table given.
Regression Analysis: y = 1075.2 63.87x1
Predictor
Coef
SE Coef
T
Constant
1075.2
121.1
8.88
x1
-63.87
8.002
-7.98
S = 19.4
R-Sq = 0.78
Analysis of variance
Source
DF

SS

P-value
0.000
0.000

MS

24

Probability and Statistics Work Book

Regression
Residual
Total

1
18
19

23965
6772
30737

23965
376

63.70

(a) Estimate HFE when the Emitter - RS is 14.5.


(b) Obtain a 95 % confidence interval for the true slope .
(c) Test for significance of regression for a = 0.05.

Exercise 4:
An chemical engineer wants to model the relationship between the purity of oxygen (y)
produced in a chemical distillation process and the percentage of hydrocarbons (x ) that are
present in the main condenser of the distillation unit. A statistical analysis is carried out and
the output is displayed in the table given.
Regression Analysis: y = 74.3 + 14.9x
Predictor
Coef
SE Coef
Constant
74.283
1.593
x1
14.947
1.317
S = 1.087 R-Sq = 87.7%
Analysis of variance
Source
DF

SS

T
46.62
11.35

MS

P-value
0.000
0.000

25

Probability and Statistics Work Book

Regression
Residual
Total

1
18
19

152.13
21.25
173.38

152.13
1.18

12.86

(a) Estimate the purity of oxygen when the percentage of hydrocarbon 1%.
(b) Obtain a 95 % confidence interval for the true slope .
(c) Test for significance of regression for a = 0.05.

Exercise 5: (Tutorial 8, No.1)


Regression methods were used to analyze the data from a study investigating the relationship
between roadway surface temperature (x) and pavement deflection (y). The data follow.
Temperature
x

Deflection
y

Temperature
x

Deflection
y

70.0

0.621

72.7

0.637

77.0

0.657

67.8

0.627

72.1

0.640

76.6

0.652

72.8

0.623

73.4

0.630

26

Probability and Statistics Work Book

Temperature
x

Deflection
y

Temperature
x

Deflection
y

78.3

0.661

70.5

0.627

74.5

0.641

72.1

0.631

74.0

0.637

71.2

0.641

72.4

0.630

73.0

0.631

75.2

0.644

72.7

0.634

76.0

0.639

71.4

0.638

(a) Estimate the intercept


and slope
regression coefficients. Write the estimated
regression line.
(b) Compute SSE and estimate the variance.
(c) Find the standard error of the slope and intercept coefficients.
(d) Show that
(e) Compute the coefficient of determination, R2. Comment on the value.
(f) Use a t-test to test for significance of the intercept and slope coefficients at
.
Give the P-values of each and comment on your results.
(g) Construct the ANOVA table and test for significance of regression using the P-value.
Comment on your results and their relationship to your results in part (f).

(h) Construct 95% CIs on the intercept and slope. Comment on the relationship
of these CIs and your findings in parts (f) and (g).

Exercise 6: (Tutorial 8, No.2)


The designers of a database information system that allows its users to search backwards for
several days wanted to develop a formula to predict the time it would be take to search.
Actually elapsed time was measured for several different values of days. The measured data
is shown in the following table:
Number of Days
Elapsed Time

1
0.65

2
0.79

27

4
1.36

8
2.26

16
3.59

25
5.39

Probability and Statistics Work Book

(i) Estimate the intercept


and slope
regression coefficients. Write the
estimated regression line.
(ii) Compute SSE and estimate the variance.
(iii) Find the standard error of the slope and intercept coefficients.
(iv) Show that
(v) Compute the coefficient of determination, R2. Comment on the value.
(vi) Use a t-test to test for significance of the intercept and slope coefficients at
. Give the P-values of each and comment on your results.
(vii)
Construct the ANOVA table and test for significance of regression
using the P-value. Comment on your results and their relationship to your
results in part (vi).
(viii)
Construct 95% CIs on the intercept and slope. Comment on the
relationship of these CIs and your findings in parts (vi) and (vii).

Multiple Linear Regressions

Exercise 1: (Example 1)

28

Probability and Statistics Work Book

Given the data:


Test Number
1
2
3
4
5
6
7
8
9
10

y
1.6
2.1
2.4
2.8
3.6
3.8
4.3
4.9
5.7
5

x1
1
1
2
2
2
3
2
4
4
3

x2
1
2
1
2
3
2
4
2
3
4

(a) Fit a multiple linear regression model to these data.

Exercise 2:
Given the data:
Observation Number Pull Strength y Wire Length x1 Die Height x2
1
9.95
2
50
2
24.45
8
110
3
31.75
11
120
4
35.00
10
550

29

Probability and Statistics Work Book

5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

25.02
16.86
14.38
9.60
24.35
27.50
17.08
37.00
41.95
11.66
21.65
17.89
69.00
10.30
34.93
46.59
44.88
54.12
56.63
22.13
21.15

8
4
2
2
9
8
4
11
12
2
4
4
20
1
10
15
15
16
17
6
5

295
200
375
52
100
300
412
400
500
360
205
400
600
585
540
250
290
510
590
100
400

(b) Fit a multiple linear regression model to these data.

Exercise 3:
A study was performed to investigate the shear strength of soil (y) as it related to depth in
meter (x1) and percentage moisture content (x2). Ten observations were collected and the
following summary quantities obtained:

30

Probability and Statistics Work Book

n 10,

x
x

2
i1
i1

i1

i2

y
x x

553,

1,916,

12,352,
x 31,729,
43,550.8, x y 104,736.8, y 371,595.6

5,200.9,
yi

223,
2
i2

i1 i 2

i2

2
i

(a) Estimate the parameters to fit the multiple regression models for these data.
(b) What is the predicted strength when x1=18meter and x2= 43%.

Exercise 4: (Example 2)
A set of experimental runs were made to determine a way of predicting cooking time y at
various levels of oven width x1, and temperature x2. The data were recorded as follows:

31

Probability and Statistics Work Book

(a) Fit a multiple linear regression model to these data.


(b)
(c)
(d)
(e)

Estimate and the standard errors of the regression coefficients.


Test for significance of and .
Predict the useful range when brightness = 80 and contrast = 75. Construct a 95% PI.
Compute the mean response of the useful range when brightness = 80 and contrast = 75.
Compute a 95% CI.
(f) Interpret parts (d) and (e) and comment on the comparison between the 95% PI and 95%
CI.

Exercise 5: (Tutorial 9, No.1)


An article in Optical Engineering (Operating Curve Extraction of a Correlator's Filter, Vol.
43, 2004, pp. 27752779) reported the use of an optical correlator to perform an experiment
by varying brightness and contrast. The resulting modulation is characterized by the useful
range of gray levels. The data are shown

32

Probability and Statistics Work Book

Brightness (%):

54 61 65 100 100 100 50

57

54

Contrast (%):

56 80 70 50

35

26

65

80

25

Useful range (ng): 96 50 50 112 96

80

155 144 255

(a) Fit a multiple linear regression model to these data.


(b)
(c)
(d)
(e)

Estimate and the standard errors of the regression coefficients.


Test for significance of and .
Predict the useful range when brightness = 80 and contrast = 75. Construct a 95% PI.
Compute the mean response of the useful range when brightness = 80 and contrast = 75.
Compute a 95% CI.
(f) Interpret parts (d) and (e) and comment on the comparison between the 95% PI and 95%
CI.

Exercise 6: (Tutorial 9, No.2)


A study was performed on wear of a bearing y and its relationship to x1 = oil viscosity and
x2 = load. The following data were obtained:
x1 1.6 15.5 22.0

43.0 33.0 40.0

33

Probability and Statistics Work Book

x2 851 816

1058 1201 1357 1115

172

293 230

91

(a)
(b)
(c)
(d)
(e)

113

125

Fir a multiple regression model to these data.


Estimate and the standard errors of the regression coefficients.
Use the model to predict wear when x1 = 25 and x2 = 1000.
Fit a multiple regression model with an interaction term to these data.
Estimate and se(j) for this new model. How did these quantities change? Does this
tell you anything about the value of adding the interaction term to the model?
(f) Use the model in (d), to predict when x1=25 and x2=1000. Compare this prediction with
the predicted value from part (c) above.

34

10
10

Probability and Statistics Work Book

Factorial Experiments
22 Factorial design

Exercise 1: (Example 1)
An engineer is investigating the thickness of epitaxial layer which will be subject to two
variations in A, deposition time (+ for short time, and for long time) and two levels of B,
arsenic flow rate (- for 55% and + for 59%). The engineer conduct 22 factorial design with n
= 4 replicates. The data are as follow:
Arsenic Level
B
(Low - 55%)

B+
(High 59%)

14.037
14.165
13.972
13.907

13.880
13.860
14.032
13.914

14.821
14.757
14.843
14.878

14.888
14.921
14.415
14.932

a)
Deposition Time
A - (Long)

A + (Short)

Construct the 2 X 2 factorial design table.


b) Find the estimate of all effects and interaction.
c) Construct the ANOVA table for each effect, test the null hypothesis that the effect is
equal to 0.
Exercise 2: (Tutorial No1)

35

Probability and Statistics Work Book

A two factor experimental design was conducted to investigate the lifetime of a component
being manufactured. The two factors are A (design) and B (cost of material). Two levels ((+)
and (-)) of each factor are considered. Three components are manufactured with each
combination of design and material, and the total lifetime measured (in hours) is as shown in
table below

Total lifetime of 3

Design

Material

Treatment

Combination
(1)

122

60

120

ab

118

AB

components
(in hours)

(a) Perform a two way analysis of variance to estimate the effects of design and material
expense on the component life time.
(b) Based on your results in part (a), what conclusions can you draw from the factorial
experiment?
(c) Indicate which effects are significant to the lifetime of a component.
(d) Write the least square fitted model using only the significant sources.

Exercise 3:

36

Probability and Statistics Work Book

An engineer suspects that the surface finish of metal parts is influenced by the type of paint
used and the drying time. He selected three drying times 20, 25, and 30 minutes and used
two types of paint. Three parts are tested with each combination of paint typoe and drying
time. The data are as follow:
Drying Time (min)
Paint

20min

25min

30min

ICI

74
64
50

73
61
44

78
85
92

NIPPON

92
86
68

98
73
88

66
45
85

(a) Compute the estimates of the effects and their standard errors for this design.
(b) Construct two-factor interaction plots and comment on the interaction of the factors.
(c) Use the t ratio to determine the significance of each effect with
.Comment on
your findings.
(d) Compute an approximate 95% CI for each effect. Compare your results with those in
part (c) and comment.
(e) Perform an analysis of variance of the appropriate regression model for this design.
Include in your analysis hypothesis tests for each coefficient, as well as residual

Exercise 4: (Tutorial 10, No.2)

37

Probability and Statistics Work Book

An experiment involves a storage battery used in the launching mechanism of a shoulderfired ground-to-air missile. Two material types can be used to make the battery plates. The
objective is to design a battery that is relatively unaffected by the ambient temperature. The
output response from the battery is effective life in hours. Two temperature levels are
selected, and a factorial experiment with four replicates is run. The data are as follows:
Temperature (F)
Material
1

Low

130 155 20 70
74

High

180 82 58

138 110 96 104


168 160 82 60

(a) Compute the estimates


of the effects and their
standard errors for this design.
(b) Construct two-factor interaction plots and comment on the interaction of the factors.
(c) Use the t ratio to determine the significance of each effect with
.Comment on
your findings.
(d) Compute an approximate 95% CI for each effect. Compare your results with those in
part (c) and comment.
(e) Perform an analysis of variance of the appropriate regression model for this design.
Include in your analysis hypothesis tests for each coefficient, as well as residual
analysis. State your final conclusions about the adequacy of the model. Compare your
results to part (c) and comment.

Exercise 5:

38

Probability and Statistics Work Book

An article in the IEEE Transactions on Semiconductor Manufacturing (Vol. 5, 1992, pp. 214222) describes an experiment to investigate the surface charge on a silicon wafer. The factors
thought to influence induced surface charge are cleaning method (spin rinse dry or SRD and
spin dry or SD and the position on the wafer where the charge was measured. The surface
charge ( X1011 q/cm3) response data are shown.

Cleaning
Method

SD

SRD

Test Position
L
1.66
1.90
1.92
-4.21
-1.35
-2.08

R
1.84
1.84
1.62
-7.58
-2.20
-5.36

(a) Compute the estimates of the effects and their standard errors for this design.
(b) Construct two-factor interaction plots and comment on the interaction of the factors.
(c) Use the t ratio to determine the significance of each effect with
.Comment on
your findings.
(d) Compute an approximate 95% CI for each effect. Compare your results with those in
part (c) and comment.
(e) Perform an analysis of variance of the appropriate regression model for this design.
Include in your analysis hypothesis tests for each coefficient, as well as residual
analysis. State your final conclusions about the adequacy of the model. Compare your
results to part (c) and comment.

39

Das könnte Ihnen auch gefallen