Beruflich Dokumente
Kultur Dokumente
Homework # 3
Fall 2016
Submitted by:
Sandarva Murti Sharma
Graduate Student
Civil Engineering Department
Nov 28, 2016
Problem 1: The production manager of a large investment casting firm is studying
different methods to increase productivity in the workforce of the company. The
process engineer and personnel in the human resource department develop three new
incentive plans (B, C, D) for which they will design a study to compare the incentive
plans with the current plan (plan A). Twenty workers are randomly assigned to each
of the four plans. The response variable is the total number of units produced by each
worker during one month on the incentive plans. Use the information given above to
fill in the following ANOVA table.
Source SS DF MS F p value
Plan 236758 ? ? ? ?
Error 2972125 ? ?
Total ? ?
Answer:
Source SS DF MS F p value
Plan 236758 3 78919.33 2.02 0.12
Error 2972125 76 39106.91
Total 3208883 79
1. (3 points) Identify the design of this experiment and briefly explain your reasoning.
Answer:
The design of this experiment is completely randomized design with two factors. This is
because it has two factors of interest and no block.
2. (3 points) Produce a profile plot with appropriate labels and a caption. Use the R code
provided with this assignment and fill in R commands where necessary. Briefly describe
the features of the profile plot.
Answer:
We have 3 severity levels. We can observe interaction between the variables in each
of the severity levels. We can also observe that there is difference in temperature level
within 9 medication levels.
3. (15 points) Construct the ANOVA table with numbers filled in (Can use anova) in R to
generate this table or calculate by hand). Include in the table the p values for relevant F
statistics.
Answer:
4. (3 points) Write down the ANOVA model, using and to represent the two variables,
respectively.
Answer:
yijk = + i + j + ij + Eijk
: overall mean (reference)
: additional effect of ith level of factor of interest (severity)
: additional effect of jth level of factor of interest (medication)
Eijk : random error
5. (6 points) Use the ANOVA table to answer the following hypothesis testing questions:
(a) Does the effect of the drug depend on the severity of the patients high blood pressure
disorder at significance level of I = 0.05? Write down the null hypothesis using math
symbols from part 4. Is the conclusion supported by the profile plot?
Answer:
H0 : ()11= ()12=()13=()19=()21=.=()39=0
The p value for interaction is 0.22 which is greater than 0.05. By this we can say that we fail
to reject the null hypothesis. The profile plot doesnt support the conclusion because we can
see the interactions in the profile plots but we failed to establish that fact (null hypothesis).
The effect of the drug does not depend on the severity of the patients high blood pressure
disorder at significance level of I = 0.05.
(b) Do different medications have different effect on the average oral body temperature at
significance level of I = 0.05? Write down the null hypothesis using math symbols from
part 4. Is the conclusion supported by the profile plot?
Answer:
H0 : 1= 2= 3=0
The p value for medication alone is less than 0.05. Therefore, we reject null hypothesis. Hence,
we conclude that medications have different effect on the average oral body temperature at
significance level of I = 0.05. It is statistically significant. Yes, the profile plot supports the
conclusion.
(c) Does the severity of high blood pressure have an impact on the average oral body
temperature at significance level of I = 0.05? Write down the null hypothesis using math
symbols from part 4. Is the conclusion supported by the profile plot?
Answer:
:
H0 1= 2= 3..= 0
The p value severity level is 0.0009 which is less than 0.05. Hence, we reject the null
hypothesis. Hence, we conclude that the severity of high blood pressure have an impact on the
average oral body temperature at significance level of I = 0.05.
6. (10 points; 2 points per question) Use a general linear model to estimate the effect sizes
of different medications and severity levels. Break down your answer into the following
parts:
(a) Write down the linear model before estimating the coefficients. Use s to represent the
coefficients. Use dummy variables to represent each level. Specify the baseline.
Y= 0 + 1X1 + 2X2 + 3X3+ 4X4+ 5X5 + 6X6 + 7X7 + 8X8 + 9X9 + 10X10 + 11X1X3 + 12X2X3
+ 13X1X4 + 14X2X4 + 15X1X5 + 16X2X5 + 17X1X6+ 18X2X6 + 19X1X7 + 20X2X7 + 21X1X8
Residuals:
Min 1Q Median 3Q Max
-0.36 -0.09 0.00 0.10 0.64
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept ) 97.48000 0.07008 1390.995 < 2e-16 ***
Severity2 -0.02000 0.09911 -0.202 0.840451
Severity3 0.18000 0.09911 1.816 0.072111 .
MedicationB 0.44000 0.09911 4.440 2.18e-05 ***
MedicationC 0.44000 0.09911 4.440 2.18e-05 ***
MedicationD -0.04000 0.09911 -0.404 0.687302
MedicationE 0.32000 0.09911 3.229 0.001647 **
MedicationF 0.32000 0.09911 3.229 0.001647 **
MedicationG -0.04000 0.09911 -0.404 0.687302
MedicationH 0.46000 0.09911 4.641 9.78e-06 ***
MedicationI 0.36000 0.09911 3.632 0.000431 ***
Severity2: MedicationB -0.20000 0.14016 -1.427 0.156478
Severity3: MedicationB -0.32000 0.14016 -2.283 0.024380 *
Severity2: MedicationC -0.06000 0.14016 -0.428 0.669441
Severity3: MedicationC -0.04000 0.14016 -0.285 0.775891
Severity2: MedicationD 0.12000 0.14016 0.856 0.393798
Severity3: MedicationD 0.16000 0.14016 1.142 0.256160
Severity2: MedicationE -0.02000 0.14016 -0.143 0.886797
Severity3: MedicationE -0.08000 0.14016 -0.571 0.569333
Severity2: MedicationF 0.06000 0.14016 0.428 0.669441
Severity3: MedicationF -0.10000 0.14016 -0.713 0.477089
Severity2: MedicationG 0.08000 0.14016 0.571 0.569333
Severity3: MedicationG -0.06000 0.14016 -0.428 0.669441
Severity2: MedicationH -0.02000 0.14016 -0.143 0.886797
Severity3: MedicationH -0.26000 0.14016 -1.855 0.066318 .
Severity2: MedicationI 0.06000 0.14016 0.428 0.669441
Severity3: MedicationI -0.02000 0.14016 -0.143 0.886797
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(e) Generate appropriate plots to check the normality and constant variance assumptions.
The normal Q-Q plot shows that the data are normally distributed and data point 45 is an outlier.
The residuals vs fitted plot shows that the residuals are more or less aligned to the zero line.
Problem 3: A quality control engineer is considering implementing a workshop to
instruct workers on the principles of total quality management (TQM). The program
would be quite expensive to implement across the whole corporation; hence the
engineer has designed a study to evaluate which of four types of workshops would be
most effective. The response variable will be the increase in productivity of the
worker after participating in the workshop. Since the effectiveness of the workshop
may depend on the workers preconceived attitude concerning TQM, the workers are
given an examination to determine their attitude prior to taking the workshop. Their
attitudes are classified into five groups. There are four workers in each group, and
the type of workshop is randomly assigned to the workers within each group. The
increases in productivity are given in workshop.TXT. No workshop-attitude
combination is expected to have extra effect on productivity.
(1) (3 points) Identify the design of this experiment and briefly explain your
reasoning.
The design of this experiment is randomized block design. Because it has 4
workshops as factor of interest, one blocking factor as attitude with five levels.
(2) (3 points) Write down the ANOVA model, using and to represent the two
variables.
yij = + i + j + Eij
: overall mean (reference)
: additional effect of ith level of factor of interest (workshops)
: additional effect of jth level of block (attitude)
Eij : random error
(3) (13 points) Construct the ANOVA table with appropriate numbers (Can use
anova() in R to generate this table). Include in the table the p values for
relevant F statistics.
Response: Productivity
Df Sum Sq Mean Sq F value Pr(>F)
Attitude 1 2160.90 2160.90 47.6459 5.039e-06 ***
Workshop 3 922.55 307.52 6.7805 0.004142 **
Residuals 15 680.30 45.35
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(4) (6 points) Generate a profile plot with appropriate labels and a caption.
Modify the R code for Question 2. Briefly describe the features in this profile
plot.
The profile plot shows that there is no interaction between the attitudes except the attitude 1
and attitude 2. There are 5 attitude level and 4 workshops. The productivity seems increasing
as the workshop increases.
(5) (10 points; 2 points per question) Use a general linear model to estimate the
effect sizes of different workshops and attitudes. Break down your answer into
the following parts:
(a) Write down the linear model before estimating the coefficients. Use s to
represent the coefficients. Use dummy variables to represent each level.
Specify the baseline.
The linear model is
Y= 0+ 1X1 +2X2 + 3X3 + 4X4 + 5X5 + 6X6 + 7X7
The base line is Workshop 1.
(b) Report the coefficient estimates and their p values.
(d) Report the fitted linear model. You may either retain only the significant
predictors, or include all the predictors and specify which of them have a
significant coefficient.
Y= 0 +2X2 + 3X3 + 4X4 + 6X6 + 7X7
Y= 0 +5.0X2 + 10.25X3 + 32.25X4 + 7.8X6 + 18.2X7
The fitted linear model above with significant variable.
(e) Generate appropriate plots to check the normality and constant variance
assumptions.
Problem 4: Exercise 16.18 (p 1037) A study was designed to evaluate whether socioeconomic
factors had an effect on verbalization skills of young children. Four socioeconomic
classes were defined and 20 children under the age of six were selected for the study. The
research hypothesis was that the mean verbalization skills would be different for the four
classes. The researchers determined that for young children there may be significant
gains in verbalization skills over only a few months. Thus, they decided to record the
exact age (in months) of each child. The verbalization skills (measured by testing) were
determined for each child. The data set is provided in verbal.TXT.
(2) (2 points) Generate a scatterplot of the data. Use different colors to distinguish
different socioeconomic classes (can use the R code provided with this assignment).
Provide an informative figure caption. Does the plot suggest a linear relationship in
each socioeconomic class?
Yes, the plot suggest a linear relationship in each economic class, but we could
also see some deviations in class 3.
(3) (3 points) Write down the general linear model for the data (NOT the fitted model).
Use s to represent the coefficients. Use dummy variables to represent each level.
Specify the baseline. Define the math symbols.
X1: Age
X2: 1 if class 2, ow 0
X3: 1 if class 3, ow 0
X4: 1 if class 4 , ow 0
The baseline is 0.
(4) Use the computer output on p1038-1039 to answer the following questions. Note that
there are a few typos in the computer output:
on p1039 the first time Model III appears. This should be Model II.
also on p1039, X2 (CD) appears twice. It should be X2 (C1).
(a) (2 points) Test whether the lines across the socioeconomic classes are parallel at
significance level of I = 0.05. Describe the null and alternative hypotheses. Provide the
observed test statistic, and p value.
H0: 5 = 6 = 7 =0
Ha: at least one of them is non-zero
F test: [(SSE2 SSE1)/ t-1] / [SSE1 / (N-2t)]
[(3316.8 3180.73)/ (4-1)] / [ 3180/ (80-2*4)]
=1.0269
p-value from R: 0.386
I = 0.05
Since, p-value from test is greater than significance level of 0.05. We fail to reject the null
hypothesis and say that lines across the socioeconomic classes are parallel at significance level
of I = 0.05.
(b) (2 points) Are there significant differences in the mean verbalization scores for the four
groups at significance level of I = 0.05? Describe the null and alternative hypotheses.
Provide the observed test statistic, and p value.
H0: 2=3 = 4 =0
Ha: at least one of them is non-zero
F test: [(SSE3 SSE2 )/ t-1] / [SSE2 / (N-t-1)]
[(8724.79 3316.83)/ (4-1)] / [ 3316.83/ (80-4-1)]
=40.762
p-value from R: 9.817987e-16
I = 0.05
Since, the p value is less than significant level 0.05. We reject the null hypothesis. We
conclude that there are significant differences in the mean verbalization scores for the four groups
at significance level of I = 0.05.
(c) (3 points) What is the fitted linear model for each socioeconomic class? You may either
retain only the significant predictors, or include all the predictors and specify which of
them have a significant coefficient. Use E = 0.05.
(d) (2 points) Estimate the mean verbalization score in each socioeconomic class.
The mean verbalization score (from model 2) in each socioeconomic class are: