Sie sind auf Seite 1von 24

1

Multivariate Data
Analysis
Selecting a Multivariate Technique
Dependency
Dependent (criterion) variables and
independent (predictor) variables are
present
Interdependency
Variables are interrelated without
designating some dependent and others
independent
2
Dependency Techniques
Multiple regression (Univariate and
multivariate)
Conjoint analysis
Discriminant analysis
Multivariate analysis of variance
(MANOVA)
Linear structural relationships (LISREL)
Interdependency Techniques
Factor analysis
Cluster analysis
Multidimensional Scaling (MDS)
3
Multiple Regression Model
The equation that describes how the dependent
variable y is related to the independent variables
x
1
, x
2
, . . . x
p
and an error term is called the
multiple regression model.
The multiple regression model is:
y =b
0
+b
1
x
1
+b
2
x
2
+. . . +b
p
x
p
+e
b
0
, b
1
, b
2
, . . . , b
p
are the parameters.
e is a random variable called the error term.
In the SLR, the conditional mean of Y depends on
X. The Multiple Regression Model extends this
idea to include more than one independent
variable.
The equation that describes how the mean value of The equation that describes how the mean value of yy is is
related to related to xx
11
, , xx
22
, . . . , . . . xx
pp
is called the is called the multiple regression multiple regression
equation equation..
The multiple regression equation is: The multiple regression equation is:
EE((yy) = ) =bb
00
++bb
11
xx
1 1
++bb
22
xx
2 2
+. . . + +. . . +bb
pp
xx
pp
Multiple Regression Equation Multiple Regression Equation
4
A simple random sample is used to compute sample A simple random sample is used to compute sample
statistics statistics bb
00
, , bb
11
, , bb
22
, , . . . , . . . , bb
pp
that are used as the point that are used as the point
estimators of the parameters estimators of the parameters bb
00
, , bb
11
, , bb
22
, . . . , , . . . , bb
pp
..
The The estimated multiple regression equation is: estimated multiple regression equation is:
yy ==bb
00
++bb
11
xx
1 1
++bb
22
xx
2 2
+. . . + +. . . +bb
pp
xx
pp
Estimated Multiple Regression Equation Estimated Multiple Regression Equation
^^
8
Least Squares Regression Line
XX
Y
Y = Y =
average average
Total Deviation Total Deviation
Deviation not Deviation not
explained by explained by
regression regression
Deviation Deviation
explained by explained by
regression regression
5
9
Regression Analysis Terms
Explained variance = R Explained variance = R
22
(coefficient of (coefficient of
determination). determination).
Unexplained variance = residuals (error). Unexplained variance = residuals (error).
Y
X
1
X
2
X
3
6
Estimation Process
Multiple Regression Model Multiple Regression Model
yy ==bb
00
++bb
11
xx
1 1
++bb
22
xx
2 2
+. . + +. . +bb
pp
xx
pp
+ + ee
Multiple Regression Equation Multiple Regression Equation
EE((yy) = ) =bb
00
++bb
11
xx
1 1
++bb
22
xx
2 2
+. . . + +. . . +bb
pp
xx
pp
Unknown parameters are Unknown parameters are
bb
00
, , bb
11
, , bb
22
, . . . , , . . . , bb
pp
Sample Data: Sample Data:
xx
11
xx
22
. . . . . . xx
pp
yy
. . . . . . . .
. . . . . . . .
Estimated Multiple Estimated Multiple
Regression Equation Regression Equation
bb
00
, , bb
11
, , bb
22
, , . . . , . . . , bb
pp
are sample statistics are sample statistics
bb
00
, , bb
11
, , bb
22
, , . . . , . . . , bb
pp
provide estimates of provide estimates of
bb
00
, , bb
11
, , bb
22
, . . . , , . . . , bb
pp
0 1 1 2 2

...
p p
y b bx bx b x
Least Squares Method
Least Squares Criterion
Computation of Coefficients Values
The formulas for the regression
coefficients b
0
, b
1
, b
2
, . . . b
p
involve the
use of matrix algebra. We will rely on
computer software packages to perform
the calculations.
min (
i
y y
i
)
2
min (
i
y y
i
)
2
^^
7
Least Squares Method Least Squares Method
A Note on Interpretation of Coefficients A Note on Interpretation of Coefficients
bb
i i
represents an estimate of the change in represents an estimate of the change in yy
corresponding to a one corresponding to a one--unit change in unit change in xx
ii
when all other when all other
independent variables are held constant. independent variables are held constant.
Relationship Among SST, SSR, SSE
SST = SSR + SSE
Multiple Coefficient of Determination Multiple Coefficient of Determination
8
Multiple Coefficient of Determination Multiple Coefficient of Determination
R R
22
=SSR/SST =SSR/SST
Adjusted Multiple Coefficient of Determination Adjusted Multiple Coefficient of Determination
If R
2
>0, then we reject the null hypothesis of no
relationship.
Multiple Coefficient of Determination Multiple Coefficient of Determination
R R
n
n p
a
2 2
1 1
1
1



( ) R R
n
n p
a
2 2
1 1
1
1



( )
Model Assumptions
Assumptions About the Error Term e
1. The error e is a random variable with mean of zero.
Implication: For given value of several independent
variables, the expected or average value of y is given by
E(y) =b
0
+b
1
x
1
+b
2
x
2
+. . . +b
p
x
p
2. The variance of e , denoted by
2
, is the same for all
values of the independent variables. Implication: The
variance of y equals
2
and same for all values of x
1,
x
2,
. .
. X
p
3. The values of e are independent. Implication: the size of
the error of a particular set of variables is not related to the
size of any other set of variable.
4. The error e is a normally distributed random variable
reflecting the deviation between the y value and the
expected value of y given by b
0
+b
1
x
1
+b
2
x
2
+. . . +b
p
x
p
Implication: y is also a normally distributed random
variable for given b
0,
b
1,
b
2
. . . b
p
9
Example: Programmer Salary Survey Example: Programmer Salary Survey
A software firm collected data for a sample of 20 A software firm collected data for a sample of 20
computer programmers. A suggestion was made that computer programmers. A suggestion was made that
regression analysis could be used to determine if salary regression analysis could be used to determine if salary
was related to the years of experience and the score on was related to the years of experience and the score on
the firms programmer aptitude test. the firms programmer aptitude test.
The years of experience, score on the aptitude test, The years of experience, score on the aptitude test,
and corresponding monthly salary ( Rs. 1000s) for a sample and corresponding monthly salary ( Rs. 1000s) for a sample
of 20 programmers is shown on the next slide. of 20 programmers is shown on the next slide.
Exper Exper.. Score Score Salary Salary Exper Exper.. Score Score Salary Salary
44 78 78 24 24 99 88 88 38 38
77 100 100 43 43 22 73 73 26.6 26.6
11 86 86 23.7 23.7 10 10 75 75 36.2 36.2
55 82 82 34.3 34.3 55 81 81 31.6 31.6
88 86 86 35.8 35.8 66 74 74 29 29
10 10 84 84 38 38 88 87 87 34 34
00 75 75 22.2 22.2 44 79 79 30.1 30.1
11 80 80 23.1 23.1 66 94 94 33.9 33.9
66 83 83 30 30 33 70 70 28.2 28.2
66 91 91 33 33 33 89 89 30 30
Example: Programmer Salary Survey Example: Programmer Salary Survey
10
In simple linear regression, the F and t tests
provide the same conclusion.
In multiple regression, the F and t tests
have different purposes.
The F test is used to determine whether a
significant linear relationship exists between
the dependent variable and the set of all the
independent variables.
The F test is referred to as the test for
overall significance.
Testing for Significance
Testing for Significance: t
Test
If the F test shows an overall significance,
the t test is used to determine whether each
of the individual independent variables is
significant.
A separate t test is conducted for each of
the independent variables in the model.
We refer to each of these t tests as a test
for individual significance.
11
Testing for Significance: F Test
Hypotheses
H
0
: b
1
=b
2
=. . . =b
p
=0
H
a
: One or more of the parameters
is not equal to zero.
Test Statistic
F =MSR/MSE
Rejection Rule
Reject H
0
if F >F

where F

is based on an F distribution with p d.f. in


the numerator and n - p - 1 d.f. in the denominator. Decide
significance based on p value
Testing for Significance: t
Test
Hypotheses
H
0
: b
i
=0
H
a
: b
i
=0
Test Statistic
Rejection Rule
Reject H
0
if t <-t

or t >t

where t

is based on a t distribution with


n - p - 1 degrees of freedom.
Decide significance based on p value
t
b
s
i
b
i
t
b
s
i
b
i

12
Multicollinearity
Multicollinearity
Is multicollinearity bad?
If the multicollinearity is perfect, the
regression coefficient becomes
indeterminate.
Substitute value of x1 = ax2 in OLS
estimate.
13
What causes multicollinearity?
God causes multicollinearity!
The data collection procedure sampling
limited range of values.
Inbuilt constraint in the model income and
wealth, expenditure and number of member
in the household.
In time series data when regressors share
common trend.
How to detect multicollinearity
Check the pair-wise correlation between
explanatory variable.
14
Remedy
Why to bother if I am using the model for
prediction!
Carryout factor analysis
Drop one collinear variable
Transform the variable by taking the
difference between two time (Time series
data)
Statistical vs. Practical Significance?
The F statistic is used to determine if the overall regression model is The F statistic is used to determine if the overall regression model is
statistically significant. If the F statistic is insignificant, it means it is unlikely your statistically significant. If the F statistic is insignificant, it means it is unlikely your
sample will produce a large R sample will produce a large R
22
when the population R when the population R
22
is actually zero. To be is actually zero. To be
considered statistically significant, a rule of thumb is there must be <.05 probability considered statistically significant, a rule of thumb is there must be <.05 probability
the results are due to chance. the results are due to chance.
If the R If the R
22
is statistically significant, we then evaluate the strength of the linear is statistically significant, we then evaluate the strength of the linear
association between the dependent variable and the several independent variables. R association between the dependent variable and the several independent variables. R
22
, ,
also called the coefficient of determination, is used to measure the strength of the also called the coefficient of determination, is used to measure the strength of the
overall relationship. It represents the amount of variation in the dependent variable overall relationship. It represents the amount of variation in the dependent variable
associated with all of the independent variables considered together (it also is referred associated with all of the independent variables considered together (it also is referred
to as a measure of the goodness of fit). R to as a measure of the goodness of fit). R
22
ranges from0 to 1.0 and represents the ranges from0 to 1.0 and represents the
amount of the dependent variable explained by the independent variables combined. amount of the dependent variable explained by the independent variables combined.
A large R A large R
22
indicates the straight line works well while a small R indicates the straight line works well while a small R
22
indicates it does not indicates it does not
work well. work well.
Even though an R Even though an R
22
is statistically significant, it does not mean it is practically is statistically significant, it does not mean it is practically
significant. We also must ask whether the results are meaningful. For example, is significant. We also must ask whether the results are meaningful. For example, is
the value of knowing you have explained 4 percent of the variation worth the cost of the value of knowing you have explained 4 percent of the variation worth the cost of
collecting and analyzing the data? collecting and analyzing the data?
15
Exercise: Multiple Regression Exercise: Multiple Regression
1. 1. Review the data for the McDonald's Review the data for the McDonald's
restaurant case. restaurant case.
2. 2. Where could multiple regression be Where could multiple regression be
useful for the customer survey? useful for the customer survey?
3. 3. Where could multiple regression be Where could multiple regression be
useful for the employee survey? useful for the employee survey?
30
Variable Description Variable Type
Restaurant Perceptions
X
1
Excellent Food Quality Metric
X
2
Attractive Interior Metric
X
3
Generous Portions Metric
X
4
Excellent Food Taste Metric
X
5
Good Value for the Money Metric
X
6
Friendly Employees Metric
X
7
Appears Clean & Neat Metric
X
8
Fun Place to Go Metric
X
9
Wide Variety of menu Items Metric
X
10
Reasonable Prices Metric
X
11
Courteous Employees Metric
X
12
Competent Employees Metric
Selection Factor Rankings
X
13
Food Quality Nonmetric
X
14
Atmosphere Nonmetric
X
15
Prices Nonmetric
X
16
Employees Nonmetric
Relationship & Classification Variables
X
17
Satisfaction Metric
X
18
Likely to Return in Future Metric
X
19
Recommend to Friend Metric
X
20
Frequency of Patronage Nonmetric
X
21
Who Saw Ad Nonmetric
X
22
Which Ad Viewed Nonmetric
X
23
Ad Rating Metric
X
24
Length of Time a Customer Metric
X
25
Gender Nonmetric
X
26
Age Metric
X
27
Income Metric
X
28
Competitor Nonmetric
Description of Customer Survey Variables Description of Customer Survey Variables
16
31
Variable Description Variable Type
Work Environment Measures
X
1
I am paid fairly for the work I do. Metric
X
2
I am doing the kind of work I want. Metric
X
3
My supervisor gives credit an praise for work well done. Metric
X
4
There is a lot of cooperation among the members of my work group. Metric
X
5
My job allows me to learn new skills. Metric
X
6
My supervisor recognizes my potential. Metric
X
7
My work gives me a sense of accomplishment. Metric
X
8
My immediate work group functions as a team. Metric
X
9
My pay reflects the effort I put into doing my work. Metric
X
10
My supervisor is friendly and helpful. Metric
X
11
The members of my work group have the skills and/or training
to do their job well. Metric
X
12
The benefits I receive are reasonable. Metric
Relationship Measures
X
13
Loyalty I have a sense of loyalty to McDonald's restaurant. Metric
X
14
Effort I am willing to put in a great deal of effort beyond that
expected to help McDonald's restaurant to be successful. Metric
X
15
Proud I am proud to tell others that I work for McDonald's restaurant. Metric
Classification Variables
X
16
Intention to Search Metric
X
17
Length of Time an Employee Nonmetric
X
18
Work Type = Part-Time vs. Full-Time Nonmetric
X
19
Gender Nonmetric
X
20
Age Metric
X
21
Performance Metric
Description of Employee Survey Variables Description of Employee Survey Variables
Selected Variables from McDonald's Customer Selected Variables from McDonald's Customer
Survey Survey
XX
11
Excellent Food Quality Excellent Food Quality Strongly Strongly Strongly Strongly
Disagree Agree Disagree Agree
1 2 3 4 5 6 7 1 2 3 4 5 6 7
XX
44
Excellent Food Taste Excellent Food Taste Strongly Strongly Strongly Strongly
Disagree Agree Disagree Agree
1 2 3 4 5 6 7 1 2 3 4 5 6 7
XX
99
Wide Variety of Menu Items Wide Variety of Menu Items Strongly Strongly Strongly Strongly
Disagree Agree Disagree Agree
1 2 3 4 5 6 7 1 2 3 4 5 6 7
XX
18 18
How likely are you to return to How likely are you to return to
McDonald's restaurant in the future? McDonald's restaurant in the future?
Definitely Will Definitely Will Definitely Will Definitely Will
Not Return Not Return Return Return
1 2 3 4 5 6 7 1 2 3 4 5 6 7
17
Using SPSS to Compute a Multiple Regression Using SPSS to Compute a Multiple Regression
Model Model
We want to compare McDonald's customers perceptions with those of We want to compare McDonald's customers perceptions with those of
Domino's, so go to the Data pull Domino's, so go to the Data pull--down menu to split the sample. Scroll down down menu to split the sample. Scroll down
and click on Split File, then on Compare Groups. Highlight variable X and click on Split File, then on Compare Groups. Highlight variable X
28 28
and and
move it into the box labeled Groups based on: and then click OK. Now you move it into the box labeled Groups based on: and then click OK. Now you
can run the regression and compare McDonald's and Domino's. can run the regression and compare McDonald's and Domino's.
The SPSS click through sequence is ANALYZE The SPSS click through sequence is ANALYZE REGRESSION REGRESSION
LINEAR. Highlight X LINEAR. Highlight X
18 18
and move it to the dependent variables box. Highlight and move it to the dependent variables box. Highlight
XX
11
, X , X
4 4
and X and X
99
and move them to the independent variables box. Use the default and move them to the independent variables box. Use the default
Enter in the Methods box. Click on the Statistics button and use the defaults Enter in the Methods box. Click on the Statistics button and use the defaults
for Estimates and Model Fit. Next click on for Estimates and Model Fit. Next click on Descriptives Descriptives and then and then
Continue. There are several other options you could select at the bottom of Continue. There are several other options you could select at the bottom of
this dialog box but for now we will use the program defaults. Click on OK this dialog box but for now we will use the program defaults. Click on OK
at the top right of the dialog box to run the regression. at the top right of the dialog box to run the regression.
Multiple Regression Dialog Boxes Multiple Regression Dialog Boxes
18
Multiple Regression Dialog Boxes Multiple Regression Dialog Boxes
36
Multiple Regression Output Multiple Regression Output
Degrees of freedom (df) = the total number of observations
minus the number of estimated parameters. For example, in
estimating a regression model with a single independent
variable, we estimate two parameters, the intercept (b0) and a
regression coefficient for the independent variable (b1). If the
number of degrees of freedom is small, the resulting prediction
is less generalizable. Conversely, a large degrees-of-freedom
value indicates the prediction is fairly robust with regard to
being representative of the overall sample of respondents.
Total Sum of Squares (SST) = total
amount of variation that exists to be
explained by the independent variables.
TSS = the sum of SSE and SSR.
Sum of Squared Errors (SSE) = the variance in the dependent variable
not accounted for by the regression model = residual. The objective is to
obtain the smallest possible sum of squared errors as a measure of
prediction accuracy.
Sum of Squares Regression (SSR) = the amount
of improvement in explanation of the dependent
variable attributable to the independent variables.
19
Unstandardized Coefficient (B) Unstandardized Coefficient (B)
interpretation = interpretation =for every unit the for every unit the
McDonald's X McDonald's X
11
increases, X increases, X
18 18
(dependent variable) will increase by (dependent variable) will increase by
.260 units. .260 units.
Constant term( Constant term(bb
00
) = ) =also referred to as the also referred to as the
intercept, it is the value on the Y axis intercept, it is the value on the Y axis
(dependent variable axis) where the line (dependent variable axis) where the line
defined by the regression equation crosses the defined by the regression equation crosses the
axis. axis.
Only significant betas are Only significant betas are
interpreted (=>.05) interpreted (=>.05)
Standardized Coefficient (Beta) Standardized Coefficient (Beta)
interpretation = interpretation =This value takes This value takes
care of different units of independent care of different units of independent
and dependent variables. and dependent variables.
Standardized coefficients are used to Standardized coefficients are used to
compare several independent compare several independent
variables. variables.
There is high There is high multicollinearity multicollinearity among the among the
independent variables. This can cause a problem independent variables. This can cause a problem
with the significance of the beta coefficients. See with the significance of the beta coefficients. See
XX
99
on previous slide. on previous slide.
20
Notice that when variables X
1
and X
4
are eliminated
then the beta for X
9
is significant.
Multicollinearity MulticollinearityDiagnostics Diagnostics
Variance Inflation Factor (VIF) Variance Inflation Factor (VIF) measures how much the variance of the measures how much the variance of the
regression coefficients is inflated by regression coefficients is inflated by multicollinearity multicollinearityproblems. If VIF problems. If VIF
equals 0, there is no correlation between the independent measures. A VIF equals 0, there is no correlation between the independent measures. A VIF
measure of 1 is an indication of some association between predictor measure of 1 is an indication of some association between predictor
variables, but generally not enough to cause problems. variables, but generally not enough to cause problems. A maximum A maximum
acceptable VIF value would be 5.0 acceptable VIF value would be 5.0; anything higher would indicate a ; anything higher would indicate a
problem with problem with multicollinearity multicollinearity..
Tolerance Tolerance the amount of variance in an independent variable that is not the amount of variance in an independent variable that is not
explained by the other independent variables. If the other variables explain explained by the other independent variables. If the other variables explain
a lot of the variance of a particular independent variable we have a problem a lot of the variance of a particular independent variable we have a problem
with with multicollinearity multicollinearity. Thus, small values for tolerance indicate problems . Thus, small values for tolerance indicate problems
of of multicollinearity multicollinearity. The minimum cutoff value for tolerance is typically . The minimum cutoff value for tolerance is typically
.20. That is, .20. That is, the tolerance value must be smaller than .20 to indicate a the tolerance value must be smaller than .20 to indicate a
problem of problem of multicollinearity multicollinearity..
21
Using SPSS to Examine Using SPSS to Examine Multicollinearity Multicollinearity::
The SPSS click through sequence is: ANALYZE The SPSS click through sequence is: ANALYZE
REGRESSION REGRESSION LINEAR. Go to McDonald's LINEAR. Go to McDonald's
employee survey data and click on X employee survey data and click on X
13 13
Loyalty and Loyalty and
move it to the Dependent Variables box. Click on move it to the Dependent Variables box. Click on
variables X variables X
11
to X to X
12 12
and move them to the Independent and move them to the Independent
Variables box. The box labeled Method has ENTER as Variables box. The box labeled Method has ENTER as
the default and we will use it. Click on the Statistics the default and we will use it. Click on the Statistics
button and use the Estimates and Model fit defaults. button and use the Estimates and Model fit defaults.
Click on Descriptives and Collinearity diagnostics Click on Descriptives and Collinearity diagnostics
and then Continue and OK to run the regression. and then Continue and OK to run the regression.
Tolerance and VIF are two statistics that tell us Tolerance and VIF are two statistics that tell us
about the extent of about the extent of multicollinearity multicollinearity exists in the exists in the
model. Tolerance is reciprocal of VIF model. Tolerance is reciprocal of VIF
22
Residuals Plots
Histogram of standardized residuals Histogram of standardized residuals enables you to determine if the enables you to determine if the
errors are normally distributed (see Exhibit 1). errors are normally distributed (see Exhibit 1).
Normal probability plot Normal probability plot enables you to determine if the errors are enables you to determine if the errors are
normally distributed. It compares the observed (sample) standardized normally distributed. It compares the observed (sample) standardized
residuals against the expected standardized residuals from a normal residuals against the expected standardized residuals from a normal
distribution (see Exhibit 2). distribution (see Exhibit 2).
ScatterPlot ScatterPlot of residuals of residuals can be used to test regression assumptions. can be used to test regression assumptions.
It compares the standardized predicted values of the dependent It compares the standardized predicted values of the dependent
variable against the standardized residuals from the regression variable against the standardized residuals from the regression
equation (see Exhibit 3). If the plot exhibits a random pattern then equation (see Exhibit 3). If the plot exhibits a random pattern then
this indicates no identifiable violations of the assumptions underlying this indicates no identifiable violations of the assumptions underlying
regression analysis. regression analysis.
Using SPSS to Examine Residuals Using SPSS to Examine Residuals
SPSS includes several diagnostic tools to examine residuals. To run the regression that SPSS includes several diagnostic tools to examine residuals. To run the regression that
examines the residuals, first load the employee database. The click through sequence is examines the residuals, first load the employee database. The click through sequence is
ANALYZE ANALYZE REGRESSION REGRESSION LINEAR. Highlight X LINEAR. Highlight X
15 15
Proud and move it to the Proud and move it to the
dependent variable box. Next highlight variables X dependent variable box. Next highlight variables X
22
, X , X
55
, X , X
77
, and X , and X
19 19
and move them to the and move them to the
independent variable box. Enter is the default in the Methods box and we will use it. Click independent variable box. Enter is the default in the Methods box and we will use it. Click
on the Statistics button and Estimates and Model Fit will be the defaults. Now, click on on the Statistics button and Estimates and Model Fit will be the defaults. Now, click on
Collinearity Collinearity Diagnostics and then go to the bottom left of the screen in the Residuals box and Diagnostics and then go to the bottom left of the screen in the Residuals box and
click on click on Casewise Casewise Diagnostics. The default is to identify outliers outside 3 standard Diagnostics. The default is to identify outliers outside 3 standard
deviations, but in this case we are going to be conservative and use 2 standard deviations. deviations, but in this case we are going to be conservative and use 2 standard deviations.
Click on Outliers outside and then place a 2 in the box for number of standard deviations. Click on Outliers outside and then place a 2 in the box for number of standard deviations.
Next click on Continue. Next click on Continue.
This is the same sequence as earlier regression applications, but now we also must go to This is the same sequence as earlier regression applications, but now we also must go to
the Plots button to request some new information. To produce plots of the residuals to check the Plots button to request some new information. To produce plots of the residuals to check
on potential violations of the regression assumptions, click on ZPRED and move it to the on potential violations of the regression assumptions, click on ZPRED and move it to the
Y box. Then click on ZRESID and move it to the X box. These two plots are for the Y box. Then click on ZRESID and move it to the X box. These two plots are for the
Standardized Predicted Dependent Variable and Standardized Residuals. Next, click on Standardized Predicted Dependent Variable and Standardized Residuals. Next, click on
Histogram and Normal Probability plot under the Standardized Residual Plots box on the Histogram and Normal Probability plot under the Standardized Residual Plots box on the
lower left side of the screen. Examination of these plots and tables enables us to determine lower left side of the screen. Examination of these plots and tables enables us to determine
whether the hypothesized relationship between the dependent variable X whether the hypothesized relationship between the dependent variable X
15 15
and the and the
independent variables X independent variables X
22
, X , X
55
, X , X
77
, and X , and X
19 19
is linear, and also whether the error terms in the is linear, and also whether the error terms in the
regression model are normally distributed. Finally, click on Continue and then on OK to regression model are normally distributed. Finally, click on Continue and then on OK to
run the program. The results are the same as in Exhibits 1 to 3. run the program. The results are the same as in Exhibits 1 to 3.
23
Exhibit 1: Histogram of Employee Survey
Dependent Variable X
15
Proud
Regression Standardized Residual
2.25
2.00
1.75
1.50
1.25
1.00
.75
.50
.25
0.00
-.25
-.50
-.75
-1.00
-1.25
-1.50
-1.75
Histogram
Dependent Variable: X15 -- Proud
F
r
e
q
u
e
n
c
y
10
8
6
4
2
0
Std. Dev =.97
Mean =0.00
N =63.00
Exhibit 2: Normal Probability Plot of
Regression Standardized Residuals
Normal P-P Plot of Regression Standardized Residual
Dependent Variable: X15 -- Proud
Observed Cum Prob
1.00 .75 .50 .25 0.00
E
x
p
e
c
t
e
d

C
u
m

P
r
o
b
1.00
.75
.50
.25
0.00
Normal probability plot = a
graphical comparison of the
shape of the sample
distribution (observed) to the
normal distribution. The
straight line angled at 45
degrees is the normal
distribution and the actual
distribution (observed) is
shown as deviations from the
straight line.
24
Exhibit 3: Scatterplot of Employee Survey
Dependent Variable X
15
Proud
Scatterplot
Dependent Variable: X15 -- Proud
Regression Standardized Residual
3 2 1 0 -1 -2
R
e
g
r
e
s
s
i
o
n

S
t
a
n
d
a
r
d
i
z
e
d

P
r
e
d
i
c
t
e
d

V
a
l
u
e
3
2
1
0
-1
-2
This is a scatterplot of the
standardized residuals versus the
predicted dependent (Y) values. If
it exhibits a random pattern, which
this plot does, then it indicates no
identifiable violations of the
assumptions underlying regression
analysis and is called a Null Plot.
Thank you

Das könnte Ihnen auch gefallen