Multiple Regression - Basic Relationships

SW388R7
Data Analysis &

Computers II Multiple Regression – Basic Relationships
Slide 1
Purpose of multiple regression
Different types of multiple regression
Standard multiple regression
Hierarchical multiple regression
Stepwise multiple regression
Steps in solving regression problems

SW388R7
Data Analysis &
Computers II Purpose of multiple regression
Slide 2
 The purpose of multiple regression is to analyze the

relationship between metric or dichotomous
independent variables and a metric dependent
variable.
 If there is a relationship, using the information in the

independent variables will improve our accuracy in
predicting values for the dependent variable.
SW388R7
Data Analysis &
Computers II Types of multiple regression
Slide 3
 There are three types of multiple regression, each of

which is designed to answer a different question:
 Standard multiple regression is used to evaluate
the relationships between a set of independent
variables and a dependent variable.
 Hierarchical, or sequential, regression is used to
examine the relationships between a set of
independent variables and a dependent variable,
after controlling for the effects of some other
independent variables on the dependent variable.
 Stepwise, or statistical, regression is used to
identify the subset of independent variables that
has the strongest relationship to a dependent
variable.
SW388R7
Data Analysis &
Computers II Standard multiple regression
Slide 4
 In standard multiple regression, all of the

independent variables are entered into the
regression equation at the same time
 Multiple R and R² measure the strength of the
relationship between the set of independent
variables and the dependent variable. An F
test is used to determine if the relationship
can be generalized to the population
represented by the sample.
 A t-test is used to evaluate the individual
relationship between each independent
variable and the dependent variable.
SW388R7
Data Analysis &
Computers II Hierarchical multiple regression
Slide 5
 In hierarchical multiple regression, the

independent variables are entered in two
stages.
 In the first stage, the independent variables
that we want to control for are entered into
the regression. In the second stage, the
independent variables whose relationship we
want to examine after the controls are
entered.
 A statistical test of the change in R² from the
first stage is used to evaluate the importance
of the variables entered in the second stage.
SW388R7
Data Analysis &
Computers II Stepwise multiple regression
Slide 6
 Stepwise regression is designed to find the

most parsimonious set of predictors that are
most effective in predicting the dependent
variable.
 Variables are added to the regression
equation one at a time, using the statistical
criterion of maximizing the R² of the included
variables.
 When none of the possible addition can make
a statistically significant improvement in R²,
the analysis stops.
SW388R7
Data Analysis &
Computers II Problem 1 - standard multiple regression
Slide 7
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect

application of a statistic? Assume that there is no problem with missing data, violation of
assumptions, or outliers, and that the split sample validation will confirm the
generalizability of the results. Use a level of significance of 0.05.
The variables "strength of affiliation" [reliten] and "frequency of prayer" [pray] have a
strong relationship to the variable "frequency of attendance at religious services" [attend].
Survey respondents who were less strongly affiliated with their religion attended religious
services less often. Survey respondents who prayed less often attended religious services
less often.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II Dissecting problem 1 - 1
Slide 8
When a problem states that there is a

relationship between some independent
variables and a dependent variable, we
do standard multiple regression.
The variables listed first in the

1. Instatement
problem the datasetare GSS2000.sav,
the is the following statement true, false, or an incorrect
independent variables
application (ivs): Assume that there is no problem with missing data, violation of
of a statistic?
"strength of affiliation"
assumptions, [reliten]and that the split sample validation will confirm the
or outliers,
and "frequency of prayer" [pray]
services less often. Survey respondents who prayed less often attended
The variable religious
that is services
less often. related to is the
dependent variable
(dv): "frequency of
1. True attendance at religious
2. True with caution services" [attend].
3. False
SW388R7
Data Analysis &
Slide 9
In1.order fordataset
In the a problem to be true, is
GSS2000.sav, wethe following statement true, false, or an incorrect
will have find:of a statistic? Assume that there is no problem with missing data, violation of
application
•a statistically significant relationship
between the ivs and the dv
•ageneralizability of the
relationship of the results.
correct Use a level of significance of 0.05.
strength
services less often. Survey respondents who prayed less often attended religious services
less often.
1. True The relationship of each of

2. True with caution the independent variables
3. False to the dependent variable
must be statistically
4. Inappropriate application of a statistic significant and interpreted
correctly.
SW388R7
Data Analysis &
Computers II Request a standard multiple regression
Slide 10
To compute a multiple
regression in SPSS, select
the Regression | Linear
command from the Analyze
menu.
SW388R7
Data Analysis &
Computers II Specify the variables and selection method
Slide 11
First, move the

dependent variable
attend to the
Dependent text box.
Second, move the

reliten and pray to
the Independent(s)
list box.
Third, select the method

for entering the variables
into the analysis from the
drop down Method menu.
In this example, we accept
the default of Enter for
direct entry of all variables,
which produces a standard
Fourth, click on the multiple regression.
Statistics… button to
specify the statistics
options that we want.
SW388R7
Data Analysis &
Computers II Specify the statistics output options
Slide 12
First, mark the

checkboxes for
Estimates on
the Regression
Coefficients
panel.
Third, click on
Second, mark the Continue
the checkboxes button to close
for Model Fit and the dialog box.
Descriptives.
SW388R7
Data Analysis &
Computers II Request the regression output
Slide 13
Click on the OK
button to
request the
regression
output.
SW388R7
Data Analysis &
Computers II LEVEL OF MEASUREMENT
Slide 14
Multiple regression requires that the dependent variable be

metric and the independent variables be metric or
dichotomous. "Frequency of attendance at religious services"
[attend] is an ordinal level variable, which satisfies the level
of measurement requirement if we follow the convention of
treating ordinal level variables as metric variables. Since
some data analysts do not agree with this convention, a note
of caution should be included in our interpretation.
"Strength of affiliation" [reliten] and "frequency of prayer"

[pray] are ordinal level variables. If we follow the convention
of treating ordinal level variables as metric variables, the
level of measurement requirement for multiple regression
analysis is satisfied. Since some data analysts do not agree
with this convention, a note of caution should be included in
our interpretation.
SW388R7
Data Analysis &
Computers II SAMPLE SIZE
Slide 15
Descriptive Statistics
Mean Std. Deviation N

HOW OFTEN R ATTENDS
3.15 2.653 113
RELIGIOUS SERVICES
STRENGTH OF
2.12 1.084 113
AFFILIATION
HOW OFTEN DOES R
2.90 1.575 113
PRAY
The minimum ratio of valid cases to

independent variables for multiple
regression is 5 to 1. With 113 valid
cases and 2 independent variables,
the ratio for this analysis is 56.5 to 1,
which satisfies the minimum
requirement.
In addition, the ratio of 56.5 to 1

satisfies the preferred ratio of 15 to 1.
SW388R7
Data Analysis & OVERALL RELATIONSHIP BETWEEN INDEPENDENT
AND DEPENDENT VARIABLES - 1
Computers II
Slide 16
The probability of the F statistic (49.824) for the

overall regression relationship is <0.001, less than or
equal to the level of significance of 0.05. We reject
the null hypothesis that there is no relationship
between the set of independent variables and the
dependent variable (R² = 0). We support the
research hypothesis that there is a statistically
significant relationship between the set of
independent variables and the dependent variable.
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regress ion 374.757 2 187.379 49.824 .000 a
Res idual 413.685 110 3.761
Total 788.442 112
a. Predictors : (Constant), HOW OFTEN DOES R PRAY, STRENGTH OF AFFILIATION
b. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES
SW388R7
Data Analysis & OVERALL RELATIONSHIP BETWEEN
INDEPENDENT AND DEPENDENT VARIABLES - 2
Computers II
Slide 17
The Multiple R for the relationship between the set of

independent variables and the dependent variable is 0.689,
which would be characterized as strong using the rule of
thumb than a correlation less than or equal to 0.20 is
characterized as very weak; greater than 0.20 and less than
or equal to 0.40 is weak; greater than 0.40 and less than or
equal to 0.60 is moderate; greater than 0.60 and less than or
equal to 0.80 is strong; and greater than 0.80 is very strong.
Model Summary
Adjus ted Std. Error of

Model R R Square R Square the Es timate
1 .689 a .475 .466 1.939
a. Predictors : (Constant), HOW OFTEN DOES R PRAY,
STRENGTH OF AFFILIATION
SW388R7
Data Analysis & RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 1
Computers II
Slide 18
For the independent variable strength of affiliation, the

probability of the t statistic (-5.857) for the b
coefficient is <0.001 which is less than or equal to the
level of significance of 0.05. We reject the null
hypothesis that the slope associated with strength of
affiliation is equal to zero (b = 0) and conclude that
there is a statistically significant relationship between
strength of affiliation and frequency of attendance at
religious services.
Coefficientsa
Uns tandardized Standardized

Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Cons tant) 7.167 .442 16.206 .000
STRENGTH OF
-1.138 .194 -.465 -5.857 .000
AFFILIATION
HOW OFTEN
-.554 .134 -.329 -4.145 .000
DOES R PRAY
a. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES
SW388R7
Computers II
Slide 19
Coefficientsa

1 (Cons tant) 7.167 .442 16.206 .000
STRENGTH OF
-1.138 .194 -.465 -5.857 .000
AFFILIATION
HOW OFTEN
-.554 .134 -.329 -4.145 .000
DOES R PRAY
The b coefficient associated with strength of affiliation

(-1.138) is negative, indicating an inverse relationship in
which higher numeric values for strength of affiliation are
associated with lower numeric values for frequency of
attendance at religious services.
Since both variables are ordinal level, we will have to look

at the coding for each before we can make a correct
interpretation. For ordinal level variables the numeric
codes can be associated with labels in ascending or
descending order.
SW388R7
Computers II
Slide 20
The independent variable

strength of affiliation is an
ordinal variable that is
coded so that higher
numeric values are
associated with survey
respondents who were less
strongly affiliated with their
religion.
SW388R7
Computers II
Slide 21
The dependent variable

frequency of attendance at
religious services is also an
ordinal variable. It is coded
so that lower numeric
values are associated with
survey respondents who
attended religious services
less often.
Therefore, the negative value of b implies

that survey respondents who were less
strongly affiliated with their religion
attended religious services less often.
SW388R7
Computers II
Slide 22
For the independent variable frequency of prayer, the

probability of the t statistic (-4.145) for the b
coefficient is <0.001 which is less than or equal to the
level of significance of 0.05. We reject the null
hypothesis that the slope associated with frequency of
prayer is equal to zero (b = 0) and conclude that there
is a statistically significant relationship between
frequency of prayer and frequency of attendance at
religious services.
Coefficientsa

1 (Cons tant) 7.167 .442 16.206 .000
STRENGTH OF
-1.138 .194 -.465 -5.857 .000
AFFILIATION
HOW OFTEN
-.554 .134 -.329 -4.145 .000
DOES R PRAY
SW388R7
Computers II
Slide 23
Coefficientsa

1 (Cons tant) 7.167 .442 16.206 .000
STRENGTH OF
-1.138 .194 -.465 -5.857 .000
AFFILIATION
HOW OFTEN
-.554 .134 -.329 -4.145 .000
DOES R PRAY
The b coefficient associated with how often does r pray

(-0.554) is negative, indicating an inverse relationship in
which higher numeric values for how often does r pray are
associated with lower numeric values for frequency of
attendance at religious services.
Since both variables are ordinal level, we will have to look

at the coding for each before we can make a correct
interpretation. For ordinal level variables the numeric
codes can be associated with labels in ascending or
descending order.
SW388R7
Computers II
Slide 24
The independent variable

frequency of prayer is an
ordinal variable that is
coded so that higher
numeric values are
associated with survey
respondents who prayed
less often.
SW388R7
Computers II
Slide 25

frequency of attendance at
religious services is also an
ordinal variable. It is coded
so that lower numeric
values are associated with
survey respondents who
attended religious services
less often.
Therefore, the negative value of b

implies that survey respondents who
prayed less often attended religious
services less often.
SW388R7
Data Analysis &
Computers II Answer to problem 1
Slide 26
 The independent and dependent variables were

metric (ordinal).
 The ratio of cases to independent variables was 56.5
to 1.
 The overall relationship was statistically significant
and its strength was characterized correctly.
 The b coefficient for all variables was statistically
significant and the direction of the relationships
were characterized correctly.
 The answer to the question is true with caution. The

caution is added because of the ordinal variables.
SW388R7
Data Analysis &
Computers II Problem 2 – hierarchical regression
Slide 27
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect

After controlling for the effects of the variables "age" [age] and "sex" [sex], the addition of
the variables "happiness of marriage" [hapmar], "condition of health" [health], and "attitude
toward life" [life] reduces the error in predicting "general happiness" [happy] by 36.1%.
After controlling for age and sex, the variables happiness of marriage, condition of health,
and attitude toward life each make an individual contribution to reducing the error in
predicting general happiness. Survey respondents who were less happy with their marriages
were less happy overall. Survey respondents who said they were not as healthy were less
happy overall. Survey respondents who felt life was less exciting were less happy overall.
1. True
3. False
SW388R7
Data Analysis &
Slide 28
The variables listed first in the

problem statement are the
independent variables (ivs)
whose14.effect
In the
wedataset
want toGSS2000.sav,
control is the following statement true, false, or an incorrect
application
before ofthe
we test for a statistic? Assume that there is no problem with missing data, violation of
assumptions,
relationship: or outliers,
"age"[age] and and that the split sample validation will confirm the
"sex"generalizability
[sex], of the results. Use a level of significance of 0.05.
The
andvariables
attitudethat we add
toward life in after
each the an individual contribution to reducing the error in
make
control variables are the independent
predicting general happiness. Survey respondents who were less
variables that we think will have a The happy with
variable thattheir marriages
to be
were lessrelationship
statistical happy overall.
to the Survey respondents who said they were not asor
predicted healthy
relatedwere
to is less
happy overall.
dependent the dependent variable
Survey respondents who felt life was less exciting were less happy overall.
variable:
"happiness of marriage" [hapmar], (dv): "general happiness"
"condition of health" [health], and [happy]
1. True
"attitude toward life" [life]
3. False
SW388R7
Data Analysis &
Slide 29
In order for a problem to be true, the

14. In the dataset GSS2000.sav, is therelationship
followingbetween the true,
statement addedfalse,
variables
or an incorrect
and the dependent variable must be
statistically significant, and the strength of
assumptions, or outliers, and that thethe split sample after
relationship validation willthe
including confirm the
control
generalizability of the results. Use avariables
level ofmust
significance of 0.05.
be correctly stated.
and attitude toward life each make an individual contribution to reducing the error in
predicting general happiness. Survey respondents who were less happy with their marriages
were less happy overall. Survey respondents who said they were not as healthy were less
happy overall. Survey respondents who felt life was less exciting were less happy overall.
The relationship between

1.WeTrue
are generally not interested each of the independent
2.in True
whether
withorcaution
not the control variables entered after the
variables have a statistically control variables and the
3.significant
False relationship to the dependent variable must
4.dependent variables.
Inappropriate application of a statistic be statistically significant
and interpreted correctly.
SW388R7
Data Analysis &
Computers II Request a hierarchical multiple regression
Slide 30
menu.
SW388R7
Data Analysis &
Computers II Specify independent variables to control for
Slide 31
First, move the

dependent variable
happy to the
Dependent text box.
Second, move the Fourth, click on the Next

independent variables button to tell SPSS to add
to control for age and another block of variables
sex to the to the regression analysis.
Independent(s) list box.
Third, select the method for

entering the variables into the
analysis from the drop down
Method menu. In this example,
we accept the default of Enter for
direct entry of all variables in the
first block which will force the
controls into the regression.
SW388R7
Data Analysis &
Computers II Add the other independent variables
Slide 32
SPSS identifies that we

will now be adding
variables to a second
block.
First, move the other

hapmar, health and
life to the
Independent(s) list
box for block 2.
Second, click on the

SW388R7
Data Analysis &
Slide 33
First, mark the

checkboxes for
Estimates on
the Regression
Coefficients
panel.
Third, click on
Second, mark the checkboxes for Model the Continue
Fit, Descriptives, and R squared change. button to close
the dialog box.
The R squared change statistic will tell
us whether or not the variables added
after the controls have a relationship to
the dependent variable.
SW388R7
Data Analysis &
Slide 34
Click on the OK
button to
request the
regression
output.
SW388R7
Data Analysis &
Slide 35
Multiple regression requires that the dependent variable be metric

and the independent variables be metric or dichotomous. "General
happiness" [happy] is an ordinal level variable, which satisfies the
level of measurement requirement if we follow the convention of
treating ordinal level variables as metric variables. Since some data
analysts do not agree with this convention, a note of caution should
be included in our interpretation.
"Age" [age] is an interval level variable, which satisfies the level of

measurement requirements for multiple regression analysis.
"Happiness of marriage" [hapmar], "condition of health" [health], and

"attitude toward life" [life] are ordinal level variables. If we follow
the convention of treating ordinal level variables as metric variables,
the level of measurement requirement for multiple regression
analysis is satisfied. Since some data analysts do not agree with this
convention, a note of caution should be included in our
interpretation.
"Sex" [sex] is a dichotomous or dummy-coded nominal variable which

may be included in multiple regression analysis.
SW388R7
Data Analysis &
Slide 36

GENERAL HAPPINESS 1.63 .626 90
AGE OF RESPONDENT 45.50 15.221 90
RESPONDENTS SEX 1.61 .490 90
HAPPINESS OF
1.42 .540 90
MARRIAGE
CONDITION OF HEALTH 1.80 .810 90
IS LIFE EXCITING OR
1.49 .525 90
DULL
The minimum ratio of valid cases to

independent variables for multiple
regression is 5 to 1. With 90 valid
cases and 5 independent variables,
the ratio for this analysis is 18.0 to 1,
which satisfies the minimum
requirement.
In addition, the ratio of 18.0 to 1

satisfies the preferred ratio of 15 to 1.
SW388R7
Data Analysis & OVERALL RELATIONSHIP BETWEEN INDEPENDENT
AND DEPENDENT VARIABLES
Computers II
Slide 37
ANOVAc
Sum of
1 Regress ion .006 2 .003 .007 .993 a
Res idual 34.894 87 .401
Total 34.900 89
2 Regress ion 12.601 5 2.520 9.493 .000 b
Res idual 22.299 84 .265
Total 34.900 89
a. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT
b. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, IS LIFE
EXCITING OR DULL, HAPPINESS OF MARRIAGE, CONDITION OF HEALTH
c. Dependent Variable: GENERAL HAPPINESS
The probability of the F statistic (9.493) for the overall

regression relationship for all indpendent variables is
<0.001, less than or equal to the level of significance of
0.05. We reject the null hypothesis that there is no
relationship between the set of all independent variables
and the dependent variable (R² = 0). We support the
research hypothesis that there is a statistically significant
relationship between the set of all independent variables
and the dependent variable.
SW388R7
Data Analysis & REDUCTION IN ERROR IN PREDICTING
DEPENDENT VARIABLE - 1
Computers II
Slide 38
Model Summary
Change Statis tics

Adjus ted Std. Error of R Square
Model R R Square R Square the Es timate Change F Change df1 df2 Sig. F Change
1 .013 a .000 -.023 .633 .000 .007 2 87 .993
2 .601 b .361 .323 .515 .361 15.814 3 84 .000
b. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, IS LIFE EXCITING OR DULL, HAPPINESS OF
MARRIAGE, CONDITION OF HEALTH
The R Square Change statistic for the increase in R²

associated with the added variables (happiness of
marriage, condition of health, and attitude toward
life) is 0.361. Using a proportional reduction in
error interpretation for R², information provided by
the added variables reduces our error in predicting
general happiness by 36.1%.
SW388R7
Data Analysis & REDUCTION IN ERROR IN PREDICTING
DEPENDENT VARIABLE - 2
Computers II
Slide 39
Model Summary
Change Statis tics

Adjus ted Std. Error of R Square
Model R R Square R Square the Es timate Change F Change df1 df2 Sig. F Change
1 .013 a .000 -.023 .633 .000 .007 2 87 .993
2 .601 b .361 .323 .515 .361 15.814 3 84 .000
b. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, IS LIFE EXCITING OR DULL, HAPPINESS OF
MARRIAGE, CONDITION OF HEALTH
The probability of the F statistic (15.814) for the change in R²

associated with the addition of the predictor variables to the
regression analysis containing the control variables is <0.001, less
than or equal to the level of significance of 0.05. We reject the
null hypothesis that there is no improvement in the relationship
between the set of independent variables and the dependent
variable when the predictors are added (R² Change = 0).
We support the research hypothesis that there is a statistically

significant improvement in the relationship between the set of
SW388R7
Data Analysis & RELATIONSHIP OF ADDED INDEPENDENT
Computers II
Slide 40
Coefficientsa

1 (Cons tant) 1.594 .341 4.677 .000
AGE OF RESPONDENT .000 .005 .012 .107 .915
RESPONDENTS SEX .011 .140 .008 .078 .938
2 (Cons tant) .432 .341 1.268 .208
AGE OF RESPONDENT -.001 .004 -.035 -.385 .701
RESPONDENTS SEX -.013 .115 -.010 -.113 .911
HAPPINESS OF
If there is a relationship between
MARRIAGE
.599 each
.104 added individual
.517 5.741 .000
independent variable and the dependent variable, the probability
of the statistical
CONDITION test of the
OF HEALTH .101b coefficient
.072 (slope of.131
the regression
1.408 .163
IS line) will be less
LIFE EXCITING OR than or equal to the level of significance. The
.170states that
null hypothesis for this test .108 b is equal .142
to zero, 1.570 .120
DULL
indicating a flat regression line and no relationship.
a. Dependent Variable: GENERAL HAPPINESS
If we reject the null hypothesis and find that there is a
relationship between the variables, the sign of the b coefficient
indicates the direction of the relationship for the data values. If
b is greater than or equal to zero, the relationship is positive or
direct. If b is less than zero, the relationship is negative or
inverse. If the variable is dichotomous or ordinal, the direction of
the coding must be taken into account to make a correct
interpretation.
SW388R7
Computers II
Slide 41
Coefficientsa

1 (Cons tant) 1.594 .341 4.677 .000
RESPONDENTS SEX .011 .140 .008 .078 .938
2 (Cons tant) .432 .341 1.268 .208
RESPONDENTS SEX -.013 .115 -.010 -.113 .911
HAPPINESS OF
.599 .104 .517 5.741 .000
MARRIAGE
CONDITION OF HEALTH .101 .072 .131 1.408 .163
IS LIFE EXCITING OR
.170 .108 .142 1.570 .120
DULL
For the independent variable happiness of marriage, the

probability of the t statistic (5.741) for the b coefficient is
<0.001 which is less than or equal to the level of
significance of 0.05.
We reject the null hypothesis that the slope associated

with happiness of marriage is equal to zero (b = 0) and
conclude that there is a statistically significant relationship
between happiness of marriage and general happiness.
SW388R7
Computers II
Slide 42
Coefficientsa

1 (Cons tant) 1.594 .341 4.677 .000
RESPONDENTS SEX .011 .140 .008 .078 .938
2 (Cons tant) .432 .341 1.268 .208
RESPONDENTS SEX -.013 .115 -.010 -.113 .911
HAPPINESS OF
.599 .104 .517 5.741 .000
MARRIAGE
IS LIFE EXCITING OR
.170 .108 .142 1.570 .120
DULL
The b coefficient associated with happiness
of marriage (0.599) is positive, indicating a
direct relationship in which higher numeric
values for happiness of marriage are
associated with higher numeric values for
general happiness.
SW388R7
Computers II
Slide 43
The independent variable happiness

of marriage is an ordinal variable
that is coded so that higher
numeric values are associated with
survey respondents who were less
happy with their marriages.
SW388R7
Computers II
Slide 44

general happiness is also an
ordinal variable. It is coded so
that higher numeric values
are associated with survey
respondents who were less
happy overall.
Therefore, the positive value of b

implies that survey respondents who
were less happy with their marriages
were less happy overall.
SW388R7
Computers II
Slide 45
Coefficientsa

1 (Cons tant) 1.594 .341 4.677 .000
RESPONDENTS SEX .011 .140 .008 .078 .938
2 (Cons tant) .432 .341 1.268 .208
RESPONDENTS SEX -.013 .115 -.010 -.113 .911
HAPPINESS OF
.599 .104 .517 5.741 .000
MARRIAGE
IS LIFE EXCITING OR
.170 .108 .142 1.570 .120
DULL
For the independent variable condition of health, the probability of

the t statistic (1.408) for the b coefficient is 0.163 which is greater
than the level of significance of 0.05. We fail to reject the null
hypothesis that the slope associated with condition of health is
equal to zero (b = 0) and conclude that there is not a statistically
significant relationship between condition of health and general
happiness. The statement in the problem that "survey respondents
who said they were not as healthy were less happy overall" is
incorrect.
SW388R7
Data Analysis &
Slide 46
 The independent and dependent variables were metric or

dichotomous. Some are ordinal.
 The ratio of cases to independent variables was 18.0 to 1.
 The overall relationship was statistically significant and its
strength was characterized correctly.
 The change in R2 associated with adding the second block of
variables was statistically significant and correctly interpreted.
 The b coefficient for happiness of marriage was statistically
significant and correctly interpreted. The b coefficient for
condition of health was not statistically significant. We cannot
conclude that there was a relationship between condition of
health and general happiness.
 The answer to the question is false.

SW388R7
Data Analysis &
Computers II Problem 3 – Stepwise Regression
Slide 47
26. In the dataset GSS2000.sav, is the following statement true, false, or an incorrect
From the list of variables "number of hours worked in the past week" [hrs1], "occupational
prestige score" [prestg80], "highest year of school completed" [educ], and "highest academic
degree" [degree], the best predictors of "total family income" [income98] are "highest
academic degree" [degree] and "occupational prestige score" [prestg80]. Highest academic
degree and occupational prestige score have a moderate relationship to total family
income.
The most important predictor of total family income is occupational prestige score. The
second most important predictor of total family income is highest academic degree.
Survey respondents who had higher academic degrees had higher total family incomes.
Survey respondents who had more prestigious occupations had higher total family incomes.
1. True
3. False
SW388R7
Data Analysis &
Slide 48
The variables listed first in the The variable that to be

26. statement
problem In the dataset GSS2000.sav, is the following statementpredicted
are the true, false, or an incorrect
or related to is
independent variables from which
application of a statistic? Assume that there is no problem thewithdependent
missing data, violation of
variable
the computer will select the best
assumptions, or outliers, and that the split sample validation will
(dv): confirm
"total the
family income"
subset using statistical criteria. [income98]
income.
The best predictors are the variables
secondthat
mostwill
important predictor of total family income is highest academic degree.
be meet the statistical
criteria for inclusion in the model.
1. True
3. False
SW388R7
Data Analysis &
Slide 49
26. In the dataset GSS2000.sav, is the followingIn order for a problem

statement to be
true, false, ortrue, we
an incorrect
application of a statistic? Assume that there iswill
no have find:with missing data, violation of
problem
•a statistically significant relationship
between the included ivs and the dv
•a relationship of the correct strength
income.
Survey respondents who had of
The importance more
the prestigious
variables is occupations had higher total family incomes.
provided by the stepwise order of entry
of the variable into the regression
1. True analysis.
3. False
SW388R7
Data Analysis &
Slide 50
26. In the dataset GSS2000.sav, is the following statement true, false, or an incorrect
assumptions, or outliers, and that therelationship
The split sample validation
between eachwill confirm
of the the
independent
generalizability of the results. Use a levelentered
variables of significance
after theofcontrol
0.05. variables and
the dependent variable must be statistically
significant
From the list of variables "number of hours and interpreted
worked correctly.
in the past week" [hrs1], "occupational
Since statistical significance of a variable's
contribution toward explaining the variance in the
academic degree" [degree] and "occupational prestige
dependent variable score" always
is almost [prestg80].
used Highest
as the academic
degree and occupational prestigecriteria
score have a moderate
for inclusion, relationship
the statistical to total family
significance of
income. the relationships is usually assured.
1. True
3. False
SW388R7
Data Analysis &
Computers II Request a stepwise multiple regression
Slide 51
menu.
SW388R7
Data Analysis & Specify variables and method for selecting
variables
Computers II
Slide 52
First, move the

dependent variable
income98 to the
Dependent text box.
Second, move the

independent variables to
control for hrs1,
prestg80, educ, and
degree to the
Independent(s) list box.
Third, select the Stepwise

method for entering the
variables into the analysis
from the drop down Method
menu.
SW388R7
Data Analysis &
Computers II Open statistics options dialog box
Slide 53
First, click on the

SW388R7
Data Analysis &
Slide 54
First, mark the

checkboxes for
Estimates on
the Regression
Coefficients
panel.
Third, click on
Second, mark the the Continue
checkboxes for Model button to close
Fit and Descriptives. the dialog box.
SW388R7
Data Analysis &
Slide 55
Click on the OK
button to
request the
regression
output.
SW388R7
Data Analysis &
Slide 56
Multiple regression requires that the dependent variable be metric

and the independent variables be metric or dichotomous. "Total
family income" [income98] is an ordinal level variable, which
satisfies the level of measurement requirement if we follow the
convention of treating ordinal level variables as metric variables.
Since some data analysts do not agree with this convention, a note
of caution should be included in our interpretation.
"Number of hours worked in the past week" [hrs1], "occupational

prestige score" [prestg80], and "highest year of school completed"
[educ] are interval level variables, which satisfies the level of
measurement requirements for multiple regression analysis.
"Highest academic degree" [degree] is an ordinal level variable. If we

follow the convention of treating ordinal level variables as metric
variables, the level of measurement requirement for multiple
regression analysis is satisfied. Since some data analysts do not agree
with this convention, a note of caution should be included in our
interpretation.
SW388R7
Data Analysis &
Slide 57

TOTAL FAMILY INCOME 17.06 4.130 151
NUMBER OF HOURS
41.45 12.076 151
WORKED LAST WEEK
RS OCCUPATIONAL
PRESTIGE SCORE 45.64 14.183 151
(1980)
HIGHEST YEAR OF
14.00 2.587 151
SCHOOL COMPLETED
RS HIGHEST DEGREE 1.74 1.159 151
The minimum ratio of valid cases to independent

variables for stepwise multiple regression is 5 to 1.
With 151 valid cases and 4 independent variables, the
ratio for this analysis is 37.75 to 1, which satisfies the
minimum requirement.
However, the ratio of 37.75 to 1 does not satisfy the

preferred ratio of 50 to 1. A caution should be added
to the interpretation of the analysis and a split sample
validation should be conducted.
SW388R7
Data Analysis & RELATIONSHIP BETWEEN BEST PREDICTORS AND
THE DEPENDENT VARIABLE - 1
Computers II
Slide 58
Variables Entered/Removeda
Variables Variables
Model Entered Removed Method
1 Stepwis e
(Criteria:
The best subset of Probabilit
predictors for total family y-of-F-to-e
RS
income included the HIGHEST .
nter <=
independent variables: DEGREE
.050,
highest academic degree Probabilit
and occupational prestige y-of-F-to-r
emove >=
score.
.100).
2 Stepwis e
(Criteria:
RS Probabilit
OCCUPATI y-of-F-to-e
ONAL nter <=
.
PRESTIGE .050,
SCORE Probabilit
(1980) y-of-F-to-r
emove >=
.100).
a. Dependent Variable: TOTAL FAMILY INCOME
SW388R7
Data Analysis & RELATIONSHIP BETWEEN BEST PREDICTORS
AND THE DEPENDENT VARIABLE - 2
Computers II
Slide 59
The probability of the F statistic (29.146) for the

regression relationship which includes these variables is
<0.001, less than or equal to the level of significance of
0.05. We reject the null hypothesis that there is no
relationship between the best subset of independent
variables and the dependent variable (R² = 0). We support
the research hypothesis that there is a statistically
significant relationship between the best subset of
ANOVAc
Sum of
1 Regress ion 620.049 1 620.049 47.661 .000 a
Res idual 1938.415 149 13.009
Total 2558.464 150
2 Regress ion 722.947 2 361.473 29.146 .000 b
Res idual 1835.517 148 12.402
Total 2558.464 150
a. Predictors : (Constant), RS HIGHEST DEGREE
b. Predictors : (Constant), RS HIGHEST DEGREE, RS OCCUPATIONAL PRESTIGE
SCORE (1980)
c. Dependent Variable: TOTAL FAMILY INCOME
SW388R7
Data Analysis & RELATIONSHIP BETWEEN BEST PREDICTORS
AND THE DEPENDENT VARIABLE - 3
Computers II
Slide 60
Model Summary
Adjus ted Std. Error of

Model R R Square R Square the Es timate
1 .492 a .242 .237 3.607
2 b
.532 .283 .273 3.522
a. Predictors : (Constant), RS HIGHEST DEGREE
b. Predictors : (Constant), RS HIGHEST DEGREE, RS
OCCUPATIONAL PRESTIGE SCORE (1980)
The Multiple R for the relationship between the subset of

independent variables that best predict the dependent variable
is 0.532, which would be characterized as moderate using the
rule of thumb than a correlation less than or equal to 0.20 is
characterized as very weak; greater than 0.20 and less than
or equal to 0.40 is weak; greater than 0.40 and less than or
equal to 0.60 is moderate; greater than 0.60 and less than or
equal to 0.80 is strong; and greater than 0.80 is very strong.
SW388R7
Data Analysis & RELATIONSHIP BETWEEN BEST PREDICTORS AND
THE DEPENDENT VARIABLE - 4
Computers II
Slide 61
Variables Entered/Removeda
Variables Variables
Model Entered Removed Method
1 Stepwis e
Based on the table of
(Criteria:
"Variables Entered/
Probabilit
Removed," the most
y-of-F-to-e
important predictor of total RS
nter <=
family income is highest HIGHEST .
.050,
academic degree. DEGREE
Probabilit
y-of-F-to-r
The second most important emove >=
predictor of total family .100).
income is occupational 2 Stepwis e
prestige score. (Criteria:
RS Probabilit
The importance of the OCCUPATI y-of-F-to-e
predictors stated in the ONAL
.
nter <=
problem is not correct. PRESTIGE .050,
SCORE Probabilit
(1980) y-of-F-to-r
emove >=
.100).
a. Dependent Variable: TOTAL FAMILY INCOME
SW388R7
Data Analysis &
Slide 62
 The independent and dependent variables were

metric, interval or ordinal.
 The ratio of cases to independent variables was
37.75 to 1.
 The relationship of the included variables was
statistically significant and the strength of the
relationship was characterized correctly.
 However, the order of entry, or importance, was not
stated correctly in the problem.
 The answer to the question is false.

SW388R7
Data Analysis &
Computers II Standard multiple regression - 1
Slide 63
The following is a guide to the decision process for answering

problems about standard multiple regression analysis:
Dependent variable
No Inappropriate
metric?
Independent variables application of
metric or dichotomous? a statistic
Yes
Ratio of cases to No Inappropriate

independent variables at application of
least 5 to 1?
a statistic
Yes
Probability of ANOVA test of No

regression less than/equal to False
level of significance?
Yes
SW388R7
Data Analysis &
Slide 64
Strength of relationship for No

included variables False
interpreted correctly?
Yes
Probability of relationship No
between each IV and DV False
<= level of significance?
Yes
Direction of relationship No
between each IV and DV False
Yes
SW388R7
Data Analysis &
Slide 65
Any independent variable or Yes

dependent variable ordinal True with caution
level of measurement?
No
Ratio of cases to independent No

variables at preferred sample True with caution
size of at least 15 to 1?
Yes
True
SW388R7
Data Analysis &
Computers II Hierarchical regression - 1
Slide 66

problems about hierarchical regression analysis:
Dependent variable
No Inappropriate
metric?
Yes

least 5 to 1?
a statistic
Yes
Probability of ANOVA test

No
of regression less
False
than/equal to level of
significance?
Yes
SW388R7
Data Analysis &
Slide 67
Probability of F test of for No

change in R² less than or False
equal to level of significance?
Yes
Change in R² correctly No
reported and interpreted? False
Yes
Probability of relationship
No
between each IV added after
False
controls and DV less than or
equal to level of significance?
Yes
SW388R7
Data Analysis &
Slide 68
Direction of relationship
No
between each IV added
after controls and DV False
Yes

No

Yes
True
SW388R7
Data Analysis &
Computers II Stepwise regression - 1
Slide 69

problems about stepwise regression analysis:
Dependent variable
No Inappropriate
metric?
Yes

least 5 to 1?
a statistic
Yes
Is the list of independent No

variables selected for False
inclusion correct?
Yes
SW388R7
Data Analysis &
Slide 70
Probability of ANOVA test of No

regression less than/equal to False
Yes
Strength of relationship for No

included variables interpreted False
correctly?
Yes
Is the stated order of No

importance independent False
variables correct?
Yes
SW388R7
Data Analysis &
Slide 71
Yes
Probability of relationship
between each included IV
No
and DV less than or equal to
False
Yes
Direction of relationship
No
between each included IV
and DV interpreted False
correctly?
Yes
SW388R7
Data Analysis &
Slide 72
Yes

No

Yes
True

Multiple Regression - Basic Relationships

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Multiple Regression - Basic Relationships

Hochgeladen von

Copyright:

Verfügbare Formate

SW388R7

Data Analysis &

Purpose of multiple regression

Different types of multiple regression

Standard multiple regression

Hierarchical multiple regression

Stepwise multiple regression

Steps in solving regression problems

 The purpose of multiple regression is to analyze the

 If there is a relationship, using the information in the

 There are three types of multiple regression, each of

 In standard multiple regression, all of the

 In hierarchical multiple regression, the

 Stepwise regression is designed to find the

In the dataset GSS2000.sav, is the following statement true, false, or an incorrect

When a problem states that there is a

The variables listed first in the

1. True The relationship of each of

First, move the

Second, move the

Third, select the method

First, mark the

Multiple regression requires that the dependent variable be

"Strength of affiliation" [reliten] and "frequency of prayer"

Mean Std. Deviation N

The minimum ratio of valid cases to

In addition, the ratio of 56.5 to 1

The probability of the F statistic (49.824) for the

The Multiple R for the relationship between the set of

Adjus ted Std. Error of

For the independent variable strength of affiliation, the

Uns tandardized Standardized

Uns tandardized Standardized

The b coefficient associated with strength of affiliation

Since both variables are ordinal level, we will have to look

The independent variable

The dependent variable

Therefore, the negative value of b implies

For the independent variable frequency of prayer, the

Uns tandardized Standardized

Uns tandardized Standardized

The b coefficient associated with how often does r pray

Since both variables are ordinal level, we will have to look

The independent variable

The dependent variable

Therefore, the negative value of b

 The independent and dependent variables were

 The answer to the question is true with caution. The

In the dataset GSS2000.sav, is the following statement true, false, or an incorrect

The variables listed first in the

In order for a problem to be true, the

The relationship between

First, move the

Second, move the Fourth, click on the Next

Third, select the method for

SPSS identifies that we

First, move the other

Second, click on the

First, mark the

Multiple regression requires that the dependent variable be metric

"Age" [age] is an interval level variable, which satisfies the level of

"Happiness of marriage" [hapmar], "condition of health" [health], and