Sie sind auf Seite 1von 72

SW388R7

Data Analysis &


Computers II Multiple Regression – Basic Relationships
Slide 1

Purpose of multiple regression

Different types of multiple regression

Standard multiple regression

Hierarchical multiple regression

Stepwise multiple regression

Steps in solving regression problems


SW388R7
Data Analysis &
Computers II Purpose of multiple regression
Slide 2

 The purpose of multiple regression is to analyze the


relationship between metric or dichotomous
independent variables and a metric dependent
variable.

 If there is a relationship, using the information in the


independent variables will improve our accuracy in
predicting values for the dependent variable.
SW388R7
Data Analysis &
Computers II Types of multiple regression
Slide 3

 There are three types of multiple regression, each of


which is designed to answer a different question:
 Standard multiple regression is used to evaluate
the relationships between a set of independent
variables and a dependent variable.
 Hierarchical, or sequential, regression is used to
examine the relationships between a set of
independent variables and a dependent variable,
after controlling for the effects of some other
independent variables on the dependent variable.
 Stepwise, or statistical, regression is used to
identify the subset of independent variables that
has the strongest relationship to a dependent
variable.
SW388R7
Data Analysis &
Computers II Standard multiple regression
Slide 4

 In standard multiple regression, all of the


independent variables are entered into the
regression equation at the same time
 Multiple R and R² measure the strength of the
relationship between the set of independent
variables and the dependent variable. An F
test is used to determine if the relationship
can be generalized to the population
represented by the sample.
 A t-test is used to evaluate the individual
relationship between each independent
variable and the dependent variable.
SW388R7
Data Analysis &
Computers II Hierarchical multiple regression
Slide 5

 In hierarchical multiple regression, the


independent variables are entered in two
stages.
 In the first stage, the independent variables
that we want to control for are entered into
the regression. In the second stage, the
independent variables whose relationship we
want to examine after the controls are
entered.
 A statistical test of the change in R² from the
first stage is used to evaluate the importance
of the variables entered in the second stage.
SW388R7
Data Analysis &
Computers II Stepwise multiple regression
Slide 6

 Stepwise regression is designed to find the


most parsimonious set of predictors that are
most effective in predicting the dependent
variable.
 Variables are added to the regression
equation one at a time, using the statistical
criterion of maximizing the R² of the included
variables.
 When none of the possible addition can make
a statistically significant improvement in R²,
the analysis stops.
SW388R7
Data Analysis &
Computers II Problem 1 - standard multiple regression
Slide 7

In the dataset GSS2000.sav, is the following statement true, false, or an incorrect


application of a statistic? Assume that there is no problem with missing data, violation of
assumptions, or outliers, and that the split sample validation will confirm the
generalizability of the results. Use a level of significance of 0.05.

The variables "strength of affiliation" [reliten] and "frequency of prayer" [pray] have a
strong relationship to the variable "frequency of attendance at religious services" [attend].

Survey respondents who were less strongly affiliated with their religion attended religious
services less often. Survey respondents who prayed less often attended religious services
less often.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II Dissecting problem 1 - 1
Slide 8

When a problem states that there is a


relationship between some independent
variables and a dependent variable, we
do standard multiple regression.

The variables listed first in the


1. Instatement
problem the datasetare GSS2000.sav,
the is the following statement true, false, or an incorrect
independent variables
application (ivs): Assume that there is no problem with missing data, violation of
of a statistic?
"strength of affiliation"
assumptions, [reliten]and that the split sample validation will confirm the
or outliers,
and "frequency of prayer" [pray]
generalizability of the results. Use a level of significance of 0.05.

The variables "strength of affiliation" [reliten] and "frequency of prayer" [pray] have a
strong relationship to the variable "frequency of attendance at religious services" [attend].

Survey respondents who were less strongly affiliated with their religion attended religious
services less often. Survey respondents who prayed less often attended
The variable religious
that is services
less often. related to is the
dependent variable
(dv): "frequency of
1. True attendance at religious
2. True with caution services" [attend].
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II Dissecting problem 1 - 2
Slide 9

In1.order fordataset
In the a problem to be true, is
GSS2000.sav, wethe following statement true, false, or an incorrect
will have find:of a statistic? Assume that there is no problem with missing data, violation of
application
•a statistically significant relationship
assumptions, or outliers, and that the split sample validation will confirm the
between the ivs and the dv
•ageneralizability of the
relationship of the results.
correct Use a level of significance of 0.05.
strength

The variables "strength of affiliation" [reliten] and "frequency of prayer" [pray] have a
strong relationship to the variable "frequency of attendance at religious services" [attend].

Survey respondents who were less strongly affiliated with their religion attended religious
services less often. Survey respondents who prayed less often attended religious services
less often.

1. True The relationship of each of


2. True with caution the independent variables
3. False to the dependent variable
must be statistically
4. Inappropriate application of a statistic significant and interpreted
correctly.
SW388R7
Data Analysis &
Computers II Request a standard multiple regression
Slide 10

To compute a multiple
regression in SPSS, select
the Regression | Linear
command from the Analyze
menu.
SW388R7
Data Analysis &
Computers II Specify the variables and selection method
Slide 11

First, move the


dependent variable
attend to the
Dependent text box.

Second, move the


independent variables
reliten and pray to
the Independent(s)
list box.

Third, select the method


for entering the variables
into the analysis from the
drop down Method menu.
In this example, we accept
the default of Enter for
direct entry of all variables,
which produces a standard
Fourth, click on the multiple regression.
Statistics… button to
specify the statistics
options that we want.
SW388R7
Data Analysis &
Computers II Specify the statistics output options
Slide 12

First, mark the


checkboxes for
Estimates on
the Regression
Coefficients
panel.

Third, click on
Second, mark the Continue
the checkboxes button to close
for Model Fit and the dialog box.
Descriptives.
SW388R7
Data Analysis &
Computers II Request the regression output
Slide 13

Click on the OK
button to
request the
regression
output.
SW388R7
Data Analysis &
Computers II LEVEL OF MEASUREMENT
Slide 14

Multiple regression requires that the dependent variable be


metric and the independent variables be metric or
dichotomous. "Frequency of attendance at religious services"
[attend] is an ordinal level variable, which satisfies the level
of measurement requirement if we follow the convention of
treating ordinal level variables as metric variables. Since
some data analysts do not agree with this convention, a note
of caution should be included in our interpretation.

"Strength of affiliation" [reliten] and "frequency of prayer"


[pray] are ordinal level variables. If we follow the convention
of treating ordinal level variables as metric variables, the
level of measurement requirement for multiple regression
analysis is satisfied. Since some data analysts do not agree
with this convention, a note of caution should be included in
our interpretation.
SW388R7
Data Analysis &
Computers II SAMPLE SIZE
Slide 15

Descriptive Statistics

Mean Std. Deviation N


HOW OFTEN R ATTENDS
3.15 2.653 113
RELIGIOUS SERVICES
STRENGTH OF
2.12 1.084 113
AFFILIATION
HOW OFTEN DOES R
2.90 1.575 113
PRAY

The minimum ratio of valid cases to


independent variables for multiple
regression is 5 to 1. With 113 valid
cases and 2 independent variables,
the ratio for this analysis is 56.5 to 1,
which satisfies the minimum
requirement.

In addition, the ratio of 56.5 to 1


satisfies the preferred ratio of 15 to 1.
SW388R7
Data Analysis & OVERALL RELATIONSHIP BETWEEN INDEPENDENT
AND DEPENDENT VARIABLES - 1
Computers II

Slide 16

The probability of the F statistic (49.824) for the


overall regression relationship is <0.001, less than or
equal to the level of significance of 0.05. We reject
the null hypothesis that there is no relationship
between the set of independent variables and the
dependent variable (R² = 0). We support the
research hypothesis that there is a statistically
significant relationship between the set of
independent variables and the dependent variable.

ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regress ion 374.757 2 187.379 49.824 .000 a
Res idual 413.685 110 3.761
Total 788.442 112
a. Predictors : (Constant), HOW OFTEN DOES R PRAY, STRENGTH OF AFFILIATION
b. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES
SW388R7
Data Analysis & OVERALL RELATIONSHIP BETWEEN
INDEPENDENT AND DEPENDENT VARIABLES - 2
Computers II

Slide 17

The Multiple R for the relationship between the set of


independent variables and the dependent variable is 0.689,
which would be characterized as strong using the rule of
thumb than a correlation less than or equal to 0.20 is
characterized as very weak; greater than 0.20 and less than
or equal to 0.40 is weak; greater than 0.40 and less than or
equal to 0.60 is moderate; greater than 0.60 and less than or
equal to 0.80 is strong; and greater than 0.80 is very strong.

Model Summary

Adjus ted Std. Error of


Model R R Square R Square the Es timate
1 .689 a .475 .466 1.939
a. Predictors : (Constant), HOW OFTEN DOES R PRAY,
STRENGTH OF AFFILIATION
SW388R7
Data Analysis & RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 1
Computers II

Slide 18

For the independent variable strength of affiliation, the


probability of the t statistic (-5.857) for the b
coefficient is <0.001 which is less than or equal to the
level of significance of 0.05. We reject the null
hypothesis that the slope associated with strength of
affiliation is equal to zero (b = 0) and conclude that
there is a statistically significant relationship between
strength of affiliation and frequency of attendance at
religious services.

Coefficientsa

Uns tandardized Standardized


Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Cons tant) 7.167 .442 16.206 .000
STRENGTH OF
-1.138 .194 -.465 -5.857 .000
AFFILIATION
HOW OFTEN
-.554 .134 -.329 -4.145 .000
DOES R PRAY
a. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES
SW388R7
Data Analysis & RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 2
Computers II

Slide 19

Coefficientsa

Uns tandardized Standardized


Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Cons tant) 7.167 .442 16.206 .000
STRENGTH OF
-1.138 .194 -.465 -5.857 .000
AFFILIATION
HOW OFTEN
-.554 .134 -.329 -4.145 .000
DOES R PRAY
a. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES

The b coefficient associated with strength of affiliation


(-1.138) is negative, indicating an inverse relationship in
which higher numeric values for strength of affiliation are
associated with lower numeric values for frequency of
attendance at religious services.

Since both variables are ordinal level, we will have to look


at the coding for each before we can make a correct
interpretation. For ordinal level variables the numeric
codes can be associated with labels in ascending or
descending order.
SW388R7
Data Analysis & RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 3
Computers II

Slide 20

The independent variable


strength of affiliation is an
ordinal variable that is
coded so that higher
numeric values are
associated with survey
respondents who were less
strongly affiliated with their
religion.
SW388R7
Data Analysis & RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 4
Computers II

Slide 21

The dependent variable


frequency of attendance at
religious services is also an
ordinal variable. It is coded
so that lower numeric
values are associated with
survey respondents who
attended religious services
less often.

Therefore, the negative value of b implies


that survey respondents who were less
strongly affiliated with their religion
attended religious services less often.
SW388R7
Data Analysis & RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 5
Computers II

Slide 22

For the independent variable frequency of prayer, the


probability of the t statistic (-4.145) for the b
coefficient is <0.001 which is less than or equal to the
level of significance of 0.05. We reject the null
hypothesis that the slope associated with frequency of
prayer is equal to zero (b = 0) and conclude that there
is a statistically significant relationship between
frequency of prayer and frequency of attendance at
religious services.

Coefficientsa

Uns tandardized Standardized


Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Cons tant) 7.167 .442 16.206 .000
STRENGTH OF
-1.138 .194 -.465 -5.857 .000
AFFILIATION
HOW OFTEN
-.554 .134 -.329 -4.145 .000
DOES R PRAY
a. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES
SW388R7
Data Analysis & RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 6
Computers II

Slide 23

Coefficientsa

Uns tandardized Standardized


Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Cons tant) 7.167 .442 16.206 .000
STRENGTH OF
-1.138 .194 -.465 -5.857 .000
AFFILIATION
HOW OFTEN
-.554 .134 -.329 -4.145 .000
DOES R PRAY
a. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES

The b coefficient associated with how often does r pray


(-0.554) is negative, indicating an inverse relationship in
which higher numeric values for how often does r pray are
associated with lower numeric values for frequency of
attendance at religious services.

Since both variables are ordinal level, we will have to look


at the coding for each before we can make a correct
interpretation. For ordinal level variables the numeric
codes can be associated with labels in ascending or
descending order.
SW388R7
Data Analysis & RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 7
Computers II

Slide 24

The independent variable


frequency of prayer is an
ordinal variable that is
coded so that higher
numeric values are
associated with survey
respondents who prayed
less often.
SW388R7
Data Analysis & RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 8
Computers II

Slide 25

The dependent variable


frequency of attendance at
religious services is also an
ordinal variable. It is coded
so that lower numeric
values are associated with
survey respondents who
attended religious services
less often.

Therefore, the negative value of b


implies that survey respondents who
prayed less often attended religious
services less often.
SW388R7
Data Analysis &
Computers II Answer to problem 1
Slide 26

 The independent and dependent variables were


metric (ordinal).
 The ratio of cases to independent variables was 56.5
to 1.
 The overall relationship was statistically significant
and its strength was characterized correctly.
 The b coefficient for all variables was statistically
significant and the direction of the relationships
were characterized correctly.

 The answer to the question is true with caution. The


caution is added because of the ordinal variables.
SW388R7
Data Analysis &
Computers II Problem 2 – hierarchical regression
Slide 27

In the dataset GSS2000.sav, is the following statement true, false, or an incorrect


application of a statistic? Assume that there is no problem with missing data, violation of
assumptions, or outliers, and that the split sample validation will confirm the
generalizability of the results. Use a level of significance of 0.05.

After controlling for the effects of the variables "age" [age] and "sex" [sex], the addition of
the variables "happiness of marriage" [hapmar], "condition of health" [health], and "attitude
toward life" [life] reduces the error in predicting "general happiness" [happy] by 36.1%.

After controlling for age and sex, the variables happiness of marriage, condition of health,
and attitude toward life each make an individual contribution to reducing the error in
predicting general happiness. Survey respondents who were less happy with their marriages
were less happy overall. Survey respondents who said they were not as healthy were less
happy overall. Survey respondents who felt life was less exciting were less happy overall.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II Dissecting problem 2 - 1
Slide 28

The variables listed first in the


problem statement are the
independent variables (ivs)
whose14.effect
In the
wedataset
want toGSS2000.sav,
control is the following statement true, false, or an incorrect
application
before ofthe
we test for a statistic? Assume that there is no problem with missing data, violation of
assumptions,
relationship: or outliers,
"age"[age] and and that the split sample validation will confirm the
"sex"generalizability
[sex], of the results. Use a level of significance of 0.05.

After controlling for the effects of the variables "age" [age] and "sex" [sex], the addition of
the variables "happiness of marriage" [hapmar], "condition of health" [health], and "attitude
toward life" [life] reduces the error in predicting "general happiness" [happy] by 36.1%.

After controlling for age and sex, the variables happiness of marriage, condition of health,
The
andvariables
attitudethat we add
toward life in after
each the an individual contribution to reducing the error in
make
control variables are the independent
predicting general happiness. Survey respondents who were less
variables that we think will have a The happy with
variable thattheir marriages
to be
were lessrelationship
statistical happy overall.
to the Survey respondents who said they were not asor
predicted healthy
relatedwere
to is less
happy overall.
dependent the dependent variable
Survey respondents who felt life was less exciting were less happy overall.
variable:
"happiness of marriage" [hapmar], (dv): "general happiness"
"condition of health" [health], and [happy]
1. True
"attitude toward life" [life]
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II Dissecting problem 2 - 2
Slide 29

In order for a problem to be true, the


14. In the dataset GSS2000.sav, is therelationship
followingbetween the true,
statement addedfalse,
variables
or an incorrect
and the dependent variable must be
application of a statistic? Assume that there is no problem with missing data, violation of
statistically significant, and the strength of
assumptions, or outliers, and that thethe split sample after
relationship validation willthe
including confirm the
control
generalizability of the results. Use avariables
level ofmust
significance of 0.05.
be correctly stated.

After controlling for the effects of the variables "age" [age] and "sex" [sex], the addition of
the variables "happiness of marriage" [hapmar], "condition of health" [health], and "attitude
toward life" [life] reduces the error in predicting "general happiness" [happy] by 36.1%.

After controlling for age and sex, the variables happiness of marriage, condition of health,
and attitude toward life each make an individual contribution to reducing the error in
predicting general happiness. Survey respondents who were less happy with their marriages
were less happy overall. Survey respondents who said they were not as healthy were less
happy overall. Survey respondents who felt life was less exciting were less happy overall.

The relationship between


1.WeTrue
are generally not interested each of the independent
2.in True
whether
withorcaution
not the control variables entered after the
variables have a statistically control variables and the
3.significant
False relationship to the dependent variable must
4.dependent variables.
Inappropriate application of a statistic be statistically significant
and interpreted correctly.
SW388R7
Data Analysis &
Computers II Request a hierarchical multiple regression
Slide 30

To compute a multiple
regression in SPSS, select
the Regression | Linear
command from the Analyze
menu.
SW388R7
Data Analysis &
Computers II Specify independent variables to control for
Slide 31

First, move the


dependent variable
happy to the
Dependent text box.

Second, move the Fourth, click on the Next


independent variables button to tell SPSS to add
to control for age and another block of variables
sex to the to the regression analysis.
Independent(s) list box.

Third, select the method for


entering the variables into the
analysis from the drop down
Method menu. In this example,
we accept the default of Enter for
direct entry of all variables in the
first block which will force the
controls into the regression.
SW388R7
Data Analysis &
Computers II Add the other independent variables
Slide 32

SPSS identifies that we


will now be adding
variables to a second
block.

First, move the other


independent variables
hapmar, health and
life to the
Independent(s) list
box for block 2.

Second, click on the


Statistics… button to
specify the statistics
options that we want.
SW388R7
Data Analysis &
Computers II Specify the statistics output options
Slide 33

First, mark the


checkboxes for
Estimates on
the Regression
Coefficients
panel.

Third, click on
Second, mark the checkboxes for Model the Continue
Fit, Descriptives, and R squared change. button to close
the dialog box.
The R squared change statistic will tell
us whether or not the variables added
after the controls have a relationship to
the dependent variable.
SW388R7
Data Analysis &
Computers II Request the regression output
Slide 34

Click on the OK
button to
request the
regression
output.
SW388R7
Data Analysis &
Computers II LEVEL OF MEASUREMENT
Slide 35

Multiple regression requires that the dependent variable be metric


and the independent variables be metric or dichotomous. "General
happiness" [happy] is an ordinal level variable, which satisfies the
level of measurement requirement if we follow the convention of
treating ordinal level variables as metric variables. Since some data
analysts do not agree with this convention, a note of caution should
be included in our interpretation.

"Age" [age] is an interval level variable, which satisfies the level of


measurement requirements for multiple regression analysis.

"Happiness of marriage" [hapmar], "condition of health" [health], and


"attitude toward life" [life] are ordinal level variables. If we follow
the convention of treating ordinal level variables as metric variables,
the level of measurement requirement for multiple regression
analysis is satisfied. Since some data analysts do not agree with this
convention, a note of caution should be included in our
interpretation.

"Sex" [sex] is a dichotomous or dummy-coded nominal variable which


may be included in multiple regression analysis.
SW388R7
Data Analysis &
Computers II SAMPLE SIZE
Slide 36

Descriptive Statistics

Mean Std. Deviation N


GENERAL HAPPINESS 1.63 .626 90
AGE OF RESPONDENT 45.50 15.221 90
RESPONDENTS SEX 1.61 .490 90
HAPPINESS OF
1.42 .540 90
MARRIAGE
CONDITION OF HEALTH 1.80 .810 90
IS LIFE EXCITING OR
1.49 .525 90
DULL

The minimum ratio of valid cases to


independent variables for multiple
regression is 5 to 1. With 90 valid
cases and 5 independent variables,
the ratio for this analysis is 18.0 to 1,
which satisfies the minimum
requirement.

In addition, the ratio of 18.0 to 1


satisfies the preferred ratio of 15 to 1.
SW388R7
Data Analysis & OVERALL RELATIONSHIP BETWEEN INDEPENDENT
AND DEPENDENT VARIABLES
Computers II

Slide 37

ANOVAc

Sum of
Model Squares df Mean Square F Sig.
1 Regress ion .006 2 .003 .007 .993 a
Res idual 34.894 87 .401
Total 34.900 89
2 Regress ion 12.601 5 2.520 9.493 .000 b
Res idual 22.299 84 .265
Total 34.900 89
a. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT
b. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, IS LIFE
EXCITING OR DULL, HAPPINESS OF MARRIAGE, CONDITION OF HEALTH
c. Dependent Variable: GENERAL HAPPINESS

The probability of the F statistic (9.493) for the overall


regression relationship for all indpendent variables is
<0.001, less than or equal to the level of significance of
0.05. We reject the null hypothesis that there is no
relationship between the set of all independent variables
and the dependent variable (R² = 0). We support the
research hypothesis that there is a statistically significant
relationship between the set of all independent variables
and the dependent variable.
SW388R7
Data Analysis & REDUCTION IN ERROR IN PREDICTING
DEPENDENT VARIABLE - 1
Computers II

Slide 38

Model Summary

Change Statis tics


Adjus ted Std. Error of R Square
Model R R Square R Square the Es timate Change F Change df1 df2 Sig. F Change
1 .013 a .000 -.023 .633 .000 .007 2 87 .993
2 .601 b .361 .323 .515 .361 15.814 3 84 .000
a. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT
b. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, IS LIFE EXCITING OR DULL, HAPPINESS OF
MARRIAGE, CONDITION OF HEALTH

The R Square Change statistic for the increase in R²


associated with the added variables (happiness of
marriage, condition of health, and attitude toward
life) is 0.361. Using a proportional reduction in
error interpretation for R², information provided by
the added variables reduces our error in predicting
general happiness by 36.1%.
SW388R7
Data Analysis & REDUCTION IN ERROR IN PREDICTING
DEPENDENT VARIABLE - 2
Computers II

Slide 39

Model Summary

Change Statis tics


Adjus ted Std. Error of R Square
Model R R Square R Square the Es timate Change F Change df1 df2 Sig. F Change
1 .013 a .000 -.023 .633 .000 .007 2 87 .993
2 .601 b .361 .323 .515 .361 15.814 3 84 .000
a. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT
b. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, IS LIFE EXCITING OR DULL, HAPPINESS OF
MARRIAGE, CONDITION OF HEALTH

The probability of the F statistic (15.814) for the change in R²


associated with the addition of the predictor variables to the
regression analysis containing the control variables is <0.001, less
than or equal to the level of significance of 0.05. We reject the
null hypothesis that there is no improvement in the relationship
between the set of independent variables and the dependent
variable when the predictors are added (R² Change = 0).

We support the research hypothesis that there is a statistically


significant improvement in the relationship between the set of
independent variables and the dependent variable.
SW388R7
Data Analysis & RELATIONSHIP OF ADDED INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 1
Computers II

Slide 40

Coefficientsa

Uns tandardized Standardized


Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Cons tant) 1.594 .341 4.677 .000
AGE OF RESPONDENT .000 .005 .012 .107 .915
RESPONDENTS SEX .011 .140 .008 .078 .938
2 (Cons tant) .432 .341 1.268 .208
AGE OF RESPONDENT -.001 .004 -.035 -.385 .701
RESPONDENTS SEX -.013 .115 -.010 -.113 .911
HAPPINESS OF
If there is a relationship between
MARRIAGE
.599 each
.104 added individual
.517 5.741 .000
independent variable and the dependent variable, the probability
of the statistical
CONDITION test of the
OF HEALTH .101b coefficient
.072 (slope of.131
the regression
1.408 .163
IS line) will be less
LIFE EXCITING OR than or equal to the level of significance. The
.170states that
null hypothesis for this test .108 b is equal .142
to zero, 1.570 .120
DULL
indicating a flat regression line and no relationship.
a. Dependent Variable: GENERAL HAPPINESS
If we reject the null hypothesis and find that there is a
relationship between the variables, the sign of the b coefficient
indicates the direction of the relationship for the data values. If
b is greater than or equal to zero, the relationship is positive or
direct. If b is less than zero, the relationship is negative or
inverse. If the variable is dichotomous or ordinal, the direction of
the coding must be taken into account to make a correct
interpretation.
SW388R7
Data Analysis & RELATIONSHIP OF ADDED INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 2
Computers II

Slide 41

Coefficientsa

Uns tandardized Standardized


Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Cons tant) 1.594 .341 4.677 .000
AGE OF RESPONDENT .000 .005 .012 .107 .915
RESPONDENTS SEX .011 .140 .008 .078 .938
2 (Cons tant) .432 .341 1.268 .208
AGE OF RESPONDENT -.001 .004 -.035 -.385 .701
RESPONDENTS SEX -.013 .115 -.010 -.113 .911
HAPPINESS OF
.599 .104 .517 5.741 .000
MARRIAGE
CONDITION OF HEALTH .101 .072 .131 1.408 .163
IS LIFE EXCITING OR
.170 .108 .142 1.570 .120
DULL
a. Dependent Variable: GENERAL HAPPINESS

For the independent variable happiness of marriage, the


probability of the t statistic (5.741) for the b coefficient is
<0.001 which is less than or equal to the level of
significance of 0.05.

We reject the null hypothesis that the slope associated


with happiness of marriage is equal to zero (b = 0) and
conclude that there is a statistically significant relationship
between happiness of marriage and general happiness.
SW388R7
Data Analysis & RELATIONSHIP OF ADDED INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 3
Computers II

Slide 42

Coefficientsa

Uns tandardized Standardized


Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Cons tant) 1.594 .341 4.677 .000
AGE OF RESPONDENT .000 .005 .012 .107 .915
RESPONDENTS SEX .011 .140 .008 .078 .938
2 (Cons tant) .432 .341 1.268 .208
AGE OF RESPONDENT -.001 .004 -.035 -.385 .701
RESPONDENTS SEX -.013 .115 -.010 -.113 .911
HAPPINESS OF
.599 .104 .517 5.741 .000
MARRIAGE
CONDITION OF HEALTH .101 .072 .131 1.408 .163
IS LIFE EXCITING OR
.170 .108 .142 1.570 .120
DULL
The b coefficient associated with happiness
of marriage (0.599) is positive, indicating a
a. Dependent Variable: GENERAL HAPPINESS
direct relationship in which higher numeric
values for happiness of marriage are
associated with higher numeric values for
general happiness.
SW388R7
Data Analysis & RELATIONSHIP OF ADDED INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 4
Computers II

Slide 43

The independent variable happiness


of marriage is an ordinal variable
that is coded so that higher
numeric values are associated with
survey respondents who were less
happy with their marriages.
SW388R7
Data Analysis & RELATIONSHIP OF ADDED INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 5
Computers II

Slide 44

The dependent variable


general happiness is also an
ordinal variable. It is coded so
that higher numeric values
are associated with survey
respondents who were less
happy overall.

Therefore, the positive value of b


implies that survey respondents who
were less happy with their marriages
were less happy overall.
SW388R7
Data Analysis & RELATIONSHIP OF ADDED INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 6
Computers II

Slide 45

Coefficientsa

Uns tandardized Standardized


Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Cons tant) 1.594 .341 4.677 .000
AGE OF RESPONDENT .000 .005 .012 .107 .915
RESPONDENTS SEX .011 .140 .008 .078 .938
2 (Cons tant) .432 .341 1.268 .208
AGE OF RESPONDENT -.001 .004 -.035 -.385 .701
RESPONDENTS SEX -.013 .115 -.010 -.113 .911
HAPPINESS OF
.599 .104 .517 5.741 .000
MARRIAGE
CONDITION OF HEALTH .101 .072 .131 1.408 .163
IS LIFE EXCITING OR
.170 .108 .142 1.570 .120
DULL
a. Dependent Variable: GENERAL HAPPINESS

For the independent variable condition of health, the probability of


the t statistic (1.408) for the b coefficient is 0.163 which is greater
than the level of significance of 0.05. We fail to reject the null
hypothesis that the slope associated with condition of health is
equal to zero (b = 0) and conclude that there is not a statistically
significant relationship between condition of health and general
happiness. The statement in the problem that "survey respondents
who said they were not as healthy were less happy overall" is
incorrect.
SW388R7
Data Analysis &
Computers II Answer to problem 2
Slide 46

 The independent and dependent variables were metric or


dichotomous. Some are ordinal.
 The ratio of cases to independent variables was 18.0 to 1.
 The overall relationship was statistically significant and its
strength was characterized correctly.
 The change in R2 associated with adding the second block of
variables was statistically significant and correctly interpreted.
 The b coefficient for happiness of marriage was statistically
significant and correctly interpreted. The b coefficient for
condition of health was not statistically significant. We cannot
conclude that there was a relationship between condition of
health and general happiness.

 The answer to the question is false.


SW388R7
Data Analysis &
Computers II Problem 3 – Stepwise Regression
Slide 47

26. In the dataset GSS2000.sav, is the following statement true, false, or an incorrect
application of a statistic? Assume that there is no problem with missing data, violation of
assumptions, or outliers, and that the split sample validation will confirm the
generalizability of the results. Use a level of significance of 0.05.

From the list of variables "number of hours worked in the past week" [hrs1], "occupational
prestige score" [prestg80], "highest year of school completed" [educ], and "highest academic
degree" [degree], the best predictors of "total family income" [income98] are "highest
academic degree" [degree] and "occupational prestige score" [prestg80]. Highest academic
degree and occupational prestige score have a moderate relationship to total family
income.

The most important predictor of total family income is occupational prestige score. The
second most important predictor of total family income is highest academic degree.

Survey respondents who had higher academic degrees had higher total family incomes.
Survey respondents who had more prestigious occupations had higher total family incomes.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II Dissecting problem 3 - 1
Slide 48

The variables listed first in the The variable that to be


26. statement
problem In the dataset GSS2000.sav, is the following statementpredicted
are the true, false, or an incorrect
or related to is
independent variables from which
application of a statistic? Assume that there is no problem thewithdependent
missing data, violation of
variable
the computer will select the best
assumptions, or outliers, and that the split sample validation will
(dv): confirm
"total the
family income"
subset using statistical criteria. [income98]
generalizability of the results. Use a level of significance of 0.05.

From the list of variables "number of hours worked in the past week" [hrs1], "occupational
prestige score" [prestg80], "highest year of school completed" [educ], and "highest academic
degree" [degree], the best predictors of "total family income" [income98] are "highest
academic degree" [degree] and "occupational prestige score" [prestg80]. Highest academic
degree and occupational prestige score have a moderate relationship to total family
income.

The most important predictor of total family income is occupational prestige score. The
The best predictors are the variables
secondthat
mostwill
important predictor of total family income is highest academic degree.
be meet the statistical
criteria for inclusion in the model.
Survey respondents who had higher academic degrees had higher total family incomes.
Survey respondents who had more prestigious occupations had higher total family incomes.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II Dissecting problem 3 - 2
Slide 49

26. In the dataset GSS2000.sav, is the followingIn order for a problem


statement to be
true, false, ortrue, we
an incorrect
application of a statistic? Assume that there iswill
no have find:with missing data, violation of
problem
•a statistically significant relationship
assumptions, or outliers, and that the split sample validation will confirm the
between the included ivs and the dv
generalizability of the results. Use a level of significance of 0.05.
•a relationship of the correct strength

From the list of variables "number of hours worked in the past week" [hrs1], "occupational
prestige score" [prestg80], "highest year of school completed" [educ], and "highest academic
degree" [degree], the best predictors of "total family income" [income98] are "highest
academic degree" [degree] and "occupational prestige score" [prestg80]. Highest academic
degree and occupational prestige score have a moderate relationship to total family
income.

The most important predictor of total family income is occupational prestige score. The
second most important predictor of total family income is highest academic degree.

Survey respondents who had higher academic degrees had higher total family incomes.
Survey respondents who had of
The importance more
the prestigious
variables is occupations had higher total family incomes.
provided by the stepwise order of entry
of the variable into the regression
1. True analysis.
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II Dissecting problem 3 - 3
Slide 50

26. In the dataset GSS2000.sav, is the following statement true, false, or an incorrect
application of a statistic? Assume that there is no problem with missing data, violation of
assumptions, or outliers, and that therelationship
The split sample validation
between eachwill confirm
of the the
independent
generalizability of the results. Use a levelentered
variables of significance
after theofcontrol
0.05. variables and
the dependent variable must be statistically
significant
From the list of variables "number of hours and interpreted
worked correctly.
in the past week" [hrs1], "occupational
prestige score" [prestg80], "highest year of school completed" [educ], and "highest academic
Since statistical significance of a variable's
degree" [degree], the best predictors of "total family income" [income98] are "highest
contribution toward explaining the variance in the
academic degree" [degree] and "occupational prestige
dependent variable score" always
is almost [prestg80].
used Highest
as the academic
degree and occupational prestigecriteria
score have a moderate
for inclusion, relationship
the statistical to total family
significance of
income. the relationships is usually assured.

The most important predictor of total family income is occupational prestige score. The
second most important predictor of total family income is highest academic degree.

Survey respondents who had higher academic degrees had higher total family incomes.
Survey respondents who had more prestigious occupations had higher total family incomes.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II Request a stepwise multiple regression
Slide 51

To compute a multiple
regression in SPSS, select
the Regression | Linear
command from the Analyze
menu.
SW388R7
Data Analysis & Specify variables and method for selecting
variables
Computers II

Slide 52

First, move the


dependent variable
income98 to the
Dependent text box.

Second, move the


independent variables to
control for hrs1,
prestg80, educ, and
degree to the
Independent(s) list box.

Third, select the Stepwise


method for entering the
variables into the analysis
from the drop down Method
menu.
SW388R7
Data Analysis &
Computers II Open statistics options dialog box
Slide 53

First, click on the


Statistics… button to
specify the statistics
options that we want.
SW388R7
Data Analysis &
Computers II Specify the statistics output options
Slide 54

First, mark the


checkboxes for
Estimates on
the Regression
Coefficients
panel.

Third, click on
Second, mark the the Continue
checkboxes for Model button to close
Fit and Descriptives. the dialog box.
SW388R7
Data Analysis &
Computers II Request the regression output
Slide 55

Click on the OK
button to
request the
regression
output.
SW388R7
Data Analysis &
Computers II LEVEL OF MEASUREMENT
Slide 56

Multiple regression requires that the dependent variable be metric


and the independent variables be metric or dichotomous. "Total
family income" [income98] is an ordinal level variable, which
satisfies the level of measurement requirement if we follow the
convention of treating ordinal level variables as metric variables.
Since some data analysts do not agree with this convention, a note
of caution should be included in our interpretation.

"Number of hours worked in the past week" [hrs1], "occupational


prestige score" [prestg80], and "highest year of school completed"
[educ] are interval level variables, which satisfies the level of
measurement requirements for multiple regression analysis.

"Highest academic degree" [degree] is an ordinal level variable. If we


follow the convention of treating ordinal level variables as metric
variables, the level of measurement requirement for multiple
regression analysis is satisfied. Since some data analysts do not agree
with this convention, a note of caution should be included in our
interpretation.
SW388R7
Data Analysis &
Computers II SAMPLE SIZE
Slide 57

Descriptive Statistics

Mean Std. Deviation N


TOTAL FAMILY INCOME 17.06 4.130 151
NUMBER OF HOURS
41.45 12.076 151
WORKED LAST WEEK
RS OCCUPATIONAL
PRESTIGE SCORE 45.64 14.183 151
(1980)
HIGHEST YEAR OF
14.00 2.587 151
SCHOOL COMPLETED
RS HIGHEST DEGREE 1.74 1.159 151

The minimum ratio of valid cases to independent


variables for stepwise multiple regression is 5 to 1.
With 151 valid cases and 4 independent variables, the
ratio for this analysis is 37.75 to 1, which satisfies the
minimum requirement.

However, the ratio of 37.75 to 1 does not satisfy the


preferred ratio of 50 to 1. A caution should be added
to the interpretation of the analysis and a split sample
validation should be conducted.
SW388R7
Data Analysis & RELATIONSHIP BETWEEN BEST PREDICTORS AND
THE DEPENDENT VARIABLE - 1
Computers II

Slide 58

Variables Entered/Removeda

Variables Variables
Model Entered Removed Method
1 Stepwis e
(Criteria:
The best subset of Probabilit
predictors for total family y-of-F-to-e
RS
income included the HIGHEST .
nter <=
independent variables: DEGREE
.050,
highest academic degree Probabilit
and occupational prestige y-of-F-to-r
emove >=
score.
.100).
2 Stepwis e
(Criteria:
RS Probabilit
OCCUPATI y-of-F-to-e
ONAL nter <=
.
PRESTIGE .050,
SCORE Probabilit
(1980) y-of-F-to-r
emove >=
.100).
a. Dependent Variable: TOTAL FAMILY INCOME
SW388R7
Data Analysis & RELATIONSHIP BETWEEN BEST PREDICTORS
AND THE DEPENDENT VARIABLE - 2
Computers II

Slide 59

The probability of the F statistic (29.146) for the


regression relationship which includes these variables is
<0.001, less than or equal to the level of significance of
0.05. We reject the null hypothesis that there is no
relationship between the best subset of independent
variables and the dependent variable (R² = 0). We support
the research hypothesis that there is a statistically
significant relationship between the best subset of
independent variables and the dependent variable.

ANOVAc

Sum of
Model Squares df Mean Square F Sig.
1 Regress ion 620.049 1 620.049 47.661 .000 a
Res idual 1938.415 149 13.009
Total 2558.464 150
2 Regress ion 722.947 2 361.473 29.146 .000 b
Res idual 1835.517 148 12.402
Total 2558.464 150
a. Predictors : (Constant), RS HIGHEST DEGREE
b. Predictors : (Constant), RS HIGHEST DEGREE, RS OCCUPATIONAL PRESTIGE
SCORE (1980)
c. Dependent Variable: TOTAL FAMILY INCOME
SW388R7
Data Analysis & RELATIONSHIP BETWEEN BEST PREDICTORS
AND THE DEPENDENT VARIABLE - 3
Computers II

Slide 60

Model Summary

Adjus ted Std. Error of


Model R R Square R Square the Es timate
1 .492 a .242 .237 3.607
2 b
.532 .283 .273 3.522
a. Predictors : (Constant), RS HIGHEST DEGREE
b. Predictors : (Constant), RS HIGHEST DEGREE, RS
OCCUPATIONAL PRESTIGE SCORE (1980)

The Multiple R for the relationship between the subset of


independent variables that best predict the dependent variable
is 0.532, which would be characterized as moderate using the
rule of thumb than a correlation less than or equal to 0.20 is
characterized as very weak; greater than 0.20 and less than
or equal to 0.40 is weak; greater than 0.40 and less than or
equal to 0.60 is moderate; greater than 0.60 and less than or
equal to 0.80 is strong; and greater than 0.80 is very strong.
SW388R7
Data Analysis & RELATIONSHIP BETWEEN BEST PREDICTORS AND
THE DEPENDENT VARIABLE - 4
Computers II

Slide 61

Variables Entered/Removeda

Variables Variables
Model Entered Removed Method
1 Stepwis e
Based on the table of
(Criteria:
"Variables Entered/
Probabilit
Removed," the most
y-of-F-to-e
important predictor of total RS
nter <=
family income is highest HIGHEST .
.050,
academic degree. DEGREE
Probabilit
y-of-F-to-r
The second most important emove >=
predictor of total family .100).
income is occupational 2 Stepwis e
prestige score. (Criteria:
RS Probabilit
The importance of the OCCUPATI y-of-F-to-e
predictors stated in the ONAL
.
nter <=
problem is not correct. PRESTIGE .050,
SCORE Probabilit
(1980) y-of-F-to-r
emove >=
.100).
a. Dependent Variable: TOTAL FAMILY INCOME
SW388R7
Data Analysis &
Computers II Answer to problem 3
Slide 62

 The independent and dependent variables were


metric, interval or ordinal.
 The ratio of cases to independent variables was
37.75 to 1.
 The relationship of the included variables was
statistically significant and the strength of the
relationship was characterized correctly.
 However, the order of entry, or importance, was not
stated correctly in the problem.

 The answer to the question is false.


SW388R7
Data Analysis &
Computers II Standard multiple regression - 1
Slide 63

The following is a guide to the decision process for answering


problems about standard multiple regression analysis:

Dependent variable
No Inappropriate
metric?
Independent variables application of
metric or dichotomous? a statistic

Yes

Ratio of cases to No Inappropriate


independent variables at application of
least 5 to 1?
a statistic

Yes

Probability of ANOVA test of No


regression less than/equal to False
level of significance?

Yes
SW388R7
Data Analysis &
Computers II Standard multiple regression - 2
Slide 64

Strength of relationship for No


included variables False
interpreted correctly?

Yes

Probability of relationship No
between each IV and DV False
<= level of significance?

Yes

Direction of relationship No
between each IV and DV False
interpreted correctly?

Yes
SW388R7
Data Analysis &
Computers II Standard multiple regression - 3
Slide 65

Any independent variable or Yes


dependent variable ordinal True with caution
level of measurement?

No

Ratio of cases to independent No


variables at preferred sample True with caution
size of at least 15 to 1?

Yes

True
SW388R7
Data Analysis &
Computers II Hierarchical regression - 1
Slide 66

The following is a guide to the decision process for answering


problems about hierarchical regression analysis:

Dependent variable
No Inappropriate
metric?
Independent variables application of
metric or dichotomous? a statistic

Yes

Ratio of cases to No Inappropriate


independent variables at application of
least 5 to 1?
a statistic

Yes

Probability of ANOVA test


No
of regression less
False
than/equal to level of
significance?

Yes
SW388R7
Data Analysis &
Computers II Hierarchical regression - 2
Slide 67

Probability of F test of for No


change in R² less than or False
equal to level of significance?

Yes

Change in R² correctly No
reported and interpreted? False

Yes

Probability of relationship
No
between each IV added after
False
controls and DV less than or
equal to level of significance?

Yes
SW388R7
Data Analysis &
Computers II Hierarchical regression - 3
Slide 68

Direction of relationship
No
between each IV added
after controls and DV False
interpreted correctly?

Yes

Any independent variable or Yes


dependent variable ordinal True with caution
level of measurement?

No

Ratio of cases to independent No


variables at preferred sample True with caution
size of at least 15 to 1?

Yes

True
SW388R7
Data Analysis &
Computers II Stepwise regression - 1
Slide 69

The following is a guide to the decision process for answering


problems about stepwise regression analysis:

Dependent variable
No Inappropriate
metric?
Independent variables application of
metric or dichotomous? a statistic

Yes

Ratio of cases to No Inappropriate


independent variables at application of
least 5 to 1?
a statistic

Yes

Is the list of independent No


variables selected for False
inclusion correct?

Yes
SW388R7
Data Analysis &
Computers II Stepwise regression - 2
Slide 70

Probability of ANOVA test of No


regression less than/equal to False
level of significance?

Yes

Strength of relationship for No


included variables interpreted False
correctly?

Yes

Is the stated order of No


importance independent False
variables correct?

Yes
SW388R7
Data Analysis &
Computers II Stepwise regression - 3
Slide 71

Yes

Probability of relationship
between each included IV
No
and DV less than or equal to
False
level of significance?

Yes

Direction of relationship
No
between each included IV
and DV interpreted False
correctly?

Yes
SW388R7
Data Analysis &
Computers II Stepwise regression - 4
Slide 72

Yes

Any independent variable or Yes


dependent variable ordinal True with caution
level of measurement?

No

Ratio of cases to independent No


variables at preferred sample True with caution
size of at least 50 to 1?

Yes

True

Das könnte Ihnen auch gefallen