Green and Salkind (2008) - Lesson 33 - Regression PDF

Bivariate Linear Regression fora bivariate linear regression problem, data are collected on an independent or predictor ‘ariable (X) and a dependent or criterion variable (¥) for each individual, Bivariate linear re- on computes an equation that relates predicted Y scores (V) to X scores. The regression includes a slope weight for the independent variable, Bujope, and an additive constant, = BuopoX + Beonsa ces are computed to assess how accurately the ¥ scores are predicted by the linear equation. Thislesson focuses on applications in which both the predictor and criterion are quantitative ables. However, bivariate regression analysis may be used in other applications, For example, ipeictor could have two levels like gender and be scored 0 for male and 1 for female. A crite. foe may also have two levels like pass-fail performance, scored 0 for fail and 1 for pass. Linear regression can be used to analyze data from experimental or nonexperimental de- Sex: Ifthe data are collected using experimental methods (e.g,, a tightly controlled study in participants have been randomly assigned to different treatment groups), the X and ¥ vati- miay be referred to appropriately as the independent and the dependent variables, respec- i SPSS uses these terms. However, ifthe data are collected using nonexperimental methods a study in which subjects are measured on a variety of variables), the X and the Y variables ore appropriately referred to as the predictor and the criterion, respectively. ications of Bivariate Linear Regression will illustrate two types of studies that can be analyzed using bivariate linear regression. 1 Nonexperimental studies 1 Experimental studies experimental Study Mary conducts a nonexperimental study to evaluate what she refers to as the strength- injury hypothesis. It states that overall body strength in elderly women determines the raumber and severity of accidents that cause bodily injury. Ifthe results of her prediction snuly support her strength-injury hypothesis, she plans to conduct an experimental study lo assess whether weight training reduces injuries in elderly women. In the prediction study, Mary collects data from 100 women who range in age from 60 to 75 years old atthe time the study begins. The women initially undergo a number of measures that assess lower- and upper-body strength, and these measures are summarized using an overall index of body strength. Over the next five years, the women record each time they have an accident that results in a bodily injury and describe fully the extent of the injury. On the basis of these data, Mary calculates an overall injury index for each woman. Mary is interested in conducting a regression analysis with the overall index of body strength as the predictor (independent) variable and the overall injury index as the criterion (dependent) variable. Mary's SPSS data file has 100 cases and scores on six variables, five individual strength measures (quads, gluts, abdoms, grip, and arms) that are to be combined to yield an overall index of body strength and the criterion variable, the overall injury index. Experimental Study Jack is interested in the effect of coffee drinking on cigarette smoking. He gains the ‘cooperation of 40 men and women who smoke 20 or more cigarettes and drink 2 or ‘more cups of coffee a day. He randomly assigns 10 participants to each of the four treatment conditions: O cups of coffee, I cup of coffee, 2 cups of coffee, and 3 cups of coffee. Each participant is asked to sit in a small room and watch TV for 43 minutes Those in the coffee drinking conditions are asked to drink their coffee at a uniform pace over the first 35 minutes of the 45-minute period. All subjects are told they may ‘not smoke their own cigarettes, but they may smoke the cigarettes that are on the desk in the room. On the basis of a videotape of each session and the cigarette butts left in the ashtray on the desk, Jack estimates the number of millimeters of cigarettes smoked by each research subject. Jack is interested in conducting a regression analysis with the amount of coffee drunk as the independent variable and the number of millimeters of cigarettes smoked as the dependent variable. Jack's SPSS data file has scores on these two variables for the 40 cases. Understanding Bivariate Linear Regression Eee Ter eas on ree ae es hae Tele Te eames ea tay Pc ae tia ll tana ola a feat Oat Zea uy Ul aco cerca RUT tae Pr ec eet Caer en GaP S ULRPRRETS ck oT RTTAEO aa insadoru eetsrecth cael ian| Giese transom eres et THe Tl eecs weal ia proba maa pp eta eNO teat a ee ha Se Coal rae ie ee ee eee eee cl nonlinear elatioships can exist between the predictor and eiterion, On the other hand, randoneffects assumptions hold, the only type of statistical relationship that can exis be tweet two variables ia linear Gne Regardless of choice of assumptions itis important to examine a bivariate eatep of te cae eee at ete ee Ieee gee ete areca Dean oae ae at eee ed eee fa aged ete: etaciaroaret cece bee (eras Re ee ae a ee ee eae a at oaeet eee Be eee eae eee ee See eae ee ee eee Fixed-Effects Model Assumptions for Bivariate Linear Regression As discussed earlier, there are two potential sets of assumptions to be considered—those ir fixed-effects model and those for a random-effects model, The following are assumpsioe fe a fixed-effects model. 276 Unit 8 / Corelation, Regression, and Discriminant Analysis Procedures As ani The interpre strength‘Assumption 1: The Dependent Variable Is Normally Distributed in the Population JorEach Level of the Independent Variable Jnmnany applications with a moderate or larger sample size, the test of the slope may yield rea- sombly accurate p values even when the normality assumption is violated. To the extent that population distributions are not normal and sample sizes are small, the p values may be invalid, Inaddition, the power of this test may be reduced if the population distributions are nonnormal Assumption 2: The Population Variances of the Dependent Variable Are the Same fur All Levels of the Independent Variable ‘othe extent that this assumption is violated and the sample sizes differ among the levels of the independent variable, the resulting p value for the overall F testis not trustworthy. ‘Assumption 3: The Cases Represent a Random Sample from the Population, and the Scores Are Independent of Each Other from One Individual to the Next The significance test for regression analysis will yield inaccurate p values if the independence ‘sumption is violated, Rendom-Effects Model Assumptions for Bivariate Linear Regression Assumption 1: The X and Y Variables Are Bivariately Normally Distributed inthe Population Ifthe variables are bivariately normally distributed, each variable is normally distributed ignor- ing the other variable and each variable is normally distributed at every level of the other variable. The significance test for bivariate regression yields, in most cases, relatively valid results in tems of Type I errors when the sample is moderate to large in size. If X and ¥ are bivariately normally distributed, the only type of relationship that exists between these variables is linear. Assumption 2: The Cases Represent a Random Sample from the Population, and the Scores on Each Variable Are Independent of Other Scores on the Same Variable ‘he significance test for regression analysis will yield inaccurate p values if the independence sumption is violated. Effect Size Statistics for Bivariate Linear Regression ‘is lesson focuses on using linear regression to evaluate how well a single independent variable pedicts a dependent variable, However, linear regression is a more general procedure that as- ‘25 how well one or more independent variables predict a dependent variable, Consequently, S85 eports strength-of-relationship statistics that are useful for regression analyses with multi- ie predictors. Four correlational indices are presented in the output for the Linear Regression endure: the Pearson product-moment correlation coefficient (r), the multiple correlation coefficient (R), its squared value (R?), and the adjusted R”. However, there is considerable redundancy ising these statistics for the single-predictor case: R = |p, R* = r?, and the adjusted R? is ap- jximately equal to R2, Accordingly, the only correlational indices we need to report in our ‘unusript fora bivariate regression model are rand 72 The Pearson product-moment correlation coefficient ranges in value from —1 to +1. A pos- ive yalue suggests that as the independent variable X increases, the dependent variable ¥ in- ‘ass. A zero value indicates that as X increases, ¥ neither increases nor decreases. A negative ube indicates that as X increases, ¥ decreases. Values closer to ~I or +1 indicate stronger linear ievionships. By convention, correlation coefficients of .10, 30, and .50, irrespective of sign, are ‘ispreted as small, medium, and large coefficients, respectively. However, the interpretation of strogth of relationship should depend on the research context, Lesson 33 / Bivariate Linear Regression 277By squaring r, we obtain an index that directly tells us how well we can predict ¥ from X.? indicates the proportion of ¥ variance that is accounted for by its linear relationship with X. Alter natively, r? can be conceptualized as the proportion reduction in error that we achieve by includ ing X in the regression equation in comparison with not including X in the regression equation. Other strength-of-relationship indices may be reported for bivariate regression problems. For example, SPSS gives Std. Error of the Estimate on the output. The standard error of estimate is an index indicating how large the typical error is in predicting ¥ from X. It is a useful index over and above correlational indices because it indicates how badly we predict the dependent variable scores in the metric of these scores. In comparison, correlational statistics are unitless indices and, therefore, are abstract and difficult to interpret. The Data Set ‘The data set used to illustrate bivariate linear regression is named Lesson 33 Data File J on the Web at http:/www.prenhall.convgreensalkind. It presents data from our strength-injury exam: ple. The variables in the data set are in Table 39. Table 39 Variables in Lesson 33 Data File 1 Variable Definition ‘A measure of strength primarily associated with the quadriceps quads luts ‘A measure of strength of the muscles in the upper part ofthe back ofthe leg and the buttocks abdoms ‘A measure of strength of the muscles of the abdomen and the lower back arms ‘A measure of strenath of the muscles ofthe arms and the shoulders rip ‘An assessment ofthe hand-grip strength injury Overall injury index based on the records kept by the participants The Research Question ‘The research question addresses the relationship between two variables, What is the linear equ tion that predicts the extent of physical injury from body strength for elderly women, and bow accurately does this equation predict the extent of physical injuries? Conducting a Bivariate Linear Regression Analysis ‘The data set includes the dependent variable for the regression analysis (injury) but does notin: clude the independent variable, an overall index of body strength. This index can be calculate by creating z scores for the five strength measures and adding them together. See Lesson 19 fora ‘more detailed explanation for creating an overall variable from standardized variables. To cv duct a bivariate linear regression analysis, follow these steps: 1. Create the total strength variable named ztotstr by z-scoring the five individual strength measures and summing them. Click Analyze, click Regression, then click Linear. You will see the Linear Re- 2. ‘gression dialog box shown in Figure 200. 3. Click injury, then click P to move it to the Dependent box 4. Click ztotst, then click > to have it appear in the Independent(s) box. 5. Click Statisties. You will see the Linear Regression: Statistics dialog box shown in Figure 201 278 Unit 8 / Corelation, Regression, and Discriminant Analysis ProceduresDever: aida Td Bock IndeoedetsS—_ egessn Costin: Besinaee Cleaner nese Blevarance matic esse Tlowbnvaion cemiee dagrosies Outen ante: [I] Figure 201. The Linear Regression: Statistics dialog box. 6. Click Confidence intervals and Deseriptives. Make sure that Estimates and Model Fit are also selected. Click Continue. 8. Click OK. Selected SPSS Output for Bivariate Linear Regression Iereslls ofthe bivariate linear regression analysis are shown in Figure 202 (on page 280). The Bs, ‘slbeled on the output in the Unstandardized Coefficients box, are the additive constant (145.80) inthe slope weight (-4.89) of the regression equation used to predict the dependent variable from eindependent variable. Accordingly, the regression or prediction equation is as follows: Predicted Overall Injury = ~4.89 Overall Strength + 145.80 ‘slope weight indicates that greater overall strength predicts lower scores on the overall in- jay index. It should be noted that the 95% confidence interval forthe slope is fairly wide [1774 10 -2.04), but is negative throughout the range of the interval. Lesson 33 / Bivariate Linear Regression 279CE ee a EL orrettions muy] nots Pearson Canelaion Tun | 1000) 375 tots | -225 | __ 1.000 Sia (rises) iniuy 000) ose | 000 w Tay 00 | 100 owe | 100 | _10 Mode Summary Tasted | id Enorof moet |__| Rsquare | square | te estimate o eto oar] s9510 = Predictors: (Consiand, noise ‘Sumot T T Model squares | ot | woansauae | _F- sig 7 —Rearession [79620 07 TY] aaszotar | 11508 | 0077 Residual | 2011908 os} 21161 Total 2097160 92 @ Preatctors:(Constano, ats Dependent variable: uy coutcints> ‘Unstendaraked —] Standaraned Coeticients ‘Coemelents 25% Condon ntowal for ode [sw evo | seta t cig [Lower Bouns_[ Upper Bouns 7 Consist | te500 | 467 ce ae CC Zot ao] 1.437 a5 |_-auoe|__oot 7743 2040 7 Depend vane fry Figure 202, The results of the bivariate linear regression analysis. A standardized regression equation can be computed if the independent and dependent vari ables are transformed to z scores with a mean of 0 and a standard deviation of 1 Predicted Zryury = ~ 32 Zovera Sirengh For bivariate regression analysis based on standardized scores, the slope (in the standardized co- cfficients box) is always equal to the correlation coefficient and the additive constant must be equal to zero. The standardized slope is labeled beta on the SPSS output. Based on the magnitude of the correlation coefficient, we can conclude that overall strength is moderately related to in- {jury level in this elderly sample, Eleven percent (r? = .106) of the variance of the injury index is ‘associated with overall strength. The hypothesis test of interest evaluates whether the independent variable predicts the depen dent variable in the population, More specifically, it assesses whether the population correlation coetficient is equal to zero or, alternatively, whether the population slope is equal to zero. This sig nificance test appears in two places for a bivariate regression analysis: the F test reported as part of the ANOVA table and the f test associated with the independent variable in the Coetficients tab ‘They yield the same p value because they are identical tests: F(1, 98) = 11.59, p <.01 and 198) = 3.40, p <.01. In addition, the fact that the 95% confidence interval for the slope does not contain the value of zero indicates thatthe hypothesis should be rejected at the .05 level 280 Unit 8 / Correlation, Regression, and Discriminant Analysis ProceduresUsing SPSS Graphs to Display the Results ys apts ct bon bugga Tor tnepraing nca fogeca al Belly edie Ba wo types of graphs, the bivariasseateplor an he plot of predicted and residual Vale, Creating a Bivariate Scatterplot The results of the bivariate regression analysis can be summarized using a bivariate scatterplot. Conduct the following steps to create a simple bivariate scatterplot: click Seatter/Dot. Click Graphs, click Legacy Dialogs, and th Click Simple Scatter, and then click Define. Click injury and click ® to move it to the Y-axis box. Click ztotstr and click to move it to the X-axis box. Click OK. Once you have ereated a scatterplot showing the relationship between the number of injuries ‘adhe total strength index, you can add a regression line by conducting the following steps: Double-click om the chart to select it for editing, and maximize the chart editor. Click on any of the data points in the scatterplot to highlight data points. Click Elements from the main menu, and click on Fit Line at Total, Click Close if a Properties dialog box is open. You will see the scatterplot (with some additional edits) in Figure 203. ‘Number of Injuries a 8 a #00 600 300 000 300 600 900 Total Strength Index Figure 203. Scatterplot between number of injuries and total body strength. An examination of the plot allowis us to assess how accurately the regression equation pre- the dependent variable scores. In this case, the equation offers some predictability, but points fall far off the line, indicating poor prediction for those points. Lesson 33 / Bivariate Linear Regression 28%Creating a Plot of Predicted and Residual Values The plot of predicted and residual values may form a pattern that indicates that an assumption has been violated. For example, a U-shaped plot would suggest thatthe two variables are non- linearly related. Or, ifthe residuals are tightly clustered for some values of the predicted scores and. ‘widely varying for other values, one might suspect that the homogeneity-of-variance assumption hhas been violated, A plot of predicted and residual values can be created by following these steps: 1. Click Amalyze, click Regression, and then click Linear, 2. The appropriate options should already be selected. If not, conduct steps 3 through 7 tas described in the section labeled Conducting a Bivariate Linear Regression Analysis. 3. Click Plots in the Linear Regression dialog box. You will see the Linear Regres- sion: Plots dialog box shown in Figure 204. es zene. | zes DRESD | AbiereD SDRESIO Figure 204. The Linear Regression: Plots dialog box. Click ZRESID, then click ® to move it to the ¥ box. Click ZPRED, then click ® to move it to the X box. Click Continue. Click OK. The edited graph is shown in Figure 205. There is no apparent pattern to the scatterplot that would make us conclude that the assump: tions have been violated. An APA Results Section A Linear regression analysis was conducted to evaluate the prediction of the physical injury index from the overall strength index for elderly women. The scatterplot for the two variables, a, shown in Figure 205, indicates that the two variables are linearly related such that as overall strength increases the overall injury index decreases. The regression equation for predicting the ‘overall injury index is Predicted Overall Injury = —4.89 Overall Strenght + 145.80 ‘The 95% confidence interval for the slope, ~7.74 to —2.04 does not contain the value of zero, and therefore overall strength is significantly related to the overall injury index. As hypothesized, elderly women who are stronger tended to have lower overall injury scores. Accuracy in predicting the overall injury index was moderate. The correlation between the strength index and the injury index was ~.32. Approximately 11% of the variance of the injury index was accounted for by its linear relationship with the strength index. 282 Unit 8 / Corelation, Regression, and Discriminant Analysis ProceduresStandardized Residual 30 20 40 oo 10 20 30 Regression Standardized Predicted Value Figure 205. Scatterplot depicting the relationship between standardized predicted and residual injury scores. Exercises The data for Exercises 1 through 4 are in the data file named Lesson 33 Exercise File I on the Web alhtp:/Wvww.prenhall.com/greensalkind. The data are based on the following research problem. Peter was interested in determining if children who hit a bobo doll more frequently would Siplay more or less aggressive behavior on the playground. He-was given permission to ob- sare 10 boys in a nursery school classroom. Bach boy was encouraged to hit a bobo doll for Sminutes. The number of times each boy struck the bobo doll was recorded (bobo). Next, Peter Bicred ine boys on the playground for an hour and recorded the mamber of ines each boy suck a classmate (peer), 1. Conduct a linear regression to predict the number of times a boy would strike a classmate from the number of times the boy hit a bobo doll. From the output, identify the following: Slope associated with the predictor Additive constant for the regression equation Mean number of times they struck a classmate Correlation between the number of times they hit the bobo doll and the number of times they struck a classmate Standard error of estimate 2. Whatis the relationship between the multiple R and the bivariate correlation between the predictor and the criterion? 3. Create a scatterplot of the relationship between the two variables, Plot the regression line on the graph. What can you tell from this graph about the predictability ‘of the dependent variable? 4. Write a Results section based on your analyses. eege ‘The data for Exercises 5 through 7 are found in the data file named Lesson 33 Exercise File 2 the Web at http://www-prenhall.com/greensalkind. The data are based on the following problem, Bety is interested in determining whether the number of publications by a professor can be ted from work ethic, She has access to a sample of 50 social science professors who were at the same university for a 10-year period. Betsy has collected data on the number of Lesson 33 / Bivariate Linear Regression 283SS publications each professor has (num_pubs). She also has scores that reflect professors’ work ethic (work eth). These scores range from I to 50, with 50 indicating a very strong work ethic. 5. Conduct a bivariate linear regression to evaluate Betsy's research question. From the output, identify the following: &. Significance test to assess the predictability of number of publications from work ethic b. Regression equation cc. Correlation between number of publications and work ethie 6. Create a scatterplot of the predicted and residual scores, using the steps described, in Using SPSS Graphs to Display Results (on page 281).What does this graph tell ‘you about your analyses? 7. Write a Results section based on your analyses. 284 Unit 8 / Correlation, Regression, and Discriminant Analysis Procedures

Green and Salkind (2008) - Lesson 33 - Regression PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Green and Salkind (2008) - Lesson 33 - Regression PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Das könnte Ihnen auch gefallen