Sie sind auf Seite 1von 3

Interpreting Regression Output

Suppose I am investigating the relationship between types of cars and their miles per gallon. My hypothesis is that luxury models are gas guzzlers. I am testing this hypothesis using 1978 auto data. I use weight as a proxy for luxury models, as I expect luxury cars are heavier. It also seems to make sense that heavier cars would use more gas. At the command window, type: sysuse auto This brings up our sample data into Stata. Next, try: regress mpg weight

Stata outputs analysis of variance (anova) results along with the regression results. Top left is anova table, and bottom is regression results. The dependent variable here is miles per gallon (mpg), and the variable name is shown at the left top of regression results table. The weight here is measured in pounds. The coefficients for weight and foreign are shown in the Coef. column. Std. Err. is Standard Error, t t test statistics, P>|t| the p values, and 95% Confidence Interval. The results can be written in regression equation form as: predicted MPG = 39.44 - 0.006WEIGHT For each pound increase in auto weight, miles per gallon decrease by 0.006, and it is statistically significant at least at 99% level (when shown as 0.000, it is less than 0.0005). You can see that the standard error is very small showing less variation and the absolute value of the t test statistic is relatively large. You can tell the statistical significance through the p value: when it is less than 0.05, it is significant at 95% level, and if it is less than 0.01, it is significant at 99% level. Constant (_cons) is an intercept of the regression line, or the starting point: mpg would be about 39 for cars with no weight. It may not make sense as such, but that is the average of mpg controlling for weight.

Right top corner lists information associated with the anova and the regression output. Total number of observations used for the analysis is 74, F test statistic with 1 numerator degrees of freedom and 72 denominator degrees of freedom is 134, and it is statistically significant at 99% level, because the p value is 0.000. I will come back to the R-squared and adjusted R-squared in the next model. Root MSE is square root of the mean squared error (MS Residual in the anova table), and is the standard deviation of the error term, what is not explained by the model. What I did earlier is a simple regression with just one predictor variable. Now, I want to control whether the cars are U.S. models or non-U.S. models in addition to weight in predicting miles per gallon. Then it is an example of a multiple regression.Variables that have a binary outcome like this U.S. vs non-U.S. models are called dummy variables. The interpretation of the variable is easier if you code them as 0 or 1. Here, the variable foreign are coded 0 for US (domestic) cars and 1 for non-US (foreign) cars.

predicted MPG = 41.68 - 0.0066WEIGHT - 1.65FOREIGN You can plug in 0 into foreign to estimate the MPG for domestic cars, and 1 for foreign cars: so MPG is 1.65 less for foreign cars than for domestic cars. Controlling for foreign cars, still, heavier models use more gas: each one pound increase in weight results in 0.0066 less mpg. Notice that foreign is not statistically significant at any conventional level of significance in this model. So can we say foreign, after all, is not important in estimating mpg? Here, it is very important that you distinguish statistical and substantive significance. Statistical significance shows you the probability that the sample value is the population value, assuming null hypothesis of no relationship is true. In addition, statistical significance can change by getting more observations, or by fitting the regression line better. Later you can see the change in the statistical significance of foreign by making an adjustment to the model. In the earlier model, R squared was 0.65, meaning about 65% of the variance in mpg is explained by the model. In this regression I got R-squared of 0.6627, so by adding one variable I am explaining the mpg 1% more. Adjusted R squared adjusts the value of the R squared by the ratio of the sample size to the number of variables. Naturally, R squared will be larger if you have more variables, but the adjusted R squared takes the number of

variables into account. It can be useful when you have many variables and a small sample size. The formula to get the adjusted R squared is 1- ((1- R squared)* ((n-1)/(n-k-1)). In the earlier model, adjusted R-squared was 0.6467, and in the current model it is 0.6532. So still this model explains the mpg better. We jumped right in to regression, but there is a whole series of assumptions we are making in running regression analyses. In your study, you need to check the data to see if the regression assumptions are met. UCLA has very good sites where they discuss regression diagnostics.

Das könnte Ihnen auch gefallen