Sie sind auf Seite 1von 5

Impact of transmission type of fuel efficiency in mtcars

20 June 2015
Executive Summarry
We are studying the data set mtcars present within R to determine the relationship between miles per gallon mpg and
transmission type am(manual/ automatic). We evaluate several model choices to explore the relationship, finally settling
on mpg~wt*factor(am) based on our choice strategy. Using, the model we discover that manual transmission offers
better mpg for cars lighter than ~2,808 lbs and manual transmission has 95% confidence of offering 3.2-11.3 miles per
gallon better mpg than automatic transmission averaged across all car weights (under sample constraints). We also look
at the residual variation in the chosen linear model.
Exploratory Data Analysis and choosing the regression model
We load the data and explore the correlation between various terms in the data set. ?mtcars provides the required
variable descriptions for the terms in the data set.
library('ggplot2');library('xtable');data(mtcars);options(scipen = 999);
cr <- as.data.frame(cor(mtcars)); tab <- xtable(cr[1:4,],
caption = "Correlation table for mtcars (top 4 rows)")
print.xtable(tab, floating = TRUE ,comment = FALSE)

mpg
cyl
disp
hp

mpg
1.00
-0.85
-0.85
-0.78

cyl
-0.85
1.00
0.90
0.83

disp
-0.85
0.90
1.00
0.79

hp
-0.78
0.83
0.79
1.00

drat
0.68
-0.70
-0.71
-0.45

wt
-0.87
0.78
0.89
0.66

qsec
0.42
-0.59
-0.43
-0.71

vs
0.66
-0.81
-0.71
-0.72

am
0.60
-0.52
-0.59
-0.24

gear
0.48
-0.49
-0.56
-0.13

carb
-0.55
0.53
0.39
0.75

Table 1: Correlation table for mtcars (top 4 rows)


The above correlation table excerpt can guide us to which variables are most correlated with miles per gallon mpg. As we
wish to study the effects of transmission am, it should also be included in the exploration. We choose to look at wt, hp
and cyl as they are highly correlated with mpg.
In Figure 1, 2 and 3, we can clearly see the relationships between mpg and weight in /1000 lbs wt(fig 1), horsepower
hp(fig 2) and number of cylinders in the engine cyl(fig 3). The linear fit model with 95% confidence intervals also helps
guide potential models we may wish to choose for our linear regression
On cursory glance, we may also wish to consider diplacement in cu. in.disp as a highly related variable to the outcome
mpg, but we can see that it is well correlated with cyl
Choosing the regression models
In order to choose the appropriate model, we follow:
- Fit a model with the variable having highest correlation with mpg (and also include am)
- Create subsequent models including another variable (one at a time) that has relatively high correlation with outcome
mpg and then perform anova as a nested likelihood ratio test and compare consecutive p-values.
Thus we choose 4 models: f1, f2, f3, f4 as shown below, with the Anova results in Table 2
f1 <- lm(mpg ~ wt + factor(am), data = mtcars); f2 <- lm(mpg ~ wt * factor(am), data = mtcars)
f3 <- lm(mpg ~ wt * factor(am) + factor(cyl), data = mtcars)
f4 <- lm(mpg ~ wt * factor(am) + factor(cyl) + hp , data = mtcars)
anv <- as.data.frame(anova(f1,f2,f3,f4))
tab1 <- xtable(anv, caption = "Anova for choosing regression model from f1, f2, f3, f4")
digits(tab1)<- 5; print.xtable(tab1, floating = TRUE, comment = FALSE)

1
2
3
4

Res.Df
29.00000
28.00000
26.00000
25.00000

RSS
278.31970
188.00767
137.99173
130.47184

Df

Sum of Sq

Pr(>F)

1.00000
2.00000
1.00000

90.31203
50.01593
7.51990

17.30489
4.79183
1.44090

0.00033
0.01731
0.24124

Table 2: Anova for choosing regression model from f1, f2, f3, f4
Examining the p-values from Table 2, we can see that there is benfit in considering an interaction between wt and am
while estimating mpg in model f2. This is true as the comparison between f1 and f2 yields a p-value of 0.0003283 which
is less than a typical Type I error rate = 0.05. So we can choose model f2 as a good choice for further study into the
effect of transmission am on mpg. Moreover, there seems to be no benefit in adding more variable to the model as per the
observed p-values for f3 and f4
Infering from the chosen model (model f2)
Model Esimation
To answer our questions of interest we plot the relationship between mpg and wt with color representing am in Figure 4.
The grey slope line shows the direct relationship between mpg and wt without considering am while the two horizontal
lines represent the mean mpg for the two transmission types. Thus from an average perspective the manual transmission
has a higher mpg = 24.39 than that of automatic transmission, which is mpg = 17.15. However, there is significant overlap
between the points and as a result a clear relationship cannot be inferred visually.
tt <- t.test(x = mtcars[mtcars$am == 0,1], y = mtcars[mtcars$am == 1,1])
hval <- hatvalues(f2); topcar <- names(hval[order(hval, decreasing = T)])
mpgChangeAuto <- f2$coeff[2]; mpgChangeMan <- f2$coeff[2] + f2$coeff[4]
This can be further analysed by taking a two.sided t-test which gives a pvalue of 0.0014 and confidence interval of -11.28,
-3.21. This definitely means that there is a measurable impact of transmission on mpg. Continuing from Figure 4, the
two regression lines for automatic and manual transmission show that mpg decreases more rapidly with increase
in weight for a car with manual transmission than one with automatic. Based on the model coefficients there
is a -3.79 change in mpg for 1000lbs increase in weight for auto transm. and a -9.08 change in mpg for
1000lbs increase in weight for manual transm. Also, as weight increases, cars tend to have automatic transmission
rather than manual which means group status partially matters (manual or auto) (Note: This assumes that the cars
sample was not chosen in such a manner that heavier cars had automatic transmission. This mpg benefit is nullified
beyond wt = 2.81 where the two regression lines intersect.
Model Characteristics
In Figure 5 we examine the model fit for model f2. The residual variation plot (plot1) shows that there is no
heteroskedasticity but significant residual variation in the middle of the dataset. We see outliers in Fiat128, Mercedes
240D, Toyota Corolla with large residual variation but as per plot4 they have low leverage. In the Normal QQ plot, the
residual error closely maps to the normal distribution, but in higher positive quantiles we see a skewness (negative) in the
error distribution. Exploring the cars having highest leverage, we get Maserati Bora with a hatvalue of 0.37.
Conclusion
Answering the Questions: Based on model f2, we can say that (1) A manual transmission is better for mpg when
weight of car is less than 2808.12lbs. Beyond that, automatic transmission offers better mpg.
(2) On an overall basis across all car weights, manual transmission offers between approx 3.2 - 11.2 better miles per
gallon than automatic (as per our t.test inference)
(3) Our conclusion is based upon the model f2 we chose, and the residual variance may impact the final result. Our model
choice was also influenced by our need to observe impact of am on mpg whose correlation is actually lesser than wt,hp,disp
and cyl (4) To infer difference in mpg we have used a t.test and the assumption is that there are no confounders that
impact the obtained result

Appendix
tr <- c("Automatic","Manual")
mtcars$trans <- tr[mtcars$am + 1]
qplot(x = wt, y = mpg, data = mtcars, color = trans, geom = c("point", "smooth"),
method = "lm", main = "Figure 1: Miles per Gallon mpg vs. Car wt (in 1000lbs)")

Figure 1: Miles per Gallon mpg vs. Car wt (in 1000lbs)


30

mpg

trans
Automatic
20

Manual

10
2

wt
qplot(x = hp, y = mpg, data = mtcars, color = trans, geom = c("point", "smooth"),
method = "lm", main = "Figure 2: Miles per Gallon mpg vs. Horse Power hp")

Figure 2: Miles per Gallon mpg vs. Horse Power hp


30

mpg

trans
Automatic

20

Manual
10

100

200

300

hp
qplot(x = cyl, y = mpg, data = mtcars, color = trans, geom = c("point", "smooth"),
method = "lm", main = "Figure 3: Miles per Gallon mpg vs. No. of cylinders cyl")

Figure 3: Miles per Gallon mpg vs. No. of cylinders cyl


35

mpg

30
trans

25

Automatic
20

Manual

15
10
4

cyl

f0 <- lm(mpg ~ wt, data = mtcars)


g <- ggplot(data = mtcars, aes(wt,mpg))
g <- g + geom_point(aes(color = trans))
g <- g + geom_hline(aes(yintercept = mean(mtcars[mtcars$am==0,1])),color ="dark grey")
g <- g + geom_hline(aes(yintercept = mean(mtcars[mtcars$am==1,1])),color ="dark grey")
g <- g + geom_abline(intercept = f0$coeff[1], slope = f0$coeff[2], color = "grey47")
g <- g + geom_abline(intercept = f2$coeff[1], slope = f2$coeff[2], color = "salmon")
g <- g + geom_abline(intercept = f2$coeff[1] + f2$coeff[3], slope = f2$coeff[2] + f2$coeff[4], color = "
g <- g + labs(title = "Figure 4: Regression Model Effects for linear model f2")
g

35

Figure 4: Regression Model Effects for linear model f2

30

25

mpg

trans
Automatic
Manual

20

15

10
2

wt
par(mfrow = c(2,2), oma = c(2,2,4,2))
plot(f2, sub.caption = "Figure 5: Linear model f2 characteristics")

15

20

25

30

Normal QQ
1 2

Fiat 128
Merc 240D
Toyota Corolla

Fiat 128
Merc 240D
Toyota Corolla

Residuals

Residuals vs Fitted

Standardized residuals

Figure 5: Linear model f2 characteristics

20

25

Fitted values

30

Residuals vs Leverage
1 2

Fiat 128
Toyota
Corolla
Chrysler
Imperial

1.0

Fiat 128
Merc 240D
Toyota Corolla

Standardized residuals

ScaleLocation

15

Theoretical Quantiles

0.0

Standardized residuals

Fitted values

1
0.5

Cook's distance
0.5

0.0

0.1

0.2

Leverage

0.3