Sie sind auf Seite 1von 12

DELHI TECHNOLOGICAL UNIVERSITY

ECONOMETRICS PROJECT REPORT

Linear Regression Analysis on Mileage of Heavy Trucks


and their Fuel Consumption

Group Members:
Prateek Singh (2K13/EL/062)
Ravi Kumar (2K13/EL/069)
Ravi Ranjan (2K13/EL/070)
Rohan Mathur (2K13/EL/072)
Vishal Dabas (2K13/EL/094)

Submitted To:
Dr. Seema Singh
LINEAR REGRESSION ANALYSIS PROJECT

ABSTRACT
Simple Linear Regression analysis conducted on a set of linearly trending data to obtain the certainty

in the variability of the observations. The data chosen for the linear regression analysis of the datasets

is Mileage of Heavy Trucks for years 1949 to 2010 (xi) vs Fuel consumption for mileage of that year

(yi). The data obtained is analyzed using the regression tool in Microsoft Excel and to test whether

the data can be used to predict the fuel consumption for a given mileage. The data is prepared by

separating the data as training data and the test data. The regression analysis is carried out for the

training data and finding the predicted value of any value of Mileage within the training data range

for obtaining fuel consumption. Also, a hypothesis testing of the slope parameter is carried out to

clarify the prediction is carried out efficiently. The test data is used for the prediction analysis of the

data. The predicted values using the fitted line is utilized for the test data prediction. The error of

prediction and mean of absolute and relative error in the observation is calculated and the results are

inferred with a conclusion.

2
LINEAR REGRESSION ANALYSIS PROJECT

DATA

The data collection process is the most important thing for conducting a simple regression

analysis of a model. The data consists of independent variables (xi) of the dataset and the

dependent variables (yi) which can be used for the simple regression analysis of the model.

Linear regression line can be inferred from the scatter plot.

Figure 1. Scatterplot of Mileage of heavy truck vs Fuel consumption

The data collection is carried out by finding the relationship of at least 50 datasets for dependent and

independent continuous variables (U.S Department of Transportation, 2012). Here, the independent

variable is Mileage of Heavy truck(xi) and the dependent variable is the fuel consumption(yi). The

value of R2 gives the certainty in the variability of the model and it explains the model. The R2 value

must not be too small. As the small value may affect the hypothesis testing

3
LINEAR REGRESSION ANALYSIS PROJECT

of the model. The data chosen for the linear regression analysis of the model is dataset of 62

observations for Mileage of Heavy Truck (xi) vs Fuel consumption (yi) shown the scatterplot

above.

From the data of the variables, taking a random variable for each data by using RAND() function of

the excel. Copy the random values with the function and replace it by pasting just the values. Sort the

random numbers from smaller to largest. The training data is just the 80 percent of the total number

of data i.e. 62, so the number of training data is 50 and the number of test data is 20 percent

remaining i.e. 12. So, the regression analysis is carried out of the training data. The arranged training

data is shown in the Appendix . Using the data-> data analysis tool. The regression is carried out for

the 50 datasets. The prediction analysis is carried out for the test data.

4
LINEAR REGRESSION ANALYSIS PROJECT

Simple Linear Regression Analysis

The input range of X & Y for the training data set is taken for using data analysis->regression tool.

The default confidence is 95%. Tick the graphs such as fit plot graph, residual plot and normal

probability plot. The generated output of the analysis is shown in the figure. Equation of fitted

regression line is y = 0.178x-271.48

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.973630211
R Square 0.947955788
Adjusted R
Square 0.946871534
Standard Error 274.3523783
Observations 50

ANOVA
df SS MS F Significance F
Regression 1 65807340.7 65807340.7 874.29276 1.83285E-32
Residual 48 3612922.919 75269.22748
Total 49 69420263.62

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -271.4825557 118.9314621 -2.282680721 0.0269201 -510.6102872 -32.3548242 -510.610287 -32.35482424
Mileage of
Heavy truck 0.17802848 0.006020895 29.56844198 1.833E-32 0.16592266 0.1901343 0.16592266 0.190134301

Table 1. Excel Output of the Training dataset

Interpretation of the slope: If

the Mileage of the heavy truck

increases by one miles per

vehicle, we predict that the fuel

5
LINEAR REGRESSION ANALYSIS PROJECT

consumption increases by approximately 0.178 gallons per vehicle.

Figure 2. Fitted line plot overlaid on the scatterplot

Interpretation of the intercept: If the Mileage of the heavy truck is zero miles per vehicles, we

predict the fuel consumption is -271.48 gallons per vehicles.

F-test of the training data: Stating the Hypothesis for the Slope parameter: -

H0: 1 = 0 vs H1: 1 0. From the ANOVA table of training data, we find that the F value is

greater than F significant value of the data. So, we reject H0.

P- value of f-test: The value of F statistics is 874.2927609. Therefore, the p value of this is

<0.00001. Hence the P value is less than the significance level 0.05. Hence, we reject H0. So, the

slope parameter cannot be zero.

Conclusion: This is a strong conclusion that the slope parameter of the given dataset is not zero.

We have statistically significant evidence at =0.05 to show that there is a relation between

mileage and fuel consumption of heavy truck. This gives the idea of the prediction of the values

of the training data.

Interpretation of R2 value from the data: From the scatter plot of the data plotted for all the

datasets, the R2 value is 0.9468. This value shows the certainty of the variance of the data.

94.68% of the variability of fuel consumption is explained by the graph of the model with the
independent lot size.

Residual Plot: The Vertical axis of the residual plot of the graph are the residuals and the

independent variable i.e. Mileage of Heavy Truck for different years is on the horizontal axis. The

6
LINEAR REGRESSION ANALYSIS PROJECT

data points are in the scatter form

around(randomly dispersed) the

horizontal axis, this gives the

information that the linear

regression model is appropriate.

Figure 3. Residual Plot of the data

Confidence interval and prediction interval:

Taking a value of x0 within the x-range. Let us take the value of x0 as 10000 mileage from the

training data. The predicted value y = 1508.52 from the equation of the predicted line of the

training data.

Confidence Interval is given by E[Y|x=x0] two sided: y| x=x0 t /2, n-2 s.e{y| x=x0}.
Standard Error of slope: s.e{b}: . { } =

The value of MSE (Mean squared error) from the ANOVA table is 75269.2274792751 and the

value of Sxx = xi2 nx2 = 2076325186.

So, the standard error of b1 is given by s.e{b1} = 65. 0528.Therefore, s.e{y| x=x0} = 65.0528 and

t /2, n-2 = t0.025,48 = 2.011

So, the confidence interval is (1377.69881,1639.3411). Thus, the we are 95% confident that the

value of x0(Mileage of Heavy Truck) lies in this interval.


Prediction interval: Prediction error: p.e{y | x=x0} = +( . { | = }) = 281.959.

7
LINEAR REGRESSION ANALYSIS PROJECT

So, the prediction interval: y| x=x0 t /2, n-2 p.e{y| x=x0}

Therefore, the prediction interval is for the given value of x0 is (941.4996,2075.54). The

predicted value of x0 lies in this prediction interval. The prediction interval is always larger than

the confidence interval.

8
LINEAR REGRESSION ANALYSIS PROJECT

Prediction Analysis and Conclusions

The test data with the corresponding values is shown in the table :

Data of the test data set with corresponding values


Absolute Absolute
Mileage Fuel Error of error of Relative Relative
of Heavy Consump Predicted prediction Prediction error of error of
Year truck(x ) tion(y ) random values y P.E= y -y (P.E) prediction Prediction
i i i
1960 26609 4174 0.85866 4465.66 -291.6641 291.66405 -0.06988 0.069876
1976 15167 2722 0.87599 2428.67 293.3323 293.33232 0.107764 0.107764
1957 10682 1281 0.87981 1630.21 -349.2121 349.2121 -0.27261 0.272609
1995 10963 1283 0.89824 1680.24 -397.238 397.23796 -0.30962 0.309616
1986 22143 3821 0.90461 3670.59 150.409 150.409 0.039364 0.039364
1993 10769 1288 0.91498 1645.7 -357.7005 357.70053 -0.27772 0.277718
1994 10395 1380 0.93487 1579.12 -199.1181 199.11806 -0.14429 0.144288
2003 19931 3647 0.96242 3276.79 370.2069 370.20693 0.10151 0.10151
1968 28573 4387 0.97485 4815.31 -428.311 428.31104 -0.09763 0.097632
1951 10545 1242 0.9843 1605.82 -363.8223 363.82226 -0.29293 0.292933
1977 27023 4057 0.98496 4539.37 -482.3676 482.36764 -0.1189 0.118898
1966 26014 4352 0.99553 4359.74 -7.737392 7.737392 -0.00178 0.001778

MAPE= 307.59327 MARE= 0.152832


Table 2. Test data set with corresponding predicted values

The table shows the predicted values (fitted line) for the xi of the test data (y), error of Prediction

(y-y), absolute value of P.E(A.P.E), Relative error of Prediction (R.E = P.E/(absolute value of

y)), Absolute Relative Error of Prediction( A.R.E).

MAPE (Mean Absolute Error of Prediction) value from the test data is 307.59327. MAPE is

used to understand the effectiveness of the predicted values to the actual model in the same units

of data. Here the developed model is of the fuel consumption impacted by the Mileage of heavy

truck over the years. The MAPE value gives the error that may occur in the predicted value

which is quite higher. For predicting the Values of the fuel consumption, the corresponding value

9
LINEAR REGRESSION ANALYSIS PROJECT

concerning the Mileage for a Predicted year may have high value of error and thus weakens the

model. The predicted value of fuel consumption is deviated by 307.593 units which gives larger

errors in prediction.

MARE (Mean Absolute relative error of Prediction) value from the test data prediction analysis is

0.15283. MARE measures the predictive power of y using the observations from the test data. It

measures the average of the relative error of prediction of y for the observations in the test data. The

lower the value of MARE , the higher is the predictive power of y. Mean relative error of prediction

is the average value a relative error of prediction by which a value can differ for the Mileage of the

heavy truck. Here the value of MARE in percentage is 15.28% which is quiet large indicating the

relative error of prediction for different values of Mileage of the test data is larger, which explains

the variability of the observations less accurately. (Thomas Bruckmann, 2014)

This implies that the prediction model for the data of linear regression analysis (Mileage vs Fuel

consumption) is likely to be less accurate due to the higher values of errors MAPE and MARE.

The prediction of the Mileage of Heavy truck should be done considering the errors.

General Conclusion:

The Prediction Model constructed for the simple regression analysis for the Mileage vs Fuel

consumption of Heavy truck for different consecutive years is developed. From the analysis of

the training data, the predicted value is accurate for predicting the fuel consumption within the x-

range of the training data. But for the prediction of test data the prediction errors are quite high

which makes the goodness of fit and the prediction power less accurate. Thus, the model cannot

be used for the prediction of the values due to large values of errors in the data.

10
LINEAR REGRESSION ANALYSIS PROJECT

APPENDIX
Table of the training data with the corresponding fiited values and residuals

Mileage of Fuel
Heavy Consumption Predicted Fuel Consumption
Observation Year truck(xi) (yi) (Fitted values) Residuals
1 1970 26602 4477 4464.431077 12.56892
2 1965 10537 1341 1604.403541 -263.404
3 1956 18736 3447 3064.059051 382.9409
4 2009 10768 1303 1645.52812 -342.528
5 1979 26092 4221 4373.636552 -152.637
6 1996 20597 3570 3395.370053 174.6299
7 1973 26514 4315 4448.76457 -133.765
8 1955 26235 4385 4399.094624 -14.0946
9 2006 10693 1333 1632.175984 -299.176
10 1963 23603 3953 3930.523664 22.47634
11 1949 15370 2775 2464.815186 310.1848
12 1964 18502 3380 3022.400386 357.5996
13 2008 26274 4037 4406.037735 -369.038
14 2000 14117 2519 2241.7455 277.2545
15 1954 10576 1293 1611.346652 -318.347
16 1967 27071 4642 4547.926434 94.07357
17 1997 13565 2467 2143.473779 323.5262
18 2004 12789 2294 2005.323679 288.6763
19 1987 9712 1080 1457.530045 -377.53
20 1981 19016 3565 3113.907025 451.093
21 1998 25231 4304 4220.35403 83.64597
22 1983 22550 3967 3743.059675 223.9403
23 1990 28093 4215 4729.871541 -514.872
24 1953 13484 2459 2129.053472 329.9465
25 1971 26262 4309 4403.901393 -94.9014
26 1999 10511 1309 1599.7748 -290.775
27 1975 27032 4218 4540.983323 -322.983
28 1991 10316 1229 1565.059247 -336.059
29 2002 25838 4202 4328.417318 -126.417
30 2001 10702 1328 1633.77824 -305.778
31 1985 25617 4391 4289.073024 101.927
32 1972 18045 3263 2941.041371 321.9586
33 1961 21083 3769 3481.891894 287.1081
34 1959 10851 1387 1660.304484 -273.304
35 1962 10408 1389 1581.437867 -192.438
36 1982 23349 3937 3885.30443 51.69557
37 2010 22485 3736 3731.487823 4.512177
38 1969 12402 2240 1936.426657 303.5733
39 1988 22926 3776 3809.998383 -33.9984
40 1980 16700 3002 2701.593065 300.4069
41 1950 12537 2250 1960.460502 289.5395
42 1952 28290 4398 4764.943151 -366.943
43 2005 15438 2764 2476.921123 287.0789
44 1978 25397 4135 4249.906758 -114.907
45 1992 25373 4210 4245.634074 -35.6341
46 2007 10554 1337 1607.430025 -270.43
47 1989 14995 2708 2398.054506 309.9455
48 1958 24229 4047 4041.969493 5.030507
49 1974 10774 1304 1646.596291 -342.596
50 1984 14780 2657 2359.778383 297.2216

Table 3. Training data with corresponding fitted values and residual

11

Das könnte Ihnen auch gefallen