Sie sind auf Seite 1von 8

Multinomial Logistic Regression To Estimate The

Influence Of Accident Factors On Accident Severity


Muhammad Husaini Nadri1
1
Universiti Sainis Malaysia 11500 USM Penang. Malaysia

Abstract— According to Department for Transport, road traffic have been used to determine the influential factor that
accidents are responsible for more than 3000 deaths per year in contribute to the accident severity.
the UK. Although progress is being made in a number of areas,
the number of vehicles involved in accidents have not been
falling in line over the year. This study focus on identifying the II. LITERATURE REVIEW
factor contributing the cause of UK traffic accidents severity.
A. Introduction
Dataset of UK traffic accidents from kaggle form 2005 to 2007,
2009 to 2011, and 2012 to 2014 that have 1.6 million instances A detailed literature review on related studies was preformed
with 34 attributes was chosen for the analysis. A nominal to collect information and build background knowledge for the
multinomial logistic regression model was built. This particular thesis. This literature review available based on published
research notes and journal papers on factors contributing to
model type of regression analysis was used due to the mixed
accident. During the literature review, also a task of finding
nature of data. Multinomial regression was used to compare available methodologies for analyzing influential factors for
accident severity of fatal injury, injury, and Property Damage traffic crashes was undertaken. In research studies conducted on
Only. The influential factors include Light Conditions, Day of this topic, so far multinomial logistic regression models are the
Week, Road Type, Road Class, Road surface condition, and most common models used in analyzing influential factors for
weather conditions that affect accident severity. The analysis traffic crashes.
show different factors having a statistically significant impact
on the accident severity. B. Influential Factors
Based on the study that had been conducted by Giuliano et
Keywords— accidents severity, Multinomial regression al. (2009), they have combine both descriptive statistics and
statistical modeling to analyze the factors that associated with
I. INTRODUCTION accident in the state of California. From the descriptive
investigation, it was observed that the low possibilities of
Nowadays, there are several traffic accident happening daily. accident occur in the winter and early spring (January, February,
Traffic crashes might end up with injury, death, and property and April) and most of accident occur during the late summer
damage. According to Zaloshnja et al. (2004) stated that each and early fall (August, September, and October). It was also
crash costs $59,153 to $88,483 in year 2000 dollars. Their cost observed that very less accident happen during the late night and
components include medical costs, emergency service costs, early morning, but accident rate tended to rise throughout the
property damage, loss of productivity, monetized value of pain morning, peak in the early afternoon. Additionally the
and suffering, and loss of quality of life due to injury or death. researches also noticed a crash pattern by day of the week. The
Accident severity is of special concern to researchers in traffic data indicated that accident tended to be more frequent on
safety since this research is aimed not only at prevention of weekday and minimal rate of accident over the weekend.
accidents but also at reduction of their severity. One way to Cheng and Mannering (1999) in their research also state
accomplish the latter is to identify the most probable factors that that, detail information towards roadway conditions, alcohol
affect accident severity. Variety of factors contribute to the risk use, injury, restraint use, and weather are the factors contributing
of accident, which are weather condition, road surface to the accident. The speed variable also increased the likelihood
condition, light condition, day of week, speed limit, and road of possible injury for the accident severity.
type.
This study aims at examining factors that believed to have a C. Logistic Regression Model
higher potential for serious injury or death. Other factors were
not examined because of substantial limitations in the data According to Nassar et al. (1997), they developed an integrated
obtained from accident reports. Logistic regression was used in accident risk model (ARM) for policy decisions using risk
this study to estimate the effect of the statistically significant factors affecting both accident occurrences on road sections and
factors on accident severity. Multinomial Logistic regression severity of injury to occupants involved in the accidents. Using
Identify applicable funding agency here. If none, delete this text box.

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


negative binomial regression and a sequential binary logit model building, the model was tested for regression analysis to
formulation, they developed models that are practical and easy evaluate the results.
to use. Mercier et al. (1997) used logistic regression to
determine whether either age or gender (or both) was a factor
influencing severity of injuries suffered in head-on automobile A. Data Description
collisions on rural highways.
The data set used in this study was derived from a sample of 1.5
Based on Kim et al. (1995), they built a structural model million instances involved in serious accidents reported in traffic
police records in UK. Only accident severity injury were
relating driver characteristics and behaviour to type of crash
considered for the purpose of this study. Since the study goal
and injury severity. They explained that the structural model
was to identify the factors that might affect the severity of the
helps to clarify the role of driver characteristics and behaviour accident. The dependent variable is accident severity which is
in the causal sequence leading to more severe injuries. They has 3 level of injury.
estimated the effects of various factors in terms of odds
multipliers that is, how much does each factor increase or The dependent variable was accident severity, which was coded
decrease the odds of more severe crash types and injuries. into three categories which are property-damage-only (PDO),
injury, and fatal injury. This variable represent by 1 is PDO, 2 is
Logistic regression methods have become an essential element injury, and 3 is fatal injury involved in the accident. The
of any data analysis tool to analyse relationship between a percentage of fatal crash is 84.02%, the percentage of injury
crash is 13%, and the percentage of property-damage-only
dependent variable and one or more independent variables. A
(PDO) crash is 2.98%. A multinomial variable was created for
study by Bham et al. (2012) used a multinomial logistic (MNL)
crash severity and used as the dependent variable.
model to inspect the differences in accident contributing factors
for six collision types, and to identify the most significant
factors that mostly contribute to accident severity. The
multinomial model’s estimation results mention that the risk of
a multivehicle crash was higher during weekdays while the risk
of a single vehicle collision was higher over the weekend. The
result also find that single vehicle accident were significantly
associated with night time and wet conditions. The model also
indicated that roadway grades and the presence of curves also
increased the severity of crashes.

Environmental factors such as the weather, the type of roadway,


and the area surrounding a roadway also contribute to heavy
vehicle crashes and crash severities. In one study conducted by
Khorashadi et al. (2005) heavy vehicle crash severity was
investigate in urban and rural areas. This study used a
multinomial logistic (MNL) model to model four outcomes of
heavy vehicle crash severity in urban and rural conditions.
Their study found some striking differences between the two
area types and their respective models. Most notable was that Figure 1 Distribution of Accident Severity
the different models contained different variables.
Additionally, variables shared by both models typically
possessed signs of different magnitude and impact. These
findings underscore the difference between urban and rural B. Logistic Regression
large truck crash severities and suggest that complex
interactions between driver and other measurable Logistic regression is a type of probabilistic statistical
environmental factors are playing a significant role in the classification model that used to predicting the outcome of a
demands placed on the driver in rural versus urban areas. categorical dependent variable based on one or more predictor
variables. Logistic regression measures the relationship
III. METHODOLOGY between a categorical dependent variable and one or more
independent variables, which are usually continuous, by using
The current study analyzed UK traffic accidents data from probability scores as the predicted values of the dependent
UK Department of Transport database. The data covered from variable. A chi-square test is used to indicate how well the
2005 to 2007, 2009 to 2011, and 2012 to 2014 traffic crashes in
logistic regression model fits the data.
the UK. In the model building part of the methodology,
environmental factors like road conditions, speed limit, light
conditions, Road surface condition, and others were used to
determine most influential factor to the accident severity. After
1) Logistic Regression Equation traffic accidents data based on the data analyzed in this study.
The second part discusses the results of the multinomial logistic
The logistic formulas are stated in terms of the probability that regression modelling and lastly, in the third part, how the
Y=1, which is referred to as P. The probability that Y is 0 is 1- influential factors impact the results is discussed.
P.
A. Results of Logistic Regression Modelling for Predictor
Variables
Equation 1
The response variable in the dataset which consisted of
three levels of crash was modelled as a nominal variable. 38
predictor variables selected from dataset with six of these were
grouped variables. The main objective of this task was to
The ln symbol refers to a natural logarithm P can be computed investigate the complex relationships between the accident
from the regression equation also. Therefore, if we know the severity and the 38 selected predictor variables by using the
regression equation, we could, theoretically, calculate the logistic regression modelling. The final product from this
expected probability that Y=1 for a given value of X. logistic regression analysis was to identify the influential
factors, which should be used in the regression modelling of
2) Logistic Regression Model Fit predicting the accident severity.
The first part of this analysis was checking nominal of
a) Maximum Likelihood dependent variable accident severity by using test of parallel
lines. In the second part of this analysis, the logistic regression
Maximum Likelihood is used to finding the best fitting line by model was built by using stepwise selection. The final results
contained the variables that were selected from the stepwise
minimizing the squared residuals, as the ordinary least squares
selection procedure.
(OLS). ML is a way of finding the smallest possible deviance
between the observed and predicted values using calculus. With
ML, the computer uses different "iterations" in which it tries
different solutions until it gets the smallest possible deviance or B. Dependent Variable Crash Severity Checking
best fit. Once it has found the best solution, it provides a final
value for the deviance, which is usually referred to as "negative a) Test of Parallel Lines Procedure
two log likelihood". According to Pedazur and Hosmer(1989),
the deviance statistic is called –2LL and it can be thought of as
a chi-square value
Model Chi-Square df Sig.
C. Analysis of Data Null Hypothesis
A multinomial logistic regression model was developed. The General
test of parallel lines to check which multinomial logistic 65.226 34 .001
regression model was used. The stepwise selection was made to
select the best explanatory variables to build the model, these
From the test of parallel lines results shown in Table 4.1
variables were considered as the influential factors. The
above, the significant level is less than 0.05, so it is significant.
significant level and chi-square were presented for each
Therefore the dependent variable of accident severity is
explanatory variable in the likelihood ratio tests and maximum
nominal, so the nominal multinomial logistic regression model
likelihood estimates. They are important criterions for checking
was built.
whether the explanatory variable was an influential factor on
dependent variable accident severity or not.

b) Accident Severity Frequency

IV. RESULTS Frequency Percent Valid Percent


Several environmental factors, which are considered to have Valid 1 19441 2.98 2.98
an effect on the occurrence of related with accident severity were
2 204504 13 13
analyzed by using logistic regression in order to determine the
most significant ones. The multinomial logistic regression 3 1280205 84.02 84.02
model which uses maximum likelihood estimation method was Total 1504150 100.0 100.0
applied to estimate statistically the effects of these variables in From the dependent variable crash severity frequency table
contributing to the occurrence of accident severity levels. shown above, a total of 1504150 observations of the accident
Predictor variables were tested at a 95% significance level. The data points were used whereby, 1.3% observations involved in
first part of this chapter presents the descriptive results of UK
property-damage-only (PDO), 13.6% observations involved in
injury, and 85.1% observations involved in fatal injury.
3) Odds Ratio Estimates

Table 3 Odds Ratio Estimates


C. Nominal Logistic Regression Model Building

Od 95% Confidence
ds Interval for Exp(B)
Accident_Severitya
Rat Lower Upper
1) Stepwise Selection Procedure
io Bound Bound
Intercept
Table 1 Stepwise Summary
[Weather_Cond
.21
itions=Fine with .023 2.044
6
high winds]
[Weather_Cond 6.9
itions=Fog or 30 .000 .b
mist] E-8
[Weather_Cond
.66
From the summary of stepwise selection shown in table the itions=Raining .132 3.347
stepwise selection procedure remove 1 variables which are 4
Pedestrian Crossing Physical Facilities to build the multinomial with high winds]
logistic regression model. The variable was removed by using [Weather_Cond
the Wald statistic criterion based on Backward Stepwise. This
variable had been removed due to significant value higher than itions=Raining .39
.120 1.297
0.05. This variables do not statistically significant to the model without high 5
and do not have influential relationship with the dependent
winds]
variable accident severity.
[Weather_Cond
18.
itions=Snowing 1.507 230.806
2) Maximum Likelihood Estimates 650
1 with high winds]
Table 2 Maximum Likelihood Estimates
[Weather_Cond
itions=Snowing 1.2
.119 13.424
without high 67
winds]
[Day_of_Week .87
.435 1.759
=1] 5
[Day_of_Week 1.1
.621 2.114
=2] 46
[Day_of_Week .69
.353 1.364
=3] 4
[Day_of_Week .86
.450 1.646
The chi-square is the important criterion in maximum =4] 1
likelihood estimates for identifying the influential factors for the
model. If the p-Sig. value is less than 0.05, it means this factor [Day_of_Week .99
.533 1.867
is significant to the dependent variable accident severity. From =5] 7
the table, it can be seen that Day of Week, Light Conditions,
Road Type, and Road Class are statistically significant to the [Day_of_Week 1.3
.772 2.477
dependent variable since all the -Sig. value is less than 0.05. =6] 82
[Light_Conditio 1.7
[@1st_Road_Cl
ns=Darkeness: .60 48 .000 .b
3.891 9.349 ass=2]
No street 31 E-7
lighting] [@1st_Road_Cl 1.8
1.165 3.084
[Light_Conditio ass=3] 96
ns=Darkness: 1.4 [@1st_Road_Cl 2.2
.951 2.249 1.250 3.978
Street lights 63 ass=4] 30
present and lit] [@1st_Road_Cl 1.0
.482 2.331
[Light_Conditio ass=5] 60
ns=Darkness: 3.0 Intercept
Street lights 03 .000 .b [Weather_Cond
.65
present but E-7 itions=Fine with .227 1.896
6
unlit] high winds]
[Road_Surface [Weather_Cond
1.8
_Conditions=Dr .566 1.329 itions=Fine 1.1
68 .594 2.175
y] without high 37
[Road_Surface winds]
_Conditions=Fl 5.1 [Weather_Cond
.423 63.347 .30
ood (Over 3cm 79 itions=Fog or .037 2.537
4
of water)] mist]
[Road_Surface [Weather_Cond .71
.58 .287 1.773
_Conditions=Fr .124 2.763 itions=Other] 3
4
ost/Ice] [Weather_Cond
.58
[Road_Surface 1.6 itions=Raining .173 1.999
8
_Conditions=Sn 96 .000 .b 2 with high winds]
ow] E-7 [Weather_Cond
[Road_Type=D itions=Raining 1.1
1.4 .580 2.417
ual .174 12.785 without high 84
92
carriageway] winds]
2.6 [Weather_Cond
[Road_Type=O 1.6
02 .000 .b itions=Snowing .124 20.869
ne way street] 07
E-7 with high winds]
[Road_Type=R .38 [Weather_Cond
.034 4.193
oundabout] 0 itions=Snowing 1.1
.244 5.635
[Road_Type=Si without high 72
1.1
ngle .140 9.743 winds]
67
carriageway] [Day_of_Week .97
.716 1.336
[Road_Type=Sl .96 =1] 8
.051 18.131
ip road] 2 [Day_of_Week .73
.540 1.005
[@1st_Road_Cl 1.9 =2] 7
.800 4.574
ass=1] 13
[Day_of_Week .59 [Road_Type=O 1.0
.431 .811 .313 3.417
=3] 1 ne way street] 34
[Day_of_Week .78 [Road_Type=R .78
.577 1.057 .256 2.395
=4] 1 oundabout] 2
[Day_of_Week .76 [Road_Type=Si
.561 1.036 1.3
=5] 3 ngle .482 3.942
79
[Day_of_Week .94 carriageway]
.707 1.255
=6] 2 [Road_Type=Sl .87
.195 3.932
[Light_Conditio ip road] 6
ns=Darkeness: 2.0 [@1st_Road_Cl .91
1.474 2.784 .516 1.610
No street 26 ass=1] 1
lighting] [@1st_Road_Cl 1.0
.127 8.928
[Light_Conditio ass=2] 66
ns=Darkness: 1.2 [@1st_Road_Cl 1.2
.462 3.207 1.007 1.524
Street lighting 18 ass=3] 39
unknown] [@1st_Road_Cl 1.0
.752 1.352
[Light_Conditio ass=4] 08
ns=Darkness: 1.4 [@1st_Road_Cl 1.2
1.196 1.788 .880 1.641
Street lights 62 ass=5] 01
present and lit]
[Light_Conditio Table shows the Odds Ratio for the predictor. The reference
ns=Darkness: group for this table is accident severity 3. An Odds Ratio
1.0 indicate if the odds ratio < 1, the outcome is more likely to be in
Street lights .233 4.568 the referent group.
32
present but
unlit]
PDO relative to fatal injury
[Road_Surface
1.2
_Conditions=Dr .952 1.545
13 Weather conditions of Fine with high winds, Fog or mist,
y] raining with high winds, and raining without high winds have
[Road_Surface higher tendency to increase accident severity to fatal injury. All
working days from Monday to Friday all have odds ratio below
_Conditions=Fl 1.0 1 which indicate more likely to have fatal injuries. All light
.000 .b
ood (Over 3cm 57 conditions that include no street lighting, Street lighting
unknown, Street lights present and lit, street lights present but
of water)] unlit are more likely to cause injury of accident severity since
[Road_Surface the odds ratio greater than 1.For the accident happen on Dry and
.12 Flood (Over 3cm of water) road surface, it will likely to cause
_Conditions=Fr .613 2.626 property damage only. But if accident happen on Frost and Snow
69
ost/Ice] road surface it more likely to have fatal injuries. It is more likely
that accident happen at road type that have roundabout to have
[Road_Surface accident severity up to fatal because odds ratio less than 1.
.23
_Conditions=Sn .496 10.761 However for Dual carriageway, One way street, and Single
11 carriageway it has accident severity to injury only because odds
ow] ratio greater than 1.
[Road_Type=D
1.0
ual .345 2.993
17
carriageway]
Injury relative to fatal injury will likely to cause property damage and injury only. But if
accident happen on Frost and Snow road surface it more likely
It is more likely that accident happen at road type that have to have fatal injuries. There is no other factors were considered
roundabout to have accident severity up to fatal because odds as significant for the likelihood that influence accident severity.
ratio less than 1. However for Dual carriageway, One way street,
and Single carriageway it has accident severity to injury only Therefore, it is recommended that characteristics of drivers
because odds ratio greater than 1.Weather conditions of Raining who get involved, or more correctly, drivers who cause the
without high winds, snowing with high winds, snowing without occurrences of motorcycle-motor vehicle crashes should be
high winds have higher tendency to have injury during accident. educated on the influential factors affecting such crashes in
For the light condition of no street lighting and street lights order to take effective countermeasures.
present but unlit lit which have odds ratio below 1 which
indicate more likely to have fatal injuries. All working days from
Monday to Friday all have odds ratio below 1 which indicate Future Recommendation
more likely to have fatal injuries. For the accident happen on
Dry and Flood (Over 3cm of water) road surface, it will likely For the future work the following recommendation can be
to cause injury only. But if accident happen on Frost and Snow carried out.
road surface it more likely to have fatal injuries. 1. Other model like log linear model and nested logistic
models can be used to predict accident severity.
2. Different variable like different crash types and
V. CONCLUSIONS AND FUTURE RECOMMENDATIONS different roadways can be used to produce better model.

This study examined the influential factors for UK traffic REFERENCES


accidents. The main objective of the current study was to
[1] Zaloshnja, E., & Miller T. (2004). Costs of large truck-involved crashes
determine the influential factors that contribute significantly to in the United States. Accident Analysis and Prevention 36, (pp 801-808).
the accident severity levels when accident happen. Based on this [2]
main objective, data from 2005 to 2007, 2009 to 2011, and 2012 [3] Giuliano, G., Zhou, J., McFerrin, P., &Miller M. (2009) Commercial
to 2014 obtained from the UK Department of Transport was Motor Vehicles’ SafetyA California Perspective. Institute of
used for this analysis. In this study, the nominal multinomial Transportation Studies- University of California, Berkley. Retrieved
logistic regression model was built to investigate characteristics from:
of injury and fatality of in UK. http://www.path.berkeley.edu/PATH/Publications/PDF/PWP/2010/PWP
-2010-01.pdf. November 2012.
The multinomial logistic regression model was used because [4]
it has the ability to detect influential factors for accident severity. [5] Chang, L. & Mannering, F. (1999) Analysis of injury severity and vehicle
The dependent variable was accident severity, which was coded occupancy in truckand non-truck-involved accidents. Accident Analysis
into three categories. They were property-damage-only (PDO), and Prevention, 31 (5), 579-592.
injury, and fatal injury. The explanatory variables include [6]
several traffic and environmental factors. They are considered to [7] Bham, G.H, Bhanu, J.S., & Manepalli, U.R. (2012) Multinomial Logistic
have an effect on the accident severity and they were analyzed Regression Model for Single-Vehicle and Multivehicle Collisions on
Urban U.S. Highways in Arkansas. Journal of Transportation
by using some powerful statistical modeling techniques in order Engineering, 138(6), 786-797.
to determine the most significant ones. A total of eight variables [8]
were selected for exploratory analysis to investigate [9] Khorashadi, A., Niemeier, D., Shankar, V., & Mannering, D. (2005)
characteristics of predictor variables and screen out the most Differences in Rural and Urban Driver-Injury Severities in Accidents
influential ones. A multinomial logistic regression model was Involving Large-Trucks: An Exploratory Analysis. Accident Analysis
built by using both to investigate the influential factors for and Prevention, 37 (5), 910-921.
accident severity in UK.
[10] Nassar, S.A., Saccomanno, F.F., Shortreed, J.H., 1997. Integrated Risk
The results of the current study suggested that several Model (ARM) of Ontario. Presented at the 76th Annual Meeting of the
explanatory variables did influence the likelihood of an accident Transportation Research Board, Washington, DC
severity that may cause fatal injury. The influential factors
include Light Conditions, Day of Week, Road Type, Road [11] Mercier, C.R., Shelley, M.C., Rimkus, J., Mercier, J.M., 1997. Age and
Class, Road surface condition, and weather conditions. For gender as predictors of injury severity in head-on highway vehicular
accident happen on weather conditions of Fine with high winds, collisions. In: Transportation Research Record 1581, TRB, National
Fog or mist, raining with high winds, and raining without high Research Council, Washington, DC.
winds have higher tendency to increase accident severity to fatal
injury. Day of week also influence the accident severity which [12] Kim, K., Lawrence, N., Richardson, J., Li, L., 1995. Personal and
behavioural predictors of automobile crash and injury severity. Accident
from Monday to Friday more likely to cause fatal injuries. For Analysis and Prevention 27 (4), 469–481.
the light condition of no street lighting and street lights present
but unlit more likely to cause fatal injuries. For the accident
D. W. Hosmer, S. Lemeshow. Applied logistic regression [M]. New York,
happen on Dry and Flood (Over 3cm of water) road surface, it NY: John Wiley & Sons, Inc., 1989.

Das könnte Ihnen auch gefallen