Sie sind auf Seite 1von 11

ISM6423 Project Proposal

Project Title
Data Mining on Traffic
Group Information
Group Name: JohnSnow
Group Members:
Chenliang Gong: gchenlia@ufl.edu
Hui Zhao: huizhao@ufl.edu
Rong Jian: jrong@ufl.edu
Yujia Yang: yujia.yang@ufl.edu
Zhuo Liu: zhuoliu_mario@yahoo.com
Zi Yang: yz1991@ufl.edu
Introduction
Traffic accidents, as well as the injury degree of traffic accidents can be affected by
higher speed limit of a highway and the traffic law changes of a state. The state of California is
trying to find out what would be the critical factors that are related to traffic accidents and traffic
fatalities from 1981 to 1989 by using monthly time series and panel data model.
This paper is going to choose several independent variables with related dependent
variables to establish different regression models in order for analyzing and estimating the
correlations and relationships between different variables.
Research Questions
Does the state unemployment rate affect traffic fatalities?
How did the traffic fatality trend go after the seat belt law speed limit law becomes effective?
How did the traffic fatality trend go after the speed limit law becomes effective?
What is the different impact of seat belt law on several types of accidents?
What is the accident trend through months in a year?
Research Objectives
To identify and evaluate traffic law changes impact on traffic fatalities.
To analyze the traffic data on different variables, including accident types, speed, time trend, etc.
To provide analysis of traffic data and find out the possible variables to predict the best model.
Methodology
The research approach will include a comprehensive study of data mining techniques
from various sources including but not limited to the research by McCarthy, as well as books and
online sources. The data set used in this research is extracted online, which is used by McCarthy.
The data set will be analyzed by using data analysis tools STATA. The monthly dummy variables
will be evaluated to show how different months in which traffic accidents were affected. Some
reasonable variables will be picked up to perform the regression model analysis. According to
the value of the coefficient, decreasing the number of variables by and figure out the most fit
predict model.

Data Mining on Traffic


1.Introduction
1

Traffic accidents, as well as injury degree of traffic accidents, can be affected by higher speed
limit of highway and the traffic law changes of a state. This research is trying to find out what
are critical factors related to traffic accidents and traffic fatalities in the state of California by
using panel data from 1981 to 1989.
This paper chose several independent variables with related dependent variables to establish
different regression models in order to analyze and estimate the correlations and relationships
between different variables. Unemployment rate, types of road, effectiveness of the speed limit
law and the seat belts law, and other interaction variables will be evaluated by conducting a
regression analysis.
2.Data Selection and Estimate
This paper focuses on effects on number of accident or accident rate of a kind among total
accident. In this paper, following variables are treated as dependent:
prcfat: percentage of fatal accident among total accident
ushighw or highr: number of accident on U.S. highway divided by that of total accident
countyr: number of accident on county road divided by number of total accident
stater: percentage of state road accident among total accident
Factors such as unemployment rate, implementation of a law, and the cross term of the two are
considered to have impact on accident number or accident rate:
us or unemb: multiplication of unemployment rate and beltlaw
us1 or unems: multiplication of unemployment rate and spdlaw
spdlaw: dummy variable measuring implementation of speed law, equaling to 1 if is
implemented
beltlaw: dummy variable measuring implementation of belt law
unem: unemployment rate
At very first, general description of data is of interest to develop potential research directions.
Line Chart 1 is created in order to see the trend of the number of fatal accidents from January
1981 to December 1989 and how the number of accidents was affected by the implementation of
the law. As shown in the Line Chart, the number of fatality incident always reach its peak in July,
August, or September while fall to the bottom in January or February every year. Then, statewide
fatal accidents might be a good candidate to include in our regression model.

figure 1. Line Chart 1


Line Chart 2 shows the number of accidents on US highways, county roads, and state routes
from January 1981 to December 1989 in California. According to the chart, the number of
accidents that happened on county roads stays at the highest throughout the timeline while the
number of accidents that occurred on US highways is always the lowest. Then, regression

analysis based on these three types of roads can be conducted.


figure 2. Line Chart 2
Line Chart 3 shows the relationship between the number of fatal accidents and unemployment
rate. The overall number of fatal accidents increases gradually from January 1981 to December
1989. However, the unemployment rate decreases gradually after it reaches its highest point in
February 1983. Therefore, the unemployment rate might also be an interesting variable that could
be included in the regression analysis. Also, the unemployment rate is predicted to have a
negative effect on accidents because people have less chance to travel when the unemployment
rate is high. The speed limit law and seat belt law are predicted to reduce the number of accidents
after the law became effective. Also, this analysis is based on the time trend from January 1981
to December 1989.

figure 3. Line Chart 3

3. Model Selection

3.1 Relationship between unemployment rate and the fatal accidents regarding the seat belt law:
According to the linear regression, the independent variable us, which is the interaction
between the unemployment rate and the dummy variable for the seat belt law being carried out,
is statistically significant. Meanwhile the independent variable unem, which is unemployment
rate, is statistically non-significant. So that here comes the conclusion that the regression only
explains that the seal belt law being carried out helped the California government to reduce the
fatal accidents rate.

3.2 Relationship between unemployment rate and the fatal accidents regarding the speed law:

According to the linear regression above, neither of the independent variables is statistically
significant, so that the coefficients can not explain the relationship between the fatal accident rate
and them.
3.3 Relationship between accidents on highway and the implementation of speed law and seat

belt law

3.4 Relationship between accidents on county road and the implementation of speed law and seat

belt law

3.5 Relationship between accidents on state road and the implementation of speed law and seat

belt law
The three regression models from 3.3 to 3.5 show the relationship between accidents on three
different types of road and the implementation of speed law.
Regression model 3.3 shows the relationship between accidents on US highway and the
implementation of speed law and seat belt law. Here, a new variable is generated that shows the
percentage of accidents on US highway to the number of total accidents. The analysis does not
use the number of US highway accidents because the number of accidents might increase
throughout the time. Therefore, the problem could be avoided by using the percentage of the
accidents to the total number of accident.
After running the regression analysis, the P value of both the speed law and seat belt law in
regression 3.3 are statistically significant. The coefficient of seat belt law indicates that the
number of accidents on highway decreased by 0.003% when the seat belt law was implemented.
However, the coefficient of speed law seems to have less practical meaning because the number
of accidents on highway increased by 0.002% when the speed law was implemented.
Same regression analysis is conducted for county roads and state routes in regression model 3.4
and 3.5. The results show that seat belts law always has positive impact on all of the three types
of roads while the speed law does not have much influence on reducing the accidents on these
three types of roads except on county road. But there might be other factors we miss that can
explain this result better.

3.6 Relationship among accidents on highways, the implementation of speed law and seat belt
law, and the unemployment rate.
7

The two interaction variables test the influence of implementations of belt law and speed law on
accident rates on US highway. The implementation of belt law decreases the accident rates by
0.00045% while the implementation of speed law increases the rates by 0.00033%, and they are
both statistically significant. The coefficient of unemployment does not have a significant impact
on accident rates.
3.7 Relationship among accidents on county roads, the implementation of speed law and seat belt
law, and the unemployment rate.

3.8 Relationship among accidents on state roads, the implementation of speed law and seat belt
law, and the unemployment rate.

By using the same model with different data, we got two regression models on state road and
county road. Belt law, in both of them, has a negative relationship with accidents which are
-0.0011% and -0.0017% respectively. And they are both significant. The outcomes of
implementation of speed law lead to two different conclusions. On state road it has a positive
impact on accident rates and it is significant while on county road we cant say that it is
negatively related with accident rates since it is not significant. The unemployment rate has a
positive influence on accident rates, with a percentage of 0.0015% and 0.0005% respectively.
3.9 Panel data and fixed effect
In previous models, we couldnt get a model that is convincing enough to show the relationship
between accident rates and unemployment rate. There might be a potential problem that the
unemployment rates and the error term are correlated so it is worth a try to create a new model.
The following are outcomes of two new regression models with year as a dummy variable.
The basic idea is to use a certain year as a dummy variable and aim is to get a better model of
the relationship between accident rates and unemployment rates.
The form of the model is as follows:

Relationship between the unemployment rate difference and fatal accident rate difference in 12
months between 1981 and 1984
X Variable is the difference of unemployment between 1981 and 1984. The p-value is 0.95 which
means we cant conclude that there is negative relationship between the change in unemployment
and the change in accident rates before the two laws were implemented.

Relationship between the unemployment rate difference and fatal accident rate difference in 12
months between 1981 and 1987
Delta unem is the difference of unemployment between 1981 and 1987. The p-value is 0.09
which is closer to 0.05 but also not significant. It also means we cant conclude that the change
in unemployment decreases the change in accident rates even after the two laws were
implemented.
4. Conclusion & Recommendations
According to the result of the linear regression and hypothesis test in the previous sections:
1). The fatal accident rate is lower since the seat belt law was implemented. The speed law and
unemployment rate have no significant impact on the fatal accident rate.
10

2). Whichever type of road is, the seat belt law has statistical significant impact on reducing
number of total accident while the speed law does not have statistically significant impact on the
number of total accident.
3). The seat belt law has statistically significant impact on reducing number of total accident
while the speed law does not have statistical significant impact on the number of total accident.
However, the unemployment has a statistically significant impact on increasing the number of
total accidents on all three types of roads.
Recommendations:
1). Send more officers on state roads.
2). Increase the fine for not buckling up.
3). Cancel the speeding violation.
http://www.forbes.com/2009/01/21/car-accident-times-forbeslife-cx_he_0121driving.html
http://www.forbes.com/2009/01/21/car-accident-times-forbeslifecx_he_0121driving_slide_11.html
https://www.youtube.com/watch?v=IxGu1XELlqM
https://www.youtube.com/watch?v=h-8PBx7isoM

11

Das könnte Ihnen auch gefallen