00110

de Leur and Sayed
Claims Prediction Model for Road Safety Evaluation

By:
Paul de Leur, P.Eng., 1) Senior Highway Safety Engineer, BC Ministry of Transportation and Highways, 940 Blanshard Street, Victoria BC, Canada, V8W 9T5 2) Project Engineer, Insurance Corporation of British Columbia, 1120 Pembroke Street, Victoria, BC, Canada, V8T 1J5 Telephone: (250) 386-4222 Facsimile: (250) 386-4223 E-mail: pdeleur@ii.ca Tarek Sayed, Ph.D., P.Eng., Associate Professor of Civil Engineering University of British Columbia, 2324 Main Mall Vancouver, BC, Canada, V6T 1Z4 Telephone: (604) 822-4379 Facsimile: (604) 822-6901 E-mail: tsayed@civil.ubc.ca Key Words: Road Safety Improvement Programs Auto Insurance Claims Road Safety Analysis Prediction Models
de Leur and Sayed
ABSTRACT
Road safety analysis is typically undertaken using traffic collision data. However, the collision data often suffers from quality and reliability problems. These problems can inhibit the ability of road safety engineers to evaluate and analyze road safety performance. An alternate source of data that characterizes the events of a traffic collision is the records that become available from an auto insurance claim. In settling an auto insurance claim, a claim adjuster must make an assessment and determination of the circumstances of the event, recording important contributing factors that led to the crash occurrence. As such, there is an opportunity to access and use the claims data in road safety engineering analysis. This paper presents the results of an initial attempt to use auto insurance claims records in road safety evaluation by developing and applying a claim prediction model. The prediction model will provide an estimate of the number of auto insurance claims that can be expected at signalized intersections in the Vancouver area of British Columbia, Canada. A discussion of the usefulness and application of the claim
prediction model will be provided together with a recommendation on how the claims data could be utilized in the future.
de Leur and Sayed
1.0
INTRODUCTION
Road safety engineers use accident data to identify hazardous locations, to diagnose road safety problems, to develop options to improve road safety, to evaluate the economic feasibility of safety improvements and for road safety research. However, a serious problem associated with the accident data currently exists in the province of British Columbia, Canada. The problem stems from the deterioration in motor vehicle accident data, as collisions are not being attended and/or recorded in a systematic manner by police officials. Consequently, the reliability and usefulness of the crash data is often considered suspect. If accident data is faulty, caused either by a reduction in reporting levels or from the deterioration in the quality, then the ability of road safety professionals to engineer solutions to address problems may be severely jeopardized This paper describes an initial attempt to utilize auto insurance claims records in road safety evaluation by developing and applying a claim prediction model. The model will provide an estimate of the number of auto insurance claims that can be expected at signalized intersections in the Vancouver area of British Columbia, Canada. The claim prediction model relates traffic volumes to claim frequency and are developed and used in road safety engineering analysis to identify and rank hazardous locations. These results are then compared to the hazardous locations identified by collision records.
2.0 2.1
BACKGROUND Accident Data
In British Columbia, a form called the MV104 has been the principle tool used by police officials to collect and report information concerning traffic accidents. The form contains data elements that describe the characteristics of an accident, including information of
de Leur and Sayed
the persons involved, vehicle data, roadway data and environmental information. Due to extensive training, the police are capable of understanding the dynamics of a crash and the behavioral characteristics of drivers. This knowledge, when transferred onto a crash report, produces an extremely valuable data source for engineering diagnostics. Although recognized as critical to the business of road safety, the accident data in BC has been degrading in recent years. The main reason for the degradation in collision data is due to a reduction in crash reporting levels due to reduced resources within provincial police agencies. Table 1 shows the frequency of police attended crashes (MV104) and the reported claims to illustrate the deterioration of police collision data over time. Compounding the problem is a lack of consistency among police forces in maintaining a specific level of accident reporting. Each police jurisdiction is given the autonomy to decide what level of collision reporting is appropriate. In jurisdictions where traffic safety is high priority, the reporting levels are maintained, but in areas with a low traffic safety priority, a significant reduction in the reporting of collisions has occurred.
Table 1: Sample of Claims Records and Deteriorating Police Reported Collisions

Incident Claims and Police Reported Collisions by Year Severity Level Fatal Collision Injury Collision PDO Collision Claims MV104 Claims MV104 Claims MV104 1993 461 442 49,546 32,393 151,899 60,984 1994 443 458 53,581 33,337 153,709 63,362 1995 448 411 57,401 32,679 170,208 60,393 1996 417 357 59,442 27,146 193,808 40,785 1997 387 340 57,244 21,064 202,870 26,981 1998 389 365 58,417 19,948 209,145 22,097 Total 2545 2373 333,631 166,569 1,081,639 274,607
de Leur and Sayed
2.2
Auto Insurance Claims Data
An alternate source of data that characterizes the events of a collision is the records available from an auto insurance claim. In settling an auto insurance claim, a claim adjuster must make an assessment of the circumstances of the event and thereby recording important contributing factors that led to the occurrence of the crash. In most jurisdictions in North America, the auto insurance companies are privately owned and obtaining claims data would be difficult, if not impossible. However in British Columbia, a public auto insurance company, Insurance Corporation of British Columbia (ICBC), handles most insurance claims and centrally warehouses the data. As such, there is an opportunity to evaluate the usefulness of claims data in road safety analysis. After a collision occurrence, the auto insurance claimant will report their claim to ICBC, and the claim representative will obtain the incident information. Once a claim is opened, considerable information is obtained from the claimant regarding the collision. This includes driver-related information in an attempt to understand the behavior and condition of the driver at the time of the crash. Vehicle damage is assessed by conducting an inspection to determine if any vehicle-related feature may have contributed to the crash (i.e., defective brakes or steering). Finally, information is
obtained regarding the driving environment at the time of the crash (i.e., the road surface condition, traffic characteristics, or road design issues) that contributed to the collision occurrence. In essence, this exercise attempts to answer the who, the what, the how, the where, the when and the why questions concerning a crash. The claims data is contained in a central database maintained by ICBC. The database is designed to settle insurance claims, rather than for use in undertaking engineering analysis and therefore, extracting specific information to assist in
de Leur and Sayed
engineering efforts can be problematic. Perhaps the greatest challenge is the lack of information detailing the specific location of an incident, stemming from the lack of an easy to use and universal location referencing system. Also, the data is not efficiently warehoused in the database, as records are largely comprised of free-form text that does not support significant querying or processing, making data difficult to manipulate and extract. However, ICBC is currently undertaking a project in an attempt to resolve these data issues. It is expected that this project, called the Crash Crime Contravention Project (CCC), should produce useful claims information that can be easily extracted, manipulated and used specifically for road safety analysis and evaluation. For this research endeavor, the data was subjected to a significant amount of post-processing resulting in very useful and accurate claims data. Each auto insurance claim record was reviewed and important information was extracted and verified. In addition, since an auto insurance claim can be generated as a result of an event that should not be attributed to a road safety problem (i.e., an auto theft, vandalism or due to windshield damage), these non-collision related claims were removed from the sample. The process to obtain the claims and collision data for this research was quite involved and is described below, together with a figure to graphically illustrate the process. 1. First, all of the claims data was obtained from ICBC for the required locations (i.e., signalized intersections in the cities of Vancouver and Richmond). 2. Second, the data was screened to eliminate all non-collision related claims (i.e., auto theft, vandalism, windshield damage, etc.). 3. Third, the collision related claims were matched by intersection location and time of day to obtain the number of claim-based collisions. This matching process is necessary since one collision often involves more that one claim.
de Leur and Sayed
4. The fourth step involved obtaining copies of the police attended collision records from for the specified locations from the two municipalities (i.e., copies of the MV104 accident report form). 5. Finally, the police attended collisions were matched against the claim-based collisions at each intersection. As expected, when these two data sources were compared, the matching process was considerably less than perfect. The
collision data used for this research resulted from the summation of the following three outcomes:
Outcome 1: The collision frequency at each intersection includes the total number of matches between claim-based collisions and police attended collisions. Outcome 2: The collision frequency at each intersection includes the non-matched, claims-based collisions. Many more claim-based collisions existed as compared to police reported crashes. There are many reasons for this difference. Some collisions are below the threshold level necessary for the police to report a collision (in BC, defined as $1000 property damage). Other collisions were reportable but were simply not reported by or to the police. In addition, occasional coding errors made by the police cause these incorrect collision records to be unusable. Outcome 3: The collision frequency at each intersection included collisions that were reported by police but no claim records existed. This could occur if the vehicle was not insured by ICBC (out-of province vehicle) or was privately insured. However, this mismatch did not account for a large proportion of the total incidents.
de Leur and Sayed
The data process and outcomes are presented graphically in Figure 1.

Obtain claim records
Screen out non-collision claims (theft, vandalism)
Collision based claims records
Match collision claim records to obtain collision frequency
Obtain police reported collisions
Collisions (matched claims records)
Match police collisions and claim-based collision records
1) matched claim and police collisions 2) claim-based collisions only

Claims data used for research
3) police-based collisions only

Collision data used for research
4) Total Collisions (1+2+3)
Figure 1: Illustration to Describe the Data Compilation Process
To conclude, it was determined that the claims-based collision data in addition to the police attended collisions provided a better estimate of the true number of collisions occurring at each intersection, and thus, was used for this research. It must also be emphasized that claims data is available and is becoming more accessible in BC due to the public nature of ICBC. Therefore, claims data offers an alternative to deteriorating police reported collisions for road safety evaluation.
de Leur and Sayed
10
3.0
CLAIMS DATA USED FOR MODELING
The data used for this research project was generated from a listing of 108 signalized intersections, located within the cities of Vancouver and Richmond in British Columbia, Canada. Associated with each intersection are the total number of claims, the total number of collisions and the traffic volumes on each of the intersection streets (the major and minor roadways). As mentioned, the claims and collision data was obtained from ICBC and municipal police records, while the traffic volumes, given in the average annual daily traffic (AADT), were obtained from the Engineering Departments of each City. The time period for the claims and collision data included all incidents that occurred from January 1, 1995 to December 31, 1997. The traffic volume data represents the average daily traffic, averaged over three years. Table 2 provides a statistical summary of the data used for this research.
Table 2: Statistical Summary of Auto Claims, Collision and Traffic Volume Data Statistics Variable Minimum Value
Major Road AADT Minor Road AADT Total Claims Total Crashes Injury Crashes 10,816 4,607 12 10 7
Maximum Value
68,043 39,616 389 292 147
Mean Value
35,593 20,239 172.13 125.02 53.14
Standard Deviation
11,384 8,222 88.10 63.74 29.16
de Leur and Sayed
11
4.0
MODELING ROAD SAFETY
Models that have been developed to relate traffic collisions and volumes have been the focus of numerous studies. Alternatively, there have not been any studies undertaken to relate the frequency of auto insurance claims to traffic volume. The methodology used for modeling collisions is considered similar to that of claims and will use the generalized linear modeling approach (GLIM). In general, there are two main approaches that can be used to model road safety. The first option is to use conventional linear regression or the second option is to use a generalized linear modeling approach (GLIM). Conventional linear regression assumes a Normal distribution error structure whereas a generalized linear modeling approach (GLIM) assumes a non-Normal distribution error structure (usually Poisson or negative binomial). Historically, many researchers developed collision prediction models using conventional linear regression. However, several researchers (Hauer et-al., 1988), (Miaou and Lum, 1993) have shown that conventional linear regression models lack the distributional property to adequately describe collisions. This inadequacy is due to the random, discrete, non-negative, and typically sporadic nature that characterize the occurrence of a traffic collision. GLIM has the advantage of overcoming these shortcomings associated with conventional linear regression and recognizing these advantages, the GLIM approach will be utilized in this study. The GLIM approach used in this paper is based on the work of Hauer (Hauer, 1988) and Kulmala (Kulmala, 1995). Assuming that Y is a random variable that describes the number of insurance claims (or collisions) at an intersection in a specific time period, and y is the observation of this variable during a period of time. The mean of Y is which can also be regarded as a random variable. Then for =, Y is Poisson
de Leur and Sayed
12
distributed with parameter . Since each site has its own regional characteristics with a unique mean collision (or claim) frequency , Hauer (Hauer, 1988) have shown that for an imaginary group of sites with similar characteristics, follows a gamma distribution (with parameters and /), with a mean and variance of:
E () = ; Var ( ) =
(Equation 1)
Hauer (Hauer, 1988) and Kulmala (Kulmala, 1995) have also shown that the point probability function of Y is given by the negative binomial distribution, with an expected value and variance of:
E (Y ) = ; Var (Y ) = +
(Equation 2)
There are several approaches to estimate the parameter of the negative binomial distribution. The macro library of the GLIM software (NAG, 1996) contains three methods: maximum likelihood, mean 2 and mean deviance. The method of maximum likelihood has been the most widely used. The statistical background of the three methods is given in Lawless (Lawless, 1987).
4.1
Model Structure
The model structure used in this study relates the frequency of auto insurance claims (or the frequency of collisions) to the product of traffic flows entering the intersection. In some cases, the sum of the traffic flows entering the intersection is used instead of the product of the traffic flows. However, it is has been shown (Hauer, 1988) that a model that utilizes the product of traffic flows provides a better representation of the relationships between collisions (or claims) and the traffic flows at intersections. In this
de Leur and Sayed
13
model structure, claim frequency is a function of the product of the two traffic flows raised to a specific power (usually less than one). The model form is shown below in equation (3).
E ( ) = ao V1a1 V2a 2
where: E() V1 / V2
ao , a1 , a2
(Equation 3) = expected auto insurance claim frequency, = major / minor road traffic volume (AADT), = model parameters.
4.2
Model Development
First, model parameters are estimated based on a Poisson error structure and a dispersion parameter (d) is calculated as follows:
d =
where:
Pearson 2 n p
(Equation 4)
n p
= the number of observations, and = the number of model parameters.
The Pearson 2 is used to assess the significance of GLIM models (described further in a subsequent section) and is defined below in equation (5).
Pearson 2 =
i =1
n y E ( ) 2 i i
Var ( yi )
(Equation 5)
where:
yi
= observed number of claims at an intersection = predicted number of claims obtained from model,
E(i)
Var(yi) = the variance of the observed claims.
de Leur and Sayed
14
If d is approximately equal to 1.0, then the assumed error structure approximately fits the Poisson distribution. If d is greater than 1.0, then the data has greater dispersion than is explained by the Poisson distribution, and a further analysis using a negative binomial error structure is required. In this case, the parameters are estimated using an iterative process based on the maximum likelihood estimate (Hauer, 1988). This iterative process has been added to the macro library of GLIM (NAG, 1996).
4.3
Testing the Significance of GLIM Models Several measures can be used to assess the significance of GLIM models. One
commonly used measure is Pearson 2 statistic defined in Section 4.2 (Model Development). The Pearson 2 statistic follows the 2 distribution with n-p-1 degrees of freedom. There are also several useful subjective, graphical measures that can be used to test a models goodness of fit. The first method is to plot the average of squared residuals versus the predicted frequency. For a well fitted model, all points should be around the variance function line as defined in equation (2) for the negative binomial distribution. The second subjective graphical method is to calculate the Pearson Residual (PR) and plot it against the predicted collision (or claim) frequency. PR is defined as the difference between the predicted and observed collision (or claim) frequency divided by the standard deviation (Bonneson and McCoy, 1997). The formulation for the Pearson Residual (PR) is shown below in equation (6). For a wellfitted model, the Pearson Residuals should be clustered around zero over the range of
E ( ) (Bonneson and McCoy, 1997).
PRi = E ( i ) yi Var ( yi )
(Equation 6)
de Leur and Sayed
15
where:
E(i) = predicted number of claims from claim model,

yi
Var ( yi )
= observed number of claims at an intersection, = the standard deviation of the observed claims.
Finally, the statistical significance of the model variables can be assessed using the t-ratio test. The t-ratio is the ratio between the estimated GLIM parameter coefficient and its standard error. For a significant variable at the 95% level of confidence, the tratio should be greater than 1.96.
5.0
RESULTS OF THE GLIM MODELING
Two prediction models were developed from the claim and collision data that were obtained. The models developed are assumed to follow the negative binomial distribution, included within the GLIM software package through a macro designed by NAG (NAG, 1996). The objectives for each of the two models are defined as follows: Model 1: Predicts the total number of claims per year at an urban intersection based on major and minor road volumes. Model 2: Predicts the total number of collisions per year at an urban intersection based on major and minor road volumes. Table 3 presents the two prediction models developed, detailing the model parameters. Each model predicts the annual frequency of claims or collisions, based on the average daily traffic in thousand vehicles per day. Also included in the table are measures that are used to define the goodness of fit for each model. These measures include the t-ratio test, the value, and the Pearson 2 statistic. The measures indicate that the two models have a relatively good fit and the value calculated for the t-ratios for all independent variables are significant.
de Leur and Sayed
16
Table 3: Developed Claim and Collision Prediction Models

Model No. Model Formulation Total Claims Model: 1
AADT maj rd Claims / 3 yrs = 2.7429 1000
0.8256
Pearson t-ratio Ao
AADT mnr rd 1000
0.4028
(<2 test)
2.3 6.8 4.3 1.7 6.8 4.1 5.55 105 (129) 5.36 101 (129)
A1 A2 Ao
Total Collision Model: 2

AADTmaj rd Collisions / 3 yrs = 2.1366 1000
0.8256
AADTmnr rd 1000
0.3793
A1 A2
Two figures are also used to demonstrate the goodness of fit for each model. Figure 2 depicts the relationship between the variance of the observed frequency and the average squared residuals. Each point represents the average of the predicted frequency for a sequenced group of intersections. Figure 3 shows the relationship between the predicted frequency and the Pearson residual. Figures 2 and 3 are associated with the claim prediction model (Model 1). A similar goodness of fit relationship was found for the other prediction model developed.
de Leur and Sayed
17
Predicted Claims versus Average Squared Residuals

16000 14000 12000 10000 8000 6000 4000 2000 0 0 50 100 150 200 250 300
Average Squared Residuals
Predicted Claims
FIGURE 2: Predicted Claims versus Average Squared Residuals
Predicted Claims versus Pearson Residuals
4 3 2 1 0 -1 0 -2 -3 -4
Pearson Residuals
100
200
300
Predicted Claims
FIGURE 3: Predicted Claims versus Pearson Residual
As evidenced by the figures, reasonably good fit was achieved. The figure showing the squared residuals (averaged for 10 sequential intersections) versus the predicted frequency (Figure 2) indicates a well-fitted model. The graph showing the Pearson residual versus the predicted frequency illustrates that the residuals are clustered around zero over the range of predicted values indicating valid models. Overall, the two models are considered to be valid and fit the observed data very well.
de Leur and Sayed
18
6.0
APPLICATIONS OF THE CLAIM PREDICTION MODEL
Given that the models have been shown to be valid, two applications using the developed claim prediction model will be presented. The first application will show how the claim prediction model can be used to identify problematic locations or locations with a higher than expected number of auto insurance claims. The second application demonstrates how the claim prediction model can be used to prioritize or rank the problematic locations identified.
6.1
Location Specific Prediction: The Empirical Bayes Refinement
Due to the randomness that is inherent in the occurrence of a collision (and therefore a claim), it is important to deploy statistical techniques that can effectively account for randomness when identifying problematic locations. A statistical technique known as the Empirical Bayes (EB) refinement can be used to identify problematic locations. Two clues are available to determine the safety performance of a location: its traffic / road characteristics, and its historical collision frequency (or claim frequency) (Hauer, 1992), (Brde and Larsson, 1988). The Empirical Bayes (EB) approach makes use of both of these clues. The EB approach refines the estimate of the expected number of claims at a location by combining the observed number of claims with the predicted number of claims obtained from the GLIM model. This yields a more accurate, location-specific safety estimate. The EB estimate provides the expected number of claims for any intersection and can be calculated by using equation (7) (Hauer, 1992).
EBsafety estimate = E ( ) + (1 ) count
(Equation 7) (Equation 8)
1 Var ( E ( )) 1+ E ( )
de Leur and Sayed
19
where:
count = observed number of claims, E() = predicted claims, estimated by the GLIM model, and
Var(E()) = variance of the GLIM estimate.
Since Var ( E ( )) =
E () 2 , equation (7) is rearranged to yield equation (9) and the
variance of the EB estimate is calculated using equation (10).

E () EBsafety estimate = + E () E () + + E () (count )
E ( ) E ( ) Var ( EBsafety estimate ) = (count ) ( ) + + E () + E ( )
2
(Equation 9)
(Equation 10)
6.2
Identifying Hazardous Locations
A problematic location is defined as any location that exhibits a significantly higher number of auto insurance claims as compared to a specific normal value. The EB refinement method improves the location specific prediction and thus, is used to identify hazardous locations. The EB refinement method identifies problem sites according to the following four-step process (Higle and Witkowski, 1988; Blanger, 1994). 1) Estimate the predicted number of claims and its variance for the intersection, using the appropriate GLIM model. This follows a gamma distribution (the prior distribution) with parameters and , where:
E () = and = E ( ) = Var ( ) E ( )
(Equation 11)
de Leur and Sayed
20
2) Determine the appropriate point of comparison based on the mean and variance values obtained in step (1). Usually the 50th percentile (P50) or the mean E ( ) is used as a point of comparison. P50 is calculated such that:
P50
( / E ( )) 1 e( / E ( )) d = 0.5
( )
(Equation 12)
3) Calculate the EB safety estimate and the variance from Equations (9) and (10). This is a gamma distribution (the posterior distribution) with parameters 1 and 1 defined as follows:
1 =
EB = + 1 and 1 = 1 EB = + count Var ( EB) E ( )
(Equation 13)
Then, the probability density function of the posterior distribution is given by the following:
f EB ( ) =
( / E ( ) + 1)( +count ) + count 1e( / E ( )+1)

( + count )
(Equation 14)
4. Identify the location as claim-prone if there is significant probability that the locations safety estimate exceeds the P50 value. Thus, the location is claim prone if:
P50 ( / E ( ) + 1)( + count ) + count 1e ( / E ( ) +1) 1 d + ( ) count 0
(Equation 15)
where:
is the desired confidence level (usually selected at 0.95).
de Leur and Sayed
21
Figure 4 illustrates the process of identifying hazardous locations in a graphical form. The prior distribution represents what is normal, obtained from the predicted frequency (GLIM model). The posterior distribution represents what is occurring,
obtained from the observed EB frequency estimate. The shaded area is the probability that the observed EB estimate is less than the mean of what is normally expected. This technique was applied to all 108 intersections used in this study. In total, there were 40 locations that were identified as being prone to claims.
0.16 0.14 0.12 Frequency 0.10 P50 Predicted Claim frequency 0.08 0.06 0.04 0.02 0.00 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Claim Frequency Distribution of Expected Claim Frequency (What is normal) Prior Distribution Distribution of Actual Claim Frequency (What is occurring) Posterior Distribution
Shaded Area Represents Probability of having Claims < P50
FIGURE 4: Identification of Hazardous Locations
6.3
Ranking Hazardous Locations
Once problem sites are identified, it is important for road authorities to rank the locations in terms of priority for scheduled treatment. Ranking problem sites will enable the road authority to establish an effective road safety program, ensuring the efficient use of the limited funding available for road safety. Two techniques that reflect different priority objectives for a road authority can be used (Sayed and Rodriguez, 1999).
de Leur and Sayed
22
The first ranking criterion is to calculate the ratio between the EB estimate and the predicted frequency as obtained from the GLIM model (a risk-minimization objective). The ratio represents the level of deviation that the intersection is away from a normal safety performance value, with the higher ratio representing a more hazardous location. The second criterion, the cost-effectiveness objective, is to calculate the difference between the EB estimate and the predicted frequency (from the GLIM model) for each hazardous location. The difference between these two values is an effective indicator of the expected safety benefits measured by the potential reduction in claim frequency. To test the application of the claim prediction model, a comparison between the obtained from the claim model (Model 1) and the ranks obtained from the collision model (Model 2) was undertaken. The EB estimate for the observed claim and total collision frequencies were calculated for each intersection. In addition, the predicted claim and total collision frequencies were calculated using the two prediction models and the corresponding road volumes. The difference and the ratio between the EB estimates and predicted values were calculated for claims and total collisions and each site was then ranked. The results of this comparative analysis are presented graphically in Figure 5 and Figure 6. Each figure shows the agreement in the ranking of problematic sites between the results from the claims and total collisions, comparing the EB estimate and the predicted value for both ranking techniques (difference and ratio). The level of agreement between the two rankings methods for the claims versus total collisions is considered very good.
de Leur and Sayed
23
5DQNLQJ $JUHHPHQW (%35('

5DQN ,&%& &ODLPV

5DQN &ROOLVLRQV
FIGURE 5: Rank Agreement EB Predicted: Claims vs. Collisions
5DQNLQJ $JUHHPHQW (%35('

5DQN ,&%& &ODLPV

5DQN &ROOLVLRQV
FIGURE 6: Rank Agreement EB / Predicted: Claims vs. Collision
de Leur and Sayed
24
7.0
SUMMARY AND CONCLUSIONS
This paper has presented some preliminary results on the use of auto insurance claims data in the evaluating road safety. The motivation for this research was to address problems with collision data and to determine if the claims data could be used for engineering diagnosis of road safety. The main research objective was to develop and apply a claim prediction model for use in road safety engineering. This research used data for 108 urban, signalized intersections located in the Vancouver area of British Columbia. This data was used to develop two prediction models; one to predict the number of claims and the second model to predict the total number of collisions based on major and minor traffic volumes entering the intersection. The generalized linear modeling approach (GLIM) was used to develop the models. The significance of the models was evaluated by using the value, the Pearson 2 statistic, and the t-ratio test. Two graphical techniques were also presented to demonstrate the goodness of fit of the models. Overall, the two prediction models are valid and fit the observed data very well. Two applications for the claim prediction model demonstrated the usefulness in road safety analysis. The first application was the identification of problem locations and the second application was the ranking of problem locations. The Empirical Bayes refinement approach (EB) was used to improve the reliability of location specific predictions, thereby improving the application of the claim prediction model. Overall, the results produced by claims data appear to be very encouraging for use in evaluating road safety. The results suggest that the claims data may be used in place of degrading collision data. It has been shown that the claims data can be used to evaluate road safety and applied in a similar manner as collision records.
de Leur and Sayed
25
8.0
REFERENCES
Blanger, C., Estimation of Safety of Four-leg unsignalized Intersections, Transportation Research Record 1467, National Research Council, Washington D. C., pp. 2329, 1994. Bonneson, J. A. and McCoy, P. T. (1993). Estimation of Safety at Two-way StopControlled Intersections on Rural Highways, Transportation Research Record 1401, National Research Council, Washington DC, pp. 83-89. Bonneson, J. A. and McCoy, P. T. (1997). Effect of Median Treatment on Urban Arterial Safety An Accident Prediction Model, Transportation Research Record 1581, National Research Council, Washington DC, pp. 27-36. Brde, U. and Larsson, J. (1988). The Use of Prediction Models for Eliminating Effects Due to Regression-to-the Mean in Road Accident Data, Accident Analysis and Prevention, Vol. 20, No 4, pp. 299-310. Hauer, E., Ng J. C. N. and Lovell J. (1988). Estimation of Safety at Signalized Intersections, Transportation Research Record 1185, National Research Council, Washington D. C., pp. 48-61. Hauer, E. (1992). Empirical Bayes Approach to the Estimation of Unsafety: The Multivariate Regression Method, Accident Analysis and Prevention, Vol. 24, No 5, pp. 457-477. Higle, J.L. and Witkowski, J.M. (1988). Bayesian Identification of Hazardous Locations', Transportation Research Record 1185, National Research Council, Washington D. C., pp. 24-31. ICBC (1999). Crash, Crime, Contravention Project: Police Crash Process Review SubProject Interim Report, Insurance Corporation of BC, Vancouver BC, Canada, p.19. Kulmala, R. (1995). Safety at Rural Three and Four-Arm Junctions: Development of Accident Prediction Models, Espoo 1995, Technical Research Centre of Finland, VTT 233. Lawless, J.F. (1987). Negative Binomial and Poisson Regression, The Canadian Journal of Statistics, Vol. 15, No. 3, pp. 209-225.
de Leur and Sayed
26
Miaou, S. and Lum, H. (1993). Modeling Vehicle Accident and Highway Geometric Design Relationships, Accidents Analysis and Prevention, Vol. 25, No 6, pp. 689709. NAG, (1996). Numerical Algorithms Group (NAG), GLIM 4. Macro Library Manual: Release 2, The Royal Statistical Society. 5. Sayed, T., and Rodriguez, F. (1999) Accident Prediction Models for Urban Unsignalized Intersections in British Columbia, Transportation Research Record, Transportation Research Board, Vol. 1665, pp. 93-99.

00110

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

00110

Hochgeladen von

Copyright:

Verfügbare Formate

de Leur and Sayed

Claims Prediction Model for Road Safety Evaluation

de Leur and Sayed

de Leur and Sayed

BACKGROUND Accident Data

de Leur and Sayed

Table 1: Sample of Claims Records and Deteriorating Police Reported Collisions

de Leur and Sayed

Auto Insurance Claims Data

de Leur and Sayed

de Leur and Sayed

de Leur and Sayed

The data process and outcomes are presented graphically in Figure 1.

Screen out non-collision claims (theft, vandalism)

Collision based claims records

Match collision claim records to obtain collision frequency

Obtain police reported collisions

Collisions (matched claims records)

Match police collisions and claim-based collision records

1) matched claim and police collisions 2) claim-based collisions only

3) police-based collisions only

4) Total Collisions (1+2+3)

Figure 1: Illustration to Describe the Data Compilation Process

de Leur and Sayed

CLAIMS DATA USED FOR MODELING

de Leur and Sayed

MODELING ROAD SAFETY

de Leur and Sayed

de Leur and Sayed

= the number of observations, and = the number of model parameters.

Var(yi) = the variance of the observed claims.

de Leur and Sayed

de Leur and Sayed

E(i) = predicted number of claims from claim model,

RESULTS OF THE GLIM MODELING

de Leur and Sayed

Table 3: Developed Claim and Collision Prediction Models

Total Collision Model: 2

de Leur and Sayed

Predicted Claims versus Average Squared Residuals

Average Squared Residuals

FIGURE 2: Predicted Claims versus Average Squared Residuals

Predicted Claims versus Pearson Residuals

FIGURE 3: Predicted Claims versus Pearson Residual

de Leur and Sayed

APPLICATIONS OF THE CLAIM PREDICTION MODEL

Location Specific Prediction: The Empirical Bayes Refinement

de Leur and Sayed

E () 2 , equation (7) is rearranged to yield equation (9) and the

variance of the EB estimate is calculated using equation (10).

Identifying Hazardous Locations

de Leur and Sayed

EB = + 1 and 1 = 1 EB = + count Var ( EB) E ( )

( / E ( ) + 1)( +count ) + count 1e( / E ( )+1)

is the desired confidence level (usually selected at 0.95).

de Leur and Sayed

Shaded Area Represents Probability of having Claims < P50

FIGURE 4: Identification of Hazardous Locations

Ranking Hazardous Locations

de Leur and Sayed

de Leur and Sayed

5DQNLQJ $JUHHPHQW (%35('

5DQNLQJ $JUHHPHQW (%35('

de Leur and Sayed

5DQNLQJ $JUHHPHQW (%35('

5DQNLQJ $JUHHPHQW (%35('