0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

3 Ansichten11 Seitentang2016.pdf

© © All Rights Reserved

PDF, TXT oder online auf Scribd lesen

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

3 Ansichten11 Seitentang2016.pdf

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

Sie sind auf Seite 1von 11

Information Via Crowdsourcing

Luliang Tang, Xue Yang, Zhen Dong, and Qingquan Li

AbstractLane-based road network information, such as the applied in the Open Street Map (OSM) for road-level map

number and locations of traffic lanes on a road, has played an construction, which uses the Global Positioning System (GPS)

important role in intelligent transportation systems. In this pa- for localization [6][9]. But low-end GPS data devices and

per, we propose a Collecting Lane-based Road Information via

Crowdsourcing (CLRIC) method, which can automatically extract urban canyons with tall buildings reduce the position accuracy

detailed lane structure of roads by using crowdsourcing data of GPS data to about 1015 m in urban areas. So it is a

collected by vehicles. First, CLRIC filters the high-precision GPS challenging to extract lane-based road information from low

data from the raw trajectories based on region growing clustering precision crowdsourcing GPS data.

with prior knowledge. Second, CLRIC mines the number and In this paper, we propose CLRIC: collecting lane-level road

locations of traffic lanes through optimized constrained Gaussian

mixture model. Experiments are conducted with taxi GPS tra- network information via crowdsourcing. CLRIC can automat-

jectories in Wuhan, China, and the results show that CLRIC is ically extract the detailed lane structure of roads using crowd-

quantified and displays detailed road networks with the number sourcing GPS data collected by vehicles. CLRIC is based on

and locations of traffic lanes comparing with the satellite image two key observations. The first observation is that high preci-

and human-interpreted situation. sion GPS trajectories with accuracies of about 3 m still exist in

Index TermsLane-based road information, crowdsourcing raw vehicle trajectories based on GPS error analysis [10]. Thus,

data, high-precision GPS data filtering, spatiotemporal GPS region growing clustering with prior knowledge (RGCPK) in

trajectories. CLRIC system is used to select high-precision GPS data from

low precision raw GPS data. The second observation is that

I. I NTRODUCTION vehicle trajectories contain abundant information regarding

road networks [11][13], traffic conditions [14], [15], points

and locations of traffic lanes) is crucial for ensuring reli-

able and safe driving for next generation navigation, especially

of interest, and driving behaviors [16], [17], etc. The detailed

lane structure of roads can also be mined by using clustering

methods based on the assumption that GPS trajectories will

for intelligent transportation systems (ITS) such as advanced trend to cluster near the center of each lane with some spread.

driver assistance systems and autonomous driving. At the same Thus, CLRIC fits an optimized constrained Gaussian mixture

time, the number of lanes can also be important for infer- model to perpendicular cross sections of the trajectory vectors

ring the type of road and for estimating traffic flow capacity. across a road, and further determine the exact number and

At present, lane-based information such as the number, turn locations of traffic lanes. In summary, the contributions of this

rules and locations of traffic lanes, is usually acquired from paper are the following.

high-definition video/images, laser point clouds, or DGPS/INS

trajectories(Differential GPS/Inertial Navigation System) with 1) We presented a trajectory optimization method to select

accuracies of about 0.54 m [1][4]. Manual and semi-manual high-precision data from crowdsourcing GPS data. The

are time-consuming and labor-intensive [5]. position accuracy of selected data is about 3 m.

Crowdsourcing is a low-cost and efficient way to extract 2) We designed a mining lane information model based on

useful information from data acquired by crowd participants or optimized constrained Gaussian mixture method.

volunteers. The crowdsourcing method has been successfully The remainder of this article is organized as follows. In

Section II, related studies on trajectory optimization and traffic

lane information extraction from GPS trajectories are reviewed.

In Section III, CLRIC is fully described. In Section IV, a series

of experiments on Wuhan datasets are used to demonstrate the

Manuscript received June 7, 2015; revised November 1, 2015 and advantages and effectiveness of the CLRIC. In Section V, some

January 10, 2016; accepted January 19, 2016. This work was supported by

the National Science Foundation of China under Grants 41571430, 41271442, conclusions and directions for future research are given.

and 40801155. The Associate Editor for this paper was Z.-H. Mao.

L. Tang, X. Yang, and Z. Dong are with the State Key Laboratory of

Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan II. R ELATED W ORK

University, Wuhan 430079, China (e-mail: yangxue@whu.edu.cn).

Q. Li is with the Department of Shenzhen Key Laboratory of Spatial Smart GPS does not work perfectly in urban area. When using GPS

Sensing and Services, Shenzhen University, Shenzhen 518060, China. receivers in street canyons with tall buildings, the shadowing

Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org. and multi-path effects results in low positional accuracy. Be-

Digital Object Identifier 10.1109/TITS.2016.2521482 sides, gathering GPS data via crowdsourcing way is relatively

1524-9050 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

optional compared with professional way, so the raw GPS data from registered users. Likewise, WikiMapia, Google Maps, and

mixes with many outliers. At present, there are several ways other map applications let users to update maps. The methods

to optimize raw trajectories such as filtering, map matching, proposed in [20][25] can generate and update road-level maps

and clustering algorithm. Filtering is suitable in those situations from crowdsourcing data, while detailed road network gener-

where the high-sampling rate trajectory data is particularly ation has gradually shifted down to lane-based road network

noisy, or when it is necessary to derive other quantities like information such as the number and locations of traffic lanes.

speed or direction from trajectory data [18]. Map matching Lane-based information extraction from vehicle trajectories

is another way for raw trajectory data optimization that each starts with differential GPS data and concludes with a refine-

trajectory is matched to road centerline it corresponds [19]. ment of an existing map, including finding lanes and lane transi-

In addition, some researchers proposed that using clustering tions through the intersections [26], [27]. This process involves

method to remove outliers. In reference [14], authors used smoothing and filtering the GPS data, matching it to an existing

Kernel density method to identify outliers and remove them. map, spline fitting for the road centerlines, clustering to find

The authors of [4] sort all the data points in ascending order ac- lanes, and refinement of the intersection geometry [28]. The

cording to their distances from the median and then choose 95% authors of [29] proposed to use vehicle trajectories collected by

of the sorted data points as the experimental data. However, all mobile phones equipped with GPS and MEMS (Micro-Electro-

these methods [4], [14], [18], [19] have their defects. Filtering Mechanical System) to generate lane-level road maps in open

is sensitive to the sampling rate of GPS data so its unfortunate area. The lane-level information was extracted by statistically

for GPS data with low-sampling rate. Map-matching is valid for analyzing the probability density distribution of trajectories

road-level information extraction like road network updating based on non-parametric Kernel Density Estimation. However,

and traffic flow detection and so on, but it is useless for lane- the methods discussed in [26][29] are based on the assumption

based road information extraction because each GPS point is that GPS trajectories from different lanes are separated well.

matched to road centerline. Besides, the existing clustering For low-precision crowdsourcing GPS data, this assumption is

methods [4], [14] are confined to parameter setting and cant seriously violated, and therefore we propose CLRIC to extract

remove outliers which are mixed in high-density points cluster. lane structure from a mass of low-precision crowdsourcing GPS

Extracting information form pre-processed GPS data is the data in urban area.

key issue in geographic area. In this study, we focus on the lane-

based information extraction from GPS data. There has also III. C OLLECTING L ANE -BASED ROAD

been work on completely automated methods aimed at inferring I NFORMATION VIA C ROWDSOURCING

road maps from crowdsourcing data. Those methods include

matching GPS traces to prototypical shapes [20], and using an The overview of the CLRIC system is shown in Fig. 1.

incremental method to process GPS traces that can be used to As seen in Fig. 1, CLRIC includes two steps:

generate road maps [21], [22], and applying clustering methods Step 1) select high-precision data from crowdsourcing data

or artificial algorithms to extract road network from GPS traces based on region growing clustering with prior knowl-

[23][25]. Besides, OpenStreetMap uses user-contributed GPS edge (RGCPK). The positional accuracy for selected

trajectories to create free digital maps that are open for editing data can approach 3 m.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Fig. 2. Trajectories and trajectory vectors. (a) and (b) show the trajectory and

trajectory vector respectively, where N indicates the north and is the angle

with north of vector v 1i .

Fig. 3. The similarity evaluation of trajectory vectors.

lected high-precision data such as the number and lo-

cations of traffic lanes using an optimized constrained 1) The Similarity Evaluation Model of Trajectory Vectors:

Gaussian mixture model and road construction rules. The difference of two trajectory vectors is reflected in the

vector direction and vertical distance between two start points

A. Crowdsourcing Data: GPS Trajectories of vectors. As shown in Fig. 3, v i (xi , yi ), (xi+1 , yi+1 ) and

In the real world, the trajectory of a moving object is v j (xj , yj ), (xj+1 , yj+1 ) are two different trajectory vectors.

continuous and usually called the original route. However, The angle to the north of v i and v j are 1 and 2 respectively.

it is hard to acquire continuous trajectories through existing The magnitude information of vectors is not discussed here

positioning techniques or store them in a database. Thus an because we only care about the location accuracy of trajectories

alternative feasible solution taken in GPS-enabled services is to rather than the sampling rate and speed of moving objects that

store only a set of sampled positions of a trajectory, as a tuple decide the value of magnitude of vectors.

T racei = p1 , p2 . . . pn , pi is a GPS sampled spatial-temporal Thus, the similarity measure in this paper is defined in the

point, which is a tuple xi , yi , ti , si (i = 1, 2, . . . n), as shown form of linear weighting:

in Fig. 2(a).

Here ti is the time stamp when data is collected, xi , yi sim(vi ,vj ) = 1 edif fHd + 2 edif fij (1)

is the geographic location of the moving object, si contains where dif fHd and dif fij represent the vertical difference and

some extra features of a moving object such as the vehicle angular difference of v i and v j respectively, 1 and 2 are the

number, driving direction and speed. These kind of sampled weighting of the difference of vertical distance and angle, 1 +

trajectories are called raw GPS trajectories. For a trajectory 2 = 1. In general similarity of two vectors ranges from 0 to 1,

vector vi = pi , pi+1 , pi and pi+1 are regarded as the start with a value of 1 for two completely same vectors and a value

point and the end point respectively, and the direction from pi of 0 for two completely separated vectors. We define dif fHd

to pi+1 is regarded as the vector direction of vi (Fig. 2(b)). and dif fij as follows,

For lane-based information extraction, the information of

traces such as location, driving direction and collected time are max(Hdij , Hdji )

dif fHd = (2)

very important. However, due to the errors caused by data sam- Disconne

pling and encryption in GPS navigation services, many GPS dif fij = 1 cos() (3)

records are not precise and thus generally need to be optimized.

where Disconne is the constant and decided by the width of

lane, and used to constrain the similarity of vectors on the same

B. RGCPK: Region Growing Clustering With

lane as GPS traces trend to cluster near the center of each lane,

Prior Knowledge

Hdij is the vertical distance from the v i starting point to the v j

Most drivers keep driving along the centerline of lanes and starting point. The computation of Hdji is the same as Hdij ,

change lanes in a short time. The GPS trajectory reflects the is the angular difference of v i and v j (Fig. 3), and can be

tendency of vehicle. Thus, the high-precision data has two estimated as:

features: tracking points with high positional accuracy always

cluster together along the centerline of each lane; the angle of = |i j | (4)

two adjacent trajectory vectors of a trajectory will not change |xi (yj+1 yj ) + yi (xj xj+1 ) + (xj+1 yj xj yj+1 )|

Hdij =

a lot unless they are at an intersection or changing lanes [30]. (yi+1 yi )2 + (xj xj+1 )2

Based on this observation, RGCPK clusters trajectories based (5)

on their similarity and then selects high-precision data from |xj (yi+1 yi ) + yj (xi xi+1 ) + (xi+1 yi xi yi+1 )|

clusters. The prior knowledge for RGCPK is extracted from Hdji =

(yi+1 yi )2 + (xi xi+1 )2

the similarity between high precision DGPS trajectories and (6)

synchronized low precision GPS trajectories. The positioning

accuracy of a DGPS trajectory and its synchronized GPS The similarity calculation is used not only for prior knowl-

trajectory are about 0.5 m and 1015 m respectively, their edge extraction from DGPS and low precision GPS data but for

sampling rate is 1 s. To evaluate the similarity between DGPS crowdsourced data clustering and selection. Thus, we use the

trajectory vector and its synchronized GPS trajectory vector, a correlation between the vertical distance difference and angular

novel vector similarity evaluation model (VSEM) is presented. difference with measuring errors to estimate 1 and 2 .

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

T i = (10)

N (STi )

where T i is the measuring error of T si , is the sum of

measuring error of each tracking point in STi , i = 1, 2, . . . , l.

The results of T si , T i and P eri are recorded as RSTi =

T si , T i , P eri and regarded as the prior knowledge of

RGCPK.

3) The Principle of RGCPK: The prior knowledge includes

T and P er derived from RSTj = T sj , T j , P erj , j =

1, 2, . . . , l. T is regarded as the threshold of clustering and

Fig. 4. Similarity of vector pairs. The similarities of some vector pairs are P er is used to select high-precision data from entire clusters.

shown in the yellow rectangle window. The key steps of the RGCPK are as follows:

Given a set of GPS trajectories T = T race1 , T race2 , . . . ,

un-clustered and the current cluster label CCL = 0;

T races , its synchronized DGPS trajectories are denoted as

Step 2: Select a trajectory vector labeled un-clustered from

DT = Dt1 , Dt2 , . . . , Dts . The position accuracy of DT and

the trajectory randomly as seed trajectory vector v s and

T is 0.5 m and 1015 m respectively. So T is regarded as

label it as CL (v s ) = CCL , as shown in Fig. 5(a).

measurements and DT with high-accuracy actually refers to

Step 3: Search the adjacent trajectory vector of v s , denoted by

the truth value of T . Let T racei = p1 , p2 , . . . , pn , Dti =

v sn. v s and v sn are merged as one cluster and labeled

rp1 , rp2 , . . . , rpn , T racei T , Dti DT , i = 1, 2, . . . , s,

as CL (v sn ) = CCL if they satisfy: Sim(vs , v sn ) > T s,

their trajectory vectors are denoted as: T v i = v 1 , v 2 , . . . ,

where Sim(v s , v sn ) is the similarity between v s and

v n1 , Dv i = rv 1 , rv2 , . . . , rvn1 . The difference of ver-

v sn , and T is the similarity threshold.

tical distance and angle between T v i and Dv i are calcu-

Step 4: Take v sn as seed trajectory vector v s . Return to Step 3,

lated and denoted as: Di = d1 , d2 , . . . , dn1 , Ai = a1 , a2 ,

as shown in Fig. 5(b).

. . . , an1 , i = 1, 2, . . . , s. The measurement error of T can

Step 5: One cluster is acquired when the seed trajectory vec-

also be signed as: i = 1 , 2 , . . . , n , j = |pj rpj |, i =

tor v s cannot be merged with its adjacent trajectory

1, 2, . . . , s, j = 1, 2, . . . , n. Then 1 and 2 are estimated as:

vectors v sn . Then, let CCL = CCL + 1, and return to

s Step 2.

rD Step 6: Repeat Step 25 until all the trajectory vectors in the

1

i=1

1 = (7) trajectory are labeled, as shown in Fig. 5(c).

ss

s

Step 7: Put the clusters as trajectory vectors and further merge

rD + rA

i=1 i=1 the clusters using these steps. The final result of clus-

2 = 1 1 (8) tering is shown in Fig. 5(d).

Step 8: After clustering, P er is regarded as the selectivity for

where rD and rA are the correlation of Di and i , Ai and i , data selection. The proportion of tracking points of

and can be estimated based on the covariance matrix. each cluster of the total is computed. All clusters are

2) The Prior Knowledge of RGCPK Extraction: The similar- sorted in descending order according to their propor-

ity SimT racei , DTi of T racei and DTi can be computed tion and summed from the first value until the accu-

according to the similarity evaluation model, and the results re- mulated value is satisfied, P er. Then those aggregated

corded as T racei = (p1 , s1 ), (p2 , s2 ), . . . , (pn , sn ), where sj clusters are regarded as the high-precision data and

is the similarity value of v j and rv j , and sj is recorded as selected from all clusters, as shown in Fig. 5(e).

the similarity value of pj according to the tendency of moving

object, sn1 = sn , j = 1, 2, . . . , n, as shown in Fig. 4.

A high value of similarity indicates a higher positional ac- C. Lane-Based Road Information Collection

curacy of tracking point. Assuming that T s = T s1 , T s2 , . . . , 1) The Principle of Number and Locations of Traffic Lanes

T sl is the threshold set of similarity and STi represents the

Detection: We fit a Constrained Gaussian Mixture Model

data set which is satisfied T si , STi T . Then the percentage (CGMM) to perpendicular cross sections of the traces across

of STi is calculated as: the road, based on the assumption that GPS trajectories will

N (STi ) tend to cluster near the center of each lane with some spread due

P eri = (9) to GPS noise and other vagaries. The CGMM can be defined as:

N (T )

k

1 (x j )2

where P eri represent the percentage of STi , N (STi ) and p(x) = j exp (11)

N (T ) are the number of tracking points of STi and T , respec- j=1 2 2 2 2

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ponent corresponds to each lane, providing an automatic lane

count; w1 . . . wk are the weight of each component, correspond-

ing to the relative traffic volume in each lane. The weights have

to be positive and normalized, that is, wj > 0, j = 1, 2, . . . , k,

and w1 + w2 + . . . wk = 1. The parameters 1 . . . k are the

mean of the trajectories for each component and equal to the

centerline of each lane; is the standard variance of the trajec-

tories for each component and are set to same value because the

width of each lane of adjacency lane usually is the same. The

expectation-maximization (EM) algorithm is used to infer

(m) (m) (m)

the unknown parameter: j (j , j , (m) ) where m

Fig. 6. The raw tracking points, (a) shows the real image data of intersection

is the number of iterations. The concrete procedure on how to and (b) is the raw GPS records around it.

(m)

get parameter: j was obtained from [4].

The key task in the CGMM model is selecting the number

However, low-end GPS devices and urban canyons mean that

of components for a CGMM. A common practice is to estimate

GPS tracking points deviate from original positions. As shown

k {wi , i , i }k for a set of ks, i = 1, 2, . . . k, and then select

in Fig. 6(b), a lot of tracking points deviate from original

the k that minimizes the following function:

positions and their proportion of total is far more than 5%. So

1

Rsrm (p(xi |k )) = L (xi , p(xi |k ))+J(p(xi |k )) (12) number of lanes from GPS data at 1015 m accuracy.

n i=1

In this paper, we present a new regularizer J(p(xi |k )) for

confirmation of the number of lanes, as shown in equation (15).

k = min (Rsrm (p(xi |k ))) (13) 2

DW

L (xi , p(xi |k )) = log (p(xi |k )) (14) J (p(xi |k )) = JTSW (p(xi |k )) = k (15)

k

n

k

k1

n

risk minimization, SRM), L(xi , p(p(xi |k ))) is the empirical

1

n +n ij (j 1)xi n1 j+1 j xi

k i=1 j=1 j=1 i=1

risk model used to evaluate the goodness of fit, J(p(xi |k )) is a = 2

regularization term that penalizes complex models, and > 0

k1

k1

j+1 j 2 + n j+1 j

is the regularization parameter. Equation (12) shows a trade- j=1 j=1

off between model fitness and model complexity in order to (16)

achieve good generalization.

The authors in reference [4] proposed a new regularizer, RLS where Dw is the spread width of optimized trajectories, the

based on the relation between the number of lanes and the total value of (Dw/k) refers to the width of lane, k is the

spread of trajectories, their test results show the advantages change between two adjacent j s that equals the detected lane

when compared to other methods such as Akaike information width, j = 1, 2, . . . , k. Equation (15) indicates the consis-

criterion (RAIC ) and Bayesian information criterion (RBIC ). tency degree of (Dw/k) and k . Equation (16) shows the

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Fig. 7. Lane centerline. (a) is the fitting result of CGMM and the location of each lane is shown in (b).

Fig. 10. The construction rule of road. Part1 and Part2 are the most likely parts

of road to add lanes.

Fig. 9. Width detection method.

/Initialization/

k

calculation of . The reasoning process and explanation of Coordinate origin: Roi; i = 1, 2, . . . , t.

the parameters are as in [4]. According to (11), the parameter horizontal axis: the direction of the road centerline;

i reflects the location of each lane, as shown in Fig. 7. longitudinal axis: U yi = 0; Dyi = 0;

2) Identification and Optimization of the Number of Lanes: Sliding window: length = rh; width = w; Proportion = 0;

The accuracy of the number of lanes detection has a great /Assignment/

impact on the locations and turning rules for lane extraction. for each Sampling cross sections, do

Given a set of trajectories AT which start from Intersection1 repeat

and end at Intersection2 , the number of traffic lanes is detected Moving the sliding window along the positive

based on the CGMM. direction and negative direction of the longi-

As shown in Fig. 8, we choose a rectangular window to sam- tudinal axis and accumulating the Proportion

ple cross sections and the length and width of rectangle window (Proportion=current points number in sliding

are set as rh and rw. Then we fit a CGMM to the intersections window/all points in the current sampling cross

between the GPS trajectories and a sampling line perpendicular sections)

to the road centerline, and according to (12)(16) confirm the until Proportion == 100%

number of lanes and record them as N lanei , i = 1, 2, . . . , t. set Dwi = maximum |U yi | + |maximum|Dyi |;

In addition, the spread width of optimized trajectories Dw set Coordinate origin changed to Roi+1; U yi+1 =

in sampling cross sections are acquired by Width Detection Al- 0; Dyi+1 = 0;

gorithm. The direction of a road centerline and a sampling line end for

are set as the horizontal axis and longitudinal axis, respectively.

The intersection between sampling line and road centerline is In most cases, the value of N lanei between two intersections

set as the origin, as shown in Fig. 9. The width of sliding always remains the same except when adding lanes at an inter-

window w can be set at any value as long as in the scope of section, as shown in Fig. 10. But some incorrect classifications

desired precision and the length of sliding window is same as for number of lanes still exist due to the effects of uncertain

the length of rectangular window. traffic flows in each lane or inaccuracies in the GPS data.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Fig. 11. The collection of DGPS trajectories and synchronized GPS trajectories, (a) shows the driving region of shuttle vehicles, (b) is the magnification of (a) that

the black points and blue points represent GPS data and DGPS data respectively.

Fig. 12. Crowdsourcing data: collected by taxis, (a) indicates the road network for taxi driving and (b) shows the raw trajectories collected by taxis.

Thus, we present a method to optimize the results of the performance of region growing clustering with prior knowledge

number of lanes extraction, as follows. method. The test data set was applied to as the main data

First, comparing N lanei+1 with N lanei and N lanei+2 , source for lane-based information extraction. Two data sets are

N lanei+1 is replaced by N lanei when N lanei and N lanei+1 introduced as follows.

are different, and N lanei and N lanei+2 are the same. The training data set was collected by shuttle vehicles. Each

Secondly, clustering the results from Step 1 according to the shuttle vehicle was equipped with a GPS logger and Inertial

value of N lanei and their arrangement, for instance N lanee , Measurement Unit (IMU) that recorded two kinds of traces,

N lanee+1 , N lanee+2 , . . . , N lanee+c will be clustered when GPS traces based on the GPS single-point positioning technique

their value is same, e < t, e + c < t. Each cluster also corre- and synchronized DGPS traces based on differential global

sponds to a number of lanes. Assuming there are s clusters, and positioning technology. The positional accuracy of the GPS

recorded as Cj = N lj , ncj , where N lj is the number of lanes and DGPS data in urban area was about 1015 m and 0.5 m

of cluster Cj , ncj is the total number of N lanei that belong to respectively. The sampling rate for the training data set was 1 s.

Cj , j = 1, 2, . . . , s. The data collection period for the shuttle vehicles was seven

Finally, comparing Cj+1 with Cj , N lj+1 of Cj+1 is replaced days. We obtained about 40 thousand GPS and DGPS points,

by N lj of Cj when N lj+1 and N lj are different, and ncj+1 < shown in Fig. 11. The prior knowledge for high-precision data

cv, where cv is a constraint value that depends largely on the selection from the crowdsourcing data was extracted from part

road construction rules. of the training data set by analyzing the similarity of DGPS data

and its synchronized GPS data. The remaining training trajec-

tory data were used to evaluate the performance of RGCPK.

IV. E XPERIMENTS

The test data set were collected by thousands of taxis based

CLRIC includes two steps high-precision data selection and on point position technique in Wuhan; the GPS devices were

lane-based information extraction. Thus, to evaluate the perfor- placed at the center of the taxi roofs. The sampling frequency

mance of CLRIC, we used two different types of data sets a of taxi traces ranged from 10 s to 20 s while the positioning ac-

training data set of ten shuttle vehicle traces and a test data set of curacy for them ranged from 10 m to 15 m in urban areas. Each

thousands of taxi GPS traces, and both data sets were collected taxi recorded traces for an average of 14 days. We collected

in urban area. The training data set was used to extract priori about 200 billion GPS points, shown in Fig. 12. According to

knowledge for high-precision data selection and verify the each tracking point location and heading direction, we got about

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

TABLE I

T HE W EIGHTS OF S IMILARITY E VALUATION M ODEL

TABLE II

T HE P RIOR K NOWLEDGE E XTRACTION

rectangle window were set to 5 m and 50 m based on the road

Fig. 13. A part of training data set. Red marks and blue marks represent GPS

data and its high-precision synchronized DGPS data. construction standards; The width of one-way road is smaller

than 50 m in China and adding a lane on a road generally occurs

1000 trajectory segments based on clustering and partitioning within 50 m of an intersection.

method [31], and the counts of trajectories of each trajectory Given a set of optimized trajectories S T that start from one

segment range from 100 to 1000. Those trajectory segments intersection and end at another intersection, and 120 thousand

represent one-way road and start from one intersection and end tracking points, the number and locations of lanes were detected

at another intersection. as follows:

1) The Number of Traffic Lanes Detected Using Optimized

A. Region Growing Clustering With Prior Knowledge Constraint Gaussian Mixture Model (CGMM): As shown in

(RGCPK) for High-Precision Data Selection Fig. 15, we trained CGMM with k = 2, 3, 4, 5, on a sample

The RGCPK is the first step in the CLRIC system. We took s1 from S T , containing 4,000 data points. The densities of

a part of the trajectories from training data set as samples those CGMMs and their individual components, multiplied by

as shown in Fig. 13. The weights in the similarity evaluation the number of data points, are shown in (a), (b), (c) and (d),

model are estimated by (7) and (8) as shown in Table I. respectively. The number of lanes of s1 was identified using

The similarity degree of each tracking point is estimated equation (12)(16).

according to the similarity evaluation model of trajectories The number of lanes of s1 is 4 according to computational

vectors. We analyzed the measured value of error (T /m) and results. Then we fit CGMMs to each sample from S T with k,

the proportion of the data set (P eri ) under different similarity and calculated the number of lanes of all samples, k = 2, 3, 4,

thresholds (T s). The similarity threshold (T s), measurement 5, as shown in Fig. 16. The true value of the number of lanes

error (T /m), and percentage of the total data set it represents of S T was obtained by field observation. From Fig. 16, we see

(P er) are shown in Table II. that incorrect lane number identification exists in the results.

In Table II, T s, T and P er represent the similarity thresh- 2) Result Optimization: The process of result optimization

old, measurement error and selectivity, respectively but T was used to improve the accuracy of lane number identification.

indicates the expectation quality for data selection. As shown in Fig. 17, incorrect identifications are re-identified

We picked 0.71 as the clustering threshold when our expecta- based on the result optimization algorithm. A constraint value

tion quality was 3 m, and 55.4% is regarded as the selectivity for cv is set to 3 due to the road construction standard that stipulates

data selection from all clusters based on RGCPK. Fig. 14 shows that adding lanes happens within 50 m of an intersection and for

the results of RGCPK for the test data set and a part of training bus stops as the length of bus stop is more than 10 m.

data set. The red-solid points and black-empty circles represent As a complement to the lane number measures, Fig. 18

the high-precision data and outliers, respectively. presents a visualization of the location of each lane of S T where

red lines depict the centerline of each lane.

B. Detection of the Number and Locations of Traffic Lanes From Fig. 18, we see that the accuracy of the location of each

lane depends largely on the accuracy of extraction of lane num-

(0) (0)

The initial estimates j , j , (0) (j = 1 . . . k) are as- ber identification. The incorrect identifications of the number of

signed as: k = 2, 3, 4, 5 because there are four types of traffic lanes lead to wrong place of the location of each lane. Misiden-

lanes designated by the road construction standards in China; tifications of the number of lanes seen in Fig. 18 also indicate

(0) (0)

1 . . . k are the centerlines for each possible lane and that our method has difficulty dealing with data sets from com-

acquired by segmenting the road from road centerlines with plex intersections; that have different traffic flows between adja-

equal interval (set to 3.5 m); (0) is the half width of a lane and cent lanes caused by the traffic lights; or where there are driving

set as 1.75 m according to road construction standards in China. restrictions. In addition, the centerlines of each lane on a one-

Values of regularization parameter and other parameters in way road cannot be connected without gaps using our method.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Fig. 14. High-precision data selection; (a) indicates the result of test data set selection, and (b) shows the selecting results of a part of training data set. The

red-solid points and black-empty circles represent outliers and high-precision data.

Fig. 15. CGMM results. (a), (b), (c) and (d) indicate the overview of k = 2, k = 3, k = 4, k = 5, respectively.

Fig. 17. The optimized results of the number of lanes detection. The blue-

Fig. 16. The lane detection results. The black-solid points and red-empty empty circles and black-solid points represent the optimized results and truth

circles represent the true value and detection results for the number of lanes. value of the number of lanes.

The measurement errors of the selected data points were

C. Quantitative Evaluation

computed along with synchronized DGPS data. The results

1) The Performance of Region Growing Clustering With show that the position accuracy of selected data can achieve

Prior Knowledge Method (RGCPK): To evaluate the perfor- 3.02 1.2 m, where 3.02 m is the average value and 1.2 m is

mance of the proposed RGCPK, we implemented RGCPK on the standard deviation.

two data sets seen in Fig. 19. The first data set from the training For test data set, we could not estimate the position ac-

data set that was used to estimate the accuracy of selected data curacy of selected data because there was no high-precision

and the other is used to identify the performance of RGCPK for synchronized DGPS data. Thus, the performance of RGCPK

lane number identification. for crowdsourcing data was evaluated by comparing it with the

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

TABLE III

L ANE N UMBER I DENTIFICATION C OMPARISONS

Fig. 18. The locations of each lane, where the yellow line depicts the centerline

of each lane and the white line shows the incorrect identification of lane

centerline.

Fig. 20. Lane width comparison between detecting value and actual value.

compared to two other classification models. The comparison

experiment also shows that methods in [4] and [29] rely too

much on positional accuracy and sampling rate.

3) Evaluation for the Location of Lane Extraction: To eval-

uate the location of each lane extraction, we randomly select

about 200 trajectories segments from test data set as the sample

data, and compare the detecting lane width () of those

Fig. 19. The results of RGCPK using data from the training data set, where samples with the actual lane width. The actual lane width of

red-empty circles depict the selected data marks and blue points show the those samples is extracted by field measurement. Based on the

synchronized DGPS data points of those selected data points.

measurement results there are two basic types of lane width in

accuracy of the lane number identification in the test data set sample data: 3.75 m (type1) and 3.5 m (type2). From Fig. 20,

before and after optimization. Based on this, the accuracy of most detecting lane width () is higher than the actual value

lane number identification for selected data from crowdsourc- either for type1 or for type2. The standard deviation of for

ing data is 85.2%, but the accuracy of lane number identification type1 and type2 is 0.1904 and 0.2337 respectively. The average

of raw test data set only reach to 42.3%. value of difference between detecting results and true value

2) Quantitative Evaluation for Lane Number Identification: of lane width for type1 and type2 is 0.3564 m and 0.4585 m

The quantitative evaluation of our lane number identification respectively, which indicates the locations of lane extraction of

was done by comparing it to human-interpreted results. This type1 are slightly better than type2, but both of them are not

comparison shows that the proposed method CLRIC achieved completely close to the actual lane width.

good performance in extracting lane numbers at an overall These differences between the detecting value and the actual

accuracy of 85.2%; however, there was also a 14.8% chance value of different types of lane width are caused by the canyon

of incorrectly identifying the number of lanes. In urban area, it streets which lead to the positional accuracy of GPS data in dif-

is a challenging task to mine lane-based information from low- ferent areas existing difference. In urban area, the positional ac-

precision crowdsourcing data especially for roads with complex curacy of GPS data collected in the street with low-rise buildings

intersections, viaducts or tunnels. In summary, the reasons for is better than those collected in the streets with high-rise build-

the incorrect identification include: our method cannot distin- ings. In the future work, we will study deeply on this problem.

guish trajectories from roads on and below viaducts, since the

experimental data had no elevation information. The lane infor- V. C ONCLUSION

mation for overlapping roads in the study area was misclassified; In this paper, we propose CLRIC, an automated method to

the number of lanes can be missed because of GPS signal-loss extract lane-based information such as the number and loca-

in tunnels; our method has difficulty dealing with data sets from tions of traffic lanes on a road via crowdsourcing. CLRIC filters

complex intersections which mixed with viaducts. the high-precision GPS data from the raw trajectories using

In addition, to evaluate the performance of CLRIC, we region growing clustering with prior knowledge, and mines

presented a qualitative comparison of lane number identifi- the number and locations of traffic lanes through optimized

cation based on CLRIC, Constraint Gaussian Mixture Model constrained Gaussian mixture model, which is promised to

(CGMM) [4] and Kernel Density Estimation (KDE) [29]. The be a low-cost and real-time way to collect lane-based road

results of the number of lane identification comparisons us- information. However, the proposed method still has room for

ing test data set are shown in Table III. According to these improvement, and our future work will focus on the extraction

results, the CLRIC system employing region growing cluster- of lane information under complex road environments such as

ing with prior knowledge and optimized constraint Gaussian under tunnels and overpasses.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

R EFERENCES [26] S. Rogers, P. Langley and C. Wilson, Mining GPS data to augment road

models, in Proc. 5th ACM SIGKDD Int. Conf. Knowl. Discovery Data

[1] A. B. Hillel, R. Lerner, D. Levi, and G. Raz, Recent progress in road and Mining, New York, NY, USA, 1999, pp. 104113.

lane detection: A survey, Mach. Vis. Appl., vol. 25, no. 3, pp. 727745, [27] K. Wagstaff, C. Cardie, S. Rogers, and S. Schrodl, Constrained

Apr. 2014. k-means clustering with background knowledge, in Proc. 18th ICML,

[2] M. Thuy and F. Len, Lane detection and tracking based on lidar data, San Francisco, CA, USA, 2011, pp. 577584.

Metrol. Meas. Syst., vol. 17, no. 3, pp. 311321, 2010. [28] S. Edelkamp and S. Schrdl, Route planning and map inference with

[3] B. Yang, Z. Dong, and W. Dai, Hierarchical extraction of urban objects

global positioning trajectories, Comput. Sci. Perspective, vol. 2598,

from mobile laser scanning data, ISPRS J. Phothogramm. Remote Sens., pp. 128151, 2003.

vol. 99, pp. 4557, Jan. 2015. [29] A. Uduwaragoda, A. S. Perera, and S. A. D. Dias, Generating lane level

[4] Y. Chen and J. Krumm, Probabilistic modeling of traffic lanes from GPS road data from vehicle trajectories using kernel density estimation, in

traces, in Proc. 18th SIGSPATIAL Int. Conf. Adv. Geographic Inf. Syst.,

Proc. IEEE 16th Int. Annu. ITSC, Oct. 69, 2013, pp. 384391.

2010, pp. 8188. [30] X. Liu et al., Road recognition using coarse-grained vehicular traces,

[5] A. G. O. Yeh et al., Hierarchical polygonization for generating and HP Lab., Palo Alto, CA, USA, 2012, pp. 110.

updating lane-based road network information for navigation from road

[31] J. G. Lee and J. Han, Trajectory clustering: A partition-and-group

markings, Int. J. Geographical Inf. Sci., vol. 29, no. 9, pp. 124, 2015. framework, in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2007,

[6] B. Zhou et al., ALIMC: Activity landmark-based indoor mapping via pp. 593604.

crowdsourcing, IEEE Trans. Intell. Transp. Syst., vol. 16, no. 5, pp. 111,

Oct. 2015.

[7] N. D. Lane, S. B. Eisenman, M. Musolesi, E. Miluzzo, and A. T. Campbell,

Urban sensing systems: Opportunistic or participatory? in Proc. 9th

Workshop Mobile Comput. Syst. Appl., 2008, pp. 1116. Luliang Tang received the Ph.D. degree from

[8] M. Haklay and P. Weber, OpenStreetMap: User-generated street maps, Wuhan University, Wuhan, China, in 2007. He is

IEEE Pervasive Comput., vol. 7, no. 4, pp. 1218, Oct.Dec. 2008. currently a Professor with Wuhan University. His

[9] B. Hull et al., Cartel: A distributed mobile sensor computing system, research interests include spacetime GIS, GIS for

in Proc. 4th Int. Conf. Embedded Netw. Sens. Syst., 2006, pp. 125138. transportation, and change detection.

[10] H. W. Mckenzie, C. L. Jerde, D. R. Visscher, E. H. Merrill, and

M. A. Lewis, Inferring linear feature use in the presence of GPS mea-

surement error, Environ. Ecol. Stat., vol. 16, no. 4, pp. 531546, 2009.

[11] J. Wang et al., A novel approach for generating routable road maps from

vehicle GPS trajectories, Int. J. Geographical Inf. Sci., vol. 29, no. 1,

pp. 6991, Jan. 2014.

[12] L. Tang, F. Huang, X. Zhang, and H. Xu, Road network change detection

based on floating car data, J. Netw., vol. 7, no. 7, pp. 10631070, 2012.

[13] P. Yin et al., Mining GPS data for trajectory recommendation, in

Advances in Knowledge Discovery and Data Mining. New York, NY, Xue Yang received the M.Eng. degree from Wuhan

USA: Springer-Verlag, 2014, pp. 5061. University, Wuhan, China, in 2013. She is currently

[14] C. de Fabritiis, R. Ragona, and G. Valenti, Traffic estimation and predic- working toward the Ph.D. degree in the State Key

tion based on real time floating car data, in Proc. IEEE 11th Int. ITSC, Laboratory of Information Engineering in Surveying,

2008, pp. 197203. Mapping and Remote Sensing, Wuhan University.

[15] L. Tang, X. Chang, and Q. Li, Public travel route optimization based on Her research interests include intelligent transporta-

ant colony optimization algorithm and taxi GPS data, Chin. J. Highway tion system, spatiotemporal data analysis, and infor-

Transp., vol. 24, no. 2, pp. 8995, 2011. mation mining.

[16] Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma, Mining interesting locations

and travel sequences from GPS trajectories, in Proc. Int. World Wide Web

Conf., 2009, pp. 791800.

[17] D. Sun et al., Urban travel behavior analyses and route prediction based

on floating car data, Transp. Lett. Int. J. Transp. Res., vol. 6, no. 3,

pp. 118125, Jul. 2014.

[18] W. C. Lee and J. Krumm, Trajectory preprocessing, in Computing Zhen Dong received the M.Eng. degree from Wuhan

With Spatial Trajectories. New York, NY, USA: Springer-Verlag, 2011, University, Wuhan, China, in 2013. He is currently

pp. 333. working toward the Ph.D. degree in the State Key

[19] S. Brakatsoulas, D. Pfoser, R. Salas, and C. Wenk, On map-matching ve- Laboratory of Information Engineering in Surveying,

hicle tracking data, in Proc. 31st Int. Conf. Very Large Data Bases, 2005, Mapping and Remote Sensing, Wuhan University,

pp. 853864. Wuhan University. His research interests include in-

[20] Y. Yanagisawa, J. Akahani, and T. Satoh, Shape-based similarity query telligent transportation system, computer vision, and

for trajectory of mobile objects, in Proc. 4th Int. Conf. Mobile Data LiDAR data processing.

Manage., Melbourne, Vic., Australia, Jan. 2124, 2003, pp. 6377.

[21] R. Bruntrup, S. Edelkamp, S. Jabbar, and B. Scholz, Incremental

map generation with GPS traces, in Proc. IEEE Intell. Transp. Syst.,

Sep. 1315, 2005, pp. 574579.

[22] J. Li, Q. Qin, C. Xie, and Y. Zhao, Integrated use of spatial and semantic

relationships for extracting road networks from floating car data, Int. J.

Appl. Earth Observ. Geoinf., vol. 19, no. 10, pp. 238247, 2012. Qingquan Li received the Ph.D. degree in geograph-

[23] A. Fathi and J. Krumm, Detecting road intersections from GPS traces, ic information system (GIS) and photogrammetry

in Geographic Information Science. Berlin, Germany: Springer-Verlag, from Wuhan Technical University of Surveying and

2010, pp. 5669. Mapping, Wuhan, China, in 1998. He is currently

a Professor with Shenzhen University, Guangdong,

[24] G. Agamennoni, J. Nieto, and E. M. Nebot, Robust inference of principal

road paths for intelligent transportation systems, IEEE Trans. Intell. China, and Wuhan University, Wuhan. His research

Transp. Syst., vol. 12, no. 1, pp. 298308, Mar. 2011. areas include dynamic data modeling in GIS, sur-

[25] L. Cao and J. Krumm, From GPS traces to a routable road map, in veying engineering, and intelligent transportation

Proc. 17th ACM SIGSPATIAL Int. Conf. Adv. Geographic Inf. Syst., 2009, system.

pp. 312.