Beruflich Dokumente
Kultur Dokumente
Geoff Wright
In March 2004, DARPA sponsored the first Grand Challenge event. Fifteen autonomous
ground vehicles attempted to complete a 142 mile desert course but the furthest distance
reached was 7 miles. The second event in 2005 was more successful. Four autonomous
vehicles successfully completed a 132-mile desert route under the required 10 hour limit,
and the prize went to “Stanley” from Stanford University.
On November 3rd 2007, MIT will compete in the new race: the DARPA Urban Challenge.
Autonomous ground vehicles will race a 60 mile course through a mock city
environment, and will be required in particular to:
• Merge into moving traffic
• Navigate traffic circles
• Negotiate busy intersections
• Change lanes during moving traffic
• Avoid static and dynamic obstacles (the first challenge had no dynamic obstacles)
This project will be focusing on the detection of lane markers to help with the above
problems. Some of the causes of difficulty in lane marker detection are:
• Colour variations, dirty roads
• Lighting variations
• Temporary occlusion due to other vehicles
• Temporary glare from headlights
• Confusing items such as black and white striped clothing, flags, or patterns due to
trees.
Problem Definition
The requirement was to develop a robust algorithm to take in video footage from a live
camera or a log file, and in real-time detect all lane markers visible in each frame. The
current algorithm being used is an HSV box filter which is good enough for the splinter
robot in the lab to follow fluorescent green tape, but is not robust to lighting changes,
occlusion, dirty lane markers, noisy images, headlight glare amongst other things.
The System Architecture diagrams in Appendix F show how lane marking detection fits
in with the rest of the work.
Related Work
The following papers contain ideas which contributed towards the success of this project.
An Integrated, Robust Approach to Lane Marking Detection and Tracking, 2004 IEEE
Intelligent Vehicles Symposium, Joel McCall
• Steerable Filters
Texton Boost Algorithm: Joint Appearance, Shape and Context Modelling for Multi-Class
Object Recognition and Segmentation, Microsoft Research Ltd
• Learning model, using ground truth data pairs. Could be useful for calibration.
Planned Approach
Assumptions
To simplify the problem the following assumptions were made:
• Flat world (but could develop this using LIDAR or prior topographical data)
• All lane makers are yellow or white
• Lane markers have a fixed (small) range of widths
Sample Input
Idealised Output
The code modules for the rest of the system communicate over an LC/LCM framework,
so a new LCM type was created called lane_marker_t. This type contains the following
information:
• A set of lane marker objects. Each object has:
o A probability that it is a white lane marker
o A probability that it is a yellow lane marker
o A list of control points, uniformly spaced, in the local coordinate frame
o An identifying number
• Timestamp
• Sequence Number
Methods
Rectification onto the ground plane was a key image processing function. The
pin-hole camera model was used:
Figure 1.1: Pin-hole Camera Model
The camera parameters (focal length and CCD resolution) were obtained from the
MIT Team Wiki, along with the inclination and position of the camera relative to
the vehicle and ground plane. A combined rectification and interpolation map is
calculated during initialisation that maps each point on the rectified image back to
four pixels (which may be the same location) on the raw image. A weightings
map is calculated also. This procedure allows for rapid real-time calculation of
the plan view image of the order of 4*n*m operations where n*m is the input
video resolution.
Figure 1.2: Width Filter Kernel
The Width-Filter implemented a 2-dimensional convolution
of each pixel in the image with the kernel in Figure 1.2. This
gives a positive response for anything with a local gradient w
that goes dark-bright-dark at the scale of w. Otherwise the
response will be negative. It is thought that at large distances
from the vehicle the width filter starts to pick up interpolation w/2 w/2
effects.
w = lane marker width
Implementation Strategy
Please refer to Appendix E for the process flow that was planned at the start of the
project. The yellow box on the left of the diagram describes the image processing
algorithm, and the purple box on the right of the diagram describes the higher level
processing over a number of images.
Current Results
The system is tuned to low miss / high false positives because there is much scope in the
future for higher level processing to remove the false positives as will be described later
in this section.
During uncluttered urban scenes, e.g.: straight sections of road, empty intersections or
dual carriageways the results are excellent with 90-100% of lane markers being detected
within a 30m distance from the vehicle. “Busy” scenes like intersections with a number
of cars queuing up directly in front of the vehicle or colourful advertisements lining the
sides of the road tend to lead to false positives. A discussion of high level processing to
remove these false positives follows.
Success cases
Appendix D: Figures D.1 – D.4 demonstrate typical performance in uncluttered scenes.
Note that the purple and dark red speckled false positives in Figure D.1. could be easily
removed with a simple RANSAC implementation.
Figure D.4. demonstrates a good performance in spite of the car in front. This is an
example where optical flow data giving knowledge of where the car obstacle is would
screen out the false positive on the top left hand edge of the car in front.
Failure modes
The main failure modes currently are glare from headlights (see Appendix C), and
general spurious data points. It is thought that the vast majority of these false positives
can be characterised and eliminated at the curve fitting stage. Currently, the spline
algorithm is very simplistic: sort the data points by distance from centre of object, and
sample the list to obtain the spline nodes. This gives a large reduction in the amount of
data, but loses certain crucial information such as the variance in orientation of the data
points. Responses from headlights, glare and spurious data points general have a dataset
with high variance in orientation that is much further than removed from a straight line
than the response due to a real lane marker. Hence, more work is required on RANSAC
analysis to fit curves and estimate the “line-lyness” of each potential lane marker object,
at a higher level than the per-pixel basis that has been used for the analysis so far.
There is another failure mode which has been illustrated in Appendix A, Figures A.4 and
A.5. The right hand lane marker has not been detected in this frame and it is thought that
this traces back to the paint filter. The current paint filter does not implement the
normalisation using average image brightness outlined in the plan in Appendix E. Thus,
the parameters are tuned to a particular level of image brightness and when an individual
frame is a bit dark, the paint filter does not do its job. Adding this normalisation is not
difficult but some thought is needed as to the best method of calibration.
Further Development.
The future development needs discussed so far are:
• Change curve fitting to RANSAC to eliminate false positives due to car
headlights and any high variance responses.
• Add average brightness level normalisation to fix missed lane markers due to
cloud cover or streetlights.
There is a rich dataset from the LIDAR sensors which could be combined with the
rectification algorithm to give some level of 3-D structure. This road location cue could
to give stronger probability to lane markers within the likely road position.
The RSS II Optical Flow project produced excellent results, that give cues for buildings,
trees, cars (and the corners of lane markers). This data should definitely be incorporated
into lane marker perception because it gives a more defined model of the road location
than is currently available. The data could eliminate all false positives outside of the road
area.
One feature that has not been discussed so far is the utilisation of previous frames. The
original process flow had a decaying memory framework whereby a map of lanes in the
region of, say 50m, around the vehicle is populated by the image processing algorithm.
The probability of each lane marker on the map decays on each clock count, but can be
increased by superposition. A key advantage is that high probability lane markers
occluded by e.g. a large vehicle will persist for a number of frames.
Introspection
RSS II has taught me a great deal about image processing in a fun and instructive
environment, while providing some level of value to the MIT DARPA Urban Challenge
Team.
In the classroom component were a number of lectures giving insight into other teams
experiences in previous years, appropriate testing procedures for large multi-disciplinary
projects, and general management techniques for software projects involving a large code
base and / or people base.
The pre-project labs gave a gentle introduction to the LCM message infrastructure, the
splinter robots for small scale tests and gave the change to get to grips with the C
language for those with limited experience. The learning curve accelerated; the second
lab was considerably more challenging than the first but the workload similar which is a
testament to the learning rate of the course.
During the final project I enjoyed the freedom of choosing my own area of interest, and
experimenting with new ideas. As well as learning a wide variety of image processing
techniques, I also felt that my project management skills improved with respect to
prioritising a large scope into an achievable workload that gave basic functionality.
Conclusion
Robust Lane Marker Detection is a challenging problem best solved by utilising a wide
range of cues as has been described in the preceding pages. The methods used for this
project have been widely successful but there are too many false positives. In the next
few months I aim to implement all of the further developments described in the previous
section, achieve 0% misses, and minimise false positives. In the medium term, I aim for
the code to be used in the MIT vehicle for the 2007 competition, and to find inspiration
for my Masters thesis next year.
Appendix A: Screenshots Set I, Illustrating Process Flow
Figure A.1: a typical uncluttered raw image from the logged video.
Figure A.2: the rectified version of the above, cropped at 20m distance.
Figure A.3: the response from the width filter
Figure A.4: strong responses from the width filter, superimposed on the rectified image
Figure A.5: segmentation of width filtered response based on edge direction.
Appendix B: Screenshots Set II, Illustrating Process
Flow
Figure B.1: Raw image, more cluttered.