Sie sind auf Seite 1von 28

Autonomous Exploration of Urban Environments using

Unmanned Aerial Vehicles


Benjamin Adler
Institute of Technical Aspects of Multimodal Systems, University of Hamburg, 22527 Hamburg, Germany
e-mail: adler@informatik.uni-hamburg.de
Junhao Xiao
Department of Automatic Control, National University of Defense Technology, 410073 Changsha, China
e-mail: junhao.xiao@ieee.org
Jianwei Zhang
Institute of Technical Aspects of Multimodal Systems, University of Hamburg, 22527 Hamburg, Germany
e-mail: zhang@informatik.uni-hamburg.de
Received 6 July 2013; accepted 24 April 2014

This paper introduces a custom-built unmanned aerial vehicle that is capable of autonomous exploration in
urban environments. It consists of a multicopter, an inertial navigation system, and two two-dimensional laser
range finders. In addition to a description of the hardware architecture and individual components being used,
the authors also discuss challenges and problems that arose during its construction, as well as optimizations
and workarounds applied in the course of its development. Also presented is the software architecture, with
a focus on a novel algorithm capable of generating multiple next best views, sorted by achievable information
gain. Although the vehicle was designed for application on airborne platforms in urban environments, it works
directly on raw point clouds and thus can be used with any sensor generating spatial occupancy information
(e.g., LIDAR, RGBD, or time-of-flight-cameras). To satisfy constraints introduced by real-time operation on
unmanned aerial vehicles, the algorithm is implemented on a highly parallel single-instruction multiple-data
architecture and benchmarked using graphics processing units from multiple hardware generations, using
data from real flights. It is also compared with the previous, CPU-based proof of concept. As the underlying
hardware imposes limitations with regard to memory access and concurrency, necessary data structures and
further performance considerations are explained in detail. The Open-source code for this paper is available at
http://www.github.com/benadler/octocopter/.  C 2014 Wiley Periodicals, Inc.

1. INTRODUCTION learn before they start school: [...] navigating a backyard, recognizing
a face; seeing.
While robots have become a very common sight in indus-
trial environments ever since their first application for man- For the age of robots to truly begin, humans will have
ufacturing tasks in the middle of the preceding century, their to find a way to teach them what children learn before they
adoption in other fields of human labor has been excruci- start school: sense, remember, and recognize their environ-
atingly slow. This is surprising at first, as machines have ment; manipulate objects; and communicate. The farther
shown superior performance in comparison to humans in away from industrial manufacturing and other highly struc-
many disciplines: often, they are faster, more precise, more tured environments robots are to be of help, the more im-
durable, and even cheaper in the long term. In stark contrast portant these skills become.
to that, most of todays robots are still incapable of perform- For this reason, autonomous environment sensing and
ing even the most simple tasks in an acceptable timehow model learning is one of the most fundamental challenges
else can it be explained that there are no robots helping us in mobile robotics. Especially in field robotics, the avail-
in the supermarket or at the gas station? As danish author ability of a map is essential for many tasks such as lo-
Tor Nrretranders puts it, calization, path planning, navigation, and manipulation.
Consequently, vast amounts of research have been directed
at this field. A very good example for progress is the ability
It is not that difficult to build computers capable of playing chess of simultaneous localization and mapping (SLAM): a little
or doing sums. Computers find it easy to do what we learned at over a decade ago, first-generation SLAM approaches as
school. But computers have a very hard time learning what children presented in Thrun, Burgard, Fox, Hexmoor, and Mataric

Journal of Field Robotics 31(6), 912939 (2014) C 2014 Wiley Periodicals, Inc.
View this article online at wileyonlinelibrary.com DOI: 10.1002/rob.21526
Adler et al.: Autonomous Exploration of Urban Environments 913

(1998) were limited to robots moving on planes and in


highly structured indoor environments. During the past
decade, the algorithms advanced, keeping pace with the
physical development of robots: better locomotion has al-
lowed robots to become far more maneuverable and escape
indoor environments, better energy sources have allowed
for longer operation, and better computers allow for faster
speed. Just two years ago, algorithms were presented that
were capable of localizing robots with six degrees of free-
dom while mapping three- or even four-dimensional space
in real-time (Newcombe et al., 2011). Figure 1. Subtasks of automatic model learning, from Fairfield
Still, for many real-world applications in the field, map- (2009).
ping an outdoor environment by simply defining a region
of interest in three-dimensional (3D) space and leaving the by a description in Section 6 of the motion control systems
details of the procedure to an autonomous robot would con- required to let the unmanned aerial vehicle (UAV) follow
stitute a considerable improvement. the aforementioned paths. In Section 7, we show waypoints
After successfully implementing localization and map- and paths generated for point clouds captured during real
ping on an unmanned aerial vehicle to enable generation flights, and we analyze the real-time applicability of the al-
of maps even in regions that are unnavigable for ground- gorithms. Finally, the paper is concluded in Section 8 by
based robots, our research focus has been on advancing our summarizing the results and discussing future work.
approach from mere mapping to exploration. A large part Videos showing the closed-loop application of the pre-
of the difference is the generation of goal configurations in sented approach for exploration in real-time are available at
order to maximize the systems information gain. Determin- http://tams.informatik.uni-hamburg.de/videos/.
ing this sensor placement is a generalization of the nonde-
terministic polynomial-time (NP) -hard art gallery problem,
and it was named the next-best-view (NBV) problem by
2. RELATED WORK
Connolly (1985). While SLAM has been researched inten- Fairfield (2009) groups the tasks of localization, mapping,
sively in recent decades, next-best-view planning has not and planning into their combinations SLAM, exploration,
received nearly as much attention. active localization, and, unifying all three problems, active
Because our experimental airborne platform features SLAM (see Figure 1). One the one hand, this model does not
a flight-time of only 15 min, NBVs need to be determined allow unambiguous classification of all existing approaches
quickly. In contrast to generating a single NBV for a given due to the lack of clear definitions, and it leaves room for
input, computation of multiple NBVs sorted by achievable debate concerning, e.g., whether localization and planning
information gain is preferred, as this enables the creation (active localization) can exist without a map. Still, it serves
of trajectories that include all NBV-derived waypoints in an as a helpful guide. In the next paragraphs, we will focus on
order optimized to allow catenation by the robot in minimal the state of the art in motion planning and its application to
time. autonomous model learning.
The work described in this paper builds on our pre- The art gallery problem was researched extensively
viously published result (Adler, Xiao, & Zhang, 2013). It by ORourke in 1987, and it still serves as a foundation
extends the paper by presenting the hardware architecture for many related approaches. Understandably, the pre-
in more detail and adding descriptions of the software mod- sented algorithms were mostly operated on polygons in
ules. The approach for generation of next best views is ex- two-dimensional space. Because the problem statement pre-
plained more thoroughly. Furthermore, new algorithms for sumes the map to be known a priori, these works do not
creating collision-free flight paths between computed way- present sufficient solutions for NBV planning in unknown
points were amended. Waypoints and flight paths of larger environments.
point clouds are presented in a more detailed results section. There have been multiple publications surveying ac-
This paper is organized as follows: The next section tive perception planning for reconstruction and inspection,
presents a short overview of work in this and related fields. the most recent being Scott, Roth, and Rivest (2003), which
In Section 3, the general architecture of the hardware and classifies methods as either surface-based, volume-based,
software setup is introduced. Section 4 starts with the idea or global. We detect the absence of information (missing ge-
behind the approach for generating waypoints providing ometry) using a surface-based approach while rating possi-
high information gain, and it continues to detail the data ble information gain of sensor poses using volumetric data
structures and algorithms necessary to do so. Section 5 structures.
discusses how the generated waypoints are used to cre- Makarenko, Williams, Bourgault, and Durrant-Whyte
ate collision-free paths through the environment, followed (2002) include all subtasks of autonomous exploration in

Journal of Field Robotics DOI 10.1002/rob


914 Journal of Field Robotics2014

the problem domain andwhile focusing on localization ulation and real-world experiments, Strand and Dillmann
qualityattempt to find a balance with information gain (2010) condense a scanned point cloud to a two-dimensional
and motion cost. However, the algorithm relies on assump- grid for navigation and planning. At the same time, the abil-
tions about unknown parts of the map, and it was evaluated ity to explore unstructured environments with slopes and
using only a simulation of a robot in a two-dimensional en- overhangs is lost.
vironment. The text further notes that while undoubtedly Blaer and Allen (2007) research ground-based outdoor
the optimal plan must take into account the expected inte- reconstruction based on building voxel-grids from acquired
gral payoffs (e.g., information gain along the path), the com- data and ray traversal for NBV computation, requiring an a
plexity of the problem effectively precludes this approach. priori 2D map with which it plans a minimal set of sufficient
Reasoning over yet-unexplored spaces using proba- covering views. To manage large-scale outdoor scenes, the
bilistic methods, such as that in Potthast and Sukhatme voxels sizes are increased to 1 m3 and they are marked as
(2011), yields helpful output so long as the environment to seen if they contain at least one point from the iteratively ac-
be scanned follows the assumptions made beforehand, e.g., quired point cloud. Candidate NBV locations are computed
flat tabletops and nondegenerate shapes. Unfortunately, by finding occupied voxels that intersect the ground-plane,
real-world outdoor scenarios are not necessarily flat and but they are marked as free on the a priori 2D map. New sen-
are often more complex than tabletops with cutlery. sor poses are then generated by using ray-tracing to count
Bryson and Sukkarieh (2006) define a mutual informa- the number of boundary unseen voxels (unseen voxels
tion gain resulting from traveling to a goal as the differ- adjacent to at least one empty voxel) visible from all candi-
ence between the entropies of the distributions about the date locations.
estimated states before and after making the observations. A frontier-based approach is commonly used in many
This definition requires knowledge about the probability NBV problems (Shade & Newman, 2011; Yamauchi, 1998),
distribution after making the observations at a given goal in because it yields sensor-poses located between known and
an unmapped environment, which cannot be foreseen and unknown regions. On the one hand, these poses offer safe
thus must be simulated under the assumption that there reachability, because the path planner can compute a tra-
exists a certain available feature density [...] in terms of av- jectory through known parts of the environment. On the
erage number of features per map grid area. other handgiven that the pose is oriented toward the
Null and Sinzinger (2006) introduce a dichotomy: the unmapped environmentit will allow the sensor to de-
interior NBV problem places the sensor within the geom- liver valuable information, advancing the mapping pro-
etry that is to be mappedindoor-scenarios are an obvi- cess. To compute such a pose, knowledge about frontiers
ous instance of this class. The exterior NBV problem is de- has to be derived from the underlying data structure. When
fined by the sensor being placed outside of the object to be using two-dimensional grid maps, frontier cells are de-
reconstructed, like a statue or a typical household object. fined as unknown cells adjacent to free cells and this way a
Here, problems arise mainly from self-occlusions of non- global frontier map can be produced (Mobarhani, Nazari,
convex objects. Surface normals are computed either from Tamjidi, & Taghirad, 2011).
voxel structures or reconstructed mesh surfaces (an expen- For our platform (see Figures 2 and 3), localization qual-
sive operation in itself), letting the sensor align itself with ity does not depend on classical map features in a way com-
normals at the border between scanned and unscanned re- mon to most SLAM algorithms, as we rely on global nav-
gions. For both problems, the paper presents algorithms igation satellite system (GNSS) -based localization. How-
based on voxel data structures and ray casting that limit the ever, this does not mean that localization quality is constant
range sensor to a constant height. Because their solution to throughout the environment. Indeed, the number of satel-
the exterior NBV problem limits the possible angles of the lites in view as well as their geometric constellation are an
sensor-pose to point toward the object, it is orders of mag- important factor for the precision of the computed posi-
nitude faster than the algorithm for interior NBV planning, tion. When mapping very close to buildings or in between
which cannot limit the search space in such comfortable high-rise architecture, obstruction of satellites becomes so
ways. Still, even the faster algorithm requires almost 12 s of important that motion planning must consider both satellite
processing time for a grid of 503 voxels, so increasing the orbits and local geometry.
grids resolution quickly proves problematic. Furthermore, Following the realization that the localization used
the strict separation of interior and exterior planning does on our vehicle is sufficiently precise in typical urban en-
not fully accommodate the case of a robot mapping an out- vironments, we optimize the exploration strategy toward
door environment, which can contain geometry from both the more application-oriented metric of map completeness
classes. (not just coverage) as well as mapping duration instead of
Because the amount of computational resources has not localizability.
increased as fast as required by the integration of the third Three-dimensional environment mapping is often im-
dimension into planning algorithms, a common strategy is plemented using laser scanners and time-of-flight cameras,
to reduce the dimensionality of the learned map. In sim- so point clouds are a very common type of sensor data.

Journal of Field Robotics DOI 10.1002/rob


Adler et al.: Autonomous Exploration of Urban Environments 915

(a) First configuration, in which laser scanners FOV is ori- (b) Second configuration with the scanners FOV oriented
ented vertically and the inertial measurement unit is fixed to downwards and perpendicular to principal direction of travel.
the scanner for precise alignment. The inertial measurement unit is mounted using vibration
dampeners for improved INS performance.
Figure 2. Previous hardware configurations of the experimental flying platform with mounted GNSS-antenna and -receiver, laser
scanners, inertial measurement unit (IMU), and processor-board. The red boom points forward.

Figure 3. The current setup includes a second laser scanner for obstacle avoidance and mapping. Both the laser scanners and the
IMU are mounted together with the battery on a stiff and light-weight carbon-fiber sandwich-board. The resulting module makes
up 40% of the UAVs weight and is fixed to the vehicles frame using vibration dampeners. This yields less drift in the INS solution,
more precise sensor alignment, and better crash survivability.

Unfortunately, information about exploration boundaries 3. EXPERIMENTAL PLATFORM


is hard to generate from point clouds. Applying a plain spa-
3.1. Hardware
tial subdivision to point clouds to form a 3D occupancy grid
map might appear to be a logical next step, extending Mo- 3.1.1. Unmanned Aerial Vehicle
barhanis definition into the third dimension. Constantly The UAV is built upon an Okto 2-multirotor helicopter
updating such a grid quickly becomes a burden on the pro- from the mikrokopter project (see Figure 3). Using a four-
cessing pipeline, as all rays scanned by the laser scanner cell 5,000 mAh LiPo-battery, its eight motors enable the plat-
have to update all the cells through which they travel. This form to lift a payload of about 600 grams for up to 16 min
makes resolutions in the centimeter range quickly become of normal flight. See Table I for a list of components and
unfeasible. Furthermore, a height limit has to be imposed their weight, as well as Figure 4(a) for a graph describ-
manually to keep the robot from mapping unknown (and ing the relation between payload and flight time. Judging
empty) regions in the sky. by the weight of the payload alone, using the QuadroXL

Journal of Field Robotics DOI 10.1002/rob


916 Journal of Field Robotics2014

Table I. Platform components and weight. tion, with a reduced data rate even when the line-of-sight
between both is obstructed.
Component Weight (g)

2 Hokuyo UTM 30lx 426 3.1.3. Inertial Navigation System


IMU XSens MTi 54 For self-localization, the UAV is equipped with a Septen-
SensorBoard 41 trio AsterX2i commercial INS system, featuring an XSens
AtomBoard incl. case and WiFi antennas 126 MTi MEMS IMU and a dual-frequency GNSS1 RTK2 re-
GNSS receiver incl. case 97
ceiver supporting GPS and GLONASS constellations. Alto-
GNSS antenna 17
gether, the IMU, GNSS-receiver, -antenna, breakout board,
GNSS antenna cable & mount 16
cables, and case weigh just 184 g. As RTK-GNSS uses con-
Shielding (aluminum foil) 15
stant updates of satellite observations from a GNSS base to
Sum payload 792
correct for errors induced in the transmission of the satellite
Payload 792 vehicles SiS,3 additional infrastructure is required to feed
LiPo 4S 16.8 V 5 Ah 511
these differential corrections to the rover in-flight. The INS
Mikrokopter Okto2 1,180
also retrieves the IMUs values at 50 Hz and fuses them with
Sum platform 2,483
10 Hz GNSS PVT4 solutions. Because the MEMS-grade IMU
exhibits a drift of more than 15 degrees per hour (earths ro-
tational velocity), it is unable to reliably measure earths
four-motor version does seem to be the obvious choice, as rotation and thus it is unable to perform static alignment.
cost and expected flight times compare favorably to the To solve the systems heading after startup, trajectories of
Okto2 model. On the other hand, using a larger vehicle IMU and GNSS have to be aligned while the vehicle moves
with more motors than strictly necessary has proven to be forward (kinematic alignment). This happens during the
advantageous in many regards: in terms of flight dynamics, first seconds of flight. In the next phase, covariances of the
the mounted sensors experience a more stable, smoother fused solution are minimized by flying curves, accelerating
flight that is less prone to wind and small turbulence. Fur- and stopping. The duration of this phase depends on the
thermore, eight slow-running motors have been shown to desired covariance thresholds and motions experienced by
induce fewer vibrations into the IMU than four motors of a the INS, but it usually takes less than 30 s.
smaller version that lift almost the same weight. Although Because the IMU is of MEMS-grade precision, is very
the vehicle has never had a motor or controller fail during lightweight, and is mounted on a vibrating platform,
flight, it has lost single propellers due to contact with trees, the values read by the INS are comparably noisy. Vi-
walls, and ground. As the flight behavior was not visibly bration dampingan eternal challenge for UAV-mounted
affected by this loss, a missing propeller was often detected sensorsbecomes even more difficult because the fre-
only after landing. Especially during development of the quency of vibration depends highly on the motor-speeds.
high-level flight-controller, the added redundancy in the oc- Figure 5 shows the raw values received by the IMU at the
tocopters propulsion has saved the vehicle from countless beginning of a flight. Clearly visible are the noisy readings
crashes, quickly paying off the higher initial investment. while parked (green section), vibrations experienced from
the motors spinning up (yellow) and during flight (red).
When requesting position and attitude from the INS at
3.1.2. Onboard Computing intervals shorter than the minimum GNSS-measurement-
For low-level flight-control, we use the FlightCtrl ME 2.1 interval of 100 ms, fused measurements (GNSS and IMU
flight controller, designed by the mikrokopter project. It is combined) will be interleaved with data that are produced
based on an ATMEGA 1284P microcontroller, it features by double-integrating IMU readings onto them. In the case
MEMS gyroscopes and accelerometers for stabilization, and of strong vibrations, this leads to unusable integrated poses,
it communicates with all eight brushless motor controllers as depicted in Figure 6. While using these data for sensor
using an I2 C bus. Its source code is published and free to fusion with the scanned range data is somewhat detrimen-
use for noncommercial purposes. tal to point cloud accuracy, it can have disastrous effects
The platform also carries an onboard computer, weigh- when used in a flight controller that contains a derivative
ing just 110 g including its case. It contains a 32-bit In- component.
tel Atom Z530 single-core processor with two logical cores
(Hyper-Threading) running at a clock of 1.6 GHz, with
1 GB of RAM and 8 GB of nonvolatile flash memory on a 1 Global Navigation Satellite System
micro-SD card. Equipped with seven USBs, two serial ports, 2 Real Time Kinematic
and IEEE 802.11n WiFi, it connects all devices onboard the 3 signal-in-space

vehicle and allows for fast communication with the base sta- 4 position, velocity, time

Journal of Field Robotics DOI 10.1002/rob


Adler et al.: Autonomous Exploration of Urban Environments 917

28 Hovering 28 Hovering
Normal Normal
26 Sportive 26 Sportive
24 24
22 22
20 20
Flighttime (min)

Flighttime (min)
18 18
16 16
14 14
12 12
10 10
8 8
6 6
4 4
2 2
0 0
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400
Payload (g) Payload (g)

(a) Flight time of an Okto2 model (8 motors, total weight 1770g (b) Flight time of a QuadroXL model (4 motors, total weight
including a 4-cell 5Ah LiPo battery) versus payload. 1480g including a 4-cell 5Ah LiPo battery) versus payload.
Figure 4. Graph depicting the achievable flight times versus the payload carried by differently sized multirotors, courtesy of the
mikrokopter project. While these numbers are theoretical, they match the experienced real-world flight-times very closely.

Figure 5. Raw accelerations experienced without motion (green), during motor-spinup (yellow), idling (white), and takeoff (red).

Given sufficient satellite reception, the systems posi- ing at the same time. This way, the scanner would have been
tion is solved to a precision of 5 cm in RTK fixed mode, while capable of scanning the ground below as well as obstacles
roll and pitch angles exhibit maximum errors of about 0.5 . ahead of the UAV. While this idea worked satisfactorily in
The precision of the heading angle depends on the amount simulation, the actual INS was not capable of delivering pre-
of motion the vehicle experiences and usually converges to cise heading information by fusing the sensor trajectories of
a maximum error of less than 1.0 . the GNSS receiver and the IMU, because its proprietary
filters were created under the assumption that the vehicle
would move mostly forward. Also, with the LIDAR scanner
3.1.4. LIDAR Sensors and IMU mounted directly to the boom, both were subjected
Initially, a single Hokuyo UTM-30LX laser range finder was to strong vibrations originating in the UAVs motors, possi-
mounted to the front arm with its field of view aligned verti- bly reducing the scanners life expectancy and adding noise
cally, as depicted in Figure 2(a), and connected to the afore- to the IMUs values.
mentioned onboard computer. Its laser scanned a front- In a second version, the laser scanner remained
facing planar field of view of 270 , resulting in a 90 blind mounted to the vehicles forward arm, but with its front
spot in its back, which aligned well with the available field side facing downward and the blind spot directed upward
of view at that mounting position. To fully take advantage of [as shown in Figure 2(b)]. The flight controller was adapted
this setup, a flight controller was created to reach waypoints to fly forward while yawing and rolling toward given way-
by pitching and rolling toward them, while constantly yaw- points. This resulted in precise attitude solutions from the

Journal of Field Robotics DOI 10.1002/rob


918 Journal of Field Robotics2014

Figure 6. Visualization of the vehicle trajectory during liftoff, exhibiting considerable IMU drift and noise. Poses are sampled at
50 Hz; each fused pose (red) is followed by 4 IMU-integrated poses (blue).

INS, but it made detection of obstacles ahead of the vehicle scanner additionally suffers from drift within the first 50
difficult. The IMU was moved to the vehicles center and min of operation: after startup, returned distances are up to
installed using vibration dampeners. While this reduced 10 mm shorter than ground truth.
the noise in its readings considerably, it decoupled the laser Although every scan of 1,080 rays contains a time
scanner from the IMU, resulting in a detrimental effect of un- stamp, its value is generated using a local clock that starts
known magnitude on the registration accuracy. Like the first at zero when the laser scanner is powered up. The manufac-
revision, the scanner moved the systems center of gravity turer describes methods of synchronizing clocks, relying on
away from its geometrical center, which was partly com- the assumption that data are transferred on the USB with
pensated for by moving the battery. relatively low jitter. But because communication is imple-
The final setup (Figure 3) was created in order to accom- mented using USB bulk transfer modes, no timing guar-
modate a second, forward-looking laser scanner for better antees are given by USB specifications. In effect, synchro-
obstacle detection. Both scanners points are fed into the nization is precise only to a few milliseconds and depends
same point cloud, also accelerating the scanning procedure. on bus utilization. A synchronization error of, e.g., 5 ms, a
The comparatively lightweight onboard computer and INS scanned distance of 25 m, and an angular vehicle-velocity
were now installed above, while the heavy combination of of 100 /s leads to an angular error of 0.5 , and, by solving
two laser scanners, the IMU, and the battery were placed the length of the third leg of the resulting isosceles triangle,
below the center of gravity. By using carbon-fiber sandwich it leads to a positional error of about 0.22 m for that point.
boards, the necessary board is extremely warp-resistant and These numbers also serve as both an example for the im-
weighs just 41 g. It directly couples the IMU with both LI- precision induced by the INS error specifications of 1.0 and
DAR sensors and is less prone to vibrations as it also holds 0.5 degrees for yaw, pitch, and roll, as well as the reason
the battery. Still, it is installed using four vibration damp- why using LIDAR sensors with an extended range is not
eners. always helpful.
Each scanner delivers distance values of 1,080 rays with To enable better time synchronization, the scanner also
0.25 angular resolution at a rate of 40 Hz and a range of features a SYNC-signal on a dedicated pin. It is pulled low
up to 30 m. According to Demski, Mikulski, and Koteras for 1 ms whenever the laser traverses the scanners rear
(2013), the precision of measured distances depends on the position. By connecting this signal to one of the event-in
reflectivity and color of the sampled material, but it gener- pins of the GNSS receiver and configuring the latter to emit a
ally shows relative errors of less than 1%. Interestingly, the time stamp on falling edges, each scan is time-stamped with

Journal of Field Robotics DOI 10.1002/rob


Adler et al.: Autonomous Exploration of Urban Environments 919

Figure 7. Diagram of hardware and interconnections used for both base and rover.

a precision of better than 10 ns, using the same GPS-derived IMU sensor readings for long periods of time, introducing
clock as for the INS measurements time stamps. Assuming considerable errors into the solutions of position and atti-
a constant rotational velocity of the LIDARs mirror, each tude. To rectify this problem without access to expensive
single ray can be temporally associated with the received equipment such as spectrum analyzers, all sources of radi-
poses with microsecond-precision. In our experience, the ation were wrapped in aluminum foil. Wires to and from
advantage of extremely precise timing measurements for the onboard computer were found to act as antennas, emit-
highly dynamic mobile platforms even in the temporary ting further interference, which was fixed by running the
absence of satellite coverage is often overlooked. affected cables through ferrite rings. Small parts of the in-
The disadvantage of this mechanism is that generation terferences remain, as large parts of the laser scanners sur-
of 40 time stamp packets per second does affect the CPU face must remain unshielded for obvious reasons. Moving
load of the INS, which is a critical parameter and must the GNSS antenna upwards and away from the scanner has
be kept below a vendor-specified threshold. When further reduced the remaining effect of this problem so much that
INS output is enabled (e.g., for diagnostic purposes), higher GNSS reception has become reliable. On the downside, the
CPU load can be a reason for the INS being unable to reach added shielding has imposed the need to monitor temper-
precise positioning modes or align successfully. To alleviate atures of onboard computer and motor controllers during
this problem, a 74HC4040 binary counter was installed be- flight.
tween the laser scanner and the event-in pin. By attaching For a complete diagram of the hardware used in this
the SYNC signal to CLOCK and the event-in pin to one of paper, please see Figure 7.
the counters parallel output pins, the SYNC signals fre-
quency can be divided by 2 (pin Q1) to 4,096 (pin Q12).
3.2. Software
This setup enables registration of the scanners rotational
phase while saving the limited computational resources of During development, a custom software stack was created,
the embedded system of the INS. consisting of three main modules: a simulator, a base sta-
As soon as the INS solves time, position, and attitude tion, and a rover program to run on the UAV. Because
precisely, the onboard computer fuses data from all sensors all algorithms must run in real-time and some parts (like
in real-time, creating a point cloud that is streamed to the the high-level flight controller) are run on very constrained
ground station during flight. hardware, all software is implemented using C++. A sim-
ple TCP/IP-based protocol was designed, with the rover
program and simulator acting as a server and accepting
3.1.5. Electromagnetic Interference connections from the base station. The protocol implements
The flight controller, the laser scanner, and especially the streaming of scanned points to the base station and allows
onboard computer emit electromagnetic interference on L1 exchange of other information; see Table II for a complete
(1,575.42 MHz) and especially L2 (1,227.6 MHz) bands, listing.
which is received by the GNSS antenna, often causing The simulator (see Figure 8) was created first in order
the GNSS signal-to-noise ratios to drop below acceptable to allow tests of how different scanner setups and flight con-
thresholds. This forces the INS to rely on double-integrated troller implementations would affect speed and quality of

Journal of Field Robotics DOI 10.1002/rob


920 Journal of Field Robotics2014

Table II. The TCP/IP-based protocol for communication between rover and base.

Message Direction Description

UAV status Base Rover battery voltage, motion-controller state, air pressure, etc.
INS status Base Rover number of visible/usable satellites, positioning mode, integration mode,
CPU load, age of differential corrections, solution covariances, etc.
differential corrections Base Rover differential correction data for RTK-GNSS in RTCMv3 format
lidar points Base Rover points from one scanners sweep in float4 format (x/y/z/w), where the
w-component indicates the squared distance between scanner and point
pose Base Rover the UAVs current position and attitude
controller gains Base Rover sets PID gains of the high-level motion controller
controller gains Base Rover notification that the high-level motion controller has changed the gains (due
to a request from the base station)
waypoint reached Base Rover notification that the UAV has reached the next waypoint
waypoint list Base Rover a list of generated waypoints, to be navigated by the UAV
flightstate Base Rover a new flight state from the motion controller due to changed waypoint
availability or flight-state restriction
flightstate restriction Base Rover flight-state restriction has changed (because the pilot actuated the switch on
the remote control)
flightcontroller values Base Rover new debug-values from the high-level motion controller, visualized in base
station
scanner state Base Rover enables/disables the onboard laser scanners
logmessage Base Rover a message from the UAV, to be displayed in base station user interface
ping Base Rover used to test connectivity

point cloud generation. Much attention was paid to achiev- ing waypoints into control output for the UAVs low-level
ing a realistic simulation: the physics engine (Bullet) com- controller running in the FlightControl-board (see Section
putes forces based on precise weights of the components, 6 for details). During flight, all sensor data and generated
while the lift-forces are determined based on experimental outputs from the high-level motion-controller are logged to
data relating the propellers rotational velocity and thrust. disk. The base station program can open these files for later
To simulate motion induced by wind and gusts, data cap- replay, greatly improving post-flight visualization, valida-
tured by an ultrasonic anemometer from a local weather tion, and debugging.
station were also incorporated into the simulation.
The base station program (depicted in Figure 9) hosts
almost all of the algorithms presented in this paper, as 4. COMPUTING NEXT BEST VIEWS
it runs on hardware offering far more computational re- The algorithm for the generation of next best views is in-
sources than what are available onboard the UAV. Af- spired by other researchers contributions concerned with
ter generating flight paths based on the received point creating watertight 3D models of real-world environments
cloud, the corresponding waypoints are sent back to the (such as, e.g., Holenstein, Zlot, & Bosse, 2011): watertight-
rover. ness is not only a desirable property for completely recon-
The rover program was created last: it runs on the UAV structed models, but it is also a helpful test for finding gaps
and interfaces with its FlightControl-board, the INS, and that have remained throughout the mapping process. The
laser scanners. It also implements the same communication following sections focus on executing this test as quickly as
protocol as the simulator, so the base station can connect to possible in order to find gaps of a desired minimum size
either the simulator or the rover and operate on simulated or and then derive useful waypoints from the results in the
real data (see Figure 10). In addition to the communication following stages.
performed between base station and simulator, the rover As an initial setup, the algorithm requires a predefined
also sends vehicle- and INS-status to the base. The base bounding-box b that contains both the UAV and the envi-
station retrieves low-datarate RTK differential corrections ronment to be mapped. A 3D uniform grid GIG of informa-
(from about 0.5 to 3 kb/s, depending on the number of GNSS tion gain subdivides b, with each cell carrying a scalar value
constellations supported and thus satellites observed) from indicating the information gain achievable by scanning it.
the GNSS reference station using a separate 2G, 3G, or serial A few seconds after takeoff, the point cloud that has been
connection and forwards them to the rover. Also, a high- streamed to the base station is downsampled (see Section
level motion-controller is implemented, processing incom- 4.2) into a sparse version, called the collider cloud C. Gaps

Journal of Field Robotics DOI 10.1002/rob


Adler et al.: Autonomous Exploration of Urban Environments 921

Figure 8. Screen shot of the simulator. The UAV can be controlled manually via joystick as well as autonomously using a custom
high-level flight-controller that approaches generated waypoints.

in this cloud are then detected by using a particle system, enormous number of collision-tests necessary for rapid de-
which simulates pouring water in the form of NP particles tection of small gaps. Thus, the algorithm was ported to
(p P ) over C (see Figure 11). We can postulate that when- CUDA running on a graphics card, and optimized for real-
ever a p P first collides with a c C and later arrives at time applicability.
bs bottom plane, it has successfully passed through a gap
in C. The algorithm stores the position Pcol of every parti-
cle ps last collision with C. Whenever a particle p reaches 4.1. Handling Point Clouds
the bounding boxs bottom plane, Pcol is looked up, and, if When the INS is delivering precise poses, all points from the
present, the information gain value of GIG s cell containing laser scanner are fused into a global coordinate system on
pcol is increased. Leaving reachability concerns aside, cells the rover and streamed to the base station in batches. Given a
of GIG in which many particles slide through gaps in the maximum of 1,080 points per scan at a scan-rate of 40 Hz, the
point cloud intuitively represent possible waypoints. theoretical point rate is 43.200 points per second per scanner.
This algorithm has previously been implemented in Some of the rays are reflected by the vehicles own booms,
a similar form using CPU implementations in the bullet propellers, and landing gear, returning distances below
physics library (Coumans, 2013): the point cloud was man- 0.5 m. Other rays directed upward rarely receive reflections,
aged using a fast dynamic bounding volume tree based on as they often point into the sky. After filtering values that
axis aligned bounding boxes for the broad-phase collision have very close or no reflections, an average of 600 points
detection. While the implementation proved to be a work- per scan remain to be fused, yielding a point rate of about
ing concept, it turned out to be far too slow to handle the 48.000 points per second from both scanners. As a single

Journal of Field Robotics DOI 10.1002/rob


922 Journal of Field Robotics2014

Figure 9. Screen shot of the base-station program, running on a notebook computer in the field. Depicted are the point cloud,
streamed in real-time from the vehicle, as well as GNSS status indicators and flight planning controls.

Figure 10. Communication between the main software modules. The base station connects to either the rover (UAV) or a simulator,
permitting tests of the employed algorithms in simulation before they are tested on the UAV.

point is currently represented using four IEEE754 single- simulated particles themselves, then executed between the
precision floats, this translates to a memory requirement of particles and the collider cloud. Thus, optimizing it becomes
0.77 Mb per second. Given the maximum flight time of 15 key to detecting gaps and generating waypoints as quickly
min, allocating 700 megabytes of memory on the graph- as possible.
ics card for the dense point cloud is sufficient, even when In the first stage, particles are collided against each
downsampling is disabled. Thus, on the base station, points other in order to simulate a fluid. While optimizing for
are saved to disk and uploaded into the GPUs memory speed by reducing the number of particle/particle colli-
space by appending them to a vertex buffer object VBOdense . sions is possible, it would mean changing the particle sys-
OpenGLs VBOs can be mapped into CUDA address space, tems behavior: if collision checks were skipped between
avoiding copies between interleaving visualization and pro- some neighboring particles, they would be allowed to pen-
cessing stages. etrate each other, making the simulated fluid lose the de-
sired property of being noncompressible. Trying to save
4.2. Data Reduction on the GPU collision tests by executing them in comparably coarse time
steps would first allow the particles to interpenetrate more
As shown in Section 7, the particle simulations bottlenecks deeply, which would cause large repelling forces to throw
are collision-detection and -processing. In every iteration them apart in the next collision detection phase. For this rea-
of the simulation, this process is first executed between the son, all neighboring particles must have pairwise collision

Journal of Field Robotics DOI 10.1002/rob


Adler et al.: Autonomous Exploration of Urban Environments 923

Figure 11. Overview of the process: (a) predefined bounding box b and the initial point cloud from the onboard laser scanner. (b)
16 k particles p P in gray being poured over downsampled cloud of colliders c C in blue. (c) One time step of simulation, with
falling particles p P that have collided with at least one c C in red, and others remaining gray. (d)(f) Overlay of a visualization
of GIG over sparse and dense point clouds, showing cells promising higher information gain in more saturated red.

Journal of Field Robotics DOI 10.1002/rob


924 Journal of Field Robotics2014

r r r

m
d
m
m

d
d

(a) A particle being collided against an (b) The same point cloud, sparsed to (c) The same point cloud, sparsed too
unprocessed, noisy and dense point cloud reduce the required number of collision much. Even though the cloud remains
for tests of permeability. Collision-tests tests. Because the sparsing algorithm ac- mathematically impermeable to the par-
will be performed between the particle counts for the particles size, permeability ticle, the discrete nature of the simulation
and every collider in its cell, resulting in remains unchanged. can allow the particle to pass through the
many unnecessary memory transactions, colliders.
calculations and comparisons.
Figure 12. Different levels of downsampling applied to the collider cloud C (stored in VBOC ) and its effects on performance and
correctness.

checks applied to them, and decreasing the total number of traveled by particle p in between collision checks is a func-
particles remains the only way to reduce the computational tion of its velocity pv and the simulations time step t. As
effort of particle/particle collision checking. soon as their product exceeds m, a particle can pass through
In the second stage, particles are collided against points the collider cloud by the offset applied due to its velocity in
of the collider cloud, testing its permeability. Depending on the integration phase [see Figure 12(c)]. Thus,
the distance between the UAVs range scanner and an ob- 
 2
ject being scanned as well as the number of scan-passes, d
pv t < r 2
that objects surface can be sampled using very high point 2
densities. Colliding particles against all of those points [as
depicted in Figure 12(a)], causes many pairwise collision- must hold to prevent false waypoints from being generated.
checks to be executed. Because only the permeability of The sparse collider cloud is stored in another, smaller
the collider cloud is tested, it is possible to increase the vertex buffer object VBOC on the graphics card. As previ-
distance dmax between neighboring colliders (reducing the ously mentioned, points are stored using four floats instead
clouds density) as long as particles of radius r are kept from of three for reasons of better alignment in GPU memory.
passing through the cloud [see Figure 12(b)]. Mathemati- The fourth component, w, stores the distance between the
cally, this is guaranteed as long as the margin m between scanned point and the range scanner. Because the positional
particle-center and the baseline (2D) or plane (3D) between accuracy of the platforms localization is far superior to its
colliders remains larger than zero. This geometric constraint orientational accuracy, the distance offers a good metric for
for nonpermeability can be trivially formulated using the the points precision, which will be used in the following
Pythagorean theorem: processing stages.
 When new waypoints have to be generated, the point
  cloud in VBOdense is downsampled into a sparse version C
dmax 2
m = r2 > 0. located in VBOC , as shown in Figure 13. An overview of
2
this process is given in Figure 14: after being cleared, VBOC
Theoretically, the constraint m >= 0 could be chosen, is filled with all points in VBOdense that are located within
effectively setting dmax = 2r. In practice, a particle of radius b, until either all points in VBOdense are copied or VBOC is
r requires the distance dmax between two colliders to re- full.
main less than 2r for two reasons: first, particles tend to Next, the colliders in the target buffer are reduced.
fall between very sparse colliders and get stuck, meaning The classic approach for implementations on the CPU is
that they no longer move and fulfill their purpose of sam- to sequentially iterate through all the points, deleting close
pling the represented surface. Secondly, the simulation is neighbors using algorithms and data structures optimized
performed in discrete time steps, meaning that the distance for radius-based neighbor queries. Our implementation is

Journal of Field Robotics DOI 10.1002/rob


Adler et al.: Autonomous Exploration of Urban Environments 925

Figure 13. (a) A dense point cloud, as streamed from the rover during flight. (b) The same point cloud, cropped to the region to
be mapped (predefined bounding box b) and sparsed to reduce the required number of collision tests.

similar in that it uses a uniform grid as a spatial decom- positions according to the hash values of the cells contain-
position technique [see Figure 15(a)], but different in that ing them increases the probability of fetching positions of
it is done in parallel on all available streaming multipro- other particles in the same and neighboring grid cells from
cessors. A spatial hash table based on the uniform grid neighboring memory locations, maximizing the memory
GSHT (the subscript SHT denotes spatial hash table) with bandwidth utilization by using coalesced access patterns.
NGSHT = GSHTx GSHTy GSHTz cells is created to enable effi- Presorting colliders also best leverages the L2 caches for
cient access to neighboring colliders. global memory access that emerged with CUDA compute
The SHT requires two vectors, Cindex and Ccellhash , asso- capability 2.0, as collider positions will already be cached
ciating the colliders index in Cpos with the hash of GSHT s when neighboring threads need to fetch their positions in
cell containing it. This is shown in Figure 15(b), and it en- order to execute collision tests against them.
ables us to find all particles within a given grid cell during Now, one thread is launched for every collider: it iter-
the following downsampling stage. The hash table for Cpos ates all other colliders in the same and neighboring cells (op-
is constructed in step 3 by launching one thread for every erating in adjacent memory locations to exploit the GPUs
c C. Afterward, it is sorted according to the CellHash col- cache) and checks whether they are located closer than the
umn, using a fast radix sort on the GPU (similar to Satish, threshold distance (usually defined to be slightly less than
Harris, & Garland, 2009). the particles diameter, as explained above). If so, the col-
Step 4 sorts colliders according to their grid cells hash lider with the larger w-component (which is likely to be less
value, thereby moving geometric neighbors into adjacent precise) gets overwritten with values of (0/0/0/0).
memory locations. In step 7, the values in VBOC can finally be compacted,
Afterward, step 5 uses one thread per collider to popu- in effect removing all zero-points. This free space allows
late a preallocated vector CellFirst of NGSHT integers, where more points from the dense point cloud to be appended
CellFirst[i] stores the first row of the sorted spatial hash ta- and reduced again, as decided in step 8.
ble referencing a collider in grid cell with hash i. If the cell In contrast to the following particle simulation, which
does not contain any colliders, its value is set to UINTmax . In needs to be executed many hundreds of times, this code
analogous fashion, CellLast is populated so that CellLast[i] is executed only once in preparation for waypoint gener-
stores the last row of the sorted spatial hash table referenc- ation, totaling fewer than 20 calls during typical flights.
ing a collider that is contained in the grid cell with hash Although the procedure contains many steps that process
i. An example is shown in Figure 15(c). On completion, large amounts of data, the massively parallel execution al-
finding all colliders in grid cell i now boils down to ac- lows downsampling of the point cloud typically within less
cessing Cpos using all indices found in Cindex [CellFirst[i]] to than 100 ms. For benchmarks, please see Section 7.
Cindex [CellLast[i]]. Independent of the threshold distance for neighbor
For optimization, a locality-preserving hashing func- removal, there is no guarantee that the holes develop-
tion is used to assign hash values to GSHT s cells. Sorting ing during downsampling are still small enough to keep

Journal of Field Robotics DOI 10.1002/rob


926 Journal of Field Robotics2014

Figure 14. A flow-chart describing the process of downsampling the point cloud in VBOdense to a sparse version C stored in VBOC .

particles from passing through. Ensuring this would require Section 4.2 already introduced the sparse collider-cloud
a computationally involved analysis of each points neigh- stored in VBOC , holding a vector of collider-positions Cpos .
borhood, slowing down execution. Given that particle size Testing this point cloud for watertightness involves simulat-
defines the minimum size of gaps to be detected, we have ing NP particles being poured over it, requiring further data
found that using their radius as a threshold distance for structures to be allocated in the graphics cards memory, as
neighbor removal prevents those degenerate cases, as long shown in Figure 17.
as the source point cloud is sufficiently dense and evenly Using these structures, a single iteration of the particle
sampled in this region. Since the latter condition is exactly simulation is processed as described in Algorithm 1:
what the algorithm is designed to test, there is no reason to
implement the aforementioned neighborhood analysis.

4.3. Testing for Watertightness on the GPU


The downsampling of the collider cloud described in the Algorithm 1 Massively parallel test for watertightness of C
previous section might seem to be a performance optimiza-
tion conceptually unrelated to the particle simulation. In 1: // Build spatial hash table for particles
fact, it is not: downsampling point clouds and simulating 2: for each core i < NP do in parallel
large amounts of particles share a large and important part 3: Pindex [i] i
of computational logic: neighbor searching. Thus, the highly 4: Pcellhash [i] GETCELLHASH (GSHT , Ppos [i])
parallel data structures supporting fast access to neighbors 5: end for
6: RADIXSORTKEYVALUE (Pcellhash , Pindex )
for downsampling can also be used to enable high-speed
7:
particle simulation, as seen in Figure 16. Instead of remov-
8: // Build cell lookup table for particles
ing colliders that are closer than a threshold distance, we
9: allocate sharedHash[NP ]
add repelling forces to particles that have approached up to 10: for each core i < NP do in parallel
a distance closer than their diameter.

Journal of Field Robotics DOI 10.1002/rob


Adler et al.: Autonomous Exploration of Urban Environments 927

The cell lookup table is populated in lines 919: one


11: sharedHash[i + 1] Pindex [i] thread per particle reads Pcellhash [threadId] into the tempo-
12: SYNCHRONIZETHREADS rary cellHash[threadId], located in the given thread-blocks
13: if sharedHash[i]
= sharedH ash[i 1] then shared memory space. In this way, each cell hash is fetched
14: PcellF irst [Pcellhash [i]] i from global memory only once. After synchronization of
15: if i > 0
all threads in the warp (ensuring that all hashes have been
16: PcellLast [sharedHash[i + 0]] i
loaded), they are compared against the cell-hash of the pre-
17: end if
vious particle in cellHash[threadId-1]. Because Pcellhash is
18: end if
sorted, a failed comparison means that the previous particle
19: end for
20: is located in a different cell, allowing CellFirst and CellLast
21: // Repeat lines 219 for colliders SHT and cell to be populated.
22: // lookup table if collider cloud C changed To detect and process collisions, NP threads fetch
23: Ppos [threadId] and compute the hash value of GSHT s re-
24: // collide particles spective cell. They then iterate through their own and all
25: for each core i < NP do in parallel 33 1 = 26 neighboring cells in the grids of the SHTs for
26: force (0, 0, 0) both other particles and colliders (lines 2544), reusing the
27: cellHashes GETNEIGHBORHASHES (Ppos [i]) vectors built in Section 4.2. To ensure that particles in non-
28: for each h cellHashes do neighboring cells cannot collide, their diameter must be less
29: // collide p against colliders than the grid-cells smallest side. For every cell visited, Cell-
30: for j CcellF irst [h], CcellLast [h] do First and CellLast are used to quickly access the indices
31: force += COLLIDEDEM (Ppos [i], Cpos [j ]) of contained particles and colliders. When collisions occur,
32: end for Pvel [threadId] is updated using forces computed by the dis-
33: // save particle pos in case of collider-collision crete elements method (Harada, 2007), and Ppos [threadId]
34: if force
= (0, 0, 0) then is copied to Pcol [threadId] (line 35), allowing this particles
35: Pcol [i] Ppos [i] last collision to be retrieved in case it reaches the bounding
36: end if boxs bottom plane in the next step.
37:
The particle-motion is integrated by launching NP
38: // collide p against other particles
threads: each kernel first updates Pvel [threadId] according
39: for j PcellF irst [h], PcellLast [h] do
to a given time step t, gravity g R3 , and a global damp-
40: f orce += COLLIDEDEM (Ppos [i], Ppos [j ])
ing value. It also collides Ppos [threadId] against the inner
41: end for
42: Pvel [i] (force + Pvel [i]) sides of b, confining the particle to the bounding box. Then,
43: end for Ppos [threadId] is updated according to Pvel [threadId] and t
44: end for and used to check whether Ppos [threadId] has reached bs
45: bottom plane. If so, that particles last collision is looked
46: // Integrate motion up from Pcol [threadId] and is, if not null, used to incre-
47: for each core i < NP do in parallel ment the information gain of GIG s cell containing it in
48: Pvel [i] damping (Pvel [i] + (g t)); line 54.
49: Pvel [i] COLLIDEWITHBOUNDINGBOX (Ppos [i]) After multiple iterations, particle simulation is termi-
50: Ppos [i] Ppos [i] + (Pvel [i] t) nated (see Section 7 for a discussion of termination criteria).
51: if Ppos [i].y < b.min.y then GIG s cells are sorted in order of decreasing information gain
52: Ppos [i].y b.max.y values, and their respective positions in R3 are computed.
53: if P col[i]
= (0, 0, 0) After close waypoints are merged, the results are passed to
54: GIG [GETCELLHASH (P col[i])]+=1 the path planner.
55: P col[i] (0, 0, 0)
56: end if
57: end if 5. COMPUTING TRAJECTORIES
58: end for
As input, the path planner requires the UAVs current posi-
tion, the sparse collider cloud, and a list of waypoints.
In the first step, another uniform 3D grid is created,
whereby one byte per cell is allocated in GPU memory.
To build the spatial hash table, NP threads write their Each cells value is initialized to zero. The sparse collider
thread-Id into Pindex [threadId] and the hash of the cell con- clouds points are then processed in a parallel manner, so
taining Ppos [threadId] into Pcellhash [threadId] in lines 25. Af- that every cell containing a point is set to a value of 255
terward, both vectors are sorted according to Pcellhash using (depicted in dark gray in Figure 18, left), in effect creat-
a parallel radix sort. ing a three-dimensional occupancy grid (Figure 18, left).

Journal of Field Robotics DOI 10.1002/rob


928 Journal of Field Robotics2014

(a) Diagram of colliders c C from the dense point cloud, located


in cells of uniform grid G S H T (diagram is inaccurate in that it
is showing only a two-dimensional grid for clarity and visualiz-
ing points as spheres). Cells are labeled using their hash values
C c e l l h a s h (in this case, the hash value is simply the cells row-
major index). Colliders are labeled using their index C i n d e x in
Cp o s .

Cell Cin dex F irst Cin dex Last


Collider Collider
Hash (CellF irst) (CellLast)
CellHash Index
0 U IN Tm ax *
Index (Ccellhash ) (Cin dex )
1 U IN Tm ax *
0 4 5
2 U IN Tm ax *
1 4 3
3 U IN Tm ax *

NG S H T
2 5 7
4 0 1
NC

3 6 2
5 2 2
4 6 9
6 3 4
5 7 4
7 5 5
6 8 1
8 6 6
(b) Spatial hash table, associating every col-
lider to its cells hash value in the uniform grid. (c) The collider cell lookup table, populated using
After construction, this table is sorted accord- the spatial hash table: for each cells hash value,
ing to the CellHash column. C ellF irst and C ellLast store the first and last row
of the spatial hash table referencing colliders in that
cell. This allows a fast, memory-coalesced access to
all colliders in a grid cell. The value U IN T m a x in
C ellF irst denotes empty cells, while * in C ellLast
consequently denotes an undefined value.
Figure 15. Data structures in GPU memory, required for downsampling the dense point cloud into the collider cloud C in VBOC .

When using a 26 (8 in 2D) neighborhood for path finding, vents this problem by wrapping those cell patterns with
generated paths can traverse diagonally between occupied occupied cells, also adding a safety margin between vehi-
cells, causing the vehicle to pass unsafe parts of the environ- cle and geometry. Thus, for increased safety, the occupancy
ment. As seen in the middle of Figure 18, dilation circum- grid is dilated in the next step: each thread processes one

Journal of Field Robotics DOI 10.1002/rob


Adler et al.: Autonomous Exploration of Urban Environments 929

(a) Particles p P (with radius, shown in gray) and colliders


c C (i.e. points from downsampled point cloud, blue) located
in cells of uniform grid G S H T (showing a two-dimensional grid
for clarity). Cells are labeled using their hash values, while parti-
cles and colliders are labeled with their index in P p o s and C p o s ,
respectively.

Cell Pin dex F irst Pin dex Last


Hash (CellF irst) (CellLast)
P article P article
0 0 0
CellHash Index
1 U IN Tm ax *
Index (Pcellhash ) (Pin dex )
2 U IN Tm ax *
0 0 6
3 U IN Tm ax *

NG S H T
1 4 9
4 1 2
NC

2 4 2
5 3 3
3 5 1
6 U IN Tm ax *
4 7 3
7 4 4
(b) Spatial hash table, associating every parti- 8 U IN Tm ax *
cle to its cells hash value in the uniform grid.
After construction, this table is sorted accord- (c) The particle cell lookup table, populated using
ing to the CellHash column. the spatial hash table: for each cells hash value,
C ellF irst and C ellLast store the first and last row
of the spatial hash table referencing particles in
that cell. This allows a fast, memory-coalesced
access to all particles in a grid cell. The value
U IN T m a x in C ellF irst denotes empty cells, while
* in C ellLast consequently denotes an undefined
value.
Figure 16. Additional vectors in GPU memory, required for particle simulation. Their structure is completely analogous to that
described in Figure 15.

cell, looks at the 26 (8) neighboring cells, and assigns a value batches. If dilated cells were also assigned a value of 255,
of 254 if it finds an occupied neighbor. Because the spatial threads executed in consecutive warps were unable to dis-
grid usually contains more cells than the GPU has stream- criminate between occupied and dilated neighbor cells, in
ing multiprocessors, the grid will be processed in (spatial) effect dilating the occupied cells even further.

Journal of Field Robotics DOI 10.1002/rob


930 Journal of Field Robotics2014

Idx Cpos Ppos Pv el Pcol GI G cell. The path from start to goal is found by launching one
0 xyzw xyzw xyzw xyzw 0 thread per grid-cell to look up its own cells value, and,
1 xyzw xyzw xyzw xyzw 0 if zero, the neighboring 26 (8) cells values. If their mini-
2 xyzw xyzw xyzw xyzw 0 mum is nonzero, that minimum is incremented by 1 and
3 xyzw xyzw xyzw xyzw 0 saved in the cell. This process is repeated until the goal-cell
... ... ... ... ... ... has received a nonzero value (shown in Figure 18, right) or a
maximum number of iterations has been reached, indicating
Figure 17. Four vectors of float4 are allocated, storing NC col- that no path was found. If a path was found, a single thread
lider positions Cpos as well as NP particle positions Ppos , the starts at the goal cell and collects the 3D world-positions of
same amount of particle velocities Pvel and particle/collider the cells it passes while hopping toward neighboring cells
collision-positions Pcol in GPU memory. Also, memory for NGIG with smaller values, eventually reaching the start cell. If
scalar cell-values of a grid GIG of information gain is allocated. a path was found, the world-positions of all visited cells
will be converted to waypoints, and path planning contin-
ues with start and goal assigned the previous goal and the
Next, the list of waypoints is processed: because way- next waypoint, respectively. If no path was found, the first
points are generated near edges of the point cloud, they are waypoint is deleted and path planning continues with the
usually placed in occupied cells of the occupancy grid and next waypoint. When path planning is completed, the col-
must be moved to neighboring free cells: because the main lected waypoints are sent to the rovers high-level motion
LIDARs field-of-view is oriented downward, the UAV is controller.
expected to map the gaps by flying above them. All way- During autonomous flight, the occupancy grid is con-
points are processed in parallel: while its containing grid cell stantly updated using the collider clouds new points. At a
is occupied, the cell above its current position is checked. If configurable interval (currently 2 s), the grid-cells of upcom-
it is also occupied, its eight horizontal neighbors are probed. ing waypoints are checked, and, if found to be occupied, the
This process repeats until either a free cell is encountered path is replanned. An example of this scenario is shown in
(and the waypoint is placed in its center) or the distance the accompanying video.
between the original waypoint and the current position ex-
ceeds the LIDARs range of 30 m, in which case the way-
6. MOTION CONTROL FOR AUTONOMOUS FLIGHT
point is deleted. The result is an occupancy grid with a list
of waypoints in its free cells. Multirotor UAVs require complex motion control because
Ignoring the occupancy grid, a CPU-based TSP-solver they react to control-input in a nonlinear fashion and they
is used to reorder the waypoints into the shortest path. Be- can be affected by potentially large disturbances (wind).
cause we produce fewer than 20 waypoints at a time, this They are also underactuated and nonholonomic, as the
computes in a few milliseconds. number of controllable degrees of freedom is less than their
The occupancy-grid-cell containing the UAVs current total DOFs. Since roll and pitch determine lateral velocity
position is marked as the start cell and its value is set to 1, and thus position in Cartesian space, their attitude cannot
while the grid-cell of the first waypoint becomes the goal be controlled independently, posing additional constraints.

Figure 18. Parallelized path planning using a uniform occupancy grid. Left: Initialized occupancy grid with start and goal cells
marked and cells containing colliders being marked as occupied. Center: After dilation, searching for traversable cells in the
diagonal neighborhood is safe. Right: Populated occupancy grid for a path between start and goal after eight iterations. In the next
iteration, the goal cell will receive a value of 10, terminating the search.

Journal of Field Robotics DOI 10.1002/rob


Adler et al.: Autonomous Exploration of Urban Environments 931

Table III. Different states of low- and high-level motion controllers, based on flight-state restriction and the presence of waypoints.

Flight-state Channel Low-level controller High-level controller High-level controller


Restriction Switch value state (ExternControl) state (waypoints absent) state (waypoints present)

UserControl 107 Disabled UserControl UserControl


Hover 20 Enabled Hover Hover
ApproachWaypoint 128 Enabled Hover ApproachWaypoint

As mentioned in Section 3.1.2, the Mikrokopter UAV limited according to another channels value (between 0 and
comes with a FlightCtrl-board that integrates a MEMS- 2 m/s, set by a potentiometer on the remote control). Then,
grade IMU and implements a low-level PID motion con- the current and desired velocity are used to compute the
troller. In the absence of control input, it keeps the platforms error. Velocity-based control was implemented after flying
heading constant and pitch/roll at zero degrees. in windy conditions had demonstrated that position-based
The FlightCtrl can read up to 12 channels from the re- control for pitch and roll was insufficient to reach the target
mote control. Each has a resolution of eight bits, producing position, and adding an integral component to counter this
signed values from 127 to 128. Values for thrust, yaw, introduced complications when the UAV changed orienta-
pitch, and roll can be sent using the remote control or using tion. For yaw, a nonlinear P-component is used to ensure
an open protocol (Buss & Busker, 2012) on its serial port at that the front is always pointed away from the position of
a rate of roughly 20 Hz, with acknowledgment packets be- liftoff. Thus, when the pilot (assumed to be positioned near
ing generated for every such motion-control packet. This is the position of liftoff) switches from Hover to the UserControl
called ExternControl in MikroKopter-parlance. Using the state, manually controlling pitch and roll does not require
values read from the gyroscopes, accelerometers, and an in- knowledge of the UAVs orientation.
ternal mixing-table, control values are converted to motor- ApproachWaypoint state is similar to Hover state, with
speeds and sent to the brushless controllers. Documentation the next waypoint being the target. It is different from
on the implementation of the low-level controller does not Hover state in that the controller first yaws toward the next
exist. Although its source-code is open, it is written mostly waypoint and then pitches forward, as the INS produces
in German and contains few comments. A deeper analysis a more precise attitude solution when flying forward. To
can be found in Sa and Corke (2011). avoid slow and clumsy cycles of orient toward target,
The four channels used for thrust, yaw, pitch, and roll fly forward, and slow down, the controller starts yaw-
are mapped to the respective sticks, while one channel is ing and rolling toward the second queued waypoint shortly
mapped to a tristate switch used for flight-state restriction. before it has reached the first. The approach is only slowed
The FlightCtrl is configured to activate ExternControl when when closing in on the last queued waypoint or when
this channels value is above 128. Even when activated, for the second queued waypoint will be situated behind the
safety reasons the FlightCtrl will use the remote controls UAV.
thrust value as an upper bound to the thrust specified on The update frequency of the high-level controller is
the serial port. This configuration makes simple and safe limited by the availability of reliable information about po-
testing of autonomous flying possible, as it allows the pilot sition and attitude. As mentioned in Section 3.1.3, the iner-
to quickly overrule the high-level motion controller in case tial navigation system is capable of producing sequences of
of problems. The high-level, PID-based motion controller one fused pose followed by four IMU-integrated poses at a
is implemented in the rover program on the UAV. It reads rate of 50 total poses per second. In this configuration, the
the flight state restriction channel using the same serial port UAV showed unpredictable and dangerous behavior dur-
and changes states accordinglythe resulting flight states ing tests, which was later determined to be caused by the
for both controllers are listed in Table III. States transition controllers derivative component reacting to the UAVs po-
automatically from ApproachWaypoint to Hover when there sition seemingly jumping between the last IMU-integrated
are no more waypoints and back, when new waypoints pose to the next, fused pose by up to 20 cm (see Figure 6).
arrive. As a result, the INS was configured to deliver only fused
When entering Hover-state, the UAVs current position poses, albeit at a lower rate of 10 Hz. As evidenced in the
is set as the target, and values for pitch, roll, and thrust are accompanying video, this rate is still sufficient for flight
computed to steer toward that target. For thrust, the con- control.
troller simply determines the error between the current and Proportional and derivative gains for thrust, yaw, pitch,
desired height, while for pitch and roll, a desired velocity and roll have been determined by performing test flights in
over pitch and roll axes is derived from the vector toward different wind conditions, with the UAV attached to a long
the target. This velocity is proportional to the distance, but sea-fishing rod for safety.

Journal of Field Robotics DOI 10.1002/rob


932 Journal of Field Robotics2014

Figure 19. Closeup of the UAV scanning near people on the campus. A single scan is shown, visualizing every ray that received
a reflection in red.

7. RESULTS bination of few visible satellites (because of unfortunate


orbit parameters and the comparatively high latitude in
7.1. Localization and Environmental Conditions
Hamburg), electromagnetic interference experienced with
Currently, the system relies on the INS providing accu- previous hardware designs, and temporary obstructions,
rate position and attitude solutions, requiring reception e.g., caused by flying below large trees, can prove trouble-
of signals on different frequencies from a sufficient num- some. The first problem is mitigated by supporting multi-
ber of satellites. During INS startup, achieving centimeter- ple constellations, and the second by shieldingthe third
precise positioning requires at least four GPS satellites to be problem was only partly solved by using a dilated occu-
visible, with each supplying signals on L1 and L2 frequen- pancy grid to keep a minimum distance from potentially
cies for both coarse acquisition and carrier phase measure- shadowing structures. Because the IMU is unable to sup-
ments, which works reliably in all but the most occluded lo- port precise positioning in the case of GNSS outages, the
cal horizons. Once the carrier phase ambiguities have been base station notifies the pilot in this case, allowing manual
solved, RTK fixed mode is entered and GLONASS satellites control.
are added to the solution, making it more robust against While the precision of the point cloud produced has not
loss-of-lock of single satellites. Because of limitations in the been investigated thoroughly, comparing distances mea-
firmware, the INS is unable to start centimeter-precision po- sured in the point cloud with those measured manually
sitioning using signals from mixed constellations, e.g., three have revealed errors of no more than 25 cm. While this
GPS and three GLONASS satellites. Once the PVT is sup- may not be sufficiently accurate for many traditional sur-
ported by both GPS and GLONASS, we experience very veying applications, the data produced are precise enough
robust positioning even very close to buildings (about 1 m to be used for motion planning, collision avoidance, and
distance), indicating very reliable multipath mitigation. Of some applications of surface reconstruction. Figure 19 gives
course, loss of GNSS-based PVT is not unseen: the com- an impression of the resulting cloud by showing the UAV

Journal of Field Robotics DOI 10.1002/rob


Adler et al.: Autonomous Exploration of Urban Environments 933

100000
Intel Xeon E3-1245 (single core)
NVIDIA Quadro 2000
Simulation step duration (ms) NVIDIA GTX 670
10000

1000

100

10

1
1k 2k 4k 8k 16k 32k 64k 128k 256k
Number of Particles
Figure 20. This graph shows the maximum time required for a single time step of the simulation containing 16 k colliders against
different numbers of particles. Visualization was disabled in all tests; note the logarithmic scale of the ordinate axis. The collision
stage took 89.1% of the shader-processors time, while particle-system integration only took 0.6%. The run time of the proof-of-
concept implementation for the CPU is up to three orders of magnitude longer than those of the GPU-based implementations.

scanning near persons in-flight. The video shows the UAV ward, it queries for collisions that occurred during pro-
scanning a street light twice within roughly a minute, where cessing in order to manage a structure similar to Pcol . Ac-
the lamps mast shows up roughly 20 cm away from the first tivating visualization causes further slowdowns, as usage
scan in the point cloud. of OpenGL immediate mode requires that all geometry is
The high-level motion controller is able to navigate the reuploaded to the device for every frame. As shown in
UAV reliably through open terrain and accommodates the Figure 20, testing a point cloud with 10 k points for wa-
inertial navigation systems special requirements with re- tertightness using 1 to 64 k particles (without visualiza-
gard to flight dynamics. The UAV has been tested at outside tion) required between 0.7 and 44 s for each simulation
temperatures down to 10 C and the motion controller is step.
known to handle wind gusts of up to 10.8 m/s (the video The GPU implementation was tested on a NVIDIA
shows a flight on October 1st, 2013, with the temperature Quadro 2000 graphics card with 192 CUDA Fermi cores
at 12 C and an average wind of 4.2 m/s with gusts up to clocked at 625 MHz as well as a NVIDIA GTX 670 card,
10.8 m/s, according to the weather station of the nearby providing 1344 CUDA Kepler cores running at a clock of
Hamburg Airport). 980 MHz. It is up to three orders of magnitude faster, taking
between 2 and 12 ms. This is because integration of mo-
7.2. Generated Waypoints and Paths tion, collision detection, and handling as well as point cloud
Adler, Xiao, and Zhang (2012) describe the CPU-based im- visualization using OpenGL core profile are very suitable
plementation for particle simulation and also provide quan- for processing on single instruction multiple data (SIMD)
titative results by comparing the scanned surface over time architectures. As expected, profiling shows that the parti-
using the algorithm against results obtained using scan- cle/particle and particle/collider collisions are by far the
line passes. This implementation was tested on an Intel most computationally demanding stages, requiring 90% of
Xeon E3-1245 CPU, clocked at 3.30 GHz. It is an unopti- the time spent on the streaming multiprocessors.
mized, single-threaded version that delegates collision de- Further results are presented based on data gathered
tection and handling to the Bullet Physics library. After- during a short flight over the campus of the Department

Journal of Field Robotics DOI 10.1002/rob


934 Journal of Field Robotics2014

Figure 21. (a) Part of the campus of the Department of Computer Science at the University of Hamburg. (b) Point cloud resulting
from about 100 s of flight on campus, aligned with the viewpoint of the above photograph.

Journal of Field Robotics DOI 10.1002/rob


Adler et al.: Autonomous Exploration of Urban Environments 935

Figure 22. (a) Particle simulation in progress; colliding particles with a sparse a collider cloud to test the latter for watertightness.
(b) Visualization of the grid of information gain. More saturated cells indicate higher information gain and are more likely to be
processed into waypoints.

Journal of Field Robotics DOI 10.1002/rob


936 Journal of Field Robotics2014

Figure 23. (a) The occupancy grid from the opposite side, with 323 cells superimposed over the dense point cloud. Occupied
cells are shown in dark gray, dilated cells in light gray. (b) Grid cells containing high information gain values were converted to
waypoint candidates. Afterward, close candidates were merged and a collision-free trajectory was generated from the vehicles
position (in the center) to the best candidate.

of Computer Science of the University of Hamburg. Both into waypoint candidates with 3D world-positions. Candi-
the campus and the resulting point cloud (colored accord- dates closer than 5.0 m are merged and passed to the path
ing to height) are displayed in Figure 21. After about 100 planner.
s of flight within a bounding box of 64 64 32 m, mo- Next, the path planner builds a 3D occupancy grid
tion planning is executed: Figure 22(a) shows 32 k particles and dilates it as described in Section 5 and depicted in
of 0.5 m radius being poured over 36 k colliders of the Figure 23(a). A collision-free path from start to goal cell is
downsampled point cloud. After 5 s, the particle simulation found and passed to the rover program in the form of way-
is stopped and the resulting values in the grid of infor- points, as seen in Figure 23(b). Using a NVIDIA GeForce
mation gain are presented in Figure 22(b). A total of 100 GTX 670, the complete process of motion planning takes
cells containing the highest information gain are converted between 500 and 600 ms using an occupancy grid of 323

Journal of Field Robotics DOI 10.1002/rob


Adler et al.: Autonomous Exploration of Urban Environments 937

Figure 24. Grid of information gain with 256 32 256 cells, projected as in Figure 11(f). Maximum information gain is shown
normalized using a heat map visualization (a) after 400, (b) after 700, and (c) after 1,000 simulation steps.

cells. While using a finer grid is possible, the UAVs diame- cloud density and particle size must be considered. The pro-
ter of more than 1 m (including propellers) make a cell-size cess also relies on sensible initial particle placement, mean-
of 23 m seem sensible. ing that the bounding box b must completely contain the
The path planner is based on a simple grid-based search point cloud to be tested when particles are placed at the
algorithm, as the underlying parallel architecture makes ap- top. Interior NBV problems have not been tested, but parti-
proaches such as best-first search (such as, e.g., A*) have cles would have to be spawned within the point cloud, not
less of an impact on performance, especially when the above.
number of grid cells to search in each iteration does not Difficult scenarios such as gaps in high walls or be-
exceed the number of available multiprocessors. As with low carports can remain undetected when a low number
most geometry-based algorithms, the planner is resolution- of particles prevents the simulation from filling the point
complete, meaning that if a path exists, it will be found if the cloud to a sufficient height or when the particle size makes
grids resolution is sufficiently large. Because our approach reaching gaps impossible. This can be rectified by initializ-
exploits the grid cells size to introduce a safety margin, false ing the simulation with large particles and restarting it with
negatives are theoretically possible, but they result practi- decreasing size when no waypoints can be generated. For
cally from the compromise between reachability and safety. this reason, we consider the approach probabilistically and
Because the particle simulation causes the information resolution-complete.
gain in GIG s cells to steadily increase during simulation, ter-
mination criteria are not apparent: a tradeoff must be found
between short run times for rapid generation of results and 7.4. Scalability
longer run times that allow a better sampling of the point Within the particle simulation, most routines such as radix
clouds gaps. Figure 24 presents the normalized information sort, construction of cell lookup tables, and particle inte-
gain values of the point cloud depicted in Figure 11(f) at gration run in linear time with respect to the number of
different steps into the particle simulation. While a visible particles, while collision-detection can become O(n2 ) in the
difference exists between the information gain values at 400 worst-case scenario. Processing n points from the sparse
and 700 steps into the simulation [(a) and (b)], the normal- point cloud into an occupancy grid of resolution r takes
ized information gain remains almost constant during the O(n) time and O(r 3 ) space, while path planning through
following simulation steps [between (c) and (d)]. Thus, for the grid takes O(r 3 ) time and space. To improve scalability,
the targeted application on UAVs, using a bounding box b waypoint generation, path planning, and obstacle avoid-
with a size of 643 m, 16 k particles (p P ) with a radius ance have been designed to work within the constraints
of 0.25 m and 64 k colliders (c C) allow the generation of of a bounding box b. Independent of how the number of
multiple NBVs in less than 3 s using a NVIDIA GTX 670 particles and colliders might be able to scale with future
graphics card. GPU generations, the approach is intrinsically scalable to
larger terrains by simply shifting b to yet unexplored spaces
once no more gaps can be found. Currently, the user can ei-
7.3. Success and Failure Analysis
ther shift b manually in the base stations user-interface, or
Testing point clouds for watertightness requires correctly instruct the path planner to automatically center b at the
tuned parametersin particular, the ratio between point UAVs current position when it approaches its boundaries.

Journal of Field Robotics DOI 10.1002/rob


938 Journal of Field Robotics2014

Table IV. Memory allocated in GPU memory space. collider positions will be researched, as it allows doubling
both capacity and perceived memory bandwidth. This data
Data Type Size (bytes) Count format has been supported by OpenGL since version 3.0.
CUDA supports half-floats merely as a storage format, but
Pvel float4 16 NP conversion to single-precision floats requires only a single
Pcol float4 16 NP instruction. Especially when applied to outdoor scenarios,
Ppos float4 16 NP precision in the centimeter range for the colliders is deemed
Pindex uint 4 NP sufficient for gap detection. The particles collision positions
Pcellhash uint 4 NP will also be converted to half-floats, but this is expected
CellFirstP uint 4 NGSHT
to have less of an effect, since updates to these values are
CellLastP uint 4 NGSHT
comparatively rare. Whether particle positions and veloc-
Cpos float4 16 NC
ities can also be stored with lower precision needs to be
Cindex uint 4 NC
Ccellhash uint 4 NC
investigated, as slight changes in the particle systems pa-
CellFirstC uint 4 NGSHT rameters often translate to large changes in the particles
CellLastC uint 4 NGSHT behavior.
GIG char 1 NGIG Another option for even faster execution is the possi-
bility of using multiple graphics cards. While embarking on
this path strongly constrains the choice of mobile hardware
accommodating this setup, a separation of the subtasks per-
7.5. Memory Requirements formed seems possible. The most promising separation of
tasks would be to execute particle/particle collisions on one
The amount of memory required on the graphics card is graphics card and particle/collider positions on the other.
determined by the number of points in the collider cloud, After the collision-induced changes of the particles veloc-
NC , the number of particles, NP , the number of cells NGSHT ities are computed on each card, they would simply have
in both spatial hash tables, as well as the number of cells to be added to the other cards particle velocities after each
for the global grid of information gain, NGIG . The respective iteration. Because the time spent visualizing the dense point
data structures and their size are detailed in Table IV. As an cloud, particles, colliders, and the grid of information gain
example, to generate waypoints for a point cloud with 64 k is non-negligible (but not included in the results above),
points using 128 k particles, we use two spatial hash tables another option is to separate data visualization from data
for particles and colliders with NGSHT = 128 64 128 cells processing: during flight, the dense point cloud could be
each, while the computed information gain is stored in a rendered (and newly appended points can be reduced) on
grid with NGIG = 256 32 256 cells. This results in a total one graphics card, with the result being sent directly to the
memory requirement of only 26.5 Mb. second card, using direct memory transfers to completely
bypass the CPU.
To further optimize memory access patterns, it is
8. CONCLUSION AND OUTLOOK planned to compare the performance to other cell-hashing
This paper has presented the following contributions: functions that are expected to provide better locality than
the currently used simple serial hashing function. Tests
r a light-weight and fast airborne platform, capable of are currently being done with Hilbert- and Z-Order curves
quickly scanning urban environments by producing geo- (Morton code), as suggested by Green (2012).
referenced point clouds of configurable density. We have successfully developed a planar surface-based
r a novel algorithm finding multiple next best views, work- outdoor mapping system in our previous work (Xiao, Adler,
ing directly on unorganized point clouds. Zhang, and Zhang, 2013) that is fast, accurate, and robust
r an implementation of said algorithm that both exploits compared to state-of-the-art algorithms, but it is not fully
the enormous computational power available on todays autonomous because a human operator is required for view-
massively parallel hardware and scales well with the point planning. It is therefore interesting to embed this NBV
number of available processors. planning algorithm in the system for autonomous explo-
r a fast and simple GPU-based algorithm for processing ration tasks. Especially for application on UAVs, extensions
the generated NBVs into collision-free sequences of way- allowing anticipation of collisions between the robot and the
points fused point cloud are currently being researched. Whenever
C is updated, one thread per cell can be employed to check
The authors plan to further improve the presented ap- potentially colliding points in every grid cell that is tra-
proach in terms of efficiency. Because most parts of the im- versed by the planned trajectory within milliseconds. If at
plementation are memory-bound, the impact of using half- least one thread detects a potential collision, the path needs
floats (i.e., binary16 in IEEE 754 parlance) for at least the to be replanned.

Journal of Field Robotics DOI 10.1002/rob


Adler et al.: Autonomous Exploration of Urban Environments 939

REFERENCES national Conference on Intelligent Robots and Systems


(pp. 11281133). San Francisco.
Adler, B., Xiao, J., & Zhang, J. (2012). Towards autonomous air-
Newcombe, R. A., Davison, A. J., Izadi, S., Kohli, P., Hilliges, O.,
borne mapping of urban environments. 2012 IEEE Confer-
Shotton, J., Molyneaux, D., Hodges, S., Kim, D., & Fitzgib-
ence on Multisensor Fusion and Integration for Intelligent
bon, A. (2011). KinectFusion: Real-time dense surface map-
Systems (MFI) (pp. 7782).
ping and tracking. In 10th IEEE International Symposium
Adler, B., Xiao, J., & Zhang, J. (2013). Finding next best views
on Mixed and Augmented Reality (ISMAR) (pp. 127136).
for autonomous UAV mapping through GPU-accelerated
Basel, Switzerland.
particle simulation. IEEE/RSJ International Conference
Null, B. D., & Sinzinger, E. D. (2006). Next best view algorithms
on Intelligent Robots and Systems (pp. 10561061).
for interior and exterior model acquisition. In Proceedings
Tokyo.
of the Second international Conference on Advances in
Blaer, P., & Allen, P. K. (2007). Data acquisition and view plan-
Visual Computing, Volume Part II, ISVC06 (pp. 668677).
ning for 3-D modeling tasks. In IEEE/RSJ International
Berlin, Heidelberg: Springer-Verlag.
Conference on Intelligent Robots and Systems (pp. 417
ORourke, J. (1987). Art gallery theorems and algorithms. New
422). Marina, San Diego, CA.
York: Oxford University Press.
Bryson, M., & Sukkarieh, S. (2006). Active airborne localisation
Potthast, C., & Sukhatme, G. S. (2011). A probabilistic
and exploration in unknown environments using inertial
framework for next best view estimation in a clut-
SLAM. In IEEE Aerospace Conference (p. 13).
tered environment. Available at http://robotics.usc.edu/
Buss, H., & Busker, I. (2012). Mikrokopter serial protocol.
potthast/vua2012.pdf, retrieved on January 13th, 2013.
http://www.mikrokopter.de/ucwiki/en/SerialProtocol/
Sa, I., & Corke, P. (2011). Estimation and control for an open-
(online; accessed 12 October 2012).
source quadcopter. In Proceedings of the Australasian
Connolly, C. (1985). The determination of next best views. In
Conference on Robotics and Automation, 2011.
IEEE International Conference on Robotics and Automa-
Satish, N., Harris, M., & Garland, M. (2009). Designing effi-
tion (vol. 2, pp. 432435).
cient sorting algorithms for manycore GPUs. In IEEE In-
Coumans, E. (2013). Bullet physics library. http://bullet
ternational Symposium on Parallel Distributed Processing
physics.org/ (online; accessed 19 Febrauary 2013).
(pp. 110). Rome.
Demski, P., Mikulski, M., & Koteras, R. (2013). Characteriza-
Scott, W. R., Roth, G., & Rivest, J.-F. (2003). View planning for au-
tion of Hokuyo UTM-30LX laser range finder for an au-
tomated three-dimensional object reconstruction and in-
tonomous mobile bobot. In A. Nawrat, K. Simek, & A.
spection. ACM Computing Surveys, 35(1), 6496.
Swierniak (eds.), Advanced Technologies for Intelligent
Shade, R., & Newman, P. (2011). Choosing where to go: Com-
Systems of National Border Security, Studies in Computa-
plete 3D exploration with stereo. In IEEE International
tional Intelligence (vol. 440, pp. 143153).
Conference on Robotics and Automation (pp. 28062811).
Fairfield, N. (2009). Localization, mapping, and planning in
Shanghai.
3D environments. Ph.D. thesis, The Robotics Institute,
Strand, M., & Dillmann, R. (2010). Grid based next best view
Carnegie Mellon University.
planning for an autonomous robot in indoor environ-
Green, S. (2012). Particle simulation using CUDA. Santa Clara,
ments. In Proceedings of the Fourth International Work-
CA: NVIDIA.
shop on Robotics for Risky Interventions and Environmen-
Harada, T. (2007). Real-time rigid body simulation on GPUs. In
tal Surveillance-Maintenance, RISE2010, Sheffield, UK.
H. Nguyen (ed.), GPU Gems 3. Boston: Addison-Wesley.
Thrun, S., Burgard, W., Fox, D., Hexmoor, H., & Mataric,
Holenstein, C., Zlot, R., & Bosse, M. (2011). Watertight surface
M. (1998). A probabilistic approach to concurrent map-
reconstruction of caves from 3D laser data. In IEEE/RSJ In-
ping and localization for mobile robots. Machine Learning
ternational Conference on Intelligent Robots and Systems
(pp. 2953).
(pp. 38303837). San Francisco.
Xiao, J., Adler, B., Zhang, H., & Zhang, J. (2013). Pla-
Makarenko, A., Williams, S., Bourgault, F., & Durrant-Whyte,
nar segments based 3D point cloud registration in out-
H. (2002). An experiment in integrated exploration. In
door environments. Journal of Field Robotics, 30(4), 552
IEEE/RSJ International Conference on Intelligent Robots
582.
and Systems (vol. 1, pp. 534539). Lausanne, Switzerland.
Yamauchi, B. (1998). Frontier-based exploration using multiple
Mobarhani, A., Nazari, S., Tamjidi, A. H., & Taghirad, H. (2011).
robots. Agents (pp. 4753).
Histogram based frontier exploration. In IEEE/RSJ Inter-

Journal of Field Robotics DOI 10.1002/rob