Supervised and Semi-Supervise Detection of Flood-Prone Areas

Soft Comput (2017) 21:3673–3685
DOI 10.1007/s00500-015-1983-z
METHODOLOGIES AND APPLICATION
Supervised and semi-supervised classifiers for the detection of

flood-prone areas
Giorgio Gnecco1 · Rita Morisi1 · Giorgio Roth2 · Marcello Sanguineti3 ·
Angela Celeste Taramasso2
Published online: 9 January 2016

© Springer-Verlag Berlin Heidelberg 2016
Abstract Supervised and semi-supervised machine-lea- which limited labeled information is available. The proposed
rning techniques are applied and compared for the recog- machine-learning techniques are applied to the basin of the
nition of the flood hazard. The learning goal consists Italian Tanaro River. The experimental results show that for
in distinguishing between flood-exposed and marginal-risk this case study, semi-supervised methods outperform super-
areas. Kernel-based binary classifiers using six quantitative vised ones when—the number of labeled examples being the
morphological features, derived from data stored in digi- same for the two cases—only a few labeled examples are
tal elevation models, are trained to model the relationship used, together with a much larger number of unsupervised
between morphology and the flood hazard. According to ones.
the experimental outcomes, such classifiers are appropriate
tools when one is interested in performing an initial low-cost Keywords Kernel-based binary classifiers · Supervised
detection of flood-exposed areas, to be possibly refined in and semi-supervised learning · Morphological features ·
successive steps by more time-consuming and costly investi- Digital elevation models · Flood hazard
gations by experts. The use of these automatic classification
techniques is valuable, e.g., in insurance applications, where
one is interested in estimating the flood hazard of areas for 1 Introduction
Communicated by V. Loia. Among natural risks, floods are particularly relevant, as it
is witnessed by the frequency of inundation events, together
B Marcello Sanguineti
marcello.sanguineti@unige.it with all the associated negative consequences on society and
Giorgio Gnecco
local economies. The still limited availability of flood haz-
giorgio.gnecco@imtlucca.it ard information makes it very important to construct maps of
Rita Morisi
flood-exposed areas, which should be one of the first steps in
rita.morisi@imtlucca.it any analysis of the flood risk. In general, this kind of analysis
Giorgio Roth
begins with an investigation of hydrological data and contin-
giorgio.roth@unige.it ues by modeling the evolution, in both space and time, of
Angela Celeste Taramasso
the flooding process itself (Guzzetti et al. 2005; Horritt and
a.c.taramasso@unige.it Bates 2002; Hunter et al. 2007). Unfortunately, techniques
for the recognition of flood-exposed areas by experts down
1 Institute for Advanced Studies (IMT), Piazza San Ponziano, to very small scales (e.g., the scale of a single building) are
6, 55100 Lucca, Italy
very time-consuming and costly, as they require the acquisi-
2 Department of Civil, Chemical and Environmental tion of information which is not easily available for all the
Engineering (DICCA), University of Genova, Via
Montallegro, 1, Genoa 16145, Italy
areas of interest, together with the intervention of the experts
3
themselves. Hence, even nowadays, the mapping of flood-
Department of Computer Science, Bioengineering, Robotics,
and Systems Engineering (DIBRIS), University of Genova,
exposed areas is far from being complete. At the same time,
Via Opera Pia, 13, 16145 Genova, Italy in recent years, the availability of new technologies, such as
123
3674 G. Gnecco et al.
radar and laser altimetry and GPS, has led to the development The paper is organized as follows. Section 2 describes
of digital elevation models (DEMs), which have become stan- the dataset available and details the features used for the
dard tools of analysis for geomorphologists and hydrologists data analysis. Section 3 defines the classifications tasks and
(Bates et al. 2003; Gallant and Dowling 2003; Giannoni et al. the experimental settings. Section 4 describes two possible
2005; Dodov and Foufoula-Georgiou 2006; Nardi et al. 2008; approaches to the classification tasks, one based on a com-
Santini et al. 2009). Since DEMs make automatically avail- pletely supervised set of data and the other on both supervised
able several morphological and hydrological features (such and unsupervised sets, and summarizes the main ideas of
as drainage areas, stream channels, and valley bottoms), they the proposed approach. Section 6 presents the results, which
have replaced more time-consuming manual techniques. So, are discussed in Sect. 7. Some technical details about the
among other applications, nowadays DEMs are used for the machine-learning techniques exploited in the paper (support
identification of flood-exposed areas (see, e.g., Noman et al. vector machines, manifold regularization, and Laplacian sup-
2001; Hjerdt et al. 2004; Nardi et al. 2006; Manfreda et al. port vector machines) are collected in three appendices.
2011)
The larger availability of measured data, compared with
past acquisition techniques, has encouraged the application to 2 Dataset and features
flood risk analysis of machine-learning techniques, too. This
is done with the aim of using at the best the available infor- We are provided with a dataset made of 187,306 labeled data
mation to construct flood hazard maps, or simply to suggest points belonging to the Tanaro basin area. Each data point
which areas should be subject to a more detailed investiga- is described by a set of eight features (“feature vector”): its
tion by experts (see, e.g., Degiorgis et al. 2012, 2013). The latitude (α) and longitude (β), its distance from the nearest
application of machine-learning techniques to the flood risk stream (D), its elevation to the nearest stream (H ), the local
identification is particularly needed when a large portion of surface curvature (H ), the local contributing area (A), the
the data made available by DEMs is unlabeled. This is the local slope (S), and its absolute elevation (E).
case, e.g., of a high cost of labeling, as a detailed flood risk The data points are initially divided into two classes, one
analysis by experts usually requires a specific study for each representing marginal-hazard areas and the other one rep-
area of interest. resenting flood-prone areas. The first class is identified by
In this framework, here we propose the application the label 0. The data points belonging to the second class
of semi-supervised machine-learning techniques—namely, are further divided into three subclasses related to the hazard
Laplacian support vector machines (Belkin et al. 2006)—to level of the area: the label 1l stands for low-hazard level, 1m
flood risk analysis, with the goal of making a better use of the for medium-hazard level, and 1h for high-hazard level. Of the
large portion of unlabeled data which is often made available total 187,306 labeled data points in the dataset, 55,521 belong
by DEMs. Likewise in our previous works (Degiorgis et al. to flood-prone areas (without taking into account their hazard
2012, 2013), we make use of a variety of morphological fea- level), while the other 131,785 come from marginal-hazard
tures for the detection of the local flood hazard: specifically, areas. The 55,521 data points associated with flood-prone
the contributing area, the distance from the closest potential areas are further divided by hazard levels: 16,659 are labeled
flood source, the local slope, the relative and absolute eleva- as areas subject to a low flood hazard, 20,099 are subject to
tion of the site, and its concavity. The main difference with a medium flood hazard, and the remaining 18,763 ones are
respect to the two above-mentioned papers consists in the subject to a high flood hazard. In some of the experiments
additional exploitation of unsupervised information in the presented and discussed later, we will not distinguish among
training of the learning machine. At the same time, we com- the 55,521 flood-prone area data points according to their
pare the cases of supervised and semi-supervised training different hazard level, but we will group them in the same
information. class “flood-prone areas,” giving them the label 1.
Likewise in Degiorgis et al. (2012, 2013), the machine-
learning techniques are applied to the basin of the Italian
Tanaro River. This has been selected as a case study due 3 Classification tasks and experimental settings
to the coexistence therein of different kinds of natural mor-
phologies (from alluvial to mountain environments), together The goal consists in training several binary classifiers to learn
with the presence of civil and industrial settlements and their to distinguish between data points belonging to two differ-
infrastructures. Indeed, the Tanaro basin includes environ- ent classes of hydrologic areas (e.g., “marginal-hazard areas”
ments that are representative of a much larger geographical versus “flood-prone areas,” or “marginal-hazard areas” ver-
region, which includes large portions of Italy and south- sus “high-hazard areas”). This is performed by using a subset
ern Europe, with the exception of arid areas and major of the dataset at our disposal for training/validation and
rivers. another subset for testing purposes. In more detail, the idea is
123
Supervised and semi-supervised classifiers for the detection of flood-prone areas 3675
to study, for different kinds of binary classifiers and different motivated by applications (such as insurance ones) where
experimental settings, the classification performance of the a classification error on a flood-prone area has more
trained classifiers, i.e., their capability to correctly identify negative consequences than a classification error on a
the class associated with a data point given as input to the marginal-hazard area.
classifier (“input data point”). In doing this, we have to dis- • The second experimental setting uses a completely bal-
tinguish between data points used for training the classifier anced dataset, which is again extracted from the original
(“training examples”) and data points not used during the one: 50 % of the training/test data is made of data points
training process (“test examples”). A more detailed descrip- coming from high-hazard (respectively, flood-prone)
tion of the kind of classifiers adopted in this study is reported areas and the other 50 % is made of marginal-hazard data
in Sect. 4 and in “Appendices 1 and 3.” points (“50 % positive, 50 % negative” case). In this con-
The investigations in Degiorgis et al. (2012) and Degiorgis text, the two classes of objects have the same importance.
et al. (2013) were limited to a restricted family of classifiers,
and their focus was on the classification performance on the In order to reduce the time required to train the classifiers
set of training examples (“training set”). This was due to and to keep the dimension of the training problem relatively
the simplicity of the classifiers considered therein (i.e., in small (hence, avoiding the use of more complex training
machine-learning terms, their small “model complexity”), procedures, optimized for huge datasets), for each kind of
which guaranteed, with large confidence, a small difference experiment we have used only a relatively small subset of the
between the classification performance on the training set entire dataset to train the classifiers. Implementation details
and the one on the set of test examples (“test set”). Another are given later in Sect. 6.
novelty of the present work with respect to Degiorgis et al.
(2012) and Degiorgis et al. (2013) is that, as detailed in the
following, here we exploit also unsupervised examples to 4 The proposed approach
train the classifiers.
In order to compare the classification performance of dif- In the binary classification problems we are dealing with, two
ferent classifiers, we have performed several experiments to different kinds of data can be used (possibly simultaneously)
simulate and reproduce different realistic case studies. First, to train the classifiers:
we have considered two different binary classification prob-
lems, which differ for the choice of one of the two classes: • input data points provided to the classifier together with
the class they belong to (“supervised data”);
• binary classification of marginal-hazard (class 0) areas • input data points provided to the classifier without the
versus “high-hazard” (class 1h ) areas (“0 vs 1h ” classifi- class they belong to (“unsupervised data”).
cation problem);
• binary classification of marginal-hazard (class 0) areas In many real situations, only a few labeled data are avail-
versus “flood-prone” (class 1) areas (“0 vs 1” classifica- able, because usually the process of labeling requires special
tion problem). devices, expensive and time-consuming experiments, or
human annotators. Instead, a huge amount of unlabeled data
For each of such binary classification problems, we have is often available. Indeed, acquiring unlabeled data is gener-
trained and tested the classifiers by using two different ally much easier, less expensive, and faster than the process
choices for the percentage of data points belonging to the of acquisition of labeled data. This is particularly evident for
high-hazard (respectively, flood-prone) class over the total labeling the hazard level of a hydrologic area, which could be
number of training/test data. More precisely, we have con- done, e.g., by a risk-analysis study performed by an expert,
sidered the two following experimental settings. or by an analysis of historical series. In general, the acqui-
sition of label information for this kind of data is costly and
• In the first setting, an unbalanced dataset is used, time-consuming. Thus, methods able to exploit not only the
extracted from the original one, and characterized by a information coming from the labeled data, but also the one
larger amount of high-hazard (respectively, flood-prone) related to the unlabeled data are extremely useful. For this
data points than marginal-hazard data points: indica- reason, among the problems considered in this work, we have
tively, 70 % of high-hazard (respectively, flood-prone) simulated possible realistic scenarios characterized by only a
training/test data points versus 30 % of marginal-hazard few number of labeled data and a much larger number of unla-
training/test data points (“70 % positive, 30 % negative” beled data, with the aim to determine a classification method
case). This choice of the distribution of the positive and that is able to deal efficiently with this kind of situation.
negative objects does not reflect the composition of the Before describing in detail the experiments performed and
original dataset made of 187,306 data points, but it is the results achieved, let us provide a short description of
123
the general machine-learning framework that we adopt. The tion of LapSVM and its comparison with the original SVM
input data provided to the classifier are arranged in an input method.
data matrix X with n rows, corresponding to the n input We aim at building a classification model able to reach sat-
data points xi , i = 1, . . . , n, and m columns; here, m is the isfactory results for the hazard level classification in a “few
number of features describing each input data point. In our labeled data points regime”. At the same time, we are inter-
situation, for instance, n is smaller than or equal to 187,306, ested in obtaining a quite general classification model, one
while m is smaller than or equal to 8 (details are provided later that can be used in as many situations as possible. Indeed, it is
in Sect. 5). To each object xi , its own label is associated, i.e., extremely useful to determine a classification model, trained
the class it belongs to, which is denoted by yi , i = 1, . . . , n. on data belonging to a particular subregion of the river basin,
Thus, we refer to each single data point as a pair (xi ,yi ), that is able to perform well not only when tested on input data
i = 1, . . . , n. Since we always deal with binary classification coming from the same subregion, but also from a different
problems, we can assume that either yi = 0 or yi = 1 holds. subregion of the same basin or even from another basin.
For instance, for the “0 vs 1” binary classification problem, In the following sections, we consider these two classifi-
the data points belonging to the three largest hazard levels are cation methods:
grouped into the same “positive” class 1, while the marginal-
hazard data points compose the “negative” class 0. Hence, in • a LapSVM as a representative state-of-the-art semi-
this case, yi = 1 for an input data point xi corresponding to a supervised model;
flood-prone area, whereas yi = 0 if xi belongs to a marginal- • a completely supervised SVM classifier, for comparison
hazard area. The labels yi are collected into a vector Y with n purposes.
components. A well-known example of a binary supervised
classifier is the support vector machine (SVM) (Cristianini By simulating the LapSVM semi-supervised scenario, we
and Shawe-Taylor 2000), whose basic description is shortly shall investigate whether a satisfactory classification perfor-
reported in “Appendix 1” for the reader’s convenience. mance can be achieved by using only a few labeled data,
As mentioned before, in many real situations—such as together with a much larger amount of unlabeled data. This
in the case of the process of labeling the hazard level of a will be compared with the case where training is restricted to
hydrologic area—a huge amount of unlabeled data is avail- the labeled data above and a SVM is used, and to the situa-
able, motivated by a large cost of the labeling process. In this tion in which also the originally unlabeled data used to train
context, the dataset is composed of the LapSVM are provided to the SVM, together with their
labels (of course, assuming that one can obtain such an infor-
• l labeled objects {(xi , yi ), i = 1, . . . , l}, described by a mation, at some additional cost). Since, in our context, the
matrix X (L) with l rows and m columns, corresponding whole dataset is labeled, we have decided not to provide to
to the first l input data points xi , i = 1, . . . , l, and a the classifier the labels of some training examples, in order to
vector Y (L) of corresponding labels, with l components simulate the semi-supervised scenario. In this way, we deal
yi , i = 1, . . . , l; with them as unlabeled data, while we provide the remaining
• u unlabeled objects {x j , j = l +1, . . . , l +u}, associated input data points to the classifier, together with their labels.
with a second matrix X (U ) with u rows and m columns.
Clearly, the class of the u unlabeled objects is unknown 5 Experimental design

to the classifier (particularly, in its training phase). Learn-
ing methods that are able to deal simultaneously with both As mentioned in Sect. 1, each object in the dataset is
the l labeled and the u unlabeled data in the training phase described by eight features. However, when training the clas-
are commonly known as “semi-supervised learning” tech- sifiers (both the supervised and the semi-supervised ones),
niques. The goal of such techniques is to exploit potential we have decided not to provide the latitude and longitude
class-related information coming from the unlabeled data to features to them. Indeed, taking into account such informa-
find a classifier that is better—in terms of the percentage of tion would likely prevent the classifiers to learn a correct
correctly classified data—than a classifier trained in a super- classification in regions geographically distant from the ones
vised way, using the labeled data alone (Zhu and Goldberg associated with the training data points. Moreover, in case of
2009). In particular, in the following we shall apply one of SVM/LapSVM classifiers using certain kernel functions (see
such semi-supervised techniques, namely the Laplacian sup- “Appendix 1” for the definition of the kernel function), the
port vector machine (LapSVM), developed in Belkin et al. latitude and longitude features are expected to be misleading
(2006). We refer the reader to “Appendix 2” for the explana- for the classification of data points coming from sufficiently
tion of the main idea behind such a method (i.e., the principle separated regions (e.g., this would happen in case of the linear
of manifold regularization) and to “Appendix 3” for a descrip- kernel, see again “Appendix 1” for its definition). However,
123
as described later, in some cases we have still used the lat- ent test set, which was also randomly extracted from the set
itude and longitude features to separate geographically the S2, independently for each realization. Thus, for each type
training and test sets in a preprocessing phase, before training of classifier, N training sets and N test sets were considered
the classifiers. After having reduced in this way the number during the entire procedure. Note that, for each realization,
of features from 8 to 6, we have subsequently normalized the the training sets for the different classifiers are related, as
values assumed by each of the remaining features in the inter- they come from the same set TR, whereas the test set is the
val [−1, 1]. This has been performed in order to give a priori same for each type of classifier, in order to obtain compa-
the same importance to each feature in the training phase. rable results among the different learning methods. Finally,
Then, we have designed and run different experiments the classification errors computed at each realization, i.e.,
in order to test the performance of both the supervised and the “test errors”, were averaged over the N realizations. The
the semi-supervised classifier, dealing with case studies rep- choice of non-overlapping training and test sets allows one to
resenting realistic situations. In particular, for each binary reduce overfitting problems (Theodoridis and Koutroumbas
classification problem considered (i.e., either the “0 vs 1h ” 2009), preventing the trained classifiers to reduce their gen-
or the “0 vs 1” problem), we have performed N different eralization capability. In addition, when compared with the
realizations of the experiments, as detailed in the following. case of a fixed test set, the procedure of randomly generating
First, we have partitioned the entire dataset into two subsets N different test sets reduces the bias in the overall classi-
S1 and S2 , from which the training set and the test set have fication performance, producing results that are statistically
been, respectively, extracted (as described later in this sec- more accurate.
tion). This partitioning has been obtained using the following The SVM and LapSVM classifiers considered in the fol-
three procedures. lowing are characterized by a series of parameters to be tuned.
Indeed, for both cases we have used a Gaussian kernel (see
• The first procedure consists in randomly assigning, with “Appendix 1” for its definition), which contains an internal
the same probability, each point of the entire dataset either width parameter σ to be chosen. Similarly, for the LapSVM
to S1 or S2 ; clearly, in this way the two sets share no data. one has to tune also the number k of nearest neighbors that
• The second procedure exploits a threshold on the latitude are used to build the edge set E of the graph that is used
feature to partition the input data. More precisely, the inside the associated optimization problem (see “Appendix
data points whose latitude is smaller than or equal to a 3”). Such parameters have to be tuned externally, as they
given threshold are assigned to the set S1 , whereas the assume fixed values in the optimization problems associated
ones whose latitude is greater than the same threshold with SVM and LapSVM training (see, respectively, formula
are assigned to the set S2 . (1) in “Appendix 1,” and formula (7) in “Appendix 3”). In
• The third procedure exploits a threshold on the longitude particular, for the SVM classifier with the Gaussian kernel,
feature to partition the input data. The data points whose as it can be argued from “Appendix 1,” the parameters are:
longitude is greater than or equal to a given threshold are
assigned to the set S1 , while those with longitude smaller • the width σ of the Gaussian kernel (formula (6) in
than the same threshold are assigned to the set S2 . “Appendix 1”);
• the regularization parameter γ A (formula (1) in “Appen-
Although no overlap between the sets S1 and S2 occurs for dix 1”).
any of the three procedures, in the first case it may still hap-
pen that some data points in S1 are very similar to data points Similarly, as it is shown in “Appendices 2 and 3,” the para-
in S2 , as they may belong to adjacent geographical areas. In meters of the LapSVM classifier with the Gaussian kernel
the second case, however, this issue is likely limited to data are:
points for which the latitude feature is near the threshold.
A similar remark holds for the third case, referring to the • k: the number of nearest neighbors considered for the
longitude feature rather than the latitude one. Hence, the last construction of the graph (“Appendix 2”);
two cases are intended to simulate a possible realistic situa- • t: the parameter used inside the definition of the Gaussian
tion where the sets S1 and S2 (hence, also the training and test weights for the edges of the graph (“Appendix 2”);
sets) are composed of data belonging to different subregions. • the width σ of the Gaussian kernel (formula (6) in
The experiments were performed by following the pro- “Appendix 1”);
cedure described in the following. First, for each of the N • the regularization parameters γ A and γ I (formula (7) in
realizations, a set TR was randomly extracted from the set “Appendix 3”).
S1 , independently for each realization. Then, each type of
classifier was trained by taking a subset of TR as the training For some parameters, a cross-validation procedure (Bishop
set (details are provided in Sect. 6) and tested on a differ- 2006) was carried out, in order to find the values that are
123
most suitable for the particular context and situation inves- 6 Experimental results
tigated. In particular, after having determined the best set
of parameters, i.e., the one corresponding to the smallest We implemented the method described in Sect. 5 in MAT-
average validation error, the model was trained again on the LAB. The code is mainly based on the library lapsvm from
entire training set, by using such optimal parameters. Finally, (Melacci and Belkin 2012), available at http://sourceforge.
the obtained model was evaluated on the test set, which has net/projects/lapsvmp/. We report the results obtained by gen-
not been previously used in the overall training/validation erating, for each type of experiment, N = 10 realizations. As
process. mentioned in Sect. 5, the test set is randomly and indepen-
In particular, during the cross-validation procedure, the dently generated at each realization and is made of T = 4000
training objects are equally distributed in the K folds in order objects. In particular, in the “70 % positive, 30 % negative”
to make each fold as representative as possible of the entire case it is made of 2800 objects from the positive class and
training set. More precisely, in order to have the same per- 1200 from the negative one. Similarly, still in the “70 % posi-
centage of labeled and unlabeled objects in each group, at first tive, 30 % negative” case, the training set, randomly extracted
we divided all the u unlabeled objects in K folds, assuming during each realization, is composed of 140 labeled objects
that u is a multiple of K . Similarly, both the positive and for the positive class, 60 labeled objects for the negative
negative labeled objects (l P objects for the “positive” class, class, and u = 3800 unlabeled objects, which are gener-
l N objects for the “negative” class) were partitioned into K ated from labeled objects simply by removing their labels
different groups, assuming that both l P and l N are multiples before presenting them to the classifier. In addition, 2660
of K . Thus, each fold is composed of l P /K positive labeled of these unlabeled objects come from the positive class and
objects, l N /K negative labeled objects, and u/K unlabeled 1140 from the negative (but this additional information is not
objects. provided to the classifier). In the “50 % positive, 50 % neg-
In order to highlight the effect of the unsupervised exam- ative” case, instead, the test set is made of 2000 objects for
ples when moving from the SVM to the LapSVM classifier, each class, and the training set is composed of 100 labeled
the parameters that are in common to both the SVM and the objects for each class, and u = 3800 unlabeled objects, which
LapSVM were assigned to the same values for both mod- are generated as described before. In addition, 1900 of these
els. More precisely, the following procedure was followed. unlabeled objects come from the positive class and 1900 from
First, the value of the width σ of the Gaussian kernel was the negative one (and again, this additional information is not
fixed. Then, the cross-validation procedure detailed above provided to the classifier).
was carried out, in order to find the best value of the regular- Concerning the completely supervised SVM classifier, in
ization parameter γ A for the SVM. Then, the parameters σ the following tables we report two kinds of classification
and γ A of the LapSVM were fixed to the same values cho- results. The first column shows the results achieved by the
sen for the SVM (i.e., its a priori fixed valued for σ , and SVM trained only with l = 200 labeled objects (the same
the one γ A chosen by the first cross-validation performed labeled objects used also to train the LapSVM), while the
for the SVM). Moreover, the value of the Gaussian weight third column reports the results obtained by training the SVM
parameter t of the LapSVM was fixed to the same value cho- on all the 4000 objects composing the training set (i.e., rein-
sen for the parameter σ , as the two parameters have similar serting the labels in its unlabeled objects). Finally, the second
meanings. Then, for the LapSVM, the cross-validation was column refers to the semi-supervised LapSVM, trained with
performed, in order to find the best values for the remain- l = 200 labeled objects and u = 3800 unlabeled objects
ing parameters k (the number of nearest neighbors used to As regards the choices of the parameters in the classi-
build the associated graph) and γ I (the additional regulariza- fiers, the width σ of the Gaussian kernel has been set to 0.5,
tion parameter), which are involved in the “semi-supervised and also the parameter t of the Gaussian edge weight has
component” of the optimization problem associated with the been set to 0.5. Concerning the cross-validation procedure,
LapSVM, see formula (7) in “Appendix 3.” The motivation we have set K = 5. Then, a first cross-validation has been
for the whole procedure is that both parameters σ and γ A performed on the regularization parameter γ A , restricting
appear in the “supervised component” of such an optimiza- the values considered for such a parameter in the three-
tion problem (as formula 7 in “Appendix 3” and formula 6 element set {10−3 , 10−2 , 10−1 }. As already mentioned, we
in “Appendix 1” show), which is in common with the opti- have decided to fix the parameters σ and γ A to the same values
mization problem associated with the SVM (see formulas 1 for both classifiers in order not to have substantial differences
and 6 in “Appendix 1”). As the procedure assigns the same in the supervised part of the optimization problems asso-
values to such common parameters, any difference in classi- ciated, respectively, with the completely supervised SVM
fication performance between the two classifiers is likely to classifier and the semi-supervised LapSVM classifier. Then,
be ascribed mainly to the absence/presence of unsupervised a second cross-validation procedure has been performed for
training examples. the additional parameters of the LapSVM model. In the spe-
123
Table 1 “0 vs 1h ” binary classification problem, when the sets S1 and S2 are obtained by a random partitioning of the dataset
70 % positive, 30 % negative 50 % positive, 50 % negative
SVM (l = 200) (%) LapSVM (%) SVM (l = 4000) (%) SVM (l = 200) (%) LapSVM (%) SVM (l = 4000) (%)
Accuracy 93 ± 0.5 92 ± 0.4 96 ± 0.2 92 ± 1 91 ± 0.5 95 ± 0.4

TP 95 ± 0.3 98 ± 0.5 98 ± 0.4 93 ± 4 95 ± 1 96 ± 0.4
TN 89 ± 1 78 ± 0.2 90 ± 2 91 ± 2 87 ± 1 93 ± 0.6
Table 2 “0 vs 1h ” binary classification problem, when the sets S1 and S2 are defined, respectively, as the subset of objects with latitude greater
than or equal to 2750 (expressed in pixel units) and the subset of objects with latitude smaller than the same threshold
Accuracy 89 ± 2 91 ± 1 95 ± 0.8 89 ± 2 91 ± 1 94 ± 0.9

TP 97 ± 5 99 ± 0.5 97 ± 0.9 90 ± 5 98 ± 1 95 ± 3
TN 75 ± 10 73 ± 5 91 ± 2 89 ± 5 83 ± 4 92 ± 1
Table 3 “0 vs 1h ” binary classification problem, when the sets S1 and S2 are defined, respectively, as the subset of objects with longitude smaller
than 2400 (expressed in pixel units) and the subset of objects with latitude greater than the same threshold
Accuracy 89 ± 3 91 ± 0.8 94 ± 0.6 88 ± 2 90 ± 0.7 93 ± 0.3

TP 90 ± 6 99 ± 0.7 97 ± 1 88 ± 5 96 ± 2 96 ± 0.7
TN 87 ± 3 75 ± 3 89 ± 1 90 ± 3 84 ± 3 91 ± 0.6
Table 4 “0 vs 1” binary classification problem, when the sets S1 and S2 are obtained by a random partitioning of the dataset
SVM (l = 200) (%) LapSVM(%) SVM (l = 4000)(%) SVM (l = 200)(%) LapSVM (%) SVM (l = 4000)(%)
Accuracy 89 ± 1 90 ± 0.7 93 ± 0.4 88 ± 0.3 87 ± 0.3 90 ± 0.5

TP 90 ± 2 93 ± 1 98 ± 0.3 90 ± 2 92 ± 2 93 ± 0.5
TN 86 ± 2 80 ± 2 77 ± 2 86 ± 1 83 ± 2 86 ± 0.7
cific experiments, the following possible choices for such • the percentage of data points in the test set correctly clas-
parameters have been examined during cross-validation: The sified by the trained classifier (accuracy);
number k of neighbors needed for the graph construction in • the true positive rate TP, which is the ratio between the
the LapSVM has been chosen inside the three-element set number of “positive” data points in the test set that have
{5, 7, 10}, whereas the value of the second regularization been correctly classified by the trained classifier as “pos-
parameter γ I has been chosen inside the three-element set itive” data points, and the total number of “positive” data
{10−1 , 1, 10}. Likewise the set used for γ A , such sets have points in the test set;
been chosen of small cardinalities, to limit the computational • the true negative rate, defined as TN = 1 – FP, where FP
time needed to perform the cross-validation procedure. stands for false positive rate, i.e., the ratio between the
Tables 1, 2, 3, 4, 5, and 6 show the results achieved by both number of “negative” data points in the test set that have
the supervised and semi-supervised classifiers. We report the been erroneously classified by the trained classifier as
results obtained for the different case studies described in “positive” data points, and the total number of “negative”
the sections above. The tables show the mean accuracies data points in the test set.
obtained averaging the test accuracies over the N = 10 dif-
ferent realizations, and their empirical standard deviations. As an illustrative example, Figs. 1, 2, and 3 show, respec-
Such accuracies are defined, respectively, as tively:
123
Table 5 “0 vs 1” binary classification problem, when the sets S1 and S2 are defined, respectively, as the subset of objects with latitude greater than
or equal to 2750 (expressed in pixel units) and the subset of objects with latitude smaller than the same threshold
SVM (l = 200) (%) LapSVM (%) SVM (l = 4000) (%) SVM (l = 200) (%) LapSVM (%) SVM (l = 4000)(%)
Accuracy 86 ± 5 90 ± 1 90 ± 2 80 ± 2 83 ± 3 83 ± 2
TP 93 ± 8 99 ± 1 98 ± 3 93 ± 5 98 ± 1 96 ± 4
TN 68 ± 8 65 ± 8 65 ± 2 67 ± 6 68 ± 8 70 ± 2
Table 6 “0 vs 1” binary classification problem, when the sets S1 and S2 are defined, respectively, as the subset of objects with longitude smaller
than 2400 (expressed in pixel units) and the subset of objects with latitude greater than the same threshold
Accuracy 87 ± 2 88 ± 1 89 ±0.7 84 ± 2 85 ± 0.5 83 ± 0.5

TP 88 ± 4 91 ± 2 95 ± 0.7 89 ± 5 91 ± 1 92 ± 0.7
TN 83 ± 5 80 ± 2 69 ± 2 79 ± 5 79 ± 2 74 ± 0.8
by choosing an SVM with the best parameters (in terms

of the validation error) among the N different classifiers
trained during the N realizations;
– the classifications produced by a LapSVM trained on
l = 200 data points and u = 3800 unlabeled data points
(50 % positive, 50 % negative) when the scenario is the
one reported in Table 4, and the sets S1 and S2 were
obtained by a random partitioning of the entire data set.
The classifier chosen is one of the N different semi-
supervised classifiers trained during the N realizations
with the “best” set of parameters, in terms of the valida-
tion error.
For illustrative purposes, Figs. 2 and 3 report the classifica-

tion results obtained by the two best classifiers on a larger
dataset, which contains also examples whose labels are com-
pletely unknown (not only to each learning machine, during
its training), simply because their labels are not available in
such an extended dataset. So, they differ from the 3800 “unla-
beled” examples used by the LapSVM, which, as already
reported, are originally “labeled” examples, whose label has
only been “hidden” to the learning machine during its train-
ing.
In addition, to better investigate the behavior of the TP
Fig. 1 Ground truth and FP rates on the test set, we evaluated their variations
with respect to the composition of the training set, focusing
on the case in which the three classifiers are applied to the
– the ground truth; “0 vs 1” binary classification problem, and the sets S1 and S2
– the classifications produced by an SVM trained on l = are defined by means of a threshold on the latitude (i.e., the
200 labeled data points when the scenario considered is same scenario of the results reported in Table 5). In partic-
the one reported in Table 4, with 50 % positive training ular, we varied the percentage of positive and negative data
data points and 50 % training negative data points, and the points in both the training and test sets, both of which have
sets S1 and S2 were obtained by a random partitioning of been randomly extracted from S1 and S2 . More precisely, we
the entire data set. In particular, the results were obtained considered seven different situations where the percentage
123
Fig. 4 ROC curves

FP rates were averaged over the N realizations. We display
the obtained results by means of the receiver operating char-
acteristic (ROC) curve shown in Fig. 4. In such figure, the
percentage of positive data points over the total is indicated
only for the curve representing the results of the SVM trained
with l = 200 labeled data points. Clearly, the same labeling
Fig. 2 Output of the SVM classifier (l = 200) of the percentage holds for the other two curves.
Tables 2, 3, 4, 5, and 6 show a LapSVM accuracy equal or
even larger than the SVM accuracy (except for the case shown
in Table 4, right column), when the latter was trained with
the same number of labeled objects (but a smaller number of
total examples, as the SVM does not use the unsupervised
examples in its training). The improvements are larger in the
unbalanced case, where 70 % of the test set is made of pos-
itive objects, while the remaining 30 % is made of negative
data points. Moreover, we can see that the LapSVM accuracy
always improves when moving from the totally balanced sit-
uation (“50 % positive, 50 % negative”) to the unbalanced
(“70 % positive, 30 % negative”) one (except for the case
shown in Table 2, where the accuracy remains the same). In
addition, the true positive (TP) rates reported in the differ-
ent tables highlight that the LapSVM performs better than
the SVM trained with the same number of labeled objects
in detecting the actual high-hazard/flood-prone areas. This
is particularly important for applications such as insurance
ones. In this context, indeed, the wrong classification of a
positive object as a negative one is much less desirable than
the wrong classification of a negative object as a positive one.
Thus, if one has to choose between a higher TP rate with a
slightly higher FP rate and a lower TP rate with a slightly
lower FP rate, the first case is preferred. A smaller percent-
Fig. 3 Output of the LapSVM classifier age of positive objects are in fact misclassified in the first
case. However, the results reported in the third column of the
of positive data points over the total is 20, 30, 40, 50, 60, 70, tables show that a better classification performance is usually
and 80 %, respectively. For each type of situation, N = 10 obtained for the SVM, when this is trained using a number
different realizations were considered and both the TP and of labeled data points equal to the number of (labeled and
123
unlabeled) data employed by the LapSVM itself. This was against floods. We have focused on binary classifiers, as our
expected, as such a SVM models a case in which the classi- goal consisted in comparing different classifiers based on the
fier has at its disposal a larger amount of (costly) information absence/presence of labels associated with some examples
for its training. (dealt with as supervised by some classifiers, and as unsuper-
vised by other ones), without taking into account the number
of possible values for the labels. Another future development
7 Discussion and conclusions consists in extending the comparisons to the multi-class case,
either training directly multi-class classifiers, or combining
This work has been focused on the development of a semi- several binary classifiers, likewise in Degiorgis et al. (2012).
supervised technique, namely the Laplacian support vector
machine (LapSVM), to classify marginal-hazard and high- Acknowledgements Marcello Sanguineti is a member of the Gruppo
Nazionale per l’Analisi Matematica, la Probabilità e le loro Applicazioni
hazard/flood-prone areas. The emphasis is on understanding (GNAMPA) of the Istituto Nazionale di Alta Matematica (INdAM).
the potentiality of this semi-supervised method when only
a few labeled examples are provided, together with a much Compliance with ethical standards
larger number of unsupervised examples. In order to high-
light the capabilities of the LapSVM classifier, we performed Conflict of interest The authors declare no conflict of interest.
different experiments comparing the performances obtained
by this technique with the ones achieved by the fully super- 8 Appendix 1: Support vector machines
vised SVM classifier, using the same number of labeled data.
For comparison purposes, we considered also the case of a Let a set made of a finite number l of labeled training data
SVM classifier trained with a much larger number of (costly) {(xi , yi ), i = 1, . . . , l} be given, with xi ∈ Rm and yi ∈
labeled examples. {-1,1}. Here, with a slight change in notation with respect to
To understand the generalization capability of the models the previous sections, the label −1 (instead than the label 0)
in various situations, we took into account different cases for is used to denote the “negative” class, while +1 is the “pos-
the generations of the training and test sets. Beside choosing itive” class label. Given a regularization parameter γ A > 0
the training and test set from the same subregion, in other and a suitable function space HK , more precisely, a repro-
experiments we separated the training and test sets by using ducing kernel Hilbert space (Cristianini and Shawe-Taylor
longitude and latitude information, in order to simulate a 2000), the (binary) support vector machine (SVM) training
possible situation where the classifier is trained with data problem consists in searching for a classifier f ∗ that solves
belonging to a specific subregion, and subsequently tested on the following optimization problem: find
a different sub-region. In particular, from Tables 5 and 6 one
1
l
realizes that, for the “0 vs 1” binary classification problem,
the LapSVM achieves the same or even a better performance min f ∈HK (1 − yi f (xi ))+ + γ A f H2 K . (1)
l
than the SVM trained with l = 4000 labeled examples, when i=1
the two sets S1 and S2 were obtained by using a threshold

either on the latitude or on the longitude and both were trained By || · ||2HK we denote the square of the norm in the repro-
by using 50 % positive examples and 50 % negative exam- ducing kernel Hilbert space HK , and (1 − yi f (xi ))+ is the
ples. Hence, the semi-supervised classifier shows a larger so-called hinge-loss function, which is defined as
generalization capability when the training and test sets are (1 − yi f (xi ))+ := max(0, 1 − yi f (xi ). (2)
extracted from geographically separated regions. In addition,

from Fig. 4 we can infer that the LapSVM performs well also The term 1l li=1 (1 − yi f (xi ))+ in (1) penalizes the
when unbalanced data sets are considered, and the error in classification error on the training set, whereas the term
the detection of the high-hazard/flood-prone areas is more γ A || f ||2HK in (1) enforces a small norm of the optimal solu-
penalized than the error in the detection of marginal-hazard tion f ∗ in the reproducing kernel Hilbert space HK (i.e.,
areas. The true positive (TP) rates reported in the different typically, high smoothness for f ∗ ). Given a (possibly unseen)
tables highlight that, the number of labeled training examples data point x ∈ Rm , the optimal classifier f ∗ assigns to x the
being the same, the LapSVM perform better than the SVM in label +1 if f ∗ (x) ≥ 0, otherwise it assigns to x the label −1.
detecting the actual flood-prone areas, which is particularly The optimization problem (1) can be rewritten in the fol-
important for certain applications, e.g., insurance ones. lowing way: find
In the data analysis, we used a variety of morphological

features for the detection of the local flood hazard. As a future 1
l
extension, one can take into account also externalities mod- min f ∈HK ,ξi R ξi + γ A || f ||2HK (3)
l
eled through features related to the local human intervention i=1
123

yi f (xi ) ≥ 1 − ξi , for i = 1, . . . , l, are provided, but a much larger set of unlabeled data is avail-
subject to able. Indeed, label information is not needed to determine the
ξi ≥ 0, for i = 1, . . . , l.
underlying geometry of the input data; hence, both labeled
We denote by K : Rm × Rm → R the (uniquely and unlabeled data can be used to this aim.
determined) kernel function associated with the reproduc- A main assumption of manifold regularization is that the
ing kernel Hilbert space HK (Cristianini and Shawe-Taylor input data points are drawn from a probability distribution
2000). The optimal solution f ∗ of the optimization problem whose support resides on a Riemannian manifold embedded
(3) is provided by the representer theorem (Cristianini and in the original feature space. A two-dimensional manifold
Shawe-Taylor 2000) in the following form: can be thought as a surface embedded in a higher dimensional
Euclidean space (Do Carmo 1976). The surface of the Earth,

l for instance, is approximately a two-dimensional manifold
f ∗ (x) = αi∗ K(x, xi ), (4) embedded in a three-dimensional space. Similar remarks
i=1 hold for larger dimensional manifolds. A Riemannian man-
ifold is one on which one can define the “intrinsic distance”
where the optimal coefficients αi∗ R. Therefore, solving the between any two points on the Riemannian manifold itself as
optimization problem (3) is reduced to determining the finite- their geodesic distance on the manifold, i.e., the length of the
dimensional coefficients αi that minimize its objective, when shortest path on the manifold between the two points. Man-
the function f is constrained to have the form (4). For a ifold regularization assumes that similar labels are expected
reproducing kernel Hilbert space HK , the kernel K has often to be assigned to points xi and x j that are close with respect
a simple expression. This is the case, e.g., of the linear kernel to the intrinsic distance on the Riemannian manifold they
lie on. So, determining an approximation of the Riemannian
K(x, y) := x, yRm , (5) manifold is expected to help the classification process.
In practice, an approximation of the Riemannian manifold
and of the Gaussian kernel can be obtained by using both the labeled and unlabeled input
data, exploiting them to build an undirected graph G = (V, E)
||x − y||2Rm which provides a discrete model of the manifold itself and
K(x, y) := exp − , (6)
2σ 2 which is associated with a symmetric matrix W of suitable
nonnegative weights. We recall that a graph is a representa-
where σ > 0 is a fixed width parameter. It often happens that tion of a set of objects where some of them are connected
only a small subset of the coefficients αi∗ (with respect to their by links (“edges”); the graph is called undirected if no ori-
total number l) is different from 0; the input data points xi entation on such links is defined, while it is weighted if one
associated with nonzero αi∗ are called support vectors. is given a measure of the strength of the links between pairs
In practice, a binary SVM classifier can be interpreted as of objects (“vertices”). In the context of the present work,
a binary linear classifier in a (possibly infinite-dimensional) the vertices in the set V correspond to the input data points
auxiliary feature space associated with the reproducing ker- (the feature vectors), while the links between different pairs
nel Hilbert space HK . The mapping between the original of input data points form the edge set E. In the context of
feature space Rm and the auxiliary feature space is typically manifold regularization, one considers weighted undirected
nonlinear. A binary SVM classifier often allows one to sepa- graphs; assigning a weight to an edge means defining a mea-
rate nonlinearly data points that are not linearly separable in sure of similarity between the associated vertices (input data
the original feature space. points). Once a similarity measure has been chosen, the larger
the similarity between the two input data points xi and x j ,
the stronger their connection in the graph. Due to the basic
9 Appendix 2: Manifold regularization assumption of manifold regularization reported in the para-
graph above, the higher the weight Wi j = W ji between xi
Manifold regularization is a class of semi-supervised learn- and x j , the higher the probability that they belong to the
ing techniques, described in Belkin et al. (2006), whose goal same class.
consists in exploiting the information contained in the train- Determining a suitable similarity measure between every
ing objects in order to determine the underlying geometry of pair of input data points (hence, a suitable weight matrix
the input data in such a way to improve the overall classifica- W ) is a challenging task, and several methods have been
tion performance. Manifold regularization aims at capturing proposed in the literature to deal with such an issue. In fact,
such a geometrical structure and exploiting it to build a clas- this measure is fundamental to build the graph that models
sifier having better classification performance than a fully the manifold where the data lie on. Usually, two choices for
supervised one, in situations where only a few labeled data the weights Wi j are considered. They could be either binary
123
weights, or Gaussian weights. In the first case, the weight

between two points is set to 1 if they are sufficiently close
in the original feature space, otherwise it is set to 0. In the
second case, one sets to Wi j := e−||xi −x j || /4t the weight
2 2
between two points xi and x j that are connected according

to the first method, where t > 0 is a suitable width parameter.
All the other weights are defined as being equal to 0. In this
work, we have decided to use only the second method to
define the weights of the edges. Note that the first one can be
considered a limit case of the latter, since it is obtained from
it for t → +∞ .
Another way to build the graph approximating the Rie-
mannian manifold is described in von Luxburg (2007). One
of its main features consists in applying a k-nearest neigh-
bors procedure to determine the edges of the graph, where k
is a user-defined parameter. In a first stage, each input data
point is connected to its k-nearest input data points, using the Fig. 5 Representation of the Riemannian manifold approximation by
Euclidean metric in the original feature space to define the means of a graph. All the objects (negative, positive, unlabeled) were
set of k-nearest neighbors. In general, this procedure leads to used to generate the graph. The links between the different points rep-
resent the edges of the graph that approximates the manifold the data lie
the definition of a directed graph; in order to obtain an undi- on. The links were generated according to the second method detailed
rected one (which is needed by manifold regularization), two in the text, i.e., a link between the two nodes xi and x j has been created
possible methods are usually implemented: if and only if xi is among the k-nearest neighbors of x j and, vice versa,
x j is among the k-nearest neighbors of xi (the value k = 5 has been
chosen). Finally, binary weights have been used for the links
• the first method consists in connecting two input data
points xi and x j if and only if either xi is among the k-
nearest neighbors of x j or, vice versa, x j is among the with x j ∈ Rm . As in “Appendix 1,” HK denotes a reproduc-
k-nearest neighbors of xi ; ing kernel Hilbert space, whereas γ A > 0 is a regularization
• the second method, which leads to a less connected (i.e., parameter. We also assume that a second regularization para-
sparser) graph, creates a link between the two nodes xi meter γ I > 0 is given. With these premises, the (binary)
and x j if and only if xi is among the k-nearest neighbors Laplacian support vector machine (LapSVM) (Belkin et al.
of x j and, vice versa, x j is among the k-nearest neighbors 2006) extends the SVM formulation described in “Appendix
of xi . In this work, we have applied this kind of method, 1” by solving the following optimization problem (which
because we prefer to deal with less connected graphs. is inspired by the principle of manifold regularization, see
“Appendix 2”): find
Once the topology (i.e., the edge set E) of the graph has been
1
l
fixed, the weight matrix W can be defined by following one min f ∈HK (1 − yi f (xi ))
of the two k-nearest neighbors procedures described in the l
i=1

paragraph above, i.e., by assigning to the edges determined γI
+γ A || f ||2HK + fTLf (7)
through such procedures either a binary weight or a Gaussian (u + l)2
weight.
Figure 5 shows an example of construction of the graph. where f := [ f (x1 ), . . . , f (xl+u )]T and L is the graph
Note that in the figure, we consider only two features for each Laplacian matrix defined as L := D − W . Here, W
point in order to visualize easily the graph. denotes a suitable (l + u) × (l + u) symmetric matrix of
weights (see “Appendix 2” for one its possible construc-
tions); its generic element Wi j = W ji is the weight of the
Appendix 3: Laplacian support vector machines edge between the ith and the jth input data points. D is
a diagonal
matrix whose diagonal elements are defined as
Likewise in “Appendix 1”, we assume that a set made of Dii := l+u j=1 Wi j . Likewise in Appendix A, the goal of the
1 l
a finite number l of labeled training data {(xi , yi ), i = term l i=1 (1 − yi f (xi ))+ in (7) is to penalize the classifi-
1, . . . , l}, with xi ∈ Rm and yi ∈{−1,1} is available. We also cation error on the training set, whereas the term γ A || f ||2HK
assume the presence of a second set made of a finite num- in (7) enforces a small norm of the optimal solution f ∗ in
ber u of unlabeled training data {x j , j = l + 1, . . . , l + u}, the reproducing kernel Hilbert space HK (i.e., typically, high
123
γI
smoothness for f ∗ ). Finally, the term (u+l) T
2 f L f enforces Gallant JC, Dowling TI (2003) A multiresolution index of valley bot-
smoothness of the optimal solution f ∗ also with respect to tom flatness for mapping depositional areas. Water Resour Res
39:1347–1360
the graph approximation of the Riemannian manifold. Giannoni F, Roth G, Rudari R (2005) A procedure for drainage network
The expression of the optimal solution f ∗ of problem (7) identification from geomorphology and its application to the pre-
follows again from another form of the representer theorem diction of the hydrologic response. Adv Water Resour 28:567–581
and is given by Guzzetti F, Stark CP, Salvati P (2005) Evaluation of flood and landslide
risk to the population of Italy. Environ Manag 36:15–36
Hjerdt KN, McDonnell JJ, Seibert J, Rodhe A (2004) A new topographic

l+u
index to quantify downslope controls on local drainage. Water
f ∗ (x) := αi∗ K (x, xi ), (8) Resour Res 40. doi:10.1029/2004WR003130
i=1 Horritt MS, Bates PD (2002) Evaluation of 1D and 2D numerical models
for predicting river flood inundation. J Hydrol 268:87–99
for suitable optimal coefficients αi∗ R. Again, solving the Hunter NM, Bates PD, Horritt MS, Wilson MD (2007) Simple spatially-
optimization problem (7) is reduced to determine the finite- distributed models for predicting flood inundation: a review.
Geomorphology 90:208–225
dimensional coefficients αi that minimize its objective, when von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput
the function f has the form (8). 17:395–416
Manfreda S, Di Leo M, Sole A (2011) Detection of flood-prone areas
using digital elevation models. J Hydrol Eng 16(10):781–790.
doi:10.1061/(ASCE)HE.1943-5584.0000367
References Melacci S, Belkin M (2012) Laplacian support vector machines trained
in the primal. J Mach Learn Res 12:1149–1184
Bates PD, Marks KJ, Horritt MS (2003) Optimal use of high resolu- Nardi F, Vivoni ER, Grimaldi S (2006) Investigating a floodplain scal-
tion topographic data in flood inundation models. Hydrol Process ing relation using a hydrogeomorphic delineation method. Water
17:537–557 Resour Res 42(9). doi:10.1029/2005WR004155
Belkin M, Niyogi P, Sindhawani V (2006) Manifold regularization: Nardi F, Grimaldi S, Santini M, Petroselli A, Ubertini L (2008) Hydro-
a geometric framework for learning from labeled and unlabeled geomorphic properties of simulated drainage patterns using digital
examples. J Mach Learn Res 7:2399–2434 elevation models: the flat area issue. Hydrol Sci J 53:1176–1193
Bishop CM (2006) Pattern recognition and machine learning. Springer, Noman NS, Nelson EJ, Zundel AK (2001) Review of automated flood-
New York plain delineation from digital terrain models. J Water Resour Plan
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector Manag 127(6):394–402
machines and other kernel-based learning methods. Cambridge Santini M, Grimaldi S, Nardi F, Petroselli A, Rulli MC (2009) Pre-
University Press, Cambridge processing algorithms and landslide modelling on remotely sensed
Degiorgis M, Gnecco G, Gorni S, Roth G, Sanguineti M, Taramasso DEMs. Geomorphology 113:110–125
AC (2012) Classifiers for the detection of flood-prone areas using Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn.
remote sensed elevation data. J Hydrol 470–471:302–315 Academic Press, New York
Degiorgis M, Gnecco G, Gorni S, Roth G, Sanguineti M, Taramasso AC Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning.
(2013) Flood hazard assessment via threshold binary classifiers: Morgan & Claypool Publishers, San Rafael
the case study of the Tanaro basin. Irrigation Drainage 62:1–10
Do Carmo MP (1976) Differential geometry of curves and surfaces, vol
2. Prentice-Hall, Englewood Cliffs
Dodov BA, Foufoula-Georgiou E (2006) Floodplain morphometry
extraction from a high-resolution digital elevation model: a simple
algorithm for regional analysis studies. IEEE Geosci Remote Sens
Lett 3:410–413
123

Supervised and Semi-Supervise Detection of Flood-Prone Areas

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Supervised and Semi-Supervise Detection of Flood-Prone Areas

Hochgeladen von

Copyright:

Verfügbare Formate

Soft Comput (2017) 21:3673–3685

METHODOLOGIES AND APPLICATION

Supervised and semi-supervised classifiers for the detection of

Published online: 9 January 2016

Clearly, the class of the u unlabeled objects is unknown 5 Experimental design

Accuracy 93 ± 0.5 92 ± 0.4 96 ± 0.2 92 ± 1 91 ± 0.5 95 ± 0.4

Accuracy 89 ± 2 91 ± 1 95 ± 0.8 89 ± 2 91 ± 1 94 ± 0.9

Accuracy 89 ± 3 91 ± 0.8 94 ± 0.6 88 ± 2 90 ± 0.7 93 ± 0.3

Accuracy 89 ± 1 90 ± 0.7 93 ± 0.4 88 ± 0.3 87 ± 0.3 90 ± 0.5

Accuracy 87 ± 2 88 ± 1 89 ±0.7 84 ± 2 85 ± 0.5 83 ± 0.5

by choosing an SVM with the best parameters (in terms

For illustrative purposes, Figs. 2 and 3 report the classifica-

Fig. 4 ROC curves

the two sets S1 and S2 were obtained by using a threshold

weights, or Gaussian weights. In the first case, the weight

between two points xi and x j that are connected according

Das könnte Ihnen auch gefallen