Abstract

AbstractThe problem of unsupervised classification of a satellite image in a number of homogeneous regions can be viewed as the task of clustering the
pixels in the intensity space. This paper proposes a novel approach that combines a recently proposed multiobjective fuzzy clustering scheme with support vector machine (SVM) classifier to yield improved solutions. The multiobjective technique is first used to produce a set of nondominated solutions. The nondominated set is then used to find some high-confidence points using a fuzzy voting technique. The SVM classifier is thereafter trained by these high-confidence points. Finally, the remaining points are classified using the trained classifier. Results demonstrating the effectiveness of the proposed technique are provided for numeric remote sensing data described in terms of feature vectors. Moreover, two remotely sensed images of Bombay and Calcutta cities have been classified using the proposed technique to establish its utility. Index TermsFuzzy clustering, multiobjective optimization (MOO), remote sensing imagery, support vector machine (SVM).
I. INTRODUCTION
OR REMOTE sensing applications, classification is an important task which partitions the pixels in the
images into homogeneous regions, each of which corresponds to some particular landcover type. The problem of pixel classification is often posed as clustering in the intensity space [1]. Clustering [2] is a popular unsupervised pattern classification technique that partitions a set of n objects into K groups based on some similarity/dissimilarity metric where the value of K may or may not be known a priori. A fuzzy clustering algorithm produces a K n membership matrix U(X) = [uki], k = 1, . . . , K and i = 1, . . . , n, where uki denotes the membership degree of pattern xi to cluster Ck. For probabilistic nondegenerate clustering, 0 < uki < 1 and _ 1, 1 i n [3]. Large amount of uncertainty and imprecision is associated with satellite images, since a pixel represents an area of the land space, which may not necessarily belong to a single landcover type. Therefore, the application of the principles of fuzzy set theory in the domain of pixel classification is useful. Genetic algorithms (GAs) [4] are popularly used in pixel classification [1]. Recently, a multiobjective fuzzy genetic clustering technique for pixel classification has been proposed in [5] that simultaneously optimizes Xie Beni (XB) index [6] and the fuzzy C-means (FCM) [3] measure (Jm). In multiobjective optimization (MOO) [5], [7][10], search is performed over a number of, often conflicting, objective functions. Instead of yielding a single best solution, in MOO, the final solution set contains a number of nondominated Pareto-optimal solutions. A challenging issue in MOO is obtaining a final solution from the set of Pareto-optimal solutions. In this regard, a novel method using support vector machine (SVM) [11] classifier is proposed in this paper. The procedure utilizes the points which are given a high membership degree to a particular class by a majority of the nondominated solutions to train the SVM classifier. Remaining points are classified by the trained classifier. There are some related works in the literature. In [12], a feature selection method using both particle swarm optimization (PSO) and GA taking SVM as a wrapper is presented. In [13], classification of multisensor data based on fusion of SVMs is proposed. In [14], a semisupervised multitemporal classification technique based on constrained multiobjective GA (MOGA) is presented that updates the ground-truth information through an automatic estimation process. In [15], multiobjective PSO is used for model selection of SVMs for semisupervised regression that exploits the unlabeled samples from satellite images. A study on classification of hyperspectral satellite images by SVM is done in [16]. In [17], a technique for running a semisupervised SVM for change detection in satellite imagery is presented. In [18], kernel methods for the integration of heterogeneous sources of information for multitemporal classification of satellite images are proposed. A context-sensitive clustering based on graph-cut initialized expectation maximization algorithm for satellite image classification is proposed in [19]. In [20] and [21], urban satellite image classification has been done by fuzzy possibilistic classifier and fusion of multiple classifiers, respectively. A transductive SVM-based semisupervised classifier for remote sensing imagery is proposed in [22]. Unlike these supervised and semisupervised approaches, here, we propose an unsupervised multiobjective classification technique that utilizes the strength of SVM classifier and is applied for satellite image segmentation. The performance of the MOGA clustering followed by SVM classification (MOGA-SVM) is demonstrated on two numeric image data. MOGA-SVM is also applied on two IRS satellite images of the cities of Bombay and Calcutta. The superiority of the proposed technique as compared to MOGA clustering, FCM algorithm, and single objective GA is demonstrated both quantitatively and qualitatively.
K k=1 uki =
AbstractA semisupervised support vector machine is presented for the classification of remote sensing images. The method exploits the wealth of unlabeled samples for regularizing the training kernel representation locally by means of cluster kernels.
The method learns a suitable kernel directly from the image and thus avoids assuming a priori signal relations by using a predefined kernel structure. Good results are obtained in image classification examples when few labeled samples are available. The method scales almost linearly with the number of unlabeled samples and provides out-ofsample predictions. Index TermsBagged and cluster kernels, image classification, kernel methods, support vector (SV) machine (SVM).
I. INTRODUCTION
HE problem of remote sensing image classification is very challenging given the typically low rate of
labeled pixels per spectral band. Supervised classifiers such as support vector machines (SVMs) [1] excel in using the labeled information and have demonstrated very good performance in multispectral, hyperspectral, and multisource image classification [2][4]. However, when little labeled information is available, the underlying probability distribution function of the image is not properly captured, and a risk of poor generalization certainly exists. Modeling the data structure exploiting the information contained in unlabeled pixels can be done with semisupervised learning (SSL) methods, but in this case, the SVM classifier needs to be reformulated. The framework of SSL is very active and has recently attracted a considerable amount of theoretical as well as remote sensing applied research [5]. Essentially, three different classes of SSL algorithms are encountered in the literature: 1) Generative models involve estimating the conditional density [6]; 2) low density separation algorithms maximize the margin for labeled and unlabeled samples simultaneously, such as transductive SVM [7]; and 3) graph-based methods, in which each sample spreads its label information to its neighbors until a global stable state is achieved on the whole data set [8]. Despite the good performance of these methods, some shortcomings are observed. First, the contribution of unlabeled samples is usually trimmed with a critical set of free parameters. Second, pure transductive methods do not yield a final classification function but only predictions for the unlabeled samples, which typically involve a high computational burden. Finally, the complexity involved in training these methods precludes their adoption by the nonexpert user. In this letter, we propose a simple, yet powerful, semisupervised SVM based on cluster kernels. Essentially, we use the SVM with a kernel obtained from clustering all available data, which are both labeled and unlabeled. This strategy, which is originally presented in [9] and [10], allows us to obtain robust SVM classifiers with kernels adapted to the intrinsic image features directly learned from the image (or set of representative images). The classifier, being de facto a SVM, is capable of providing out-of-sample predictions, and the computational burden is only increased by the (typically small) time involved in clustering data with the users preferred algorithm. Moreover, the training complexity is the same as that involved in the familiar SVM. The method is successfully tested in multispectral and hyperspectral image classification scenarios. The rest of this letter is outlined as follows. Section II fixes notation and briefly revises the main concepts and properties of SVM and kernels. Noting that the key to obtain a good performance with SVM is a proper design of the kernel structural form, Section III pays attention to the problem of learning the kernel directly from the image and introduces the concepts of cluster and bagged kernels for semisupervised SVM image classification. Section IV presents the data collection, experimental setup, and the obtained results and also analyzes the information encoded in the proposed kernels and induced kernel mappings. Finally, Section V concludes with some remarks and further research directions.
AbstractThis paper presents two semisupervised one-class support vector machine (OC-SVM) classifiers for remote sensing applications. In one-class image classification, one tries to detect pixels belonging to one of the classes in the image and reject the others. When few labeled pixels of only one class are available, obtaining a reliable classifier is a difficult task. In the particular case of SVM-based classifiers, this task is even harder because the free parameters of the model need to be finely adjusted, but no clear criterion can be adopted. In order to improve the OC-SVM classifier accuracy and alleviate the problem of free-parameter selection, the information provided by unlabeled samples present in the scene can be used. In this paper, we present two state-of-the-art algorithms for semisupervised one-class classification for remote sensing classification problems. The first proposed algorithm is based on modifying the OC-SVM kernel by modeling the data marginal distribution with the graph Laplacian built with both labeled and unlabeled samples. The second one is based on a simple modification of the standard SVM cost function which penalizes more the errors made when classifying samples of the target class. The good performance of the proposed methods is illustrated in four challenging remote sensing image classification scenarios where the goal is to detect one of the classes present on the scene. In particular, we present results for multisource urban monitoring, hyperspectral crop detection, multispectral cloud screening, and change-detection problems. Experimental results show the suitability of the proposed techniques, particularly in cases with few or poorly representative labeled samples. Index TermsChange detection, one-class classification, oneclass support vector machine (OC-SVM), semisupervised learning (SSL), support vector domain description (SVDD), target detection.
I. INTRODUCTION
N REMOTE sensing image classification, it is quite common to deal with reduced sets of labeled samples
when developing classifiers. Support-vector-machine (SVM)-based classifiers excel in using the labeled information, with (regularized) maximum margin classifiers also being equipped with an appropriate loss function [1], [2]. However, applicability of the SVM is only possible when labeled samples of all the landcover classes present in the scene are available. When such information is only available for one class of interest (or few), other techniques should be used. In particular, high interest has been devoted to the learning frameworks of the following: 1) anomaly detection, where one tries to identify pixels differing significantly from the background; 2) target detection, where the target spectral signature is assumed to be known and the goal is that of detecting pixels that match the target; and 3) one-class classification, where one tries to detect one class and reject the others. This paper is focused on one-class classification. In the past, several kernel-based methods have been developed to this purpose. The use of kernel methods offers many advantages with regard to other approaches, as they alleviate the curse of dimensionality in hyperspectral images, increase the robustness of the method to noise, and allow flexible and smooth nonlinear mappings [1]. Kernel methods in general, and kernel-based classifiers in particular, rely on the proper definition of a kernel (or similarity) function between samples. In particular, the oneclass SVM (OC-SVM) [3] aims at identifying samples of one particular class while rejecting all the others. In the remote sensing literature, the method was originally introduced for anomaly detection [4], [5], then exercised in incomplete and unreliable training data problems [6], and recently engineered for change detection [7]. Nevertheless, when very few or less representative training samples are available, the OC-SVM may result in unreliable classification results. Thus, for dealing with this kind of problems, OC-SVM can be reformulated in the framework of semisupervised learning (SSL) to exploit not only labeled but also unlabeled sample [8][10] information. In this paper, we introduce two semisupervised OC-SVM methods. The first method, named semisupervised OCSVM (S2OC-SVM), uses the available supervised information (labeled) and also the data with no a priori class information (unlabeled) to encode some knowledge about the geometry and data distribution. The exploration of the shape of the marginal distribution of data adds significant information in order to better position the decision boundary. It is worth noting that this kind of procedure can be applied to any kernelbased classification method and results in a modification of the measure of similarity in the kernel space according to the geometry of the unlabeled samples. This is done by including an additional regularization term on the geometry of both labeled and unlabeled samples by using the graph Laplacian [11], [12].
AbstractThe multitemporal classification of remote sensing images is a challenging problem, in which the efficient combination of different sources of information (e.g., temporal, contextual, or multisensor) can improve the results. In this paper, we present a general framework based on kernel methods for the integration of heterogeneous sources of information. Using the theoretical principles in this framework, three main contributions are presented. First, a novel family of kernel-based methods for multitemporal classification of remote sensing images is presented. The second contribution is the development of nonlinear kernel classifiers for the well-known difference and ratioing change detection methods by formulating them in an adequate high-dimensional feature space. Finally, the presented methodology allows the integration of contextual information and multisensor images with different levels of onlinear sophistication. The binary support vector (SV) classifier and the one-class SV domain description classifier are evaluated by using both linear and nonlinear kernel functions. Good performance on synthetic and real multitemporal classification scenarios illustrates the generalization of the framework and the capabilities of the proposed algorithms. Index TermsChange detection, information fusion, kernel methods, multisource, multitemporal classification, support vector (SV) domain description (SVDD), support vector machine (SVM).
I. INTRODUCTION
HE PROBLEMS of multitemporal image classification and change detection are highly relevant in many
domains [1], particularly in the field of remote sensing [2][4]. Typical applications consider updating digital remote sensing databases, following multiseasonal crop cover phenology or the automatic detection of growing urbanization. With the increasing multitemporal and multisensor data available from remote sensing platforms, the efficient fusion and exploitation of this unprecedented wealth of data is a critical issue at present.Many methods have been proposed to tackle the problem of multi-temporal classification, in general, and of change detection, in particular. However, so far, there is no general methodological framework for combining different sources of information that involve different sensors, time instants, and spatial or contextual extracted features efficiently and with tunable complexity. This is the focus of this paper. On the one hand, multitemporal classification algorithms classify pixels by learning the changing mapping between dates in a temporal sequence of images. When a labeled image dataset is available, supervised
classifiers can yield improved performance over unsupervised approaches. Other advantages are their capability to explicitly detect land-cover transitions, robustness to different atmospheric and light conditions at the acquisition times, and their demonstrated ability to process multisensor/multisource images [5]. Many multitemporal supervised methods have been used during the last years, such as evidence reasoning [6], generalized least squares [7], or neural networks [8][10]. Nevertheless, several problems are identified in the presented strategies. First, classifiers are, in general, sensitive to the high dimension of pixels in hyperspectral images or to the high input space generated by putting together multisensor features at different temporal instants. This stacked approach increases the well-known curse of dimensionality, which has lately been alleviated by using support vector (SV) machines (SVMs) in this setting [11], [12]. Second, classifiers can suffer from false-alarm detection rates when the contextual or textural information of the change is not considered. This is an important issue because in practice, the user is ultimately interested in very precisely detecting both the position and the spatial extent of the class(es) of interest. Multitemporal and multiband synthetic aperture radar (SAR) classification of urban areas using spatial analysis has been successfully addressed with both statistical and neural approaches [13] and at feature and pixel information levels [14]. Third, and very importantly, most methods do not consider the (potentially nonlinear) cross information among pixels (and among features) at different time instants. In fact, the learning paradigm is often violated because the classifier is trained and tested with data coming from different distributions due to differences in atmospheric and light conditions, sensor drifts, etc. To address this problem, several strategies have been presented. A dynamic approach to link hidden Markov random fields at different dates is used in [15], whereas in [16], scenes are classified by a fuzzy fusion of the spatial and spectral information, whereas the temporal information is obtained from transition probabilities. Last, but not least, it should be noted that, in most cases, only two dates are considered to illustrate method capabilities, and thus, the performance of the algorithms for long-term operational studies is unclear. In [17], a methodology that encompasses the use of both temporal and contextual information was presented for the classification of the long time series of satellite data. The method was based on krigging-integrated variograms and Gaussian maximum like lihood classification and showed very good results. All the aforementioned shortcomings can be simultaneously alleviated with the adequate formulation of novel kernel methods [18], [19] that we will focus on in this paper. On the other hand, change detection can be viewed as a particular case of the multitemporal image classification problem. Two main approaches are followed in the literature, namely: 1) postclassification comparison and 2) preclassification enhancement. In the first case, the images of two dates are independently classified and coregistered, and an algorithm is used to identify those pixels whose predicted labels change between dates. In the second case, a single classification is performed on the combined image dataset for the two dates. The postclassification approach can fail, considering that it relies on the accuracy of each independent classifier. Both approaches, however, inherit all the aforementioned problems of the multitemporal image classification scenario. Classical change detection techniques are based on multidate principal-component analysis, temporal image subtraction or ratioing, changevector analysis, clustering, or cross-correlation analysis [2]. The main ideas underlying these techniques are visualizing, analyzing, or computing the differences among the sample distributions for two dates in a lowdimensional subspace (e.g., two principal components, few bands, etc.). If one detects changes in a (representative-enough) space, then, one can analyze the nature of the change by inspecting the spectral signatures involved in it. All these techniques are unsupervised in the sense that they do not require a labeled image at time t1 to learn from and then to extrapolate to the subsequent image at time t2. The early approaches considered intuitive threshold-based image differencing or ratioing; however, this was readily demonstrated to be inefficient. The selection of suitable thresholds under Bayesian criteria have been largely studied [5], [20]. Recently, the KittlerIllingworth minimum-error thresholding algorithm attained good results for unsupervised SAR change detection [21], whereas a fuzzy hidden Markov chain model has been successfully used in combination with the ratio approach [22]. Furthermore, a full methodology for change detection based on the analysis of the difference vectors in the polar domain has been presented [23], and a method based on studying the evolution of the local statistics using the KullbackLeibler divergence has been proposed with good results [24]. It is only recently that authors have turned to kernel-based methods for change detection. In [25], a semisupervised oil slick detection was proposed by using the SV domain description (SVDD) classifier in the wavelet domain of SAR images, and in [26], the SV classifier (SVC) for abrupt change detection was presented for detecting buried landmines from groundpenetrating radar data. Not only do these kernel methods allow large-margin classifications, but they also intrinsically match the well-known nonlinear nature of the change [27]. However, none of them has been particularly redesigned to consider cross relations between time instants or to efficiently include contextual and multisource data in the classifier.
In this scenario and attending to the previously identified problems, we present a novel methodological framework that allows us to develop a family of nonlinear classifiers for multitemporal, contextual, and multisource image classification and change detection. In particular, the methods are developed under the framework of kernel methods [18], [19], which has demonstrated good results in high-dimensional image classification [28][30]. Certainly, these are important characteristics of kernel methods, which become strictly necessary when multitemporal, multisensor, and contextual features are extracted and need to be combined. In addition, we derive specific formulations for dealing with the peculiarities of the change detection problem by proposing the difference and the ratioing of images in the kernel space. The proposed methodological framework also serves to efficiently integrate different information sources such as optical and SAR data. The rest of this paper is organized as follows. Section II reviews both the framework of the kernel methods, paying special attention to their general properties, and the formulations of the standard SVC and SVDD supervised classifiers. Section III introduces the novel methodological framework for information fusion based on the kernels. The proposed family of kernels is used in Section IV to develop specific kernel classifiers for multitemporal classification and change detection. Section V exploits the presented framework to integrate the contextual, textural, and multisource information in the classifier. Section VI presents the experimental results in both long time series of simulated images and challenging real scenarios of multitemporal image classification and change detection. Finally, Section VII draws some concluding remarks.
AbstractGaussian processes (GPs) represent a powerful and interesting theoretical framework for Bayesian classification. Despite having gained prominence in recent years, they remain an approach whose potentialities are not yet sufficiently known. In this paper, we propose a thorough investigation of the GP approach for classifying multisource and hyperspectral remote sensing images. To this end, we explore two analytical approximation methods for GP classification, namely, the Laplace and expectationpropagation methods, which are implemented with two different covariance functions, i.e., the squared exponential and neuralnetwork covariance functions. Moreover, we analyze how the computational burden of GP classifiers (GPCs) can be drastically reduced without significant losses in terms of discrimination power through a fast sparse-approximation method like the informative vector machine. Experiments were designed aiming also at testing the sensitivity of GPCs to the number of training samples and to the curse of dimensionality. In general, the obtained classification results show clearly that the GPC can compete seriously with the state-of-the-art support vector machine classifier. Index TermsExpectation-propagation (EP) method, Gaussian process (GP), hyperspectral imagery, Laplace approximation, sparse classification, support vector machine (SVM).
I. INTRODUCTION
UPERVISED classification of remote sensing images has received great attention from the remote sensing
community for several decades. For such a purpose, many simple and sophisticated techniques have been considered, such as the statistical classifier, the k-nearest neighbor classifier, the artificial neural-network (NN) classifier, and, more recently, the kernel-based classifier [1], [2]. Among the most popular kernelbased classifiers available in the literature, one can find support vector machine (SVM) classifiers [3]. They are based on the margin maximization principle, which aims at providing them with a good generalization capability. SVM classifiers have been used extensively and proved to be successful in dealing with remote sensing data [4][14]. Another potentially interesting kernel-based classification approach is the one based on Gaussian processes (GPs) [15][20]. In contrast to SVM classifiers, GP classifiers (GPCs) have not yet received sufficient attention from the remote sensing community, despite being theoretically attractive statistical models that permit a fully Bayesian treatment of the considered classification problem. Compared to SVM classifiers, they have the advantage of providing probabilistic outputs rather than discriminant function values. Moreover, they can use evidence for solving the model selection issue in a completely automatic way. The main idea of GPCs is to assume that the probability of belonging to a class label for an input sample is monotonically related to the value of some latent function at that sample. Such a monotonic relationship is defined according to a so-called squashing function. A GP prior characterized by a covariance matrix embedding a set of hyperparameters is placed on this latent function. Inference is made by integrating over the latent function. Since such an integral is analytically intractable, solutions based on Monte Carlo sampling or analytical approximation methods are adopted. The two key analytical approximation algorithms are the Laplace and expectation-propagation (EP) algorithms. Both approximate the non-Gaussian joint posterior over the latent variables with a Gaussian one. In the Laplace approximation, the Gaussian model is defined with mean and covariance matrix as the maximum point of the posterior and the negative Hessian matrix at that point, respectively. The identification of this maximum is carried out according to the iterative Newton method. The EP algorithm is a more sophisticated approximation technique that tries in some way to minimize locally the KullbackLeibler
divergence measure between the true posterior and the approximated one. This is done sequentially through the so-called cavity distribution. In the prediction phase, the approximate predictive mean and variance for the (approximated) Gaussian posterior over the latent variable of the considered sample are computed first. Then, the class posterior probability for the sample target is derived either analytically or by approximation, depending on the adopted squashing function. The multiclass implementation of GPCs is obtained through an intrinsicmulticlass formulation, which can be complex [21], or simply by decomposition into binary classification problems. In this paper, we propose a thorough investigation of the GPC effectiveness for classifying multisource and hyperspectral remote sensing images. To this end, we designed several experiments aiming also at testing the sensitivity of GPCs to the number of training samples and to the curse of dimensionality. In general, the obtained classification results show clearly that the GPC can compete seriously (providing sometimes better accuracies) with the state-of-the-art SVM classifier.

Abstract

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Abstract

Hochgeladen von

Copyright:

Verfügbare Formate

AbstractThe problem of unsupervised classification of a satellite image in a number of homogeneous regions can be viewed as the task of clustering the

Das könnte Ihnen auch gefallen