Beruflich Dokumente
Kultur Dokumente
Abstract— Cloud detection is one of the important tasks by clouds [6], most remote sensing imageries would inevitably
for remote sensing image (RSI) preprocessing. In this paper, be contaminated by clouds. Cloud coverage degrades the
we utilize the thumbnail (i.e., preview image) of RSI, which quality of satellite imagery by disabling satellite sensor to
contains the information of original multispectral or panchro-
matic imagery, to extract cloud mask efficiently. Compared with obtain clear views of the earth’s surface, thus affecting imagery
detection cloud mask from original RSI, it is more challenging postprocessing, such as remote sensing image (RSI) classi-
to detect cloud mask using thumbnails due to the loss of fication and segmentation [7], image matching [8], and 3-D
resolution and spectrum information. To tackle this problem, surface generation [9]. Hence, it is important to quickly and
we propose a cloud detection neural network (CDnet) with an accurately detect cloud mask to assess the quality of remote
encoder–decoder structure, a feature pyramid module (FPM),
and a boundary refinement (BR) block. The FPM extracts the sensing imagery.
multiscale contextual information without the loss of resolution Most previous works utilized spectral information (far
and coverage; the BR block refines object boundaries; and infrared and thermal infrared information) in hyperspec-
the encoder–decoder structure gradually recovers segmentation tral/multispectral remote sensing imagery to identify and seg-
results with the same size as input image. Experimental results on ment clouds. Typical methods include International Satellite
the ZY-3 satellite thumbnails cloud cover validation data set and
two other validation data sets (GF-1 WFV Cloud and Cloud Cloud Climatology Project (ISCCP) [10], Clouds from the
Shadow Cover Validation Data and Landsat-8 Cloud Cover Advanced Very High Resolution Radiometer (CLAVR) [11],
Assessment Validation Data) demonstrate that the proposed and AVHRR Processing scheme Over clouds, Land and Ocean
method achieves accurate detection accuracy and outperforms (APOLLO) [12]. However, some high-resolution remote sens-
several state-of-the-art methods. ing imageries, e.g., China’s ZY-3 multispectral imagery [7],
Index Terms— Cloud detection, cloud detection neural network have only four bands (blue, green, red, and near infrared),
(CDnet), deep convolutional neural network (DCNN), satellite which are challenging to reliably detect cloud. To improve
imagery, thumbnails. cloud detection performance, it is necessary to incorporate
discriminate features, such as texture, geometry, and ground
I. I NTRODUCTION objects’ size [13]. However, it is difficult to design discrim-
inative features because the high complexity of targets and
W ITH the rapid development of remote sensing tech-
nology, high-resolution satellite imagery is read-
ily available and has been widely used in agriculture
the large coverage of high-resolution remote sensing imagery
usually lead to tremendous computational complexity.
engineering [1], environmental protection [2], land or mineral The most straightforward way to reduce computational com-
resource exploration [3], geographical survey [4], and military plexity is working on subsampled images or even thumbnail
reconnaissance [5]. Since nearly 66% earth surface is covered images (i.e., preview image) [14]. Thumbnail images contain
necessary ground objects’ information of original multispec-
Manuscript received September 18, 2018; revised January 23, 2019 and tral/hyperspectral or panchromatic images for preview and
March 1, 2019; accepted March 9, 2019. This work was supported in
part by the National Natural Science Foundation of China (NSFC) under have smaller sizes. But a thumbnail image generally contains
Grant 61771339, Grant 61571322, Grant 61672378, and in part by the only an RGB image with three bands or even a gray image
Tianjin Science and Technology Program under Grant 17ZXRGGX00160 and with only one band. Hence, cloud detection from thumbnail
Grant 18JCYBJC19200. (Corresponding author: Kun Li.)
J. Yang, J. Guo, and H. Yue are with the School of Electrical and images is more difficult than that from high-resolution hyper-
Information Engineering, Tianjin University, Tianjin 300072, China (e-mail: spectral or multispectral RSI [15], [16], especially for images
yjy@tju.edu.cn; g_j_h@tju.edu.cn; huanjing.yue@tju.edu.cn). with cloud-snow coexistence as shown in Fig. 1. Research
Z. Liu is with the School of Geology Engineering and Geomatics, Chang’an
University, Xi’an 710054, China (e-mail: liuzhiheng@chd.edu.cn). work on this line [15], [17]–[20] has achieved high accuracy
H. Hu is with the School of Precision Instrument and Opto-Electronics for cloud detection, but most of them fail to distinguish
Engineering, Tianjin University, Tianjin 300072, China (e-mail: between cloud and snow. Therefore, it is desirable to develop
haofeng_hu@tju.edu.cn).
K. Li is with the Tianjin Key Laboratory of Cognitive Computing and more powerful feature description and classification techniques
Application, School of Computer Science and Technology, Tianjin University, in order to obtain accurate cloud detection from thumbnails.
Tianjin 300072, China (e-mail: lik@tju.edu.cn). In recent years, the deep neural network has achieved
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. tremendous success in image analysis and recognition, sig-
Digital Object Identifier 10.1109/TGRS.2019.2904868 nificantly outperforming traditional machine learning across
0196-2892 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
YANG et al.: CDnet: CNN-BASED CLOUD DETECTION FOR REMOTE SENSING IMAGERY 3
2) Multiple Image-Based Methods: To tackle the limitation ones. These algorithms achieve significant improvement in
of simple thresholding techniques, using multiple images with terms of segmentation accuracy over traditional methods. The
temporal information is much more informative. Such methods strategies used to further improve cloud detection accuracy can
show that temporal information is able to improve cloud be roughly divided into the following three aspects.
detection results. Zhu and Woodcock [30] estimated a time 1) Exploit Context Information: Contextual correlation is
series model for each pixel based on the robust iteratively important for complex scene understanding [25]. To enlarge
reweighted least squares (RIRLS) method. Qian et al. [31] the receptive field of neural networks, Yu and Koltun [55] used
proposed to detect cloud on optical RSI time series using dilated convolution to systematically aggregate multiscale con-
the mean shift algorithm. Gmezchova et al. [32] proposed the textual information without losing resolution. Zhan et al. [56]
regularized least squares and kernel regression methods for enhanced the network VGG-16 [53] with dilated convolution
cloud detection. Hagolle et al. [33] and Goodwin et al. [34] for cloud detection from satellite images. For its promising
detected cloud using multiple temporal images, assuming that performance, dilated convolution has been used in many net-
underlying landscape has little change within a short time works for image semantic segmentation, such as PSPnet [25],
period, and the pixel values of cloud areas are obviously dif- RefineNet [57], Deeplab, and its variants [23], [24], [44].
ferent. These methods show significant improvement in cloud 2) Preserve Score Map Resolution: Preserving score map
estimation and achieve high accuracy for the cloud and cloud resolution is another main research direction in semantic
shadow detection. But image data are not always available segmentation. To obtain a score map with the same size
for multiperiod observations [35]. Moreover, multitemporal as the input image, Kalia et al. [58] proposed the Cloud-
analysis is very sensitive to data quality [28]. CNN network for cloud/shadow detection based on the
3) Learning-Based Methods: To further improve the perfor- encoder–decoder architecture [59] evaluated on Himawari-8
mance, more and more machine learning methods, including AHI and GOES-16 ABI multispectral data. Ozkan et al. [60]
support vector machine (SVM) [36], neural network [37], ran- proposed a deep pyramid network (DPN) with encoder and
dom forest (RF) [38], maximum likelihood (MaxLike) [39], generator filter blocks (decoder architecture) for cloud detec-
Markov random field [40], K -nearest neighbor (K-NN) [20], tion from RGB color RSIs. Recently, Zhang et al. [61]
and decision tree theory [41], are used in RSI processing. proposed a lightweight neural network based on the U-Net
SVM is the most popular one due to its prominent classi- model [62] for on-board pixelwise cloud detection on small
fication performance. It is applied to many cloud detection satellites.
cases [36], [42]. As the input of a classifier, hand-crafted fea- 3) Refine Object Boundaries: High-quality segmentation
tures, such as texture/color information [18] and morphologi- results should be coherent with object boundaries. To refine
cal features [29], are difficult to accurately capture the cloud cloud boundaries, Yue et al. [63] utilized the discrete
characteristics under complex environment [35]. This leads conditional random field (CRF) [64] to refine segmentation
us to seek more effective and accurate feature representation boundaries by exploiting contextual information in cloud seg-
methods. mentation. Zhan et al. [56] exploited low-level visual features
to generate sharp and detailed cloud boundaries. To further
improve the localization capability near cloud boundaries,
B. Deep Feature-Based Methods
Yuan et al. [65] proposed an edge-aware segmentation net-
The rapid development of a deep convolutional neural work with an encoder–decoder structure for cloud detection.
network (DCNN) provides us new available approaches for Most existing deep neural networks for cloud detection
image processing [21]. DCNNs are able to extract high-level addressed only one of these aspects. In this paper, we propose
abstract features from input images and significantly improve the CDnet for cloud detection by simultaneously exploiting
the accuracy of image classification or recognition. Image multiscale and global contextual information, preserving score
semantic segmentation techniques based on deep learn- map resolution, and refining object boundaries. To this end,
ing have also achieved amazing performance. Segmenta- we introduce the FPM and BR modules. The FPM combines
tion results in PASCAL VOC2012 challenge [43], and the multiple parallel dilated convolution layers and the GAP block.
highest mean intersection over union (MIOU) is achieved Dilated convolution with different sampling rates enlarges the
by DeeplabV3+ [44] up to 89%. Similarly, DCNNs have field of view of filters and effectively incorporates multiscale
also been introduced to RSI processing, such as ground context [23]. The GAP block is able to extract image-level
objects’ classification [45], feature extraction [46], scene features, which helps capture long-range information beyond
classification [47], object detection [17], and super-resolution the capability of dilated convolutional layers. BR is a boundary
reconstruction [48]. For cloud detection, Xie et al. [17], refinement residual block, which helps refine object bound-
Goff et al. [49], and Chen et al. [50] used the fully connected aries. These strategies make the segmentation results more
DCNN to detect cloud regions from superpixels obtained by accurate and reliable.
the simple linear iterative cluster (SLIC) method. However,
the performance is limited by the presegmental superpixels.
III. P ROPOSED M ETHOD
Recent efforts tried to transfer classification and objects’
recognition networks, such as AlexNet [51], GoogLeNet [52], In this section, we describe the overall framework of the
VGG [53], and ResNet [54], into fully convolutional ones proposed CDnet, as shown in Fig. 3. The key components,
by replacing the fully connected layers with convolutional i.e., modified ResNet-50, FPMs, BR blocks, classification
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 3. Framework of the proposed CDnet. Red and green arrows represent the 2× downsampling and 2× upsampling operators, respectively. The red
rectangular box with “×2” represents the upsampling and BR operations are implemented twice. The operator represents the elementwise summation
operator. The first three convolution layers in the modified ResNet-50 use convolutions with stride 2, 1, and 1, respectively, and the filter size is 3 × 3.
layers, and loss function involved in the CDnet, are presented B. Modified ResNet-50 Feature Network
in detail. Residual networks [54] with skip connection in each block
are easier to train and optimize particularly for very deep
A. Overall Framework of CDnet networks, which achieves an impressive performance in vari-
The framework of the proposed CDnet is shown in Fig. 3. ous vision tasks. For its promising performance in extracting
We first use the pretrained and modified ResNet-50 network discriminative features, we stand on ResNet-50 as the main
(detailed network structure is shown in Fig. 4) to extract fea- structure of the proposed network. As shown in Fig. 3, instead
tures. Then, the proposed FPMs extract multiscale information of using a 7 × 7 receptive field with a stride of 2 in the
at different stages of the modified network. For each FPM, first convolutional layer of original ResNet-50, we use three
a 3 × 3 convolution (including 512 filters, batch normaliza- small filters with 3 × 3 receptive fields (Conv1 x block),
tion, and Relu) is followed to reduce the number of channels. i.e., conv1 1 with stride 2, conv1 2, and conv1 3 with stride 1.
Finally, a 1 × 1 convolutional layer is used to generate Such a modification greatly reduces the number of parameters:
semantic score maps for each class. Three low-resolution the three 3 × 3 convolutional layers have 27 parameters and
score maps are generated in the middle stages of the CDnet: the 7 × 7 convolutional layer has 49 parameters. In addi-
scoremaps1 and scoremaps2 are 1/8 × 1/8 size of the input tion, the incorporated three nonlinear 3 × 3 convolutional
image, while scoremaps3 is 1/4 × 1/4 size of the input image. layers also make the modified ResNet-50 deeper and more
We fuse scoremaps1 with scoremaps2 by an elementwise discriminative [53] than ResNet-50 (see results in Table I).
summation operation and refine the fusion results by a BR Besides, all the strides in the convolutional layers of modified
operation. The refined fusion results are upsampled to the ResNet-50 are set to 1 except that the stride of the first
same resolution as scoremaps3.1 Subsequently, we further fuse convolution layer of the whole network and that of the first
the upsampled results with scoremaps3 by an elementwise convolution layer in Res 3 block are set to 2, as shown
summation operation followed by a BR operation to obtain in Fig. 3. In Fig. 3, the sizes of output feature maps at
a new scoremap, which is 1/4 × 1/4 size of the input image. the intermediate layers of modified ResNet-50, marked by
Finally, the new scoremap is upsampled to the same size as the output size, are given as normalized ratios against the original
input image by two upsampling and BR operations. These key input size.
modules of the CDnet are described in Sections III-B–III-D. The repeated pooling and subsampling operators of the
ResNet-50 usually led to the reduction of spatial resolution.
1 The upsampling layer in this paper uses bilinear interpolation. The loss of spatial information may be harmful to produce
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
YANG et al.: CDnet: CNN-BASED CLOUD DETECTION FOR REMOTE SENSING IMAGERY 5
Fig. 7. Example of (a) cloud image, (b) coarse score map T1 , and (c) refined
score map T2 .
YANG et al.: CDnet: CNN-BASED CLOUD DETECTION FOR REMOTE SENSING IMAGERY 7
Fig. 12. Detailed structures of the three network structure variants. The red and green arrows represent the downsample and upsample operators,respectively.
The red rectangular box represents upsample, and BR implemented twice in CDnet-C and three times in CDnet-A and CDnet-B. The operator represents
the elementwise sum operators. The first three convolution layers in the modified ResNet-50 use convolutions with stride 2, 1, and 1, respectively, and the
filter size is 3 × 3.
C. Ablation Experiments removed from the CDnet. In CDnet-FPM-BR, both FPM and
1) Network Structures: To demonstrate the effectiveness BR modules are removed from the CDnet. Third, in order to
of the proposed CDnet, we first evaluate MRN+FPM and investigate the effectiveness of the proposed CDnet, we design
MRN+FPM+BR to investigate the performance of two three variant network structures, i.e., CDnet-A, CDnet-B, and
key components, i.e., FPM and BR block. Specifically, CDnet-C, as shown in Fig. 12.
in MRN+FPM, an FPM module is appended to the modified In this paper, the proposed CDnet consists of MRN and mul-
ResNet-50 (MRN), where the scoremap is directly upsampled tiple FPM+BR modules, which utilizes the features extracted
to the same size as the input image without BR. While from different scales and levels. In contrast, the network of
in MRN+FPM+BR, the scoremap is enhanced by a BR MRN+FPM+BR and CDnet-A only contains one FPM+BR
operation before upsampling. Second, we design CDnet-FPM, module, which can only take advantage of the features at
CDnet-BR, and CDnet-FPM-BR to investigate ablation exper- one scale and level. MRN+FPM+BR directly upsamples the
iments for FPM and BR. In CDnet-FPM, the FPM module scoremap refined by BR to the resolution of the original input
is removed from the CDnet. In CDnet-BR, the BR module is image, while CDnet-A gradually upsamples the scoremap
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
YANG et al.: CDnet: CNN-BASED CLOUD DETECTION FOR REMOTE SENSING IMAGERY 9
Fig. 13. Comparison of cloud extraction results of CNN-based methods in thumbnails of ZY-3 satellite imagery. (a)–(d) Four RGB thumbnails. (e)–(h) Four
gray thumbnails. Among of them, (a), (e), and (g) are for cloud-only cases, (b), (c), and (f) are for cloud-snow co-existing cases, and (d) and (h) are for
snow-only case. The sizes of RGB thumbnails and gray thumbnails are 1k × 1k and 3k × 3k, respectively.
2) Quantitative Results: Table III presents quantitative cause an apparent reduction in the cloud PA and UA [69].
results in terms of OA, MIOU, Kappa, PA, and UA. We note Therefore, images with cloud coverage less than 5% are
that a low cloud coverage of percentage (less than 5%) may not included in the evaluation. Results in Table III indicate
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
YANG et al.: CDnet: CNN-BASED CLOUD DETECTION FOR REMOTE SENSING IMAGERY 11
Fig. 14. Comparison of cloud extraction results of classic learning-based methods in thumbnails of ZY-3 satellite imagery. Thumbnails are the same as in
those in Fig. 13.
TABLE III and 28 (test) scenes for training, validation, and testing,
C LOUD E XTRACTION A CCURACY (%) respectively. Data are composed of channels 4, 3, and 2. In
the training stage, we divide the large GF-1 satellite imagery
into subimages with the size of 321 ×321. In the testing stage,
we divide the original image into subimages of size 513×513,
since our GPU memory is not enough to process large size
images. The final result for the whole image is generated by
stitching the results of subimages together. The settings for the
training and testing are the same as those for ZY-3 satellite
thumbnails. We compare our proposed CDnet with the eight
CNN-based methods. In addition, the automatic multifeature
combined (MFC) cloud detection method [16], which is the
baseline method for GF1 data set, is also compared. In this
paper, we do not use the low-accuracy traditional methods,
i.e., Maxlike [7] and SVM [36], as comparison methods in the
additional tests on GF-1 Cloud Cover Assessment Validation
that the proposed CDnet consistently outperforms eight other Data. The cloud extraction results of four typical GF-1 Satellite
CNN-based comparison methods in terms of OA, MIoU, imageries are shown in Fig. 15. Table IV shows quantitative
Kappa, PA, and UA. Moreover, these CNN-based methods results on the testing data set. Both results in Fig. 15 and
are significantly better than the two traditional methods, Table IV suggest that the proposed CDnet achieves the best
since hand-crafted features are not as discriminative as those segmentation accuracy.
extracted by CNN-based methods. Nevertheless, the PA of 2) Landsat-8 Cloud Cover Assessment Validation Data:
the two traditional methods is higher than that of CNN-based Original Landsat-8 Cloud Cover Assessment Validation Data
methods, since they tend to classify all white pixels as cloud contains 96 operational land imager (OLI) thermal infrared
regions, including snow coverage areas. As a result, their UA sensor (TIRS) terrain-corrected (Level-1T) scenes.5 In this
values are significantly lower than those of the CNN-based experiment, we select 22 (22)-scene Landsat-8 satellite images,
methods. whose cloud percentages lie in 35%∼100%, for training (val-
idation). The data for training and validation include barren,
E. Experiments on GF-1 and Landsat-8 Cloud Cover
grass/crops, forest, shrubland, urban, snow/ice, wetlands, water
Assessment Validation Data
area, and so on. In addition, 11-scene images with the cloud
1) GF-1 Satellite Image: GF-1 WFV Cloud and Cloud percentage of 35%∼100% and another 20-scene images with
Shadow Cover Validation Data released by the SENDIMAGE the cloud percentage of 5%∼35% are selected for testing.6
Lab includes 108 GF-1 wide field-of-view (WFV) level-2A
scenes and their corresponding cloud and cloud shadow refer- 5 https://landsat.usgs.gov/landsat-8-cloud-cover-assessment-validation-data
ence masks.4 In this experiment, we use 40 (train), 40 (val), 6 In order to construct a balanced distributed data set, 21-scene images
with the cloud percentage of 0%∼5% in data set are excluded for training,
4 http://sendimage.whu.edu.cn/en/mfc-validation-data/ validation, and testing [69].
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 15. Comparison of cloud extraction results of different methods in three GF-1 Satellite imageries. From top to bottom, they are thick cloud, thin cloud,
cloud, and snow–cloud coexistence images, respectively.
To verify the performance of CNN-based methods with partial experimental setting for training and testing are the same
spectral information, channels 3–5 of Landsat-8 satellite image as GF-1 satellite image validation data. In addition, Fmask
are used in the training and testing. Data preprocessing and algorithm (Fmask) [70] is the baseline method for Landsat-8
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
YANG et al.: CDnet: CNN-BASED CLOUD DETECTION FOR REMOTE SENSING IMAGERY 13
Fig. 16. Comparison of cloud extraction results of different methods in three Landsat8 Satellite imageries. From top to bottom, they are typhoon eye cloud,
inland desert cloud, snow/ice and wetlands area cloud, and rainforest cloud images, respectively.
Satellite image data set and is also included in the comparison. Table V suggest that the proposed CDnet network achieves
The cloud extraction results of four typical Landsat-8 satellite promising cloud detection performance for Landsat-8 satellite
images are shown in Fig. 16. Table V shows the quantitative image, and it outperforms Fmask and other eight CNN-based
results of the testing data set. Both results in Fig. 16 and methods.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE IV
C LOUD E XTRACTION A CCURACY (%) OF GF-1 S ATELLITE I MAGERY
TABLE V
high consistency, and 81%∼100% almost perfect [71]. Most
C LOUD E XTRACTION A CCURACY (%) OF L ANDSAT-8
notably, the CDnet has achieved the best performance, out-
S ATELLITE I MAGERY performing other CNN-based methods, including the recent
Deeplab series [23], [24], [44] and the PSPnet [25]. These
results show that the proposed CDnet has a powerful semantic
segmentation capability, which can be well used for remote
sensing imagery clouds’ detection.
YANG et al.: CDnet: CNN-BASED CLOUD DETECTION FOR REMOTE SENSING IMAGERY 15
R EFERENCES [22] C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun, “Large kernel matters-
improve semantic segmentation by global convolutional network,” in
[1] T. S. Magney, L. A. Vierling, J. U. H. Eitel, D. R. Huggins, and Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Apr. 2017,
S. R. Garrity, “Response of high frequency photochemical reflectance pp. 1743–1751.
index (PRI) measurements to environmental conditions in wheat,” [23] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille,
Remote Sens. Environ., vol. 173, pp. 84–97, Feb. 2016. “DeepLab: Semantic image segmentation with deep convolutional nets,
[2] Z. Li et al., “Remote sensing of atmospheric particulate mass of atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern
dry PM2.5 near the ground: Method validation using ground-based Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2017.
measurements,” Remote Sens. Environ., vol. 173, pp. 59–68, Feb. 2016. [24] L. C. Chen, G. Papandreou, F. Schroff, and H. Adam. (2017). “Rethink-
[3] A. A. Fenta et al., “The dynamics of urban expansion and land use/land ing atrous convolution for semantic image segmentation.” [Online].
cover changes using remote sensing and spatial metrics: The case of Available: https://arxiv.org/abs/1706.05587
mekelle city of northern ethiopia,” Int. J. Remote Sens., vol. 38, no. 14, [25] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing
pp. 4107–4129, Jul. 2017. network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
[4] M. G. Benson and J. L. Faundeen, “The U.S. geological survey, remote Mar. 2017, pp. 6230–6239.
sensing, and geoscience data: Using standards to serve us all,” in Proc. [26] Q. J. He, “A daytime cloud detection algorithm for FY-3A/VIRR data,”
IEEE Int. Geosci. Remote Sens. Symp. (IGRSS), vol. 3, Jul. 2000, Int. J. Remote Sens., vol. 32, no. 21, pp. 6811–6822, Jul. 2011.
pp. 1202–1204. [27] J. Wei et al., “Dynamic threshold cloud detection algorithms for MODIS
[5] Z. G. Wang, Q. Kang, Y. J. Xun, Z. Q. Shen, and C. B. Cui, “Military and Landsat 8 data,” in Proc. IEEE Int. Geosci. Remote Sens. Symp.
reconnaissance application of high-resolution optical satellite remote (IGARSS), Jul. 2016, pp. 566–569.
sensing,” in Proc. Int. Symp. Optoelectronic Technol. Appl., vol. 9299, [28] B. Zhong, W. Chen, S. Wu, L. Hu, X. Luo, and Q. Liu, “A cloud
Nov. 2014, Art. no. 9299195. detection method based on relationship between objects of cloud and
[6] Y. Zhang, W. B. Rossow, A. A. Lacis, V. Oinas, and M. I. Mishchenko, cloud-shadow for Chinese moderate to high resolution satellite imagery,”
“Calculation of radiative fluxes from the surface to top of atmosphere IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 10, no. 11,
based on ISCCP and other global data sets: Refinements of the radiative pp. 4898–4908, Nov. 2017.
transfer model and the input data,” J. Geophys. Res. Atmos., vol. 109, [29] A. Fisher, “Cloud and cloud-shadow detection in SPOT5 HRG imagery
no. D19, pp. 1–27, Oct. 2004. with automated morphological feature extraction,” Remote Sens., vol. 6,
[7] F. Yang, J. Guo, H. Tan, and J. Wang, “Automated extraction of urban no. 1, pp. 776–800, 2014.
water bodies from ZY-3 multi-spectral imagery,” Water, vol. 9, no. 2, [30] Z. Zhu and C. E. Woodcock, “Automated cloud, cloud shadow, and
p. 144, Feb. 2017. snow detection in multitemporal Landsat data: An algorithm designed
[8] J. Guo, F. Yang, H. Tan, J. Wang, and Z. Liu, “Image matching using specifically for monitoring land cover change,” Remote Sens. Environ.,
structural similarity and geometric constraint approaches on remote vol. 152, pp. 217–234, Sep. 2014.
sensing images,” J. Appl. Remote Sens., vol. 10, no. 4, Oct. 2016, [31] J. Qian, Y. Luo, Y. Wang, and D. Li, “Cloud detection of optical remote
Art. no. 045007. sensing image time series using mean shift algorithm,” in Proc. IEEE
[9] Y. Zhang, F. Du, and C. Zhu, “DEM extraction and accuracy assessment Int. Geosci. Remote Sens. Symp. (IGARSS), Jul. 2016, pp. 560–562.
based on ZY-3 stereo images,” in Proc. Int. Conf. Comput. Sci. Netw. [32] L. Gmezchova, J. Amorslpez, and G. Campsvalls, “Cloud masking and
Technol., Dec. 2012, pp. 1439–1442. removal in remote sensing image time series,” J. Appl. Remote Sens.,
[10] W. B. Rossow and L. C. Garder, “Cloud detection using satellite vol. 11, no. 1, Jan. 2017, Art. no. 015005.
measurements of infrared and visible radiances for ISCCP,” J. Climate, [33] O. Hagolle, M. Huc, D. V. Pascual, and G. Dedieu, “A multi-temporal
vol. 6, no. 12, pp. 2341–2369, Dec. 1993. method for cloud detection, applied to FORMOSAT-2, VENμS, LAND-
[11] L. L. Stowe et al., “Global distribution of cloud cover derived from SAT and SENTINEL-2 images,” Remote Sens. Environ., vol. 114, no. 8,
NOAA/AVHRR operational satellite data,” Adv. Space Res., vol. 11, pp. 1747–1755, 2010.
no. 3, pp. 51–54, 1991. [34] N. R. Goodwin, L. J. Collett, R. J. Denham, N. Flood, and D. Tindall,
[12] G. Gesell, “An algorithm for snow and ice detection using AVHRR data “Cloud and cloud shadow screening across Queensland, Australia:
an extension to the APOLLO software package,” Int. J. Remote Sens., An automated method for Landsat TM/ETM + time series,” Remote
vol. 10, nos. 4–5, pp. 897–905, Apr. 2007. Sens. Environ., vol. 134, pp. 50–65, Jul. 2013.
[13] X. Huang, H. Chen, and J. Gong, “Angular difference feature extraction [35] Z. Shao, J. Deng, L. Wang, Y. Fan, N. Sumari, and Q. Cheng, “Fuzzy
for urban scene classification using ZY-3 multi-angle high-resolution autoencode based cloud detection for remote sensing imagery,” Remote
satellite imagery,” ISPRS J. Photogram. Remote Sens., vol. 135, Sens., vol. 9, no. 4, p. 311, Mar. 2017.
pp. 127–141, Jan. 2018. [36] C. Latry, C. Panem, and P. Dejean, “Cloud detection with SVM
[14] T. Suga, K. Muto, K. Yagi, T. Onodera, Y. Nakada, and K. Takano, technique,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGRSS),
“Thumbnail image size for image transmission system using satellite Jan. 2008, pp. 448–451.
communication in disaster,” Int. Inf. Inst., vol. 18, no. 3, pp. 1019–1027, [37] M. J. Hughes and D. J. Hayes, “Automated detection of cloud and cloud
Mar. 2015. shadow in single-date Landsat imagery using neural networks and spatial
[15] Q. Zhang and C. Xiao, “Cloud detection of RGB color aerial pho- post-processing,” Remote Sens., vol. 6, no. 6, pp. 4907–4926, May 2014.
tographs by progressive refinement scheme,” IEEE Trans. Geosci. [38] J. Deng, H. Wang, and J. Ma, “An automatic cloud detection algorithm
Remote Sens., vol. 52, no. 11, pp. 7264–7275, Nov. 2014. for landsat remote sensing image,” in Proc. Int. Workshop Earth Observ.
[16] Z. Li, H. Shen, H. Li, G. Xia, P. Gamba, and L. Zhang, “Multi- Remote Sens. Appl. (EORSA), Jul. 2016, pp. 395–399.
feature combined cloud and cloud shadow detection in GaoFen-1 wide [39] L. Xu, A. Wong, and D. A. Clausi, “A novel Bayesian spatial-
field of view imagery,” Remote Sens. Environ., vol. 191, pp. 342–358, temporal random field model applied to cloud detection from remotely
Mar. 2017. sensed imagery,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 9,
[17] F. Xie, M. Shi, Z. Shi, J. Yin, and D. Zhao, “Multilevel cloud detection pp. 4913–4924, Sep. 2017.
in remote sensing images based on deep learning,” IEEE J. Sel. Topics [40] Q. Li, W. Lu, J. Yang, and J. Z. Wang, “Thin cloud detection of all-sky
Appl. Earth Observ. Remote Sens., vol. 10, no. 8, pp. 3631–3640, images using Markov random fields,” IEEE Geosci. Remote Sens. Lett.,
Aug. 2017. vol. 9, no. 3, pp. 417–421, May 2012.
[18] E. Başeski and A. Cenaras, “Texture and color based cloud detection,” [41] B. B. Barnes and C. Hu, “A hybrid cloud detection algorithm to improve
in Proc. 7th Int. Conf. Recent Adv. Space Technol. (RAST), Jun. 2015, MODIS sea surface temperature data quality and coverage over the
pp. 311–315. eastern gulf of Mexico,” IEEE Trans. Geosci. Remote Sens., vol. 51,
[19] Z. Chen, T. Deng, H. Zhou, and S. Luo, “Cloud detection based on no. 6, pp. 3273–3285, Jun. 2013.
HSI color space and SWT from high resolution color remote sensing [42] H. Ishida, Y. Oishi, K. Morite, K. Moriwaki, and T. Y. Nakajima,
imagery,” Proc. SPIE, vol. 8919, Oct. 2013, Art. no. 891907. “Development of a support vector machine based cloud detection method
[20] M. Xia, W. Lu, J. Yang, Y. Ma, W. Yao, and Z. Zheng, “A hybrid method for MODIS with the adjustability to various conditions,” Remote Sens.
based on extreme learning machine and k-nearest neighbor for cloud Environ., vol. 205, pp. 309–407, Feb. 2018.
classification of ground-based visible cloud image,” Neurocomputing, [43] M. Everingham, L. van Gool, C. K. I. Williams, J. Winn, and
vol. 160, pp. 238–249, Jul. 2015. A. Zisserman. (2012). The PASCAL Visual Object Classes Chal-
[21] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, lenge 2012 (VOC2012) Results. [Online]. Available: http://www.pascal-
no. 7553, p. 436, 2015. network.org/challenges/VOC/voc2012/workshop/index.html
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[44] L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. (2018). [67] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks
“Encoder-decoder with atrous separable convolution for semantic image for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell.,
segmentation.” [Online]. Available: https://arxiv.org/abs/1802.02611 vol. 39, no. 4, pp. 640–651, Apr. 2017.
[45] Y. Tao, M. Xu, F. Zhang, B. Du, and L. Zhang, “Unsupervised-restricted [68] Y. Bengio, Practical Recommendations for Gradient-Based Training
deconvolutional neural network for very high resolution remote-sensing Deep Architectures, vol. 7700. Berlin, Germany: Springer, 2012.
image classification,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 12, [69] S. Qiu, B. He, Z. Zhu, Z. Liao, and X. Quan, “Improving fmask cloud
pp. 6805–6823, Dec. 2017. and cloud shadow detection in mountainous area for Landsats 4–8
[46] A. Romero, C. Gatta, and G. Camps-Valls, “Unsupervised deep feature images,” Remote Sens. Environ., vol. 199, pp. 107–119, Jun. 2017.
extraction for remote sensing image classification,” IEEE Trans. Geosci. [70] Z. Zhu, S. Wang, and C. E. Woodcock, “Improvement and expansion
Remote Sens., vol. 54, no. 3, pp. 1349–1362, Mar. 2016. of the fmask algorithm: cloud, cloud shadow, and snow detection for
[47] E. Li, J. Xia, P. Du, C. Lin, and A. Samat, “Integrating multilayer Landsats 4–7, 8, and sentinel 2 images,” Remote Sens. Environ., vol. 159,
features of convolutional neural networks for remote sensing scene pp. 269–277, May 2015.
classification,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 10, [71] S. Simon. StATS: What is A Kappa coefficient? (Cohen’s Kappa).
pp. 5653–5665, Oct. 2017. [Online]. Available: http://www.pmean.com/definitions/kappa.htm
[48] Y. Luo, L. Zhou, S. Wang, and Z. Wang, “Video satellite imagery super
resolution via convolutional neural networks,” IEEE Geosci. Remote
Sens. Lett., vol. 14, no. 12, pp. 2398–2402, Dec. 2017.
[49] M. L. Goff, J. Y. Tourneret, H. Wendt, M. Ortner, and M. Spigai, “Deep
learning for cloud detection,” in Proc. Int. Conf. Pattern Recognit. Syst. Jingyu Yang (M’10–SM’17) received the B.E.
(ICPRS), vol. 10, Oct. 2017, pp. 1–6. degree from the Beijing University of Posts and
[50] Y. Chen, R. Fan, M. Bilal, X. Yang, J. Wang, and W. Li, “Multilevel Telecommunications, Beijing, China, in 2003, and
cloud detection for high-resolution remote sensing imagery using multi- the Ph.D. degree (Hons.) from Tsinghua University,
ple convolutional neural networks,” ISPRS Int. J. Geo-Inf., vol. 7, no. 5, Beijing, in 2009.
p. 181, May 2018. He has been a Faculty Member with Tianjin
[51] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification University, Tianjin, China, since 2009, where he is
with deep convolutional neural networks,” in Proc. Int. Conf. Neural Inf. currently a Professor with the School of Electrical
Process. Syst. (ICONIP), Aug. 2012, pp. 1097–1105. and Information Engineering. He was with Microsoft
[52] C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE Research Asia (MSRA), Beijing, in 2011, within
Conf. Comput. Vis. Pattern Recognit. (CVPR), May 2015, pp. 1–9. the MSRAs Young Scholar Supporting Program,
[53] K. Simonyan and A. Zisserman. (2016). “Very deep convolutional and with the Signal Processing Laboratory, École Polytechnique Fédérale
networks for large-scale image recognition.” [Online]. Available: de Lausanne, Lausanne, Switzerland, in 2012 and from 2014 to 2015. His
https://arxiv.org/abs/1409.1556 research interests include image video processing, 3-D imaging, and computer
vision.
[54] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
Dr. Yang served as the Special Session Chair in the International Conference
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
on Visual Communications and Image Processing 2016 and the Area Chair in
(CVPR), Jun. 2016, pp. 770–778. the International Conference on Image Processing 2017. He was selected in
[55] F. Yu and V. Koltun. (2015). “Multi-scale context aggregation by dilated the program for New Century Excellent Talents in University (NCET) from
convolutions.” [Online]. Available: https://arxiv.org/abs/1511.07122 the Ministry of Education, China, in 2011, the Reserved Peiyang Scholar
[56] Y. Zhan, J. Wang, J. Shi, G. Cheng, L. Yao, and W. Sun, “Distinguishing Program of Tianjin University in 2014, and the Tianjin Municipal Innovation
cloud and snow in satellite images via deep convolutional network,” Talent Promotion Program in 2015.
IEEE Geosci. Remote Sens. Lett., vol. 14, no. 10, pp. 1785–1789,
Oct. 2017.
[57] G. Lin, A. Milan, C. Shen, and I. Reid, “Refinenet: Multi-path refinement
networks for high-resolution semantic segmentation,” in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 5168–5177.
Jianhua Guo (S’18) received the B.E. degree in
[58] S. G. Subodh Kalia, S. Li, and R. R. Nemani, “DeepSAT’s cloudCNN:
surveying and mapping engineering from Anhui
A deep neural network for rapid cloud detection from geostationary
Jianzhu University, Hefei, China, in 2014, and the
satellites,” in Proc. Fall Meeting, Aug. 2017, pp. 1589–1596.
M.A. degree in geodesy and surveying engineering
[59] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep from Liaoning Technical University, Fuxin, China,
convolutional encoder-decoder architecture for image segmentation,” in 2017. He is currently pursuing the Ph.D. degree
IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, with the School of Electrical and Information Engi-
Dec. 2017. neering, Tianjin University, Tianjin, China. He was a
[60] S. Ozkan, M. Efendioglu, and C. Demirpolat. (2018). “Cloud detection jointly educates student with the Satellite Surveying
from RGB color remote sensing images with deep pyramid networks.” and Mapping Application Center, Beijing, China,
[Online]. Available: https://arxiv.org/abs/1801.08706 from 2015 to 2017.
[61] G. X. Zhaoxiang Zhang, A. Iwasaki, and J. Song. (2018). His research interests include remote sensing image matching, classifi-
Small Satellite Cloud Detection Based on Deep Learning and cation, and segmentation. More details about his work can be found at
Image Compression. [Online]. Available: https://www.preprints.org/ https://www.researchgate.net/profile/Jianhua_Guo11.
manuscript/201802.0103/v1
[62] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net-
works for biomedical image segmentation,” in Proc. Int. Conf. Med.
Image Comput. Comput.-Assist. Intervent (MICCAI), 2015, pp. 234–241.
[63] Z. Yue, X. Fengying, and Z. Jiang, “A cloud detection method for landsat Huanjing Yue (M’17) received the B.S. and Ph.D.
8 images based on PCANet,” Remote SensingPattern Recognit., vol. 10, degrees from Tianjin University, Tianjin, China,
no. 6, p. 877, Jun. 2018. in 2010 and 2015, respectively.
[64] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and She was an intern with Microsoft Research Asia,
A. L. Yuille. (2014). “Semantic image segmentation with deep Beijing, China, from 2011 to 2015. She visited
convolutional nets and fully connected CRFs.” [Online]. Available: the Video Processing Laboratory, University of
https://arxiv.org/abs/1412.7062 California at San Diego, La Jolla, CA, USA,
[65] K. Yuan, G. Meng, D. Cheng, J. Bai, S. Xiang, and C. Pan, “Efficient from 2016 to 2017. She is currently an Assistant
cloud detection in remote sensing images using edge-aware segmentation Professor with the School of Electrical and Infor-
network and easy-to-hard training strategy,” in Proc. IEEE Int. Conf. mation Engineering, Tianjin University. Her research
Image Process. (ICIP), Sep. 2017, pp. 61–65. interests include image processing and computer
[66] M. Lin, Q. Chen, and S. Yan. (2013). “Network in network.” [Online]. vision.
Available: https://arxiv.org/abs/1312.4400 Dr. Yue received the Microsoft Research Asia Fellowship Honor in 2013.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
YANG et al.: CDnet: CNN-BASED CLOUD DETECTION FOR REMOTE SENSING IMAGERY 17
Zhiheng Liu received the B.E. degree in surveying Kun Li (M’12) received the B.E. degree from the
and mapping engineering from the Henan University Beijing University of Posts and Telecommunica-
of Engineering, Zhengzhou, China, in 2014, and tions, Beijing, China, in 2006, and the master’s and
the master’s degree from the School of Geology Ph.D. degrees from Tsinghua University, Beijing,
Engineering and Geomatics, Chang’an University, in 2011.
Xi’an, China, where he is currently pursuing the She visited the École Polytechnique Fédérale
Ph.D. degree. He was a jointly educates student with de Lausanne, Lausanne, Switzerland, in 2012 and
the Satellite Surveying and Mapping Application from 2014 to 2015. She is currently an Associate
Center, Beijing, China, from 2015 to 2016. Professor with the School of Computer Science
His research interests include remote sensing and Technology, Tianjin University, Tianjin, China.
geology, image processing, and environmental Her research interests include dynamic scene 3-D
protection. reconstruction and image/video processing.
Dr. Li was selected in the Peiyang Scholar Program of Tianjin University
Haofeng Hu received the B.S. and Ph.D. degrees in 2016. She received the Platinum Best Paper Award in IEEE ICME 2017.
from Nankai University, Tianjin, China, in 2002 and
2011, respectively.
He visited the Institute of Optics, French National
Center for Scientific Research, Paris, France,
from 2011 to 2013. He is currently an Assistant
Professor with the School of Precision Instrument
and Opto-Electronics Engineering, Tianjin Univer-
sity, Tianjin. His research interests include optical
imaging and polarization imaging technologies.