1. Introduction
As clouds continuously cover approximately 66% of the earth’s surface, they are ubiquitous in satellite remote sensing images [
1,
2,
3]. The presence of clouds greatly hinders the accuracy and reliability of the inversion of surface and atmospheric parameters in quantitative remote sensing, such as the monitoring of land use and land cover changes and the retrievals of the aerosol optical depth, surface temperature, and particulate matter [
4,
5,
6]. Therefore, automatic and efficient cloud detection is of great significance for subsequent remote sensing image analysis.
Cloud detection studies have been previously conducted; cloud detection methods are primarily divided into two categories: physical rule-based detection methods and machine learning methods. Clouds are characterised by a high reflectivity in the visible and near-infrared bands and a low temperature in the thermal infrared bands [
7]. Physical rule-based methods are used to identify clouds and clear-sky pixels based on the spectral differences or low temperature properties of the clouds and surface in different wavebands. The function of the mask (Fmask) algorithm, a representative cloud detection method, produces a cloud detection probability map for Landsat-8 images based on different threshold values of both cloud reflection characteristics and the brightness temperature [
8,
9]. The Moderate Resolution Imaging Spectroradiometer (MODIS) cloud mask algorithm also applies the threshold method in order to obtain MODIS cloud mask products [
10,
11]. The threshold method has been widely adopted in cloud detection; however, the selection of the band and threshold strongly depends on the analysis of spectral differences between the cloud layers and typical surface features. Due to the complex surface environment and the diversity of cloud geometries, it is typically difficult to fully consider the influencing factors in order to determine the optimal threshold [
12]. In addition, the threshold-based method is sensitive to the changes in the atmospheric conditions and scene attributes and lacks versatility when transferred to other sensors [
13]. Clouds dynamically change and the images of the same location obtained at different times notably differ. Therefore, using the differences between multi-temporal image pixels is important to detect clouds and clear-sky pixels. Compared with the single-temporal image cloud detection method, the multi-temporal method can utilise spectral and timing sequence information to improve the cloud detection accuracy [
14,
15]. However, the major limitation of this method is that it requires clear-sky reference images and dense data, which increases the operating costs and limits the applicability of this method in time-sensitive emergencies [
16,
17]. The physical rule-based and time difference methods are primarily based on the cloud spectrum, time information, and a priori knowledge. Due to the high complexity of the surface environment and the uncertainty of certain parameters, the performances of the threshold and time difference methods differ in different regions and periods, thereby affecting the accuracy in cloud detection.
Machine learning, a data-driven technology, enhances the data learning and analysis capabilities of statistical methods due to its strong self-learning and information extraction capabilities, thereby minimising the influence of human factors. The cloud detection method based on machine learning defines the detection of cloud and clear-sky pixels as a classification problem. Support vector machine (SVM) and artificial neural network classification models are typically used in cloud detection research [
18,
19]. These methods can identify and utilise potential features of clouds, but they still require the selection of the parameters, design, and feature extraction, and their performance is constrained by the classification framework, network structure, and capacities [
20].
Deep learning, or convolutional neural network (CNN), as a branch of machine learning, has benefitted from deep convolution features and achieved breakthrough achievements in image classification tasks [
21]. Due to their powerful learning and feature extraction capabilities, CNN has also been successfully used in cloud detection research [
22,
23,
24]. In order to overcome the limitation of thin cloud detection, Shao et al. [
25] designed a multiscale feature CNN (MF-CNN) that fuses the features extracted from different deep network layers to effectively identify clouds. The U-network (U-net) [
26], which combines the shallowest and deepest features of the network, has become the standard deep learning method for image segmentation. Jeppesen et al [
27] and Francis et al [
28] defined cloud detection as a semantic segmentation task and used the U-net network to classify each pixel for identifying cloud and clear-sky pixels. Although deep learning approaches have a stronger data mining capability and can achieve more accurate cloud detection results compared to other methods, challenges remain with respect to the application of deep learning methods in cloud detection. First, the lack of training samples directly affects the performance of CNN [
12,
27]. Very few datasets, including images and artificial interpretation maps, are available for cloud detection research. Second, most methods are only applicable to local regions and specific types of satellite images, suggesting that different types of sensors generally cannot share datasets. When using deep learning methods for cloud detection on different types of satellite images, it is necessary to separately interpret the different types of satellite images for obtaining training data. This requires professional knowledge and is time-consuming, and the definition of the cloud used for manual interpretation varies for each person [
13]. Uncertainty regarding thin clouds, broken clouds, and clear-sky pixels in remote sensing interpretation is not conducive to the feature learning of CNN and leads to recognition errors. Based on the increase in the number of satellite image sources, deep learning cloud detection methods suitable for various satellite data must be established. Li et al. [
21] proposed a multi-sensor cloud detection method based on DNN, which can be used for cloud detection in Landsat-5/7/8, Gaofen-1/2/4, Sentinel-2, and Ziyuan-3 images. However, the training dataset of this method contains several types of satellite images, and manual interpretation cannot be avoided. Wieland et al. [
16] selected the Landsat Thematic Mapper (TM), Enhanced Thematic Mapper Plus (ETM+), Operational Land Imager (OLI), and Sentinel-2 sensors sharing spectral bands, using the Landsat-8 cloud cover validation data and Spatial Procedures for Automated Removal of Cloud and Shadow (SPARCS) datasets as training datasets, before applying CNN to achieve multi-sensor cloud and cloud shadow segmentation. However, this method is limited to the common spectral range of different sensors. Unique spectral properties of different sensors are lost, and this method cannot be applied to all sensors.
The spectral characteristics of ground objects are the result of the interaction between electromagnetic waves and ground object surfaces, which are typically reflected in the different bands of reflection, thermal radiation, microwave radiation, and scattering characteristics of ground objects. The spectral library is a collection of reflection spectra of various ground objects measured by the hyperspectral imaging spectrometer under specific conditions. The spectral library plays a crucial role in accurately interpreting remote sensing imagery through rapidly matching unknown features and improving the classification and recognition level of remote sensing. Gómez-Chova et al. [
29] have successfully used the spectral library and simulation data to achieve cloud detection in Advanced Along-Track Scanning Radiometer–Medium Resolution Imaging Spectrometer (AATSR-MERIS) images. The combination of the spectral library and CNN may be a promising method for cloud detection without annotating images.
With the increase in the number of satellites, satellite images have become more multi-sourced. Cloud detection methods based on deep learning should consider versatility in order to be suitable for multiple types of sensor images, and the methods should reduce a large amount of work related to labelling annotations. To achieve this, a novel cloud detection algorithm based on a spectral library and CNN (CD-SLCNN) is proposed in this study.
Figure 1 presents the detailed framework of the proposed algorithm. The CD-SLCNN algorithm consists of three main steps: the establishment of a spectral library, data simulation and cloud detection, which are both based on residual learning, and the one-dimensional CNN (Res-1D-CNN). The spectral library for cloud detection includes both the Advanced Spaceborne Thermal Emission Reflection Radiometer (ASTER) library [
30] and cloud pixel spectrum library. The ASTER spectral library contains a comprehensive collection of spectra and is selected as the typical feature spectral library, the details of which will be provided in
Section 2.1.1. The Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data, with a 0.4–2.5 µm continuous wave band, were used to establish a cloud pixel library, which can be utilised to explore slight differences between clouds and ground features [
31]. The pixel dataset consisting of the ASTER spectral library and cloud pixel spectral library was converted to a multispectral version dataset based on the sensor spectral response function (SRF) and Second Simulation of the Satellite Signal in the Solar Spectrum (6S) model [
32]. Cloud detection for different satellite remote sensing images, such as Landsat-8, MODIS, andSentinel-2, can be conducted using a shared set of spectral libraries, without requiring a new dataset. Finally, the Res-1D-CNN used as the classifier was applied in order to automatically learn and extract pixel spectral information and accurately detect the differences between clouds and ground features.
The remainder of the paper is organised as follows:
Section 2 describe the details of the proposed method,
Section 3 illustrates the experimental results and analysis, and
Section 4 provides the overall summary of this study.
3. Experiment and Results
We conducted cloud detection experiments on Landsat-8, Terra MODIS, and Sentinel-2A data of different regions and at different times in 2013–2020. In these experiments, the total number of samples in the pixel library was 85,394, wherein the number of cloud pixels and ground object pixels were 32,414 and 52,980, respectively. Bands 2, 3, 4, 5, 6, 7, and 9 were used for Landsat-8 images, bands 1-7 for MODIS images and bands 2, 3, 4, 8, 11, and 12 for Sentinel-2A images. The initial spectral library was simulated as the cloud detection spectral library of Landsat-8, MODIS, and Sentinel-2A satellite images, respectively. The cloud detection neural network (described in
Section 2.3) was used as the classifier to identify clouds and surface features. In addition, Landsat-8 cloud cover assessment validation data (Landsat-8 Biome) [
41] and MOD35 cloud product [
42] were used to validate the performance of the proposed algorithm qualitatively and quantitatively.
The Fmask algorithm, a widely recognised cloud detection algorithm, was used for the comparison [
43,
44]. Random forest (RF) and SVM classifiers, which are based on machine learning algorithms, were also used for comparison with the proposed method. The training data for RF and SVM were the same as that of the proposed method.
3.1. Detection Performance on Landsat-8 Data
For cloud detection, 62-scene Landsat-8 satellite images were used; 36-scene images with the approximate cloud percentage of 5–100% were from the Landsat-8 biome, and the other 26-scene images were randomly selected in the globe, which include not only different densities and sizes of clouds but also various land surfaces. The underlying surfaces of the validation data included barren land, grass/crops, forests, shrubland, urban areas, snow/ice, wetlands, and water bodies.
Figure 6 shows four typical cloud detection results of the Landsat-8 images, including vegetation, sand, mountains, towns, lakes, oceans, and other complex surface types. In each Landsat-8 image, the upper left part shows the false-colour composite image (RGB: bands 5, 4, and 3), and the upper right part displays the corresponding cloud detection results of the entire image, where the white and black parts represent the cloud and clear-sky pixels, respectively. For better comparisons of the cloud detection results with visual interpretations, the lower left and lower right parts of the images present the enlarged details of areas marked by yellow boxes in the upper left and upper right parts, respectively. From the comparison of the cloud detection results with the ground observations, the CD-SLCNN algorithm accurately and reliably identified most of the clouds in the image at different times and in different areas, revealing a large amount of underlying surface information. The CD-SLCNN algorithm performed well for the various cloud types and sizes over diverse environments. Thick, thin, and broken clouds were detected in the enlarged area in the lower right corner of (a), (b), (c), and (d) in
Figure 6, and the cloud edges were completely extracted.
In order to better analyse the details and detection results, the images obtained over more challenging areas are shown. The left and right sides in
Figure 7 show the Landsat-8 false-colour image (RGB: bands 5, 4, 3) and corresponding cloud detection result, respectively. In the cultivated land area (
Figure 7c), the cloud detection results are not affected by the coverage with various crops, and the clouds were well distinguished. Furthermore, thin clouds over darker surfaces, such as inland water and oceans, can also be accurately identified (
Figure 7h), suggesting acceptable classification results. Bright surfaces have similar spectral characteristics to clouds because of their high surface reflectance, particularly in the visible and NIR bands. This challenges traditional cloud detection approaches because it is difficult to determine an appropriate threshold and can lead to the misidentification of bright surfaces as clouds, as well as difficulties in accurately detecting thin clouds. The results presented in
Figure 7b,d,g show that the CD-SLCNN algorithm can detect most clouds over bare soils, urban areas, and rocks, respectively, with few cloud omissions and false recognitions, particularly regarding thin and broken clouds. Due to the fact that the colour and reflectance of snow and ice are similar to those of clouds in visible bands, the recognition of cloud over these environments has always been a challenge in cloud detection. However, the CD-SLCNN algorithm uses Res-1D-CNN to deeply mine the spectral differences and hidden features of clouds and ice/snow in the near-infrared band in order to distinguish ice/snow and clouds and obtain cloud contours that are complete (
Figure 7e). Small and broken clouds are typically omitted in cloud detection. However, the cloud detection results show that the CD-SLCNN algorithm efficiently recognised small and broken clouds well, with a good level of robustness. Notably, the performance of the CD-SLCNN algorithm is slightly inferior in urban areas with low vegetation coverage and bright buildings, with a small number of misidentifications of clear-sky pixels as clouds.
In order to further assess the effectiveness of the CD-SLCNN algorithm, the cloud extraction results from Fmask 4.0, RF, SVM, and our proposed method on 36-scene Landsat-8 biome images were compared with the manual cloud mask. These results are presented in
Figure 8. In
Figure 8, the first column displays the false-colour composite images (RGB: bands 6, 5, and 4); the second column displays the manual cloud mask; the third to sixth columns show the cloud detection results from RF, Fmask4.0, SVM, and our proposed method, respectively. The second, fourth, sixth, and eighth rows show the enlarged details of areas marked by red boxes in the first, third, fifth, and seventh row, respectively. In the third to sixth columns, white indicates that the results are consistent with the mask, red indicates the misclassification and less red indicates a better effect on cloud detection. The cloud extractions of the RF and Fmask 4.0 algorithm have significant differences with the cloud mask, which implies poor cloud recognition. For different underlying surfaces, there were various omission and commission errors. The results from SVM and the proposed method were similar to the cloud mask, and the overall detection was good. Furthermore, for clouds over water bodies and thin clouds, the CD-SLCNN algorithm performed better than SVM. The CD-SLCNN algorithm can extract deep information, which produces accurate and stable detection results. Nevertheless, the cloud cover areas detected using our method exhibited a high consistency with the cloud mask, indicating promising results. There were fewer misclassified cloud pixels in our results than those obtained using competing methods, as observed in the enlarged images in the yellow rectangles in
Figure 8.
3.2. Detection Performance on MODIS and Sentinel-2 Data
MODIS data have been widely used for cloud detection. The MODIS cloud mask (MOD35) is a representative cloud detection product. In this experiment, 25-scene Terra MODIS 1B images and the corresponding MOD35 cloud product were used. Furthermore, 20-scene Sentinel-2A images were also applied to the validation.
Figure 9 shows the cloud detection results for the MODIS data over diverse underlying surfaces. The first row presents false-colour images that were synthesised using bands 5, 4, and 3; the second row shows the manual cloud mask; the third and fifth rows display cloud detection results and the MOD35 cloud mask, respectively; the fourth and the last rows show the misclassification of the CD-SLCNN results and the MOD35 cloud mask, respectively. Green denotes the consistency with the manual cloud mask and red implies misclassification (e.g., objects identified as clouds or clouds identified as objects) in the cloud detection results of the CD-SLCNN and MOD35 cloud mask. These results indicated that the CD-SLCNN algorithm can effectively identify clouds over soil, vegetation, urban areas, and water bodies with fewer misclassifications. Based on the abundant features obtained by CNN, the CD-SLCNN algorithm could effectively distinguish the surface, as well as thick, thin, and broken clouds.
Figure 10 presents the cloud detection results for Sentinel-2A data, including thick clouds (thick and bright white,
Figure 10a–c), thin clouds (thin and translucent,
Figure 10d,e), and broken clouds (small size, and irregular shapes,
Figure 10c,f). The first row shows false-colour images synthesised using bands 8, 4, and 3; the second row shows the manual cloud mask; the third and the last rows display cloud detection results and the difference in the CD-SLCNN results with the manual cloud mask, respectively. Green denotes the consistency with the manual cloud mask, whereas red denotes the inconsistency. The proposed algorithm performed well on thick clouds with subtle differences. The extraction results on thin clouds and broken clouds were slightly inferior, however, a recognition accuracy of 80% and above was achieved.
The interpretation and analysis that were conducted to validate the three datasets indicate that the CD-SLCNN algorithm proposed in this study was suitable for multi-satellite remote sensing images and achieved a good performance for diverse underlying surfaces. The detection of thin and broken clouds has also been improved. In
Section 3.3, the quantitative analysis of cloud detection results is described.
3.3. Quantitative Analysis
3.3.1. Landsat-8 Biome
This study used 36-scene Landsat-8 biome images for quantitative verification. The cloud mask of the Landsat-8 biome was used as reference data to evaluate the accuracy of the cloud detection results from RF, SVM, and the proposed algorithm. The mean intersection over union (MIoU), kappa coefficient (Kappa), overall accuracy (OA), producer accuracy (PA), user accuracy (UA), and F1-score were used to quantitatively evaluate the proposed CD-SLCNN method.
Table 1 shows that the CD-SLCNN method was better than the competing methods in terms of OA, UA, MIoU, and Kappa. Our proposed method was advantageous in terms of Kappa (84.27%) and had the highest OA (95.60%), UA (96.43%), and MIoU (77.82%) accuracies among all competing methods.
3.3.2. MODIS and Sentinel-2
Herein, 25-scene Terra MODIS 1B images and 20-scene Sentinel-2A images were quantitatively evaluated.
Table 2 and
Table 3 present the quantitative validation results of the different methods. From
Table 2 and
Table 3, we observed that the proposed method achieved an outstanding cloud detection performance on MODIS and Sentinel-2 data. The OA, Kappa, and MIoU for cloud detection on MODIS images were 95.36%, 83.78%, and 77. 94%, respectively. This implies that the proposed method exhibited a high performance in detecting clouds over diverse underlying surfaces.
Table 3 suggests that good results were obtained for the Sentinel-2A images with an OA of >94% and Kappa of >80%. The high PA value (91.61%) implies that most of the clouds in the Sentinel-2A images can be well recognised. The UA (96.84%) demonstrated that there were fewer misclassifications. The recognition of thin clouds was slightly inferior to thick clouds due to the translucency and unclear boundaries of thin clouds.
3.4. Discussion
Qualitative and quantitative evaluations show that the proposed algorithm performed well in terms of the cloud detection for various types of remote sensing images. In order to ensure the wide application of the algorithm in multi-satellite remote sensing images, multispectral data simulation technology was adopted. In order to minimise the errors from the data simulation, all of the possible influencing factors, such as the atmospheric model, aerosol type, and AOD, were considered. Moreover, abundant spectral samples and deep CNN models further promote high-precision cloud detection. Compared with traditional machine learning algorithms, deep learning models have incomparable nonlinear function approximation capabilities and can extract and express data features well. The combination of the data simulation and deep learning model provides a promising strategy for cloud detection for various types of satellite data. The cloud detection results on Landsat-8, MODIS, and Sentinel-2A fully demonstrate the robustness of the proposed algorithm.
Although the algorithm achieved automatic and efficient cloud detection, it has certain limitations. Spectral libraries and data simulation reduce the requirement for remote sensing interpretation and the influence of human subjectivity. However, they also ignore spatial information, which results in limited feature learning for complex scenes, such as a spectral information mixture of thin clouds and the ground surface, or the presence of bright underlying surfaces that may be easily confused with clouds. Achieving weakly supervised cloud detection from remote sensing images has attracted attention previously [
45]. In future studies, we will explore the fusion of spectral and spatial information and propose unsupervised and high-precision cloud detection algorithms for multispectral remote sensing data.
4. Conclusions
For efficient and accurate cloud detection for various satellite images, a universal cloud detection algorithm (CD-SLCNN), which combines a spectral library and Res-1D-CNN, is presented in this paper. The algorithm was supported by the prior spectral library and data simulation. Data simulation technology based on SRF can rapidly obtain a spectral library of the sensor to be detected. A Res-1D-CNN model was used to perform automatic cloud detection by automatically learning and extracting features of cloud and clear-sky pixels, which reduced the effects of human-related factors. The CD-SLCNN method achieved cloud detection with a high accuracy on different types of multispectral images. The experimental results of Landsat-8, Terra MODIS, and Sentinel-2 images show that the proposed method performed well on different clouds types, sizes, and densities over different surface environments. Furthermore, compared with the Fmask cloud detection results, quasi-synchronous MODIS cloud mask products (MOD35), and the results of RF and SVM methods, the proposed method obtained the highest OA, Kappa, and MIoU, and the PA was higher by 2.81%, 3.2%, and 5.57% than the results of the SVM on the Landsat-8, MODIS, and Sentinel-2 data, respectively. The comparison demonstrated that the CD-SLCNN method produced a more accurate cloud contour for thick clouds, thin clouds, and broken clouds than the current methods. The dual-channel CNN and the information fusion between different layers enhanced the unique features that distinguish clouds from ground objects and enabled their comprehensive extraction. In some special scenes, such as ice and snow, the proposed method has also achieved good results with fewer misclassifications, which is of great significance for cloud detection over bright surface environments.
The CD-SLCNN algorithm is applicable to diverse sensor data and only requires SRFs of multispectral data, avoiding the time-consuming and labour-intensive manual labelling of vectors in order to establish the training dataset for various remote sensing data. In addition, the algorithm is highly automated, can process large remote sensing datasets in batches, and is simple to operate.