1. Introduction
CO
2 is one of the greenhouse gases on earth, and the increase in its concentration over the last century seriously affects the environment of human survival [
1]. The IPCC report in 2014 shows that the radiative forcing effect of aerosols is the largest source of uncertainty in climate change assessment. Monitoring and evaluating the spatial distribution and parameters of both CO
2 and aerosols have become a matter of great concern for scientists. Therefore, research and improvement of aerosol and CO
2 parameter retrieval schemes have been carried out all over the world. Large-scale observation of the earth can be achieved by satellite remote sensing, which has become one of the main ways to detect CO
2 and aerosol [
2].
The China Carbon Dioxide Observation Satellite (TANSAT) project is the first global atmospheric carbon dioxide observation scientific experimental satellite developed entirely by China. It was successfully launched by the CZ-2D carrier rocket at 3:22 on 22 December 2016. TANSAT is a sun-synchronous polar orbiting satellite with an orbital altitude of 712 km. There are two key instruments on the satellite: carbon dioxide sensor (CDS) and cloud and aerosol polarization imager (CAPI) [
3]. CAPI aims to identify cloud and aerosol interference in the atmospheric to more accurately retrieve CO
2 parameters. CAPI has a spatial resolution of 250 m/1000 m, and a calibration accuracy of better than 5%. There are two modes of sub-satellite point observation and flare observation on it, between which the sub-satellite point observation mode is not affected by the solar flare. CAPI is equipped with five near ultraviolet to near infrared bands (0.38, 0.67, 0.87, 1.375, and 1.64 μm), as shown in
Table 1, and realizes the small-scale lightweight design. Similar to the cloud and aerosol imager (TANSO-CAI) on the greenhouse gas observation satellite (GOSAT) [
4], CAPI has an additional 1.375 μm band for cirrus detection. CAPI adds polarization channels at 0.67 and 1.64 μm, respectively [
5]. Its swath width is 400 km, which provides a large enough area for detecting aerosol spatial distribution and cloud cover.
However, most remote sensing images contain clouds [
6]. Cloud is an interference factor for atmospheric retrieval, especially the thin cirrus, which is difficult to detect, and its existence will greatly affect the retrieval accuracy of aerosol and carbon dioxide parameters [
7]. Therefore, in order to minimize the impact, accurate cloud detection plays an important role in image preprocessing.
Since the advent of the satellite era, many cloud recognition algorithms have been developed. The cloud recognition algorithms are mainly divided into three categories: machine learning method [
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18], cloud texture and spatial features method [
19,
20,
21,
22,
23], and spectral thresholds method [
24,
25,
26,
27,
28,
29,
30]. The research status of these methods is classified and summarized in
Table 2.
With the increasing application of machine learning in the field of remote sensing, it provides another effective means for cloud detection. However, because the machine learning method usually requires a large number of training samples for model construction, the workload is large, and the universality of the model is often not effectively guaranteed. The method of cloud texture and spatial features performs cloud detection based on the spatial information features of the image. The cloud has a unique texture feature that distinguishes it from the background, which is reflected in the spatial variation of the spectral brightness value. Due to the complexity of this algorithm and the large amount of calculation, it is difficult to meet the needs of high-efficiency calculations. The basic principle of the spectral thresholds method of cloud recognition is to rely on the spectrum difference between the cloud and the earth’s surface in the remote sensing image. It is a physical method to distinguish between the cloud and the earth’s surface by setting the radiance (reflectivity) thresholds. Due to its simple model and fast calculation speed, this method can meet the needs of batch image processing. Therefore, this research is based on the spectral thresholds method to design a cloud detection algorithm suitable for CAPI data.
However, there are a few researches on cloud recognition algorithm for CAPI at present. Only one scheme based on Chinese FengYun-3A Polar-orbiting Meteorological Satellite data was proposed by Wang et al. before satellite launch [
30]. Although the cloud and aerosol unbiased decision intellectual algorithm (CLAUDIA) has been proved to be applicable to all remote sensing instrument data with corresponding visible to thermal infrared bands [
27], it also has limitations. For CAPI, due to design limitation, the channels that can be applied to the CLAUDIA method only contain four bands from visible to near-infrared, which is difficult to achieve the accuracy of cloud detection algorithms using thermal infrared bands. In addition, using 0.67 and 0.87 μm bands for cloud detection needs the support of minimum surface reflectance database to ensure the inversion accuracy, while the use of the minimum surface reflectance database has its own limitations such as causing image interpretation errors.
To solve the above problems, this paper proposes a more effective cloud detection method for CAPI, which is a thresholds detection method based on different underlying surface thresholds from near ultraviolet to near infrared (NNDT). For the first time, 0.38 μm and the ratio of 0.38 to 1.64 μm bands are used in cloud detection. First, the algorithm designed unique band test combinations for the four surface types of ocean, vegetation, desert, and polar regions. Subsequently, CAPI data have been collected over the four surface types to establish threshold values for the cloud detection spectral tests, and the corresponding fixed thresholds are determined. Finally, in order to verify and evaluate the effectiveness of the NNDT algorithm, the cloud recognition results of the algorithm and the official cloud recognition results of the Moderate Resolution Imaging Spectroradiometer (MODIS) and the Second-generation GLobal Imager (SGLI) are compared visually and quantitatively, respectively. Experiments prove that the algorithm can overcome the above limitations and can effectively identify the cloud.
This paper is organized as follows. In
Section 2, the NNDT algorithm is introduced, including the characteristics analysis of wavelength and wavelength combination and the introduction of algorithm flow (including: preprocessing process, algorithm design for different underlying surfaces, and the statistics and determination of each threshold).
Section 3 gives the verification results of cloud detection. Finally, the experimental results are discussed in
Section 4, and in
Section 5, the conclusions are given and the future work is prospected.
4. Discussion
From the visual comparison results in
Section 3.1, it can be seen that the NNDT algorithm cloud recognition results and MYD35 have a relatively consistent cloud distribution range in the five scenes, but there are differences in local details. Moreover, there is a value difference between 1% and 5% of the cloud amount between the two results. The common reasons that caused the cloud cover difference are that the cloud displacement caused during the transit time difference between CAPI and MODIS as well as the cloud distortion caused by the observation geometric difference and geometric correction error [
55]. Other reasons will be discussed separately for different scenarios.
In the Pacific Ocean scene, compared with MYD35, the cloud area identified by the NNDT algorithm is more consistent with the distribution of clouds. In most cases, the spatial distribution of cloud optical thickness is gradual. The distribution areas of optically thin clouds include the edge of the cloud, the hole in the center of the cloud cluster, and the middle of two close cloud clusters, which are difficult to be seen subjectively in RGB images. Due to the sensitivity of 0.38 μm band to the edge details of middle and low clouds and 1.375 μm band to optically thin clouds, the existence of these thin clouds can be objectively determined by the reflectivity thresholds of these two bands. Therefore, for the phenomenon that the pixels of optically thin clouds or edge clouds are recognized as clouds by NNDT algorithm, but not in MYD35, it shows the advantages of NNDT algorithm in the recognition of optically thin clouds and cloud edge pixels on the ocean surface, excluding the cloud displacement during this time difference and the cloud distortion caused by the geometric difference between the two instruments.
In the cloud detection results of bare ground in
Figure 6, for the recognition of small scattered clouds, the cloud shape of CAPI cloud flag is more discrete, while the cloud shape of MYD35 is more continuous. The reason is that while the NNDT algorithm uses the threshold test of the band combination
to suppress the recognition of the highlighted sandbank as a cloud, it also causes a part of the cloud pixels with a gray value similar to that of the sandbank to be recognized as clear. Therefore, the edges of some small scattered clouds are missed. The MODIS cloud recognition algorithm includes cloud detection in the thermal infrared band [
32], which can make up for the omission of point cloud detection. Point clouds are thick clouds. However, for high and thin clouds, where the 1.375 μm test is not limited by the
band combination, there will be no similar missed detection.
From the six verification scores of the four scenarios in
Section 3.2, it can be seen that the POD values are all greater than 0.8, most of which are close to 1, indicating that the NNDT algorithm cloud recognition result basically conforms to the official SGLI algorithm result. The FAR
clear values are all below 0.1, even close to 0, except in the ocean scene, indicating that the NNDT algorithm rarely distinguishes the pixels marked as cloud by SGLI CLFG as clear. In the ocean scene, the FAR
clear value is slightly larger than in other scenes, which is 0.14. The reason is that the tests use the thresholds based on CAPI data statistics, which cannot be fully applicable to SGLI data, so the NNDT algorithm misses some cloud pixels, which is most obvious on the underlying surface of the ocean. The FAR
cloud has good values close to 0 in other scenes, but it is 0.44 in the polar scene. However, the high consistency of NNDT-CLFG with true color images proves the effectiveness of the NNDT algorithm. The HR values of all scenes are between 0.86 and 0.99, which indicates that the NNDT algorithm has a high cloud recognition hit rate and further proves that the NNDT algorithm has a high cloud recognition accuracy.
Through the verification examples and verification scores in
Section 3.1 and
Section 3.2, it can be seen that, in general, the NNDT algorithm cloud recognition results have a more consistent range and cloud amount on the four underlying surfaces compared with MYD35 and SGLI CLFG. The NNDT algorithm has excellent performance in the ocean, vegetation, and snow underlying surface, especially in the detection of thin clouds and cloud edges, which are difficult to detect. In scenes affected by sand with high reflectivity, the NNDT algorithm can identify most of the clouds, although the recognition of pixels at the edge of the cloud and thin cloud pixels is insufficient. Moreover, the algorithm can effectively avoid identifying inland waters as clouds. These verifications can prove that the 0.38 μm band test can be used as an effective alternative to the 0.67 and 0.87 μm band tests in the cloud recognition algorithm and does not require the support of the minimum surface reflectance database of the corresponding band. The band test of
can be effectively used to identify cloud pixels on the polar surface. Based on the principle of identifying all cloud pixels in the image as much as possible, the NNDT algorithm proposed in this paper is more concise than the CLAUDIA algorithm and other algorithms that need to calculate the confidence level. It shows that the NNDT algorithm is reasonable, efficient, and suitable for cloud recognition of CAPI data.
5. Conclusions
Based on the requirements of TANSAT satellite sensor aerosol and CO2 inversion for cloud detection, an innovative cloud recognition algorithm—NNDT (near-ultraviolet to near-infrared band with threshold tests for different underlying surfaces), suitable for CAPI data, is proposed. The algorithm uses unique band test combinations for four types of underlying surface, which are ocean, vegetation, desert, and polar regions, respectively. From the CAPI 2017 data, the four underlying surfaces are counted to obtain the thresholds used in all cloud recognition tests of this algorithm. Subsequently, two comparison methods are used, namely, a visual comparison method based on CAPI data and MODIS official cloud recognition results, and a quantitative comparison method based on SGLI data and SGLI official cloud recognition results, to verify and evaluate the effectiveness of the NNDT algorithm. Both verification methods randomly selected typical target areas representing four surface types (ocean, vegetation, desert, polar) and performed experimental verification. It can obtain good cloud recognition effects in all scenarios, with a hit rate between 0.86 and 0.99. Combined with the analysis of the spectral characteristics, it preliminarily shows that the 0.38 μm band can well distinguish water, vegetation, and desert surface from clouds, and the ratio of 0.38 to 1.64 μm band has an excellent distinction ability between clouds and polar snow in the CAPI data.
This algorithm uses the combined threshold detections of near-ultraviolet to near-infrared bands to get rid of the drawbacks introduced using the 0.67 and 0.87 μm minimum reflectance database in traditional cloud detection algorithm. Moreover, the introduction of near-ultraviolet band to cloud detection algorithm enables it to obtain an effect comparable to that with thermal infrared band support, even without using the thermal infrared band. Based on the principle of identifying cloud pixels as much as possible, classifying all image pixels into cloud and clear categories can greatly improve the efficiency and accuracy of data utilization in aerosol and CO2 inversion. This algorithm applies the 0.38 μm and the ratio of 0.38 to 1.64 μm to cloud recognition unprecedentedly, providing a theoretical basis and new ideas for future research.
In this paper, the underlying surface types are divided into four categories. The experimental regions selected for algorithm validation are different from those selected for threshold statistics. However, the application scenarios of this article cannot cover all the more detailed and special surface types in the world. In the future, we will focus on exploring and studying the differences of thresholds in different regions. According to the spectral characteristics in different regions, the underlying surfaces can be grouped in more detail, and the corresponding band detection schemes and thresholds can be set. Furthermore, cloud shadow is also a challenge in cloud detection. The surface data covered by cloud shadows will also affect the use of data by researchers [
56,
57,
58]. In this paper, this method cannot solve this problem. This algorithm recognizes all cloud shadow areas as clear, which may affect the application effect of data in aerosol inversion. In future work, we will consider some bands and combinations that can detect cloud shadows, as well as corresponding threshold statistics.