Next Article in Journal
Monitoring the Ice Thickness in High-Order Rivers on the Tibetan Plateau with Dual-Polarized C-Band Synthetic Aperture Radar
Previous Article in Journal
The Analysis of Cones within the Tianwen-1 Landing Area
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Machine Learning Algorithm for Cloud Detection Using AERI Measurement Data

1
College of Meteorology and Oceanography, National University of Defense Technology, Changsha 410073, China
2
Unit No. 91922 of PLA, Sanya 572099, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(11), 2589; https://doi.org/10.3390/rs14112589
Submission received: 4 April 2022 / Revised: 25 May 2022 / Accepted: 25 May 2022 / Published: 27 May 2022
(This article belongs to the Section Atmospheric Remote Sensing)

Abstract

:
Infrared hyperspectral remote sensing has been widely used in the field of meteorology. Many scientists have carried out research on inversion methods of meteorological elements such as thermodynamic profile, boundary layer height, cloud base height, etc. In this study, a method based on machine learning for cloud detection using ground-based infrared hyperspectral radiation data is proposed. The features of outliers, the cloudy and cloud-free data of Atmospheric Emitted Radiance Interferometer (AERI) radiation are extracted. The “reference values” of cloudy and cloud-free are determined based on the observation data of Vaisala CL31 ceilometer within the time range of 8 min before the corresponding time of AERI. A support vector machine (SVM) algorithm is used for training. The dataset comes from the Atmospheric Radiation Measurement (ARM) Southern Great Plains (SGP) site and North Slope Alaska (NSA) site from 2015 to 2017, and the ARM West Antarctic Radiation Experiment (AWARE) site in 2016 is also analyzed. The instruments used in this paper include AERI, ceilometer, etc. The experimental results reveal that the agreement of cloud detection results between the proposed algorithm and ceilometer is about 93% at each site. However, for high clouds or optically thin clouds, the agreement will decrease.

Graphical Abstract

1. Introduction

Hyperspectral remote sensing was first proposed in the 1970s, and further developed by geologists and geophysicists [1,2]. Compared with multispectral remote sensing, hyperspectral remote sensing can acquire many, narrow, contiguous spectral bands and had higher spectral resolution [3]. In addition, it can provide a more in-depth examination of features and details on Earth [4]. Therefore, hyperspectral remote sensing has been widely used in the field of meteorology. Hyperspectral sensors on satellites can provide data covering the globe on a regular basis, containing valuable information on atmospheric temperature and humidity profiles, greenhouse gases, clouds, and other near-surface characteristics [5]. Since the 1990s, satellite-based infrared hyperspectral sounders have been developed, including the polar-orbiting Interferometric Monitor for Greenhouse Gases (IMS) [6], the grating-based Atmospheric Infrared Sounder (AIRS) on Aqua [7,8], the first operational interferometer called Infrared Atmospheric Sounding Interferometer (IASI) on Meteorological Operation (MetOp) [9], and the Cross-Track Infrared Sounder (CrIS) on the Suomi National Polar-Orbiting Partnership (Suomi NPP) satellite (replacing NOAA’s legacy broadband IR sounder, HIRS) [10,11]. In addition, China has also developed its own infrared hyperspectral sounder named Geostationary Interferometric Infrared Sounder (GIRRS) which was launched by FengYun-4 (FY-4) [10,12].
Satellite-based infrared hyperspectral remote sensing has wide space coverage and high horizontal resolution, but it is significantly affected near the surface. Contrarily, the ground-based sounder receives the atmosphere downward radiation that is less affected by the surface. In 1988, at the Ground-based Atmospheric Profiling Experiment (GAPEX), Smith and Frey placed a High-resolution Interferometer Sounder (HIS) on the ground to observe the sky, which revealed the feasibility of ground-based sounder to detect the temperature and humidity of the lower atmosphere [13,14]. On the basis of HIS, the first Atmospheric Emitted Radiance Interferometer (AERI) instrument was designed and fabricated to measure the atmospheric downward infrared radiation on the earth’s surface with high absolute accuracy in 1989 [15].
At present, AERI radiation observations have been used in a wide range of scientific applications including improving infrared radiative transfer models [16,17], measuring the boundary layer height [18], detecting characteristics of aerosols and dust in troposphere [19], and retrieving the content of ozone and other trace gases [20,21]. Moreover, it can also carry out cloud remote sensing including cloud phase [22], cloud-based height [23], effective cloud emissivity [24], and cloud microphysical parameters [25,26]. It is worth noting that the AERI observations are sensitive to the evolution of the thermodynamic profile in the boundary layer below 3 km [27]. Therefore, AERI can detect the temperature and humidity profiles near the ground by observing the radiation spectrum [28,29], and the corresponding value-added product which AERIPROF has formed [30]. Before these applications, cloud detection is necessary. However, cloud detection currently needs to be realized with the assistance of ceilometer [31] or lidar [32].
In fact, the atmospheric downward infrared hyperspectral radiation data contain information that can distinguish between cloudy and cloud-free conditions [33]. In recent years, based on typical cloud-free measurement data, Joon-Sik Cho et al. used ±60% of the 760–1000 cm−1 radiation data as the threshold to judge the cloudy condition of AERI measurement results [34]. However, this method had some limitations, because the absolute radiation value was prominently affected by the atmospheric conditions. They further selected the typical cloud-free data of four seasons as the cloud-free threshold, but still cannot completely solve this problem. Rizzi et al. studied the cloud detection method based on the bright temperature difference (BTD) of different channels obtained by the FTS REFIR-Prototype for Applications and Development (REFIR-PAD) at Dome C, Antarctica [35]. The above two methods only achieve the application of cloud detection at a single site, and the applicability to other sites has not been confirmed. In addition, there may be outliers in the ground-based infrared hyperspectral measurement radiation, which are difficult to be eliminated by the above methods. To alleviate those issues, our primary goal in this paper is to define a novel cloud detection method with high accuracy and high generality in different sites and which is less affected by precipitable water vapor (PWV). The cloud detection algorithm is complementary to the above methods. In this paper, we use the spectral features of AERI in a specific region to distinguish cloudy and clear-free spectra based on the support vector machine (SVM).
This paper is organized as follows. Section 2 proposes the features of outliers of AERI and establishes cloudy and cloud-free dataset. Section 3 describes the cloud detection method by SVM algorithm. Section 4 is the core of this study which evaluates the performance of the proposed method and gives uncertainty estimates. Finally, the conclusion and outlook are presented in Section 5.

2. Data Set

2.1. Instrument

The atmospheric downward infrared hyperspectral data is from AERI. It is a ground-based Fourier Transform Spectrometer (FTS) with 8 min of temporal resolution in normal sampling mode which is designed by the University of Wisconsin Space Science and Engineering Center (UW-SSEC) for the Department of Energy (DOE) Atmospheric Radiation Measurement (ARM) Program [36]. AERI has a spectral measurement range between 520 and 3300 cm−1 with a spectral resolution of 0.5 cm−1 and the field-of-view of this instrument is 1.3° [37]. The design of AERI system has enabled the ARM program to deploy it in all permanent sites, including the SGP site (36°36′N, 97°29′W) and the NSA site (71°19′N, 156°36′W) [38,39]. Additionally, it is also deployed in many mobile sites, for example the ARM West Antarctic Radiation Experiment Field Campaign taken place in 2016 (AWARE site, 77°51′S, 166°40′E) [18,19,40]. The data of this paper are collected at the ARM SGP site and NSA site from 1 January 2014 to 31 December 2017. The data of AWARE site in 2016 has also been applied.
Vaisala CL31 ceilometer is an all-weather automatic device with a 10 kHz pulsed indium gallium arsenide (InGaAs) diode laser. It transmits near-infrared pulses of light at 910 ± 10 nm with a pulse width of 100 ns [41]. Ceilometer receives the signal scattered by cloud, rain, dust, and fog, and it produces a 7700 m vertical backscattering profile with 10 m range gates every 16 s [42]. Over the whole measurement range, the signal is strong and stable to make good quality of the received signal [43].
Total Sky Imager Model (TSI-660) is an automatic daytime imager. It can provide time series of hemispheric sky images during daylight hours at 30 s sampling interval [44]. Particle size and velocity disdrometer (Parsivel2; manufactured by Ott Hydromet GmbH at Kempten, Germany) can directly measure the particle size and obtain the derived parameters such as radar reflectivity and particle number density with an interval of 1 min [45]. Microwave radiometer (MWR) Retrieval (MWRRET) value-added product can provide PWV data every 20 s [46]. Radiative Flux Analysis (RADFLUXANAL) Value-Added Product based on the longwave downward radiation [47] can provide cloud optical depths (COD) every 20 s during daytime.

2.2. Data Quality Control

The instrument may produce some outliers in the process of measurement. When AERI obtains outliers, the measurement cannot be used for cloud detection. According to the spectral features of the obvious outliers, we divide the outliers into four cases (see Figure 1). In Figure 1a, there are many negative radiations in the spectrum which may be a sign of measurement problems. In Figure 1b, the AERI spectrum is similar to smooth curve. The possible cause may be that the hatch of AERI is closed during precipitation events that makes sensors unable accept the downward infrared radiation data from the atmosphere [48]. When the temperature of ambient and hot blackbodies of AERI is unstable, the measurements may generate excessive noise [36,37] which is shown in Figure 1c,d. For the purpose of comparison, we show two typical ideal examples without noise in Figure 1e,f. Excessive noise can be eliminated by setting the threshold of standard deviation in specific spectral channels. Based on the features of outliers, five feature parameters are extracted as the conditions for judging outliers of AERI which are shown in Table 1.

2.3. Cloudy and Cloud-Free Data Set

After data quality control, the training set and testing set are established. The specific steps are as follows
  • After eliminating the outliers, we perform time-matching between the Vaisala CL31 ceilometer and AERI to obtain the sample set Q1. The observation period of AERI is 8 min, and the spectrum obtained by AERI can reflect the average state of the atmosphere or cloud in the previous 8 min. Therefore, we find the data of ceilometer within the time range of 8 min before the corresponding time of AERI.
  • We use the data obtained from ceilometer to define the measurement of AERI if cloudy or cloud-free. If more than 95% of the ceilometer data shows cloudy or cloud-free, the AERI data at the corresponding time is also judged cloudy or cloud-free to establish the sample set Q2.
  • The Q2 sample sets of 2014 at SGP site, 2014 at NSA site, and one-third of cloudy data (about 6000 groups) and cloud-free data (about 6000 groups) randomly selected from AWARE site in 2016 are taken as the training set, which are recorded as Q2_Train_SGP, Q2_Train_NSA, and Q2_Train_AWARE, respectively. To maintain the sample balance of the training set, we randomly selected 6000 groups of cloudy data and 6000 groups of cloud-free data from SGP site and NSA site and put them into the training set. Data from 2015 to 2017 at the SGP and NSA sites, and the rest of the data from the AWARE site, are used as the testing set. The data information is shown in Table 2.

3. Method

3.1. Features of Cloudy and Free-Free Conditions

The measurement data of Q2_Train_SGP, Q2_Train_NSA and Q2_Train_AWARE is selected to summarize the features in distinguishing between cloudy and cloud-free scene. Here, we take the cloud-free spectra at 1743 Coordinated Universal Time (UTC) on 21 January 2016 of AWARE site, 1006 UTC on 3 February 2014 of SGP site, and 2112 UTC on 13 January 2014 of NSA site, and the cloudy spectra at 0929 UTC on 21 January 2016 of AWARE site, 2006 UTC on 3 February 2014 of SGP site, and 1224 UTC on 13 January 2014 of NSA site as examples, as shown in Figure 2. The echo maps of ceilometer in Figure 2b,c are used to show the situations of sky at the corresponding times, which are given in Figure 2a. It can be seen that the cloudy spectrum corresponds to the time when the ceilometer shows strong backscattered signal, while the cloud-free spectrum corresponds to the time when the backscattering of ceilometer is weak. According to the two kinds of spectra, there are 12 features selected to distinguish cloudy or cloud-free scenes which are shown in Table 3 and Figure 2a.
However, the slope and intercept of fitting line in 780–920 cm−1 band radiation are particularly noteworthy. There are many peaks in this range, and the slope and intercept of the fitting line cannot directly reflect the cloud-free spectrum. The range is divided into eight small wavebands, and then we calculate the mean radiation values of these small bands. The final slope and intercept of the fitting line are obtained by fitting the average radiation of each small band. The eight small wavebands are as follows: 780–783 cm−1, 786–790 cm−1, 815–820 cm−1, 830–835 cm−1, 842–846 cm−1, 857–864 cm−1, 895–900 cm−1, and 915–920 cm−1 (Purple background parts in Figure 2a).
The average value ( u ) and standard deviation ( σ ) of the 12 features of the training set and testing set are further calculated. The features of the training set and testing set are normalized from 0 to 1 in Equation (1) as follows [49].
F n o r m i = ( F i u i ) / σ i
where F is the corresponding feature in Table 3 used for training, and F n o r m denotes the normalized feature. i is the serial number from 1 to 12.
In order to know how the single inputs influence accuracy of the classification, we use the random forest algorithm to calculate the “Gini importance” of each input in the training set to obtain the weights of different inputs for cloud detection which is shown in Figure 3 [50]. As can be seen from Figure 3, feature 1 and feature 2 have the largest weight among the 12 features used for classification. Therefore, the slope and intercept of 740–760 cm−1 band radiation play the most important role on the classification between cloudy and cloud-free samples. Furthermore, to improve the accuracy of the algorithm, reduce the risk of overfitting, and reduce the calculation time of the model, we need to determine the best input. In Section 3.3, according to the ranking of feature weights in Figure 3, we select 1 to 12 features as input to build the cloud detection model so as to determine the best input features.

3.2. Validation Methods

In this paper, we use 2 × 2 confusion matrix (Table 4) to analyze and calculate the testing results [51]. The percent correct (PC) describes the amount of agreement between the proposed algorithm and the ceilometer detection method, as follows.
P C = ( T P + T N ) / ( T P + F P + F N + T N ) × 100 %
True positive rate (TPR) can give the proportion of the cloudy data detected by AERI to the cloudy data detected by ceilometer. It can be used to describe the sensitivity of the proposed algorithm for cloudy data detection, as follows.
T P R = T P / ( T P + F N ) × 100 %
Similarly, true negative rate (TNR) can give the proportion of the cloud-free data detected by AERI to the cloud-free data detected by ceilometer. It can be used to describe the sensitivity of the proposed algorithm for cloud-free data detection, as follows.
T N R = T N / ( F P + T N ) × 100 %

3.3. Cloud Detection Algorithm

Using AERI measurement data for cloud detection can be regarded as a classification problem. In this paper, the SVM algorithm is applied. SVM is an effective tool to solve small sample, nonlinear, high dimensional problems. The core idea of SVM is to use the classification hyperplane as the basis of discrimination to achieve the maximum degree of classification [52]. The main process of establishing a cloud detection algorithm is as follows (Figure 4)
  • According to the ranking of feature weights in Figure 3, we select a specific number of features in order from 1 to 12. The reselected features are used as the input of SVM algorithm, while the cloud detection results of ceilometer are used as reference values.
  • For each of Q2_Train_AWARE, Q2_Train_SGP, and Q2_Train_NSA 1000 groups of cloudy data and 1000 groups of cloud-free data are randomly selected to form the training set (P1); the remaining data are used as the validation set (See Appendix A).
  • Substitute P1 into the SVM algorithm to build cloud detection model. The kernel parameter g and penalty factor C are obtained by using grid search method [53]. Furthermore, the radial basis function (RBF) kernel is applied. The range of C and g is 2−8–28, and the search step is 20.8. The cloud detection model is built according to each pair of g and C searched. Then, obtain the classification results of the validation set based on the cloud detection model.
  • Determine the best g and C to obtain the best cloud detection model. We calculate the accuracy of the validation set corresponding to every model. The mode with the maximum accuracy is considered as the best cloud detection model.
  • Judge whether the number of features in step 1 is greater than 12. If not, repeat steps 1, 2, 3, and 4.
  • We compare the detection models corresponding to different input features to further determine the final cloud detection model. The corresponding input features are used as the best training features. Table 5 displays the accuracy of the validation set (PC) and the ability in detecting cloudy data (TPR) and cloud-free data (TNR) of the cloud detection model corresponding to the number of features from 1 to 12, respectively. The PC of the model increases with the increase in the number of features. However, when the number of features is greater than 9, the accuracy of the model decreases slightly, which may be caused by the redundancy between features. At the same time, TPR and TNR are close to the maximum when the number of features is 9. This shows that the model has the best detection performance when the number of inputs is 9. Therefore, we finally selected the best training features of features 1, 2, 3, 4, 6, 7, 8, 9, and 10 in Table 3. The corresponding penalty factor C and kernel function g are 5.278 and 1.741, respectively.
  • Use the final cloud detection model to obtain the cloud detection results from testing set.

4. Results and Discussion

This section uses the cloud detection algorithm which is introduced in Section 3.3 to classify the testing set. Table 6 displays the percent correct, true positive rate, and true negative rate of the classification results between the cloud detection algorithm and the ceilometer detection method. The PC between our method and the ceilometer detection method is about 93%, and the performance of the proposed algorithm is excellent. It also displays high generality with similar high accuracy obtaining at SGP site, NSA site, and AWARE site. The TPR between these methods is more than 92%, which means the proposed method can distinguish more than 92% of cloudy event. Simultaneously, the proposed method can also detect about 92% of cloud-free events (TNR ≈ 92%). Nevertheless, the proposed algorithm misjudges in some cases. We further divide the samples with inconsistent classification between the proposed algorithm and ceilometer detection into two categories. Category (1): AERI detects cloud-free, but ceilometer detects cloudy. Category (2): AERI detects cloudy, but ceilometer detects cloud-free (Table 6). To better understand the performance of cloud detection based on AERI measurement data, we will run analyses in different precipitable water vapors (PWV), different cloud-base height ranges, and different CODs. At the end of this section, we also evaluate the accuracy of the proposed algorithm under the conditions of mist, thick fog, and blowing snow under clear sky.

4.1. Performance Evaluation in Different PWVs

Water vapor molecules have a strong attenuation effect on infrared radiation in the water vapor absorption band, which interferes with AERI measurement data on cloud detection [54]. Finding a cloud detection algorithm which is less affected by water vapor is also the purpose of this paper. Therefore, before selecting features for distinguishing cloudy or cloud-free cases, we list the water vapor absorption bands which are as follows: 538–588 cm−1 and 1250–1350 cm−1 [55]. We therefore avoid these channels in the selected features in Table 3. MWRRET Value-Added Product gives PWV data every 20 s, and we perform time-matching between AERI and this product. According to the distribution of PWV, PWV is divided into three ranges, PWV < 1, 1 ≤ PWV ≤ 2.5, and 2.5 < PWV. The proportion between the data in the selected range and the total data is given in parentheses in Table 7. Compared with the SGP site, PWV in polar regions is significantly lower, especially at the AWARE site in Antarctica in which the PWV is always less than 1 cm. The PC between the proposed algorithm and ceilometer detection method changes slightly under different PWVs, as shown in Table 7. Therefore, the cloud detection algorithm is not significantly affected by different PWVs and has a high recognition rate for clouds in high humidity or low humidity.

4.2. Performance Evaluation in Different Cloud-Base Height Ranges

Figure 5 displays the proportion of total data, the data classified consistently by the AERI and ceilometer (consistent data) and category (1) under different cloud-base height ranges at SGP site, NSA site, and AWARE site. In category (2), the ceilometer detects the data as cloud-free, which results in the distribution of cloud-base height of category (2) not being displayed in Figure 5. The NSA site and AWARE site are located in polar regions. Compared with the SGP site (mid-latitude region), their cloud is lower and the cloud-base height is basically distributed below 4000 m. The cloud-base height of SGP site is relatively evenly distributed at different heights, and only the proportion of clouds below 1000 m is slightly larger. However, the proportional distribution of category (1) is very different from the total data. The data of category (1) are mostly concentrated above 4000 m in the SGP site, while the proportion below 3000 m is less. At the NSA and AWARE sites, the proportion of high cloud in category (1) is also significantly higher than that of the total data. This therefore means that the detected effect of AERI for lower clouds is better than that for higher clouds. This is consistent with the statistical results in Figure 5d. The proposed algorithm has high recognition ability for clouds below 4000 m (PC is greater than 90%), but for clouds above 6000 m, PC decreases significantly. In addition, the recognition accuracy of the proposed algorithm for high clouds at SGP sites is higher than that in polar regions. The reason for the different recognition effect of AERI on clouds with different heights may be that different types of clouds have different dimensions, opacities, and other properties [56]. In Figure 6a, we take the SGP site as an example on 1 January 2015 to compare the spectra corresponding to different cloud-base heights. The spectra with cloud-base heights at 1501 m (0357 UTC), 5685 m (0055 UTC), and 6514 m (0007 UTC), and the spectrum of cloud-free (0237 UTC) are selected. Figure 6b displays the backscattered echo of ceilometer at the time corresponding to the spectra of Figure 6a. The atmospheric downward infrared hyperspectral radiation is significantly strong under low cloud. Contrarily, the downward radiation is small under high cloud, which almost coincides with the cloud-free scene. Therefore, the recognition data of AERI for high clouds is reduced.

4.3. Performance Evaluation in Different CODs

The temporal resolution of CODs from RADFLUXANAL Value-Added Product is about 20 s, and we perform time-matching between AERI measurement data and CODs. As shown in Figure 7, it can be seen that the distribution of total data and consistent data is basically the same, while the distribution of category (1) is basically concentrated during small COD. Compared with the SGP site, the COD in polar regions is smaller. At the AWARE site, in particular, the COD is no more than 10. Figure 7d displays that the proposed algorithm maintains a high accuracy at the AWARE site under any COD (PC is about 95%). At the SGP site, the proposed algorithm is significantly related to COD. When the COD is less than 5, PC will drop below 80%, making it difficult to realize accurate identification of cloud. According to Section 4.2 and Section 4.3, we find that optical thin clouds and high clouds will lead to the proposed algorithm misjudging cloudy as cloud-free at the SGP site. Although the proposed algorithm has good recognition ability for optical thin clouds in polar regions, it is not ideal for high clouds. The recognition performance of AERI for different type of clouds may be related to the proportion of different cloud-base heights and different CODs in the training set at different sites.

4.4. Case Studies

The above sections analyze the situation that the proposed algorithm does not detect the cloud. In fact, the proposed algorithm also recognizes some cloud-free scenes as cloudy. It may be closely related to the occurrence of fog, blowing snow (polar regions), and other phenomenon. The Antarctic Meteorological Research Center (AMRC) at the University of Wisconsin-Madison provides the weather phenomena recorded by human observers at McMurdo Station in three-hour periods [57]. Human observers reported mist, thick fog, and blowing snow in clear sky on 1900 UTC to 2100 UTC on 14 December, 1500 UTC to 1800 UTC on 9 October, and 2200 UTC to 2400 UTC on 8 April, respectively.

4.4.1. Mist: 14 December

On 14 December, there was mist with a thickness of nearly 200 m from 1900 UTC to 2100 UTC. The backscattering of ceilometer was relatively strong at the corresponding time. The TSI image reveals the situation of clouds in the sky, and we can see that it is cloud-free during that time. The proposed algorithm does not recognize mist as cloudy. Therefore, mist has little effect on the cloud detection accuracy of the proposed algorithm (Figure 8).

4.4.2. Thick Fog: 9 October

On 9 October, AWARE site was experiencing thick fog between 1500 UTC and 1800 UTC. The TSI image shows that it was cloud-free during this time period, but the visibility was very poor. The low-level backscattering of ceilometer was extremely strong, indicating that the particle number density of fog is large. Thick fog significantly increases the infrared radiation of the atmosphere received by AERI, which lead to the proposed algorithm judging the thick fog to be cloudy (Figure 9).

4.4.3. Blowing Snow in Clear Sky: 8 April

At about 1900 UTC on 8 April (black dotted line in Figure 10), the cloud detection result of ceilometer was cloud-free, but the proposed algorithm recognized it as cloudy. Human observers recorded blowing snow under clear sky during this time period. The ceilometer has strong backscattering at low level, and parsivel2 reveals that the particle number density is exceeding 26 m−3mm−1. The proposed algorithm misjudges the blowing snow layer near the ground as cloudy.

5. Conclusions

The purpose of this paper is to reveal the potential of the machine learning approach for a fast, universal, and automated cloud detection algorithm. We use AERI measurement data from the SGP site, NSA site, and AWARE site to build a cloud detection algorithm based on the support vector machine (SVM) algorithm, and we select the features except the water vapor sensitive channels to avoid the interference caused by water vapor. By testing the data of the ARM SGP site and NSA site from 2015 to 2017, and the rest of the data of AWARE site in 2016, the testing results are compared with the data measured by ceilometer. The percent correct between cloud detection algorithm and ceilometer detection is about 93% every year at each site. The accuracy of the cloud detection algorithm indicates that the cloudy and cloud-free samples can be detected accurately based on the selected features. Simultaneously, the proposed algorithm has high generality in detecting cloud scenes at different sites. However, the experimental results show that the ability of the proposed algorithm to detect high clouds and optical thin clouds at SGP sites decreases, while it struggles to accurately detect high clouds in polar regions. This may be related to the proportion of different clouds in the training set. The proposed algorithm can accurately distinguish whether it is cloudy or cloud-free in the presence of mist. However, if there is thick fog or blowing snow, the proposed algorithm cannot achieve cloud detection. In future research, we will appropriately increase the samples of high clouds and the optical thin clouds to improve the recognition rate of the proposed algorithm for high clouds and optical thin clouds. We will also compare the proposed algorithm with the cloud detection results of millimeter wave cloud radar to further verify the accuracy.

Author Contributions

Conceptualization, L.L. and J.Y.; methodology, L.L., J.Y. and S.L.; software, J.Y.; validation, L.L., J.Y., S.L., S.H. and Q.W.; writing—original draft preparation, L.L.; writing—review and editing, L.L.; and supervision, J.Y. and S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Key Research and Development Program of China (Grant 2021YFC2802501), the National Natural Science Foundation of China (Grant 41875025, 62105367 and 42175154) and the Hunan Provincial Natural Science Foundation of China (Grant 2021JJ10047 and 2020JJ4662).

Data Availability Statement

The data of ARM SGP site, NSA site, and AWARE site (including AERI, Vaisala CL31 ceilometer, TSI, etc.) can be downloaded at https://adc.arm.gov/discovery/#/ (accessed on 19 January 2022). Human observation data can be downloaded at https://minds.wisconsin.edu/handle/1793/79355 (accessed on 19 January 2022).

Acknowledgments

The authors would like to thank the U.S. Department of Energy (DOE) Atmospheric Radiation Measurement plan for providing the data of SGP site, NSA site and AWARE site, the Antarctic Meteorological Research Center (AMRC) and the McMurdo Station Meteorology Office for the meteorological data recorded by human observers.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Number of Samples Selected for Training

In order to determine the optimal number of samples for training in the training set, each of Q2_Train_AWARE, Q2_Train_SGP, and Q2_Train_NSA are, respectively, used to randomly select 200, 500, 1000, 2000, 3000, 4000, and 5000 groups of cloudy data and cloud-free data for training. According to the training features in Table 3, Table A1 reveals the classification results of testing set based on the proposed algorithm under different numbers of samples for training. It can be seen that when the number of training sets is small, the accuracy of the model will increase with the expansion of the training set. However, when the number of training samples reaches a certain number, the accuracy of the test set changes little. Considering the accuracy of the model and the time cost of training, 1000 groups of cloudy and cloud-free data from Q2_Train_AWARE, Q2_Train_SGP, and Q2_Train_NSA are selected for training (a total of 6000 groups).
Table A1. The classification results of testing set based on the proposed algorithm under different number of samples for training. The value below PC indicates the number of samples used for training.
Table A1. The classification results of testing set based on the proposed algorithm under different number of samples for training. The value below PC indicates the number of samples used for training.
SiteData SetPC (%)
1200
PC (%)
3000
PC (%)
6000
PC (%)
12,000
PC (%)
18,000
PC (%)
24,000
PC (%)
30,000
SGPQ2_Test2015_SGP87.1691.3094.5094.8394.9995.1395.15
Q2_Test2016_SGP85.9989.7593.2393.3693.5093.6193.62
Q2_Test2017_SGP86.8190.6694.0494.2594.3894.4694.49
NSAQ2_Test2015_NSA87.1190.3693.7094.3394.4594.4794.49
Q2_Test2016_NSA86.2789.4292.8193.3893.7493.7793.77
Q2_Test2017_NSA85.8689.0892.4992.9793.1793.2193.25
AWAREQ2_Test_AWARE86.3091.2793.9594.3594.6294.7294.74

References

  1. Goetz, A.F.H. Three decades of hyperspectral remote sensing of the Earth: A personal view. Remote Sens. Environ. 2009, 113, S5–S16. [Google Scholar] [CrossRef]
  2. Macdonald, J.S.; Ustin, S.L.; Schaepman, M.E. The contributions of Dr. Alexander F.H. Goetz to imaging spectrometry. Remote Sens. Environ. 2009, 113, S2–S4. [Google Scholar] [CrossRef]
  3. Govender, M.; Chetty, K.; Bulcock, H. A review of hyperspectral remote sensing and its application in vegetation and water resource studies. Water S A 2007, 33, 145–151. [Google Scholar] [CrossRef] [Green Version]
  4. Vorovencii, I. The hyperspectral sensors used in satellite and aerial remote sensing. Bull. Transilv. Univ. Bras. Ser. II 2009, 2, 51–56. [Google Scholar]
  5. Wang, L.; Han, Y.; Han; Jin, X.; Chen, Y.; Tremblay, D.A. Radiometric consistency assessment of hyperspectral infrared sounders. Atmos. Meas. Tech. 2015, 8, 4831–4844. [Google Scholar] [CrossRef] [Green Version]
  6. Shimoda, H.; Ogawa, T. Interferometric monitor for greenhouse gases (IMG). Adv. Space Res. 1994, 25, 937–946. [Google Scholar] [CrossRef]
  7. Chahine, M.T.; Pagano, T.S.; Aumann, H.H.; Atlas, R.; Granger, S. AIRS: Improving Weather Forecasting and Providing New Data on Greenhouse Gases. Bull. Am. Meteorol. Soc. 2006, 87, 911–926. [Google Scholar] [CrossRef] [Green Version]
  8. Tobin, D.C.; Revercomb, H.E.; Knuteson, R.O.; Lesht, B.M.; Strow, L.L.; Hannon, S.E.; Feltz, W.F.; Moy, L.A.; Fetzer, E.J.; Cress, T.S. Atmospheric Radiation Measurement site atmospheric state best estimates for Atmospheric Infrared Sounder temperature and water vapor retrieval validation. J. Geophys. Res. Atmos. 2006, 111. [Google Scholar] [CrossRef] [Green Version]
  9. Hilton, F.; Armante, R.; August, T.; Barnet, C.; Zhou, D. Hyperspectral earth observation from IASI. Bull. Am. Meteorol. Soc. 2012, 93, 347–370. [Google Scholar] [CrossRef]
  10. Menzel, W.P.; Schmit, T.J.; Zhang, P.; Li, J. Satellite-Based Atmospheric Infrared Sounder Development and Applications. Bull. Am. Meteorol. Soc. 2018, 99, 583–603. [Google Scholar] [CrossRef]
  11. Han, Y.; Revercomb, H.; Cromp, M.; Gu, D.; Johnson, D.; Mooney, D.; Scott, D.; Strow, L.; Bingham, G.; Borg, L.; et al. Suomi NPP CrIS measurements, sensor data record algorithm, calibration and validation activities, and record data quality. J. Geophys. Res. Atmos. 2013, 118, 12734–12748. [Google Scholar] [CrossRef]
  12. Guo, Q.; Lu, F.; Wei, C.; Zhang, Z.; Yang, J. Introducing the New Generation of Chinese Geostationary Weather Satellites, Fengyun-4. Bull. Am. Meteorol. Soc. 2017, 98, 1637–1658. [Google Scholar] [CrossRef]
  13. Smith, W.L.; Frey, R. On cloud altitude determinations from High Resolution Interferometer Sounder (HIS) observations. J. Appl. Meteorol. 1990, 29, 658–662. [Google Scholar] [CrossRef] [Green Version]
  14. Diak, G.R.; Scheuer, C.J.; Whipple, M.S.; Smith, W.L. Remote sensing of land-surface energy balance using data from the high-resolution interferometer sounder (HIS): A simulation study. Remote Sens. Environ. 1994, 48, 106–118. [Google Scholar] [CrossRef]
  15. Knuteson, R.O.; Revercomb, H.E.; Best, F.A.; Ciganovich, N.C.; Dedecker, R.G.; Dirkx, T.P.; Ellington, S.C.; Feltz, W.F.; Garcia, R.K.; Howell, H.B.; et al. Atmospheric emitted radiance interferometer. part I: Instrument design. J. Atmos. Ocean. Technol. 2004, 21, 1763–1776. [Google Scholar] [CrossRef]
  16. Tobin, D.C.; Best, F.A.; Brown, P.D.; Clough, S.A.; Dedecker, R.G.; Ellingson, R.G.; Garcia, R.K.; Howell, H.B.; Knuteson, R.O.; Mlawer, E.J.; et al. Downwelling spectral radiance observations at the SHEBA ice station: Water vapor continuum measurements from 17 to 26 μm. J. Geophys. Res. Atmos. 1999, 104, 2081–2092. [Google Scholar] [CrossRef]
  17. Turner, D.D.; Tobin, D.C.; Clough, S.A.; Brown, P.D.; Ellingson, R.G.; Mlawer, E.J.; Knuteson, R.O.; Revercomb, H.E.; Shippert, T.R.; Smith, W.L. The QME AERI LBLRTM: A Closure Experiment for Downwelling High Spectral Resolution Infrared Radiance. J. Atmos. Sci. 2004, 61, 2657–2675. [Google Scholar] [CrossRef]
  18. Ye, J.; Liu, L.; Wang, Q.; Hu, S.; Li, S. A Novel Machine Learning Algorithm for Planetary Boundary Layer Height Estimation Using AERI Measurement Data. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1002305. [Google Scholar] [CrossRef]
  19. Hansell, R.A.; Liou, K.N.; Ou, S.C.; Tsay, S.C.; Ji, Q.; Reid, J.S. Remote sensing of mineral dust aerosol using AERI during the UAE(2): A modeling and sensitivity study. J. Geophys. Res. Atmos. 2008, 113, D18202. [Google Scholar] [CrossRef]
  20. Yurganov, L.; Mcmillan, W.; Wilson, C.; Fischer, M.; Biraud, S.; Sweeney, C. Carbon monoxide mixing ratios over Oklahoma between 2002 and 2009 retrieved from Atmospheric Emitted Radiance Interferometer spectra. Atmos. Meas. Tech. 2010, 3, 1319–1331. [Google Scholar] [CrossRef] [Green Version]
  21. Spnkuch, D.; Dhler, W.; Güldner, J.; Schulz, E. Estimation of the amount of tropospheric ozone in a cloudy sky by ground-based Fourier-transform infrared emission spectroscopy. Appl. Opt. 1998, 37, 3133–3142. [Google Scholar] [CrossRef] [PubMed]
  22. Turner, D.D.; Ackerman, S.A.; Baum, B.A.; Revercomb, H.E.; Yang, P. Cloud Phase Determination Using Ground-Based AERI Observations at SHEBA. J. Appl. Meteorol. 2003, 42, 701–715. [Google Scholar] [CrossRef] [Green Version]
  23. Rowe, P.M.; Cox, C.J.; Walden, V.P. Toward autonomous surface-based infrared remote sensing of polar clouds: Cloud-height retrievals. Atmos. Meas. Tech. 2016, 9, 3641–3659. [Google Scholar] [CrossRef] [Green Version]
  24. Pan, L.J.; Lu, D.R. Cloud Base Height and Effective Cloud Emissivity Retrieval with Ground-Based Infrared Interferometer. Atmos. Ocean. Sci. Lett. 2012, 5, 439–444. [Google Scholar] [CrossRef] [Green Version]
  25. Turner, D.D. Ground-based infrared retrievals of optical depth, effective radius, and composition of airborne mineral dust above the Sahel. J. Geophys. Res. 2008, 113. [Google Scholar] [CrossRef] [Green Version]
  26. Garrett, T.J.; Zhao, C. Ground-based remote sensing of thin clouds in the Arctic. Atmos. Meas. Tech. 2013, 6, 1227–1243. [Google Scholar] [CrossRef] [Green Version]
  27. Feltz, W.F.; Smith, W.L.; Howell, H.B.; Knuteson, R.O.; Revercomb, H.E. Near-Continuous Profiling of Temperature, Moisture, and Atmospheric Stability Using the Atmospheric Emitted Radiance Interferometer (AERI). J. Appl. Meteorol. 2003, 42, 584–597. [Google Scholar] [CrossRef]
  28. Kang, S.H.; Goo, T.Y.; Ou, M.L. Improvement of AERIT/qRetrievals and Their Validation at Anmyeon-Do, South Korea. J. Atmos. Ocean. Technol. 2013, 30, 1433–1446. [Google Scholar] [CrossRef]
  29. Feltz, W.F.; Knuteson, R.O.; Smith, W.L. Meteorological applications of temperature and water vapor retrievals from the ground-based Atmospheric Emitted Radiance Interferometer (AERI). J. Appl. Meteorol. 1998, 37, 857–875. [Google Scholar] [CrossRef]
  30. Feltz, W.F.; Turner, D.D.; Howell, H.B.; Smith, W.L.; Knuteson, R.O.; Woolf, H.M.; Comstock, J.; Sivaraman, C.; Mahon, R.; Halter, T. Retrieving Temperature and Moisture Profiles from AERI Radiance Observations: AERIPROF Value-Added Product Technical Description. DOE/SC-ARM/TR-066.1. Available online: https://www.arm.gov/publications/tech_reports/doe-sc-arm-tr-066.1.pdf (accessed on 19 January 2022).
  31. Münkel, C.; Eresmaa, N.; Rasanen, J.; Karppinen, A. Retrieval of mixing height and dust concentration with lidar ceilometer. Bound. -Layer Meteorol. 2007, 124, 117–128. [Google Scholar] [CrossRef]
  32. Sassen, K. Advances in polarization diversity lidar for cloud remote sensing. Proc. IEEE 1994, 82, 1907–1914. [Google Scholar] [CrossRef]
  33. Li, J. Accounting for overlap of fractional cloud in infrared radiation. Q. J. R. Meteorol. Soc. 2000, 126, 3325–3342. [Google Scholar] [CrossRef]
  34. Cho, J.S.; Goo, T.Y.; Shin, J. Improvement of the Site-Specific Cloud Filtering Method Using AERI Spectrum at Anmyeon island, South Korea. In Proceedings of the 2015 NDACC-IRWG & TCCON Meeting, Toronto, ON, Canada, 8–12 June 2015. [Google Scholar]
  35. Rizzi, R.; Arosio, C.; Maestri, T.; Palchetti, L.; Bianchini, G.; Guasta, M.D. One year of downwelling spectral radiance measurements from 100 to 1400 cm−1 at Dome Concordia: Results in clear conditions. J. Geophys. Res. Atmos. 2016, 121, 10937–10953. [Google Scholar] [CrossRef] [Green Version]
  36. Knuteson, R.O.; Revercomb, H.E.; Best, F.A.; Ciganovich, N.C.; Dedecker, R.G.; Dirkx, T.P.; Ellington, S.C.; Feltz, W.F.; Garcia, R.K.; Howell, H.B.; et al. Atmospheric emitted radiance interferometer. part II: Instrument performance. J. Atmos. Ocean. Technol. 2004, 21, 1777–1789. [Google Scholar] [CrossRef]
  37. Demirgian, J.; Dedecker, R. Atmospheric Emitted Radiance Interferometer (AERI) Handbook; ARM TR-054; U.S. Dept. Energy: Washington, DC, USA, 2005. Available online: https://www.arm.gov/publications/tech_reports/handbooks/aeri_handbook.pdf (accessed on 19 January 2022).
  38. Stokes, G.M.; Schwartz, S.E. The Atmospheric Radiation Measurement (ARM) Program: Programmatic Background and Design of the Cloud and Radiation Test Bed. Bull. Am. Meteorol. Soc. 1994, 75, 1201–1221. [Google Scholar] [CrossRef]
  39. Turner, D.D.; Blumberg, W.G. Improvements to the AERIoe Thermodynamic Profile Retrieval Algorithm. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 12, 1339–1354. [Google Scholar] [CrossRef]
  40. Lubin, D.; Zhang, D.; Silber, I.; Scott, R.C.; Kalogeras, P.; Battaglia, A.; Bromwich, D.H.; Cadeddu, M.; Eloranta, E.; Fridlind, A.; et al. AWARE: The Atmospheric Radiation Measurement (ARM) West Antarctic Radiation Experiment. Bull. Am. Meteorol. Soc. 2020, 101, E1069–E1091. [Google Scholar] [CrossRef] [Green Version]
  41. Young, J.S.; Whiteman, C.D. Laser Ceilometer Investigation of Persistent Wintertime Cold-Air Pools in Utah’s Salt Lake Valley. J. Appl. Meteorol. Climatol. 2015, 54, 752–765. [Google Scholar] [CrossRef]
  42. Morris, V.R. Ceilometer Instrument Handbook; DOE/SC-ARM-TR-020; U.S. Dept. Energy: Washington, DC, USA, 2016. Available online: https://www.arm.gov/publications/tech_reports/handbooks/ceil_handbook.pdf (accessed on 19 January 2022).
  43. Martucci, G.; Milroy, C.; O’Dowd, C.D. Detection of Cloud-Base Height Using Jenoptik CHM15K and Vaisala CL31 Ceilometers. J. Atmos. Ocean. Technol. 2010, 27, 305–318. [Google Scholar] [CrossRef]
  44. Morris, V.R. Total Sky Imager (TSI) Handbook; ARM TR-017; U.S. Dept. Energy: Washington, DC, USA, 2005. Available online: https://www.arm.gov/publications/tech_reports/handbooks/tsi_handbook.pdf (accessed on 19 January 2022).
  45. Bartholomew, M.J. Laser Disdrometer Instrument Handbook; DOE/SC-ARM-TR-137; U.S. Dept. Energy: Washington, DC, USA, 2020. Available online: https://www.arm.gov/publications/tech_reports/handbooks/ldis_handbook.pdf (accessed on 19 January 2022).
  46. Gaustad, K.L.; Turner, D.D.; McFarlane, S.A. MWRRET Value-Added Product: The Retrieval of Liquid Water Path and Precipitable Water Vapor from Microwave Radiometer (MWR) Data Sets; DOE/SC-ARM/TR-081.2; U.S. Dept. Energy: Washington, DC, USA, 2011. Available online: https://www.arm.gov/publications/tech_reports/doe-sc-arm-tr-081.2.pdf (accessed on 19 January 2022).
  47. Riihimaki, L.D.; Gaustad, K.L.; Long, C.N. Radiative Flux Analysis (RADFLUXANAL) Value-Added Product: Retrieval of Clear-Sky Broadband Radiative Fluxes and Other Derived Values; DOE/SC-ARM-TR-228; U.S. Dept. Energy: Washington, DC, USA, 2019. Available online: https://www.arm.gov/publications/tech_reports/doe-sc-arm-tr-228.pdf (accessed on 19 January 2022).
  48. Mariani, Z.; Strong, K.; Palm, M.; Lindenmaier, R.; Adams, C.; Zhao, X.; Savastiouk, V.; McElroy, C.T.; Goutail, F.; Drummond, J.R. Year-round retrievals of trace gases in the Arctic using the Extended-range Atmospheric Emitted Radiance Interferometer. Atmos. Meas. Tech. 2013, 6, 1549–1565. [Google Scholar] [CrossRef] [Green Version]
  49. Zou, Z.X.; Yang, Y.M.; Fan, Z.Q.; Tang, H.M.; Zou, M.; Hu, X.L.; Xiong, C.R.; Ma, J.W. Suitability of data preprocessing methods for landslide displacement forecasting. Stoch. Environ. Res. Risk Assess. 2020, 34, 1105–1119. [Google Scholar] [CrossRef]
  50. Menze, B.H.; Kelm, M.B.; Masuch, R.; Himmelreich, U.; Bachert, P.; Petrich, W.; Hamprecht, F.A. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinform. 2009, 10, 213. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  51. Joro, S.; Hyvarinen, O.; Kotro, J. Comparison of Satellite Cloud Masks with Ceilometer Sky Conditions in Southern Finland. J. Appl. Meteorol. Climatol. 2010, 49, 2508–2526. [Google Scholar] [CrossRef]
  52. Li, Y.Z.; Xie, P.C.; Tang, Z.S.; Jiang, T.; Qi, P.H. SVM-Based Sea-Surface Small Target Detection: A False-Alarm-Rate-Controllable Approach. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1225–1229. [Google Scholar] [CrossRef] [Green Version]
  53. Wang, X.; Zhang, F.; Kung, H.T.; Johnson, V.C.; Latif, A. Extracting soil salinization information with a fractional-order filtering algorithm and grid-search support vector machine (GS-SVM) model. Int. J. Remote Sens. 2019, 41, 953–973. [Google Scholar] [CrossRef]
  54. Loehnert, U.; Turner, D.D.; Crewell, S. Ground-Based Temperature and Humidity Profiling Using Spectral Infrared and Microwave Observations. Part I: Simulated Retrieval Performance in Clear-Sky Conditions. J. Appl. Meteorol. Climatol. 2008, 48, 1017–1032. [Google Scholar] [CrossRef]
  55. Ou, S.C.; Chen, Y.; Liou, K.N.; Cosh, M.; Brutsaert, W. Satellite remote sensing of land surface temperatures: Application of the atmospheric correction method and split-window technique to data of ARM-SGP site. Int. J. Remote Sens. 2002, 23, 5177–5192. [Google Scholar] [CrossRef]
  56. Chang, F.L.; Coakley, J.A. Relationships between Marine Stratus Cloud Optical Depth and Temperature: Inferences from AVHRR Observations. J. Clim. 2007, 20, 2022–2036. [Google Scholar] [CrossRef] [Green Version]
  57. Loeb, N.A.; Kennedy, A. Blowing Snow at McMurdo Station, Antarctica During the AWARE Field Campaign: Surface and Ceilometer Observations. J. Geophys. Res. Atmos. 2021, 126, e2020JD033935. [Google Scholar] [CrossRef]
Figure 1. Blue lines indicate the obvious outliers of AERI (ad). Gray background parts represent the corresponding features in Table 1. Error-case1 can be found based on feature 5, error-case2 can be found based on feature 1 and feature 2, error-case3 can be found based on feature 1-4, and error-case4 can be found based on feature 3 and feature 4. Red lines show two typical ideal examples without noise (e,f).
Figure 1. Blue lines indicate the obvious outliers of AERI (ad). Gray background parts represent the corresponding features in Table 1. Error-case1 can be found based on feature 5, error-case2 can be found based on feature 1 and feature 2, error-case3 can be found based on feature 1-4, and error-case4 can be found based on feature 3 and feature 4. Red lines show two typical ideal examples without noise (e,f).
Remotesensing 14 02589 g001
Figure 2. (a) Comparison of typical cloud-free and cloudy spectra in the three sites. The background parts in (a) represent the corresponding features in Table 3. (b) Echo map of ceilometer on 21 January 2016 of AWARE site. (c) Echo map of ceilometer on 3 February 2014 of SGP site. (d) Echo map of ceilometer on 13 January 2014 of NSA site.
Figure 2. (a) Comparison of typical cloud-free and cloudy spectra in the three sites. The background parts in (a) represent the corresponding features in Table 3. (b) Echo map of ceilometer on 21 January 2016 of AWARE site. (c) Echo map of ceilometer on 3 February 2014 of SGP site. (d) Echo map of ceilometer on 13 January 2014 of NSA site.
Remotesensing 14 02589 g002
Figure 3. The weight of different features (F) in Table 3 for distinguishing cloudy data and cloud-free data in the training set.
Figure 3. The weight of different features (F) in Table 3 for distinguishing cloudy data and cloud-free data in the training set.
Remotesensing 14 02589 g003
Figure 4. Research process of the cloud detection algorithm based on AERI measurement data.
Figure 4. Research process of the cloud detection algorithm based on AERI measurement data.
Remotesensing 14 02589 g004
Figure 5. The proportion of total data, consistent data, and category (1) under different cloud-base heights of total testing set at (a) SGP site, (b) NSA site, and (c) AWARE site. (d) The PC of SGP site, NSA site, and AWARE site under different cloud-base heights.
Figure 5. The proportion of total data, consistent data, and category (1) under different cloud-base heights of total testing set at (a) SGP site, (b) NSA site, and (c) AWARE site. (d) The PC of SGP site, NSA site, and AWARE site under different cloud-base heights.
Remotesensing 14 02589 g005
Figure 6. (a) Measurement of AERI under different cloud-base heights. (b) Backscattered echo of ceilometer from 0000 UTC to 0500 UTC on 1 January 2015. Background parts represent the corresponding time of the spectra in (a).
Figure 6. (a) Measurement of AERI under different cloud-base heights. (b) Backscattered echo of ceilometer from 0000 UTC to 0500 UTC on 1 January 2015. Background parts represent the corresponding time of the spectra in (a).
Remotesensing 14 02589 g006
Figure 7. The proportion of total data, consistent data, and category (1) under different CODs of total testing set at (a) SGP site, (b) NSA site, and (c) AWARE site. (d) The PC of SGP site, NSA site, and AWARE site under different CODs.
Figure 7. The proportion of total data, consistent data, and category (1) under different CODs of total testing set at (a) SGP site, (b) NSA site, and (c) AWARE site. (d) The PC of SGP site, NSA site, and AWARE site under different CODs.
Remotesensing 14 02589 g007
Figure 8. (a) Backscattered echo of ceilometer from on 14 December 2016. Blue dotted line corresponds to 1900 UTC, red dotted line corresponds to 2000 UTC, and black dotted line corresponds to 2100 UTC. (b) TSI images at 1900 UTC, 2000 UTC, and 2100 UTC on 14 December 2016.
Figure 8. (a) Backscattered echo of ceilometer from on 14 December 2016. Blue dotted line corresponds to 1900 UTC, red dotted line corresponds to 2000 UTC, and black dotted line corresponds to 2100 UTC. (b) TSI images at 1900 UTC, 2000 UTC, and 2100 UTC on 14 December 2016.
Remotesensing 14 02589 g008
Figure 9. (a) Backscattered echo of ceilometer from on 9 October 2016. Blue dotted line corresponds to 1630 UTC, red dotted line corresponds to 1700 UTC, and black dotted line corresponds to 1800 UTC. (b) TSI images at 1630 UTC, 1700 UTC, and 1800 UTC on 9 October 2016.
Figure 9. (a) Backscattered echo of ceilometer from on 9 October 2016. Blue dotted line corresponds to 1630 UTC, red dotted line corresponds to 1700 UTC, and black dotted line corresponds to 1800 UTC. (b) TSI images at 1630 UTC, 1700 UTC, and 1800 UTC on 9 October 2016.
Remotesensing 14 02589 g009
Figure 10. (a) Backscattered echo map of ceilometer and (b) the average particle number density of parsivel2 on 8 April 2016. Black dotted line corresponds to 1900 UTC.
Figure 10. (a) Backscattered echo map of ceilometer and (b) the average particle number density of parsivel2 on 8 April 2016. Black dotted line corresponds to 1900 UTC.
Remotesensing 14 02589 g010
Table 1. Spectral features of obvious outliers corresponding to gray background parts in Figure 1. RU is the radiation unit (mWsr−1m−2 (cm−1)−1).
Table 1. Spectral features of obvious outliers corresponding to gray background parts in Figure 1. RU is the radiation unit (mWsr−1m−2 (cm−1)−1).
NumberFeatures
1The slope of 1000–1040 cm−1 band radiation is less than −0.2
2The intercept of 1000–1040 cm−1 band radiation is greater than 300 RU
3The standard deviation of 857–862 cm−1 band radiation is greater than 10 RU
4The standard deviation of 894–902 cm−1 band radiation is greater than 5 RU
5The number of the points with radiation less than 0 RU in 520–1800 cm−1 is more than 5
Table 2. Cloud detection database. The data in parentheses indicates 6000 groups of cloud-free samples and 6000 groups of cloudy samples of Q2_Train_SGP and Q2_Train_NSA for training after reselection.
Table 2. Cloud detection database. The data in parentheses indicates 6000 groups of cloud-free samples and 6000 groups of cloudy samples of Q2_Train_SGP and Q2_Train_NSA for training after reselection.
SiteData SetTotal SamplesCloud-Free SamplesCloudy Samples
SGPQ2_Train_SGP51,074 (12,000)35,497 (6000)15,577 (6000)
Q2_Test2015_SGP50,92135,34615,575
Q2_Test2016_SGP53,47838,28815,190
Q2_Test2017_SGP51,29135,92515,366
NSAQ2_Train_NSA34,829 (12,000)14,304 (6000)20,525 (6000)
Q2_Test2015_NSA37,11714,23222,885
Q2_Test2016_NSA31,84012,16019,680
Q2_Test2017_NSA32,17211,56820,604
AWAREQ2_Train_AWARE12,00060006000
Q2_Test_AWARE32,57317,73214,841
Table 3. Twelve selected features for distinguishing cloudy or cloud-free data. Features 1, 2, and 5–12 correspond to the black background parts of Figure 2a, while features 3 and feature 4 correspond to the purple background parts.
Table 3. Twelve selected features for distinguishing cloudy or cloud-free data. Features 1, 2, and 5–12 correspond to the black background parts of Figure 2a, while features 3 and feature 4 correspond to the purple background parts.
NumberFeatures
1The slope of 740–760 cm−1 band radiation
2The intercept of 740–760 cm−1 band radiation
3The slope of 780–920 cm−1 band radiation
4The intercept of 780–920 cm−1 band radiation
5The slope of 1000–1040 cm−1 band radiation
6The intercept of 1000–1040 cm−1 band radiation
7The slope of 1050–1070 cm−1 band radiation
8The ratio between 784.5 cm−1 band radiation and the average radiation in 781.5–782.5 cm−1 band
9The ratio between 791.5 cm−1 band radiation and the average radiation in 789.2–790.2 cm−1 band
10The ratio between 1174 cm−1 and 1170 cm−1 band radiation
11The ratio between 1187 cm−1 and 1185 cm−1 band radiation
12The ratio between 1198 cm−1 and 1195 cm−1 band radiation
Table 4. 2 × 2 confusion matrix of the comparison between our method and the ceilometer detection method. In this table, TP is the number of the events that the ceilometer reports cloudy and AERI reports cloudy. FP is the number of the events that ceilometer reports cloud-free but AERI reports cloudy. FN is the number of the events that ceilometer reports cloudy but AERI reports cloud-free. TN is the number of the events that ceilometer reports cloud-free and AERI reports cloud-free.
Table 4. 2 × 2 confusion matrix of the comparison between our method and the ceilometer detection method. In this table, TP is the number of the events that the ceilometer reports cloudy and AERI reports cloudy. FP is the number of the events that ceilometer reports cloud-free but AERI reports cloudy. FN is the number of the events that ceilometer reports cloudy but AERI reports cloud-free. TN is the number of the events that ceilometer reports cloud-free and AERI reports cloud-free.
Our Method
(AERI)
Ceilometer Detection
CloudyCloud-Free
CloudyTP (True Positive)FP (False Positive)
Cloud-freeFN (False Negative)TN (True Negative)
Table 5. The detection results of the validation set of the cloud detection model corresponding to the number of features from 1 to 12.
Table 5. The detection results of the validation set of the cloud detection model corresponding to the number of features from 1 to 12.
Number of FeaturesPC (%)TPR (%)TNR (%)
184.7883.1586.41
287.0287.6486.40
388.3989.1587.63
489.4189.9788.85
590.5491.9989.08
691.3692.6990.02
791.6293.0590.18
892.2693.5191.01
992.7993.5192.07
1092.7493.5591.92
1192.7293.4691.99
1292.7293.3792.06
Table 6. The classification results of testing set based on the proposed algorithm.
Table 6. The classification results of testing set based on the proposed algorithm.
SiteData SetTotal SamplesPC (%)TPR (%)TNR (%)Category (1)Category (2)
SGPQ2_Test2015_SGP50,92194.5092.7395.2811331667
Q2_Test2016_SGP53,47893.2391.9193.7512282392
Q2_Test2017_SGP51,29194.0492.6194.6511361922
NSAQ2_Test2015_NSA37,11793.7094.2292.8613221016
Q2_Test2016_NSA31,84092.8193.3891.881301987
Q2_Test2017_NSA32,17292.4992.9591.671452964
AWAREQ2_Test_AWARE32,57393.9594.7693.277781193
Table 7. Consistency between the cloud detection results derived by AERI and ceilometer at different sites under different PWV ranges. The percentage in parentheses indicates the proportion between the data in the selected PWV range and the total data in the corresponding years of different sites.
Table 7. Consistency between the cloud detection results derived by AERI and ceilometer at different sites under different PWV ranges. The percentage in parentheses indicates the proportion between the data in the selected PWV range and the total data in the corresponding years of different sites.
SiteDifferent PWV (cm)2015 PC (Proportion %)2016 PC (Proportion %)2017 PC (Proportion %)
SGP<196.22 (20.63%)93.22 (24.58%)95.78 (20.74 %)
1–2.593.51 (37.37%)94.07 (37.97%)94.48 (43.23%)
>2.594.43 (42.00%)92.11 (37.45%)92.80 (36.03%)
NSA<193.73 (66.83%)95.27 (70.22%)94.06 (62.46%)
1–2.597.76 (32.43%)95.01 (27.86%)97.99 (33.85%)
>2.597.66 (0.74%)95.09 (1.92%)97.91 (3.68%)
AWARE<1 95.09 (100%)
1–2.5NaN
>2.5NaN
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Liu, L.; Ye, J.; Li, S.; Hu, S.; Wang, Q. A Novel Machine Learning Algorithm for Cloud Detection Using AERI Measurement Data. Remote Sens. 2022, 14, 2589. https://doi.org/10.3390/rs14112589

AMA Style

Liu L, Ye J, Li S, Hu S, Wang Q. A Novel Machine Learning Algorithm for Cloud Detection Using AERI Measurement Data. Remote Sensing. 2022; 14(11):2589. https://doi.org/10.3390/rs14112589

Chicago/Turabian Style

Liu, Lei, Jin Ye, Shulei Li, Shuai Hu, and Qi Wang. 2022. "A Novel Machine Learning Algorithm for Cloud Detection Using AERI Measurement Data" Remote Sensing 14, no. 11: 2589. https://doi.org/10.3390/rs14112589

APA Style

Liu, L., Ye, J., Li, S., Hu, S., & Wang, Q. (2022). A Novel Machine Learning Algorithm for Cloud Detection Using AERI Measurement Data. Remote Sensing, 14(11), 2589. https://doi.org/10.3390/rs14112589

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop