3. Results and Discussion
The results of the operation on the MTMF values of the study region are as follows: the overall MTMF values can be divided into two normal distributions, and one normal distribution is generally considered to represent a geological phenomenon or a class of features. The cloud information in the gray-scale image after MTMF is enhanced, and other non-cloud information is weakened. Sub-distribution diagrams of the mixed monolithic sieving model are shown in
Figure 5.
The improved mixed monolithic sieving model uses the values obtained by MTMF as the basic data, and the model can be calculated in the MATLAB platform to obtain normal distribution diagrams (
Figure 5), while the basic parameters in the following tables (
Table 3,
Table 4,
Table 5 and
Table 6) can be obtained after calculating each sub-distribution in the MATLAB. To summarize, the gray-scale image obtained from the MTMF can be divided into two categories, i.e., clouds and non-clouds. Lower pixels in the MTMF gray-scale image indicate no target component, which is the corresponding background value, and pixels with higher MTMF values are considered to contain higher target components, i.e., the cloud information we want to extract. The following tables show the specific information of each sub-distribution in the four images in a quantitative way.
Table 3 shows the parameter values of each sub-distribution for the data of image A after processing by the improved mixed monolithic sieving model. Among them, the normal distribution on the left is normal distribution 1, and the normal distribution on the right is normal distribution 2. The mean value of the data within normal distribution 1 is 15,037.78, and the mean value of the data within normal distribution 2 is 28,165.49. The weights of the left and right sub-distributions are 0.75 and 0.25, respectively, indicating that the background information accounts for slightly more weight than the cloud information, which is caused by the fact that most clouds in the image are point-like clouds with small shapes. The standard deviations of normal distribution 1 and normal distribution 2 are 2715.31 and 3492.23, respectively. The standard deviation reflects the dispersion of the data involved in the mixed overall sieving relative to the overall data mean, which means that the dispersion of the data among individuals within the normal distribution 2 group is more discrete than that of normal distribution 1, and the data are not uniformly distributed, which is caused by the disorderly distribution of small point clouds in normal distribution 2. The thresholds of the two subnormal distributions corresponding to
Table 3 are from minimum to 22,971.68 and from 22,971.68 to maximum, where the threshold range is automatically obtained by the algorithm. The separation of the two sub-distributions is 1.78 and 0.89, respectively, indicating that the first sub-distribution is easier to separate from the overall data compared to the second one, and the extraction of cloud body information is relatively difficult. The overlap rates of normal distribution 1 and normal distribution 2 are 17.35% and 21.42%, respectively, which also confirms the cloud–snow confusion phenomenon.
The mean value in the table is the mean value of the data within each normal distribution. The mean values of the two sub-distributions are 15,680.72 and 26,966.67, respectively; the weight is a relative concept established based on a certain object, the weight of a certain indicator represents the importance of the indicator relative to the overall evaluation, and the two weight values are 0.69 and 0.31, respectively, so the background information containing snow and ice accounts for a larger proportion and clouds account for a relatively small proportion. The standard deviation is an indicator used to reflect the dispersion of a set of data relative to the mean. Large standard deviation indicates that a large number of data are different from the mean of the overall data; a small standard deviation indicates that only a small number of data are different from the mean of the overall data, and most of the remaining data are close to the mean. The standard deviations of normal distribution 1 and normal distribution 2 are 2803.01 and 7390.65, respectively, which means that the data of individuals in the second normal distribution group are more discrete than those in the first normal distribution group, and the data are not evenly distributed. The improved mixed monolithic sieving model can automatically derive the threshold range to automatically extract the feature body to be extracted, and the thresholds of the two normal distributions corresponding to the following table are: minimum to 20859.38 and 20859.38 to maximum, respectively. The separation is the degree of differentiation between the features to be extracted and the background information, and the separation of the two sub-distributions are 1.02 and 0.72, respectively, indicating that the second sub-distribution is easier to be separated from the overall data compared to the first one. The overlap rate refers to the ratio of the overlapped data in the sub-distribution to the overall data, and the ratio of the overlapped data in the two sub-distributions to the overall data is 61.18% and 30.79%, respectively.
Table 5 represents the parameter values of each sub-distribution of the improved mixed monolithic sieving model for image C. Among them, the mean distributions of the data for the two normal distributions are 22,526.57 and 31,229.21, the weight of normal distribution 1 is 0.55, and the weight of normal distribution 2 is 0.45, indicating that the weight of background information and cloud body information is similar, and the background information is slightly larger than the cloud body information. The standard deviations of normal distribution 1 and normal distribution 2 are 4491.07 and 6844.61, respectively, and the distribution of clouds in satellite images is discrete and irregular. The dispersion of the data involved in the overall sieving of the mixture is higher than the mean value of the overall data, which means that the dispersion of data among individuals in normal distribution 2 is more discrete than that of normal distribution 1, and the distribution of cloud information data is not uniform. This is due to the large differences in the morphological characteristics and spatial distribution of cloud bodies within normal distribution 2. The thresholds of the two sub-normal distributions corresponding to
Table 5 are minimum to 26,712.65 and 26,712.65 to maximum, where the threshold range is automatically given by the algorithm. The separation of the two sub-distributions is 0.56 and 0.33, respectively, indicating that it is relatively difficult to perform cloud information extraction in complex background areas. The overlap rates of normal distribution 1 and normal distribution 2 are 77.76% and 51.02%, respectively, and the superposition of clouds and snow in the image adds difficulty to the extraction of clouds in the complex background information.
The mean values of the data for each normal distribution in
Table 6 are 8761.75 and 16,423.35, respectively; the weight values of the two normal distributions are 0.36 and 0.64, respectively, because the northern part of image D contains a large number of thin clouds and the southern part has irregularly distributed punctiform clouds; the standard deviations of normal distribution 1 and normal distribution 2 are 9724.56 and 3352.91, respectively. This is due to the fact that the background information contains snow and ice in the high mountains, cloud separation in the complex terrain of the high mountains under snow and ice conditions, patchy snow areas with varying light and darkness due to light conditions and solar azimuth, frozen rivers with irregular and high reflectance on the subsurface, and the unstable mixture of influences within one hydrological year, such as snowfields with multiple snow/non-snow cycles in a hydrological year, that increase the difference between the background information and the average of the overall data. The improved mixed monolithic sieving model can automatically derive the threshold range to automatically extract the features to be extracted, and the thresholds of the two normal distributions corresponding to
Table 6 are: minimum to 9724.56 and 9724.56 to maximum, respectively. The separation is the degree of differentiation between the features to be extracted and the background information. The separation of the two sub-distributions is 1.53 and 1.46, respectively, indicating that the first sub-distribution is more easily separated from the overall data compared to the second one. The ratio of the overlapping data of the two sub-distributions to the overall data is 27.43% and 8.66%, respectively, which is beneficial to the subsequent cloud extraction in high mountainous and complex terrain areas.
In order to verify the effectiveness of this method, we compare the proposed approach with the ANN and Fmask approaches, which are two algorithms with good cloud extraction results [
7,
20].
Figure 6 shows the four regions of interest with their cloud extraction results. The first column in
Figure 6 displays the original images, which cover various surface environments, including bright backgrounds, mountains, ice, and snow. The second column of false color composite images helps the reader visually differentiate between snow/ice and clouds, with bands 7, 4, and 2 for Red, Green, and Blue, respectively. Columns 3–5 are the results of cloud extraction using ANN, Fmask, and the proposed method, where the yellow color represents the extracted cloud information and the black color represents the non-cloud subsurface area. We applied ANN and Fmask to ENVI 5.4 software here. The details are shown in the figure below.
By comparison, the ANN and Fmask methods seem to overestimate the number of clouds, as shown in
Figure 6, which also verifies the fact that images with snow and ice in the background are easily misclassified as clouds. Through visual recognition, it can be seen that the method proposed in this paper has a better effect on extracting cloud information and effectively reduces the problem of cloud and snow/ice confusion. The quantification results of the four complex background areas containing both clouds and snow are shown in the
Table 7. Three different accuracy assessments were used to assess the accuracy of the algorithm results.
The verification samples (reference mask) were obtained by the visual evaluation of the scene in Adobe Photoshop CS6. The results of the proposed method are better than those of the FMask algorithm and ANN method, especially for UA, which is significantly improved compared with the other two methods. This is because the proposed method effectively improves the phenomenon wherein snow and ice are misjudged as clouds, and reduces the occurrence of cloud and snow confusion. The proposed method achieves high precision in cloud detection in areas where it is difficult to extract cloud. The results of cloud screening using the extracted cloud-covered data indicated that PA exceeds 80%, UA exceeds 80%, and OA exceeds 90%. Therefore, this method can solve the difficult problem wherein background ice and snow are misjudged as clouds, and the problem of point clouds with very small morphology being missed, and it also has better results for cloud–snow separation in complex terrain areas.