1. Introduction
Detailed land use/land cover (LULC) information at global and regional scales is essential in many applications, such as natural resource protection, sustainable development, and climate change [
1,
2,
3]. Remote sensing data are widely used to obtain LULC information [
4]. However, for large areas with spatial heterogeneity, achieving high accuracy in LULC classifications with only satellite data is difficult [
5]. With the availability of other auxiliary datasets, researchers have attempted to improve the accuracy of LULC classification by combining satellite data and auxiliary datasets [
6].
The importance of combining auxiliary features (also known as auxiliary data or multi-source data) with remote sensing data to improve the classification accuracy at both regional and global scales has been a topic of interest for 40 years [
7,
8]. Studies have focused on the assumption that the distribution of vegetation is directly or indirectly related to natural factors [
9,
10]. Therefore, topography, soil type, and water source can be used as auxiliary features to explain the distribution of vegetation and improve the accuracy of land use classification [
11]. In addition to natural factors, remote sensing indices and time series of remote sensing images are also used as auxiliary features for LULC classification [
12]. Due to the complexity of the real ground surface, the accuracy of land-use types with various remote sensing indices is highly different [
13]. Therefore, multiple remote sensing indices are often used as auxiliary features to distinguish specific LULC types [
14]. A remote sensing time series (RSTS) includes long-term repeated observations of the same area [
15]. Such data can effectively explain the changes in spectral characteristics caused by different LULC types, or plant growing cycles (phenological characteristics). Therefore, RSTS can also effectively improve the accuracy of LULC classification [
16]. Spectral-temporal metrics, a commonly used RSTS, is often adopted as the main or auxiliary features in LULC classification [
17].
Although auxiliary features can improve the accuracy of LULC classification, there are two main reasons that restrict the wider use of open-source spatial datasets [
18]. The availability of free and open spatial datasets is limited. These datasets must be requested and obtained from multiple sources. Due to computing and storage limits, only a limited number of features can be used in classification models [
11]. Google Earth Engine (GEE) provides a powerful cloud-based platform for planetary-scale geospatial analysis that can directly call multi-petabyte satellite imagery and various types of geospatial datasets [
19]. GEE is widely used for high-precision land use classification at regional, national, and even global scales [
20].
Much previous research using auxiliary features to improve classification accuracy with medium-resolution satellite imagery has focused on pixel-based classification and less on object-based classification [
21,
22]. LULC classification can be divided into two types according to the classification objects: pixel- and object-based image analysis (OBIA) [
23]. LULC types are often difficult to separate spectrally due to low inter-class separability and high intra-class variability [
24]. In such cases, the auxiliary features can improve the accuracy of LULC classification because they can provide information beyond the spectrum [
25]. Nevertheless, comparative analysis of the differences and similarities of different auxiliary features to improve the classification accuracy between the pixel- and object-based classification methods call for careful study [
26]. To ensure that our experiments can be carried out in the lacking data area, we chose open-source datasets that can be directly used in GEE to calculate the auxiliary features, including remote sensing multi-indices, terrain characteristics, soil characteristics, distance to the water source, and spectral–temporal metrics. We then analyzed the performance of different auxiliary features in both pixel- and object-based LULC classifications.
The objectives of this study are to identify (1) which type of auxiliary features is most useful in pixel- and object-based model and (2) the similarities and differences in the various auxiliary features to improve the classification accuracy of the pixel- and object-based classification methods. To achieve our objectives, we examined the performance of 14 random forest (RF) classification models (seven pixel-based RF models and seven object-based RF models) based on six common types of auxiliary features. Our results provide insights for improving the classification accuracy in areas with strong landscape heterogeneity (similar to the Yangtze River Delta) and advice on selecting the optimal combination of auxiliary features to achieve high-precision LULC mapping.
3. Methods
Figure 3 illustrates the overall process of this experiment. First, with the support of multi-source LULC products, following the principles of “stable state” and “consistent classification” [
9], the reference data were generated and then divided into independent training and verification samples. We then segmented the pre-processed Landsat-8 OLI image, generated image objects, and obtained the feature value of each object. Third, we tested the performance and validated the classification accuracy of 14 classification models, including seven pixel-based methods and seven object-based methods. We then assessed the importance of different feature types for these models.
3.1. Field Data Collection and Sampling
A set number of training samples and verification samples are required for RF classification. The accuracy of the sample points directly affects the accuracy of the LULC classification results. Our existing land use survey data only included data from 2008 and 2010; there was also a lack of field survey data. Therefore, we collected new land use survey data. Current methods to increase the number of training data points mainly include field surveys and manual–visual interpretation. However, the use of both field surveys and manual–visual interpretation over large areas with large landscape heterogeneities requires manpower and is therefore highly expensive [
48]. To overcome these issues and generate highly reliable reference datasets, we improved the Yunfeng Hu sample point generation method [
49] and proposed a new scheme to obtain high-precision sample points. The proposed scheme has the following steps:
(1) Reclassify all LULC products, such as MCD12Q1, GlobCover2009, and Land survey data, of the study area into nine types (see
Table S1).
(2) Overlay the MCD12Q1 products of the study area from 2008 to 2015, and extract the relatively stable areas where the LULC types have not changed (see
Supplementary Materials Part 3).
(3) In a relatively stable area, 10,000 points are randomly generated for each LULC type.
(4) Overlay the analysis of the randomly generated points with the GlobCover2009 and Land survey data from 2008 to remove the points that are inconsistent in classification, and obtain consistent points under multi-source LULC products.
(5) The consistent points of multi-source LULC products are selected at 1500-m intervals for each LULC type to avoid spatial autocorrelation between sampling points. These selected points are applied for visual comparison and verification with Google Earth images and Land survey data from 2010. Finally, there are 18,878 sample points retained. We randomly select 70% of the sample points as the training samples, and use the rest of the sample points as the verification samples.
3.2. LULC Classification Methods
Random forest is a machine learning method proposed by Breiman [
50], which is a classifier based on a decision tree in which each tree contributes one vote, and the final classification or prediction results are obtained by voting [
51]. A large number of studies have shown that RF produces relatively high classification accuracy in LULC classification [
33,
52,
53]. At the same time, the RF method has the advantages of easy parameterization, as well as the ability to manage collinear features and high-dimensional data [
54]. We used the RF classification algorithm because it can be applied to both pixel- and object-based LULC classification, as well as being highly robust against overfitting and outliers [
55].
For the object-based classification model, in contrast to previous studies, we did not use texture features. The main reasons are as follows. First, our main remote sensing images for classification were Landsat-8 OLI data, which have medium-spatial resolution and texture features that are not observable at this spatial scale. Second, one of our research objectives was to determine the performance of auxiliary features in pixel- and object-based classification. In pixel-based classifications, texture features are not used. Therefore, for consistency, we also did not consider texture features in the object-based classification models.
To compare the pixel- and object-based classifications, we designed 14 RF classification models (M1–M7 are pixel-based RF classification methods, and M1’–M7’ are object-based RF classification methods) (see
Table 3). Models M1 and M1’ only include spectral features while M2 and M2’ include spectral features combined with multiple remote sensing indices. Models M3 and M3’ include spectral features combined with terrain features while M4 and M4’ combine the spectral features with the distance to the water source. Models M5 and M5’ also combine the spectral features with the soil characteristics while M6 and M6’ combine the spectral features with the phenological features. Finally, models M7 and M7’ include all of the features.
3.2.1. Pixel-Based RF Classification
The RF classification algorithm was implemented in the GEE platform. The training data were used for the RF classifier training, and verification data were used to evaluate the classification error. When using the RF models in GEE, two parameters must be set: the number of decision trees to create per class (numberoftrees) and the minimum size of a terminal (minleaf). The LULC classification was carried out using different values of numberoftrees and minleaf. The optimum parameters were decided based on the overall classification accuracy. For the numberoftrees parameter, we first began with 10 and increased it to 100 in steps of 10; starting from 100, we increased it to 1000 in steps of 100. For the minleaf parameter, we began with 5; in each step, we increased its value by 1 to reach 25. Finally, through repeated comparative experiments, we set numberoftrees to 100 and minleaf to 10 in all models.
The RF classification model is robust against high-dimensional and collinear data. However, feature reduction for removing redundant information can further improve the LULC classification accuracy of the RF model [
56]. Here, we used the Recursive Feature Elimination (RFE) method to remove redundant features and repeat the iterations for each group of features to obtain more stable results. The RFE method ranks the importance of all of the elements in the classification model. In each iteration, the highest-ranked element was retained and the least important element was eliminated. Then, the model was reconstructed and re-evaluated. As there is no direct RFE function in GEE, we developed an independent REF method, and, through manual trial and error, we obtained the optimal feature combination for each model (see
Supplementary Materials Part 5).
3.2.2. Object-Based Random Forest Classification
For object-based classification, we first performed remote sensing image segmentation. The purpose of image segmentation is to extract the image objects. These objects are as homogeneous as possible, such as contiguous cropland, lakes, and rivers. When segmenting images, we mainly emphasized the homogeneity of the spectrum. GEE mainly supports three remote sensing image segmentation algorithms, including K-means, G-means, and Simple Non-Iterative Clustering (SNIC) [
14]. We separately analyzed these three remote sensing image algorithms. The segmentation algorithm conducts the preliminary comparison experiments and initially calculates the accuracy of the LULC classification based on different image segmentation algorithms [
57]. The SNIC algorithm has the best LULC classification results because it can be controlled by user parameters. Finally, we selected the SNIC method as our research image segmentation algorithm. The SNIC algorithm is a superpixel (i.e., simplifies an image into small clusters of connected pixels called superpixels) boundary and image segmentation algorithm. The SNIC algorithm is non-iterative and enforces connectivity at the beginning of the algorithm. SNIC is also less memory-intensive, and faster than the other two algorithms while its segmentation accuracy can be easily controlled via setting its parameters.
The SNIC algorithm must be controlled by four user-defined parameters: compactness, connectivity, neighborhoodSize, and seeds [
58]. The parameter settings are based on repeated iterations and combined with visual evaluation. An initial value for the parameter seeds is provided by GEE [
59]. The seedGrid function generates the initial seeds. Then, it calculates the standard deviation and maximum spectral distance between the average value of the generated object and original image on the initial seeds. Finally, the seedGrid function reinserts the object with a larger spectral standard deviation or a larger maximum spectral distance. The seeds are superimposed with the original seeds to generate the final seeds and regenerate the image objects. The parameters of the SNIC were determined by repeated iterations. The compactness, connectivity and neighborhoodSize were set to 1, 8 and 256, respectively. Finally, 1,349,374 image objects were generated. The spectral standard deviation of all image objects and the original image were less than 0.25; the maximum spectral distance was less than 1 pixel (see
Supplementary Material Table S4).
After completing SNIC, we then obtained the spectral features of all segmented objects in the study area and the average value of all other auxiliary features. We then used the average value within the object combined with the sample points to generate a training model. Similar to parameter settings of pixel-based RF models, we also used the stepwise increment method for parameter settings of object-based RF models with the numberOfTrees were set to be 100 and the minLeafPopulation was set to be 10. The RFE method was used to obtain the optimal feature combination for all object-based classification models.
3.2.3. Classification Accuracy Assessment and Statistical Comparison
Existing studies show that slightly different classification results are produced if the same RF model is iterated using the same classification parameters and input data [
34]. To overcome the instability in the LULC classification results, we carried out each classification model 49 times, and then merged the classification results according to the “minority obeys the majority” principle to generate the final LULC classification results.
Here, we applied the training sample points and verification sample points to calculate the corresponding confusion matrix. To quantitatively analyze the accuracy of the LULC classification, we then obtained the producer’s accuracy (PA), user’s accuracy (UA), overall accuracy (OA), and kappa coefficient (by calculating the confusion matrix).
In the pixel-based classification methods, the confusion matrices were calculated based on the number of pixels. In the object-based classification methods, the confusion matrix can be obtained based on either the number of objects [
60] or the area of the object [
61]. To compare and analyze the difference in the performance of the auxiliary features between the pixel- and object-based classification methods, we selected methods that are as similar as possible to generate a confusion matrix. To be consistent with pixel-based classification, each object was treated as an element in the object-based method and a confusion matrix was generated according to the number of the elements.
3.3. Feature Importance Comparison
In this study, the feature importance measurement of the RF classification models mainly included the following two aspects: Statistical Machine Intelligence and Learning Engine (SMILE) and Mean Decrease in Accuracy (MDA) [
62]. SMILE, as a feature importance measurement method, can be directly used on GEE. However, previous studies have shown that the application of SMILE to measure the importance of elements in the RF model requires a more balanced distribution among the reference data [
63]. In generating the reference data, we considered the actual situation between the LULC types. As a result, our reference data were more unbalanced. Therefore, using SMILE may lead to incorrect results. MDA quantifies the importance of a variable by measuring the change in the prediction accuracy when the values of the variable are randomly permuted compared to the original observations [
62]. To ensure the accuracy of the features importance analysis, we used the MDA method as its accuracy is not affected by the reference data [
64]. First, we applied the RFE method to remove the least important features according to the OA. The classification was then continuously performed until its OA reached the highest value. Second, we analyzed the importance of each feature for every classification model based on SMILE. Finally, we applied the MDA method to further calculate the feature importance of each model after optimization. Some features did not show important features at the model level because these features may only have a substantial impact on specific LULC types. To achieve the highest overall classification accuracy, we further tested all of the features to ensure that the final selected features obtained the highest OA for all LULC types.
4. Results
4.1. Pixel- and Object-Based Feature Importance Comparison
Figure 4 and
Figure 5 show the top ten most important features of each model. As the main goal was the performance of the different auxiliary data types in the pixel- and object-based classification models, the most important features of the base models (M1 and M1’) were not relevant and are not provided here. Furthermore, our spectral–temporal metrics were considered as a whole, such that the most important features of the M6 and M6’ models were also not relevant and are not provided here.
In M2, except for NIR_MD (median of NIR) and PC3_MD (median of the third principal components), the rest of the features were remote sensing indices. In M3, the median elevation was considered the most important, and the median slope and aspect ranked 7th and 8th in the importance list, respectively. In M4, the distance to the water source was ranked 7th, i.e., not particularly important. In M5, the median of the topsoil pH (2nd, rank of the feature importance), the median of the topsoil sand content (8th), and the soil texture class (10th) were all in the top ten. Finally, in M7, including all of the features, the auxiliary features dominated the top ten. Among them, the median elevation was the most important, closely followed by the maximum of the TCA (TCA_Max) and Soil Bulk Density.
Different from M2, the remote sensing index in the M2’ model only included the median of the TCB (4th), the median of the TCW (7th), the maximum value of the NBR (8th), and the maximum value of the TCG (10th). Similar to M3, the median elevation was also considered the most important feature in M3’ while the median slope and aspect were respectively the 5th and 4th most important features. The difference between M4 and M4’ was that the distance to the water source was considered the most important in M4’. Similar to M5, the median of the topsoil pH (1st), the median of the soil texture class (5th), and the median of the topsoil sand content (9th) were in the top ten for M5′. Finally, in M7’, the auxiliary features dominated the top ten. Among them, the maximum of the TCA was the most important while the elevation was only the 4th most important.
Comparing
Figure 4 and
Figure 5, we can observe that the importance of the same auxiliary features in the pixel- and object-based classification models is not the same. The pixel- and object-based classification models are still highly similar. Except for the differences in the top ten important features for the multiple remote sensing index features, the top ten important features for the other models are highly similar.
4.2. Different Types of Auxiliary Features Improve Classification Accuracy Assessment Comparison
To analyze the performance of the different auxiliary features both in the pixel- and object-based classification, the optimum feature composition classification model was used and applied in the study area. On this basis,
Table 4 lists the pixel-based classification confusion matrix and
Table 5 lists that for the object-based method.
Tables S2 and S3 listed the detailed confusion matrices.
From
Table 4, we can observe that the introduction of any type of auxiliary features improves the OAs. The classification model using the spectral features (M1) was used as the baseline model, and the OA value was the lowest at 91.51%. The OA of the pixel-based model (M7) by integrating all of the features was the highest, improving the OA by approximately 2.45%. For a single type of auxiliary feature, the soil features (M5) had the greatest improvement in the OA, i.e., more than 2%. The phenological features (M6), multi-remote sensing index features (M2), and topographic features (M4) had similar improvement effects, where the OA increased by approximately 1.2%. The distance to water bodies (M2) had the smallest effect, where the OA increased by less than 1%.
Similar to the pixel-based method, we analyzed the performance of the different auxiliary features in the object-based models.
Table 5 lists the OAs of the object-based models (see
Table S3 for details). This clearly shows that all of the auxiliary features can improve the classification accuracy in the object-based classification models. The spectral features model (M1′), as the baseline model, was the lowest with a classification accuracy of 94.03%. The model (M7′) that combines all of the auxiliary features was the best, achieving 96.01%. The OA of M7′ increased by approximately 2%. The main reason why the OA (M7’) did not improve as much as the pixel-based models (M7) is that the baseline model (M1’) has a higher classification accuracy. The classification model based on the terrain features (M3’) was the second-best, where the OA increased by approximately 1.7%. The OA of the soil features (M5’) and phenological features (M6’) increased by approximately 1%. The other auxiliary features had a small improvement in the OA by less than 1%.
In general, our results indicate that the use of free open-access global-scale auxiliary features in GEE can obviously improve the overall classification accuracy of areas with high landscape heterogeneity in both pixel- and object-based models.
4.3. Pixel-Based and Object-Based Classification Results Comparison in Different Terrain Area
To compare the details of the pixel- and object-based classification maps, we selected two typical regions as case areas (see
Figure 6,
Figure 7 and
Figure 8). The typical region in
Figure 6 and
Figure 7 is located in Jiangsu Province. The main landform in this region is plain and the main LULC types include cropland, built-up land, water bodies, and some forestland. This region contains a large number of mixed LULC types, resulting in unclear image segmentation boundaries. This also results in different spectral characteristics for the same LULC type. Therefore, we also performed a visual comparison analysis for this region. The case area selected in
Figure 8 is located in Zhejiang province, which is dominated by mountainous terrain. The main LULC types include forest and croplands, as well as some built-up land and water bodies.
Figure 6 shows the results of the LULC maps generated by the pixel-based models. The LULC results using the pixel-based models achieved similar results for the plain areas. However, the LULC maps generated by the pixel-based classification models have some “salt-pepper noise”. Furthermore, these pixel-based models are often misclassified in mountainous areas and their shadow areas, or areas where water bodies are mixed with other LULC types. First, the M7 model applied all of the features, showing the best accuracy. Second, M2 and M6 applied the phenological features and multiple remote sensing index features, respectively; the accuracy of these two models was relatively lower than M7. Third, the accuracy of M5, which considered soil features, was lower than M2, M6, and M7. The low accuracy may be caused by the low spatial resolution of the soil features. Finally, M3, based on topographical features, had the worst performance. The selected typical region was plain, with small changes in the topography. Therefore, M3 cannot easily improve its classification accuracy significantly.
Figure 7 shows the object-based classification results. The object-based classification methods can reduce the “salt-pepper noise,” thus yielding higher OAs. In general, the object-based classification models achieved better classification results. First, similar to the pixel-based classification result, the best classification result was still based on all of the features. The effect of the phenological characteristics and the multi-remote sensing index was second. Moreover, the effect of terrain and soil characteristics is unclear.
Although the OA of M1’ is better than that of M1, this model was unable to clearly express the correct spatial distribution details of the ground objects. In cases where only spectral features are used for medium-resolution remote sensing images, an object-based classification model may not be suitable. This is because, based only on spectral features, the average value of the image object is used instead of each pixel value, which results in a decrease in the features’ differences. With the addition of other features, the difference between the features then increases and fine classification can be effectively carried out. Therefore, other types of auxiliary features should be added to the object-based classification at a medium-resolution. This not only improves the overall classification accuracy, but also shows the true spatial details.
In the plain areas, the results of the pixel- and object-based classification models were relatively similar. To highlight the differences in the classification results, we selected mountainous regions as a typical area for comparison (see
Figures S1 and S2 for the complete classification results).
Figure 8 compares selections of the classification results for models M7 (pixel-based) and M7’ (object-based) with the highest OAs. The classification result of M7 includes partial “salt-pepper noise”. Furthermore, M7’ effectively reduces the impact of the “salt-pepper noise”. Therefore, the object-based classification results are generally better than pixel-based classification results.
Figure 8 shows that the pixel-based classification result is similar to the object-based classification result. The A1 region in
Figure 8 is a transition area from mountain to plain, and A2 is a typical mountainous area. The classification results of these two positions reveal that the pixel-based classification result contains some image spots while the object-based classification result contains fewer spots. In the A1 region, the pixel-based classification result contains many small forests while the object-based classification result contains relatively few. In the A2 region, the pixel-based classification result contains abundant croplands while the object-based classification results contain significantly less cropland. In the A1 (A2) region, cropland counted for 81.39% (20.37%) and 90.75% (9.8%) by pixel- and object-based models, respectively. In general, different from the pixel-based classification method, the object-based classification method generates a cleaner LULC map. The object-based classification method result is more similar to the reference data, although it does not detect small objects, such as croplands in the A1 region. The object-based classification results are generally satisfactory, but the main disadvantage is that the boundaries of the image objects affect the classification results. There is also no clear difference between the boundaries of the ground objects, which limits the fineness of the classification results.
6. Conclusions
In this study, we examined the effect that six auxiliary features in GEE had on accuracy improvements of 14 RF classification models (seven pixel-based RF models and seven object-based RF models); the main conclusions are:
(1) Auxiliary features, such as multiple remote sensing indices, topographic features, soil features, distance to the water sources, and phenological features, can improve the OA in heterogeneous landscapes. Landsat-8 OLI remote sensing image data were combined with the various auxiliary features used in this study, and we showed that they effectively improve the accuracy of LULC classification. The OA of the pixel-based (object-based) method increased from 91.51 to 94.20% (94.03 to 96.01%).
(2) The performance of the auxiliary features was not the same between pixel- and object-based models. In pixel-based models, soil features had the best effect on improving the classification accuracy. However, in object-based models, topographic features performed best. In the classification model combining all of the features, the topographic features had the greatest effect on improving the classification accuracy in both the pixel- and object-based models.
(3) We further found that when only using spectral data, the object-based classification method achieved higher OA and was unable to show small objects. Therefore, when object-based classification models are applied to medium resolution remote sensing images (such as Landsat data), other types of auxiliary features should be used.
The auxiliary data used in GEE in this study reflects a significant potential for improving the accuracy of LULC classifications in large areas with highly heterogeneous landscapes. However, we note that the main factors vary in different terrain areas. In the study, we used the YRD as a whole, and did not subdivide the YRD based on the terrain. In future studies, we will further investigate the similarities and differences in the optimal combination of auxiliary features under different topographic conditions.