Next Article in Journal
Validation of Gross Primary Production Estimated by Remote Sensing for the Ecosystems of Doñana National Park through Improvements in Light Use Efficiency Estimation
Next Article in Special Issue
Uncovering a Seismogenic Fault in Southern Iran through Co-Seismic Deformation of the Mw 6.1 Doublet Earthquake of 14 November 2021
Previous Article in Journal
Influence of Image Compositing and Multisource Data Fusion on Multitemporal Land Cover Mapping of Two Philippine Watersheds
Previous Article in Special Issue
InSAR-DEM Block Adjustment Model for Upcoming BIOMASS Mission: Considering Atmospheric Effects
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Aquaculture Ponds Identification Based on Multi-Feature Combination Strategy and Machine Learning from Landsat-5/8 in a Typical Inland Lake of China

1
Key Laboratory of Land and Sea Ecological Governance and Systematic Regulation, Shandong Academy for Environmental Planning, Jinan 250101, China
2
Academician Workstation for Big Data in Ecology and Environment, Environment Research Institute, Shandong University, Qingdao 266003, China
3
Center for Geodata and Analysis, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Remote Sens. 2024, 16(12), 2168; https://doi.org/10.3390/rs16122168
Submission received: 29 April 2024 / Revised: 12 June 2024 / Accepted: 13 June 2024 / Published: 15 June 2024
(This article belongs to the Special Issue Remote Sensing for Geology and Mapping)

Abstract

:
Inland aquaculture ponds, as an important land use type, have brought great economic benefits to local people but at the same time have caused many environmental problems threatening regional ecology security. Therefore, understanding the spatiotemporal pattern of aquaculture ponds and its potential influence on water quality is vital for the sustainable development of inland lakes. In this study, based on Landsat5/8 images, three types of land features, namely spectral features, index features, and texture features, and five machine learning algorithms, namely random forest (RF), extreme gradient boosting (XGBoost), artificial neural network (ANN), k-nearest neighbor (KNN), and Gaussian naive Bayes (GNB), were combined to identify aquaculture ponds and some other primary land use types around a typical inland lake of China. The results demonstrated that the XGBoost algorithm that integrated the three features performed the best among all groups of the five machine learning algorithms and the three features, with an overall accuracy of up to 96.15%. In particular, the texture features provided additional useful information besides the spectral features to allow more accurately separation of aquaculture ponds from other land use types and thus improve the land use mapping ability in complex inland lakes. Next, this study examined the tendency of aquaculture ponds and found a segmented increase mode, namely sharp increase during 1984–2003 and then slow elevation since 2003. Further positive correlation detected between the area of aquaculture ponds and the phytoplankton population dynamics suggest a likely influence of aquaculture activity on the lake water quality. This study provides an important scientific basis for the sustainable management and ecological protection of inland lakes.

1. Introduction

Land use and cover change (LUCC) is currently a hot area of global environmental change, which closely links human society and natural ecological processes and has a profound impact on human survival and development [1,2]. Aquaculture ponds, one of the important land use types, serve as one of the main sources of animal protein and are increasingly contributing to food security in Asia’s populous inland cities. Specifically, China’s aquaculture production accounted for approximately 60% of the world total until 2020 [3,4]. As the size of ponds has increased considerably, intensive aquaculture has caused serious destructive effects on local environments, such as the decrease in water quality, the decline in biodiversity, and the loss of services provided by aquatic ecosystems [5,6,7]. Therefore, understanding the expansion pattern of aquaculture ponds in inland lakes and its influences on local environments is of great importance to the healthy development of human–natural ecosystems.
In previous studies, scholars have devoted themselves to improving the accuracy of land use classification from two aspects, namely, the incorporation of multi-source features and the development of new classification algorithms. Multi-source features, mainly including band reflectance, remote sensing indices, and texture characteristics, and the rich fusion of these features can provide comprehensive surface information and thus enhance the ability of land use mapping. For example, Chen et al. (2017) [8] jointly employed Landsat-8 OLI, MODIS, HJ-1A, and ASTER DEM data to perform land cover classification in Beijing by integrating temporal, spectral, angular, and topographic information, which achieved a 4.53% higher overall accuracy (OA) than using only OLI data. Li et al. (2023) [9] deeply explored the scaling effect of image spatial resolution on land cover classification from the perspectives of hybrid image element decomposition and spatial heterogeneity based on GF-2, SPOT-6, Sentinel-2, and Landsat-8, and showed that GF-2 and SPOT-6 had the best classification performance with an OA of up to 92.81%. In addition, Feyisa et al. (2014) [10] proposed the innovative Automatic Water Extraction Index (AWEI), which improved the classification accuracy (kappa = 0.98) of shaded and dark surface areas that are usually difficult to classify correctly by normal methods in New Zealand. Huang et al. (2015) [11] successfully integrated texture features and DEM data using the BP artificial neural network and gained high accuracy in remote sensing image classification and land use change detection (OA = 95.08%).
In terms of algorithms applied to land use classification, traditional classification models such as the maximum likelihood method [12], the K-means method [13,14], and the k-nearest neighbor algorithm (KNN) [15] dominated early research. However, with the rapid development of pattern recognition and machine learning, some intelligent algorithms such as support vector machine (SVM) [16] and neural networks [17] have gradually come to the forefront, presenting higher accuracy and effectiveness than the traditional parametric methods in land use classification. Tree-based models, especially those equipped with learning methods such as random forest (RF) [18] and Extreme Gradient Boosting Tree (XGBoost) [19], have attracted widespread attention for their excellent performance and ease of use. In addition, feature selection plays a key role in improving classification accuracy [20]. By removing irrelevant or redundant features, model performance could be largely optimized without losing important information. It has been proven that proper feature selection has a significant impact on the final classification accuracy [21].
Inland lake aquaculture ponds are often overlooked or not included in the existing land use classification system. This is mainly due to the special nature and complexity of lake aquaculture ponds, which make them difficult to clearly delineate with traditional land use types. As a way of utilizing waters, lake aquaculture ponds have their own unique functions and characteristics, which are different from general land use types such as water bodies or agricultural land. Specifically, aquaculture ponds often appear as regularly shaped, isolated, and enclosed bodies of water [22,23]. The water quality of ponds is highly affected by aquaculture activities, such as feed delivery and fish excretion, which lead to a significant increase in the material circulation efficiency in the pond [24]. Therefore, a lot of aquaculture ponds are easily covered by some plant vegetation and phytoplankton in the growing season as an abundant nutrient supplement, which makes it hard to distinguish them from land vegetation.
Traditional land use classification mainly focuses on distinguishable land utilization modes, such as water bodies, agriculture, forests, and built-up areas, while it lacks a finer categorization of water use modes in the land–water transition zone. Normally, in the existing land classification system, lake aquaculture ponds are often categorized as an unspecified type or ignored. This fails to meet the needs of the scientific protection and sustainable development of lake resources. To accurately evaluate the intensity of aquaculture ponds, and to detect and quantify the distribution and change trends of aquaculture ponds, we selected a typical inland lake with a long history of fish pond culture in northern China to (1) implement different classifiers based on multilevel feature fusion for LUCC mapping and change detection for the selected 10 years of Landsat data from 1984 to 2022 and (2) explore the spatiotemporal pattern of aquaculture ponds and other associated land use types and their potential impacts on local water quality.

2. Materials and Methods

2.1. Study Area

Nansi Lake (116°34′–117°21′E, 34°27′–35°20′N) is one of the most important freshwater lakes in North China (Figure 1), which is not only the main fishery base of Shandong Province but also a critical intermediate lake on the east route of the South-to-North Water Transfer Project. It is approximately 126 km long from north to south and 5–25 km wide from east to west. The central part of the lake is slightly narrower, while the northern and southern parts are broader, forming a teardrop shape. The average water depth in the lake is 2 m. The study region belongs to a warm temperate semi-humid monsoon climate zone with an average annual temperature of about 13.7 °C and an average annual precipitation of 695.2 mm. More than 70% of the annual precipitation falls in the flood season from June to September. Pit-pond culture is the dominant fishery type in Nansi Lake and a vital part of the local economy. Aquaculture ponds constructed by setting up dike banks near the shore are mainly distributed in the water of Nansi Lake. However, with the continuous economic development, the land use structure of the lake area has undergone significant changes in the past several decades. The explosive growth in local population and economic development and the rapid expansion in aquaculture ponds have led to a continuous decrease in the arable land and waters and unavoidably resulted in a certain degree of ecological imbalance in this area.

2.2. Data

Landsat TM/OLI (L2) images from 1984 to 2022 provided by the United States Geological Survey (USGS) were used in this study to carry out land cover classification. In order to accurately separate aquaculture ponds from nearshore vegetation, we elaborately selected a total of 10 winter cloud-free images to eliminate the effect of spectral convergence caused by plants growing in the ponds during growing season. The acquisition dates of these images were 7 February 1987, 27 January 1989, 24 December 1993, 5 February 1998, 31 January 2002, 20 December 2003, 26 January 2006, 29 November 2013, 10 December 2017, and 5 December 2021, respectively. In order to comprehensively analyze the long-term changes in LUCC around the lake, we set a 5 km buffer zone based on the vector extent of the lake. When determining the land cover types, we fully considered the actual land use status in the Nansi Lake area, and the potential relationship between different land use types and the dynamic changes in aquaculture ponds. Therefore, five land cover types were identified in the current study, namely farmland, water, aquaculture pond, built-up land, and others (primarily consists of forests and barens). In addition, the phytoplankton density data derived from Wang et al. (2024) [25] were adopted to study the possible influence of aquaculture activity on lake water quality, which allowed an RF-based model to be developed to quantify the ecological status in Nansi Lake by means of Landsat-8 OLI images and obtained a high prediction accuracy.

2.3. Samples Collection

In remote sensing image classification, sample quality is crucial to the final mapping accuracy. Following the principle of full frame selection, all Landsat TM/OLI images covering the study area were comprehensively visually interpreted. Special attention was paid to the selection of representative pixels for each type of land use to ensure that the samples could truly reflect the spectral characteristics and spatial distribution of each type of feature. After strict screening and calibration, 11,489 sample points evenly distributed throughout the study area were finally identified, including 1877 samples for farmland, 2649 samples for water, 3102 samples for aquaculture pond, 1122 samples for built-up land, and 2739 samples for others. This approach fully considered the balance of the scale of each LUCC type and thus could avoid classification bias due to excessive differences in the number of samples. Then, we randomly divided them into training and test datasets with an 8:2 ratio to ensure that the model could be adequately trained and its classification performance effectively evaluated.

2.4. Classification Features

First, this study selected the three visible bands, the near-infrared band and the two short-wave infrared bands, corresponding to the B1, B2, B3, B4, B5, and B7 bands of Landsat5 TM and the B2, B3, B4, B5, B6, and B7 bands of Landsat8 OLI, as the direct features to capture the difference among land use types in spectral characteristics. Second, we considered the Enhanced Vegetation Index (EVI) and the Modified Normalized Difference Water Index (MNDWI) as two other keys to enhance the discrepancy among the targeted objects. The EVI could reduce the atmospheric effects and address the saturation issue in the area of high vegetation coverage found in the traditional normalized difference vegetation index [26,27]. The expression is as follows:
E V I = 2.5 × ρ n i r ρ r e d ρ n i r + 6 × ρ r e d 7.5 × ρ b l u e + 1
where ρ n i r , ρ r e d , and ρ b l u e are the atmospherically corrected reflectance values for the near-infrared and red and blue bands, respectively.
MNDWI can eliminate the effect of terrain difference and solve the problem of noise in water body identification [28,29]. The expression is as follows:
M N D W I = ( G r e e n S W I R 1 ) / ( G r e e n + S W I R 1 )
where Green and SWIR1 are the reflectance values in the green band and short-wave infrared band 1, respectively.
In addition, we also adopted the gray-level co-occurrence matrix (GLCM) method to extract texture information by calculating the gray-level spatial relationship between pixels for further improving the classification accuracy [30]. Texture features play a crucial role in the recognition of ground object types, especially in distinguishing ground objects with similar spectral features but different spatial features. For example, aquaculture pond and water body are similar in spectral reflection, but their texture features, such as regularity, roughness, and grain size, may differ significantly. By introducing texture features, we can more accurately depict the spatial structure of ground objects, thus improving the accuracy of classification. In total, eight texture metrics were calculated, including mean, variance, homogeneity, contrast, dissimilarity, entropy, angular second moment, and correlation. These variables provide an effective tool for quantifying surface irregularities and are essential for distinguishing different land cover categories (Table 1). To guarantee the accuracy and effectiveness of the texture analysis, we used a window size of 3 × 3 to traverse the entire image pixel by pixel and took the gray level of 64 to capture the detailed texture information in the image. With the above settings, we successfully extracted 48 texture features from the original image. However, too many features may lead to an increase in computational complexity and a decrease in classification performance. In order to reduce the feature dimension and extract the most important information, we performed principal component transform analysis on these texture features. Principal component analysis (PCA) is a statistical tool that transforms the original features into new, unrelated features through linear transformations. These new features are called principal components. The purpose of PCA is to identify the most important features from the data and aggregate them into a new, smaller set of features that explain the greatest degree of variance in the data. By calculating the covariance matrix of the texture feature matrix, PCA determines the direction that can maximize the variance in the data, that is, the main direction of the data change. Each principal component is a linear combination of the original features, with the first principal component explaining the largest variance in the data, the second principal component explaining the largest portion of the remaining variance, and so on. The first 5 principal components were selected for the subsequent land use classification.

2.5. Classification Algorithms

In this study, we employed two tree-based machine learning models, RF and XGBoost; two classical models, KNN and Gaussian naive Bayes (GNB); and artificial neural network (ANN) for land cover classification. The RF model is the most widely used classification model in LUCC classification, with proven accuracy [31,32]. The XGBoost model stands out in the field of machine learning due to its efficient processing of large-scale data [33,34]. ANN is one of the most commonly used non-parametric classification techniques, renowned for its strong generalization capabilities [35]. KNN and GNB are both traditional machine learning algorithms that are computationally simple, run efficiently, and perform well in land object identification with high homogeneity [36,37].

2.5.1. RF

RF is an integrated classifier based on decision trees, each of which is independently generated with a user-defined number of features on which its node splits are based. The selection of these features is randomized to warrant model diversity. The training data and variables for each decision tree are generated through a bagging strategy, and the final classification results are derived through majority voting [38]. In this study, we set the number of trees (n_estimators) to 100 to ensure that the model had sufficient diversity; we set the maximum number of features per tree (max_features) to the square root of the total number of features to balance the complexity of the model and the risk of overfitting.

2.5.2. XGBoost

XGBoost, a remarkable machine learning algorithm based on the Gradient Boosting Decision Tree framework, stands out due to its superior flexibility, high efficiency, and outstanding performance in Kaggle machine learning competitions [19]. By introducing a regularization mechanism, XGBoost is able to smooth the weights of the final learning, effectively avoid the overfitting issue, and thus improve the learning accuracy. In addition, XGBoost is equipped with parallel and distributed computing capabilities, which significantly accelerates the learning speed. In this study, we set the number of iterations (n_estimators) to 100, the maximum depth of the decision tree (max_depth) to 10, and learning_rate to 1.

2.5.3. KNN

KNN is an instance-based classifier for classification and regression [39]. It does not rely on an explicit model training process, but instead finds the K closest training samples to an unknown sample by measuring the distance of that sample from all samples in the training set, and uses the category with the most votes as the prediction for the unknown sample based on the category labels of the K samples [40]. After experimental validation, K was set to 20 for ensuring that the model can make full use of the information from neighboring samples when classifying, while avoiding the influence of noisy data on the classification results.

2.5.4. GNB

GNB is a machine learning algorithm that uses probabilistic methods and relies on Gaussian distributions. Its principle is based on Bayes’ theorem and the assumption of conditional independence between features, i.e., the feature variables of each category obey a normal distribution. By calculating the mean and variance of the feature variables of each category, the algorithm can estimate the probability that an unknown sample belongs to each category based on these statistics [41,42]. In our dataset, the distribution of most features is approximately normal, which provides a reasonable basis for the application of the GNB model (Figure S1). In this study, in GNB parameters, the prior was set to none and var_smoothing was set to 1 × 10−9.

2.5.5. ANN

The ANN classification algorithm learns the relationship between input features and output categories through a training process. During training, the network calculates outputs through forward propagation and then computes output errors through the backpropagation algorithm, updating network weights based on these errors [43]. ANN classification algorithms typically consist of an input layer, one or more hidden layers, and an output layer, with each neuron using an activation function (Sigmoid, Tanh, ReLU, etc.) to determine whether to activate, introducing non-linear factors that enable the neural network to learn and model complex non-linear relationships [44]. In this study, we employed Multilayer Perceptron (MLP) as the neural network architecture, selected the logistic function as the activation function, and chose the lbfgs optimizer to refine the weights.

2.6. Analysis

Based on spectral features, index features, and texture features, we constructed three feature schemes (Table 2) and trained KNN, GNB, ANN, RF, and XGBoost classification models, respectively.
In order to objectively and systematically evaluate the performance of different classification algorithms and feature schemes, we used a variety of statistical metrics for quantitative analysis. Specifically, we calculated the confusion matrix of each model based on the training and testing datasets to visualize the model’s classification effect on each category of samples. On this basis, we further quantified the OA, which can directly reflect the proportion of objects correctly classified by the model, providing us with an intuitive performance measurement.
In addition, in order to evaluate the model performance more comprehensively, we also introduced the Kappa consistency coefficient, which is an indicator describing the degree of consistency between the model’s classification results and the actual situation. Meanwhile, we further calculated the Producer Accuracy (PA) and recall to assess model performance in terms of the prediction of positive examples of the classification results and the recall of real positive examples.
Finally, we adopted the F1 score as a comprehensive evaluation metric, which combines the information of precision and recall and can fully reflect the comprehensive performance of the model in the classification task. Through the comprehensive analysis of these metrics, we are able to more objectively and comprehensively assess the performance of different classification algorithms and feature schemes in LUCC classification.

3. Results

3.1. Feature Selection and Feature Importance

In this study, feature importance was assessed for the three feature schemes using RF (Figure 2). Among the 13 features assessed, the index feature EVI had the highest importance score of 0.16, indicating that EVI played a key role in classification prediction. The spectral features NIR and SWIR1 also exhibited high importance scores, reflecting their effectiveness in distinguishing different LUCC types. For texture features, the importance scores of the principal components PC1 and PC2 were relatively higher than PC3, PC4, and PC5. In comparison, the effects of visible bands are relatively weaker than those of index and texture features.

3.2. Feature Profile Comparison

In order to legibly understand the gaps among the five targeted objects, we systematically compared their differences in spectral features, index features, and principal components of texture features (Figure 3). In the visible light bands, the reflectance distributions of the five objects were relatively similar and generally in the range of 0~0.2. This indicated that the spectral characteristics of these classes did not differ too much in the visible light bands, making it difficult to effectively distinguish them by only relying on the spectral gaps in visible bands. In comparison, in the non-visible bands (NIR, SWIR1, and SWIR2), the spectral properties of the five classes showed significant differences. Specifically, farmland and built-up areas had relatively higher reflectance in the NIR and SWIR bands than aquaculture ponds, water, and others. Meanwhile, water in particular reflected less than aquaculture ponds, which would be helpful to distinguish them. Regarding index features, the highest EVI values and the smallest MNDWI values were detected in farmland. The EVI and MNDWI value of aquaculture ponds lies between water and built-up areas/others. As for texture profiles, only PC1 and PC2 exhibited obvious gaps among the five classes, providing an effective basis for LUCC classification. In contrast, the PC3-PC5 principal components largely overlapped with each other, suggesting that they were useless for improving classification accuracy.

3.3. Accuracy Comparation of Different Classification Models

The performance of classification models was explored in depth on the test data (Table 3). Among the classifiers, the XGBoost and RF models exhibited better performance in the classification task and reached higher accuracies of up to 96.15% and 95.92%, respectively. In contrast, the GNB model performed the worst with an accuracy below 65%. Among the three feature schemes examined, scheme 5 that incorporated texture features outperformed all other schemes overall, especially the XGBoost classifier. In comparison, scheme 3, which only included spectral and texture features, was slightly less effective than scheme 5, indicating the significance of spectral features in the classification process. scheme 4, which contained only index and texture features, had lower classification accuracy than scheme 5, further confirming the critical role of spectral features in classification tasks.
This study employed the XGBoost scheme 5 classification scheme to evaluate the accuracy of land use types and presented the corresponding normalized confusion matrix (Figure 4). Among the different land use types, built-up land had the lowest classification accuracy, with PA and recall of 91.3% and 88.63%, respectively. In contrast, farmland had the highest recall, reaching 97.19%, while the ‘other’ type had the highest PA, at 97.24% (Figure 4b). Apart from built-up land, the correct classification ratio for other land use types was generally higher than 0.9. The correct classification ratio for aquaculture ponds was 0.93, which was relatively superior among all land use types. Further analysis of the misclassification of aquaculture ponds revealed that the highest proportion of errors was with water, amounting to 0.03 (Figure 4a). This indicates the high accuracy of XGBoost scheme 5 in classifying most types of land use.
In terms of visualization, compared to scheme 1, scheme 2, and scheme 4, the patch integrity of surface objects predicted under scheme 5 was significantly improved and the confusion of categories was significantly reduced (Figure 5). Especially in the categorization of aquaculture ponds, scheme 5-based prediction significantly refined the continuity and completeness of their distribution.

3.4. Land Cover Changes in Nansi Lake

Based on the results predicted by the XGBoost classifier under scheme 5, this study analyzed the development of the area of five land cover types around Nansi Lake between 1987 and 2021 (Figure 6b). Built-up land and aquaculture ponds have largely expanded since 1987. Specifically, the area of built-up land extended from 120 km2 in 1987 to 296 km2 in 2021, while the area of aquaculture ponds surged from 48 km2 to 842 km2, with a sharp increase during 1984–2003. On the contrary, the area of the ‘other’ type greatly shrank from the dominant cover type to about 215 km2 by 2021. Farmland showed a slight expansion tendency overall with a sudden drop around 2003. The area of lake bodies fluctuated dramatically during the study period but no significant trends were found here. Spatially, the distribution of aquaculture ponds after 2002 has shown a pronounced characteristic of geographic clustering, primarily concentrated in the western and central regions of Nansi Lake (Figure 6a).

3.5. Relationship between Water Quality and the Expansion of Aquaculture Ponds

This study analyzed the potential impact of aquaculture pond expansion on water quality, expressed through the amount of phytoplankton in Nansi Lake, by means of correlation analysis (Figure 7). We found the phytoplankton abundance decreased after 2003 with the increase in the area of aquaculture ponds. Nonetheless, it showed a positive correlation (R = 0.5) with annual fluctuation, suggesting that its synchronous relationship with the water quality of the lake may be partially affected by aquaculture activity.

4. Discussion

In this study, we performed land use classification with Landsat 5/8 images around Nansi Lake by means of KNN, GNB, RF, and XGB algorithms under multilevel features. Our results showed that the XGB algorithm, especially when combined with texture features, achieved the highest classification accuracy of up to 96.15%. Compared with the existing literature, the classification accuracy in this study is significantly improved. For example, Talukdar et al. (2020) [35] used the RF algorithm to classify a riparian landscape in India and obtained a lower classification accuracy with a kappa coefficient of 0.89. Similarly, Abbas and Jaber (2020) [45] used WorldView-2 image and SVM algorithm to classify the land use in Hilla City in Babylon, Iraq, and obtained an overall classification accuracy of 94.48% and a kappa coefficient of 0.9, which are still less than in the current study. Xia et al. (2020) [46] extracted aquaculture ponds in Shanghai by integrating existing multi-source remote sensing data on the Google Earth Engine platform and combining multi-threshold connection component segmentation and random forest algorithm and reached an OA of 91.8%, which is still lower than the current study. This study not only confirms the key role of multiple feature integration in improving classification results, but also highlights the great potential of advanced machine learning algorithms in land use classification.
The key to achieving such a high classification accuracy in this study is the innovative introduction of texture features and effective dimensionality reduction. Texture features enable the model to better distinguish objects with similar spectra but large texture differences, especially for specific surface object types such as aquaculture ponds. The PCA method effectively reduced the information redundancy among features and improved the classification ability. The first PCA component typically accounts for the largest variance in the PCA analysis of texture data, while the second principal component explains the most variance among the remaining components, and so on. The first five principal components collectively represent approximately 99.99% of the shape information of all land cover types and contributed more to distinguishing aquaculture ponds and farmlands with a regular shape from others. The overall accuracy of RF scheme 4 is 92.03%, indicating that satisfactory classification can be achieved even when using only index and texture principal component features. This may be attributed to the EVI index, a vegetation index that integrates information from the near-infrared, red, and blue bands, outperforming single-band data in terms of classification performance. Similarly, the MNDWI, as a water body index, effectively distinguishes water from other land use types by utilizing information from the shortwave infrared and green bands. It is important to note that although index features (EVI and MNDWI) are included in this study, their contribution to the final classification accuracy when combined with spectral and texture features is not significant. However, in the feature importance evaluation, these indicators scored higher. The reason may be that the sufficiency of spectral information and its strong collinearity with index features factually caused no significant increment in useful information in the final classification. Nevertheless, the contribution of spectral features to the classification process remains substantial. The overall accuracy of the RF classification model based on spectral features reached 92.51%, highlighting the key role of spectral features in distinguishing different land cover types. Particularly in the feature importance evaluation, the NIR band, which scores highly, is greatly effective in differentiating vegetation types, while the SWIR band demonstrates its unique ability in identifying water types.
From the perspective of land use change, this study revealed dramatic changes in land feature type conversion in Nansi Lake. With the advancement of urbanization, natural land types such as bare land and forest land have been gradually converted to other uses. At the same time, the expansion of built land and aquaculture ponds reflects the increasing demand for land due to population growth and urban expansion. It is worth noting that since 2003, the expansion rate of aquaculture ponds in Nansi Lake has slowed down, and at the same time, the water quality of Nansi Lake has also shown a gradual improvement. This change may be closely related to the government’s policy of keeping the lake natural, which forced a lot of farmers to reduce aquaculture activities to restore the integrity of the lake ecosystem. With the reduction in fertilizer application, a decrease in phytoplankton density was detected. The contrary yearly trends of aquaculture ponds and water quality are because the embankments of many aquaculture ponds are still there despite no fishery activity under the strong management of local government. However, we must also recognize the complexity of the relationship between aquaculture ponds and changes in water quality in the lake region. The change in water quality is the result of multiple factors, including climate, hydrology, land use, and human activities. Although the change in aquaculture pond area has a certain impact on water quality, it is only one of many factors affecting water quality changes. Therefore, to fully and deeply understand the causes of water quality evolution, more comprehensive investigation is needed in the future.
Despite the remarkable results of this study, there are still some limitations that need to be noted. First, high-precision classification relies heavily on the accurate selection of training and validation samples. In this study, we used visual interpretation to select samples, which is inevitably affected by subjective factors. Different interpreters may classify and categorize feature types in the same area differently according to their own experiences and judgments, leading to misclassification of sample types and thus adversely affecting the accuracy of the classification results. Meanwhile, due to the complexity of local land use types, especially in the transition areas, the unclear boundaries of these types often make it particularly difficult to accurately select high-purity pixels. Second, the current studies mainly rely on traditional feature selection methods, which may not be able to fully mine the potential information in the data. Future research could try to use deep learning techniques to automatically extract and select the most discriminative features.

5. Conclusions

Aiming at the difficult problem of identifying aquaculture ponds in the Nansi Lake region, this study integrated multi-level features into different machine learning algorithms to achieve high-precision land use classification with the highest accuracy of 96.15%, breaking through the limitations of traditional methods. This study shows that the land use pattern in the region has greatly transformed, and natural land such as bare land and forest land has largely been replaced by aquaculture ponds and built-up land. At the same time, we found that phytoplankton density was correlated with the changes in the area of aquaculture ponds, suggesting that the expansion of ponds and the reduction in local farming strength may have changed the hydrological environment of the lake.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs16122168/s1, Figure S1: The data distribution presents the texture features of five target objects prior to Principal Component Analysis (PCA). These texture features are derived from eight texture attributes extracted from the six selected spectral bands using the Gray Level Co-occurrence Matrix (GLCM) method. The attributes include Mean, Variance, Homogeneity, Contrast, Dissimilarity, Entropy, Angular Second Moment, and Correlation.

Author Contributions

Conceptualization, G.X. and S.R.; methodology, X.B.; formal analysis, X.B.; investigation, Y.P., Y.L. (Yi Li), C.Z., Y.L. (Yang Liu) and J.L.; writing—original draft preparation, G.X. and X.B.; writing—review and editing, S.R., L.F., J.C., J.M., X.W., G.W. and Q.W.; funding acquisition, S.R. All authors have read and agreed to the published version of the manuscript.

Funding

This study was specially funded by the National Key R&D Program of China (Grant No. 2022YFC3204400) and the Key Laboratory of Land and Sea Ecological Governance and Systematic Regulation.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Dehong, D.; Yinfang, S.; Lin, S.; Genxia, W. Remote Sensing Technology’s Applied Research and Development Direction in Land-Use and Land-Cover Change (LUCC). In Proceedings of the 2012 2nd International Conference on Remote Sensing, Environment and Transportation Engineering, Nanjing, China, 1–3 June 2012; pp. 1–4. [Google Scholar]
  2. Wang, X.; Dong, X.; Liu, H.; Wei, H.; Fan, W.; Lu, N.; Xu, Z.; Ren, J.; Xing, K. Linking land use change, ecosystem services and human well-being: A case study of the Manas River Basin of Xinjiang, China. Ecosyst. Serv. 2017, 27, 113–123. [Google Scholar] [CrossRef]
  3. Cao, L.; Naylor, R.; Henriksson, P.; Leadbitter, D.; Metian, M.; Troell, M.; Zhang, W. China’s aquaculture and the world’s wild fisheries. Science 2015, 347, 133–135. [Google Scholar] [CrossRef] [PubMed]
  4. Naylor, R.; Fang, S.; Fanzo, J. A global view of aquaculture policy. Food Policy 2023, 116, 102422. [Google Scholar] [CrossRef]
  5. Jiang, Q.; Bhattarai, N.; Pahlow, M.; Xu, Z. Environmental sustainability and footprints of global aquaculture. Resour. Conserv. Recycl. 2022, 180, 106183. [Google Scholar] [CrossRef]
  6. Luo, J.; Pu, R.; Ma, R.; Wang, X.; Lai, X.; Mao, Z.; Zhang, L.; Peng, Z.; Sun, Z. Mapping long-term spatiotemporal dynamics of pen aquaculture in a shallow lake: Less aquaculture coming along better water quality. Remote Sens. 2020, 12, 1866. [Google Scholar] [CrossRef]
  7. Sun, Z.; Luo, J.; Gu, X.; Qi, T.; Xiao, Q.; Shen, M.; Ma, J.; Zeng, Q.; Duan, H. Policy-driven opposite changes of coastal aquaculture ponds between China and Vietnam: Evidence from Sentinel-1 images. Aquaculture 2023, 571, 739474. [Google Scholar] [CrossRef]
  8. Chen, B.; Huang, B.; Xu, B. Multi-source remotely sensed data fusion for improving land cover classification. ISPRS J. Photogramm. Remote Sens. 2017, 124, 27–39. [Google Scholar] [CrossRef]
  9. Li, R.; Gao, X.; Shi, F.; Zhang, H. Scale Effect of Land Cover Classification from Multi-Resolution Satellite Remote Sensing Data. Sensors 2023, 23, 6136. [Google Scholar] [CrossRef] [PubMed]
  10. Feyisa, G.L.; Meilby, H.; Fensholt, R.; Proud, S.R. Automated Water Extraction Index: A new technique for surface water mapping using Landsat imagery. Remote Sens. Environ. 2014, 140, 23–35. [Google Scholar] [CrossRef]
  11. Huang, D.-M.; Wei, C.-T.; Yu, J.-C.; Wang, J.-L. A method of detecting land use change of remote sensing images based on texture features and DEM. In Proceedings of the International Conference on Intelligent Earth Observing and Applications, Guilin, China, 23–24 October 2015; pp. 613–618. [Google Scholar]
  12. Wang, H.; Zhao, H.; Li, W. Land-use Classification of Zhanghe River Basin Using the Maximum Likelihood and Decision Tree Method. In Proceedings of the 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), Xiamen, China, 19–21 August 2014; pp. 322–327. [Google Scholar]
  13. Li, Y.; Wu, H. A Clustering Method Based on K-Means Algorithm. In Proceedings of the International Conference on Solid State Devices and Materials Science (SSDMS), Macao, China, 1–2 April 2012; pp. 1104–1109. [Google Scholar]
  14. Papa, J.P.; Papa, L.P.; Pereira, D.R.; Pisani, R.J. A Hyperheuristic Approach for Unsupervised Land-Cover Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 2333–2342. [Google Scholar] [CrossRef]
  15. Heydari, S.S.; Mountrakis, G. Effect of classifier selection, reference sample size, reference class distribution and scene heterogeneity in per-pixel classification accuracy using 26 Landsat sites. Remote Sens. Environ. 2018, 204, 648–658. [Google Scholar] [CrossRef]
  16. He, T.; Sun, Y.-J.; Xu, J.-D.; Wang, X.-J.; Hu, C.-R. Enhanced land use/cover classification using support vector machines and fuzzy k-means clustering algorithms. J. Appl. Remote Sens. 2014, 8, 083636. [Google Scholar] [CrossRef]
  17. Shakya, A.; Biswas, M.; Pal, M. Parametric study of convolutional neural network based remote sensing image classification. Int. J. Remote Sens. 2021, 42, 2663–2685. [Google Scholar] [CrossRef]
  18. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  19. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  20. Lu, D.; Weng, Q. A survey of image classification methods and techniques for improving classification performance. Int. J. Remote Sens. 2007, 28, 823–870. [Google Scholar] [CrossRef]
  21. Pal, M.; Foody, G.M. Feature Selection for Classification of Hyperspectral Data by SVM. IEEE Trans. Geosci. Remote Sens. 2010, 48, 2297–2307. [Google Scholar] [CrossRef]
  22. Duan, Y.; Li, X.; Zhang, L.; Chen, D.; Ji, H. Mapping national-scale aquaculture ponds based on the Google Earth Engine in the Chinese coastal zone. Aquaculture 2020, 520, 734666. [Google Scholar] [CrossRef]
  23. Zeng, Z.; Wang, D.; Tan, W.; Yu, G.; You, J.; Lv, B.; Wu, Z. RCSANet: A full convolutional network for extracting inland aquaculture ponds from high-spatial-resolution images. Remote Sens. 2020, 13, 92. [Google Scholar] [CrossRef]
  24. Zhang, M.; Dong, J.; Gao, Y.; Liu, Y.; Zhou, C.; Meng, X.; Li, X.; Li, M.; Wang, Y.; Dai, D. Patterns of phytoplankton community structure and diversity in aquaculture ponds, Henan, China. Aquaculture 2021, 544, 737078. [Google Scholar] [CrossRef]
  25. Wang, W.; Chen, J.; Fang, L.; Yinglan, A.; Ren, S.; Men, J.; Wang, G. Remote sensing retrieval and driving analysis of phytoplankton density in the large storage freshwater lake: A study based on random forest and Landsat-8 OLI. J. Contam. Hydrol. 2024, 261, 104304. [Google Scholar] [CrossRef]
  26. Gu, Z.; Zhang, Z.; Yang, J.; Wang, L. Quantifying the influences of driving factors on vegetation EVI changes using structural equation model: A case study in Anhui province, China. Remote Sens. 2022, 14, 4203. [Google Scholar] [CrossRef]
  27. Pôças, I.; Calera, A.; Campos, I.; Cunha, M. Remote sensing for estimating and mapping single and basal crop coefficientes: A review on spectral vegetation indices approaches. Agric. Water Manag. 2020, 233, 106081. [Google Scholar] [CrossRef]
  28. Bhangale, U.; More, S.; Shaikh, T.; Patil, S.; More, N. Analysis of surface water resources using Sentinel-2 imagery. Procedia Comput. Sci. 2020, 171, 2645–2654. [Google Scholar] [CrossRef]
  29. Li, L.; Su, H.; Du, Q.; Wu, T. A novel surface water index using local background information for long term and large-scale Landsat images. ISPRS J. Photogramm. Remote Sens. 2021, 172, 59–78. [Google Scholar] [CrossRef]
  30. Duan, M.; Song, X.; Liu, X.; Cui, D.; Zhang, X. Mapping the soil types combining multi-temporal remote sensing data with texture features. Comput. Electron. Agric. 2022, 200, 107230. [Google Scholar] [CrossRef]
  31. Wu, H.; Lin, A.; Xing, X.; Song, D.; Li, Y. Identifying core driving factors of urban land use change from global land cover products and POI data using the random forest method. Int. J. Appl. Earth Obs. Geoinf. 2021, 103, 102475. [Google Scholar] [CrossRef]
  32. Zhang, F.; Yang, X. Improving land cover classification in an urbanized coastal area by random forests: The role of variable selection. Remote Sens. Environ. 2020, 251, 112105. [Google Scholar] [CrossRef]
  33. Abdi, A.M. Land cover and land use classification performance of machine learning algorithms in a boreal landscape using Sentinel-2 data. GIScience Remote Sens. 2020, 57, 1–20. [Google Scholar] [CrossRef]
  34. Quan, Y.; Hutjes, R.W.; Biemans, H.; Zhang, F.; Chen, X.; Chen, X. Patterns and drivers of carbon stock change in ecological restoration regions: A case study of upper Yangtze River Basin, China. J. Environ. Manag. 2023, 348, 119376. [Google Scholar] [CrossRef]
  35. Talukdar, S.; Singha, P.; Mahato, S.; Pal, S.; Liou, Y.-A.; Rahman, A. Land-use land-cover classification by machine learning classifiers for satellite observations—A review. Remote Sens. 2020, 12, 1135. [Google Scholar] [CrossRef]
  36. Duran, Z.; Ozcan, K.; Atik, M.E. Classification of photogrammetric and airborne lidar point clouds using machine learning algorithms. Drones 2021, 5, 104. [Google Scholar] [CrossRef]
  37. Jiang, W.; Zhang, M.; Long, J.; Pan, Y.; Ma, Y.; Lin, H. HLEL: A wetland classification algorithm with self-learning capability, taking the Sanjiang Nature Reserve I as an example. J. Hydrol. 2023, 627, 130446. [Google Scholar] [CrossRef]
  38. Pal, M. Random forest classifier for remote sensing classification. Int. J. Remote Sens. 2005, 26, 217–222. [Google Scholar] [CrossRef]
  39. Usman, M.; Ejaz, M.; Nichol, J.E.; Farid, M.S.; Abbas, S.; Khan, M.H. A Comparison of Machine Learning Models for Mapping Tree Species Using WorldView-2 Imagery in the Agroforestry Landscape of West Africa. ISPRS Int. J. Geo-Inf. 2023, 12, 142. [Google Scholar] [CrossRef]
  40. Ge, G.; Shi, Z.; Zhu, Y.; Yang, X.; Hao, Y. Land use/cover classification in an arid desert-oasis mosaic landscape of China using remote sensed imagery: Performance assessment of four machine learning algorithms. Glob. Ecol. Conserv. 2020, 22, e00971. [Google Scholar] [CrossRef]
  41. Islam, M.R.; Nahiduzzaman, M. Complex features extraction with deep learning model for the detection of COVID19 from CT scan images using ensemble based machine learning approach. Expert Syst. Appl. 2022, 195, 116554. [Google Scholar] [CrossRef] [PubMed]
  42. Li, D.; Hu, S.; He, W.; Zhou, B.; Peng, J.; Wang, K. The area prediction of western North Pacific Subtropical High in summer based on Gaussian Naive Bayes. Clim. Dyn. 2022, 59, 3193–3210. [Google Scholar] [CrossRef]
  43. Mao, W.; Lu, D.; Hou, L.; Liu, X.; Yue, W. Comparison of machine-learning methods for urban land-use mapping in Hangzhou city, China. Remote Sens. 2020, 12, 2817. [Google Scholar] [CrossRef]
  44. Ghayour, L.; Neshat, A.; Paryani, S.; Shahabi, H.; Shirzadi, A.; Chen, W.; Al-Ansari, N.; Geertsema, M.; Pourmehdi Amiri, M.; Gholamnia, M. Performance evaluation of sentinel-2 and landsat 8 OLI data for land cover/use classification using a comparison between machine learning algorithms. Remote Sens. 2021, 13, 1349. [Google Scholar] [CrossRef]
  45. Abbas, Z.; Jaber, H.S. Accuracy assessment of supervised classification methods for extraction land use maps using remote sensing and GIS techniques. IOP Conf. Ser. Mater. Sci. Eng. 2020, 745, 012166. [Google Scholar] [CrossRef]
  46. Xia, Z.; Guo, X.; Chen, R. Automatic extraction of aquaculture ponds based on Google Earth Engine. Ocean. Coast. Manag. 2020, 198, 105348. [Google Scholar] [CrossRef]
Figure 1. Geographical location and overview of Nansi Lake.
Figure 1. Geographical location and overview of Nansi Lake.
Remotesensing 16 02168 g001
Figure 2. Importance ranking of 13 feature variables.
Figure 2. Importance ranking of 13 feature variables.
Remotesensing 16 02168 g002
Figure 3. Quantity difference among five targeted objects in (a) spectral features, (b) index features, and (c) texture features.
Figure 3. Quantity difference among five targeted objects in (a) spectral features, (b) index features, and (c) texture features.
Remotesensing 16 02168 g003
Figure 4. Comparison of classification accuracy for land use types identified with XGBoost scheme 5: (a) confusion matrix; (b) PA and recall for each land use type.
Figure 4. Comparison of classification accuracy for land use types identified with XGBoost scheme 5: (a) confusion matrix; (b) PA and recall for each land use type.
Remotesensing 16 02168 g004
Figure 5. LUCC mapping and details comparison: (a,b) are two partial details presented to compare the land use classification performance among different models.
Figure 5. LUCC mapping and details comparison: (a,b) are two partial details presented to compare the land use classification performance among different models.
Remotesensing 16 02168 g005
Figure 6. Land cover changes from 1984 to 2021 based on XGBoost scheme 5: (a) temporal variation; (b) spatial distribution.
Figure 6. Land cover changes from 1984 to 2021 based on XGBoost scheme 5: (a) temporal variation; (b) spatial distribution.
Remotesensing 16 02168 g006
Figure 7. Correlation of aquaculture pond area and phytoplankton density. The correlation coefficient (a) and regression equation (b) between aquaculture pond area and phytoplankton density.
Figure 7. Correlation of aquaculture pond area and phytoplankton density. The correlation coefficient (a) and regression equation (b) between aquaculture pond area and phytoplankton density.
Remotesensing 16 02168 g007
Table 1. Characteristics and description of selected GLCM.
Table 1. Characteristics and description of selected GLCM.
GLCMDescription
MeanReflects the degree of regularity of the texture.
VarianceMeasures the dispersion of the gray-level distribution to emphasize the visual edges of land cover patches.
HomogeneityMeasures the local gray-level homogeneity of an image.
ContrastReflects the total amount of local gray-level changes in an image.
DissimilaritySimilar to contrast, if the local contrast is higher, the dissimilarity is also higher.
EntropyMeasures the amount of information contained in an image, representing the degree of non-uniformity or complexity of textures within the image.
Angular Second MomentMeasures the uniformity of the image gray-level distribution, reflecting the degree of uniformity of the image gray-level distribution and the coarseness of the texture.
CorrelationMeasures the linear relationship of gray levels, describing the degree of similarity between elements in rows or columns.
MeanReflects the degree of regularity of the texture.
Table 2. Experimental feature schemes.
Table 2. Experimental feature schemes.
SchemesFeature Variables
Scheme 1:
spectral feature
Blue band, green band, red band, NIR band, SWIR1 band, SWIR2 band
Scheme 2:
spectral feature
+index feature
Blue band, green band, red band, NIR band, SWIR1 band, SWIR2 band, EVI, MNDWI
Scheme 3:
spectral feature + texture feature
Blue band, green band, red band, NIR band, SWIR1 band, SWIR2 band, PC1, PC2, PC3, PC4, PC5
Scheme 4:
index feature + texture feature
EVI, MNDWI, PC1, PC2, PC3, PC4, PC5
Scheme 5:
spectral feature + index feature + texture feature
Blue band, green band, red band, NIR band, SWIR1 band, SWIR2 band, EVI, MNDWI, PC1, PC2, PC3, PC4, PC5
Table 3. Comparison of classification accuracy of different models.
Table 3. Comparison of classification accuracy of different models.
ModelOA (%)KappaPA (%)Recall (%)F1
KNN scheme 184.930.8084.6984.930.85
KNN scheme 285.260.8084.9985.260.85
KNN scheme 385.510.8185.3685.510.85
KNN scheme 485.740.8185.5985.740.86
KNN scheme 585.830.8185.6885.830.86
GNB scheme 159.680.4860.6859.680.58
GNB scheme 260.570.4964.3460.570.61
GNB scheme 358.010.4559.7958.010.55
GNB scheme 459.440.4658.2259.440.56
GNB scheme 560.900.5065.8360.900.61
ANN scheme 169.740.5967.7969.740.68
ANN scheme 270.850.6069.7270.850.68
ANN scheme 375.570.6774.6675.570.75
ANN scheme 476.500.6875.7676.500.76
ANN scheme 576.000.6774.9376.000.75
RF scheme 192.510.9092.4392.510.92
RF scheme 292.280.9092.2092.280.92
RF scheme395.920.9595.9195.920.96
RF scheme 492.030.8991.9892.030.92
RFscheme 595.660.9495.6595.660.96
XGBoost scheme 191.620.9291.6291.620.92
XGBoost scheme 292.010.8991.9392.010.92
XGBoost scheme 396.070.9596.0596.070.96
XGBoost scheme 491.320.8891.2791.320.91
XGBoost scheme596.150.9596.1496.150.96
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xie, G.; Bai, X.; Peng, Y.; Li, Y.; Zhang, C.; Liu, Y.; Liang, J.; Fang, L.; Chen, J.; Men, J.; et al. Aquaculture Ponds Identification Based on Multi-Feature Combination Strategy and Machine Learning from Landsat-5/8 in a Typical Inland Lake of China. Remote Sens. 2024, 16, 2168. https://doi.org/10.3390/rs16122168

AMA Style

Xie G, Bai X, Peng Y, Li Y, Zhang C, Liu Y, Liang J, Fang L, Chen J, Men J, et al. Aquaculture Ponds Identification Based on Multi-Feature Combination Strategy and Machine Learning from Landsat-5/8 in a Typical Inland Lake of China. Remote Sensing. 2024; 16(12):2168. https://doi.org/10.3390/rs16122168

Chicago/Turabian Style

Xie, Gang, Xiaohui Bai, Yanbo Peng, Yi Li, Chuanxing Zhang, Yang Liu, Jinhui Liang, Lei Fang, Jinyue Chen, Jilin Men, and et al. 2024. "Aquaculture Ponds Identification Based on Multi-Feature Combination Strategy and Machine Learning from Landsat-5/8 in a Typical Inland Lake of China" Remote Sensing 16, no. 12: 2168. https://doi.org/10.3390/rs16122168

APA Style

Xie, G., Bai, X., Peng, Y., Li, Y., Zhang, C., Liu, Y., Liang, J., Fang, L., Chen, J., Men, J., Wang, X., Wang, G., Wang, Q., & Ren, S. (2024). Aquaculture Ponds Identification Based on Multi-Feature Combination Strategy and Machine Learning from Landsat-5/8 in a Typical Inland Lake of China. Remote Sensing, 16(12), 2168. https://doi.org/10.3390/rs16122168

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop