4.2. Simulation Accuracy of the Global LULC Dataset PLUS Model before and after Correction
The land use data, parameters (constraint matrix and domain weights in the target year), and land use demand quantity for each year in the Ili–Balkhash Basin were imported into the PLUS model, and the land use status of the basin in 2020 was simulated based on the land use data for different years. The final simulation effect of the LULC dataset in the basin before and after the correction was obtained. The land use demand adopts the land use quantity from the actual LULC data in 2020 during the validation simulation phase and in the future prediction simulation phase using CA–Markov prediction (i.e., the Markov chain part of the PLUS–Markov future simulation). This work used three simulation periods with different time scales for both uncorrected and corrected data, and the year with the best simulation effect was selected as the base data for the study area’s future simulations.
4.2.1. Corrected Global LULC Data Accuracy Check
- (1)
Uncorrected Global LULC Dataset PLUS Simulation
The production of the same global LULC dataset adopts the same classification criteria and algorithms. Although there are some differences from the actual situation in detail, the accuracy of LULC data of the same product in different years in the Ili-Balkhash Basin remains basically in a similar range. The trend of LULC variation in different years can reflect the actual LULC variation in the basin from a macroscopic perspective over a long time series. The consistency within the uncorrected LULC product with the product ensures that the PLUS model runs smoothly and that the data are not affected by subsequent human adjustments. In order to show more objectively the simulation and prediction effects of the PLUS model itself in the basin based on two global datasets and to discuss the applicability of the model, the uncorrected LULC data for each time period were input into the model for simulation and prediction. The effectiveness and accuracy of its simulation and prediction become one of the reference bases for model setting and data selection for LULC spatial simulation in the basin.
The actual land use quantity of GLC_FSC30 data in 2020 was used as the land use demand, and the simulation results were obtained by inputting land use data from different time periods and combining the parameters (
Figure 8). The simulation results for different time periods using the GLC_FCS30 product had fewer changes in the various types of LULC patches compared with the target year (2020 GLC_FCS30). In the southern part of the Ili River valley, there were fewer patches of cropland and barren areas in the simulation results in each year than in the target year. Among the simulated land use classifications based on land use data at different time intervals, the simulation results for 2000–2020 (
Figure 8b) were poor. The areas of urban and built-up land and croplands were smaller than in the target year, while the area of croplands in the northeastern part of the basin was excessive. The simulation of water bodies for western Lake Balkhash produced more barren area image elements compared to the target year. The simulation results based on data from 2010–2020 (
Figure 8c) were relatively close to the actual data but had the same problems, only to a lesser extent. The distribution and sizes of various types of land uses, especially urban and built-up land, in the simulation results based on 2015–2020 data (
Figure 8d) were closer to the actual data. The GLC_FCS30 data showed an interspersed distribution of image elements in each class, which was not homogeneous, and, therefore, there were more interspersed spots in each class in the simulation results.
The uncorrected GlobeLand30 land use data for each year, the parameters (land use quantity, constraint matrix, and domain weights for the target year), and the land use demand were imported into the PLUS model to obtain the 2020 land use simulation results based on data for different year intervals (
Figure 9). The simulation results for 2020 obtained using 2000 as the recent data had an oversized grassland area, which was obviously not consistent with the actual situation in the target year. Each LULC type in the simulation results based on 2010 data was almost consistent with the target year. The simulated area of urban and built-up land was slightly larger than that in the target year.
We used Kappa coefficients and FoM coefficients to quantitatively measure the accuracy of simulations with uncorrected GLC_FCS30 and GlobeLand30 data combined with the PLUS model (
Table 5). When the PLUS model simulates the original 2020 GLC_FCS30 land use data, the simulation result based on 2015 data has the highest Kappa coefficient of more than 0.75 and a relatively high FoM coefficient of 0.074. The simulation result based on 2000 data has the lowest Kappa coefficient (0.69), but the FoM coefficient was the highest among the three simulations (0.083). GlobeLand30 simulations based on 2000 data are less accurate than those based on GLC_FCS30 data. GlobeLand30 simulations exhibit high Kappa coefficients (0.86) and FoM coefficients in simulations using data from the last decade (2010–2020), and the simulation results are nearly identical to those of the target year.
- (2)
Uncorrected global LULC dataset PLUS–Markov chain forecasts
The land use demand quantity for future years had been obtained by forecasting, and, therefore, it was also necessary to explore the variation of simulation accuracy after forecasting. In this work, the land use demand quantity for future years was obtained through the already embedded CA–Markov chain forecasting module in the PLUS model. Historical land use data were combined with the CA-Markov chain 2020 projections to obtain the predicted figure utilization demand values for 2020 before making the actual projections. The value was input into the PLUS model for simulation to verify whether the PLUS simulation accuracy changed when combined with the forecast value. The simulation validation for the combined CA–Markov chain prediction used two periods of historical LULC data for the predictive and one period of existing LULC data for validation. Therefore, after taking 2020 as the validation data, the two-period land use data for the last 5 or 10 years were selected forward in time as the basis for the prediction.
The 2020 prediction obtained based on GLC_FCS30 land use data for different time periods was used as the land use demand, and the simulation results were obtained by combining the parameters (
Figure 10). The simulation results differed only slightly from the simulations using the actual land use quantities in 2020 as the land use demand. There were problems with inaccurate cropland simulation in the southern Ili River Valley in China and smaller patches of urban and built-up land in the southern part of the basin. The prediction simulation results based on the predicted values derived from 2010–2015 were relatively accurate for croplands, and the remaining land categories also only differed slightly from the actual data.
The simulation results for 2020 based on the GlobeLand30 recent data combined with the predicted values obtained using the CA–Markov chain for the land use demand are shown in
Figure 11. The simulation results based on land use data with a 10-year interval varied slightly from the GlobeLand30 2020 LULC status for urban and built-up land, while the simulations for the other land use types were similar to the GlobeLand30 2020 LULC status.
The accuracies of the uncorrected GLC_FSC30 and GlobeLand30 simulations are shown in
Table 6. Compared to the results using the 2020 actual land use quantity as the land use demand with the 2020 data, the Kappa coefficients of the GLC_FSC30 predictions do not vary by more than 0.01. The GlobeLand30 data has the highest consistency (Kappa, 0.85) and the highest FoM coefficient (0.015), and the accuracy was not much different from the GlobeLand30 2010–2020 simulations without CA–Markov chain prediction.
4.2.2. Corrected Simulation and Prediction Accuracy of the Global LULC Dataset
- (1)
Corrected Global LULC Dataset PLUS simulation
Both corrected datasets resulted in improved regional accuracy and produced results that were closer to the actual conditions of the Ili–Balkhash Basin than the uncorrected datasets. The corrected data may have different effects on the model simulation and prediction results. Therefore, the two corrected land use datasets were input into the PLUS model to further investigate the PLUS model simulation accuracy in the study area using the available data.
The corrected GLC_FCS30 simulation results with the actual land use demand quantity in 2020 used as the land use demand are shown in
Figure 12. The simulation results for each year of data were similar to the 2020 LULC data, with almost no issue of excessively small urban and built-up land patches. However, there are still inaccuracies in simulating the croplands in the southern part of the Ili river valley. In the simulation results based on the 2000–2020 data the development of bare ground patches was still not consistent with the actual data, and the bare ground patches simulated based on the data for the last two decades still intruded too far into the area at the western end of Lake Balkhash. The simulation results based on the 2010–2020 and 2015–2020 data were in general agreement with the actual 2020 data.
The simulation results based on the corrected GlobeLand30 data using the actual land use demand quantity in 2020 as the land use demand are shown in
Figure 13. Because only the 2000 data were corrected, the corrected Globeland30 data only produced one phase of simulation results. In the simulation results for the corrected 2000–2020 data, the spatial distribution of barren areas and grasslands tended to be correct, but the barren area patches still differed significantly from the actual 2020 data, and there were obvious errors in their simulation. The simulated forests patches in the northwestern part of the basin developed abnormally, and the urban and built-up land areas were obviously smaller compared with the target year, while the simulation of cropland in the basin was neither too large nor too small, which was more consistent with the actual data.
The Kappa and FoM coefficient values of the two corrected global land use datasets simulated using the PLUS model are shown in
Table 7. The Kappa coefficients of all three simulations for the corrected GLC_FCS30 data increased by approximately 0.2 to 0.3, and the FoM coefficients also increased to different degrees with values of approximately 0.1. Meanwhile, the Kappa coefficient of the simulation based on the last decade of data (2010–2020) reached more than 0.75, indicating high consistency. The Kappa coefficient of the present status simulation based on the 2015–2020 data exhibited the highest consistency at 0.79, while the FoM coefficient of this result was higher than that of the simulation based on the last decade of data (2010–2020).
The Kappa coefficient of the corrected GlobeLand30 2000–2020 simulations decreased by 0.12 compared to the uncorrected period, and the Kappa coefficient was only 0.58, which is less consistent with the target year. The FoM coefficient was 0.013, which is lower than those of the uncorrected data simulations. In contrast, the 2010–2020 data were not corrected, so only the simulation accuracy of the 2000–2020 time period changed after correction, and the simulations based on data from the last decade (2010–2020) still had the highest Kappa coefficients and relatively high FoM coefficients.
- (2)
Corrected global LULC dataset PLUS–Markov chain forecasts
As in the previous
Section 4.2.1. prat (2), the projected land use demand quantities for 2020 obtained by combining CA–Markov chain prediction based on land use data for different time periods and comparing them with the actual results for 2020 were used to measure the simulation accuracy. The corrected GLC_FCS30 data combined with CA–Markov chain predicted simulation results for 2020 are shown in
Figure 14. The overall simulations for cropland and urban and built-up land areas in the southern part of the basin were still small. For forests, the predicted simulation results for 2020 based on the 2000–2010 data were closer to the actual situation. In contrast, the prediction simulation based on 2010–2015 data was closer to the reality for croplands, urban and built-up lands, and barren areas, with relatively good results.
The results of the corrected GlobeLand30 data combined with CA–Markov chain 2020 forecast simulations are shown in
Figure 15. The simulation of urban and built-up lands in the Ili River Valley area was slightly smaller than the actual situation, and the simulations of the rest of the land types were almost identical to the actual situation.
The accuracies of the corrected prediction results for both data products combined with CA–Markov chain prediction are shown in
Table 8. The Kappa coefficient of the corrected GLC_FCS30 prediction results based on 2010–2015 data improved to 0.77, and the FoM coefficient increased by 0.001 (0.008). The Kappa coefficient of the prediction results based on corrected GLC_FCS30 2000–2010 data remained unchanged, while the FoM coefficient improved. The corrected GlobeLand30 data had the highest Kappa coefficient of 0.84, which was slightly lower than the Kappa coefficient of 0.85 for the simulation with actual data used for the land use demand, but still has high consistency. Meanwhile, the FoM coefficient was higher, at 0.4, compared with the uncorrected prediction simulation results. This shows that the dataset accuracy improvement process also has a positive effect on the simulation accuracy. Moreover, both the PLUS model and the predicted simulation results after incorporating Markov chain prediction achieved high accuracy closer to the actual land use status.
4.3. Prediction
The PLUS model provides linear regression and Markov chain methods to predict the classification of future land use demands. The CA–Markov chain method was used in this study to generate the predictions. The simulation and prediction validation process based on uncorrected and corrected LULC data in this work has shown the ability of two global land use products combined with the PLUS model to simulate LULC in the Ili-Balkhash Basin. CA–Markov chain predictions require only two periods of data, and therefore we combined the modified GLC_FCS30 data for the last five years (2015–2020) and GlobeLand30 data for the last ten years (2010–2020) for the forecast. The selected LULC data for these two time periods achieved the best results and accuracy in the combined uncorrected and corrected simulation and prediction results. Therefore, it was used to predict the LULC development in the Ili-Balkhash Basin for the next five years and the next ten years under natural scenarios (
Figure 16).
The 2025 prediction results for the corrected GLC_FCS30 data are shown in
Figure 16a. The spatial pattern of land use has not changed significantly, and the grasslands category is distributed throughout the basin, with a large area of grasslands in the northern part of the basin. Barren areas are concentrated in the vicinity of Lake Balkhash, with a vast barren area south of the lake and in the Ili River Valley. Croplands are concentrated in the plain in the southeastern part of the basin and the vicinity of urban peripheries. Forests are distributed in the mountainous area in the southeastern part of the basin. The increases or decreases in various types of patches in the basins under the 2025 simulation are not significant. The land use transfer matrix (
Table 9) indicates that grasslands have the largest transfer out with a reduction of 8478.34 km
2, mostly to cropland and bare land with very small transfers to water bodies and urban and built-up lands. A small part of forest is converted to cropland. The expansion of croplands, covering an area of 4348.35 km
2, represents the largest extent, with grasslands being the primary contributor to this expansion. Additionally, approximately 207.08 km
2 of forests have been converted into croplands. Urban and built-up lands have experienced a transfer area of 20.84 km
2, which accounts for less than 0.0001% of the total basin area and approximately 0.01% of the urban and built-up land area in 2020. Simultaneously, there is almost no conversion of urban and built-up lands to other land types. In general, urban and built-up lands experienced an overall increase in 2025, with a net addition of 15.47 km
2. The increase in area was predominantly sourced from grasslands. There is a reciprocal transfer phenomenon between grasslands and water bodies. From 2020 to 2025, the transfer area from grasslands to water bodies is greater than the transfer area from water bodies to grasslands. The transfer area from water bodies to grasslands amounts to only 181.95 km
2.
The GlobeLand30 prediction results for 2030 are shown in
Figure 16b The spatial distribution of each LULC type does not change drastically compared with the 2025 forecast simulation result. There is a significant increase in the urban and built-up lands, concentrated in the southern part of the basin. There is also an expansion of urban and built-up lands and barren areas in the northeastern part of the basin. The forested area of the GlobeLand30 product is significantly lower than that of the uncorrected GLC_FCS30 data, so the simulation data still maintains a small forested area. The area transfer matrix (
Table 10) for each category shows that all types of land use have been converted, but the transfer amounts are small. Grasslands still have the largest reduction, amounting to 1321.16 km
2, mainly converted to croplands, barren areas, and forests. A total of 744 km
2 of croplands were converted into urban and built-up lands, whereas a negligible conversion of urban and built-up lands (11.99 km
2) occurred into croplands. However, the overall growth of urban and built-up lands remains significant in the 2030 prediction results, with a net increase of 781.62 km
2. The extent of barren areas has increased, primarily sourced from grasslands. Additionally, this increase includes a conversion area of 110.96 km
2 from water bodies to barren areas. Moreover, a small amount of forests have also been converted into croplands and urban and built-up lands. The reciprocal conversion phenomenon between grasslands and water bodies still exists, but the total area converted from land types other than grasslands to grasslands is only 58.02 km
2.
In general, based on the transfer matrices calculated from the simulated results for the Ili-Balkhash Basin in 2025 and 2030, it can be observed that the transformations mainly occur between grasslands, barren areas, croplands, and urban and built-up lands, while the transformation areas of other land types remain relatively small. During the prediction process, there were no restrictions imposed on the reciprocal conversions between barren areas and most other land types. However, in both predicted years, there is an ongoing expansion observed in the extent of barren areas, indicating that under a natural scenario, the area of barren areas will persistently increase. Simultaneously, there is almost no conversion from other land types to grasslands. The area of grasslands significantly decreases in both predicted years compared to other land types, with only a minimal amount of water bodies converting to grasslands. Under the natural scenario without any external forces, there is significant growth in urban and built-up lands. In the simulation results for both time periods, croplands are primarily derived from a substantial conversion of grasslands. The increase in urban and built-up lands over the next ten years mainly originates from the conversion of grasslands, while over the next decade, it predominantly shows an expansion based on croplands. From a five year perspective, the growth of croplands and urban and built-up lands will be more pronounced. However, in the next ten year timeframe, the expansion and changes in urban and built-up lands will be more significant.