Next Article in Journal
China’s Largest City-Wide Lockdown: How Extensively Did Shanghai COVID-19 Affect Intensity of Human Activities in the Yangtze River Delta?
Previous Article in Journal
Accuracy Assessment of the Positioning of a Swarm of Underwater Vehicles in Relation to Four Surface Vehicles Using the TDOA Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Extraction of Cotton Information with Optimized Phenology-Based Features from Sentinel-2 Images

1
Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China
2
College of Geography and Environmental Sciences, Zhejiang Normal University, Jinhua 321004, China
3
College of Surveying and Mapping and Geographic Science, Liaoning Technical University, Fuxin 123000, China
4
College of Resources and Environmental, University of Chinese Academy of Sciences, Beijing 100049, China
5
CAS Research Center for Ecology and Environment of Central Asia, Urumqi 830011, China
6
Suzhou Argo Space Technology Co., Ltd., Suzhou 215000, China
7
Institute of Genetics and Plant Experimental Biology of the Academy of Sciences of the Republic of Uzbekistan, Tashkent 100047, Uzbekistan
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Remote Sens. 2023, 15(8), 1988; https://doi.org/10.3390/rs15081988
Submission received: 27 February 2023 / Revised: 1 April 2023 / Accepted: 5 April 2023 / Published: 10 April 2023

Abstract

:
The spatial distribution of cotton fields is primary information for national farm management, the agricultural economy and the textile industry. Therefore, accurate cotton information at the regional scale is required with a rapid increase due to the chance provided by the huge amounts of satellite images accumulated in recent decades. Research has started to introduce the phenology characteristics shown at special growth phases of cotton but frequently focuses on limited vegetation indices with less consideration on the whole growth period. In this paper, we investigated a set of phenological and time-series features with optimization depending on each feature permutation’s importance and redundancy, followed by its performance evaluation through the cotton extraction using the Random Forest (RF) classifier. Three sets of 31 features are involved: (1) phenological features were determined by the biophysical and biochemical characteristics in the spectral space of cotton during each of its five distinctive phenological stages, which were identified from 2307 representative cotton samples using 21,237 Sentinel-2 images; (2) three typical vegetation indices were functionalized into time-series features by harmonic analysis; (3) three terrain factors were derived from the digital elevation model. Our analysis of feature determination revealed that the most valuable discriminators for cotton involve the boll opening stage and harmonic coefficients. Moreover, both qualitative and quantitative validation were performed to evaluate the retrieval of the optimized features-based cotton information. Visual examination of the map exhibited high spatial consistency and accurate delineation of the cotton field. Quantitative comparison indicates that classification of RF-coupled optimized features achieves improved overall accuracy 5.53% higher than that which works with either the limited vegetation indices. Compared with all 31 features, the optimized features realized greater identification accuracy while using only about half the number of features. Compared with test samples, the cotton map achieved an overall accuracy greater than 98% and a kappa more than 0.96. Further comparison of the cotton map area at the county-level showed a high level of consistency with the National Bureau of Statistics data from 2020, with R2 over 0.96, RMSE no more than 14.62 Kha and RRMSE less than 17.78%.

Graphical Abstract

1. Introduction

Accurate information regarding the area and spatial distribution of crops is of practical importance for food security, global change and sustainable development [1,2,3]. Cotton, as one of the most crucial cash crops, accounts for approximately 79% of the world’s natural fibers and generates at least $600 billion in economic benefits annually for the textile industry [4,5,6,7]. Furthermore, cotton is a significant oilseed crop, and its seeds can be improved for food and fodder [8]. More than 100 countries worldwide cultivate cotton [9]. Especially in China—the world’s largest cotton producer and consumer—cotton has played an essential role in the economy and livelihood [10]. Moreover, cotton cultivation area provides fundamental data for yield estimation and can guide agricultural management, garment manufacturing development, international trade and other fields [11]. Therefore, accurate and timely monitoring strategies for cotton planting area are a prerequisite for agricultural management, trade and livelihood.
Traditional methods for cotton area monitoring have higher accuracy at small scales, such as field surveys, statistical sampling and visual interpretation. However, those are generally time-consuming and labor intensive when applied to regional scales [12,13]. Remote sensing has been proven to be an effective tool for mapping large areas of cotton fields [14,15,16]. Currently, optical remote sensing cotton mapping can be classified into three categories based on different methods: single-time phase, time-series image single feature parameter and time-series image multiple phenological feature approaches. The single time-phase method generally uses spectral features from one or a few periods, focusing on retrieving temporal profiles of cotton fields with prominent spectral differences from the background. Min et al. [11] employed China–Brazil Earth Resources Satellite (CBERS) and Huanjing (HJ) images of the cotton bud stage combined with a decision tree classifier to extract cotton area from Dingzhuang Town, Shandong Province. The single-time phase approach is more interpretable and more straightforward. However, the number of available images, the spectral similarity of different land types and spatial and temporal heterogeneity pose challenges to its application at the regional scale [17,18]. The integration of cotton’s phenological information and seasonal patterns into the remote sensing identification process could reduce the phenomenon of “different things sharing the similar spectrum” in single-phase identification, potentially improving cotton’s spectral separability from other land cover types and providing the possibility to improve classification accuracy further [1,19,20,21]. The time-series single-feature parameter method commonly adopts only one index to represent the whole growing season of cotton, such as Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI) and other spectral indices. For example, Zhong et al. [22] used phenology indicators extracted from Moderate Resolution Imaging Spectroradiometer (MODIS) NDVI time series in combination with a decision tree classifier for cotton mapping in the San Joaquin Valley, California. Further, some new methods of processing single features, such as Fast Fourier transform (or harmonic regression) and spatio-temporal fusion methods, are increasingly applied in cotton remote sensing extraction. Time-series dynamic change information extracted from remote sensing images utilizing harmonic regression has also proven to be one of the effective features in identifying cotton [10,23]. The utilization of a single time-series features in remote sensing images to extract information on phenological changes confers obvious benefits in regions with a simpler cropping structure. However, due to limited knowledge of cotton phenological characteristics at the remote sensing scale and computational processing efficiency, indices such as NDVI and EVI are predominantly utilized. While these indices characterize vegetation greenness and can be adapted to some phenological stages of cotton, they may not be optimal feature variables for all phenological stages, such as during the boll opening stage where grayish-white visual features are present, potentially reducing the spectral separability of cotton from other land cover types. Therefore, there is urgent to develop multiple phenological features suitable for different phenological stages of cotton and to improve the classification accuracy, particularly in large areas and regions with complex planting structure.
Numerous studies have reported successful regional-scale mapping of cotton and other crops using multiple phenological features [24,25,26,27]. Google Earth Engine (GEE) has enhanced the efficiency of remote sensing image processing. It objectively facilitates the development of time-series multiple phenological features due to the high-speed parallel computing resources based on Google’s computational infrastructure available to users [28,29]. For example, Han et al. [30], based on GEE, employed multi-spectral bands and spectral indices derived from Landsat5,8 and Sentinel-2, respectively, combined with a Random Forest (RF) classifier to extract cotton from the Aogan-Kuche River Oasis in Xinjiang, China. Despite existing methods for cotton extraction based on time-series multiple phenological features having achieved a certain level of success, they did not devise suitable features based on cotton’s physical and biochemical characteristics during various phenological stages and still depend on limited vegetation indices and band reflectance. This limited representation of cotton’s phenology events may increase classification uncertainty, particularly when dealing with crops with similar growing seasons such as cotton and maize [1]. In addition, further investigation is necessary to determine the optimal features for identifying cotton and the possibility of using less costly yet more reasonable feature combinations to obtain higher classification accuracy. In recent advancement, Ni et al. [31] proposed an enhanced pixel-based phenological feature composed method (Eppf-CM) using Sentinel-2 time-series data and, for the first time, employed a combination of features that matched the phenological characteristics of multiple key growth stages for mapping paddy rice in Northeast China. Although their method achieved considerable accuracy, three weaknesses still exist when applying it to cotton mapping: (1) cotton and rice have distinctly different phenological periods; (2) only multiple spectral indices for several phenological stages were considered, ignoring the full utilization of time-series dynamic change information, which has been proved to be one of the most valuable features for cotton extraction; and (3) the optimal combination of features for cotton mapping remains unclear.
The objectives of this study were to develop an optimized time-series multiple phenological feature in GEE with Sentinel-2 imagery to: (1) take full advantage of multiple representative phenological stages and time-series dynamic change information of cotton, (2) evaluate the relative importance and collinearity among features to capture the optimal combination and (3) generate an accurate cotton map at a resolution of 10 m for Xinjiang, the largest cotton cultivation region in China (http://www.fao.org/faostat, accessed on 2 October 2022).

2. Study Area and Data

2.1. Study Area

The study area is located in Xinjiang Province, which is the largest provincial administrative region and cotton cultivation area in China, covering an area of 1.66 million km2 (Figure 1). Xinjiang has a temperate continental arid and semi-arid climate with an average annual precipitation of approximately 170 mm. The frost-free period in the region is 140~220 days, and the annual sunlight duration is between 2300 and 3200 h. The major crops cultivated in the area include cotton, wheat, corn and tomatoes. Cotton production in Xinjiang was 5,160,700 tons in 2020, accounting for 87% of national production (http://www.stats.gov.cn, accessed on 2 October 2022). The Tianshan Mountains divide the entire study area into two sub-regions with significantly different natural conditions in northern Xinjiang (including northern Xinjiang and northwestern Xinjiang, NX and NWX) and southern Xinjiang (SX), which are considered separately in this study for the production of the cotton map based on the natural zoning of Chinese agriculture (http://www.resdc.cn, accessed on 2 October 2022). Previous studies have demonstrated that locally adaptive models are more accurate than using a single model over a wide area [32,33]. Thus, we trained feature optimization and classification models separately in the southern and northern Xinjiang regions.
The growth duration of cotton from sowing to picking is typically divided into five stages: sowing, seedling, squaring, flowering and boll development and the boll opening stage [10]. In this region, cotton is generally sown in mid to late April and harvested in early October, after undergoing two major stages of vegetative and reproductive growth. Pictures of cotton at different growth duration are shown in Figure 2, collected from the Internet. Cotton phenology observation data were collected from the China National Meteorological Information Center (http://data.cma.cn/, accessed on 1 September 2020) and the China Agricultural Information Network (http://www.agri.cn/, accessed on 1 September 2020).

2.2. Remote Sensing Imagery

Considering their high spatial, temporal and spectral resolution and their greater sensitivity to surface changes compared to Level-1C data [34], the Sentinel-2 Multi-spectral Instrument (MSI) Level-2A were employed for this study, providing Bottom of Atmosphere reflectance images derived from Level-1C data. A total of 21,237 scenes Sentinel-2 images covering the study area during the 2020 major crop growth duration (70–300 Julian date, doy) were collected from GEE. In addition, widely applied 30 m DEM data provided by Shuttle Radar Topography Mission (SRTM) were collected from GEE to participate in cotton classification [35].
All Sentinel-2 Level -2A data used in this paper were clipped and mosaiced, followed by removing clouds and rare land classes. Clouds were masked by combining the minimizing thresholds at the scene-level and the cloud masks at the pixel-level. Scene-level thresholds are set via the CLOUDY_PIXEL_PERCENTAGE attribute of the Sentinel-2 image in GEE, and pixel-level cloud masks are generated via the QA60 band. The whole growth duration of cotton was divided into five representative phenological stages based on time-series spectral analysis of cotton. Hence, after cloud masking, the scene-level thresholds of northern Xinjiang were set to 20%, 10%, 20%, 15% and 15%, and the thresholds of southern Xinjiang were set to 30%, 15%, 25%, 15% and 15%. This way, the full coverage of the vegetation in the study area and the minimization of the cloud threshold could be satisfied simultaneously.
There are small portions of this area being water bodies, impervious surfaces and forests. These land categories occupy a tiny percentage of the area within the land cover of the study area, and it is expensive to generate sufficient samples. Therefore, water bodies, impervious surfaces and forested lands were extracted from GSW, GISD and CLCD in 2020, respectively. The three rare land classes were reprojected, resampled to 10 m spatial resolution and then overlaid to generate rare land masks. Finally, the DEM data are resampled to a 10 m resolution and reprojected to the “World Geodetic System 1984 (EPSG:4326)” to ensure that all remote sensing data can be used together.

2.3. Ancillary Data

2.3.1. World Cover 2020

The European Space Agency’s World Cover produced a global land cover product for 2020 with a 10 m resolution [36]. World Cover provided 11 land categories defined by the Land Cover Classification System (LCCS), with an overall classification accuracy of 74.4%. These data are used to generate partial training and test samples automatically.

2.3.2. China Land Cover Dataset (CLCD)

CLCD is a 30 m resolution China land cover product developed by Wuhan University from 1990 to 2020 [37]. The overall accuracy of CLCD is 79.31%, further evaluated by the third-party test sample showing that the overall accuracy of CLCD is better than MCD12Q1, ESACCI_LC, FROM_GLC and GlobeLand30. These data are employed to generate partial training and test samples automatically, as well as to create forest masks.

2.3.3. Global Surface Water Dataset (GSW)

GSW contains maps of surface water from 1984 to 2020 [38] obtained from Landsat 5, 7 and 8 images with 30 m resolution. Errors of omission overall were less than 5%, and the commission errors were less than 1%. We selected the 2020 Yearly Water Classification in GEE, merging the seasonal and permanent water within it to mask the water pixels.

2.3.4. Global Impervious Surface Dynamic Dataset (GISD)

GISD is an accurate global 30 m impervious-surface dynamic distribution data utilizing time-series Landsat imagery from 1985 to 2020, produced by the Aerospace Information Research Institute, Chinese Academy of Sciences, with an overall accuracy of 90.1% [39]. We selected the 2020 GISD to generate an impervious-surface mask.

2.4. Reference Data

2.4.1. Sample Data

Sample data were employed to train the classification models for cotton and for evaluation, obtained from automated sampling points, visual interpretation and field surveys (Figure 1). Firstly, 6777 cotton samples and 6646 non-cotton samples (other crops, such as wheat, maize and tomatoes) were collected from visual interpretation of Google Earth very high resolution (VHR) data supplemented with field surveys. The above set was divided by stratified random sampling into a training set (80%) and a validation set (20%). Secondly, 2675 non-cotton samples (non-cropland, including grassland, bare land and snow/ice) were automatically collected from existing land cover products based on hexagon grids (Figure 1), which have been proven effective by many studies [37,40,41]. CLCD 2020 and World Cover 2020 were processed by reprojection and reclassification, followed by bilinear resampling to 10 m resolution and then overlaying into one raster data. Based on the overlayed raster data, 2675 non-cotton samples were randomly sampled from 126 hexagon grids (Figure 1) covering non-desert regions of the study area, of which 75% were added to the training set, and 25% were added to the validation set. Finally, 831 random test samples (31 cotton and 797 non-cotton samples) covering the study area were added through visual interpretation of VHR data to the validation set.

2.4.2. Statistical Data

Cotton area was collected from the Xinjiang Uygur Autonomous Region Bureau of Statistics, China and the Xinjiang Production and Construction Corps, China (https://tjj.xinjiang.gov.cn/, accessed on 2 October 2022). The cotton area statistics of all county-level cities in the study area during 2020 were obtained. It should be noted that there are fourteen Production and Construction Corps in Xinjiang whose area data are independent of the Autonomous Region statistics, and their administrative boundaries are blurred. Since this paper is based on administrative vector boundaries for comparative analysis of each county-level city area, this study combines each Corps with adjacent county-level cities into one county-level city for cotton area calculation. Based on this, a total of 79 county-level statistical units were obtained to verify the consistency between the cotton classification results and statistical data in this study.

3. Methods

Firstly, raw data were processed before analysis and classification. Secondly, the time-series multiple phenological feature (Tmpf) for cotton was extracted. Then, Tmpf was optimized by sorting the importance and redundancy between features. Finally, two RF classifiers (northern and southern Xinjiang) based on sample data and optimized features were used for cotton classification, and performance assessment was conducted. The general classification framework is shown in Figure 3.

3.1. Time-Series Multiple Phenological Feature

3.1.1. Five Representative Phenological Stages Features

To enhance the spectral separability of cotton from other land covers, five representative phenological time windows were retrieved (sowing, seedling and squaring, flowering, boll development, boll opening) by the time-series analysis of the eight spectral indices and red-edge indices (Table 1). The stage of seedling and squaring were combined due to the similarities in cotton’s appearance and phenological development during these stages, characterized by vegetation growth predominating.
These indices were selected for the following reasons: (1) due to their successful for bare soil mapping and monitoring, NDSI, Bare Soil Index (BSI) were selected to detect the bare soil signal during the sowing stage (SOS) [42,43,44]. (2) NDVI and EVI were selected for its sensitivity to vegetation development to retrieve the rapid development of the seedling and squaring stages (SES) [45]. (3) The water requirement of cotton reaches its growth stage peak during the flowering and boll development stage (FBS), and LSWI was selected to identify the entire FBS to its greater sensitivity to leaf water content. FBS could be divided into two parts: the flowering stage (FLS) and the boll development stage (BDS); REPI was employed to differentiate these due to high sensitivity to changes in chlorophyll content [46]. (4) Signals of crop senescence are generally characterized by a decrease in chlorophyll content and an increase in carotenoids. PSRI and Structure Intensive Pigment Vegetation Index (SIPI) can respond to changes in carotenoid and chlorophyll content and be selected to identify the start of cotton senescence at the boll opening stage (BOS).
The cotton spectral time-series profiles in Figure 4 are derived from 2307 representative cotton pixels. To save cost and obtain representative pixel points, we randomly sampled 50% of the 5421 cotton field training samples within the study area. Then, we selected 2307 pure cotton pixels evenly distributed throughout the random sample points based on Google Earth VHR visual interpretation. For each cotton pixel, all available Sentinel-2 images covering the study area in 2020 were employed to calculate the spectral indices after 70% scene-level cloud screening and cloud pixel masks. Considering that a relatively high cloud threshold may lead to more cloud contamination and thus generate data noise that affects the reliability of the observations, this paper filters out the disturbance of extreme outliers through the outlier test (Equation (1)).
o t s = {   f c s > Q 3 + 3 × ( Q 3 Q 1 )   f c s < Q 1 3 × ( Q 3 Q 1 )
ots is the set of sample points of extreme outliers on a day; fcs is the set of features on that day; and Q3 and Q1 are the upper and lower quartiles of each feature, respectively.
The light blue ribbon (Figure 4) indicates the double standard deviation of reliable cotton observations, and the blue line is the median value of all reliable observations on each doy. We further employed the Savitzky–Golay (SG) algorithm for the blue line, the curve after smoothing as a black line [47]. We set the length of the filter window of SG to 10 and set the order of the polynomial used to fit the samples to 2, according to a previous study [31]. The five representative phenological stages were retrieved based on Figure 4 and a priori knowledge of cotton phenology:
In SOS, cotton at this stage is in the state of sown or not yet sown, mainly characterized by a significant soil signal. Based on the continuous high-values of BSI and NDSI in Figure 4a,b, SOS was determined to be 75–135 doy.
In SES, the cotton was dominated by vegetative growth, with rapid development of roots, stems, leaves and branches and gradual increase of canopy closure. At the pixel scale, it is shown as a continuous increase in chlorophyll signal that can be identified by the rapid increase in NDVI and EVI on 135–180 doy in Figure 4c,d.
In FBS, cotton turned into vegetative and reproductive growth concurrently which could be divided into FLS and BDS.
In FLS, cotton turned into vegetative and reproductive growth concurrently, but vegetative growth was greater than reproductive growth during this stage. Cotton leaf area and water demand were at the peak of growth duration, with numerous buds and gradually increased flowers. The pixel scale was characterized by a gradual saturation of chlorophyll and water signals, corresponding to the gradual saturation of LSWI. Especially, REPI, highly sensitive to changes in chlorophyll content, reached a local peak (210 doy). Therefore, FLS was determined to 180–210 doy.
In BDS, reproductive growth was dominant at this stage, mainly demonstrated by many flowers and bolls developing. The increment of cotton leaf area slowed down, and the nutrient delivered to reproductive organs gradually increased. The pixel scale was characterized by a slight decrease in chlorophyll content from the peak. However, the chlorophyll content and water requirement were still at stable high values, which corresponded to the start of a local peak in REPI to the start of a rapid decline in LSWI (210–250 doy).
In BOS, cotton bolls open, and the cotton plant gradually senesces during this stage, with a rapid decrease in chlorophyll content. Further, vegetative growth gradually stopped, and reproductive growth also slowed down. PSRI and SIPI in Figure 4g,h rapidly rose at 250–300 doy. Therefore, BOS was determined to be 250–300 doy.
Ten indices were selected to composite new features with adequate phenological information (Table 1). BSI and NDSI were selected in SOS. NDVI and EVI were selected in SES. LSWI, NDRE and REPI were selected in FLS and BDS. PSRI, SIPI and Enhanced Blossom Index (EBI) were selected in BOS. We employed the 85th percentile composite for all vegetation indices (NDVI, EVI, NDRE) and EBI. The median composite was used for the other indices. The median composite could avoid interference caused by residual clouds (high reflectance) and shadows (low reflectance). The 85th percentile composite method could reduce outliers and obtain the peak of greenness and whiteness [48]. Reasons for selecting the above indices were as follows: (1) a total of eight indices, NDSI, BSI, NDVI, EVI, LSWI, REPI, SIPI and PSRI, were selected due to their outstanding performance in cotton phenological periods analysis. (2) The Normalized Difference Red Edge Index (NDRE) was selected for FLS and BDS due to its highly sensitive to chlorophyll content [49]. More importantly, it was difficult to fully characterize the gray–white visual signal of cotton during BOS by the greenness vegetation index. EBI has proven to be a reliable spectral index for quantifying almond bloom phenology [50]. The gray–white characteristics of almond bloom were similar to the opening of boll; hence, EBI was introduced to BOS.

3.1.2. Time-Series Dynamic Change Features

Multiple Sentinel-2 images of the crop at the same location during different growth stages provide an opportunity to capture continuous time-series change information. Examples of successful methods for extracting time-series features from discrete time-series and applying them to crop classification include double-sigmoid [51] and harmonic regression [52,53], etc. We selected harmonic regression because it can be easily and seamlessly deployed over an extensive area in GEE through linear regression functions and can match vegetation phenology fluctuations [54]. Previous studies have shown that harmonic coefficients are excellent features for crop classification [23,53,55]. This paper applied harmonic regression to EVI, NDVI, LSWI time-series (Equation (2)):
S I i = a i + b i 1 cos ( 3 π t ) + b i 2 sin ( 3 π t ) + c i 1 cos ( 6 π t ) + c i 2 sin ( 6 π t )
The coefficient a i measures the overall vegetation phenology; coefficients b i 1 and b i 2   represent general changes in vegetation phenology; and coefficients c i 1 and c i 2 represent minor fluctuations in the vegetation phenology. t represents the time during the growth duration when the images were taken and are expressed as a fraction between 0 (March 15) and 1 (October 20).
Fifteen coefficients of EVI, NDVI and LSWI time-series were extracted by harmonic regression. In addition, the widely applied terrain factor derived from DEM, slope and aspect were used together with elevation for cotton classification. We finally collected a total of 31 candidate features for cotton mapping. These features included: (1) phenological information covering all representative reproductive stages of cotton and (2) time-series change information within the cotton growth duration and terrain features. These features captured the specific phenological information of cotton in different dimensions, thus having robustness to represent the characteristics of the cotton at pixel scale comprehensively.

3.2. Feature Optimization

Feature optimization (or selection) plays a crucial role in determining the efficiency of machine learning models. While we constructed a time-series multiple phenological feature set suitable for cotton classification, it remains unclear which features yield superior performance. Numerous features may lead to unnecessary computational burden, resulting in overfitting and reduced interpretability. Additionally, high correlations may exist, complicating the model while providing only redundant information [34]. To retrieve the more important subset of candidate features with low collinearity in northern and southern Xinjiang, we developed a two-step data-driven machine learning method. In the first phase, we evaluated the relative importance of each candidate feature using a ten-average permutation importance (PI) metric. This involved permuting the column of a single predictor feature and calculating the accuracy based on RF embedded out-of-bag (OOB) samples. We chose OOB instead of the training set as the accuracy measure to maximize the number of available samples as an accuracy measure, while also ensuring the generalization of the model. The feature’s importance was determined the difference between the baseline and the drop in overall accuracy caused by permuting the column [56]. This process was repeated ten times to obtain more robust results, and each feature’s final importance was the average of total records. After sorting the importance in descending order, we sequentially input the top n (ranging from 1 to 31) features into the RF classifier for training. Then, we recorded the ten-fold cross-validation accuracy (CV) for each group. The feature combinations that reach the highest CV among the top n were selected and input to the second phase for collinearity elimination. In this phase, the collinearity between features was evaluated using the Spearman correlation coefficient [57]. The lower-importance features with correlation coefficients greater than 0.8 were iteratively eliminated. Spearman correlation converts multiple variables to ranked values and then operates the standard Pearson correlation on those.

3.3. Classification

Two RF classifiers combined with the optimized features were employed to produce cotton maps of northern and southern Xinjiang. RF was an ensemble machine learning algorithm based on the bootstrap aggregating (bagging) technique [56]. It has proven successful in many remote sensing applications, including land cover change, crop mapping and soil components estimation, due to its reduced susceptibility to over-fitting [32,58]. Two parameters were adjusted in assessing features’ importance: (1) n_estimators—the number of trees and (2) min_samples_leaf—the minimum number of samples required to be at a terminal leaf. These parameters were set to 300 and 10, repectively, based on previous studies [34]. In addtion, three parameters were adjusted based on CV using the grid search method: (3) max_depth—the maximum depth of a single tree, (4) max_features—the number of features to consider when looking for the best split and (5) min_samples_split—the minimum number of samples required to split an internal node. Other parameters were deployed as default. After local parameterization, two trained RF models were transferred to GEE through the geemap [59]. The whole work of parameterization and classification was based on the Python 3.8 scikit-learn library and geemap. Further, a “despeckler” algorithm was applied to alleviate the salt-and-pepper noise in pixel-scale classification results [60]. For cotton patches smaller than 0.25 hectares, the classification results were updated by a square kernel-based majority filter with a radius of 20 m and iterations equal to 10. This approach effectively classified most isolated pixels as background and filled small gaps within cotton fields.

3.4. Performance Assessment

We designed four feature scenarios to demonstrate the superiority of the cotton-specific Tmpf. These scenarios include NDVI and EVI in five representative phenological stages (FC1), multiple phenological features in five phenological stages (FC2), the time-series multiple phenological features of cotton (FC3) and the optimized features (FC4). Each feature scenario’s CV was calculated under the same conditions of control training samples and RF classifier’s parameters. Further, we also calculate its overall accuracy (OA), Precision (Pre), Recall (Rec) and kappa coefficient based on the same validation samples for multi-level comparative analysis.
The cotton classification results were assessed across three levels: congruence with high-resolution spatial imagery, accuracy through validation samples and consistency with statistical data. OA, Rec, Pre and Kappa were calculated using 4214 validation samples (composed of 1389 cotton samples and 2825 non-cotton samples). In addition, the cotton map area was compared with statistical data, while coefficients of determination (R2), root-mean-square error (RMSE) and relative root-mean-square error (RRMSE) were calculated at the county-level city scales. The cotton map’s area was determined by counting the number of pixels, with each pixel covering an area of 100 square meters.

4. Results

4.1. Variable Importance and Optimized Features

Figure 5 depicts the separability of features for cotton and non-cotton (except for terrain features) in southern and northern Xinjiang, based on the training set, to increase interpretability. Nearly all features exhibit clear distinguishability in both areas. However, the distributions of NDVI and EVI composited in SES show significant overlap, likely due to the rapid greening of most vegetation during this period. In the enhanced spectral features of five representative stages in southern Xinjiang (Figure 5a–c), the features with higher separability were concentrated in FBS and BOS, including LSWI within FLS (LSWI_3) and LSWI within BDS (LSWI_4), EBI, SIPI and PSRI. Similarly, in northern Xinjiang, the enhanced spectral features with high distinguishability were also concentrated in BOS and FBS, including LSWI_3, LSWI_4, SIPI, PSRI and EBI (Figure 5e–f). Although the overlap between NDVI and EVI has improved slightly in southern Xinjiang, it remains a minor discriminative feature.
In the time-series dynamic change features of southern Xinjiang, the most distinguishable features are the harmonic coefficients of the sin (3πt) term of EVI, NDVI and LSWI time-series (EVI_sin3, NDVI_sin3 and LSWI_sin3), followed by the harmonic coefficients of the cos (3πt) term (Figure 5d). In northern Xinjiang, the highest distinguishable coefficients in time-series dynamic features are those of the cos (3πt) term, followed by the constant term coefficients (Figure 5h). The harmonic coefficients of the sin (3πt), cos (3πt) and constant term reflect the overall and general change in vegetation phenology, allowing for relatively high separability. Overall, the features developed in this paper exhibit high separability and consistency in both regions.
Figure 6 displays the average permutation importance of variables and the ten-fold cross-validation accuracy of different feature combinations in both southern and northern Xinjiang regions. In southern Xinjiang, the results show that the accuracy peaks (red point, 0.9614) after including the top 24 features (Figure 6a). Subsequent addition of features resulted in a decline in accuracy. Additionally, EVI_sin3 exhibited the highest importance among the variables. Enhanced spectral features that exhibited high importance were mainly concentrated in BOS and FBS, including EBI, REPI within BDS (REPI_4), LSWI_4 and SIPI. This is consistent with the information provided in the boxplots of southern Xinjiang (Figure 5a–c). Futhermore, terrain factors such as elevation also contributed significantly. In northern Xinjiang, the accuracy peaked (red point, 0.9839) after incorporating the top 18 variables (Figure 6b). Similarly, the features with higher importance were concentrated in BOS, FBS and harmonic coefficients that represented the overall and general change of vegetation phenology, such as PSRI, EBI, SIPI, the harmonic coefficients of the cos (3πt) term of LSWI (LSWI_cos3) and the harmonic coefficients of the constant term of EVI (EVI_constant). In contrast to southern Xinjiang, elevation in northern Xinjiang played a more significant role. This may be attributed to the relatively concentrated cotton cultivation in northern Xinjiang, resulting in a more pronounced difference in altitude between cotton and non-cotton areas.
The optimized features consist of a subset of variables with higher contribution and lower redundancy. The features of southern and northern Xinjiang reached the peak accuracy at the first 24 and first 18 variables (Figure 6a,b). To reduce the colinearity between features, Spearman correlation coefficients were calculated for the selected feature combinations in each region, and variables with correlations greater than 0.8 and lower importance were iteratively eliminated. In southern Xinjiang, for instance, the correlation between the harmonic coefficients of the cos (6πt) term of EVI (EVI_cos6) and the harmonic coefficients of the cos (6πt) term of NDVI (NDVI_cos6) was 0.97, and NDVI_cos6 was removed due to its lower importance (Figure S1). In northern Xinjiang, for example, both SIPI and PSRI characterized the senescence signal of cotton with a correlation coefficient of 0.96, and the less important PSRI was removed (Figure S2). The optimized features for southern Xinjiang included 16 variables, while for northern Xinjiang, 11 variables were selected based on their Spearman correlation coefficients (Table S1). By utilizing optimized features with eliminated high-correlations, superior CVs were obtained (0.9620 and 0.9839 for southern and northern Xinjiang, respectively) compared to using only features with higher importance.

4.2. Accuracy Analysis of Different Feature Scenarios

Table 2 presents specific information for the four feature scenarios. NDVI and EVI exhibit poor stability in detecting cotton phenological stages and are limited to identifying only three stages: SES, FBS and BOS. To ensure the fairness of the comparative analysis, NDVI and EVI were composited within the five representative phenological stages to extract a total of ten features as FC1. Overall, accuracy metrics for FC1 to FC4 are all above 90%, except for Kappa for FC1 in northern Xinjiang, which is 86.48%, and these metrics increase sequentially (Figure 7). In northern Xinjiang, FC2, which is generated using the same phenological periods as FC1 but with additional features, prominently improved accuracy compared to FC1 that only utilizes NDVI and EVI. For instance, CV increases from 92.53% to 96.73%, and Kappa jumps from 86.48% to 93.10% (Figure 7a). FC3 combines time-series dynamic features with FC2, and all accuracy metrics are further improved compared to FC2. For example, OA increases from 96.59% to 98.72%. FC4 has fewer feature counts than FC3 but achieves the highest accuracy. The accuracy of FC4 improves remarkably compared to FC1, with OA increasing from 93.33% to 98.86%. The results for southern Xinjiang are highly consistent with those for northern Xinjiang, but all accuracies are lower than those for northern Xinjiang (Figure 7b).

4.3. Accuracy Assessment of Classification Results

Figure 8 displays the cotton map generated by the RF classifier combined with the optimized features and post-processing method. Visual verification revealed that the delineation of cotton fields was accurate and consistent with the high-resolution images and could be distinguished effectively from other land types (Figure 8b–h). In northern Xinjiang, cotton fields were predominantly distributed across the plains between 43°N to 45°N, while in southern Xinjiang, they were more scattered. The confusion matrix and corresponding metrics were calculated for the 4214 test samples and are presented in Table 3. The cotton map achieved an overall accuracy of 98.46%, with a Kappa value of 0.96. Additionally, the Precision for cotton was 97.13%, and the Recall was higher at 98.19%.
The classification results indicate that the cotton-mapped area was 2305.17 Kha, which is only 7.8% smaller than the provincial-level statistical area of 2501.90 Kha. At the county-level scale, the statistical and classification data showed a high degree of agreement, with an R2 of 0.9677, an RMSE of 14.62 Kha and an RRMSE of 17.78% across 79 county-level cities (Figure 9). These findings suggest that the cotton map accurately reflects the cotton-growing area, both at the provincial and county level.

5. Discussion

5.1. Advantages of the Optimized Tmpf

In this study, we developed a Tmpf with an optimization depending on each feature permutation importance and redundancy to identification of cotton using Sentinel-2 time-series and demonstrated its performance over northern and southern Xinjiang with differences in climate, terrain and cultivation structure. It obtained considerable accuracy, mainly due to two advantages compared with existing methods.

5.1.1. Reasonability of Tmpf

The Tmpf feature collection, which is specifically designed for cotton, integrates five typical cotton phenological stages and time-series change information. In contrast, existing methods only utilize a portion of these features or rely on a limited vegetation index to characterize entire phenological periods. As an illustration, Hu et al. [25] primarily employed NDVI and EVI as the input features for the RF classifier to detect cotton in northern Xinjiang. Our results have demonstrated that incorporating multiple spectral features at different phenological stages can significantly enhance the classification accuracy compared to relying solely on NDVI and EVI throughout the entire phenological period. Further, some studies have solely employed individual phenological stages for cotton extraction [27,61], which could heighten the probability of confusion between cotton and other crops with similar spectral characteristics (e.g., maize and cotton) [1].
Tmpf can enhance the spectral differences between cotton and other land types. During SOS, cotton fields are covered with soil, while overwintered crops like winter wheat are in the greening stage. Cotton can be easily detected by BSI and NDSI, which are responsive to soil minerals [43,44]. During SES, there is a significant overlap between the spectra of cotton and other vegetation. However, the rapidly increasing canopy closure during this period can effectively differentiate cotton from non-vegetation areas by NDVI and EVI. In FLS, cotton’s chlorophyll content and water demand reach their peak during growth duration. Most spring crops mature at this stage with marked differences from cotton, such as spring wheat and tomato [25]. In BDS, cotton retains high greenness and water demand, while maize is either in the milky maturity or the senescence stages with yellowing and drying leaves [1]. Thus, during FLS and BDS, NDRE and REPI were used to detect the changes in chlorophyll, while LSWI was employed to identify the signal of water. During BOS, cotton bolls open and gradually senesce, while most crops have already been harvested. Therefore, EBI was introduced to detect the signal of cotton’s boll, and SIPI and PSRI were introduced to enhance the signal of senesces [46,50,62]. Furthermore, to further enhance the cotton identification ability, we integrated the harmonic coefficients extracted from NDVI, LSWI and EVI time-series, which can capture general changes and small fluctuations in vegetation phenology [63] and potentially reflect the time-series dynamic change of cotton. This integration further improved the accuracy of cotton classification when compared to using representative phenological stage features alone. Overall, Tmpf effectively captures the phenological characteristics of cotton at five growth stages and incorporates time-series dynamic features, enabling it to effectively distinguish cotton from other land types such as winter wheat, maize, spring wheat, tomato and non-vegetation.
Although Tmpf is suitable for cotton mapping, there are some weaknesses. One weakness is that the information on mulching of cotton fields was not incorporated into the features. In Xinjiang, the introduction of drip irrigation under mulch technology has led to a high percentage of cotton covered with mulch. This study focused only on the bare soil signal of cotton at the sowing stage and did not account for the mulch signal, which could limit the spectral separability. Another potential limitation is that the impact of colored cotton was not assessed. Cotton includes a small amount of colored cotton, such as green and brown, and the suitability of Tmpf for such cotton has not been fully demonstrated. Future studies may benefit from the development of indices characterizing the mulch signal and further investigation of the effectiveness of Tmpf in the main production areas of colored cotton.

5.1.2. Performance of Feature Optimization Method

The feature optimization method (OM) used in this study enables ranking the importance of features and eliminates collinearity among them. Previous studies for crop classification have generally used the Mean Decrease in Impurity (MDI) feature importance [1,32,64,65]. However, MDI tends to features with high cardinality and might ignore the importance of certain features with high-correlation. The permutation importance (PI) applied in our study can partially address these issues [66]. The ten times average PI employed in this paper provides more robust results. We used the out-of-bag (OOB) sample instead of the validation sample to evaluate feature PI, as it can ensure generalization while saving samples. Additionally, we iteratively removed highly correlated features with less importance based on the Spearman correlation coefficient to identify the optimal features for cotton classification.
The analysis of the feature scenarios for both southern and northern Xinjiang indicates that the optimized features can obtain higher accuracy using a smaller number of features (16/31 and 11/31) compared to the complete set of Tmpf. The optimized feature set exhibits high interpretability, in addition to the improvement in classification accuracy. As Table S1 shows, the optimized features displayed slight variations between the two regions, potentially attributed to the differences in sample points distribution and natural environment. However, there was a high overall similarity, with the feature of boll opening stage and the feature indicating the overall change in vegetation phenology yielding more contributions. In representative phenological stage features, EBI, PSRI and SIPI of the boll opening stage had relatively high importance, followed by the features within the flower and boll development stage (Figure 6). These features are indicative of boll opening and senescence signals, suggesting that the boll opening stage is the most critical phenological stage for cotton identification. This finding is consistent with previous studies and is in line with cotton’s actual physical and biochemical characteristics [25,27]. In time-series dynamic features, the harmonic coefficients of cos (3πt), sin (3πt) and constant terms were found to be relatively more important, as demonstrated in Figure 6. This suggests that features representing the overall and general changes in vegetation phenology contain critical information [63] and are therefore more beneficial for identifying cotton. Terrain factors, such as elevation, contribute to both regions but are evidently more significant in northern Xinjiang than in southern Xinjiang. This may be attributed to the concentration of cotton fields in plain areas in the north, in contrast to the more dispersed distribution in the south. The concentrated distribution leads to similar elevation within the cotton class, which widens the differences with non-cotton classes and induces OM to use elevation as the preferred feature. In summary, OM allows for the selection of a smaller subset of features to achieve improved classification accuracy. The results of feature importance analysis highlight the boll opening stage as a critical phenological stage for cotton identification, underscoring the importance of accounting for the overall and general change of vegetation phenology.
Despite the advantages of OM mentioned above, it is important to acknowledge its limitations in comparison to feature optimization results obtained using other classifiers (such as SVM or C5.0). As an individual classifier may introduce errors in feature selection, combining the results of multiple classifiers through voting can yield a more robust outcome. Therefore, we will employ a multi-classifier feature selection approach to improve the accuracy of our results in future study.

5.2. Uncertainty of Cotton Map and Future Perspectives

Despite that the cotton map at a 10 m resolution in Xinjiang using the optimized fatures has demonstrated high accuracy, some limitations and uncertainty still exist. The first limitation is the availablity of images of crucial phenological stages. The omissions of cloud contamination pixels caused by cloud detection algorithms may also affect classification accuracy. Although linear interpolation of time series data is commonly used to ensure continuous data availablity, missing inflection point images can result in inaccurate reflectance measurements [67]. To address this issue in future research, combining Sentinel-2 data with other sources of remote sensing imagery (such as Landsat) to obtain more comprehensive observations may prove to be a valuable approach. Secondly, it is worth noting that some noise points caused by the pixel-based classification method may adversely impact accuracy. While we have attempted to mitigate this issue by using a morphology-based post-processing algorithm, its dependence on empirical parameters can result in uncertainty regarding the accuracy of the results. Therefore, future studies may benefit from the adoption of an object-oriented classification method, which can minimize the impact of noise points resulting from the pixel-based classification method. Thirdly, only the RF classifier has been considered in this study, with other classifiers such as SVM, C5.0 and ANN not being taken into account. It is difficult to identify a single classifier that performs optimally in all situations [68]. Therefore, future research may benefit from the use of a multiple classifier system or classifier ensemble approach, which can improve the accuracy of the cotton map by considering the outputs of different classifiers.

6. Conclusions

This study investigated a set of time-series multiple phenological features with an optimization depending on each feature permutation importance and redundancy to identification of cotton at a regional scale using Sentinel-2 imagery and RF classifier in GEE. This feature set integrates phenological information during five representative phenological stages, and time-series change features extracted by harmonic analysis have improved cotton field identification accuracy compared to using limited vegetation indices throughout the whole phenological period. Further, our features optimization method was able to retrieve a superior combination of features, resulting in greater accuracy with fewer features than the total feature set. The result of feature importance highlights the boll opening stage as a critical phenological stage for cotton identification, emphasizing that the harmonic coefficients representing the overall and general change in vegetation phenology cannot be ignored. Ultimately, the optimized features, in conjunction with RF, were utilized to generate a cotton map at 10 m spatial resolution in Xinjiang, China, for the year 2020. The qualitative and quantitative assessment results demonstrate that the cotton map displays high spatial consistency with high-resolution images and effective delineation of field parcels, achieves an accurate validation accuracy with test samples and exhibits a strong correlation with statistical data. Overall, we expect that the cotton-specific time-series multiple phenological features could advance the development of cotton mapping by incorporating phenological knowledge and providing a novel perspective for other crop extraction studies. Furthermore, we aspire that the generated cotton maps will serve as the fundamental baseline data for agricultural resource management, economic trade management and food security.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs15081988/s1, Figure S1: Spearman correlation coefficient matrix of important variables for southern Xinjiang. Figure S2: Spearman correlation coefficient matrix of important variables for northern Xinjiang. Table S1: Detailed information on four feature scenarios.

Author Contributions

Conceptualization, Y.T. and Y.S.; methodology, Y.T. and Y.S.; data organization, C.S. and H.W.; writing—original draft preparation, Y.T. and L.F.; writing—review and editing, Y.L., X.C. and A.N.; Validation, R.U. and S.B.; supervision, Y.S.; funding acquisition, Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Key Research and Development Program of China (no. 2020YFA0608501), Talent recruited program of the Chinese Academy of Science (no. Y938091), National Natural Science Foundation of China (no. 42071351), and the project-supporting discipline innovation team of Liaoning Technical University (no. LNTU20TD-23).

Data Availability Statement

All data in this article can be obtained by reasonably contacting the corresponding author.

Acknowledgments

Firstly, we are very grateful to Google Earth Engine (https://earthengine.google.com/, accessed on 2 October 2022) for their free services. Then, we appreciate the geemap package (https://geemap.org/, accessed on 2 October 2022) for helping us with data processing. Thirdly, we are also thankful to scholars for their analysis of the feature importance (https://explained.ai/rf-importance/index.html, accessed on 2 October 2022). Finally, we sincerely thank the editor and reviewers for their time and effort in providing detailed comments that helped improve the quality of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hu, Q.; Sulla-Menashe, D.; Xu, B.; Yin, H.; Tang, H.; Yang, P.; Wu, W. A phenology-based spectral and temporal feature selection method for crop mapping from satellite time series. Int. J. Appl. Earth Obs. Geoinf. 2019, 80, 218–229. [Google Scholar] [CrossRef]
  2. Heupel, K.; Spengler, D.; Itzerott, S. A Progressive Crop-Type Classification Using Multitemporal Remote Sensing Data and Phenological Information. PFG J. Photogramm. Remote Sens. Geoinf. Sci. 2018, 86, 53–69. [Google Scholar] [CrossRef] [Green Version]
  3. Foley, J.A.; DeFries, R.; Asner, G.P.; Barford, C.; Bonan, G.; Carpenter, S.R.; Chapin, F.S.; Coe, M.T.; Daily, G.C.; Gibbs, H.K.; et al. Global Consequences of Land Use. Science 2005, 309, 570–574. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Alves, A.N.; Souza, W.S.R.; Borges, D.L. Cotton pests classification in field-based images using deep residual networks. Comput. Electron. Agric. 2020, 174, 105488. [Google Scholar] [CrossRef]
  5. Khan, M.A.; Wahid, A.; Ahmad, M.; Tahir, M.T.; Ahmed, M.; Ahmad, S.; Hasanuzzaman, M. World Cotton Production and Consumption: An Overview. In Cotton Production and Uses: Agronomy, Crop Protection, and Postharvest Technologies; Ahmad, S., Hasanuzzaman, M., Eds.; Springer: Singapore, 2020; pp. 1–7. [Google Scholar] [CrossRef]
  6. Razzaq, A.; Zafar, M.M.; Ali, A.; Hafeez, A.; Batool, W.; Shi, Y.; Gong, W.; Yuan, Y. Cotton germplasm improvement and progress in Pakistan. J. Cotton Res. 2021, 4, 1. [Google Scholar] [CrossRef]
  7. Townsend, T.; Sette, J. Natural Fibres and the World Economy. In Natural Fibres: Advances in Science and Technology Towards Industrial Applications; Springer: Dordrecht, The Netherlands, 2016; pp. 381–390. [Google Scholar]
  8. Chen, Z.J.; Scheffler, B.E.; Dennis, E.; Triplett, B.A.; Zhang, T.; Guo, W.; Chen, X.; Stelly, D.M.; Rabinowicz, P.D.; Town, C.D.; et al. Toward sequencing cotton (Gossypium) genomes. Plant Physiol. 2007, 145, 1303–1310. [Google Scholar] [CrossRef] [Green Version]
  9. OECD/FAO. OECD-FAO Agricultural Outlook 2016–2025; OECD Publishing: Paris, France, 2016. [Google Scholar] [CrossRef]
  10. Xun, L.; Zhang, J.; Cao, D.; Wang, J.; Zhang, S.; Yao, F. Mapping cotton cultivated area combining remote sensing with a fused representation-based classification algorithm. Comput. Electron. Agric. 2021, 181, 105940. [Google Scholar] [CrossRef]
  11. Min, L.; Gengxing, Z.; Yuanwei, Q. Extraction and Monitoring of Cotton Area and Growth Information Using Remote Sensing at Small Scale: A Case Study in Dingzhuang Town of Guangrao County, China. In Proceedings of the 2011 International Conference on Computer Distributed Control and Intelligent Environmental Monitoring, Changsha, China, 19–20 February 2011; pp. 816–823. [Google Scholar]
  12. Carletto, C.; Gourlay, S.; Winters, P. From Guesstimates to GPStimates: Land Area Measurement and Implications for Agricultural Analysis. J. Afr. Econ. 2015, 24, 593–628. [Google Scholar] [CrossRef] [Green Version]
  13. Zhang, J.; Feng, L.; Yao, F. Improved maize cultivated area estimation over a large scale combining MODIS–EVI time series data and crop phenological information. ISPRS J. Photogramm. Remote Sens. 2014, 94, 102–113. [Google Scholar] [CrossRef]
  14. Weiss, M.; Jacob, F.; Duveiller, G. Remote sensing for agricultural applications: A meta-review. Remote Sens. Environ. 2020, 236, 111402. [Google Scholar] [CrossRef]
  15. Boryan, C.; Yang, Z.; Mueller, R.; Craig, M. Monitoring US agriculture: The US Department of Agriculture, National Agricultural Statistics Service, Cropland Data Layer Program. Geocarto Int. 2011, 26, 341–358. [Google Scholar] [CrossRef]
  16. Defourny, P.; Bontemps, S.; Bellemans, N.; Cara, C.; Dedieu, G.; Guzzonato, E.; Hagolle, O.; Inglada, J.; Nicola, L.; Rabaute, T.; et al. Near real-time agriculture monitoring at national scale at parcel resolution: Performance assessment of the Sen2-Agri automated system in various cropping systems around the world. Remote Sens. Environ. 2019, 221, 551–568. [Google Scholar] [CrossRef]
  17. Blickensdörfer, L.; Schwieder, M.; Pflugmacher, D.; Nendel, C.; Erasmi, S.; Hostert, P. Mapping of crop types and crop sequences with combined time series of Sentinel-1, Sentinel-2 and Landsat 8 data for Germany. Remote Sens. Environ. 2022, 269, 112831. [Google Scholar] [CrossRef]
  18. Preidl, S.; Lange, M.; Doktor, D. Introducing APiC for regionalised land cover mapping on the national scale using Sentinel-2A imagery. Remote Sens. Environ. 2020, 240, 111673. [Google Scholar] [CrossRef]
  19. Bargiel, D. A new method for crop classification combining time series of radar images and crop phenology information. Remote Sens. Environ. 2017, 198, 369–383. [Google Scholar] [CrossRef]
  20. Han, J.; Zhang, Z.; Cao, J.; Luo, Y. Mapping rapeseed planting areas using an automatic phenology- and pixel-based algorithm (APPA) in Google Earth Engine. Crop J. 2022, 10, 1483–1495. [Google Scholar] [CrossRef]
  21. Dong, J.; Xiao, X.; Menarguez, M.A.; Zhang, G.; Qin, Y.; Thau, D.; Biradar, C.; Moore, B., 3rd. Mapping paddy rice planting area in northeastern Asia with Landsat 8 images, phenology-based algorithm and Google Earth Engine. Remote Sens. Environ. 2016, 185, 142–154. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Zhong, L.; Hawkins, T.; Biging, G.; Gong, P. A phenology-based approach to map crop types in the San Joaquin Valley, California. Int. J. Remote Sens. 2011, 32, 7777–7804. [Google Scholar] [CrossRef]
  23. Wang, S.; Azzari, G.; Lobell, D.B. Crop type mapping without field-level labels: Random forest transfer and unsupervised clustering techniques. Remote Sens. Environ. 2019, 222, 303–317. [Google Scholar] [CrossRef]
  24. Al-Shammari, D.; Fuentes, I.; Whelan, B.M.; Filippi, P.; Bishop, T.F.A. Mapping of Cotton Fields Within-Season Using Phenology-Based Metrics Derived from a Time Series of Landsat Imagery. Remote Sens. 2020, 12, 3038. [Google Scholar] [CrossRef]
  25. Hu, T.; Hu, Y.; Dong, J.; Qiu, S.; Peng, J. Integrating Sentinel-1/2 Data and Machine Learning to Map Cotton Fields in Northern Xinjiang, China. Remote Sens. 2021, 13, 4819. [Google Scholar] [CrossRef]
  26. Peña-Barragán, J.M.; Ngugi, M.K.; Plant, R.E.; Six, J. Object-based crop identification using multiple vegetation indices, textural features and crop phenology. Remote Sens. Environ. 2011, 115, 1301–1316. [Google Scholar] [CrossRef]
  27. Wang, N.; Zhai, Y.; Zhang, L. Automatic Cotton Mapping Using Time Series of Sentinel-2 Images. Remote Sens. 2021, 13, 1355. [Google Scholar] [CrossRef]
  28. Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Quackenbush, L.; Adeli, S.; Brisco, B. Google Earth Engine for geo-big data applications: A meta-analysis and systematic review. ISPRS J. Photogramm. Remote Sens. 2020, 164, 152–170. [Google Scholar] [CrossRef]
  29. Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
  30. Han, L.; Ding, J.; Wang, J.; Zhang, J.; Xie, B.; Hao, J. Monitoring Oasis Cotton Fields Expansion in Arid Zones Using the Google Earth Engine: A Case Study in the Ogan-Kucha River Oasis, Xinjiang, China. Remote Sens. 2022, 14, 225. [Google Scholar] [CrossRef]
  31. Ni, R.; Tian, J.; Li, X.; Yin, D.; Li, J.; Gong, H.; Zhang, J.; Zhu, L.; Wu, D. An enhanced pixel-based phenological feature for accurate paddy rice mapping with Sentinel-2 imagery in Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2021, 178, 282–296. [Google Scholar] [CrossRef]
  32. You, N.; Dong, J.; Huang, J.; Du, G.; Zhang, G.; He, Y.; Yang, T.; Di, Y.; Xiao, X. The 10-m crop type maps in Northeast China during 2017-2019. Sci. Data 2021, 8, 41. [Google Scholar] [CrossRef]
  33. Xie, Y.; Lark, T.J.; Brown, J.F.; Gibbs, H.K. Mapping irrigated cropland extent across the conterminous United States at 30 m resolution using a semi-automatic training approach on Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2019, 155, 136–149. [Google Scholar] [CrossRef]
  34. Jin, Z.; Azzari, G.; You, C.; Di Tommaso, S.; Aston, S.; Burke, M.; Lobell, D.B. Smallholder maize area and yield mapping at national scales with Google Earth Engine. Remote Sens. Environ. 2019, 228, 115–128. [Google Scholar] [CrossRef]
  35. Farr, T.G.; Rosen, P.A.; Caro, E.; Crippen, R.; Duren, R.; Hensley, S.; Kobrick, M.; Paller, M.; Rodriguez, E.; Roth, L.; et al. The Shuttle Radar Topography Mission. Rev. Geophys. 2007, 45, 1–33. [Google Scholar] [CrossRef] [Green Version]
  36. Zanaga, D.; Van De Kerchove, R.; De Keersmaecker, W.; Souverijns, N.; Brockmann, C.; Quast, R.; Wevers, J.; Grosu, A.; Paccini, A.; Vergnaud, S.; et al. ESA WorldCover 10 m 2020, v100; The European Space Agency (ESA): Paris, France, 2021. [Google Scholar]
  37. Yang, J.; Huang, X. The 30 m annual land cover dataset and its dynamics in China from 1990 to 2019. Earth Syst. Sci. Data 2021, 13, 3907–3925. [Google Scholar] [CrossRef]
  38. Pekel, J.F.; Cottam, A.; Gorelick, N.; Belward, A.S. High-resolution mapping of global surface water and its long-term changes. Nature 2016, 540, 418–422. [Google Scholar] [CrossRef] [PubMed]
  39. Zhang, X.; Liu, L.; Zhao, T.; Gao, Y.; Chen, X.; Mi, J. GISD30: Global 30 m impervious-surface dynamic dataset from 1985 to 2020 using time-series Landsat imagery on the Google Earth Engine platform. Earth Syst. Sci. Data 2022, 14, 1831–1856. [Google Scholar] [CrossRef]
  40. Cao, B.; Yu, L.; Naipal, V.; Ciais, P.; Li, W.; Zhao, Y.; Wei, W.; Chen, D.; Liu, Z.; Gong, P. A 30 m terrace mapping in China using Landsat 8 imagery and digital elevation model based on the Google Earth Engine. Earth Syst. Sci. Data 2021, 13, 2437–2456. [Google Scholar] [CrossRef]
  41. Zhu, X.; Xiao, G.; Zhang, D.; Guo, L. Mapping abandoned farmland in China using time series MODIS NDVI. Sci. Total Environ. 2021, 755, 142651. [Google Scholar] [CrossRef]
  42. Diek, S.; Fornallaz, F.; Schaepman, M.E. Barest Pixel Composite for Agricultural Areas Using Landsat Time Series. Remote Sens. 2017, 9, 1245. [Google Scholar] [CrossRef] [Green Version]
  43. Hongmei, Z.; Xiaoling, C. Use of normalized difference bareness index in quickly mapping bare areas from TM/ETM+. In Proceedings of the 2005 IEEE International Geoscience and Remote Sensing Symposium (IGARSS ‘05), Seoul, Republic of Korea, 29 July 2005; pp. 1666–1668. [Google Scholar]
  44. Chen, W.; Liu, L.; Zhang, C.; Wang, J.; Wang, J.; Pan, Y. Monitoring the seasonal bare soil areas in beijing using multi-temporal TM images. In Proceedings of the 2004 IEEE International Geoscience and Remote Sensing Symposium, Anchorage, AK, USA, 20–24 September 2004. [Google Scholar]
  45. Tucker, C.J. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef] [Green Version]
  46. Curran, P.J.; Windham, W.R.; Gholz, H.L. Exploring the relationship between reflectance red edge and chlorophyll concentration in slash pine leaves. Tree Physiol. 1995, 15, 203–206. [Google Scholar] [CrossRef]
  47. Chen, J.; Jönsson, P.; Tamura, M.; Gu, Z.; Matsushita, B.; Eklundh, L. A simple method for reconstructing a high-quality NDVI time-series data set based on the Savitzky–Golay filter. Remote Sens. Environ. 2004, 91, 332–344. [Google Scholar] [CrossRef]
  48. Holben, B.N. Characteristics of maximum-value composite images from temporal AVHRR data. Int. J. Remote Sens. 2007, 7, 1417–1434. [Google Scholar] [CrossRef]
  49. Barnes, E.M. Coincident detection of crop water stress, nitrogen status and canopy density using ground-based multispectral data. In Proceedings of the 5th International Conference on Precision Agriculture, Bloomington, MN, USA, 16–19 July 2000. [Google Scholar]
  50. Chen, B.; Jin, Y.; Brown, P. An enhanced bloom index for quantifying floral phenology using multi-scale remote sensing observations. ISPRS J. Photogramm. Remote Sens. 2019, 156, 108–120. [Google Scholar] [CrossRef]
  51. Zhong, L.; Yu, L.; Li, X.; Hu, L.; Gong, P. Rapid corn and soybean mapping in US Corn Belt and neighboring areas. Sci. Rep. 2016, 6, 36240. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  52. Ghazaryan, G.; Dubovyk, O.; Löw, F.; Lavreniuk, M.; Kolotii, A.; Schellberg, J.; Kussul, N. A rule-based approach for crop identification using multi-temporal and multi-sensor phenological metrics. Eur. J. Remote Sens. 2018, 51, 511–524. [Google Scholar] [CrossRef]
  53. Jakubauskasa, M.E.; Legates, D.R.; Kastens, J.H. Crop identification using harmonic analysis of time-series AVHRR NDVI data. Comput. Electron. Agric. 2002, 37, 127–139. [Google Scholar] [CrossRef]
  54. Sun, C.; Li, J.; Cao, L.; Liu, Y.; Jin, S.; Zhao, B. Evaluation of Vegetation Index-Based Curve Fitting Models for Accurate Classification of Salt Marsh Vegetation Using Sentinel-2 Time-Series. Sensors 2020, 20, 5551. [Google Scholar] [CrossRef]
  55. Mingwei, Z.; Qingbo, Z.; Zhongxin, C.; Jia, L.; Yong, Z.; Chongfa, C. Crop discrimination in Northern China with double cropping systems using Fourier analysis of time-series MODIS data. Int. J. Appl. Earth Obs. Geoinf. 2008, 10, 476–485. [Google Scholar] [CrossRef]
  56. Breiman, L. Random Forest. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  57. Pirie, W. Spearman Rank Correlation Coefficient. In Encyclopedia of Statistical Sciences; John Wiley & Sons, Inc.: Hoboken, NY, USA, 2006. [Google Scholar] [CrossRef]
  58. Zhang, Y.; Liang, S.; Zhu, Z.; Ma, H.; He, T. Soil moisture content retrieval from Landsat 8 data using ensemble learning. ISPRS J. Photogramm. Remote Sens. 2022, 185, 32–47. [Google Scholar] [CrossRef]
  59. Wu, Q. geemap: A Python package for interactive mapping with Google Earth Engine. J. Open Source Softw. 2020, 5, 2305. [Google Scholar] [CrossRef]
  60. Deines, J.M.; Kendall, A.D.; Crowley, M.A.; Rapp, J.; Cardille, J.A.; Hyndman, D.W. Mapping three decades of annual irrigation across the US High Plains Aquifer using Landsat and Google Earth Engine. Remote Sens. Environ. 2019, 233, 111400. [Google Scholar] [CrossRef]
  61. Xun, L.; Zhang, J.; Cao, D.; Yang, S.; Yao, F. A novel cotton mapping index combining Sentinel-1 SAR and Sentinel-2 multispectral imagery. ISPRS J. Photogramm. Remote Sens. 2021, 181, 148–166. [Google Scholar] [CrossRef]
  62. Anderegg, J.; Yu, K.; Aasen, H.; Walter, A.; Liebisch, F.; Hund, A. Spectral Vegetation Indices to Track Senescence Dynamics in Diverse Wheat Germplasm. Front. Plant Sci. 2019, 10, 1749. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  63. Sun, C.; Li, J.; Liu, Y.; Liu, Y.; Liu, R. Plant species classification in salt marshes using phenological parameters derived from Sentinel-2 pixel-differential time-series. Remote Sens. Environ. 2021, 256, 112320. [Google Scholar] [CrossRef]
  64. Howard, D.M.; Wylie, B.K. Annual Crop Type Classification of the US Great Plains for 2000 to 2011. Photogramm. Eng. Remote Sens. 2014, 80, 537–549. [Google Scholar] [CrossRef]
  65. Laliberte, A.S.; Browning, D.M.; Rango, A. A comparison of three feature selection methods for object-based classification of sub-decimeter resolution UltraCam-L imagery. Int. J. Appl. Earth Obs. Geoinf. 2012, 15, 70–78. [Google Scholar] [CrossRef]
  66. Strobl, C.; Boulesteix, A.L.; Zeileis, A.; Hothorn, T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 2007, 8, 25. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  67. Pan, L.; Xia, H.; Zhao, X.; Guo, Y.; Qin, Y. Mapping Winter Crops Using a Phenology Algorithm, Time-Series Sentinel-2 and Landsat-7/8 Images, and Google Earth Engine. Remote Sens. 2021, 13, 2510. [Google Scholar] [CrossRef]
  68. Du, P.; Xia, J.; Zhang, W.; Tan, K.; Liu, Y.; Liu, S. Multiple Classifier System for Remote Sensing Image Classification: A Review. Sensors 2012, 12, 4764–4792. [Google Scholar] [CrossRef]
Figure 1. Study area of the Xinjiang showing location of the cotton/non-cotton samples. The entire Xinjiang was divided by the natural zoning of Chinese agriculture into northern Xinjiang (including NX and NWX) and southern Xinjiang (SX). Note: This map is based on the standard map with the approval number GS (2020)4619 downloaded from the standard map service website of the Ministry of Natural Resources of China, and the base map has not been modified.
Figure 1. Study area of the Xinjiang showing location of the cotton/non-cotton samples. The entire Xinjiang was divided by the natural zoning of Chinese agriculture into northern Xinjiang (including NX and NWX) and southern Xinjiang (SX). Note: This map is based on the standard map with the approval number GS (2020)4619 downloaded from the standard map service website of the Ministry of Natural Resources of China, and the base map has not been modified.
Remotesensing 15 01988 g001
Figure 2. Photographs of cotton at different growth stages.
Figure 2. Photographs of cotton at different growth stages.
Remotesensing 15 01988 g002
Figure 3. The general framework of cotton mapping. NDSI is Normalized Difference Soil Index, REPI is Red Edge Position Index, PSRI is Three-band Plant Senescence Reflectance Index, LSWI is Land Surface Water Index.
Figure 3. The general framework of cotton mapping. NDSI is Normalized Difference Soil Index, REPI is Red Edge Position Index, PSRI is Three-band Plant Senescence Reflectance Index, LSWI is Land Surface Water Index.
Remotesensing 15 01988 g003
Figure 4. Annual time-series profiles of cotton for eight spectral indices. (ah) represent NDSI, BSI, NDVI, EVI, LSWI, REPI, SIPI and PSRI.
Figure 4. Annual time-series profiles of cotton for eight spectral indices. (ah) represent NDSI, BSI, NDVI, EVI, LSWI, REPI, SIPI and PSRI.
Remotesensing 15 01988 g004
Figure 5. Boxplots of each feature for cotton and non-cotton in southern and northern Xinjiang. LSWI_3 and LSWI_4 represent LSWIs that are composited within FLS and BDS, respectively. NDVI_cos3 refers to the cos (3πt) term coefficient of the NDVI time series, while NDVI_constant corresponds to the constant term coefficient of the NDVI time series. The remaining variables hold similar meanings. (ac) and (eg) demote the representative phenological stages features in southern and northern Xinjiang, respectively, (d) and (h) represent harmonic coefficients in southern and northern Xinjiang, respectively.
Figure 5. Boxplots of each feature for cotton and non-cotton in southern and northern Xinjiang. LSWI_3 and LSWI_4 represent LSWIs that are composited within FLS and BDS, respectively. NDVI_cos3 refers to the cos (3πt) term coefficient of the NDVI time series, while NDVI_constant corresponds to the constant term coefficient of the NDVI time series. The remaining variables hold similar meanings. (ac) and (eg) demote the representative phenological stages features in southern and northern Xinjiang, respectively, (d) and (h) represent harmonic coefficients in southern and northern Xinjiang, respectively.
Remotesensing 15 01988 g005
Figure 6. The variable permutation importance measures in descending order and the average cross-validation accuracy of different feature combinations. (a) represents southern Xinjiang. (b) represents northern Xinjiang. The cross-validation accuracy peaks at the features with red color.
Figure 6. The variable permutation importance measures in descending order and the average cross-validation accuracy of different feature combinations. (a) represents southern Xinjiang. (b) represents northern Xinjiang. The cross-validation accuracy peaks at the features with red color.
Remotesensing 15 01988 g006
Figure 7. Multiple accuracy metrics for different feature scenarios. (a) represents northern Xinjiang. (b) represents southern Xinjiang.
Figure 7. Multiple accuracy metrics for different feature scenarios. (a) represents northern Xinjiang. (b) represents southern Xinjiang.
Remotesensing 15 01988 g007
Figure 8. Spatial distribution of cotton in Xinjiang. (bh) represent the case regions. The specific locations are marked in (a) with red boxes.
Figure 8. Spatial distribution of cotton in Xinjiang. (bh) represent the case regions. The specific locations are marked in (a) with red boxes.
Remotesensing 15 01988 g008
Figure 9. Comparison of cotton area in county-level municipalities between Sentinel-2 map and official statistics.
Figure 9. Comparison of cotton area in county-level municipalities between Sentinel-2 map and official statistics.
Remotesensing 15 01988 g009
Table 1. Formulas of the selected spectral indices with the corresponding phenological stages.
Table 1. Formulas of the selected spectral indices with the corresponding phenological stages.
Spectral IndicesFormulasComposite MethodPhenological Stage
BSI ( S W I R 1 + R ) ( N I R + B ) ( S W I R 1 + R ) + ( N I R + B ) medianSOS
NDSI S W I R 1 N I R S W I R 1 + N I R medianSOS
NDVI N I R R N I R + R 85th percentileSES
EVI 2.5 N I R R N I R + 6     R 7.5     B + 1 85th percentileSES
LSWI N I R S W I R 1 N I R + S W I R 1 medianFLS, BDS
NDRE N I R R e d 2 N I R + R e d 2 85th percentileFLS, BDS
REPI 705 + 35 R + R e d 3 2 R e d 1 R e d 2 R e d 1 medianFLS, BDS
PSRI R B R e d 2 medianBOS
SIPI N I R B N I R R medianBOS
EBI R + G + B G B ( R B + 1 ) 85th percentileBOS
Table 2. Detailed information on four feature scenarios.
Table 2. Detailed information on four feature scenarios.
Feature ScenariosFeature SetNumber
FC1NDVI and EVI in five phenological stages10
FC2Multiple phenological features in five phenological stages13
FC3Time-series multiple phenological features 31
FC4The optimized features of time-series multiple phenological features11 & 16
Table 3. Confusion matrix of the cotton map for Xinjiang.
Table 3. Confusion matrix of the cotton map for Xinjiang.
Classification ResultGround TruthPre (%)Rec (%)OA(%)Kappa(%)
CottonNon-Cotton
Cotton13534097.1398.1998.4696.51
Non-cotton25279699.1198.59
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tian, Y.; Shuai, Y.; Shao, C.; Wu, H.; Fan, L.; Li, Y.; Chen, X.; Narimanov, A.; Usmanov, R.; Baboeva, S. Extraction of Cotton Information with Optimized Phenology-Based Features from Sentinel-2 Images. Remote Sens. 2023, 15, 1988. https://doi.org/10.3390/rs15081988

AMA Style

Tian Y, Shuai Y, Shao C, Wu H, Fan L, Li Y, Chen X, Narimanov A, Usmanov R, Baboeva S. Extraction of Cotton Information with Optimized Phenology-Based Features from Sentinel-2 Images. Remote Sensing. 2023; 15(8):1988. https://doi.org/10.3390/rs15081988

Chicago/Turabian Style

Tian, Yuhang, Yanmin Shuai, Congying Shao, Hao Wu, Lianlian Fan, Yaoming Li, Xi Chen, Abdujalil Narimanov, Rustam Usmanov, and Sevara Baboeva. 2023. "Extraction of Cotton Information with Optimized Phenology-Based Features from Sentinel-2 Images" Remote Sensing 15, no. 8: 1988. https://doi.org/10.3390/rs15081988

APA Style

Tian, Y., Shuai, Y., Shao, C., Wu, H., Fan, L., Li, Y., Chen, X., Narimanov, A., Usmanov, R., & Baboeva, S. (2023). Extraction of Cotton Information with Optimized Phenology-Based Features from Sentinel-2 Images. Remote Sensing, 15(8), 1988. https://doi.org/10.3390/rs15081988

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop