1. Introduction
The cover-management factor (C-factor, land-use factor) is a term used in soil erosion management to describe the impact of land use and management on soil loss rates [
1,
2]. It is one of the most critical soil erosion risk factors that policymakers and farmers can influence to help reduce soil loss rates. The land-use factor is critical in regulating soil erosion and other environmental processes [
3]. Mathematical models have been developed to incorporate the land-use factor to better understand and predict these processes. One of the earliest models is the Universal Soil Loss Equation (USLE) [
4,
5], which was later corrected and modified to become the Revised Universal Soil Loss Equation (RUSLE) [
6,
7] and the Modified Universal Soil Loss Equation (MUSLE) [
8]. These models use various factors, including slope, soil type, rainfall intensity, and land use, to estimate soil erosion rates. Another model, the Water Erosion Prediction Project (WEPP) model [
9,
10,
11], also incorporates land use as a direct factor in its erosion calculations. Land-use data are a crucial component of these erosion models and are typically obtained from remote sensing data, land-use maps, and other geographic information systems. These data can be represented in a variety of ways, including as vector data (point, line, or polygon features), in the case of analysis of land use change at a farm or municipality level, or as raster data (grid cells with a specified value), in the case of analysis at a state or global level. In addition to the direct incorporation of land-use data in erosion models, the indirect effects of land use on soil erosion can also be considered. For example, land use changes can influence precipitation runoff and sediment transport [
12]. Physically-based models that incorporate these indirect effects have been developed, such as the Soil and Water Assessment Tool (SWAT) [
13]. These models are beneficial for large-scale assessments of erosion processes and the impact of land use change on these processes.
The study of global land use is an essential area of research, especially with regard to its effect on soil erosion and other exogenous processes [
14]. To better understand the impact of land use on erosion, researchers have developed a range of global land-use models derived from remote sensing data from satellites. These models vary in their spatial resolution and temporal coverage, depending on the type of sensor data used in their construction. One of the best-known global land-use models is the MCD12Q1.061-MODIS Land Cover Type Yearly Global [
15]. This model has a spatial resolution of 500 m and it has been updated annually since 2001. Another well-known model is the GlobCover: Global Land Cover Map [
16], which has a spatial resolution of 300 m. The European Space Agency/Copernicus also provides the Copernicus Global Land Cover Layers [
17], which have a spatial resolution of 100 m and are available free of charge. With the advancement of the Landsat program and improvements in software algorithms for processing and analyzing satellite images, the number of models with a 30 m resolution is increasing. Examples of these models include the Global Land Cover and Land Use Change [
18] GLC_FCS30-2015 [
19], and the GlobeLand30 model [
20]. These models have an accuracy rate of 70% or higher when recognizing different types of land use, as compared with independent sampling. Recently, models based on multispectral data from the Sentinel-2 program have also been developed. These models have the highest resolution (10 m) and can reflect recent land use changes. Examples of these models include the ESA WorldCover 10 m [
21], ESRI 10 m Annual Land Use Land Cover [
22], and the Dynamic World model developed by the World Resources Institute and Google [
23].
Land use models are commonly used in erosion modeling studies to simulate the impact of land use change on soil erosion processes. These models provide high spatial and temporal resolution information on land use and land cover changes, but typically, they need more detailed information on crop type and field-specific tillage practices [
24]. This is a critical limitation because crop type and tillage practices significantly impact soil erosion and sediment yield [
25]. Crop models are designed to provide detailed information on crops grown in specific fields, including their phenology, growth patterns, and responses to environmental conditions. This information is essential for understanding the impact of land use change on erosion and sediment yield, as different crops have different root systems, aboveground biomass, and leaf area index, which all influence erosion processes [
26,
27].
Several open-use crop recognition products are available to researchers and practitioners. The USDA NASS Cropland Data Layers (CDL) [
28] is a well-known example, providing annual crop data for the United States with a 30 m resolution derived from Landsat data since 1997. The Canada AAFC Annual Crop Inventory product [
29] provides a similar annual inventory map for North America, also derived from Landsat data. More recently, the development of the Sentinel program, particularly Sentinel-1 radar data, has made it possible to produce crop recognition with a much higher spatial resolution. The EUCROPMAP 2018 model from the Joint Research Center (JRC) [
30] is one such example, providing crop recognition at a 10 m resolution based on Sentinel-1 radar data. The classification accuracy of the EUCROPMAP model has been reported to range from 70 to 95 percent, with some crop types having higher accuracy levels than others. However, the classification accuracy of different crops is uneven. For example, for the EUCROPMAP product, the user accuracy of different crops varies from 28% to 90%. For the USDA NASS Cropland Data Layers model, this varies from 28% to 97%, and for the Canada AAFC Annual Crop Inventory, depending on the year, the figure varies from 64–87% to 74–90%, from 2011 to 2013 [
31].
Crop recognition from remote sensing data has been a significant area of research for several decades now. Several methods based on machine learning algorithms have been developed for crop recognition using geospatial and remote sensing data [
32,
33]. The products developed for crop recognition are generally based on state statistics geodata, which enables the creation of a training sample distributed throughout the study area [
34]. Automatic crop identification is a complex task with several challenges. One of the primary challenges in automatic crop identification is the inherent variability and complexity of spectral characteristics exhibited by different crops [
35]. Factors such as growth stage, weather conditions, and agricultural practices contribute to the unique spectral signatures of crops. This variability makes it difficult to accurately establish universal algorithms and models to identify crops [
32,
33]. Certain crops possess similar spectral signatures, leading to spectral signature overlap [
35]. Distinguishing between these crops based solely on spectral data becomes challenging, as their characteristic reflectance patterns may exhibit significant similarities [
35]. Discriminating between crops with similar spectral characteristics, such as wheat and barley, requires advanced algorithms that consider additional contextual information [
36]. Remote sensing data from satellites often need to improve their spatial resolution [
32,
33]. This limitation poses challenges when differentiating between crops occupying small areas, or when adjacent crops share the same pixel [
35]. The coarse spatial resolution may hinder the accurate identification and delineation of individual crops within mixed cropping systems [
34,
35]. Agricultural fields are often heterogeneous, as multiple crops or mixtures coexist [
34]. This heterogeneity introduces complexities when attempting to accurately identify and quantify each crop within a given field [
34]. Overcoming this challenge requires methods that account for the spatial variability and mixed composition of crops within a field [
34]. The success of automatic crop identification relies heavily upon the availability of sufficient and diverse training data [
34]. A significant challenge concerns the ability to obtain representative and high-quality training datasets encompassing a wide range of crops, geographic regions, and environmental conditions [
34]. The scarcity of such data limits the development and validation of robust algorithms for crop identification [
34]. Crops undergo distinct seasonal changes in spectral characteristics as they progress through different growth stages [
33,
36]. These changes can affect the accuracy of crop identification algorithms, particularly during transitions between growth phases or during the ripening stage [
33,
36]. Accounting for the temporal dynamics and capturing the phenological variations of crops is essential accurate and reliable identification [
33,
36]. Various factors, such as cloud cover, atmospheric haze, and shadows, can interfere with the quality and availability of satellite imagery used for crop identification [
32]. These interferences introduce noise and artifacts in the data, which affect the accuracy of crop classification algorithms [
32]. Mitigating the impact of these interferences through preprocessing techniques and data fusion approaches is essential for improving crop identification accuracy [
32].
Traditionally, machine learning techniques such as decision trees (e.g., Canada AAFC Annual Crop Inventory until 2015), ensemble bagging algorithms (e.g., Canada AAFC Annual Crop Inventory after 2015, and EUCROPMAP, 2018), and boosting algorithms (e.g., Catboost library used to create the ESA WorldCover 10 m model) have been used for crop recognition. However, due to the high computational resource requirements, neural networks are rarely used in crop recognition, especially when high-resolution data are used. Neural networks are more commonly used for regional or local models, which consider not only the variability of pixel brightness in multispectral data, but also other complex parameters, depending on the type of neural network. For example, convolutional neural networks have been used to recognize objects on space images based on their shape and texture [
37,
38]. Multilayer perceptron (MLP) has been used for both land-use type recognition [
39] and the spatial and temporal extrapolation of the obtained results [
40,
41,
42]. Recurrent neural networks have been used for sequence analysis [
33,
43,
44,
45], such as the time series analysis of vegetation indices like the Normalized Difference Vegetation Index (NDVI), which is an indicator of vegetation health. It can be used to compare the conditions of the same species at different times of the year, different years, or in different areas within a region or field. Regarding crop recognition, the classification is conducted pixel-by-pixel, with each pixel having the predicted class value and the probability of belonging to a particular class. This indirectly allows for the estimation of areas of fields with perennial grasses underplanted with the main crop, for instance. Object-based recognition is not produced at the global or national level. The methods using MLP and recurrent networks will be discussed in more detail in the relevant section.
A spatially distributed training sample covering all groups of land use types or crops is required to train recognition algorithms effectively. However, although it is possible to manually prepare a training sample for land use types using satellite imagery, it is not possible to do so for crop types due to the visual similarities between different crops. This challenge is intensified when analyzing large areas with varying natural and climatic conditions, such as large countries or the territory of the European Union.
Climatic and soil variations can significantly impact the phenological characteristics of vegetation, potentially shifting the time of growth onset, maximum growth level, and rate of green mass drying. Within-field differences in soil properties can also lead to differences in phenology, making it challenging to accurately identify crops [
46].
To overcome these challenges, it is crucial to develop methods that consider not only the values of spectral brightness and their variance, but also the nature of the variability during the growing season. This becomes a fundamental task without spatial information of crop types over large areas.
One possible solution to this problem is to incorporate additional information, such as climatic data, into the detection algorithms. For example, a study by Shendryk et al. [
47] used a combination of satellite imagery and climate data to accurately classify crops in a large agricultural region. They found that incorporating information concerning temperature, rainfall, and other climatic variables improved crop classification accuracy compared with using satellite imagery alone. Climate data help detect crops in satellite images because it can provide information about temperature, precipitation, and other environmental factors that affect crop growth and health. This information can identify areas where crops are thriving or struggling, which can help farmers make better decisions about planting, irrigation, and other management practices. For example, increases in regional temperatures due to climate change, especially in the tropics, can lead to heat stress for all crops. Many crops start feeling stressed at temperatures above 90 to 95 degrees Fahrenheit (32 to 35 degrees Celsius), although this will vary depending on crop type and water availability [
48].
Another approach involves using advanced machine learning techniques, such as deep learning, to effectively distinguish between crops with similar phenologies. For example, a study by Turkoglu et al. [
49] developed a deep learning-based algorithm that used multispectral satellite imagery to detect different crops in a mixed-crop field accurately. The algorithm effectively separated crops with similar phenologies, such as corn and soybean, by using information from multiple spectral bands and the temporal variability of vegetation growth.
Accurately assessing soil erosion requires spatial data from the locations of fields with specific crops, so that they can be used in a training sample. Therefore, this study aims to develop and validate a crop and crop group recognition methodology for large areas, with diverse natural conditions, without training data. There are several prerequisites for solving this problem. In the realm of remote sensing for agricultural applications, the presence and quality of in situ data are pivotal for determining the precision of crop mapping across diverse regions. Such data not only aid in the training phase, but also in the validation of crop mapping algorithms. However, the extent of available in situ data can differ markedly between regions, leading to variations in mapping accuracy.
Regions characterized by sparse in situ data often grapple with the challenge of formulating precise crop mapping algorithms. The efficacy of these algorithms is intertwined with localized insights into management strategies, crop phenology, and historical cropping trends. Moreover, the accuracy of crop mapping is contingent upon the spatial and temporal relevance of the in situ data employed during classifier training. Data that are either misrepresentative or insufficient can inadvertently introduce errors into the mapping outcomes.
A significant portion of crop mapping research is localized, heavily reliant on field data, and consequently, it lacks the versatility to be applied seamlessly across different regions. The process of in situ data acquisition, particularly in expansive agricultural zones, can be both time consuming and financially demanding, thereby restricting its availability. Additionally, certain regions might experience constraints when accessing ground truth data, which can, in turn, compromise the integrity of crop mapping results.
To navigate these challenges, there is an imperative need to enhance both the accessibility and caliber of in situ data. Innovative data collection methodologies, such as citizen science or crowdsourcing, could be explored. Furthermore, fostering a culture of data collaboration and sharing can potentially mitigate the inherent limitations of leveraging remote sensing for crop mapping. Another approach, which is proposed in this paper, concerns the use of crop data in areas where the collection and systematization of data at the locations of the plots is conducted at a qualitatively high level for crop mapping in areas with similar physiographic conditions and cultivated crops.
2. Materials and Methods
2.1. Study Area
The European part of Russia (EPR) (
Figure 1) serves as the focus of this study, and it covers an area of approximately 4 million sq. km, encompassing over 2400 km, north to south. An online geoportal, “The River Basins of the European Territory of Russia” (
http://bassepr.kpfu.ru/, accessed on 1 January 2023), has been developed to provide a comprehensive understanding of the environmental conditions of the territory, including topography, climate, hydrology, soils, land use, and human-induced impact.
The landscape of the EPR is diverse and encompasses a range of zones, from tundra to temperate deserts. The elevation varies from −30 m to 830 m, with 40% of the territory having altitudes ranging between 120 m and 180 m. The distribution of climatic elements within the EPR is zonal, with the average annual temperature ranging from −8 °C in the northeast to +12–14 °C on the Black Sea coast and the Caspian lowland. Moving from west to east into the mainland, the climate changes due to the transformation of Atlantic air masses, resulting in longitudinal landscape differentiation.
The annual precipitation in the EPR is significant in the west, approximately 700 mm, and it decreases in the northern and southeastern regions following the law of latitudinal zoning. This reduction in precipitation results in a reduction of forest cover and swampiness from north to south. The southern part of the EPR, where highly fertile soils (Chernozems) are located, has undergone intensive cultivation for a prolonged period. Arable lands are primarily located in the steppe, forest-steppe, mixed, and broad-leaved landscapes, and they cover an area of approximately 530,000 sq. km [
50].
Soils in the EPR are predominantly Chernozems, and they have a high level of fertility, making them ideal for agriculture [
51]. However, the intensive cultivation of these lands has led to degradation and soil erosion in some areas. The climate of the EPR also affects soil characteristics, with the cold and dry conditions in the northern regions resulting in permafrost and the formation of bog soils. The mild and humid conditions in the western regions contribute to development of Greyzems and Phaeozems.
The study area is a region actively involved in the agricultural activities of Russia, where the dominant types of crops are wheat, barley, sunflowers, and sugar beet.
Wheat and barley are typically sown in spring (April–May) and harvested in late summer or early autumn (August–September). These cereal crops constitute a significant portion of sown areas and are integral to the region’s agricultural production. Sugar beet and sunflowers are also sown in spring and harvested in the fall. These crops also play a crucial role in the region’s agriculture, with sunflowers often grown within crop rotation schemes to improve soil health. The primary production and cultivation methods include using modern agritechnologies such as minimum tillage, fertilization, and pest management, as well as utilizing crop rotation to maintain soil fertility and production sustainability.
2.2. Remote Sensing Data
MOD13Q1 [
52] and MYD13Q1 [
53] data from MODIS satellite imagery were used for crop recognition.
MODIS is a key instrument aboard NASA’s Terra and Aqua satellites, launched in 1999 and 2002. MODIS provides a range of data products for various scientific disciplines, including atmospheric science, oceanography, hydrology, and land use and land cover studies.
MOD13Q1 and MYD13Q1 are two of the many MODIS data products that provide information about the state of vegetation on the Earth’s surface. MOD13Q1 is obtained from the Terra satellite, whereas MYD13Q1 is obtained from the Aqua satellite. Both products provide 16-day composites of the Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) as measures of vegetation health and productivity. NDVI and EVI are indices that help quantify the amount of green vegetation in an area by comparing the reflectance levels of visible and near-infrared light. NDVI values range from −1 to 1, with higher values indicating a greater density of green vegetation. Similarly, EVI values range from −1 to 1, with higher values indicating greater green vegetation cover.
The MOD13Q1 and MYD13Q1 products have a spatial resolution of 250 m. This makes them useful for the large-scale monitoring of vegetation, such as tracking the progress of crop growth, monitoring the impact of drought, and assessing the distribution of vegetation types over large areas. Two scenes, wherein MOD13Q1 and MYD13Q1 were used to cover the agriculturally developed part of Canada, and seven scenes wherein corresponding products were required to cover the European part of Russia, were examined.
The TerraNorte RLC-2016 product is an annual map of terrestrial ecosystems in Russia, produced from remote sensing MODIS data, that depicts the spatial distribution of the main types of land cover with a spatial resolution of 230 m. The use of TerraNorte RLC-2016, in this case, was to identify arable lands within the European part of Russia (EPR) and to mask the areas occupied by arable land so as not to classify areas knowingly used for non-agricultural uses [
54,
55].
2.3. Training Sample
The study aimed to identify and recognize crops in a specific study area. Of more than 40 crops mentioned in the government statistics, those that occupied the majority of the cultivated areas were selected for recognition, including winter cereals, spring cereals, legumes, buckwheat, corn, sunflower, sugar beet, potatoes, perennial grasses, and fallow land. The other crop types were found to be rare in the study area.
Geospatial data were needed to locate fields with recognizable crops to create a robust training sample. However, there needed to be more reliable, publicly available georeferenced data on the composition of cropland in the study area. As a result, reference sites for the presence of crops were identified using the Territories–Analogues data from Canada, which have similar landscapes and climatic conditions to the study area.
The Territories–Analogues data were obtained from the Annual Crop Inventory project of the Canadian Ministry of Agriculture and Food, which provides publicly available raster layers of land use/land cover at a spatial resolution of 30 m, including crop type. The project used satellite data from Landsat, Sentinel, and RADARSAT-2 for processing and classification (
Figure 2).
The data of the Territories–Analogues were used in the training sample, using the MODIS pixels (250 m) that covered only one type of crop when they were spatially aligned with the 30 m land use raster of the Annual Crop Inventory. The NDVI, EVI layers, and QA (quality assessment) layer values were taken from the MOD13Q1 and MYD13Q1 products for 2015 and 2016 for each of the MODIS pixels. In this study, we used multi-temporal remote sensing data to differentiate between various types of crops. This differentiation was based on the sequence of values of spectral vegetation indices during the vegetation season. Spectral vegetation indices are calculated using different spectral ranges (bands) of remote sensing data and they serve as reliable indicators of vegetation activity on the Earth’s surface.
The most commonly used spectral vegetation index is the Normalized Difference Vegetation Index (NDVI), which is calculated using the following formula: NDVI = (NIR − Red)/(NIR + Red). NIR represents the near-infrared band (841–876 nm), and Red represents the red band (459–479 nm). Another widely used vegetation index is the Enhanced Vegetation Index (EVI), which is designed to minimize the impact of soil and atmospheric conditions on the vegetation signal. EVI is calculated, as follows: EVI = G·(NIR-Red)/(NIR + C1·Red − C2·Blue + L). In this formula, Blue represents the blue band (650–680 nm); L represents the canopy background adjustment; C1 and C2 are coefficients of the aerosol resistance term; and G is the scaling factor. For MODIS data, the coefficients are set as L = 1, C1 = 6, C2 = 7.5, and G = 2.5.
30 NDVI, 30 EVI, and 30 QA grids were used each year from 22 March to 19 November, with a time step of 8 days. The NDVI and EVI values on a pixel were considered to be acceptable if the QA value on the pixel was 0 (“good data”) or 1 (“marginal data”). Missing data were filled via cubic spline interpolation (in individual pixels for specific dates). In general, the quality of remote sensing data for the territory of Canada and European Russia was assessed as good and comparable (
Table 1).
The result was a training sample, a multidimensional dataset with pixels of the references as elements (rows) and crop code values and NDVI and EVI values as variables (columns). NDVI and EVI time series were constructed for each pixel of the training sample, reflecting the seasonal dynamics of these indices (with a time step of 8 days). The constructed NDVI and EVI time series were then fed into the recognition algorithm for training.
Spectral vegetation indices, such as NDVI and EVI, are frequently used as input data for analyzing and interpreting vegetation cover using remote sensing data. The seasonal patterns of NDVI and EVI values for different crops are unique. We constructed and analyzed time series graphs of NDVI and EVI for the pixels of the training sample, which depicted the changes in these indices from late March to mid-November. We smoothed these time series using Generalized Additive Models [
56] to obtain a generalized data representation.
The results, shown in
Figure 3, reveal the specific seasonal variability of vegetation indices for ten different crops. The graphs prove that each crop has a distinct seasonal pattern due to variations in phenology and agrotechnical activities such as sowing and harvesting. For instance, winter cereals and perennial grasses reach their peak photosynthetic activity earlier than other crops, around the 150th day of the year, and they experience a rapid decline in vegetation indices by the time of harvest in August (around the 220th day of the year). These distinct patterns in the time series of NDVI and EVI make using these data as inputs for crop recognition algorithms promising.
European Russia, the agriculturally developed part of Russia, is characterized by a wide range of geographic and climatic conditions. The region is primarily dominated by the East European Plain, with a climate that varies from humid continental to subarctic, significantly influencing agricultural practices. The soil composition is predominantly Chernozem, known for its high fertility and organic matter content, making it suitable for agriculture.
The main crops grown in European Russia include wheat, barley, oats, rye, corn, and sunflower seeds. The region is also known for its production of potatoes and vegetables. The growing season varies depending on the crop and the specific geographic location within the region. However, the general growing season is from late April to early October, with yields heavily influenced by weather conditions, particularly the amount and distribution of rainfall during the growing season.
Canada has a wide range of geographic conditions, from the fertile plains of Alberta and Saskatchewan to the mountainous regions of British Columbia. The climate varies from temperate on the west coast to subarctic and arctic in the north. Soil types also vary widely across the country, with Chernozem in the prairies, Luvisol in the northern regions, and Podzol in the boreal forest regions.
The major crops grown in Canada include wheat, canola, barley, corn for grain, and soybeans. The country is also a major producer of fruits, vegetables, and other specialty crops. The growing season in Canada is generally shorter than in European Russia due to the colder climate, typically from May to September. However, this can vary depending on the geographic location and crop.
Summarized information concerning the study area and the comparison of the Territories–Analogues is presented in
Table 2.
Despite the slight differences in geography and climate, both regions have developed robust agricultural sectors that use their unique conditions to grow various crops. Weather conditions heavily influence yields in both regions during the growing season, soil health, and the implementation of modern agricultural practices.
2.4. Recognition Methods
Crop recognition from remote sensing data is an essential aspect of precision agriculture, supplying valuable information for decision-making related to crop management, yield prediction, and environmental monitoring [
61]. Several machine learning techniques have been proposed for crop recognition from remote sensing data, with Random Forest (RF), Multilayer perceptron (MLP), and Long Short-Term Memory (LSTM) recurrent neural networks being one of the most widely used algorithms. The training and classification pipeline implementing these methods was developed in Python 3.7 using the Keras v.2.3.1 + TensorFlow v.2.2.0 framework, with GPU (graphics processing unit), and the data preprocessing, postprocessing, and analysis programs were developed in R v.3.4.4.
2.4.1. Random Forest
RF is a machine learning technique that builds an ensemble of decision trees, wherein each tree makes a prediction, and the final prediction is the average of the predictions made by all the trees. RF is a type of bagging algorithm, and the term “bagging” stands for bootstrap aggregating. In RF, the bootstrapped samples are drawn with replacements from the training data and are used to train a separate decision tree. Each tree is trained on a different subset of the data, and the final prediction is made based on the predictions of all the trees.
One of the main advantages of RF is that it can handle linear and non-linear relationships between the features and the target. This means that RF is flexible enough to model complex relationships between the variables and it can be applied to various problems. Additionally, RF can handle missing values and is relatively insensitive to noisy features. Another advantage of RF is that it supplies a measure of feature importance, which allows it to identify the most essential features for the classification task. This can be useful when conducting crop recognition from remote sensing data, where there may be many features, and it can be challenging to determine which ones are the most important.
However, one of the limitations of RF is that it is computationally intensive, especially when compared with other machine learning algorithms such as linear regression or logistic regression. RF also requires a large amount of memory, which can be an issue when working with large datasets.
There have been some studies that have used RF for crop recognition from remote sensing data. For example, a study by [
62] used RF to classify crops in the Midwestern United States based on remote sensing data obtained from Landsat imagery. In this study, RF performed better than other machine learning algorithms, including decision trees and support vector machines.
Another study by [
63] used RF to classify crops in the Nile Delta region of Egypt based on remote sensing data obtained from MODIS imagery. The study found that RF could accurately classify different crop types and provide valuable information for decision-making related to crop management and yield prediction.
2.4.2. Multilayer Perceptron (MLP)
Multilayer Perceptron (MLP) [
64] is a type of Artificial Neural Network (ANN) used for supervised learning tasks, such as classification and regression. MLP comprises one or more layers of processing nodes, also known as neurons, which are connected via weighted links. The input layer receives the input features, whereas the output layer provides the predicted class labels or continuous values. The hidden layers between the input and output layers transform the input into a representation suitable for the prediction task. Each neuron in the MLP model performs a simple calculation that combines the inputs with a set of weights and applies a non-linear activation function, such as the sigmoid or rectified linear unit (ReLU) function, to the result. The weights in the MLP model are optimized during the training process by minimizing the difference between the predicted outputs and actual target values. This optimization process typically uses backpropagation algorithms, such as gradient descent or variations thereof.
MLP has several advantages, such as the ability to model non-linear relationships, handle many input features, and perform well in many practical applications. However, MLP also has several disadvantages, such as the risk of overfitting, the need for large amounts of training data, and difficulties in interpreting the internal representation learned by the model.
MLP can be used in remote sensing applications, such as crop recognition, to predict the class labels of pixels in satellite or aerial imagery. MLP can learn complex patterns in the data by combining multiple layers and non-linear activation functions. Additionally, MLP can provide probabilistic estimates for each class, which can be used to improve the reliability and interpretability of the predictions.
However, it should be noted that MLP is only sometimes the best choice for remote sensing classification tasks, especially when dealing with high-resolution data. MLP requires a lot of computational resources and can be slow to train and predict. In addition, MLP may need help to capture meaningful spatial and spectral relationships in the data, especially when the data are highly heterogeneous or contains a large amount of noise. Other machine learning methods, such as convolutional neural networks or decision trees, may perform better in such cases.
2.4.3. Long Short-Term Memory (LSTM)
Long Short-Term Memory (LSTM) is a Recurrent Neural Network (RNN) algorithm used to learn long-term dependencies in data sequences. LSTM networks can learn patterns in data that span long distances and can help solve tasks such as language modeling, time-series prediction, and classification using remote sensing data. Extended Short-Term Memory Networks are a type of RNN used in various tasks such as machine translation, speech recognition, and image captioning. The LSTM architecture was introduced in 1997 by Sepp Hochreiter and Jürgen Schmidhuber [
65]. This architecture was designed to overcome the vanishing gradient problem that plagues traditional RNNs. The LSTM architecture achieves this by introducing a memory cell that keeps track of information over long periods, and it utilizes gated recurrent units. Gated recurrent units comprise an input gate, a forget gate, and an output gate. These gates control what information is passed on to the cell state and the output. Since their introduction, LSTMs have been used in many research areas and they have achieved state-of-the-art performance in many tasks. For example, they have been used in natural language processing (NLP) tasks such as machine translation and language modeling, as well as in computer vision tasks such as image captioning and object detection. They have also been used for robotics tasks such as robot arm control and navigation. On classification from remote sensing data, LSTM networks can learn the timesteps in a remote sensing data sequence and classify the data points. This type of task is often referred to as temporal classification. LSTM networks can outperform traditional methods, such as Support Vector Machines (SVMs) because they can learn long-term temporal dependencies.
One of the significant advantages of using LSTM networks for temporal classification is that they require less data preprocessing than traditional methods, making them more suitable for remote sensing data. Additionally, the long-term temporal dependencies learned by LSTM networks can help to reduce false positives in classification results. The main disadvantage of using LSTM networks for temporal classification is that they are more computationally expensive than traditional methods. Additionally, they can be prone to overfitting, especially when a limited amount of training data are available.
2.4.4. Recognition Methodology
As mentioned above, crop recognition was performed using three algorithms, and for each, hyperparameter optimization and cross-validation were performed. For this purpose, tenfold cross-validation with a hyperparameter search was used. This procedure was performed using the means of the scikit-learn library. In the first stage, the whole dataset was randomly divided into three parts, as follows: training, validation, and test datasets. These parts were based on a ratio of 70%, 20%, and 10%, respectively. The following set of parameters was used for the MLP method: number of neurons in the hidden layer equal to 100; ReLU activation function for the hidden layer; Adam solver for weight optimization; 0.9 for estimates of the first-moment vector; and 0.999 for estimates of the second-moment vector.
For the Random Forest method, the parameters of the classifiers were chosen, as follows: 500 trees in the forest; seven was the maximum depth of the tree; the minimum number of samples required to split an internal node was 10; and the minimum number of samples required to be a leaf node was 10.
In our study, we also employed recurrent neural network (RNN) architecture with long short-term memory (LSTM) units to capture the temporal dependencies in our data. The LSTM model was designed to avoid the vanishing gradient problem that is commonly encountered when training standard RNNs.
The input to the model was a sequence of data with a shape defined by the number of timesteps and the dimensionality of the data at each timestep (input_shape = (timesteps, data_dim)). The first layer of the model was an LSTM layer with 32 hidden units that returns the full sequence of hidden states for each timestep. This layer was followed by a second LSTM layer with 64 hidden units, which returned the full sequence of hidden states.
A unique feature of our model is the inclusion of a ‘skip’ connection, a concept popularized by ResNet models. This skip connection was implemented as a third LSTM layer with 64 hidden units that took the output of the first LSTM layer as input. The output of this layer was then combined with the output of a fourth and fifth LSTM layer (both with 64 hidden units) using an addition operation.
The combined output was fed into a sixth LSTM layer with 256 hidden units. However, unlike the previous LSTM layers, this layer only returned the last hidden state for each sample. This output was then passed through three fully connected (Dense) layers with 512, 128, and ncls units, respectively. The first two Dense layers used the ReLU activation function, whereas the last layer used the softmax activation function to output class probabilities.
The Keras functional API defines the model by specifying the input and output, as follows: model = Model(input, x). This architecture allowed the model to learn both short-term and long-term dependencies in the data, making it suitable for time series classification tasks and other tasks where temporal dependencies are essential.
2.5. Accuracy Assessment
The classification quality was assessed using two methods. The first method involved a pixel-by-pixel comparison of the classification results with the training sample, conducted with cross-validation. The pixel-by-pixel comparison was carried out based on the contingency matrices and the calculation of such classification quality metrics as Recall, Precision, F1-score, and Accuracy.
Recall (TPR) is the true positive rate; in other words, the proportion of true positives correctly classified. Precision (PPV) is the positive predictive value; in other words, the proportion of correct positive predictions. False Negatives (FN) occurred when our model incorrectly classified a pixel belonging to a certain crop type as not belonging to that crop type. Conversely, False Positives (FP) were instances wherein our model incorrectly identified a pixel belonging to a certain crop type when it did not. In our multiclass classification task, we extended the concepts of False Positives (FP) and False Negatives (FN) so that they could be applied outside of their traditional binary classification context. For each crop type, a False Positive indicated an instance where our model incorrectly predicted a pixel as belonging to that crop type when it belonged to a different one. Conversely, a False Negative was an instance where our model incorrectly predicted a pixel as not belonging to a certain crop type when it did. These metrics were calculated individually for each crop type in a ‘One-vs.-All’ manner, and then, they were averaged to provide a single measure of our model’s performance. This approach allowed us to understand not only the overall accuracy of our model, but also its performance concerning each crop type. The choice of averaging method, either ‘micro’ or ‘macro’ was determined based on whether we wanted each class (macro-averaging), or each instance (micro-averaging), to have equal weight when calculating these metrics. This nuanced evaluation helped us to identify specific areas where the model’s performance could be improved. The F1-score measures a model’s overall performance, calculated as the harmonic mean of precision and recall. Accuracy is the fraction of correct classifications.
The second method involved comparing the model (obtained from the results of crop recognition) and the real areas (presented in the state statistics database) of arable land occupied by a particular crop in spatial units such as municipal districts. For each crop, statistical characteristics concerning errors (differences between real and modelled areas occupied by crops in the districts) were calculated, as follows: mean error (ME), mean absolute error (MAE), root mean square error (RMSE), weighted average percentage error (WAPE), standard error of estimation (SE), and median error (MdE). These metrics are robust in terms of outliers.
The mean error (ME) is the average of the absolute values of the differences between the real and model areas. The mean absolute error (MAE) is the average of the absolute values of the differences between the real and model areas without regard to sign. The root mean square error (RMSE) is the square root of the average squared differences between the real and model areas. The weighted average percentage error (WAPE) is the average of the percentage errors weighted by the real areas. The standard error of estimation (SE) is the standard deviation of the errors. The median error (MdE) is the median of the errors.
where N is the number of municipal districts,
and
are the real and model areas of arable land occupied by a specific crop in the i-th district.
3. Results
Pixel classification, with the allocation of ten target crop classes (winter cereals, spring cereals, legumes, buckwheat, corn, sunflower, sugar beet, potatoes, perennial grasses, and fallow land), was implemented using three methods—MLP, RF, and LSTM. NDVI and EVI (MODIS) time series data for 2015 and 2016 were analyzed. For each year, the prepared composite of multi-temporal MODIS data with 60 layers (30 NDVI and 30 EVI layers) and the training data set were fed into the algorithms as input. In total, two MODIS scenarios concerning the Territories–Analogues, and 7 MODIS scenarios from the study area of the EPR were analyzed.
For each of the two territories (Territories–Analogues and the study area), the results of the recognition are presented in the form of six raster layers, the arable land pixels of which are assigned the codes of the classes of the recognized crop; these were obtained via three methods (MLP, RF, and LSTM), and for two years, respectively. For example,
Figure 4 shows some of the classification results from 2015, which concern the study area, and which used the LSTM method.
The quality of interpretation was assessed by comparing the classification results for the Territories–Analogues, pixel-by-pixel, with the training samples. A total of six classification results were obtained, covering two years (2015 and 2016), and using three methods (MLP, RF, LSTM) for each year. The comparison was conducted using cross-validation.
The recognition quality metrics (1)–(4) for each method were calculated based on contingency matrices and summarized over two years. The results are shown in
Table 3. In this table, the “weighted avg.” indicator represents the average quality indicators and it considers the representation of crops in the territory.
The second approach to assess the quality of crop recognition was applied to the results obtained in the study area. Only the results obtained using the LSTM method were considered, as it showed noticeably better results in the Territories–analogues. The recognition results were two classification rasters, covering the two years under consideration (2015 and 2016) in the study area.
To assess the quality of crop recognition, real (ground) data (represented in the state statistics database) and model data (obtained from the recognition results) were compared in terms of spatial units, such as municipalities. Open data from the Federal State Statistics Service (Rosstat) were used, specifically, the Database of Indicators of Municipalities [
66]. Queries in this database were used to obtain information concerning the areas of arable land occupied by different crops in the municipal districts of the subjects of the Russian Federation. This information was then geocoded using the vector layer of municipal districts in the study area, which included 1120 districts with arable land. The resulting classification raster was sampled on the vector layer of districts. The model areas were calculated for each of the 1120 districts (i.e., the areas occupied by recognizable crops, according to the interpretation results). This procedure was repeated for each of the two years under consideration.
The consistency of the crop recognition results was verified by comparing the values of the areas against different crops estimated from the recognition results, and the Rosstat data concerning the areas of arable land occupied by these crops in the 1120 municipal districts. The comparison was limited to significant crop groups as per the state statistics (Winter cereals, Spring cereals, Legumes). The statistical characteristics (5)–(10) of the errors (differences between the real and modelled areas of crops in the districts), summarized over the two years under consideration, are given in
Table 4.
Figure 5 shows the frequency histograms of these errors.
The thematic maps of
Figure 6 show the spatial distribution of the real (based on Rosstat data) and modelled (predicted) values of the area, using one of the crops in the municipal districts of the study area. These maps provide a visual assessment of the quality of interpretation, and they reveal the spatial specifics of the agreement between the modelled and real values.
4. Discussion
The recognition quality evaluation showed that the LSTM model outperformed the RF and MLP models in recognition of ten crop classes. The overall accuracy of the correct recognition of ten crop classes using the LSTM method reaches 0.937 on the training set with cross-validation. In contrast, the RF method demonstrates an accuracy of 0.897 and an MLP of −0.401.
The F1-score, a metric that evaluates the accuracy and sensitivity of the method, further confirms the superiority of the LSTM model with a score of 0.941 compared with 0.876 for the RF method and 0.458 for the MLP method. The MLP method, as a whole, showed the weakest results. The RF method recognized crop classes that were abundantly represented in the Territories–analogues (spring cereals, perennial grasses) well, but it did not recognize crops that were poorly represented (buckwheat, corn, sunflower, sugar beets, potatoes) well, despite attempts to balance the sample. This was due to insufficient efforts to balance the training data in the implementation of the RF algorithm, and training with rebalancing could improve the quality of this model.
The LSTM method demonstrated excellent recognition quality with high precision and recall values for crops. Its recall value of more than 0.95 indicates a correct recognition of more than 95% with regard to the pixels occupied by these crops; conversely, its precision value of more than 0.96 means that more than 96% of the pixels in the class correspond with the target crop. The LSTM method also performed well for rare crops, with high recall values and lower precision values, indicating correct recognition of almost all pixels occupied by these crops, but it also flagged many false positive pixels.
The advantage of the LSTM method lies in its ability to analyze sequences of data and time series data, such as NDVI and EVI, making it an ideal choice for classification, processing, and prediction based on time series data. The LSTM method is also relatively robust with regard to gaps in the time series, making it a compelling option for sequence analysis.
The LSTM method was validated on the arable land of the study area by comparing its detection results with state statistical data from 1120 EPR municipal districts for the main crops of winter cereals, spring cereals, and legumes. The comparison showed a high degree of agreement between the actual and predicted values of areas with regard to the different crops in the regions. The LSTM method slightly underestimated the presence of spring cereals and overestimated the presence of legumes.
The effectiveness of the LSTM in capturing patterns in time series data for crop detection has been demonstrated in numerous case studies worldwide. For example, studies in India [
67,
68] and Brazil [
69] used LSTM to classify different crop types, and they achieved better results than other machine learning algorithms. Similarly, a study in the United States combined remote sensing data with deep learning algorithms, including LSTMs, to achieve high accuracy in crop mapping [
70].
Although the landscape and climatic conditions in Canada may differ from those in the European part of Russia, the approach of using analog data from similar landscapes and climatic conditions has proven effective for remote sensing studies. The similarities between the territories in Canada and the study area in the European part of Russia was considered regarding vegetation types, topography, climate, and land use patterns.
The two key agrarian regions are Canada and European Russia, located in roughly the same latitudinal belts (40 to 60 degrees north latitude).
The Canadian agricultural belt is in a temperate climate zone with fairly mild winters and warm summers; the temperature ranges from −10 to 30 °C, depending on the season and region. In comparison, the European part of Russia has a moderately continental climate, with colder winters and warmer but shorter summers. Temperatures in this region range from −20 to 30 °C. The soil cover in both regions is characterized by fertile soils, which are typical of a temperate climate. Chernozems and Greyzems are common in Canada, whereas in European Russia, Chernozems, Podzols, and Greyzems are common.
Vegetation in both regions includes temperate broadleaved and coniferous forests, steppes, and prairies, which are ideal for agricultural use.
Canada and European Russia grow many of the same crops, such as wheat, barley, and corn. In both regions, wheat and barley are usually sown in spring (April–May) and harvested in late summer or early fall (August–September).
Additional crops, such as corn in Canada and sugar beets and sunflowers in Russia, are also usually sown in spring and harvested in early or mid-autumn.
Agroecological conditions in Canada and European Russia also have many similarities. Both regions have fertile soils suitable for temperate climates, and both are located in temperate climate zones, which are ideal conditions for many crops. Comparing the vegetation time series of the main crops of both territories involves analyzing the growth and development of crops in different periods, in accordance with climatic and soil conditions.
Based on the data presented, we found similarities between Canada and the European part of Russia in terms of seasonal patterns, similarity of crops, and climatic and soil conditions. These factors provide a basis for comparing the vegetation time series of the main crops of both territories. However, it is worth remembering that there are also differences in terms of agroecological conditions and agrotechnical practices, which may affect the vegetation indices and their interpretation. The latter is challenging to consider when using the MODIS RS data with a resolution of 250 m, therefore, we neglected these parameters in this stage of the study.
The NDVI and EVI time series were constructed to reflect the seasonal dynamics of these indices, and the constructed NDVI and EVI time series were then fed into the recognition algorithm for training purposes. Therefore, the classification of crop types in the European part of Russia is based on the analysis of the NDVI and EVI time series’ seasonal dynamics and the crop type information obtained from the Canadian data, rather than on the physical properties of the study area. Canadian data can be used as a form of transfer learning, where knowledge gained from one domain (in this case, Canadian crop types) is applied to another related domain (crop types in the European part of Russia). As far as we know, similar approaches have yet to be used, distinguishing this work in terms of novelty.
The TerraNorte RLC-2016 product provides valuable insight into the distribution and characteristics of different land cover types within the EPR, including arable lands. By using TerraNorte RLC-2016 to identify and mask the arable lands, the study focused specifically on agricultural land use and excluded non-agricultural land uses from the analysis. The study conducted its classification based on the land cover types identified by TerraNorte RLC-2016 was used to mask the areas occupied by arable land. For example, the study may have identified the pixels in TerraNorte RLC-2016 that correspond with arable land cover types (such as cropland or pasture) and then masked those pixels to exclude them from further analysis. This approach allows for significant computing optimization when analyzing big data over vast areas and the possibility of scaling the study results.
The recognition quality assessment highlights the potential of the LSTM method for crop mapping and the importance of using time-series data in remote sensing to improve accuracy. The superior overall accuracy and F1 score of the LSTM model, and the high correlation between the modelled and real values, demonstrate its reliability as a promising approach for crop mapping in different regions. Further comparative studies with different crops and regions are needed to validate the results and assess the global potential of the LSTM method.
The results of crop identification aggregated in the districts are in average agreement with the Rosstat data for the European part of Russia. Nevertheless, the quality of recognition in the analogous territories is relatively high. Using the LSTM algorithm with training data specific to the study area may further improve the agreement between the model’s results and the Rosstat data. Although more than recognition accuracy may be required for the reliable estimation of crop presence in specific locations, it is acceptable for solving regional scale tasks. The crop detection results were used to study soil erosion processes, a critical issue in nature management. Incorporating the results into the Universal Soil Loss Equation (USLE) model, the intensity of soil erosion loss in the European part of Russia could be estimated. An essential factor in the USLE model is the C-factor, a ratio comparing soil loss from land, using a given crop, with the corresponding loss from fallow land. The C-factor depends on the soil-conserving properties of the crop, which vary throughout the season based on crop growth stages and agronomic practices [
71,
72]. The resulting crop maps were used to calculate the C-factor for each month of a particular year from 2014 to 2019. A two-step process was used to estimate the C-factor. First, MODIS remote sensing data for a given year were used to automatically identify the crops grown on the cropland. Then, the C-factor in the pixels of the cropland on a given date was estimated based on the values of the soil conservation coefficients of the identified crops and the dynamics of these coefficients during the warm part of the year.
The average C-factor value for the European part of Russia was 0.401; for the forest zone of the landscape, it was −0.262; for the forest-steppe zone, it was −0.362; and for the steppe zone, it was −0.454. The obtained results correlate well with the results of previous field studies, and they provide current (based on 2014–2019 data) C-factor estimates for rainfall erosion (monthly, annual) with high spatial detail (250 m) [
14].
This innovative approach to soil erosion estimation could provide valuable insights into the complex interactions between crops, vegetation, and soil loss, and it has the potential to inform conservation management strategies and decision-making processes [
73].
The intricacies of crop mapping and agricultural research are deeply intertwined with the methodologies and tools employed [
74]. Our study, focused on the European part of Russia, has illuminated a spectrum of challenges and potential pathways for enhancement. A primary constraint was the resolution limitation of the MODIS images, which, at 250 m [
74], is suitable for regional analyses, but it falls short for detailed, large-scale studies. Such a resolution might inadvertently overlook micro-level variations in crop patterns, which are essential for nuanced agricultural decision-making.
Another significant challenge was the absence of comprehensive field data, making it difficult to verify the pixel-by-pixel accuracy of the classification [
75]. Although our findings show a commendable correlation with statistical data concerning cultivated areas, such indirect measures cannot substitute the precision and authenticity of direct field verification [
76]. Additionally, the study’s methodology might not be adept at capturing swift temporal changes, potentially missing out on insights into seasonal variations and short-term agricultural shifts.
Regarding future research, there is a palpable need for integrating higher resolution satellite images, like those from Landsat and Sentinel-2. Such images can offer a richer, more detailed view of the landscape, capturing nuances that might have been previously missed. Ground-truthing, or field verification, remains a cornerstone for remote sensing research, and future endeavors in this domain should prioritize this aspect [
75]. This ensures a harmonious alignment between satellite-derived findings and on-ground realities.
The Asian part of Russia, with its rich agricultural heritage, presents a promising avenue for future research [
77]. Extending the methodologies and insights from this study to that region can offer a comprehensive understanding of Russia’s agricultural dynamics. Moreover, the rapid advancements in machine learning and artificial intelligence present a plethora of opportunities. By integrating these techniques, we can refine classification accuracy, anticipate future trends, and delve deeper into the socio-economic implications of observed crop patterns. Collaborative research, involving farmers, agricultural experts, and regional authorities, can further enrich the study, offering a multi-dimensional perspective [
78].
In conclusion, although our study has charted significant progress in terms of understanding crop patterns in the European part of Russia, the road ahead is filled with opportunities [
76]. By addressing current limitations and leveraging advanced technologies and collaborative insights, we can set the stage for a more informed and sustainable agricultural future [
78].