1. Introduction
Excessive soil erosion is a significant environmental issue that is particularly acute within agricultural landscapes. The topsoil losses within agricultural regions have been shown to be extensive, reducing soil productivity and resulting in severe economic impacts [
1,
2]. Sediment and nutrient runoff also have a negative impact on the downstream infrastructure [
3] and ecology [
4] that can be costly to remediate. Consequently, understanding the magnitude and patterns of soil erosion within agriculture landscapes is an important pursuit that will positively impact society and the environment.
Due to the complex interactions that govern sediment erosion and transport [
5], monitoring erosion within large regions is challenging, and models that enable this require a variety of biophysical input parameters [
6]. One model that has had success in this regard is the Daily Erosion Project (DEP), which generates daily estimates of precipitation-related soil loss for over 6000 small watersheds (USGS Hydrologic Unit Code 12) across the Midwestern United States [
7]. The DEP accomplishes this via an implementation of the Water and Erosion Prediction Project (WEPP) [
8] model that utilizes radar-derived precipitation data and additional weather data, LiDAR-derived terrain data, field-level crop rotation data [
9], and SSURGO soil property data [
10] to ascertain most of the input parameters needed for the WEPP model. While the WEPP input parameters that are sourced from these data are relatively well constrained, information regarding the crop residue cover conditions of agricultural fields has proven to be difficult to obtain while maintaining accuracy, resolution, and coverage [
11]. The DEP implementation of the WEPP infers one of six tillage regimes and estimates time-based changes in residue cover, soil bulk density, soil roughness, etc., based on the minimum observed residue at one or more time periods during the tillage calendar [
8]. It is thus understood that the residue cover estimates within the DEP/WEPP may not match the observed residue cover by humans or sensors in some fields due to cloud coverage or model errors, but this was selected as the most practical method of estimating the tillage practices required by the WEPP and similar models like RUSLE2 and SWAT.
The tillage information needed to estimate crop residue cover and/or assign one of the six DEP tillage practices could potentially come from either tillage surveys or crop residue cover measurements. Tillage surveys, which are traditionally conducted by driving a transect across a geographic area such as a county and observing the tillage practices at a defined interval, were routinely conducted by the United States Natural Resource Conservation Service (NRCS) nationally from 1982 through 2004 as part of the Crop Residue Management by the Conservation Technology Information Center (CRM; CTIC) survey and are still conducted in some locations. The field-level data recorded while driving these transects are typically divided into residue-cover- and technique-based tillage classes as defined by the survey methodology (CTIC), such as conventional till (<15% at planting), reduced till (15–30%), and strip till, ridge till, and no till (all > 30%), potentially allowing one to estimate the mean residue cover observed for each tillage class. Alternatively, some surveys classify the field residue cover conditions into a 10% crop residue cover bin (e.g., 0–10%, 10–20% … 90–100%). Since this method does not accommodate the break at 15% that is found in the NRCS and CTIC conventional and reduced tillage system definitions, it is not utilized as often. Both methods are conducted via the “windshield”, meaning the observer does not leave the vehicle, which enables quick collection of data, but these estimates often exhibit bias [
12,
13,
14], which is likely due to the human difficulty in converting viewing angles from oblique to nadir [
13], resulting in recommendations such as holding one or more observers consistent across multiple county surveys [
15]. New techniques to standardize the windshield assessment process are being developed [
16] but are not yet widely applied. Along with these “windshield” methods, in-field methods are also used to estimate crop residue cover. In-field crop residue cover measurements can be collected via a multitude of methods, including tape transects, spiked wheels, and point-count photo analysis surveys [
17]. These methods all have inherent relative advantages and disadvantages but are generally more accurate than the “windshield” methods. However, the in-field techniques tend to achieve lower levels of adoption due to the larger amount of time and effort they take to complete.
The time constraints of observing the residue cover in millions of fields of 500,000+ km
2 necessitate the use remote sensing, which many recent studies have demonstrated the effectiveness of in terms of using single- or multi-band indexes within regression or machine learning models to predict crop residue cover [
18,
19,
20,
21,
22,
23].
To minimize the computational expenses and training data variability associated with increasing geographic extent, a problem not unique to residue cover studies [
24], an important recent development is the creation of cloud-based geospatial analysis platforms such as the Google Earth Engine [
25]. The Google Earth Engine hosts all the major publicly available Earth observation satellite datasets as well as some aerial datasets (e.g., US National Agricultural Imagery Project; NAIP) and provides an optimized computational environment that greatly reduces computation times when compared to running similar analyses on standard desktop machines. The Google Earth Engine introduces some level of image abstraction, which can result in the loss of insight into the processing steps, such as easily identifying which image date within a time range contains specific pixel values. The Google Earth Engine can also help to overcome the difficulties associated with the temporal and spatial variations in soils and crop residue cover over large regions by incorporating remote sensing derivatives that are sensitive to changes in the photosynthetic vegetation and soil moisture content within the analysis or removing problematic data, as was shown before [
21,
26], thus expanding the training data coverage to multiple non-contiguous regions over multiple years.
In this study, our primary objective was to test multiple sources of training data and determine the minimum quality of crop residue fraction that is needed to produce reliable crop residue cover models via the creation and comparison of models trained using residue cover measurements collected using a variety of methods. Our secondary objective was to test the ability of different crop residue cover data sources to identify the endpoints of the residue cover spectrum. Between 2010 and 2017, the authors and cooperators collected crop residue survey data using three different survey techniques (windshield tillage surveys, windshield binned residue surveys, and photo analysis surveys) across three Midwestern US states where corn and soybeans are commonly grown. These survey data were used in conjunction with a Google Earth Engine spatial analysis and a local linear regression analysis to assess whether each survey type is capable of producing reliable regression models. Additionally, we tested the utility of photo analysis survey data that were retroactively binned into 10% residue cover bins. We hope that this study will help future researchers to select the crop residue cover data collection strategy that is most appropriate for their study design when creating remote-sensing-based residue cover models.
2. Materials and Methods
2.1. Data Collection
To investigate the suitability of windshield tillage surveys, windshield binned residue surveys, and photographic analysis surveys as sources of training data for a crop residue cover model, we and our colleagues have collected and aggregated residue cover estimates via various survey methods throughout the Midwestern United States. All survey data were collected during the springtime between 2010 and 2017 within the states of Nebraska, Kansas, and Iowa, USA.
Table 1 provides a summary of collection years and states for each survey dataset and
Figure 1. provides a map of the survey locations. Iowa data were collected using two different survey techniques depending on the year and location surveyed, while Nebraska and Kansas survey data were collected using consistent regional surveys.
The Iowa residue cover surveys were conducted using both windshield binned surveys and photo analysis surveys between 2013 and 2017 within four different survey campaigns. The 2013 windshield binned residue cover survey was conducted by Agren, Inc. (Carroll, IA, USA), an agricultural conservation planning business, during the month of May via observations of residue cover in all fields within the South Fork of the Iowa River, an HUC8 watershed in north central Iowa. Observations were collected by trained personnel and classified into 10% residue category bins (e.g., 0–10%, 10–20%, …, 90–100%). The 2016 binned residue cover survey was conducted by watershed coordinators as part of the Iowa Watershed Approach flood management project. This data collection was coordinated by the Iowa Department of Natural Resources using personnel with similar backgrounds and similar procedures regarding the South Fork data from 2013. Photo analysis surveys were conducted in 2015, 2016, and 2017 after planting (late April through early June) using the reference methods described by Gelder et al. [
19]. All field photos were taken from approximately 1 m directly above the field (
Figure 2), and four to five images were collected for each field at random locations at least 30 m apart throughout the field. Trained observers placed grids over the images on a computer screen and the presence or absence of crop residue was noted for 100 evenly spaced grid intersections. Crop residue cover for each image was defined as the percent of intersections that corresponded to the presence of crop residue, and field crop residue cover value was defined as the mean crop residue cover of all the images from that field.
Nebraska surveys were conducted using the CRM windshield tillage technique and were conducted in late April through early June 2016 using the methods developed by the Nebraska state office of the NRCS. These surveys were conducted via observing the type of tillage found within fields at one-mile intervals along a predefined route. For both the Nebraska and Kansas windshield tillage surveys, the tillage practice present in each field was recorded as conventional tillage (<15% residue coverage), reduced tillage (15–30% residue coverage), conservation tillage (30–50% residue coverage), or no till (>50% residue coverage). This information was collected using a GPS-enabled tablet to assist in accurate geolocation during data entry.
The Kansas surveys were also windshield tillage surveys and were conducted by a team of 3–4 trained personnel who drove a pre-determined route through each county, stopping at intersections and determining the tillage practice present in each field. Approximately 450 fields per county per year were examined, both in fall and again in spring, but only the spring data were used in our analysis.
Before the Kansas and Nebraska survey data could be used to form regression relationships, the tillage categories needed to be converted to numeric crop residue cover values. To facilitate this, the observed tillage type and the Cropland Data Layer (CDL) crop type from the previous year were used to convert tillage practice observations to estimates of spring crop residue cover using the conversions of Bull [
27] (
Table 2). The windshield tillage surveys were performed using these converted crop residue cover values.
In addition to these surveys, the photo analysis survey data were artificially binned into 10% bins (e.g., 0–10%, 10–20%, …, 90–100%) to create a binned photo analysis dataset. This dataset was used to determine if photo analysis residue cover data classified to a coarser resolution could be usable as training data.
2.2. Geospatial Analysis
For the geospatial analysis, residue cover survey data were associated with field polygons on a local computer before they were uploaded to Google Earth Engine, where the satellite imagery analysis was conducted. A tabular version of the results from this analysis was downloaded to a local computer, where the final regression analysis was conducted within a Python 3.4 computational environment. A graphical summary of our analysis workflow can be found in
Figure 3, and all Google Earth Engine/Python code used in this study can be found at the GitHub repository:
https://github.com/dailyerosion/Williams_et_al_2024_residue_cover (accessed 19 August 2024). An in-depth description of our workflow follows.
Residue cover photographic survey data were first recorded as pointwise observations. However, since crop residue cover is likely to vary most significantly at the agricultural field scale, each point was associated with the appropriate agricultural field polygon sourced from the Agricultural Conservation Planning Framework database [
9]. This was conducted so that the analysis could be conducted on the field scale for all analyses since all windshield data were only collected as a field average. In the case of the photo analysis surveys, the mean value of all points within a field was used as the field’s residue cover value. All field polygons were buffered inward by 30 meters to limit the influence of edge effects, then uploaded to Google Earth Engine where the satellite analysis was conducted.
Within Google Earth Engine, time series data from the two most relevant Landsat Earth observation missions (Landsat 7 and 8) were used to calculate spectral indexes that relate to the three key components of residue cover observations: the amount of residue cover, emergent photosynthetic vegetation, and soil moisture. To capture these properties, we utilized the Normalized Difference Tillage Index (NDTI), Normalized Difference Vegetation Index (NDVI), and the Normalized Difference Moisture Index (NDMI). The NDTI focuses primarily on soil surface differences. This is essential for calculating the residue cover percentages. The NDVI and NDMI are primarily included and used for filtering out different pixels for being too green or too wet. Those two ensure that the final pixels being reviewed and analyzed are the most likely to have the residue that is being searched for. The equations used to calculate these indexes can be found in
Table 3. Using the pixel quality bands that are a component of the Landsat 7 and Landsat 8 data, bitwise masks were used to remove pixels containing clouds or snow prior to analysis. For Landsat 7, bits 3, 5, and 7 were used, and for Landsat 8 bits 3 and 5 were used. Once clouds and snow were removed, the Landsat 7 and 8 surface reflectance datasets were harmonized using the methodology of Roy et al. [
28], and the resulting dataset was used to calculate our spectral indices. Both Landsat 7 and 8 mission data have ground sampling distances (GSDs) of 30 × 30 m.
To capture all possible images that could correspond to the time of spring planting, when the DEP model expects minimum residue cover to be measured [
7], all images that were collected between 1 March and 15 June each year were included. On average, this results in 6 cloud-free images for Landsat 7 and 8 combined, but this varied based on year and location (
Table 4). It was not possible for us to determine the time of planting at each pixel, so instead we used the minimum NDTI value of each year’s image stack to produce the NDTI value used in our regression analysis. The minimum NDTI was selected based on previous work by Gelder et al. [
19] and Zheng et al. [
14], which both showed that using the minimum value of the image stack accurately corresponds to the residue cover at the time of planting.
Similarly to Beeson et al. [
29], we controlled for extreme NDTI values by removing NDTI values associated with high levels of emergent vegetation and/or excess soil moisture. In keeping with Beeson et al., we used an NDVI threshold to remove pixels overly influenced by emergent vegetation, but unlike their study we used an NDMI threshold instead of NEXRAD rainfall data to remove pixels with high soil moisture. While this is a slightly different metric, we expect that our NDMI threshold approach will behave similarly to their NEXRAD approach and has the advantage of exclusively relying on Landsat data.
To determine the most appropriate threshold values, we tested different NDVI/NDMI thresholds and determined the optimal threshold values based on model performance and data coverage. We calculated the average NDVI and NDMI index values within each field for each Landsat image available within our sensing period. We then used these values to create distributions for each index (
Figure 4 and
Figure 5). Using these distributions, we assessed the utility of using the 50%, 60%, 70%, 80%, and 90% quantile values for the NDVI/NDMI index threshold values. We decided that using an 80% quantile threshold provided the best balance between coverage and data quality, and we used this threshold value for both NDVI and NDMI. These 80% threshold values were 0.129 and −0.081 for NDMI and NDVI, respectively. In addition to using these thresholds to discard low-quality data, any raster cells that did not have data available from three valid Landsat images after thresholding were also discarded, ensuring that the minimum NDTI calculation is based on at least three image captures.
The use of SSURGO soil survey data [
30] as a regression factor was considered but not implemented due to multiple reasons. Soil color has been shown to have an impact on residue cover determination [
20,
31], but SSURGO data are not currently available as a Google Earth Engine dataset, and others have shown that SSURGO texture data may not improve model performance in a remote sensing context [
29]. Additionally, SSURGO mapping units were designed with minimum areas of 1–5 ha, and complexes (mixed soil series) are mapped in areas where soil series are not readily separated [
32]; thus, they were not designed to be accurate at the GSD of current Landsat images.
Once time series values were calculated at the pixel level, they were then aggregated to the field level by taking the median summarized index value within the inward-buffered field polygon. Aggregating to the field level by taking the minimum field index value was not considered because this would likely include data from in-field depressions whose high soil water content biases NDTI and NDVI values [
19,
20] and mean values would likely include high-residue-cover waterways or other conservation practices. In addition to the indexes described above, the modal Cropland Data Layer (CDL) value for each field during the previous year was used to assign a crop type to each field.
2.3. Regression Analysis
While other crop types were present in small numbers within our training data, we chose to focus our analysis on corn (
Zea mays) and soybean (
Glycine max) fields, which composed 52% and 42% of our survey data, respectively. The remaining 6% of our data, which was composed of 17 different crop types, was discarded. Even though corn and soybean crop residues have different structures (
Figure 2), we found little difference in their NDTI response for a given crop residue cover percentage in initial trials, and thus performed the regression for both crops at once. This may be due to the fact that long-term low-tillage-intensity fields in corn–soybean rotations have significant corn residue even in the year directly after soybeans.
To assess the quality of the data produced by the various survey techniques, we conducted four separate linear regression analyses between each of the four survey types used and the satellite-data-derived NDTI values. We then used the quality of these regressions to determine the utility of using each survey type as training data for a remote-sensing-based residue cover model. We assessed model fit for all regressions via the coefficient of determination (r2).
We are also interested in the feasibility of using each type of training data for discriminating between the end-member tillage types (e.g., differentiating between no till and conventional till, corresponding to maximal or minimal residue cover) and thus performed a secondary analysis to assess each survey type’s utility in this context. For this analysis, we removed data not associated with either no till or conventional till fields from each dataset, then compared the distributions of each dataset’s NDTI values between the two remaining tillage types. For the windshield tillage dataset, the original tillage type classifications were used to distinguish between no till and conventional till fields. For the remaining datasets, any field with less than 10% observed residue cover was classified as a conventional till field and any field with greater than 55% observed residue cover was classified as a no till field. The resulting distributions were assessed based on their degree of separation using the Kolmogorov–Smirnov test, a non-parametric test of the dissimilarity between two distributions.
4. Discussion
4.1. Recommended Use Cases of “Windshield” Survey Datasets
While windshield tillage and windshield binned residue surveys are widely conducted, as evidenced by the large number of observations we were able to assemble, our analysis shows that they perform poorly when used to predict crop residue cover as a continuous variable. This is disappointing because datasets such as these are widely available throughout the United States, have existed for many years, and thus could be a valuable training dataset. Both windshield survey datasets had high levels of variability, which led to training data that were distributed over the entire range of predicted crop residue cover values for a single tillage class or residue cover bin, making it difficult for the models to identify the relationship observed in the photo analysis dataset.
Although neither windshield dataset produced reliable regressions, the distributions in
Figure 7 show that binned residue data can generally be grouped into “low intensity” and “high intensity” tillage categories that have distinct minimum NDTI value distributions, although the photo analysis data have less overlap, which we hypothesize is due to the improved observation angle [
13]. The regression results were not unexpected because they are likely due to the inherently inaccurate data conversion process ([
27]; Bull et al.) between categorical information and quantitative data. This separability of low/high-intensity tillage suggests that windshield data may likely be useful in an end-member tillage classification approach if the quality can be maintained. We do not believe that this contradicts the findings by others [
13,
22] who found that some high-quality tillage class training datasets could be used to develop models that accurately estimate the tillage classes on validation fields. This finding is particularly relevant to crop residue cover approaches that employ spectral unmixing or otherwise require information on the end-member characteristics of tilled fields [
29,
33]. In this context, windshield binned residue surveys should be able to reliably represent the end-member characteristics of high-tillage and low-tillage fields without the need to employ more rigorous survey techniques. However, actual residue cover values, and not the standard three or four tillage classes, are preferred by the DEP due to the improved ability to discriminate between the six DEP tillage classes that may only differ by a few percent in terms of the residue cover at planting in the more intense tillage systems.
4.2. Recommended Use Cases of Photo Analysis Survey Datasets
As expected, the photo analysis survey datasets produced the most reliable models of crop residue cover. Unlike the windshield methodologies, the photo analysis survey observes the field from above, samples multiple portions of a given field, and quantitatively assesses the amount of crop residue cover in each location. All these factors combine to create a more accurate representation of in-field crop residue cover, and, consequently, this technique returned the best correlations with the remote sensing estimates. As noted in the Introduction, however, this is also the most labor-intensive technique, so the number of data points collected to establish the crop residue models will also need to be balanced against the time and monetary expenses of this approach. Alternative photo collection and photo analysis techniques that maintain the same level of data quality but decrease the cost of collection will likely need to be developed before we observe the widespread adoption of this survey technique.
In contrast to the windshield data, the retroactively binned photo analysis survey produced a regression that performed almost equally well compared to the photo analysis survey data. This finding suggests two things. First, the issues found within the windshield binned residue data do not inherently result from binning data, and efforts to improve the accuracy of windshield binned residue surveys [
16] may make them suitable for use within a regression framework. Second, those photo analysis techniques that increase the percent crop residue cover speed for a given image that also sacrifice a small amount of accuracy may be worth pursuing. For example, in our analysis of photos, we recorded the presence or absence of crop residue in 100 points per image. Based on these results, it is likely that it would be possible to reduce the number of points observed per image significantly without reducing the efficacy of the resulting crop residue cover model, which would in turn enable us to collect more training data in the same amount of time. Additionally, machine vision approaches [
34] that can reliably classify residue cover images into 10% bins could offer a way to quickly collect large quantities of reliable training data.
4.3. Model Limitations
The best-performing model was able to produce linear regression with an r
2 of 0.56, which resulted in residue cover maps that correlate with the known variances in crop residue cover percentages across Iowa. While this article covers the effects of the input survey data on the resulting model, the quality and quantity of the available remote sensing data are also important factors that influence these models. For example, it is well documented that local variations in confounding factors such as soil moisture, emergent vegetation, and soil type/color can have a significant impact on the accuracy of the resulting remote sensing models [
35,
36,
37], which could explain the reduced correlation between the NDTI and residue cover below 30% in
Figure 6; thus, accuracy should be assessed when estimating residue cover on soil orders not sampled in this study. The NDTI values in
Figure 7 across all the survey types indicate similar minimums and maximums across the Iowa, Kansas, and Nebraska soils, indicating that the regressions across the entire region are generally valid but may have decreased accuracy when compared to locally derived regressions. Generally, we succeeded in reducing the impact of the two most important confounding factors (soil moisture and emergent vegetation) for estimating residue cover; however, better characterization of these factors would likely result in model improvements.
In addition, since we are using minimum spring NDTI values to predict crop residue cover and requiring a minimum of three valid images, we are limited by the number of remote sensing observations we have in a given location. As shown in
Table 4, we have on average five valid image dates between 1 March and 15 June; thus, some locations will not have enough observations in a given year to generate an estimate. To overcome this, several possibilities exist. First, we could incorporate data from a wider variety of satellite platforms. In particular, the Harmonized Landsat Sentinel-2 dataset is promising for our work, and we look forward to the full development of the Sentinel 2 component within the Google Earth Engine. Second, we could utilize estimates from prior or subsequent years and assume similar practices existed in those years. Third, the use of alternative satellite technologies such as Synthetic Aperture Radar (SAR) [
23] singly, or in conjunction with optical remote sensing, could expand the image opportunities during periods of cloud cover. In particular, the upcoming launch of the NISAR SAR satellite may provide interesting possibilities since it will be collecting fully polarimetric SAR data over our study region.
Although generally accurate in terms of determining the mean field residue cover, we observe in
Figure 9, a detail of
Figure 8 in central Hamilton County, Iowa, that the current model overestimates the range of field-level residue variation. The field to the south of the farmstead has significant soil organic and soil moisture content variation, and the model overestimates the variation in residue cover. The model-estimated field ranges (maximum–minimum) were roughly 30–40% Y, and the observed field range was 20%. We believe this is due to the linear regression model minimizing the errors for the mean field residue cover versus mean NDTI. We theorize that future research could improve the regressions by considering the soil property variation within each field and how that soil variation modifies the NDTI response to residue cover. This is likely to improve the low-residue-cover performance of the model as the soil background dominates the reflected spectra instead of the more spatially uniform residue signature.
4.4. Potential Cost-Effective Survey Designs
The major limitation of photo analysis surveys is the cost of transporting the survey staff to a diverse set of geographic locations to perform surveys and the cost of performing the point count photo analyses for the collected survey images. However, there may be ways to reduce both costs to the point that it is efficient to conduct photo analysis surveys at broad scales. First, the prevalence of GPS-enabled smartphones with high-quality camera systems means there are essentially no additional equipment costs associated with performing these surveys. With minimal training that could consist of a single instructional video, conservation groups or citizen scientists could be trained and asked to perform these surveys within their local area. They could then send these survey photos back to a centralized organization such as a university or conservation organization where the point-count surveys could be centrally performed. This approach could dramatically reduce the cost of obtaining raw survey photos and result in the collection of much more data.
In addition, our retroactively binned photo analysis results suggest that it may be reasonable to employ photo analysis techniques that speed up the process of measuring the crop residue cover percentages in survey photos, even if they sacrifice some accuracy. In the past few years, the ability of machine learning models to perform these types of analyses has improved rapidly [
34], and using acceptable models would dramatically increase the speed of the photo analysis process. Thus, by combining distributed partner-led survey campaigns and the utilization of machine learning models for survey photo analysis, the efficiency of these photo surveys could approach the efficiency of the conventional windshield surveys while providing much better data.