Gap Filling Cloudy Sentinel-2 NDVI and NDWI Pixels with Multi-Frequency Denoised C-Band and L-Band Synthetic Aperture Radar (SAR), Texture, and Shallow Learning Techniques

Lasko, Kristofer

doi:10.3390/rs14174221

Open AccessArticle

Gap Filling Cloudy Sentinel-2 NDVI and NDWI Pixels with Multi-Frequency Denoised C-Band and L-Band Synthetic Aperture Radar (SAR), Texture, and Shallow Learning Techniques

by

Kristofer Lasko

Geospatial Research Laboratory, Engineer Research and Development Center, Alexandria, VA 22315, USA

Remote Sens. 2022, 14(17), 4221; https://doi.org/10.3390/rs14174221

Submission received: 28 June 2022 / Revised: 2 August 2022 / Accepted: 24 August 2022 / Published: 27 August 2022

(This article belongs to the Section Remote Sensing Image Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Multispectral imagery provides unprecedented information on Earth system processes: however, data gaps due to clouds and shadows are a major limitation. Normalized-Difference Vegetation Index (NDVI) and Normalized-Difference Water Index (NDWI) are two spectral indexes employed for monitoring vegetation phenology, land-cover change and more. Synthetic Aperture Radar (SAR) with its cloud-penetrating abilities can fill data gaps using coincident imagery. In this study, we evaluated C-band Sentinel-1, L-band Uninhabited Aerial Vehicle SAR (UAVSAR) and texture for gap filling using efficient machine learning regression algorithms across three seasons. Multiple models were evaluated including Support Vector Machine, Random Forest, Gradient Boosted Trees and an ensemble of models. The Gap filling ability of SAR was evaluated with Sentinel-2 imagery from the same date, 3 days and 8 days later than both SAR sensors in September. Sentinel-1 and Sentinel-2 imagery from winter and spring seasons were also evaluated. Because SAR imagery contains noise, we compared two robust de-noising methods and evaluated performance against a refined lee speckle filter. Mean Absolute Error (MAE) rates of the cloud gap-filling model were assessed across different dataset combinations and land covers. The results indicated de-noised Sentinel-1 SAR and UAVSAR with GLCM texture provided the highest predictive abilities with random forest R² = 0.91 (±0.014), MAE = 0.078 (±0.003) (NDWI) and R² = 0.868 (±0.015), MAE = 0.094 (±0.003) (NDVI) during September. The highest errors were observed across bare ground and forest, while the lowest errors were on herbaceous and woody wetland. Results on January and June imagery without UAVSAR were less strong at R² = 0.60 (±0.036), MAE = 0.211 (±0.005) (NDVI), R² = 0.61 (±0.043), MAE = 0.209 (±0.005) (NDWI) for January and R² = 0.72 (±0.018), MAE = 0.142 (±0.004) (NDVI), R² = 0.77 (±0.022), MAE = 0.125 (±0.004) (NDWI) for June. Ultimately, the results suggest de-noised C-band SAR with texture metrics can accurately predict NDVI and NDWI for gap-filling clouds during most seasons. These shallow machine learning models are rapidly trained and applied faster than intensive deep learning or time series methods.

Keywords:

Sentinel-1; random forest; support vector machine; North America; cloud; gap-fill; GLCM; machine learning; UAVSAR; speckle

1. Introduction

Multispectral satellite remote sensing, with its consistent repeat-pass observations, has been routinely used for characterizing and monitoring the land surface over long time periods. Such studies include monitoring forest disturbance and recovery [1,2], decadal-scale forest cover change [3,4], continental-scale cropland extent [5], continuous time-series based land cover change monitoring [6,7] and many other large-scale studies. More recently, land surface studies have included multi-sensor fusion to mitigate temporal and data gap limitations through the integrated use of Sentinel-2 and Landsat 8 moderate resolution imagery with data provided through the Harmonized Landsat and Sentinel-2 Surface Reflectance Dataset (HLS) [8,9]. The improved temporal resolution provided by HLS has enabled multiple ecosystem studies to better monitor change dynamics that are time sensitive. For example, one study monitored time-sensitive central U.S. grassland landscape dynamics with HLS because these ecosystems respond rapidly to land disturbances and weather, which ultimately impact nutrient cycling, greenhouse gas emissions and other ecosystem processes [10]. The combined use of disparate multispectral datasets can result in drastic improvements with local satellite revisit times under 7 days in many instances as previously mentioned. Unfortunately, these quick revisit times can lose utility in cases where cloud cover consistently obscures pixels.

Many regions of the world experience cloud cover for extended periods of time which negatively impacts the ability of multispectral satellites to monitor the surface of the earth. For example, a study showed that much of Eastern North America experiences cloud cover over 60% of daily Moderate Resolution Imaging Spectroradiometer (MODIS) satellite observations [11]. Moreover, equatorial areas of South America, Africa and Southeast Asia including Brazil, Democratic Republic of Congo and Indonesia experience cloud cover over 75% of the time [11]. Large swaths of boreal and northern latitude areas also experience similar cloudy conditions, thereby significantly restricting multispectral land surface observations. Moreover, areas of Southeast Asia such as Vietnam experience an average maximum of 20–25 continuous days of smoke or cloud cover during March, resulting in high temporal uncertainty for burned area mapping [12]. Another study demonstrated that many agricultural regions of the world have pervasive cloud cover, especially during key early-to-mid growing season stages [13]. Vegetation phenology studies require dense time series observations of imagery and even short data gaps can be detrimental. Machine learning regression algorithms and harmonic regression methods can predict spectral index values by gap-filling and fitting between several dates of observations, but consecutive data gaps affect the reliability of predictions [14]. Croplands and other land cover types are often characterized over time using common spectral indexes such as the Normalized Difference Vegetation Index (NDVI), Leaf Area Index (LAI), Enhanced Vegetation Index (EVI), or Normalized Difference Water Index (NDWI). Multi-date interpolation methods are efficient but can be limited as they don’t rely on direct observations for the given date.

Studies have mitigated data gaps in multispectral imagery and resultant spectral indexes using a variety of methods. Commonly, multiple dates of imagery are composited to fill in pixels with cloud cover, resulting in cloud free composites [15,16,17]. Unfortunately, these composites can include pixels with vastly different dates and varied seasonal phenomena due to consistent cloud cover in some areas. Some studies have used multi-sensor imagery to create 15-day gap-filled composites [18]. Although these composites can have high accuracy, authors report lower accuracies in areas with extended periods of data gaps.

More computationally intensive methods for data gap filling have emerged in recent years due to increased applications of machine and deep learning. Synthetic Aperture Radar (SAR), with its cloud-penetrating ability, has been leveraged for data gap filling using a variety of methods. Interferometric coherence was found capable of predicting NDVI with a moderate strength relationship across farmland and villages [19]. Machine learning regression such as with random forest or support vector machine to predict NDVI using SAR resulted in a maximum R² of about 0.71 across several different crop types [20]. More recent studies using hyper-local machine learning achieved higher R² (0.92) for croplands across multiple sites [21]. Some studies found predictive difficulties using random forests with low R² in some regions [22]. Different types of recurrent neural networks have been leveraged for predicting NDVI with SAR under cloudy conditions with some success across several land cover types, including benefits from temporal and textural information [23,24]. Other spectral indexes such as LAI have been successfully predicted using SAR and a modified cloud water model and found Vertical transmit, Horizontal receive (VH) polarization to be a better predictor than Vertical transmit, Vertical receive (VV) at C-band frequency [25].

Generative Adversarial Networks (GANs) have become prevalent and well-suited for predicting multispectral imagery obstructed by cloud cover. A variety of studies using GANs have found R² in excess of 0.92 for predicting individual spectral bands on satellites such as Sentinel-2 using various frequencies of SAR [26,27,28,29,30,31]. GANs are incredibly strong for this application due to their unsupervised and adversarial nature as well as training using only back propagation. There has also been successful cloud removal on Sentinel-2 image bands using Convolutional Neural Networks (CNNs) with RMSE as low as 0.042 on locally trained and predicted images with SAR [32].

In an era with the development of multi-frequency space-based SAR systems such as NASA-ISRO SAR (NISAR), integrating multiple SAR frequencies for data gap filling offers unique advantages because each frequency is variably sensitive to vegetation canopies. Limited work has shown the interoperability of C-band and L-band SAR with monitoring across land cover types such as forest type mapping [33], forest biomass estimation with better performance from L-band [34] or soil moisture monitoring or downscaling [35,36]. Moreover, studies have found both C-band and L-band acceptable for mapping cropland, but with C-band offering higher accuracy [37] and variations in vegetative canopy penetration based on polarization and frequency [38].

While the aforementioned deep and shallow learning methods have achieved a high accuracy for data gap filing of multispectral imagery, many of the methods are computationally intensive, rely on a time series of imagery or they evaluated single frequency SAR for gap-filling of spectral indexes. In this endeavor, many studies did not compare how well prominent SAR de-noising techniques can gap-fill cloudy pixels in spectral indexes. The present study addresses some of these shortcomings. Accordingly, this study evaluates the predictive performance of multi-frequency C-band and L-band SAR for data gap filling NDVI and NDWI spectral indexes from Sentinel-2 at 10m spatial resolution using a balanced set of randomly selected training data across major land cover types in a mixed temperate forest, coastal plains ecoregion with a variety of land cover types found globally. Different data and band combinations are evaluated as well as the use of texture metrics and evaluation of imagery from overlapping acquisition dates and dates which differ by several days. Error rates are also computed and summarized across different land cover types. This study uses several different machine learning regression models using locally trained data and quantifies the predictive ability of varying data combinations. Ultimately, this study aims to quantify the utility of a fast and efficient method using multi-frequency SAR and texture metrics for data gap filling spectral indexes across multiple land cover types to better improve continuous land cover mapping and monitoring applications. Compared with previous studies this study offers novelty by evaluating predictive performance of multi-frequency SAR with texture using computationally efficient methods. Moreover, this study evaluates how model accuracy changes when SAR and multispectral imagery are acquired at the same date or different dates and how different robust de-noising techniques impact model accuracy.

2. Study Area and Datasets

2.1. Study Area

The study area is located along the PeeDee River, South Carolina, USA. It is located within the coastal plains and temperate forest ecoregion of North America. The site covers an area of approximately 1100 km². The study area contains a river, coastal area along the Atlantic Ocean and several large tracts of woody wetlands which include cypress trees along the river occupying nearly 49% of the study area, some croplands, grasslands and forested areas are common as well. The study area contains several small urban areas including the town of Garden City Beach in the Eastern quadrant and it borders Georgetown to the South. The study area was primarily selected because it contains coincident Sentinel-1 SAR, Uninhabited Aerial Vehicle SAR (UAVSAR) and Sentinel-2 multispectral imagery all from the same acquisition date. Other study areas could not be located with the same three datasets acquired on the same date. Table 1 shows the different datasets acquired over the study area.

2.2. Sentinel-2 Imagery

Sentinel-2 multispectral imagery is provided by the European Space Agency (ESA) collected from two satellites in constellation with an approximately 5-day repeat overpass at the equator and more overpasses at higher latitudes. Sentinel-2 provides multiple spectral bands at 10 m, 20 m and 60 m spatial resolutions. The red, green, blue and near-infrared (NIR) bands are available at 10 m, while the remaining bands including red-edge and shortwave-infrared, are provided with the 20 m and 60 m datasets only. The Sentinel-2 imagery was acquired from the EarthExplorer (https://earthexplorer.usgs.gov/, accessed on 1 December 2021) at Level-1C/Top of Atmosphere (TOA). In order to mitigate atmospheric effects and obtain the scene classification layer (SCL) with cloud mask, the imagery was converted into surface reflectance/level-2A using Sen2Cor version 2.09 [39].

For this study, the 10 m imagery was used. Imagery was acquired and processed over the study area for three dates: 18 September 2018, 21 September 2018 and 26 September 2018. 18 September 2018 coincides with the acquisition date of SAR imagery and the 21 September 2018 and 26 September 2018 imagery are used to evaluate how cloud gap-filling model strength varies when same-date imagery is not available. Sentinel-2 imagery was also acquired on 9 January 2019 (winter) and 15 June 2019 (spring) for further seasonal evaluation.

2.3. UAVSAR

The UAVSAR is a fully-polarimetric, airborne L-band SAR system designed for single and repeat-pass interferometry and tomography [40]. The system has been operational since 2007 with data collected in campaigns across the world. UAVSAR has been leveraged for a variety of ecological applications such as forest structure and biomass mapping or stand age [41,42,43].

For this study, we acquired the UAVSAR imagery collected over Peedee River, South Carolina, USA (Flight line ID 15100) from 18 September 2018 which was freely provided courtesy of NASA/JPL-Caltech (https://uavsar.jpl.nasa.gov/, accessed on 1 December 2021). The multi-looked and ground range projected dataset in sigma nought backscatter was used. The final image contained an azimuth spacing of 0.6 m and was resampled to a 10 m resolution and aligned with the Sentinel-2 and Sentinel-1 pixels to maintain spatial consistency. The VVVV and HHHV polarizations were acquired and converted to decibels when applicable. UAVSAR imagery is limited and was only available during September 2018 at this location. This L-band SAR operates in a long wavelength capable of penetrating most vegetative canopies and it is not affected by atmospheric phenomena such as clouds or aerosols. Airborne SAR such as UAVSAR is slightly less influenced by speckle noise than spaceborne SAR due to the shorter distance between sensor and ground, as well as higher spatial resolution pixels that are multi-looked to 10 m.

2.4. Sentinel-1

Sentinel-1 is a C-band (5.4 GHz) SAR maintained by the European Space Agency (ESA) and operating in a near-polar, sun-synchronous orbit with a constellation of two satellites [44]. The constellation offers a 6-day local repeat pass and collects imagery in several modes including interferometric wideswath (IW) which was used for this study. The IW imagery has a nominal spatial resolution of 5 × 20 m but was reprocessed into a nominal 10 m resolution with square pixels aligned with the Sentinel-2 imagery. We processed the IW imagery using the Sentinel SNAP software. Because Sentinel-1 operates in the C-band it is capable of penetrating clouds and aerosols and is therefore not significantly affecting by atmospheric effects.

For this study, imagery was acquired over Peedee, South Carolina coinciding with the limited collection UAVSAR imagery. We acquired imagery from the Alaska Satellite Facility with the following dates: 13 September 2018, 18 September 2018, 19 September 2018 and 24 September 2018 primarily across path 150, frame 104. Sentinel-1 imagery was also acquired during winter (10 January 2019) and spring seasons (15 June 2019) for further seasonal evaluation. Ground-range-detected imagery was acquired for each date and calibrated into sigma nought backscatter using SNAP. Multilooking, terrain correction using SRTM 30 m DEM, thermal noise reduction and refined lee filter were all applied resulting in projected VV and VH polarized imagery with sigma nought backscatter scaled in dB. Additionally, imagery at the single look complex level was acquired on 18 September 2018, 10 January 2019 and 15 June 2019 for the purpose of obtaining the wave covariance matrix (C2) and complex data. Additional description of dataset processing is included in the methods sections.

2.5. Land Cover

A Sentinel-2 land cover dataset was created over the study area using the semi-automated land cover classification tool with overall accuracy of about 85% [45]. The land cover classes were aggregated into water, herbaceous wetlands, woody wetlands, bare ground, scrub/shrub/grasses, trees and built-up area. The land cover dataset was used for stratification of training points as well as evaluation of errors by land cover type. Maps of the datasets acquired over the study area are shown in Figure 1.

3. Methods

This study evaluates multi-frequency, cloud-penetrating SAR (C-band and L-band) for the purposes of predicting spectral indexes such as NDVI and NDWI in order to fill gaps due to cloudy pixels. Coincident imagery from Sentinel-1, UAVSAR and Sentinel-2 were acquired at a study site with the same acquisition date. Because spaceborne SAR such as the C-band Sentinel-1 contains a large amount of inherent speckle noise, we also compare the predictive performance of three de-noising techniques: (1) speckle filtering using a refined lee filter, (2) multi-temporal (MT) speckle filtering and (3) a covariance matrix noise reduction method (C2) described in a subsequent section. Further, texture metrics are computed and evaluated in conjunction with the SAR imagery. An equalized stratified random sample is generated based on the land cover layer which enables a balanced training dataset (equal number of random samples per class). The training data is then used to fit machine learning regression models including Support Vector Machine (SVM), Random Forest (RF), Gradient Boosted Trees (GBT) and compared with multiple linear regression (MLR) as well as an ensemble of the three most accurate models. The accuracy of the models is evaluated by withholding 20% of the stratified random sample points for evaluation. Varying numbers of training sample points were generated and evaluated in order to determine the effect on accuracy. Additionally, because it is not always possible to obtain SAR imagery acquired on the same date as multispectral imagery, we evaluated predictive possibility on offset dates (3 days and 8 days). This paper aims to generate a computationally efficient and accurate method for filling data gaps within a single Sentinel-2 scene which could be scaled in the future to include additional scenes. Figure 2 shows a generalized overview of the workflow of this study.

3.1. SAR De-Noising

Spaceborne SAR contains inherent constructive and destructive interference and random noise known as speckle. It is due to coherent processing of backscattered signals from multiple distributed targets. It is less noticeable in imagery acquired with high spatial resolution such as the UAVSAR. For this study, we de-noised the Sentinel-1 SAR using 3 methods. The first is a refined lee filter which builds on the basic Lee filter by locally altering the sliding window based on K-Nearest Neighbor (KNN) clustering algorithm to make it more robust [46]. The second method uses a multitemporal speckle filtering based on processed GRDH SAR images acquired on 13, 18, 19 and 24 September 2018 over the study area. The multitemporal speckle filter obtains the local mean for a given pixel’s calibrated intensity across the time period of images provided and is used in conjunction with a refined lee filter as implemented in the Snappy python library [47]. Current multitemporal speckle filter methods do not guarantee mitigation of effects from vegetation or land cover change during the given period. Therefore, to lessen this effect, only imagery within a small timespan (e.g., 2 weeks or less) is used. The third method calculates noise-reduced intensities applied to the complex SAR products obtained from Single Look Complex (SLC) imagery, which are then used to obtain the noise-reduced wave covariance matrix (C2) [48]. The three mentioned de-noising techniques were conducted using the SNAP software version 8.0 [49]. Lastly, the SAR imagery were resampled to 10 m resolution and pixels were aligned to the Sentinel-2 multispectral imagery. The Sentinel-1 imagery was clipped to match the extent of the UAVSAR image strip.

Texture metrics for Sentinel-1 and UAVSAR were created using the C11 band (VH) from the Sentinel-1 covariance matrix and the HHHV band from UAVSAR. Gray-Level Co-Occurrence Matrices (GLCM) were generated for both bands. The GLCM is a histogram of co-occurring greyscale values at a given offset for the image. Second order texture metrics can be computed from the matrix. Specifically, we used a 5 × 5 window size to obtain the GLCM variance for each pixel.

3.2. Spectral Indexes

The Normalized Difference Vegetation Index (NDVI) and Normalized Difference Water Index (NDWI) are two commonly used indices for monitoring vegetation phenology or surface water extent respectively. The NDVI is a band ratio using Near-IR and red bands, while the NDWI uses near-IR and green bands. The spectral indexes were calculated using the 10 m Sentinel-2 bands which were converted into surface reflectance using Sen2Cor version 2.09 software. The Sentinel-2 imagery was clipped to match the narrow UAVSAR image strip.

3.3. Training Sample Generation and Prediction of Indexes

For the purpose of generating training sample points to predict the NDVI and NDWI in order to fill cloud gaps, the cloud and cloud shadow pixels were masked using the provided SCL which is originally generated at 20 m but was resampled to the 10 m imagery. This layer contains bit masks for several phenomena. Using the SCL, we masked out thin cirrus, high probability cloud, medium probability cloud and cloud shadow. Considering that the SCL typically misses outer portions of clouds and shadows, a 120 m buffer was applied to better mask cloud and cloud shadow. Additionally, the SCL layer falsely masks out a limited number of pixels over water and built-up areas which we manually corrected.

After cloud and cloud shadow masking, we tested generating varying numbers of training points ranging from 50 to 3500 (50-point step) in order to measure the effect on accuracy using cross-validation with shuffle split (25% of points withheld for validation) against the predicted and observed spectral index values. The training sample count evaluation was based on using the UAVSAR HHHV, VVVV, HHHV GLCM variance and Sentinel-1 C11, C22 and C11 GLCM variance with each of the machine learning regressors. In order to generate training points that would not bias the models, the points were generated based on equalized stratified random sampling across the land cover type map with points generated equally on surface water, herbaceous wetlands, woody wetlands, trees, scrub/shrub/low vegetation, and built-up and bare ground. For each random point, the values of the overlapping pixel from each layer from UAVSAR, Sentinel-1 and the Sentinel-2 spectral indexes were collected. The model performance and accuracy were evaluated using cross-validation with shuffle split (n_splits = 5, test_size = 25%) method used on the equalized stratified randomly selected training points [50].

The sample points for each step were used to fit machine learning regression models to predict the spectral indexes. We evaluated the performance of different machine learning models. Several models in the python library scikit-learn were evaluated including SVM, RF and GBT [51]. They were compared with multiple linear regression (MLR). SVM determines the optimal hyperplane across n-dimensional space and uses the best-fit line as the result. We applied the SVM with the radial basis function kernel and the regularization parameter set to 1.5, with remaining parameters as default. RF fits a series of weak learner decision trees with each split evaluated using mean squared error [52]. The final result is based on an average of the ensemble of decisions trees. We applied the RF with 500 trees and the maximum number of features considered at a split as equal to the square root of the feature count. GBT builds iteratively upon the deficiencies of the previous tree and minimizes the gradient of the loss function. The GBT was applied with 500 estimators and Friedman mean squared error loss function, all other parameters as default. MLR was applied with default parameters. We selected 500 trees for the RF and GBT during initial model tuning where we tested several different tree counts including 100, 500 and 1000. Increasing from 100 to 500 trees offered a slight model improvement (e.g., R² = 0.857 vs. R² = 0.861). Whereas increasing from 500 to 1000 trees yielded negligible change, but nearly doubled the model fit time.

Subsequently, different dataset combinations were evaluated for predicting NDVI and NDWI using the optimal number of points determined from the previous section. The performance was evaluated for the different de-noising techniques and their impact on the models for predicting NDVI and NDWI. The techniques were: refined lee speckle filtered Sentinel-1 bands versus the more robustly de-noised Sentinel-1 bands with multitemporal (MT) de-noising applied from the 14, 18, 19 and 24 September imagery and the Sentinel-1 bands de-noised from the SLC covariance matrix (C22) [48]. The performance of L-band UAVSAR was assessed with and without the Sentinel-1 imagery. Additionally, we evaluated the performance of individual bands for predicting the spectral indexes. In all of these cases, performance was evaluated using R² to determine the amount of variance explained and Mean Absolute Error (MAE) to measure the error rate of the model. MAE was selected over Root Mean Square Error (RMSE) because it measures all error equally and is robust against outliers. The error rates and R² were assessed using 20% of the equalized random stratified points which were withheld from training.

In addition to evaluating the models using imagery from September 2018, we also acquired Sentinel-1 and Sentinel-2 imagery (UAVSAR was not available) during other seasons in order to evaluate the model performance during different vegetative stages and environmental conditions. Specifically, we acquired imagery during winter (Sentinel-1 imagery from 10 January 2019, Sentinel-2 imagery from 9 January 2019) and late-Spring (Sentinel-1 imagery and Sentinel-2 imagery both from 15 June 2019), as well as the original imagery from 18 September 2018 during late-summer/early autumn. We calculated the random forest model performance and the MAE, as well as scatterplots of the observed and predicted model values in order to interpret the results and any observable bias in the model.

4. Results

Each dataset was individually assessed for predicting NDVI and NDWI. Table 2 shows the R² and MAE for each one using random forest regression with 3000 equalized stratified random samples based on the land cover layer and the datasets acquired on 18 September 2018. Out of the three SAR de-noising techniques evaluated, the individual predictive performance for NDVI was about the same for the S1-MT (R² = 0.786 (±0.016), MAE = 0.120 (±0.002) and S1-C2 method (R² = 0.791 ± 0.02), MAE = 0.118 (±0.005), but lower for the more commonly used S1 refined lee method (R² = 0.719 (±0.01), MAE = 0.137 (±0.003). A similar trend was observed with the NDWI dataset as shown in Table 2. Interestingly, the L-band UAVSAR imagery had relatively lower predictive ability than any of the C-band Sentinel-1 datasets. This is likely because L-band has a longer wavelength which is known to penetrate vegetative canopies and thus have more soil contribution to backscatter signal, whereas C-band backscatter response is mainly from the vegetative cover, especially in dense vegetation (e.g., NDVI > 0.6) [36,38,53]. The most accurate combination of datasets for predicting NDVI was S1-C2 (C11 and C22), C11 GLCM variance, UAVSAR (VVVV and HHHV) and UAVSAR HHHV texture (R² = 0.868 (±0.015), MAE = 0.094 (±0.003), whereas NDWI had a slightly better accuracy with the same dataset at R² = 0.910 (±0.014), MAE = 0.078 (±0.003). It should be noted that the accuracy difference between the two de-noising techniques (S1-C2 and S1-MT) was not statistically significant. The GLCM variance used as a texture metric, provided a major performance improvement when used in conjunction with L-band UAVSAR; however, the accuracy still remained below all of the Sentinel-1 models with texture. Inclusion of texture with the other datasets (i.e., S1-MT and S1-C2) resulted in less substantial model improvement as shown in Table 2. Additionally, using a majority ensemble of the three most accurate models (RF, GBT and SVM) with 3000 training samples and cross validation, the R² was slightly worse for NDVI than the RF alone (0.866 (±0.016) vs. 868 (±0.015), but MAE was the same at 0.094 (±0.003) for both. A similar pattern was observed for NDWI (ensemble R² = 0.904 (±0.016) vs RF R² = 0.910 (±0.014), ensemble MAE = 0.081 (±0.004) vs. RF MAE = 0.078 (±0.003).

Using one of the most accurate datasets (S1-C2, UAVSAR and GLCM textures) determined from the previous section for NDVI and NDWI, we evaluated the impact that the number of equalized stratified random training samples (based on the land cover dataset) had on the regression, as well as how that affects the different regression algorithms. The results are shown with a 90% confidence interval in Figure 3. Ultimately, for most of the algorithms, the R² and MAE seem to stabilize after exceeding 1000 sample points for either dataset. The RF has the highest R² and lowest MAE across most of the tests, followed very closely by GBT and then SVM. All models routinely exceeded the performance of MLR.

The maps of the original spectral indexes predicted values for the spectral indexes using the random forest with S1-C2, UAVSAR and textures and absolute error per pixel are shown in Figure 4. As can be seen in the original spectral indexes, cloud cover obscures a large portion of the study area; however, the SAR and texture metrics are able to accurately fill in the data gaps. Visually, the errors appear to be relatively highest adjacent to cloud cover, as shown in the figure. This is due to the built-in Sentinel-2 SCL cloud mask not masking out the entirety of the cloudy pixels or shadows. Figure 5 shows a zoomed-in portion of the study area providing a closer look at spatial performance. For each spectral index, the first sub-plot shows the original index, the second shows the predicted index using the SAR data layers, the third shows the original spectral index with cloudy pixels filled in with SAR predicted values and the last shows the absolute error between the predicted and original spectral values. The third sub-plot represents the practical use of this combined SAR-multispectral algorithm for gap filling cloudy pixels. We also assessed the errors by land cover type and included how the errors vary based on the dataset combination that was selected. The errors for the most accurate dataset of S1-C2, UAVSAR and textures showed that MAE was lowest across woody wetlands for both NDWI and NDVI (MAE = 0.05 vs. 0.08 respectively), likely due to SAR sensitivity to moisture. Errors rates were highest for trees, bare ground and built-up pixels (Figure 6) likely attributed to C-band SAR not penetrating the dense tree canopy, as well as color differences in bare ground and built-up areas (e.g., different pavement or building colors). Interestingly, the inclusion of texture with S1-C2 reduced errors primarily over water, whereas the inclusion of texture with UAVSAR reduced errors across all land cover types with the largest improvement in built-up areas. The former is attributed to the very consistent and low value texture across water pixels, while the latter is attributed to the texture better representing vegetative and urban structure than L-band backscatter alone (i.e., L-band often penetrates to the land surface).

The importance of the variables for the most accurate dataset in the RF was assessed using Mean Decrease in Impurity (MDI). This measures the total decrease in node impurity weighted by the probability of reaching that node and averaged across all trees of the forest [54]. Ultimately, the Sentinel-1 C11 band was by far the most important variable compared with the other variables for both NDVI and NDWI as shown in Figure 7. This is likely due to the C11 band representing the VH cross-polarization which is more sensitive to differences in vegetation than the co-polarized VV band [55].

In addition to evaluating the UAVSAR, Sentinel-1 and Sentinel-2 imagery all acquired on the same date of 18 September 2018, we also evaluated Sentinel-2 imagery acquired on 21 and 26 September 2018 with the UAVSAR and Sentinel-1 from 18 September 2018 using the RF. For NDVI we found R² and MAE of 0.830 (±0.021) and 0.085 (±0.004) for the 21st and 0.743 (±0.022) and 0.103 (±0.002) for the 26th, whereas NDWI was slightly better with R² and MAE of 0.864 (±0.016) and 0.072 (±0.002) on the 21st and 0.788 (±0.020) and 0.086 (±0.002) on the 26th. This result suggests that the SAR imagery can still be used for predicting spectral indexes even if the date does not coincide with the multispectral imagery. However, in an area undergoing significant precipitation or land cover change between just several days, this would not be the case. We also evaluated the predictive performance of the individual bands against NDVI and NDWI comparing linear regression and random forest regression as shown in Figure 8. Unsurprisingly, the R² is the highest for most variables on 18 September. Interestingly, NDWI linear regression errors were highest on the 18th; however, RF errors were typically lowest on that same date. NDVI observed relatively similar patterns with the 18th having the highest predictive ability. Interestingly, GLCM variance texture errors were very high for the 18th and 21st but less so for the 26th with RF, but not as much with linear regression. For both NDWI and NDVI, C-band VV polarization resulted in lower MAE than VH and L-band HHHV had slightly lower MAE than VVVV, except when applied to imagery acquired on the 26th.

Lastly, the Sentinel-1 and Sentinel-2 imagery from late-spring, winter and late-summer/early-autumn were evaluated using the most accurate model determined from the previous section (S1-C2) with a random forest. The purpose of this was to evaluate whether any seasonal effects played a role in model accuracy. Each model was computed using the Sentinel-1 C11 and C22 bands with GLCM variance texture. Ultimately, the results for the winter (10 January 2019) and spring (15 June 2019) had lower accuracy rates than the 18 September 2018 model. We found NDVI accuracy to be lower than NDWI accuracy. The September 2018 NDWI model had the highest accuracy with R² = 0.88 followed by the June 2019 model with R² = 0.77 and then the winter 2019 model with R² = 0.61. Additional results for NDVI and MAE for both models are shown in Table 3. The scatterplots showing the observed and predicted NDWI and NDVI for the three dates are shown in Figure 9. The scatterplots show that for the 18 September 2018 imagery, the modelled NDVI tends to be overestimated at NDVI > 0.4 and NDVI < −0.4, while modelled NDWI is mostly accurate, but slight overestimation is observed for NDWI between −0.25 and 0.50. The winter (10 January 2019) modelled NDVI and NDWI tend to be overestimated, but the accuracy is reduced. The spring (15 June) modelled NDVI tends to be underestimated for NDVI > 0.6 and overestimated for most of the remaining values. June modelled NDWI tends to be underestimated at NDWI > 0.25 but overestimated for NDWI < −0.50. The variation may be attributed to a combination of factors such as changes in wetland area, vegetation canopy density and soil moisture.

5. Discussion

The results from this study align well with results reported in the literature. For example, one study predicted NDVI with CNNs with correlation coefficient as high as 0.87 using Sentinel-1 and Sentinel-2 [56]. Another study using Gaussian process regression with same-date Sentinel-1 and Sentinel-2 imagery achieved an R² of 0.90 for predicting LAI across a small 140 km² study site in Northern Spain consisting mostly of dry and irrigated cropland; even higher R² (>0.96) was observed when using multi-output Gaussian process regression which takes into account multiple dates of imagery instead of just one at a time as used in this study [57]. Another study over maize fields found linear R² between NDVI and Sentinel-1 VH to be 0.69 and even lower for VV polarization [58]. While lower accuracy than our study, it confirms the finding that better performance results were observed when VV polarization is used. A study over St. Petersburg, Russia found MAE of 0.05 for predicting NDVI in cloudy pixels with Sentinel-1 SAR; this algorithm achieved higher accuracy than our study but required a time series for prediction [59]. Another study predicted NDVI using Sentinel-1 and found an R² as high as 0.79 in rice fields; in line with this study, they also found random forest offered the best performance [20]. The use of deep learning frameworks and a time series of imagery for training can offer high accuracy for predicting NDVI. One study achieved MAE < 0.05 and R² > 0.83 for predicting NDVI with offset imagery [23]. In contrast to SAR, some studies have had success with gap-filling Landsat imagery using MODIS time series and a Savitzky-Golay filter with results showing RMSE below 0.001 [60].

The mentioned studies have achieved varying accuracies. Oftentimes, the studies with higher accuracy leverage a time series of imagery or deep learning, or both. This requires more significant data intensity as well as computational time for model training, whereas some of the studies achieving lower accuracy were often using standard machine learning algorithms like random forest but were predicting over individual agricultural land cover classes with offset imagery that can be impacted by changing environmental conditions (e.g., precipitation) and result in reduced accuracy. These kinds of conditions can be challenging to accurately predict with imagery from different dates.

The limitations of this study include that it was limited to one location due to a current lack of coincident imagery for all three datasets (UAVSAR, Sentinel-1 and Sentinel-2) on the same date. Future work could also explore variations to machine learning model parameters. Based on findings in the literature, it is likely that the predictive potential of UAVSAR and Sentinel-1 may vary across other regions with varied proportions of land cover type and different ecoregions. Conversely, this study brought some novelty by using equalized stratified random sampling based on land cover type for generating a balanced set of training data. Moreover, it also evaluated how much different parameters influence accuracy such as SAR speckle de-noising, inclusion of texture metrics and impact of imagery date, which were not previously reported on in this application.

6. Conclusions

This study evaluated the potential of C-band Sentinel-1 SAR and L-band UAVSAR for predicting NDVI and NDWI under cloud obscured Sentinel-2 multispectral pixels. Three different SAR de-noising techniques were compared: (1) standard approach using a refined lee speckle filter (2) multitemporal filtering (MT) with refined lee speckle filter and (3) covariance matrix (C2) de-noising from SLC imagery. The impact of GLCM variance texture was also evaluated. This study found that the combined use of C-band and L-band SAR can accurately predict NDVI and NDWI across multiple general land cover types for gap-filling cloud obscured pixels. Many previous studies using shallow learning techniques did not use robust de-noising techniques or GLCM texture, both of which yielded model improvements as shown in this study.

Multiple machine learning models were evaluated, but the most accurate proved to be a random forest with R² = 0.910 (±0.014), MAE = 0.078 (±0.003) for NDWI and R² = 0.868 (±0.015), MAE = 0.094 (±0.003) for NDVI. This model used the most accurate dataset combination which included S1-C2 (C11 and C22), UAVSAR VVVV and HHHV bands and GLCM variance. The inclusion of texture resulted in a major improvement in the model when using only a single frequency SAR (e.g., C-band or L-band only) with R² increasing from 0.66 (±0.032) to 0.77 (±0.021) for predicting NDVI with texture (UAVSAR) and R² increasing from 0.72 (±0.01) to 0.79 (±0.02) for Sentinel-1 refined lee speckle filtering with texture for predicting NDVI. Noticeably smaller improvements were found when texture was included with the other two more robust de-noising SAR methods. Employing an ensemble of the three most accurate models (RF, GBT and SVM) resulted in negligible changes to accuracy, at the expense of increased model fit time.

This study also found that the number of training samples plays a role in accuracy and found that model accuracy tended to stabilize after 1000 training samples. This study evaluated the potential of the two SAR sensors to predict NDWI and NDVI with multispectral imagery acquired from 3 days and 8 days after SAR acquisition. These results showed that the model still yields moderately strong predictive power on imagery from different dates (R² = 0.830 (±0.021) and MAE = 0.085 (±0.004) for 3 days later and R² = 0.830 (±0.021) and MAE = 0.085 (±0.004) for 8 days later for NDVI). Slightly higher accuracy was found for NDWI.

NDVI and NDWI models based on Sentinel-1 SAR from January (winter) and June (spring/summer) were also evaluated and compared to the September (Fall/Summer) results. Ultimately, this analysis found that the predictive capability of Sentinel-1 SAR for both NDVI and NDWI varies based on the season with the relatively low predictive capability in winter, but moderate capability in spring/summer, suggesting this algorithm would be effective on imagery acquired when vegetation is not in senescence.

This study ultimately found that C-band and L-band SAR together offer high accuracy in predicting NDVI or NDWI, but that de-noised C-band SAR and texture may be sufficiently accurate. L-band UAVSAR did not offer substantial improvement when paired with de-noised C-band SAR and texture. Ultimately, these methods and results could be used for scalable and efficient gap-filling of clouds for more complete land surface observations in applications including vegetation phenology and land cover change studies.

Funding

This research was funded under the Engineer Research and Development Center (ERDC) Basic Research Portfolio through Program Element PE 0601102A/T14/ST1409. Any opinions expressed in this paper are those of the author and are not to be construed as official positions of the funding agency. The use of trade, product, or firm names in this document is for descriptive purposes only and does not imply endorsement by the U.S. Government. Permission to publish was approved by the ERDC Public Affairs Office.

Data Availability Statement

This study uses free and publicly accessible data. The Sentinel-1 SAR imagery was acquired from the Alaska Satellite Facility (ASF) (https://search.asf.alaska.edu/#/, accessed on 1 December 2021), the UAVAR imagery was acquired from NASA/Caltech-JPL (https://uavsar.jpl.nasa.gov/) and the Sentinel-2 imagery was acquired from the USGS Earth Explorer website (https://earthexplorer.usgs.gov/).

Conflicts of Interest

The author declares no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Kennedy, R.E.; Yang, Z.; Cohen, W.B. Detecting trends in forest disturbance and recovery using yearly Landsat time series: 1. LandTrendr—Temporal segmentation algorithms. Remote Sens. Environ. 2010, 114, 2897–2910. [Google Scholar] [CrossRef]
DeVries, B.; Decuyper, M.; Verbesselt, J.; Zeileis, A.; Herold, M.; Joseph, S. Tracking disturbance-regrowth dynamics in tropical forests using structural change detection and Landsat time series. Remote Sens. Environ. 2015, 169, 320–334. [Google Scholar] [CrossRef]
Kim, D.H.; Sexton, J.O.; Noojipady, P.; Huang, C.; Anand, A.; Channan, S.; Feng, M.; Townshend, J.R. Global, Landsat-based forest-cover change from 1990 to 2000. Remote Sens. Environ. 2014, 155, 178–193. [Google Scholar] [CrossRef]
Hansen, M.C.; Potapov, P.V.; Moore, R.; Hancher, M.; Turubanova, S.A.; Tyukavina, A.; Thau, D.; Stehman, S.V.; Goetz, S.J.; Loveland, T.R.; et al. High-resolution global maps of 21st-century forest cover change. Science 2013, 342, 850–853. [Google Scholar] [CrossRef]
Xiong, J.; Thenkabail, P.S.; Tilton, J.C.; Gumma, M.K.; Teluguntla, P.; Oliphant, A.; Congalton, R.G.; Yadav, K.; Gorelick, N. Nominal 30-m cropland extent map of continental Africa by integrating pixel-based and object-based algorithms using Sentinel-2 and Landsat-8 data on Google Earth Engine. Remote Sens. 2017, 9, 1065. [Google Scholar] [CrossRef]
Zhu, Z.; Woodcock, C.E. Continuous change detection and classification of land cover using all available Landsat data. Remote Sens. Environ. 2014, 144, 152–171. [Google Scholar] [CrossRef]
Wang, Z.; Yao, W.; Tang, Q.; Liu, L.; Xiao, P.; Kong, X.; Zhang, P.; Shi, F.; Wang, Y. Continuous change detection of forest/grassland and cropland in the Loess Plateau of China using all available Landsat data. Remote Sens. 2018, 10, 1775. [Google Scholar] [CrossRef]
Claverie, M.; Ju, J.; Masek, J.G.; Dungan, J.L.; Vermote, E.F.; Roger, J.-C.; Skakun, S.V.; Justice, C. The Harmonized Landsat and Sentinel-2 surface reflectance data set. Remote Sens. Environ. 2018, 219, 145–161. [Google Scholar] [CrossRef]
Masek, J.; Ju, J.; Roger, J.C.; Skakun, S.; Claverie, M.; Dungan, J. Harmonized Landsat/Sentinel-2 products for land monitoring. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 8163–8165. [Google Scholar]
Zhou, Q.; Rover, J.; Brown, J.; Worstell, B.; Howard, D.; Wu, Z.; Gallant, A.L.; Rundquist, B.; Burke, M. Monitoring landscape dynamics in central US grasslands with harmonized Landsat-8 and Sentinel-2 time series data. Remote Sens. 2019, 11, 328. [Google Scholar] [CrossRef]
Wilson, A.M.; Jetz, W. Remotely sensed high-resolution global cloud dynamics for predicting ecosystem and biodiversity distributions. PLoS Biol. 2016, 14, e1002415. [Google Scholar] [CrossRef]
Lasko, K. Incorporating Sentinel-1 SAR imagery with the MODIS MCD64A1 burned area product to improve burn date estimates and reduce burn date uncertainty in wildland fire mapping. Geocarto Int. 2021, 36, 340–360. [Google Scholar] [CrossRef]
Whitcraft, A.K.; Vermote, E.F.; Becker-Reshef, I.; Justice, C.O. Cloud cover throughout the agricultural growing season: Impacts on passive optical earth observations. Remote Sens. Environ. 2015, 156, 438–447. [Google Scholar] [CrossRef]
Belda, S.; Pipia, L.; Morcillo-Pallarés, P.; Verrelst, J. Optimizing gaussian process regression for image time series gap-filling and crop monitoring. Agronomy 2020, 10, 618. [Google Scholar] [CrossRef]
Griffiths, P.; van der Linden, S.; Kuemmerle, T.; Hostert, P. A pixel-based Landsat compositing algorithm for large area land cover mapping. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 2088–2101. [Google Scholar] [CrossRef]
Hermosilla, T.; Wulder, M.A.; White, J.C.; Coops, N.C.; Hobart, G.W. An integrated Landsat time series protocol for change detection and generation of annual gap-free surface reflectance composites. Remote Sens. Environ. 2015, 158, 220–234. [Google Scholar] [CrossRef]
Man, C.D.; Nguyen, T.T.; Bui, H.Q.; Lasko, K.; Nguyen, T.N.T. Improvement of land-cover classification over frequently cloud-covered areas using Landsat 8 time-series composites and an ensemble of supervised classifiers. Int. J. Remote Sens. 2018, 39, 1243–1255. [Google Scholar] [CrossRef]
Vuolo, F.; Ng, W.T.; Atzberger, C. Smoothing and gap-filling of high resolution multi-spectral time series: Example of Landsat data. Int. J. Appl. Earth Obs. Geoinf. 2017, 57, 202–213. [Google Scholar] [CrossRef]
Bai, Z.; Fang, S.; Gao, J.; Zhang, Y.; Jin, G.; Wang, S.; Zhu, Y.; Xu, J. Could vegetation index be derive from synthetic aperture radar?–the linear relationship between interferometric coherence and NDVI. Sci. Rep. 2020, 10, 6749. [Google Scholar] [CrossRef]
Mohite, J.D.; Sawant, S.A.; Pandit, A.; Pappula, S. Investigating the Performance of Random Forest and Support Vector Regression for Estimation of Cloud-Free Ndvi Using SENTINEL-1 SAR Data. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 43, 1379–1383. [Google Scholar] [CrossRef]
Pelta, R.; Beeri, O.; Tarshish, R.; Shilo, T. Sentinel-1 to NDVI for Agricultural Fields Using Hyperlocal Dynamic Machine Learning Approach. Remote Sens. 2022, 14, 2600. [Google Scholar] [CrossRef]
Haupt, S.; Engelbrecht, J.; Kemp, J. Predicting modis EVI from SAR parameters using random forests algorithms. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 4382–4385. [Google Scholar]
Garioud, A.; Valero, S.; Giordano, S.; Mallet, C. Recurrent-based regression of Sentinel time series for continuous vegetation monitoring. Remote Sens. Environ. 2021, 263, 112419. [Google Scholar] [CrossRef]
Jing, R.; Duan, F.; Lu, F.; Zhang, M.; Zhao, W. An NDVI Retrieval Method Based on a Double-Attention Recurrent Neural Network for Cloudy Regions. Remote Sens. 2022, 14, 1632. [Google Scholar] [CrossRef]
Yadav, V.P.; Prasad, R.; Bala, R. Leaf area index estimation of wheat crop using modified water cloud model from the time-series SAR and optical satellite data. Geocarto Int. 2021, 36, 791–802. [Google Scholar] [CrossRef]
Grohnfeldt, C.; Schmitt, M.; Zhu, X. A conditional generative adversarial network to fuse SAR and multispectral optical data for cloud removal from Sentinel-2 images. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 1726–1729. [Google Scholar]
Singh, P.; Komodakis, N. Cloud-gan: Cloud removal for sentinel-2 imagery using a cyclic consistent generative adversarial networks. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 1772–1775. [Google Scholar]
Bermudez, J.D.; Happ, P.N.; Feitosa, R.Q.; Oliveira, D.A. Synthesis of multispectral optical images from SAR/optical multitemporal data using conditional generative adversarial networks. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1220–1224. [Google Scholar] [CrossRef]
Fuentes Reyes, M.; Auer, S.; Merkle, N.; Henry, C.; Schmitt, M. Sar-to-optical image translation based on conditional generative adversarial networks—Optimization, opportunities and limits. Remote Sens. 2019, 11, 2067. [Google Scholar] [CrossRef]
Gao, J.; Yuan, Q.; Li, J.; Zhang, H.; Su, X. Cloud removal with fusion of high resolution optical and SAR images using generative adversarial networks. Remote Sens. 2020, 12, 191. [Google Scholar] [CrossRef]
Darbaghshahi, F.N.; Mohammadi, M.R.; Soryani, M. Cloud removal in remote sensing images using generative adversarial networks and SAR-to-optical image translation. IEEE Trans. Geosci. Remote Sens. 2021, 60, 4105309. [Google Scholar] [CrossRef]
Zhang, X.; Qiu, Z.; Peng, C.; Ye, P. Removing cloud cover interference from Sentinel-2 imagery in Google Earth Engine by fusing Sentinel-1 SAR data with a CNN model. Int. J. Remote Sens. 2022, 43, 132–147. [Google Scholar] [CrossRef]
Mitchell, A.L.; Tapley, I.; Milne, A.K.; Williams, M.L.; Zhou, Z.S.; Lehmann, E.; Caccetta, P.; Lowell, K.; Held, A. C-and L-band SAR interoperability: Filling the gaps in continuous forest cover mapping in Tasmania. Remote Sens. Environ. 2014, 155, 58–68. [Google Scholar] [CrossRef]
Huang, X.; Ziniti, B.; Torbick, N.; Ducey, M.J. Assessment of forest above ground biomass estimation using multi-temporal C-band sentinel-1 and polarimetric L-band PALSAR-2 data. Remote Sens. 2018, 10, 1424. [Google Scholar] [CrossRef] [Green Version]
Ghafari, E.; Walker, J.P.; Das, N.N.; Davary, K.; Faridhosseini, A.; Wu, X.; Zhu, L. On the impact of C-band in place of L-band radar for SMAP downscaling. Remote Sens. Environ. 2020, 251, 112111. [Google Scholar] [CrossRef]
Hamze, M.; Baghdadi, N.; El Hajj, M.M.; Zribi, M.; Bazzi, H.; Cheviron, B.; Faour, G. Integration of L-Band Derived Soil Roughness into a Bare Soil Moisture Retrieval Approach from C-Band SAR Data. Remote Sens. 2021, 13, 2102. [Google Scholar] [CrossRef]
Kraatz, S.; Torbick, N.; Jiao, X.; Huang, X.; Robertson, L.D.; Davidson, A.; McNairn, H.; Cosh, M.H.; Siqueira, P. Comparison between Dense L-Band and C-Band Synthetic Aperture Radar (SAR) Time Series for Crop Area Mapping over a NISAR Calibration-Validation Site. Agronomy 2021, 11, 273. [Google Scholar] [CrossRef]
El Hajj, M.; Baghdadi, N.; Bazzi, H.; Zribi, M. Penetration analysis of SAR signals in the C and L bands for wheat, maize, and grasslands. Remote Sens. 2018, 11, 31. [Google Scholar] [CrossRef]
Main-Knorn, M.; Pflug, B.; Louis, J.; Debaecker, V.; Müller-Wilm, U.; Gascon, F. Sen2Cor for sentinel-2. In Image and Signal Processing for Remote Sensing XXIII; International Society for Optics and Photonics: Bellingham, WA, USA, 2017; Volume 10427, p. 1042704. [Google Scholar]
Hensley, S.; Wheeler, K.; Sadowy, G.; Jones, C.; Shaffer, S.; Zebker, H.; Miller, T.; Heavey, B.; Chuang, E.; Chao, R.; et al. The UAVSAR instrument: Description and first results. In Proceedings of the 2008 IEEE Radar Conference, Rome, Italy, 26–30 May 2008; pp. 1–6. [Google Scholar]
Pinto, N.; Simard, M.; Dubayah, R. Using InSAR coherence to map stand age in a boreal forest. Remote Sens. 2012, 5, 42–56. [Google Scholar] [CrossRef]
Simard, M.; Hensley, S.; Lavalle, M.; Dubayah, R.; Pinto, N.; Hofton, M. An empirical assessment of temporal decorrelation using the uninhabited aerial vehicle synthetic aperture radar over forested landscapes. Remote Sens. 2012, 4, 975–986. [Google Scholar] [CrossRef]
Fatoyinbo, L.; Pinto, N.; Hofton, M.; Simard, M.; Blair, B.; Saatchi, S.; Lou, Y.; Dubayah, R.; Hensley, S.; Armston, J.; et al. The 2016 NASA AfriSAR campaign: Airborne SAR and Lidar measurements of tropical forest structure and biomass in support of future satellite missions. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 4286–4287. [Google Scholar]
Geudtner, D.; Torres, R.; Snoeij, P.; Davidson, M.; Rommen, B. Sentinel-1 system capabilities and applications. In Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, Quebec City, QC, Canada, 13–18 July 2014; pp. 1457–1460. [Google Scholar]
Lasko, K.D.; Sava, E. Semi-automated land cover mapping using an ensemble of support vector machines with moderate resolution imagery integrated into a custom decision support tool. ERDC Libr. 2021. [Google Scholar] [CrossRef]
Yommy, A.S.; Liu, R.; Wu, S. SAR image despeckling using refined Lee filter. In Proceedings of the 2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics, Hangzhou, China, 26–27 August 2015; Volume 2, pp. 260–265. [Google Scholar]
Quegan, S.; Le Toan, T.; Yu, J.J.; Ribbes, F.; Floury, N. Multitemporal ERS SAR analysis applied to forest mapping. IEEE Trans. Geosci. Remote Sens. 2000, 38, 741–753. [Google Scholar] [CrossRef]
Mascolo, L.; Lopez-Sanchez, J.M.; Cloude, S.R. Thermal noise removal from polarimetric Sentinel-1 data. IEEE Geosci. Remote Sens. Lett. 2021, 19, 4009105. [Google Scholar] [CrossRef]
Zuhlke, M.; Fomferra, N.; Brockmann, C.; Peters, M.; Veci, L.; Malik, J.; Regner, P. SNAP (sentinel application platform) and the ESA sentinel 3 toolbox. In Proceedings of the Sentinel-3 for Science Workshop, Venice, Italy, 2–5 June 2015; Volume 734, p. 21. [Google Scholar]
Refaeilzadeh, P.; Tang, L.; Liu, H. Cross-validation. Encycl. Database Syst. 2009, 5, 532–538. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Bousbih, S.; Zribi, M.; El Hajj, M.; Baghdadi, N.; Lili-Chabaane, Z.; Gao, Q.; Fanise, P. Soil moisture and irrigation mapping in A semi-arid region, based on the synergetic use of Sentinel-1 and Sentinel-2 data. Remote Sens. 2018, 10, 1953. [Google Scholar] [CrossRef]
Archer, K.J.; Kimes, R.V. Empirical characterization of random forest variable importance measures. Comput. Stat. Data Anal. 2008, 52, 2249–2260. [Google Scholar] [CrossRef]
Lasko, K.; Vadrevu, K.P.; Tran, V.T.; Justice, C. Mapping double and single crop paddy rice with Sentinel-1A at varying spatial scales and polarizations in Hanoi, Vietnam. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 498–512. [Google Scholar] [CrossRef]
Mazza, A.; Gargiulo, M.; Scarpa, G.; Gaetano, R. Estimating the ndvi from sar by convolutional neural networks. In Proceedings of the IGARSS 2018–2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 1954–1957. [Google Scholar]
Pipia, L.; Muñoz-Marí, J.; Amin, E.; Belda, S.; Camps-Valls, G.; Verrelst, J. Fusing optical and SAR time series for LAI gap filling with multioutput Gaussian processes. Remote Sens. Environ. 2019, 235, 111452. [Google Scholar] [CrossRef]
Alvarez-Mozos, J.; Villanueva, J.; Arias, M.; Gonzalez-Audicana, M. Correlation Between NDVI and Sentinel-1 Derived Features for Maize. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 6773–6776. [Google Scholar]
Sarafanov, M.; Kazakov, E.; Nikitin, N.O.; Kalyuzhnaya, A.V. A machine learning approach for remote sensing data gap-filling with open-source implementation: An example regarding land surface temperature, surface albedo and NDVI. Remote Sens. 2020, 12, 3865. [Google Scholar] [CrossRef]
Chen, Y.; Cao, R.; Chen, J.; Liu, L.; Matsushita, B. A practical approach to reconstruct high-quality Landsat NDVI time-series data by gap filling and the Savitzky–Golay filter. ISPRS J. Photogramm. Remote Sens. 2021, 180, 174–190. [Google Scholar] [CrossRef]

Figure 1. Overview of the study area located in coastal South Carolina, USA. The UAVSAR, de-noised Sentinel-1 and Sentinel-2 multispectral imagery are all shown. The land cover type map used for training data stratification and error evaluation is also shown.

Figure 2. Simplified overview of the study. The graphic shows the SAR datasets that were evaluated, how training and validation data for the machine learning models was obtained and the different analyses that were conducted for predicting the spectral index values for cloud-obscured pixels.

Figure 3. The proportion of model variance explained (R²) and Mean Absolute Error (MAE) are shown for both NDVI and NDWI. Each subplot shows the performance of the respective machine learning model and how it varies based on the number of sample points used to train the model. The shading adjacent to each line represents the 90% confidence interval of the estimate.

Figure 4. The original spectral indexes are shown for NDVI (top row) and NDWI (bottom row) with the model estimated pixel values (2nd column), combined multispectral and SAR-based cloud-filled indexes (3rd column) and the absolute error between the original spectral index and the SAR-predicted spectral index values computed per pixel. No data areas containing cloud or cloud shadow appear as white color in the absolute error map.

Figure 5. This view zooms in on a portion of the study area containing errors for NDVI at the top and NDWI at the bottom. The first sub-plot shows the original NDVI which contains clouds and cloud shadow, the 2nd subplot (top-right) shows the NDVI values estimated using SAR, the 3rd subplot (bottom-left) shows the original NDVI with cloud and cloud shadow pixels gap-filled with SAR-predicted NDVI values and the 4th subplot (bottom-right) shows the absolute error per pixel. Most errors are observed adjacent to clouds or over agricultural lands with mid-range NDVI or NDWI.

Figure 6. The breakdown of Mean Absolute Error (MAE) across the different land cover types for each of the dataset combinations applied to the random forest model.

Figure 7. Mean Decrease in Impurity (MDI) for each of the variables used in the random forest model with de-noised Sentinel-1 and UAVSAR. The C11 band (obtained from VH-polarization) was found to be the most important band compared with the rest.

Figure 8. The top sub-plot shows NDVI and the bottom sub-plot shows NDWI. R² and MAE are shown for models computed using each variable individually in order to evaluate the impact of each variable on the model’s performance. The metrics were computed for 18 September 2018 when all three datasets (UAVSAR, Sentinel-1 and Sentinel-2) had imagery, as well as using multispectral imagery offset from the SAR imagery (21 September 2018, 26 September 2018). Linear Regression (LR) was compared against Random Forest (RF) for each dataset and date.

Figure 9. Scatterplots of observed NDVI and NDWI as a function of SAR-predicted NDVI and NDWI for each of the three different seasons of imagery.

Table 1. SAR and multispectral datasets used in the first segment of this study are shown with the de-noising technique applied (if applicable) and the polarizations that were used. All of the SAR datasets were resampled to a common 10 m resolution and aligned with the Sentinel-2 multispectral imagery.

Platform	Date	Dataset	De-Noise Technique	Resampled Resolution
UAVSAR L-band	18 September 2018	HHHV, VVVV	N/A	10 m
UAVSAR L-band	18 September 2018	HHHV GLCM Variance	N/A	10 m
Sentinel-1 C-band	18 September 2018	VV & VH	Refined Lee filter	10 m
Sentinel-1 C-band	18 September 2018	VV & VH	Multi-temporal (MT) filter/noise reduction	10 m
Sentinel-1 C-band	18 September 2018	C11 & C22	Covariance (CV) matrix noise reduction	10 m
Sentinel-1 C-band	18 September 2018	C11 GLCM Variance	Covariance (CV) matrix noise reduction	10 m
Sentinel-2A	18 September 2018	NDVI, NDWI	N/A	10 m
Sentinel-2A	21 September 2018	NDVI, NDWI	N/A	10 m
Sentinel-2B	26 September 2018	NDVI, NDWI	N/A	10 m

Table 2. Comparison of RF model performance for the different dataset combinations for both spectral indexes.

Datasets	Index	R² (SD)	MAE
S1 (C11/C22), UAVSAR, and texture	NDVI	0.868 (±0.015)	0.094 (±0.003)
S1 MTdn, UAVSAR, texture	NDVI	0.866 (±0.015)	0.094 (±0.004)
S1, UAVSAR, texture	NDVI	0.859 (±0.018)	0.097 (±0.004)
S1 (C11/C22), UAVSAR	NDVI	0.849 (±0.013)	0.100 (±0.003)
S1, UAVSAR	NDVI	0.830 (±0.014)	0.106 (±0.002)
S1 (C11/C22), texture	NDVI	0.816 (±0.022)	0.109 (±0.005)
S1 MTdn, texture	NDVI	0.816 (±0.015)	0.111 (±0.004)
S1 (C11/C22)	NDVI	0.791 (±0.02)	0.118 (±0.005)
S1, texture	NDVI	0.789 (±0.022)	0.119 (±0.004)
S1 MTdn	NDVI	0.786 (±0.016)	0.120 (±0.002)
UAVSAR, Texture	NDVI	0.768 (±0.021)	0.118 (±0.004)
S1	NDVI	0.719 (±0.01)	0.137 (±0.003)
UAVSAR	NDVI	0.656 (±0.032)	0.135 (±0.003)
S1 (C11/C22), UAVSAR, and texture	NDWI	0.910 (±0.014)	0.078 (±0.003)
S1 MTdn, UAVSAR, texture	NDWI	0.904 (±0.015)	0.079 (±0.004)
S1, UAVSAR, texture	NDWI	0.902 (±0.017)	0.080 (±0.004)
S1 (C11/C22), UAVSAR	NDWI	0.884 (±0.014)	0.086 (±0.003)
S1 (C11/C22), texture	NDWI	0.877 (±0.018)	0.090 (±0.004)
S1 MTdn, texture	NDWI	0.870 (±0.014)	0.092 (±0.004)
S1, UAVSAR	NDWI	0.860 (±0.015)	0.092 (±0.003)
S1, texture	NDWI	0.854 (±0.018)	0.098 (±0.003)
S1 (C11/C22)	NDWI	0.840 (±0.016)	0.100 (±0.005)
S1 MTdn	NDWI	0.831 (±0.019)	0.103 (±0.002)
UAVSAR, Texture	NDWI	0.784 (±0.02)	0.106 (±0.003)
S1	NDWI	0.771 (±0.009)	0.118 (±0.003)
UAVSAR	NDWI	0.645 (±0.047)	0.127 (±0.002)

Table 3. Variation in RF-based predicted NDVI and NDWI accuracy based on the different seasons of imagery.

Date	Dataset	Index	R²	MAE
15 June 2019	S1 (C11/C22), texture	NDVI	0.719 ± 0.018	0.142 ± 0.004
15 June 2019	S1 (C11/C22), texture	NDWI	0.774 ± 0.022	0.125 ± 0.004
10 January 2019	S1 (C11/C22), texture	NDVI	0.60 ± 0.036	0.211 ± 0.005
10 January 2019	S1 (C11/C22), texture	NDWI	0.609 ± 0.043	0.209 ± 0.005
18 September 2018	S1 (C11/C22), texture	NDWI	0.877 ± 0.018	0.090 ± 0.004
18 September 2018	S1 (C11/C22), texture	NDVI	0.816 ± 0.022	0.109 ± 0.005

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lasko, K. Gap Filling Cloudy Sentinel-2 NDVI and NDWI Pixels with Multi-Frequency Denoised C-Band and L-Band Synthetic Aperture Radar (SAR), Texture, and Shallow Learning Techniques. Remote Sens. 2022, 14, 4221. https://doi.org/10.3390/rs14174221

AMA Style

Lasko K. Gap Filling Cloudy Sentinel-2 NDVI and NDWI Pixels with Multi-Frequency Denoised C-Band and L-Band Synthetic Aperture Radar (SAR), Texture, and Shallow Learning Techniques. Remote Sensing. 2022; 14(17):4221. https://doi.org/10.3390/rs14174221

Chicago/Turabian Style

Lasko, Kristofer. 2022. "Gap Filling Cloudy Sentinel-2 NDVI and NDWI Pixels with Multi-Frequency Denoised C-Band and L-Band Synthetic Aperture Radar (SAR), Texture, and Shallow Learning Techniques" Remote Sensing 14, no. 17: 4221. https://doi.org/10.3390/rs14174221

APA Style

Lasko, K. (2022). Gap Filling Cloudy Sentinel-2 NDVI and NDWI Pixels with Multi-Frequency Denoised C-Band and L-Band Synthetic Aperture Radar (SAR), Texture, and Shallow Learning Techniques. Remote Sensing, 14(17), 4221. https://doi.org/10.3390/rs14174221

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Gap Filling Cloudy Sentinel-2 NDVI and NDWI Pixels with Multi-Frequency Denoised C-Band and L-Band Synthetic Aperture Radar (SAR), Texture, and Shallow Learning Techniques

Abstract

1. Introduction

2. Study Area and Datasets

2.1. Study Area

2.2. Sentinel-2 Imagery

2.3. UAVSAR

2.4. Sentinel-1

2.5. Land Cover

3. Methods

3.1. SAR De-Noising

3.2. Spectral Indexes

3.3. Training Sample Generation and Prediction of Indexes

4. Results

5. Discussion

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI