1. Introduction
Peatlands are characterized by persistent soil saturation at or near the surface [
1] and develop primarily in cool climates [
2]. They perform many important environmental functions, including the regulation of carbon and water cycling from local to global scales [
3,
4]. Hydrological parameters, such as surface soil moisture and depth to water table, control the rate at which atmospheric carbon is absorbed and released from northern peatlands [
5]. Therefore, measurements or simulations of soil moisture and water table depth are key inputs to carbon models [
6,
7,
8]. Such models often use point-scale measurements of these parameters, which are extrapolated across peatland landscapes. Measurement or interpolation errors can lead to unrealistic estimates of peatland hydrology and errors in estimates of greenhouse gas outputs [
7]. Furthermore, these point measurements are not practical across large areas and may not be valid at larger scales or over time [
9,
10,
11]. Techniques are required to bridge the gap between point-scale field observations and the much broader scale of observation required for decision making. Using remote sensing, information can be captured both at broader spatial scales and repeatedly over time.
Synthetic Aperture Radar (SAR) is an active microwave remote sensing technique that shows promise in the application of monitoring hydrologic conditions [
12,
13,
14,
15]. Microwave sensors are sensitive to the dielectric constant (permittivity) of their target (i.e., surface materials). As water has a very high dielectric constant (~80) [
16], soil surfaces that are wetter produce a stronger response in SAR than drier surfaces [
17]. This means that SAR should be sensitive to variation in surface soil moisture and therefore is often used to create soil moisture retrieval models. SAR is also sensitive to vegetation structure and water content, which can be confounded with the soil moisture component of the signal [
18].
Typical approaches used to estimate surface moisture conditions with SAR include empirical, semi-empirical, and physically based models [
11,
19]. The focus of this research is on the use of the more commonly used empirical approach. To build a robust model capable of accurately predicting soil moisture at locations or times outside the data used in the model, the full variability of potential conditions must be captured in the field data. This may require collecting observations that span the full range of hydrologic conditions occurring within the study site, thus requiring longer-term measurement campaigns. Statistical techniques allow multiple dates to be pooled into one large dataset, thereby increasing the sample size, such that a “global” model can be fit to explain variability over time and space. However, repeated observations at fixed locations can be temporally autocorrelated [
20], meaning that a measurement taken at (x,y)
i¸|T
j is similar to the measurement acquired at (x,y)
i¸|T
j+1 (where x,y represents the coordinate at sampling site i; T represents time and j represents sequential sampling campaigns). Ignoring inherent autocorrelation and treating these data as a random sample is termed “pseudoreplication in time” and can lead to artificially low standard errors and confidence intervals for estimated parameters and/or inflated Type I error [
21]. Quantifying temporal autocorrelation can be difficult when data are collected at irregular intervals and when sample sizes are small (i.e., a small number of sampling dates). However, in a case where data is measured repeatedly over time (sometimes referred to as repeated measures or longitudinal data) at the same locations, there will most certainly be some level of temporal autocorrelation.
If data collected on individual dates are not independent of each other, mixed-effects models can be used to account for pseudo-replication in time [
20]. These models contain both fixed effects (in this case, the independent remote sensing and landscape variables used to predict the dependent variable) and random effects; hence the name “mixed” effects. Either model slopes and/or intercepts can be allowed to vary randomly for “block” variables in the model, and resulting in residuals that are not dependent upon the hierarchical structure of the data (e.g., the
Subject or the specific location/pixel) thereby avoiding violations of independence in the data points (Laird and Ware, 1982). Block variables are categorical variables such as the class of an object (e.g., peatland class). In biological and remote sensing activities, the identification code for the location where measurements are repeatedly acquired can be considered a random block variable.
Nakagawa and Schielzeth [
22] showed that estimates of explained variance (i.e., R
2 values) can be generated for both the fixed and random effects of a mixed effects model. The marginal R
2 indicates the amount of variation explained by the fixed effects, and the conditional R
2 is the amount of variation explained by both the fixed and random effects together. Therefore, mixed effects models allow the addition of fixed effects (i.e., remote sensing data) to be assessed in relation to block variables, and can provide an understanding of the unique contribution of fixed effects to the overall predictive strength of the model.
The most common objective in remote sensing is to make predictions of a variable of interest at new locations across space or time, yet few authors have used mixed-effects models for this purpose. Of the few examples [
23,
24,
25,
26,
27], none of these examples include SAR or multi-sensor remote sensing, and none address the relative addition of remote sensing data as a fixed effect to the model. In this study, we test Linear Mixed Effects Models (LMEs) and specifically assess the contribution of multi-source remote sensing (SAR, Moderate Resolution Imaging Spectroradiometer (MODIS) and Light Detection and Ranging (LiDAR)) while accounting for the effect of peatland class, measurement sites, and dates of observation. The specific goal of this research was to build predictive models of soil moisture (volumetric water content; VWC) within a peatland across both space and time, with a focus on the ability to predict soil moisture in images not used for model fitting. In general, to produce a strong, valid model of a dependent variable, the unexplained variance must be minimized and the true “signal” maximized. SAR data is inherently noisy due to speckle, but also due to the effect of spatially and temporally variable vegetation and spatially variable surface roughness, making the effect of soil moisture on SAR backscatter (i.e., the “signal”) difficult to detect. Covariates are often measured in an attempt to explain other influences on the SAR signal (e.g., such as vegetation). We collected covariates from a variety of sources and at different scales (Drought Code, Day of Year, number of days since last rain, LiDAR vegetation density, MODIS Normalized Difference Vegetation Index (NDVI) composite data).
This study builds on previous work by Millard and Richardson. Millard and Richardson [
28], created an ecosystem map that has been used to spatially determine Class for each data point in our models and predictions. In Millard and Richardson [
29], it was found that spatial models of VWC based on imagery of a single date were generally poor, with several of the dates tested resulting in poor relationships between soil moisture (VWC) and SAR parameters. However, in that study field measured sample sizes on each date were small (
n = 32 on most dates, fewer on others) and VWC was often highly skewed. When data are pooled across all dates, the sample size is much larger (
n = 249, across 32 sites and seven dates) and the pooled VWC data are close to normal distribution. However, these data cannot be treated as independent as they are repeated measures, which violates an assumption of linear regression. New to this study, we assess the presence of temporal autocorrelation in field-measured VWC and document trials of monitoring VWC over both space and time using multi-sensor remote sensing data (SAR, MODIS, LiDAR) alongside covariates in linear mixed effects models. We assess the relative contribution of these remote sensing data as a fixed effect in relation to contribution of environmental covariates as random effects. Through this temporal analysis, spatial predictions of VWC were also created and all model predictions are validated to provide unbiased estimates of model error. The specific objectives of this research were:
Determine the strength and significance of temporal autocorrelation in field measured VWC in a temperate peatland;
Use linear mixed effects models to predict VWC from remotely sensed imagery over time;
Perform independent validation of linear mixed effects models to objectively quantify predictive accuracy of remote sensing-derived surface soil moisture.
2. Study Area
This research was conducted in a peatland, locally referred to as “Alfred Bog”, located near the town of Alfred, Ontario, Canada (
Figure 1); it is similar to the nearby, and more well-known Mer Bleue Bog (
Figure 1) which has been the subject of scientific examination and monitoring for over 20 years [
30,
31,
32,
33,
34]. Alfred Bog consists of several different types of peatland: (1) Shrub Bog: domed,
Sphagnum and shrub dominated bog. The large proportion of hummocks results in a very dry surface with little variability in wetness throughout the summer. The dome of the bog rises approximately 2.5 m above the rest of the peatland and approximately 8 m above the surrounding agricultural areas (
Figure 2). (2) Treed Bog: black spruce (
Picea mariana) treed bog is highly variable in wetness and vegetation height and density and (3) Fen: patterned poor-fen with strings (raised linear feature) and flarks (depressions that may seasonally form pools of water between strings) (
Figure 2). Ponding within the fen is highly variable within and between years based on local rainfall.
Alfred Bog is similar in vegetation and landscape characteristics to boreal peatlands and therefore the methods developed at this site could potentially be adapted to northern peatlands (
Figure 1) that are thought to be at risk of undergoing changes in hydrological regimes induced by climate change. For a more in-depth description of the study site, please see [
28,
29].
4. Results
Temporal autocorrelation analysis indicated that both VWC and SAR datasets exhibited moderate autocorrelation (ρ = 0.1–0.6 depending on the site) at a lag of 1 and 2 time intervals, with lag 1 always being greater than lag 2. At greater lag differences autocorrelation was somewhat reduced, but still moderate at many sites (e.g., ρ < 0.4).
Using the pooled data in linear mixed effects models generally did not lead to strong predictive power for soil moisture. The model-estimated RMSE and R
2 values were promising for all models, but independent validation indicated that these models were not strong predictors of VWC outside the data used to train them. In many cases, negative independently validated R
2 values were produced, indicating poor models. Negative explained variance is possible when the variation in the residuals is larger than the variance in the data used in creating the model. In the model reporting the highest R
2 (model A), calculation of explained variance using a random selection of data points indicated that this model did perform quite well, however, it is important to note that this is not an independent validation. All models resulted in a low marginal R
2 (the unique contribution explained by fixed effects) but higher conditional R
2 (the variance explained by fixed and random effects together). Model E (SAR alone as a fixed effect,
Class as random effect) had significantly lower marginal R
2 than model G (a similar model including LiDAR and MODIS, as well as SAR, as fixed effects), but the independent validation of model E indicated it was a better predictor. Additionally, while model G indicated relatively high marginal and conditional R
2, the independently validated R
2 indicated poor predictability. The model using both
Date and
Class as random effects (model C) and the model using remote sensing covariates (model G,
Class as a random effect), both produced the low (positive) average independently validated RMSE (
Table 2).
The maps of predicted VWC confirm that model results are highly dependent on random terms (
Figure 4). Where
Class was used as a random effect, the pattern of the three classes is easily identified in the predicted maps. In model C, the standard deviation of the intercept for
Class (σ = 18.7) is much larger than the standard deviation for the
Date intercept (σ = 11.3) indicating that there are more differences between the classes than there are between dates. In Model A, where
Site,
Date and
Class were all used as random effects,
Site had the smallest standard deviation (σ = 67) and
Class (σ = 409) the largest (with the standard deviation of
Date = 188). This indicates that the measurements collected at each site over time exhibit lower variability than those collected across sites on a specific
Date, and the largest variability occurs within each of the given
Class land cover types.
5. Discussion
This research addresses variability in peatland soil moisture across both space and time. While previous studies have generally focused on spatially predicting soil moisture using remotely sensed data of a single date, the aim was to build a model that could predict soil moisture using images acquired in the future (i.e., for monitoring soil moisture spatially and temporally). Throughout the literature, several authors have used empirical models to predict spatial and temporal variation in soil moisture from SAR data. Millard and Richardson [
29] summarized the literature (see [
29]
Table 1) and reported R
2 values ranging from 0–0.93 for relationships between SAR and soil moisture, however, none of these studies addressed temporal autocorrelation. More specifically to peatland environments and applications, Jacombe et al. [
14] found relationships ranging from R
2 0.0–0.53, and Millard and Richardson [
29] produced models ranging from R
2 0.14 and 0.66. Similar to these previous studies, the research presented here resulted in wide-ranging R
2 values (from R
2 = 0.18–0.89), but through independent validation it was determined these estimates of explained variance were inflated and resulted in R
2 ranging from 0–0.30 (independently validated using a with-held date). The literature does not indicate common use of mixed effects models to account for repeated measures issues and non-independence in remote sensing or field data used to create models from remote sensing data. Of the examples found, both [
24,
28] calculated pseudo-R
2 values but did not compute the R
2 components associated with the fixed and random effects. Neither of these studies used
Class-specific measurements but used mixed models to account for autocorrelation in repeated measurements. This could be explained by the relatively new ability to measure marginal and conditional explained variance [
22] in common statistical analysis packages [
45].
The ability to predict a physical variable from an independent image that is not used in building the predictive model is an important problem in remote sensing, as being able to do so allows remotely monitor landscapes. However, we have demonstrated here that there are several challenges to using the empirical methods that are traditionally used in non-temporal analysis and important aspects of the data that must be taken into consideration.
5.1. Temporal Autocorrelation in Repeatedly-Measured Data
In practice, the field-measured data used in building models for prediction of the dependent variable over time are often collected on only a few dates, and detecting temporal autocorrelation within these data may be difficult. We demonstrate here that temporal autocorrelation can be present in data, even if it is difficult to visually identify through traditional plotting of time series data. However, it should be assumed that if data are collected repeatedly at the same locations over time there is likely to be autocorrelation between the data points, and mixed effects models are one method that can be used for prediction or the assessment of relationships. Other authors have built models using data that were collected repeatedly over time at the same locations without an assessment of temporal autocorrelation (e.g., [
46]); therefore, estimates of the significance of these models may be inflated [
21].
5.2. New Insights Using Mixed Effects Models
In previous research [
29], we used more traditional empirical modeling (bivariate linear regression) to assess a similar set of VWC and remote sensing data to that used here. That approach was restrictive, in that data collected from each date must be assessed independently of all other dates, resulting in small sample sizes and skewed residuals in models. It was hypothesized that by pooling the data (thereby increasing sample size and incorporating data from a wider range of values in a single model) that the predictive strength would increase. In order to do so, the mixed effects approach was used to account for non-independence in the temporal data. This approach also allowed the assessment of the contribution of different remote sensing datasets to predictive strength, which is a fundamentally important exercise in the promotion and advancement of the operational use of remote sensing for environmental monitoring. In addition, this approach allowed the contribution of different categorical variables and environmental covariates to be quantified.
The mixed effects approach also allowed the different effects of block variables to be assessed, which cannot be done using traditional linear modelling approaches. Using the Subject (in this case Site, or Class of measurement) as a random effect in models allows any non-independence between the measurements of each subject to be accounted for. The Date the measurement was acquired can also have a profound effect on the measurements acquired due to subject-wide variability in climate and weather on each specific date. Here, Date was found to be an important predictor, which highlights the importance of accounting for the variable relationships that may occur on different dates. However, the intercept of Class had a higher standard deviation than Date, meaning that there are greater differences in peatland soil moisture between Class than between Date, and measurements collected over time will exhibit lower variability than those collected across sites on a specific date.
Assessment of the marginal and conditional R
2 of the different models allowed the relative components of the different variables to be assessed. Although the SAR polarimetric parameter (Freeman Durden Power due to Rough Surface Scattering) is theoretically related to surface conditions including soil moisture, this variable explained little variance in all models as compared to the random effect block factors (
Site,
Class,
Date). All models resulted in a low marginal R
2 (the variance explained by fixed effects) but high conditional R
2 (the variance explained by fixed + random effects). This means that the fixed effects (remote sensing parameters: SAR, MODIS, LiDAR) explain little variability in the models and most of the variability in VWC can be attributed to differences in VWC between
Dates,
Classes or in the
Site (depending on the specific model generated). Remote sensing data are fixed effects model parameters, but as they allow spatial and temporal prediction and large scale monitoring, are not without value. However, this highlights a larger problem with the use of SAR to predict VWC in peatlands at the pixel level. In SAR data, the within-class variability is often high due to the effect of SAR speckle [
47]. But, in the field-measured soil moisture, the between-class variability is often greater than the within-class variability over time, making temporal prediction difficult. This also means that by simply knowing the class and date that a data point or pixel belongs to, a reasonable estimate of the VWC can be predicted at any given point regardless of the fixed effect (SAR) value. However, this is dependent on the requirement of field data to inform the model of the mean VWC of each class.
Environmental covariates explained a relatively large proportion of variability in models (indicated by high marginal R
2). While these variables are “site-wide”, their addition could explain weather-specific or season-specific trends in time series data that may not be evident due to spatial variability within each SAR image. This is important, as data collected from a meteorological station nearby could explain a hidden temporal component. Future studies should look at a larger time series of data. For example, data spaced more frequently in time or a dataset that spans several growing seasons could capture more of the variability in hydrologic and weather conditions (see
Section 5.3.3 below). Remote sensing covariates (MODIS and LiDAR) also explained a relatively large proportion of the variability in models where they were used. Without SAR data, LiDAR and MODIS together resulted in a marginal R
2 = 0.14. The addition of SAR data to models containing both the remote sensing covariates and environmental covariates resulted in the highest marginal R
2 (0.51). However, none of these models indicated positive independently validated R
2. SAR and the remote sensing covariates alone did result in a low but positive independently validated R
2, which shows promise in the use of these covariates for soil moisture monitoring.
5.3. Limitations
5.3.1. Small Sample Size
As in most research that requires field data collection, the number of samples collected for this analysis was a limiting factor. In a previous study [
29], where single date models were created (9 models, each model’s
n ≤ 32), it was also found that sample size was as an issue relating to poor model strength. It was hypothesized that pooling data from several dates (i.e., increasing sample size and variability in the overall sample) would lead to a stronger model. Although in this research 249 site measurements were collected, these were not statistically independent, and therefore the analysis that could be performed was somewhat restricted. For example, LMEs allow both random slopes and intercepts to be modeled. We used peatland Class as our subject, and allowed each class to be modeled with a different intercept. Since fen sites were generally wetter than bog sites, this difference in wetness can be captured by allowing variable intercepts per class. The size of the dataset used here did not allow for random slopes to also be modeled due to low degrees of freedom. The slope of relationship between soil moisture and SAR may also vary between classes due to variable standing water and saturated conditions between these two classes on any given date and over time. We also believe that sample size led to overfit models because of the spatial variability in the SAR data and the variability in temporal environmental conditions between the image acquisition dates. This resulted in a large divergence between model R
2 and independently validated R
2 and stresses the importance of reporting independently validated model results.
5.3.2. Independent Validation with Random Effects
As was demonstrated in
Table 2, it can be difficult to independently validate the predictions of linear mixed effects models. Where a random effect is included in a model, each unique instance of that variable must also exist in the predicted data, or else the globally estimated parameters will be used. Therefore, when
Date is used as a random effect, it is not ideal to independently validate models based on a withheld date. Using a withheld
Site will allow a spatial estimate of the model strength to be independently assessed but the validation dataset will contain all dates that data were collected at that site and, therefore, the independent validation does not provide insight into the model’s ability to predict temporally (i.e., it cannot be used to predict VWC from imagery on dates not used to create the model). Also, when
Date,
Class and
Site were together used as random effects in a single model, it is impossible to truly independently validate as a random sample of data points must be used.
Additionally, since temporal autocorrelation is inherent in data collected at the same locations over time, using a withheld date to validate may not truly represent an independent validation. The only way to properly independently validate would be to fit a model using data from one study area and use it to predict the dependent variable at another completely independent study area at a different time, thereby avoiding both spatial and temporal autocorrelation. This makes it difficult to use these types of models to monitor a dependent variable such as soil moisture using remotely sensed imagery.
5.3.3. Temporal Prediction of VWC
The spatial and temporal differences of between- and within-class variability are much more pronounced in the field data than the SAR data. In general, the SAR data are noisy and show considerable overlap between classes on specific dates and between all classes over time. This could be a result of speckle in the SAR data, although the processing methods reduced speckle significantly. Additionally, data collection was restricted to a single growing season and we acquired images and field data every 24 days. This was done because the same incident angle is only acquired every 24 days with RADARSAT-2 and we aimed to exclude the effect of variable SAR incident angles and pass directions. Therefore, each date that we collected data exhibited somewhat different hydrological and vegetation conditions than the other dates. One way to avoid this is to collect SAR data more frequently (as will be available with the upcoming Radarsat Constellation Mission) or to collect data over more than one growing season. This may also reduce the importance of Date as a random effect and increase the importance of environmental variables as fixed effects. Using Date as a random effect allows each date to have a variable intercept which could explain any offsets seen between images due to weather and lighting (i.e., in passive optical imagery only) conditions. However, as similar weather and lighting conditions are recorded more than once in the dataset, the specific date will become less important and the environmental phenomenon causing these differences will become more important.
Although Root Mean Square Error of models to predict VWC was sometimes high (e.g., RMSE was >20%), predicted VWC follows similar temporal trends over time as measured VWC when Date is included as a random effect in the model. Therefore, with a dataset that captures more temporal variability in soil moisture and vegetation conditions, independently validated R2 should reach levels similar to conditional R2. In that case, at sites where the peatland class is known, models based on SAR backscatter could be used to produce spatially variable estimates of VWC on a given date, but only if some monitoring data are available on each image acquisition date. In practical terms, after a significant data collection campaign to capture conditions both spatially and temporally (similar to the data collection here), prediction on future dates requires the installation of self-logging, in-situ VWC sensors. While this may be useful in smaller and more accessible peatlands such as Alfred Bog and Mer Bleue, it is not practical in more remote and expansive peatland complexes.
6. Conclusions
We quantified the spatial and temporal variability of soil moisture and SAR data in a north-temperate peatland complex and assessed predictive accuracy of empirical soil moisture retrieval and its potential for operational monitoring of peatland hydrology. Linear mixed effects models (LMEs) were used to build predictive models of soil moisture, as an approach to overcome temporal autocorrelation in soil moisture conditions. LMEs allow the effects of data collected at the same locations across various dates (i.e., repeated measures) to be accounted for, as well as the effects of landcover class.
Non-independence of repeated measures of field data in time series data is not well addressed in the remote sensing literature, and we demonstrate the use of mixed effects models to quantify the relative contribution of both remotely sensed data and landscape-level environmental covariates. Our findings indicate that LMEs are appropriate to address this prevalent issue. Although the models resulted in high R2 values (e.g., R2 = 0.24–0.89), most of the variability was explained through random effects (block factors of Site, Date, or Class), and fixed effects (SAR data) contributed comparatively less to the models (marginal R2 = 0.01–0.07 for SAR in temporal models). Landscape-level covariates explained much of the variability in models (i.e., marginal R2 increased to 0.16 when these variables were included), but because these data vary only temporally but not spatially throughout the peatland, their explanatory power cannot be assessed using traditional modelling methods. Remotely sensed data provides both a spatial and temporal component to the prediction of VWC which are invaluable for ecosystem monitoring, provided the issue of statistical non-independence is adequately addressed within the modelling approach.