1. Introduction
Soil moisture (SM) is an important state variable governing the partitioning of rainfall into runoff and water that infiltrates the soil. Although the water contained in soil is only a tiny fraction of all the water on the Earth, it influences important extreme events, such as floods and droughts [
1]. SM is existing soil water stored among soil particles and pores. Moreover, SM is a hydrometric factor that plays a crucial role in the exchange of water and energy in the land and atmosphere. Notably, SM has been studied in the agricultural field regarding plant growth, water resources field for rainfall-runoff, and meteorological field regarding interactions between the atmosphere and land [
2].
There are multiple ways to estimate SM, including in situ networks and satellite remote sensing. Traditional in situ measurements provide valuable information on SM at different soil depths. Many field techniques are available, such as oven drying, neutron probe, time/frequency domain reflectometry (TDR/FDR), and capacitance measurements [
3]. TDR, tension-measuring, and gravimetric methods are available to measure SM through ground observations. However, these methods are expensive and time consuming when used on large areas. Most allow point measurements, which provide information on the SM content at specific points only. In addition, reliability of the point data is also poor due to the short observation period as well as many missing data. Therefore, in situ SM is regarded as the true value of SM and is commonly used as a reference to validate remotely sensed SM retrieval [
4,
5,
6]. However, in situ SM data of point scale are difficult to use as spatial SM [
7,
8,
9] in large areas.
The estimation of spatial SM can be divided into direct measurement using microwave satellites and indirect measurement using land surface parameters related to SM without using microwave satellite data. First, microwave satellites are widely used because they allow the continuous observation of spatial SM over a wide area [
10,
11]. However, there are several limitations to directly and immediately using SM data based on satellites: (1) the relatively low spatiotemporal resolution of the data; (2) the only regional optimization algorithms for satellite soil moisture estimation are available from the National Aeronautics and Space Administration (NASA), European Space Agency (ESA), and Japan Aerospace Exploration Agency (JAXA); (3) the satellite data have not been calibrated and verified; and (4) radio interference. Nevertheless, many studies have been actively conducted to calibrate spatial SM based on satellite data compared to point-based observations of SM over wide areas [
12,
13,
14]. For example, using the AMSR satellite, Kim et al. [
15] obtained AMSR2 spatial SM data and performed a correction process using observed SM data. In addition, other studies have proposed microwave satellites SM downscaling method using MODIS and in situ data [
16,
17,
18,
19]. Second, spatial SM can be estimated from land surface parameters such as LST and NDVI without using microwave satellite data [
20]. Moreover, LST has a unique relationship with spatial SM [
21]. Examples of applications as non-microwave satellites using moderate resolution imaging spectroradiometer (MODIS) LST data are as follows. Cai et al. [
22] obtained SM retrieval form Terra MODIS L1B data. Kim et al. [
23] improved the spatial SM of AMSE-E through the integration of MODIS land surface temperature (LST), the enhanced vegetation index (EVI) and albedo. MODIS data have been successfully used to increase the accuracy of spatial SM estimates.
Although MODIS LST has been used in various ways, MODIS LST should be regenerated using gap-filling and correction processes, which can convert global data to region data. Notably, daily MODIS LST datasets (MOD11_L2) have missing values due to clouds and atmospheric conditions [
24]. In some previous studies, MODIS LST was corrected by spatial interpolation and geostatistical methods [
24,
25]. To overcome this issue, geostatistical methods (e.g., kriging, inverse distance weighting and spline methods) can be applied to ground measurements by matching the spatial scale with satellite-based products [
26,
27]. In rainfall research, various strategies for combining satellite-based and observed data have been widely used to overcome the limitation of areal representativeness of point scale measurements and the high variability in satellite-based datasets. This method, which is called “conditional merging (CM)” in Sinclari et al. [
28], uses the radar field to estimate the error associated with the ordinary kriging method based on rain gauges and to correct it [
29]. In general, previous studies that have used the CM method yielded reasonable results compared to ground-based measurements and exhibited improved spatial and temporal variability [
30].
The overall goal of this study was to estimate spatial SM based on MODIS LST and normalized difference vegetation index (NDVI) data via multiple linear regression (MLR) analysis (
Figure 1). The specific objectives of the study were as follows: (1) to correct MODIS LST using the CM method; (2) to develop the MLR model using corrected MODIS LST, MODIS NDVI data, and interpolated precipitation (PCP); (3) to generate the spatial distribution of SM using normalized regression coefficients; (4) to assess spatial SM estimates and (5) to compare the SM index (SMI) and standardized precipitation index (SPI) to evaluate the usability of spatial SM.
3. Results and Discussion
3.1. Corrected MODIS LST Data
The leave-one-out cross-validation method was used to assess the performance of the CM technique in predicting the LST values at ungauged locations. In this technique, observed LST stations are assumed to be nonexistent at given gauge locations, and a spatial interpolation technique (CM) is applied to obtain the values at these points. Then, the estimated values obtained from the CM technique are compared to the original values at all measurement locations. All observed daily LST data measured by 71 stations of KMA from January 2013 to December 2015 (
Figure 5).
For verifying LST by CM, some LST stations were assumed to be ungauged stations for verifying LST in green ellipses of
Figure 5. LST stations used for verification considered three conditions: (1) the locations in coastal and inland regions; (2) the proximity of the SM observation station; and (3) each land use. From that, 129, 192 and 238 stations are selected by coastal and agricultural area (129), coastal and pasture area (192), and inland and forest area (238) for verifying LST.
After excluding three stations, the CM technique was applied to calculate the corrected LST distribution. The coefficient of determination (
R2) values were high at all stations, including verified stations, and varied from 0.89 to 0.99. Overall, the calibration results at the three points excluded from the CM process showed good results between observed and simulated values (
Figure 6). Finally, monthly spatial distribution map of corrected LST was generated by the CM technique from 2013 to 2015.
Figure 7 shows monthly spatial distribution maps of corrected LST from April to September in drought years (2013–2015).
3.2. Optimal Regression Equation and Regression Coefficients
To determine the optimal regression scenario, the regression coefficients of MLR equations were assessed (
Table 4). Moreover, scenario 9 was additionally added to assess how the result of corrected MODIS LST would be affected in
R2. In the correlation analysis,
R2 was used to assess the results. The weighted effect of each regression coefficient on SM is given in
Table 4. Scenario 1 only used MODIS LST and PCP
n to estimate SM, and the resulting
R2 was 0.22. However,
R2 increased to 0.24 and 0.45 when LST, PCP
n, PCP
n-1, PCP
n-2, PCP
n-3, PCP
n-4, and PCP
n-5 were included. From this result, Scenario 8, which included nine independent variables, was selected as the optimal regression equation. Additionally, the
R2 of Scenario 9 decreased by 0.06 when original MODIS LST was applied, compared with the result of Scenario 8.
From an optimal regression equation, the regression coefficients of the MLR model were re-estimated based on the four soil classes (clay, loam, sand, silt) and four seasons (spring, summer, autumn, winter) considering the antecedent precipitation for up to five days. The regression coefficients of optimal scenario were divided into 16 regression coefficients over four seasons and four soil types (
Table 5). Each regression coefficient was presented with the
p-value to confirm the statistical significance, and the coefficient of determination (
R2) was used for correlation analysis. Overall, the
p-values of all the regression coefficients were less than 0.05, except for the autumn and winter regression coefficients of NDVI for silt, indicating statistical significance. The reason for the high
p-values of the regression coefficients of NDVI for silt in autumn and winter is that most of the observations corresponding to silt are provided by the RDA and the data are insufficient (data were available for a period of less than a year).
The MLR coefficients showed that R2 increased from 0.07 to 0.24 when we estimated the coefficient seasonally compared with the coefficients that were not estimated seasonally (0.22 to 0.43). These results indicate that the correlation between observed soil moisture and estimated SM was improved by considering seasonal and soil type patterns. Nevertheless, the reason for the low R2 of less than 0.5 was that the soil classification for each observation site was not complete. It was difficult to classify the soil classes of the RDA’s stations (from No. 19 to 58) because these stations were not stabilized; the stations have only been measuring SM since December 2014. In addition, the classification of 12 soil types into four soil types also contributed to the poor accuracy. Therefore, if we add more soil moisture observations or extend the observation period and refine the soil classification to more than four soil types, the simulation results would improve.
The seasonal regression coefficients showed the highest correlation in all four soil classes in spring and summer. Because of the characteristics of the monsoon climate in South Korea, where precipitation is concentrated in summer, there are many values that are effective for regression analysis based on antecedent precipitation, and it is estimated that the correlation is high. However, the reason for the low R2 of the four soil types in winter is that SM data in winter are associated with high uncertainty because the soil is frozen, which can cause instrument errors.
3.3. Soil Moisture by Regression Coefficients of Optimal Scenario
Using the optimal scenario, the accuracy of SM estimation was evaluated via comparison to observed SM at 58 stations. SM was estimated using the regression coefficients, and the accuracy with respect to the observed SM was verified by
R2 and root mean square error (RMSE) (
Table 6). The graphs between observed SM and estimated SM at major stations are illustrated in
Figure 8.
R2 and RMSE for all soil types ranged from 0.30 to 0.76 (R2) and 0.46% to 12.21% (RMSE). The overall R2 and RMSE were greater than 0.4 (R2), indicating a constant correlation. Most of the RMSEs were less than 5.0% (RMSE), but the RMSEs at Stations 1 and 2 were greater than 9.0% (RMSE). The main errors may have been associated with the artificial water supply. Unlike other stations, these two stations are located near upland crop and paddy field areas. Therefore, the observed SM was likely influenced by the agricultural water supply in addition to precipitation during the irrigation period from April to June.
Notably, R2 and RMSE in winter were poor, as illustrated by the many missing values, and uncertainty exists in observations due to freezing and mechanical errors in the soil. Thus, the prediction accuracy is low in winter due to the difficulty of establishing an appropriate regression model. Additionally, some observations with low correlations did not fit the soil properties. After refining the soil classes in further research, it is expected that the R2 and RMSE of these observatories, as well as those of the other observation sites, will increase because of the characteristics of the regression model. Currently, 43 of the 58 stations are occupied by loam and clay, and these 43 stations are calculated using one regression equation for each season. Therefore, as mentioned above, improvement of the soil classification is necessary.
3.4. Distribution of Estimated Soil Moisture
The SM distribution by soil type was estimated using normalized regression coefficients (
Table 7). The monthly distribution maps of spatial SM and spatial precipitation (PCP) were generated from 2013 to 2015 (
Figure 9). From the results of the monthly distribution map, spring drought was severe until March due to the absence of rain in 2013 and 2015. As seen from
Figure 9, there was no rainfall in other regions, except for northeastern South Korea, until January. Therefore, SM was relatively low compared with other regions. Because PCP was observed throughout South Korea from March to April, SM increased by 5–6%. However, SM in May decreased due to the absence of rain. These results show that SM depends on spatial PCP pattern.
In particular, in the June SM map, the SM in the western part of the Korean Peninsula sharply decreased because there was no rainfall for the three months after March from 2013 to 2015. This trend is due to the monsoon climate of the area. All areas of South Korea are affected by monsoons, and wet and dry seasons occur each year, with seasonal variations in precipitation. Usually, June through August (summer) is the wet season, and most of yearly rainfall occurs during this period. Approximately 30% of the annual rainfall occurs in the other 9 months.
3.5. Comparison of Drought Index
The SMI and SPI in 2015, an extreme drought year, are illustrated in
Figure 10. The SPI was extremely low (dry) in Gyeonggi and Gangwon Provinces (northern part of South Korea) on 1 January 2015. In March and May, SMI and SPI values show that drought was alleviated by rain. However, the SMI and SPI levels remained severe in Gyeonggi and Gangwon Provinces. This drought was resolved by large-scale rainfall events in July, and the SMI approached zero in southern regions where rainfall occurred. After July, the soil moisture naturally increased and decreased according to the precipitation. This pattern is consistent with the tendency of SPI. Coupling with SPI, SMI can be used as meteorological drought index in forested area and agricultural drought index in cultivation area.
4. Conclusions
This study estimated the spatial SM of South Korea from January 2013 to December 2015 using an MLR model and MODIS satellite data and evaluated the results by comparison with observed SM data at 58 stations. From the original MODIS LST data, daily spatial LST was corrected using CM technique. Additional satellite data (NDVI of Terra MODIS) were used to reflect vegetation variation. The observed precipitation measured from AWSs of the KMA considered during the simulation period was interpolated using the IDW method to match the spatial resolution of 1 km. Although the USDA textural classification, which divides soil into 12 classes, is one of the most widely used soil classification systems, the soil was classified into four types (loam, sand, clay and silt) based on the largest proportion of soil in South Korea. Finally, the regression coefficients of the MLR model were estimated seasonally considering the five-day antecedent precipitation. The primary results are summarized as follows:
The R2 of MODIS LST corrected by CM were between 0.83 and 0.99 at all LST stations. The results showed the values were generally accurate compared to the observed LST.
The p-values of all the regression coefficients were less than 0.05, except for a few coefficients of NDVI for silt, indicating statistical significance. The R2 values of the regression coefficients for the 4 soil classes were between 0.28 and 0.67. The reason for the low R2 values of less than 0.5 is that the soil classification for each observation site was not completely accurate.
The seasonal regression coefficients showed the highest correlation in all four soil classes in summer due to the characteristics of the monsoon climate in South Korea, where precipitation is concentrated in summer. There are many values that are effective for regression analysis based on antecedent precipitation.
When we simulated SM using the estimated regression coefficients, the overall R2 was greater than 0.4 at most observation sites (approximately 66%), except for some observations. Therefore, as mentioned above, improvements in soil and season classification are necessary.
For distributing spatial SM, normalized regression coefficients were estimated using min–max normalization. Normalization typically showed relationship which independent variable has the largest influence on the dependent variable in MLR analysis. In this study, the normalization was performed using min–max normalization.
In the spatial soil moisture distribution, simulated SM tends to increase and decrease with precipitation. This tendency is more clearly seen in the SMI map, where the SMI decreases from −2 to −3, indicating a weak drought. From March to April in 2014, PCP was observed throughout South Korea. Thus, SM also increased by 5–6%. These results showed that approximately 60% of the drought areas predicted by the SMI and SPI overlapped.
The result of the CM technique in this study showed that the accuracy of MODIS LST data improved by 20–30%. In regression analysis, the most important variables for estimating SM include 2-day, 3-day, 4-day and 5-day antecedent precipitation, LST and NDVI. SM exhibits spatially different patterns, even in areas with the same land use and soil characteristics. Overall, this study develops a high-resolution and accurate spatial distribution of SM for the first time based on satellite data. The results of this study showed that spatial resolution improved by 90% (10 km to 1 km) and R
2 increased by 62% (0.30 to 0.49) compared with the spatial resolution (10 km) and
R2 (0.30) of a previous study [
48] based on AMSR2 satellite data. Although machine learning methods such as MLR and artificial neural network (ANN) may achieve better accuracy, there are still some limitations. The MLR model is trained with a large number of samples, and the more training samples that are available, the better the model fits the data. This fact draws on a requirement for the abundance of historical satellite data and contemporary in situ SM observations. If the satellite data and in situ SM archive is not abundant enough, then the relation values cannot be fully represented by historical observation pairs [
49]. Notably, this study is the first to spatially estimate SM using MODIS LST corrected using the CM technique in South Korea. Therefore, this study provides a framework for accurate SM prediction in ungauged areas. Future research study could be improved if the soil classification is further subdivided and the soil moisture regression coefficient is simulated based on more observed data.