1. Introduction
Coarse-resolution satellite data obtained, for example from the Advanced Very High Resolution Radiometer (AVHRR) [
1], Systeme Pour l’Observation de la Terre (SPOT) Vegetation (VGT) [
2], and from the Moderate Resolution Imaging Spectroradiometer (MODIS) [
3], are widely used in areas such as land cover and land use mapping [
4,
5], crop mapping and yield forecasting [
6,
7], global change [
8], vegetation trends and phenology estimations [
9,
10], disaster monitoring [
11,
12,
13,
14] and atmospheric environment [
15,
16,
17] and water environment monitoring [
18]. The return cycle of these satellites is one to two days, making them suitable for dynamic monitoring of land surface processes, particularly AVHRR data, which provides the longest time series among global satellite measurements [
19]. However, the spatial resolutions of these data are lower than 250 m. When the size of land objects is smaller than the spatial resolution of images acquired by these sensors, the recorded signals are often a mixture of different land cover types. This makes these data difficult to apply to high spatial resolution surface process monitoring. Medium spatial resolution satellite data, such as data from Landsat and the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), can also be used for dynamic monitoring of land surface processes. However, because of the long return cycles of these satellites (>16 days) and the influence of clouds, the rate at which these satellites can obtain useful data is very low [
20], and it is difficult to use their data to monitor rapid changes in land surface processes. Therefore, these satellite data are often used only for annual dynamic analyses, including the spatiotemporal dynamic analysis of ecosystems such as wetlands [
21,
22], forests [
23,
24,
25], water [
26], crops [
27] and cities [
28], for monitoring plant phenology [
29] and for land management [
30,
31]. There is a lack of satellite data with high enough spatial and temporal resolutions to monitor rapid changes in land surface processes.
A solution to this problem is to combine coarse and medium spatial resolution satellite data to generate synthetic satellite data with high spatial and temporal resolutions. This method is called spatial and temporal data fusion technology and several such approaches have recently been proposed. Gao
et al. [
32] introduced the Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM) for blending MODIS and Landsat imagery. Several studies have applied STARFM, mainly in coniferous areas for urban environmental variable extraction, vegetated dry-land ecosystem monitoring, public health studies and daily land surface temperature generation [
33,
34,
35,
36,
37]. Zhu
et al. [
38] enhanced STARFM for complex heterogeneous regions. Emelyanova
et al. [
39] assessed the accuracy of STARFM and ESTARFM (Enhanced STARFM) for two landscapes with contrasting spatial and temporal dynamics. Jarihani
et al. [
40] evaluated the accuracy of STARFM and ESTARFM to downscale MODIS indices to match the spatial resolution of Landsat. Scholars have also proposed methods based on linear mixed models [
41,
42,
43]. Wu
et al. [
44] proposed a Spatial and Temporal Data Fusion Approach (STDFA) to calculate the real surface reflectance of fine-resolution pixels from the mean reflectance of each land cover class, disaggregated using unmixing methods. They also applied this method to the estimation of high spatial and temporal resolution Leaf Area Index [
45] and land surface temperature [
46] data. Gevaert and García-Haro [
47] compared STARFM and an unmixing-based algorithm. STARFM and ESTARFM are more suitable for complex heterogeneous regions, while unmixing methods such as STDFA are more suitable for cases that downscale the spectral characteristics of medium-resolution input imagery [
47].
The proposed spatial and temporal data fusion approaches mainly focus on the fusion of Landsat and MODIS data. However, with the recent launch of new satellites, there is a need to validate these methods for the new sensors. In recent years, China has launched two moderate-resolution satellites, Huanjing satellite (HJ), and Gaofen satellite no. 1 (GF-1). Wei
et al. [
48] compared the data quality of the HJ charge coupled device (CCD) and the Landsat Thematic Mapper (TM) sensor data. They found that the radiation accuracy, clarity and signal-to-noise ratio (SNR) of the HJ CCD data were lower than for the Landsat TM data. This issue is particularly important for Chinese satellite data because of the poor data quality.
To address this problem, the objectives of the present study are: (1) to validate the applicability of ESTARFM and STDFA to HJ and GF satellite data and (2) to analyse the influence of MODIS data of different spatial resolutions on the application of ESTARFM and STDFA.
Figure 1.
Locations of the study areas.
Figure 1.
Locations of the study areas.
3. Methods
3.1. Model Introduction
3.1.1. ESTARFM
ESTARFM was proposed to improve the STARFM algorithm for the accurate prediction of surface reflectance in heterogeneous landscapes using the observed reflectance trend between two points in time and spectral unmixing theory [
38]. According to the linear mixture model, the changes in coarse-resolution reflectance from
t0 to
tk can be expressed as:
where
and
are the coarse-resolution reflectances from time
t0 to
tk,
and
are the fine-resolution reflectances from time
t0 to
tk,
M is the total number of endmembers,
is the fraction of the
i-th land type and
a is the sensor calibration coefficient. ESTARFM assumes that thechange in the reflectance of each land type is linear with time:
where
and
hk is the rate of change of the
k-th endmember, which was assumed to be constant from time
t0 to
tk. Then, Equations (1) and (2) can be rewritten as:
Substituting Equation (3) into Equation (4), the ratio,
vk, of the change in reflectance for the
k-th endmember to the change in reflectance for a coarse pixel can be described as:
Equation (5) can be rewritten as:
where (
x,y) is the position of the target pixels.
By introducing additional information from the neighbouringpixels to reduce the influence of land cover changes, surface heterogeneity and solar geometrybi-directional reflectance distribution function (BRDF) changes, a weighted ESTARFM can be determined as Equation (7):
where
w is the size of the search window;
is the weight determined by the spectral difference
and the temporal difference
between the fine- and low-resolution data, and by the location distance
between the target pixel and the candidate pixel;
k is the number of pixels (
xi,yj) in window
w. These parameters are calculated as follows:
3.1.2. STDFA
STDFA is based on a linear mixing model that assumes the reflectance of each coarse spatial resolution pixel is a linear combination of the responses of each land cover class contributing to the mixture [
49]:
where
is the mean reflectance of land type
i at time
t and
is the residual error term. By inputting the fraction data
extracted from the land cover map, the mean reflectance for land cover class
i can be calculated by solving Equation (12) using the ordinary least squares technique. Then, based on the assumption that the temporal variation properties of each fine-resolution pixel in the same class are constant, the STDFA predicts synthetic high-resolution spatial imagery as follows:
3.2. Model Application
According to Equation (7), three images are needed for ESTARFM: fine-resolution data acquired at time t0, called the base image, and two low-resolution datasets acquired at times t0 and tk, called the time series low-resolution data. Two pairs of fine-resolution and low-resolution datasets acquired at time t0 and time tl are also required to calculate the spectral similarity index. According to Equation (12), three images are needed for STDFA: a fine-resolution dataset acquired at time t0, called the base image, and two low-resolution datasets acquired at times t0 and tk, called the time series low-resolution data. Two fine-resolution datasets acquired at time t0 and time tl are also required for classification. The outputs of ESTARFM and STDFA are synthetic fine-resolution data acquired at time tk. In this study, HJ-1 CCD and GF-1 WFV data acquired on 3 October 2013 were used as the base image. MODIS images acquired on 3 October 2013 and 7 October 2013 were used as the time series low-resolution data. Two pairs, HJ-1 CCD and MODIS data, or GF-1 WFV and MODIS data, acquired on 3 October 2013 and 15 October 2013 were used to calculate the spectral similarity index for ESTARFM. Two HJ-1 CCD or GF-1 WFV datasets, acquired on 3 October 2013 and 15 October 2013, were used for classification in STDFA. The outputs of ESTARFM and STDFA are synthetic HJ-1 CCD and GF-1 WFV data acquired on 7 October 2013.
3.3. Validation of Results
Since the objective of the ESTARFM and STDFA methods was to generate synthetic fine-resolution data, actual HJ-1 CCD and GF-1 WFV data acquired on 7 October 2013 were used to validate the algorithm using the methods proposed by Wu
et al. [
44]. First, the results were qualitatively evaluated using visual interpretation. The greater the similarity between the synthetic and actual fine-resolution data, the higher the accuracy of the model. Second, the results were quantitatively evaluated using the correlation analysis method. Parameters, such as the correlation coefficient (
r), variance (Var), mean absolute difference (MAD), bias and root mean square error (RMSE), were calculated to quantitatively evaluate the precision of these models. A higher value of
r and lower variance, MAD, bias and RMSE indicate higher accuracy.
3.4. Accuracy Comparison for the Fusion Results Using 250 m and 500 m MODIS Data
There are two reflectance MODIS data products: MOD09GQ and MOD09GA. Both products can be used in spatial and temporal data fusion, but the question remains as to whether the spatial resolution differences between these two products have an impact on the fusion accuracy. To answer this question, we applied STDFA and ESTARFM using MOD09GQ data in Kuche and Luntai, and compared the fusion results with results obtained using MOD09GA data. The correlation analysis method was used to evaluate the similarity of the actual fine-resolution data and the synthetic fine-resolution data generated by inputting MOD09GQ and MOD09GA. By comparing the r, variance, MAD, bias and RMSE we can analyse the influence of spatial resolution differences for these two data products.
3.5. Comparison of Actual NDVI and NDVI Calculated Using Synthetic Data
The synthetic red- and near infrared (NIR)-band data were generated using STDFA and ESTARFM in Kuche and Luntai, allowing Normalized Different Vegetation Index (NDVI) images in these two study areas to be calculated. The correlation analysis method was then used to evaluate the similarity of the NDVI image calculated from synthetic fine-resolution data and the NDVI image calculated from actual fine-resolution data.
5. Discussion
This study applied and demonstrated the use of ESTARFM and STDFA in combining HJ CCD, GF-1 WFV and MODIS data to generate high spatial resolution data. Although the data quality of HJ CCD data is lower than Landsat data, both the ESTARFM and STDFA algorithms can generate daily synthetic high spatial resolution data accurately, with r values higher than 0.8643 and RMSEs lower than 0.0360. As the MODIS sensor can acquire daily images, ESTARFM and STDFA can be used to enhance the proportion of useful HJ CCD and GF-1 WFV data. For example, the proportion of useful MODIS data in Luntai for 2013 is 48.49%, while the proportions of useful HJ CCD data and useful GF-1 WFV data are 15.07% and 10.29%, respectively. Owing to the high proportion of useful data, this method is potentially useful for high spatial resolution environmental process monitoring, and can be applied in natural resource damage assessments and environmental policy and management. However, there are some issues should be addressed in the application of this method:
(1) Influence of the satellite data quality. The test areas for the HJ-MODIS fusion and the GF-MODIS fusion in Kuche were the same, while the test area for the GF-MODIS fusion was a little larger than the area for the HJ-MODIS fusion in Luntai. Therefore, we can only analyse the influence of the satellite data quality on the data fusion for the Kuche area. As an early Chinese moderate-resolution satellite, the quality of HJ satellite data is lower than the quality of GF-1 satellite data. The position error of HJ CCD data is greater than 1 km, while the position error of GF-1 WFV data is less than 100 m. Furthermore, the SNR of HJ CCD data is much lower than the SNR of GF-1 WFV data.
Table 6 lists the differences between
Table 5 and
Table 4, showing the influence of the sensor differences on the model accuracy. From
Table 8, we can see that the influence of the sensor differences for STDFA is less significant than for ESTARFM. This is because STDFA had better noise immunity. Similar results were found for the fusion of ASTER and MODIS land surface temperature products [
46].
Table 8.
Influence of sensor differences on model accuracy.
Table 8.
Influence of sensor differences on model accuracy.
Parameters | STDFA | ESTARFM |
Blue | Green | Red | NIR | Blue | Green | Red | NIR |
r | 0.0165 | 0.0045 | 0.0018 | 0.0283 | 0.0206 | 0.0104 | 0.0077 | 0.0574 |
Var | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | −0.0001 | 0.0000 | 0.0000 |
MAD | −0.0012 | −0.0015 | −0.0003 | −0.0005 | −0.0017 | −0.0025 | −0.0018 | −0.0022 |
RMSE | −0.0230 | −0.0143 | 0.0088 | −0.0142 | −0.0205 | −0.0096 | −0.0038 | −0.0009 |
bias | −0.0238 | −0.0147 | 0.0113 | −0.0156 | −0.0213 | −0.0096 | −0.0034 | 0.0009 |
(2) Influence of bidirectional reflectance distribution function (BRDF) changes. For different data acquisition dates, the solar and satellite azimuth and zenith angles are different (
Table 9). This leads to changes in the BRDF and hence in the information received by the sensor. More seriously, this could lead to differences in the shadow direction and length for the same object on different dates.
Figure 4 shows an example of the difference in the shadow direction and length for trees on 3 October 2013, 7 October 2013 and 15 October 2013 recorded by the GF-1 WFV sensor. In the shaded area, the differences of reflectance in the NIR band between actual and synthetic GF-1 WFV data can reach 0.0676 for STDFA and 0.0638 for ESTARFM. However, in the unshaded area, the differences of reflectance in the NIR band between actual and synthetic GF-1 WFV data were only 0.0078 for STDFA and 0.0153 for ESTARFM. Therefore, we conclude that further study is required regarding this problem.
Table 9.
Sensors and solar azimuth information from GF-1 WFV data.
Table 9.
Sensors and solar azimuth information from GF-1 WFV data.
Image Information | 3 October 2013 | 7 October 2013 | 15 October 2013 |
---|
Receive Time (local time) | 13:34:11 | 13:33:24 | 13:31:47 |
Solar Azimuth | 166.147 | 166.601 | 167.415 |
Solar Zenith | 42.4117 | 40.9225 | 37.9947 |
Satellite Azimuth | 235.912 | 121.314 | 106.892 |
Satellite Zenith | 89.1838 | 88.3384 | 84.7204 |
Figure 4.
An example of the difference in shadow direction and length.
Figure 4.
An example of the difference in shadow direction and length.
(3) The ESTARFM and STDFA algorithms were proposed to enhance the temporal resolution of high spatial resolution data. High spatial and temporal remote sensing data can be generated using these methods. However, spectral fusion was not considered for these methods. Therefore, these methods can only be used for spatial and temporal fusion, and we recommend using the spectral fusion method in the future to improve the spectral resolution of sensors [
50]. In addition, it has been proposed that these methods could be used to fuse optical images that are easily affected by cloudy weather. Although these methods can improve the proportion of useful data, a significant proportion of the data will be polluted by cloud. The development of an optical and radar data fusion algorithm is therefore an important direction for multi-source remote sensing data studies.