1. Introduction
The land surface temperature (LST), generally defined as the radiative skin temperature of the ground, is closely related to the radiative budget and energy fluxes between the atmosphere and the ground [
1,
2,
3,
4]. LST plays an important role in the estimation of climate models, environmental models and evapotranspiration models, as well as the calculation of drought indices, soil moisture contents and mortality rates [
5,
6,
7,
8,
9,
10,
11,
12,
13,
14].
Compared with LST measurements at ground stations, satellite remote sensing observations have the advantages of easy acquisition and complete spatial coverage over large areas. Typical LST products include Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), Moderate Resolution Imaging Spectroradiometer (MODIS) and Meteosat Second Generation Spinning Enhanced Visible and Infrared Imager (MSG-SEVIRI) datasets, with spatial resolutions of 90 m, 1 km and 3 km, respectively [
15,
16,
17]. Among them, the MODIS LST product is the most widely used and best suited for our research because of its appropriate spatial resolution (1 km), high temporal resolution (four overpasses per day), wide coverage (globe), and high retrieval accuracy (approximately better than 1 K). The MODIS instruments were launched on the Sun-synchronous satellites of Terra and Aqua in December 1999 and May 2002, respectively [
18]. The MODIS LST products are generated with bands 31 and 32 of MODIS’s 36 spectral using the split window algorithm [
2]. The latest version C6 MODIS LST products have different spatial resolutions of 1 km, 6 km, and 0.05° and different temporal resolutions of daily, eight days and monthly. The MOD11A1 and MYD11A1 are 1 km daily Level 3 products in those MODIS LST products. The transit time of Terra corresponding to MOD11A1 is about 10:30 (22:30), while the transit time of Aqua corresponding to MYD11A1 is about 13:30 (1:30). They are both processed into sinusoidal projection and stored in tiles containing 1200 rows and 1200 columns. The quality of MODIS products is continuing to improve, from more than 2 K in the previous versions to less than 2 K (within ±1 K in most cases) in the C6 version [
4]. MODIS LST products have been widely used in LST research [
19,
20,
21]. However, the MODIS LST product can only provide usable values under clear-sky conditions, and its spatial integrity is thus affected by clouds or other atmospheric disturbances. Taking China as an example, more than half of the pixels per day have no observations on average, and these gaps seriously hinder the application of the MODIS LST product.
Several gap-filling methods have been developed to reconstruct LSTs under cloudy conditions to obtain spatiotemporally-continuous LST products. In general, these methods can be divided into two main groups: clear-sky LST [
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30] and cloudy-sky LST [
31,
32,
33,
34,
35,
36,
37] methods. Clear-sky LST represents the retrieved LST assuming no cloud effects, whereas cloudy-sky LST represents the actual LST of the reconstruction considering cloud effects. Usually, clear-sky LST is slightly higher than cloudy-sky LST. The methods for reconstructing cloudy-sky LST, mostly based on surface energy balances, often use passive microwave remote sensing data or require ground station measurements or shortwave radiation products. Nonetheless, microwave data have a coarse spatial resolution and an accuracy that needs improvement. Moreover, ground station measurements or shortwave radiation products with a high spatial and temporal resolution are difficult to obtain. This study focused on reconstructing the clear-sky LST, first, because improving the accuracy of clear-sky LST is conducive to further determining the cloudy-sky LST better, and second, because clear-sky LST can be directly applied to research fields such as numerical weather prediction [
38], the identification of diurnal patterns of urban heat islands [
39], and calculation of the Temperature-Vegetation Dryness Index (TVDI) or Temperature-Vegetation-soil Moisture Dryness Index (TVMDI) [
40,
41].
The methods for clear-sky LST may be divided into four categories, according to the underlying principles: considering temporal correlation, considering spatial correlation, considering auxiliary information, and the hybrid method. Details of each category are as follows: (1) LST has a temporal correlation because, for the same pixel at different times, the surface properties are the same and only different weather factors, such as solar radiation and wind speed, cause LST differences. Therefore, the first category reconstructs LST based on the temporal correlation using temporal interpolation methods or methods that employ correlations at different times [
22,
23]; (2) LST also has a spatial correlation because different pixels at the same time have the same weather factors and different surface properties (such as elevation and land cover), but only the surface properties cause LST differences. The second category thus reconstructs LST based on the spatial correlation using spatial interpolation methods [
19,
24]; (3) in addition, LST is affected by related factors such as elevation and NDVI. The third category thus estimates the missing LST using the empirical relationship between LST and the auxiliary information, which has a similar spatiotemporal resolution to LST and a better spatial coverage integrity than LST [
20,
21]; (4) finally, the fourth category is hybrid methods that combine two or three of the above methods, such as spatiotemporal gap-filling methods or spatial interpolation methods that consider auxiliary information [
25,
26,
27,
28,
29,
30]. In general, the hybrid approach is the most promising. Considering only temporal correlation is not suitable for regions with high spatial heterogeneity. If only spatial correlation is taken into account, the results will be inaccurate for areas that have large weather changes in a short period of time. If only auxiliary information is considered, the accuracy of regression and the uncertainty of auxiliary information will affect the final results. In previous studies, there have been relatively few methods suitable for LST reconstruction in large areas. In a region as large as China, where climate change is complex, spatial heterogeneity is high, and auxiliary information has considerable uncertainty, a method is needed that can comprehensively and reasonably consider time correlation, spatial correlation, and auxiliary information, and the uncertainty of auxiliary information should also be considered.
Bayesian maximum entropy (BME) is a spatiotemporal statistical method proposed by Christakos that can provide a systematic and rigorous framework for incorporating hard data, soft data and other sources of information into the estimation of variables [
42,
43]. BME has several attractive features; it does not need to make any assumptions regarding the linearity of the estimator, the normality of the underlying probability laws, or the homogeneity of the spatial distribution. Moreover, BME is capable of considering uncertainties contained in the data. The method has been successfully applied to numerous areas, such as air pollution, soil properties, water demand and disease [
44,
45,
46,
47,
48,
49,
50,
51,
52,
53,
54,
55,
56,
57,
58,
59,
60,
61]. It has also achieved good results in the gap-filling of remote sensing data [
62,
63,
64]. The BME method is suitable for our research because it can not only take advantage of the temporal correlation and spatial correlation of the LST, but can also explicitly consider the uncertainties of the auxiliary information.
In this study, we applied the BME-based interpolation method to reconstruct 1 km resolution daily clear-sky LST for China’s landmass considering temporal correlation, spatial correlation and auxiliary information. The goals of this article are to (1) examine the feasibility of the BME method to reconstruct LST for the whole of China, (2) discuss the accuracy of the BME method for different land cover types, and (3) compare the BME method with other commonly used LST reconstruction methods.
4. Discussion
4.1. Accuracy Analysis
The mean RMSEs in this study were 1.6 K for the daytime and 1.2 K for the nighttime, which were slightly lower than the RMSEs of 3.3 K for the daytime and 2.7 K for the nighttime in Li’s study in a comparably large area [
30]. The single point RMSE of approximately 0.5 K is comparable with the RMSE of approximately 2 K under cloud-free conditions in Duan’s study, which selected four ground points for validation [
31]. The accuracy of this method is acceptable for large areas with complex geographical and climatic conditions. In addition, this method has a high accuracy in estimating urban LST and can be applied to urban heat island research.
There were different accuracies for different land cover types, which indicates that the accuracy is affected by land cover [
34]. The accuracy of barren land and forest was lower than that of urban and cropland because the terrain of barren and forest is more complex, and the spatial heterogeneity is greater. Therefore, the model accuracy can be improved by dividing various terrain regions and then adopting different covariance models for various regions.
The accuracy decreased with the increase of NDVI and average LST because a high NDVI and high temperature usually represent the summer climate in China, when the cloud cover is large and the distribution is concentrated, which results in large LST gaps. The completeness has only a slight impact on the accuracy, possibly because the accuracy is influenced not only by the number of missing pixels, but also by their maximum diameter and distribution characteristics [
69].
When constructing soft data, the accuracy of multiple linear regression does not affect the accuracy; this may be due to the ability of BME to fully consider the uncertainty of soft data. The average regression R2 in this study was nearly 0.6, and in the subsequent application of the BME method to LST reconstruction, when the average regression R2 in the construction process of soft data is greater than 0.6, it is unnecessary to adopt more complex regression methods to improve the regression accuracy.
4.2. Suggestions for Method Selection
We can learn the characteristics of each LST reconstruction method from
Figure 11 and in combination with previous studies. The accuracy of spatial interpolation models is usually higher than that of temporal interpolation models [
70]. The time interpolation models have some difficulties in capturing extreme values, and their accuracy is relatively low. Spatiotemporal gap-filling methods are often unable to fill all the missing values at one time, and one usually needs to iterate several times until all the missing values are filled. Spatial interpolation methods, especially the ones that consider auxiliary information, have a high accuracy, but usually take a relatively long time for calculations [
25,
26]. The study area in this study was large. To balance the calculation time and accuracy, we did not select the spatiotemporal covariance model, but rather the spatial covariance model, to consider the spatial dependence characteristics and the simple 15 day mean value to consider the time dependence characteristics. The spatiotemporal covariance model can be selected in small areas [
34,
71]. With the development of computing power and multi-core parallelism in the future, the computing speed will become faster.
Suggestions on how to choose an appropriate method to reconstruct LST are as follows: (1) If one hopes for a short computation time and simple computation steps, one can choose the time interpolation method; (2) if a high precision and simple calculation steps are required, the spatial interpolation method is recommended; (3) if one wants to balance the calculation speed and accuracy, we suggest using the spatiotemporal gap-filling method. All of the above three methods can consider introducing auxiliary data to improve the accuracy. In addition, note that the Kriging spatial interpolation method achieved good results in this study area and that Kriging is thus a simple method worth trying. The BME method that we used is a spatial interpolation method that considers auxiliary information, and its precision was very high in this study area.
4.3. Exploration of the Accuracy Improvement
In this part, we will explore methods to improve the accuracy and model applicability based on the assumptions of the method. One assumption of this method was that LST changed linearly in a short time. We thus subtracted the 15 day mean LST from the LST of the constructed day to consider the time correlation. Doing so can also make the data closer to a normal distribution and thus replace the step of trend removal in other BME studies. We selected 18 October, 2015 as an example (there were relatively more MODIS observed LSTs available on that day), as shown in
Figure 12. The green part shows that the LST of all pixels on this day presents a non-normal distribution, while the purple part exhibits an approximately normal distribution after subtraction calculation (LST–the mean LST of 15 days) is performed. This operation achieved the desired effect. However, as can be seen from
Figure 8, the time variation of LST exhibited both an overall trend and fluctuation. According to the limitation of this study’s assumption, we may improve the accuracy by taking better account of the time correlation. We can do so by introducing the Annual Temperature Cycle (ATC) model, which is a general and smooth curve description of the LST annual change. Considering the time correlation with the LST of the estimated day minus the LST of the corresponding point on the ATC curve, theoretically, will be more accurate than considering the time correlation with the LST of the estimated day minus the LST of the 15 day mean value. We plan to conduct follow-up studies with regards to this.
The second assumption of this method was that one omnidirectional covariance model of the constructed day can be used in the whole study area. Here, we want to explore the directivity of the covariance model. We constructed omnidirectional and directional covariance models for 18 October, 2015 (
Table 2,
Figure 13). As can be seen from the parameters of directional covariance in the study area, the overall characteristic is that the directional covariance is significantly affected by longitude and latitude, and the latitude direction changes faster than the longitude direction.
Omnidirectional and directional covariance models were input into the method in this study to calculate the results, and 10,000 points were randomly selected during the day and night for accuracy verification and comparison. The results show that after considering the directionality of the covariance, the accuracy in the daytime is slightly improved, and the accuracy of the night is not improved (
Table 3). It is suggested that directionality can be considered during the daytime in the following research.
In addition, large study areas will have problems with spatial covariance models that differ in different regions. First of all, however, we have not explored how the covariance model changed in different regions of China’s mainland. Moreover, if we want to fill in the LST of the whole study area at once, more detailed regional division may make the model more complicated. How to apply different covariance models in different subregions of the study area is challenging and worthy of further exploration.
The third assumption of this method was that the NDVI of each day can be represented by the NDVI data of its adjacent 7 days. We used 15 day NDVI data because it had values on all pixels to ensure that soft data can be constructed on all LST missing pixels. When the LST was reconstructed on the 15th day of each month in 2015, the nearest NDVI data were selected on 9 January, 10 February, 14 March, 15 April, 17 May, 18 June, 20 July, 21 August, 22 September, 8 October, 8 November and 11 December. There were no results to prove that the closer the reconstruction date was to the obtained date of NDVI, the higher the fitting accuracy and the final LST accuracy were. We believe it was feasible to reconstruct the LST daily with a 15 day NDVI product. Lastly, we cannot accurately calculate the impact of the uncertainty of NDVI datasets on the uncertainty of the final results, which is a limitation of this method.
4.4. Recommendations for Future Studies
Due to the characteristics of the BME method, we cannot accurately determine the uncertainty of the results. At present, we can only obtain the conclusion that at a 1 km spatial resolution, the accuracy of reconstructing the daily LST of China’s landmass with the BME method is acceptable.
For daytime or nighttime LST reconstruction on a certain day, one covariance model can be adopted in the whole research area on that day, which can achieve a reasonable accuracy. If one wants to use this model later, we suggest that it be directly used in the same study area and at the same spatiotemporal resolutions. If other areas or other resolutions are studied, an accuracy analysis should be conducted first to see whether it can meet the actual requirements. The soft data should also be reconstructed according to the geographic and data characteristics of the other study areas. One can try to use the four independent variables in this paper or may introduce other auxiliary data, such as soil moisture and temperature. In the future, we can try to introduce the ATC model, divide different subregions and adopt different covariance models to improve the accuracy of the BME method presented in this study.
When using the BMElib package, it is important to be aware of some parameter settings. The maximum effective distance can be set to a value similar to the range in the covariance model. In order to balance the calculation accuracy and calculation time, we recommend that the maximum number of hard data points should not exceed 50 and the maximum number of soft data points should not exceed 5.
Although clear-sky LST can be applied in some research, the real surface LST is still needed in many fields. This study calculated clear-sky LST, and if one wants to obtain cloudy-sky LST from clear-sky LST, one can refer to Zeng’s method [
69]. Adding microwave and ground observation data can also be considered.
5. Conclusions
The MODIS LST product has many missing values over wide areas, which hinders its practical application. In this study, we reconstructed the seamless 1 km resolution daily clear-sky LST for China’s landmass based on the BME method, considering spatiotemporal correlation and taking auxiliary data as soft data. The average RMSE was 1.6 K for the daytime and 1.2 K for the nighttime, with the mean absolute error (MAE) of 1.1 K for the daytime and 0.8 K for the nighttime, and the corresponding R2 of 0.92 for the daytime and 0.98 for the nighttime.
This method has the following advantages: (1) It simultaneously considers spatiotemporal correlation and auxiliary data and has a high accuracy in a large area. It has the ability to capture extreme values; (2) the data in this method are easy to obtain and process; (3) simple linear regression is used to construct soft data, and there is no need to adopt more complex regression methods to improve the regression accuracy, as long as the average regression R2 is greater than 0.6; (4) even if the diameter of the missing area is large or the continuous missing time is long, this method does not need multiple step-by-step calculations to gradually fill in the missing pixels, and can estimate all the missing pixels at one time.
There are also some limitations for this method: (1) This method is not applicable when an accuracy of less than 1 K across the entire Chinese landmass is required; (2) when using the method in other study areas and spatiotemporal scales, it is necessary to first consider whether the hypothesis of LST linearity change in a short time and one omnidirectional covariance model can be applied to the entire study area are valid; (3) the method cannot quantitatively calculate the influence of the uncertainty of NDVI and DEM data on the uncertainty of the results; (4) the clear-sky LST should be converted to cloudy-sky LST if the real LST is required.
The results of this study provide a data basis for daily LST analysis and subsequent relevant studies in large areas of China. For the method, its high accuracy and great ability to capture extreme values in urban areas can help improve urban heat island research. It can also be applied to the reconstruction of missing LST values of other years, other regions and other spatial resolutions (such as Landsat), as well as the estimation of missing values of other geographical attributes.