1. Introduction
Air temperature plays an important role in the interaction between the land surface and the atmosphere and has a crucial impact on many atmospheric processes [
1,
2]. The temperature participates in the physical, biological, and microbial processes of atmosphere [
3], affects the growth and development of plants [
4] and is also an important input parameter for calculating the vegetation water stress index, analyzing vegetation drought and estimating vegetation transpiration and net primary productivity [
5,
6,
7,
8,
9].
As a collection of instruments that measure atmospheric conditions to help to study the weather and climate of a specific location, weather stations have become the most commonly used method to assess temperature [
10,
11]. However, the temperature obtained by weather stations will produce errors and uncertainties in the process of converting the scattered temperature. The layout of weather stations is restricted by terrain, environment, and economic development, resulting in their uneven distribution and even some areas that cannot be covered. With the development of satellite remote sensing, the retrieval of temperature based on satellite images has achieved great progress [
12,
13,
14,
15]. The transit time of the satellite is limited, and the satellite images cannot fully cover a certain area with high spatial–temporal resolution, making it difficult to accurately reflect the characteristics and changes of temperature in a certain area [
16].
Published by the European Centre for Medium-Range Weather Forecasts (ECMWF), ERA5 represents the latest reanalysis data providing estimates of a large number (including the temperature) of land variables with regular latitude–longitude grids at 0.1° × 0.1°resolution [
17]. Since this type of reanalysis data covers the period from 1950 to near-real time and has the ability to provide hourly data, it has been widely used in recent years as a data source for temperature modeling and ecological parameter calculation. When using ERA5 to calculate the temperature at any location, however, the commonly used bilinear interpolation always introduces errors and uncertainties, which affects the estimated accuracy of temperature in the research area [
18,
19].
In this study, a refinement method for the temperature derived from ERA5 was proposed, which is based on the artificial neural network (ANN). This refined method makes full use of the temperature interpolated by the ERA5 gridded data and the corresponding values measured by weather stations and adopts the ANN to construct their correlation.
2. Methodology
To calculate the temperature at any location, ERA5 reanalysis data should be downloaded from
https://cds.climate.copernicus.eu/ (accessed on 1 August 2022). Then, bilinear interpolation is utilized to calculate the estimates of the temperature for the target location after the four nearest grid points are determined. Specifically, the formula for bilinear interpolation is as follows:
where
l,
φ denote the longitude and latitude of the target location, respectively.
,
,
,
represent the four grid points closest to the target location, where
and
,
, and
are on the same latitude, and
and
,
, and
are on the same longitude. Therefore, the latitude and longitude of
are
and
, respectively, and the corresponding values for
are
and
, respectively.
T refers to the temperature, and
is the calculated value based on ERA5 gridded data.
Designed to simulate the way the human brain analyzes and processes information, an artificial neural network (ANN) is widely used in the field of regression, classification, and geoscience [
20,
21,
22,
23]. It is composed of an input layer, an output layer, and one or more hidden layers and has the ability to handle the relationship between the input and output variables without the limitations present in linear mode [
24,
25]. In ANN models, it is allowed to adapt the architecture to the task itself and imposes no limitations on the distribution of input data, which enables ANNs with self-learning capabilities to produce better results as enough data become available. The number of neurons and hidden layers is considered the most important parameter in ANNs, as when the number of neurons and hidden layers increases, the learning accuracy increases, and the generalization ability decreases.
Based on the characteristics of bilinear interpolation and ANNs mentioned above, we utilized an ANN as the modeling tool to refine the temperature calculated by bilinear interpolation, that is, to construct the relationship between interpolated temperature and measured temperature (recorded by the weather station). Specifically, four variables, i.e., the latitude, longitude, and altitude of the site as well as its interpolated temperature, were selected as the input parameters of the input layer, and the corresponding measured temperature was considered the output parameter of the output layer.
3. Experiment and Results
In our experiment, three subregions, Ordos (Inner Mongolia Autonomous Region), Shanbei (Shaanxi Province), and Shanxi Province, were selected as the research area, which is shown in
Figure 1. This is due to the fact that this region is considered a high-intensity coal mining area in China, and its ecological monitoring and protection have received increasing attention [
26]. Research on the temperature refinement method in this area can provide more a accurate data source for the inversion of various environmental parameters. As per the figure, 145 weather stations represented by the circles exist in this research area. The daily temperature of these weather stations was derived from the China Meteorological Data Network (
http://data.cma.cn/ accessed on 1 January 2020), which provided us with the measured temperature as the output values in the output layer of the ANN model and as the true values in the subsequent verifications. Correspondingly, the gridded temperature with a 0.1° × 0.1° resolution from 34.5° N to 41° N in latitude and from 106.5° E to 114.5° E in longitude was obtained from the ERA5 reanalysis data, and then, the interpolated temperature at the location of these weather stations could be determined. This type of temperature is used as one of the input parameters in the ANN model. In this process, the weather stations were divided into two types, the training sites were utilized to construct the temperature refinement model based on the ANN that accounts for 4/5, and the test sites were regarded as the references accounting for 1/5. To better demonstrate the applicability of the proposed refinement method, 10 experiments were conducted, and the testing stations in each experiment were randomly selected in our experiment. Note that the testing stations in each experiment should be evenly distributed in the research area. For example, the blue and red circles in
Figure 1 represent, respectively, the training and testing stations in one of the experiments.
In each experiment, the measured temperature and the interpolated temperature at the training sites from 2011 to 2015 were utilized to construct the refinement model, and then, the internal and external accuracy tests were conducted. Specifically, the internal accuracy test is to utilize the constructed refined model to estimate the temperature of the training sites from 2011 to 2015 and compare them with the true values. The external accuracy test compares the temperature of the testing sites estimated by the constructed model with the true values in 2016. The proposed temperature refinement method and the traditional temperature interpolation method are called Scheme #1 and Scheme #2, respectively. In Scheme #1, the ANN structure used for the refinement models was determined after repeat training, that is, four nodes in the input layer (same as the number of the input parameters), with a single node in the output layer and two hidden layers with four nodes, and the learning rate, error threshold, and maximum training number were set to 0.02, 0.001, and 6000, respectively, while the hyperbolic tangent and Levenberg–Marquardt were selected as the activation and training functions, respectively. In Scheme #2, the temperature values were directly obtained based on bilinear interpolation, and the accuracy of the interpolation method from 2011 to 2015 and its accuracy in 2016 correspond to the internal and external accuracy, respectively.
Two statistical quantities, i.e., root mean square error (RMSE) and compound relative error (CRE), were selected as criteria to conduct the comparisons. The formulas are shown as follows:
where
refers to the temperature calculated by the two schemes,
denotes the true values of the temperature, and
represents its mean value.
N is the number of the samples.
The differences between the calculated temperature and the reference temperature for the two schemes were counted in both the internal and external accuracy tests. The statistical indices, including the RMSE and CRE for these 10 experiments, are shown in
Figure 2, in which different circles and colors denote the different schemes and different types of tests, respectively. It is clear that Scheme #1 outperforms Scheme #2 in both the external and internal accuracy tests. Represented by the straight and dashed line in blue, the mean values of RMSE are 2.76 and 2.61 K for Scheme #2 in the internal and external accuracy test, respectively. The mean values of Scheme #1 are 1.59 and 1.76 K, representing an approximately 42% and 33% improvement over Scheme #1. For the CRE, these mean values in internal and external accuracy test are 0.021/0.026 and 0.062/0.057 for Scheme #1 and #2, respectively. Note that Scheme #1 performs slightly worse in the external accuracy test compared to the internal accuracy test, while the opposite is true for Scheme #2. This is due to the fact that Scheme #2 interpolated the temperature without modeling, and its internal and external accuracy tests both compare the interpolated temperature and the true temperature, worsening the performance of the test with more stations.
The empirical distribution functions of temperature differences calculated by the two schemes are plotted in
Figure 3, indicating the percentage of each range of the ZTD differences in the internal and external accuracy tests. The maximum value of temperature differences reached 7.6 and 8.1 K for Scheme #1 in the internal and external accuracy test, respectively. For Scheme #2, the two values became 17.3 and 13.9 K, respectively. In the internal accuracy test, the percentage of temperature differences smaller than 2K was 80% and 58% for Schemes #1 and #2, respectively. The percentages became 78% and 60% in the external accuracy tests. When the range was set to less than 4K, the percentages increased to 98%/87% and 97%/87% for the four situations. This demonstrates that the proposed temperature refinement method can achieve better temperature difference distribution than the traditional method.
To further illustrate the performance of the two schemes, the RMSE and CRE of each station in the internal and external accuracy tests are calculated and shown in the form of a boxplot in
Figure 4. It can be seen from the internal (right panel) and external (left panel) accuracy tests that the length of the box and the range of the bound in Scheme #1 (in red) were smaller than those of Scheme #2 (in blue) in the comparison of both the RMSE and CRE, indicating a better statistical distribution using the proposed refinement method. The 50% RMSE values were concentrated in the ranges of 1.19 to 1.72 K and 1.23 to 1.88 K for Scheme #2 in the internal and external accuracy tests, respectively, outperforming Scheme #2 with respective ranges of 1.63 to 2.85 K and 1.51 to 2.80 K. This advantage is also reflected in the boxplot of CRE, where the ranges are 0.013 to 0.030, 0.020 to 0.074, 0.012 to 0.026, and 0.024 to 0.075 from left to right in the lower penal. Moreover, the statistical values including the RMSE and CRE of the external accuracy test for these ten experiments are listed in
Table 1 to show the different performances of the two schemes. The values within the square brackets are the minimum and maximum, and the value outside the square brackets is the mean value. One can find from the statistical results that Scheme #1 significantly outperformed Scheme #2 in each experiment in terms of the maximum and minimum, as well as the mean value. The performances of the proposed refinement method are stable in the ten experiments, showing an approximately 30% and 55% improvement, respectively, in the RMSE and CRE over Scheme #2.
4. Discussion
The second experiment with the largest RMSE and CRE in Scheme #1 was selected to show the specific performance of the two schemes. The statistical results of the RMSE and CRE for the two schemes are shown in
Figure 5 and
Figure 6, respectively, in which the spatial variation in the accuracy of these schemes can be seen. The upper and lower panels in the two figures are the internal and external accuracy test, respectively. It can be seen that the two statistical results of Scheme #1 at all sites were obviously better than those of Scheme #2 in both the internal and external accuracy tests. From the RMSE, an accuracy of better than 3.5 K has been achieved at all sites for Scheme #1 in the internal accuracy test, while 21% of the sites have an accuracy larger than 3.5 K for Scheme #2. In the external accuracy test, these two values become 3.0 K and 31% for Scheme #1 and #2, respectively. The results of Scheme #2 show geographic differences with poor accuracy in the northeast of the research area, which reflects the spatial differences in the accuracy of the ERA5 gridded data. The proposed refinement method effectively solves the above problem and obtains temperature estimates with high accuracy at all sites in the study area. For the spatial distribution of CRE, it illustrates the same results as
Figure 5.
Further, the histogram of the temperature residuals, namely, the values of subtracting the reference temperature from the calculated temperature, is shown in
Figure 7. Meanwhile, the standard deviation (SD), mean, and median value are listed. All the indicators of Scheme #1 are better than those of Scheme #2 in both the internal and external accuracy tests. For example, the SD, median, and mean values are 1.87, −0.28 and −0.24 K, respectively, for Scheme #1 in the external accuracy test, while they become 2.18, 1.50 and 1.59 K for Scheme #2. The histograms of the two schemes are normally distributed, and the proposed refinement method is better with more temperature residuals concentrated around zero. For Scheme #2, the temperature residuals are approximately distributed around +1 in both the internal and external accuracy test, indicating that the temperature interpolated by the traditional method has a positive systematic deviation. Specifically, the percentages of temperature residuals in the range of −2 to 2 K are 81% and 71% for Scheme #1 in the internal and external accuracy test, respectively. For Scheme #2, the percentages become 59% and 55%.
5. Conclusions
It is crucial to achieve temperature estimates with high accuracy in a certain area, especially in regions that are difficult for weather stations to cover. The gridded temperature derived from ERA5 reanalysis data provides the opportunity to interpolate temperature at arbitrary locations, but this process tends to introduce errors and uncertainties. To improve the accuracy of interpolated temperature, we established a temperature refinement method using the ANN model, which made full use of the measured temperature derived from weather stations and the interpolated temperature from ERA5.
The high-intensity coal mining area including the Ordos, Shanbei, and Shanxi Provinces was selected as the research area, and ten experiments were conducted. Numerical results, namely, the RMSE and CRE, showed that the proposed refinement method had a better performance in temperature calculation compared with the traditional method, in both the internal and external accuracy tests. Specifically, the proposed temperature refinement method achieved an RMSE of 1.59 K and 1.76 K, respectively, in the internal and external accuracy tests in the ten experiments, which was 1.17 K and 0.85 K smaller than the results of the traditional method, representing an approximate 42% and 33% improvement over them. Moreover, the proposed temperature refinement method can effectively solve the deficit of geographic differences in the accuracy temperature interpolation and obtain temperature estimates with high accuracy at all sites in the research area. However, only a certain area was selected to conduct the experiments; thus, the effectiveness of the proposed temperature refinement method in other regions, especially in areas with mountains, still needs to be further verified. In addition, multisource data such as the global atmospheric reanalysis data of the China Meteorological Administration (CMA) could be added to try to further improve the accuracy of the proposed refinement model in follow-up research.