1. Introduction
Temperature profiles represent the temperature values at different altitudes, which is an essential meteorological characteristic of the atmosphere and the basis for accurately retrieving atmospheric trace gases. Meanwhile, temperature profiles represent the thermal conditions of the atmosphere, and monitoring their distribution and variability is crucial for researching climatic phenomena, such as the greenhouse effect [
1]. Since studying climate radiative forcing requires global-scale observations of temperature, the accurate research and descriptions of atmospheric evolution, climate change, numerical forecasting, short-range warning, artificial weather impact, and other meteorological protection efforts rely on the real-time and effective detection of atmospheric temperature profiles.
In 2008, China established the National Key Science and Technology Infrastructure Project, “Integrated Ground-based Monitoring Meridian Chain for the Eastern Hemisphere Space Environment”, in order to establish comprehensive observatory stations for continuous monitoring of the middle and upper atmosphere from the Earth’s surface [
2]. The middle atmosphere is a crucial aspect in the transition to the space environment between the Sun and the Earth. The phenomenon of temperature inversion in the middle atmospheric layer has a significant impact on the launch probability of missiles and satellites, the probability of accurate orbital access, and the operational lifetime of such technology; therefore, monitoring the temperature of the middle layer is of great importance for scientific research and aerospace activities [
3]. The mesosphere evolves more slowly than the troposphere, and its downward transport may lead to continuous and predictable changes at the surface it. Thus, is also important to observe or predict the temperature of the mesosphere for weather studies involving the troposphere or surface [
4].
Currently, there are four primary sources of atmospheric temperature profiles: satellite-based measurements, airborne measurements, ground-based measurements, and reanalysis data. Among them, satellite-based measurements provide a wide range of observations and are continuous on the time scale, but their observation time spans are shorter compared to those of reanalysis data.
Between 2008 and 2021, China successfully launched five ‘Fengyun-3’ series meteorological satellites (FY-3A, FY-3B, FY-3C, FY-3D, FY-3E), with a microwave hygrometer, microwave thermometer, microwave imager, and other payloads on board [
5,
6]. However, FY-3A and FY-3B are currently out of operation. The Fengyun (FY)-3C satellite carries the microwave humidity and temperature sounder on board, which began taking measurements on 30 September 2013. The microwave humidity and temperature sounder (MWHTS) observes the vertical distribution of atmospheric temperature and moisture [
7]. The Advanced Technology Microwave Sounder (ATMS), which is on board the Suomi NPP (National Polar-orbiting Partnership), the preparatory star for the next-generation U.S. polar-orbiting weather satellite JPSS (Joint Polar Satellite System), launched in 2011, is a combination of microwave thermometer AMSU-A and microwave hygrometer AMSU-B/MHS that detects the vertical distribution characteristics of atmospheric temperature and humidity [
8].
Airborne measurement is expensive and lacks time continuity, so this measurement method is generally not used; for ground-based measurement, which includes measurements using a microwave radiometer, a sounding balloon, or other methods, a ground-based microwave radiometer measurement shows good time continuity for temperature profile observation, but it is greatly affected by weather: especially under cloudy conditions, the uncertainty of the cloud absorption coefficient leads to an increase in error or even failure. Conventional sounding measures are the most reliable and representative approach for measuring atmospheric temperature profiles, but they have limits in terms of temporal continuity, station spread, and expense. RAOB (Radiosonde Observation) sounding data are the actual measurements from radiosondes at global weather stations from the National Oceanic and Atmospheric Administration (NOAA)-National Environmental Satellite Data and Information Service (NESDIS) operational meteorological database archive [
9], whose soundings have high confidence and representativeness and are generally used for data accuracy validation; for example, Ma, Y. et al. (2020) used RAOB data to verify the accuracy of atmospheric temperature data retrieved from AIRS for the Taklamakan Desert [
10]. In addition, at present, some platforms in various countries use assimilation and other technologies to fuse satellite, ground-based, aircraft, ship, and other data in order to produce reanalysis data with good temporal continuity and wide spatial coverage. Reanalysis is the process of reprocessing a series of observational data using an assimilation system, which typically produces data with a wide variety of parameters, a long duration, and a wide spatial resolution, and the role of reanalysis in climate monitoring applications is now widely recognized. As a result, reanalysis data are increasingly used in the fields of agriculture, weather monitoring, energy, oceanics, etc. The temporal and spatial resolutions of various reanalysis datasets have been increasing, and the time span has increased from a decade to more than 100 years.
Data assimilation is a method of incorporating new observations into the dynamic operation of a numerical model, taking into account the spatial and temporal distributions of the data and the errors in the observed and background fields, which is therefore equally applicable to data fusion. Data assimilation methods commonly include successive correction, optimal interpolation, 3D/4D variation, and Kalman filters. In order to understand 3DVar and 4DVar, it is necessary to understand Bayesian theory and great likelihood estimation, as well as some basic variational theories. If one wants to understand the ensemble Kalman filter, one needs to understand the ideas of theories such as minimum variance estimation [
11,
12]. The successive correction method and the optimal interpolation method are similar, in that they both make a distinction between the observed values and the points to be assimilated, which are then interpolated into the values of the points to be assimilated, and then results are finally obtained as analytical values. The difference between them is that, using the optimal interpolation method, the weight function is calculated by minimizing the analytical variance. Therefore, the biggest improvement in the optimal interpolation method compared with the successive correction method is that, when calculating the weights, the correlation between various observational errors and the correlation between different observations are considered. This avoids the disadvantage of arbitrary weight selection, as occurs when using the successive correction method [
13,
14]. The variational assimilation method uses the numerical model as a kinetic constraint. It reduces the data fusion to the problem of solving the extrema of the objective function, characterizing the deviations between the analyzed and observed fields and the background field [
15]. If the objective function is defined in three dimensions (excluding the time dimension), then it is a three-dimensional variational method; if it is defined in four dimensions, it corresponds to a four-dimensional variational method. Due to the great computational effort required for the four-dimensional variational method, it is relatively rare in the operational application of data fusion. Based on the sounding data, Cai Yi et al. (2017) used the optimal interpolation method to correct the atmospheric profile of MODIS inversion. The profile accuracy can be effectively improved in areas with corresponding ground stations. However, the method is no longer applicable in places where ground stations are missing [
16]. S. Mahagammulla Gamage et al. (2020) combined the measurements of Raman lidar with ERA5 data. For this approach, the authors used the one-dimensional variational method, based on the optimal interpolation method, to finally obtain the fused product, in which the initial separate products were improved to some extent [
17].
Usually, the reanalysis data cover the whole world, such as with ERA5, MERRA-2, JRA55, NCEP, and other datasets, and they are better than sounding data in terms of spatial and temporal resolution, which makes them very suitable for analyzing spatial and temporal changes over long periods. Currently, data from the National Oceanic and Atmospheric Administration (NOAA)/National Centers for Environmental Prediction (NCEP) and the European Centre for Medium-Range Weather Forecasts (ECMWF) are available. The National Aeronautics and Space Administration (NASA) Global Modeling and Assimilation Office (GMAO) and the latest reanalysis programs of the Japan Meteorological Agency (JMA) additionally provide a rich collection of climate data products [
18]. Among them, the temperature profiles of ERA5 and MERRA-2 have relatively high accuracy and are widely used. Robert M. Graham et al. (2019) used AWI radiosonde observations to assess the accuracy of five global atmospheric reanalysis datasets from the Fram Strait, including temperature profiles from ERA5, ERA-I, JRA-55, MERRA-2, and CFSv2. The ERA5 temperature profile demonstrates the best accuracy among all five reanalyses, with a correlation coefficient of 0.96 and a deviation of 0.3 °C from the actual value. In contrast, MERRA-2 exhibits the second-best accuracy, with a correlation coefficient of 0.95 and a deviation of 0.5 °C from the actual value [
19].
Among the currently known temperature profile products, ERA5 has high temporal and horizontal resolutions, which can reach 1 h and 0.25°, respectively [
20]. In terms of vertical distribution, MERRA-2 has the finest relative pressure distribution, at 42 levels [
18]. Compared to other reanalysis data, JRA-55 has a temporal resolution of 6 h and a spatial resolution of 125 km. NCEP-DOE AMIP-II has a temporal resolution of 6 h and a spatial resolution of 250 km, both of which are relatively coarse in resolution; hence, ERA5 and MERRA-2 are used for data fusion in this research [
21,
22]. Moreover, for some pressure layers unique to MERRA-2, most occur in the stratosphere and mesosphere. Although seasonal weather predictions were previously based mainly on data from the troposphere, meteorological data from the middle atmosphere are becoming increasingly non-negligible in climate change predictions. Since the turn of the 21st century, data modeling studies have increasingly incorporated stratospheric and even mesospheric information. While the spatial and temporal resolutions of ERA5 are high, it lacks some information about the mesosphere.
Therefore, in this paper, the optimal interpolation method with high efficiency is chosen for data fusion, mainly by taking full advantage of the high observational accuracy of RAOB sounding data, combining the advantages of the horizontal resolution of ERA5 and the vertical distribution of MERRA-2, and using the optimal interpolation method to optimally fuse the two data that ERA5 and MERRA-2 in order to avoid the problem of the discontinuity of sounding data. A spatially continuous fused product, with a horizontal resolution of 0.25°, a pressure layer of 45 layers, and high accuracy, is obtained. The final generated temperature profile fused data can be used for meteorological studies.
3. Fusion of Temperature Profile Data
3.1. Data Fusion Process
In this paper, we take full advantage of the high observation accuracy of RAOB sounding data, combine the advantages of the horizontal resolution of ERA5 and the vertical distribution of MERRA-2, and adopt the optimal interpolation method to fuse the data of each pixel in order to obtain a fused product with high spatial resolution and accuracy. The specific operation is to extend the insufficient horizontal resolution of MERRA-2 to 0.25° resolution using the interpolation method. The data presented show that, although the vertical distribution of MERRA-2 is finer than that of ERA5, some of ERA5’s pressure layers are not available for MERRA-2. The empirical equation between the temperature of the unknown pressure layer and the temperature of the known pressure layer is solved using both MERRA-2 and ERA5 so that the pressures of ERA5, RAOB, and MERRA-2 are all upgraded to 45 layers. In addition, for the true value, the accuracies of ERA5 and MERRA-2 are first verified using the RAOB with different regional and pressure layers, and the one with better accuracy is taken as the accurate and reasonable initial condition (initial field) under different conditions. Finally, ERA5 and MERRA-2, with 45 pressure layers and 0.25° horizontal resolution, are used for the optimal interpolation method calculation in order to obtain the final fused results, the flowchart for which is shown in
Figure 2.
3.2. Regional Division
Firstly, by reading the data, it was found that most of the RAOB sounding data are detected twice a day, and the data times obtained are UTC 0700 and UTC 1900. In this study, the RAOB sounding data and ERA5 temperature profile data at UTC 0700 are downloaded, since the temporal resolution of the MERRA-2 temperature profile is 3 h. MERRA-2 data are not available at UTC 0700, so UTC 0600 is temporarily used instead. To ensure that the data in each region are optimally fused, the experiments calculated the accuracy of ERA5 and MERRA-2 temperature profiles in different regions for all seasons in 2019. Based on the experimental results, the globe was divided into regions as shown in
Figure 3. The black dots in this figure show the distribution of some of the RAOB sounding balloon sites around the world, and it can be seen from the figure that the sounding sites are not continuous. However, they are distributed in various locations around the world.
3.3. Data Evaluation
In order to fuse ERA5 and MERRA-2 optimally, a preliminary evaluation of the data was first carried out. Based on the optimal interpolation method, the data with higher accuracy are used to correct the data with relatively poor accuracy, thus achieving the result of optimal data fusion. MERRA-2 and ERA5 were matched to data from RAOB, and error calculations were conducted for each point, averaging the errors within each region according to the results of the regional division in
Figure 2, and then analyzed. The results are shown in
Table 1 and
Table 2. The analysis found that the accuracy of ERA5 is higher than that of MERRA-2 in most regions and seasons; the blue-bold font indicates the regions where the accuracy of MERRA-2 is better. To ensure that the optimal fusion is obtained for each pressure layer, the experiment further explored the errors of different pressure layers over different seasons, as shown in
Figure 4, which shows the error profiles in winter 2019, where columns 1 and 3 represent the RMSE of ERA5, and columns 2 and 4 represent the RMSE of MERRA-2. The horizontal lines in each figure represent the data error, and the longer horizontal line represents the larger error. It is found that ERA5 and MERRA-2 have their advantages in different pressure layers. Therefore, this paper will use different correction methods for different pressure layers and regions. For the pressure layers where ERA5 demonstrates a higher accuracy, MERRA-2 will be used as the initial value, and ERA5 will be used as the observed value with which to correct MERRA-2. Moreover, for the pressure layers where MERRA-2 demonstrates a higher accuracy, ERA5 is used as the initial value and MERRA-2 as the observed value with which to correct ERA5.
3.4. Data Fusion
The basic steps for the data fusion described in this paper include the following:
(1) Using existing data, a look-up table for converting known pressure layer temperatures to unknown pressure layer temperatures was calculated, and the datasets were converted to 45 pressure layers.
(2) The MERRA-2 temperature profile data were populated with global 0.25 data using interpolation.
(3) Based on the RAOB temperature data for all seasons in 2019, the accuracies of ERA5 and MERRA-2 were evaluated in different regional and pressure layers. Thereafter, the optimal fusion was applied to different pressure layers and regions.
(4) Algorithm results were evaluated for accuracy using RAOB sounding data.
By investigating the vertical distribution of the pressure recorded using ERA5 and MERRA-2, it was found that the merged set of both should have 45 pressure levels: 1000, 975, 950, 925, 900, 875, 850, 825, 800, 775, 750, 725, 700, 650, 600, 550, 500, 450, 400, 350, 300, 250 225, 200, 175, 150, 125, 100, 70, 50, 40, 30, 20, 10, 7, 5, 4, 3, 2, 1, 0.7, 0.5, 0.4, 0.3, and 0.1 (hPa); therefore, the corresponding look-up tables were created by polynomially fitting the pressure layer at the known temperature to the pressure layer at the unknown temperature using two datasets—those of ERA5 and MERRA-2—so that, for the corresponding look-up tables, all sets are represented by equations for , applied to the datasets of RAOB, ERA5, and MERRA-2, in order to find the data corresponding to the pressure layer of the unknown temperature, and, finally, the vertical distribution of all datasets was extended to 45 layers for subsequent calculations.
In order to validate the accuracy of this method, the RAOB data stored at UTC 0000 on 1 January 2020 were read. The real time of the data was calculated to be UTC 0700, so the temperature profile data of the datasets at the Beijing site (116°20′E, 39°56′N) at UTC 0700 on 1 January 2020 were read and plotted. In
Figure 5a, the temperature profiles of MERRA-2, ERA5, and RAOB are shown.
After processing using the look-up table method, the temperature profiles of the datasets in the Beijing area at UTC 0700 on 1 January 2020 were drawn again for comparison, as shown in
Figure 5b. It was found that the temperature profile results generated using this method were in normal order, and, therefore, all temperature profile data were converted to 45 pressure layers using the empirical formulae obtained. The temperature profiles of the datasets in the middle layer in Beijing are shown in
Figure 5c, and their shapes are also roughly similar.
The subsequent use of the RAOB sounding data included the processing into 45 pressure levels, to verify the ERA5 and MERRA-2 temperature profile data at the same time and with the same number of layers. The results are shown in
Figure 6a, where the blue line represents the error of MERRA-2, and the red line represents the error of ERA5. By calculation, the mean RMSE of the ERA5 temperature profile is approximately 5.7 K, and the mean MAE value is approximately 4.3 K. The mean RMSE value of MERRA-2 temperature profile is approximately 11.7 K, and the mean MAE value is approximately 8.9 K. The mean values of both the RMSE and MAE of ERA5 are approximately 5 K smaller than those of MERRA-2.
Figure 6b shows the error of the temperature profile in the middle layer. The mean RMSE value of the ERA5 temperature profile is approximately 7.8 K, and the mean MAE value is approximately 5.9 K. The mean RMSE value of the MERRA-2 temperature profile is approximately 7.9 K, and the mean MAE value is approximately 5.9 K. The error in the mesosphere is large compared to that in the troposphere, probably because the true value in the mesosphere is calculated by the look-up table, which exhibits some uncertainty.
Since the spatial resolution of MERRA-2 is 0.5° × 0.625°, it needs to be interpolated to 0.25°. After interpolation, the horizontal resolution of both MERRA-2 and ERA5 is 0.25°. Finally, the two datasets, with a horizontal resolution of 0.25° and a pressure layer of 45 layers, were substituted into the optimal interpolation method formula described in
Section 2.3.1 in order to calculate the fused product results.
5. Conclusions
In this paper, we proposed a fusion method to yield high spatiotemporal atmospheric temperature profiles from RAOB, ERA5 and MERRA-2 data based on the optimal interpolation method. The method takes advantage of the high observation accuracy of RAOB sounding data, combines the advantages of the horizontal resolution of ERA5 and the vertical distribution of MERRA-2, and adopts the optimal interpolation method in order to fuse the data with a horizontal resolution of 0.25°, a vertical distribution of 45 pressure levels, and high accuracy. The following conclusions were drawn:
A polynomial fitting method was used to develop an empirical formula for converting known pressure layer temperatures to unknown pressure layer temperatures, allowing the data to be filled with 45 pressure layers.
The accuracy is reduced by 6.0 K for RMSE and 5.0 K for MAE relative to the MERRA-2 temperature profile, and by 0.3 K for RMSE and 0.4 K for MAE relative to the ERA5 temperature profile.
By comparing the values of the fused data with the RAOB sounding data and the RMSE(K) at different stations, it was found that the fused data was already very close to the RAOB sounding data. Additionally, the fused data was the finest, in terms of both horizontal resolution and vertical distribution, of any directly downloadable product available. Therefore, this method has the potential to be applied to meteorological studies in order to provide finer temperature profile products.
Since the time difference between the MERRA-2 initial temperature profile data and RAOB is one hour, this will affect the accuracy of the final fused product. Therefore, we will consider interpolating the MERRA-2 data in the future, and then taking the ERA5 and MERRA-2 data at the same time into the algorithm, so that hour-by-hour fused results can be calculated. In addition, using the current fused results, the temperature profile can only be verified against the daily RAOB data at the same moment, and in the future, the RAO | combined with ground-based microwave radiometer observations can be evaluated in order to validate the accuracy of the fused data.