1. Introduction
As a significant greenhouse gas with 28 times greater global warming potential than carbon dioxide [
1], the recognition of the importance of monitoring methane (CH
4) in the atmosphere on a global scale has increased. Major absorption bands of CH
4 are located in the short-wave IR (centered at ~1.65 μm and ~3.3 μm) and mid-IR (centered ~7.7 μm) spectral regions, respectively. Spaceborne instruments like the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY) aboard the European Space Agency (ESA)’s Environmental Satellite (ENVISAT), the Thermal and Near-infrared Sensor for Carbon Observation–Fourier Transform Spectrometer (TANSO-FTS) aboard the Greenhouse Gases Observing Satellite (GOSAT), and the Tropospheric Monitoring Instrument (TROPOMI) aboard the Sentinel-5 Precursor satellite (Sentinel-5P) measure the absorption of solar radiation by CH
4 in the short-wave IR band and provide near-surface sensitivity [
2,
3,
4]. Both the SCIAMACHY and GOSAT measure strong CH
4 absorption signals around 1.65 μm. The TROPOMI measures CH
4 mixing ratios using the absorption information from the Oxygen-A Band (760 nm) and the short-wave IR band (2.3–2.4 μm). GOSAT also provides measurements in the thermal infrared band between 5.5 and 14.3 µm, which is used to derive thermal IR CH
4 data [
5,
6]. Compared with short-wave IR observations, mid-wave IR spectral measurements in the 7.7 μm mid-IR band are more sensitive to mid to upper tropospheric CH
4. IR sounders like the AIRS, IASI, and CrIS all measure the absorption of thermal IR radiation in the 7.7 μm band and therefore can provide important complementary information needed for global CH
4 monitoring.
Existing IR sounders have limited measurement sensitivity and instrument spectral resolution. Consequently, fully resolving weak CH
4 signals becomes difficult, impacting the accuracy of both total concentration and vertical profile distribution of the retrieved atmospheric CH
4. CH
4 absorption lines in the mid-IR band overlap with those of other gases, such as water vapor and nitrous oxide (N
2O), further introducing difficulties in isolating and accurately quantifying CH
4 concentrations. Retrieval studies using the AIRS and IASI have shown that the retrieved upper tropospheric CH
4 can easily have a bias error ranging from 1 to 4% [
7,
8]. Studies have shown inconsistencies between the CH
4 profiles retrieved from AIRS and GOSAT TIR measurements, as well as inconsistencies between the CH
4 profiles retrieved from AIRS and IASI measurements [
9,
10]. And the global spatial distribution of these CH
4 retrieval products remains to be validated.
CH
4 retrieval studies using hyper-spectral IR sounders measurements have been mostly based on the optimal estimation methodology (OEM) following Rodger’s formulism [
11]. A priori knowledge of CH
4 is critically needed to complement information content from the IR sounder measurements. Xiong et al., used latitudinal-dependent CH
4 first-guess profiles derived from the monthly averaged results of in situ aircraft observations, ground-based flask network measurements, satellite observations, and the atmospheric transport model TM3 [
7]. García et al., retrieved CH
4 from IASI measurements using mean profiles from WACCM (Whole 30 Atmosphere Community Climate Model-version 5,
https://www2.acom.ucar.edu/gcm/waccm), averaged on a 1.9° × 2.5° grid for the 2004–2006 period, as the first-guess profiles [
12]. An ad hoc Tikhonov–Philips slope constraint is used to maintain the vertical shape of the CH
4 profiles during the retrieval process. De Wachter et al. also used the a priori profile and the covariance matrix constraint derived from WACCM, but a single global climatological a priori was used in their study [
8]. Siddans et al. used a fixed value of 1.75 ppmv in the troposphere as the a priori state and two years of zonal mean values from the TOMCAT chemical transport model to construct a priori error statistics for IASI CH
4 retrievals [
13]. Razavi et al., used the a priori derived from the “Laboratoire de Météorologie Dynamique” global climate model that constraints north–south latitudinal gradients of CH
4 distribution [
14].
OEM usually assumes a Gaussian distribution of the possible solution around its a priori state, as well as a liner relationship between the change in CH
4 concentration from the a priori state and the associated change in observed radiance. Considering the complexity of the inverse relationship to be established for CH
4 retrieval, Crevoisier et al. used a neural network-based non-linear inference method to retrieve CH
4 from IASI observations [
15]. The neural network-based approach, in theory, can be used to represent a non-linear training-prediction relationship, but the prediction accuracy largely depends on the representativeness of the training samples. The neural network-based approach also lacks the error estimation that is provided by the OEM-based scheme. The approach was trained using simulated data. The difference between simulated radiances and the observations was simply addressed via a bias correction based on one year of data over the tropical region. Considering the scene-dependent nature of the simulation error, large uncertainty between the simulation and the observation likely remains and inevitably contributes to the retrieval error.
In this paper, we present a novel CH
4 retrieval methodology based on the spectral fingerprinting approach. Such an approach utilizes spectral information from IR sounder measurements to derive a scene-dependent a priori state and corresponding constraint for each individual CH
4 retrieval. Well-defined a priori information will greatly improve the linearity of the inverse relationship between the atmospheric CH
4 and the spectral radiances observed, as well as the Gaussian distribution characteristics of the solution around the a priori state. The spectral fingerprinting scheme is based on a pre-constructed database that comprises an ample set of representative reference states, along with their corresponding radiative kernels. In this approach, a clustering method based on machine learning is employed to stratify and identify the scene-dependent a priori state within the pre-constructed database. Spectral radiances serve as the predictors. The a priori state is identified via radiance spectral matching, and the corresponding radiative kernel is then used to find the fingerprinting solution via an OEM-based linear inversion scheme. Details about the spectral fingerprinting scheme will be introduced, with in-depth technical insights into the construction of the fingerprinting database also being provided. The application of the fingerprinting methodology on SNPP CrIS observations will be demonstrated with the derived CH
4 profiles validated using both CH
4 reanalysis data from the Copernicus Atmosphere Monitoring Service (CAMS) and in situ measurement data from the Atmospheric Tomography Mission (ATom) [
16].
The CH
4 fingerprinting algorithm study presented herein contributes to the efforts to improve the single field-of-view sounder atmospheric product (SIFSAP) [
17]. As a novel algorithm developed to complement other sounder products, such as CLIMCAPS and AIRS version 7 [
18,
19], the SiFSAP system produces hyper-spectral IR sounder Level 2 data at an instrument-native spatial resolution and provides high-accuracy spectral fitting to the top of the atmosphere (TOA) sounder observations by directly simulating cloud scattering. In order to maximize the information content from the measurements (i.e., minimized use of a priori information), the global climatology-based a prior constraint is used in the SiFSAP retrieval algorithm. We purposely designed the SiFSAP algorithm using relaxed a priori global climatology so that the retrieval results are more sensitive to the small climate signals caught by the TOA spectral radiance observations. High-quality SiFSAP Level 2 products of atmospheric temperature, water vapor, and other trace gases such as ozone (O
3) have been used for various dynamics and climate studies [
20]. However, the uncertainty in CH
4 data at a localized, instantaneous scale can be potentially large because the CH
4 information provided by the measurements can be very limited for a significant percentage of cases. The fingerprinting algorithm can be used to generate a high-quality first-guess and scene-dependent a priori covariance constraint to improve the CH
4 retrieval in the SiFSAP system for near-real-time applications. It will be implemented to produce the next version of SiFSAP.
2. The Spectral Fingerprinting Methodology
The spectral fingerprinting methodology has been widely used for the characterization and quantification of biological materials, chemical components, mineral analysis, and remote sensing [
21,
22,
23,
24]. The concept is based on the fact that the spectral feature of a measured signal can be used for the constitutive component analysis of target samples. The analysis usually involves the classification of spectral features associated with known constituents so that the constitutive component can be identified by characterizing the similarity between the measured signal and the spectral features of a prescribed constituent. When the measured spectral signal
of a target sample matches a known reference spectrum, the constituents of the target sample
x are instantly identified using the reference database. In broad terms, spectral fingerprinting integrates both spectral classification and spectral matching procedures. Recent corresponding research in the field of remote sensing has predominantly centered on the application of spectral fingerprinting in the analysis of hyperspectral images [
25,
26,
27].
If the target signal is composed of multiple constituents, the spectral fingerprinting also involves the decomposition of the total measured signal into its different spectral components of individual constituents. The technique has been used for the optimal detection and attribution of climate change signals in the outgoing spectra of TOA radiances [
28,
29,
30,
31,
32]. In those studies, the spectral fingerprints are the anomalies of outgoing spectral radiances, and the constituents are the changes in the TOA spectra that are associated with different feedbacks and forcings. The attribution of those spectral fingerprints to the change in essential climate variables involves the use of radiative kernels and a linear inversion scheme. The spectral fingerprinting scheme can be expressed as:
where
is the spectral fingerprints,
is the radiative kernel,
is the change in the geophysical variables, and
is the fingerprinting residual term.
can be derived from the spectral fingerprints as follows:
where
is the covariance of the residual term
, and
is the covariance constraint of
. In the satellite-based remote sensing of trace gases, spectral fingerprints are established as variations in the observed spectral radiances
with respect to the reference spectrum
. These variations are exclusively attributed to the changes in the trace gas profile
from the reference profile
. The radiative kernel
defines the linear relationship between
and
.
The spectral fingerprinting of CH
4 using the IR sounder measurements includes both classification and one-step linear inversion procedures. The classification analysis is carried out to build a predictive model based on a reference database that includes the representative radiance spectra, the collocated CH
4 profile data, and corresponding radiative kernels. The classification of IR spectral measurements is based on their spectral characteristics associated with the CH
4 absorption. A given IR sounder observation can therefore be automatically assigned to one of the predefined classes. The CH
4 profile
, the corresponding spectrum
, the radiative kernel
, and the fingerprinting residual covariance
of the assigned class are then used in the subsequent inversion procedure. The final solution
is given by adjusting
to account for the small spectral difference
between the measurements
and
:
where the radiative kernel
is derived from the radiative transfer calculation as the Jacobian, i.e., the derivative of the TOA radiance with respect to the changes in
, for the assigned class.
The formulas shown in Equations (2) and (3) follow the standard OEM scheme. Further details regarding the establishment of
,
, and
, along with the construction of covariance matrices
and
, will be elaborated upon in
Section 3.2.2.
4. Results and Validation
The left panels in
Figure 2 and
Figure 3 show two days’ worth of SNPP CrIS CH
4 fingerprinting results (only descending orbital results are presented for simplicity). The fingerprinting results are compared with the global reanalysis dataset of atmospheric composition produced by CAMS. CAMS reanalysis provides sub-daily data interpolated to a regular 0.75° × 0.75° lat/lon grid. The CAMS reanalysis is produced using 4-DVar data assimilation in ECMWF’s Integrated Forecasting System (IFS) using a comprehensive inventory, climatological, and chemical modeling dataset to initialize and constrain CH
4 emissions, natural sources/sinks, and chemical sinks [
36]. The accuracy of the CAMS CH
4 results have been assessed in the CAMS greenhouse gas technical note [
37]. When comparing the global mean values of CAMS CH
4 with observations, the bias in the difference is generally small, with an uncertainty value of 0.01 ppm. Specifically, both the bias and the standard deviation values of the difference between the CH
4 tropospheric profiles and the NOAA AirCore data are below 0.05 ppm. The largest difference in surface and tropospheric column CH
4 between the CAMS results and the Total Carbon Column Observing Network (TCCON) data, with an averaged magnitude of up to 2.5%, is observed at mid- and high-latitude TCCON sites. The difference between CAMS results and data from the Network for the Detection of Atmospheric Composition Change (NDACC) is about 0.4% across all NDACC sites. CAMS reanalysis data are not recommended for investigating a local emission change or quantifying the changes in the atmospheric CH
4 growth rate, but these data can be used to characterize synoptic spatial variability and the seasonal cycle of CH
4 [
37].
We compared the global distribution of the upper to middle tropospheric CH
4 volume mixing ratio (VMR) characterized by the CrIS fingerprinting results with the corresponding daily mean values from the CAMS reanalysis. The side-by-side comparisons illustrated in
Figure 2 and
Figure 3 show good correlations between two results concerning the latitudinal gradient and several large-scale thermodynamic characteristics. Satellite-based CrIS observations for CH
4 are inevitability affected by cloud contamination. As a result, certain areas with high concentrations of CH
4 in the tropical region, as indicated in the CAMS reanalysis, are not accurately reflected in the fingerprinting results. The systematic differences between the two sets of results, most notably in the southern polar region, fall within the range of 1–3%. Nevertheless, the changes in CH
4 concentration from winter to summer in 2017, depicted through the difference between
Figure 2 and
Figure 3, are consistent in both datasets.
Figure 4 demonstrates the latitudinal change in CH
4 concentration at different altitudes. In
Figure 4, the latitudinal variations in CH
4 concentration at different altitudes are illustrated. Consistent latitudinal patterns spanning from the Southern Hemisphere to the Arctic region can be observed in the daily results of the CAM reanalysis and CrIS fingerprinting. Both results also capture similar seasonal changes in CH
4 profiles. Generally, there is a positive tropospheric CH
4 increment in the Southern Hemisphere, contrasting with a negative increment in the Northern Hemisphere. The contrast between the CH
4 profiles on January 11 and July 9 reveals a significant increase in CH
4 concentration within the 100–200 hPa vertical region over Antarctica (90–60°S) and the northern mid-latitude zone (30–60°N).
Hyperspectral sensors like the AIRS, IASI, and CrIS only have measurement sensitivity to CH
4 profiles within a limited vertical region. The AIRS is known to be sensitive to the upper troposphere (200–300 hPa) in the tropics and the middle troposphere (400–500 hPa) in the polar regions [
7]. Similarly, studies on the IASI indicate its sensitivity to mid–upper tropospheric CH
4 [
13,
14,
15]. CLIMCAPS exclusively provides CH
4 mass mixing ratios at 400 hPa, situated near the sensitivity peak defined by the algorithm and the spectral characteristics of the CrIS instrument. Despite the limited information from the measurements, the latitudinal distribution of the CH
4 profiles derived by fingerprinting SNPP CrIS measurements, as illustrated in
Figure 2,
Figure 3 and
Figure 4, generally agrees with the CAM reanalysis. This underscores the benefit of a scene-dependent a priori scheme that can be precisely constructed using the spectral fingerprinting methodology.
Under the fingerprinting scheme, the scene-dependent a priori information obtained through machine learning not only supplements the vertical information but also enhances the retrieval accuracy, particularly in areas where CH
4 signals are relatively weak. Physical retrieval algorithms like CLIMCAPS and AIRS version 7 are susceptible to errors in both the forward model and information from measurements. Addressing those errors becomes more challenging in regions covered by cloud and lacking thermal contrast. The fingerprinting methodology uses an a priori scheme obtained via machine learning to effectively constrain the impact of measurement errors in those regions.
Figure 5 illustrates the latitudinal distribution of mid-tropospheric CH
4 (200–500 hPa) from the SNPP CrIS fingerprinting results of two days (11 January and 9 July 2017), along with the results from SNPP CLIMCAPS and AIRS version 7. Using CAM reanalysis and CT data as the reference, we can see that the daily latitudinal variations in CH
4 from AIRS version 7 and CLIMCAPS are unrealistically large. The global-scale CH
4 concentration is underestimated in AIRS version 7. CLIMCAPS, on the other hand, overestimates CH
4 concentration over the subtropical region and underestimates it over the Antarctic region during the winter. In comparison with the operational sounder retrieval results, the fingerprinting results provide more reasonable estimates of both the magnitude of the CH
4 concentration and the latitudinal distribution on a global scale. These improvements in the accuracy of CH
4 data, which could significantly benefit studies that emphasize daily-scale geographical distributions.
The SNPP CrIS CH
4 profiles were further validated using airborne measurement data. Three years’ (2016–2018) worth of ATom data were used as the reference dataset. ATom observations provide continuous in situ measurements of CH
4 at various altitudes ranging from 1.2 km to 12 km. We began by selecting SNPP CrIS measurements that fall within a 12 h window of individual ATom observations. Subsequently, we generated collocated CrIS results through a two-dimensional space interpolation process. We excluded samples where the horizontal distance between an ATom observation and the nearest CrIS footprint exceeds 1 degree (~100 km). Each collocated sample’s fingerprinting result, representing an individual CH
4 vertical profile, was then aligned with individual in situ results measured at different altitudes using vertical interpolation. ATom deployed flights over several months during a campaign year. The total number of collocated samples used for each year falls within the range of several hundred thousand.
Figure 6,
Figure 7 and
Figure 8 show the CrIS CH
4 results along with the collocated ATom in situ observations. It is evident that the fingerprinting results effectively capture the fluctuations in CH
4 concentration as the aircraft traversed different regions, collecting measurements at diverse altitudes.
Statistics regarding the difference between the CrIS fingerprinting and in situ measurement results are detailed in
Table 2. Both the bias and the standard deviation values below 300 hPa are around 1% or smaller. The bias of the CrIS—ATom difference is comparable to what was reported in another CrIS—ATom inter-comparison study which used CH
4 results from the NOAA-Unique Combined Atmospheric Processing System (NUCAPS) [
38]. However, it is noted that our temporal collocation criterion of ±12 h is much more relaxed compared with the ±1.5 h criterion used for the NUCAPS CH
4 study. Such a choice is based on the consideration that that the change in atmospheric CH
4 concentration in the troposphere is predominantly less than 1% (root mean square difference) within a 12 h timeframe, as evidenced by statistics derived from CT-CH
4 and CAM reanalysis data. Importantly, our study benefits from a significantly larger sample size that is two orders of magnitude higher than that used in the NUCAPS CH
4 study.
A similar satellite–airborne inter-comparison study validated CH
4 results retrieved from the Aura Tropospheric Emission Spectrometer (TES) and AIRS radiances using ATom data [
39]. This study implemented a coincident criteria of 9 h and 50 km, closer to the criteria used in our study. Additionally, specific quality control was applied to screen out low-sensitivity and cloudy cases, leading to a low yield rate (~1/4) of samples [
39]. The difference between AIRS and in situ results exhibited a bias of ~3% over the ocean and ~4% over land before bias correction, with a standard deviation less than 2% [
39]. Both the NUCAPS and the TES-AIRS CH
4 validation studies involved the use of averaging kernel correction to take into account the relative difference introduced by averaging kernel ‘smoothing’ [
40]. However, applying averaging kernel correction on ATom data necessitates extending ATom CH
4 measurements within the limited vertical region to match the complete profile measured by the sounders. It can be difficult to assess errors introduced by extending aircraft measurements using the assumed ‘true’ profiles typically obtained from global atmospheric chemistry models [
7,
10,
40]. Nalli et al. [
38] cautioned that applying averaging kernel correction for an inter-comparison study between satellite-based and airborne measurements can be misleading under certain conditions when the retrieval system has little to no measurement sensitivity to CH
4. Based on these considerations, we did not implement averaging kernel correction in our study. Therefore, the difference between the spectral fingerprinting results and ATom results include the null-space errors.
The error statistics listed in
Table 2 reflect the accuracy of the CH
4 derived under all-sky conditions without the imposition of a carefully designed quality control scheme to filter out low-sensitivity samples with potentially large errors. Despite the absence of a quality control scheme and the retention of null-space errors, the precision and accuracy of our results are still comparable to, or even better than, the NUCAPS and TES-AIRS results. This suggests the potential of using spectral fingerprinting methodology to enhance the CH
4 data products from existing hyper-spectral sounder missions.
Table 2 shows that the errors in the lower troposphere are most pronounced in the northern mid-latitude region as opposed to the other regions. Compared with the ATom observations, the fingerprinting results for individual CrIS footprints demonstrate a consistently positive bias globally. The most substantial errors, both in terms of bias and standard deviation, can be observed in the upper troposphere to lower stratosphere region (above 250 hPa) over the Canadian wetlands and Greenland regions. These discrepancies can be attributed to the limitations in the vertical resolution of CrIS spectral measurements, preventing the precise sensing of the rapid decrease in methane concentrations in the upper-troposphere to low-stratosphere altitudes in the Arctic region. This challenge is particularly notable in regions where the tropopause height can be as low as 300 hPa.
5. Conclusions and Future Work
Compared to ground-based measurements, which serve as anchor observations for long-term CH4 trends and inter-annual variations in the atmosphere, satellite-based measurements play a crucial role in assessing the geographical distribution of CH4 concentrations globally. Accurate information about the sources and sinks of CH4 is essential for climate models to predict atmospheric CH4 concentrations. However, existing CH4 data products based on infrared (IR) sounder observations face limitations in accurately capturing the global geographical distribution characteristics of CH4. A major challenge arises in areas where satellite-based measurements lack information content due to factors like insufficient thermal contrast or cloud blockage. To address this challenge, scene-dependent a priori information is required to enhance CH4 retrieval in these areas. While high-quality CH4 information can be obtained from CH4 data assimilation systems like CT and CAM reanalysis, spectral radiance information from IR sounders has yet to be adequately assimilated in those systems. Additionally, considerations for data latency, especially in applications for environmental monitoring, hinder the direct use of CH4 data from these systems to provide a priori constraints for physical retrievals.
We have developed a spectral fingerprinting scheme to tackle the challenges of retrieving CH4 from satellite-based IR hyper-spectral sounder measurements. The scheme has a lot of similarities to a data assimilation system, but it differs from this type of system because it uses a machine learning-based model to initialize the a priori background and an optimized scheme to enhance the spectral fingerprints of CH4. The fingerprinting algorithm follows a ‘lazy learning’ methodology to efficiently identify a group of matched CH4 profiles using the optimized CrIS spectral radiances as the predictors. The a priori information that largely retains the accuracy and characteristics of CT can be provided in a real-time (near-real-time) manner via a classification scheme based solely on the spectral radiance measured by sounders. A final solution for the CH4 profile can be obtained by using a radiative kernel-based optimal inversion procedure to fit the optimized spectral radiance signals from CrIS measurements. This combination of machine learning and radiative kernel-based inversion has the potential to offer advantages in terms of accuracy and computational efficiency in the context of sounder-based CH4 retrieval.
We have demonstrated that the CH4 retrieved from SNPP CrIS observations via the fingerprinting method can generally catch the vertical and spatial distribution characteristics at a global scale and at different seasons, using the CAMS CH4 reanalysis data as the reference. A validation study carried out using in situ ATom data demonstrated that both the systematic error and the uncertainty associated with the derived CH4 profiles at various altitudes in the tropospheric region range from less than 1% to no more than 2%.
The fingerprinting scheme leverages SiFSAP to generate radiative kernels under all-sky conditions. The results and error statistics presented herein are associated with individual CrIS observations under both clear- and cloudy-sky conditions. Quality temperature, water vapor, surface, and cloud properties from the SiFSAP used in the offline training ensure the accuracy of the radiative kernels at the individual footprint scale of the CrIS observations, thereby safeguarding the accuracy of the CH4 fingerprinting at the fine spatial–temporal scale.
This paper has demonstrated the potential of using satellite-based spectral observations to facilitate the instantaneous monitoring of height-resolved methane distribution. This study showcases the effectiveness of employing machine learning to overcome the challenges posed by modeling and measurement errors in a standard retrieval scheme. Our future work will focus on enhancing the accuracy of the a priori background state by exploring more sophisticated machine learning models. Additionally, efforts will be made to integrate fingerprinting results as the a priori information for the iterative physical retrieval procedure used for the production of SiFSAP CH4.