1. Introduction
Advancements in space technology have significantly increased both the diversity and quantity of remote sensors, resulting in an abundant array of remote sensing image resources [
1]. Achieving both a multispectral and a wide instantaneous field of view in remote sensing images is challenging while maintaining a high signal-to-noise ratio. This limitation implies that a single remote sensor often cannot simultaneously capture images with both high spatial resolution and multispectral resolution [
2]. Remote sensing image fusion technology integrates images with different spatial and spectral resolutions to produce a composite image that combines both high spatial and multispectral resolution. This technique enables the fused image to present detailed information with enhanced spatial clarity and spectral richness [
3]. High-quality, quantitative remote sensing image fusion must not only achieve high spatial and multispectral resolution but also ensure accurate spectral quantification. This means that the pixel values must be computable to derive physical information about the Earth’s surface, such as radiance or reflectance, within the response wavelength range of the sensor. Quantitative fusion of panchromatic and multispectral color remote sensing images have widespread applications, offering reliable foundations for research in areas such as land cover change [
4], terrestrial ecosystem monitoring [
5], and the monitoring and classification of forests and crops [
6,
7]. Additionally, government agencies can leverage these high-quality fused images to develop policies that address environmental changes and ensure the sustainable management of land resources [
8].
To achieve the quantitative fusion of remote sensing images, researchers have undertaken significant efforts. For example, in 2011 [
9], an improved additive wavelet transform was introduced for fusion, capable of preserving both radiometric and geometric information. In 2012 [
10], a multisensor image fusion method based on a hidden Markov tree and a pulse-coupled neural network (PCNN) was proposed. This method uses a PCNN to select the maximum value for low-pass coefficients and a saliency-based rule for directional coefficients, addressing minor distortions in structural components. In 2013 [
11], the authors proposed a fusion method based on variational wavelets, which perform well with highly heterogeneous data. In 2014 [
12], the authors explored contourlet representation and introduced an adjustable contourlet transform for effective fusion, averaging the low-pass coefficients and selecting the absolute maximum value for directional coefficients. This method corrects minor structural distortions and radiometric blurring. In 2015 [
13], a region division strategy was used in the shearlet domain for pansharpening, applying region-correlation-based fusion rules to all decomposed coefficients. Significant spatial enhancement was observed as a result of these region-based rules. In 2016 [
14], a pansharpening method in the shearlet domain was proposed, considering regional correlation metrics and applying local-region-based fusion rules to approximation coefficients and gradient-based fusion rules to directional coefficients. The results demonstrate that this method effectively preserves structural and radiometric information. In 2018, [
15] conducted the spatiotemporal fusion study using deep convolutional neural networks (CNNs) in the context of massive remote sensing data. In 2019, [
16] presented a region-based fusion scheme for combining panchromatic, multispectral, and synthetic aperture radar images. Temporal data fusion and high spatial methods were used to generate synthetic Landsat imagery by combining Landsat and MODIS data. In 2021 [
17], a new region-based fusion method combining non-subsampled contourlet transform (NSCT) and particle swarm optimization (PSO) was proposed. This method applies a maximum-based rule to the approximation layer, separates band-pass coefficients into smooth regions, and uses PSO for edge regions and a maximum-based rule for separated components. It performs well in spatial enhancement and mitigates minor blurring effects.
In recent years, deep-learning-based fusion methods have been categorized into three types based on the supervisory paradigms employed during the training process: unsupervised, self-supervised, and supervised approaches [
18]. Supervised methods utilize ground truth values to guide the training processes, while unsupervised approaches construct loss functions by constraining the similarity between the fusion results and the source images. Self-supervised algorithms are commonly associated with the AutoEncoder (AE)-based framework. In 2020, [
19] introduced the pansharpening generative adversarial network (Pan-GAN), the first method to explore the unsupervised fusion of multispectral and panchromatic images. This approach incorporates two discriminators that create adversarial relationships between the fusion result and the two source images, each assessing the fidelity of spectral and spatial information, respectively. Ref. [
20] introduced a semantic-aware real-time image fusion network (SeAFusion) in the same year. This study employs a cascading approach that integrates a fusion module with a semantic segmentation module, allowing semantic loss to guide high-level semantic information to flow back to the fusion module. Additionally, it proposes methods such as gradient residual dense block (GRDB) to bridge the gap between image fusion and high-level vision tasks. The concept of self-supervised edge-attention guidance for image fusion (EAGIF) was proposed in [
21], utilizing a coarse-to-fine deep architecture to learn multiscale features from multimodal images. It also designs an edge-guided attention mechanism based on these multiscale features to focus the fusion process on common structures, thereby enhancing detail recovery while attenuating noise. Supervised fusion methods, as proposed in [
22,
23,
24,
25], illustrate the advancements that deep learning has contributed to image fusion techniques. In 2017, [
22] proposed the pansharpening deep network architecture (PanNet), which employs residual learning to shift network training to the high-frequency domain. This approach allows the network to focus on learning high-frequency structural information, thereby enhancing the spatial quality of the fusion results. Ref. [
23] proposed the super-resolution-guided progressive pansharpening neural network (SRPPNN), which incorporates two specific structural designs: a super-resolution module and progressive learning. These features enable the network to continuously capture spatial details at various scales and progressively integrate them into the upsampled multispectral images. Ref. [
24] proposed the gradient projection-based pansharpening neural network (GPPNN), which investigates generative models for panchromatic and multispectral images. This approach explores the spatial and spectral degradation processes and uses them as priors to guide the optimization of neural networks, thereby enhancing fusion performance. In 2022, [
25] introduced GTP-PNet, which employs a specialized transformation network (TNet) to model the spectral degradation process. This method establishes a more accurate nonlinear regression relationship between multispectral and panchromatic images in the gradient domain. The nonlinear regression relationship is used as a prior to constrain the preservation of spatial structures, thereby ensuring a balance between spectral and spatial information.
The research discussed above primarily focuses on improving fusion results through various approaches, including advancements in image decomposition methods [
9,
11,
12,
14], the establishment of stringent criteria for selecting fusion coefficients [
10,
13,
17], enhancing the accuracy of nonlinear spectral fitting [
22,
25], and optimizing detail and spectral quantitative guidance [
19,
20,
21,
23,
24]. However, there are still some problems in quantitative remote sensing image fusion. The information in remote sensing images represents a combination of atmospheric and surface data. Atmospheric effects not only reduce contrast and blur detail textures in panchromatic images [
1], but also distort spectral information and decrease the quantitative accuracy of multispectral color images [
26]. As a result, the fused images obtained under these conditions exhibit blurred spatial details and errors in color spectral information, failing to meet the requirements for quantitative remote sensing image fusion. Most research considers atmospheric information as part of remote sensing image data and employs mathematical methods to develop updated and more efficient fusion algorithms. However, the impact of atmospheric effects is often overlooked. This neglect prevents significant improvements in the spatial detail and spectral quantification of fused images. Additionally, aerosol observation methods are relatively limited, and aerosols exhibit strong spatiotemporal heterogeneity [
27]. This results in aerosol parameters that often do not align temporally and spatially with the remote sensing images, leading to residuals in atmospheric correction results [
28].
The primary objective of this study is to explore methods for obtaining highly synchronous aerosol optical depth (AOD) and column water vapor (CWV) measurements and integrating these atmospheric parameters as source images for fusion with panchromatic and multispectral remote sensing images. This approach aims to mitigate the impact of atmospheric effects on the quantitative fusion of remote sensing images. The study examines the interaction between atmospheric correction and five widely used fusion methods: principal component analysis (PCA) [
3], intensity-hue-saturation (IHS) [
29], Laplacian pyramid (LP) [
30], discrete wavelet transform (DWT) [
4], and non-subsampled contourlet transform (NSCT) [
31]. It investigates how factors such as the choice of fusion methods, atmospheric correction, the synchronization of atmospheric parameters, and the timing of atmospheric correction affect the quality of fused remote sensing images. The findings provide a valuable reference for establishing robust quantitative fusion processes. The data for this study were obtained from airborne flight experiments involving multiple remote sensors, enabling a systematic comparison of spatial detail and spectral quantitative accuracy in the fused images.
5. Conclusions
This study investigates the impact of 6S atmospheric correction combined with five image fusion methods (PCA, IHS, DWT, LP, and NSCT) on the performance of fused images, focusing on four aspects: the choice of fusion method, the implementation of atmospheric correction, the synchronization of atmospheric parameters, and the timing of atmospheric correction.
The experimental results for each fusion method indicate that the choice of fusion technique significantly affects the spatial details of remote sensing fused images. In three experimental scenarios, selecting an appropriate fusion method can enhance the spatial detail evaluation values of the fused images by factors of 1.149 to 2.739, 1.056 to 2.597, and 1.057 to 2.111, respectively. However, the choice of fusion method has a relatively minor impact on the spectral quantification of remote sensing images. The maximum reduction in DTR for black targets, white targets, cement surfaces, and the grassland was observed to be as follows: from 193.26% to 175.20%, from 40.77% to 36.70%, from 41.92% to 35.02%, and from 68.41% to 51.97%, respectively.
Experiments employing different atmospheric parameters for atmospheric correction demonstrate that atmospheric correction improves the performance of fused images across all methods, with synchronized atmospheric parameter maps yielding better results than single-value atmospheric parameters. In the three experimental scenarios, atmospheric correction enhances the spatial details of PCA, IHS, DWT, NSCT, and LP fused images by factors ranging from 1.12 to 1.38, 1.06 to 1.98, 1.03 to 1.89, 1.02 to 1.73, and 1.06 to 2.03, respectively. Additionally, atmospheric correction reduces the DTR for four types of features (black targets, white targets, cement surfaces, and grassland), with maximum reductions of the following: from 163.7% to 9.4%, from 36.7% to 6.3%, from 35.0% to 10.0%, and from 51.9% to 5.4%. Furthermore, experiments conducted at different timings for atmospheric correction indicate that placing the atmospheric correction step before the fusion step yields better results.
In summary, integrating synchronized atmospheric parameter distribution images into the fusion framework prior to remote sensing image fusion, along with selecting fusion methods based on multiscale decomposition, such as NSCT, can significantly enhance both the spatial details and spectral quantification of fused images. The findings of this study provide a technical pathway for achieving the high-quality quantitative fusion of remote sensing images.
While this research demonstrates the impact of 6S atmospheric correction and five fusion techniques on the performance of fused images, future work will explore deep-learning-based methods. Deep learning approaches not only hold promise for image fusion but may also address challenges related to atmospheric correction and aerosol parameter inversion, necessitating further investigation.