1. Introduction
Panchromatic sharpening, known simply as pan-sharpening, is a broadly used radiometric transformation-based image fusion and enhancement technique that can merge a high-spatial-resolution panchromatic (PAN) image and a lower-spatial-resolution multispectral (MS) image to approximate a finer-spatial-resolution MS image (MSf) [
1]. In other words, pan-sharpening can increase the spatial resolution of a MS image. Though most MS satellite images have a high spectral resolution, they often lack high spatial resolution because the input spectral signal received by the MS sensor is divided among multiple reflectance bands. PAN images, on the other hand, have high spatial resolution precisely because they integrate the energy of a wide spectral region, typically the visible to near-infrared region, to which silicon is sensitive, to allow for finer-spatial parsing of the available energy (i.e., electromagnetic waves). Pan-sharpening requires that input PAN and MS images be spatially co-registered, be in the same spatial reference system, be fully overlapped, have the same dimension (i.e., ground footprint), and have an assumption of identical scene conditions between the two image sets. It is most common, therefore, to merge PAN and MS images that are obtained coincidently from the same platform or even sensor (e.g., Landsat), such that the geometry and scene conditions are inherently identical.
Pan-sharpening is most frequently used to enhance the visual representation of a remotely sensed image. For example, most of the high-spatial-resolution images used in platforms such as Google Earth are created with pan-sharpening techniques [
2]. These enhanced images allow users to distinguish and observe finer-scale features on the ground more reliably and accurately [
3]. Pan-sharpening is designed to combine the high-spatial-resolution detail from PAN bands with the low-spatial-resolution but high-spectral-resolution detail from MS bands, creating an image which has not only high spatial resolution but also high spectral resolution [
4]. Regardless of how well these conditions are satisfied, pan-sharpening inherently alters the values of multispectral bands, likely explaining its limited use as a preprocessing step for automated machine interpretation.
Over the past decades, researchers have developed a myriad of pan-sharpening algorithms that can be generally classified into one of three categories: (1) component substitution (CS), (2) multiresolution analysis (MRA), and (3) variational optimization (VO) [
5]. A detailed description for each of these categories was provided in a study performed by Meng et al. [
5], but a brief summary is given here. Mathematically, CS methods are the simplest, and they work by first conducting principal component analysis (PCA) on both the MS and PAN images and then normalizing the histograms of each by matching the spectral component from the MS image with the structural component from the PAN one to produce a single fused image with the principal spectral and structural information from both input images. MRA methods split the spatial information from the MS image and PAN image into two-band pass channels, known as approximations (i.e., low-frequency channel) and details (i.e., high-frequency channel). The high-frequency channels of the PAN image are injected into the corresponding interpolated MS bands at the same resolution as the PAN image, and, subsequently, the pan-sharpened MS data are reconstructed from the set of frequency bands [
6]. VO methods approach pan-sharpening as an inverse optimization problem whereby a fused image is derived by generating an energy functional from the input MS and PAN images that is then passed into an optimization algorithm. In other words, VO methods focus on building the most effective and suitable models to characterize the correlation between the spectral information from the MS image with the spatial information in the PAN image [
5,
7,
8,
9]. A review of current industry-standard GIS software revealed that VO methods have yet to be widely adopted. Similarly, machine- and deep-learning approaches (see Reference [
10]), which typically utilize a VO approach, are not readily available in industry-standard software packages.
The performance variability of each algorithm is determined largely by the geographic context of a given scene and the application orientation, or analytical goals at hand. This represents a significant gap in the extant pan-sharpening literature in that most pan-sharpening algorithm performance evaluation studies were focused mainly on one or a couple of geographic contexts [
5]. As it stands, selecting the most effective pan-sharpening algorithm is an important decision, and performance evaluation is an essential step. In recent years, many studies have developed various metrics for performance evaluation, with most focused on examining the quality of the pan-sharpened image in terms of color preservation, spatial fidelity, and spectral fidelity [
11,
12,
13,
14,
15,
16,
17,
18]. However, because all pan-sharpening algorithms modify the pixel values of the original MS image, image products derived from the original MS image and pan-sharpened MS image can be expected to diverge. Index-specific performance evaluations, such as this evaluation on the calculation of normalized difference vegetation index (NDVI), hold promise to optimize pan-sharpening-algorithm selection for machine interpretation workflows which are dependent on deriving reliable indices. Thus, this research was focused on introducing a statistical framework (
Figure 1) which can be used to assess the relative variability introduced in postprocessed pan-sharpened imagery and, ultimately, can be extensible to multiple application orientations, such as normalized difference water index (NDWI).
A review of the extant literature reveals that a framework for selecting among the myriad of widely adopted pan-sharpening algorithms in industry-standard software is lacking. The intellectual significance of this research lies in developing a framework for context-based pan-sharpening algorithm selection from a statistical perspective. Using one of the most common and frequently used spectral indices as a surrogate, the normalized difference vegetation index (NDVI), this research evaluated the performance of several pan-sharpening algorithms from a statistical perspective, or, more specifically, empirical cumulative distribution function (eCDF), which calculates the cumulative distribution of a sample’s empirical measure for a given variable value. This method is predicated on the assumption that differences in the overall spectral and radiometric information between an original and a fused image are the result of the pan-sharpening algorithm employed. The degree of spatial and spectral distortion introduced in a fused image can have significant impacts on the reliability of a derived index [
19,
20]. Thus, a comparison between the cumulative distribution of NDVI values in the original and fused images will allow us to quantify the degree of distortion introduced in the pan-sharpening process. The premise is that an ideal pan-sharpening algorithm will produce the same cumulative distribution of NDVI values of a given region from a MSf image as the original MS image. In other words, MSf NDVI’s empirical cumulative distribution pattern will be identical to that of NDVI calculated the original MS image.
4. Results and Discussion
A total of 30 eCDF plots were created for Landsat and proprietary satellite images since there are five areas of interest (AOIs) and six pan-sharpening methods. As aforementioned, NDVI values range from −1.0 to 1.0; however, the x-axis values in these graphs only range from −0.5 to 1.0 to enhance visualization of the cumulative NDVI distributions where there is the most variance. It should be noted that the eCDF values (probability values) range from 0.0 to 1.0, and, therefore, the y-axis values in these graphs range from 0.0 to 1.0.
Interpreting
Figure 4, we can see that all methods, except for GS, can be used to effectively pan-sharpen the entire collection of Landsat images (Rio de Janeiro; San Francisco; Stockholm; Washington, D.C.; and Tripoli) in the context of the NDVI, because the discrepancy between the baseline distribution and the post-distribution is visually minimal. The GS method cannot be used to effectively pan-sharpen any of the Landsat images. A further examination disclosed that the GS method involves an approximation process that could cause the discrepancy. To wit, the GS method approximates a low-spatial-resolution image from the high-spatial-resolution PAN image based on derived weights for each band. Weight selection is critical to the approximation process, and different band weights will create completely different low-spatial-resolution images. Currently, there are no well-established guidelines on how to select the weights for different satellite images, and most users use the software’s default weights for the approximation process. This could result in inappropriate weight selections for Landsat images, ultimately leading to the discrepancy between the baseline distribution pattern and the post-distribution pattern.
Further exploration of the eCDF graphs reveals that the GS method has varying impacts on different Landsat images. The degree of variation is seemingly unassociated with the relative percentage of vegetation within each scene, but the eCDF curves tend to converge near the extreme low and high ends, where cumulative values are close to 0% and 100%, respectively. This is likely because GS decorrelation creates a vector of n dimensions based on the number of pixels in each band, thereby substantially increasing the potential sources of distortion.
There are no generalizable patterns between pre- and post-sharpened NDVI datasets derived from using proprietary imagery (
Figure 4). That being said, the effective pan-sharpening methods vary for proprietary satellite images in the context of NDVI. For Rio de Janeiro, which is a Worldview-4 satellite image, the effective pan-sharpening methods are Esri, IHS, and SM in the context of NDVI, because the discrepancy between the baseline distribution and the post-distribution is visually minimal. Not surprisingly, both the Brovey and GS methods do not exhibit high performance, as a result of their computational complexity, as discussed above. The WRM method also shows some discrepancy. For San Francisco, none of the methods was effective in the context of NDVI; however, WRM may be effective because the discrepancy between the baseline distribution and the post-distribution is relatively small. Even so, further examination revealed that the discrepancy is large around the NDVI value of 0, potentially underestimating vegetation in built or barren areas. However, the Brovey, Esri, IHS, GS, and SM methods are not effective at all due to the large discrepancy. For Stockholm, the effective pan-sharpening methods are Brovey, Esri, IHS, and SM in the context of NDVI. However, the GS method and WRM method are not effective at all, due to the large discrepancy. For Washington, D.C., none of the pan-sharpening methods was effective in the context of NDVI since the discrepancy between the baseline distribution and the post-distribution is very large. That being said, when the Worldview-4 images for the Washington, D.C., area are used for pan-sharpening and subsequently used for NDVI analysis, Brovey, Esri, GS, IHS, SM, and WRM methods are not effective. Referring back to the GlobCover LU/LC distribution, we see that close to 95% of the Washington, D.C., scene consists of vegetation cover, followed by Rio de Janeiro (59%) and San Francisco (25%).
The results reveal that all tested NDVI eCDF distributions have varied results, which are summarized in
Table 5. The KS statistic,
D, is the absolute maximum distance between both eCDFs. For each pan-sharpened NDVI dataset,
D statistics with an associated
p-value less than 0.05 (95% confidence level) are statistically dissimilar from the pre-sharpened NDVI calculation, meaning that relatively smaller
D values and corresponding
p-values greater than 0.05 suggest that an algorithm is effective.
To summarize, all methods, except for GS, can be used to effectively pan-sharpen the entire collection of Landsat images (Rio de Janeiro; San Francisco; Stockholm; Washington, D.C.; and Tripoli) in the context of NDVI, because the p-values are less than 0.01 (99% confidence interval). It is also noteworthy that, despite the visual similarity between the pre-sharpened NDVI dataset and the one derived from WRM in the eCDF plot, the KS test reveals small but statistically significant (p < 0.01) dissimilarities in all scenes but Rio de Janeiro. However, for proprietary satellite images, the KS test results closely resemble the visual analysis detailed above. For Rio de Janeiro, the effective methods in the context of NDVI are SM, IHS, and Esri, while the ineffective methods are WRM, GS, and Brovey. For Stockholm, the effective pan-sharpening methods are Brovey, Esri, IHS, and SM in the context of NDVI, while the GS method and WRM method are not effective. None of the tested methods was effective in the context of NDVI for San Francisco; Washington, D.C.; and Tripoli. However, the intention of the current study is not to make specific recommendations as to which pan-sharpening algorithm is most appropriate given the geographic context, because the ultimate objective of each practitioner may vary within the domain of interpreting VIs.
One possible explanation is that, the more a given scene is dominated by vegetation cover, the less accurate are the NDVI computations when derived from pan-sharpened imagery, with no extensible rule about which algorithm is superior or inferior. To test this, we assessed the relationship between the distribution of NDVI values and percentage of vegetation cover for each scene. Because the distribution of NDVI values for all scenes is not normally distributed, the difference between median NDVI values of the pre- and post-fused images is used to represent the variation introduced by each pan-sharpening algorithm. These differences and the percent vegetation cover present in each scene were transposed into the dataset displayed in
Table 6.
The SW normality test indicates that the distributions of percent vegetative cover and differences in median NDVI are not normally distributed. As such, the Spearman rank correlation coefficient was calculated between the percent vegetation cover and differenced median NDVI for each algorithm. The results, which are displayed in
Table 7, indicate that, as vegetation cover increases, so too does the difference in median NDVI. The ρ values indicate a weak to moderate correlation (weak ρ = 0.1 = 0.3; moderate ρ = 0.3–0.5) that varies by the algorithm employed. Moreover, following the results of the KS test, this relationship is noticeably less pronounced when using the Brovey and GS algorithms, thus further suggesting an increased introduction of random distortion in the output PSf MS image. A simple linear regression between the percent of vegetation cover and difference in median NDVI shows that this relationship is significant (
p = 0.002), with an R
2 of 0.2144. Model diagnostics suggest that the linear regression is acceptable—the residuals are normally distributed with only negligible evidence of kurtosis, and only one point (derived by using the Washington, D.C., proprietary image) is an outlier, but with little to no leverage. When the data points for the GS and Brovey algorithms are removed, the model significance increases somewhat, while the R
2 remains unchanged.
These relatively low correlations and R2 values are deemed acceptable, and they are even expected, in this context. First, the global land-cover classification is derived from the ESA GlobCover product, which has a spatial resolution of approximately 500 m. Second, the GlobCover product is an average LU/LC classification over time, while the scenes analyzed in the present study are at a single discrete moment. Finally, an ideal pan-sharpening algorithm is characterized by minimal distortion, which can stem from a vast array of potential sources of error or uncertainty. Vegetation cover, as it stands, can still reasonably account for 21.4% of the model variability in difference in median NDVI over and above the grand mean.
Additionally, the effect of differential bandwidth between MS and PAN bands cannot be overlooked. All three sensors that were analyzed in the present study have contrasting bandwidths between the MS and PAN bands, so it follows that spectral information is lost during the pan-sharpening process. Matsuoka et al. [
40] conducted a sensitivity analysis exploring the effect of varying band position and bandwidth across multiple sensors on pan-sharpened images, finding that discrepancies in bandwidth do impart varying degrees. It is noteworthy that Matsuoka et al. identify a causal relationship between land-cover type and variability in fused image quality, specifically relating this variability to low contrast between vegetation and open water when employing the GS algorithm, regardless of sensor. This result is effectively replicated in the present analysis. To wit, GS is consistently outperformed by all other algorithms employed herein.
5. Conclusions
Pan-sharpening is an effective tool for increasing the visual interpretability of remotely sensed imagery in that it increases spatial resolution and provides a better visualization of a relatively low-spatial-resolution MS satellite image, using a single high-spatial-resolution PAN image. Moreover, pan-sharpening can improve the accuracy of image analysis, feature extraction, and modeling and classification [
41]. The spatial and spectral context of a scene is often overlooked within the body of application-oriented pan-sharpening literature [
5]. We highlighted one specific application in earth observation, NDVI, and systematically test the effects of multiple commercially available pan-sharpening algorithms on the calculation of NDVI. The assumption of the current study is that overall differences between the NDVI calculated from raw and pan-sharpened imagery is due to spectral or radiometric distortion, or loss of image fidelity, during the pan-sharpening process. The premise, therefore, is that an effective pan-sharpening algorithm will produce an image whose NDVI’s cumulative distribution pattern is similar or close to the original MS image.
We quantified the deviation introduced in the NDVI calculations after the pan-sharpening process and found that the deviation was increased by the spatial resolution of the input imagery and land-cover type—in this instance, the presence and abundance of vegetation. Similarly, error increases in geographic contexts with a higher degree of vegetative land cover. As such, we provide the following recommendations and procedure to assist users in selecting the most effective pan-sharpening algorithm for calculating the NDVI within their given context and application. First, avoid the GS and Brovey algorithms in favor of Esri, IHS, or SM for images with relatively coarse spatial resolution, such as Landsat, and exercise caution when selecting any pan-sharpening algorithm for imagery with a sub-meter ground sampling distance (GSD). Second, quantify the difference between the NDVI datasets calculated by using both the pre- and post-sharpened input imagery and select the algorithm that introduces the least distortion. Finally, the proposed analytical framework to quantitatively compare eCDF plots of the NDVI derived from using both pre- and post-fused imagery can assist with the qualitative identification of which algorithm introduces error and at which particular NDVI ranges of interest to the task at hand.
The framework proposed in the current study may also be extended to selecting the most effective pan-sharpening algorithms for different application contexts, such as soil or water indices. A limitation of the current framework is that the KS statistic represents the severity of difference between the eCDF functions being compared. It does not, however, quantify the overall extent or source of variance. Future research ought to focus on identifying the specific sources of distortion introduced by using any given pan-sharpening algorithm, the application of a sensitivity analysis to assess the effects of different band weights in pan-sharpening algorithms, and on the application of novel methods to the problem of pan-sharpening, which is outside the scope of the present study.