1. Introduction
Multimodal signal processing for sensor fusion is increasingly important in image sensing. Sensor fusion can combine beneficial information from different sensors to generate a richer single image. Image signal fusion approaches have various applications: RGB and infrared image fusion [
1,
2,
3], RGB and multispectral image fusion [
4], intercolor RGB signal fusion [
5,
6], RGB and depth fusion [
7,
8], RGB and light fusion [
9], RGB and computed edge fusion [
10], different focal image fusion [
11,
12], CT and MRI signal fusion for medical image processing [
3], Retinex-based enhancement [
13], SAR and multispectral image fusion [
14], and general signal fusion [
15].
Filtering is a basic tool for handling such multimodal signals. Multilateral filtering, which is one type of edge-preserving filtering, successfully handles multiple signal information. Edge-preserving filtering with additional guidance information, called joint edge-preserving filtering, recently attracted attention from image processing and computational photography researchers for sensor fusion. Joint edge-preserving filtering helps transfer major characteristics from guidance images, which are not filtering images themselves. Various applications use the filters, including flash/no-flash photography [
16,
17], up-sampling/super resolution [
18], compression noise removal [
19], alpha matting [
20], haze removing [
21], rain removing [
22], depth refinement [
23,
24], stereo matching [
25,
26], and optical flow estimation [
27].
Joint/cross bilateral filtering [
16,
17] is a seminal work of joint edge-preserving filtering. The filter is naturally derived from bilateral filtering [
28] by computing the kernel weight from a guidance image instead of an input filtering image. This formulation enables us to reflect the edge information of the guidance image (e.g., RGB, infrared, and hyperspectral images) to the filtering target image (e.g., RGB image, alpha mask, depth map, and optical flow).
We can expect a higher edge-preserving effect using multiple guidance images (e.g., a set of multi-sensor signals), and recently, we have been able to capture not only RGB images but also infrared, hyperspectral, depth, and other images by new devices (e.g., infrared/hyperspectral cameras and depth sensors). The images have different edge information and signal characteristics from RGB images. The multiple guidance information is helpful for improving signal visibility and the signal-to-noise ratio [
29,
30]. In other cases, we can deal with a self-generated image and inversely rendered maps as an additional guidance signal [
31,
32].
There are two categories for using multiple guidance images in image filtering: high-dimensional filtering [
30,
33,
34,
35,
36,
37] and multilateral filtering [
29,
31,
32,
38,
39]. The former is additive logic, and the latter is multiplicative logic for additional kernels. The main difference is the severity of the restriction to compute the kernel weight. The restriction of additive logic is looser than that of multiplicative logic; hence, high-dimensional filtering can robustly smooth out noise or rich textures. By contrast, since the restriction of the multiplicative logic is severe, multilateral filtering produces fewer blurred regions. Each filtering method has advantages and disadvantages, but multilateral filtering is preferred when we expect a sharply edge-preserving effect.
A critical issue of edge-preserving filtering for multimodel sensing is computational time. This is because sensing is the gateway to all processing, and signal processing during sensing is expected to operate in real time. Therefore, many researchers have proposed acceleration methods for edge-preserving filtering. In particular, the acceleration for bilateral filtering has been actively discussed. The bilateral grid [
40,
41] is the seminal approach, and Yang et al. [
42,
43] extend it to bilateral filtering in constant time. Yang’s method [
43] has adequate efficiency in grayscale images, and recent work further accelerates the bilateral filter [
44,
45,
46]; however, they are inefficient in color cases. There are several proposals [
34,
47,
48] to approximate and accelerate bilateral filtering in the case of higher-dimensional (color) images. Furthermore, hardware-friendly methods are proposed [
49,
50,
51]. However, these approaches have limitations for kernel weight, whose kernels are defined by the Gaussian distribution. Other efficient edge-preserving filters, which do not limit the Gaussian distribution, are proposed in contrast to the bilateral filtering acceleration. Guided image filtering [
20], domain transform filtering [
33], and adaptive manifold filtering [
30] are representative examples. These filters have assumptions different from Gaussian smoothing but have excellent edge-preserving effects and efficiency. Note that these filters can handle similar signals better than those with different modalities and characteristics.
Multiple guidance images provide richer information for various applications; however, these efficient methods cannot individually handle multiple guidance images. Therefore, we propose an efficient algorithm for accelerating multilateral filtering, which is developed for multiple-guidance image filtering. Furthermore, we extend the efficient edge-preserving filters so that they can exploit multiple guidance images.
Our algorithm is based on the fact that
n-lateral filtering is represented by the summation of (
)-lateral filtering. Therefore, when multilateral filtering is expanded as an asymptotic expression, it becomes constant-time filtering since 1-lateral filtering is spatial filtering.
Figure 1 denotes the overview of the proposed filter algorithm. The proposed filter—named DMF: decomposed multilateral filtering—recursively decomposes multilateral (
n-lateral) filtering by splatting to (
)-lateral filtering until it is a constant-time filter. Then, the results of constant-time filtering for the decomposed components are merged into the result of multilateral filtering.
The contributions of this paper are summarized as follows:
- 1.
Introducing a constant-time algorithm for multilateral filtering (
Section 5);
- 2.
Extending various filters (e.g., guided image filtering [
20] and domain transform filtering [
33]) to deal with multiple guidance information (
Section 6.1);
- 3.
Proposing a multilateral extension to the filter that uses the filtering output as a guidance image, such as rolling guidance filters [
52] (
Section 6.2).
2. Related Work
Due to physical constraints, a single image sensor cannot simultaneously capture rich information such as resolution, wavelength range, focus, dynamic range, and scene features. Image fusion is one way to solve this problem. Research on image fusion is active, with the number of papers increasing each year, as well as many survey papers [
53,
54,
55,
56,
57,
58,
59,
60,
61,
62]. Image fusion involves smoothing, denoising, enhancement, sharpening, super-resolution, and blending for multiple signals to obtain the desired signal. Image fusion is mainly divided into digital photography image fusion and multimodal image fusion.
Digital photography image fusion combines images taken by the same sensor with different sensor settings and includes multi-focus image fusion, multi-exposure image fusion, multi-temporal image fusion, and multi-view image fusion. In multi-focus image fusion, an all-in-focus image is synthesized from images taken at different focus settings, and in multi-exposure image fusion, a wide dynamic range image is synthesized from images taken at different dynamic ranges. Multi-exposure image fusion also includes the use of different external flash environments. Multi-temporal image fusion synthesizes signals that vary along a time axis, while multi-view image fusion synthesizes signals from camera motion or multiple cameras capturing a scene.
Multimodal image fusion combines different characteristics of multiple sensors into one, including RGB-IR fusion, multi-hyperspectral-panchromatic image fusion, RGB-depth/LiDAR fusion, and medical image fusion (CT, PET, MRI, SPECT, X-ray), etc. In RGB-IR fusion, visible images are combined with IR images, taking advantage of the high contrast of IR and the good texture characteristics of RGB in the visible region. It also combines images using the different wavelength bands that can be captured by external flashes. In multi-hyperspectral-panchromatic image fusion, sensors that acquire images in different wavelength bands and resolutions are combined, and each sensor often has a different resolution and noise sensitivity. The objective is to improve the resolution and noise sensitivity of each sensor. RGB-depth/LiDAR fusion corrects depth sensor output from RGB images, including upsampling of depth information, missing interpolation, and contour correction noise reduction. Medical image fusion integrates the output of various medical sensors in the same dimension to assist in the diagnosis.
Among these image fusion methods, those that improve the acquisition signal are called sharpening fusion, which aims at signal denoising, sharpening, contrast improvement, and resolution improvement. In image fusion, various tools are used, such as weighted smoothing filtering, morphology filtering, principal component analysis (PCA), Laplacian pyramid, discrete cosine transformation (DCT), discrete Fourier transformation (DFT), discrete wavelet transform (DWT), etc. This paper is an extension of the weighted smoothing method. In particular, the proposed method extends existing smoothing/weighted smoothing methods to guided smoothing and has a wide range of applications.
3. Preliminaries
In this section, we review the previous work of constant-time bilateral filtering proposed by Yang et al. [
42,
43]. Bilateral filtering [
28] is a representative edge-preserving smoothing filtering defined as a finite impulse response (FIR) manner. This filtering achieves edge-preserving effects by filtering in the range and spatial domains; thus, its filtering kernel weights are derived from a product of spatial and range weights based on a Gaussian distribution. Let input and output images be denoted as
, where
is the spatial domain,
is the range domain, and
d is the color range dimension (generally,
,
, and
), respectively. Bilateral filtering is formulated as follows:
where
represents a target pixel and a neighboring pixel of
, respectively.
are pixel values at
.
is a set of neighboring pixels of
.
are weight functions based on the Gaussian distribution whose smoothing parameters are
and
, respectively. Here, we can formulate joint bilateral filtering [
16,
17] by replacing
I in (
1) with an arbitrary additional guidance image
.
Naïve bilateral filtering is
per pixel algorithm, where
r is the filtering kernel radius; thus, the computational complexity increases exponentially when the kernel size is large. Several constant-time-per-pixel algorithms for bilateral filtering have been proposed to solve this problem. In particular, the algorithm proposed by Yang et al. [
42,
43] is the basis of the proposed method.
Yang et al. proposed a constant-time algorithm by extending the bilateral grid [
40,
41]. The algorithm decomposes bilateral filtering into a set of spatial filtering that can be computed in constant-time (e.g., box filter using integral image [
63,
64] and the recursive Gaussian filter [
65,
66,
67,
68]). The decomposition is conducted by computing principle bilateral filtered image components (PBFICs) [
43] from the input or guidance image. Since arbitrary range filtering weights can generate PBFICs, the algorithm can compute the arbitrary bilateral filtering response in the range kernel.
Yang’s algorithm [
43] is further extended to apply to multichannel images in [
42]. The extended algorithm computes multichannel images by preparing multichannel PBFICs with combinations of pixel values in each channel. However, this extension requires uniform processing for all channels. In other words, we cannot filter for each channel differently. This indicates that the algorithm is extendable when we compute multichannel or multiple guidance images with differential characteristics in each channel.
Our algorithm is inspired by Yang’s algorithm [
42,
43], which represents bilateral filtering by a set of spatial filtering. In contrast, our algorithm decomposes a filter for multichannel images into arbitrary constant-time filters.
4. Relationship between Multilateral Filtering and Higher-Dimensional Filtering
In this section, we compare the filtering properties between multilateral filtering (MF) and high-dimensional filtering (HDF). The main difference between them is the logic to compute the filtering weight. The weight of multilateral filtering
is computed by the multiplicative logic from spatial weight and range weights of multiple guidance images:
where
is a filtering weight for the
i-th guidance image
, where
is the range domain of
.
m is the number of guidance images. An early work on MF was proposed by Choudhury and Tumblin [
32]. Each range weight
for the guidance image is individually defined to represent the characteristics of the image.
HDF’s weight
is computed by the additive logic:
where
denotes an arbitrary weight function at the pixel
;
denotes an arbitrary norm function;
denotes higher-dimensional information consisting of spatial and range information, e.g.,
in RGB image, and
is the size of
. The work of Gastal and Oliveira [
30] is a successful extension for HDF with multiple guidance information. They exploited additional guidance information to increase higher-dimensional information
.
The two logics differ in terms of the severity of the restriction to compute the kernel weight; the multiplicative logic’s restriction is more severe than the additive logic. The difference affects the edge-preserving performance.
Figure 2 shows examples of HDF and MF weights. HDF assigns the low weights as a whole, even if the guidance pixel is hardly relevant to the target pixel. In contrast, MF assigns the low weights with the guidance pixel having a similar target pixel value. In this way, MF has a high edge preservation effect; hence, it is preferred when it is significant.