Next Article in Journal
New Insights on the Tensile Strength and Fracture Mechanism of c-ZrO2/α-Al2O3 Interfaces
Next Article in Special Issue
Super-Resolution Reconstruction of Depth Image Based on Kriging Interpolation
Previous Article in Journal
Assessment Possibilities of the Quality of Mining Equipment and of the Parts Subjected to Intense Wear
Previous Article in Special Issue
Deep Visual Waterline Detection for Inland Marine Unmanned Surface Vehicles
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

Efficient Re-Parameterization Residual Attention Network for Nonhomogeneous Image Dehazing

1
School of Ocean Information Engineering, Jimei University, Xiamen 361021, China
2
Fujian Provincial Key Laboratory of Oceanic Information Perception and Intelligent Processing, Xiamen 361021, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2023, 13(6), 3739; https://doi.org/10.3390/app13063739
Submission received: 1 February 2023 / Revised: 26 February 2023 / Accepted: 8 March 2023 / Published: 15 March 2023
(This article belongs to the Special Issue Recent Advances in Image Processing)

Abstract

:
Real-world nonhomogeneous haze brings challenges to image restoration. More efforts are needed to remove dense haze and thin haze simultaneously and efficiently. However, most existing dehazing methods do not pay attention to the complex distributions of haze and usually suffer from a low runtime speed. To tackle such problems, we present an efficient re-parameterization residual attention network (RRA-Net), whose design has three key aspects. Firstly, we propose a training-time multi-branch residual attention block (MRAB), where multi-scale convolutions in different branches cope with the nonuniformity of haze and are converted into a single-path convolution during inference. It also features local residual learning with improved spatial attention and channel attention, allowing dense and thin haze to be attended to differently. Secondly, our lightweight network structure cascades six MRABs followed by a long skip connection with attention and a fusion tail. Overall, our RRA-Net only has about 0.3M parameters. Thirdly, two new loss functions, namely the Laplace pyramid loss and the color attenuation loss, help train the network to recover details and colors. The experimental results show that the proposed RRA-Net performs favorably against state-of-the-art dehazing methods on real-world image datasets, including both nonhomogeneous haze and dense homogeneous haze. A runtime comparison under the same hardware setup also demonstrates the superior efficiency of the proposed network.

1. Introduction

Clear visibility is crucial to the success of outdoor computer vision tasks [1]. However, the quality of outdoor images often deteriorates due to haze, i.e., the existence of smoke, dust, fumes, mist, and other floating particles in the atmosphere. Such haze reduces the performance of a subsequent visual analysis. Moreover, haze distribution often has a non-homogeneous character in many real scenes, posing additional challenges [2]. Single-image dehazing as a fundamental low-level vision task aims to recover the latent haze-free image and has attracted increasing attention in the computer vision community over the past few decades [3,4,5,6,7,8].
In the early dehazing methods [9,10,11], image priors are usually used to estimate important parameters of the imaging model for hazy scenes. Then, image dehazing is fulfilled by solving the inverse problem of such an imaging model. However, these image priors are not always valid in complex, real hazy scenes. Additionally, the nonhomogeneous distribution of haze is certainly one factor that adds to the difficulties.
Recently, deep-learning-based methods [12,13,14,15,16,17,18,19,20] have directly learned the latent clean image from a single hazy image in an end-to-end manner and have shown promising performance in haze removal. However, the current deep-learning-based dehazing models often have a large number of parameters (e.g., the numbers of parameters of the recent two-branch neural network (TBNN) [19] and the Dehamer [20] network are around 50.40M and 132.45M, respectively) or suffer from a low runtime speed. In addition, to overcome the complex distributions of non-homogeneous images, a lightweight network to deal with nonhomogeneous haze effectively and efficiently is of great value. To this end, we design an efficient re-parameterization residual attention network featuring a training-time multi-branch esidual attention block, an end-to-end lightweight network structure and two new loss functions to achieve a good balance between the dehazing performance and model complexity. We conduct comparative experiments on both non-homogeneous and homogeneous haze scenes to demonstrate the superiority of the proposed RRA-Net. An ablation study is also conducted to validate the main modules of the proposed network.
Our main contributions are summarized as follows:
(1) A training-time multi-branch residual attention block is designed for coping with the non-uniformity of haze.
(2) An end-to-end lightweight network structure (with only about 0.3M parameters) is proposed, which cascades six MRABs followed by a long skip connection with attention and a fusion tail.
(3) Two novel loss functions, namely the Laplace pyramid loss and the color attenuation loss, are employed to train our RRA-Net.

2. Related Works

There are mainly two types of methods for single-image dehazing: prior-based methods and deep-learning-based methods.

2.1. Prior-Based Methods

This type of dehazing method utilizes prior-information based on observations about characteristics of haze-free images or haze-degraded images in order to solve the inverse problem of the atmospheric scattering model. He et al. [9] proposed the dark channel prior (DCP) for single-image haze removal. The DCP is based on the observation that the pixels of a haze-free image tend to have very low intensity in at least one color (red, green, or blue) channel. Tan [10] developed a cost function in the framework of Markov random fields, which was based on two basic observations: haze-free images have more contrast, and the airlight tends to be smooth. Zhu et al. [11] presented the color attenuation prior which states that the relationship among the scene depth, the brightness, and the saturation is linear. They estimated the depth map of the scene using this prior and a supervised linear model. Berman et al. [21] observed that the pixels of a haze-free image in a given cluster are often non-local, which was dubbed as the non-local prior. Bui et al. [22] constructed color ellipsoids that were statistically fitted to haze pixel clusters in the RGB space and then calculated the transmission values through color ellipsoid geometry. Yuan et al. [23] proposed a confidence prior to accurately estimate scene transmissions for image dehazing. However, these hand-crafted priors do not always hold true for diverse real-world scenes with different haze statistics. For example, the DCP is unreliable for sky regions or white objects.

2.2. Deep-Learning-Based Methods

With the availability of large scale datasets and the development of deep neural networks, data-driven deep-learning-based methods have achieved promising results in single-image dehazing.
At first, deep learning was used to estimate variables of the atmospheric scattering model in Equation (1) or certain derived variables [24,25,26,27]. For example, Cai et al. [24] proposed DehazeNet, which takes in a hazy image and outputs its transmission map. Apart from the transmission map, the global atmospheric light also needs to be estimated separately in order to recover a haze-free image via the atmospheric scattering model. Li et al. [25] reformulated the atmospheric scattering model and designed the all-in-one dehazing network (AOD-Net) to estimate a K variable that effectively combines the global atmospheric light and transmission map into one.
Later on, end-to-end convolutional neural networks (CNNs) were devised to directly estimate the haze-free image from a haze-degraded image [12,13,14,15,16,17,18,19]. Liu et al. [12] proposed GridDehazeNet, which implements a novel attention-based multi-scale estimation on a grid network for single-image dehazing. Qin et al. [13] presented the feature fusion attention network (FFA-Net) in which features at different levels are fused by an attention-based feature fusion structure combining channel attention with a pixel attention mechanism. While FFA-Net considers different channel-wise features and an uneven haze distribution, it carries out convolutional operations at the resolution of the original image, resulting in a large amount of calculation. Dong et al. [14] designed the multi-scale boosted dehazing network (MSBDN) with dense feature fusion based on the U-Net architecture, which incorporates the strengthen–operate–subtract boosting strategy in the decoder of the model. Das et al. [15] proposed a fast deep multi-patch hierarchical network (DMPHN) to restore non-homogeneous hazed images by aggregating features from multiple image patches from different spatial sections of the hazed image. Yu et al. [19] introduced a two-branch neural network to separately deal with non-uniformly distributed haze and the limited data challenge and fuse features from these two branches. Bu et al. [16] designed a generative adversarial network with residual guided filters that effectively obtains the contour information of a hazy image. Apart from designing more powerful CNNs, advanced learning techniques are also exploited for single-image dehazing. For example, contrastive learning is used to exploit both the information of hazy images and clear images as negative and positive samples [18]. While achieving promising dehazing results, these networks often have a large number of parameters or perform poorly on nonhomogeneous haze situations in the real world. Unlike the representative end-to-end CNN-based dehazing techniques, the proposed method takes into consideration both a lightweight network design and the ability to cope with the nonuniformity of haze. Specifically, we use multi-scale convolutions and an improved attention mechanism to remove dense haze and thin haze simultaneously.
Recently, visual transformers have also been utilized for image dehazing due to their capabilities for global modeling [20,28,29,30,31]. To take the advantage of both visual transformers and CNNs, Xu et al. [28] proposed a transformer–convolution fusion dehazing network. Guo et al. [20] brought a haze density-related prior into the transformer via a novel transmission-aware 3D position-embedding module and modulated the CNN features via learning modulation matrices conditioned on transformer features, instead of simple feature addition or concatenation. However, this transformer relies on large-scale training data for optimal performance. Gao et al. [29] proposed a transformer-based channel attention module combined with a spatial attention module to enhance a CNN-based backbone network. Despite the impressive performance improvements brought by combing transformers and CNNs, these hybird networks usually have a much higher complexity with more parameters and a slower runtime speed, hindering their applications in real-time scenarios, especially when dehazing high-definition images is necessary.

3. Materials and Methods

Despite the development of single-image dehazing techniques, effectively and efficiently recovering the haze-free images from under nonhomogeneous haze is still a challenge from a technological perspective. Thus, we propose the following methodology.

3.1. Case Study and Definitions

Nonhomogeneous haze is common in real-world hazy weather conditions. One example of hazy images taken in such scenes is shown in Figure 1a, where the haze density is nonuniform across the image. It can been that the image suffers from low contrast and detail loss. Additionally, the degree of image degradation is closely related to the haze density.
It is well-established that such image degradation caused by haze can be formulated by the atmospheric scattering model [9]:
I ( x ) = J ( x ) t ( x ) + A ( 1 t ( x ) ) ,
where I ( x ) is the observed hazy image, J ( x ) is the latent haze-free image, t ( x ) is the transmission map, and A is the global atmospheric light. In previous studies, the transmission map t is sometimes assumed to be attenuated exponentially with the scene depth d, i.e., t ( x ) = e β d ( x ) , where β is the scattering coefficient of the atmosphere. However, this relation is invalid under nonhomogeneous hazy scenes since the assumed transmission map is only distance-dependent, and hence inconsistent, with the randomness of nonhomogeneous haze.
Given an observed hazy I ( x ) , the task of single-image dehazing aims to recover the latent haze-free J ( x ) . To this end, we take the deep learning approach and propose an end-to-end RRA-Net model to tackle the challenge of nonhomogeneous degradations. In the meantime, high dehazing efficiency is taken into consideration to make sure of its practicability in real-time applications.

3.2. Proposed Model

Figure 2 shows the architecture of our RRA-Net, which has a lightweight, shallow structure. Specifically, we design a novel training-time MRAB as the basic block. Our RRA-Net cascades 6 MRABs, followed by a long skip connection with attention and a fusion tail. In this subsection, the MRAB is introduced first, followed by the details of the RRA-Net’s lightweight structure and the loss function used during training.

3.2.1. Training-Time Multi-Branch Residual Attention Block

Inspired by RepVGG [32], we design a training-time multi-branch structure with an improved batch normalization (BN) strategy. It consists of one 3 × 3 convolution, one 1 × 1 convolution, and one identity connection, as shown in Figure 3. This parallel multi-branch structure allows the convolutions of different receptive field sizes to extract features at different scales and to cope with the nonuniformity of haze. During inference, the multi-branch convolutions and the following BN are converted into a single-path 3 × 3 convolution through the technique of re-parameterization, largely reducing the inference time.
Furthermore, we do not use BN in each branch, as was used in RepVGG, but rather, we use a single BN layer to stabilize the training after summing up the outputs of 3 convolutions. Since BN actually destroys the internal features of the sample, we believe that if individual BN layers are present in different branches, the connection between the features in different branches will be degraded more seriously, incurring a performance loss.
The MRAB also utilizes local residual learning with attention. The local residual connection allows features from haze-free areas to pass directly without processing so that the multi-branch convolutions can deal with the hazy areas in the image. Additionally, our attention module, consisting of an improved spatial attention (SA) layer and a channel attention (CA) layer, allows dense and thin hazy areas in the image be attended to differently.
In Section 3.3, detailed ablation studies are conducted to analyze the MRAB and verify the effectiveness of its structure and the BN strategy. Below, we provide the details of this basic block.

Inference-Time Re-Parameterization of MRAB

During inference, the multi-branch convolutions and the following BN are converted into a single-path 3 × 3 convolution through the technique of re-parameterization. Let X denote the input of the MRAB, and Y denote the output of BN. The process of re-parameterization can be described as:
Y = B N ( W 3 * X + W 1 * X + X ) ,
where W 3 denotes the kernel of the 3 × 3 convolution branch, and W 1 denotes the kernel of the 1 × 1 convolution branch. Since the identity branch can be expressed as a particular 1 × 1 convolution, and the 1 × 1 convolution can be considered a 3 × 3 convolution with zero padding, the re-parameterization during inference is:
Y = B N ( W 3 * X + W 1 * X + X ) = B N ( W 3 * X + W 1 3 * X + W i d e n t i t y 3 * X ) = B N ( ( W 3 + W 1 3 + W i d e n t i t y 3 ) * X ) ,
where W 1 3 and W i d e n t i t y 3 denote the equivalent kernels of the 1 × 1 convolution branch and the identity branch after re-parameterization.
Furthermore, B N in the above equation can also be integrated into convolution during inference. Let { W , b } represent the 3 × 3 convolution kernel and bias obtained by the three-way branch fusion in Equation (3), and μ , σ , γ , β , respectively, represent the accumulated mean, standard deviation, learned scaling factor, and bias of the B N layer. Then, the convolution kernel and bias obtained after integrating B N are:
W = γ σ W , b = γ ( b μ ) σ + β .

Spatial Attention

Since the haze distribution often has a non-homogeneous character in many real scenes, a spatial attention mechanism is advantageous for dealing with such a situation. We design an improved spatial attention module, where 3D MaxPool, 1 × 1 convolution, ReLU, and 1 × 1 convolution are applied subsequently, and then a Sigmoid operation is used to output the attention weights:
Y s a = σ ( C o n v ( R e l u ( C o n v ( M a x P o o l ( X ) ) ) ) ) ,
where X is the input feature, σ is the Sigmoid function, and Y s a is the spatial attention weights. The output Y is derived as the element-wise product of the input X and the spatial attention weights Y s a :
Y = Y s a X .

Channel Attention

Different channels of a feature map often have different degrees of importance; thus, the channel attention mechanism is necessary for low-level vision tasks, such as dehazing. We follow the channel attention design in [13].

3.2.2. Details of RRA-Net’s Lightweight Structure

In RRA-Net, as shown in Figure 2, a 3 × 3 convolution is firstly applied to the hazy image, outputting a feature map of the shape 64 × H × W where H and W are the height and width of the input image. Then, a 3 × 3 , stride 2 convolution is used to downsample the feature map, which doubles the receptive fields of the subsequent convolutions. After that, 6 MRABs are applied subsequently to gradually extract features, followed by a pixel shuffle layer, which upscales the feature map back to the original spatial size H × W and reduces the number of channels to 16. To avoid the loss of shallow features, before the final fusion tail, a long skip connection with attention is used to introduce shallow features. Specifically, the feature map after the pixel shuffle passes through a 5 × 5 convolution, a spatial attention layer, and a channel attention layer. In the meantime, the shallow features from the first 3 × 3 convolution are channel shrunken and long skip connected. The final fusion tail consists of a reflection padding, a 7 × 7 convolution, and a residual connection. The reflection padding operation alleviates the image boundary distortion problem caused by the following convolution of the large kernel size. The 7 × 7 convolution outputs the difference Y res between the input hazy image I hazy and the recovered clean image R R A ( I hazy ) . Thus, the output of the proposed RRA-Net is
R R A ( I hazy ) = Y res + I hazy .
Such a multi-stage fusion strategy has the advantages of combining shallow and deep features and avoiding the instability caused by the upsampling process. The structure of RRA-Net achieves a good balance between the dehazing performance and model complexity. Overall, it only has about 0.3M parameters.

3.2.3. Loss Function

To recover high-frequency details and colors, we design two new loss functions, namely the Laplace pyramid loss and the color attenuation loss. During training, the total loss function is
L Θ = L 1 + α 1 L Laplace + α 2 L CA ,
where Θ denotes the parameters of RRA-Net, L 1 is the L1-norm loss, L Laplace is the Laplace pyramid loss, L CA is the color attenuation loss, and α 1 , α 2 are the balancing coefficients. It is worth noting that we do not adopt the commonly used perceptual loss or GAN loss.

Laplace Pyramid Loss

The Laplace pyramid is commonly used to extract high-frequency features from an image. To calculate the Laplace pyramid loss, the recovered clean image R R A ( I hazy ) and its ground-truth I gt undergo the same process to build two 3-layer Laplace pyramids. Let G 1 denote R R A ( I hazy ) or I gt . G 1 is downsampled 3 times successively by Gaussian filtering to generate G k , k = 2 , 3 , 4 . The Laplace pyramid is constructed as L P k = G k u ( G k + 1 ) , k = 1 , 2 , 3 , where u is the bilinear interpolation upsampling operation. Finally, the proposed Laplace pyramid loss is calculated as
L Laplace = 1 N k = 1 3 i = 1 N L P k ( R R A ( I hazy i ) ) L P k ( I gt i ) 2 2 .

Color Attenuation Loss

The saturation and brightness of an image are influenced by nonhomogeneous haze. The saturation of the hazy area usually is decreased because the color fades under haze, while the brightness of the hazy area is usually increased. A color attenuation prior was presented in [11] where the haze concentration P ( x ) at position x can be calculated by the following formula:
P ( x ) = S ( I ( x ) ) V ( I ( x ) ) 1 ,
where S ( I ( x ) ) is the saturation at position x of image I, and V ( I ( x ) ) is the brightness at position x of image I. Inspired by this prior, we designed a color attenuation loss L CA to regulate the RRA-Net to recover the saturation and brightness. L CA is calculated as:
L CA = α 1 N i = 1 N S ( R R A I hazy i ) S ( I g t i ) 2 2 + β 1 N i = 1 N V ( R R A I hazy i ) V ( I g t i ) 2 2 ,
where α and β are the balancing coefficients. Empirically, the values of α and β are set to 1 and 0.5, respectively.

3.3. Experimental Validation

To evaluate the performance of RRA-Net on the single-image dehazing task, we compare it with other state-of-the-art methods both quantitatively and qualitatively. To this end, we construct the training dataset to train the models and the testing dataset for model evaluation. Two evaluation metrics are calculated to compare different models. We also conduct a series of ablation studies to verify the effectiveness of the main components of the proposed RRA-Net.
In this subsection, the datasets and evaluation metrics used in the experiments are introduced first, followed by some implementation details. Finally, the comparative experiments and ablation studies are described.

3.3.1. Datasets

Since RRA-Net is designed to dehaze realistic images, we train and test RRA-Net on real-world hazy image datasets. We take the images from the I-Haze [33] dataset, O-Haze [34] dataset, and NH-Haze (2020) [2] dataset together to form a training set. The I-Haze dataset contains 35 image pairs of hazy and corresponding haze-free (ground-truth) indoor images. The O-Haze dataset consists of 45 pairs of hazy and corresponding haze-free outdoor images. In practice, the hazy images have been captured in presence of real haze, generated by professional haze machines. The NH-Haze (2020) dataset is the first non-homogeneous image-dehazing dataset and contains 55 outdoor image pairs, among which 50 pairs are usually used for training and 5 pairs are for testing. The non-homogeneous haze has been introduced in the scene using a professional haze generator that imitates real conditions of hazy scenes. In our experiments, the constructed training set consists of 130 image pairs in total, including 35 pairs from the I-Haze dataset, 45 pairs from the O-Haze dataset, and 50 pairs from the NH-Haze (2020) dataset.
As for the test dataset, we use the NH-Haze (2021) [35] dataset and the Dense-Haze [36] dataset. The NH-Haze (2021) dataset is a non-homogeneous realistic dataset with pairs of real hazy and corresponding haze-free images. Additionally, the Dense-Haze dataset is characterized by dense and homogeneous hazy scenes.

3.3.2. Evaluation Metrics

To quantitatively evaluate the performance of our RRA-Net, we adopt the peak signal-to-noise ratio (PSNR) and the structural similarity index (SSIM) [37] as evaluation metrics.
The PSNR is the ratio between the maximum possible power of a signal and the power of discrepancy that affects the quality of its representation. It is defined via the Mean Squared Error (MSE). Given an H × W dehazed image I d e h a z e d and its corresponding haze-free image I g t , the MSE is defined as
M S E = 1 H W x I d e h a z e d ( x ) I g t ( x ) 2 2 ,
where x indexes the pixel position of the image. Then, the PSNR (in dB) is defined as
P S N R = 10 · log 10 ( 255 2 M S E ) .
On the other hand, the SSIM is a quality assessment metric based on the degradation of structural information. It extracts three key features from an image, namely the luminance, contrast, and structure. The comparison between I d e h a z e d and I g t is performed using three comparison functions on the basis of these three features. Then, the SSIM is defined as
S S I M = [ l ( I d e h a z e d , I g t ) ] α · [ c ( I d e h a z e d , I g t ) ] β · [ s ( I d e h a z e d , I g t ) ] γ ,
where l ( I d e h a z e d , I g t ) , c ( I d e h a z e d , I g t ) , and s ( I d e h a z e d , I g t ) are the luminance, contrast, and structure comparison functions, respectively, as defined in [37], and α > 0 , β > 0 , and γ > 0 are the parameters used to adjust the relative importance of the three components. We set α = β = γ = 1 in the experiments since this setting is commonly used.

3.3.3. Implementation Details

During training, since there are only 130 pairs of a hazy image and its ground-truth, the training dataset is augmented with random rotation and horizontal flip. Then, we divide the hazy image and its ground-truth into image patches of the size 128 × 128 and feed them to RRA-Net. The network is trained for 5 × 10 5 steps on the training dataset. We use the Adam optimizer whose parameters β 1 and β 2 are set to 0.9 and 0.999, respectively. The initial learning rate is set to 6 × 10 4 , and we adopt the cyclical learning rate strategy [38] to adjust the learning rate from the initial value to 1.2 × 10 3 with a step size of 10. The RRA-Net model is implemented in Pytorch [39] and run on 2 RTX 2080Ti GPUs.
During testing, we pass each whole image from the testing dataset into RRA-Net and other models directly and average the PSNR and SSIM metrics over the testing dataset.

3.3.4. Comparative Experiments and Ablation Studies

Comparison with State-of-the-Art Methods

We compare our RRA-Net with 7 single-image dehazing methods, including the traditional DCP [9] and SOTA deep-learning-based methods. Among the latter type of methods, AOD-Net [25], FFA-Net [13], MSBDN [14], DMPHN [15], and TBNN [19] are based on CNNs, and the Dehamer [20] is based on a transformer. The evaluations are conducted on the NH-Haze (2021) and Dense-Haze datasets.
For each compared model, the PSNR and SSIM metrics are calculated between the dehazed results and the haze-free images of the testing set. A higher PSNR score or a higher SSIM score indicates a better dehazing performance.
To validate the efficiency of the proposed RRA-Net, we also conduct a comparative runtime test. Specifically, images of the size 1600 × 1200 are passed into each compared model. Additionally, then, the average processing FPS of each model is calculated. A higher FPS score indicates a better dehazing speed and, hence, a higher efficiency.

Ablation Studies

To validate our design choices for RRA-Net, we conduct a series of ablation studies on modules of the MRAB structure, two newly designed loss functions, and the batch normalization strategy. In each ablation study, the PSNR and SSIM metrics are calculated to show how these design choices affect the dehazing performance of RRA-Net.

4. Results

In this section, the results of the comparative experiments are presented and discussed first to evaluate the dehazing performance of the proposed RRA-Net. Then, the results of the ablation studies are given to validate several design choices for the proposed model.

4.1. Results of Comparative Experiments

As shown in Table 1, our RRA-Net outperforms all of the listed SOTA methods on both datasets in terms of the PSNR and SSIM metrics. For the NH-Haze (2021) dataset with nonhomogeneous haze scenes, RRA-Net has a 0.5dB lead in terms of the PSNR over the second-best method. Additionally, note that RRA-Net works well on both nonhomogeneous hazy images and homogeneous dense hazy images. For the Dense-Haze dataset, RRA-Net achieves a PSNR of 15.78 dB, exceeding the second-best method by 0.37 dB.
Visual comparisons of the recovered results on nonhomogeneous hazy scenes are given in Figure 4 and Figure 5. It can be seen that RRA-Net produces results with less residual haze and preserves more image details and contrast. For example, in Figure 4, the results of the FFA-Net and TBNN methods still have some residual haze; the result of MSBDN loses pixel color saturation to some extent and has a whitish appreance, while the result of DMPHN introduces visible artifacts in the sky area. In addition, the Dehamer method cannot effectively remove the nonhomogeneous haze from either sample given in Figure 4 and Figure 5.
A visual comparison of the homogeneous dense hazy scene is given in Figure 6. It can be seen that in such a case, RRA-Net is also capable of producing a result with less residual haze.
Apart from having better dehazing results, RRA-Net is designed to be lightweight from the beginning. Under the same configuration, it has only about 0.3M parameters and runs at 166 FPS to dehaze a 1600 × 1200 image on RTX 2080Ti GPU. As can be seen from Table 1, RRA-Net runs faster than the other methods, except for AOD-Net, which falls behind in terms of the PSNR and SSIM metrics. The speed comparison is conducted under the same hardware setup, demonstrating the superior efficiency of the proposed network.

4.2. Results of Ablation Studies

The ablation studies are conducted to validate several key choices in the design of the proposed RRA-Net, whose results are described as follows.

4.2.1. Effect of MRAB Structure

We first verify the effectiveness of different modules of the basic MRAB via ablation experiments on the NH-Haze (2021) dataset.
The results are listed in the top rows of Table 2. In the S1 setting, the MRAB consists of only multi-branch convolutions and the LeakyReLU layer. On top of S1, we applied the other components to form different settings: S2 (apply the BN layer and Attention Module) and RRA-Net (apply the BN layer, attention module, and local residual connection). It can be seen that the inclusion of the BN layer and attention mechanism greatly improve the dehazing performance of the network, and the local residual connection is also beneficial.

4.2.2. Effect of Two Novel Loss Functions

Secondly, the effectiveness of the two newly proposed loss functions is also verified by the ablation study, as listed in the bottom rows of Table 2. In the S3 setting, only the L 1 loss function is used during training. On top of that, the L CA loss function is added to the S4 setting, which regulates the network to recover the saturation and brightness. It can be seen that S4 enjoys a moderate PSNR gain, while suffering a slight loss in terms of the SSIM. Since L CA is calculated in a pixel-wise way, it does not focus on the recovery of the structural information, which is evaluated by the SSIM. Finally, in the RRA-Net setting, both the L CA and L Laplace loss functions are used during training. With the help of L Laplace , which is designed to protect high-frequency features, both the PSNR and SSIM metrics are lifted substantially from 19.21 dB and 0.746 to 19.813 dB and 0.765, respectively, as shown in the last row of Table 2. Through this ablation study, we find that using a combination of L CA and L Laplace can boost the dehazing performance effectively since these two loss functions are designed from different perspectives.

4.2.3. Effect of the Batch Normalization Strategy

During training, we use a single BN layer after summing up the outputs of three convolution branches, rather than applying individual BN layers on each branch as applied in RepVGG. Table 3 shows the performance of these two different BN strategies. It can be seen that the BN strategy we propose results in much higher PSNR and SSIM scores, demonstrating that applying a single BN after multi-branch convolutions can stabilize the training more effectively and demonstrate improvement over the original strategy in RepVGG.

5. Conclusions and Further Studies

This paper proposes an efficient RRA-Net to perform single-image dehazing. The contributions of the work are a training-time multi-branch residual attention block, an end-to-end lightweight network structure, and two new loss functions. Multi-branch convolutions in the MRAB are able to deal with non-uniformity of haze. Additionally, due to lightweight design, RRA-Net only has about 0.3M parameters, achieving a good balance between the dehazing performance and model complexity. The experimental results demonstrate that RRA-Net outperforms other SOTA methods on both realistic nonhomogeneous and homogeneous hazy image datasets.
With the help of its lightweight network structure, RRA-Net is able to dehaze 1600 × 1200 HD images at a speed of 166 fps on a computer with 2 RTX 2080Ti GPUs, demonstrating its application potential in real-time dehazing. RRA-Net can not only produce high-quality clean images for better human perception but also contribute to the success of subsequent real-world vision tasks in hazy weather, such as object detection and autonomous driving.
Despite the superiority of RRA-Net, it is CNN-based and lacks ability for global modeling. As mentioned earlier, recently, there have been some efforts to take advantage of both visual transformers and CNNs. However, such hybrid networks usually have a much higher complexity. Thus, it is worth investigating how to bestow RRA-Net with the merits of transformers while keeping the network as efficient as possible at the same time.

Author Contributions

Conceptualization, E.C. and T.Y.; methodology, E.C. and T.Y.; software, T.Y. and J.J.; validation, J.J. and L.T.; formal analysis, Q.Y. and L.T.; investigation, E.C. and T.Y.; resources, Q.Y.; data curation, T.Y.; writing—original draft preparation, T.Y., J.J. and Q.Y.; project administration, Q.Y.; funding acquisition, E.C. and Q.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Fujian Province of China under grant no. 2021J01867, the Scientific Research Foundation of Jimei University under grant no. ZQ2018012, and the Xiamen Municipal Bureau of Ocean Development under grand no. 22CZB013HJ04.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: https://data.vision.ee.ethz.ch/cvl/ntire18/o-haze/ (acessed on 15 January 2021), https://data.vision.ee.ethz.ch/cvl/ntire18//i-haze/ (acessed on 15 January 2021), https://data.vision.ee.ethz.ch/cvl/ntire20/nh-haze/ (acessed on 1 April 2021) and https://data.vision.ee.ethz.ch/cvl/ntire19//dense-haze/ (acessed on 15 January 2021).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AOD-NetAll-in-One Dehazing Network
BNBatch Normalization
CAColor Attenuation
CNNConvolutional Neural Network
DCPDark Channel Prior
DMPHNDeep Multi-Patch Hierarchical Network
FFA-NetFeature Fusion Attention Network
GANGenerative Adversarial Network
MRABMulti-branch Residual Attention Block
MSBDNMulti-Scale Boosted Dehazing Network
PSNRPeak Signal-to-Noise Ratio
RRA-NetRe-parameterization Residual Attention Network
SSIMStructural Similarity Index
TBNNTwo-Branch Neural Network
TCAMTransformer-based Channel Attention Module

References

  1. Agrawal, S.C.; Jalal, A.S. A Comprehensive Review on Analysis and Implementation of Recent Image Dehazing Methods. Arch. Comput. Methods Eng. 2022, 29, 4799–4850. [Google Scholar] [CrossRef]
  2. Ancuti, C.O.; Ancuti, C.; Timofte, R. NH-HAZE: An image dehazing benchmark with non-homogeneous hazy and haze-free images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 444–445. [Google Scholar] [CrossRef]
  3. Gui, J.; Cong, X.; Cao, Y.; Ren, W.; Zhang, J.; Zhang, J.; Tao, D. A Comprehensive Survey on Image Dehazing Based on Deep Learning. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, Virtual, 19–26 August 2021; pp. 4426–4433. [Google Scholar] [CrossRef]
  4. Singh, D.; Kumar, V. A comprehensive review of computational dehazing techniques. Arch. Comput. Methods Eng. 2019, 26, 1395–1413. [Google Scholar] [CrossRef]
  5. Li, Y.; You, S.; Brown, M.S.; Tan, R.T. Haze visibility enhancement: A survey and quantitative benchmarking. Comput. Vis. Image Underst. 2017, 165, 3. [Google Scholar] [CrossRef] [Green Version]
  6. Harish Babu, G.; Venkatram, N. A survey on analysis and implementation of state-of-the-art haze removal techniques. J. Vis. Commun. Image Represent. 2020, 72, 102912. [Google Scholar] [CrossRef]
  7. Wang, W.; Yuan, X. Recent advances in image dehazing. IEEE/CAA J. Autom. Sin. 2017, 4, 410–436. [Google Scholar] [CrossRef]
  8. Ye, T.; Zhang, Y.; Jiang, M.; Chen, L.; Liu, Y.; Chen, S.; Chen, E. Perceiving and Modeling Density for Image Dehazing. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2022; Springer: Cham, Switzerland, 2022; pp. 130–145. [Google Scholar] [CrossRef]
  9. He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [CrossRef] [PubMed]
  10. Tan, R.T. Visibility in bad weather from a single image. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar] [CrossRef]
  11. Zhu, Q.; Mai, J.; Shao, L. Single Image Dehazing Using Color Attenuation Prior. In Proceedings of the British Machine Vision Conference, Nottingham, UK, 1–5 September 2014; BMVA Press: Nottingham, UK, 2014. [Google Scholar] [CrossRef] [Green Version]
  12. Liu, X.; Ma, Y.; Shi, Z.; Chen, J. Griddehazenet: Attention-based multi-scale network for image dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7314–7323. [Google Scholar] [CrossRef] [Green Version]
  13. Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11908–11915. [Google Scholar] [CrossRef]
  14. Dong, H.; Pan, J.; Xiang, L.; Hu, Z.; Zhang, X.; Wang, F.; Yang, M.H. Multi-scale boosted dehazing network with dense feature fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2157–2167. [Google Scholar] [CrossRef]
  15. Das, S.D.; Dutta, S. Fast deep multi-patch hierarchical network for nonhomogeneous image dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 482–483. [Google Scholar] [CrossRef]
  16. Bu, Q.; Luo, J.; Ma, K.; Feng, H.; Feng, J. An enhanced pix2pix dehazing network with guided filter layer. Appl. Sci. 2020, 10, 5898. [Google Scholar] [CrossRef]
  17. Chen, Z.; Wang, Y.; Yang, Y.; Liu, D. PSD: Principled Synthetic-to-Real Dehazing Guided by Physical Priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 7180–7189. [Google Scholar] [CrossRef]
  18. Wu, H.; Qu, Y.; Lin, S.; Zhou, J.; Qiao, R.; Zhang, Z.; Xie, Y.; Ma, L. Contrastive learning for compact single image dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10551–10560. [Google Scholar] [CrossRef]
  19. Yu, Y.; Liu, H.; Fu, M.; Chen, J.; Wang, X.; Wang, K. A Two-Branch Neural Network for Non-Homogeneous Dehazing via Ensemble Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Nashville, TN, USA, 20–25 June 2021; pp. 193–202. [Google Scholar] [CrossRef]
  20. Guo, C.L.; Yan, Q.; Anwar, S.; Cong, R.; Ren, W.; Li, C. Image Dehazing Transformer with Transmission-Aware 3D Position Embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 5812–5820. [Google Scholar] [CrossRef]
  21. Berman, D.; Avidan, S.; Treibitz, T. Non-local image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1674–1682. [Google Scholar] [CrossRef]
  22. Bui, T.M.; Kim, W. Single Image Dehazing Using Color Ellipsoid Prior. IEEE Trans. Image Process. 2018, 27, 999–1009. [Google Scholar] [CrossRef] [PubMed]
  23. Yuan, F.; Zhou, Y.; Xia, X.; Qian, X.; Huang, J. A confidence prior for image dehazing. Pattern Recognit. 2021, 119, 108076. [Google Scholar] [CrossRef]
  24. Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. Dehazenet: An end-to-end system for single image haze removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. Aod-net: All-in-one dehazing network. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4770–4778. [Google Scholar] [CrossRef]
  26. Ren, W.; Liu, S.; Zhang, H.; Pan, J.; Cao, X.; Yang, M.H. Single image dehazing via multi-scale convolutional neural networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Cham, Switzerland, 2016; pp. 154–169. [Google Scholar] [CrossRef]
  27. Li, Z.; Zheng, C.; Shu, H.; Wu, S. Dual-Scale Single Image Dehazing via Neural Augmentation. IEEE Trans. Image Process. 2022, 31, 6213–6223. [Google Scholar] [CrossRef] [PubMed]
  28. Xu, J.; Chen, Z.X.; Luo, H.; Lu, Z.M. An Efficient Dehazing Algorithm Based on the Fusion of Transformer and Convolutional Neural Network. Sensors 2023, 23, 43. [Google Scholar] [CrossRef] [PubMed]
  29. Gao, G.; Cao, J.; Bao, C.; Hao, Q.; Ma, A.; Li, G. A Novel Transformer-Based Attention Network for Image Dehazing. Sensors 2022, 22, 3428. [Google Scholar] [CrossRef] [PubMed]
  30. Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. SwinIR: Image Restoration Using Swin Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar] [CrossRef]
  31. Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 17683–17693. [Google Scholar] [CrossRef]
  32. Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar] [CrossRef]
  33. Ancuti, C.O.; Ancuti, C.; Timofte, R.; Vleeschouwer, C.D. I-HAZE: A dehazing benchmark with real hazy and haze-free indoor images. arXiv 2018, arXiv:1804.05091v1. [Google Scholar]
  34. Ancuti, C.O.; Ancuti, C.; Timofte, R.; De Vleeschouwer, C. O-haze: A dehazing benchmark with real hazy and haze-free outdoor images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 754–762. [Google Scholar] [CrossRef] [Green Version]
  35. Ancuti, C.O.; Ancuti, C.; Vasluianu, F.A.; Timofte, R. NTIRE 2021 nonhomogeneous dehazing challenge report. In Proceedings of the EEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Nashville, TN, USA, 19–25 June 2021; pp. 627–646. [Google Scholar] [CrossRef]
  36. Ancuti, C.O.; Ancuti, C.; Sbert, M.; Timofte, R. Dense haze: A benchmark for image dehazing with dense-haze and haze-free images. In Proceedings of the IEEE International Conference on Image Processing, (IEEE ICIP 2019), Taipei, Taiwan, 22–25 September 2019. [Google Scholar] [CrossRef] [Green Version]
  37. Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  38. Smith, L.N. Cyclical learning rates for training neural networks. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017; IEEE: New Tork, NY, USA, 2017; pp. 464–472. [Google Scholar] [CrossRef] [Green Version]
  39. Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic Differentiation in PyTorch 2017. Available online: https://openreview.net/pdf?id=BJJsrmfCZ (accessed on 15 January 2020).
Figure 1. Visual result of the proposed method on NH-Haze (2021) dataset. Our method is able to produce haze-free images with high perceptual quality.
Figure 1. Visual result of the proposed method on NH-Haze (2021) dataset. Our method is able to produce haze-free images with high perceptual quality.
Applsci 13 03739 g001
Figure 2. The efficient RRA-Net architecture. It cascades 6 MRABs, followed by a long skip connection with attention and a fusion tail.
Figure 2. The efficient RRA-Net architecture. It cascades 6 MRABs, followed by a long skip connection with attention and a fusion tail.
Applsci 13 03739 g002
Figure 3. Structures of MRAB during training (left) and during inference (right).
Figure 3. Structures of MRAB during training (left) and during inference (right).
Applsci 13 03739 g003
Figure 4. Qualitative comparison of dehazing results on one nonhomogeneous hazy image from NH-Haze (2021) dataset.
Figure 4. Qualitative comparison of dehazing results on one nonhomogeneous hazy image from NH-Haze (2021) dataset.
Applsci 13 03739 g004
Figure 5. Qualitative comparison of dehazing results on another nonhomogeneous hazy image from NH-Haze (2021) dataset.
Figure 5. Qualitative comparison of dehazing results on another nonhomogeneous hazy image from NH-Haze (2021) dataset.
Applsci 13 03739 g005
Figure 6. Qualitative comparison of dehazing results on an image with dense haze from the Dense-Haze dataset.
Figure 6. Qualitative comparison of dehazing results on an image with dense haze from the Dense-Haze dataset.
Applsci 13 03739 g006
Table 1. Quantitative comparisons with SOTA methods on the realistic datasets. Best results are underlined. The second-best results are in Bold.
Table 1. Quantitative comparisons with SOTA methods on the realistic datasets. Best results are underlined. The second-best results are in Bold.
MethodNH-Haze (2021)Dense-HazeParamRuntime (fps)
@1600 × 1200
PSNR↑SSIM↑PSNR↑SSIM↑
DCP [9] (TPAMI’10)10.570.5210.060.3856--
AOD-Net [25] (TIP’17)14.1040.55213.340.42440.002M2598.3
FFA-Net [13] (AAAI’20)19.10.74814.310.47974.68M1.28
MSBDN [14] (CVPR’20)19.310.75915.410.485831.35M36.84
DMPHN [15] (CVPRW’20)18.1840.74514.010.44365.424M135.46
TBNN [19] (CVPR’21)15.622 10.70714.600.482950.40M- 1
Dehamer [20] (CVPR’22)13.940 10.64614.630.4996132.4M- 1
RRA-Net (Ours)19.8130.76515.780.51540.3M166.11
1 For fair comparisons, we conducted the tests for all the methods on the same computer with 2 RTX 2080Ti GPUs by passing whole images into these models. For TBNN and Dehamer models, GPU “out of memory” issue was reported during the tests run on this specific computer since they have much higher model complexities. Thus, we used a patching strategy to quantitatively evaluate these two models. Note that this strategy may incur slight losses of performance. Accordingly, the runtimes of these two models are not reported.
Table 2. Ablation studies on main modules of MRAB (top rows) and on two new loss functions (bottom rows). B N , A M , and L R stand for batch normalization, attention module (spatial attention and channel attention), and local residual connection in MRAB, respectively. Best results are underlined.
Table 2. Ablation studies on main modules of MRAB (top rows) and on two new loss functions (bottom rows). B N , A M , and L R stand for batch normalization, attention module (spatial attention and channel attention), and local residual connection in MRAB, respectively. Best results are underlined.
SettingModule/LossMetric
B N A M L R L 1 L CA L Laplace PSNR↑SSIM↑
S1 18.7970.733
S2 19.5320.754
RRA-Net19.8130.765
S3 19.210.746
S4 19.340.741
RRA-Net19.8130.765
Table 3. Ablation study on different BN strategies. Best results are underlined.
Table 3. Ablation study on different BN strategies. Best results are underlined.
BN StrategyPSNRSSIM
Individual BN on each branch18.6270.688
A single BN after multi-branch19.8130.765
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, E.; Ye, T.; Jiang, J.; Tong, L.; Ye, Q. Efficient Re-Parameterization Residual Attention Network for Nonhomogeneous Image Dehazing. Appl. Sci. 2023, 13, 3739. https://doi.org/10.3390/app13063739

AMA Style

Chen E, Ye T, Jiang J, Tong L, Ye Q. Efficient Re-Parameterization Residual Attention Network for Nonhomogeneous Image Dehazing. Applied Sciences. 2023; 13(6):3739. https://doi.org/10.3390/app13063739

Chicago/Turabian Style

Chen, Erkang, Tian Ye, Jingxia Jiang, Lihan Tong, and Qiubo Ye. 2023. "Efficient Re-Parameterization Residual Attention Network for Nonhomogeneous Image Dehazing" Applied Sciences 13, no. 6: 3739. https://doi.org/10.3390/app13063739

APA Style

Chen, E., Ye, T., Jiang, J., Tong, L., & Ye, Q. (2023). Efficient Re-Parameterization Residual Attention Network for Nonhomogeneous Image Dehazing. Applied Sciences, 13(6), 3739. https://doi.org/10.3390/app13063739

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop