1. Introduction
In the past few years, USVs have become widely used in civil domains including maritime search and rescue, in addition to playing a significant role in the sphere of national defense [
1,
2,
3,
4]. However, the sea navigation conditions are complex, and the ocean images in hazy weather are characterized by low contrast, few high-frequency components, and a large number of missing details [
5,
6,
7]. This will hinder subsequent advanced visual tasks. Therefore, it is very important for the USV perception system to obtain high-definition images in marine haze areas [
8,
9,
10].
The reduction in picture quality brought on by maritime haze can be roughly expressed by mathematical Equation (
1) [
11,
12].
where
represents the hazy image acquired by the imaging device.
represents the haze-free image.
A represents the atmospheric light value.
represents the transmission map [
13]. The inverse restoration procedure of the physical deterioration process described in Formula (1) is called maritime image dehazing, which is a highly ill-posed problem because of the unknown transmission map and global atmospheric light. To address this difficult issue, a wide variety of image dehazing methods have been put out in recent years. These methods are broadly classified into traditional a priori-based methods and modern deep-learning-based methods. The handcrafted nature of the picture priors in the former type is the primary distinction between the two types, while the image priors of the latter type are learned automatically.
Traditional dehazing methods mainly use a priori knowledge to dehaze. For example, He et al. [
14] proposed the dark channel prior (DCP). He et al. believe that there is always a channel with a very low gray value in the hazy map. Based on this prior knowledge, the transmission map is solved, and then, the atmospheric scattering model is used for image dehazing. This method achieved a good effect at that time, but it will cause color distortion in some scenes. Meng et al. [
15] proposed an effective regularization method to remove haze. This dehazing method based on a boundary constraint can solve the problem of the low brightness of an image. Zhu et al. [
16] proposed an a priori method of the color attenuation prior (CAP). The above early image dehazing methods can achieve good results in some specific scenes and have made a great contribution to the development of dehazing technology. However, because most of these methods rely on a priori information, the accuracy of the accepted assumptions/priors with regard to the target scenarios naturally limits their performances. Therefore, the traditional method cannot achieve the expected effect in many cases.
Deep learning techniques attempt to directly regress the final haze-free image or the intermediate transmission map; this overcomes the limitation of a specific prior. With the Big Data being applied, using their strong representation and end-to-end learning ability, people have proposed many image dehazing methods based on deep convolution neural networks (CNNs) [
17,
18,
19,
20,
21,
22,
23,
24,
25,
26] and achieved superior performance and robustness. For example, DehazeNet, an end-to-end trainable deep CNN model, was proposed by Cai et al. [
27], which can learn and transmit directly from a hazy image, which is better than modern a priori methods and the random forest model. A multiscale CNN (MSCNN) was suggested by Ren et al. [
17] to learn transmission maps in the way of full convolution and explored a multiscale structure for coarse-stage to fine-stage regression. The dense connected pyramidal dehazing network (DCPDN) was proposed by Zhang et al. [
18] to concurrently learn the transmission map and atmospheric light. The method also utilizes adversarial loss based on generative adversarial networks [
19] to supervise the dehazing network. One problem with these CNN-based methods is that all of them require accurate transmission maps and atmospheric light. Li et al. [
20] did not estimate an intermediate transmission map. An end-to-end CNN model, called the all-in-one dehazing network (AOD-Net), was proposed for learning haze-free images from hazy images. Despite the reformulation of the haze imaging model by integrating transmitted and atmospheric light into a single variable, there is still a need to accurately estimate the intermediate variables, so that AOD-Net still belongs to the physical model in (1). Learning the image a priori through the deep learning method has largely gotten rid of the limitations of traditional methods, but it still follows the traditional dehazing model. Therefore, if we cannot accurately estimate the image a priori, it will still lead to low-quality results.
Different from the CNN method for estimating intermediate variables, the network proposed in [
21,
22,
23,
24,
25] is constructed based on the principle of image fusion. Instead of estimating the intermediate parameters of the atmospheric scattering model, it learns the relevant features of the image and feature fusion through the network, then directly restores the haze-free image. In general, fusing features from different levels can improve network performance. To implement this idea, Reference [
21] used feature pyramids to combine low-level and high-level semantic feature maps to carry out this approach, and Reference [
18] used dense connected pyramidal networks to achieve feature fusion at all scales. Hang Dong [
22] proposed MSBDN-DFF based on the U-Net architecture with dense feature fusion. The network integrates the dense feature fusion module (DFF) into the U-Net architecture to make up for the missing spatial information in high-resolution features at the same time. Ayush [
23] proposed a novel generative countermeasure network structure: the back-projected pyramid network (BPPNet). The spatial context is reserved through the iterative block of U-Net, and the multi-scale structure information is fused through a new pyramid convolution block. Most of these feature fusion methods are implemented by brutally overlaying pixels, treating each level of the feature map without any difference and easily losing important feature information. Therefore, Ren et al. [
24] proposed a gated fusion network (GFN) using an encoder–decoder architecture. Learning three pre-processed images obtained from the original image, the proportion of these three image features in the output image is automatically obtained using the gated fusion network and fused to recover the haze-free image. Tan et al. [
25] proposed a multi-branch deep fusion network (MBDF-Net) for 3D target detection, and a simple adaptive attention fusion (AAF) module was designed in the feature extraction phase to suppress non-interest regions and feature fusion. Considering the real-time nature of the algorithm, Shi et al. [
26] proposed an efficient and concise multi-branch fusion convolution network for remote sensing image scene classification. In the network, SE and CBAM attention modules are added to extract shallow and deep features, respectively, and fusion is performed according to the generated attention weight. A gated contextual aggregation network (GCANet) was suggested by Chen et al. [
28] to directly reconstruct the final haze-free image. The method proposes a more lightweight and brief gated fusion sub-network to fuse characteristics at several levels, assigning importance weights to the elements at varying levels. When compared to conventional approaches, these deep-learning-based methods circumvent the issue of low image restoration quality caused by image prior estimation error and improve the flexibility of feature image fusion. However, the lack of the differentiated treatment of channel, pixel, and spatial features in feature extraction hinders the characterization capability of deep networks.
To address the above problems, based on the principle of image fusion and the characteristics of marine hazy image, this paper proposes a new end-to-end multi-branch feature gated fusion network, which is used for maritime image dehazing and directly obtains haze-free images. The network proposes three residual attention modules each incorporating different branch networks to concentrate on key elements and optimize the details of the image. To determine the relative importance of various branch feature maps, the weight maps of the corresponding branches are adaptively learned using the gated fusion sub-network and to combine the feature maps using the associated importance weights. Compared with other dehazing methods, it performs outstandingly in preserving image colors and recovering information in thick haze areas. The experiments show that MGFNet has better performance than previous image dehazing methods, both qualitatively and quantitatively. A comprehensive ablation study is also provided in the paper to verify the significance and requirement of each element. Further, the proposed MGFNet is applied to the hazy sea–skyline detection task, which performs superior to the previous state-of-the-art network.
The following are the paper’s contributions:
① Three different residual attention modules are proposed, which are applied to different branch networks based on U-Net [
29], and three inputs are extracted from the original image through the branch network: The first input is obtained by incorporating the branch network of residual channel attention, which aims to weaken the color distortion caused by atmospheric light scattering. The second input is obtained by fusing the branch network of residual spatial attention to enhance the contrast and produce better global visibility. In order to recover the detail of thick haze region, the third input fuses the residual pixel attention and focuses on the thick haze pixel region.
② The gated fusion sub-network is proposed to adaptively learn the weight maps of corresponding branches to determine the importance of different branch feature maps and to fuse the feature maps according to their corresponding importance weights. The purpose is to seamlessly fuse the branch feature graph by retaining specific features and enhance the network performance.
The structure of this paper is as follows:
Section 1 introduces the dehazing model and discusses pervious traditional and modern deep-learning-based dehazing methods;
Section 2 discusses the proposed multi-branch gated fusion network and the components of the network;
Section 3 introduces the loss function for the training network;
Section 4 compares and analyzes the results of the comparative experiments;
Section 5 applies the proposed method to sea–skyline detection to further verify the advanced nature of the method.
Section 6 compares and analyzes average model runtime.
Section 7 concludes this paper.
3. Loss Function
Since the objective evaluation index PSNR now serves as the primary measure of image dehazing, the mean absolute error (MAE) or mean-squared error (MSE) are typically utilized as the loss functions to optimize the network. However, both the L1 and L2 losses are pixel losses that do not take global information into account, so over-smoothing problems are usually encountered. Many loss functions have been proposed, such as perceptual loss [
30], adversarial. [
19], and even compound loss, which combines multiple loss functions together to solve the problem. In order to focus on the trade-off between human perception and metric scores, new loss functions are proposed to optimize MGFNet, as shown in the following Equation (
11).
where
represents the degraded image,
B represents the batch size of the training data,
C is the number of feature channels, and
H and
W are the image size.
Y represents the haze-free image. The total loss function consists of
and
.
is the proposed loss consisting of the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index, using only two standard metrics as parameters to achieve better visual perception using fewer parameters, as shown in Equation (
12).
where the parameter
is a constant empirically set to 0.005. the second term
is the edge loss, cited in [
31], which limits the difference between the actual image and the anticipated haze-free image’s high-frequency component. It can be expressed as Equation (
13):
where
denotes the Laplace operator. The constants in Equation (
12) and in Equation (
13) were set to 0.05.
We also supplemented the comparison with other common loss functions (e.g., MAE loss, MSE loss, MAE and MSE combination loss, adversarial loss) to prove the superiority of the loss function proposed in this paper. The comparison results are shown in
Table 2.
It can be seen from
Table 2 that different loss functions are used for training under the same parameters and compilation environment. The models PSNR and SSIM trained by the loss function in this paper are higher than other losses, reflecting the superiority of the loss function proposed in this paper.
5. Effect Evaluation of Sea–Skyline Detection on Dehazing Network
The sea–skyline is an important factor for USV vision technology to perceive the surrounding environment. Accurate maritime antenna detection can accurately divide the sea–sky region, which is important for the safe navigation of USVs on the maritime surface and target detection [
33,
34,
35,
36]. Liang et al. [
37] proposed a sea–skyline detection method for complex environments. This method locates the sea–sky area based on texture features and uses the OTSU algorithm to obtain an adaptive segmentation threshold to generate a group of sea–skyline candidate points. Finally, a simple clustering method is used to select appropriate points and transform them by line fitting. This method can accurately detect the sea wave line under the complex background of many clouds and waves, but it does not perform well under the maritime haze condition, mainly because the image contrast is low and the texture features are not obvious enough in maritime haze conditions. In maritime hazy weather, the elimination of haze is an important pre-processing step for sea–skyline detection. In other words, the accuracy of sea–skyline detection is a valid metric to estimate the performance of the dehazing network. The Hough transform [
38] or Radon transform [
39] are computationally small and have anti-interference. The fundamental needs of the USV can be satisfied by fitting the edge pixels to the sea–skyline using the Hough transform. Therefore, in this paper, fitting the edge pixels made use of the Hough transform, and the image quality was evaluated by counting the accuracy of the sea–skyline detection in the images before and after dehazing, respectively.
Figure 9 shows the effect of the sea–skyline detection.
Statistical calculation and analysis of the sea–skyline detection results for a certain amount of maritime surface images were performed. Judgment basis for the maritime surface image: when the detected sea–skyline point is within five pixels from the real sea antenna, the sea antenna is considered to be correctly detected; otherwise, it is judged to be incorrectly detected. The real sea–skyline is artificially labeled. The quantitative detection accuracy is counted, and the calculation procedure is as follows in Equation (
17).
where
p denotes the accuracy of sea–skyline detection,
denotes the number of correctly detected images, and
denotes the total number of selected maritime surface images. In order to evaluate whether the proposed network is effective, 20 real maritime surface images with haze were randomly selected, and the above sea–skyline detection algorithm was used to detect the sea–skyline on the images before and after the haze removal.
Figure 10 shows the effect comparison of each network after dehazing, and the comparison of the accuracy is shown in
Table 5.
According to the experimental findings, the proposed maritime image dehazing network can improve the color spots and overall color dimming phenomenon existing in other networks to a certain extent. The recovered image is able to recover more details in the hazy region, which can provide higher-quality maritime images to the USV perception system.
Table 5 also demonstrates that the sea–skyline accuracy of the image after processing by the proposed network and AOD-Net can both reach 95%, which is better than the other networks.
Figure 10 shows that both the proposed network and AOD-Net outperformed the other networks in terms of the sharpness of sea–skyline part recovery, which is favorable for sea–skyline identification, but the present network was more powerful in recovering the overall image quality. The sea–skyline detection on hazy days further validates the proposed network’s excellence and efficiency applied to the USV perception system for maritime image dehazing.