1. Introduction
When imaging through a fluctuating air-water surface using a fixed camera, the images acquired by observing the fixed objects that are completely immersed in the water, often suffer from severe distortions due to the water surface fluctuation. Furthermore, refraction occurs when light passes through the air-water interface, especially on the wavy water surface [
1,
2]. The refraction angle is affected by the refractive index of the medium and angle of incidence. In addition, the attenuation derived from the scattering and absorption of the imaging path is also an important consideration that results in observation distortion. The suspended particles in the medium, such as organic matter, mineral salt, microorganisms, etc., are constantly consuming the beam energy and changing the propagation path of light [
3,
4]. Hence, underwater image restoration is more challenging than similar problems in other environments.
Research on this challenging problem has been carried out for decades. Some researchers consider that the distorted wavefront due to the fluctuated surface is the important factor of image degradation. Therefore, the adaptive optics method, which is popular in astronomical imaging, was used for underwater image correction at first. Holohan and Dainty [
5] assumed that the distortions are mainly low-frequency aberration and phase shift, and proposed a simplified model based on adaptive optics.
There are several methods using ripple estimation technology for underwater reconstruction. A simple algorithm was proposed in [
6], where it reconstructs the 3D model of water surface ripple by optical flow estimation and statistical motion features. In [
7], an algorithm based on the dynamic nature of the water surface was presented, which uses cyclic waves and circular ripples to express local aberration. Tian et al. [
8] proposed a model-based tracking algorithm. The distortion model, based on the wave equation, is established, then the model is fitted to frames to estimate the shape of the surface and restore the underwater scene.
According to Cox–Munk law [
9], if the water surface is sufficiently large and still, the normal water surface will roughly assume a Gaussian distribution. Inspired by the law, some approaches focus on finding the center of the distribution of patches from the image sequence as the orthoscopic patch [
10]. Many strategies related to lucky regions have been proposed to deal with the problem. In [
11], the graph-embedding technology was applied to block frames and calculate the local distance between them, and the shortest path algorithm was used to select patches to form the undistorted image. Donate et al. [
12,
13] came up with a similar method, where the motion blur and geometric distortion were modeled respectively, and the K-means algorithm was applied to replace the shortest path algorithm. Wen et al. [
14] combined bispectrum analysis with lucky region selection, and smooth edge transition by using patches fusion. A restoration method based on optical flow and lucky region was proposed by Kanaev et al. [
15]. The selection of a lucky patch is realized by image measurement metric and calculating the nonlinear gain coefficient of the current frame to each point. Later in [
16,
17], Kanaev et al. improved the resolution of their algorithm by developing structure tensor oriented image quality metrics. Recently in [
18], on the basis of lucky region fusion, Zhang et al. put forward to a method that the input of distorted frame and output of restored frame are carried out at the same time. The quality of reconstruction is ameliorated by successive updating of the subsequent distorted frame.
The registration technology [
19,
20,
21,
22], which is originally used to recover the atmospheric turbulence image, is applied to address this issue. Oreifej et al. [
20] presented a two-stage nonrigid registration approach to overcome the structural turbulence of waves. In the first stage, a gaussian blur is added to sequence frames to improve the registration effect, and in the second stage, a rank minimization is used to dislodge sparse noise. In our previous research [
21], an iterative robust registration algorithm was employed to overcome the structural turbulence of the waves by registering each frame to a reference frame. The high-quality reference frame is reconstructed by the patches selected from the sequence frames and a blind deconvolution algorithm is performed to improve the reference frame. Halder et al. [
22] proposed a registration approach using pixel shift maps, where registers image sequence against the sharpest frame to obtain pixel shift maps.
Motivated by deep learning technology, Li et al. [
23] introduced the trained convolution neural network to dewarp dynamic refraction, which has a larger requirement of training samples and training cycles.
Recently, James et al. [
24] assumed that water fluctuation possesses spatiotemporal smoothness and periodicity. Based on this hypothesis, compressive sensing technology was combined with local polynomial image representation. Later, based on the periodicity, a Fourier based pre-processing was proposed to correct the apparent distortion [
25].
Most of the above studies were performed to restore the whole image from an underwater sequence. However, some structural information of the image is expected to be obtained in some cases, such as underwater cable number observing. In this paper, a new image restoration approach for underwater image sequences using an image registration algorithm is proposed. In our approach, the lucky patches fusion was employed to discard the patches with severer warping, then a guided filter algorithm was used to enhance object boundary in the fused image. An iterative registration algorithm was applied to remove most of the distortions in the frames, which registers frames against the enhanced image. After the registration process, the unstructured sparse noises were eliminated by principal component analysis (PCA) and patches fusion technology to produce undistorted image sequences and frames. Experiments present that the proposed method has better performance on structural information reconstruction [
20,
21].
The remaining part of this paper proceeds as follows:
Section 2 presents an overview of the proposed method and introduces it in detail,
Section 3 shows the results of the experiment,
Section 4 analyzes and discusses the experimental results, and
Section 5 concludes the paper.
3. Results
In the experiment, the proposed method was implemented on MATLAB (MathWorks Co., Natick, MA, USA). The basic data sets usedwere the same as Tian’s in [
8]. The source codes are opened online for the public and available in the following link:
https://github.com/tangyugui1998/Reconstruction-of-distorted-underwater-images. To verify the performance of the proposed method, we made a comparison with our previous study [
21] and Oreifej et al. [
20], whose source codes are also available online. The same image sequence was processed by three methods individually, and for all three methods, the maximum number of iterations was set as five.
The tested data sets contain “checkboard”, “Large fonts”, “Middle fonts”, and “Small fonts”. Each data set was composed of 61 frames. The size of the frame and patch used in patches fusion was manually preset before running the algorithm, which is shown in
Table 1.
Algorithm 1: Structural Information Reconstruction |
Input: Distorted image sequence |
|
Output: Undistorted image sequence and dewarped image |
, |
Whiledo |
Step 1: Lucky patches fusion |
for each patch sequence |
; |
; |
; |
End |
; |
Step 2: Guided filter |
; |
Step 3: Image registration |
; |
; |
; |
End |
Step 4: Post-processing |
End |
The registration results are shown in
Figure 2. The results indicate that all three methods can compensate for the distortion, defocusing, and double image of an underwater image sequence to some extent, while our method has better performance in some details, especially on three text data sets. The adjacent letters are blurred together by the Oreifej method and the previous research, which has been marked in red rectangles. Meanwhile, there are some ghosting and misalignment in marked red regions, such as the Oreifej’s results of “Large fonts” and “checkboard” and the result of our previous method in “Large fonts”. On the contrary, the proposed method successfully avoids the above problems, and the letters in results are clearer and easier to be observed. Moreover, for some data sets such as “checkboard”, the proposed method is inferior to our previous method in regional restoration (marked with green rectangles), and some letters in the result of “Large fonts” suffered from slight aberration. Although some areas marked in green rectangles are more distorted compared with other methods, the letter observation is not affected dramatically.
In addition, to compare the running time of three methods, we used a laptop computer (Intel Core i5-9300H, 8 GB RAM) to measure the processing time. The “checkboard” data set was used and the running time of each step in the proposed method as follows: The patches fusion step cost 4.609 s. The guided filter step cost 0.016 s. The comparison of each registration step of the three methods is shown in
Figure 3. Each iteration time of the Oreifej method fluctuated around 260 s, whereas the iteration time of the proposed method decreases with increasing of iterations, and the time was less than the Oreifej method except for in the first iteration. Furthermore, the average running time of the proposed method was 249.859 s, less than the Oreifej method, which was 263.060 s. Compared with the previous method, obviously, the proposed method had a decreasing running time as iterations went on, whereas uptime of the previous method rebounded evidently with an average of 255.522 s. Overall, the proposed method kept a clear pattern of deceleration in the iteration time, which is very useful with the improvement of processor hardware acceleration.
4. Discussion
4.1. Better Quality Reference Image
Usually, the real image as a reference image used in image registration can not be obtained in a real video system. So, the most important step in image registration is the selection of the reference image, as mentioned in the introduction, the mean or the sharpest frame of the image sequence is usually used as the reference image in the past. However, if the mean frame is so blurred that it causes feature loss in the image, while the sharpest frame has severe geometric distortion that cannot be ignored, the registration is seriously restricted. In order to solve the problem, Oreifej selected the mean of an image sequence as the reference and blurred the sequence frames using a blur kernel estimated from the sequence. When the reference image and sequence are at the same blur level, the whole registration process is guided to the sharper region. However, in the Oreifej method, the blurred frames tend to introduce unexpected local image quality deterioration. Some edges in sequence are blurred and shifted because of joined frame blurring, and the same regions in the mean are also ambiguous and warped. When the registration process of these blurred regions is directed to other regions, and the registration of the boundary is sacrificed. It should be noted that the loss and confusion of edge information are probably irreversible, and the distortion and misalignment may be aggravated with the increase of iterations.
In the proposed method, reconstruction of the higher quality reference frame is performed at first. The reference frame is obtained through discarding severely distorted parts and enhancing the edge. Compared with the mean frame, the reconstructed reference frame can greatly avoid introducing artificial misalignment, and its pixel points are closer to their real positions. Then, the registration is guided to the sharper boundaries with the preserve and enhancement of edge. The comparison between the reconstructed reference image with the mean reference image used in other methods described above, is shown in
Figure 4. It is clear that the reference processed by the proposed method has more distinct letters edges.
4.2. Analysis of Restoration Results
The results shown in
Figure 2 suggest that the proposed method performed better than our previous study and the Oreifej method in the adjacent letter reconstruction. The mean frame of frame sequence was selected as the reference frame of the Oreifej method, and the originally sharp letter boundaries in the image sequence were destroyed by bringing blur to the frames. It is difficult in the registration process to act precisely on the edge of text, even causing some mismatches. The fuzzy and double image caused by mismatches gets worse as the iteration step goes on. Our previous study adds a deblur step to the reference image before registration, which makes the registration pay too much attention to the nonboundary part and neglects the restoration of the border. In other words, Oreifej adds blurring to the original sequence and registers the blurred image sequence against the mean frame of the original sequence. Our previous method registers the original sequence against the reconstructed image using lucky patches fusion and blind deconvolution. However,
Figure 2 shows that the above methods fail to deal with the fuzzy of structural information, especially the outline of adjacent letters. We think the reason is that the regions registered preferentially, which are relatively sharper, are random and irregular because of the uncertainty of sharper region generation and lack of explicit information priority. Therefore, considering the integrity of structural edge information is useful to improve the visual effect, we decided to intentionally raise the status of edge registration and guarantee the priority of boundary reconstruction to recover the edge information to a greater extent. We spliced the patches with less distortion into an image. Then the guided filter was used to keep the boundary information with great gradient change and ensure the edges of the character were clearer than non-edges. The registration process of edges took precedence over other positions, and with increasing iterations, the reference quality was constantly improved. Nevertheless, because the nonuniform geometric degradation in the first iteration was unable to be completely compensated by lucky patches fusion, and guided filter directed the registration to focus on the structural details instead of regional distortion, some of the migrated boundaries were mistaken for correct ones and partial distortion was retained, leading to some letters suffering from deformations shown in the result of the “Large fonts”. Although there are still some local distortions in the results, slight distortions are more conducive to be observed than blurring.
To quantify the results of all methods, gradient magnitude similarity deviation (GMSD) [
31], and feature similarity (FSIM) [
32], were chosen as relevant quality metrics. Above metrics belong to full-reference, which reflect the difference between the evaluated image and the ideal image, were defined as follows:
GMSD is a metric that estimates image quality score by gradient magnitude, which presents structural information. The formal can be shown as:
where
and
denote gradient magnitude of the reference image and the distorted image, respectively,
represents constant, and GMSM denotes the mean of gradient magnitude similarity.
FSIM is an expression for emphasizing the structural features of visual interest, to estimate the frame quality. It can be expressed as:
where
represents phase congruency,
denotes the entire image, and
denotes similarity that is determined by phase congruency similarity and GMS.
Furthermore, the underwater image quality measure (UIQM) [
33] is employed as a no-reference quality metric, considering that the true images of the “Large fonts” and “checkboard” are unavailable. The metric is attributed by colorfulness measure (UICM), sharpness measure (UISM), and contrast measure (UIConM). In our experiment, the data sets are grayscale images so that the UICM is identically equal to zero.
The comparison results of image quality are presented in
Table 2. It shows that the proposed method outperforms our previous method and is close to the Oreifej method in full-reference metrics. For most of the no-reference metrics, the proposed method performs the best numerical value, which indicates the sharpness and contrast of the whole image are significantly improved. The higher value of UISM shows the image contains richer structural edge information and details, and the greater value of UIConM reflects the figure is more suitable for the human visual system. For all metrics, the UIQM, which reflects the underwater image quality from different aspects, is presented based on human visual system and is a special image evaluation for the underwater environment. However, GMSD and FSIM are not as relevant to people’s intuitive feelings as UIQM. Therefore, we prefer UIQM as the main indicator. Although the residual distortions of the proposed method are indicated by GMSD and FSIM, compared with the removal of all distortions, the sharp letter distinction and details, indicated by higher contrast and sharpness, contribute more to the subsequent observation.
Furthermore, from
Table 2, we found that the proposed method performs better in UISM, UIConM, and UIQM, which means that it is suitable for the applications that emphasize the human visual effects of restored images and emphasizes structural boundary information, such as underwater text image.
4.3. Number of Iterations and Running Time
For the purpose of raising efficiency, the
, which presents the difference between the frames and the mean of the current sequence
, is applied to determine the end of the registration process. The value can be obtained from Equation (14). The threshold is set to 0.025, the same as Oreifej [
20]. As long as the
is less than the threshold, it means that the registration tends to be stable, then the post-processing is carried out. The comparison of convergence results is listed in
Table 3.
From
Table 3, we can see that the iteration numbers of the proposed method are obviously less than other methods, and neither of the data sets has more iterations than the preset maximum iteration times.
We think the reason that, in the Oreifej method, the numbers of iterations keep a high level is because frame blurring is used. In our previous study, using the higher quality reference frame could speed up the registration. However, due to the continuous improvement of the overall definition of the reference frame, the nature of registration being guided to the sharper region is weakened, resulting in a decrease of later iteration velocity. However, a processed reference frame, which has preserved edges and smoothed non-edges simultaneously, is employed to the registration of our proposed method. The edges of reference are always sharper than other parts as the iterations go on, thus further shortening the registration time. It is also shown by the decreasing trend of iteration time, as shown in
Figure 3.
Therefore, the proposed method has better performance on computing speed and greater development potential when the numbers of iterations are the same or unknown.
4.4. Analysis of Patch Fusion
Figure 5 shows the registration results of using the mean frame or the fused frame as the reference frame after the guided filter. It can be seen that there are some distortions and double image (marked by the red rectangle) that affect the recognition and even the overlapping of letters in the mean-to-frames registration. However, the results of our proposed method can express more accurate structural information. The results show that the patch fusion step must be carried out to eliminate the seriously distorted parts before the guided filtering, so as to enhance the correct boundaries of the objects as much as possible, which could decrease the errors that occurred in the registration step.
4.5. Analysis of Guided Filter
Compared with other popular edge-preserving algorithms such as the bilateral filter, the guided filter has better behavior near the edges, a fast and non-approximate linear-time characteristic. Furthermore, the guided filter step, which enhances the structure information’s boundaries, is the key factor in determining the quality of restoration, especially for “Small fonts” and “Tiny fonts”. After the fusion of the sequences, the letters are closely arranged, and their edges are mildly shifted and blurred, resulting in the small variance of the window centered on the boundary points. If the selected regularization parameter is too large, the guided filter will mistake the boundaries between letters as the “flat area” to be smoothed, then the loss of boundary information makes it impossible to distinguish letters, whereas if the parameter is too small, then the guided filter will mistakenly regard the ghosting as the gradient edge to preserve, which may cause registration error.
We tested different combinations of regularization parameter
and window size
, the reconstruction results are compared, as shown in
Figure 6. If the
is not selected properly, the boundaries will be deleted and blurred, as marked by the red rectangle, while the selection of
has little effect on the results. Therefore, we could find that
plays a greater role in edge restoration than
. In the case,
and
are set to 0.022 and 4, respectively, to achieve better reconstruction result.
4.6. Other Underwater Text Data Set
To further verify the robustness of the proposed method, we dealt with a real text data set in underwater scenes [
24]. The data set contains 101 frames, and the size of each frame is 512 × 512. The result is also compared with other registration algorithms [
20,
21]. As shown in
Figure 7, the proposed method still outperforms other registration strategies, and the boundary between letters is successfully restored without blurring.
4.7. Compared with Nonregistration Method
The above discussions have proven that the proposed method has faster convergence speed and better behavior than other registration strategies. In order to verify the performance more objectively, the proposed method was also compared with state-of-the-art technology without registration. Considering the reproducibility, the James method [
24], whose resource code is available online, it was selected as a reference method. The same data set “Large fonts” with 61 frames were applied to the proposed method and the James method simultaneously.
Figure 8 describes one frame of data set and the results of two methods. We found that the James method failed to deal with details under strong disturbance, while the proposed method was able to reconstruct better boundaries and more details. The comparison illustrates that the periodicity hypothesis may fail in severely disturbed environments. Instead, we intend to treat all distortions as local deformation that can be corrected through registration, and achieve better recovery in a strongly disturbed situation.
5. Conclusions
This article proposes a new structural information restoration method for underwater image sequences using an image registration algorithm. Different from previous studies [
20,
21], the proposed method follows a new registration strategy, that is, it gives higher priority to edge information before registration, and intentionally guides the registration to focus on the boundary area of interest. At first, the regions with less misalignment across the input warped sequence, which are picked by using SSIM as a metric, are fused into a single frame. Then the guided filter is employed to recognize and hold back the gradient border of the fused image while blurring other areas. During the iterative registration, using the output of the guided filter as a reference, most unexpected fringe anamorphoses can be corrected, and the structural edge information is restored to a greater extent. As iterations go on, the quality of the boundary improves progressively and tends to be stable with the increasing of the iteration. Finally, to dispel the random noise and produce an undistorted frame, the PCA algorithm and patches fusion technology are applied to the registered frames.
From the experiment, we have tested and compared our method with other registration methods [
20,
21]. In comparison with other registration strategies [
20,
21], these methods fail to deal with the contour relationship between adjacent letters, bringing about fuzzy blocks that restrict character recognition. Instead, our method can effectively restore the more prominent letter boundary. Meanwhile, as the iteration of registration increases, the running time of our method is constantly shortened, which makes our method more advantageous in the real scene with unknown iterations. In comparison with a nonregistered method [
24], our method can also cope better with the highly disturbed underwater scenes. In the future, the application of edge priority in underwater observing will be further studied.