1. Introduction
Underwater images contain copious information and have been an imperative source of completely different, intriguing branches of technology and scientific research [
1], such as underwater image enhancement [
2], underwater color restoration [
3], and underwater object detection [
4]. However, the appropriateness of the nonexclusive classic models in real-time underwater robotic vision has been constrained. Underwater imagery suffers from poor visibility resulting from the attenuation of the propagated light, mainly due to absorption and scattering effects [
5]. The turbid underwater environment makes object detection a challenging job. For decades, several attempts have been made to reestablish and enhance the visibility of such corrupted images.
As the deterioration of underwater scenes corresponds to the combination of multiplicative and added substance processes, conventional enhancing strategies such as gamma adjustment [
6], histogram equalization [
7], and color correction [
8]. The existing underwater restoration strategies have sprung up in recent years. Several foggy dehazing techniques [
9] have been introduced with great attention. However, underwater imagery is even more difficult to handle the extinction resulting from scattering depending on the light wavelength on the color component. The absorption substantially reduces the color information in underwater scenes which has the results of foggy object appearance contrast degradation.
On the other hand, detecting the most valuable object in the underwater scene has become an important and valuable subject of research. The human visual system has an effective attention mechanism to find the most important information from the visual scene. Computer vision aims to imitate the mechanism in two research branches: eye fixation detection and salient object detection. Our work focuses on the second one, and aims at detecting the most salient object in the underwater domain-specific field which may be composited with several different characteristic regions.
Eye prediction models typically try to detect a small set of points that represent human eye fixation, while salient object detection focuses on segmenting the whole extent of the salient object. Complete and accurate salient object detection has been not only a challenging job in underwater applications but also important for visual detection, scene comprehension, visual recognition, and other computer vision tasks [
10,
11]. The research on saliency measures may be classified into two groups according to the two human attention mechanism branches. The first saliency computation model is defined by Itti [
12] which first brings the saliency biological model into the computer vision field. The model extracts color, intensity, and orientation features to calculate the saliency contrast information with visual biological mechanisms. After the biological model was proposed, a lot of computational biological models sprang up which may be referred to as the bottom-up models. Since bottom-up saliency methods largely depend on biological mechanisms and only calculate the local feature contrast information, they can only detect the edge and contour information without the complete whole body of the object. Liu [
13] was the first algorithm to deal with the bottom-up problem and bring salient object detection into computer vision which may focus on detecting the whole region of the object. Liu computes the multi-scale color contrast on the original image and fuses the local feature with the global feature through CRF. Finally, the salient object is obtained by searching for the object that meets the threshold of the region. Since the result mostly depends on the handcraft threshold, detection performance varies from different scenes.
Traditional salient object detection often utilizes bottom-up saliency measures to make the coarse detection. They search the saliency map by a greedy algorithm or use the grab-cut segmentation algorithm to find the salient object while the region meets the handcraft threshold of the object. To avoid the high exhibition of time caused by the greedy search used before, Lampert [
14] uses the ESS algorithm instead. Achanta [
15] segments the original image into pieces by mean-shift algorithm and then uses an adaptive threshold method to find the region that has the double saliency value compared with the average value of the whole image. The former methods largely depend on the handcraft threshold, but neither of them gives a measure that can find the best searching threshold. To avoid the problem of choosing the best handcraft threshold, Luo [
16] computes the saliency density of the image and then searches for the rectangle window which contains the max saliency density. Shi [
17] constructs a model that describes the detection of a region by maximizing the saliency value among it. The final salient object is obtained by the iteration of the ESS algorithm, and without handcrafted parameters become the privilege of the method. However, the advanced searching method not only exhibits the amount of time but also has no accordance with the human fixation mechanism. Cheng [
18] defines a color contrast calculation with sparse quantification and a smooth term to obtain a global saliency. The model initializes the detection process by binarizing the saliency map, the final object is obtained by iterating the grab-cut algorithm with the operation of corrosion and expansion. As mentioned above, both the eye fixation methods and the salient object detection methods encounter the spatial structure problem. In our work, a proposed spatial coherence optimization algorithm is utilized instead of CRF and ESS algorithm to preserve the spatial structure information.
With the population of deep learning, a lot of methods leverage the advantage of convolutional neural network to calculate the segmentation task. Li [
19] makes the early trial with CNN to extract multi-scale deep features. Instead of using CRF to enhance the spatial coherence, a super-pixel level saliency map is calculated. To adjust the obscure edge problem made by FCN, Hou [
20] makes a deep supervised learning with the short connection to obtain a more accurate saliency map with edge information. Luo [
21] utilizes CNN as a feature extraction measure which may extract both the global and local features. Liu [
22] proposes some pooling modules and makes a joint training with the edge detection. Qin [
23] makes a CNN module with both a feature extraction and refinement module to adjust the final saliency map. Zhao [
24] makes a multi-scale feature extraction with CNN which is emphasized by spatial and channel attention. Since FCN methods lead to obscure edge problems which may drop the detection precision, Zhou [
25] extracts and fuses the feature with two streams to integrate saliency maps, contour cues, and their correlation. Pang [
26] makes a multi-scale interactive network to enhance the performance of feature fusion. Li [
27] makes a joint training network for camouflaged object detection. The fusion of side output from multi-scale deep learning features increases the segmentation performance. Inspired by the multi-scale features used in deep learning, both the global and local saliency contrast information are considered in our work to obtain a more accurate salient object location.
The saliency map generated by traditional saliency measures is vague without an accurate complete region of the salient object. Supervised measures largely depend on a lot of prior knowledge and training tricks. To address the problems mentioned above, a novel fusion underwater salient object detection (FUSOD) algorithm is proposed. An improved color restoration algorithm are utilized to restore the original image and enhance the detection precision. Both the global feature contrast information and the local eye fixation feature are considered to calculate the multi-scale saliency which may increase the object location accuracy. A novel spatial coherence optimization algorithm is proposed to preserve the spatial structure information. The proposed FUSOD algorithm will detect the most salient underwater object with a complete region not only the edge and corner information in the complex underwater images.
Contributions for fusion underwater salient object detection are shown below:
- (1)
An improved color restoration method is utilized to compensate for the red color channel and correct the turbid scene in underwater images which will improve the performance of underwater salient object detection.
- (2)
A novel spatial optimization algorithm is proposed to enhance spatial coherence. The algorithm may optimize the super-pixel level saliency and make the whole framework preserve with more spatial structure information.
- (3)
A novel fusion underwater salient object detection algorithm is proposed which fully utilizes the underwater scenes and comprehensively considers the global and local contrast information in the underwater image. The proposed algorithm can detect the underwater salient object completely and accurately. Experiment results on the USOD dataset show that both the qualitative evaluation and the quantitative evaluation of the proposed FUSOD algorithm have a comparatively higher performance than the other traditional salient object detection algorithm.
The rest of the paper is organized as follows. In
Section 2, the novel proposed fusion algorithm is described in detail. In
Section 3, simulation results are presented. Conclusions are presented in
Section 4.
2. The Proposed FUSOD Algorithm
As mentioned above, underwater imagery suffers from poor visibility resulting from the attenuation of the propagated light. Traditional salient object detection methods may only segment the object with edge or contour information while supervised measure largely depends on prior knowledge and training tricks. To solve the problems mentioned above, an improved underwater color restoration method is utilized to compensate for the red color channel. Both the global contrast and local contrast feature are fully considered to obtain a more accurate multi-scale fusion saliency map. Finally, the proposed spatial optimization algorithm is utilized to smooth the saliency fusion map which may enhance the spatial coherence information. The proposed algorithm detects underwater salient object with a complete region not only the edge and contour information. The algorithm is organized into three parts,
Section 2.1 demonstrates the color restoration algorithm for underwater images,
Section 2.2 describes the calculation of the multi-scale fusion saliency map and
Section 2.3 demonstrates the spatial optimization algorithm. The whole framework of the proposed FUSOD algorithm is shown in
Figure 1. The result of each step from the entire framework can be seen in
Figure 2.
2.1. Underwater Color Restoration
The underwater scene is different from the common transmission medium, which is mainly affected by the properties of the target objects and camera lens characteristics. Research [
5] shows that seawater assimilates gradually different wavelengths of light as the complex circumstances of the sea and sunlight. Since red color is the longest wavelength, it is the first to be absorbed. Followed by orange and yellow color. Evidence shows that the green channel and blue channel are relatively well preserved in the underwater images. This means the green channel and blue channel contain complementary color information to compensate for the strong attenuation of the red channel.
The green channel and the blue channel are utilized to compensate for the red channel
for each pixel. Firstly, the mean color value
of the whole color channels is calculated, where
,
and
refer to the mean value of the red channel, green channel, and blue channel, respectively:
Then the red color is compensated depending on the value of the blue channel and green channel. Where , and represent the green, the blue color, and the red channel of the image . is defined as the compensated red color channel. is a constant parameter appropriate for various illumination conditions and acquisition settings.
Finally, the comprehensive compensated color information of the underwater image is obtained, which is described as
:
After the red channel attenuation has been compensated, the Gray-World algorithm
is utilized to estimate and compensate the illuminant color cast. Finally, a gamma correction filter is utilized to enhance the underwater image color contrast.
where
means the gamma filter and
refers to the final pixel level restored underwater image. The white balance algorithm makes the restored underwater image a brighter scene. In addition, gamma correction makes much of the color located on the bright side.
The result of each step from the improved underwater color restoration is shown in
Figure 3. The underwater color restoration makes the color brighter, which is beneficial for the color contrast saliency calculation afterward, and improves the performance of underwater salient object detection.
2.2. Multi-Scale Fusion Saliency
To obtain a more accurate saliency map, both the global and local contrast information of the underwater image are fully considered. The global contrast saliency calculates the color contrast by the specific pixel with each quantified pixel from the whole image. Finally, the saliency extent of each pixel is obtained, which is the global saliency. Local contrast saliency imitates the human eye fixation mechanism which searches the whole image with a local color contrast calculation window. The saliency extent of each pixel is calculated by the specific pixel with its surrounding sample pixels. With the linear aggregation, the final multi-scale fusion saliency map is obtained.
2.2.1. Global Contrast Saliency
Inspired by the assumption [
18] that not all color in the image is useful to the detection task and the color quantification is conducted in the restored underwater image
, the color attribute is obtained after the quantification, where
represents the quantified
RGB color field,
responds to the
color, and
means the number of colors in the quantified
RGB color field.
The pixel
that has a low frequency of occurrence during the quantification process may adopt a similar color to the adjacent pixel
which has a high frequency of occurrence. The similar color may be defined by color distance
calculation between each pixel:
where
represents the red color channel of the pixel
. The same meaning is to
and
. Then the sparsity
of each color is calculated after the color quantification.
Then, each pixel has its corresponding color attribute by the color distance calculation . In addition, each pixel has its own color sparsity through the color sparsity calculation.
Since the CIELab area can best simulate human perception, the global contrast saliency is defined by calculating the color contrast of each pixel with others from the whole image in the CIELab area, which is defined as:
Then, a Min–Max normalization of
is calculated to obtain the normalized color contrast
.
The color quantification and sparsity calculation will not affect the detection precision while reducing the computation of the global color contrast. With the multiplication of each pixel color sparsity
, the final global contrast saliency map is obtained, which is
. The heat map of global contrast saliency is shown in
Figure 4.
2.2.2. Local Contrast Saliency
There is a lot of evidence showing that human eye fixation will always be on the center of the visual scene while ignoring the surrounding areas. The phenomenon is called the center–surround principle. The local contrast saliency is estimated by a center–surround approach [
28,
29], which may calculate the location of the target object accurately. A rectangle window is established which is jointly divided into a center window
and a surround window
. Then, the window is slid in the whole image to calculate the local contrast saliency. The local contrast saliency calculation can be conducted as a Bayesian center–surround model which computes the saliency of each pixel
under certain feature whether salient or not.
While the sample pixel is salient, the pixel is defined as . refers to the likelihood calculation of the sample salient pixel in the center window. refers to the likelihood calculation in the surround window. responds to the color feature utilized to calculate the contrast saliency information.
The color contrast in the CIELab color area is calculated by a local Gaussian filter [
29]. The center window
is squeezed to a pixel to sparse the sample of pixels and reduce the operation. Finally, the calculation of center window likelihood
can be expanded as:
where
is the number of sample pixels in the center window and
is a standard deviation.
refers to the
feature of the sample pixel in the center window
.
The calculation of surround window likelihood
can be expanded as:
where
refers to the number of sample pixels and
describes the
feature of the sample pixel in the surround window
. The squeezed circle module [
29] is obtained to filter and calculate the likelihood of the surround window
of the image. The assumption that saliency is objective to each pixel is adopted and the probability density function
is made a constant [
28]. Finally, with Bayesian integration of prior and likelihood information, the local contrast saliency
can be obtained. The heat map of local contrast saliency is shown in
Figure 5.
2.2.3. Saliency Fusion
Following the aggregation algorithm proposed by A. Borji [
11], the multi-scale saliency is fused by linear aggregation. Given a set of
saliency maps
computed from an image
, the aggregated saliency value
at pixel
of
is modeled as the probability:
where
represents the saliency value of the pixel
in the saliency map
,
is a binary random variable taking the value 1 if
is a salient pixel and 0 otherwise, and
is a constant. The paper [
11] implemented three different options for the function
in Equation (16).
For effective calculation, a modified fusion strategy of the global and local contrast saliency
is conducted by linear aggregation with weighting parameter
:
where
is a trade-off parameter for balancing the global and local contrast saliency. The
is set as 0.3 to the underwater domain-specific field by an amount of trial.
2.3. Spatial Coherence Optimization
Traditional salient object detection methods often detect the object with vague edge points or corner information without a complete whole body of the salient object. The super-pixel segmentation is utilized to preserve the spatial coherence information and energy minimization function to optimize the super-pixel region between each other.
With the assumption that human vision intends to view similar perceptual parts of the scene as a whole, the super-pixel information used in this paper is considered as the whole perceptual part not only imitates the human eye mechanism but also preserves the image spatial coherence information. The SLIC algorithm is utilized to segment images into multiple pieces which is
, and each super-pixel contains the complete edge and spatial coherence information. Meanwhile, the pixel among the super-pixel has the union compatibility feature which means the pixel inside each super-pixel can be represented as a linear or affine combination of all the other points. Since the pixel in the super-pixel has a distinctive attraction towards the human eye fixation, the mean saliency value is obtained by averaging the pixel-level saliency inside each super-pixel which is the super-pixel level saliency
:
represents the super-pixel region, demonstrates the specific point in the super-pixel region.
Since the persistence of vision on an image varies from person to person, the generated saliency map can only demonstrate some kind of prior information about the salient object. It cannot describe the whole body of the salient object. We smooth the super-pixel saliency value, which is calculated before approaching the complete region of salient object by energy minimization function
. With the hypothesis that the saliency values of two adjacent super-pixels will not vary too much if they have a similar color feature. The super-pixel saliency map is optimized by the smooth terms in the energy minimization function. The compactness restriction of the smooth term in
RGB area will make the foreground of the saliency map more outstanding than the background. In addition, the saliency gradient of each adjacent super-pixel will not change too much.
represents the energy minimization function. demonstrates the mean saliency value of each super-pixel calculated before. means the small set which contains the two adjacent super-pixels and . and respond to the mean RGB value of the two adjacent super-pixels. calculates the color value distance between the two specific adjacent super-pixel regions mentioned before. Weight is bigger if the color attribute of the two adjacent super-pixels is similar.
Finally, the saliency map of the fusion underwater salient object detection algorithm is obtained by iteration of the energy minimization function. The final fusion underwater saliency map optimized by the energy minimization function has better performance than the traditional salient object detection method. The saliency gradient between two adjacent super-pixels will not change too much, and the foreground of the object inside the final saliency map is more outstanding than the background. The spatial coherence optimization algorithm of the fusion salient object detection algorithm will make the final detection more accurate. The result of each step from the proposed FUSOD algorithm is shown in
Figure 6. The heat map of each step from the entire framework shows that the final segmented object will preserve spatial structure information benefited from the proposed spatial optimization algorithm.
3. Results
In this section, we first briefly describe the experiment implementation details, introduction to the benchmark datasets, and evaluation metrics. Then, experiments on the publicly available datasets are conducted to evaluate and analysis the performance of the proposed fusion underwater salient object detection algorithm. Both qualitative and quantitative evaluation experiments are conducted to verify the proposed FUSOD algorithm. The proposed underwater color restoration algorithm and spatial coherence optimization algorithm may be a plug-in module for any marine optical detection task. Evaluation of underwater color restoration and spatial optimization is analyzed by ablation experiments. Experiments will also be conducted to analysis the effect of the hyper-parameter set in the underwater color restoration and the multi-scale fusion saliency algorithm.
The proposed and compared saliency methods are experimented with an Intel i5 8400 CPU with matlab2020b. Following the official hyper-parameters set in the benchmark [
11], experiment results on the specific dataset are credible.
USOD [
30] is a challenging dataset to evaluate underwater SOD methods. The dataset combines the subset of three underwater datasets which is USR-248 [
31], UIEB [
32], and EUVP [
33]. It contains 300 natural underwater images which are exhaustively compiled to ensure diversity in the object categories, waterbody, optical distortions, and aspect ratio of the salient objects. The combination of underwater datasets may make the USOD dataset more complex and difficult for underwater salient object detection tasks. The dataset provides binary segmentation images as ground truth for the evaluation of segmentation tasks.
The quantitative evaluation of the salient object detection algorithm with other traditional saliency methods are conducted based on the widely used evaluation criteria:
- (1)
Precision-Recall (PR): PR is a standard performance metric and is complementary to mean absolute error. It is evaluated by binarizing the predicted saliency maps with a threshold sliding from 0 to 255 and then performing a bin-wise comparison with the ground truth values. A saliency map
is first converted to a binary mask
with the ground truth
:
Achanta [
15] proposes the image-dependent adaptive threshold for binarizing
, which is computed as twice as the mean saliency of
:
- (2)
F-measure: F-measure is an overall performance measurement that is computed by the weighted harmonic mean of the precision and recall, where the parameter
is often set to 0.3:
Larger F-measure score indicates better performance.
- (3)
MAE: MAE is defined as the average pixel-wise absolute error between the prediction saliency map and the binary ground truth , both normalized in the range . The smaller MAE indicates better performance.
where
and
refer to the height and width of the saliency map, respectively.
To improve the readability of the paper, a nomenclature table of the abbreviations utilized in the algorithm is presented below in
Table 1. Each line describes the abbreviation which is used in the afterward experiments.
3.1. Experiment of Fusion Underwater Salient Object Detection Algorithm
Eight traditional saliency methods are chosen from the state-of-the-art benchmark. The eight methods are CA [
34], COV [
35], FES [
29], SEG [
28], SeR [
36], SIM [
37], SUN [
38] and SWD [
39] measure. Details of qualitative evaluation and quantitative evaluation are shown below:
3.1.1. Qualitative Evaluation
The proposed fusion underwater salient object detection algorithm (FUSOD) and the eight compared salient object detection measures are evaluated on different underwater scenes to imitate the complex underwater natural environment and evaluate the robustness performance of the algorithm. Several representative results are shown below to analyze the detection problem of the traditional SOD. Qualitative experiment results can be found in
Figure 7.
As mentioned before, vague foreground and incomplete detection results are the drawbacks to traditional SOD algorithms. From the qualitative results above, there is only edge and contour information detected by the traditional bottom-up saliency methods of CA without a complete whole body of the salient object. COV and FES methods focus on locating the object with a vague object detection result. However, the coarse location information may be utilized as a prior knowledge of many computer vision tasks. SEG use CRF to preserve spatial coherence information with a low detection result on some image which may detect the non-salient background information. SeR and SIM segment a blurry foreground with a lot of background information. SUN method utilizes bottom-up saliency to predict human eye fixation incorporated with top-down information. However, the result contains a lot of points from background statistics distribution. SWD method also focuses on predicting human eye fixation while missing the complete target object. It utilizes biological mechanisms and evaluates the spatial-weight dissimilarity. From the results above, the detection result focuses on the center of the view. The proposed fusion underwater salient object detection algorithm (FUSOD) can generate a better saliency map that benefits from the color restoration algorithm, the multi-scale contrast calculation, and the spatial coherence optimization algorithm. The saliency map generated from the proposed algorithm shows that both the location and the complete shape of the object can be detected accurately.
3.1.2. Quantitative Evaluation
Standard evaluations, such as PR curve, F-measure, and MAE are utilized to evaluate the quantitative performance of the SOD algorithm in the underwater domain-specific field. Quantitative evaluation results may be found in
Figure 8 and
Figure 9, and
Table 2. Experiment results show that SIM, SUN, and SeR have a comparatively low performance because they only use local saliency information and without an optimization refinement process. From the qualitative result above, SIM may detect a lot of background information which may increase the MAE score indicating fault detection. The FES method may have a comparatively high location capability but still with a lot of error information. CA combines local feature and global feature to detect the most salient eye fixation point but lost spatial information of the target object. The combination of multi-scale fusion saliency and spatial coherence optimization can make the proposed FUSOD algorithm not only preserve the complete region of the object body but also result in a high detection precision performance.
3.2. Ablation Study of Underwater Color Restoration
3.2.1. Qualitative and Quantitative Analysis of Underwater Color Restoration
The proposed fusion underwater salient object detection algorithm aims at segmenting the most salient object in a domain-specific area with a complete whole body accurately. The preprocessing of the turbid image is important for afterward color contrast calculation. So, we set an ablation experiment to analyze the effect of the color restoration algorithm on the proposed FUSOD algorithm. The details of experiment results are shown in
Figure 10 and
Figure 11, and
Table 3.
The qualitative experiment result above shows that the restored underwater image may be a bit brighter than the original view. The corrected color may move towards the bright side which may benefit afterwards color contrast calculation.
The red plot (FUSOD) represents the proposed fusion underwater salient object detection algorithm with the color restoration preprocess step. While the blue one (FUSOD-ucr) means the fusion underwater SOD algorithm without underwater color restoration preprocess. From the quantitative evaluation results, the detection precision will be a bit lower if the color restoration preprocess algorithm is emitted from the whole framework. The color feature is conducted in the whole detection process, such as global contrast calculation, local contrast calculation, and spatial coherence optimization. The underwater images are distinctive from common images and color attenuation may have an important effect on optical-based applications and research. The color restoration algorithm will not hugely enhance the detection performance, but the color compensation and correction will benefit the amount of afterward tasks.
3.2.2. Experiment of Hyper-Parameter in Underwater Color Restoration
To find the best
in Equation (2), an experiment is set with different values of
. From the PR curve and F-measure score in
Figure 12 and
Table 4, the best
may be set as 0.2. Experiment results demonstrate that the detection performance will have a minor change while the hyper parameter
changes from 0.1 to 0.5. Evidence [
5] shows that
equals to 0.1 may have a better color restoration result. We set
as 0.2 according to the experiment and the underwater domain-specific field.
3.3. Experiment of Hyper-Parameter in Multi-Scale Saliency Fusion
The constant parameter
in Equation (17) is a trade-off parameter to balance the multi-scale contrast saliency. The heat map of each step in the entire framework shows that the local contrast saliency may provide more location information than global contrast saliency. The PR curve and F-measure in
Figure 13 and
Table 5 demonstrate that
may be set as 0.3 to obtain the best performance.
3.4. Ablation Study of Spatial Coherence Optimization
The proposed spatial optimization algorithm is calculated with super-pixel segmentation and energy minimization function to optimize the coarse multi-scale fusion saliency. The method may enhance the spatial coherence and make the final saliency map segmented with a complete body of the salient object. It meets the physical rule that the whole system is stable while the value of the energy function is the lowest. So, the final FUSOD is optimized by iteration of energy minimization. We set an ablation experiment to evaluate the performance between the coarse pixel level multi-scale fusion saliency, super-pixel level saliency without energy minimization, and the final optimized FUSOD algorithm. Experiment results can be found in
Figure 14 and
Figure 15, and
Table 6.
The FUSOD refers to the final optimized fusion underwater salient object detection algorithm. The FUSOD-p means the pixel-level coarse multi-scale fusion saliency map. The FUSOD-sp represents the super-pixel level saliency methods without energy minimization optimization. From the qualitative evaluation results presented in
Figure 14, the coarse multi-scale fusion saliency may segment the salient object with global contrast information and local eye fixation features. The coarse saliency map is calculated by pixel from the image. The foreground segmentation may be the full extent of the salient object than the traditional bottom-up saliency methods, which means the fixation points and region information are both detected from the image. The super-pixel level saliency map preserves the spatial coherence information. However, both of the former saliency maps have a lot of fault background detection results which will result in a low precision and comparatively high MAE score. Finally, the saliency map segmented by the FUSOD algorithm may segment the foreground outstanding than the background with the help of energy minimization optimization. Both the foreground and background regions are smooth inside.
The quantitative evaluation results above show that the coarse multi-scale fusion saliency map (FUSOD-p) may have a low performance compared with the other two methods. The MAE column shows that the super-pixel level saliency segmentation (FUSOD-sp) may have a comparatively high F-measure but retain a lot of fault detection. The optimized FUSOD algorithm may have a high detection precision with a low MAE score.