1. Introduction
In navigation, searching, and tracking tasks under marine environments, infrared target detection technology plays a crucial role due to its unique advantages, such as long detection range, and high concealment [
1,
2,
3]. Infrared imaging systems can obtain distance and shape information by receiving thermal radiation [
4], and then produce infrared images for different tasks through subsequent processing. However, constrained by detector performance, complex weather conditions, and the inherent fluctuations of the marine, infrared images typically have only a narrow grayscale range, with a low contrast and restricted signal-to-noise ratio at long imaging distances, which significantly increases the difficulty of target detection.
In infrared ship detection tasks, targets can be roughly divided into point targets or small targets (having area less than 9 × 9 pixels, isotropic), area targets (typically having certain shape and contour information, lacking texture, but having a relatively uniform grayscale distribution), and larger targets (larger area, with rich texture and contour information). In point target detection tasks, researchers have delved into the isotropic shape characteristics and strong contrast of point targets, developing a series of efficient and reliable methods such as curvature filtering [
5,
6] and Local Contrast Measure (LCM) [
7,
8]. For larger targets, which usually occupy a significant area in the image and have rich contour and texture details, they are easier to detect compared to point and area targets, but face the challenge of how to completely extract the entire ship.
Area target ships always appear as patches with uniform grayscale distribution and regular shapes in infrared images. If the grayscale of the ship is greater (smaller) than its local area, it is called a bright (dark) polarity target. The existing detection methods of the area target ship can be roughly divided into histogram-based methods, background modeling methods, feature-based methods, and deep learning-based methods. Histogram-based detection methods rely on the grayscale distribution of pixels in the whole image, dividing the image into foreground and background categories under certain criteria. For different scenarios, many researchers have developed various histogram transformation methods to adjust the grayscale distribution of images [
4,
9,
10], enhance the contrast of targets, and then use methods such as the Otsu [
11,
12], the maximum entropy [
13,
14], and various improved forms for segmentation. In addition, clustering methods [
15,
16] have been introduced into infrared ship detection tasks to achieve complete extraction of larger targets. Mean shift was utilized to smooth infrared images, enhancing the contrast of ship targets [
17,
18]. However, these methods often have strict limitations on the grayscale distribution of the image and the target, making it difficult to determine a reasonable segmentation threshold in complex scenarios with unknown target grayscale polarity, interference similar to real targets, or irregular histogram distribution. Background modeling methods estimate the background in various ways to separate the target from the background. The Infrared Patch Image (IPI) model [
19] is based on the assumption that the target is sparse with the background low-rank; thus, by dividing, re-organizing, decomposing, and reconstructing the image, small-sized targets in a stable background can be detected. Subsequently, researchers have improved the reconstruction and decomposition methods of sub-images to enhance the detection capability of the IPI model in complex scenarios [
20,
21]. Apart from this, researchers use Gaussian mixture models to model the background [
22,
23,
24], achieving prediction and reproduction of dynamic backgrounds. These methods have certain robustness in scenarios with severe fluctuations but struggle to distinguish irregular fish scale reflections, sun-glint, and other interferences in complex scenarios.
In infrared images, ships often have one or more features that make them distinguishable from the background. By quantitatively analyzing these features and segmenting images according to certain criteria, feature analysis-based detection algorithms can be formed. Top-Hat filtering was first introduced into infrared ship detection tasks [
25], using morphological filters to suppress clutter in the image while preserving bright blobs, achieving the detection of bright ships. On this basis, many researchers have improved morphological operations, such as using annular structural elements to enhance the detection capability for small-sized targets [
26,
27] or introducing multi-scale algorithms to achieve adaptive detection of targets of different sizes [
28]. In addition, features such as contours [
29,
30] and gradients [
31,
32,
33] have also been widely used in ship target detection, and many researchers consider combining multiple features [
15,
34] to enhance the robustness of algorithms in different scenarios. For dark ships, many researchers have conducted in-depth studies. Dong et al. [
35] calculated saliency maps through an inverse Gaussian difference filter, making dark blobs outstanding in the saliency map, and then extracted potential ships in the image by segmenting the saliency map with an adaptive threshold. However, this method struggles to distinguish narrow dark bands on the sea surface or small-sized fish scale patterns with clear edges.
By using Grayscale Morphological Reconstruction operations to preserve and suppress bright or dark blobs in infrared images, Li et al. [
36] achieved parallel detection of bright and dark ships. However, this method also faces difficulties in determining segmentation thresholds in complex scenarios with “significant” interference. To address the interference of island and reef backgrounds, Chen et al. [
37] calculated the improved structural tensor of the multi-scale grayscale morphological reconstruction results of the original infrared image as a guide, merging the prominent regions in the Gaussian-filtered image to detect bright polarity ships of different sizes. However, the improved structural tensor proposed by this method struggles to distinguish fish scale pattern interference similar in size to ships. Ding et al. [
4] proposed an improved histogram equalization combined with gradient information (MHEEF) to preprocess infrared images with backlit scenes, enhancing the contrast of dark ships, and then a dual-scale, dual-mode Local Contrast Measure (LCMDSM) was utilized to extract targets. The above methods can all be summarized as using single-frame image information to detect ships in single-frame images. Considering that ships, as man-made objects, have temporal and spatial stability in continuously captured image sequences, based on this feature, Wang et al. [
31] proposed an improved wavelet transform to suppress the time-varying background clutter and simultaneously track stable ships by using pipeline filtering. Similarly, Li et al. [
38] first extracted the Maximally Stable Extremal Regions (MSER) in the infrared image, then suppressed clutter through region matching between adjacent frames, and finally stable bright and dark polarity ships were detected at the same time. However, when the sea surface fluctuates violently and the size of the ship is small, it will be challenging to achieve stable matching between the MSER regions containing ship targets directly extracted from different frames, which may lead to frequent missed detections. Zhang et al. [
39] developed a “detection-tracking-detection” method for detecting small-sized bright ships in infrared images, first extracting regions in single-frame images where targets may exist by using difference of Gaussian filtering and adaptive threshold segmentation, then suppressing interference through continuous frame matching. Apart from this, a re-detection method for potential missed targets was designed to further improve the robustness of this method.
As a trend in many research fields, deep learning-based methods are data-driven, that is, through data annotation, reward and punishment mechanisms, as well as iterative training, researchers enable neural networks to mine and refine deeper, more abstract features from a large amount of data and finally achieve efficient and accurate detection. Early on, such methods were mainly applied to infrared image target detection tasks with a space-based observer [
40,
41]. In 2018, Zhou et al. [
42] proposed a one-stage network to learn features from multi-resolution infrared images, achieving reliable detection of ships in large infrared images. In 2022, Long et al. [
43] introduced a visual attention mechanism into the YOLOv5 network architecture and introduced dilated convolution to enhance the receptive field, achieving the recognition of infrared ships against the background of a gentle sea surface with island reefs. By combining a manually designed feature extractor and deep learning methods, Yao et al. [
44] designed a multi-dimensional information fusion network to accurately identify small-sized bright ships in infrared images. In 2023, Deng et al. [
45] published an infrared ship rotating target detection algorithm, FMR-YOLO, in which a Weighted Feature Pyramid Network Based on Extended Convolution (DWFPN) was proposed with rotation detection technology introduced and achieved an average accuracy of up to 92.7%. Considering the complexity of the deep learning method and the difficulty of deployment on small devices, Gao et al. [
46] proposed a lightweight model for detecting infrared ships by replacing the backbone of YOLOv5 with the Mobilev3, which greatly improved the computational efficiency and achieved the same detection performance as the YOLOv5m model while reducing the parameter size by 83%. In 2024, an improved detection model based on YOLOv5s to detect infrared ship targets in coastal areas with high ship density and significant target scale differences was proposed by Wang et al. [
47], in which a feature fusion module was designed to enhance the feature fusion of the network, with SPD-Conv and Soft-NMS adopted to improve the detection accuracy of small targets in low-resolution images and deal with the missed detection in the case of dense occlusion. In addition to improving the design of the model, Wang et al. [
48] introduced infrared multi-band fusion technology to improve detection accuracy with fewer parameters, achieving inference speeds close to 60 frames per second on embedded devices. Apart from this, many researchers [
49,
50,
51] have been applying deep learning-based methods into ship detection tasks in infrared remote sensing images, continuously improving the model performance and detection effect. However, for deep learning-based methods, a large amount of training data are needed to ensure the reliability of the neural network. For example, in [
43], researchers mentioned using 4079 out of 4533 infrared images for training and then testing the remaining images. On the one hand, publicly available infrared sea surface image datasets vary greatly and are limited in number. At the same time, manually annotating a large number of infrared images with poor contrast, missing target details, and a low signal-to-noise ratio requires a lot of manpower and resources, still posing challenges for deep learning-based methods [
37].
In summary, we may summarize the current challenges in detecting infrared ships. First, most existing methods are designed for relatively simple scenarios with smooth seas, and usually a single grayscale polarity of the target is assumed, while in actual sea surface scenarios, the polarity of ship targets in infrared images is often unknown due to the variation in sea conditions, illumination, and detector positions, thus may result in missing detections. Second, in complex scenarios, there may exist interferences of different sizes, such as islands, artificial structures, bright and dark bands, fish scale patterns, and even clouds, which may be more prominent than real targets in the saliency map of various features. As a result, it may struggle to determine the segmentation parameters for methods such as adaptive threshold segmentation or Otsu to achieve balance between accuracy and completeness. Finally, in some scenarios, the temporal and spatial stability of the ships may not be fully utilized, and these features may provide some assistance for infrared ship detection tasks.
In this paper, we make the following assumptions regarding area target ships in maritime scenarios:
- (1)
Ship Polarity: Ships can exhibit either bright polarity or dark polarity. Specifically, their grayscale values are either relatively high (bright) or low (dark) than the local background.
- (2)
Uniform Grayscale Distribution: The grayscale distribution of ships is uniform across the infrared image sequences.
- (3)
Temporal and Spatial Stability: Ships demonstrate temporal and spatial stability in infrared image sequences. In other words, their grayscale distribution and shape remain nearly constant over time.
Addressing the issues above, this paper first proposes an infrared image smoothing method that combines GMR and RTV to suppress noises and enhance the contrast of ships. Subsequently, the Maximally Stable Extremal Regions in the image are extracted as candidate targets. Finally, shape features and spatiotemporal characteristics are integrated to discriminate between ships and interferences, achieving the detection of bright and dark ships in complex scenarios.
4. Conclusions
In this paper, we introduce a novel infrared image smoothing technique composed of GMR and RTV. Additionally, a detection method considering the grayscale uniformity of ships and integrating shape and spatiotemporal features is established for detecting bright and dark ships in complex maritime scenarios. Initially, the input infrared images undergo OGMR(CGMR) to preserve dark (bright) blobs with the opposite suppressed, followed by smoothing the image with the RTV to reduce clutter and enhance the contrast of the ship. Subsequently, Maximally Stable Extremal Regions (MSERs) are extracted from the smoothed image as candidate targets, and the results from the bright and dark channels are merged. Shape features are then utilized to eliminate clutter interference, yielding single-frame detection results. Finally, utilizing the spatiotemporal stability of ships and the fluctuation of clutter, true targets are identified through a multi-frame matching strategy. Experimental results demonstrate that the proposed method outperforms ITDBE, MRMF, and TFMSER in seven image sequences, achieving accurate and effective detection of bright and dark polarity ship targets. Our method avoids the use of adaptive threshold segmentation, which may struggle in complex maritime scenes. Instead, the RTV method is introduced into the preprocessing process of infrared images to enhance the suppression effect of fish scale plates, improve the detection effect of ships, and combine underutilized features such as gray uniformity and spatiotemporal stability, hoping to provide new ideas for infrared ship detection tasks.
Despite the excellent detection results of the proposed method, there are still some shortcomings. Primarily, the proposed method is aimed at ships with distinct polarities—bright and dark—and may exhibit unsatisfactory preformation when encountering ships with an uneven grayscale distribution, potentially leading to incomplete or missed detections. Future amendments could incorporate methods like watershed segmentation [
56] and region growth [
57] to refine the detection of unevenly distributed ships. Additionally, the proposed method avoids the use of adaptive threshold segmentation due to its limited adaptability in complex scenarios. A simple and effective evaluation mechanism for describing the scenarios and adjusting the parameters of the methods still remains necessary. For example, an evaluation method based on the statistical results of the image blocks in [
18] could adaptively guide mean drift filters. Therefore, we will also focus on the evaluation mechanism of the scenario as a key direction for future studies. Lastly, with the rapid advancements in deep learning, there is an aspiration to integrate deep learning-based methods into infrared ship target detection tasks. This may involve using deep learning to refine and summarize the shape features of targets, potentially replacing traditional manual designed features to achieve more effective and robust ship detection. Furthermore, leveraging deep learning methods to explore image features at a deeper and more abstract level enables the extraction of richer semantic information, which can facilitate distinguishing different components within complex maritime scenarios (such as separating sea surfaces, islands, and the sky) and constructing image models and field data that can accurately describe such scenarios. These insights may provide novel ideas for achieving more effective and robust detection.