Next Article in Journal
Characterization of CYGNSS Ocean Surface Wind Speed Products
Previous Article in Journal
Underutilized Feature Extraction Methods for Burn Severity Mapping: A Comprehensive Evaluation
Previous Article in Special Issue
Multi-Scale and Multi-Network Deep Feature Fusion for Discriminative Scene Classification of High-Resolution Remote Sensing Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

LH-YOLO: A Lightweight and High-Precision SAR Ship Detection Model Based on the Improved YOLOv8n

1
School of Microelectronics, Xi’an Jiaotong University, Xi’an 710049, China
2
School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(22), 4340; https://doi.org/10.3390/rs16224340
Submission received: 1 October 2024 / Revised: 15 November 2024 / Accepted: 16 November 2024 / Published: 20 November 2024

Abstract

:
Synthetic aperture radar is widely applied to ship detection due to generating high-resolution images under diverse weather conditions and its penetration capabilities, making SAR images a valuable data source. However, detecting multi-scale ship targets in complex backgrounds leads to issues of false positives and missed detections, posing challenges for lightweight and high-precision algorithms. There is an urgent need to improve accuracy of algorithms and their deployability. This paper introduces LH-YOLO, a YOLOv8n-based, lightweight, and high-precision SAR ship detection model. We propose a lightweight backbone network, StarNet-nano, and employ element-wise multiplication to construct a lightweight feature extraction module, LFE-C2f, for the neck of LH-YOLO. Additionally, a reused and shared convolutional detection (RSCD) head is designed using a weight sharing mechanism. These enhancements significantly reduce model size and computational demands while maintaining high precision. LH-YOLO features only 1.862 M parameters, representing a 38.1% reduction compared to YOLOv8n. It exhibits a 23.8% reduction in computational load while achieving a mAP50 of 96.6% on the HRSID dataset, which is 1.4% higher than YOLOv8n. Furthermore, it demonstrates strong generalization on the SAR-Ship-Dataset with a mAP50 of 93.8%, surpassing YOLOv8n by 0.7%. LH-YOLO is well-suited for environments with limited resources, such as embedded systems and edge computing platforms.

1. Introduction

SAR images are generated by synthetic aperture radar systems with the advantage of being independent of light and weather conditions, thereby enabling imaging at all times, regardless of environmental factors [1]. Unlike optical images, SAR actively emits electromagnetic waves and receives reflected signals from targets, allowing for image generation without relying on sunlight or other natural light sources [2]. SAR uses microwaves with long wavelengths and strong penetration capabilities, allowing it to penetrate clouds, fog, and even vegetation. Consequently, SAR images have significant advantages in detection tasks and are widely applied in ocean monitoring, disaster response, surface deformation, and various other fields. Compared to optical images, SAR images provide more reliable data under adverse conditions, making them particularly suitable for long-term and large-scale monitoring tasks. SAR ship detection systems are applied to maritime surveillance, and their significance is increasing rapidly [3,4,5]. The ship detection algorithms for SAR images can be divided into traditional algorithms and deep learning algorithms [6].
In SAR ship detection, one of the most classical methods among traditional algorithms is the constant false alarm rate (CFAR) [7], primarily focusing on the extraction of simple image features using empirical techniques. It has been well developed, giving rise to a series of algorithms, such as VTM-CFAR [8], OS-CFAR [9], and CMLD-CFAR [10]. The core mechanism of these algorithms is based on statistical principles, enabling target detection by analyzing the contrast difference between the ship targets and the background clutter. During this process, an adaptive threshold is established to ensure stable detection across varying clutter environments, effectively reducing the false alarm rate. Similarly, there are other traditional algorithms such as algorithms based on superpixel segmentation [11,12]. However, traditional methods, like the CFAR, rely on manual modeling to characterize clutter background, posing significant challenges in complex scenarios. For instance, in complex maritime conditions or nearshore environments, dynamic changes in clutter characteristics can complicate model parameter adjustments, adversely affecting the algorithm’s detection accuracy and generalization ability [13]. Another limitation is that these algorithms typically address only simple features, making it challenging to capture more complex target morphology and motion information. Traditional algorithms often exhibit poor performance in complex situations, such as ship occlusion, overlapping, or dense target distribution, further limiting their effectiveness in practical applications.
Recent advancements in artificial intelligence (AI) have revealed the significant potential of deep learning algorithms in the domain of visual perception applications [14]. To enhance the robustness of SAR ship detection algorithms in complex scenes, deep learning techniques have been progressively adopted and have become the mainstream approach in SAR ship detection [15]. The YOLO series, which constitutes single-stage target detection algorithms, have been widely adopted in SAR image detection tasks owing to their superior speed and accuracy performance [16,17,18,19,20]. Building upon this foundation, researchers have proposed various improvements. For instance, Guo et al. [21] optimized YOLOv5s and applied them to ship detection tasks. They enhanced the model’s ability to accurately detect targets of varying scales in intricate environments by adaptively fusing multi-scale features. DBW-YOLO [22] introduces a feature enhancement module based on deformable convolutional networks to provide a more comprehensive representation of ship features. YOLO-OSD [23] adopts a data-driven hybrid approach that leverages the dataset’s statistical properties to construct an efficient model architecture. Other single-stage detection algorithms, including SSD [24] and RetinaNet [25], have also demonstrated robust performance in SAR image detection. SSD significantly simplifies the detection process and boosts inference speed by predicting bounding boxes and categories directly from the feature map, thus avoiding the region proposal step found in the two-stage algorithms. The core innovation of RetinaNet lies in its introduction of the focal loss function, improving the model’s capacity to handle both complex and simple samples through the dynamic modulation of the learning rate for different samples. Focal loss mitigates the issue of the extreme foreground–background class imbalance by down-weighting the loss assigned to easy-to-classify samples, allowing the model to focus training on complex examples.
In addition, two-stage target detection algorithms, including Faster R-CNN [26], Mask R-CNN [27], and Cascade R-CNN [28], have been widely applied in SAR image detection. To begin with, Faster R-CNN improves target detection accuracy through the incorporation of a region proposal network (RPN), which also significantly accelerates detection speed. By integrating the RPN and designing shared convolutional layers, this method optimizes detection efficiency while maintaining high precision, making it a milestone in the field of target detection. Inspired by this, He et al. developed Mask R-CNN, which inherited Faster R-CNN’s high precision advantage and introduced instance segmentation, successfully combining object detection and semantic segmentation to provide richer target information. Subsequently, Cascade R-CNN improves detection accuracy through a multi-stage cascade detection strategy that continuously refines bounding box predictions. The cascade mechanism effectively addresses the challenges that single-stage detectors struggle to resolve, further enhancing detection accuracy and robustness. However, its model size remains relatively large.
There are substantial advances in improving ship detection performance in SAR imagery. However, current SAR ship detection algorithms still struggle with insufficient accuracy and excessive complexity in practical deployment [29]. Developing a lightweight, high-precision model for multi-scale SAR ship detection continues to pose a significant challenge. It is particularly evident in terms of parameter redundancy, as SAR images are grayscale images and provide less information compared to optical images. Sophisticated detection network architectures may prove inefficient in detection task, leading to convolutional redundancy during ship recognition. Researchers have investigated various solutions in this field. Feng et al. [30] introduced LPEDet, an anchor-free SAR ship detection algorithm with lightweight position enhancement, built upon the YOLOX framework. They restructured the multi-scale backbone network NLCNet to optimize both detection speed and precision. Yasir et al. [31] proposed the lightweight HGNetv2 backbone, SLim-neck, and an efficient EMSConvP lightweight decoder header, reducing the computational complexity and achieving a 133.1% improvement in the frame rate. Gao et al. [32] proposed DSSM-LightNet, utilizing the SLim-neck structure with GSConv and VoVGSCSP modules for optimized feature fusion. Cui et al. [33] designed an algorithm using the multi-objective firefly optimization algorithm (MFA) for network architecture optimization, generating a set of lightweight network architectures by iteratively refining probability lists to strike high detection accuracy while ensuring small model size. Consequently, improving ship detection performance while emphasizing lightweight model optimization remains the central focus of current research, aiming to balance computational complexity and detection performance. This approach holds promising application in resource-constrained scenarios, such as edge computing platforms.
To address these challenges, this paper presents LH-YOLO, a lightweight and high-precision SAR ship detection network based on YOLOv8n, which incorporates element-wise multiplication and model compression techniques to minimize redundant parameters and computational demands. LH-YOLO additionally introduces a weight sharing mechanism to enhance parameter utilization efficiency. These optimization strategies significantly decrease both network parameters and computational load while enhancing detection performance on SAR ship datasets. Consequently, LH-YOLO achieves efficient detection while ensuring deployment convenience and recognition stability in practical applications. The contributions are summarized as follows:
  • The lightweight StarNet-nano structure are designed for the LH-YOLO’s backbone, significantly enhancing network efficiency and reducing the parameter count. StarNet-nano retains key feature extraction capabilities by optimizing the structure of convolutional modules and reducing layer complexity, while notably decreasing parameter count and computational overhead. The structural design adheres to the principle of balancing lightweight design with efficiency, significantly decreasing parameter count and reducing computational load while maintaining high model precision.
  • We also introduce the LFE-C2f structure in the neck of LH-YOLO, effectively reducing model parameters while maintaining performance comparable to that of YOLOv8n. LFE-C2f reduces redundant convolutional computations by adopting a branching feature fusion design, ensuring improved overall model efficiency. The core operation of the LFE-C2f architecture is the element-wise multiplication, which maps input feature to a high-dimensional nonlinear feature space, significantly enhancing the model’s feature representation capabilities.
  • To mitigate the high computational load of the YOLOv8’s detection head, we designed a reused and shared convolutional detection (RSCD) head, employing a weight sharing mechanism to improve parameter utilization. This design reduces parameter count while improving the performance of the detection head. Overall, the LH-YOLO model has a relatively small parameter count of only 1.862M, which is 1.144M fewer than YOLOv8n, representing a 38.1% decrease. Despite this substantial reduction, LH-YOLO’s precision in detecting SAR ships surpasses that of YOLOv8n, achieving a mAP50 that is 1.4% higher than YOLOv8n on the HRSID dataset.
The remainder of this paper is structured as follows. Section 2 details the structure of the YOLOv8 model and describes the proposed LH-YOLO model. Section 3 provides details of the experimental setup, presents ablation experiments with these three improvements, and offers a performance comparison between LH-YOLO and other models. Section 4 discusses the advantages and limitations of LH-YOLO. Finally, Section 5 presents a brief conclusion.

2. Methodology

2.1. The Architecture of YOLOv8

YOLOv8 is a state-of-the-art (SOTA) model, which supports various visual AI applications, such as detection, classification, and segmentation, offering both high performance and flexibility [19]. It comprises three main components: the backbone network that aims to extract features, the neck enables the features to be refined and processed, and the detection head for final prediction to complete detection.
The architecture is illustrated in Figure 1. At the start of inference, the input features are first fed into the backbone to extract features. After finishing extraction in the backbone, the extracted feature maps are subsequently processed by the neck, which generates three levels of the scale-specific feature maps using the PANet architecture [34], being primarily composed of modules C2f, Bottleneck, and upsample. Finally, the head uses three decoupled heads for the final detection output.
The backbone network of YOLOv8 is based on an improved version of the structure of cross stage partial network (CSPNet) known as CSPDarkNet53, which is designed to reduce computational complexity and enchance the representation of features by dividing the features into two segments for independent processing and subsequent fusion [35]. This approach minimizes computational load and memory demands while preserving detection accuracy. It employs the C2f, Bottleneck, and SPPF modules to enhance feature extraction. The SPPF module captures features at multiple scales and aggregates them through max pooling, enhancing the network’s robustness against variations in target size and enhancing detection performance [36]. The backbone integrates ResNet principles [37], utilizing residual connections extensively to support deeper networks. This mitigates the issue of vanishing gradients during training and facilitate model convergence. The neck network incorporates a bidirectional feature fusion pathway to effectively transfer and optimize features across layers, enhancing detection performance. The head module processes multi-scale feature maps to handle variable target sizes, enabling robust detection at different scale. However, it still faces challenges related to high computational load and model complexity, limiting its deployment convenience in resource-constrained scenarios.
The YOLOv8 network consists of five model variants: YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x, which are differentiated by model size. The primary distinctions lie in model complexity and parameter count. YOLOv8n is the smallest model, offering the lowest computational complexity and the fastest inference speed, making it ideal for edge deployment with rapid identification capabilities. Furthermore, since SAR images are grayscale and contain less information than optical images, complex detection frameworks are not always well suited for SAR ship target detection. They may introduce unnecessary convolutional redundancy, potentially reducing efficiency in detecting SAR ships. To address these issues, we improved YOLOv8n by incorporating element-wise multiplication and model compression techniques, designing a lightweight backbone called StarNet-nano, replacing the C2f module with an enhanced LFE-C2f module, and proposing a higher performance detection head, called RSCD.

2.2. The Proposed LH-YOLO Structure

In this subsection, we will emphasize the distinctions between the proposed LH-YOLO and YOLOv8n, followed by a description of the three primary improvements in LH-YOLO.
As shown in Figure 2, a new network architecture has been proposed. We designed an SAR ship detection model named LH-YOLO, which is based on the YOLOv8n. Three major improvements were incorporated in the LH-YOLO. First, a newly designed StarNet-nano backbone was incorporated in the backbone network of the LH-YOLO, which was constructed based on the next-generation SOTA lightweight model, known as StarNet [38]. We established a lightweight four-stage backbone, significantly reducing parameter count and computational load while maintaining performance. Second, we enhanced the core module in the neck network, the C2f module, by developing the LFE-C2f module, which reduces parameters and improves feature representation capabilities through element-wise multiplication that maps features into a high-dimensional implicit feature space. Third, we designed the RSCD head. By employing a weight-sharing mechanism, RSCD’s required convolutional layers are decreased, decreasing the parameter count of the detection head. Additionally, the reuse of parameters addresses the limitation of low-parameter utilization in the head network.
LH-YOLO improves detection accuracy while also achieving a significant reduction in parameter count in the model. Furthermore, architectural optimizations lead to a notable decrease in computational demands compared to YOLOv8n. These modifications achieve the goal of lightweight design while maintaining high precision, making the model particularly suitable for resource-constrained scenarios. Furthermore, LH-YOLO has been applied to several datasets, such as HRSID and SAR-Ship-Dataset, demonstrating excellent performance. It indicates strong generalization capabilities for the LH-YOLO model.

2.2.1. Structure of the Lightweight StarNet-Nano Backbone Network

StarNet can achieve efficient feature representation without requiring complex network designs. Its unique capability lies in performing computations in low-dimensional feature spaces while implicitly considering high-dimensional features. This capability can be leveraged to enhance the backbone of YOLOv8n, providing efficient computation and improved feature representation, which contributes to higher detection performance in resource-constrained scenarios. It is suitable for deployment in efficient and compact networks rather than the large models traditionally used. Moreover, through multi-layer star operations, StarNet can recursively increase the implicit feature dimensions, approaching infinite dimensionality. For networks with extensive widths and depths, this property significantly enhances the expressiveness of the features. Consequently, the quality of feature extraction can be markedly improved through appropriate depth and width design, thereby enhancing the accuracy of SAR ship detection. Therefore, StarNet is well suited for enhancing YOLOv8n in a lightweight manner.
Four versions of StarNet are introduced in [38]: StarNet-s1, StarNet-s2, StarNet-s3, and StarNet-s4. Unlike these versions, we have designed a more lightweight architecture, StarNet-nano, further minimizing the number of convolutional layers and reducing layer complexity. StarNet-nano is designed with a four-stage hierarchical architecture, as prior studies have shown that this structure yields optimal performance. Moreover, four stages, combined with the SPPF module, enable the extraction of five feature levels within the YOLOv8n framework, which can seamlessly replace the original backbone structure without altering the neck and head components of YOLOv8n. Each stage employs a convolutional layer for downsampling and the star blocks for feature extraction. The star block serves as the fundamental module for the StarNet-nano network, as detailed in Figure 3a. A kernel size of 7×7 depthwise convolution is incorporated into the last section of the star block to effectively extract features which have a larger receptive field and capture complex spatial relationships [39]. The use of a 7×7 depthwise convolution simplifies the model architecture by preserving important features while reducing the depth of the network. The channel expansion factor is fixed at the value of 4, and the network width doubles at each stage. To enhance efficiency, batch normalization is applied in star block to replace layer normalization. Additionally, the GELU activation function in the star block is replaced with ReLU6 [40], which helps prevent the activation values from becoming excessively large, thereby improving the model’s stability and efficiency for resource-constrained devices.
In the star block, the detailed computation process of its core operation is depicted in Figure 3b, The input feature map X 1 passes through two different fully connected layers, one of which is followed by a ReLU6 activation. After obtaining the two output feature matrices, element-wise multiplication of the two feature matrices is implemented. The mathematical expressions are as follows:
X 1 = B N [ f d w c o n v 7 × 7 ( X I N ) ]
X 2 = { R e L U 6 [ f c o n v 1 × 1 ( X 1 ) ] } f c o n v 1 × 1 ( X 1 )
X O U T = f d w c o n v 7 × 7 { B N [ f c o n v 1 × 1 ( X 2 ) ] } X I N
where f c o n v 1 × 1 ( x ) represents the fully connected layer, f d w c o n v 7 × 7 ( x ) represents the depthwise convolution operations with a kernel size of 7 × 7 , and B N ( x ) means batch normalization. ⨁ represents the elementwise addition between the two feature matrices, and ∗ means the elementwise multiplication.
The most crucial aspect is represented in Equation (2), where ReLU6 activation is applied only to the output feature maps of one branch. This approach leverages the potential for nonlinearity in the elementwise multiplication, allowing for the possibility of eliminating the nonlinear activation in CNN models, thereby reducing model’s complexity and computational demands. Therefore, the module shown in Figure 3a is used as the fundamental module for constructing StarNet-nano.
To further reduce the parameter count and model complexity while meeting accuracy requirements, we independently designed the lightweight backbone, StarNet-nano, based on the prototype of StarNet. The final structure of backbone is detailed in the Figure 2. StarNet-nano is made up of four stages, each containing a different number of star blocks. A 3 × 3 convolutional layer is placed at the beginning of each stage, followed by feeding the feature map into the star block. The first stage includes 1 star block, the second stage has 2, the third stage contains 4, and the fourth stage contains 1 star block. Compared to StarNet-s1, StarNet-nano incorporates 7 fewer star blocks, further reducing parameters and even enhancing inference accuracy.

2.2.2. Lightweight Feature Extraction Module

In the YOLOv8n architecture, the C2f module plays a crucial role in implementing cross stage partial fusion. As a specialized convolution module, it performs operations such as feature transformation, branch processing, and feature fusion, enabling it to extract and transform input featuresa and producing outputs with enhanced representational capabilities.
The C2f module structure in YOLOv8n’s neck is illustrated in Figure 4a. First, the input features undergo a transformation through a 1 × 1 convolutional layer, improving the expressiveness of the feature of the model. The resulting output is then split into two parts based on the output channels. One part is sent directly to the output, while the other part is processed by a DarknetBottleneck module to extract deeper features. This C2f module in the neck section does not include shortcut connections. The results of these two parts are concatenated along the dimension of the output channel to enhance the diversity of characteristics. The concatenated feature maps are then compressed by a second 1 × 1 convolutional layer. The required output channels are then obtained for subsequent processing. The specific process can be described as follows.
X 1 , X 3 = S p l i t [ f c o n v 1 × 1 ( X I N ) ]
X 2 = f c o n v 3 × 3 [ f c o n v 3 × 3 ( X 1 ) ]
X O U T = f c o n v 1 × 1 [ C o n c a t ( X 1 , X 2 , X 3 ) ]
Although the C2f module was designed to enhance performance, its increased structural complexity and the ability for feature fusion may render the network more sensitive to input noise, particularly in complex environments such as ocean monitoring, potentially impacting detection accuracy. Additionally, the parameters of the C2f modules account for a significant portion of YOLOv8n’s neck network. To address these issues while achieving lightweight functionality, we designed the LFE-C2f module as an improvement, as illustrated in Figure 4b.
The new LFE-C2f module fuses different subspace features via elementwise multiplication. The main idea is to improve the arithmetic process of Equations (5) to (7).
X 2 = X 1 f d w c o n v 7 × 7 f c o n v 1 × 1 { [ R e L U 6 ( f c o n v 1 × 1 f d w c o n v 7 × 7 ( X 1 ) ) ] [ f c o n v 1 × 1 f d w c o n v 7 × 7 ( X 1 ) ] }
It demonstrates significant potential in performance and efficiency, due to its capacity to map inputs to high-dimensional, nonlinear feature spaces [41]. Equation (7) implicitly implements feature extraction in higher dimensions without the need to increase network depth. Depthwise convolutions are also employed to reduce computational load, and the entire module effectively balances the dual goals of minimizing computational demands while enhancing feature representation. After incorporating our LFE-C2f design, the model’s parameter count was reduced to 84.2% compared to YOLOv8n while maintaining detection precision.

2.2.3. Resued and Shared Convolutional Detection Head

YOLO models typically employ separate detection heads for features extracted at multiple levels, which can result in inefficient utilization of model parameters. Additionally, the decoupled detection head of YOLOv8n incurs a significant computational load, accounting for approximately 25% of the total computation. Therefore, enhancing the structure of the detection head is crucial for achieving lightweight design.
The detection head of YOLOv8n, illustrated in Figure 5a, consists of three groups, each utilizing features from the P3, P4, and P5 levels for recognition, respectively. It employs the current widely-adopted decoupled-head structure, where the classification and recognition tasks are handled separately. In this structure of the decoupled head, one head focuses on target recognition, evaluated by a bounding box regression loss function (Bbox.Loss). The loss function consists of complete IoU (CIou) and distribution focal loss (DFL) [42]. The other head manages classification, evaluated by the loss function (Cls.Loss). Each head is preceded by three convolutional layers, and the convolutional parameters are not shared. To achieve lightweight design, we redesigned the detection head and introduced a weight sharing mechanism to merge and share convolutional layers in front of the two heads within each decoupled head, thereby reducing the overall parameter count. Additionally, some convolutional layers from the three independent detection heads of YOLOv8n were reused to improve parameter utilization.
We designed the RSCD head, as shown in Figure 5b. This idea arose from the observation that objects detected across various feature levels exhibit similarity in proportional scale sizes [25,43]. At the first step, we shared the two 3 × 3 Conv layers before Conv-4*reg_max in the recognition head and the two 3 × 3 Conv layers before Conv-num_class in the classification head, reducing the number of 3 × 3 Conv layers from 4 to 2 at each feature level. Additionally, we reused the first two 3 × 3 Conv layers across different feature levels, which reduced the total count of 3 × 3 Conv layers from 6 to 1 across the three feature levels. The Conv-4*reg_max and Conv-num_class layers for each feature level were also shared. Notably, when sharing the first 3 × 3 Conv layer, it is necessary to insert a 1 × 1 Conv layer in front of it. This additional layer ensures channel alignment, enabling input feature maps from various feature levels to be processed by the same 3 × 3 Conv layer. To address the issue that target scales detected by each recognition head may differ, a scale layer must be added to enable features after the Conv-4*reg_max layer to be scaled individually for each feature level. When sharing the Conv-4*reg_max layer, this is achieved by multiplying the output feature maps by a separate coefficient. Throughout the structure, we adopt the design concept of NASFPN, as features differ between stages, necessitating the inclusion of a normalization layer [44]. Directly introducing BN in the shared 3 × 3 Conv layer can lead to errors, while incorporating group normalization (GN) will increase inference overhead. To maintain accuracy, we enable convolutional layers to be shared within the RSCD head, while keeping BN layers independent for each stage.
Following RSCD, we share multiple convolutional layers, significantly reducing parameters compared to YOLOv8n’s detection head. Our network reduces parameters to 78.6% of the original without any loss in accuracy, while improving mAP50 by approximately 1% across different datasets. These improvements enable the detection head to operate with fewer parameters and less computation load, improving accuracy. Reusing and sharing convolutional layers drastically reduces parameter count, making a lightweight model which is conveniently to deploy on systems with limited resources.

3. Experiments

The proposed LH-YOLOv8n was evaluated by the ablation and comparison experiments, and this section will present an analysis of its performance.

3.1. Implementation Details

3.1.1. Platform

The experimental configurations for this work are outlined below. A cloud server was utilized to perform all of these experiments, equipped with an NVIDIA RTX 4090 (24 GB) graphics processing unit (GPU). The server operated on the Ubuntu 20.04.6 LTS operating system. Python 3.8.16 was used for training, while CUDA 12.1 was utilized for computational acceleration. YOLOv8n served as the foundation for enhancing our entire network architecture, which was derived from publicly available source code. During training, a batch size of 16 was used, and both datasets were trained for 300 epochs. The Adam optimizer was employed with a learning rate of 0.01. No pre-trained weights wer utilized for the models. Since one of the main features of LH-YOLO is its lightweight design, its training cost is lower than other models, as reflected in its parameter count and computational complexity. On the HRSID dataset, training for LH-YOLO took approximately 2.5 h on the NVIDIA RTX 4090 GPU.

3.1.2. Datasets

The high-resolution SAR images dataset (HRSID) and SAR-Ship-Dataset wer used to validate our neural network model’s performance. Both datasets capture echo information from a single electromagnetic band. Pixel points in SAR images represent radar waves reflections from ground targets. Rough surfaces tend to generate stronger backscatter signals, while smooth surfaces typically fail to produce detectable echoes, causing the rough targets to appear as bright spots or blocks in SAR images. The primary challenge in ship identification lies in distinguishing ship information from ground features, as both appear as bright spots and bright blocks, while the sea surface shows as a dark region in SAR images. In Table 1, the description of these two datasets is presented.
  • HRSID
    The HRSID dataset comprises 5604 cropped SAR images and 16,951 ships. All images are 800 × 800 pixels in size. Ships in the dataset are categorized into three classes based on pixel area: small ships (<482 pixels), medium ships (482–1452 pixels), and large ships (>1452 pixels). Specifically, the dataset includes 9242 small ships, 7388 medium ships, and 321 large ships [45]. The SAR images in the HRSID exhibited very high spatial resolution and provide complex backgrounds, making them a vital source for developing high-precision ship detection algorithms.
  • SAR-Ship-Dataset
    The SAR-Ship-Dataset comprises 43,819 cropped SAR images and 59,535 ships. All images are 256 × 256 pixels in size. Specifically, the dataset includes 35,695 small ships, 23,660 medium ships, and 180 large ships [46]. The dataset provides a variety of images captured by different SAR sensors, covering different camera angles and geographical regions. Given that the dataset originates from multiple SAR sensors, significant variations in image quality and resolution exist, requiring the algorithm to handle data from various sources. Furthermore, the detection algorithm must demonstrate strong scale invariance to effectively handle ship targets of varying sizes and shapes. In conclusion, the dataset highlights image diversity and a wide range of application scenarios, covering various sensors and sea regions, making it ideal for developing detection algorithms with strong generalization capabilities.

3.1.3. Evaluation Metrics

To comprehensively and effectively evaluate the detection accuracy of LH-YOLO in our experiments, we utilized three key metrics: precision (P), recall (R), and mean average precision ( m A P ). Each metric is defined by the following formulas.
P = T P T P + F P
R = T P T P + F N
A P = 0 1 P ( R ) d R
m A P = 1 N i = 1 N A P ( i )
where T P represents true positives, referring to ships correctly detected by LH-YOLO. F P represents the number of false positives, indicating nonships incorrectly detected as ships by LH-YOLO. F N represents the number of false negatives, reflecting ships incorrectly identified as nonships by LH-YOLO. P indicates the proportion of ships accurately detected, while R is the proportion of ships correctly identified among all actual ships. A P evaluates the precision across different recall rates, being calculated by the precision and recall. m A P 50 is the mean average precision when the intersection over union (IoU) threshold is 50%.
Additionally, we used three metrics for comparison: the number of parameters, floating point operations (FLOPs), and model size, which were ultized to assess the lightweight capabilities of our proposed LH-YOLO more thoroughly. Parameters and FLOPs in convolutional layers are expressed by the following equation:
P a r a m s = ( k h × k w × C i n ) × C o u t
F L O P s = ( k h × k w × C i n × C o u t ) × ( H × W )
where k h and k w represent the kernel’s height and width, C i n and C o u t indicate the number of input and output channels, and H and W represent the dimensions of the feature map. Parameters refer to the model’s learnable variables, which are continuously refined during training to minimize loss. FLOPs quantify the floating point operations performed in the inference, serving as a metric for the computational complexity or the cost of a specific operation. The model size reflects its complexity and the required storage space. An increase in parameters, FLOPs, and model size will result in higher hardware resource consumption.

3.2. Ablation Experiment

Compared to the baseline YOLOv8n network, LH-YOLO introduces several architectural improvements. We experimentally assessed how all these enhancements would affect the model and interact with each other. The performance outcomes of StarNet-nano in the backbone, LFE-C2f module in the neck, and the RSCD head were assessed through ablation experiments on the HRSID and SAR-Ship-Dataset. The experiments utilized different combinations of these improvements, leading to eight sub-experiments for each dataset.
The results are presented in Table 2. On the HRSID dataset, the designed StarNet-nano backbone reduced the model size by 9.5% and decreased parameter count by 10.0%, while the model’s mAP50 remained nearly the same compared to YOLOv8n, highlighting the advantage of StarNet-nano in lightweight design. Following the LFE-C2f module in the neck network of YOLOv8n, the mAP50 showed a slight improvement compared to YOLOv8n, with the model size reduced by 14.3%, parameter count reduced by 15.8%, and computational load decreased by 15.1%. This result demonstrates the effectiveness of the LFE-C2f module in lightweighting, allowing for mapping the input feature map to a very high-dimensional nonlinear feature space without increasing computational demands or the number of parameters. Moreover, after incorporating our designed RSCD head network, the experimental data indicate a 1.3% improvement in the mAP50, while achieving the most significant model size reduction of 20.6%, a parameter decrease of 21.4%, and a computational load reduction of 19.0%, due to the reused and shared convolutional layer weights. The highest precision was also represented in this improvement. Each of these three enhancements contributed to improving the overall performance of LH-YOLO.
Notably, our three improvements were optimized for the backbone, neck, and head networks, respectively, ensuring no conflicts in terms of precision, model size, parameter count, or computational demands when incorporated into our proposed model. The results in our combined experiments highlight this synergy. Combining any two of the three improvements consistently resulted in enhanced performance, both in terms of parameter count and computational load, which attributed to each individual enhancement. In summary, when combining all of these three optimizations, it led to a 1.4% improvement in mAP50 compared to YOLOv8n, with a total model size reduction of 34.9%, a parameter reduction of 38.1%, and a computational load decrease of 23.8%. This proves that the LH-YOLO model successfully maintains high detection accuracy while remaining lightweight.
The visualization results in Figure 6 clearly demonstrate that the LH-YOLO model outperformed the YOLOv8n in detecting ships on the HRSID dataset. In the figure, green boxes indicate ships that are detected correctly by the model, and blue boxes represent incorrectly detected ships. Moreover, red boxes signify missed detections. Whether detecting small ships in offshore images or detecting ships in nearshore images, which have complex backgrounds, LH-YOLO’s recognition results in Figure 6c were found to be superior to those of YOLOv8n in Figure 6b. This illustrates the stable performance of our model in complex sea conditions. As for detecting small target ships, LH-YOLO significantly outperformed YOLOv8n. The enhanced performance can be largely explained by the enhanced feature extraction capabilities provided by the LFE-C2f module in our improved method.
To further demonstrate generalization ability for our LH-YOLO, the same experiments were performed on the SAR-Ship-Dataset, and as shown in Table 2, its results exhibited a similar trend to those from the HRSID dataset, with a 0.7% improvement in mAP50. The LH-YOLO’s model size and computational load also had a significant reduction compare to the YOLOv8n. The visualization results in Figure 7 also reveal that the LH-YOLO model outperformed YOLOv8n in various complex environments on the SAR-Ship-Dataset, further validating its effectiveness in practical applications. In summary, LH-YOLO achieved superior detection accuracy compared to the original YOLOv8n model while decreasing its parameter count, model size, and computational load, and it demonstrated strong generalization capability.

3.3. Comparison with Other Methods

We carried out in-depth experiments on both the HRSID and SAR-Ship-Dataset to offer a more thorough evaluation of LH-YOLO’s performance. These experiments aimed to demonstrate the superior effectiveness of our LH-YOLO model for ship detection tasks. The evaluation included single-stage algorithms such as YOLOv3-Tiny [18], YOLOv5, YOLOv8n, and YOLOv10n [17], as well as two-stage algorithms such as Faster R-CNN, Cascade R-CNN, and Mask R-CNN, as shown in Table 3. We have also included comparisons with the latest lightweight models, such as ESarDet [47], Improved YOLOx-Tiny [48], YOLO-Lite [49], and SHIP-YOLO [50]. In the final row, LH-YOLO represents our proposed model. To ensure a fair comparison, we maintained consistent parameter settings across the various networks during training. This approach facilitated a clearer understanding of the advantages and effectiveness of LH-YOLO in ship detection tasks.
Table 3 demonstrates that LH-YOLO achieved the highest accuracy with the fewest parameters on both datasets compared to other detection networks. On the HRSID dataset, LH-YOLO achieved a mAP50 of 96.6%, significantly outperforming other mainstream models, including a 0.9% increase over the second-highest model, YOLOv10n (95.7%). Models like YOLOv5 and YOLOv8n achieved mAP50s of 94.5% and 95.2%, respectively, while Faster R-CNN, Cascade R-CNN, and Mask R-CNN all had mAP50s below 90%. As for parameter count, LH-YOLO required only 1.862M parameters, being significantly lower than Faster R-CNN (41.753M), Cascade R-CNN (69.395M), and Mask R-CNN (44.396M). ESarDet achieved an mAP50 of 93.2%, which was lower than LH-YOLO’s performance of 96.6%. Additionally, ESarDet had a higher parameter count, indicating that LH-YOLO achieved superior detection precision with a more compact architecture. As for the improved YOLOx-Tiny, while it had 0.398M fewer parameters than LH-YOLO, its detection accuracy was significantly lower, with an mAP50 of 86.8% compared to LH-YOLO’s 96.6%. LH-YOLO maintaind high precision with minimal network parameters and computational demands, enhancing its feasibility and flexibility for practical applications, especially in resource-constrained scenarios such as embedded systems or edge computing platforms.
On the SAR-Ship-Dataset, LH-YOLO also performed well, achieving a mAP50 of 93.8%, comparable to YOLOv8n (93.1%), Faster R-CNN (93.5%), and YOLO-lite (92.1%). However, LH-YOLO’s parameter count and computational load were significantly lower than these models, showcasing its design efficiency. Although SHIP-YOLO achieved a slightly higher mAP50 of 96.6%, its parameter count was 34.3% greater than that of LH-YOLO, indicating that the accuracy gain came at a cost to model size and computational efficiency. LH-YOLO’s high accuracy was attributed to its innovative network architecture, particularly the optimized weight sharing mechanism and feature extraction module, which enhanced the detection capability and decreased the model size. Additionally, LH-YOLO exceled in both precision and recall, which were higher than YOLOv8n and most of the mainstream models. Furthermore, comparing the performance of the networks across different datasets reveals that among these models, Faster R-CNN and Mask R-CNN exhibited the poorest generalization ability. In contrast, the performance of LH-YOLO on these datasets demonstrates its robust generalization capability, making it suitable for high-precision detection tasks under practical ocean conditions.
The visualization results of various models on the HRSID and SAR-Ship-Dataset are illustrated in Figure 8, facilitating a clearer comparison. All of these models shows some cases of bright spots being misidentified as ships and missing detections in complex backgrounds, with YOLOv3-Tiny exhibiting the most severe missed ship targets and YOLOv8n having the highest number of misidentified ship targets. However, LH-YOLO demonstrated minimal instances of misidentified ship targets and missed ship targets. LH-YOLO clearly maintained superior detection precision compared to other models. This further illustrates the fact that LH-YOLO achieved superior performance while maintaining a lightweight architecture.
The feasibility and excellent detection performance of the model has been clearly illustrated, making it well suited for real-time monitoring on devices with limited resources. The LH-YOLO model exhibited excellent performance on both datasets, leading in precision, parameter count, and computational efficiency. This indicates that LH-YOLO has strong potential for target detection tasks, particularly in scenarios with limitied computational resources.

4. Discussion

Deep learning algorithms, particularly CNNs, have shown considerable success in the field of image recognition task and have proven highly effective in processing SAR images [51]. This paper introduced LH-YOLO, a lightweight, high-precision ship detection model. It achieved high detection accuracy with a mAP50 of 96.6% on the HRSID dataset, which was 1.4% higher than the baseline YOLOv8n. On the other hand, it maintained a balance between model size and detection performance, with a parameter count reduced by 38.1% and computation load reduced by 23.8%. Additionally, the generalization ability of LH-YOLO was validated on the SAR-Ship-Dataset. It outperformed the baseline YOLOv8n model across both public SAR image datasets. Due to its lightweight design, our model is well suited for deployment on resource-constrained scenarios.
The LH-YOLO model incorporates the designed StarNet-nano structure as the backbone network. This improvement achieves the balance between computational complexity and performance through a four-stage architecture. This enables efficient computation and enhanced feature representation, supporting higher detection performance in resource-constrained scenarios. By minimizing redundant operations, StarNet-nano not only alleviates hardware burden but also accelerates inference speed, enhancing its performance in real-time detection tasks. The designed LFE-C2f module was integrated into the neck, employing the element-wise mutiplication to map input features into a high-dimensional nonlinear feature space without increasing computational complexity. Additionally, feature fusion was applied to obtain richer and more expressive feature representations. By improving the nonlinear representation of features, the model becomes more robust in handling complex target detection tasks. In the detection head, a weight sharing mechanism was employed to significantly reduce model parameters. Weight parameter reuse was employed to enhance parameter utilization. Furthermore, to maintain accuracy, we allowed the RSCD head to share the convolutional layers while keeping the batch BN layers independent and computed separately.
The entire network significantly reduced the parameter count and eliminated unnecessary structures while ensuring an improvement in the mAP50. In conclusion, the proposed model demonstrates superior performance and this lightweight design effectively reduces the computational demands, facilitating it to deploy conveniently for resource-constrained scenarios, such as embedded systems and edge computing platforms.
This study demonstrates that the LH-YOLO model offers significant advantages in detecting tasks in SAR images while also presenting certain limitations. More effective methods still can be tried to enhance computational efficiency and precision. Ship detection in SAR images poses unique challenges compared to conventional detection tasks, as the environmental conditions are highly variable and complex, often involving extreme weather conditions such as dense cloud cover, thunderstorms, snow, and rain. Furthermore, the diversity in ship sizes, shapes, and types adds complexity to the detection task. To enhance ship detection in complex environments, richer data augmentation strategies, such as random cropping, rotation, and noise addition, are planned to be employed during training [52,53,54]. Incorporating a self-attention mechanism or channel attention module enables the model to focus more on important regions in the image, enhancing its sensitivity to ships. To further optimize model lightweighting, we will explore adjustments to the StarNet-nano backbone, as well as advanced techniques like model distillation and pruning [55,56]. Given the growing demand for deployment on edge platforms, we aim to deliver a SAR ship detection model, which maintain lightweight and high-precision to meet the needs of practical applications in resource-constrained environments.

5. Conclusions

In this paper, we proposed a lightweight and high-precision deep learning method for SAR ship detection named LH-YOLO. The LH-YOLO improved the YOLOv8n baseline through three key optimizations. First, a new lightweight backbone, StarNet-nano, was designed. These modifications significantly reduced the number of parameters and computational load while maintaining accuracy. Second, a lightweight LFE-C2f architecture was developed to address the issue of the C2f’s parameter count occupying a significant portion of the YOLOv8n’s neck. Third, a RSCD head was proposed to support the weight parameters reusing and sharing, resolving the underutilization of parameters in YOLOv8n’s detection head.
The network model was evaluated on two public datasets, HRISD and SAR-Ship-Dataset, with results indicating that LH-YOLO achieve better performance in accuracy, model size, parameter count, and computational complexity. On the HRISD dataset, compared to the YOLOv8n benchmark, LH-YOLO reduced parameters and computational load by 38.1% and 23.8%, respectively, while improving mAP50 by 1.4%. Additionally, the model achieved strong generalization on the SAR-Ship-Dataset. This demonstrates the model’s ability to balance high detection precision and model efficiency for SAR ships. The lightweight nature of LH-YOLO allows for easier integration into resource-constrained systems. This paper aims to provide valuable guidance to researchers in the field of ship detection, aiming to enhance inference performance in practical applications.
In future work, we will continue to enhance the feature extraction capabilities of this lightweight architecture and improve small ship detection, aiming for superior SAR ship detection performance.

Author Contributions

Conceptualization, Q.C. and H.C.; methodology, Q.C.; software, Q.C.; validation, Q.C., H.C. and S.W.; formal analysis, Q.C. and Y.W.; investigation, Q.C. and H.C.; resources, Q.C.; data curation, Z.C.; writing—original draft preparation, Q.C.; writing—review and editing, Q.C., H.C., H.F. and F.L.; visualization, Q.C.; supervision, F.L. and Q.C.; project administration, F.L. and Q.C.; funding acquisition, F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Aeronautical Science Foundation of China (ASFC-20184370012) and by the National Natural Science Foundation of China (No. 61474093).

Data Availability Statement

This paper used the HRSID and SAR-Ship-Dataset. These are the High-Resolution SAR Images Dataset (HRSID) dataset (https://github.com/chaozhong2010/HRSID, accessed on 19 September 2024), and theSAR-Ship-Dataset (https://github.com/CAESAR-Radi/SAR-Ship-Dataset, accessed on 19 September 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Moreira, A.; Prats-Iraola, P.; Younis, M.; Krieger, G.; Hajnsek, I.; Papathanassiou, K.P. A tutorial on synthetic aperture radar. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–43. [Google Scholar] [CrossRef]
  2. Wysocki, K.; Niewińska, M. Counteracting imagery (IMINT), optoelectronic (EOIMINT) and radar (SAR) intelligence. Sci. J. Mil. Univ. Land Forces 2022, 54, 222–244. [Google Scholar] [CrossRef]
  3. Agrawal, S.; Khairnar, G.B. A comparative assessment of remote sensing imaging techniques: Optical, sar and lidar. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, XLII-5/W3, 1–6. [Google Scholar] [CrossRef]
  4. Li, J.; Xu, C.; Su, H.; Gao, L.; Wang, T. Deep Learning for SAR Ship Detection: Past, Present and Future. Remote Sens. 2022, 14, 2712. [Google Scholar] [CrossRef]
  5. Alexandre, C.; Devillers, R.; Mouillot, D.; Seguin, R.; Catry, T. Ship Detection with SAR C-Band Satellite Images: A Systematic Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 14353–14367. [Google Scholar] [CrossRef]
  6. Yasir, M.; Jianhua, W.; Mingming, X.; Hui, S.; Zhe, Z.; Shanwei, L.; Colak, A.T.I.; Hossain, M.S. Ship detection based on deep learning using SAR imagery: A systematic literature review. Soft Comput. 2023, 27, 63–84. [Google Scholar] [CrossRef]
  7. Liu, T.; Zhang, J.; Gao, G.; Yang, J.; Marino, A. CFAR Ship Detection in Polarimetric Synthetic Aperture Radar Images Based on Whitening Filter. IEEE Trans. Geosci. Remote Sens. 2020, 58, 58–81. [Google Scholar] [CrossRef]
  8. Smith, M.; Varshney, P. VI-CFAR: A novel CFAR algorithm based on data variability. In Proceedings of the 1997 IEEE National Radar Conference, Syracuse, NY, USA, 13–15 May 1997; pp. 263–268. [Google Scholar] [CrossRef]
  9. Blake, S. OS-CFAR theory for multiple targets and nonuniform clutter. IEEE Trans. Aerosp. Electron. Syst. 1988, 24, 785–790. [Google Scholar] [CrossRef]
  10. Abdou, L.; Soltani, F. OS-CFAR and CMLD threshold optimization in distributed systems using evolutionary strategies. Signal Image Video Process. 2008, 2, 155–167. [Google Scholar] [CrossRef]
  11. Arisoy, S.; Kayabol, K. Mixture-Based Superpixel Segmentation and Classification of SAR Images. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1721–1725. [Google Scholar] [CrossRef]
  12. Wang, X.; Li, G.; Plaza, A.; He, Y. Revisiting SLIC: Fast Superpixel Segmentation of Marine SAR Images Using Density Features. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–18. [Google Scholar] [CrossRef]
  13. Peng, B.; Peng, B.; Zhou, J.; Xie, J.; Liu, L. Scattering Model Guided Adversarial Examples for SAR Target Recognition: Attack and Defense. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5236217. [Google Scholar] [CrossRef]
  14. Huang, Q.; Zhu, W.; Li, Y.; Zhu, B.; Gao, T.; Wang, P. Survey of Target Detection Algorithms in SAR Images. In Proceedings of the 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 12–14 March 2021; pp. 1756–1765. [Google Scholar] [CrossRef]
  15. Er, M.J.; Zhang, Y.; Chen, J.; Gao, W. Ship detection with deep learning: A survey. Artif. Intell. Rev. 2023, 56, 11825–11865. [Google Scholar] [CrossRef]
  16. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  17. Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
  18. Adarsh, P.; Rathi, P.; Kumar, M. YOLO v3-Tiny: Object Detection and Recognition using one stage improved model. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 March 2020; pp. 687–694. [Google Scholar] [CrossRef]
  19. Wang, Z.; Hua, Z.; Wen, Y.; Zhang, S.; Xu, X.; Song, H. E-YOLO: Recognition of estrus cow based on improved YOLOv8n model. Expert Syst. Appl. 2024, 238, 122212. [Google Scholar] [CrossRef]
  20. Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo algorithm developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
  21. Guo, Y.; Chen, S.; Zhan, R.; Wang, W.; Zhang, J. LMSD-YOLO: A Lightweight YOLO Algorithm for Multi-Scale SAR Ship Detection. Remote Sens. 2022, 14, 4801. [Google Scholar] [CrossRef]
  22. Tang, X.; Zhang, J.; Xia, Y.; Xiao, H. DBW-YOLO: A High-Precision SAR Ship Detection Method for Complex Environments. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 7029–7039. [Google Scholar] [CrossRef]
  23. Humayun, M.F.; Nasir, F.A.; Bhatti, F.A.; Tahir, M.; Khurshid, K. YOLO-OSD: Optimized Ship Detection and Localization in Multiresolution SAR Satellite Images Using a Hybrid Data-Model Centric Approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 5345–5363. [Google Scholar] [CrossRef]
  24. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
  25. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar] [CrossRef]
  26. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
  27. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
  28. Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving Into High Quality Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar] [CrossRef]
  29. Zhang, Y.; Hao, Y. A Survey of SAR Image Target Detection Based on Convolutional Neural Networks. Remote Sens. 2022, 14, 6240. [Google Scholar] [CrossRef]
  30. Feng, Y.; Chen, J.; Huang, Z.; Wan, H.; Xia, R.; Wu, B.; Sun, L.; Xing, M. A Lightweight Position-Enhanced Anchor-Free Algorithm for SAR Ship Detection. Remote Sens. 2022, 14, 1908. [Google Scholar] [CrossRef]
  31. Yasir, M.; Liu, S.; Pirasteh, S.; Xu, M.; Sheng, H.; Wan, J.; de Figueiredo, F.A.; Aguilar, F.J.; Li, J. YOLOShipTracker: Tracking ships in SAR images using lightweight YOLOv8. Int. J. Appl. Earth Obs. Geoinf. 2024, 134, 104137. [Google Scholar] [CrossRef]
  32. Gao, Z.; Yu, X.; Rong, X.; Wang, W. Improved YOLOv8n for Lightweight Ship Detection. J. Mar. Sci. Eng. 2024, 12, 1774. [Google Scholar] [CrossRef]
  33. Wang, J.; Cui, Z.; Jiang, T.; Cao, C.; Cao, Z. Lightweight Deep Neural Networks for Ship Target Detection in SAR Imagery. IEEE Trans. Image Process. 2023, 32, 565–579. [Google Scholar] [CrossRef]
  34. Wang, K.; Liew, J.H.; Zou, Y.; Zhou, D.; Feng, J. PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment. In Proceedings of the The IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
  35. Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
  36. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
  37. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  38. Ma, X.; Dai, X.; Bai, Y.; Wang, Y.; Fu, Y. Rewrite the Stars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
  39. Zhou, D.; Hou, Q.; Chen, Y.; Feng, J.; Yan, S. Rethinking bottleneck structure for efficient mobile network design. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Part III 16. Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 680–697. [Google Scholar]
  40. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
  41. Rao, Y.; Zhao, W.; Tang, Y.; Zhou, J.; Lim, S.N.; Lu, J. HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions. In Proceedings of the Advances in Neural Information Processing Systems; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2022; Volume 35, pp. 10353–10366. [Google Scholar]
  42. Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. In Proceedings of the Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 21002–21012. [Google Scholar]
  43. Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9626–9635. [Google Scholar] [CrossRef]
  44. Ghiasi, G.; Lin, T.Y.; Le, Q.V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7029–7038. [Google Scholar] [CrossRef]
  45. Wei, S.; Zeng, X.; Qu, Q.; Wang, M.; Su, H.; Shi, J. HRSID: A High-Resolution SAR Images Dataset for Ship Detection and Instance Segmentation. IEEE Access 2020, 8, 120234–120254. [Google Scholar] [CrossRef]
  46. Wang, Y.; Wang, C.; Zhang, H.; Dong, Y.; Wei, S. A SAR Dataset of Ship Detection for Deep Learning under Complex Backgrounds. Remote Sens. 2019, 11, 765. [Google Scholar] [CrossRef]
  47. Zhang, Y.; Chen, C.; Hu, R.; Yu, Y. ESarDet: An Efficient SAR Ship Detection Method Based on Context Information and Large Effective Receptive Field. Remote Sens. 2023, 15, 3018. [Google Scholar] [CrossRef]
  48. Kong, W.; Liu, S.; Xu, M.; Yasir, M.; Wang, D.; Liu, W. Lightweight algorithm for multi-scale ship detection based on high-resolution SAR images. Int. J. Remote Sens. 2023, 44, 1390–1415. [Google Scholar] [CrossRef]
  49. Ren, X.; Bai, Y.; Liu, G.; Zhang, P. YOLO-Lite: An Efficient Lightweight Network for SAR Ship Detection. Remote Sens. 2023, 15, 3771. [Google Scholar] [CrossRef]
  50. Luo, Y.; Li, M.; Wen, G.; Tan, Y.; Shi, C. SHIP-YOLO: A Lightweight Synthetic Aperture Radar Ship Detection Model Based on YOLOv8n Algorithm. IEEE Access 2024, 12, 37030–37041. [Google Scholar] [CrossRef]
  51. Chen, S.; Wang, H. SAR target recognition based on deep learning. In Proceedings of the 2014 International Conference on Data Science and Advanced Analytics (DSAA), Shanghai, China, 30 October–2 November 2014; pp. 541–547. [Google Scholar] [CrossRef]
  52. Ding, J.; Chen, B.; Liu, H.; Huang, M. Convolutional Neural Network With Data Augmentation for SAR Target Recognition. IEEE Geosci. Remote Sens. Lett. 2016, 13, 364–368. [Google Scholar] [CrossRef]
  53. Zeng, G.Q.; Wei, H.N.; Lu, K.D.; Geng, G.G.; Weng, J. DACO-BD: Data Augmentation Combinatorial Optimization-Based Backdoor Defense in Deep Neural Networks for SAR Image Classification. IEEE Trans. Instrum. Meas. 2024, 73, 2526213. [Google Scholar] [CrossRef]
  54. Yu, T.; Shigang, W.; Jian, W.; Yan, Z.; Jiehua, L.; Jiaqi, Y.; Dongliang, L. Scene-aware data augmentation for ship detection in SAR images. Int. J. Remote Sens. 2024, 45, 3396–3411. [Google Scholar] [CrossRef]
  55. Wang, G.; Qin, R.; Xia, Y. M-FSDistill: A Feature Map Knowledge Distillation Algorithm for SAR Ship Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 13217–13231. [Google Scholar] [CrossRef]
  56. Yu, J.; Chen, J.; Wan, H.; Zhou, Z.; Cao, Y.; Huang, Z.; Li, Y.; Wu, B.; Yao, B. SARGap: A Full-Link General Decoupling Automatic Pruning Algorithm for Deep Learning-Based SAR Target Detectors. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5202718. [Google Scholar] [CrossRef]
Figure 1. The overall structure of YOLOv8. YOLOv8 originates from the open-source code made available by Ultralytics. “ ×2” means there are two columns of ConvModule.
Figure 1. The overall structure of YOLOv8. YOLOv8 originates from the open-source code made available by Ultralytics. “ ×2” means there are two columns of ConvModule.
Remotesensing 16 04340 g001
Figure 2. The structure of LH-YOLO.
Figure 2. The structure of LH-YOLO.
Remotesensing 16 04340 g002
Figure 3. (a) The fundamental module for the StarNet-nano network. (b) Detailed description of the star operation.
Figure 3. (a) The fundamental module for the StarNet-nano network. (b) Detailed description of the star operation.
Remotesensing 16 04340 g003
Figure 4. (a) The C2f module in the neck of YOLOv8n. (b) The proposed LFE-C2f module.
Figure 4. (a) The C2f module in the neck of YOLOv8n. (b) The proposed LFE-C2f module.
Remotesensing 16 04340 g004
Figure 5. (a) The framework of the decoupled head of YOLOv8n. (b) The framework of the RSCD head.
Figure 5. (a) The framework of the decoupled head of YOLOv8n. (b) The framework of the RSCD head.
Remotesensing 16 04340 g005
Figure 6. Detection results comparison using the HRSID dataset. Green boxes represent ships that have been correctly detected, while blue boxes indicate incorrectly detected ships. Additionally, red boxes denote ships that were not detected at all.
Figure 6. Detection results comparison using the HRSID dataset. Green boxes represent ships that have been correctly detected, while blue boxes indicate incorrectly detected ships. Additionally, red boxes denote ships that were not detected at all.
Remotesensing 16 04340 g006
Figure 7. Detection results comparison using the SAR-Ship-Dataset.
Figure 7. Detection results comparison using the SAR-Ship-Dataset.
Remotesensing 16 04340 g007
Figure 8. Visual comparison of detection results between LH-YOLO and four detection models using the HRSID and SAR-Ship-Dataset: (a) Ground Truth, (b) YOLOv3-Tiny. (c) YOLOv5. (d) YOLOv10n. (e) Proposed LH-YOLO.
Figure 8. Visual comparison of detection results between LH-YOLO and four detection models using the HRSID and SAR-Ship-Dataset: (a) Ground Truth, (b) YOLOv3-Tiny. (c) YOLOv5. (d) YOLOv10n. (e) Proposed LH-YOLO.
Remotesensing 16 04340 g008
Table 1. Description of the HRSID and SAR-Ship-Dataset.
Table 1. Description of the HRSID and SAR-Ship-Dataset.
ParameterHRSIDSAR-Ship-Dataset
Data sourcesSentinel-1B; TerraSAR-X; TanDEMGF-3; Sentinel-1
Image size800 × 800256 × 256
Images number560443,819
Small924235,695
Ships numberMedium738823,660
Large321180
Resolution (m)0.5, 1, 33∼25
Table 2. Ablation experiments on the HRSID and SAR-Ship-Dataset.
Table 2. Ablation experiments on the HRSID and SAR-Ship-Dataset.
DatasetStarNet-NanoLFE-C2fRSCDPrecisionRecallmAP50Model Size (MB)Params (M)FLOPs (G)
HRSID 0.9400.9010.9526.33.00612.6
0.9370.8990.9495.72.70512.5
0.9380.9020.9555.42.53210.7
0.9540.9060.9655.02.36310.2
0.9460.9040.9585.32.50512.0
0.9430.8860.9574.42.06210.2
0.9470.9010.9624.21.9569.6
0.9520.9080.9664.11.8629.6
SAR-Ship-Dataset 0.8820.8850.9316.23.0061.3
0.8900.8810.9315.72.7051.2
0.8850.8780.9275.32.5321.1
0.8920.8940.9374.92.3631.0
0.890.8860.9315.32.5051.2
0.8930.8840.9354.42.0621.0
0.8860.8880.9324.21.9561.0
0.8950.8890.9384.01.8621.0
"✓" means the specific improvement was incorporated in these sub-experiments. The bold numbers represent the best performance achieved among these experiments.
Table 3. Comparisons of LH-YOLO and other models on the HRSID and SAR-Ship-Dataset. Bolded numbers indicate the best indicators.
Table 3. Comparisons of LH-YOLO and other models on the HRSID and SAR-Ship-Dataset. Bolded numbers indicate the best indicators.
DatasetModelPrecisionRecallmAP50Params (M)
HRSIDYOLOv3-Tiny0.9430.8220.90412.128
YOLOv50.9250.8930.9452.503
YOLOv8n0.9400.9010.9523.006
YOLOv10n0.9340.8830.9572.265
Faster R-CNN0.9110.8710.87541.753
Cascade R-CNN0.9020.8900.89469.395
Mask R-CNN0.8940.8580.86344.396
ESarDet *--0.9326.200
Improved YOLOx-Tiny *0.936-0.8681.464
LH-YOLO(ours)0.9340.8830.9661.862
SAR-Ship-DatasetYOLOv3-Tiny0.8850.8220.89212.128
YOLOv50.8870.8640.9162.503
YOLOv8n0.8860.8860.9313.006
YOLOv10n0.8770.8790.9262.265
Faster R-CNN0.8240.9430.93541.753
Cascade R-CNN0.8360.9450.93663.395
Mask R-CNN0.8190.9420.93744.396
YOLO-lite0.9480.8810.9217.640
SHIP-YOLO0.9320.9280.9662.500
LH-YOLO(ours)0.8950.8890.9381.862
*, “-” means the specific value of the metric is not provided in corresponding paper, and the model is also not open source.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cao, Q.; Chen, H.; Wang, S.; Wang, Y.; Fu, H.; Chen, Z.; Liang, F. LH-YOLO: A Lightweight and High-Precision SAR Ship Detection Model Based on the Improved YOLOv8n. Remote Sens. 2024, 16, 4340. https://doi.org/10.3390/rs16224340

AMA Style

Cao Q, Chen H, Wang S, Wang Y, Fu H, Chen Z, Liang F. LH-YOLO: A Lightweight and High-Precision SAR Ship Detection Model Based on the Improved YOLOv8n. Remote Sensing. 2024; 16(22):4340. https://doi.org/10.3390/rs16224340

Chicago/Turabian Style

Cao, Qi, Hang Chen, Shang Wang, Yongqiang Wang, Haisheng Fu, Zhenjiao Chen, and Feng Liang. 2024. "LH-YOLO: A Lightweight and High-Precision SAR Ship Detection Model Based on the Improved YOLOv8n" Remote Sensing 16, no. 22: 4340. https://doi.org/10.3390/rs16224340

APA Style

Cao, Q., Chen, H., Wang, S., Wang, Y., Fu, H., Chen, Z., & Liang, F. (2024). LH-YOLO: A Lightweight and High-Precision SAR Ship Detection Model Based on the Improved YOLOv8n. Remote Sensing, 16(22), 4340. https://doi.org/10.3390/rs16224340

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop