Lightweight Ship Detection Network for SAR Range-Compressed Domain

Tan, Xiangdong; Leng, Xiangguang; Sun, Zhongzhen; Luo, Ru; Ji, Kefeng; Kuang, Gangyao

doi:10.3390/rs16173284

Open AccessArticle

Lightweight Ship Detection Network for SAR Range-Compressed Domain

by

Xiangdong Tan

,

Xiangguang Leng

^*,

Zhongzhen Sun

,

Ru Luo

,

Kefeng Ji

and

Gangyao Kuang

State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System, College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(17), 3284; https://doi.org/10.3390/rs16173284

Submission received: 1 August 2024 / Revised: 25 August 2024 / Accepted: 30 August 2024 / Published: 4 September 2024

(This article belongs to the Special Issue SAR-Based Signal Processing and Target Recognition (Second Edition))

Download

Browse Figures

Versions Notes

Abstract

:

The utilization of Synthetic Aperture Radar (SAR) for real-time ship detection proves highly advantageous in the supervision and monitoring of maritime activities. Ship detection in the range-compressed domain of SAR rather than in fully focused SAR imagery can significantly reduce the time and computational resources required for complete SAR imaging, enabling lightweight real-time ship detection methods to be implemented on an airborne or spaceborne SAR platform. However, there is a lack of lightweight ship detection methods specifically designed for the SAR range-compressed domain. In this paper, we propose Fast Range-Compressed Detection (FastRCDet), a novel lightweight network for ship detection in the SAR range-compressed domain. Firstly, to address the distinctive geometric characteristics of the SAR range-compressed domain, we propose a Lightweight Adaptive Network (LANet) as the backbone of the network. We introduce Arbitrary Kernel Convolution (AKConv) as a fundamental component, which enables the flexible adjustment of the receptive field shape and better adaptation to the large scale and aspect ratio characteristics of ships in the range-compressed domain. Secondly, to enhance the efficiency and simplicity of the network model further, we propose an innovative Multi-Scale Fusion Head (MSFH) module directly integrated after the backbone, eliminating the need for a neck module. This module effectively integrates features at various scales to more accurately capture detailed information about the target. Thirdly, to further enhance the network’s adaptability to ships in the range-compressed domain, we propose a novel Direction IoU (DIoU) loss function that leverages angle cost to control the convergence direction of predicted bounding boxes, thereby improving detection accuracy. Experimental results on a publicly available dataset demonstrate that FastRCDet achieves significant reductions in parameters and computational complexity compared to mainstream networks without compromising detection performance in SAR range-compressed images. FastRCDet achieves a low parameter of 2.49 M and a high detection speed of 38.02 frames per second (FPS), surpassing existing lightweight detection methods in terms of both model size and processing rate. Simultaneously, it attains an average accuracy (AP) of 77.12% in terms of its detection performance. This method provides a baseline in lightweight network design for SAR ship detection in the range-compressed domain and offers practical implications for resource-constrained embedded platforms.

Keywords:

synthetic aperture radar (SAR); range-compressed domain; ship detection; deep learning; lightweight network

1. Introduction

Maritime transport has propelled the growth of coastal industries and international trade exchanges, providing a steady supply for modern economic and trade development [1]. Global trade exchanges have become more active and maritime transport volume has surged rapidly. However, the continuous progress in maritime transport has also presented mounting challenges in terms of maritime supervision. Ship detection serves as an essential means within the maritime supervision system, enabling swift ship positioning and tracking. Synthetic Aperture Radar (SAR) is widely acknowledged as one of the most extensively employed techniques for maritime supervision due to its exceptional capability to operate continuously under diverse weather conditions [2]. The field of maritime surveillance necessitates a rapid and efficient SAR ship detection method, given the importance of real-time ship detection.

Numerous efficient and accurate methods have been developed for ship detection, with scholars extensively researching real-time ship detection using various forms of SAR, including SAR imagery, raw SAR echo data and SAR range-compressed data. The earliest and most extensive research on SAR ship detection is the detection using SAR imagery. Based on the presence or absence of artificially designed features, this field can be categorized into traditional methods and deep learning-based methods. Traditional ship detection methods primarily rely on manually crafted features, with the constant false alarm rate algorithm (CFAR) being the most representative [3]. The CFAR algorithm is based on the utilization of statistical properties of background noise for determining an appropriate threshold through rigorous statistical analysis, thereby achieving the binary classification between ships and background. Apart from CFAR, there exist detection techniques based on global threshold [4], visual saliency [5], domain transform [6], etc. However, these conventional methods exhibit limited robustness in terms of detection performance and involve pixel-level traversal during processing, rendering them inadequate for real-time applications. The deep learning ship detection method employs deep neural networks for automated feature extraction. Due to the robust feature extraction capabilities of convolutional neural networks (CNN), CNN-based detection methods have garnered significant attention in research. Consequently, in this phase of the investigation, a two-stage detection network exemplified by Faster RCNN [7] and a one-stage detection network exemplified by YOLOv1 [8] emerged. Moreover, Transformer [9], a type of recurrent neural network (RNN) has also been found applicable. Subsequently, RNN-based networks such as Vision Transformer (ViT) [10] and Detection Transformer (DeTR) [11] have gradually emerged within the field of computer vision. These deep neural networks have been progressively integrated into SAR ship detection, showcasing precise detection performance and efficient detection time.

Although deep learning-based methods can rapidly detect ships in SAR images, the real-time detection of SAR ships is often constrained by imaging time in practical application scenarios. During the SAR imaging process, the echo data need to be transmitted to the ground station. Subsequently, after undergoing various intricate and time-consuming operations such as range cell migration correction, azimuth pulse compression, and radar radiation correction, SAR imagery products essential for subsequent ship detection can be obtained. Therefore, the overall time required for ship detection using SAR images should encompass both the detection time and the acquisition time of obtaining a fully focused SAR image product.

In light of the inherent limitations introduced by the SAR imaging process, researchers have attempted to advance the detection step and explore on-board detection before imaging, to enhance the real-time detection performance of SAR ships. Prior to imaging, several scholars [12,13] have attempted detection based on the analysis of raw SAR echo data. However, due to the low peak signal-to-noise ratio of SAR raw echo data, the accurate detection of ship targets becomes challenging. Consequently, there has been a growing interest in detecting ships within the range-compressed domain with partial focus. Joshi et al. [14] proposed that the real-time requirements of ship detection cannot be met by solely relying on fully focused SAR images, highlighting the necessity for additional processing efforts to generate these SAR images. The utilization of range-compressed data appears relatively appealing. Subsequently, CFAR [14], Faster RCNN [15], YOLO [16], LSTM [17], Inception [18], and other ship detection methods commonly employed for SAR imagery have also been applied to range-compressed data. Nevertheless, these methods were originally designed for optical images and do not account for network structure redesign when directly applied to range-compressed domain data. As a result, their performance may not meet expectations since properties of range-compressed SAR images significantly differ from those of optical remote sensing images and conventional SAR representations of ships. Additionally, the mentioned ship detection methods of the range-compressed domain fail to address the challenges of optimizing computational efficiency and reducing parameters. Therefore, it is imperative to design an efficient lightweight ship detection method based on the characteristics of the range-compressed domain.

In this paper, we propose an innovative design of a novel lightweight network, FastRCDet, which effectively ensures comparable performance to mainstream lightweight detection networks while significantly reducing the number of parameters and floating-point operations (FLOPs). The key contributions of this research can be summarized as follows:

We propose a novel lightweight detection network framework for ship detection in the SAR range-compressed domain, considering both the limited resources of embedded platforms and the unique characteristics of the SAR range-compressed domain to design an optimized network structure.
Considering the specific geometric characteristics in the SAR range-compressed domain and combining these with a lightweight network design concept, the Lightweight Adaptive Network (LANet) is proposed as the backbone. To address the large scale and high aspect ratio of ships in the range-compressed domain, Arbitrary Kernel Convolution (AKConv) is introduced as a fundamental component of our backbone. This allows for adaptability in shaping the receptive field to better suit ship data characteristics within the range-compressed domain. LANet outperforms existing lightweight networks in ship detection within the range-compressed domain, exhibiting smaller models, faster speeds, and more precise detection.
To further enhance the efficiency and simplicity of network models, an innovative single Multi-Scale Fusion Head (MSFH) is proposed. We incorporate an Atrous Spatial Pyramid Pooling (ASPP) module to effectively fuse feature maps from multiple scales and reduce computational complexity. This module effectively combines features at different scales to better capture detailed information of the target object. Compared to traditional multi-detection head structures, this design significantly reduces the number of parameters in the network model and minimizes computational resource consumption while maintaining high-precision detection results.
To enhance the adaptability of ship detection in the range-compressed domain of SAR, a novel loss function, Direction IoU (DIoU), tailored to the target’s shape characteristics is proposed. By meticulously designing the angular cost component, this function ensures higher predicted bounding box accuracy in the range through increased horizontal movement cost. Consequently, our model exhibits exceptional detection performance even in SAR range-compressed images.
To assess the performance of our proposed network, we conduct experiments using publicly available ship datasets in the range-compressed domain. In comparison to mainstream lightweight detection networks, our proposed network achieves significant reduction in parameter count and computational requirements without compromising detection performance.

The rest of this paper is organized as follows. In Section 2, we review the related work, which provides a literature review in lightweight networks, lightweight ship detection methods in SAR imagery, and ship detection methods in the SAR range-compressed domain. The theoretical explanation of the range-compressed domain is presented in Section 3, accompanied by a brief analysis of its characteristics. In Section 4, we elaborate on the proposed method in detail. The evaluation metrics, experimental design, experiment results, and analysis of experimental results are presented in Section 5. The conclusion is presented in Section 6.

2. Related Works

Performing ship detection prior to image acquisition from airborne or spaceborne SAR payload platforms reduces data transmission and enables a rapid determination of the ship’s position. Considering the limited availability of computing resources, it is imperative to develop a lightweight neural network model for smart ship detection that can effectively operate on resource-constrained edge computing platforms. Therefore, this section provides a comprehensive literature review encompassing three key aspects: (1) lightweight networks; (2) lightweight ship detection methods in SAR imagery; and (3) ship detection methods in the SAR range-compressed domain.

2.1. Lightweight Networks

Lightweight ship detection in SAR imagery is a critical technology for maritime surveillance and search and rescue operations. Although traditional deep learning models are effective, they have high computational and storage requirements. Consequently, relevant scholars have been actively engaged in the investigation of lightweight networks.

2.1.1. Convolution Neural Networks

Currently, convolutional neural networks (CNNs) are predominantly utilized for object detection and recognition tasks. Since AlexNet [19], CNNs have progressively evolved into deeper and more intricate architectures. However, their efficiency in terms of time and memory utilization has not necessarily improved. In real-world scenarios, real-time computation with limited computational resources is imperative, necessitating a delicate trade-off between speed and accuracy. To achieve higher efficiency, numerous studies of lightweight CNNs have been conducted. MobileNets [20,21,22], proposed by Google, are a series of lightweight networks that effectively reduce model parameters and computational complexity while marginally compromising precision. ShuffleNets [23,24], introduced by the Megvii Institute in 2017, demonstrates efficient performance on mobile devices. The Huawei Noah’s Ark Laboratory proposed an end-to-end network, GhostNet [25], which introduces an innovative Ghost module specifically designed to efficiently generate additional feature maps. EfficientNets [26,27] are convolutional neural networks proposed by the Google Brain, which achieves efficient model design through the uniform scaling of network depth, width, and resolution. This approach effectively reduces computational complexity and parameter count while maintaining high accuracy.

2.1.2. Vision Transformer and Variants

Since the proposal of ViT, which extends the application of the Transformer from machine translation or prediction [28] to computer vision, there has been a growing research interest in ViT. Subsequent studies have focused on enhancing ViT through improvements in lightweight model design. LeViT [29], proposed by Facebook AI Research, stands as a prominent benchmark among lightweight ViT models. It effectively amalgamates the global information transmission capabilities of Transformers with the local structural perception capabilities of CNNs. Wang et al. [30] proposed the Pyramid Vision Transformer (PVT), which integrates the pyramid structure of convolutional neural networks with the global receptive field of Transformers. PVT is specifically designed to address challenges such as low resolution, high computational requirements, and memory overhead associated with traditional Transformers when handling demanding prediction tasks. Chen et al. [31] have proposed a novel network called Mobile-Former, which effectively parallelizes the functionalities of MobileNet and Transformers while establishing a bidirectional connection between them. Mehta et al. [32] proposed a network named MobileViT that aims to integrate the strengths of CNN and ViT.

2.2. Lightweight Ship Detection in SAR Imagery

Lightweight ship detection in SAR imagery is a critical technology for maritime surveillance and search and rescue operations. Although traditional deep learning models are effective, their high computational and storage demands limit their suitability for deployment on resource-constrained platforms.

To address these challenges, recent advancements have focused on developing lightweight neural networks that maintain high detection accuracy while significantly reducing model size and computational complexity. These networks employ innovative techniques such as optimized convolutional layers, attention mechanisms, and advanced loss functions to efficiently process SAR images and accurately detect ships under various environmental conditions. The objective is to achieve real-time processing speeds and robust performance even in cluttered maritime scenes.

Currently, there exist two categories of lightweight synthetic aperture radar (SAR) ship detection methodologies. One approach involves adapting the mainstream object detection network by making certain modifications to reduce its computational burden. Miao et al. [33] proposed an improved lightweight RetinaNet model for ship detection in SAR images. By replacing shallow convolutional layers and reducing the number of deep convolutional layers in the backbone, adding space and channel attention modules, and applying a K-means clustering algorithm to adjust the model parameters, the calculation and parameter number are significantly reduced. Li et al. [34] proposed a novel detection framework based on Faster R-CNN to enhance detection speed, and made some improvements to its feature extraction, recognition, and positioning task network to improve its recognition and localization capabilities. Yu et al. [35] proposed a lightweight ship detection network based on YOLOX-s [36], which eliminates the computationally intensive pyramid structure and establishes a streamlined network relying on first-order features to enhance inspection efficiency. Liu et al. [37] presented a novel lightweight approach YOLOv7oSAR for ship detection in SAR imagery, demonstrating advancements over YOLOv7 [38] through the introduction of a lightweight rotating frame structure to reduce computational costs and the adoption of specific loss functions to enhance accuracy.

Another type of approach is to redesign a new network framework. Wang et al. [39] propose a lightweight network for ship target detection in SAR images, addressing the high-cost issue of traditional deep convolutional neural networks through the design of a network structure optimization algorithm based on a multi-objective Firefly algorithm. Ren et al. [40] presented a lightweight network named YOLO-Lite, based on the YOLO [41] framework. By devising feature-enhanced backbone networks and incorporating specialized modules, accurate SAR ship detection across diverse backgrounds is achieved while maintaining low computational overhead. Chang et al. [42] developed a lightweight multi-class SAR detection network, MLSDNet, which leverages adaptive scale-distributed attention and a streamlined backbone network to enhance target detection performance. Tian et al. [43] proposed the LFer-Net detector in order to address the challenges encountered in ship detection, such as low resolution, small target size, and the dense arrangement of ships. Zhou et al. [44] introduced a lightweight SAR ship detection network, EGTB-Net, which integrates converter and feature enhancement technologies to enhance both the speed and accuracy of detection.

2.3. Ship Detection in the SAR Range-Compressed Domain

Joshi et al. [14] proposed using the CFAR algorithm to detect ships outside the clutter region of the range-Doppler domain, which is a robust method for dealing with different clutter environments. However, it may encounter challenges when scaling effectively to larger datasets or more diverse environmental conditions. Leng et al. [45] proposes a ship detection method based on SAR range-compressed data, using Complex Signal Kurtosis (CSK) to pre-screen potential ship areas, and using CNN-based classification to detect potential target areas. The combined use of CSK and CNN could lead to an increase in computing demands, potentially hampering real-time processing capabilities.

In recent years, Gao et al. [17] proposed utilizing LSTM networks in natural language processing to detect two-dimensional objects as multiple one-dimensional sequences in the range-compressed domain and achieved preliminary results. However, further experiments are required to validate the real-time capabilities and speed of this detector. LSTM is primarily designed for sequential data processing, making its application to spatial domain tasks conceptually challenging and necessitating extensive fine-tuning. Loran et al. [15] introduced an airborne SAR ship detector based on the Faster R-CNN network in the range-Doppler domain. Nevertheless, these methods lack a comprehensive analysis of features associated with the range-compressed domain and fail to integrate them into their approaches. Zeng et al. [18] presented a ship target detection method based on SAR’s range-compressed domain using a novel inception-text convolutional neural network model. However, this method can only provide an approximate measurement of the range dimension, requiring further experimentation for verification of its applicability.

3. Characteristics of the Range-Compressed Domain

The term “range-compressed domain” refers to an incomplete focusing data format obtained through the pulse compression of raw SAR echo data in range dimension. In this section, we initially derive the range-compressed domain data theoretically and subsequently conduct a qualitative analysis to discern its disparity from SAR image quality.

3.1. Analytic Expression of the Range-Compressed Domain

First, an ideal process for acquiring raw SAR echo data that disregards various losses in electromagnetic wave transmission is investigated. The signal received by the SAR sensor from the target is expressed as

s_{r} (τ, η)

[46]:

\begin{matrix} s_{r} (τ, η) & = A_{0} ω_{r} (τ - 2 R (η) / c) ω_{a} (η - η_{c}) \\ \times \exp (- j 4 π f_{0} R (η) / c) \exp (j π K_{r} {(τ - 2 R (η) / c)}^{2}) \end{matrix}

(1)

where

τ

represents the range time, commonly referred to as fast time.

η

denotes the azimuth time, often known as slow time. The amplitude of the backscattering coefficient, denoted as

A_{0}

, is a constant.

ω (\cdot)

is the rectangular pulse function.

R

represents the distance between the SAR sensor and target point, which is directly associated with

η

.

c

represents the speed of light.

η_{c}

denotes the moment when the beam’s center intersects the target, relative to zero Doppler time. The radar center frequency is denoted as

f_{0}

.

K_{r}

is the linear modulation frequency in range.

The raw echo data in SAR imaging undergo pulse compression in terms of both range dimension and azimuth dimension, a process commonly referred to as matching filtering. The matching filter can be implemented through convolution in the time domain. However, it is more expedient to achieve this through direct design in the frequency domain, which can be expressed as follows:

S_{o u t} (f) = S_{r} (f) \cdot H (f)

(2)

The received signal

s_{r} (τ, η)

is initially subjected to the Fourier transform in the range dimension, yielding

S_{r} (f_{τ}, η)

in the frequency domain of the range:

S_{r} (f_{τ}, η) = \int_{- \infty}^{\infty} s_{r} (τ, η) \exp (- j 2 π f_{τ} τ) d τ

(3)

We can obtain a simplified approximate expression by employing the Principle of Standing Phase (POSP) [47].

S_{r} (f_{τ}, η) = A_{0} W_{r} (f_{τ}) w_{a} (η - η_{c}) \times \exp (- j \frac{4 π (f_{0} + f_{τ}) R (η)}{c}) \exp (- j \frac{π f_{τ}^{2}}{K_{r}})

(4)

where

W_{r} (f_{τ})

is the envelope of the range spectrum.

The filter

H (f)

should be designed to effectively eliminate the quadratic phase in (4), as follows:

H (f) = r e c t (\frac{f}{|K| T}) \exp (j π \frac{f^{2}}{K})

(5)

where

T

is the pulse duration and

K

is the linear modulation frequency.

The final expression for range-compressed data is derived from (4) and (5), as follows:

\begin{matrix} s_{r c} (τ, η) & = I F F T_{τ} [S_{r} (f_{τ}, η) H (f_{τ})] \\ = A_{0} p_{r} [τ - \frac{2 R (η)}{c}] w_{a} (η - η_{c}) \exp (\frac{- j 4 π f_{0} R (η)}{c}) \end{matrix}

(6)

where the compressed pulse envelope,

p_{r} (τ)

, is obtained as the inverse Fourier transform of the window function

W_{r} (f_{τ})

. For instance, in the case of a rectangular window, pr corresponds to the Sinc function.

3.2. Analysis of Characteristics

As illustrated in Figure 1, range-compressed domain data exhibit distinct characteristics compared to the fully focused SAR image. In Figure 1a, the SAR focused image clearly depicts the shape and intricate details of the target, whereas in Figure 1b, the SAR range-compressed domain data represent the ship as a hyperbolic shape. This comparison facilitates an enhanced comprehension and analysis of diverse features inherent in the range-compressed domain. The characteristics of range-compressed imagery can be briefly analyzed from three perspectives:

From the perspective of signal processing, the range-compressed imagery focuses in the range dimension while exhibiting a defocusing effect in the azimuth dimension. This is attributed to the concentration of signal energy along the range dimension towards target scattering centers through pulse compression. Consequently, when detecting ships in the range-compressed domain, ship and sea clutter characteristics are more distinguishable in terms of their spatial distribution along range. By enhancing extraction techniques for features oriented towards range information, ship detection performance can be significantly enhanced.
From a geometric perspective, a point target in the range-compressed domain typically exhibits a hyperbolic shape with a higher height-to-width ratio. The deviation from traditional focused SAR images arises due to the asymmetry in both range and azimuth. However, when the SAR acquisition distance is long, the curvature of the target hyperbola becomes small enough for point targets in the range-compressed domain data to be approximately treated as straight line segments parallel to the azimuth axis. Nevertheless, when employing a deep learning-based ship detection method, the aspect ratio unique to ships may not converge appropriately or could potentially result in suboptimal outcomes during the training process.
From the perspective of amplitude, range-compressed images offer a significant advantage in distinguishing between target and background areas due to their substantial signal-to-noise ratio gap. This disparity plays a pivotal role in effectively discerning ships from sea clutter by accurately depicting the intensity contrast between the ship and its surrounding background. Consequently, detected targets can be binary classified based on their distinctive features or characteristics, thereby achieving the objective of ship detection. This attribute proves particularly valuable for applications such as rapid ship detection.

4. Methods

In this section, a new lightweight network is proposed for ship detection in the SAR range-compressed domain, named FastRCDet, which is designed to be efficient and rapid. Instead of building upon an existing network as a baseline, we have reengineered the network specifically considering ship characteristics in the SAR range-compressed domain. FastRCDet introduces a lightweight backbone, utilizes a single detection head with a novel loss function, and incorporates an anchor-free algorithm to form our detection framework.

4.1. Overall Structure and Process

The overall network structure of FastRCDet is illustrated in Figure 2. It represents an anchor-free single-stage ship detector in the range-compressed domain. In order to cater to the specific characteristics of ship data within this domain, our design has deviated from the conventional network structure comprising input, backbone, neck, detection head, and output components. Instead, we have adopted a simplified framework consisting of four fundamental modules: input, lightweight backbone, lightweight single head, and output. This approach not only simplifies the network structure but also significantly reduces computational complexity.

Specifically, during the process of forward propagation, we initially subject the imagery in the range-compressed domain to the Lightweight Adaptive Network (LANet), which has been meticulously designed, thereby obtaining a deep and high-dimensional feature map. Subsequently, the feature maps originating from diverse spatial scales are effectively fused through a novel single Multi-Scale Fusion Head (MSFH) for accurate ship prediction.

In the process of backpropagation, a new loss function is proposed by analyzing the shape characteristics of ship images in the range-compressed domain. The resultant loss function, named as Direction IoU (DIoU), effectively facilitates network convergence by aligning the predicted bounding box with the vertical line of the ground truth box during training, which improves the model’s reception of the position and shape of the ship.

4.2. Backbone: Lightweight Adaptive Network (LANet)

4.2.1. Adaptive Kernel Convolution

Considering the distinctive characteristics of ships in the range-compressed domain, we propose adjusting the shape of the convolution kernel to effectively modify the receptive field region and enhance the accuracy of extraction. We incorporate AKConv [48] with shape adjustability, which offers significant advantages in addressing traditional convolution defects. This convolution not only enhances the capture of surrounding area information but also reduces parameter count and computational burden, resulting in improved results and higher efficiency for image processing tasks.

Traditional convolution exhibits limitations in processing range-compressed domain images. We introduce AKConv into the backbone to address these issues. The specific structure of AKConv is illustrated in Figure 3. Firstly, conventional convolution operations are confined to a fixed-size window and fail to capture information beyond its boundaries. Consequently, important details may be lost when dealing with larger targets. In contrast, AKConv possesses shape-adjustment capabilities by learning offsets that modify the shape of the convolution kernel. This enables more flexibility in adapting to varying sizes and shapes of windows during convolution operations, thereby effectively capturing additional information from the surrounding area. Furthermore, as the window size k increases, traditional convolution introduces a substantial increase in the number of parameters. Consequently, this leads to heightened model complexity and computational demands, making overfitting more likely to occur. AKConv effectively mitigates these issues by dynamically learning offsets and adjusting shapes. Therefore, when employing AKConv for convolution operations, it not only reduces model complexity and computational burden but also enhances overall generalization capabilities.

AKConv stands out due to its ability to accommodate an arbitrary number of parameters, setting it apart from traditional convolution methods that rely on fixed-size parameters like 3 × 3 or 5 × 5. As depicted in Figure 4, AKConv surpasses this constraint by allowing parameter sizes to be flexibly set to any value, including but not limited to 1, 2, 3, 4, and beyond. This characteristic endows AKConv with enhanced flexibility for model design.

In addition to the arbitrary number of parameters, AKConv possesses another distinctive characteristic: the initial convolution kernel shape is not constrained by traditional conventions. While conventional convolutions typically employ square or rectangular shapes as their initial kernels, AKConv allows for a wider range of shapes and sizes. Figure 5 illustrates how different shapes and sizes can be utilized as initial cores when employing n convolution parameters.

4.2.2. Adaptive Kernel Block (AK Block)

We have developed two types of blocks within LANet, namely the basic block and the Adaptive Kernel (AK) block. The schematic representation of the conventional block is illustrated in Figure 6a, while Figure 6b depicts the architecture of the AK block. The basic block consists of an AKConv layer and two 1 × 1 convolution layers. Inspired by the concept of ResNet [49], a shortcut connection is incorporated between the input and output of the block.

In the proposed LANet, the network structure is constructed by sequentially connecting an AK block and three basic blocks using merging layers. The design of LANet aims to fully exploit the unique advantages offered by the AKConv layer and complement conventional convolution operations.

Through the utilization of this deep convolutional network architecture, the extraction of image features becomes more comprehensive and accurate, enabling the capture of rich and diverse information present in the image. Notably, the normal block encompasses essential functions for normal convolution, effectively extracting deep features. Additionally, the AK block introduces a novel AKConv layer that enhances the model’s capability to model intricate relationships between details and local regions while preserving its original information extraction ability. The various components of the network are tightly integrated and play a pivotal role in information transmission and the acquisition of abstract representations. This deep convolutional network architecture efficiently progresses from low-level visual features to high-level semantic representations, ultimately generating high-dimensional feature representations with abundant semantic information and excellent generalization performance.

4.3. Detection Head: Multi-Scale Fusion Head (MSFH)

Most target detection networks use multiple detection heads to adapt to objects of different scales. The high-layer detection head is responsible for the detection of small objects, while the low-layer detection head is responsible for the detection of large objects, which is a divide-and-conquer strategy. However, in fact, the key to detecting objects of different scales is the size of the receptive field, because the receptive field required for objects of different scales is different, and each layer of the model has a different size of the receptive field. Including FPN, it integrates and fuses different receptive field features.

In this paper, a novel design method for a single detector head named Multi-Scale Fusion head (MSFH) is proposed, which draws on the idea of YOLOF [50], and uses a parallel group convolution network structure similar to Inception [51] to fuse different receptive field features. In this way, the single detection head can be adapted to objects of different scales, and the network structure is simplified to reduce the computation. The structure of the detection head is illustrated in Figure 7.

Specifically, in the MSFH module, we employ Atrous Spatial Pyramid Pooling (ASPP) [52] to fuse multi-scale features. This module replaces the conventional pooling operation with multiple parallel dilatation convolution layers featuring different dilation rates. The features extracted from each parallel convolution layer are processed in separate branches and fused to generate the final outcome. The core concept lies in dividing the input image into distinct levels, where each level incorporates multiple parallel dilatation convolutions with varying dilation rates, followed by concatenating the output results from all levels to form a comprehensive feature representation for the network. This approach offers an advantage as it combines outputs from diverse levels, enabling the MSFH module to effectively extract features from images of various sizes and thereby preserving image information more comprehensively. The background, position, and category information of the ship target are obtained by employing three parallel 5 × 5 convolution layers, respectively.

4.4. Loss Function: Direction IoU (DIoU)

The object detection loss function, as depicted in (7), is decomposed into two components: Intersection over Union (IoU) loss and category loss.

L o s s_{t o t a l} = L_{I o U} + L_{c a t e g o r y}

(7)

where since the proposed method in our study solely focuses on ship position detection without considering categorization,

L_{c a t e g o r y}

is zero and will not be further discussed.

Firstly, in object detection, IoU, initially employed to quantify the accuracy between the predicted bounding box and ground truth bounding box, is defined as follows:

I o U = \frac{|A \cap B|}{|A \cup B|}

(8)

where

A, B \subseteq 𝕊 \in ℝ^{n}

represents any two polygonal boxes. In the early target detection networks, the loss function used in the backpropagation process was defined as follows:

L_{I o U} = 1 - I o U

(9)

The ship range-compressed data differ significantly from conventional optical remote sensing images and SAR images, necessitating a redesign of the loss function to enhance network training. To better suit ship detection within the range-compressed domain of SAR, we have devised a novel IoU loss function named Direction IoU (DIoU) that aligns more effectively with the target’s shape characteristics. The expression of DIoU is as follows:

L_{D I o U} = L_{I o U} + L_{d i s t} + L_{s h a p e}

(10)

where

L_{d i s t}

represents distance loss, and

L_{s h a p e}

represents shape loss. The following provides the precise definition and calculation of distance loss and shape loss, respectively.

4.4.1. Distance Loss

As depicted in Figure 1, the SAR range-compressed domain image demonstrates range-dimensional focusing of the ship while exhibiting azimuth-dimensional defocusing. Specifically, it is crucial to ensure utmost precision in detecting range cells while allowing a certain level of compromise in azimuthal accuracy. Consequently, during network training, the meticulous prediction of bounding boxes should be emphasized for horizontal movement, whereas vertical movement can be approached with relatively more flexibility. In conclusion, it is imperative to devise a loss function that adequately accounts for the impact of distance between the predicted bounding box and ground-truth bounding box when the angle

α

approaches 0 and minimizes its influence as

α

trends towards

π / 2

.

The calculation of the proposed distance loss is illustrated by (11):

L_{d i s t} = \frac{1}{2} \sum_{t = x, y} (1 - e^{- (2 - A) ρ_{t}}) = 1 - \frac{1}{2} e^{- (2 - A) ρ_{x}} - \frac{1}{2} e^{- (2 - A) ρ_{y}}

(11)

where

A

represents angle cost, and the calculation of

A

is presented in (12).

ρ_{x}

(

ρ_{y}

) represents the square ratio of the difference between the width (height) of the center point in the actual box and the width (height) of the center point in the predicted box to the width (height) of the smallest peripheral rectangle, as depicted in (13).

A = 1 - \sin^{2} (α) = 1 - \sin^{2} (\arcsin (\frac{c_{h}}{σ}))

(12)

ρ_{x} = {(\frac{b_{c_{x}}^{g t} - b_{c_{x}}}{C_{w}})}^{2}, ρ_{y} = {(\frac{b_{c_{y}}^{g t} - b_{c_{y}}}{C_{h}})}^{2}

(13)

As depicted in Figure 8,

α

represents the azimuth angle between the ground truth bounding box and the predicted bounding box,

c_{h}

denotes the vertical displacement between them,

c_{w}

signifies their horizontal separation, and

σ

indicates the distance between their respective center points. The variables

C_{w}

and

C_{h}

represent the width and height, respectively, of the minimum external bounding box encompassing both bounding boxes.

c_{w} = \max (b_{c_{x}}^{g t}, b_{c_{x}}) - \min (b_{c_{x}}^{g t}, b_{c_{x}})

(14)

c_{h} = \max (b_{c_{y}}^{g t}, b_{c_{y}}) - \min (b_{c_{y}}^{g t}, b_{c_{y}})

(15)

σ = \sqrt{{(b_{c_{x}}^{g t} - b_{c_{x}})}^{2} + {(b_{c_{y}}^{g t} - b_{c_{y}})}^{2}}

(16)

where

(b_{c_{x}}, b_{c_{y}})

and

(b_{c_{x}}^{g t}, b_{c_{y}}^{g t})

represent the central point coordinates of the prediction box and the truth box, respectively.

As

α

approaches

π / 2

, it can be observed that

A

tends to approach 0 in Figure 9c. In Formula (11), when

ρ

remains constant, a smaller value of

A

corresponds to a smaller exponential term, indicating greater distance loss, as shown in Figure 9d. Consequently, in order to achieve the same low loss during descent, the predicted bounding box needs to undergo more displacement towards the ground-truth bounding box as α increases.

4.4.2. Shape Loss

The calculation of shape loss is presented in (17):

L_{s h a p e} = \frac{1}{2} \sum_{t = w, h} {(1 - e^{- ω_{t}})}^{4}

(17)

The

ω_{w}

(

ω_{h}

) is defined as the quotient of the discrepancy between the width (height) values of the truth-valued bounding box and the predicted bounding box, divided by the maximum width (height) value observed in both boxes. The geometric relationship between foundation truth bounding box and prediction bounding box is shown in Figure 10. The calculation of

ω

is presented in (18).

ω_{w} = \frac{|w - w^{g t}|}{\max (w, w^{g t})}, ω_{h} = \frac{|h - h^{g t}|}{\max (h, h^{g t})}

(18)

By combining Formulas (9)–(11) and (17), the final expression of DIoU loss can be expressed as follows:

L_{D I o U} = L_{I o U} + L_{d i s t} + L_{s h a p e} = 2 - I o U - \frac{1}{2} \sum_{t = x, y} (1 - e^{- (2 - A) ρ_{t}}) + \frac{1}{2} \sum_{t = w, h} {(1 - e^{- ω_{t}})}^{4}

(19)

4.5. Anchor-Free

Among the conventional object detection algorithms, the anchor-based algorithm is a prevalent approach. This technique necessitates performing an anchor-bias operation on the dataset, which involves clustering the width and height of annotated objects to derive a set of prior dimensions. Subsequently, during model training, the network optimizes its predicted bounding box dimensions based on this set of priors.

However, our proposed FastRCDet adopts an anchor-free algorithm. In contrast to an anchor-based algorithm, this algorithm eliminates the need for predefining anchor frames based on prior width and height information of ground truth values. Instead, it transforms target detection into a key point detection problem. Specifically, during model construction, the target is treated as a single point represented by its center coordinates using this method. Consequently, target detection is achieved through the regression of the center point and its distance from the target boxes.

This simplification streamlines model post-processing and reduces computational overhead. Another distinction is that while anchor-based algorithms typically associate N anchors with each feature point on a feature map, our anchor-free method only corresponds one candidate box per feature point. Consequently, it also offers significant advantages in terms of reasoning speed.

5. Experiments

To validate the efficacy of FastRCDet, an extensive array of experiments was conducted on an available ship dataset within the range-compressed domain of SAR. This section initially outlines the dataset preparation, experimental setup, evaluation metrics, and training specifics. Subsequently, ablation experiments were performed to substantiate the effectiveness of each proposed module. Furthermore, a comparative analysis was carried out to juxtapose the proposed FastRCDet with the most commonly used detectors.

5.1. Dataset

FastRCDet is a supervised deep learning detector. It is crucial to have an adequate number of range-compressed domain data samples with target labeling in order to ensure that the trained network model possesses strong reasoning capabilities, specifically for detecting ships in test images.

In this study, we employ RCShip [53], a publicly available dataset of ship detection data in the range-compressed domain. Table 1 presents the relevant parameters associated with these images. The dataset consists of 18,672 images measuring 1024 × 1024 pixels each. The data show distinct ships and encompass extensive sea surface areas. These samples encompass diverse ships across various scenarios and have been meticulously annotated by experts to ensure precise and dependable reference data.

During the training phase, utilizing these labeled compressed domain data samples enables the network model to better comprehend and identify various types of targets. Consequently, it effectively guides the network model in acquiring the capability to differentiate and locate each target within test images.

5.2. Implementation Details

The experiments in this paper were conducted using the same equipment, and the specific environmental parameters of the equipment are presented in Table 2. For network training, a batch size of 16 was employed, with an input image size of 1024 × 1024. Each training iteration consisted of 200 epochs, starting with an initial learning rate of 0.001 and gradually decreasing to a final learning rate of 0.0001. Stochastic Gradient Descent (SGD) was utilized as the optimizer, with a momentum value set at 0.937 and a weight decay term set at 0.0005.

5.3. Evaluation Metrics

The performance evaluation of the implemented detector incorporates

F_{1} - s c o r e

, average precision (

A P

), frames per second (

F P S

), parameters (

P a r a m s .

), and floating point operations (

F L O P s

) as evaluation metrics. These metrics are computed as follows:

P = \frac{T P}{T P + F P}

(20)

R = \frac{T P}{T P + F N}

(21)

F_{1} - s c o r e = \frac{2 \times P \times R}{P + R}

(22)

A P = \int_{0}^{1} P (R) d R

(23)

F P S = \frac{n}{t i m e}

(24)

P a r a m s . \sim O (\sum_{l = 1}^{D} K_{l}^{2} \cdot C_{l - 1} \cdot C_{l} + \sum_{l = 1}^{D} M_{l}^{2} \cdot C_{l})

(25)

F L O P s \sim O (\sum_{l = 1}^{D} M_{l}^{2} \cdot K_{l}^{2} \cdot C_{l - 1} \cdot C_{l})

(26)

where

T P

represents the count of accurately detected ships, while

F P

denotes false alarms, indicating the number of non-ship targets mistakenly identified.

F N

stands for missed alarms and indicates the count of undetected ships.

n

represents the total number of images.

M_{l}

represents the dimension of the

l

-th feature map,

C_{l}

represents the number of channels of the

l

-th feature map,

K_{l}

represents the size of the

l

-th convolution kernel, and

D

represents the total number of layers.

5.4. Experimental Results

5.4.1. Comparison Experiments between the Lightweight Detection Methods and FastRCDet

In the experiments, we conducted a comparative analysis between FastRCDet and other state-of-the-art CNN-based object detectors, including a two-stage network (Faster R-CNN) and one-stage networks (YOLOv4-tiny, YOLOv5-n, and YOLOv7-tiny), as well as the anchor-free network, YOLOX-nano. The specific results can be found in Table 3.

According to the Table 3, it is evident that various detection networks exhibit different performances. Particularly for lightweight models, the FastRCDet proposed in this paper demonstrates conspicuous advantages.

Firstly, in terms of accuracy, FastRCDet achieved a relatively high level compared to the lightweight models YOLOv4-tiny, YOLOv5-s, and YOLOv7-tiny, with its F1-score reaching 68.81%. This surpasses the performance of YOLOv5-s (47.55%) and YOLOv7-tiny (49.09%), indicating that FastRCDet exhibits superior overall performance in target detection tasks. Secondly, in terms of computational efficiency, FastRCDet achieves a frame rate (FPS) of 38.02, surpassing the respective rates of 30.03 for YOLOv4-tiny and 24.03 for YOLOv5-s. This improvement highlights its suitability for real-time scenarios with stringent time constraints.

The FastRCDet model also demonstrates competitiveness in terms of model size and computation, with a mere 2.49 M parameters and 8.73 GFLOPs required. This confers a significant advantage for deployment in resource-constrained environments. In summary, as a lightweight ship detection network, FastRCDet not only exhibits exceptional accuracy and recall rate but also offers substantial benefits in real-time performance and resource utilization. These attributes render it suitable for application scenarios that demand stringent computing resources and rapid response times.

According to the experimental results presented in Table 3, it is evident that FastRCDet achieves a superior detection speed and boasts the most lightweight model size. Among models with similar sizes to FastRCDet, YOLOv4-tiny and YOLOX-tiny are the closest contenders. To provide visual evidence, we include the test results of FastRCDet alongside these two models for comparison purposes in Figure 11 and Figure 12. Figure 11 depicts scenarios of sparser ship distribution, while Figure 12 shows scenarios of denser ship distribution. Notably, in the selected eight scenarios, FastRCDet demonstrates performance on par with YOLOX-tiny. However, YOLOv4-tiny exhibits more instances of false detection and missed detection, aligning with our actual test findings.

In summary, the FastRCDet model achieves rapid detection speed with minimal model parameters and computational requirements, while maintaining performance that is comparable to the state-of-the-art real-time detector YOLOX.

5.4.2. Comparison Experiments between the Mainstream Lightweight Backbone and LANet

To validate the efficacy of LANet in ship detection, we conducted experiments employing diverse backbones while keeping the detection head conditions constant. The experimental findings are presented in Table 4. The effectiveness of LANet in ship detection can be comprehensively analyzed based on the experimental results.

According to Table 4, LANet, as proposed in this paper, exhibits significant superiority over mainstream lightweight backbones (MobileNetv2 [21], GhostNetv2 [25], and ShuffleNetv2 [24]) when employing the same detection head (MSFH + DIoU). Specifically, LANet achieves an

A P_{0.5}

indicator of 77.12%, surpassing the performance of other backbone networks (48.71% for MobileNetv2, 49.26% for GhostNetv2, and 42.13% for ShuffleNetv2). These results demonstrate that LANet effectively enhances detection accuracy in ship detection tasks with a substantial advantage.

The performance of LANet on the evaluation indicator

A P_{0.5 : 0.95}

was also notable, achieving a significant 39.70% compared to other backbone networks (MobileNetv2 at 17.30%, GhostNetv2 at 18.42%, and ShuffleNetv2 at 13.21%). This demonstrates the robustness of LANet across a wider range of IoU.

Furthermore, LANet exhibits remarkable computational efficiency, surpassing other backbone networks with an impressive FPS of 38.02 (compared to MobileNetv2’s 33.34, GhostNetv2’s 27.45, and ShuffleNetv2’s 15.55). This exceptional performance enables real-time target detection at significantly faster speeds, making it well-suited for scenarios requiring prompt responses.

In conclusion, LANet, proposed as the novel backbone, exhibits remarkable performance advantages for ship detection tasks, encompassing high precision, efficiency, and adaptability. It is well-suited for application scenarios that necessitate real-time inspections at high speeds while ensuring accuracy.

5.4.3. Comparison Experiments of the YOLOX Head and CIoU Loss

To evaluate the efficacy of MFSH in ship inspection, we conducted comparative experiments utilizing the detection head of YOLOX [36], a widely acknowledged state-of-the-art anchor-free detector. Simultaneously, to assess the effectiveness of DIoU, we employed the CIoU (Complete IoU) loss function for conducting comparative experiments while maintaining consistency in the backbone. The experimental findings are presented in Table 5.

(1): YOLOX Head vs. MSFH:

According to Table 4, YOLOX Head employs two loss functions (CIoU and DIoU), yielding 58.24%/20.35% and 60.30%/22.61% on the

A P_{0.5}

and

A P_{0.5 : 0.95}

metrics, respectively. MSFH also utilizes these two loss functions but achieves significantly improved performance under identical conditions, achieving 71.26%/35.19% and 77.12%/39.70% on the

A P_{0.5}

and

A P_{0.5 : 0.95}

indicators, respectively, demonstrating superior detection accuracy compared to YOLOX Head.

YOLOX Head achieves an FPS of 25.36, while the MSFH outperforms it with an FPS of 38.02, indicating superior processing speed in MSFH as well. Moreover, the parameter count for YOLOX Head is 9.02 M, whereas for MSFH it is only 2.49 M, suggesting a lower model complexity in MSFH. Additionally, the computational complexity measured by GFLOPS is significantly higher in YOLOX Head at 34.62 compared to MSFH’s value of 8.73.

In conclusion, from the perspective of the detection head, MSFH outperforms YOLOX Head in all metrics, indicating that MSFH exhibits higher accuracy and efficiency in ship detection tasks in the SAR range-compressed domain.

(2): CIoU vs. DIoU:

The experimental results in Table 5 demonstrate the significant impact of employing different loss functions. Specifically, under YOLOX Head, DIoU outperforms CIoU with respect to

A P_{0.5}

and

A P_{0.5 : 0.95}

, as evidenced by an increase from 58.24% to 60.30% for

A P_{0.5}

and from 20.35% to 22.61% for

A P_{0.5 : 0.95}

when using DIoU.

When employing the MSFH, the utilization of CIoU yields an

A P_{0.5}

of 71.26% and an

A P_{0.5 : 0.95}

of 35.19%, whereas DIoU achieves superior performance with an

A P_{0.5}

of 77.12% and an

A P_{0.5 : 0.95}

of 39.70%. These results indicate that DIoU outperforms CIoU in terms of both

A P_{0.5}

and

A P_{0.5 : 0.95}

under MSFH.

From the perspective of the loss function, DIoU outperforms CIoU across all metrics, thereby suggesting that employing the DIoU loss is more appropriate for ship detection in the range-compressed domain.

5.4.4. Ablation Experiments to Examine the Effect of Neck

To ascertain the dispensability of neck in ship detection within the range-compressed domain of SAR, we conducted experiments with different necks while maintaining consistent conditions. The experimental results are presented in Table 6.

By eliminating the neck structure and utilizing LANet as the backbone, a remarkable improvement in frame rate to 38.02 FPS was achieved, surpassing the performance of FPN (15.86 FPS) and PAFPN (14.01 FPS). This underscores the exceptional efficiency of processing extensive data without relying on the neck structure.

The number of parameters in the no-neck configuration is merely 2.49 M, accompanied by a computation of 8.73 GFLOPs, which significantly contrasts with the parameter count and computation involved when employing the neck structure. This not only diminishes hardware requisites and operational expenses but also enhances the model’s applicability within resource-constrained environments.

Despite the absence of the neck structure, our proposed method exhibits commendable detection accuracy. The achieved

A P_{0.5}

of 77.12% and

A P_{0.5 : 0.95}

of 39.70% are comparable to those obtained with the inclusion of the neck structure, underscoring that our neck-free configuration significantly enhances real-time performance and computational efficiency without compromising precision.

The advantage of the neck-free structure is particularly suitable for applications that require rapid processing of large amounts of data. High frame rates and low latency improve the system’s response speed, helping real-time ship monitoring systems operate more efficiently.

6. Conclusions

To address the issue of being inadequately lightweight that occurs in existing methods for SAR range-compressed domains, we propose a novel ship detection method in the range-compressed domain called FastRCDet. This method aims to simplify the network structure and reduce parameter count and computational complexity, thereby enhancing real-time ship detection on board. The specific innovations are outlined below:

A new lightweight backbone, LANet. LANet employs adaptive kernel convolution to dynamically adjust convolution parameters, thereby reducing the number of model parameters and enhancing the effectiveness of depth feature extraction in the range compression domain.
A new single detection head, MSFH. By integrating feature maps of different scales, MSFH adeptly adapts to ship shape features within the range-compressed domain, thereby mitigating the potential degradation of detection performance caused by reduced network models.
A new loss function, DIoU. Considering the geometric attributes of ship, a new loss function called DIoU is designed to enhance the adaptability of our network towards range-compressed domain characteristics.

To validate the feasibility and efficacy of our approach, ship detection experiments were conducted employing publicly available range-compressed datasets. Our results demonstrate that FastRCDet achieves a significant reduction in parameter count and computational complexity, with about a 27% improvement in real-time detection speed compared to existing methods.

Future research will further optimize the network structure to improve the detection performance and real-time capability. At the same time, according to the characteristics of the embedded edge platform, we will also carry out targeted optimization. These optimization measures include, but are not limited to, adopting more advanced algorithms, increasing hardware resources, improving imaging algorithms, and so on. The model demonstrates accurate results in validation. However, the collection of more appropriate ship data is necessary to enhance the dataset and thoroughly evaluate and refine the network. Through these efforts, we hope to make deep learning technology work better for SAR ship detection, and make references for the radar community.

Author Contributions

Conceptualization, X.T.; methodology, X.T. and R.L.; software, X.T.; validation, X.T. and R.L.; formal analysis, X.T. and Z.S.; investigation, X.T.; resources, X.T. and X.L.; data curation, X.T. and Z.S.; writing—original draft preparation, X.T.; writing—review and editing, X.L., R.L. and K.J.; visualization, X.T. and R.L.; supervision, X.L. and K.J.; project administration, G.K.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded partly by the National Natural Science Foundation of China (62001480, 62471475), Hunan Provincial Natural Science Foundation of China (2024JJ4045), the science and technology innovation Program of Hunan Province (2024RC3124), and Young Elite Scientists Sponsorship Program.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank European Space Agency (ESA) for providing free Sentinel-1 data online.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, C.; Liu, P.; Wang, H.; Jin, Y. A review of recent advance of ship detection in single-channel SAR images. Waves Random Complex Media 2023, 33, 1442–1473. [Google Scholar] [CrossRef]
Luo, R.; Chen, L.; Xing, J.; Yuan, Z.; Tan, S.; Cai, X.; Wang, J. A Fast Aircraft Detection Method for SAR Images Based on Efficient Bidirectional Path Aggregated Attention Network. Remote Sens. 2021, 13, 2940. [Google Scholar] [CrossRef]
Eldhuset, K. An automatic ship and ship wake detection system for spaceborne SAR images in coastal regions. IEEE Trans. Geosci. Remote Sens. 1996, 34, 1010–1019. [Google Scholar] [CrossRef]
Gao, G. Statistical Modeling of SAR Images: A Survey. Sensors 2010, 10, 775–795. [Google Scholar] [CrossRef]
Zhai, L.; Li, Y.; Su, Y. Inshore Ship Detection via Saliency and Context Information in High-Resolution SAR Images. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1870–1874. [Google Scholar] [CrossRef]
Schwegmann, C.P.; Kleynhans, W.; Salmon, B.P. Synthetic Aperture Radar Ship Detection Using Haar-Like Features. IEEE Geosci. Remote Sens. Lett. 2017, 14, 154–158. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I.J.A. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
Leng, X.; Ji, K.; Kuang, G. Ship Detection from Raw SAR Echo Data. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5207811. [Google Scholar] [CrossRef]
Cascelli, G.; Guaragnella, C.; Nutricato, R.; Tijani, K.; Morea, A.; Ricciardi, N.; Nitti, D.O. Use of a Residual Neural Network to Demonstrate Feasibility of Ship Detection Based on Synthetic Aperture Radar Raw Data. Technologies 2023, 11, 178. [Google Scholar] [CrossRef]
Joshi, S.K.; Baumgartner, S.V.; da Silva, A.B.C.; Krieger, G. Range-Doppler Based CFAR Ship Detection with Automatic Training Data Selection. Remote Sens. 2019, 11, 1270. [Google Scholar] [CrossRef]
Loran, T.; Silva, A.B.C.d.; Joshi, S.K.; Baumgartner, S.V.; Krieger, G. Ship Detection Based on Faster R-CNN Using Range-Compressed Airborne Radar Data. IEEE Geosci. Remote Sens. Lett. 2023, 20, 3500205. [Google Scholar] [CrossRef]
Tan, X.; Leng, X.; Wang, J.; Ji, K. A ship detection method based on YOLOv7 in range-compressed SAR data. In Proceedings of the IET International Radar Conference (IRC 2023), Chongqing, China, 3–5 December 2023; pp. 948–952. [Google Scholar]
Gao, Y.; Li, D.; Guo, W.; Yu, W. SAR Ship Detection in Range-Compressed Domain Based on LSTM Method. In Proceedings of the IGARSS 2023—2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 16–21 July 2023; pp. 6422–6425. [Google Scholar]
Zeng, H.; Song, Y.; Yang, W.; Miao, T.; Liu, W.; Wang, W.; Chen, J. An Incept-TextCNN Model for Ship Target Detection in SAR Range-Compressed Domain. IEEE Geosci. Remote Sens. Lett. 2024, 21, 3501305. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1, Lake Tahoe, Nevada, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Tan, M.; Le, Q. Efficientnetv2: Smaller models and faster training. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 10096–10106. [Google Scholar]
Wen, S.; Wang, H.; Metaxas, D. Social ode: Multi-agent trajectory forecasting with neural ordinary differential equations. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 217–233. [Google Scholar]
Graham, B.; El-Nouby, A.; Touvron, H.; Stock, P.; Joulin, A.; Jégou, H.; Douze, M. Levit: A vision transformer in convnet’s clothing for faster inference. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 12259–12269. [Google Scholar]
Wang, W.; Xie, E.; Li, X.; Fan, D.-P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 568–578. [Google Scholar]
Chen, Y.; Dai, X.; Chen, D.; Liu, M.; Dong, X.; Yuan, L.; Liu, Z. Mobile-former: Bridging mobilenet and transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5270–5279. [Google Scholar]
Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
Miao, T.; Zeng, H.; Yang, W.; Chu, B.; Zou, F.; Ren, W.; Chen, J. An improved lightweight RetinaNet for ship detection in SAR images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4667–4679. [Google Scholar] [CrossRef]
Li, Y.; Zhang, S.; Wang, W.Q. A Lightweight Faster R-CNN for Ship Detection in SAR Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4006105. [Google Scholar] [CrossRef]
Yu, W.; Wang, Z.; Li, J.; Luo, Y.; Yu, Z. A Lightweight Network Based on One-Level Feature for Ship Detection in SAR Images. Remote Sens. 2022, 14, 3321. [Google Scholar] [CrossRef]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Liu, Y.; Ma, Y.; Chen, F.; Shang, E.; Yao, W.; Zhang, S.; Yang, J. YOLOv7oSAR: A Lightweight High-Precision Ship Detection Model for SAR Images Based on the YOLOv7 Algorithm. Remote Sens. 2024, 16, 913. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Wang, J.; Cui, Z.; Jiang, T.; Cao, C.; Cao, Z. Lightweight Deep Neural Networks for Ship Target Detection in SAR Imagery. IEEE Trans. Image Process. 2023, 32, 565–579. [Google Scholar] [CrossRef]
Ren, X.; Bai, Y.; Liu, G.; Zhang, P. YOLO-Lite: An efficient lightweight network for SAR ship detection. Remote Sens. 2023, 15, 3771. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Chang, H.; Fu, X.; Dong, J.; Liu, J.; Zhou, Z. MLSDNet: Multiclass Lightweight SAR Detection Network Based on Adaptive Scale Distribution Attention. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4010305. [Google Scholar] [CrossRef]
Tian, C.; Liu, D.; Xue, F.; Lv, Z.; Wu, X. Faster and Lighter: A Novel Ship Detector for SAR Images. IEEE Geosci. Remote Sens. Lett. 2024, 21, 4002005. [Google Scholar] [CrossRef]
Zhou, S.; Zhang, M.; Wu, L.; Yu, D.; Li, J.; Fan, F.; Zhang, L.; Liu, Y. Lightweight SAR Ship Detection Network Based on Transformer and Feature Enhancement. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 4845–4858. [Google Scholar] [CrossRef]
Leng, X.; Wang, J.; Ji, K.; Kuang, G. Ship Detection in Range-Compressed SAR Data. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 2135–2138. [Google Scholar]
Cumming, I.G.; Wong, F.H. Digital Processing of Synthetic Aperture Radar Data: Algorithms and Implementation; Artech House: Norwood, MA, USA, 2005; Volume 1, pp. 108–110. [Google Scholar]
Key, E.; Fowle, E.; Haggarty, R. A method of designing signals of large time-bandwidth product. IRE Int. Conv. Rec 1961, 4, 146–155. [Google Scholar]
Zhang, X.; Song, Y.; Song, T.; Yang, D.; Ye, Y.; Zhou, J.; Zhang, L. AKConv: Convolutional kernel with arbitrary sampled shapes and arbitrary number of parameters. arXiv 2023, arXiv:2311.11587. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Chen, Q.; Wang, Y.; Yang, T.; Zhang, X.; Cheng, J.; Sun, J. You only look one-level feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13039–13048. [Google Scholar]
Szegedy, C.; Wei, L.; Yangqing, J.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
Tan, X.; Leng, X.; Ji, K.; Kuang, G. RCShip: A dataset dedicated to ship detection in range-compressed SAR data. IEEE Geosci. Remote Sens. Lett. 2024, 21, 4004805. [Google Scholar] [CrossRef]
Ultralytics. YOLOv5. Available online: https://github.com/ultralytics/yolov5 (accessed on 22 November 2022).
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]

Figure 1. Two representations of the same ship: (a) depicts the fully focused SAR image, while (b) illustrates the image in the range-compressed domain.

Figure 2. The structure of FastRCDet. FastRCDet consists of four modules: input, backbone, head, and output.

Figure 3. The structure of Adaptive Kernel Convolution (AKConv).

Figure 4. Examples of arbitrary numbers of the convolution kernel parameter. The numbers in the figure represent the quantity of convolution parameters.

Figure 5. (a–f) Examples of arbitrary shapes of the convolution kernel in the same kernel parameter. The examples from (a) to (f) illustrate six instances with same parameters but distinct shapes.

Figure 6. The structural composition of the two proposed blocks: (a) basic block and (b) Adaptive Kernel block (AK block).

Figure 7. The detailed structure of Multi-Scale Fusion Head (MSFH).

Figure 8. Geometric relationship and variable diagram of the ground-truth bounding box and prediction bounding box in the calculation of distance loss.

Figure 9. Graphs depicting interconnected functions are presented: (a) represents the sine function, (b) illustrates the square of the sine function, (c) displays Equation (12) for

A

, and (d) showcases exponential terms for various values of

A

.

Figure 9. Graphs depicting interconnected functions are presented: (a) represents the sine function, (b) illustrates the square of the sine function, (c) displays Equation (12) for

A

, and (d) showcases exponential terms for various values of

A

.

Figure 10. Geometric relationship and variable diagram of ground-truth bounding box and prediction bounding box in the shape loss calculation. The blue box represents the prediction bounding box, while the green box denotes the ground-truth bounding box.

Figure 11. Visualization of detection results, showcasing the performance of YOLOv4-tiny, YOLOX-tiny, and FastRCDet, proposed in this study, respectively, from top to bottom. (a–d) Scenarios 1 to 4 with a sparser distribution of ships. The green, red, and yellow bounding boxes represent accurate detection, false detection, and missed detection.

Figure 12. Visualization of detection results, showcasing the performance of YOLOv4-tiny, YOLOX-tiny, and FastRCDet, proposed in this study, respectively, from top to bottom. (a–d) Scenarios 5 to 8 with a denser distribution of ships. The green, red, and yellow bounding boxes represent accurate detection, false detection, and missed detection.

Table 1. The basic parameters of SAR images from the Sentinel-1 satellite.

Parameter	Value	Unit
Imaging Mode	IW	-
Resolution	5 × 20	Meter (m)
Polarization	VV, VH	-
Wave Band	C	-
Image Size	21,000 × 14,000 (around)	pixel
Cover Range	250	Kilometer (km)

Table 2. The parameters of the experimental environment.

No.	Parameters	Value
1.	Operating System	Microsoft Windows 10 × 64
2.	Central Processing Unit (CPU)	12th Gen Intel^® Core™ i9-12900 K 3.19 GHz
3.	Graphics Processing Unit (GPU)	Nvidia GeForce RTX 3090 24 GB
4.	Memory	32 GB
5.	Hardware Acceleration	CUDA 11.3.58
6.	Programming Language	Python 3.7.12
7.	Programming Framework	PyTorch 1.11.0

Table 3. The results of comparison with the mainstream lightweight detection method.

Method	Model	F1-Score (%)	AP_0.5 (%)	FPS	Params. (M)	GFLOPs
Faster R-CNN [7]	-	82.42	85.87	4.89	28.28	305.88
YOLOv4 [41]	tiny	57.69	75.96	30.03	5.87	20.70
YOLOv5 [54]	n	47.55	77.99	24.03	7.02	20.41
YOLOv7 [38]	tiny	49.09	74.78	21.53	6.02	16.87
YOLOX [36]	tiny	74.85	78.68	23.17	5.03	19.49
FastRCDet	-	68.81	77.12	38.02	2.49	8.73

Table 4. The results of comparison with different backbones.

Backbone	Head	AP_0.5 (%)	AP_0.5:0.95 (%)	FPS	Params. (M)	GFLOPS
MobileNetv2 [20]	MSFH + DIoU	48.71	17.30	33.34	8.33	12.87
GhostNetv2 [25]	MSFH + DIoU	49.26	18.42	27.45	8.76	16.64
ShuffleNetv2 [24]	MSFH + DIoU	42.13	13.21	15.55	3.15	12.62
LANet (proposed)	MSFH + DIoU	77.12	39.70	38.02	2.49	8.73

Table 5. The results of comparison with different heads and loss functions.

Head (Loss Function)	Backbone	AP_0.5 (%)	AP_0.5:0.95 (%)	FPS	Params. (M)	GFLOPS
YOLOX Head [36] (CIoU)	LANet	58.24	20.35	25.36	9.02	34.62
YOLOX Head (DIoU)	LANet	60.30	22.61	25.36	9.02	34.62
MSFH (CIoU)	LANet	71.26	35.19	38.02	2.49	8.73
MSFH (DIoU)	LANet	77.12	39.70	38.02	2.49	8.73

Table 6. The results of comparison with different necks.

Neck	Backbone	Head	AP_0.5 (%)	AP_0.5:0.95 (%)	FPS	Params. (M)	GFLOPS
FPN [55]	LANet	MSFH + DIoU	78.09	40.98	15.86	7.87	35.58
PAFPN [56]	LANet	MSFH + DIoU	77.43	42.11	14.01	35.45	79.1
×	LANet	MSFH + DIoU	77.12	39.70	38.02	2.49	8.73

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tan, X.; Leng, X.; Sun, Z.; Luo, R.; Ji, K.; Kuang, G. Lightweight Ship Detection Network for SAR Range-Compressed Domain. Remote Sens. 2024, 16, 3284. https://doi.org/10.3390/rs16173284

AMA Style

Tan X, Leng X, Sun Z, Luo R, Ji K, Kuang G. Lightweight Ship Detection Network for SAR Range-Compressed Domain. Remote Sensing. 2024; 16(17):3284. https://doi.org/10.3390/rs16173284

Chicago/Turabian Style

Tan, Xiangdong, Xiangguang Leng, Zhongzhen Sun, Ru Luo, Kefeng Ji, and Gangyao Kuang. 2024. "Lightweight Ship Detection Network for SAR Range-Compressed Domain" Remote Sensing 16, no. 17: 3284. https://doi.org/10.3390/rs16173284

APA Style

Tan, X., Leng, X., Sun, Z., Luo, R., Ji, K., & Kuang, G. (2024). Lightweight Ship Detection Network for SAR Range-Compressed Domain. Remote Sensing, 16(17), 3284. https://doi.org/10.3390/rs16173284

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lightweight Ship Detection Network for SAR Range-Compressed Domain

Abstract

1. Introduction

2. Related Works

2.1. Lightweight Networks

2.1.1. Convolution Neural Networks

2.1.2. Vision Transformer and Variants

2.2. Lightweight Ship Detection in SAR Imagery

2.3. Ship Detection in the SAR Range-Compressed Domain

3. Characteristics of the Range-Compressed Domain

3.1. Analytic Expression of the Range-Compressed Domain

3.2. Analysis of Characteristics

4. Methods

4.1. Overall Structure and Process

4.2. Backbone: Lightweight Adaptive Network (LANet)

4.2.1. Adaptive Kernel Convolution

4.2.2. Adaptive Kernel Block (AK Block)

4.3. Detection Head: Multi-Scale Fusion Head (MSFH)

4.4. Loss Function: Direction IoU (DIoU)

4.4.1. Distance Loss

4.4.2. Shape Loss

4.5. Anchor-Free

5. Experiments

5.1. Dataset

5.2. Implementation Details

5.3. Evaluation Metrics

5.4. Experimental Results

5.4.1. Comparison Experiments between the Lightweight Detection Methods and FastRCDet

5.4.2. Comparison Experiments between the Mainstream Lightweight Backbone and LANet

5.4.3. Comparison Experiments of the YOLOX Head and CIoU Loss

5.4.4. Ablation Experiments to Examine the Effect of Neck

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI