BurgsVO: Burgs-Associated Vertex Offset Encoding Scheme for Detecting Rotated Ships in SAR Images

Zhang, Mingjin; Li, Yaofei; Guo, Jie; Li, Yunsong; Gao, Xinbo

doi:10.3390/rs17030388

Open AccessArticle

BurgsVO: Burgs-Associated Vertex Offset Encoding Scheme for Detecting Rotated Ships in SAR Images

by

Mingjin Zhang

^*

,

Yaofei Li

,

Jie Guo

,

Yunsong Li

and

Xinbo Gao

School of Telecommunications Engineering, Xidian University, Xi’an 710071, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(3), 388; https://doi.org/10.3390/rs17030388

Submission received: 3 December 2024 / Revised: 13 January 2025 / Accepted: 14 January 2025 / Published: 23 January 2025

(This article belongs to the Special Issue SAR Image Object Detection and Information Extraction: Methods and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Synthetic Aperture Radar (SAR) is a crucial remote sensing technology with significant advantages. Ship detection in SAR imagery has garnered significant attention. However, existing ship detection methods often overlook feature extraction, and the unique imaging mechanisms of SAR images hinder the direct application of conventional natural image feature extraction techniques. Moreover, oriented bounding box-based detection methods often prioritize accuracy excessively, leading to increased parameters and computational costs, which in turn elevate computational load and model complexity. To address these issues, we propose a novel two-stage detector, Burgs-rooted vertex offset encoding scheme (BurgsVO), for detecting rotated ships in SAR images. BurgsVO consists of two key modules: the Burgs equation heuristics module, which facilitates feature extraction, and the average diagonal vertex offset (ADVO) encoding scheme, which significantly reduces computational costs. Specifically, the Burgs equation module integrates temporal information with spatial data for effective feature aggregation, establishing a strong foundation for subsequent object detection. The ADVO encoding scheme reduces parameters through anchor transformation, leveraging geometric similarities between quadrilaterals and triangles to further reduce computational costs. Experimental results on the RSSDD and RSDD benchmarks demonstrate that the proposed BurgsVO outperforms the state-of-the-art detectors in both accuracy and efficiency.

Keywords:

SAR image; oriented bounding box (OBB); ship detection; Burgs equation; average offset of diagonal vertices (ADVO)

1. Introduction

Synthetic Aperture Radar (SAR) is an effective active microwave sensor that operates under various weather conditions, making it an indispensable tool for maritime traffic control and search and rescue operations [1]. Its broad applications present numerous research challenges and directions, highlighting its importance in these domains.

In recent years, ship detection in SAR imagery has gained increasing attention as a critical task. Traditional ship detection methods include ship structure analysis [2], saliency-driven techniques [3], and threshold-based methods [4,5]. Although these methods have provided foundational solutions, advancements in deep learning and the availability of comprehensive SAR datasets, such as RSDD [6] and RSSDD [7], have facilitated the development and widespread application of advanced detectors. These detectors include single-stage detectors [8,9] and two-stage detectors [10,11,12], which offer superior accuracy compared to traditional methods [13]. Despite these advancements, ship detection in SAR images continues to face challenges, primarily due to the prevalent use of horizontal bounding boxes (HBB) [13,14,15]. In SAR images, densely packed and elongated ships can cause overlapping HBBs, complicating target differentiation and background management. Additionally, ships near the shore are often affected by speckle noise, further hindering detection.

To address these challenges, researchers propose oriented bounding boxes (OBB) [16], which provide a more precise representation of ship shape and orientation, reducing overlap and enhancing detection accuracy. OBB-based ship detection methods are typically categorized into keypoint-based and anchor-based approaches. While keypoint-based methods [17,18,19] offer rapid detection, they may face generalization issues due to complex loss functions. While some anchor-based methods [20,21] improve precision and recall, they can introduce boundary discontinuities in object orientation predictions. Although certain solutions employ additional modules or angle transformations [22,23,24] to address these issues, they often lead to increased parameters and computational costs. Despite the high potential of two-stage ship detection methods, research remains limited. Existing approaches continue to encounter two key challenges as follows.

Challenge I: Feature Extraction: SAR images present unique imaging mechanisms and noise characteristics, such as speckle noise, that challenge traditional feature extraction methods. Additionally, ships in SAR images often appear against complex backgrounds, such as other vessels, sea waves, and surface textures, requiring the feature extraction module to effectively separate ship signals from background interference. Current methods often lack rigorous heuristic principles, while incorporating guidance from partial differential equations could significantly improve feature aggregation.

Challenge II: OBB Encoding Scheme: Many OBB methods have improved the performance of single-stage [8,9] and anchor-free detectors [25,26], which addresses boundary problems. For instance, RSDet++ [27] employs a modulated rotation loss to address loss discontinuities, though this may compromise predictive performance. Similarly, a polar coordinate encoding method [28] adds computational burden by introducing extra hyperparameters. Accordingly, designing a new encoding scheme based on anchor transformation is necessary to mitigate these challenges while keeping costs low.

Based on the above considerations, we propose a Burgs-rooted vertex offset encoding scheme (BurgsVO) for detecting rotated ships in SAR images, significantly enhancing speed while maintaining accuracy. The Burgs-equation heuristics module addresses the limitations of existing heuristic methods by leveraging spatial information to complement detection data, optimizing feature extraction and improving detection accuracy. In our two-stage detection process, we design a novel encoding scheme based on average offset of diagonal vertices (ADVO). This scheme transforms rotation bounding boxes—defined by center coordinates, dimensions, and angles—into a new format comprising external rectangle center coordinates, dimensions, and two sliding offsets, represented by six parameters. By simplifying the encoding process and reducing the number of anchors, this approach effectively decreases the computational load during model training and detection. Extensive experiments on the RSSDD and RSDD datasets demonstrate that the proposed BurgsVO surpasses the state-of-the-art (SOTA) methods in both detection accuracy and speed, achieving efficient and accurate ship detection in complex SAR imaging environments. This transformation makes the model training process more efficient and precise, advancing the capabilities of SAR-based ship detection.

In summary, this paper contributes in three key areas:

1.: We propose a novel two-stage ship detection model in SAR images that begins with the feature extraction module and efficient encoding scheme. Experiments on the RSSDD and RSDD datasets demonstrate that the proposed BurgsVO enhances speed while maintaining accuracy.
2.: We design a second-order differential network equation based on the Burgess equation. By integrating contextual and spatial information, this method compensates for detection data deficiencies, improving feature extraction and enabling more effective ship target detection.
3.: We develop an ADVO encoding scheme, which converts rotated anchors to horizontal anchors, accelerating model convergence and reducing computational burden.

The remainder of this paper is organized as follows: Section II briefly reviews target detection methods related to the proposed approach and the application of neural partial differential equations. Section III provides a detailed description of the proposed model. Section IV presents the experimental results and analysis. Section V concludes the paper.

2. Related Work

2.1. Ship Detection Methods Based on OBB

Traditional SAR ship detection methods generally involve several stages, including image preprocessing, segmentation of land and ocean areas, and extraction of candidate regions. Various methods have been proposed [29,30,31], each offering unique advantages. However, these approaches often face limitations in characterization and generalization, especially in constrained or challenging scenarios.

Recent advances in deep learning-based object detectors have led to the development of numerous methods [7,32,33], which can be broadly classified into anchor-free and anchor-based approaches. Anchor-free methods, such as bbavtors [19], integrate CenterNet with point-based encoding to represent OBB, aiming to achieve precise detection through rotating bounding boxes. In contrast, anchor-based methods enhance recall rates by incorporating anchor priors, which are particularly effective for detecting small objects. However, these methods often face challenges such as computational redundancy and boundary discontinuity. To address these issues, new approaches have been developed, including angle classification techniques [34,35] and the rotation-equivariant detector ReDet [36]. The ReDet model introduces a single alignment network (S2-Net) to address inconsistencies between regression and classification tasks.

In the field of ship detection in SAR images, OBB-based methods are frequently used to address issues related to boundary discontinuity and to enhance detection speed. These methods include both anchor-free and single-stage detectors. Researchers have investigated a range of advanced techniques to improve performance, such as multi-scale feature fusion [37], frequency focusing modules [38], and multi-stage anchor mechanisms [39]. While these techniques have demonstrated effectiveness in various scenarios, they often necessitate manual tuning of anchors. This manual adjustment process complicates the calibration phase and extends the learning period. Additionally, manual tuning can sometimes result in performance degradation due to boundary discontinuities, which has led to increased interest in anchor-free detectors. Recent advancements include deep learning techniques that focus on key points, alongside methods utilizing elliptical encoding and dynamic key points, all designed to enhance detection performance [25].

Although two-stage detectors exhibit significant potential, research in this area remains relatively limited. Two-stage detectors typically involve a two-step process: generating region proposals followed by classification and refinement to improve accuracy. For example, some strategies in rotation region proposal networks use multi-angle anchors to refine OBB detection results [40]. Nevertheless, many two-stage detectors rely on a large number of anchors to improve recall rates, which can result in increased computational overhead and reduced real-time performance. Therefore, there exists considerable potential for further research and advancements in two-stage detectors, with the aim of enhancing detection efficiency while minimizing computational costs.

2.2. Neural Partial Differential Equations

Modern deep network designs often draw inspiration from Partial Differential Equations (PDEs) [41], showcasing their versatility across various applications. For instance, nonlinear anisotropic diffusion has been optimized for image denoising [42], and techniques enhancing discriminant and rotation-invariant properties have been developed for image classification [43]. The complex mathematical derivations involved in integrating PDEs into deep networks have led researchers to explore Ordinary Differential Equations (ODEs) and neural networks as more accessible alternatives. However, many existing feature extraction methods are tailored for natural images and are not directly applicable to ship detection in SAR images due to their unique imaging mechanisms. Therefore, developing a robust and interpretable module is essential. By leveraging temporal information, we can compensate for spatial limitations and enhance feature aggregation.

2.3. Oriented Bounding Box Encoding Scheme

Traditional OBB detection methods typically rely on rectangles, while quadrilateral detection techniques, such as Glide Vertex [18] and ResDet [44], are also commonly employed. However, both of these methods lack robust calibration measures, which can lead to decreased detection accuracy. In earlier research, methods such as rectangular box regression [25] have attempted to address this by defining a constrained quadrilateral through the convex hull formed by key points. Despite this, the approach can still result in irregular quadrilaterals that do not perfectly align with the actual object boundaries. To address these limitations, researchers have introduced directional Region-based Convolutional Neural Networks (RCNN) [45], which aim to convert parallelogram proposals into rotated rectangle candidates. However, this approach occasionally produces calibration rectangles that fail to accurately match the true OBB due to the presence of redundant background areas and minor directional deviations. Researchers [46] have proposed deriving the skew factor using the offsets of the top-left two points of the OBB. While this approach reduces computational load, it can result in discrepancies between the HBB and the generated bounding box. Recent studies have proposed various methods for ship detection in remote sensing images, focusing on small ship detection [47], oriented target identification [48], and precise aerial target prediction [49]. These approaches address key detection challenges, offering new insights and technical support for advancing related fields. Accordingly, creating a novel encoding scheme centered on anchor transformation is crucial for speeding up model convergence and minimizing computational load.

3. Method

We provide an overview of the proposed BurgsVO in Figure 1. Our model is mainly composed of the image processing module inspired by the Burgess equation and the ship image detection module in the ADVO coding mode. To enhance ship detection, we introduce a Burgess equation-inspired module to improve image detail and use spatial information to address temporal data gaps. Additionally, the ADVO coding scheme enhances the speed and accuracy of ship image detection during the HBB to OBB conversion process.

3.1. Burgess Equation Heuristic Module

In nearshore environments, SAR imaging is often plagued by speckle noise from complex backgrounds like decks and reefs, which disrupts ship detection. To tackle this, we develop the Burgess equation heuristic module. This module extracts contextual information and leverages spatial data to address temporal deficiencies, enhancing feature aggregation. The Burgess equation heuristic module effectively reduces noise by adhering to the Burgess equation’s principles. It combines spatial feature extraction with temporal data integration to compensate for static image limitations, improving ship target identification in complex backgrounds. Furthermore, the module operates with precision, avoiding additional errors in feature extraction and enhancement. This approach significantly enhances the robustness of the ship detection system, ensuring high performance in difficult conditions and marking a substantial advancement in overcoming background noise in SAR imagery.

Specifically, given feature image u, the one-dimensional Burgess equation is as follows:

\frac{\partial u}{\partial t} + u \frac{\partial u}{\partial x} = ν \frac{\partial^{2} u}{\partial x^{2}},

(1)

where

\frac{\partial u}{\partial t}

stands for the first derivative of time,

\frac{\partial u}{\partial x}

represents the first derivative of space,

\frac{\partial^{2} u}{\partial x^{2}}

denotes the second derivative of space, and

ν

is the joint coefficient of time and space. All terms are one-dimensional nonlinear convection and diffusion terms, so the one-dimensional nonlinear convection equation can also be called the non-viscous Burgess equation. Based on this, we utilize forward difference for the time derivative, backward difference for the space derivative, and central difference for the second derivative to obtain the following equation as follows:

\frac{u_{i}^{n + 1} - u_{i}^{n}}{Δ t} + u_{i}^{n} \frac{u_{i}^{n} - u_{i - 1}^{n}}{Δ x} = ν \frac{u_{i + 1}^{n} - 2 u_{i}^{n} + u_{i - 1}^{n}}{Δ x^{2}},

(2)

where n refers to the time dimension and i is the space dimension. After simplification:

\begin{matrix} u_{i}^{n + 1} = & u_{i}^{n} - u_{i}^{n} \frac{Δ t}{Δ x} (u_{i}^{n} - u_{i - 1}^{n}) \\ + ν \frac{Δ t}{Δ x^{2}} (u_{i + 1}^{n} - 2 u_{i}^{n} + u_{i - 1}^{n}), \end{matrix}

(3)

we take step size

Δ x

= 1. To make the module run more smoothly, we set

ν

to 1, so at this point,

Δ t = \frac{Δ x}{ν} = 1

. The above formula can be rewritten as follows:

\begin{matrix} u_{i}^{n + 1} & = u_{i}^{n} - u_{i}^{n} (u_{i}^{n} - u_{i - 1}^{n}) + (v_{i + 1}^{n} - 2 u_{i}^{n} + u_{i - 1}^{n}) \\ = u_{i}^{n} + (- u_{i}^{n} Δ u_{i}^{n}) + Δ u_{i + 1}^{n} + (- Δ u_{i}^{n}) \end{matrix} .

(4)

From this, we can obtain the formula

u_{i}^{n + 1}

strictly connected according to the nodes.

To enhance the intuitiveness of our heuristic module, we reformulate the equation into a more comprehensible form, as illustrated in Figure 2. This reformulation clarifies how information can be extracted from the data points

u_{i - 1}^{n}

,

u_{i}^{n}

, and

u_{i + 1}^{n}

, thereby facilitating better feature aggregation. By applying a linear transformation to these three data points, we ensure the robustness of the transformation process. This approach enables the integration of temporal information with spatial data, compensating for any detection data deficiencies and capturing more detailed information. Consequently, this refined method enhances feature extraction quality, enabling more accurate ship target detection in subsequent analysis stages while minimizing redundant computations and optimizing processing speed.

3.2. ADVO-Based Two-Stage Detector

In the two-stage detection process, we propose a new encoding scheme based on the Average Diagonal Vertex Offset (ADVO), grounded in the mathematical principle of similarity. The choice to use a two-stage ADVO-based detector over a single-stage one is based on performance, complexity, and accuracy, particularly in handling orientation and geometric transformations. Single-stage detectors struggle with rotated bounding boxes, requiring numerous anchors to cover various rotations and scales, which increases computational load and reduces accuracy. In contrast, the two-stage method separates anchor box generation from fine-tuning detection. The first stage (ORPN) reduces anchor numbers with a new encoding scheme and generates rotated proposals, while the second stage (ORCNN) refines these proposals, improving classification and localization. This approach simplifies the encoding and reduces the number of anchor points, effectively lowering the computational burden during training and detection.

3.2.1. ORPN

The current OBB-based detection methods introduce an additional parameter, angle

θ

, which increases the preset anchor points and computational load, leading to lower regression rates and boundary discontinuity issues. To address these challenges, we leverage mathematical properties of triangles and quadrilaterals to propose a new coding scheme for converting HBB to OBB and desgin a novel OBB encoding method named ADVO. As depicted in Figure 3, the red rectangle denotes the OBB of the target, while the black box represents the attached HBB. The four green dots signify the OBB’s vertices

(v t, v r, v b, v l)

. The light pink dot marks the upper-left vertex (

v_{l t}^{o}

) of the HBB, and the light blue dot represents the lower-right vertex (

v_{r b}^{o}

) of the HBB. The black lines a and b depict offsets of the upper vertex

v t

and left vertex

v l

relative to the upper-left point of the predicted HBB (black). Similarly, c and d represent offsets of the lower vertex

v b

of the red OBB and the right vertex

v r

relative to the lower-right point of the predicted HBB (black). Due to inaccuracies in the sine and cosine calculations of external rectangle vertex coordinates, errors between a and c, and b and d may occur. To mitigate this, the last two offsets

(L_{1}^{g}, L_{2}^{g})

are averaged:

(L_{1}^{g}, L_{2}^{g}) = ((a + b) \times 0.5, (c + d) \times 0.5)

.

(x_{o}^{g}, y_{o}^{g}, w_{o}^{g}, h_{o}^{g})

represent the center point coordinates, width, and height of HBB. Thus, the OBB of such a GT can be represented by a six-dimensional vector:

O B B = (x_{o}^{g}, y_{o}^{g}, w_{o}^{g}, h_{o}^{g}, L_{1}^{g}, L_{2}^{g})

. While ORPN appears conceptually straightforward, its strength lies in its direct decoding representation scheme: ADVO. Next, we will outline the process of generating a directional rectangular proposal from an ADVO representation (Algorithm 1).

Algorithm 1 ADVO Encoding

Input: Grounding truth boxes(GTs):

GT ’ s OBB \leftarrow (x_{o}^{g}, y_{o}^{g}, w_{0}^{g}, h_{0}^{g}, L_{1}^{g}, L_{2}^{g})

;

Anchors:

a \leftarrow (a_{x}, a_{y}, a_{w}, a_{h})

;

Steps: 1. The actual values obtained from the network—used as ground truth in the regression loss function—represent the ratios of the Grounding Truth (GT) OBB to each level’s anchor

δ

:

δ \leftarrow (d x, d y, d w, d h, d L_{1}, d L_{2})

;

2. The ratio of the predicted proposals to the current anchor

δ^{'}

:

δ^{'} \leftarrow (d x^{'}, d y^{'}, d w^{'}, d h^{'}, d L_{1}^{'}, d L_{2}^{'})

;

3. Target outcome: Generate a single oriented proposal represented by ADVO:

\{\begin{matrix} x_{o}^{p} \leftarrow d x^{'} \cdot α_{w} + α_{x}, & y_{o}^{p} \leftarrow d y^{'} \cdot α_{h} + α_{y} \\ w_{o}^{p} \leftarrow α_{w} \cdot e^{d w^{'}}, & h_{o}^{p} \leftarrow α_{h} \cdot e^{d h^{'}} \\ L_{1}^{p} \leftarrow d L_{1^{'}} \cdot w_{o}^{p}, & L_{2}^{p} \leftarrow d L_{2^{'}} \cdot h_{o}^{p} \end{matrix}

;

Output: The oriented proposal is represented by the ADVO scheme:

(x_{o}^{p}, y_{o}^{p}, w_{o}^{p}, h_{o}^{p}, L_{1}^{p}, L_{2}^{p})

.

ADVO enoding scheme: In the initial stage, horizontal anchors are transformed into horizontal proposals to facilitate the regression needed for the subsequent stage. During the ORPN training phase, horizontal anchors are presupposed as four-dimensional vectors

a = (a_{x}, a_{y}, a_{w}, a_{h})

, where

a_{x}

and

a_{y}

denote the central coordinates, and

a_{w}

and

a_{h}

represent the width and height of the anchor, respectively.

In the regression branch of the network during training, the actual learned values, which are fed into the regression part of the loss function, correspond to the ratio of the Ground Truth (GT)’s OBB relative to each level’s anchor:

δ = (d x, d y, d w, d h, d L_{1}, d L_{2})

.

During the testing phase of ORPN, a fully convolutional network consisting of a 3 × 3 convolutional layer and two parallel 1 × 1 convolutional layers are employed to predict proposal categories and their positions. One of the 1 × 1 convolutional layers serves as a regression branch that outputs predicted proposals relative to the ratio of the current anchor:

δ^{'} = (d x^{'}, d y^{'}, d w^{'}, d h^{'}, d L_{1}^{'}, d L_{2}^{'})

.

The two calculation processes involve reciprocal affine transformations, with the specific conversion formula for the inverse process during the testing stage as follows:

\{\begin{matrix} x_{o}^{p} = d x^{'} \cdot α_{w} + α_{x}, & y_{o}^{p} = d y^{'} \cdot α_{h} + α_{y} \\ w_{o}^{p} = α_{w} \cdot e^{d v^{'}}, & h_{o}^{p} = α_{h} \cdot e^{d h^{'}} \\ L_{1}^{p} = d L_{1^{'}} \cdot w_{o}^{p}, & L_{2}^{p} = d L_{2^{'}} \cdot h_{o}^{p} \end{matrix},

(5)

where

(x, y)

are the central coordinates of the predicted level of the proposal (HBB), w and h are the width and height of the HBB.

L_{1}, L_{2}

are the average offsets between the four vertices of the actually generated oriented proposal (OBB) and the two vertices of the upper-left and lower-right predicted horizontal proposal, respectively. Finally, the oriented proposal is represented by the ADVO’s proposal:

(x_{o}^{p}, y_{o}^{p}, w_{o}^{p}, h_{o}^{p}, L_{1}^{p}, L_{2}^{p})

.

x_{o}^{p}, y_{o}^{p}, w_{o}^{p}, h_{o}^{p}

represents the center point coordinates and width and height of the external HBB of the OBB.

L_{1}^{p}, L_{2}^{p}

represents two sets of offsets representing the two vertices of OBB relative to the upper-left and lower-right vertices of HBB. It is worth mentioning here that the precision of the regression of the oriented proposal represented by the ADVO is crucial for the second stage (ORCNN), which directly affects the similarity factor r and the positioning of the four vertex coordinates

(\vec{v t}, \vec{v r}, \vec{v b}, \vec{v l})

.

Next, we introduce the specific calculation method of the four vertex coordinates of OBB in detail. The specific process is shown in Figure 4. Initially, the OBB of proposals predicted by the ORPN network is represented by

(x_{o}^{p}, y_{o}^{p}, w_{o}^{p}, h_{o}^{p}, L_{1}^{p}, L_{2}^{p})

. To find the coordinates of its four vertices, we first need to convert its horizontal external HBB:

(x_{o}^{p}, y_{o}^{p}, w_{o}^{p}, h_{o}^{p})

to the expression

(x_{1}, y_{1}, x_{2}, y_{2})

through

v_{l t}

(top-left vertex) and

v_{b r}

(bottom-right vertex), calculated as follows:

\{\begin{matrix} x_{1} = x_{o}^{p} - w_{o}^{p} / 2, & y_{1} = y_{o}^{p} - h_{o}^{p} / 2 \\ x_{2} = x_{o}^{p} + w_{o}^{p} / 2, & y_{2} = y_{o}^{p} + h_{o}^{p} / 2 \end{matrix} .

(6)

Then, using the two offsets of

L_{1}^{p}, L_{2}^{p}

, we can derive the similarity factor r:

r = \frac{L_{2}^{P}}{L_{1}^{P}} .

(7)

The geometric properties and similarity factor are then applied to derive the other two vertices of the two Oriented Bounding Boxes (OBB2 and OBB1), as shown in Figure 4.

The coordinates of the four vertices of OBB1 can be expressed as follows:

\{\begin{matrix} v_{t 1} = (x_{2} - (w_{o}^{p} - L_{1}^{p}), y_{2} - L_{2}^{p} - (w_{o}^{p} - L_{1}^{p}) / (r)) \\ v_{r 1} = (x_{2}, y_{2} - L_{2}^{p}) \\ v_{b 1} = (x_{2} - L_{1}^{p}, y_{2}) \\ v_{l 1} = (x_{1}, y_{2} - (w_{o}^{p} - L_{1}^{p}) / (r)) \end{matrix} .

(8)

The coordinates of the four vertices of OBB2 can be expressed as follows:

\{\begin{matrix} v_{t 2} = (x_{1} + L_{1}^{p}, y_{l}) \\ v_{r 2} = (x_{2}, y_{1} + (w_{o}^{p} - L_{1}^{p}) / (r)) \\ v_{b 2} = (x_{1} + w_{o}^{p} - L_{1}^{p}, y_{1} + L_{2}^{p} + (w_{o}^{p} - L_{1}^{p}) / (r)) \\ v_{l 2} = (x_{1}, y_{1} + L_{2}^{p}) \end{matrix} .

(9)

The four vertex coordinates of the OBB of the final oriented proposals are the following:

(v_{t}, v_{r}, v_{b}, v_{l})

. They can be obtained from the mean of the vertex coordinates corresponding to OBB1 and OBB2. The calculation process is as follows:

\{\begin{matrix} v_{t} = (v_{t 1} + v_{t 2}) \times 0.5 \\ v_{r} = (v_{r 1} + v_{r 2}) \times 0.5 \\ v_{b} = (v_{b 1} + v_{b 2}) \times 0.5 \\ v_{l} = (v_{l 1} + v_{l 2}) \times 0.5 \end{matrix} .

(10)

With this rectangular box coded representation, we implement the regression of oriented proposals through the predicted proposals represented by ADVO (Algorithm 2) (six parameters) and convert from

(x_{o}^{p}, y_{o}^{p}, w_{o}^{p}, h_{o}^{p}, L_{1}^{p}, L_{2}^{p})

to the common representation of four vertices

(v_{t}, v_{r}, v_{b}, v_{l})

(eight parameters) for the feature alignment module of the next stage (ORCNN): Rotate ROI.

Regression part of the loss function: To monitor ORPN, we assign binary labels to positive and negative samples

p_{*} \in (0, 1)

. We utilize the outer rectangle of the Ground Truth (

G T

) ’s OBB as the proportional supervised offset, that is

δ^{*} = (d_{x}^{*}, d_{y}^{*}, d_{w}^{*}, d_{h}^{*}, d_{L_{1}}^{*}, d_{L_{2}}^{*})

denoted by the affine transformation as follows:

\{\begin{matrix} d_{x}^{*} = (x_{o}^{g} - x_{o}^{p}) / w_{o}^{g}, & d_{y}^{*} = (y_{o}^{g} - y_{o}^{p}) / h_{o}^{g} \\ d_{w}^{*} = log (w_{o}^{g} / w_{o}^{p}), & d h^{*} = log (h_{o}^{g} / h_{o}^{p}) \\ d_{L_{1}}^{*} = L_{1}^{g} / w_{o}^{g}, & d_{L_{2}}^{*} = L_{2}^{g} / h_{o}^{g} \end{matrix},

(11)

where

(x_{o}^{p}, y_{o}^{p}, w_{o}^{p}, h_{o}^{p})

is the external HBB of the predicted proposals,

(x_{o}^{g}, y_{o}^{g}, w_{o}^{g}, h_{o}^{g})

is the external rectangle (HBB) of

G T

’s OBB, and

(L_{1}^{g}, L_{2}^{g})

is the offset between the two vertices on

G T

’s OBB with respect to the two vertices on their external HBB. Then, we have the following ORPN loss function:

L = \frac{1}{N} \sum_{i = 1}^{N} L_{cls} (p_{i}, p_{i}^{*}) + \frac{1}{N} p_{i}^{*} \sum_{i = 1}^{N} L_{reg} (δ_{i}, δ_{i}^{*}),

(12)

where i represents the i-th index of the anchor point, and N is the total number of anchor points.

p_{i}

is the classification score and

p_{i}^{*}

is the classification label of

G T

. To evaluate the bias between labels and predictions, we use the cross-entropy loss as

L_{c l s}

,

δ_{i}

to represent the predicted proportional shift

δ^{'} = (d x^{'}, d y^{'}, d w^{'}, d h^{'}, d L_{1}^{'}, d L_{2}^{'})

. The loss function of the regression part

L_{r e g}

selects a smoothL1 loss [11]:

L_{reg} (δ_{i}, δ_{i}^{*}) = \{\begin{matrix} 0.5 {(δ_{i} - δ_{i}^{*})}^{2} & |δ_{i} - δ_{i}^{*}| < 1 ∣ \\ |δ_{i} - δ_{i}^{*}| - 0.5 & otherwise \end{matrix},

(13)

where

δ_{i}^{*}

to represent the proportional shift between

G T

’s OBB and anchor:

δ^{*} = (d_{x}^{*}, d_{y}^{*}, d_{w}^{*}, d_{h}^{*}, d_{L_{1}}^{*}, d_{L_{2}}^{*})

.

Algorithm 2 ADVO Decoding

Input: Grounding truth boxes (GTs):

The oriented proposal \leftarrow (x_{o}^{p}, y_{o}^{p}, w_{o}^{p}, h_{o}^{p}, L_{1}^{p}, L_{2}^{p})

;

Anchors:

a \leftarrow (a_{x}, a_{y}, a_{w}, a_{h})

;

Steps: 1. Calculate the coordinates of the four vertices:

\{\begin{matrix} x_{1} = x_{o}^{p} - w_{o}^{p} / 2, & y_{1} = y_{o}^{p} - h_{o}^{p} / 2 \\ x_{2} = x_{o}^{p} + w_{o}^{p} / 2, & y_{2} = y_{o}^{p} + h_{o}^{p} / 2 \end{matrix}

;

2. Calculate the similarity factor:

r = (L_{2}^{p} / L_{1}^{p})

;

3. Calculate the other two vertices of the two oriented bounding boxes (OBB1 and OBB2):

\{\begin{matrix} v_{t 1} = (x_{2} - (w_{o}^{p} - L_{1}^{p}), y_{2} - L_{2}^{p} - (w_{o}^{p} - L_{1}^{p}) / (r)) \\ v_{r 1} = (x_{2}, y_{2} - L_{2}^{p}) \\ v_{b 1} = (x_{2} - L_{1}^{p}, y_{2}) \\ v_{l 1} = (x_{1}, y_{2} - (w_{o}^{p} - L_{1}^{p}) / (r)) \end{matrix}

\{\begin{matrix} v_{t 2} = (x_{1} + L_{1}^{p}, y_{l}) \\ v_{r 2} = (x_{2}, y_{1} + (w_{o}^{p} - L_{1}^{p}) / (r)) \\ v_{b 2} = (x_{1} + w_{o}^{p} - L_{1}^{p}, y_{1} + L_{2}^{p} + (w_{o}^{p} - L_{1}^{p}) / (r)) \\ v_{l 2} = (x_{1}, y_{1} + L_{2}^{p}) \end{matrix}

;

4. Calculate the coordinates of the four vertices of the oriented proposals’ OBB:

\{\begin{matrix} v_{t} = (v_{t 1} + v_{t 2}) \times 0.5 \\ v_{r} = (v_{r 1} + v_{r 2}) \times 0.5 \\ v_{b} = (v_{b 1} + v_{b 2}) \times 0.5 \\ v_{l} = (v_{l 1} + v_{l 2}) \times 0.5 \end{matrix}

Output: Coordinates of the four vertices of the oriented proposals’ OBB:

(v_{t}, v_{r}, v_{b}, v_{l})

.

3.2.2. ORCNN

Since common target detection often encounters mismatches between

R o I

and target, resulting in inconsistent classification score confidence and detection accuracy, our next action is to place a directional rectangle proposal on the feature graph F to obtain a RotatedRoI (

R R o I

).

Transformation of feature maps: The specific process is to divide each

R R o I

into

K \times K

bins and generate a new feature map

F^{'}

with dimensions

K \times K \times C

(the default is 7 × 7 × 2). For the feature of the c-th channel

(1 \leq c \leq C)

of each bin whose index is

(i, j) (0 \leq i, j \leq K - 1)

, the formula is as follows:

F_{c}^{'} (i, j) = \sum_{(x, y) \in bin (i, j)} F_{i, j, c} (R (x, y, θ)) / n,

(14)

where n is the number of sampling locations in each bin,

R (\cdot)

is calculated as follows [50]:

(\begin{matrix} x^{'} \\ y^{'} \end{matrix}) = (\begin{matrix} cos θ & - sin θ \\ sin θ & cos θ \end{matrix}) (\begin{matrix} x - w_{r} / 2 \\ y - h_{r} / 2 \end{matrix}) + (\begin{matrix} x_{r} \\ y_{r} \end{matrix}),

(15)

where

(x_{r}, y_{r})

represents the center of the RRoI and

(w_{r}, h_{r})

represents the width and height of the RRoI.

Loss function of ORCNN: The regression loss is the most common and widely used smoothL1 loss, and the classification loss is CrossEntropyLoss:

L_{c l s} = - \sum_{i = 1}^{n} clog c_{p},

(16)

where c and

c_{p}

are the underlying truth label and the predicted score, respectively.

4. Experiment

The experimental settings are designed to evaluate SAR ship detection methods under diverse conditions, utilizing RSSDD and RSDD datasets for comprehensive testing. Advanced hardware and optimization techniques ensure robust training, while key performance metrics rigorously assess the models’ accuracy, efficiency, and generalization, guaranteeing reliable and practical outcomes.

4.1. Experimental Settings

4.1.1. Dataset

RSSDD [7] is the earliest open-source SAR image dataset and a challenging public benchmark for SAR ship detection, contributing significantly to the development of directional detectors, which include 1160 images and 2456 objects (ships). The images in the RSSDD are captured by the TerraSAR-X, RadarSat-2, and Sendin1 sensors. Image sizes range from 200 to 700 and resolution from 1 to 15 m. We apply MMRotate’s preprocessing to rescale images to 512 × 512 for both training and testing. The dataset is split into inshore and offshore scenarios. Inshore images often contain significant speckle noise, requiring effective noise reduction, while offshore images have a uniform background but smaller ship targets, challenging small object detection. For evaluation, the test set is divided into inshore and offshore subsets.

The RSDD dataset [6] is a publicly available SAR ship detection resource, designed to overcome the limitations of existing inclined bounding box datasets and to aid in algorithm development and practical applications. It includes 127 scenes with 7000 images and 10,263 ship instances, covering various imaging modes, polarizations, and resolutions. Annotations are created through automated labeling combined with manual refinement, using the OpenCV long-edge definition method in COCO format to capture target center points, edge lengths, and rotation angles. Generalization tests show that models trained with the RSDD dataset perform well on other datasets and uncropped large images, demonstrating robust generalization and practical value. This dataset is a valuable asset for advancing ship detection research in SAR imagery and related algorithms.

4.1.2. Experimental Details

Our single-GPU training setup includes Python 3.8.16, PyTorch 1.10.1, GCC 7.3, CUDA 11.3, and MMRotate 1.0. Training employs the AdamW optimizer with a momentum of 0.9, weight decay of 0.0001, and an initial learning rate of 2.5 ×

10^{- 4}

. The BurgsVO model is trained for up to 180 epochs on the RSSDD dataset and 36 epochs on the RSDD dataset using a cosine decay learning rate strategy, which reduces the learning rate to 0.05 of its initial value after reaching half of the maximum epochs.

Performance is evaluated using the VOC mAP metric for the RSSDD dataset and the COCO metric for the RSDD dataset. Additional metrics, including recall, precision, and frames per second (FPS), provide a comprehensive evaluation. Recall assesses the model’s ability to detect true positives, precision measures detection accuracy, and FPS reflects computational efficiency.

Evaluation Metrics: Precision (Pd), recall (Rd), mean average precision (mAP), and F1 are commonly employed metrics for model evaluation. For precision and recall, their calculations are as follows:

\{\begin{matrix} P_{d} = \frac{T P}{T P + F P} \\ R_{d} = \frac{T P}{T P + F N} \end{matrix},

(17)

where the variable

T P

signifies the quantity of targets that have been accurately identified.

F P

represents the instances of false alarms.

F N

refers to the targets that were not detected.

F P S

refers to the quantity of images the model can analyze every second, serving as a metric to gauge the speed of detection among various detectors, and is defined as follows:

F P S = \frac{1}{T i m e s},

(18)

where

T i m e s

is the average detection time for each image.

The AP metric provides a numerical assessment of a detector’s comprehensive detection efficacy by computing the area beneath the precision–recall graph. It is capable of gauging the detector’s overall performance across a range of thresholds and is articulated as follows:

A P = \int_{0}^{1} P_{d} (R_{d}) d R_{d} .

(19)

4.2. Comparison of Representational Methods

In this section, we compare our method to the most current representative methods, including one-stage detectors such as S2ANet [51], R3Det [52], KFiou [53], and two-stage detectors such as Orient-RCNN [45] and LPST-Det [54]. To ensure consistency in network scale, other detectors also employ a 152-layer ResNet [55] as their backbone.

4.2.1. Specific Comparison and Analysis

As shown in Table 1, our method achieves the highest levels of precision, recall, and mAP in the nearshore scenarios of the RSSDD dataset. Additionally, it also demonstrates excellent precision and mAP in the farshore scenes. These exceptional results are attributed to the synergistic effects of the Burgess equation-inspired module and the ADVO programming module. Notably, the module inspired by the Burgess equation plays a crucial role in complex nearshore environments. It excels in feature extraction by effectively eliminating speckle noise while preserving detailed structural information, thus providing a more accurate foundation for subsequent detection results. Furthermore, the ADVO encoding scheme significantly reduces computational costs while maintaining accuracy, thereby enhancing overall system performance. In summary, the proposed BurgsVO not only delivers superior performance in accuracy but also optimizes computational efficiency, demonstrating its robustness and effectiveness across different scenarios.

4.2.2. Visual Results

Figure 5 visually compares different methods using images from various RSSDD datasets, highlighting their performance in detecting and identifying targets. Observations reveal that methods other than our method exhibit noticeable instances of false positives and missed detections, which can significantly impact the reliability of target detection. The proposed BurgsVO, in contrast, consistently demonstrates low rates of both missed detections and false alarms across different scenarios, including both nearshore and farshore environments. This is particularly evident in challenging cases involving complex ship and coral scattering.

In the first image, other methods are seen to produce false positives and miss several detections, indicating their reduced accuracy in certain contexts. The second image shows inaccuracies due to ship scattering, where other methods misidentify the targets. The third image highlights issues related to inaccuracies and the failure to effectively recognize noise caused by scattered ships, further underscoring the limitations of other methods. In the fourth image, which depicts a distant scene, only the proposed BurgsVO successfully avoids the common pitfall of mistakenly identifying scattered noise from an underwater coral reef as a legitimate target.

We also present a comparative analysis of experimental results between our model and other representative methods on the RSDD dataset in Figure 6. The visual results indicate that, in these complex scenarios, other methods consistently suffer from false detections and missed detections. In contrast, the proposed BurgsVO demonstrates superior performance, highlighting the network’s capability for accurate detection in challenging conditions. In the specific discussion, it can be observed that other methods have missed detections in the nearshore scenes of the first and third figures. In the farshore scene of the fifth figure, other methods have exhibited false alarms, indicating that the accuracy of other methods is relatively low in some special circumstances. In the second and fourth figures, it is evident that other methods have all resulted in false positives, whereas only our method is correct.

To evaluate the impact of ship size on the proposed method, experiments were conducted on remote sensing images with varying ship sizes (Figure 7, left). The analysis also examined significant size variations within the same image, with results shown in the right part of Figure 7. The findings show that the algorithm remains stable despite size changes, thanks to the Burgess equation heuristic module’s integration of spatial and temporal information and the adaptive adjustment of the ADVO encoding scheme. These factors reduce anchor point reliance, improve target box regression accuracy, and optimize localization through similarity factors, ensuring stable detection across scales.

5. Discussion

The ablation study and performance analysis confirm the proposed model’s effectiveness in SAR ship detection. The synergy between the Burgs-inspired heuristic module and ADVO achieves a peak mAP of 0.926. Speed and complexity evaluations further demonstrate a balance of high accuracy and processing efficiency. The model outperforms existing methods in speed among two-stage detectors while maintaining competitive accuracy, highlighting its effectiveness for complex SAR detection tasks.

5.1. Ablation Study

To evaluate the efficacy of each component in our model, we conduct four ablation experiments on the RSSDD dataset, and the results for each experiment group are presented in Table 2. The Burgess equation heuristic module enhances detection accuracy from 0.886 mAP to 0.908 mAP, while ADVO improves it further by 0.011 mAP. When these two components are combined, the model achieves its highest performance of 0.926 mAP, underscoring their synergistic effectiveness.

5.2. Speed and Complexity Analysis

Under identical experimental conditions, we evaluate the speed and accuracy of various methods. Our approach, with its unique feature extraction and efficient encoding scheme, reduces the number of anchors while improving image quality, excelling in both speed and accuracy (in Table 3). Our method achieves the highest speed while also delivering excellent detection accuracy, reaching 0.926 mAP.

As shown in Figure 8, the proposed method outperforms the traditional one-stage approach in processing speed. Unlike the one-stage method, which performs direct object detection, the two-stage method first aggregates features to extract compact, representative information, providing a solid foundation for efficient detection in the second stage. This reduces computational complexity, improving overall speed. In contrast, the one-stage method lacks feature aggregation, leading to more complex computations. Additionally, by reducing the number of anchor points in the second stage, the proposed method further alleviates the computational burden, resulting in faster processing.

It can be concluded from Figure 8, our method’s advanced feature extraction capabilities ensure that the image quality is significantly enhanced, contributing to more precise and reliable detection outcomes. The reduction in the number of anchors, combined with the efficient encoding scheme, optimizes both processing speed and accuracy. While the speed improvement over single-stage detectors is modest, our method significantly enhances accuracy. This is because, while the Burgess equation-inspired module effectively enhances feature aggregation, its inherent nonlinear computations introduce delays. Additionally, the ADVO encoding scheme reduces anchor points and improves localization accuracy, but the two-stage design, particularly the training and inference processes of ORPN and ORCNN, still demands significant computational resources. As a result, despite the method’s accuracy and robustness, speed improvements are limited by the model’s architectural design. Among two-stage detectors, its high mAP demonstrates effective target detection and classification, even in complex scenarios. Overall, the proposed BurgsVO represents a balanced solution that achieves both high-speed performance and exceptional accuracy in detection tasks.

6. Conclusions

This paper introduces a novel two-stage detector: BurgsVO, for ship detection in SAR images. Initially, we incorporate the Burgess equation heuristics module to feature extraction and utilize spatial information to compensate for temporal detection data deficiencies, enhancing feature aggregation for improved ship detection. Additionally, we design an efficient ADVO encoding scheme to streamline anchor points, thereby reducing computational load and significantly enhancing processing speed. Experimental results on RSSDD datasets and RSDD datasets demonstrate that the proposed BurgsVO outperforms representative approaches in terms of both detection accuracy and speed, achieving an optimal balance between two metrics.

Author Contributions

Conceptualization, M.Z.; Methodology, Y.L. (Yaofei Li); Software, J.G.; Writing—review & editing, X.G.; Supervision, Y.L. (Yunsong Li). All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 92470108, Grant 62272363; in part by the Young Elite Scientists Sponsorship Program by China Association for Science and Technology (CAST) under Grant 2021QNRC001; in part by the Joint Laboratory for Innovation in Satellite-Borne Computers and Electronics Technology Open Fund 2023 under Grant 2024KFKT001-1.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shi, H.; Zhang, B.; Wang, Y.; Cui, Z.; Chen, L. SAR-to-Optical Image Translating Through Generate-Validate Adversarial Networks. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Duan, C.; Hu, W.; Du, X. SAR image based geometrical feature extraction of ships. In Proceedings of the 2011 IEEE International Geoscience and Remote Sensing Symposium, Vancouver, BC, Canada, 24–29 July 2011; pp. 2547–2550. [Google Scholar]
Wang, Y.; Liu, H. A hierarchical ship detection scheme for high-resolution SAR images. IEEE Trans. Geosci. Remote Sens. 2012, 50, 4173–4184. [Google Scholar] [CrossRef]
Xing, X.; Chen, Z.; Zou, H.; Zhou, S. A fast algorithm based on two-stage CFAR for detecting ships in SAR images. In Proceedings of the 2009 2nd Asian-Pacific Conference on Synthetic Aperture Radar, Xi’an, China, 26–30 October 2009; pp. 506–509. [Google Scholar]
Leng, X.; Ji, K.; Yang, K.; Zou, H. A bilateral CFAR algorithm for ship detection in SAR images. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1536–1540. [Google Scholar] [CrossRef]
Xu, C.; Su, H.; Li, J.; Liu, Y.; Yao, L.; Gao, L.; Yan, W.; Wang, T. RSDD-SAR: Rotated ship detection dataset in SAR images. J. Radar 2022, 11, 581–599. [Google Scholar]
Li, J.; Qu, C.; Shao, J. Ship detection in SAR images based on an improved faster R-CNN. In Proceedings of the 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), Beijing, China, 13–14 November 2017; pp. 1–6. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
Zhang, H.; Chang, H.; Ma, B.; Wang, N.; Chen, X. Dynamic r-cnn: Towards high quality object detection via dynamic training. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 260–275. [Google Scholar]
Deng, Z.; Sun, H.; Zhou, S.; Zhao, J. Learning Deep Ship Detector in SAR Images From Scratch. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4021–4039. [Google Scholar] [CrossRef]
Cui, Z.; Li, Q.; Cao, Z.; Liu, N. Dense attention pyramid networks for multi-scale ship detection in sar images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8983–8997. [Google Scholar] [CrossRef]
Wang, S.; Cai, Z.; Yuan, J. Automatic SAR Ship Detection Based on Multi-Feature Fusion Network in Spatial and Frequency Domain. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4102111. [Google Scholar]
Zand, M.; Etemad, A.; Greenspan, M. Oriented Bounding Boxes for Small and Freely Rotated Objects. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
Zhu, Y.; Du, J.; Wu, X. Adaptive Period Embedding for Representing Oriented Objects in Aerial Images. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7247–7257. [Google Scholar] [CrossRef]
Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.S.; Bai, X. Gliding Vertex on the Horizontal Bounding Box for Multi-Oriented Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1452–1459. [Google Scholar] [CrossRef] [PubMed]
Yi, J.; Wu, P.; Liu, B.; Huang, Q.; Qu, H.; Metaxas, D. Oriented Object Detection in Aerial Images with Box Boundary-Aware Vectors. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 2149–2158. [Google Scholar] [CrossRef]
Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. IEEE Trans. Multimed. 2018, 20, 3111–3122. [Google Scholar] [CrossRef]
Zhang, Z.; Guo, W.; Zhu, S.; Yu, W. Toward Arbitrary-Oriented Ship Detection With Rotated Region Proposal and Discrimination Networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1745–1749. [Google Scholar] [CrossRef]
Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8231–8240. [Google Scholar] [CrossRef]
Wang, A.; Wang, H.; Huang, Z.; Zhao, B.; Li, W. Directional Alignment Instance Knowledge Distillation for Arbitrary-Oriented Object Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
Yang, X.; Yan, J.; Liao, W.; Yang, X.; Tang, J.; He, T. SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 2384–2399. [Google Scholar] [CrossRef]
Zhang, J.; Xing, M.; Sun, G.C.; Li, N. Oriented Gaussian function-based box boundary-aware vectors for oriented ship detection in multiresolution SAR imagery. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–15. [Google Scholar] [CrossRef]
Gao, F.; Huo, Y.; Sun, J.; Yu, T.; Hussain, A.; Zhou, H. Ellipse encoding for arbitrary-oriented SAR ship detection based on dynamic key points. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–28. [Google Scholar] [CrossRef]
Qian, W.; Yang, X.; Peng, S.; Zhang, X.; Yan, J. RSDet++: Point-based modulated loss for more accurate rotated object detection. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 7869–7879. [Google Scholar] [CrossRef]
He, Y.; Gao, F.; Wang, J.; Hussain, A.; Yang, E.; Zhou, H. Learning polar encodings for arbitrary-oriented ship detection in SAR images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3846–3859. [Google Scholar] [CrossRef]
Pappas, O.; Achim, A.; Bull, D. Superpixel-Level CFAR Detectors for Ship Detection in SAR Imagery. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1397–1401. [Google Scholar] [CrossRef]
Gao, F.; Ma, F.; Wang, J.; Sun, J.; Yang, E.; Zhou, H. Visual Saliency Modeling for River Detection in High-Resolution SAR Imagery. IEEE Access 2018, 6, 1000–1014. [Google Scholar] [CrossRef]
Gao, G.; Ouyang, K.; Luo, Y.; Liang, S.; Zhou, S. Scheme of Parameter Estimation for Generalized Gamma Distribution and Its Application to Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2017, 55, 1812–1832. [Google Scholar] [CrossRef]
Wei, S.; Zeng, X.; Qu, Q.; Wang, M.; Su, H.; Shi, J. HRSID: A high-resolution SAR images dataset for ship detection and instance segmentation. IEEE Access 2020, 8, 120234–120254. [Google Scholar] [CrossRef]
Zhang, T.; Zhang, X.; Ke, X.; Zhan, X.; Shi, J.; Wei, S.; Pan, D.; Li, J.; Su, H.; Zhou, Y.; et al. LS-SSDD-v1. 0: A deep learning dataset dedicated to small ship detection from large-scale Sentinel-1 SAR images. Remote Sens. 2020, 12, 2997. [Google Scholar] [CrossRef]
Yang, X.; Yan, J. Arbitrary-oriented object detection with circular smooth label. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 677–694. [Google Scholar]
Yang, X.; Hou, L.; Zhou, Y.; Wang, W.; Yan, J. Dense label encoding for boundary discontinuity free rotation detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15819–15829. [Google Scholar]
Han, J.; Ding, J.; Xue, N.; Xia, G.S. Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2786–2795. [Google Scholar]
Zhao, S.; Liu, Q.; Yu, W.; Lv, J. A Single-Stage Arbitrary-Oriented Detector Based on Multiscale Feature Fusion and Calibration for SAR Ship Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 8179–8198. [Google Scholar] [CrossRef]
Zhang, L.; Liu, Y.; Zhao, W.; Wang, X.; Li, G.; He, Y. Frequency-Adaptive Learning for SAR Ship Detection in Clutter Scenes. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
An, Q.; Pan, Z.; Liu, L.; You, H. DRBox-v2: An Improved Detector With Rotatable Boxes for Target Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8333–8349. [Google Scholar] [CrossRef]
Pan, Z.; Yang, R.; Zhang, Z. MSR2N: Multi-stage rotational region based network for arbitrary-oriented ship detection in SAR images. Sensors 2020, 20, 2340. [Google Scholar] [CrossRef]
Guo, P.; Huang, K.; Xu, Z. Partial Differential Equations is All You Need for Generating Neural Architectures—A Theory for Physical Artificial Intelligence Systems. arXiv 2021, arXiv:2103.08313. [Google Scholar]
Fang, C.; Zhao, Z.; Zhou, P.; Lin, Z. Feature learning via partial differential equation with applications to face recognition. Pattern Recognit. 2017, 69, 14–25. [Google Scholar] [CrossRef]
Chen, Y.; Yu, W.; Pock, T. On learning optimized reaction diffusion processes for effective image restoration. In Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015; pp. 5261–5269. [Google Scholar]
Qian, W.; Yang, X.; Peng, S.; Guo, Y.; Yan, J. Learning modulated loss for rotated object detection. arXiv 2019, arXiv:1911.08299. [Google Scholar] [CrossRef]
Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for object detection. In Proceedings of the IEEE/CVF international conference on computer vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
Guo, P.; Celik, T.; Liu, N.; Li, H.C. Break through the border restriction of horizontal bounding box for arbitrary-oriented ship detection in SAR images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
Zhang, H.; Wen, S.; Wei, Z.; Chen, Z. High-Resolution Feature Generator for Small Ship Detection in Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5617011. [Google Scholar] [CrossRef]
Tan, Z.; Jiang, Z.; Yuan, Z.; Zhang, H. OPODet: Toward Open World Potential Oriented Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5645313. [Google Scholar] [CrossRef]
Tan, Z.; Jiang, Z.; Guo, C.; Zhang, H. WSODet: A weakly supervised oriented detector for aerial object detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–12. [Google Scholar] [CrossRef]
Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2849–2858. [Google Scholar]
Han, J.; Ding, J.; Li, J.; Xia, G.S. Align deep features for oriented object detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–11. [Google Scholar] [CrossRef]
Yang, X.; Yan, J.; Feng, Z.; He, T. R3det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 3163–3171. [Google Scholar]
Yang, X.; Zhou, Y.; Zhang, G.; Yang, J.; Wang, W.; Yan, J.; Zhang, X.; Tian, Q. The kfiou loss for rotated object detection. arXiv 2022, arXiv:2201.12558. [Google Scholar]
Yang, Z.; Xia, X.; Liu, Y.; Wen, G.; Zhang, W.E.; Guo, L. LPST-Det: Local-Perception-Enhanced Swin Transformer for SAR Ship Detection. Remote Sens. 2024, 16, 483. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]

Figure 1. The overall framework of our method.

Figure 2. The structure of the Burgess equation heuristic module.

Figure 3. Illustration of an OBB represented by an ADVO.

Figure 4. Decoding regression diagram of ADVO.

Figure 5. Visualization of the detection results of different methods on RSSDD. Red rectangles indicate the actual ship targets. Green and purple rectangles represent the detection results of five comparative methods and our method, respectively.

Figure 6. Visualization of the detection results of different methods on RSDD. Red rectangles indicate the actual ship targets. Green and blue rectangles represent the detection results of five comparative methods and our method, respectively.

Figure 7. Algorithm Performance under ship size variations. Red rectangles indicate the actual ship targets, and purple rectangles represent the detection results of our method.

Figure 8. Speed vs. accuracy on the RSSDD test set.

Table 1. Comparison of different methods on the RSSDD test dataset. Bold items indicate optimal values in a column, and underlined items indicate suboptimal values in a column.

Method	Backbone	Stage	Inshore
Method	Backbone	Stage	Recall	Precision	mAP
S2ANet [51]	R152	One	81.7	53.0	77.0
R3Det [52]	R152	One	80.9	35.2	76.8
KFiou [53]	R152	One	81.7	38.8	76.0
Orient-RCNN [45]	R152	Two	84.0	82.7	80.6
LPST-Det [54]	R152	Two	84.7	73.5	81.3
Our Method	CSPNext	Two	90.4	89.6	88.3
Method	Offshore			All Scenes
Method	Recall	Precision	mAP	mAP
S2ANet [51]	93.0	58.7	92.0	88.6
R3Det [52]	91.5	32.9	90.6	87.4
KFiou [53]	93.8	52.1	92.9	89.2
Orient-RCNN [45]	92.0	93.9	91.6	89.1
LPST-Det [54]	94.8	92.2	94.4	91.2
Our Method	94.1	87.4	93.0	92.6

Table 2. Ablation study.

ID	Burgs-Inspired	ADVO	mAP
1			88.6
2	✓		90.8
3		✓	91.9
4	✓	✓	92.6

Table 3. SPEED vs. Accuracy on the rssdd dataset. Bold items represent the best while underlined items represent the runner-up.

Methods	All Scenes
Methods	mAP	FPS
S2ANet [51]	88.6	24.3
R3Det [52]	87.4	22
KFiou [53]	89.2	24.4
Orient-RCNN [45]	89.1	23.8
LPST-Det [54]	91.2	23.6
Our Method	92.6	25.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, M.; Li, Y.; Guo, J.; Li, Y.; Gao, X. BurgsVO: Burgs-Associated Vertex Offset Encoding Scheme for Detecting Rotated Ships in SAR Images. Remote Sens. 2025, 17, 388. https://doi.org/10.3390/rs17030388

AMA Style

Zhang M, Li Y, Guo J, Li Y, Gao X. BurgsVO: Burgs-Associated Vertex Offset Encoding Scheme for Detecting Rotated Ships in SAR Images. Remote Sensing. 2025; 17(3):388. https://doi.org/10.3390/rs17030388

Chicago/Turabian Style

Zhang, Mingjin, Yaofei Li, Jie Guo, Yunsong Li, and Xinbo Gao. 2025. "BurgsVO: Burgs-Associated Vertex Offset Encoding Scheme for Detecting Rotated Ships in SAR Images" Remote Sensing 17, no. 3: 388. https://doi.org/10.3390/rs17030388

APA Style

Zhang, M., Li, Y., Guo, J., Li, Y., & Gao, X. (2025). BurgsVO: Burgs-Associated Vertex Offset Encoding Scheme for Detecting Rotated Ships in SAR Images. Remote Sensing, 17(3), 388. https://doi.org/10.3390/rs17030388

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

BurgsVO: Burgs-Associated Vertex Offset Encoding Scheme for Detecting Rotated Ships in SAR Images

Abstract

1. Introduction

2. Related Work

2.1. Ship Detection Methods Based on OBB

2.2. Neural Partial Differential Equations

2.3. Oriented Bounding Box Encoding Scheme

3. Method

3.1. Burgess Equation Heuristic Module

3.2. ADVO-Based Two-Stage Detector

3.2.1. ORPN

3.2.2. ORCNN

4. Experiment

4.1. Experimental Settings

4.1.1. Dataset

4.1.2. Experimental Details

4.2. Comparison of Representational Methods

4.2.1. Specific Comparison and Analysis

4.2.2. Visual Results

5. Discussion

5.1. Ablation Study

5.2. Speed and Complexity Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI