The range of oil spill pollution is wide and difficult to clean up; therefore, early detection of the oil spill area is of great significance to marine ecology, along with the subsequent cleanup work. Synthetic Aperture Radar (SAR) is a system that actively observes and images the target. Compared with traditional optical remote sensors, SAR is not affected by clouds, rain, and other external environments, and is capable of round-the-clock and all-weather imaging [
1,
2]. Therefore, it is widely used in military and civilian fields. The use of synthetic aperture radar (SAR), a strong and effective satellite monitoring tool, to monitor sea surface oil spills is also recognized as one of the ideal sensors at home and abroad. When the sea surface is covered by an oil film, the oil film inhibits the formation of short gravity waves and capillary waves at the sea surface and hinders the natural motion of the sea surface, which leads to the weakening of Bragg scattering in the oil spill region. In SAR images, the oil spill area will thus appear as a dark patch, contrasting with the surrounding sea surface area [
3,
4].
Despite the significant advantages of SAR imaging, whose images are usually characterized by high noise and low contrast, traditional SAR image processing methods, including thresholding-based, morphological processing, and traditional machine learning algorithms, are often ineffective when confronted with high-noise and low-contrast SAR images [
5]. These methods usually rely on pre-set parameters and feature extraction techniques, which are difficult to adapt to the complex and changing environment. In recent years, with the rapid development of artificial intelligence technology, deep learning techniques have been applied to sea surface oil spill segmentation using SAR images.
Long et al. [
6] proposed Fully Convolutional Networks (FCNs) for the semantic segmentation of images based on anti-convolution and jump connections in 2015. In the same year, Ronneberger et al. [
7] proposed a U-Net network based on FCN, which still outperforms other networks under the condition of using less image data for training and ensures that the size of the input image remains the same as the output image. Based on this advantage, the U-Net was applied to the detection of a marine oil spill from SAR images [
8,
9]. In 2016, Xu et al. [
10] put two independent artificial neural networks (ANNs) together to detect a marine oil spill from SAR images sequentially. The first ANN, a backpropagation neural network (BPNN), was used to segment the SAR images to identify black spots caused by oil spills or oil-like objects. Based on the extracted statistical features, the second BPNN is driven to distinguish between oil spills and similar oil spills. In 2017, Song et al. [
11] combined multiple fully polarized SAR feature data with an optimized wavelet neural network classifier (WNN) to validate the effectiveness of the proposed method by using two sets of fully polarized RADARSAT-2 SAR data, and the experimental results proved the method’s effectiveness and applicability to marine oil spills. The experimental results prove that the method is effective and applicable to marine oil spill classification. In 2018, Chen et al. [
12] developed the DeepLabV3+ network for semantic segmentation. This network introduces a unique encoding–decoding structure to better capture spatial information and boundary details; the encoder part extracts features through multi-scale context aggregation, while the decoder part recovers the spatial resolution. This encoding–decoding structure has also been widely used by researchers in new oil spill detection models. In 2019, Krestenitis et al. [
13] combined U-Net, LinkNet, DeepLabV3+, and other mainstream segmentation models’ backbone networks, which were replaced and applied to SAR image sea surface oil spill detection, and the results showed that DeepLabV3+ had the best comprehensive performance. In 2020, Zeng et al. [
14] proposed an oil spill convolutional network (OSCNet) with 12 weight layers, which deepens the network by deepening the network depth to better extract features and can learn oil spill detail features from the dataset better than hand-labeled detail features. In 2021, Shaban et al. [
15] proposed a deep learning framework that is designed as two parts. The first part is a convolutional neural network (CNN) using a Frost filter, where the final output is then differentiated by whether the number of pixels in the oil spill area accounted for more than 1% or not. The part that is greater than 1% is taken as the input of the second part of the framework, while the part that is less than 1% is directly rounded off. The second part uses a five-layer U-Net and a generalized dice loss function to optimize the inputs, and the oil spill pixel recognition accuracy reached 84%. Optical remote sensing images also play an important role in oil spill detection. Seydi et al. [
16] developed a new framework for oil spill detection in optical remote sensing images based on a multiscale multidimensional residual kernel convolutional neural network, which was investigated using a two-dimensional multiscale residual block and applied to a one-dimensional multiscale residual block. Li et al. [
17] proposed a deep learning framework based on predicting the probability of semantic spatial position distribution for remote sensing image alignment, which effectively overcame the sensitivity of the traditional methods to radiometric disparity and the problem of long processing time by optimizing the subpixel matching position and determining the semantic spatial probability distribution of the matching template. In 2022, Wang et al. [
18] designed the oil spill detection network BO-DRNet using polarization features. Its basic network architecture is based on DeepLabV3+ and the backbone network is replaced with ResNet-18, which makes the network capable of obtaining more complete detailed features of the oil spill while using Bayesian optimization (BO) to adjust the hyperparameters. The experimental results show that the average accuracy of BO-DRNet is 74.69% and the average Dice coefficient is 85.51%. In 2023, to address the problem of oil spill dataset limitation in SAR images, Fan et al. [
19] designed a Multi-task Generative Adversarial Network (MTGAN) oil spill detection model, which is capable of distinguishing between oil spills and similar oil spill regions and segmenting the oil spill regions within a single framework. The network only needs to use a small number of oil spill images as a training set for the network, and the experimental results show that the proposed MTGAN oil spill detection framework outperforms other models in oil spill classification and semantic segmentation. In 2024, Li et al. [
20] designed an oil spill segmentation network based on U-Net, which mainly consists of a multi-convolutional layer (MCL). The MCL module extracts the basic feature information of the SAR image, and a feature extraction module (FEM) further extracts and fuses different levels of feature maps generated by the U-Net decoder. After the above operations, the network can learn rich global and local contextual information, which improves the segmentation accuracy of the oil spill region. The identification of oil spills in SAR images mainly relies on computer image processing and pattern recognition technology in the aforementioned literature without the utilization of physical information such as the polarization and phase information of SAR images. One of the current state-of-the-art approaches is to combine the physical information of SAR images with neural networks for oil spill detection. Polarized SAR data provide a rich set of polarization features, enabling distinguishing different types of sea surface phenomena [
21,
22].
The unique imaging mechanism of SAR means that SAR images are characterized by high noise and low contrast; therefore, current methods are not fully applicable to the field of SAR image segmentation. To address the above problems, this paper first utilizes the original DeepLabV3+ model for sea surface oil spill detection using SAR images. The backbone network of this model is Xception; however, experimental results show that the detection performance of Xception–DeepLabV3+ is not good. Therefore, Xception is replaced by a lightweight feature extraction network, MobileNetV2, and the experimental results show that using MobileNetV2 as the backbone network leads to better detection performance than using Xception. Although the overall performance of the MobileNetV2–DeepLabV3+ model is significantly improved, it is still deficient in extracting the details of the oil spill area. For this reason, this paper introduces the spatial and channel Squeeze and Excitation module (scSE) into the MobileNetV2 backbone network and Atrous Spatial Pyramid Pooling (ASPP). Meanwhile, in order to solve the problem of category imbalance between the oil spill region and the sea surface background, the joint loss function of Bce + Dice is adopted. The scSE module improves the model’s focus on the channel and spatial information of the oil spill region by enhancing feature representation, while the joint loss function of Bce + Dice improves the model’s ability to detect the oil spill region by dealing with the category imbalance and optimizing the boundary details, especially in the extraction of the boundary details of the oil spill region, which has a significant effect on the detection of the oil spill area with boundary detail extraction significantly improved.