Retinal vessels contain much biological information and are the only clear blood vessels in the body that can be visualized by non-invasive means. Retinal vessel segmentation can be used to characterize the morphology of retinal vessels, such as length, width, branches pattern and angle. Current medical research suggests that retinal vasculopathy may precede cardiovascular diseases, such as hypertension, coronary artery disease, and diabetes, and that retinal vessel segmentation can be used as a basis for diagnosing related diseases [
1,
2,
3,
4,
5]. However, the complex distribution and trend of blood vessels in the retina, the large variation in size and the interference of lesions, as well as the low illumination and imaging resolution of fundus cameras, make it difficult to completely segment retinal vessels [
6]. Therefore, retinal vessel image segmentation has been a hot and difficult issue in the field of retinal image analysis.
In recent years, deep learning has proved to be powerful in the field of image segmentation [
7,
8,
9,
10]. Researchers have proposed many neural networks for vessel segmentation and achieved good segmentation results. In 2015, Long et al. proposed semantic segmentation using full convolutional neural networks [
11] for the first time, based on which a variety of different semantic segmentation networks have emerged [
12]. Among them, the UNet [
13] (U-shaped Convolutional Network) network model has gradually become the focus of the medical image segmentation field because of its good segmentation performance, after which many improved network models based on UNet have appeared. In 2018, Alom et al. utilized the power of UNet, residual networks, and RCNN [
14] (recurrent convolutional neural network) to propose RU-Net and R2U-Net [
15]. In 2019, Gu et al. proposed the CE-net [
16] model by replacing the traditional UNet encoding and decoding blocks with pre-trained ResNet-34 [
17] blocks, while applying dense atrous convolution and residual multi-kernel pooling to the bottleneck of the network, which achieved good results in retinal vessel segmentation. In 2020, Sinha et al. [
18] extracted the global context information through ResNet blocks in the encoding phase, and the feature maps obtained by guiding the attention modules (spatial self-attention and channel self-attention) in the decoding phase were used as the final segmentation results. In the same year, Li et al. proposed IterNet [
19], which makes retinal vessel segmentation more coherent through UNet and several mini-UNets, and is connected by dense jumps [
20] to prevent overfitting. In 2021, Wu et al. proposed SCS-net [
21], which replaces the traditional four-fold downsampling with three-fold downsampling. SCS-net employs scale-aware feature fusion through 3 × 3 convolutions with different strides at the bottleneck, incorporates attention mechanisms during skip connections, and applies semantic supervision by decoding feature maps at different stages during the final output. SCS-net has achieved outstanding results on common retinal vessel datasets. In 2022, a novel full-resolution network (FR-UNet) [
22] was proposed, which horizontally and vertically expands through a multi-resolution convolution interaction mechanism to address the issue of spatial information loss in traditional U-shaped segmentation networks. The feature aggregation module in FR-UNet integrates multi-scale feature maps from adjacent stages to supplement high-level contextual information. Modified residual blocks continuously learn multi-resolution representations to obtain pixel-level accuracy prediction maps. In 2022, Liu et al. proposed ResDO-UNet [
23]. ResDO-UNet introduces the residual DO-conv [
24] (ResDO-conv) network as a backbone network. ResDO-UNet proposes a Fusion Pooling Block (PFB) for non-linear fusion pooling during pooling operations. Additionally, it introduces an Attention Fusion Block (AFB) for multi-scale feature representation during skip connections. In 2023, Wei et al. proposed OCE-Net [
25]. OCE-Net is able to capture both orientation information and contextual information of vessels and fuses these two types of information together to improve the accuracy of segmentation. Also in 2023, Khan et al. proposed a Multi-resolution Contextual Network (MRC-Net) [
26] for retinal vessel segmentation that addresses these issues by extracting multi-scale features to learn contextual dependencies between semantically different features and using bi-directional recurrent learning to model former–latter and latter–former dependencies. Another key idea is training in adversarial settings for foreground segmentation improvement through optimization of the region-based scores. This novel strategy boosts the performance of the segmentation network in terms of the Dice score and corresponding Jaccard index while keeping the number of trainable parameters comparatively low. In 2024, He et al. proposed MU-Net [
27]. MU-Net employs a multi-scale residual convolution module (MRCM) to extract image features with different granularities and improves the feature utilization efficiency and reduces information loss through residual learning. Selective kernel units (SKUs) are introduced in jump connections to obtain multi-scale features through soft attention. A Residual Attention Module (RAM) is constructed in the decoder stage to further extract vascular features and improve processing speed. In the same year, Tan et al. proposed Anisotropic Perceptive Convolution (APC) and an Anisotropic Enhancement Module (AEM) to model visual cortex cells and their orientation selection mechanism, along with a novel network named W-shaped Deep Matched Filtering (WS-DMF) [
28]. This network features a W-shaped framework, with DMF based on a multilayer aggregation of APC designed to enhance vascular features and inhibit pathological features. AEMs are embedded into DMF to improve the orientation and position information of high-dimensional features. Furthermore, to enhance APC’s ability to perceive linear textures, such as blood vessels, an Orientation Anisotropic Loss (OAL) is introduced. In 2024, Ding et al. proposed RCAR-UNet [
29], a retinal vessel segmentation network algorithm that employs a novel rough channel attention mechanism. This method integrates deep neural networks to learn complex features and rough sets to handle uncertainty, designing rough neurons. A rough channel attention mechanism module is constructed based on these rough neurons and embedded in UNet skip connections to integrate high-level and low-level features. Additionally, residual connections are added to transmit low-level features to high-level, enhancing network feature extraction and aiding in gradient back-propagation during model training. Also in 2024, Liu et al. proposed a multi-scale feature fusion segmentation network called IMFF-Net [
30], which is based on a four-layer U-shaped architecture. It enhances the utilization of both high-level and low-level features through multi-scale feature fusion to improve segmentation performance.
(1) Proposes a plenary attention mechanism to enable the network to better learn and pay more attention to blood vessel structures of different shapes and sizes.
(2) Proposes use of DropBlock_Diagonal based on DropBlock, which is more suitable for retinal vessel datasets. Furthermore, it adds DropBlock_Diagonal to the traditional convolutional block to prevent the network from overfitting.
(3) Merges the feature maps containing vessel details at different scales obtained from each stage of the decoder to further improve the final vessel segmentation results.