PAM-UNet: Enhanced Retinal Vessel Segmentation Using a Novel Plenary Attention Mechanism

Wang, Yongmao; Wu, Sirui; Jia, Junhao

doi:10.3390/app14135382

Open AccessArticle

PAM-UNet: Enhanced Retinal Vessel Segmentation Using a Novel Plenary Attention Mechanism

by

Yongmao Wang

,

Sirui Wu

^* and

Junhao Jia

School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo 454003, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(13), 5382; https://doi.org/10.3390/app14135382

Submission received: 21 May 2024 / Revised: 14 June 2024 / Accepted: 16 June 2024 / Published: 21 June 2024

(This article belongs to the Special Issue Artificial Neural Network Applications in Healthcare and Biomedical Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Retinal vessel segmentation is critical for diagnosing related diseases in the medical field. However, the complex structure and variable size and shape of retinal vessels make segmentation challenging. To enhance feature extraction capabilities in existing algorithms, we propose PAM-UNet, a U-shaped network architecture incorporating a novel Plenary Attention Mechanism (PAM). In the BottleNeck stage of the network, PAM identifies key channels and embeds positional information, allowing spatial features within significant channels to receive more focus. We also propose a new regularization method, DropBlock_Diagonal, which discards diagonal regions of the feature map to prevent overfitting and enhance vessel feature learning. Within the decoder stage of the network, features from each stage are merged to enhance the segmentation accuracy of the final vessel. Experimental validation on two retinal image datasets, DRIVE and CHASE_DB1, shows that PAM-UNet achieves 97.15%, 83.16%, 98.45%, 83.15%, 98.66% and 97.64%, 85.82%, 98.46%, 82.56%, 98.95% on Acc, Se, Sp, F1, AUC, respectively, outperforming UNet and most other retinal vessel segmentation algorithms.

Keywords:

deep learning; retinal vessel segmentation; plenary attention mechanism; regularization

1. Introduction

Retinal vessels contain much biological information and are the only clear blood vessels in the body that can be visualized by non-invasive means. Retinal vessel segmentation can be used to characterize the morphology of retinal vessels, such as length, width, branches pattern and angle. Current medical research suggests that retinal vasculopathy may precede cardiovascular diseases, such as hypertension, coronary artery disease, and diabetes, and that retinal vessel segmentation can be used as a basis for diagnosing related diseases [1,2,3,4,5]. However, the complex distribution and trend of blood vessels in the retina, the large variation in size and the interference of lesions, as well as the low illumination and imaging resolution of fundus cameras, make it difficult to completely segment retinal vessels [6]. Therefore, retinal vessel image segmentation has been a hot and difficult issue in the field of retinal image analysis.

In recent years, deep learning has proved to be powerful in the field of image segmentation [7,8,9,10]. Researchers have proposed many neural networks for vessel segmentation and achieved good segmentation results. In 2015, Long et al. proposed semantic segmentation using full convolutional neural networks [11] for the first time, based on which a variety of different semantic segmentation networks have emerged [12]. Among them, the UNet [13] (U-shaped Convolutional Network) network model has gradually become the focus of the medical image segmentation field because of its good segmentation performance, after which many improved network models based on UNet have appeared. In 2018, Alom et al. utilized the power of UNet, residual networks, and RCNN [14] (recurrent convolutional neural network) to propose RU-Net and R2U-Net [15]. In 2019, Gu et al. proposed the CE-net [16] model by replacing the traditional UNet encoding and decoding blocks with pre-trained ResNet-34 [17] blocks, while applying dense atrous convolution and residual multi-kernel pooling to the bottleneck of the network, which achieved good results in retinal vessel segmentation. In 2020, Sinha et al. [18] extracted the global context information through ResNet blocks in the encoding phase, and the feature maps obtained by guiding the attention modules (spatial self-attention and channel self-attention) in the decoding phase were used as the final segmentation results. In the same year, Li et al. proposed IterNet [19], which makes retinal vessel segmentation more coherent through UNet and several mini-UNets, and is connected by dense jumps [20] to prevent overfitting. In 2021, Wu et al. proposed SCS-net [21], which replaces the traditional four-fold downsampling with three-fold downsampling. SCS-net employs scale-aware feature fusion through 3 × 3 convolutions with different strides at the bottleneck, incorporates attention mechanisms during skip connections, and applies semantic supervision by decoding feature maps at different stages during the final output. SCS-net has achieved outstanding results on common retinal vessel datasets. In 2022, a novel full-resolution network (FR-UNet) [22] was proposed, which horizontally and vertically expands through a multi-resolution convolution interaction mechanism to address the issue of spatial information loss in traditional U-shaped segmentation networks. The feature aggregation module in FR-UNet integrates multi-scale feature maps from adjacent stages to supplement high-level contextual information. Modified residual blocks continuously learn multi-resolution representations to obtain pixel-level accuracy prediction maps. In 2022, Liu et al. proposed ResDO-UNet [23]. ResDO-UNet introduces the residual DO-conv [24] (ResDO-conv) network as a backbone network. ResDO-UNet proposes a Fusion Pooling Block (PFB) for non-linear fusion pooling during pooling operations. Additionally, it introduces an Attention Fusion Block (AFB) for multi-scale feature representation during skip connections. In 2023, Wei et al. proposed OCE-Net [25]. OCE-Net is able to capture both orientation information and contextual information of vessels and fuses these two types of information together to improve the accuracy of segmentation. Also in 2023, Khan et al. proposed a Multi-resolution Contextual Network (MRC-Net) [26] for retinal vessel segmentation that addresses these issues by extracting multi-scale features to learn contextual dependencies between semantically different features and using bi-directional recurrent learning to model former–latter and latter–former dependencies. Another key idea is training in adversarial settings for foreground segmentation improvement through optimization of the region-based scores. This novel strategy boosts the performance of the segmentation network in terms of the Dice score and corresponding Jaccard index while keeping the number of trainable parameters comparatively low. In 2024, He et al. proposed MU-Net [27]. MU-Net employs a multi-scale residual convolution module (MRCM) to extract image features with different granularities and improves the feature utilization efficiency and reduces information loss through residual learning. Selective kernel units (SKUs) are introduced in jump connections to obtain multi-scale features through soft attention. A Residual Attention Module (RAM) is constructed in the decoder stage to further extract vascular features and improve processing speed. In the same year, Tan et al. proposed Anisotropic Perceptive Convolution (APC) and an Anisotropic Enhancement Module (AEM) to model visual cortex cells and their orientation selection mechanism, along with a novel network named W-shaped Deep Matched Filtering (WS-DMF) [28]. This network features a W-shaped framework, with DMF based on a multilayer aggregation of APC designed to enhance vascular features and inhibit pathological features. AEMs are embedded into DMF to improve the orientation and position information of high-dimensional features. Furthermore, to enhance APC’s ability to perceive linear textures, such as blood vessels, an Orientation Anisotropic Loss (OAL) is introduced. In 2024, Ding et al. proposed RCAR-UNet [29], a retinal vessel segmentation network algorithm that employs a novel rough channel attention mechanism. This method integrates deep neural networks to learn complex features and rough sets to handle uncertainty, designing rough neurons. A rough channel attention mechanism module is constructed based on these rough neurons and embedded in UNet skip connections to integrate high-level and low-level features. Additionally, residual connections are added to transmit low-level features to high-level, enhancing network feature extraction and aiding in gradient back-propagation during model training. Also in 2024, Liu et al. proposed a multi-scale feature fusion segmentation network called IMFF-Net [30], which is based on a four-layer U-shaped architecture. It enhances the utilization of both high-level and low-level features through multi-scale feature fusion to improve segmentation performance.

However, existing algorithms suffer from insufficient feature extraction capabilities, leading to difficulties in learning and focusing on different shapes and sizes of vessel structures. Meanwhile, the traditional Dropout [31] regularization method [32] is not suitable for convolutional operation. A U-shaped network (PAM-UNet) based on PAM is proposed to address the above problems. This network has the following features:

(1) Proposes a plenary attention mechanism to enable the network to better learn and pay more attention to blood vessel structures of different shapes and sizes.

(2) Proposes use of DropBlock_Diagonal based on DropBlock, which is more suitable for retinal vessel datasets. Furthermore, it adds DropBlock_Diagonal to the traditional convolutional block to prevent the network from overfitting.

(3) Merges the feature maps containing vessel details at different scales obtained from each stage of the decoder to further improve the final vessel segmentation results.

The remaining sections of this paper are organized as follows: Section 2 gives a detailed description of the proposed PAM-UNet. In Section 3, two publicly available datasets and experimental details are presented. In Section 4, we demonstrate the experimental results and analyze the performance of the PAM-UNet in vessel segmentation. Finally, the conclusions of this paper and the future outlook are presented in Section 5.

2. Methods

To achieve automatic segmentation of retinal vessels, we propose a novel model called PAM-UNet. This section offers a detailed description of the PAM-UNet model and each of its designed modules.

2.1. Overall Network Architecture

PAM-UNet uses a network architecture of encoder and decoder, as shown in Figure 1. First, the original color fundus image is preprocessed and cropped to a suitable size for input to the network. In order to preserve more high-level semantic information and prevent the network from overfitting, PAM-UNet adds DropBlock_Diagonal to all the convolutional blocks except BottleNeck (n.b., DropBlock_Diagonal convolution block is abbreviated as Diag_Conv in the following). In the encoding stage, two Diag_Conv blocks are passed through(2*Diag_Conv means two Diag_Conv are used), followed by Max-Pooling with a step size of 2 × 2 to reduce the feature map size to half of its original size. The above operation is repeated three times and the final feature map size obtained in the encoding stage is one eighth of the original map size. Contrary to references [33,34,35], the plenary attention mechanism is not within the skip connections, but instead is at the bottleneck of the network. The spatial features in the meaningful channels are paid attention to by the PAM in the BottleNeck block of the network. In the decoding stage, the feature map is enlarged to twice its original size by two Diag_Conv, bilinear interpolation upsampling steps. The above operation is repeated three times to restore the feature map to the original image size. At the same time, in order to preserve more contextual information to enhance the vessel segmentation effect, the feature maps obtained from each stage of the decoder are merged. A convolution operation is performed on the low-dimensional feature maps obtained from each stage of the decoder to match the number of channels with the last stage of the decoder. Then, using bilinear interpolation, these are upsampled to match the resolution with the last stage of the decoder. The resulting feature maps obtained are added through convolution and upsampling to obtain the final feature map. Finally, a 1 × 1 convolution operation is used to obtain the segmentation result.

2.2. Plenary Attention Mechanism

In order to better learn and attend to vessel structures of different shapes and sizes, we propose a novel method called the Plenary Attention Mechanism (PAM). The design of PAM is inspired by existing attention mechanisms, such as Squeeze-and-Excitation (SE) [36] and Coordinate Attention (CA) [37].

The plenary attention mechanism first focuses on those meaningful channels of the feature map, and then focuses on the meaningful information on those meaningful channels along the X- and Y-directions of the feature map. This operation is exactly the same as the method of reading in real life—finding the important pages first, and then reading these important pages in detail.

The PAM can be divided into two parts: channel attention and positional embedding. In convolutional networks, many channels are redundant, and highlighting meaningful channels enables the network to capture more important feature information. However, using only channel attention may overlook positional information. After applying channel attention to the feature maps, average-pooling is performed along the X- and Y-directions of the feature maps separately. The resulting feature maps are then fed through a shared MLP (multi-layer perceptron) [38] for learning. This process helps preserve accurate spatial positional information in one direction, aiding the network in more accurately localizing the target of interest. A schematic of the plenary attention mechanism is shown in Figure 2.

In implementing channel attention, the input feature map F is first passed through a global average-pooling operation to generate

F_{a v g}

(C × 1 × 1). Then,

F_{a v g}

is passed through an MLP containing a hidden layer. To reduce the computational cost, the number of channels will be compressed to C/r in the hidden layer, with r being the reduction rate. Finally, the channel weights are obtained by sigmoiding the feature vectors through the MLP. The obtained weights are multiplied with F to obtain

F_{c}

.

F_{c}

will highlight important channels. In summary, the channel attention is calculated as in Equation (1).

F_{c} = σ (M L P (A v g P o o l (F))) \cdot F = σ (W_{1} (W_{0} (F_{a v g}))) \cdot F

(1)

Here,

σ

is the sigmoid function with

W_{0} \in R^{\frac{C}{r} \times C}

,

W_{1} \in R^{C \times \frac{C}{r}}

. · stands for element-by-element multiplication.

To implement positional embedding, average-pooling is performed along the X-, Y-direction of

F_{c}

to obtain

F_{x}

and

F_{y}

. Then,

F_{x}

and

F_{y}

are passed through a shared MLP containing a hidden layer. The reduction rate of MLP is the same as that in channel attention. Finally the feature vectors from the MLP are sigmoided to obtain the weights in the X-, Y-direction. The obtained weights are multiplied with F to obtain

F^{'}

.

F^{'}

not only highlights the important channels but also adds the position information. The positional embedding is calculated in Equation (2).

\begin{matrix} F^{'} & = σ (M L P (X_{A v g P o o l} (F_{c}))) \cdot σ (M L P (Y_{A v g P o o l} (F_{c}))) \cdot F \\ = σ (W_{2} (W_{3} (F_{x}))) \cdot σ (W_{2} (W_{3} (F_{y}))) \cdot F \end{matrix}

(2)

Here,

σ

is the sigmoid function with

W_{2} \in R^{\frac{C}{r} \times C}

,

W_{3} \in R^{C \times \frac{C}{r}}

. · stands for element-by-element multiplication.

2.3. DropBlock_Diagonal

The regularization method Dropout randomly discards some activation units during network training and is not applicable to convolutional layers containing spatial correlations. In 2018, Ghiasi et al. proposed DropBlock [39], which was designed and tested with a shared DropBlock mask across different channels. DropBlock discards some contiguous regions of the feature map to lose some semantic information (non-vascular or vascular regions) and uses the remaining activation units to learn the features of the input image, as shown in Figure 3. DropBlock controls the number of activation units to be dropped through

γ

, which is calculated as shown in Equation (3).

γ = \frac{(1 - k e e p_p r o b) \times (w \times h)}{b l o c k_s i z e^{2} \times (w - b l o c k_s i z e + 1) \times (h - b l o c k_s z i e + 1)}

(3)

where

b l o c k_s i z e

denotes the size of the discarded block,

k e e p_p r o b

denotes the probability of retaining the activation unit, w denotes the width of the feature map, and h denotes the height of the feature map.

Figure 3b shows the effect of Dropout, where the activation units are randomly discarded. This method does not effectively remove semantic information (vessel or non-vessel regions) because nearby activation units contain redundant information. In contrast, Figure 3c illustrates the effect of DropBlock, where consecutive regions are discarded. This approach effectively removes specific semantic information, thereby forcing the remaining activation units to learn and represent the features of the fundus image more accurately.

By observing the characteristics of retinal vessel data, it is evident that vascular direction is often intricate and tends to be angled. Therefore, by discarding the diagonal of the relevant region, the network is enabled to better learn vascular information. So, we propose DropBlock_Diagonal based on DropBlock. DropBlock_Diagonal controls the diagonal length using

d i a g_l

,

d i a g_t y p e

to control whether it is the primary diagonal or the secondary diagonal, and

n u m_d i a g

to control the diagonal width. The specific implementation steps are shown in Algorithm 1. Similar to DropBlock, DropBlock_Diagonal also controls the number of dropped activation units using

γ

, calculated according to the formula shown in Equation (4). In DropBlock, the area of the discarded region is represented by

b l o c k_s i z e^{2}

. Following a similar approach, in DropBlock_Diagonal, we denote the area of the discarded region by

\sqrt{2} d i a g_l + 2 \sqrt{2} (\sum_{i = 1}^{n u m_d i a g} (d i a g_l - i))

. Similar to

b l o c k_s i z e^{2}

,

\sqrt{2} d i a g_l + 2 \sqrt{2} (\sum_{i = 1}^{n u m_d i a g} (d i a g_l - i))

denotes the number of pixel points in the discarded region. A DropBlock_Diagonal illustration is shown in Figure 4.

γ = \frac{(1 - k e e p_p r o b) \times (w \times h)}{\sqrt{2} d i a g_l + 2 \sqrt{2} (\sum_{i = 1}^{n u m_d i a g} (d i a g_l - i)) \times (w - d i a g_l + 1) \times (h - d i a g_l + 1)}

(4)

where

k e e p_p r o b

denotes the probability of keeping the activation unit, w denotes the width of the feature map, and h denotes the height of the feature map.

Algorithm 1: DropBlock_Diagonal

1:: Input: a layer output (A), $d i a g_l$ , $γ$ , $m o d e$ , $d i a g_t y p e$ , $n u m_d i a g$
2:: if $m o d e$ == $I n f e r e n c e$ then
3:: return A
4:: end if
5:: Random sampling of M, M: $M_{i, j}$ ∼ Bernoulli( $γ$ )
6:: For each zero $M_{i, j}$ , generate diag_mask centered on $M_{i, j}$ . Generate diag_mask based on $d i a g_l$ , $d i a g_t y p e$ , $n u m_d i a g$ , and make C copies of it so that it is the same as the number of A channels.
7:: Apply mask: A = $A \times M$
8:: Regularize the feature map: A = $A \times c o u n t (M)$ / $c o u n t_o n e s (M)$

In order to prevent the network from overfitting, DropBlock_Diagonal is added to the convolutional block. The convolutional block is illustrated in Figure 5a. DropBlock_Diagonal is specifically added at the location, after the convolutional layer and before BN [40] and Relu [41], as shown in Figure 5b (here, Drop_Diag stands for DropBlock_Diagonal).

2.4. Multi-Scale Feature Merging Module

In order to retain more detailed information about the vessels to improve the final vessel segmentation results, the Multi-scale Feature Merging Module (MFMM) is introduced. The MFMM merges the feature maps obtained at each stage of the decoder so that the final feature maps contain more information about the blood vessels, which, in turn, improves the final blood vessel segmentation accuracy and effectiveness. The MFMM realization step is shown in Equations (5) and (6). In this module, the feature maps of each scale are denoted as

F_{x}

, and x denotes each stage of the decoder. The feature maps at each stage of the decoder have different resolutions and channel numbers. Firstly, it is ensured that the feature maps obtained from each stage have the same number of channels by using a 1 × 1 convolutional layer. Then, using bilinear interpolation, they are upsampled to obtain feature maps with the same resolution, denoted as

F_{x}^{'}

. Next, the obtained feature maps are added together to form the final feature map,

F_{f i n a l}

. Finally, a 1 × 1 convolutional layer is used to obtain the final segmentation result, as illustrated in Figure 1.

F_{x}^{'} = U P (1 \times 1 C o n v (F_{x})) x = 1, 2, 3

(5)

F_{f i n a l} = F_{1}^{'} + F_{2}^{'} + F_{3}^{'}

(6)

Here,

1 \times 1 C o n v ()

denotes the 1 × 1 convolution operation,

U P

denotes bilinear interpolation upsampling, and + denotes summation by channel.

2.5. Loss Function

For vessel segmentation, both the precision and the recall of the segmentation need to be taken into account [42].

Dice Loss can more accurately measure the similarity between predicted results and the ground truth, allowing the model to pay more attention to the coverage of the vascular region during the training process, thus increasing the recall of the segmentation. However, Dice Loss only considers the similarity between the predicted results and the ground truth, and there is no penalty mechanism for misclassification to fully optimize the accuracy of the model.

Binary Cross-Entropy (BCE) Loss is primarily used for binary classification tasks. It enables training of classifiers by computing the difference between the predicted results and the ground truth, exhibiting excellent performance in penalizing classification errors. However, for the retinal vessel segmentation task, the vessel region tends to occupy a smaller proportion of the whole image, and the model may tend to predict more vessel as background. To minimize the effect of category imbalance, category weights are introduced to adjust the loss function. The weights are adjusted to control the contribution of positive samples to the loss function, so that the model can better focus on vessel information.

PAM-UNet uses Dice Loss and BCE Loss in combination to fully utilize their respective advantages, thus improving the model performance. Dice Loss, BCE Loss, and the final joint loss function are shown in Equations (7)–(9).

L_{D i c e} = 1 - \frac{2 \sum_{\begin{matrix} i = 1 \end{matrix}}^{N} y_{i} p_{i}}{\sum_{\begin{matrix} i = 1 \end{matrix}}^{N} y_{i} + \sum_{\begin{matrix} i = 1 \end{matrix}}^{N} p_{i}}

(7)

L_{B C E} = - \frac{1}{N} \sum_{i = 1}^{N} [w_{i} y_{i} log (p_{i}) + (1 - y_{i}) log (1 - p_{i})]

(8)

L o s s = L_{D i c e} + L_{B C E}

(9)

where N is the number of pixel points,

y_{i}

is the true category,

p_{i}

is the predicted category, and

w_{i}

is the positive sample weight.

3. Experiments

3.1. Experimental Platform and Training Parameter Settings

The operating system is Ubuntu (v. 22.04), The GPU is NVIDIA RTX 4080 with 16 GB of memory (NVIDIA, Santa Clara, CA, USA). The programming language is Python 3.8, using the PyTorch deep learning framework. RMSprop [43] is chosen as the optimizer for training the model, with a momentum of 0.9 and weight decay of 0. The model is trained for 400 epochs, with an initial learning rate of 0.00001 and a batch size of 2.

3.2. Datasets

In this section, experiments undertaken on two datasets, DRIVE [44] and CHASE_DB1 [45], are reported to verify the effectiveness of PAM-UNet. The DRIVE public dataset consists of a total of 40 color fundus images with a resolution of 584 × 565, where binary mask and GT labels are provided for each image. Each fundus image corresponds to two images manually segmented by an expert. The annotation of the first expert is selected as the GT label for network training, and the first 20 images are used for training and the remaining 20 images are used for testing. The CHASE_DB1 public dataset contains 28 color fundus images with a resolution of 999 × 960. Similarly, each color fundus image has an expert segmentation result corresponding to it, and the annotation of the first expert is selected as the GT label. In this paper, the dataset is divided into 20 training images and 8 test images. DRIVE and CHASE_DB1 details are shown in Table 1.

Since fundus images constitute small-sample data, data augmentation [46] is necessary during the training process, including random scaling, random horizontal flipping, vertical rotation, and random cropping. Simultaneously, the training images are converted to grayscale, followed by adaptive histogram equalization to enhance the contrast between vessels and background while suppressing noise. Additionally, local adaptive gamma correction [47] is applied to correct retinal images, mitigating uneven illumination and central line reflection phenomena. The preprocessed images are shown in Figure 6.

The image size of each dataset varies and is cropped at the time of input to the network for training when the image in DRIVE is cropped to 480 × 480 and the image in CHASE_DB1 is cropped to 800 × 800.

3.3. Evaluation Metrics

To evaluate the performance of PAM-UNet, we selected Accuracy (Acc), Specificity (Sp), Sensitivity (Se), F1-score and AUC as evaluation metrics. The formulas for these metrics are shown in Equations (10)–(13).

A c c = \frac{T P + T N}{T P + T N + F P + F N}

(10)

S e = \frac{T P}{T P + F N}

(11)

S p = \frac{T N}{T N + F P}

(12)

F 1 = \frac{2 T P}{2 T P + F N + F P}

(13)

where TP represents the number of pixels correctly classified as vessels, TN represents the number of pixels correctly classified as background, FP represents the number of pixels incorrectly classified as background, and FN represents the number of pixels incorrectly classified as vessels. Acc represents the proportion of correctly labeled vessel and background pixels out of the total pixels in the entire image. Sp represents the proportion of pixels correctly predicted as vessels among all pixels that are actually vessels. Se represents the proportion of pixels correctly predicted as background among all pixels that are actually background. F1 can be used to evaluate a classifier in a more comprehensive way. The larger F1 is, the higher the quality of the model is. In addition, AUC reflects the relationship between Se and Sp and is used to characterize the predictive accuracy of vessels segmentation results.

4. Results

In this section, we systematically analyze the performance of PAM-UNet in the following aspects: First, we show the relevant hyperparameter settings, for example, reduction rate (r), shape of DropBlock_Diagonal. Second, we compare PAM-UNet with some other vessel segmentation methods proposed in recent years to verify the excellent performance of the model. Then, we report ablation experiments on the DRIVE dataset to demonstrate the effectiveness of the designed module.

4.1. Reduction Rate in PAM

Table 2 illustrates the impact of different reduction rates on PAM. The experiment incorporates PAM into the BottleNeck. When r = 8, the network pays more attention to vascular information, with the highest Se index reaching 82.69%. When r = 32, the network achieves optimal performance in terms of Acc, Sp, and F1, reaching 97.14%, 98.53%, and 82.93%, respectively. To further demonstrate the effectiveness of PAM, Figure 7 presents segmentation results of the network integrating PAM with different reduction rates. The red boxes indicate some parts where the segmentation results of PAM-UNet are better. The red box in the bottom right corner of the figure indicates the enlarged segmentation detail. TP pixels are marked in white, FP pixels in green, and FN pixels in red. Both FP and FN pixels represent misclassified pixels. Observing the first row of segmentation results, it is evident that when r = 32, the network can better handle complex vascular trends. The second row indicates that the introduction of PAM alleviates vascular discontinuity to varying degrees, with the best results achieved when r = 32. The third row demonstrates that when r = 32, the network can more accurately segment fine blood vessels. In this paper, PAM selects r = 32 as the reduction ratio.

4.2. The Location of DropBlock_Diagonal

Through experiments, it was found that the location of DropBlock_Diagonal affects the segmentation performance of the model. Experimenting on the DRIVE dataset, we investigate the impact of DropBlock_Diagonal’s location on the segmentation performance of the model. For different DropBlock_Diagonal location, the influence of

k e e p_p r o b

on network performance (i.e., F1) is illustrated in Figure 8. Dropdiag_All replaces all traditional convolution blocks with structured DropBlock_Diagonal convolution blocks. Dropdiag_NoBottleneck replaces all traditional convolution blocks except for BottleNeck with structured DropBlock_Diagonal convolution blocks. Dropdiag_NoInconv replaces all traditional convolution blocks except for the first stage of the encoding phase with structured DropBlock_Diagonal convolution blocks. Dropdiag_NoInconv_NoBottleneck replaces all traditional convolution blocks except for the first stage of the encoding phase and BottleNeck with structured DropBlock_Diagonal convolution blocks. “Dropout” indicates the addition of dropout to all traditional convolution blocks.

From Figure 8, it can be observed that Dropdiag_All and Dropdiag_NoBottleneck achieve the best performance, both outperforming the network with dropout. In order to preserve more high-level semantic information, this paper selects Dropdiag_NoBottleNeck as the backbone of the network.

4.3. Shape of DropBlock_Diagonal

In this section, we demonstrate the effectiveness of DropBlock_Diagonal through extensive experiments. Figure 9a shows the change in accuracy as

k e e p_p r o b

increases when different shapes of DropBlock_Diagonal are added to the network. Figure 9b shows the change in F1 as

k e e p_p r o b

increases when different shapes of DropBlock_Diagonal are added to the network.

Here, Dxy represents DropBlock_Diagonal, where x represents the value of

d i a g_l

, and y represents the value of

n u m_d i a g

. From Figure 9a, it can be observed that the network segmentation accuracy is generally higher when

k e e p_p r o b

is between 0.94 and 0.99. From Figure 9b, it can be seen that the network’s F1 is generally higher when

k e e p_p r o b

is between 0.91 and 0.96. Considering both Acc and F1, the network performs best when

k e e p_p r o b

is set to 0.94–0.96. In this study, D70 is selected as the DropBlock_Diagonal shape for the DRIVE dataset with

k e e p_p r o b

set to 0.96, and for the CHASE_DB1 dataset with

k e e p_p r o b

set to 0.94. When the network incorporates D70, the model loss changes as shown in Table 3. After the addition of D70, Train_Loss increases while Val_Loss decreases, indicating that the inclusion of D70 improves the network’s overfitting situation.

It is also observed that the network’s performance is generally better than when the network does not include DropBlock_Diagonal (when

k e e p_p r o b

= 1). DropBlock_Diagonal, by setting its dropout regions to diagonals of different widths, can make the network more sensitive to vessel information and better learn vessel features.

4.4. Vessel Segmentation Results

To fully demonstrate the segmentation performance of PAM-UNet, Figure 10 and Figure 11 display the segmentation results of PAM-UNet on the DRIVE and CHASE_DB1 datasets, respectively. The red boxes indicate some parts where the segmentation results of PAM-UNet are better. TP pixels are marked in white, FP pixels in green, and FN pixels in red. Both FP and FN pixels represent misclassified pixels. In Figure 10, the first row is the original color fundus image of the input model, the second row is the GT label, the third row is the graph of the result of the comparison between UNet and GT label, and the fourth row is the graph of the segmentation result of PAM-UNet compared with the GT label. As can be seen from Figure 10 and Figure 11, the segmentation results of PAM-UNet are more accurate (fewer red and green pixels).

In order to fully validate the segmentation performance of PAM-UNet, Figure 12 and Figure 13 show the segmentation results of vessel details after zooming. TP pixels are marked in white, FP pixels in green, and FN pixels in red. Both FP and FN pixels represent misclassified pixels. In the Figure 12, the first row is the localization of the fundus image, the second row is the GT label, the third row is the UNet segmentation result compared with the GT label, and the fourth column is the PAM-UNet segmentation result compared with the GT label. Figure 12 shows the segmentation details of the DRIVE dataset. The first column of the segmentation results in Figure 12 shows that PAM-UNet can clearly distinguish the blood vessels from the background under low contrast. From the segmentation results in the second and third columns, it can be seen that in the unevenly illuminated optic disk region, artifacts can interfere, causing misclassification of the background and blood vessels, but PAM-UNet is able to reduce the influence of artifacts. Figure 13 shows the segmentation details of the CHASE_DB1 dataset. As can be seen from the first column of Figure 13, PAM-UNet is better able to deal with low contrast and complex vessels alignment. As shown in the second, third, and fourth columns of Figure 13, UNet is susceptible to artifacts in the unevenly illuminated optic disc and macular region, resulting in severe vessel breakage, while PAM-UNet can reduce the effects of artifacts and other problems, resulting in more consistent and accurate vessel segmentation. As can be seen in the fifth column of Figure 13, PAM-UNet is able to better handle the situation of low contrast and small vessels. As shown in the sixth column of Figure 13, PAM-UNet can segment the bifurcation details at the end of the vessels which cannot be detected by UNet, and improves the phenomenon of fine vessel breakage.

4.5. Comparison with the Existing Methods

To further validate the performance of PAM-UNet, this paper compares the proposed segmentation algorithm with several typical algorithms. The comparison results are shown in Table 4 and Table 5. From Table 4, it can be seen that on the DRIVE dataset, PAM-UNet achieves evaluation results of 97.15% for Acc, 83.16% for Se, 98.45% for Sp, 83.15% for F1, and 98.66% for AUC, which are significantly better than for other models. As shown in Table 5, on the CHASE_DB1 dataset, PAM-UNet achieves evaluation results of 97.64% for Acc, 85.82% for Se, 98.46% for Sp, 82.56% for F1, and 98.95% for AUC, with the Sp result being only 0.03% lower than the optimal result. PAM-UNet achieves the best results in terms of Acc, Se, F1, and AUC on both datasets, demonstrating the effectiveness of the model.

4.6. Ablation Analysis

Use of PAM-UNet demonstrates that PAM, DropBlock_Diagonal, and MFMM can enhance the segmentation performance of the algorithm to some extent based on ablation experiments conducted on the DRIVE dataset. To ensure the fairness of the experiments, all comparison methods employ the same training strategy and hyperparameter settings. Table 6 presents the results of the ablation experiments.

As shown in Table 6, when PAM is added to the BottleNeck of the network, F1 increases by 0.06%. This indicates that the addition of PAM allows the network to better learn and focus on vascular structures of different shapes and sizes. When DropBlock_Diagonal is added to the network, F1 increases by 0.14% and AUC by 0.03%. This indicates that the inclusion of DropBlock_Diagonal enables the network to better learn vascular features. At this point, the parameters and FLOPs of the network remain unchanged, as DropBlock_Diagonal only functions during model training. When MFMM is added to the model, F1 increases by 0.01% and AUC by 0.03%. This indicates that merging feature maps at each stage of the decoder improves the model’s quality and segmentation capability. When both PAM and DropBlock_Diagonal are added to the model, F1 increases by 0.18% and AUC by 0.04%. This demonstrates that the combination of PAM and DropBlock_Diagonal is superior to adding PAM or DropBlock_Diagonal individually, further proving the effectiveness of PAM and DropBlock_Diagonal. When both DropBlock_Diagonal and MFMM are added to the model, F1 increases by 0.15% and AUC by 0.06%. This shows that the combination of DropBlock_Diagonal and MFMM is superior to adding DropBlock_Diagonal or MFMM individually, further proving the effectiveness of DropBlock_Diagonal and MFMM. When both PAM and MFMM are added to the model, F1 increases by 0.07% and AUC by 0.04%. This shows that the combination of PAM and MFMM is superior to adding PAM or MFMM individually, further proving the effectiveness of PAM and MFMM. When PAM, DropBlock_Diagonal, and MFMM are added to the model, F1 and AUC reach their optimal values of 83.15% and 98.66%, respectively.

Ablation experiments on the DRIVE dataset indicate that the designed modules each improve the network’s segmentation performance to varying degrees. PAM-UNet only increases the parameters by 0.02 M and the FLOPs by 0.03 G. However, it improves F1 by 0.28% and AUC by 0.1%. PAM-UNet achieves significant improvements in F1 and AUC at the cost of only a small increase in parameters and FLOPs. Specifically, PAM-UNet achieves 83.15% in F1, 98.66% in AUC, 4.30 M in parameters, and 154.36 G in FLOPs.

5. Conclusions

PAM-UNet is optimized in three main ways. By incorporating the Plenary Attention Mechanism (PAM) in the network’s BottleNeck block, our method is able to find key channels and embed positional information, enabling full-dimensional attention across the X-, Y-, and C-directions of the feature maps. This results in more precise and detailed feature extraction, which is particularly important for complex structures like retinal vessels. Our proposed DropBlock_Diagonal method, which is specifically tailored for retinal vessel datasets, is added to the traditional convolutional blocks. This helps to prevent overfitting by discarding contiguous regions of activation units, forcing the network to learn more robust features. This is especially beneficial in medical image segmentation, where datasets are often limited in size. The integration of feature maps at each decoder stage in the final output ensures that vessel details are preserved, leading to improved segmentation accuracy. Our experiments demonstrate that PAM-UNet achieves better segmentation results on the DRIVE and CHASE_DB1 datasets compared to most other algorithms, indicating its effectiveness and robustness. Through extensive testing, we observed that PAM-UNet performs well even in challenging scenarios with low contrast and severe lesions. This highlights the model’s ability to handle complex cases effectively, making it a valuable tool in medical image analysis. While our current focus has been on retinal vessel segmentation, the architecture and mechanisms we propose have potential applications in other medical fields. We plan to explore these possibilities in our future work.

Author Contributions

Conceptualization, Y.W.; methodology, S.W.; software, S.W.; validation, Y.W., S.W. and J.J.; formal analysis, Y.W.; investigation, Y.W. and S.W.; resources, Y.W.; data curation, S.W.; writing—original draft preparation, S.W.; writing—review and editing, Y.W. and S.W.; visualization, S.W.; supervision, Y.W.; project administration, Y.W.; funding acquisition, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

Key Science and Technology Program of Henan Province (222102210131); Doctoral Fund Support Project of Henan Polytechnic University (B2014-043).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

We use two publicly available retinal image datasets to evaluate the segmentation network proposed in this paper, namely, the DRIVE dataset and the CHASE_DB1 dataset. They can be downloaded from the URL: https://www.kaggle.com/datasets/andrewmvd/drive-digital-retinal-images-for-vessel-extraction (accessed on 15 June 2024) and https://www.kaggle.com/datasets/rashasarhanalharthi/chase-db1 (accessed on 15 June 2024).

Conflicts of Interest

The authors declare no conflict of interest.

References

Chen, C.; Chuah, J.H.; Ali, R.; Wang, Y. Retinal vessel segmentation using deep learning: A reviewn. IEEE Access 2021, 9, 111985–112004. [Google Scholar] [CrossRef]
Qin, Q.; Chen, Y. A review of retinal vessel segmentation for fundus image analysis. Eng. Appl. Artif. Intell. 2024, 128, 107454. [Google Scholar] [CrossRef]
Kumar, K.S.; Singh, N.P. Analysis of retinal blood vessel segmentation techniques: A systematic survey. Multimed. Tools Appl. 2023, 82, 7679–7733. [Google Scholar] [CrossRef]
Brazionis, L.; Quinn, N.; Dabbah, S.; Ryan, C.D.; Møller, D.M.; Richardson, H.; Keech, A.C.; Januszewski, A.S.; Grauslund, J.; Rasmussen, M.L.; et al. Review and comparison of retinal vessel calibre and geometry software and their application to diabetes, cardiovascular disease, and dementia. Graefe’s Arch. Clin. Exp. Ophthalmol. 2023, 261, 2117–2133. [Google Scholar] [CrossRef]
Sule, O.O. A survey of deep learning for retinal blood vessel segmentation methods: Taxonomy, trends, challenges and future directions. IEEE Accessy 2022, 10, 38202–38236. [Google Scholar] [CrossRef]
Cervantes, J.; Cervantes, J.; García-Lamont, F.; Yee-Rendon, A.; Cabrera, J.E.; Jalili, L.D. A comprehensive survey on segmentation techniques for retinal vessel segmentation. Neurocomputing 2023, 556, 126626. [Google Scholar] [CrossRef]
Aljabri, M.; AlGhamdi, M. A review on the use of deep learning for medical images segmentation. Neurocomputing 2022, 506, 311–335. [Google Scholar] [CrossRef]
Liu, X.; Song, L.; Liu, S.; Zhang, Y. A review of deep-learning-based medical image segmentation methods. Sustainability 2021, 13, 1224. [Google Scholar] [CrossRef]
Khan, M.Z.; Gajendran, M.K.; Lee, Y.; Khan, M.A. Deep Neural Architectures for Medical Image Semantic Segmentation: Review. IEEE Access 2021, 9, 83002–83024. [Google Scholar] [CrossRef]
Wang, R.; Lei, T.; Cui, R.; Zhang, B.; Meng, H.; Nandi, A.K. Medical image segmentation using deep learning: A survey. IET Image Process. 2022, 16, 1243–1267. [Google Scholar] [CrossRef]
Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
Guo, Y.; Liu, Y.; Georgiou, T.; Lew, M.S. A review of semantic segmentation using deep neural networks. Int. J. Multimed. Inf. Retr. 2018, 7, 87–93. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
Liang, M.; Hu, X. Recurrent convolutional neural network for object recognition. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3367–3375. [Google Scholar]
Alom, M.Z.; Hasan, M.; Yakopcic, C.; Taha, T.M.; Asari, V.K. Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation. arXiv 2018, arXiv:1802.06955. [Google Scholar]
Gu, Z.; Cheng, J.; Fu, H.; Zhou, K.; Hao, H.; Zhao, Y.; Zhang, T.; Gao, S.; Liu, J. CE-Net: Context Encoder Network for 2D Medical Image Segmentation. IEEE Trans. Med. Imaging 2019, 38, 2281–2292. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Sinha, A.; Dolz, J. Multi-Scale Self-Guided Attention for Medical Image Segmentation. IEEE J. Biomed. Health Inform. 2021, 25, 121–130. [Google Scholar] [CrossRef]
Li, L.; Verma, M.; Nakashima, Y.; Nagahara, H.; Kawasaki, R. IterNet: Retinal image segmentation utilizing structural redundancy in vessel networks. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020, Snowmass Village, CO, USA, 1–5 March 2020; pp. 3645–3654. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Wu, H.; Wang, W.; Zhong, J.; Lei, B.; Wen, Z.; Qin, J. SCS-Net: A Scale and Context Sensitive Network for Retinal Vessel Segmentation. Med Image Anal. 2021, 70, 102025. [Google Scholar] [CrossRef]
Liu, W.; Yang, H.; Tian, T.; Cao, Z.; Pan, X.; Xu, W.; Jin, Y.; Gao, F. Full-Resolution Network and Dual-Threshold Iteration for Retinal Vessel and Coronary Angiograph Segmentation. IEEE J. Biomed. Health Inform. 2022, 26, 4623–4634. [Google Scholar] [CrossRef]
Liu, Y.; Shen, J.; Yang, L.; Bian, G.; Yu, H. ResDO-UNet: A deep residual network for accurate retinal vessel segmentation from fundus images. Biomed. Signal Process. Control 2022, 79, 104087. [Google Scholar] [CrossRef]
Cao, J.; Li, Y.; Sun, M.; Chen, Y.; Lischinski, D.; Cohen-Or, D.; Chen, B.; Tu, C. Do-conv: Depthwise over-parameterized convolutional layer. IEEE Trans. Image Process. 2022, 31, 3726–3736. [Google Scholar] [CrossRef] [PubMed]
Wei, X.; Yang, K.; Bzdok, D.; Li, Y. Orientation and context entangled network for retinal vessel segmentation. Expert Syst. Appl. 2023, 217, 119443. [Google Scholar] [CrossRef]
Khan, T.M.; Naqvi, S.S.; Robles-Kelly, A.; Razzak, I. Retinal vessel segmentation via a Multi-resolution Contextual Network and adversarial learning. Neural Netw. 2023, 165, 310–320. [Google Scholar] [CrossRef]
He, X.; Wang, T.; Yang, W. Research on Retinal Vessel Segmentation Algorithm Based on a Modified U-Shaped Network. Appl. Sci. 2024, 14, 465. [Google Scholar] [CrossRef]
Tan, Y.; Yang, K.F.; Zhao, S.X.; Wang, J.; Liu, L.; Li, Y.J. Deep matched filtering for retinal vessel segmentation. Knowl.-Based Syst. 2024, 283, 111185. [Google Scholar] [CrossRef]
Ding, W.; Sun, Y.; Huang, J.; Ju, H.; Zhang, C.; Yang, G.; Lin, C.T. RCAR-UNet: Retinal vessel segmentation network algorithm via novel rough attention mechanism. Inf. Sci. 2024, 657, 120007. [Google Scholar] [CrossRef]
Liu, M.; Wang, Y.; Wang, L.; Hu, S.; Wang, X.; Ge, Q. IMFF-Net: An integrated multi-scale feature fusion network for accurate retinal vessel segmentation from fundus images. Biomed. Signal Process. Control 2024, 91, 105980. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Ying, X. An overview of overfitting and its solutions. J. Phys. Conf. Ser. 2019, 1168, 022022. [Google Scholar] [CrossRef]
Jiang, Y.; Yao, H.; Tao, S.; Liang, J. Gated skip-connection network with adaptive upsampling for retinal vessel segmentation. Sensors 2021, 21, 6177. [Google Scholar] [CrossRef]
Gu, R.; Wang, G.; Song, T.; Huang, R.; Aertsen, M.; Deprest, J.; Ourselin, S.; Vercauteren, T.; Zhang, S. CA-Net: Comprehensive attention convolutional neural networks for explainable medical image segmentation. IEEE Trans. Med. Imaging 2020, 40, 699–711. [Google Scholar] [CrossRef] [PubMed]
Zhang, S.; Fu, H.; Yan, Y.; Zhang, Y.; Wu, Q.; Yang, M.; Tan, M.; Xu, Y. Attention guided network for retinal image segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, 13–17 October 2019; Proceedings, Part I 22; Springer: Berlin/Heidelberg, Germany, 2019; pp. 797–805. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
Riedmiller, M.; Lernen, A. Multi Layer Perceptron; Machine Learning Lab, University of Freiburg: Freiburg im Breisgau, Germany, 2014; Volume 24. [Google Scholar]
Ghiasi, G.; Lin, T.Y.; Le, Q.V. Dropblock: A regularization method for convolutional networks. Adv. Neural Inf. Process. Syst. 2018, 2018, 10727–10737. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shifts. Int. Conf. Mach. Learn. 2015, 37, 448–456. [Google Scholar]
Sharma, S.; Sharma, S.; Athaiya, A. Activation functions in neural networks. Towards Data Sci. 2017, 6, 310–316. [Google Scholar] [CrossRef]
Jadon, S. A survey of loss functions for semantic segmentation. In Proceedings of the 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Vina del Mar, Chile, 27–29 October 2020; pp. 1–7. [Google Scholar]
Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
Staal, J.; Abràmoff, M.D.; Niemeijer, M.; Viergever, M.A.; Van Ginneken, B. Ridge-based vessel segmentation in color images of the retina. IEEE Trans. Med. Imaging 2004, 23, 501–509. [Google Scholar] [CrossRef]
Fraz, M.M.; Remagnino, P.; Hoppe, A.; Uyyanonvara, B.; Rudnicka, A.R.; Owen, C.G.; Barman, S.A. An ensemble classification-based approach applied to retinal blood vessel segmentation. IEEE Trans. Biomed. Eng. 2012, 59, 2538–2548. [Google Scholar] [CrossRef]
Uysal, E.S.; Bilici, M.Ş.; Zaza, B.S.; Özgenç, M.Y.; Boyar, O. Exploring the limits of data augmentation for retinal vessel segmentation. arXiv 2021, arXiv:2105.09365. [Google Scholar]
Rahman, S.; Rahman, M.M.; Abdullah-Al-Wadud, M.; Al-Quaderi, G.D.; Shoyaib, M. An adaptive gamma correction for image enhancement. EURASIP J. Image Video Process. 2016, 2016, 1–13. [Google Scholar] [CrossRef]
Zhuang, J. LadderNet: Multi-path networks based on U-Net for medical image segmentation. arXiv 2018, arXiv:1810.07810. [Google Scholar]
Li, X.; Jiang, Y.; Li, M.; Yin, S. Lightweight attention convolutional neural network for retinal vessel image segmentation. IEEE Trans. Ind. Inform. 2020, 17, 1958–1967. [Google Scholar] [CrossRef]
Yang, D.; Zhao, H.; Yu, K.; Geng, L. Naunet: Lightweight retinal vessel segmentation network with nested connections and efficient attention. Multimed. Tools Appl. 2023, 82, 25357–25379. [Google Scholar] [CrossRef]
Ryu, J.; Rehman, M.U.; Nizami, I.F.; Chong, K.T. SegR-Net: A deep learning framework with multi-scale feature fusion for robust retinal vessel segmentation. Comput. Biol. Med. 2023, 163, 107132. [Google Scholar] [CrossRef]
Islam, M.T.; Khan, H.A.; Naveed, K.; Nauman, A.; Gulfam, S.M.; Kim, S.W. LUVS-Net: A Lightweight U-Net Vessel Segmentor for Retinal Vasculature Detection in Fundus Images. Electronics 2023, 12, 1786. [Google Scholar] [CrossRef]

Figure 1. PAM-UNet network architecture diagram.

Figure 2. Plenary attention mechanism.

Figure 3. DropBlockschematic diagram. (a) represents a color fundus image (local) inputted into the network. In (b,c) light blue regions represent activated units containing vessel semantic information, while white regions represent activated units containing background semantic information. Each cell represents a pixel. These crosses represent discarded activation units.

Figure 4. DropBlock_Diagonal schematic diagram. (a) represents a color fundus image (local) inputted into the network. (b) shows the effect of Dropout, where the activation units are randomly discarded. Figure (c) illustrates the schematic when

d i a g_l

= 5,

d i a g_t y p e

= “primary”,

n u m_d i a g

= 1 for DropBlock_Diagonal and Figure (d) illustrates the schematic when

d i a g_l

= 5,

d i a g_t y p e

= “secondary”,

n u m_d i a g

= 1.

Figure 4. DropBlock_Diagonal schematic diagram. (a) represents a color fundus image (local) inputted into the network. (b) shows the effect of Dropout, where the activation units are randomly discarded. Figure (c) illustrates the schematic when

d i a g_l

= 5,

d i a g_t y p e

= “primary”,

n u m_d i a g

= 1 for DropBlock_Diagonal and Figure (d) illustrates the schematic when

d i a g_l

= 5,

d i a g_t y p e

= “secondary”,

n u m_d i a g

= 1.

Figure 5. DropBlock_Diagonal placement. The convolutional block is illustrated in (a), DropBlock_Diagonal is specifically added at the location, after the convolutional layer and before BN [40] and Relu [41], as shown in (b) (here, Drop_Diag stands for DropBlock_Diagonal).

Figure 6. Image preprocessing.

Figure 7. Segmentation results of PAM with different reduction rate (r).

Figure 8. DropBlock_Diagonal location discussion.

Figure 9. DropBlock_Diagonal shape discussion.

Figure 10. DRIVE dataset segmentation results.

Figure 11. CHASE_DB1 dataset segmentation results.

Figure 12. DRIVE dataset segmentation details.

Figure 13. CHASE_DB1 dataset segmentation details.

Table 1. Detailed dataset information.

Dataset	Image Size	Number of Images	Mask	Training/Testing
DRIVE	584 × 565	40	✓	20/20
CHASE_DB1	999 × 960	28	✓	20/8

Table 2. Impact of different reduction rates on network models.

Model	Acc/%	Se/%	Sp/%	F1/%
r = 8	97.12	82.69	98.46	82.91
r = 16	97.12	82.13	98.52	82.85
r = 32	97.14	82.17	98.53	82.93

The bold font indicates the optimal value.

Table 3. The model’s loss changes when the network incorporates D70.

	Train_Loss	Val_Loss
D70	0.31542	1.48000
Without D70	0.31251	1.48030

Table 4. Comparison of PAM-UNet with other algorithms on DRIVE dataset.

Method	Acc/%	Se/%	Sp/%	F1/%	AUC/%
UNet [13]	95.31	75.37	98.20	81.42	97.55
R2U-Net [16]	95.56	77.92	98.13	81.71	97.84
LadderNet [48]	95.61	78.56	98.10	82.02	97.93
Xiang Li’s model [49]	95.68	79.21	98.10	-	98.06
SCS-Net [21]	96.97	82.89	98.38	81.89	98.37
ResDo-UNet [23]	95.61	79.85	97.91	82.29	-
NAUNet [50]	96.71	79.99	98.31	-	98.31
SegR-Net [51]	-	82.06	98.14	80.97	-
LUVS-Net [52]	96.92	82.58	98.30	-	82.44
OCE-Net [25]	95.81	80.18	98.26	83.02	98.21
MU-Net [27]	96.90	81.97	98.33	-	98.53
PAM-UNet	97.15	83.16	98.45	83.15	98.66

The bold font indicates the optimal value.

Table 5. Comparison of PAM-UNet with other algorithms on CHASE_DB1 dataset.

Method	Acc/%	Se/%	Sp/%	F1/%	AUC/%
UNet [13]	95.78	82.88	97.01	77.83	97.72
R2U-Net [16]	96.34	77.56	98.20	79.28	98.15
LadderNet [48]	96.56	79.78	98.18	80.31	98.39
Xiang Li’s model [49]	96.35	78.18	98.19	-	98.10
SCS-Net [21]	97.44	83.65	98.39	-	98.67
ResDo-UNet [23]	96.72	80.20	97.94	82.36	-
NAUNet [50]	96.83	80.71	98.20	-	98.41
SegR-Net [51]	-	83.29	98.38	80.30	-
LUVS-Net [52]	97.38	82.69	98.46	-	81.27
OCE-Net [25]	96.78	81.38	98.24	81.96	98.72
MU-Net [27]	97.52	83.13	98.49	-	98.60
PAM-UNet	97.65	85.82	98.46	82.56	98.95

The bold font indicates the optimal value.

Table 6. Results of ablation experiments.

Model	F1/%	AUC/%	Parameters	FLOPs
Baseline	82.87	98.56	4.28 M	153.39 G
Baseline + PAM	82.93	98.56	4.29 M	153.39 G
Baseline + Diag	83.01	98.59	4.28 M	153.39 G
Baseline + MFMM	82.88	98.59	4.29 M	154.36 G
Baseline + PAM + Diag	83.05	98.60	4.29 M	153.39 G
Baseline + Diag + MFMM	83.02	98.62	4.29 M	154.36 G
Baseline + PAM + MFMM	82.94	98.60	4.30 M	154.36 G
PAM-UNet	83.15	98.66	4.30 M	154.36 G

The bold font indicates the optimal value.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Wu, S.; Jia, J. PAM-UNet: Enhanced Retinal Vessel Segmentation Using a Novel Plenary Attention Mechanism. Appl. Sci. 2024, 14, 5382. https://doi.org/10.3390/app14135382

AMA Style

Wang Y, Wu S, Jia J. PAM-UNet: Enhanced Retinal Vessel Segmentation Using a Novel Plenary Attention Mechanism. Applied Sciences. 2024; 14(13):5382. https://doi.org/10.3390/app14135382

Chicago/Turabian Style

Wang, Yongmao, Sirui Wu, and Junhao Jia. 2024. "PAM-UNet: Enhanced Retinal Vessel Segmentation Using a Novel Plenary Attention Mechanism" Applied Sciences 14, no. 13: 5382. https://doi.org/10.3390/app14135382

APA Style

Wang, Y., Wu, S., & Jia, J. (2024). PAM-UNet: Enhanced Retinal Vessel Segmentation Using a Novel Plenary Attention Mechanism. Applied Sciences, 14(13), 5382. https://doi.org/10.3390/app14135382

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PAM-UNet: Enhanced Retinal Vessel Segmentation Using a Novel Plenary Attention Mechanism

Abstract

1. Introduction

2. Methods

2.1. Overall Network Architecture

2.2. Plenary Attention Mechanism

2.3. DropBlock_Diagonal

2.4. Multi-Scale Feature Merging Module

2.5. Loss Function

3. Experiments

3.1. Experimental Platform and Training Parameter Settings

3.2. Datasets

3.3. Evaluation Metrics

4. Results

4.1. Reduction Rate in PAM

4.2. The Location of DropBlock_Diagonal

4.3. Shape of DropBlock_Diagonal

4.4. Vessel Segmentation Results

4.5. Comparison with the Existing Methods

4.6. Ablation Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI