In order to better reflect the proposed U-Net based improved network for retinal blood vessel segmentation of the image effect, the use of comparative experiments and ablation experiments to prove its effectiveness.
This section first describes the relevant evaluation indicators, then tests the basic segmentation network structure and compares it with the proposed method.
3.1. Evaluation Indicators
In order to better highlight the effectiveness of the proposed model, some evaluation indicators, including accuracy, F1-score, sensitivity, specificity, and precision, were used to evaluate the segmentation ability of retinal vascular images.
A
cc (Accuracy): This reflects the proportion of correctly classified blood vessels and background pixels to the total number of pixels (see Equation (1)).
Among them, TP and TN represent correctly segmented retinal vessels and background pixels, respectively, FP and FN represent incorrectly segmented retinal vessels and background pixels, respectively.
F
1-score: The ability to measure the accuracy of a dichotomous model, taking into account both model accuracy and recall, is a harmonic mean of both accuracy and recall (see Equation (2)).
S
e (Sensitivity): Also known as true positive rate (TPR), this represents the proportion of retinal vessels correctly identified (see Equation (3)).
S
p (Specificity): Also known as true negative rate (TNR), this represents the proportion of pixel points with correct background pixel classification to the total number of background pixels (see Equation (4)).
P
r (Precision): This represents the proportion of the number of correctly segmented retinal vessel pixels to the total number of segmented retinal vessel pixels (see Equation (5)).
3.2. Experimental Setup
The proposed model is based on the Py-Torch framework. For training of the model, the epoch for training was set to 200. We used the SGD optimizer, setting the learning rate to 1 × 10
−2, the momentum to 0.9, and the weight decay to 1 × 10
−4. The batch size was set to 4. In addition, in order to speed up the network training and testing process, the Nvidia GeForce RTX5000 TI card was used on the above-mentioned experimental process (See
Figure 7).
For the loss function, cross entropy loss plus dice loss was selected. Cross-entropy loss, which is a common loss function in semantic segmentation, can not only obtain the difference in prediction probability but also measure the performance of different classifiers in more detail.
3.2.1. Cross-Entropy Loss
Cross-entropy is a very important concept in information theory, whose main purpose is to measure the difference between two probability distributions. For image segmentation, cross-entropy loss is calculated by the average cross entropy of all pixels. The definition of cross-entropy loss is:
If Ω represents a pixel region, the pixel region consists of height a, width b, and K classes. Then there is , (see Equation (6)).
3.2.2. Dice Loss
where
X denotes the pixel tag of a true-segmented retinal vessel, and
Y denotes the pixel category of the model-predicted retinal vessel-segmented image (see Equation (7)).
3.3. Ablation Experiments
In order to prove the effectiveness of each method, the ablation experiment is used to study the performance of each module for segmented images. First, the basic U-Net network is tested, and then the performance analysis is carried out by adding ResNest Block and DFB network structures. The results are shown in
Table 4’s ablation experiment results.
From the results of the ablation experiment, it can be seen that, compared with the basic network structure model, our proposed model has improved in all indicators, especially in F1-score and Pr indicators. √ in the table means the module has been added, and × means the module has not been added.
On the DRIVE dataset, F1-score improved from 0.8278 to 0.8687, and Pr improved from 0.7287 to 0.7863. When the DFB module is added, the F1-score, Sp, and Pr indicators were all improved, which is enough to prove the effectiveness of the proposed DFB module.
In the Chase_DB1 dataset, the performance indicator can also show that the proposed model has a good effect. The key indicators F1-score and Pr are more effective, and the other indicators also show more outstanding performance than the basic network model.
From ablation experiments of the above two datasets, we can see that, except for Acc and Se, by adding the DFB module, the other performance indicators have been improved. Although Acc and Se decreased slightly after the addition of the DFB module, the performance of the proposed model is improved, which shows the validity of the proposed model for retinal image segmentation.
3.4. Comparison Test with Other Models
In order to better highlight the effect of the proposed model, we conducted a series of experiments in two public datasets, showing the effect of retinal blood vessel segmentation and corresponding indicators. The results were compared with more advanced models, with quantitative analysis and qualitative analyses of relevant indicators.
As shown in
Table 5, the F
1-score of the proposed improved network structure based on U-Net on the DRIVE dataset can reach 0.8687, the score on S
p can reach 0.9703, and the score on P
r can reach 0.7863, which are higher than the scores of other models. Compared to the other best data, the difference between A
cc and the best score was 0.0080, and the difference between S
e scores was 0.0842.
On the CHASE_DB1 dataset, as shown in
Table 6, the proposed improved network structure based on U-Net still obtains relatively good results. Among them, F
1-score can reach 0.8358, S
p score can reach 0.9720, P
r score can reach 0.7330; these scores are also the highest compared with other models. Compared to the other best data, the numerical difference between A
cc and the best score was 0.0060 and between S
e and the best score was 0.1207.
Obviously, the network structure has obvious segmentation effect in the two data sets. From the contrast experiment, the F1-score, Sp, and Pr of the improved model based on U-Net are the highest on DRIVE dataset and CHASE_DB1 dataset, but the effect of the change on Acc and Se is not obvious. Among them, the effect of Acc is not significantly reduced compared to other advanced models, so the improvement of Acc is effective on the whole.
The Se effect is not ideal or even slightly decreased in the ablation test and comparison test above, which may be caused by an excessive proportion of background noise pixels. Because the background pixel noise is too high, this will lead to missegmentation of the background pixel, resulting in an increase in FN in Equation (3), which will decrease Se. In addition, other noise factors may be introduced in the identification process, which may be identified as background pixels, so that the proportion of correctly identified retinal blood vessel pixels measured may be reduced, which may also lead to a slight decrease in the Se indicator.
In order to prove that the inference that the Se indicator obviously decreases in the above experiment is correct, we adopt the opening and closing operation on the picture to remove the noise of the background pixels and analyze the influence of noise on the Se indicator.
From the analysis of the S
e performance indicator by denoising in
Table 7, it can be seen that the performance indicator of S
e rises after denoising, which proves that the inference we made above is correct. On the whole, the proposed model has better segmentation performance.
We did the same for A
cc and found that the denoised A
cc metric also increased. The results are presented in
Table 8.