1. Introduction
Retinal blood vessels are the only part of the systemic blood vessels that can be observed in a non-invasive way. The changes of retinal blood vessels, such as the width, angle and branching shape of blood vessels, which can be used as the diagnostic basis for vascular-related diseases [
1,
2], for example, patients with glaucoma may develop into neurovascular atrophy, micro aneurysms and cotton wool spots [
3,
4]. Therefore, the segmentation of retinal blood vessels in retinal image is of great significance [
5]. However, retinal images are easily affected by different illumination, the same retinal vessel always presents different qualities; the contrast between capillaries and background is small and easily confusing; doctors with different experiences often segment the same retinal vessel into different results and this often takes effort and time. According to above reasons, which lead to the difficulty of retinal blood vessel segmentation, thus, the machine learning-based retinal vessel segmentation methods are urgently needed [
6,
7].
The machine learning-based retinal vessel segmentation methods can be always divided into deep learning-based methods and non-deep learning-based methods. Among non-deep learning methods, it was the first proposed to use a Gaussian filter to segment the blood vessel image, which used the features of the blood vessel to solve the difficulties for segmentation, e.g., low contrast of the local blood vessel [
8]. A B-Combination of Shifted Filter Responses (COSFIRE) filter [
9] was proposed with selective response to automatically segment blood vessel trees and this filter achieved good results. Based on the above, the authors presented a segmentation method combining multi-scale matched filtering and double-threshold, which could reduce the influence of background noise [
10]. Except for the filter-based method, the authors proposed a segmentation method with iterative adaptive threshold, which iteratively discovered new vascular pixels through global threshold of enhanced images and adaptive threshold of residual images [
11]. In [
12], the authors presented a K-Nearest Neighbor (KNN)-based approach, which used sequential forward feature selection to distinguish vascular and non-vascular. In [
13], a retinal vessel segmentation method based on morphological component analysis (MCA) was proposed to solve the problem of false positive vessels. However, the non-deep learning-based retinal vessel segmentation methods often relied on empirical feature extraction and led to poor segmentation effect. For this reason, deep learning-based methods were proposed to be used for segmentation.
In order to enhance the segmentation effect, more and more deep learning-based methods had been proposed in the field of retinal vascular segmentation. The method used deep neural network to segment retinal blood vessels, which was pre-processed by global contrast normalization, zero phase whitening, geometric transformation and Gamma calibration data, and it had good resistance to central vascular reflex phenomenon and could segment blood vessels well [
14]. Guo et al. [
15] proposed a Bottom-top short connections deeply supervised network (BTS-DSN) method, which could transmit semantic information and structural information of retinal images through a short connection neural network. In [
16], the author proposed a retinal blood vessel segmentation method that combined Convolutional Neural Networks (CNN) quantization and pruning. In [
16], quantitative technology was used in the fully connected layer, and the pruning strategy was used in the convolutional layer to form an effective network and improve the accuracy of segmentation. Wu et al. [
17] employed SCS-Net (Scale and Context Sensitive Network), which designed an adaptive feature fusion (AFF) module to guide efficient fusion between adjacent hierarchical features to capture more semantic information. The method introduced above was based on the traditional convolutional neural network. The current mainstream was a symmetrical Fully Convolutional Networks (FCN) network for retinal vessel segmentation called U-Net [
18], which had achieved good results in the field of retinal vessel segmentation. On the basis of U-Net, Wu et al. [
19] proposed a new lightweight retinal vessel segmentation model called Vessel-Net, which added an efficient initial residual convolution block to the structure of U-Net. The model combined the advantages of U-Net and residual module to improve feature representation, which greatly improved the segmentation performance. Mou et al. [
20] introduced a new curvilinear structure segmentation network (CS2-Net), which included a self-attention mechanism in the encoder and decoder. The two attention modules were used to enhance inter-class discrimination and inter-class responsiveness. Finally, experiments were carried out on datasets to verify the effectiveness of the method. Kwon et al. [
21] proposed an advanced confrontational training method to defend against the interference of unknown confrontational examples. The author used the Fast Gradient Sign Method (FGSM) to generate a fixed range of value of the adversarial example for training. The experiments showed that the U-Net model trained on different adversarial examples was more robust to unknown adversarial examples. Qilong Fu et al. [
22] proposed a Multi-Scale Convolutional Neural Network with Attention Mechanisms (MSCNN-AM). The model introduced various separable convolutions with varying dilation rates, which could capture global and multi-scale vessel information better. Meanwhile, in order to reduce false-positive predictions for tiny vessel pixels, the author also adopted attention mechanisms so that the proposed MSCNN-AM could pay more attention to retinal vessel pixels instead of background pixels, and experiments showed that the model achieved good results. In [
23], the author proposed a Multiscale Channel Attention Network based on the encoder-decoder architecture for fundus retinal vessel segmentation, which redesigned encoder extracts the multi-scale structure information of retinal vessels by fusing multi-scale features, which weakened the influence of complex morphology on segmentation performance. The results on the DRIVE dataset and the CHASE_DB1 dataset showed that the network could segmented the capillaries in fundus retinal images well. So far, the network evolved from U-Net dominated the field of retinal blood vessel segmentation, but they also had certain limitations. Specifically, U-Net divides the fundus retinal image into image blocks, and then inputs them into the network. It does not carefully extract the features of each pixel, so it loses the detailed features in the retinal blood image.
In order to solve this problem, this paper proposes a multiple multi-scale neural network knowledge transfer and integration method for accurate pixel-level retinal blood vessel segmentation. This method has the following characteristics: (1) it can fuse different retinal features with different sizes and different depth via multiple multi-scale neural networks to obtain better segmented results; (2) using the knowledge transfer avoids excessively long network training; (3) using networks integration to get better segmented results. Combining the above three characteristics, the segmentation effect can be improved effectively.
The remainder of the paper is organized as follows:
Section 2 introduces the details of the proposed method; experimental results are reported in
Section 3; in
Section 4 and
Section 5, discussions and conclusions are presented.
2. The Proposed Method
The proposed method includes data pre-processing, multiple multi-scale networks knowledge transfer and integration, neural network training and network testing.
2.1. Data Pre-Processing
For simplicity, we assume that the retinal blood vessel dataset is
, the retinal blood vessel images in the dataset are denoted as
,
,…,
, the images of labels in the dataset are denoted as
,
,…,
. Taking
as an example,
and
are the length and width of
, respectively. Firstly, padding the image
, the padded image is represented by
. The length and width of
are denoted by
and
, respectively. The size of the extracted image block is
, the padded retinal image
is shown in Equation (1).
Secondly, starting from the first pixel of the retinal blood vessel image, traversing the n-th entire image to obtain image blocks with the size of , the whole blocks extracted from n-th image are denoted as , each block respects the attribute of the center pixel.
Thirdly, getting the label of the corresponding image block from the image of labels via traversing, all of the labels in n-th image of labels are denoted as . The image block and the corresponding label compose the sample pair, which is denoted as .
Taking the open fundus retinal image dataset DRIVE as an example, there are 40 fundus retinal images in the dataset, 20 images are retinal vessel images and the others are label images.
The whole sample pairs are extracted by the above three steps, which are denoted as , where, N denotes the number of the retinal vessel images.
2.2. Multiple and Multi-Scales Networks Knowledge Transfer and Integration
2.2.1. Multiple Networks Integration
VGG-16 is a classic network model used for ImageNet dataset classification, which is often applied for transfer learning tasks [
24,
25]. VGG-16 network is composed of 13 convolutional layers, five pooling layers and three fully connected layers. In this paper, we use previous 2, 4, 7 convolutional layers of VGG-16 to establish new different models as multiple networks, which are called VGG-16-2, VGG-16-4 and VGG-16-7, respectively.
The detailed structures of VGG-16-2, VGG-16-4 andVGG-16-7 are shown in
Table 1.
As shown in
Table 1, conv3-64 indicates that the size of the convolution kernel is 3 × 3 and the number of channels is 64. FC-1024 indicates that the layer is a fully connected layer and the number of hidden nodes in the fully connected layer is 1024. The schematic diagram of network structures of the three networks are shown in
Figure 1,
Figure 2 and
Figure 3.
The integrated model includes three different networks, VGG-16-2, VGG-16-4 and VGG-16-7, which are shown in
Figure 4. The final classification result is determined by the multiple networks integration, as shown in Equation (2):
In Equation (2),
represents the final classification result,
represents the classification result of each network classifier, and
represents the number of network classifiers. When the sum of the classification results of multiple network classifiers is greater than or equal to
, the final classification result is 1, which represents the blood vessel pixel point. In contrast, it is the background pixel point. The schematic diagram of multi-network integration is shown in
Figure 4.
2.2.2. Multiple and Multi-Scale Networks Integration
In order to extract features with different sizes, the input blocks can be with different sizes. If each pixel is denoted by the corresponding image blocks with different sizes, the pixel can have different features in the same nodes of the same layer in the same neural network. For this reason, we use groups of image blocks with different sizes as the network input for obtaining different features. The structure of the multiple and multi-scale network integration model is shown in
Figure 5.
2.2.3. Knowledge Transfer
A transfer learning algorithm is usually used to decrease the network training. VGG-16 network is designed to process the ImageNet classification dataset and obtain an effective result. Through VGG-16 network trained by ImageNet, retinal blood vessel images are finetuned, which can be easily achieved by the effective network model after short training.
In order to reduce the training time, the method of transfer learning is adopted and the parameters of convolutional layers of VGG-16 pre-trained by the ImageNet dataset are used. The retinal blood vessel images are used for finetuning the CNN models with 10 iterations to get the testing result.