Figure 1.
(a) Framework of the Mask R-CNN method used for retinal and choroid layer segmentation of OCT images. (b) Faster R-CNN workflow, receiving the image in the first stage to extract features at different scales, to then, generate anchors for object detection at each feature map, keeping the ones with more retinal layer information (non-background) and classifying the object inside the selected region of interest.
Figure 1.
(a) Framework of the Mask R-CNN method used for retinal and choroid layer segmentation of OCT images. (b) Faster R-CNN workflow, receiving the image in the first stage to extract features at different scales, to then, generate anchors for object detection at each feature map, keeping the ones with more retinal layer information (non-background) and classifying the object inside the selected region of interest.
Figure 2.
Example of an OCT image from the retinal dataset (a,b) and choroidal boundaries dataset (c,d). Left plots shows the boundaries of interest, while right plots show their corresponding class maps, which are extracted by the network.
Figure 2.
Example of an OCT image from the retinal dataset (a,b) and choroidal boundaries dataset (c,d). Left plots shows the boundaries of interest, while right plots show their corresponding class maps, which are extracted by the network.
Figure 3.
Retinal and choroid layer ratio (width/height) histograms for the dataset. Ratio is calculated from a frame (i.e., bounding box) which covers the maximum horizontal (width) and vertical (height) distances of each layer belonging to the raw OCT images without any pre-processing or flattening of the retinal area. The plots show the distribution of the ratios for each layer of the dataset and informed the selection of the anchor ratios hyper-parameter.
Figure 3.
Retinal and choroid layer ratio (width/height) histograms for the dataset. Ratio is calculated from a frame (i.e., bounding box) which covers the maximum horizontal (width) and vertical (height) distances of each layer belonging to the raw OCT images without any pre-processing or flattening of the retinal area. The plots show the distribution of the ratios for each layer of the dataset and informed the selection of the anchor ratios hyper-parameter.
Figure 4.
Example of the five crops obtained with “specific_crop” over an annotated OCT image used for training. The main crops (A–C) are joined with the support of the auxiliary crop (D,E) to produce a full image for evaluation performance.
Figure 4.
Example of the five crops obtained with “specific_crop” over an annotated OCT image used for training. The main crops (A–C) are joined with the support of the auxiliary crop (D,E) to produce a full image for evaluation performance.
Figure 5.
Graphical example of two different OCT images (Raw Image), showing their retinal layers ground truth (G.T.), and segmentation output probability maps when using the Mask R-CNN, U-Net, FCN and DeeplabV3 methods on each image respectively. Each row shows the proper segmentation map for the OCT image showed in columns (a,b).
Figure 5.
Graphical example of two different OCT images (Raw Image), showing their retinal layers ground truth (G.T.), and segmentation output probability maps when using the Mask R-CNN, U-Net, FCN and DeeplabV3 methods on each image respectively. Each row shows the proper segmentation map for the OCT image showed in columns (a,b).
Figure 6.
Graphical example of two different OCT images (Raw Image), showing their retinal layers ground truth boundary annotations (G.T.), and extracted boundaries from the obtained probability maps when using the Mask R-CNN, U-Net, FCN and DeeplabV3 methods on each image respectively. Each row shows the extracted boundaries for the OCT image showed in columns (a,b).
Figure 6.
Graphical example of two different OCT images (Raw Image), showing their retinal layers ground truth boundary annotations (G.T.), and extracted boundaries from the obtained probability maps when using the Mask R-CNN, U-Net, FCN and DeeplabV3 methods on each image respectively. Each row shows the extracted boundaries for the OCT image showed in columns (a,b).
Figure 7.
Graphical example of two different OCT images (Raw Image), showing their full retina and choroid tissue ground truth (G.T.), and segmentation output probability maps when using the Mask R-CNN, U-Net, FCN and DeeplabV3 methods on each image respectively. Each row shows the proper segmentation map for the OCT image showed in columns (a,b).
Figure 7.
Graphical example of two different OCT images (Raw Image), showing their full retina and choroid tissue ground truth (G.T.), and segmentation output probability maps when using the Mask R-CNN, U-Net, FCN and DeeplabV3 methods on each image respectively. Each row shows the proper segmentation map for the OCT image showed in columns (a,b).
Table 1.
Mean (standard deviation) Dice coefficient (%) for the individual retinal layer and overall performance for different model initializations. Each value represents the mean of four independent runs.
Table 1.
Mean (standard deviation) Dice coefficient (%) for the individual retinal layer and overall performance for different model initializations. Each value represents the mean of four independent runs.
Init. | ILM-NFL | NFL-IPL | IPL-OPL | OPL-ELM | ELM-ISOS | ISOS-RPE | Overall |
---|
Scratch/All | 94.85 (0.08) | 91.04 (0.67) | 87.78 (0.21) | 88.27 (0.37) | 87.50 (0.59) | 88.90 (1.03) | 92.03 (0.10) |
COCO/All | 95.82 (0.46) | 92.09 (0.67) | 90.56 (1.24) | 90.61 (0.87) | 90.37 (0.98) | 91.89 (0.52) | 93.70 (0.59) |
COCO/3+ | 96.11 (0.04) | 92.79 (0.34) | 91.28 (0.24) | 91.17 (0.41) | 90.91 (0.26) | 92.15 (0.20) | 94.13 (0.09) |
COCO/4+ | 95.46 (0.38) | 91.93 (0.24) | 89.72 (0.56) | 90.17 (0.51) | 89.21 (0.70) | 90.16 (0.96) | 93.14 (0.42) |
ImageNet/All | 95.73 (0.15) | 92.02 (0.17) | 89.36 (0.52) | 89.58 (0.20) | 89.96 (0.39) | 91.43 (0.17) | 93.32 (0.14) |
ImageNet/3+ | 95.52 (0.28) | 91.01 (1.27) | 88.57 (1.70) | 89.22 (0.81) | 89.98 (0.74) | 90.73 (1.39) | 93.01 (0.68) |
ImageNet/4+ | 95.39 (0.57) | 91.01 (0.99) | 88.58 (1.81) | 88.57 (2.51) | 88.89 (1.62) | 90.11 (1.81) | 92.68 (1.02) |
Table 2.
Mean (standard deviation) Dice coefficient (%) comparison between using the default and custom bounding boxes ratios and size for each individual and overall retinal layer. Each value represents the mean of four independent runs.
Table 2.
Mean (standard deviation) Dice coefficient (%) comparison between using the default and custom bounding boxes ratios and size for each individual and overall retinal layer. Each value represents the mean of four independent runs.
B.Box | ILM-NFL | NFL-IPL | IPL-OPL | OPL-ELM | ELM-ISOS | ISOS-RPE | Overall |
---|
Default | 94.89 (0.22) | 89.73 (0.63) | 90.28 (0.54) | 83.24 (2.51) | 60.76 (8.28) | 54.83 (7.36) | 86.71 (1.55) |
Custom | 96.11 (0.04) | 92.79 (0.34) | 91.28 (0.24) | 91.17 (0.41) | 90.91 (0.26) | 92.15 (0.20) | 94.13 (0.09) |
Table 3.
Mean (standard deviation) Dice coefficient (%) of the Mask R-CNN (coco/3+ and custom ratios) compared with the benchmark U-Net method and a pretrained FCN and DeeplabV3 methods for each individual and overall retinal layer. Each value represents the mean of four independent runs.
Table 3.
Mean (standard deviation) Dice coefficient (%) of the Mask R-CNN (coco/3+ and custom ratios) compared with the benchmark U-Net method and a pretrained FCN and DeeplabV3 methods for each individual and overall retinal layer. Each value represents the mean of four independent runs.
Method | ILM-NFL | NFL-IPL | IPL-OPL | OPL-ELM | ELM-ISOS | ISOS-RPE | Overall |
---|
U-Net | 94.32 (0.32) | 96.74 (0.21) | 93.44 (0.60) | 97.84 (0.32) | 94.78 (0.74) | 98.06 (0.12) | 96.58 (0.30) |
Mask R-CNN | 96.11 (0.04) | 92.79 (0.34) | 91.28 (0.24) | 91.17 (0.41) | 90.91 (0.26) | 92.15 (0.20) | 94.13 (0.09) |
FCN | 90.97 (0.12) | 93.75 (0.11) | 87.75 (0.04) | 95.97 (0.02) | 85.59 (0.06) | 95.43 (0.01) | 93.25 (0.04) |
DeeplabV3 | 90.62 (0.14) | 93.82 (0.09) | 87.23 (0.19) | 95.99 (0.05) | 82.78 (0.18) | 95.38 (0.05) | 92.98 (0.01) |
Table 4.
Mean signed and absolute boundary errors (in pixels) from the retinal layer boundaries obtained after the corresponding post processing steps for Mask R-CNN, U-Net, FCN and DeeplabV3 methods. The average time of inference per image in seconds is presented. Each value represents the mean of four independent runs.
Table 4.
Mean signed and absolute boundary errors (in pixels) from the retinal layer boundaries obtained after the corresponding post processing steps for Mask R-CNN, U-Net, FCN and DeeplabV3 methods. The average time of inference per image in seconds is presented. Each value represents the mean of four independent runs.
Method | ILM | NFL | IPL | OPL | ELM | ISOS | RPE | Average | Time/Img. |
---|
Mean signed error |
U-Net | −0.494 (0.067) | −0.408 (0.080) | −0.491 (0.050) | −0.623 (0.175) | −0.426 (0.165) | −0.500 (0.020) | −0.489 (0.117) | −0.490 (0.019) | 8.21 (0.597) |
Mask R-CNN | −0.136 (0.064) | −0.018 (0.127) | 0.042 (0.043) | 0.054 (0.086) | −0.031 (0.038) | 0.042 (0.049) | −0.559 (0.154) | −0.087 (0.080) | 3.25 (0.008) |
FCN | 0.492 (0.013) | 0.368 (0.008) | 0.500 (0.005) | −0.640 (0.044) | 0.438 (0.019) | 0.515 (0.025) | 0.497 (0.016) | 0.310 (0.017) | 2.87 (0.048) |
DeeplabV3 | 0.092 (0.022) | 0.439 (0.020) | 0.355 (0.040) | 0.390 (0.035) | 0.247 (0.025) | 0.521 (0.027) | 0.561 (0.020) | 0.372 (0.025) | 4.83 (0.353) |
Mean absolute error |
U-Net | 0.631 (0.040) | 0.760 (0.006) | 0.694 (0.032) | 0.822 (0.127) | 0.556 (0.059) | 0.559 (0.015) | 0.624 (0.084) | 0.664 (0.029) | 8.21 (0.597) |
Mask R-CNN | 0.969 (0.004) | 0.962 (0.016) | 0.817 (0.006) | 0.814 (0.013) | 0.548 (0.006) | 0.513 (0.004) | 0.862 (0.068) | 0.784 (0.017) | 3.25 (0.008) |
FCN | 0.839 (0.011) | 0.860 (0.007) | 0.904 (0.003) | 0.971 (0.032) | 0.616 (0.009) | 0.719 (0.002) | 0.744 (0.019) | 0.808 (0.010) | 2.87 (0.048) |
DeeplabV3 | 1.082 (0.048) | 1.060 (0.014) | 0.873 (0.014) | 0.797 (0.010) | 0.634 (0.007) | 0.680 (0.012) | 0.732 (0.009) | 0.837 (0.015) | 4.83 (0.353) |
Table 5.
Mean (standard deviation) Dice coefficient (%) of the Mask -R-CNN (coco/3+ and custom ratios) compared with the benchmark U-Net method and a pretrained FCN and DeelpabV3 methods for the full retina (ILM-RPE) and choroid (RPE-CSI) regions segmentation. Each value represents the mean of four independent runs.
Table 5.
Mean (standard deviation) Dice coefficient (%) of the Mask -R-CNN (coco/3+ and custom ratios) compared with the benchmark U-Net method and a pretrained FCN and DeelpabV3 methods for the full retina (ILM-RPE) and choroid (RPE-CSI) regions segmentation. Each value represents the mean of four independent runs.
Method | ILM-RPE | RPE-CSI | Overall |
---|
U-Net | 99.50 (0.01) | 97.94 (0.07) | 98.79 (0.03) |
Mask R-CNN | 97.21 (0.19) | 96.66 (0.23) | 96.99 (0.07) |
FCN | 98.86 (0.01) | 97.72 (0.05) | 98.35 (0.02) |
DeeplabV3 | 98.87 (0.03) | 97.79 (0.02) | 98.46 (0.01) |
Table 6.
Mean signed and absolute boundary errors (in pixels) from the retinal and choroidal layer boundaries obtained after the corresponding post-processing steps for Mask R-CNN U-Net, FCN and DeeplabV3 methods. The average time of inference per image in seconds is presented. Each value represents the mean of four independent runs.
Table 6.
Mean signed and absolute boundary errors (in pixels) from the retinal and choroidal layer boundaries obtained after the corresponding post-processing steps for Mask R-CNN U-Net, FCN and DeeplabV3 methods. The average time of inference per image in seconds is presented. Each value represents the mean of four independent runs.
Method | ILM | RPE | CSI | Average | Time/Img. |
---|
Mean signed error |
U-Net | −0.456 (0.040) | −0.447 (0.073) | −0.861 (0.432) | −0.588 (0.163) | 8.82 (0.622) |
Mask R-CNN | −0.063 (0.133) | 0.125 (0.119) | −0.219 (0.452) | −0.052 (0.183) | 1.91 (0.010) |
FCN | 0.526 (0.036) | 0.461 (0.276) | −0.381
(0.122) | 0.202 (0.144) | 1.83 (0.024) |
DeeplabV3 | 0.527 (0.034) | 0.376 (0.053) | 0.362 (0.095) | 0.422 (0.039) | 2.89 (0.017) |
Mean absolute error |
U-Net | 0.601 (0.015) | 0.596 (0.041) | 2.539 (0.097) | 1.245 (0.043) | 8.82 (0.622) |
Mask R-CNN | 0.960 (0.017) | 0.941 (0.017) | 2.284 (0.081) | 1.395 (0.037) | 1.91 (0.010) |
FCN | 0.744 (0.015) | 0.870 (0.050) | 2.193 (0.069) | 1.269 (0.044) | 1.83 (0.024) |
DeeplabV3 | 0.736 (0.016) | 0.763 (0.020) | 2.079 (0.023) | 1.192 (0.016) | 2.89 (0.017) |
Table 7.
Mean (standard deviation) pixel accuracy (Acc.), precision (Pr.), recall (Rc.), and specificity (Sp.) overall performance for the Mask R-CNN, U-Net, FCN and DeeplabV3 methods for the six retinal regions (ILM-RPE, NFL-IPL, IPL-OPL, OPL-ELM, ELM-ISOS, ISOS-RPE) and retina/choroid segmentation. Each value represents the mean of four independent runs.
Table 7.
Mean (standard deviation) pixel accuracy (Acc.), precision (Pr.), recall (Rc.), and specificity (Sp.) overall performance for the Mask R-CNN, U-Net, FCN and DeeplabV3 methods for the six retinal regions (ILM-RPE, NFL-IPL, IPL-OPL, OPL-ELM, ELM-ISOS, ISOS-RPE) and retina/choroid segmentation. Each value represents the mean of four independent runs.
Metric | Mask R-CNN | U-Net | FCN | DeeplabV3 |
---|
Retinal regions |
Acc. | 99.64 (0.01) | 99.74 (0.01) | 99.67 (0.01) | 99.43 (0.01) |
Pr. | 94.22 (0.14) | 95.60 (0.03) | 93.14 (0.02) | 98.23 (0.02) |
Rc. | 90.76 (0.13) | 94.78 (0.05) | 93.37 (0.04) | 98.28 (0.01) |
Sp. | 99.87 (0.01) | 99.88 (0.01) | 99.83 (0.01) | 99.66 (0.04) |
Retina/Choroid |
Acc. | 99.25 (0.04) | 99.55 (0.01) | 99.51 (0.01) | 99.43 (0.01) |
Pr. | 99.08 (0.08) | 98.51 (0.24) | 98.27 (0.07) | 99.34 (0.02) |
Rc. | 95.84 (0.29) | 98.38 (0.20) | 98.24 (0.03) | 99.33 (0.01) |
Sp. | 99.85 (0.01) | 99.75 (0.04) | 99.70 (0.01) | 99.51 (0.01) |
Table 8.
Cross-dataset analysis on the full retinal tissue segmentation results, presenting the mean (standard deviation) Dice coefficient (%) results. Three different training/testing combinations were included for each model, including: (i) The original using only the original dataset (O.D./O.D.), (ii) training adding the AMD dataset images (O.D. + AMD) and testing only on O.D. and (iii) training adding the AMD dataset images (O.D. + AMD) and testing only on AMD. O.D. denotes original healthy dataset imaged with the Spectralis device and the AMD dataset denotes the pathology dataset imaged with Bioptigen device.
Table 8.
Cross-dataset analysis on the full retinal tissue segmentation results, presenting the mean (standard deviation) Dice coefficient (%) results. Three different training/testing combinations were included for each model, including: (i) The original using only the original dataset (O.D./O.D.), (ii) training adding the AMD dataset images (O.D. + AMD) and testing only on O.D. and (iii) training adding the AMD dataset images (O.D. + AMD) and testing only on AMD. O.D. denotes original healthy dataset imaged with the Spectralis device and the AMD dataset denotes the pathology dataset imaged with Bioptigen device.
Trained/Tested Data | Mask R-CNN | U-Net | FCN | DeeplabV3 |
---|
O.D./O.D. | 97.21 (0.19) | 99.50 (0.01) | 98.86 (0.01) | 98.87 (0.03) |
O.D. + AMD/O.D. | 97.41 (0.26) | 99.35 (0.01) | 99.61 (0.01) | 99.60 (0.01) |
O.D. + AMD/AMD | 96.40 (0.40) | 99.55 (0.02) | 99.83 (0.01) | 99.83 (0.01) |
Table 9.
Cross-dataset analysis on the full retinal tissue segmentation results, presenting the mean (standard deviation) mean signed and absolute boundary errors (in pixels) results. Three different training/testing combinations were included for each model, including: (i) Using only the original dataset (O.D./O.D.), (ii) training adding the AMD dataset images (O.D. + AMD), and testing only on O.D. and (iii) training adding the AMD dataset images (O.D. + AMD), and testing only on AMD. The O.D. dataset denotes original healthy dataset imaged with the Spectralis device and the AMD dataset denotes the pathology dataset imaged with the Bioptigen device.
Table 9.
Cross-dataset analysis on the full retinal tissue segmentation results, presenting the mean (standard deviation) mean signed and absolute boundary errors (in pixels) results. Three different training/testing combinations were included for each model, including: (i) Using only the original dataset (O.D./O.D.), (ii) training adding the AMD dataset images (O.D. + AMD), and testing only on O.D. and (iii) training adding the AMD dataset images (O.D. + AMD), and testing only on AMD. The O.D. dataset denotes original healthy dataset imaged with the Spectralis device and the AMD dataset denotes the pathology dataset imaged with the Bioptigen device.
Data | Mask R-CNN | U-Net | FCN | DeeplabV3 |
---|
Trained/ Tested | ILM | RPE | ILM | RPE | ILM | RPE | ILM | RPE |
Mean signed error |
O.D./O.D. | −0.063 (0.133) | 0.125 (0.119) | −0.456 (0.040) | −0.447 (0.073) | 0.526 (0.036) | 0.461 (0.276) | 0.527 (0.034) | 0.376 (0.053) |
O.D. + AMD/O.D. | 0.451 (0.270) | −0.169 (0.010) | 0.312 (0.007) | 0.121 (0.102) | 0.265 (0.148) | 0.199 (0.103) | 0.175 (0.084) | 0.151 (0.074) |
O.D. + AMD/AMD | 1.299 (0.398) | −0.975 (0.810) | −0.034 (0.078) | 0.030 (0.134) | −0.093 (0.032) | −0.140 (0.044) | 0.035 (0.061) | −0.018 (0.140) |
Mean absolute error |
O.D./O.D. | 0.960 (0.017) | 0.941 (0.017) | 0.601 (0.015) | 0.596 (0.041) | 0.744 (0.015) | 0.870 (0.050) | 0.736 (0.016) | 0.763 (0.020) |
O.D. + AMD/O.D. | 1.250 (0.316) | 0.995 (0.187) | 0.629 (0.002) | 0.559 (0.052) | 0.507 (0.060) | 0.497 (0.037) | 0.482 (0.020) | 0.511 (0.026) |
O.D. + AMD/AMD | 1.887 (0.308) | 2.434 (0.159) | 0.343 (0.019) | 0.408 (0.014) | 0.387 (0.011) | 0.523 (0.024) | 0.421 (0.004) | 0.557 (0.021) |