4.4. Cornea Segmentation Results
For the first part of this experiment, we built SCED networks for segmenting the cornea area by using corneal fluorescein-stained images as inputs. Five SCED networks with different configurations were tested, as shown in
Table 1. For the encoder network, the input size was set to
. The feature map for the convolutional layer in the encoder network was set to 64, followed by the Rectified Linear Unit (ReLU) activation function and max pooling with a stride of
. The kernel size of the convolutional layer varied from 7 to 11 and the depth of the CNN layers varied from 3 to 5. Dropout layers were included to prevent overfitting. For the decoder network, the upsampling size was
, while the other parameters were defined the same as in the encoder network. The decoder’s output layer consisted of one feature map with a sigmoid activation function.
The model depth and other parameters were referenced from the work by Juš Lozej et al. [
9]. In this experiment, aiming to obtain a smooth shape of the predicted cornea area, we increased the kernel size to expand the receptive field. The differences among the models’ architectures are as follows: SCED1-3 used a kernel size of 9 for all models and varying model depths of 3, 4, and 5, respectively; SCED4 increased the kernel size to 11 with a model depth of 3; and SCED5 employed different kernel sizes in each layer. In the encoder network, the kernel size for depth layers 1 and 2 were configured as 11, layers 3 and 4 were configured as 9, and layer 5 was configured as 7, maintaining this setup in the corresponding layers of the decoder network. The memory (VRAM) and training time per epoch required for training each model are also presented in
Table 1.
To benchmark cornea segmentation performance, we implemented two U-Net models. The first U-Net model followed the original design proposed by Ronneberger et al. [
32], while the second used transfer learning by combining the pretrained ResNet-50 encoder [
42] with U-Net’s decoder structure. The details of U-Net and U-Net–ResNet-50 are provided in
Table 1.
Table 1.
Cornea segmentation model architectures and resources required for training.
Table 1.
Cornea segmentation model architectures and resources required for training.
Model Name | Feature Maps | Kernel Size | Depth | Trainable Parameters | Memory Usage [GB] | Training Time per Epoch [s/Epochs] |
---|
SCED1 | 64 | 9 | 3 | 2,338,561 | 8.7 | 63 |
SCED2 | 64 | 9 | 4 | 3,332,241 | 8.8 | 67 |
SCED3 | 64 | 9 | 5 | 3,997,761 | 8.9 | 69 |
SCED4 | 64 | 11 | 3 | 3,493,121 | 8.8 | 74 |
SCED5 | 64 | 11, 11, 9, 9, 7 | 4 | 4,029,889 | 8.9 | 77 |
U-Net | 64, 128, 256, 512, 1024 | 3 | 4 | 36,605,317 | 21.0 | 96 |
U-Net-ResNet-50 | 64, 128, 256, 512, 1024, 2048 | 1, 3, 7 * | 5 ** | 73,378,625 | 11.3 | 110 |
The training and validation accuracy and loss curves for the cornea segmentation models, illustrated in
Figure 7, provide crucial insights into the models’ learning process and performance. These graphs display the accuracy and loss metrics for both training and validation sets throughout the training epochs. The data presented in these graphs represent the first fold of a three-fold cross-validation process for all models. The plots reveal that all SCED models reach convergence within 30 to 40 epochs. Among the SCED model variants, the SCED3 model stands out by converging the fastest and achieving the highest accuracy on the validation set. Conversely, U-Net and U-Net–ResNet-50 show signs of overfitting, as evidenced by the discrepancy between training and validation performance. This can be observed in the accuracy and loss curves, where the training set accuracy continues to improve while the validation loss progressively increases from the early epochs. In this study, model selection is based on the criterion of minimal validation loss observed during the training process.
As mentioned in
Section 3, the raw outputs from SCED networks are postprocessed by image processing techniques and the HCT. In the experiment, we name the SCED networks with this postprocessing the Simple CNN Encoder–Decoder Model with Hough Circle Transform (SCED-HT). First, we utilized morphological erosion to separate excess pixels from the main cornea predicted area using a circular kernel with a radius of 4. Then, the connected component technique was applied to remove noise areas smaller than 400 pixels. After obtaining the desired area, the cornea region was expanded back to the original size using morphological dilation with the previous circular kernel. Subsequently, we applied HCT to locate the entire circle under the specified parameters. The resolution of the HCT accumulator was
pixels, the minimum distance for the center of each detected circle was set to 40 pixels, and the radius of the detected circles was set to 120–1600 pixels.
Figure 7.
Training and validation accuracy and loss curves for cornea segmentation models.
Figure 7.
Training and validation accuracy and loss curves for cornea segmentation models.
All circles obtained from this process were evaluated using the objective function
mentioned in Equation (
1). The circle that maximizes
was selected to represent the cornea area. In the experiment, we determined the best hyperparameter
by varying the values of
between 4.6 to 5.8 and calculating the IoU of the detected cornea areas for each
value. We first identified the optimal value of
using 120 additional ocular staining images that were not included in the SUSTech-SYSU dataset. An
value of 5.2 achieved the highest IoU for cornea segmentation. These
values were also verified with 712 images from the SUSTech-SYSU dataset, and provided the same results. The relation between the IoU and
is shown in
Figure 8. The optimal
value of 5.2 was used for the entire experiment.
To evaluate the performance of the objective function
for circle selection to represent the cornea regions, we calculated the percentage of the selected circles based on their IoU rankings among the candidates. From the graph in
Figure 9, the circle which maximized
is 67.79% of the circles which obtained the highest IoU from all candidates; 84.53% of the selected circles were among the two highest-ranked candidates, and 92.27% were within the top three highest-ranked candidates.
To assess the segmentation performance, the SCED and SCED-HT models were evaluated based on accuracy, IoU, and MAPE of cornea size.
Table 2 displays the results from all models in this experiment. The values following MAPE represent the standard deviation of MAPE, indicating the variability of the error in predicted cornea area around the mean in percentage. From the results, the SCED3 model has the highest accuracy and IoU compared to other methods, with 96.97% and 90.21%, respectively. The SCED2 model has the lowest MAPE at 7.28%. Overall, SCED models perform slightly better than SCED-HT models with the same architecture. The samples of cornea segmentation results for our work are illustrated in
Figure 10.
Figure 8.
Cornea segmentation IoU from
values in the proposed objective function (Equation (
1)).
Figure 8.
Cornea segmentation IoU from
values in the proposed objective function (Equation (
1)).
Figure 9.
The percentage for the highest IoU value of the top five candidate circles calculated from the objective function (Equation (
1)).
Figure 9.
The percentage for the highest IoU value of the top five candidate circles calculated from the objective function (Equation (
1)).
Table 2.
Evaluation results for the individual cornea segmentation models.
Table 2.
Evaluation results for the individual cornea segmentation models.
Method | Accuracy | Micro IoU | MAPE (Cornea Size) |
---|
SCED1 | 96.32 | 88.29 | 8.02 ± 13.22 |
SCED2 | 96.74 | 89.50 | 7.28 ± 11.25 |
SCED3 | 96.97 | 90.21 | 7.45 ± 12.48 |
SCED4 | 96.39 | 88.50 | 8.14 ± 13.07 |
SCED5 | 96.62 | 89.18 | 7.55 ± 11.19 |
SCED1-HT | 96.40 | 88.37 | 9.11 ± 14.11 |
SCED2-HT | 96.36 | 88.18 | 9.06 ± 13.54 |
SCED3-HT | 96.49 | 88.56 | 8.58 ± 13.28 |
SCED4-HT | 96.39 | 88.31 | 9.09 ± 15.44 |
SCED5-HT | 96.48 | 88.56 | 8.90 ± 12.67 |
Figure 10.
Sample results from the cornea segmentation algorithm.
Figure 10.
Sample results from the cornea segmentation algorithm.
Finally, we combined all segmentation results from all SCED networks using a voting ensemble. The results from the ensemble models are shown in
Table 3. The EM-5-SCED is an ensemble model of SCED1 to SCED5, while the EM-5-SCED-HT is an ensemble model of SCED1-HT to SCED5-HT. The results from the ensemble models outperform their individual models. The EM-5-SCED has the best performance for cornea segmentation in this experiment, achieving 97.08% accuracy, 90.58% IoU in micro-average, and 6.93% MAPE for cornea size. For the EM-5-SCED-HT results, the accuracy and IoU are comparable to EM-5-SCED but cannot yield a lower MAPE value, as EM-5-SCED-HT employs the HCT to identify the cornea area based on the segmentation results from EM-5-SCED. When EM-5-SCED generates inaccurate cornea shapes but still closely approximates cornea sizes, employing the HCT to determine the cornea region can result in incorrect areas, leading to errors in cornea size.
To evaluate the efficacy of our proposed method, we conducted a comprehensive comparative analysis using both image processing and deep learning approaches for cornea segmentation. The first comparison focused on the image processing method developed by Isam Abu Qasmieh et al. [
11], who proposed techniques for cornea and corneal ulcer segmentation. We implemented the process as described in their study to extract the cornea area for direct comparison with our method. The image processing method utilizes the Otsu thresholding algorithm [
43] and morphological operators to determine the eye border pixels. Then, the upper and lower eye border pixels are fitted with Gielis curves [
44]. Finally, the cornea region is obtained from the enclosed circle formed by the upper and lower eye border curves. The second approach involved the deep learning-based SLIT-Net model developed by Jessica Loo et al. [
45]. This model, based on the Mask R-CNN architecture [
46], is designed for ocular staining image segmentation, including the limbus (cornea region). In the experiment, we utilized the SLIT-Net model originally provided by the authors, which was trained with 133 slit-lamp images from the University of Michigan Kellogg Eye Center (Ann Arbor, MI, USA) and Aravind Eye Care System (Madurai, India) without additional training on the SUSTech-SYSU dataset to generate cornea segmentation results for evaluation. Both methods were tested using the same dataset employed in our study to ensure a consistent comparison. Additionally, we included the U-Net and U-Net–ResNet-50 models in the analysis to establish a comprehensive baseline for performance comparison. The results reveal that the EM-5-SCED and EM-5-SCED-HT variants of our proposed method both achieve superior cornea segmentation performance compared to all other examined methods. This enhanced performance is evident across multiple metrics, including Accuracy, IoU, and MAPE in cornea size estimation, as presented in
Table 3.
Table 3.
Comparison of the cornea segmentation performance across various methods.
Table 3.
Comparison of the cornea segmentation performance across various methods.
Method | Accuracy | Micro IoU | MAPE (Cornea Size) |
---|
Image Processing Method [11] | 88.53 | 66.89 | 25.26 ± 34.34 |
SLIT-Net [45] | 94.10 | 82.02 | 14.64 ± 20.54 |
U-Net | 95.99 | 87.24 | 9.28 ± 16.17 |
U-Net-ResNet-50 | 95.95 | 86.85 | 9.74 ± 16.30 |
EM-5-SCED | 97.08 | 90.58 | 6.93 ± 11.19 |
EM-5-SCED-HT | 96.70 | 89.23 | 8.14 ± 11.71 |
Figure 11 illustrates the cornea segmentation results of the eight methods: Image Processing Method [
11], SLIT-Net [
45], U-Net, U-Net–ResNet-50, SCED3, SCED3-HT, EM-5-SCED, and EM-5-SCED-HT. These samples demonstrate that when the cornea boundary is clearly visible in the ocular staining image, all methods accurately segment the cornea area, as evidenced in samples (a), (b), and (c). However, when the cornea boundary is unclear, as in samples (e), (f), (g), and (h), the segmentation performance varies among the models. The U-Net–ResNet-50 model effectively segments clearly defined areas, but exhibits reduced performance in regions with complex and ambiguous image features. In contrast, the high-resolution U-Net model provides excessive detail, resulting in significant oversegmentation. The SLIT-Net and SCED models yield results with comparable shapes, although SLIT-Net demonstrates lower performance. Moreover, the SLIT-Net model significantly underperforms compared to the U-Net model. This discrepancy may be due to the differences in the training dataset. In this experiment, the U-Net model was trained and validated on the SUSTech-SYSU dataset, whereas the SLIT-Net model was trained on a different dataset and subsequently validated on the SUSTech-SYSU dataset for comparison. The SCED and EM-5-SCED models demonstrate superior performance in identifying the overall cornea area, which is suitable for subsequent circle detection tasks, despite some segmentation errors. The segmentation results observed in these sample images align with the performance comparisons presented in
Table 3. Furthermore, we compared the segmentation results from the SCED model with the SCED-HT method, which converts the segmented area to a circular shape. When the segmentation results from SCED accurately represent the cornea areas or resemble a circular shape, as in samples (a), (b), and (c), the outputs of SCED and SCED-HT are similar. However, when the segmented results are approximately circular but include small excess areas, as in samples (d) and (e), SCED-HT can remove excess areas and identify more accurate cornea regions compared to SCED. In cases such as samples (f) and (g), where the segmented areas are irregularly shaped, using the segmentation results from SCED directly may yield better outcomes than applying the HCT to determine the boundary of the cornea region.
Finally, we compared the cornea segmentation results from the proposed method with other methods, as shown in
Table 4. The details of each method are as follows:
CANNY and RANSAC: In this method, the original ocular staining images were first converted into grayscale, then Canny Edge Detection [
47] was applied to identify the cornea edges. Next, RANdom SAmple Consensus (RANSAC) [
48] was used to detect circles from the edge images.
CANNY and HCT: The same procedure as in the first method, except with the HCT used for circle detection. The circle with the highest accumulator value was selected as the output.
EM-5-SCED and RADON: First, the cornea segments were determined from the ocular staining images by using the EM-5-SCED model and extracting the contour of the cornea segments. Then, Radon Transform-based circle detection was applied to detect the circles, as proposed by Okman et al. [
49].
EM-5-SCED and RANSAC: The same procedure as the third method, except with RANSAC used for circle detection.
PROPOSED without : Our proposed approach utilizes HCT to detect circles from the contour of the cornea segments. To demonstrate the effect of our proposed circle selection process, in this method the circle with the highest accumulator value is selected as the output instead of using the objective function .
PROPOSED with : This method represents the full version of the proposed cornea segmentation process, incorporating the EM-5-SCED model, HCT, and objective function .
Figure 11.
Comparison of cornea segmentation results. (a–c) are sample images with clearly visible cornea areas. (d–h) are sample images without clearly visible cornea areas. (Green = True Positive, Blue = False Negative, Red = False Positive).
Figure 11.
Comparison of cornea segmentation results. (a–c) are sample images with clearly visible cornea areas. (d–h) are sample images without clearly visible cornea areas. (Green = True Positive, Blue = False Negative, Red = False Positive).
Table 4.
Comparison of various circle detection methods for the cornea segmentation process.
Table 4.
Comparison of various circle detection methods for the cornea segmentation process.
Method | Accuracy | Micro IoU | MAPE (Cornea Size) |
---|
CANNY and RANSAC | 79.76 | 57.45 | 77.19 ± 64.78 |
CANNY and HCT | 80.03 | 58.23 | 76.09 ± 59.48 |
EM-5-SCED and RADON | 95.13 | 84.30 | 11.62 ± 24.20 |
EM-5-SCED and RANSAC | 96.48 | 88.75 | 9.64 ± 14.41 |
PROPOSED (EM-5-SCED and HCT) without | 96.21 | 87.77 | 9.73 ± 13.81 |
PROPOSED (EM-5-SCED and HCT) with | 96.70 | 89.23 | 8.14 ± 11.71 |
In the canny edge detection process in Methods 1 and 2, the grayscale image is first blurred using a Gaussian filter. Then, the image gradients are calculated for each pixel, and only pixels with local maxima of the gradient magnitude along the gradient direction are considered as edge candidates. Double thresholding is applied to classify candidate pixels into strong and weak edges using high and low threshold values, respectively. The final edge image is obtained from strong edges and weak edges that are connected to the strong edges. In this experiment, we varied the standard deviations of the Gaussian filter across 5, 10, 15, 20, and 25. Additionally, we varied the high and low threshold values across 6, 12, 18, 24, 30, and 36, pairing each lower value with its equivalent and all higher values to create a range of threshold combinations. We selected the optimal combination of Gaussian filter standard deviation and high and low threshold values that achieved the highest average precision in cornea edge detection. In this experiment, we identified the optimal parameters for the canny edge detection process as a Gaussian filter standard deviation of 10 with both the low and high thresholds set to 12.
For RANSAC-based circle detection, let
P represent a set of pixels used for circle fitting, which includes either the edge pixels in Method 1 or the contour of the cornea segments in Method 4. The algorithm starts by randomly selecting three pixel positions from
P and determining the circle
C that best fits these points. The distance between every pixel in
P and the circle
C is calculated. The inlier pixels, which are those with distances less than the threshold, are counted and stored. In this experiment, the inlier threshold is set to 2. This process of random sampling, circle fitting, and inlier counting is repeated for a predetermined number of iterations
T, which is calculated as follows [
50]:
where
p denotes the desired probability of success,
w denotes the percentage of inliers in the dataset, and
n denotes the number of points to define the model (for circle detection,
n = 3). After
T iterations are completed, the circle with the highest inlier count is returned as the final result. In this experiment, we set the desired probability of success
p to 0.95. For Method 1, the percentage of inliers
w was 1.87%, which leads to a number of iterations totaling 458,117 rounds. For Method 4, where the percentage of inliers
w was 13.24%, we used 1289 iterations.
The results in
Table 4 reveal that the proposed method provides more accurate results in cornea detection across all metrics compared to conventional methods. Examples of the circle detection process and results for each method are shown in
Figure 12. The first two conventional methods, which apply canny edge detection directly to the original ocular staining images, provide low performance. This is because edge detection algorithms tend to detect many edges other than the cornea, especially the eyelid, which has a curve resembling a circular shape. These issues are illustrated in
Figure 12, Method 1 and 2. Consequently, the RANSAC and HCT algorithms are unable to accurately detect the cornea area. The experimental results indicate that presegmentation of the corneal area prior to circle detection significantly improves cornea detection performance compared to direct application of circle detection algorithms on the original images. Additionally, the implementation of
for circle selection with the HCT algorithm in the proposed method demonstrates a marked improvement in cornea detection performance. This refined approach outperforms the conventional method of selecting the circle based on the highest accumulated value, further validating the effectiveness of the proposed technique.
Figure 12.
Sample of various circle detection methods for cornea segmentation. (Green = True Positive, Blue = False Negative, Red = False Positive).
Figure 12.
Sample of various circle detection methods for cornea segmentation. (Green = True Positive, Blue = False Negative, Red = False Positive).
4.5. Corneal Ulcer Segmentation Results
In the second part of the experiment, we built corneal ulcer segmentation models based on the original U-Net architecture [
32]. The four model configurations used in the experiment are shown in
Table 5. The input image size was
, the kernel sizes of the convolutional layers varied between 3 and 5, and ReLU was used as the activation function of the convolutional layers. The depth of the encoder and decoder networks in all models was equal to four layers. For the encoder networks, the number of feature maps in the first layer was set to 32 or 64 and then doubled in the next set of convolutional layers. A max pooling layer with a stride of
was applied to the encoders of every model. In the decoder networks, the number of feature maps in each convolutional layer was the reverse sequence of the number of feature maps in the encoder networks. The upsampling process utilized a Conv2DTranspose layer with a stride of
. This setting was applied to the decoders of every model. A concatenation layer was included in the decoder networks to combine the skip-connected feature maps from the encoder networks. The final layer was characterized by one feature map and a sigmoid activation function.
Table 5.
Corneal ulcer segmentation model architecture.
Table 5.
Corneal ulcer segmentation model architecture.
Model | Feature Map | Kernel Size | Depth | Trainable Parameters |
---|
UNET1 | 32, 64, 128, 256, 512 | 5 | 4 | 21,706,949 |
UNET2 | 64, 128, 256, 512, 1024 | 5 | 4 | 86,811,013 |
UNET3 | 32, 64, 128, 256, 512 | 3 | 4 | 9,154,245 |
UNET4 | 64, 128, 256, 512, 1024 | 3 | 4 | 36,605,317 |
We compared three methods for corneal ulcer segmentation. The first method, named UNET-R-I, uses a conventional U-Net model with raw cornea mask multiplication in the inference phase. The model for segmenting the corneal ulcer area is based on the U-Net configuration in
Table 5. The model outputs are multiplied with the cornea segmentation mask from EM-5-SCED. This step is applied to filter out the predicted corneal ulcer area that falls outside the boundary of the predicted cornea area. The remaining corneal ulcer region is the result of this process. This method is used as a baseline for comparison to our proposed method.
The second method is our proposed U-Net model with HCT cornea mask multiplication in the learning phase, named UNET-H-L. This model uses the same U-Net configuration in
Table 5 as the baseline model. The key differences between our proposed method and the baseline approach are: (1) instead of applying a raw cornea mask, we apply the circular cornea mask obtained from EM-5-SCED-HT, and (2) instead of applying a mask in the inference phase, the cornea mask is applied in both the learning and inference phases.
The last method, named UNET-R-L, utilizes the same models as the proposed method in
Figure 4, but replaces the circular cornea mask input from EM-5-SCED-HT with the raw predicted cornea mask from EM-5-SCED. The raw predicted cornea mask is considered a soft restriction in the learning phase, as it contains continuous values between 0 and 1, whereas the circular cornea mask is considered a hard restriction in the learning phase due to its binary values of 0 or 1. This method aims to observe the effect of using different masks during the learning phase.
Figure 13 presents graphs showing the training and validation accuracy and loss curves for the corneal ulcer segmentation models in the experiment. These graphs illustrate the accuracy and loss values for both training and validation processes across all training epochs. The data represent the first fold of a three-fold cross-validation process for six sample models: UNET2-R-I, UNET4-R-I, UNET2-R-L, UNET4-R-L, UNET2-H-L, and UNET4-H-L. The graphs reveal that the UNET2-R-L, UNET4-R-L, UNET2-H-L, and UNET4-H-L models, all of which incorporate the cornea mask in the learning phase, converge faster than the conventional UNET2-R-I and UNET4-R-I models. This improved convergence occurs because these models focus on learning from the fluorescein-stained areas within the cornea region. Additionally, the validation set accuracy is significantly higher for these models due to the elimination of excess areas from the corneal ulcer segmentation through the use of the cornea mask. The selection criterion for the final model is the achievement of the least validation loss over the course of training.
The performance of all methods is evaluated by comparing the predicted corneal ulcer area with the labeled images in terms of Accuracy, DSC, Sen, and IoU. Additionally, the MAE is used to compare the percentage of ulcerated area on cornea between the predicted and actual data. The values following MAE represent the standard deviation of the MAE, indicating the variability of the error around the mean in the predicted percentage of ulcerated area on the cornea. The results from this experiment are shown in
Table 6. For segmentation performance, UNET2-R-I performs the best among all baseline models (UNET-R-I group), with an Accuracy of 99.32%, DSC of 89.22%, Sen of 86.91%, IoU of 80.54%, and MAE of 2.56%. The best model of the UNET-R-L group is UNET2-R-L, which segments corneal ulcers with an Accuracy of 99.27%, DSC of 88.64%, Sen of 88.53%, IoU of 79.60%, and MAE of 2.60%. In the proposed method (UNET-H-L group), UNET4-H-L provides the best performance, achieving an Accuracy of 99.35%, DSC of 89.87%, Sen of 89.17%, IoU of 81.60%, and MAE of 2.51%.
Figure 13.
Training and validation accuracy and loss curves for the corneal ulcer segmentation models.
Figure 13.
Training and validation accuracy and loss curves for the corneal ulcer segmentation models.
Comparing the results from the three main methods with similar U-Net architectures, UNET-H-L provides superior results in all metrics except for the MAE of UNET2-R-I and UNET2-R-L, which are slightly better than UNET2-H-L. The UNET-R-I and UNET-R-L methods demonstrate comparable performance in corneal ulcer segmentation tasks. This is because both methods use similarly shaped cornea masks to eliminate the fluorescein-stained areas outside the segmented cornea region. As a result, the outcomes do not differ significantly.
In terms of memory (VRAM), the usage for each model depends on the trainable parameters, as shown in
Table 5 and
Table 6. Models using the UNET2 configuration have the highest number of parameters and require the longest training time per epoch across all model groups. Regarding training time per epoch, the UNET-R-L and UNET-H-L groups are quite similar. Meanwhile, models in both of these groups take longer times to train per epoch compared to models in the UNET-R-I group with the same configuration. This is because an additional image (the cornea mask) needs to be loaded and processed in the model learning phase.
In the context of varying model architectures, utilizing a kernel size of 5 generally yields better results than a kernel size of 3 across all metrics within the same model architecture for all methods except for UNET4-H-L, which uses a kernel size of 3. This method achieves better Accuracy, DSC, IoU, and MAE values than UNET2-H-L, which uses a kernel size of 5, by 0.03%, 0.36%, 0.58%, and 0.14%, respectively. For feature map size, models using larger feature map sizes tend to outperform models using smaller feature map sizes in all metrics except for the Sen of UNET4-R-I, which is lower than UNET3-R-I (refer to
Table 5 and
Table 6). Moreover, utilizing a larger feature map with a kernel size of 5 results in UNET2-R-I and UNET2-R-L being the most efficient models in the UNET-R-I and UNET-R-L groups, respectively. However, the proposed UNET4-H-L method, which applies a kernel size of 3 and a large feature map size, provides better outcomes than UNET2-H-L, which applies a kernel size of 5 and the same feature map size. As a result, UNET4-H-L stands out as the best method in this phase.
Table 6.
Comparison of all individual corneal ulcer segmentation models.
Table 6.
Comparison of all individual corneal ulcer segmentation models.
Method | Accuracy | Micro DSC | Micro Sen | Micro IoU | MAE | Memory Usage | Training Time per Epoch |
---|
UNET1-R-I | 99.25 | 88.21 | 86.37 | 78.91 | 2.74 ± 5.33 | 20.8 GB | 38 s/epoch |
UNET2-R-I | 99.32 | 89.22 | 86.91 | 80.54 | 2.56 ± 5.09 | 21.1 GB | 42 s/epoch |
UNET3-R-I | 99.22 | 87.87 | 86.63 | 78.36 | 2.82 ± 5.52 | 14.6 GB | 36 s/epoch |
UNET4-R-I | 99.29 | 88.57 | 85.43 | 79.48 | 2.82 ± 5.80 | 21.0 GB | 40 s/epoch |
UNET1-R-L | 99.20 | 87.54 | 86.90 | 77.85 | 2.87 ± 5.90 | 20.8 GB | 44 s/epoch |
UNET2-R-L | 99.27 | 88.64 | 88.53 | 79.60 | 2.60 ± 5.11 | 21.1 GB | 49 s/epoch |
UNET3-R-L | 99.21 | 87.55 | 85.73 | 77.85 | 2.80 ± 5.26 | 14.6 GB | 42 s/epoch |
UNET4-R-L | 99.23 | 87.86 | 86.37 | 78.34 | 2.69 ± 4.98 | 21.0 GB | 47 s/epoch |
UNET1-H-L | 99.29 | 88.87 | 88.31 | 79.96 | 2.74 ± 5.37 | 20.8 GB | 44 s/epoch |
UNET2-H-L | 99.32 | 89.51 | 90.15 | 81.02 | 2.65 ± 5.31 | 21.1 GB | 50 s/epoch |
UNET3-H-L | 99.26 | 88.38 | 87.57 | 79.17 | 2.77 ± 5.38 | 14.6 GB | 42 s/epoch |
UNET4-H-L | 99.35 | 89.87 | 89.17 | 81.60 | 2.51 ± 4.68 | 21.0 GB | 49 s/epoch |
Next, we employed a voting ensemble to combine the corneal ulcer segmentation results from four models within each method and evaluate their performance. Accordingly, EM-4-UNET-R-I was an ensemble of models from UNET1-R-I to UNET4-R-I, EM-4-UNET-R-L was an ensemble of models from UNET1-R-L to UNET4-R-L, and EM-4-UNET-H-L was an ensemble of models from UNET1-H-L to UNET4-H-L. The results of the ensemble approach in
Table 7 demonstrate that the proposed method (EM-4-UNET-H-L) outperforms the conventional method (EM-4-UNET-R-I) in terms of all metrics. Accuracy increases by 0.05%, DSC increases by 1.04%, Sen increases by 3.07%, IoU increases by 1.72%, and MAE decreases by 0.19%. Moreover, using the segmented cornea area from HCT multiplication in the learning phase (EM-4-UNET-H-L) yields better corneal ulcer segmentation result than using the raw predicted cornea mask (EM-4-UNET-R-L) for all metrics. Notably, Accuracy increases by 0.07%, DSC increases by 1.14%, Sen increases by 2.22%, IoU increases by 1.88%, and MAE decreases by 0.13%. The voting ensemble technique enhances the segmentation performance of the UNET-H-L method, which can be observed by comparing the results of UNET4-H-L, with the best performance in the UNET-H-L group, to EM-4-UNET-H-L. The accuracy increases from 99.35% to 99.37%, the DSC increases from 89.87% to 90.00%, and the IoU increases from 81.60% to 81.83%. However, Sen decreases from 89.17% to 88.19%, and the MAE remains similar. Based on these results, the proposed method (EM-4-UNET-H-L) outperforms all other corneal ulcer segmentation methods in this experiment.
Table 7.
Comparison of the corneal ulcer segmentation ensemble models.
Table 7.
Comparison of the corneal ulcer segmentation ensemble models.
Method | Accuracy | Micro DSC | Micro Sen | Micro IoU | MAE | Memory Usage | Training Time per Epoch |
---|
EM-4-UNET-R-I | 99.32 | 88.96 | 85.12 | 80.11 | 2.70 ± 5.28 | 21.1 GB | 39 s/epoch |
EM-4-UNET-R-L | 99.30 | 88.86 | 85.97 | 79.95 | 2.64 ± 4.78 | 21.1 GB | 46 s/epoch |
EM-4-UNET-H-L | 99.37 | 90.00 | 88.19 | 81.83 | 2.51 ± 4.63 | 21.1 GB | 46 s/epoch |
Sample results from the highest-performing instance of each method are shown in
Figure 14. This comparison illustrates that the proposed UNET4-H-L method segments the corneal ulcer region more accurately than UNET2-R-I and UNET2-R-L. This can be observed from the true positive areas in UNET4-H-L, which more comprehensively fill the corneal ulcer area in the eye images. Furthermore, the false negative and false positive areas from UNET-4-H-L are slightly smaller than the outcomes from UNET2-R-I and UNET-2-R-L. These consequences are also evident in the ensemble model results. Upon visual examination, EM-4-UNET-H-L can segment the corneal ulcer area most accurately; these results are related to the experimental results in
Table 6 and
Table 7.
Figure 14.
Comparison of corneal ulcer segmentation results. (Green = True Positive, Blue = False Negative, Red = False Positive).
Figure 14.
Comparison of corneal ulcer segmentation results. (Green = True Positive, Blue = False Negative, Red = False Positive).
We summarize the results from our corneal ulcer segmentation approach in comparison to conventional methods in
Table 8 and
Table 9. Each method is described in detail as follows:
U-Net: This method uses the UNET2 model, which provided the best result of all U-Net models in this experiment, to segment the corneal ulcer area from the dataset. This method does not apply the cornea mask to filter out the segmented corneal ulcer area outside the cornea area.
U-Net-ResNet-50: We developed U-net–ResNet-50 by utilizing the encoder network from the ResNet-50 backbone combined with the decoder network from the U-Net architecture to segment the corneal ulcer area. This method also does not apply the cornea mask.
U-Net with raw cornea mask: The results of this method were derived from UNET2-R-I, which uses the corneal ulcer area from the UNET2 model and applies a raw cornea mask from EM-5-SCED during the inference phase to remove the excess corneal ulcer area outside the cornea area.
Ensemble U-Net models with cornea mask: This method uses the ensemble segmented corneal ulcer areas from UNET1-R-I to UNET4-R-I (EM-4-UNET-R-I), which applies a raw cornea mask from EM-5-SCED in the inference phase.
Proposed method: The results of the proposed method are from the ensemble segmented corneal ulcer area from UNET1-H-L to UNET4-H-L (EM-4-UNET-H-L), which applies the cornea mask from EM-5-SCED-HT in the learning phase and inference phase.
The comparison included the maximum memory usage evaluated during training and the total training time, representing the end-to-end process for training each method. For conventional methods, U-Net outperforms U-Net with the ResNet-50 backbone across all performance metrics. However, the U-Net model requires more resources in terms of training time and memory usage. Notably, the U-Net-ResNet-50 model underperforms compared to the other methods due to overfitting during model training. In comparing U-Net and U-Net with raw cornea masks, the results show that utilizing the predicted cornea mask to eliminate excess predicted corneal ulcer areas improves segmentation results across all metrics. However, this improvement comes at the cost of increased training time due to the additional step of training the cornea segmentation models. The proposed method demonstrates superior performance, outperforming all conventional methods across all metrics used in this experiment. Compared to the ensemble U-Net models with raw cornea masks, which have the same number of trainable parameters and consume the same maximum memory, the proposed method not only provides better performance but also requires less training time due to faster convergence. These results demonstrate that applying HCT cornea mask multiplication in the model learning phase can improve both performance outcomes and resource efficiency.
Table 8.
Comparison of corneal ulcer segmentation performance across various methods.
Table 8.
Comparison of corneal ulcer segmentation performance across various methods.
Method | Accuracy | Micro DSC (%) | Micro Sen (%) | Micro IoU (%) | MAE |
---|
U-Net | 99.24 | 88.15 | 87.56 | 78.88 | 2.68 ± 5.23 |
U-Net-ResNet-50 | 98.95 | 82.68 | 77.82 | 70.48 | 4.29 ± 9.89 |
U-Net with raw cornea mask | 99.32 | 89.22 | 86.91 | 80.54 | 2.56 ± 5.09 |
Ensemble U-Net models with raw cornea mask | 99.32 | 88.96 | 85.12 | 80.11 | 2.70 ± 5.28 |
Proposed method | 99.37 | 90.00 | 88.19 | 81.83 | 2.51 ± 4.63 |
Table 9.
Comparison of resources and time required to create corneal ulcer segmentation models.
Table 9.
Comparison of resources and time required to create corneal ulcer segmentation models.
Method | Trainable Parameters | Maximum Memory Usage | Total Training Time |
---|
U-Net | 86,811,013 | 21.1 GB | 230 min |
U-Net-ResNet-50 | 73,378,625 | 11.4 GB | 70 min |
U-Net with raw cornea mask | 104,002,586 | 21.1 GB | 446 min |
Ensemble U-Net models with raw cornea mask | 171,469,097 | 21.1 GB | 922 min |
Proposed method | 171,469,097 | 21.1 GB | 788 min |
In
Figure 15, the graph represents a scatter plot between the actual and predicted percentage of ulcerated area on the cornea obtained from our best model (EM-4-UNET-H-L). From 354 images analyzed, 94.65% (335 images) fall within a margin of error of ±10%, while 86.72% (307 images) fall within a margin of error of ±5%. The coefficient of determination (r-squared) is calculated as 0.94. These results satisfy our expectations as to the accuracy and reliability of the proposed method. However, the experiment still provides some erroneous results. We highlight and analyze some of the inaccurately predicted outcomes, labeled as (a), (b), and (c), in
Figure 15 and
Figure 16 to identify the causes of these inconsistencies. In these cases, significant errors in determining the percentage of the ulcerated area on the cornea arose from opposing errors in determining the cornea area and corneal ulcer area, explained as follows: in case (a), while the results from cornea segmentation are smaller than the actual cornea area, the outcome of corneal ulcer segmentation is larger than the actual corneal ulcer area; in case (b), the predicted cornea area is too large compared to the actual cornea area, but the predicted corneal ulcer area is slightly smaller than the actual corneal ulcer area; finally, in case (c), the results from cornea segmentation provide a moderately large area compared to the actual cornea area, but the outcome from corneal ulcer segmentation is considerably smaller than the actual corneal ulcer area.
Figure 15.
Scatter plot showing the predicted and actual percentage of ulcerated area on the cornea from EM-4-UNET-H-L.
Figure 15.
Scatter plot showing the predicted and actual percentage of ulcerated area on the cornea from EM-4-UNET-H-L.
Figure 16.
Case (
a) represents a case of predicting a higher percentage of ulcerated area than the actual value displayed in
Figure 15. Case (
b,
c) represent cases of predicting a lower percentage of ulcerated area than the actual value displayed in
Figure 15. (Green = True Positive, Blue = False Negative, Red = False Positive).
Figure 16.
Case (
a) represents a case of predicting a higher percentage of ulcerated area than the actual value displayed in
Figure 15. Case (
b,
c) represent cases of predicting a lower percentage of ulcerated area than the actual value displayed in
Figure 15. (Green = True Positive, Blue = False Negative, Red = False Positive).