Exploring the Potential of Ensembles of Deep Learning Networks for Image Segmentation
Abstract
:1. Introduction
2. Literature Review
2.1. Ensemble Approaches
2.2. Ensemble Combination Strategies
2.3. Ensembles in Deep Learning
- For different tasks, including image classification, detection, and segmentation;
- In several application domains, including healthcare, speech analysis, forecasting, fraud prevention, and information retrieval.
3. Materials and Methods
- Polyp-PVT [11] represents a transformer-based architecture, offering a different approach to feature extraction and context modeling, which complements CNNs;
- HSNet [12] is a hybrid architecture that combines CNN and transformer components, exploiting the advantages of both, resulting in a broader range of feature representations and contextual information.
3.1. Loss Functions
- Dice-Based Loss Functions:
- –
- The Generalized Dice Loss is a multiclass variant of the Dice Loss;
- –
- The Focal Generalized Dice Loss is the focal version of the Generalized Dice Loss, emphasizing hard-to-segment regions while downplaying well-segmented areas;
- –
- The Log-Cosh Dice Loss is a combination of the Dice Loss and the Log-Cosh function, applied with the purpose of smoothing the loss curve.
- Tversky-Based Loss Functions:
- –
- The Tversky Loss is a weighted version of the Tversky index designed to deal with unbalanced classes;
- –
- The Focal Tversky Loss is a variant of the Tversky loss where a modulating factor is used to ensure that the model focuses on hard samples instead of properly classified examples;
- –
- The Log-Cosh Focal Tversky Loss is based on the same idea of smoothing, here applied to the Focal Tversky Loss.
- Structural Similarity-Based Loss Functions:
- –
- The SSIM Loss is obtained from the Structural similarity (SSIM) index, usually adopted to evaluate the quality of an image;
- –
- The MS-SIM Loss is a variant of defined using the multiscale structural similarity (MS-SSIM) index.
- Boundary-Based Loss Functions:
- –
- The Boundary Enhancement Loss () explicitly focuses on the boundary areas during training. The Laplacian filter is used to generate strong responses around the boundaries and zero everywhere else; see [13] for details. We gather Dice Loss, Boundary Enhancement loss, and the Structure Loss together, weighted appropriately: . We set and ;
- –
- The Structure Loss is a combination of the weighted Intersect over Union () and the weighted binary-crossed entropy loss . We refer the reader to [10] for details. The weights in this loss function are determined by the importance of each pixel, which is calculated from the difference between the center pixel and its surrounding pixels. To give more importance to the binary-crossed entropy loss, we used a weight of 2, as suggested in [10], for it: .
- Combined Loss Functions:The losses described above can be combined in different ways; notice that each component has the same weight equal to 1:
- –
- ,
- –
- ,
- –
- ,
- –
- .
3.2. Data Augmentation
- Data Augmentation 1 (DA1) [13] is obtained through horizontal flip, vertical flip, and 90° rotation;
- Data Augmentation 2 (DA2) [13] consists of 13 operations, some changing the color of an image and some changing its shape;
- Data Augmentation 3 (DA3) is a variant of the approach used in [12]. It consists of using multiscale strategies (i.e., 1.25, 1, 0.75) to alleviate the sensitivity of the network to scale variability. Simultaneously, random perspective technology is adopted to process the input image with a probability of 0.5, together with random color adjustment with a probability of 0.2 for data augmentation. While DA1 and DA2 do not include randomness, DA3 uses a different training set for each network. The application of this data augmentation technique substantially amplifies result variability within the network, consequently fostering greater diversity among ensemble constituents.
3.3. Performance Metrics
3.4. Datasets and Testing Protocols
3.4.1. Polyp Segmentation (POLYP)
- The Kvasir-SEG dataset comprises medical images that have been meticulously labeled and verified by medical professionals. These images depict various segments of the digestive system, showcasing both healthy and diseased tissue. The dataset encompasses images with varying resolutions, ranging from 720 × 576 pixels to 1920 × 1072 pixels, organized into folders based on their content. Some of these images also include a small picture-in-picture display indicating the position of the endoscope within the body;
- The CVC-ColonDB dataset consists of images designed to offer a diverse range of polyp appearances, maximizing dataset variability;
- CVC-T serves as the test set of a larger dataset named CVC-EndoSceneStill;
- The ETIS-Larib dataset comprises 196 colonoscopy images;
- CVC-ClinicDB encompasses images extracted from 31 videos of colonoscopy procedures. Expert annotations identify the regions affected by polyps, and ground truth data are also available for light reflections. The images in this dataset are uniformly sized at 576 × 768 pixels.
3.4.2. Skin Segmentation (SKIN)
3.4.3. Leukocyte Segmentation (LEUKO)
3.4.4. Butterfly Identification (BFLY)
3.4.5. Microorganism Identification (EMICRO)
3.4.6. Ribs Segmentation (RIBS)
3.4.7. Locust Segmentation (LOC)
3.4.8. Portrait Segmentation (POR)
3.4.9. Camouflaged Segmentation (CAM)
4. Experimental Results
- In Section 4.1, different methods for building an ensemble of DeepLabV3+ models are tested and compared;
- In Section 4.2, the ensemble of different topologies is tested and the different methods for building the output mask of HArdNet, HSN and PVT are compared.
4.1. Experiments: DeepLabV3+
- Initial learning rate = 0.01;
- Number of epoch = 10 or 15 (it depended on data augmentation: see below);
- Momentum = 0.9;
- L2Regularization = 0.005;
- Learning Rate Drop Period = 5;
- Learning Rate Drop Factor = 0.2;
- Shuffle training images at every epoch;
- Optimizer = SGD (stochastic gradient descent).
- ERN18(N) is an ensemble of N RN18 networks trained with DA1;
- ERN50(N) is an ensemble of N RN50 networks trained with DA1;
- ERN101(N) is an ensemble of N RN101 networks trained with DA1;
- E101(10) is an ensemble of 10 RN101 models trained with DA1 and five different loss functions. The final fusion is determined by the formula: , where indicates two RN101 models trained using the loss function ;
- EM(10) is a similar ensemble, but the two networks using the same loss (as in E101(10), the five losses are , , , , ) were trained once using DA1 and once using DA2;
- EM2(10) is similar to the previous ensemble, but was used instead of ;
- In EM2(5)_DAx, five RN101 networks were trained using the loss of EM2(10). All five networks were trained using data augmentation DAx;
- EM3(10) is similar to the previous ensemble, but was used as a loss function.
- Among stand-alone networks, RN101 obtained the best average performance, but in RIBS (a small training set), it performed worse than the others. This probably happened because it is a larger network than RN18 and RN50, thus it requires a larger training set for better tuning;
- ERN101(10) always outperformed RN101(1);
- E101(10) outperformed ERN101(10) with a p-value of 0.0078 (Wilcoxon signed rank test) and EM(10) outperformed E101(10) with a p-value of 0.0352. For the sake of space, we have not reported the performance obtained from individual losses. In any case, there was no winner: the various losses led to similar performances;
- EM3(10) obtained the highest average performance, but the p-value was quite high: it outperformed EM(10) with a p-value of 0.1406 and EM2(10) with a p-value of 0.2812;
- There was no statistical difference between the performance of EM2(5)_DA1 and EM2(5)_DA2. Instead, EM2, using both data augmentation methods, achieved better performance (on average) than EM2(5)_DA1 and EM2(5)_DA2.
4.2. Experiments: Combining Different Topologies
- LRa: ;
- LRb: decaying to after 10 epochs;
- LRc: decaying to after 30 epochs.
- Fusion: the combination of all the nets while varying the DA and LR strategy;
- Baseline Ensemble: fusion between nine networks (the same size of Fusion) obtained via retraining DA3-LRc nine times;
- Fusion obtained the best performance, outperforming (on average) the stand-alone approaches and previous best ensemble (SOTAEns);
- There was no clear winner among the different data augmentation approaches and learning rate strategies;
- The proposed Fusion ensemble always improved the Baseline Ensemble except in CAMO. In this dataset, there was a significant difference in the performance between LRc and the other learning strategies; combining only the three networks based on LRc (i.e., using the three data augmentations coupled with LRc), both HS and PVT obtained a Dice of 0.830, outperforming the Baseline Ensemble.
- Ens1: EM3(10) ⊖ Fusion(FH) ⊖ Fusion(PVT) ⊖ Fusion(HSN). See Figure 2;
- Ens2: Fusion(FH) ⊖ Fusion(PVT) ⊖ Fusion(HSN);
- Ens3: Fusion(PVT) ⊖ Fusion(HSN).
5. Discussion
5.1. Performance and Ensemble Comparison
5.2. Data Augmentation and Learning Rate Strategies
5.3. Comparative Analysis with the State of the Art
5.4. Overall Contribution
6. Conclusions
- The fusion of different convolutional and transformer networks can achieve state-of-the-art (SOTA) performance;
- The application of diverse approaches to the learning rate strategy is a viable method to build a set of segmentation networks;
- The integration of transformers (HSN and PVT) in an ensemble can be enhanced by modifying the way the final segmentation map is obtained, thereby avoiding excessively sharp masks.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hao, S.; Zhou, Y.; Guo, Y. A brief survey on semantic segmentation with deep learning. Neurocomputing 2020, 406, 302–321. [Google Scholar] [CrossRef]
- Wang, S.; Mu, X.; Yang, D.; He, H.; Zhao, P. Attention guided encoder-decoder network with multi-scale context aggregation for land cover segmentation. IEEE Access 2020, 8, 215299–215309. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Siddique, N.; Paheding, S.; Elkin, C.P.; Devabhaktuni, V. U-Net and its variants for medical image segmentation: A review of theory and applications. IEEE Access 2021, 9, 82031–82057. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:cs.CV/2010.11929. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 568–578. [Google Scholar]
- Mohammed, A.; Kora, R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 757–774. [Google Scholar] [CrossRef]
- Huang, C.H.; Wu, H.Y.; Lin, Y.L. HarDNet-MSEG: A Simple Encoder-Decoder Polyp Segmentation Neural Network that Achieves over 0.9 Mean Dice and 86 FPS. arXiv 2021, arXiv:cs.CV/2101.07172. [Google Scholar]
- Dong, B.; Wang, W.; Fan, D.P.; Li, J.; Fu, H.; Shao, L. Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers. arXiv 2023, arXiv:eess.IV/2108.06932. [Google Scholar] [CrossRef]
- Zhang, W.; Fu, C.; Zheng, Y.; Zhang, F.; Zhao, Y.; Sham, C.W. HSNet: A hybrid semantic network for polyp segmentation. Comput. Biol. Med. 2022, 150, 106173. [Google Scholar] [CrossRef]
- Nanni, L.; Lumini, A.; Loreggia, A.; Formaggio, A.; Cuza, D. An Empirical Study on Ensemble of Segmentation Approaches. Signals 2022, 3, 341–358. [Google Scholar] [CrossRef]
- Nanni, L.; Loreggia, A.; Lumini, A.; Dorizza, A. A Standardized Approach for Skin Detection: Analysis of the Literature and Case Studies. J. Imaging 2023, 9, 35. [Google Scholar] [CrossRef] [PubMed]
- Nanni, L.; Fantozzi, C.; Loreggia, A.; Lumini, A. Ensembles of Convolutional Neural Networks and Transformers for Polyp Segmentation. Sensors 2023, 23, 4688. [Google Scholar] [CrossRef] [PubMed]
- Rokach, L. Ensemble-based classifiers. Artif. Intell. Rev. 2010, 33, 1–39. [Google Scholar] [CrossRef]
- Polikar, R. Ensemble Based Systems in Decision Making. IEEE Circuits Syst. Mag. 2006, 6, 21–45. [Google Scholar] [CrossRef]
- Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 2020, 14, 241–258. [Google Scholar] [CrossRef]
- Schapire, R.E. The strength of weak learnability. Mach. Learn. 1990, 5, 197–227. [Google Scholar] [CrossRef]
- Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Valiant, L.G. A Theory of the Learnable. Commun. ACM 1984, 27, 1134–1142. [Google Scholar] [CrossRef]
- Kearns, M.; Valiant, L.G. Cryptographic Limitations on Learning Boolean Formulae and Finite Automata. J. ACM 1994, 41, 67–95. [Google Scholar] [CrossRef]
- Efron, B. Bootstrap Methods: Another Look at the Jackknife. Ann. Stat. 1979, 7, 1–26. [Google Scholar] [CrossRef]
- Alexandre, L.A.; Campilho, A.C.; Kamel, M. On combining classifiers using sum and product rules. Pattern Recognit. Lett. 2001, 22, 1283–1289. [Google Scholar] [CrossRef]
- Ganaie, M.A.; Hu, M.; Malik, A.K.; Tanveer, M.; Suganthan, P.N. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 2022, 115, 105151. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the Computer Vision—ECCV 2018: 15th European Conference, Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 833–851. [Google Scholar] [CrossRef]
- Lumini, A.; Nanni, L. Fair comparison of skin detection approaches on publicly available datasets. Expert Syst. Appl. 2020, 160, 113677. [Google Scholar] [CrossRef]
- Phung, S.L.; Bouzerdoum, A.; Chai, D. Skin segmentation using color pixel classification: Analysis and comparison. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 148–154. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Cao, F.; Zhao, J.; Chu, J. Segmentation of White Blood Cells Image Using Adaptive Location and Iteration. IEEE J. Biomed. Health Inform. 2017, 21, 1644–1655. [Google Scholar] [CrossRef]
- Filali, I.; Achour, B.; Belkadi, M.; Lalam, M. Graph ranking based butterfly segmentation in ecological images. Ecol. Inform. 2022, 68, 101553. [Google Scholar] [CrossRef]
- Zhao, P.; Li, C.; Rahaman, M.M.; Xu, H.; Ma, P.; Yang, H.; Sun, H.; Jiang, T.; Xu, N.; Grzegorzek, M. EMDS-6: Environmental Microorganism Image Dataset Sixth Version for Image Denoising, Segmentation, Feature Extraction, Classification, and Detection Method Evaluation. Front. Microbiol. 2022, 13, 829027. [Google Scholar] [CrossRef]
- Nguyen, H.C.; Le, T.T.; Pham, H.H.; Nguyen, H.Q. VinDr-RibCXR: A Benchmark Dataset for Automatic Segmentation and Labeling of Individual Ribs on Chest X-Rays. arXiv 2021, arXiv:eess.IV/2107.01327. [Google Scholar]
- Liu, L.; Liu, M.; Meng, K.; Yang, L.; Zhao, M.; Mei, S. Camouflaged locust segmentation based on PraNet. Comput. Electron. Agric. 2022, 198, 107061. [Google Scholar] [CrossRef]
- Park, H.; Sjösund, L.L.; Yoo, Y.; Kwak, N. ExtremeC3Net: Extreme Lightweight Portrait Segmentation Networks using Advanced C3-modules. arXiv 2019, arXiv:cs.CV/1908.03093. [Google Scholar]
- Yan, J.; Le, T.N.; Nguyen, K.D.; Tran, M.T.; Do, T.T.; Nguyen, T.V. MirrorNet: Bio-Inspired Camouflaged Object Segmentation. IEEE Access 2021, 9, 43290–43300. [Google Scholar] [CrossRef]
- Li, W.; Zhao, Y.; Li, F.; Wang, L. MIA-Net: Multi-information aggregation network combining transformers and convolutional feature learning for polyp segmentation. Knowl.-Based Syst. 2022, 247, 108824. [Google Scholar] [CrossRef]
- Wu, Y.H.; Liu, Y.; Zhan, X.; Cheng, M.M. P2T: Pyramid Pooling Transformer for Scene Understanding. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 11, 12760–12771. [Google Scholar] [CrossRef] [PubMed]
- Liu, F.; Hua, Z.; Li, J.; Fan, L. DBMF: Dual Branch Multiscale Feature Fusion Network for polyp segmentation. Comput. Biol. Med. 2022, 151, 106304. [Google Scholar] [CrossRef] [PubMed]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.; et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 6877–6886. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
- Zhang, Y.; Liu, H.; Hu, Q. TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2021; de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 14–24. [Google Scholar] [CrossRef]
- Kim, T.; Lee, H.; Kim, D. UACANet: Uncertainty Augmented Context Attention for Polyp Segmentation. In Proceedings of the 29th ACM International Conference on Multimedia, MM’21, Virtual Event, 20–24 October 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 2167–2175. [Google Scholar] [CrossRef]
- Wei, J.; Hu, Y.; Zhang, R.; Li, Z.; Zhou, S.K.; Cui, S. Shallow Attention Network for Polyp Segmentation. In Proceedings of the Lecture Notes in Computer Science, Strasbourg, France, 27 September–1 October 2021; Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. Springer: Berlin/Heidelberg, Germany, 2021; Volume 12901. [Google Scholar] [CrossRef]
- Zhao, X.; Zhang, L.; Lu, H. Automatic Polyp Segmentation via Multi-scale Subtraction Network. arXiv 2021, arXiv:2108.05082. [Google Scholar] [CrossRef]
- Park, K.B.; Lee, J.Y. SwinE-Net: Hybrid deep learning approach to novel polyp segmentation using convolutional neural network and Swin Transformer. J. Comput. Des. Eng. 2022, 9, 616–632. [Google Scholar] [CrossRef]
- Song, P.; Li, J.; Fan, H. Attention based multi-scale parallel network for polyp segmentation. Comput. Biol. Med. 2022, 146, 105476. [Google Scholar] [CrossRef]
- Xia, Y.; Yun, H.; Liu, Y.; Luan, J.; Li, M. MGCBFormer: The multiscale grid-prior and class-inter boundary-aware transformer for polyp segmentation. Comput. Biol. Med. 2023, 167, 107600. [Google Scholar] [CrossRef]
Short Name | Name | #Samples |
---|---|---|
Kvasir | Kvasir-SEG dataset | 100 |
ColDB | CVC-ColonDB | 380 |
CVC-T | CVC-EndoSceneStill | 300 |
ETIS | ETIS-Larib | 196 |
ClinicDB | CVC-ClinicDB | 612 |
Short Name | Name | #Samples |
---|---|---|
Prat | Pratheepan | 78 |
MCG | MCG-skin | 1000 |
UC | UChile DB-skin | 103 |
CMQ | Compaq | 4675 |
SFA | SFA | 1118 |
HGR | Hand Gesture Recognition | 1558 |
Sch | Schmugge dataset | 845 |
VMD | Human Activity Recognition | 285 |
ECU | ECU Face and Skin Detection | 2000 |
VT | VT-AAST | 66 |
POLYP | SKIN | LEUKO | BFLY | EMICRO | RIBS | LOC | POR | CAM | |
---|---|---|---|---|---|---|---|---|---|
RN18(1) | 0.806 | 0.865 | 0.897 | 0.960 | 0.908 | 0.827 | 0.812 | 0.980 | 0.624 |
RN50(1) | 0.802 | 0.871 | 0.895 | 0.968 | 0.909 | 0.818 | 0.835 | 0.979 | 0.665 |
RN101(1) | 0.808 | 0.871 | 0.915 | 0.976 | 0.918 | 0.776 | 0.830 | 0.981 | 0.717 |
ERN18(10) | 0.821 | 0.866 | 0.913 | 0.963 | 0.913 | 0.842 | 0.830 | 0.981 | 0.672 |
ERN50(10) | 0.807 | 0.872 | 0.897 | 0.969 | 0.918 | 0.839 | 0.840 | 0.980 | 0.676 |
ERN101(10) | 0.834 | 0.878 | 0.925 | 0.978 | 0.919 | 0.779 | 0.838 | 0.982 | 0.734 |
E101(10) | 0.842 | 0.880 | 0.925 | 0.980 | 0.921 | 0.785 | 0.841 | 0.984 | 0.747 |
EM(10) | 0.851 | 0.883 | 0.936 | 0.983 | 0.924 | 0.833 | 0.854 | 0.985 | 0.740 |
EM2(10) | 0.851 | 0.883 | 0.943 | 0.984 | 0.925 | 0.846 | 0.859 | 0.986 | 0.731 |
EM2(5)_DA1 | 0.836 | 0.881 | 0.928 | 0.982 | 0.921 | 0.800 | 0.841 | 0.985 | 0.742 |
EM2(5)_DA2 | 0.847 | 0.869 | 0.948 | 0.985 | 0.920 | 0.860 | 0.842 | 0.983 | 0.700 |
EM3(10) | 0.852 | 0.883 | 0.945 | 0.985 | 0.925 | 0.856 | 0.860 | 0.986 | 0.728 |
POLYP | SKIN | LEUKO | BFLY | EMICRO | RIBS | LOC | POR | CAM | |
---|---|---|---|---|---|---|---|---|---|
EM(10) | 0.787 | 0.798 | 0.887 | 0.966 | 0.869 | 0.714 | 0.769 | 0.971 | 0.630 |
EM2(10) | 0.790 | 0.799 | 0.897 | 0.969 | 0.870 | 0.734 | 0.778 | 0.972 | 0.621 |
EM3(10) | 0.791 | 0.798 | 0.899 | 0.970 | 0.872 | 0.749 | 0.780 | 0.972 | 0.617 |
DA | LR | POLYP | SKIN | EMICRO | CAM | |
---|---|---|---|---|---|---|
LRa | 0.828 | 0.873 | 0.912 | 0.700 | ||
HardNet | DA1 | LRb | 0.821 | 0.858 | 0.905 | 0.667 |
LRc | 0.795 | 0.869 | 0.909 | 0.712 | ||
LRa | 0.852 | 0.870 | 0.912 | 0.715 | ||
HardNet | DA2 | LRb | 0.826 | 0.854 | 0.905 | 0.665 |
LRc | 0.846 | 0.872 | 0.910 | 0.710 | ||
LRa | 0.828 | 0.853 | 0.907 | 0.653 | ||
HardNet | DA3 | LRb | 0.832 | 0.839 | 0.904 | 0.613 |
LRc | 0.828 | 0.865 | 0.904 | 0.694 | ||
Fusion | DA1,2,3 | LRa,b,c | 0.868 | 0.883 | 0.921 | 0.726 |
SOTAEns | 0.863 | 0.886 | 0.916 | — |
DA | LR | SM | POLYP | SKIN | EMICRO | CAM | |
---|---|---|---|---|---|---|---|
LRa | No | 0.857 | 0.874 | 0.919 | 0.788 | ||
DA3 | DA1 | LRb | No | 0.850 | 0.844 | 0.914 | 0.743 |
LRc | No | 0.861 | 0.877 | 0.919 | 0.810 | ||
LRa | No | 0.862 | 0.845 | 0.917 | 0.742 | ||
PVT | DA2 | LRb | No | 0.847 | 0.854 | 0.912 | 0.743 |
LRc | No | 0.862 | 0.876 | 0.917 | 0.813 | ||
LRa | No | 0.855 | 0.875 | 0.917 | 0.765 | ||
PVT | DA3 | LRb | No | 0.851 | 0.856 | 0.916 | 0.718 |
LRc | No | 0.871 | 0.883 | 0.918 | 0.817 | ||
Fusion | DA1,2,3 | LRa,b,c | No | 0.884 | 0.892 | 0.925 | 0.813 |
Fusion | DA1,2,3 | LRa,b,c | Yes | 0.885 | 0.892 | 0.926 | 0.814 |
Baseline Ensemble | DA3 | LRc | 0.880 | 0.886 | 0.921 | 0.829 | |
SOTAEns | 0.877 | 0.883 | 0.922 | — |
DA | LR | SM | POLYP | SKIN | EMICRO | CAM | |
---|---|---|---|---|---|---|---|
LRa | No | 0.847 | 0.873 | 0.919 | 0.776 | ||
HSN | DA1 | LRb | No | 0.852 | 0.816 | 0.916 | 0.742 |
LRc | No | 0.860 | 0.873 | 0.919 | 0.817 | ||
LRa | No | 0.857 | 0.873 | 0.921 | 0.742 | ||
HSN | DA2 | LRb | No | 0.849 | 0.850 | 0.918 | 0.743 |
LRc | No | 0.873 | 0.873 | 0.919 | 0.814 | ||
LRa | No | 0.866 | 0.863 | 0.922 | 0.782 | ||
HSN | DA3 | LRb | No | 0.854 | 0.856 | 0.913 | 0.697 |
LRc | No | 0.866 | 0.876 | 0.924 | 0.800 | ||
Fusion | DA1,2,3 | LRa,b,c | No | 0.881 | 0.885 | 0.926 | 0.813 |
Fusion | DA1,2,3 | LRa,b,c | Yes | 0.882 | 0.886 | 0.926 | 0.812 |
Baseline Ensemble | DA3 | LRc | 0.876 | 0.879 | 0.923 | 0.820 | |
SOTAEns | 0.879 | 0.879 | — | — |
POLYP | SKIN | EMICRO | CAM | |
---|---|---|---|---|
Ens1 | 0.886 | 0.892 | 0.927 | 0.817 |
Ens2 | 0.887 | 0.893 | 0.927 | 0.812 |
Ens3 | 0.886 | 0.894 | 0.927 | 0.805 |
[13] | 0.874 | 0.893 | 0.926 | — |
[14] | — | 0.895 | — | — |
[15] | 0.885 | — | — | — |
Method | Kvasir | ClinDB | ColDB | ETIS | CVC-T | Average | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
IoU | Dice | IoU | Dice | IoU | Dice | IoU | Dice | IoU | Dice | IoU | Dice | |
Ens2 | 0.883 | 0.927 | 0.893 | 0.935 | 0.766 | 0.840 | 0.762 | 0.833 | 0.834 | 0.899 | 0.828 | 0.887 |
HSNet [12] | 0.877 | 0.926 | 0.905 | 0.948 | 0.735 | 0.81 | 0.734 | 0.808 | 0.839 | 0.903 | 0.818 | 0.879 |
MIA-Net [36] | 0.876 | 0.926 | 0.899 | 0.942 | 0.739 | 0.816 | 0.725 | 0.8 | 0.835 | 0.9 | 0.815 | 0.877 |
P2T [37] | 0.849 | 0.905 | 0.873 | 0.923 | 0.68 | 0.761 | 0.631 | 0.7 | 0.805 | 0.879 | 0.768 | 0.834 |
DBMF [38] | 0.886 | 0.932 | 0.886 | 0.933 | 0.73 | 0.803 | 0.711 | 0.79 | 0.859 | 0.919 | 0.814 | 0.875 |
HarDNet [10] | 0.857 | 0.912 | 0.882 | 0.932 | 0.66 | 0.731 | 0.613 | 0.677 | 0.821 | 0.887 | 0.767 | 0.828 |
PraNet, from [10] | 0.84 | 0.898 | 0.849 | 0.899 | 0.64 | 0.709 | 0.567 | 0.628 | 0.797 | 0.871 | 0.739 | 0.801 |
SFA, from [10] | 0.611 | 0.723 | 0.607 | 0.7 | 0.347 | 0.469 | 0.217 | 0.297 | 0.329 | 0.467 | 0.422 | 0.531 |
U-Net++, from [10] | 0.743 | 0.821 | 0.729 | 0.794 | 0.41 | 0.483 | 0.344 | 0.401 | 0.624 | 0.707 | 0.57 | 0.641 |
U-Net, from [10] | 0.746 | 0.818 | 0.755 | 0.823 | 0.444 | 0.512 | 0.335 | 0.398 | 0.627 | 0.71 | 0.581 | 0.652 |
SETR [39] | 0.854 | 0.911 | 0.885 | 0.934 | 0.69 | 0.773 | 0.646 | 0.726 | 0.814 | 0.889 | 0.778 | 0.847 |
TransUnet [40] | 0.857 | 0.913 | 0.887 | 0.935 | 0.699 | 0.781 | 0.66 | 0.731 | 0.824 | 0.893 | 0.785 | 0.851 |
TransFuse [41] | 0.87 | 0.92 | 0.897 | 0.942 | 0.706 | 0.781 | 0.663 | 0.737 | 0.826 | 0.894 | 0.792 | 0.855 |
UACANet [42] | 0.859 | 0.912 | 0.88 | 0.926 | 0.678 | 0.751 | 0.678 | 0.751 | 0.849 | 0.91 | 0.789 | 0.85 |
SANet [43] | 0.847 | 0.904 | 0.859 | 0.916 | 0.67 | 0.753 | 0.654 | 0.75 | 0.815 | 0.888 | 0.769 | 0.842 |
MSNet [44] | 0.862 | 0.907 | 0.879 | 0.921 | 0.678 | 0.755 | 0.664 | 0.719 | 0.807 | 0.869 | 0.778 | 0.834 |
Polyp-PVT [11] | 0.864 | 0.917 | 0.889 | 0.937 | 0.727 | 0.808 | 0.706 | 0.787 | 0.833 | 0.9 | 0.804 | 0.869 |
SwinE-Net [45] | 0.87 | 0.92 | 0.892 | 0.938 | 0.725 | 0.804 | 0.687 | 0.758 | 0.842 | 0.906 | 0.803 | 0.865 |
AMNet [46] | 0.865 | 0.912 | 0.888 | 0.936 | 0.69 | 0.762 | 0.679 | 0.756 | - | - | - | - |
MGCBFormer [47] | 0.885 | 0.931 | 0.915 | 0.955 | 0.731 | 0.807 | 0.747 | 0.819 | 0.851 | 0.913 | 0.826 | 0.885 |
Method | Prat | MCG | UC | CMQ | SFA | HGR | Sch | VMD | ECU | VT | AVG |
---|---|---|---|---|---|---|---|---|---|---|---|
Ens2 | 0.928 | 0.896 | 0.913 | 0.870 | 0.956 | 0.972 | 0.804 | 0.770 | 0.956 | 0.861 | 0.893 |
HardNet | 0.908 | 0.881 | 0.911 | 0.832 | 0.948 | 0.962 | 0.772 | 0.661 | 0.942 | 0.832 | 0.865 |
PVT | 0.919 | 0.891 | 0.906 | 0.860 | 0.950 | 0.970 | 0.806 | 0.726 | 0.950 | 0.849 | 0.883 |
HSN | 0.921 | 0.898 | 0.908 | 0.854 | 0.954 | 0.966 | 0.778 | 0.659 | 0.951 | 0.860 | 0.876 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nanni, L.; Lumini, A.; Fantozzi, C. Exploring the Potential of Ensembles of Deep Learning Networks for Image Segmentation. Information 2023, 14, 657. https://doi.org/10.3390/info14120657
Nanni L, Lumini A, Fantozzi C. Exploring the Potential of Ensembles of Deep Learning Networks for Image Segmentation. Information. 2023; 14(12):657. https://doi.org/10.3390/info14120657
Chicago/Turabian StyleNanni, Loris, Alessandra Lumini, and Carlo Fantozzi. 2023. "Exploring the Potential of Ensembles of Deep Learning Networks for Image Segmentation" Information 14, no. 12: 657. https://doi.org/10.3390/info14120657
APA StyleNanni, L., Lumini, A., & Fantozzi, C. (2023). Exploring the Potential of Ensembles of Deep Learning Networks for Image Segmentation. Information, 14(12), 657. https://doi.org/10.3390/info14120657