Medical Image Classifications Using Convolutional Neural Networks: A Survey of Current Methods and Statistical Modeling of the Literature
Abstract
:1. Introduction
1.1. Background and Context
1.2. Importance of CNN for Medical Image Classification
1.3. Objectives of the Study
1.4. What Distinguishes the Current Study from Previously Published Review Papers?
2. Review of CNN Algorithms and Methods
2.1. Basic Architectures of CNNs
- The input layer is the first layer which receives the input image.
- The convolutional layer is the core layer of the CNN architecture, where the convolution operation is performed using a set of learnable kernels or filters to detect edges, corners and textures (to extract features from the input data) [25]. Feature extraction may involve strides and paddings along with kernels (1).
- The activation function introduces non-linearity to capture complex relationships in the data, and it is applied element-wise to the output of the convolutional layer.
- The pooling layer is applied to reduce the spatial dimensions (width and height) of the feature maps obtained from the convolution layer [16] by performing down-sampling.
- The fully connected layer is used to learn high-level representations by combining features learned from the previous layers.
- The output layer is the last layer which produces the desired output based on the task at hand.
2.2. Improvements in Architectural Designs of CNNs
2.3. Activation Functions Used in CNNs
2.4. Popular Frameworks
2.5. Ensemble Approaches for CNN Models
2.6. Hyperparameters of CNNs Used for Medical Image Analyses
2.6.1. Hyperparameter Tuning and Optimization Methods
- Optimization algorithms, such as particle swarm [93], black-box and gradient or Bayesian-based algorithms (such as the surrogate-based [94,95] or asymmetric kernel function [96]) or genetic or custom genetic algorithms [97], artificial bee colony algorithm [98], the firefly algorithm [99] and the Broyden–Fletcher–Goldfarb–Shanno algorithm (for iteratively solving unconstrained nonlinear optimization)
- The orthogonal array tuning method [2], the adaptive hyperparameter tuning and the covariance matrix adaptation evolution strategy.
- Simulated annealing, the KNN approach, per-parameter regularization and the EVO technique (used to obtain the accurate optimized value in terms of hybridized exploitation and exploration).
2.6.2. Tuning of Parameters
2.6.3. Benchmarking of Model Performances
Performance Metrics Used in Evaluating CNN Models
True Positives (TP) | False Positives (FP) |
False Negatives (FN) | True Negatives (TN) |
- Classification accuracy is the percentage of correctly classified instances out of the total number of instances in the dataset (3).
- Sensitivity and specificity are measures of the true positive rate and true negative rate, respectively. Sensitivity measures the proportion of correctly identified actual-positives (4), and specificity measures the proportion of correctly identified actual-negatives (5).
- The F1 score is a measure of the balance between precision and recall, which are metrics that evaluate the accuracy and completeness of the model’s predictions, respectively. That is, the F1 score (6) is the harmonic mean of precision and recall and is used to evaluate the performance of CNNs in binary classification tasks.
- Mean squared error (MSE): measures the average squared difference between the predicted and actual values (7). MSE is particularly important to evaluate quantitative or regression tasks.
- Area under the curve (AUC) is a summary measure of the ROC curve representing the probability that a randomly chosen negative instance will be ranked lower than a randomly chosen positive instance. It is commonly used to evaluate the overall performance of CNNs in binary classification tasks.
2.7. Data Pre-Processing Methods
- Data distillation methods: uniform experiment design method, highlighting, background filling, resizing, noise reduction, the Gabor filter model, image defect detection and implicit differentiation;
- Optical flow image processing;
- Sliding window data-level approach [109];
- Flattening and normalizing data in a task-specific manner;
- One-hot vector encoding method;
- Frequency-based tokenization [110];
- Training-validation-testing splits.
2.8. Image Datasets Relevant for Medical Themes
2.9. Data Augmentation for Training a Robust CNN Diagnostic Model for Cases with Insufficient Training Data
Enhancing CNN-Based Image Classification for Rare Diseases through Data Augmentation
3. Machine Learning-Assisted Statistical Modeling of the Literature (Pertaining to CCN Application for Medical Image Understanding)
3.1. Literature Search Strategy
3.2. Statistical Modeling and Visualization
4. Results from Statistical Modeling
5. Discussion
5.1. Highlights of Current Practices
5.2. Implications for Clinical Practice
5.3. Gaps in the Current State of CNN Application for Medical Image Understanding
5.4. Trends and Future Directions
- The development of specialized and efficient CNN architectures (including methods for automatically designing CNN architectures), such as evolving arbitrary CNNs with the goal of better discovering and understanding new effective architectures for robust learning outcomes that are tailored towards learning specific representations.
- Design methods that can be readily used for automatic optimization of CNN architectures for personalized medicine.
- Designing new domain agnostic CNN algorithms which can be used for transfer learning or that can be used for reliable learning from small datasets or learn online, e.g., by combining with reinforcement learning.
- Exploring new activation functions for efficient and robust learning, including on small datasets (to mitigate the issue of labeled data scarcity).
- Designing more efficient 3D CNNs.
- The dynamic selection of misclassified negative samples during training to improve performance and to speed up learning.
- The privacy, data security and prevention of adversarial data poisoning.
5.5. Limitations of the Review
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
References
- Razzak, M.I.; Naz, S.; Zaib, A. Deep learning for medical image processing: Overview, challenges and the future. In Classification in BioApps: Automation of Decision Making; Springer: Berlin/Heidelberg, Germany, 2018; pp. 323–350. [Google Scholar]
- Salih, O.; Duffy, K.J. Optimization Convolutional Neural Network for Automatic Skin Lesion Diagnosis Using a Genetic Algorithm. Appl. Sci. 2023, 13, 3248. [Google Scholar] [CrossRef]
- Salehi, A.W.; Khan, S.; Gupta, G.; Alabduallah, B.I.; Almjally, A.; Alsolai, H.; Siddiqui, T.; Mellit, A. A Study of CNN and Transfer Learning in Medical Imaging: Advantages, Challenges, Future Scope. Sustainability 2023, 15, 5930. [Google Scholar] [CrossRef]
- Sarvamangala, D.; Kulkarni, R.V. Convolutional neural networks in medical image understanding: A survey. Evol. Intell. 2022, 15, 1–22. [Google Scholar] [CrossRef]
- Cheng, Y.; Zhao, C.; Neupane, P.; Benjamin, B.; Wang, J.; Zhang, T. Applicability and Trend of the Artificial Intelligence (AI) on Bioenergy Research between 1991–2021: A Bibliometric Analysis. Energies 2023, 16, 1235. [Google Scholar] [CrossRef]
- Al Fryan, L.H.; Shomo, M.I.; Alazzam, M.B. Application of Deep Learning System Technology in Identification of Women’s Breast Cancer. Medicina 2023, 59, 487. [Google Scholar] [CrossRef]
- Alaba, S. Image Classification using Different Machine Learning Techniques. TechRxiv 2023. [Google Scholar] [CrossRef]
- Chan, H.-P.; Samala, R.K.; Hadjiiski, L.M.; Zhou, C. Deep learning in medical image analysis. Deep Learn. Med. Image Anal. Chall. Appl. 2020, 1213, 3–21. [Google Scholar]
- Inamullah; Hassan, S.; Alrajeh, N.A.; Mohammed, E.A.; Khan, S. Data Diversity in Convolutional Neural Network Based Ensemble Model for Diabetic Retinopathy. Biomimetics 2023, 8, 187. [Google Scholar] [CrossRef] [PubMed]
- Fu, Y.; Lei, Y.; Wang, T.; Curran, W.J.; Liu, T.; Yang, X. Deep learning in medical image registration: A review. Phys. Med. Biol. 2020, 65, 20TR01. [Google Scholar] [CrossRef]
- El-Ghany, S.A.; Azad, M.; Elmogy, M. Robustness Fine-Tuning Deep Learning Model for Cancers Diagnosis Based on Histopathology Image Analysis. Diagnostics 2023, 13, 699. [Google Scholar] [CrossRef]
- Equbal, A.; Masood, S.; Equbal, I.; Ahmad, S.; Khan, N.Z.; Khan, Z.A. Artificial intelligence against COVID-19 Pandemic: A Comprehensive Insight. Curr. Med. Imaging 2023, 19, 1–18. [Google Scholar]
- Fehling, M.K.; Grosch, F.; Schuster, M.E.; Schick, B.; Lohscheller, J. Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network. PLoS ONE 2020, 15, e0227791. [Google Scholar] [CrossRef] [PubMed]
- Krupička, R.; Mareček, S.; Malá, C.; Lang, M.; Klempíř, O.; Duspivová, T.; Široká, R.; Jarošíková, T.; Keller, J.; Šonka, K. Automatic substantia nigra segmentation in neuromelanin-sensitive MRI by deep neural network in patients with prodromal and manifest synucleinopathy. Physiol. Res. 2019, 68, S453–S458. [Google Scholar] [CrossRef] [PubMed]
- Lin, X. Research of Convolutional Neural Network on Image Classification. Highlights Sci. Eng. Technol. 2023, 39, 855–862. [Google Scholar] [CrossRef]
- Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A. A survey of the recent architectures of deep convolutional neural networks. arXiv 2019, arXiv:1901.06032. [Google Scholar] [CrossRef]
- Abdelrazik, M.A.; Zekry, A.; Mohamed, W.A. Efficient Hybrid Algorithm for Human Action Recognition. J. Image Graph. 2023, 11, 72–81. [Google Scholar] [CrossRef]
- Jussupow, E.; Spohrer, K.; Heinzl, A.; Gawlitza, J. Augmenting medical diagnosis decisions? An investigation into physicians’ decision-making process with artificial intelligence. Inf. Syst. Res. 2021, 32, 713–735. [Google Scholar] [CrossRef]
- Liu, J.-W.; Zuo, F.-L.; Guo, Y.-X.; Li, T.-Y.; Chen, J.-M. Research on improved wavelet convolutional wavelet neural networks. Appl. Intell. 2021, 51, 4106–4126. [Google Scholar] [CrossRef]
- Chen, L.; Li, S.; Bai, Q.; Yang, J.; Jiang, S.; Miao, Y. Review of image classification algorithms based on convolutional neural networks. Remote Sens. 2021, 13, 4712. [Google Scholar] [CrossRef]
- Tripathi, K.; Gupta, A.K.; Vyas, R.G. Deep residual learning for image classification using cross validation. Int. J. Innov. Technol. Explor. Eng. 2020, 9, 1525–1530. [Google Scholar] [CrossRef]
- Wang, W.; Yang, X.; Li, X.; Tang, J. Convolutional-capsule network for gastrointestinal endoscopy image classification. Int. J. Intell. Syst. 2022, 37, 5796–5815. [Google Scholar] [CrossRef]
- Lim, M.; Lee, D.; Park, H.; Kang, Y.; Oh, J.; Park, J.-S.; Jang, G.-J.; Kim, J.-H. Convolutional Neural Network based Audio Event Classification. KSII Trans. Internet Inf. Syst. 2018, 12, 2748–2760. [Google Scholar]
- Wang, W.; Yang, Y.; Wang, X.; Wang, W.; Li, J. Development of convolutional neural network and its application in image classification: A survey. Opt. Eng. 2019, 58, 040901. [Google Scholar] [CrossRef]
- Kao, C.-C. Optimizing FPGA-Based Convolutional Neural Network Performance. J. Circuits Syst. Comput. 2023, 32, 2350254. [Google Scholar] [CrossRef]
- Jain, A.; Singh, R.; Vatsa, M. On detecting GANs and retouching based synthetic alterations. In Proceedings of the 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), Redondo Beach, CA, USA, 22–25 October 2018; pp. 1–7. [Google Scholar]
- Wang, T.; Lan, J.; Han, Z.; Hu, Z.; Huang, Y.; Deng, Y.; Zhang, H.; Wang, J.; Chen, M.; Jiang, H. O-Net: A novel framework with deep fusion of CNN and transformer for simultaneous segmentation and classification. Front. Neurosci. 2022, 16, 876065. [Google Scholar] [CrossRef] [PubMed]
- Siddique, N.; Paheding, S.; Elkin, C.P.; Devabhaktuni, V. U-net and its variants for medical image segmentation: A review of theory and applications. IEEE Access 2021, 9, 82031–82057. [Google Scholar] [CrossRef]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018, Proceedings; Springer Nature: Cham, Switzerland, 2018; pp. 3–11. [Google Scholar]
- Milletari, F.; Navab, N.; Ahmadi, S.-A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar]
- Jha, D.; Riegler, M.A.; Johansen, D.; Halvorsen, P.; Johansen, H.D. Doubleu-net: A deep convolutional neural network for medical image segmentation. In Proceedings of the 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester, MN, USA, 28–30 July 2020; pp. 558–564. [Google Scholar]
- Shen, J.; Tao, Y.; Guan, H.; Zhen, H.; He, L.; Dong, T.; Wang, S.; Chen, Y.; Chen, Q.; Liu, Z. Clinical Validation and Treatment Plan Evaluation Based on Autodelineation of the Clinical Target Volume for Prostate Cancer Radiotherapy. Technol. Cancer Res. Treat. 2023, 22, 15330338231164883. [Google Scholar] [CrossRef] [PubMed]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
- Farooq, M.; Hafeez, A. Covid-resnet: A deep learning framework for screening of COVID-19 from radiographs. arXiv 2020, arXiv:2003.14395. [Google Scholar]
- Shehab, L.H.; Fahmy, O.M.; Gasser, S.M.; El-Mahallawy, M.S. An efficient brain tumor image segmentation based on deep residual networks (ResNets). J. King Saud Univ.-Eng. Sci. 2021, 33, 404–412. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Iandola, F.; Moskewicz, M.; Karayev, S.; Girshick, R.; Darrell, T.; Keutzer, K. Densenet: Implementing efficient convnet descriptor pyramids. arXiv 2014, arXiv:1404.1869. [Google Scholar]
- Jégou, S.; Drozdzal, M.; Vazquez, D.; Romero, A.; Bengio, Y. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 11–19. [Google Scholar]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. (CSUR) 2022, 54, 1–41. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Hatamizadeh, A.; Nath, V.; Tang, Y.; Yang, D.; Roth, H.R.; Xu, D. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In Proceedings of the International MICCAI Brainlesion Workshop, Virtual, 27 September 2021; pp. 272–284. [Google Scholar]
- Hatamizadeh, A.; Tang, Y.; Nath, V.; Yang, D.; Myronenko, A.; Landman, B.; Roth, H.R.; Xu, D. Unetr: Transformers for 3d medical image segmentation. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 574–584. [Google Scholar]
- Dalmaz, O.; Yurt, M.; Çukur, T. ResViT: Residual vision transformers for multimodal medical image synthesis. IEEE Trans. Med. Imaging 2022, 41, 2598–2614. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
- Li, Z.; Li, D.; Xu, C.; Wang, W.; Hong, Q.; Li, Q.; Tian, J. TFCNs: A CNN-Transformer Hybrid Network for Medical Image Segmentation. In Artificial Neural Networks and Machine Learning–ICANN 2022: 31st International Conference on Artificial Neural Networks, Bristol, UK, September 6–9, 2022, Proceedings; Part IV; Springer: Cham, Switzerland, 2022; pp. 781–792. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6881–6890. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Valanarasu, J.M.J.; Oza, P.; Hacihaliloglu, I.; Patel, V.M. Medical transformer: Gated axial-attention for medical image segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I; Springer: Cham, Switzerland, 2021; pp. 36–46. [Google Scholar]
- Dai, Y.; Gao, Y.; Liu, F. Transmed: Transformers advance multi-modal medical image classification. Diagnostics 2021, 11, 1384. [Google Scholar] [CrossRef]
- Wang, Z.; Min, X.; Shi, F.; Jin, R.; Nawrin, S.S.; Yu, I.; Nagatomi, R. SMESwin Unet: Merging CNN and Transformer for Medical Image Segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, 18–22 September 2022, Proceedings, Part V; Springer: Cham, Switzerland, 2022; pp. 517–526. [Google Scholar]
- Khairandish, M.O.; Sharma, M.; Jain, V.; Chatterjee, J.M.; Jhanjhi, N. A hybrid CNN-SVM threshold segmentation approach for tumor detection and classification of MRI brain images. IRBM 2022, 43, 290–299. [Google Scholar] [CrossRef]
- Pham, Q.-D.; Nguyen-Truong, H.; Phuong, N.N.; Nguyen, K.N.; Nguyen, C.D.; Bui, T.; Truong, S.Q. Segtransvae: Hybrid cnn-transformer with regularization for medical image segmentation. In Proceedings of the 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), Kolkata, India, 28–31 March 2022; pp. 1–5. [Google Scholar]
- Dastider, A.G.; Sadik, F.; Fattah, S.A. An integrated autoencoder-based hybrid CNN-LSTM model for COVID-19 severity prediction from lung ultrasound. Comput. Biol. Med. 2021, 132, 104296. [Google Scholar] [CrossRef]
- Yu, Z.; Lee, F.; Chen, Q. HCT-net: Hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation. Appl. Intell. 2023, 53, 19990–20006. [Google Scholar] [CrossRef]
- Sun, Q.; Fang, N.; Liu, Z.; Zhao, L.; Wen, Y.; Lin, H. HybridCTrm: Bridging CNN and transformer for multimodal brain image segmentation. J. Healthc. Eng. 2021, 2021, 7467261. [Google Scholar] [CrossRef] [PubMed]
- Sangeetha, S.; Mathivanan, S.K.; Karthikeyan, P.; Rajadurai, H.; Shivahare, B.D.; Mallik, S.; Qin, H. An enhanced multimodal fusion deep learning neural network for lung cancer classification. Syst. Soft Comput. 2024, 6, 200068. [Google Scholar]
- Sharif, M.I.; Li, J.P.; Khan, M.A.; Kadry, S.; Tariq, U. M3BTCNet: Multi model brain tumor classification using metaheuristic deep neural network features optimization. Neural Comput. Appl. 2022, 36, 95–110. [Google Scholar] [CrossRef]
- Haque, R.; Hassan, M.M.; Bairagi, A.K.; Shariful Islam, S.M. NeuroNet19: An explainable deep neural network model for the classification of brain tumors using magnetic resonance imaging data. Sci. Rep. 2024, 14, 1524. [Google Scholar] [CrossRef]
- Swain, A.K.; Swetapadma, A.; Rout, J.K.; Balabantaray, B.K. Classification of non-small cell lung cancer types using sparse deep neural network features. Biomed. Signal Process. Control 2024, 87, 105485. [Google Scholar] [CrossRef]
- Morais, M.; Calisto, F.M.; Santiago, C.; Aleluia, C.; Nascimento, J.C. Classification of breast cancer in Mri with multimodal fusion. In Proceedings of the 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), Cartagena, Colombia, 18–21 April 2023; pp. 1–4. [Google Scholar]
- Kaya, M. Feature fusion-based ensemble CNN learning optimization for automated detection of pediatric pneumonia. Biomed. Signal Process. Control 2024, 87, 105472. [Google Scholar] [CrossRef]
- Abrantes, J.; Bento e Silva, M.J.N.; Meneses, J.P.; Oliveira, C.; Calisto, F.M.G.F.; Filice, R.W. External validation of a deep learning model for breast density classification. ECR 2023. [CrossRef]
- Diogo, P.; Morais, M.; Calisto, F.M.; Santiago, C.; Aleluia, C.; Nascimento, J.C. Weakly-Supervised Diagnosis and Detection of Breast Cancer Using Deep Multiple Instance Learning. In Proceedings of the 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), Cartagena, Colombia, 18–21 April 2023; pp. 1–4. [Google Scholar]
- Han, Q.; Qian, X.; Xu, H.; Wu, K.; Meng, L.; Qiu, Z.; Weng, T.; Zhou, B.; Gao, X. DM-CNN: Dynamic Multi-scale Convolutional Neural Network with uncertainty quantification for medical image classification. Comput. Biol. Med. 2024, 168, 107758. [Google Scholar] [CrossRef]
- He, Y.; Gao, Z.; Li, Y.; Wang, Z. A lightweight multi-modality medical image semantic segmentation network base on the novel UNeXt and Wave-MLP. Comput. Med. Imaging Graph. 2024, 111, 102311. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Qureshi, S.A.; Raza, S.E.A.; Hussain, L.; Malibari, A.A.; Nour, M.K.; Rehman, A.u.; Al-Wesabi, F.N.; Hilal, A.M. Intelligent ultra-light deep learning model for multi-class brain tumor detection. Appl. Sci. 2022, 12, 3715. [Google Scholar] [CrossRef]
- Xiao, J.; Ye, H.; He, X.; Zhang, H.; Wu, F.; Chua, T.-S. Attentional factorization machines: Learning the weight of feature interactions via attention networks. arXiv 2017, arXiv:1708.04617 2017. [Google Scholar]
- Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X. DeepFM: A factorization-machine based neural network for CTR prediction. arXiv 2017, arXiv:1703.04247. [Google Scholar]
- Wang, R.; Fu, B.; Fu, G.; Wang, M. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17, Halifax, NS, Canada, 13–17 August 2023; pp. 1–7. [Google Scholar]
- Watanabe, S.; Hori, T.; Karita, S.; Hayashi, T.; Nishitoba, J.; Unno, Y.; Soplin, N.E.Y.; Heymann, J.; Wiesner, M.; Chen, N. Espnet: End-to-end speech processing toolkit. arXiv 2018, arXiv:1804.00015. [Google Scholar]
- Pratap, V.; Hannun, A.; Xu, Q.; Cai, J.; Kahn, J.; Synnaeve, G.; Liptchinsky, V.; Collobert, R. Wav2letter++: A fast open-source speech recognition system. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 6460–6464. [Google Scholar]
- Dai, J.J.; Ding, D.; Shi, D.; Huang, S.; Wang, J.; Qiu, X.; Huang, K.; Song, G.; Wang, Y.; Gong, Q. Bigdl 2.0: Seamless scaling of ai pipelines from laptops to distributed cluster. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 21439–21446. [Google Scholar]
- Thongprayoon, C.; Kaewput, W.; Kovvuru, K.; Hansrivijit, P.; Kanduri, S.R.; Bathini, T.; Chewcharat, A.; Leeaphorn, N.; Gonzalez-Suarez, M.L.; Cheungpasitporn, W. Promises of big data and artificial intelligence in nephrology and transplantation. J. Clin. Med. 2020, 9, 1107. [Google Scholar] [CrossRef]
- Jayasinghe, W.L.P.; Deo, R.C.; Ghahramani, A.; Ghimire, S.; Raj, N. Deep multi-stage reference evapotranspiration forecasting model: Multivariate empirical mode decomposition integrated with the boruta-random forest algorithm. IEEE Access 2021, 9, 166695–166708. [Google Scholar] [CrossRef]
- Nazari, E.; Biviji, R.; Roshandel, D.; Pour, R.; Shahriari, M.H.; Mehrabian, A.; Tabesh, H. Decision fusion in healthcare and medicine: A narrative review. Mhealth 2022, 8, 8. [Google Scholar] [CrossRef]
- Santoso, I.B.; Adrianto, Y.; Sensusiati, A.D.; Wulandari, D.P.; Purnama, I.K.E. Ensemble Convolutional Neural Networks with Support Vector Machine for Epilepsy Classification Based on Multi-Sequence of Magnetic Resonance Images. IEEE Access 2022, 10, 32034–32048. [Google Scholar] [CrossRef]
- Liu, N.; Shen, J.; Xu, M.; Gan, D.; Qi, E.-S.; Gao, B. Improved cost-sensitive support vector machine classifier for breast cancer diagnosis. Math. Probl. Eng. 2018, 2018, 3875082. [Google Scholar] [CrossRef]
- Qureshi, A.S.; Roos, T. Transfer learning with ensembles of deep neural networks for skin cancer detection in imbalanced data sets. Neural Process. Lett. 2022, 55, 4461–4479. [Google Scholar] [CrossRef]
- Li, X.; Xiong, H.; Chen, Z.; Huan, J.; Xu, C.-Z.; Dou, D. “In-Network Ensemble”: Deep Ensemble Learning with Diversified Knowledge Distillation. ACM Trans. Intell. Syst. Technol. (TIST) 2021, 12, 1–19. [Google Scholar] [CrossRef]
- Mukherjee, D.; Dhar, K.; Schwenker, F.; Sarkar, R. Ensemble of deep learning models for sleep apnea detection: An experimental study. Sensors 2021, 21, 5425. [Google Scholar] [CrossRef] [PubMed]
- SureshKumar, M.; Perumal, V.; Yuvaraj, G.; Rajasekar, S.J.S. Detection of Pneumonia from Chest X-Ray images using Machine Learning. Concurr. Eng.-Res. Appl. 2022, 30, 325–334. [Google Scholar]
- Cui, W.; Liu, Y.; Li, Y.; Guo, M.; Li, Y.; Li, X.; Wang, T.; Zeng, X.; Ye, C. Semi-supervised brain lesion segmentation with an adapted mean teacher model. In Information Processing in Medical Imaging: 26th International Conference, IPMI 2019, Hong Kong, China, 2–7 June 2019, Proceedings 26; Springer: Cham, Switzerland, 2019; pp. 554–565. [Google Scholar]
- Toda, R.; Oda, M.; Hayashi, Y.; Otake, Y.; Hashimoto, M. Improved method for COVID-19 classification of complex-architecture CNN from chest CT volumes using orthogonal ensemble networks. In Proceedings of the SPIE Medical Imaging, San Diego, CA, USA, 19–24 February 2023; p. 124650D. [Google Scholar]
- Chen, Y.-M.; Chen, Y.J.; Ho, W.-H.; Tsai, J.-T. Classifying chest CT images as COVID-19 positive/negative using a convolutional neural network ensemble model and uniform experimental design method. BMC Bioinform. 2021, 22, 147. [Google Scholar] [CrossRef] [PubMed]
- Thomas, J.B.; KV, S.; Sulthan, S.M.; Al-Jumaily, A. Deep Feature Meta-Learners Ensemble Models for COVID-19 CT Scan Classification. Electronics 2023, 12, 684. [Google Scholar] [CrossRef]
- Liu, S.; Xie, Y.; Jirapatnakul, A.; Reeves, A.P. Pulmonary nodule classification in lung cancer screening with three-dimensional convolutional neural networks. J. Med. Imaging 2017, 4, 041308. [Google Scholar] [CrossRef] [PubMed]
- Bazgir, O.; Ghosh, S.; Pal, R. Investigation of REFINED CNN ensemble learning for anti-cancer drug sensitivity prediction. Bioinformatics 2021, 37, i42–i50. [Google Scholar] [CrossRef]
- Patane, A.; Kwiatkowska, M. Calibrating the classifier: Siamese neural network architecture for end-to-end arousal recognition from ECG. In Machine Learning, Optimization, and Data Science: 4th International Conference, LOD 2018, Volterra, Italy, 13–16 September 2018, Revised Selected Papers 4; Springer: Cham, Switzerland, 2019; pp. 1–13. [Google Scholar]
- Wen, L.; Ye, X.; Gao, L. A new automatic machine learning based hyperparameter optimization for workpiece quality prediction. Meas. Control 2020, 53, 1088–1098. [Google Scholar] [CrossRef]
- Gu, B.; Liu, G.; Zhang, Y.; Geng, X.; Huang, H. Optimizing large-scale hyperparameters via automated learning algorithm. arXiv 2021, arXiv:2102.09026. [Google Scholar]
- Liu, Y.; Li, Q.; Cai, D.; Lu, W. Research on the strategy of locating abnormal data in IOT management platform based on improved modified particle swarm optimization convolutional neural network algorithm. Authorea Prepr. 2023. [CrossRef]
- Ait Amou, M.; Xia, K.; Kamhi, S.; Mouhafid, M. A Novel MRI Diagnosis Method for Brain Tumor Classification Based on CNN and Bayesian Optimization. Healthcare 2022, 10, 494. [Google Scholar] [CrossRef]
- Saeed, T.; Loo, C.K.; Kassim, M.S.S. Ensembles of deep learning framework for stomach abnormalities classification. CMC Comput. Mater. Contin. 2022, 70, 4357–4372. [Google Scholar] [CrossRef]
- AlBahar, A.; Kim, I.; Yue, X. A robust asymmetric kernel function for Bayesian optimization, with application to image defect detection in manufacturing systems. IEEE Trans. Autom. Sci. Eng. 2021, 19, 3222–3233. [Google Scholar] [CrossRef]
- Thavasimani, K.; Srinath, N.K. Hyperparameter optimization using custom genetic algorithm for classification of benign and malicious traffic on internet of things-23 dataset. Int. J. Electr. Comput. Eng. 2022, 12, 4031. [Google Scholar] [CrossRef]
- Ozcan, T.; Basturk, A. Performance improvement of pre-trained convolutional neural networks for action recognition. Comput. J. 2021, 64, 1715–1730. [Google Scholar] [CrossRef]
- Korade, N.B.; Zuber, M. Stock Price Forecasting using Convolutional Neural Networks and Optimization Techniques. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 378–385. [Google Scholar] [CrossRef]
- Ghawi, R.; Pfeffer, J. Efficient hyperparameter tuning with grid search for text categorization using KNN approach with BM25 similarity. Open Comput. Sci. 2019, 9, 160–180. [Google Scholar] [CrossRef]
- Sinha, A.; Khandait, T.; Mohanty, R. A gradient-based bilevel optimization approach for tuning hyperparameters in machine learning. arXiv 2020, arXiv:2007.11022. [Google Scholar]
- Florea, A.-C.; Andonie, R. Weighted random search for hyperparameter optimization. arXiv 2020, arXiv:2004.01628. [Google Scholar] [CrossRef]
- Nayak, D.R.; Padhy, N.; Mallick, P.K.; Bagal, D.K.; Kumar, S. Brain tumour classification using noble deep learning approach with parametric optimization through metaheuristics approaches. Computers 2022, 11, 10. [Google Scholar] [CrossRef]
- Passos, L.A.; Papa, J.P. A metaheuristic-driven approach to fine-tune deep Boltzmann machines. Appl. Soft Comput. 2020, 97, 105717. [Google Scholar] [CrossRef]
- Ergen, T.; Mirza, A.H.; Kozat, S.S. Energy-Efficient LSTM Networks for Online Learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 3114–3126. [Google Scholar] [CrossRef] [PubMed]
- Mujahid, M.; Rustam, F.; Álvarez, R.; Luis Vidal Mazón, J.; Díez, I.d.l.T.; Ashraf, I. Pneumonia Classification from X-ray Images with Inception-V3 and Convolutional Neural Network. Diagnostics 2022, 12, 1280. [Google Scholar] [CrossRef] [PubMed]
- Subramanian, B.; Muthusamy, S.; Thangaraj, K.; Panchal, H.; Kasirajan, E.; Marimuthu, A.; Ravi, A. A new method for detection and classification of melanoma skin cancer using deep learning based transfer learning architecture models. Res. Sq. 2022, preprint. [Google Scholar] [CrossRef]
- Gaur, L.; Bhatia, U.; Jhanjhi, N.; Muhammad, G.; Masud, M. Medical image-based detection of COVID-19 using deep convolution neural networks. Multimed. Syst. 2021, 29, 1729–1738. [Google Scholar] [CrossRef] [PubMed]
- Suresh, V.; Janik, P.; Rezmer, J.; Leonowicz, Z. Forecasting solar PV output using convolutional neural networks with a sliding window algorithm. Energies 2020, 13, 723. [Google Scholar] [CrossRef]
- Bhandari, N.; Khare, S.; Walambe, R.; Kotecha, K. Comparison of machine learning and deep learning techniques in promoter prediction across diverse species. PeerJ Comput. Sci. 2021, 7, e365. [Google Scholar] [CrossRef] [PubMed]
- Kumar, A.; Kim, J.; Lyndon, D.; Fulham, M.; Feng, D. An ensemble of fine-tuned convolutional neural networks for medical image classification. IEEE J. Biomed. Health Inform. 2016, 21, 31–40. [Google Scholar] [CrossRef] [PubMed]
- Cifci, M.A.; Hussain, S.; Canatalay, P.J. Hybrid Deep Learning Approach for Accurate Tumor Detection in Medical Imaging Data. Diagnostics 2023, 13, 1025. [Google Scholar] [CrossRef]
- Kalantar, R.; Lin, G.; Winfield, J.M.; Messiou, C.; Lalondrelle, S.; Blackledge, M.D.; Koh, D.-M. Automatic segmentation of pelvic cancers using deep learning: State-of-the-art approaches and challenges. Diagnostics 2021, 11, 1964. [Google Scholar] [CrossRef]
- Li, J.; Han, D.; Wang, X.; Yi, P.; Yan, L.; Li, X. Multi-sensor medical-image fusion technique based on embedding bilateral filter in least squares and salient detection. Sensors 2023, 23, 3490. [Google Scholar] [CrossRef] [PubMed]
- Boikos, C.; Imran, M.; De Lusignan, S.; Ortiz, J.R.; Patriarca, P.A.; Mansi, J.A. Integrating Electronic Medical Records and Claims Data for Influenza Vaccine Research. Vaccines 2022, 10, 727. [Google Scholar] [CrossRef] [PubMed]
- Chlap, P.; Min, H.; Vandenberg, N.; Dowling, J.; Holloway, L.; Haworth, A. A review of medical image data augmentation techniques for deep learning applications. J. Med. Imaging Radiat. Oncol. 2021, 65, 545–563. [Google Scholar] [CrossRef] [PubMed]
- Yoo, J.; Kang, S. Class-Adaptive Data Augmentation for Image Classification. IEEE Access 2023, 11, 26393–26402. [Google Scholar] [CrossRef]
- Takahashi, R.; Matsubara, T.; Uehara, K. Data augmentation using random image cropping and patching for deep CNNs. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 2917–2931. [Google Scholar] [CrossRef]
- Alkhairi, P.; Windarto, A.P. Classification Analysis of Back propagation-Optimized CNN Performance in Image Processing. J. Syst. Eng. Inf. Technol. (JOSEIT) 2023, 2, 8–15. [Google Scholar]
- Feshawy, S.; Saad, W.; Shokair, M.; Dessouky, M. Proposed Approaches for Brain Tumors Detection Techniques Using Convolutional Neural Networks. Int. J. Telecommun. 2022, 2, 1–14. [Google Scholar] [CrossRef]
- Alsmirat, M.; Al-Mnayyis, N.; Al-Ayyoub, M.; Asma’A, A.-M. Deep learning-based disk herniation computer aided diagnosis system from mri axial scans. IEEE Access 2022, 10, 32315–32323. [Google Scholar] [CrossRef]
- Wei, R.; Zhou, F.; Liu, B.; Bai, X.; Fu, D.; Li, Y.; Liang, B.; Wu, Q. Convolutional neural network (CNN) based three dimensional tumor localization using single X-ray projection. IEEE Access 2019, 7, 37026–37038. [Google Scholar] [CrossRef]
- Gowdra, N.; Sinha, R.; MacDonell, S. Examining and mitigating kernel saturation in convolutional neural networks using negative images. In Proceedings of the IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 18–21 October 2020; pp. 465–470. [Google Scholar]
- Van Eck, N.; Waltman, L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 2010, 84, 523–538. [Google Scholar] [CrossRef]
- Yu, Y.; Li, Y.; Zhang, Z.; Gu, Z.; Zhong, H.; Zha, Q.; Yang, L.; Zhu, C.; Chen, E. A bibliometric analysis using VOSviewer of publications on COVID-19. Ann. Transl. Med. 2020, 8, 816. [Google Scholar] [CrossRef]
- Wickham, H. ggplot2. Wiley Interdiscip. Rev. Comput. Stat. 2011, 3, 180–185. [Google Scholar] [CrossRef]
- Islam, M.Z.; Islam, M.M.; Asraf, A. A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using X-ray images. Inform. Med. Unlocked 2020, 20, 100412. [Google Scholar] [CrossRef] [PubMed]
- Munir, K.; Elahi, H.; Ayub, A.; Frezza, F.; Rizzi, A. Cancer diagnosis using deep learning: A bibliographic review. Cancers 2019, 11, 1235. [Google Scholar] [CrossRef] [PubMed]
- Abdou, M.A. Literature review: Efficient deep neural networks techniques for medical image analysis. Neural Comput. Appl. 2022, 34, 5791–5812. [Google Scholar] [CrossRef]
- Yao, X.; Wang, X.; Wang, S.-H.; Zhang, Y.-D. A comprehensive survey on convolutional neural network in medical image analysis. Multimed. Tools Appl. 2020, 81, 41361–41405. [Google Scholar] [CrossRef]
- Summers, R. Deep convolutional neural networks for computer-aided detection: Cnn architectures dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 2016, 35, 1285. [Google Scholar]
- Abbas, A.; Abdelsamea, M.M.; Gaber, M.M. Detrac: Transfer learning of class decomposed medical images in convolutional neural networks. IEEE Access 2020, 8, 74901–74913. [Google Scholar] [CrossRef]
- Xu, L.; Huang, J.; Nitanda, A.; Asaoka, R.; Yamanishi, K. A novel global spatial attention mechanism in convolutional neural network for medical image classification. arXiv 2020, arXiv:2007.15897. [Google Scholar]
- Khan, A.H.; Abbas, S.; Khan, M.A.; Farooq, U.; Khan, W.A.; Siddiqui, S.Y.; Ahmad, A. Intelligent model for brain tumor identification using deep learning. Appl. Comput. Intell. Soft Comput. 2022, 2022, 8104054. [Google Scholar] [CrossRef]
- Mahjoubi, M.A.; Hamida, S.; El Gannour, O.; Cherradi, B.; El Abbassi, A.; Raihani, A. Improved Multiclass Brain Tumor Detection using Convolutional Neural Networks and Magnetic Resonance Imaging. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 406–414. [Google Scholar] [CrossRef]
- Pham, C.-H.; Tor-Díez, C.; Meunier, H.; Bednarek, N.; Fablet, R.; Passat, N.; Rousseau, F. Multiscale brain MRI super-resolution using deep 3D convolutional networks. Comput. Med. Imaging Graph. 2019, 77, 101647. [Google Scholar] [CrossRef]
- Papandrianos, N.; Papageorgiou, E.; Anagnostis, A.; Feleki, A. A deep-learning approach for diagnosis of metastatic breast cancer in bones from whole-body scans. Appl. Sci. 2020, 10, 997. [Google Scholar] [CrossRef]
- Serte, S.; Serener, A.; Al-Turjman, F. Deep learning in medical imaging: A brief review. Trans. Emerg. Telecommun. Technol. 2022, 33, e4080. [Google Scholar] [CrossRef]
- Ahmed, M.; Du, H.; AlZoubi, A. An ENAS based approach for constructing deep learning models for breast cancer recognition from ultrasound images. arXiv 2020, arXiv:2005.13695. [Google Scholar]
- Kugunavar, S.; Prabhakar, C. Convolutional neural networks for the diagnosis and prognosis of the coronavirus disease pandemic. Vis. Comput. Ind. Biomed. Art 2021, 4, 12. [Google Scholar] [CrossRef] [PubMed]
- Frid-Adar, M.; Diamant, I.; Klang, E.; Amitai, M.; Goldberger, J.; Greenspan, H. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 2018, 321, 321–331. [Google Scholar] [CrossRef]
- Singh, S.P.; Wang, L.; Gupta, S.; Goli, H.; Padmanabhan, P.; Gulyás, B. 3D deep learning on medical images: A review. Sensors 2020, 20, 5097. [Google Scholar] [CrossRef] [PubMed]
- Agrawal, T.; Gupta, R.; Narayanan, S. On evaluating CNN representations for low resource medical image classification. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 1363–1367. [Google Scholar]
- Tran, D.T.; Iosifidis, A.; Gabbouj, M. Improving efficiency in convolutional neural networks with multilinear filters. Neural Netw. 2018, 105, 328–339. [Google Scholar] [CrossRef]
- Hegde, K.; Agrawal, R.; Yao, Y.; Fletcher, C.W. Morph: Flexible acceleration for 3d cnn-based video understanding. In Proceedings of the 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Fukuoka, Japan, 20–24 October 2018; pp. 933–946. [Google Scholar]
- Hasenstab, K.A.; Huynh, J.; Masoudi, S.; Cunha, G.M.; Pazzani, M.; Hsiao, A. Feature Interpretation using Generative Adversarial Networks (FIGAN): A Framework for Visualizing a CNN’s Learned Features. IEEE Access 2023, 11, 5144–5160. [Google Scholar] [CrossRef]
- Fielding, B.; Lawrence, T.; Zhang, L. Evolving and ensembling deep CNN architectures for image classification. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
CNN Design | Description | Specific Application |
---|---|---|
U-net [28] U-net++ [29] | U-shaped network design or a nested U-net architecture. | For the segmentation of medical images [30,31,32]. |
attention U-net [33] | Attention gate (AG) model. | Automatically learns to focus on structures of varying sizes and shapes. |
ResNet [34,35,36] | A deep residual learning network (a shortcut connection model to significantly reduce the difficulty of training very deep CNNs). | Aims to simplify very deep networks by introducing a residual block that sums two input signals. |
FC-DenseNet [37,38] | Fully convolutional DenseNet developed by the composition of dense blocks and pooling operations in which the up-sampling path was introduced to restore the input resolution. | For semantic image segmentation. |
ViT [39,40] | Vision transformer. | For image segmentation. |
Swin Transformer [41] | Hierarchical vision transformer using shifted windows (uses a sliding window to limit self-attention calculations to non-overlapping partial windows). | Serves as a general-purpose backbone for medical image segmentation and classification. |
Swin UNETR [42] UNETR [43] | Shifted windows UNet transformers (Swin UNet Transformers): pretrained, large-scale and self-supervised 3D models for data annotation (tailored for 3D segmentation and directly use volumetric data). | Pretrained framework tailored for self-supervised tasks in 3D medical image analysis. |
ResViT [44] | Residual vision transformers. | Generative adversarial network for multi-modal medical image synthesis. |
TransUNet [45] | This has an embedded transformers in the down-sampling process to extract the information in the original image. | To solve a lack of high-level detail. |
TFCNs [46] | Transformers for fully convolutional DenseNet. | To tackle the problem of high-precision medical image segmentation by introducing a ResLinear-transformer and convolutional linear attention block to FC-DenseNet |
SETR [47] | Segmentation transformer. | A pure transformer (without convolution and resolution reduction) to encode an image as a sequence of patches. |
Deformable DETR [48] | Fully end-to-end object detector using a simple architecture by combining CNNs and transformer encoder–decoders architecture. | Mitigates the slow convergence and high complexity issues of DETR. |
Medical Transformer [49] | Gated axial attention for medical image segmentation. | Operates on the whole image and patches to learn global and local features. |
O-Net [27] | Framework with deep fusion of CNN and transformer. | For simultaneous segmentation and classification. |
TransMed [50] | Combines CNN and transformer to efficiently extract low-level features of images. | Multi-modal medical image classification. |
SMESwin Unet [51] | Superpixel and MCCT-based channel-wise cross-fusion transformer (CCT) coupled with multi-scale semantic features and attention maps (Swin UNet). | For medical image segmentation. |
CNN-SVM hybrid [52] | Threshold segmentation approach. | For tumor detection and classification of MRI brain images. |
SegTransVAE [53] | Hybrid CNN–transformer with regularization. | For medical image segmentation. |
autoencoder-hybrid CNN-LSTM model [54] | Hybrid of CNN with RNN. | For COVID-19 severity prediction from lung ultrasounds. |
HCT-Net [55] | Hybrid CNN–transformer model based on a neural architecture search network. | A neural architecture search network for medical image segmentation. |
HybridCTrm [56] | Bridging CNNs and transformers. | For multimodal image segmentation. |
MFDNN [57] | MFDNN (multimodal fusion deep neural network) integrates different modalities (medical imaging, genomics, clinical data) to enhance lung cancer diagnostic accuracy. | Used for lung cancer classification by integrating clinical data, electronic health records and multimodal approaches (to improve the accuracy and reliability of lung cancer diagnosis). |
M3BTCNet [58] | This architecture uses metaheuristic deep neural network features optimization. | For a multi model brain tumor classification. |
NeuroNet19 [59] | Uses VGG19 as its backbone and incorporates inverted pyramid pooling module (iPPM) to capture multi-scale feature maps (to extract both local and global image contexts). Local interpretable model-agnostic explanations (LIME), is used to highlight the features or areas focused on while predicting individual images. | An explainable deep neural network model for the classification of brain tumors (using MRI data). |
sparse deep neural network features [60] | Designed based on dense neural network (VGG-16 and Resnet-50) and sparse neural networks (inception v3). | For detection and classification of non-small cell lung cancer types. |
3D CNN Multimodal Framework [61] | The framework comprises a 3D CNN for each modality, whose predictions are then combined using a late fusion strategy based on Dempster–Shafer theory. | Classification of MRI images with multimodal fusion. This multimodal framework processes all the available MRI data in order to reach a diagnosis. |
Feature fusion-based ensemble CNN learning optimization [62] | An ensemble CNN framework incorporates optimal feature fusion: multiple CNN models with different architectures are trained on the dataset using fine-tuning and transfer learning techniques. | For the automated detection of pneumonia. Learning optimizations achieved by iteratively eliminating irrelevant features from the fully connected layer of each CNN model using chi-square and mRMR methods. Optimal feature sets are then concatenated to enhance feature vector diversity for classification. |
External validation of a deep learning model [63] | This model is based on ResNet-18 to automatically assess the mammographic breast density (for each mammogram), providing a quantitative measure of the breast tissue composition. | For breast density classification. |
Weakly Supervised Deep Multiple Instance Learning [64] | This is a two-stage framework based on deep multiple instance learning. It requires only global labels (weak supervision). | For diagnosis and detection of breast cancer. This approach provides classification of the whole volume and of each slice and the 3D localization of lesions through heatmaps. |
DM-CNN [65] | Dynamic multi-scale CNN contains four sub-modules: dynamic multi-scale feature fusion module (DMFF), hierarchical dynamic uncertainty quantifies attention (HDUQ-Attention), multi-scale fusion pooling method (MF Pooling) and multi-objective loss (MO loss) | For medical image classification with uncertainty quantification. DMFF selects convolution kernels according to the feature maps of each level for information fusion. HDUQ-Attention has a tuning block to adjust the attention weight according to the information of each layer. The Monte Carlo (MC) dropout structure is for quantifying uncertainty. The MF pooling is to speed up the computation and prevent overfitting. And the MO loss is for a fast optimization speed and good classification effect. |
lightweight multi-modality UNeXt and Wave-MLP semantic segmentation network [66] | The wave block module in Wave-MLP replaces the Tok-MLP module in UNeXt. The phase term in wave block can dynamically aggregate token to improve segmentation accuracy. An AG attention gate module at the skip connection suppresses irrelevant feature representations. Then, the focal Tversky loss is added to handle both binary and multiple classification tasks. | For multi-modality medical image semantic segmentation. |
MobileNets [67] | Efficient CNN based on a streamlined architecture that uses depth-wise separable convolutions to build light-weight deep neural networks. | For mobile and embedded vision applications. Use cases include object detection, fine-grain classification, face attributes and largescale geo-localization. |
UL-BTD [68] | an automated ultra-light brain tumor detection (UL-BTD) system based on ultra-light deep learning architecture (UL-DLA) for deep features, integrated with highly distinctive textural features, extracted by gray level co-occurrence matrix. | For multiclass brain tumor detection. It forms a hybrid feature space for tumor detection using support vector machine, leading to high prediction accuracy and optimum false negatives with limited network size to fit within the average GPU resources of a modern PC system. |
Activation Function | Equation | Graphical Representation * | Short Description |
---|---|---|---|
Rectified Linear Unit (ReLU) | ReLU(x) = max(0, x) i.e., ReLU(x) = {0, if x ≤ 0, x, if x > 0} | Computationally efficient and helps alleviate the vanishing gradient problem, allowing for faster training and improved network performance. | |
Leaky ReLU | Leaky ReLU(x) = max(alpha × x, x) = {x, if x > 0, alpha × x, if x ≤ 0} x is the input, and alpha is a small positive constant (determines the slope for negative input values). | alpha = 0.1 | Addresses the issue of “dead neurons” by allowing small negative values instead of setting them to zero; provides some gradient flow for negative inputs during backpropagation. |
Parametric ReLU (PReLU) | PReLU(x) = {x, if x > 0, alpha × x, if x ≤ 0}; Alpha is a parameter that can be learned during the training process (controls the slope for negative input values). | Similar to leaky ReLU (although alpha here is a parameter to be learned and optimized). | During training, the alpha parameter is updated through backpropagation, enabling the network to learn the optimal value for each neuron. Adjusting the slope for negative inputs can lead to improved performance and better representation learning. |
Randomized Leaky ReLU (RRELU) | The slope is fixed to a predefined value during testing. This introduces a form of regularization and can help prevent overfitting. | Similar to leaky ReLU. | A variation of Leaky ReLU that randomly samples the slope from a uniform distribution during training. |
Exponential Linear Unit (ELU) | ELU(x) = {x, if x > 0, alpha × (exp(x) − 1), if x ≤ 0} Alpha is a hyperparameter (controls the behavior of the function); ELU captures more nuanced information from negative inputs and alleviate the vanishing gradient problem. | alpha = 1.0 | Smooths negative inputs by using an exponential function; the exponential smoothing helps reduce the impact of noisy activations. |
Scaled Exponential Linear Unit (SELU) | SELU = λ with predefined value for lambda λ or in general SELU(x) = {scale × (x if x > 0 else (alpha × exp(x) − alpha)), if training; scale × x, if testing} x is input to the activation function, alpha is a hyperparameter that controls the slope for negative inputs and scale is a scaling factor to maintain the mean and variance of the inputs close to 0 and 1, respectively. SELU has the property of self-normalization, which can lead to improved performance and stability in deep neural networks. | SELU, by adjusting the mean and variance, takes care of internal normalization. Gradients can be used to adjust the variance (needs a region with a gradient > 1 to increase it). | During training, SELU applies a modified ELU function (negative inputs are transformed with a negative slope). The scale factor stabilizes the activations and ensures self-normalization. The mean and standard deviation of the outputs are enforced to be approximately 0 and 1, respectively (helps address the vanishing/exploding gradient problem). During testing, SELU behaves as a scaled identity function (inputs are multiplied by the scale factor to preserve the output magnitude). |
Swish | SWISH (x) = x × sigmoid(beta × x) Beta is a hyperparameter that controls the behavior of the function. Higher values of beta can lead to more pronounced non-linearity, while lower values can make it closer to the identity function. | beta = 0.5 | Combines the linearity of the identity function (x) with the non-linearity of the sigmoid function (for positive inputs: retains the linearity; for negative inputs, the output towards zero is dampened due to the sigmoid function). It performs well in CNNs. |
SWISH-RELU | SWISH-RELU(x) = x × sigmoid(beta × x) if x > 0 SWISH-RELU(x) = x if x ≤ 0 The advantage of SWISH-RELU is that it retains the desirable properties of Swish, such as the smoothness and non-monotonic behavior, while also providing a fallback to ReLU for negative inputs. This fallback mitigates the problem of dead neurons and vanishing gradients associated with the standard Swish activation function. | beta = 0.1 | The Swish activation function with a ReLU fallback is a Swish and ReLU hybrid. The sigmoid introduces a smooth non-linearity, while the ReLU fallback ensures that the activation does not completely vanish for negative inputs. SWISH-RELU performs well in CNNs for image classification. |
Gaussian Error Linear Unit (GELU) | GELU(x) = 0.5x × (1 + erf(x/sqrt(2))) This is smooth and non-monotonic. x is the input and erf is the error function used to model cumulative distribution. | erf = 0.3 | GELU has a smooth and non-linear behavior that can help capture complex patterns and gradients; it performs well in NLP and CNNs. It is computationally more expensive than ReLU due to the involvement of erf but improves the performance in certain scenarios. |
Softmax | Given an input vector of x = [x1, x2, …, xn], the Softmax function computes the probability pi for each element xi as: Softmax(xi) = exp(xi)/sum(exp(xj)) for j = 1 to n The highest probability class is selected as the predicted class label. | Boundaries vary based on the xi and xj values. | Used as the final activation function in the output layer for multi-class classification tasks (takes a vector of real numbers inputs and outputs a vector of probabilities between 0 and 1 that sum up to 1). Enables the network to assign probabilities to each class, indicating the model’s confidence for each class prediction. |
Hyperbolic Tangent (Tanh) | Non-linear symmetric function around the origin (squeezes the input value into a range between −1 and 1). | Useful for tasks that require outputs in the range of −1 to 1 or for modeling symmetric patterns. Suffers from the “vanishing gradient” problem, where the gradient becomes extremely small for inputs with very high absolute values. | |
Sigmoid (logistic) | sigmoid(x) = 1/(1 + exp(−x)) A non-linear function that squeezes the input value into a range between 0 and 1. Suffers from the “vanishing gradient” problem, where the gradient becomes extremely small for inputs with very high or very low absolute values. | Maps any real-valued number to a value between 0 and 1, with values close to 0 representing the lower end of the range and values close to 1 representing the upper end (suitable for binary classification tasks or probabilistic outputs). | |
Softplus | Softplus(x) = log(1 + exp(x)) Designed to be a smooth and differentiable approximation of the ReLU function, which is non-differentiable at x = 0. Commonly used in variational autoencoders (VAEs) and some recurrent neural networks (RNNs). | Has similar properties to ReLU, where positive inputs are passed through unchanged, while negative inputs are mapped to small positive values. It introduces non-linearity to the network, allows for the modeling of complex patterns, and provides smoother gradients than ReLU (facilitates better training and convergence). | |
Mish | Mish(x) = x × tanh(softplus(x)) Mish does not have a closed-form derivative and is often approximated or numerically computed during backpropagation. It introduces non-linear behavior, captures complex patterns and alleviates the vanishing gradient problem. | Performs well in image classification and NLP. | Mish combines the non-linearity of the softplus function with the smoothness of the hyperbolic tangent function (has a similar shape to the Swish activation function but with a gentler slope for negative inputs). |
Inverse Square Root Unit (ISRU) | ISRU(x, alpha) = x/sqrt(1 + alpha × x2) Alpha is a positive constant that determines the steepness and shape of the ISRU function; a larger alpha value results in a steeper curve, while a smaller alpha value leads to a more gradual curve. The square root and normalization ensure that the output remains within a reasonable range. | alpha = 0.5 | ISRU is used as an alternative to sigmoid or tanh in situations where a more gradual transition from low to high activations is desired. But it is not widely used in deep learning models. |
Framework * (Repository) | Developed (Maintained) | Short Description | Numbers of Hits for Each Search Engine | ||
---|---|---|---|---|---|
Google Scholar | PubMed | IEEE Xplore | |||
TensorFlow | Google Brain Team | An end-to-end machine learning platform. | 231,000 | 364 | 2154 |
PyTorch | Meta AI | Based on the Torch library. | 82,600 | 196 | 650 |
Theano | Montreal Institute for Learning Algorithms | Allows the definition, optimization, and efficient evaluation of mathematical expressions involving multi-dimensional arrays. | 29,600 | 23 | 69 |
Keras | François Chollet | Provides a Python interface for ANNs, e.g., TensorFlow. | 697,000 | 182 | 893 |
MXNet | Apache Software Foundation | Scalable, allows fast model training and supports multiple programming languages. | 7930 | 5 | 59 |
Caffe/Caffe2 | University of California, Berkeley | A lightweight, modular and scalable deep learning framework. | 7410 | No results | No result |
Chainer | Preferred Networks, Inc., Tokyo, Japan | A collection of tools to train and run neural networks for computer vision tasks. | 5700 | 9 | 52 |
CNTK | Microsoft | Describes neural networks as a series of computational steps via a directed graph. | 19,800 | 8 | 28 |
Torchnet | PyThorch TNT | An abstraction to train neural networks (for logging and visualizing, loading and training). | 97 | 61 | No results |
JAX | Provides interfaces to compute convolutions across data. | 143,000 | 2692 | 113 | |
EfficientNet | ICML 2019·Mingxing Tan, Quoc V. Le | A CNN architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a compound coefficient. | 23,700 | 325 | 648 |
SRnet | Niwhskal/SRNet | A twin discriminator GAN that can edit text in any image while maintaining context of the background, font style and color. | 2960 | 9 | 34 |
LFnet | Learning Local Features from Images | Deep architecture to learn local features and can be trained end-to-end with just a collection of images, from scratch, without hand-crafted priors. | 1150 | 5 | 6 |
Horovod | The Linux Foundation | Distributed deep learning training framework for TensorFlow, Keras, PyTorch and Apache MXNet. | 1840 | No results | 40 |
Attention Factorization Machine (AFM) | Jun Xiao et al. [69] | Learning the weight of feature interactions via attention networks. | 18,500 | No results | 3 |
Neural Factorization Machine NFM-PT | Xiangnan He and Tat-Seng Chua | For sparse predictive analytics (or prediction under sparse settings). | 4480 | 2 | 1 |
Deep Factorization Machine (DeepFM) | Guo, H et al. [70] | Combines the power of factorization machines for recommendation and deep learning for feature learning with no need of feature engineering besides raw features. | 2340 | 3 | 20 |
Deep Cross-Network (DCN) | Wang, R et al. [71] | Applies feature crossing networks at each layer that do not require manual feature engineering, and hence, it is more efficient in learning certain bounded-degree feature interactions. | 581 | 1 | 4 |
Trax | Google Brain Team | An end-to-end library for deep learning that focuses on clear code and speed. | 20,800 | 156 | 38 |
Kaldi | DNN in KALDI | An open-source speech recognition toolkit. | 47,100 | 73 | 174 |
OpenSeq2Seq | FazedAI/OpenSeq2Seq | A TensorFlow-based toolkit for sequence-to-sequence models. | 127 | No results | No results |
ESPNet/ESPNet | Watanabe, S et al. [72] | An end-to-end toolkit for speech processing, recognition and text-to-speech translation. | 3270 | 140 | 30 |
wav2letter++ | Pratap, V et al. [73] | A fast open-source deep learning speech recognition framework. | 4 | No results | 3 |
Elephas | Max Pumperla and Daniel Cahall | An extension of Keras, which allows the running of distributed deep learning models at scale with Spark. | 52,700 | 786 | 3 |
Tfaip/tfaip | Python community | Research framework for developing, organizing, and deploying deep learning models powered by TensorFlow. | 26 | 5 | No results |
BigDL | Dai, J et al. [74] | A distributed deep learning library for Apache Spark (fast, distributed and secure AI for big data). | 455 | 1684 | 10 |
Hyperparameter * | Description |
---|---|
Learning rate | Controls the step size at each iteration during training and influences how quickly the model learns. |
Number of epochs | Determines the number of times the entire dataset is passed through the network during training. |
Batch size | Specifies the number of training examples in each mini-batch used for updating the model’s parameters. |
Batch normalization | A normalization technique that helps stabilize the learning process by normalizing the inputs of each layer. |
Optimizer type | Selects the optimization algorithm used to update the model’s weights based on the computed gradients (e.g., Adam, SGD). |
Loss function | Defines the objective function used to measure the difference between predicted and actual values (e.g., categorical cross-entropy, mean squared error). |
Activation function | Applies non-linearity to the output of a neuron and determines the range of values that can be produced by the layer (e.g., ReLU, sigmoid). |
Dropout rate | Controls the probability of randomly setting a fraction of the input units to 0 during training, reducing overfitting. |
Weight initialization strategy | Determines how the initial weights of the model are set before training begins. |
Number of layers | Specifies the depth or the number of layers in the CNN architecture. |
Filter/kernel size | Defines the spatial extent of the filters (convolutional kernels) used to scan the input data. |
Pooling type | Determines the downsampling operation applied to reduce the spatial dimensions of the feature maps (e.g., max pooling, average pooling). |
Pooling size | Specifies the size of the pooling window used for downsampling. |
Stride | Defines the step size at which the filter/kernel moves horizontally or vertically when performing convolutions or pooling. |
Padding | Determines whether and how extra border pixels are added to the input data before performing convolutions or pooling. |
Learning rate decay | Reduces the learning rate over time to allow for finer adjustments during training. |
Weight decay | Adds a penalty term to the loss function to discourage large weights, reducing overfitting. |
Data augmentation | Applies random transformations to the training data, such as rotation, flipping or zooming, to increase the diversity of examples and improve generalization. |
Transfer learning | Uses pre-trained models on large-scale datasets as a starting point for training on a specific task, saving training time and potentially improving performance. |
Early stopping | Stops the training process if the validation loss does not improve over a certain number of epochs, preventing overfitting and saving computational resources. |
Learning rate schedule | Specifies how the learning rate is adjusted during training, such as by reducing it after a certain number of epochs or based on a predefined schedule. |
Initialization of biases | Determines how the biases of the model’s layers are initialized. |
Learning rate warm-up | Gradually increases the learning rate at the beginning of training to stabilize the optimization process. |
Image normalization | Specifies how the input images are normalized (e.g., mean subtraction, scaling to a certain range). |
Network architecture | Defines the overall structure of the CNN model, including the arrangement and types of layers (e.g., VGG, ResNet, Inception). |
Number of filters per layer | Determines the depth of the feature maps produced by each convolutional layer. |
Dilated convolutions | Allows the network to have a larger receptive field without increasing the number of parameters. |
Weight sharing | Shares weights across different parts of the network to reduce the number of parameters and improve generalization. |
Learning rate annealing | Gradually decreases the learning rate during training to fine-tune the model’s parameters. |
Input image size | Specifies the size of the input images to the CNN model. |
Number of convolutional layers | Determines the depth or capacity of the CNN; the appropriate number depends on the task, the size and diversity of the dataset and the computational resources. |
Number of fully connected layers | Used to map the high-level features to the desired output. The number of neurons or units in each fully connected layer is another hyperparameter. |
Momentum | Used in optimization algorithms (e.g., SGD) and can improve the convergence speed and stability of CNN training by accumulating momentum from past gradients. |
Inverse dropout | Makes inference faster during test time. |
L2 regularization | For the sparse representation of features. |
Dataset Name (Hyperlinked URL as of 21 January 2024) | Theme | Description |
---|---|---|
Medical Information Mart for Intensive Care III (MIMIC-III) | Critical care | Electronic health records (EHR) of ICU patients, including clinical notes, demographics, lab results and imaging reports |
The Cancer Genome Atlas (TCGA) | Cancer genomics | Images and genomic and clinical data for various cancer types, including gene expression, epigenetic marks, mutations and clinical outcomes. |
NIH Chest X-ray dataset | Chest X-ray imaging | An open dataset of chest X-ray images labeled for common thoracic diseases, including pneumonia and lung cancer, often used for developing and evaluating image classification models. |
Alzheimer’s Disease Neuroimaging Initiative (ADNI) | Neuroimaging (Alzheimer’s disease) | Longitudinal MRI and PET imaging data for Alzheimer’s disease research. |
Diabetic Retinopathy Detection | Ophthalmology | Fundus images for diabetic retinopathy classification, used to develop algorithms for automated disease detection. |
PhysioNet Challenge | Various cardiology and physiological signals | Datasets from PhysioNet challenges cover a variety of themes, including heart rate, blood pressure and electrocardiogram (ECG) signals. |
Multimodal Brain Tumor Segmentation Challenge (BraTS) | Neuroimaging (brain tumor) | MRI images for brain tumor segmentation, challenging researchers to develop algorithms for tumor detection and segmentation. |
UCI Machine Learning Repository-Health Datasets | Various | A collection of health-related datasets covering different topics, including diabetes, heart disease and liver disorders. |
PhysioNet/MIMIC-CXR Database | Chest X-ray imaging | A dataset of chest X-ray images with associated radiology reports, supporting research in chest radiography. |
Skin Cancer Classification-Refugee Initiative (SCC-RI) | Dermatology | A dataset of skin lesion images for skin cancer classification, focusing on refugee populations. |
Federal interagency traumatic brain injury research (FITBIR) | Traumatic brain injury | These datasets include images on traumatic brain injury patients, clinical and molecular datasets. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mohammed, F.A.; Tune, K.K.; Assefa, B.G.; Jett, M.; Muhie, S. Medical Image Classifications Using Convolutional Neural Networks: A Survey of Current Methods and Statistical Modeling of the Literature. Mach. Learn. Knowl. Extr. 2024, 6, 699-735. https://doi.org/10.3390/make6010033
Mohammed FA, Tune KK, Assefa BG, Jett M, Muhie S. Medical Image Classifications Using Convolutional Neural Networks: A Survey of Current Methods and Statistical Modeling of the Literature. Machine Learning and Knowledge Extraction. 2024; 6(1):699-735. https://doi.org/10.3390/make6010033
Chicago/Turabian StyleMohammed, Foziya Ahmed, Kula Kekeba Tune, Beakal Gizachew Assefa, Marti Jett, and Seid Muhie. 2024. "Medical Image Classifications Using Convolutional Neural Networks: A Survey of Current Methods and Statistical Modeling of the Literature" Machine Learning and Knowledge Extraction 6, no. 1: 699-735. https://doi.org/10.3390/make6010033
APA StyleMohammed, F. A., Tune, K. K., Assefa, B. G., Jett, M., & Muhie, S. (2024). Medical Image Classifications Using Convolutional Neural Networks: A Survey of Current Methods and Statistical Modeling of the Literature. Machine Learning and Knowledge Extraction, 6(1), 699-735. https://doi.org/10.3390/make6010033