An Improved Deep Learning Framework for Multimodal Medical Data Analysis
Abstract
:1. Introduction
- The article presents an improved deep learning model to diagnose Tuberculosis disease on a new multimodal medical dataset.
- The article used cross model transformer approach to fuse different modality and provide the unified feature for disease diagnosis.
- The results reveal that proposed approach performed better than traditional approaches available for multimodal medical diagnosis.
2. Data Set Description
3. Proposed Methodology
3.1. Proposed Framework
3.1.1. Module1: Clinical Data Processing ()
3.1.2. Image Data Processing Module
3.1.3. Cross-Modal Transformer
Algorithm 1 Psuedocode for generating enhanced feature vectors for improved classification |
3.1.4. Disease Probability Prediction Module
Algorithm 2 Pseudocode for classification using Multi-layer Dense Neural Network |
3.2. Model’s Classification Performance Measure
- TP, TN, FP and FN:
- TP (True positive): In our approach true positive is taken as the number patients with tuberculosis in original dataset and are correctly identified.
- TN: (True Negative): It shows the total number of patients who are not suffering from tuberculosis and being correctly identified.
- FP (False Positive): This term reflects the total number patients who are not suffering from tuberculosis but model classified them as one.
- FN (False Negative): This term Presents the number of patients who were suffering from tuberculosis but were not identified and were marked as healthy.
- Prediction Accuracy: Accuracy is a crucial parameter in distinguishing between patient and healthy cases. To achieve this, we evaluate the proportion of true positive cases and the proportion of true negative cases in each case, as shown in the following Equation (15):
- True Positive Rate (TPR) and False Positive Rate (FPR): Also referred to as the sensitivity of a classifier, TPR measures the proportion of correctly identified positive events. FPR, on the other hand, is the probability of false rejection of the null hypothesis. It divides the number of negative events falsely classified as positive by the total number of true negative events. TPR can be calculated as Equation (16), and FPR can be calculated as Equation (17).
- Precision: Precision is a metric that is widely used to evaluate the performance of a classification algorithm. It measures the exactness of the algorithm’s results and can be defined as a measure of exactness. Mathematically, precision can be expressed as shown in the Equation (18).
- Specificity: The classifier’s specificity refers to its ability to accurately predict negative cases within a given dataset. This can be calculated as a metric of performance as given in Equation (19):
- F-measure (F-score): The F-measure, also known as the F-score, is a metric used to evaluate the accuracy of a classifier test [18]. It is calculated by taking into account both precision and recall, and is defined as the harmonic mean of these two values. The F-score ranges from 0 to 1, with 1 being the ideal value and 0 being the worst. F-score can be calculated as given in Equation (20).
- Matthews correlation coefficient (MCC): The Matthews correlation coefficient (MCC) is a balanced metric used to measure the quality of binary classification when the predicted variable has only two values [19]. In our scenario, the target attribute has two class values: ‘TB’ and ‘NTB’. The MCC is particularly useful when dealing with imbalanced classes. Its value ranges between to . A value of is considered as a perfect prediction, 0 for average prediction and for no prediction. MCC can be calculated as Equation (21).
- Receiver operating characteristic (ROC) curve: The ROC (Receiver Operating Characteristic) is a crucial measure for evaluating classifier accuracy [20]. Originally developed in signal detection theory, it shows the connection between hit and false alarm rates while taking into account channel noise. Nowadays, ROC is widely used in the field of machine learning as a valuable tool for visualizing classifier performance. The ROC curve illustrates the correlation between True Positive Rate (TPR) and False Positive Rate (FPR). To evaluate the performance of the classifier, we calculate the AUC (area under the ROC curve). An AUC value close to 1 indicates excellent performance, while a value less than 0.5 indicates poor performance.
4. Experiments and Results
4.1. Experimental Setup
4.2. Multimodal Data Processing and Fusion
4.3. Ablation Study and Performance Comparison with Existing Models
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Tuberculosis. Available online: https://www.who.int/news-room/fact-sheets/detail/tuberculosis (accessed on 10 December 2023).
- Esteva, A.; Chou, K.; Yeung, S.; Naik, N.; Madani, A.; Mottaghi, A.; Liu, Y.; Topol, E.; Dean, J.; Socher, R. Deep learning-enabled medical computer vision. NPJ Digit. Med. 2021, 4, 5. [Google Scholar] [CrossRef] [PubMed]
- Aiadi, O.; Khaldi, B. A fast lightweight network for the discrimination of COVID-19 and pulmonary diseases. Biomed. Signal Process. Control 2022, 78, 103925. [Google Scholar] [CrossRef] [PubMed]
- Guan, B.; Yao, J.; Zhang, G. An enhanced vision transformer with scale-aware and spatial-aware attention for thighbone fracture detection. In Neural Computing and Applications; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1–14. [Google Scholar]
- Boulahia, S.Y.; Amamra, A.; Madi, M.R.; Daikh, S. Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition. Mach. Vis. Appl. 2021, 32, 121. [Google Scholar] [CrossRef]
- Pandeya, Y.R.; Lee, J. Deep learning-based late fusion of multimodal information for emotion classification of music video. Multimed. Tools Appl. 2021, 80, 2887–2905. [Google Scholar] [CrossRef]
- Xu, T.; Zhang, H.; Huang, X.; Zhang, S.; Metaxas, D.N. Multimodal deep learning for cervical dysplasia diagnosis. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, 17–21 October 2016; Proceedings, Part II 19. Springer: Berlin/Heidelberg, Germany, 2016; pp. 115–123. [Google Scholar]
- Schulz, S.; Woerl, A.C.; Jungmann, F.; Glasner, C.; Stenzel, P.; Strobl, S.; Fernandez, A.; Wagner, D.C.; Haferkamp, A.; Mildenberger, P.; et al. Multimodal deep learning for prognosis prediction in renal cancer. Front. Oncol. 2021, 11, 788740. [Google Scholar] [CrossRef] [PubMed]
- Vale-Silva, L.A.; Rohr, K. Long-term cancer survival prediction using multimodal deep learning. Sci. Rep. 2021, 11, 13505. [Google Scholar] [CrossRef] [PubMed]
- Joo, S.; Ko, E.S.; Kwon, S.; Jeon, E.; Jung, H.; Kim, J.Y.; Chung, M.J.; Im, Y.H. Multimodal deep learning models for the prediction of pathologic response to neoadjuvant chemotherapy in breast cancer. Sci. Rep. 2021, 11, 18800. [Google Scholar] [CrossRef] [PubMed]
- Steyaert, S.; Pizurica, M.; Nagaraj, D.; Khandelwal, P.; Hernandez-Boussard, T.; Gentles, A.J.; Gevaert, O. Multimodal data fusion for cancer biomarker discovery with deep learning. Nat. Mach. Intell. 2023, 5, 351–362. [Google Scholar] [CrossRef] [PubMed]
- Ivanova, O.N.; Melekhin, A.V.; Ivanova, E.V.; Kumar, S.; Zymbler, M.L. Intermediate fusion approach for pneumonia classification on imbalanced multimodal data. Bull. South Ural. State Univ. Ser. Comput. Math. Softw. Eng. 2023, 12, 19–30. [Google Scholar]
- Kumar, S.; Ivanova, O.; Melyokhin, A.; Tiwari, P. Deep-learning-enabled multimodal data fusion for lung disease classification. Inform. Med. Unlocked 2023, 42, 101367. [Google Scholar] [CrossRef]
- Lu, Z.H.; Yang, M.; Pan, C.H.; Zheng, P.Y.; Zhang, S.X. Multi-modal deep learning based on multi-dimensional and multi-level temporal data can enhance the prognostic prediction for multi-drug resistant pulmonary tuberculosis patients. Sci. One Health 2022, 1, 100004. [Google Scholar] [CrossRef] [PubMed]
- Zhou, H.Y.; Yu, Y.; Wang, C.; Zhang, S.; Gao, Y.; Pan, J.; Shao, J.; Lu, G.; Zhang, K.; Li, W. A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics. Nat. Biomed. Eng. 2023, 7, 743–755. [Google Scholar] [CrossRef] [PubMed]
- Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 1096–1103. [Google Scholar]
- Bengio, Y. Learning deep architectures for AI. Found. Trends Mach. Learn. 2009, 2, 1–127. [Google Scholar] [CrossRef]
- Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar]
- Matthews, B.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta (BBA)-Protein Struct. 1975, 405, 442–451. [Google Scholar] [CrossRef]
- Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Clinical Feature | Value Type | Values |
---|---|---|
Temperature | Continuous | 97.1–103.3 °F |
Diastolic blood pressure | Continuous | 50–103 mm/Hg |
Systolic blood pressure | Continuous | 100–178 mm/Hg |
blood sugar (fasting) | Continuous | 80–170 mg/dL |
Heart rate | Continuous | 70–128 bpm |
Patient’s Height | Continuous | 137–181 cm |
Hemoglobin | Continuous | 6.0–19.3 g/dL |
Body weight | Continuous | 35–118 kg |
Pleuritic chest pain | Categorical | Y/N |
Sputum Test | Categorical | P/N |
breathlessness | Categorical | Y/N |
Loss of appetite | Categorical | Y/N |
Weight loss | Categorical | Y/N |
Cough (Dry/Haemoptysis) | Categorical | D/H |
Malaise | Categorical | Y/N |
Smoking habit | Categorical | Y/N |
NN Architecture | Accuracy | Loss |
---|---|---|
95.59 | 0.0768 | |
94.9 | 0.0985 | |
94.6 | 0.1005 | |
93.95 | 0.1039 | |
90.86 | 0.1186 |
Model | Accuracy | TPR/Recall | FPR | Specificity | Precision | F-Measure | MCC | ROC | Class |
---|---|---|---|---|---|---|---|---|---|
Early Fusion | 0.8454 | 0.7777 | 0.1058 | 0.89414 | 0.8409 | 0.8080 | 0.6804 | 0.8359 | TB |
0.8454 | 0.8941 | 0.2222 | 0.7777 | 0.8482 | 0.8705 | 0.6804 | 0.8359 | NTB | |
Late Fusion | 0.8687 | 0.8065 | 0.0865 | 0.9134 | 0.8702 | 0.8372 | 0.7290 | 0.8600 | TB |
0.8687 | 0.9134 | 0.1934 | 0.8065 | 0.8677 | 0.8900 | 0.729 | 0.8600 | NTB | |
Hybrid Fusion | 0.8594 | 0.7985 | 0.0966 | 0.9033 | 0.8560 | 0.8262 | 0.7097 | 0.8509 | TB |
0.8594 | 0.9033 | 0.2014 | 0.7985 | 0.8616 | 0.8820 | 0.7097 | 0.8509 | NTB | |
IRENE [15] | 0.9443 | 0.9187 | 0.0372 | 0.9627 | 0.9467 | 0.9325 | 0.8854 | 0.9407 | TB |
0.9443 | 0.9627 | 0.0812 | 0.9187 | 0.9427 | 0.9526 | 0.8854 | 0.9407 | NTB | |
Proposed Model | 0.9559 | 0.9328 | 0.0280 | 0.9719 | 0.9599 | 0.9461 | 0.9086 | 0.9524 | TB |
0.9559 | 0.9719 | 0.0671 | 0.9328 | 0.9526 | 0.9622 | 0.9086 | 0.952403983 | NTB |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kumar, S.; Sharma, S. An Improved Deep Learning Framework for Multimodal Medical Data Analysis. Big Data Cogn. Comput. 2024, 8, 125. https://doi.org/10.3390/bdcc8100125
Kumar S, Sharma S. An Improved Deep Learning Framework for Multimodal Medical Data Analysis. Big Data and Cognitive Computing. 2024; 8(10):125. https://doi.org/10.3390/bdcc8100125
Chicago/Turabian StyleKumar, Sachin, and Shivani Sharma. 2024. "An Improved Deep Learning Framework for Multimodal Medical Data Analysis" Big Data and Cognitive Computing 8, no. 10: 125. https://doi.org/10.3390/bdcc8100125
APA StyleKumar, S., & Sharma, S. (2024). An Improved Deep Learning Framework for Multimodal Medical Data Analysis. Big Data and Cognitive Computing, 8(10), 125. https://doi.org/10.3390/bdcc8100125