Non-IID Medical Imaging Data on COVID-19 in the Federated Learning Framework: Impact and Directions
Abstract
:1. Introduction
- This study identifies the applicability and benefits of implementing FL to process and train distributed COVID-19 lung data using various imaging modalities and equipment, identifying the imaging types and modes available in distributed hospitals and medical institutions.
- It provides an overview of the FL system and describes the variables of implementation and the practical constraints in medical fields. It also investigates the progress made in developing FL frameworks to train medical images and identifies areas that require further effort to overcome the pitfalls of distributed learning performance.
- This article provides detailed descriptions of the data heterogeneity issue, identifies the metrics that might be affected by that issue, and offers a mathematical description of the problem for each type of skewness, along with valuable research directions to mitigate the impact of data heterogeneity.
- It emphasizes other prevalent FL issues in a concise manner to offer a comprehensive perspective for research on the FL environment.
- This study uses imaging data to outline potential avenues for future research to explore how COVID-19 affects the lung and internal organs, referencing ongoing studies that consider relevant factors from a medical and radiology standpoint.
2. Procedure
3. Related Works
4. FL Opportunities for COVID-19 Lung Imaging
4.1. Data Availability
4.2. Cold-Start Problem
4.3. Time and Cost of Processing
4.4. Security and Privacy
5. COVID-19 Medical Imaging Data
6. Federated Learning Overview
- Medical imaging data management is costly. Many medical institutions lack the infrastructure necessary to manage their imaging data according to standard management requirements. This is an emerging challenge in implementing federated learning for research: the limitation of the number of data sources that can be selected for training data.
- Only medium-to-large hospitals or medical research institutions own the repositories of standardized medical images. This enables deep learning to concentrate on valuable features, avoid incorporating weak features from low-quality data, and identify trustworthy participants. As a result, the local update models received from distributed sites might be more reliable [52], which leads to improving the global model’s convergence in a lower number of rounds to achieve satisfactory accuracy.
- Datasets of medical images contain highly sensitive patient information. However, if the application of FL ignores privacy-preserving methods such as differential privacy, then the homomorphic encryption of the sharing weights may result in the leakage of patient or institutional privacy. At the same time, it greatly increases the computational overheads of training models because the medical image models exceed 10 million weights [56].
7. Data Heterogeneity Issue in Medical Imaging
7.1. Non-IID Types
7.1.1. Quantity Skew
- Using the augmentation method to expand the size of the image dataset is a simple and common solution for highly training the model on the same data features, which can be achieved by changing various scales such as transformation, zooming, and rotation. However, the transformation methods used for generating data are not always effective in training, which may degrade the model’s performance [69].
- In such aggregation strategies, quantity skew issues are improved by assigning a learning rate or batch number to each client variant based on the quantity of data [70].
- The FedAMP model exhibits resistance to quantity skew because its aggregate weights are adaptively learned throughout the training process [49].
7.1.2. Label Distribution Skew
- A first-direction solution is implemented before training data by preprocessing data to ensure a uniform distribution across sites using the local augmentation method [51], GANs [52], and the synthetic minority oversampling technique (SMOTE) [71]. However, these solutions require more communication between parties to fine-tune the number of labels and the distribution of images in each. This may also lead to information leakage from participant data with slightly improvements in accuracy, approximately around 1.6% [68].
- The second direction is to improve convergence between updated models locally, which focuses on the monitoring of local updates per batch in FedBN [72], per round in FedProx [55], or by normalizing both local and global updates in HarmoFL [65]. These methods report efficient bias mitigation and improve the global model’s generalizability. However, they may have a negative impact on the model’s personality. In other words, the model may yield poor results due to its incompatibility with local population data.
- The third direction is examining the local updates on the server side before accepting the updated models. This may rely on various calculations of acceptance priority [30], the use of voting methods [7], and the implementation of smart contracts in blockchain-based systems [50,73]. However, these methods have additional computational overheads.
7.1.3. Extreme Label Skew
- The authors used a semi-supervised method to label unlabeled data and reported satisfactory accuracy [63]. As a recommendation to the uniform label name in the FL framework, their method could be useful in this situation.
- To address the word variants issue, radiologists could also analyze meta-data using natural language processing (NLP) [37].
7.1.4. Data Acquisition Protocol Skew
7.1.5. Modality Skew
7.1.6. Feature Skew
7.2. Bias Generation Factors
7.2.1. Training Model
7.2.2. Aggregative Strategy
Application | FL Architecture | Measured Metrics | Skewness Type | Aggregative Strategy | Preprocessing |
---|---|---|---|---|---|
Classification of lung diseases | Central | Generalization | Data acquisition | FedAvg | Used CycleGAN method [69] |
Quantity/label distribution | FedAvg | SMOTE [71] | |||
Personalization | Extreme label | FedAvg | GAN with augmentation method [60] | ||
Data acquisition | FedBN | Lung segmentation, image normalization, and data augmentation [49] | |||
Quantity/label distribution | FedAvg under smart contract | The size of training sample is computed based on the ratio of class in the test set [70] | |||
P2P | Generalization | Quantity/label distribution | FedAvg | Augmentation [68] Change of the setting of FL hyperparameters [75] | |
Data acquisition | FedAvg | Using vision transformers model [85] | |||
Delegated Proof-of-Stake (DPoS) | GAN [77] | ||||
Segmentation of lung infections | Central | Personalization | Data acquisition | FedAvg with local adoption epoch | Spatial normalization and scaling [67] |
P2P | Generalization | FedAvg with weights of computational cost | Spatial and signal normalization with segmentation [30] | ||
Boundary box of lung lesions | Central | Generalization vs. personalization | Data acquisition | FedAvg | Normalization and data augmentation [59] |
Labeling and annotating data | Central | Generalization | Quantity/label distribution | Self-adaptive aggregation method | CLAHE parameter on data and transferring meta data [74] |
Data acquisition | FedAvg | Augmentation [63] | |||
Oxygen prediction | Central | Generalization and personalization | Data acquisition | FedAvg | Normalization and augmentation of distributed data [64] |
Severity diagnosing | Central | Generalization | IID | FedAvg | Not mentioned [13] |
P2P | Generalization and personalization | Data acquisition | FedAvg with timer for generated ledger | Capsule network for segmentation and classification with blockchain technology [73] |
8. Common FL Challenges
8.1. Communication Issues
8.2. Privacy and Security Issues
8.3. System Resource Issues
9. Results and Discussion
10. Recommendations and Directions
11. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Study | Aim | Application | Contributions | Limitations |
---|---|---|---|---|
Xu et al., 2020 [13] | Compares the accuracy of FL models with six radiologists in a diagnosing task. Fixes the lack of generalization for local models. | Diagnosing CT lung images with four infection labels: COVID-19, viral pneumonia, bacterial pneumonia, and healthy. | They achieved a comparable FL model in terms of sensitivity-specificity for classification results compared with six radiologists. They conducted real FL experiments with data from three hospitals in Wuhan. | There was a trade-off between performance and communication because 16 hours were required to finish 200 training rounds. |
Zhang et al., 2021 [79] | Improves communication efficiency using dynamic fusion-based federated learning. | Diagnosing X-ray and CT lung images with three infection labels: COVID-19, pneumonia and healthy. | They were able to reduce communication overheads by scaling down the uploaded model to 1/16 of the time needed by galaxy FL in complicated models with satisfactory accuracy. | They did not consider reversing engineering abilities in their solution. |
Feki et al., 2021 [68] | Investigates properties and specificities of FL settings, including non-IID and unbalanced data distribution. | Diagnosing chest X-ray images with two infection labels: COVID-19 and healthy. | They found the following: Increasing the number of rounds could improve the accuracy of models. More participants led to fast convergence rates and reduced the need for more rounds. - Labeled distribution skew led to worse performance than quantity skewness. | They reported the results on a small dataset containing only 108 chest X-ray images of positive COVID-19. |
Liu et al., 2021 [53] | Compares the performance in FL of four DL models on COVID-19 X-ray images: COVIDNet, ResNeXt, MobileNet-v2, and ResNet18. | Diagnosing X-ray lung images with three infection labels: COVID-19, pneumonia, and healthy. | They found ResNeXt has the best performance in images with COVID-19 labels. | Models were trained on data containing only 2% COVID-19 labels, which may provide unreliable results without considering non-IID issues. |
Jabłecki et al., 2021 [83] | Measures the impact of the non-IID issue on the accuracy of FL models. | Diagnosing chest X-ray images with three infection labels: COVID-19, pneumonia, and healthy. | They found the following: More local epochs increase GPU time without significant impact on accuracy. Non-IID degraded accuracy from 0.923 to 0.39. - EfficientNetB0 achieved the best performance. | The time needed for the first round was longest due to the construction of the execution graph in the TensorFlow framework at the beginning of training. However, they neglected the impact of low availability of GPU resources on Google Collab cloud. |
Dou et al. n.d. 2021 [59] | Improves generalizability of and automated estimation of the lesion progression using data from 4 different hospitals in Germany and China in testing with comparison to radiologists’ report. | Quantifying lesions from COVID-19 CT images. | They found the following: - Increasing data size is important to mitigate model bias and improve generalizability of diverse training data associated with imaging scanners and annotation protocols. | The time required was 40 ms per round to test one CT image, but they did not consider reverse engineering attacks. |
Qayyum et al., 2022 [80] | Attempts to fix heterogeneity of imaging modalities and improves computational overheads by using edges to cluster each type of modality with different models for automatic diagnosis of COVID-19. | Diagnosing chest X-rays and ultrasound images with binary classification of COVID-19 and normal. | They found that the same result can be reported by sharing the same model with different modalities. The generalizability of the global model can be improved, even with limited hospital resources, and they could benefit from this collaborative learning method. | They did not mention how the data were distributed across clients and clusters in their experiments. Privacy was not guaranteed. They mentioned improving the low latency of FL as an aim of the study, but there were no results about it. |
Yang et al., 2021 [63] | Evaluates FL performance with heterogeneity of data acquisition skew and unlabeled data by training on data from China, Italy, and Japan. | Segmenting and annotating lesions on lungs infected by COVID-19 using CT images. | They reported the importance of data augmentation strategies for computing consistency loss, which improves the generalizability of model. They described the need to tune the trade-off between aggregation frequency and communication cost based on the applications. | They did not solve the problem of how to improve models with non-IID issues and mitigate or detect bias during FL. |
Bai et al., 2021 [69] | Aims to improve generalizability by collecting data from 5 hospitals and challenging the FL method with high heterogeneity of data. | Diagnosing chest X-ray images with three infection labels: non-COVID-19 viral and bacterial pneumonia, COVID-19, and healthy. | They provided the results of computational cost FLOPS with different models. | They mentioned the lack of bias in their study and dropping of participants during training rounds. |
Kumar et al., 2021 [77] | Attempts to overcome the problem of a central point using a fully decentralized blockchain and HE. | Segmentation and classification to detect the COVID-19. | They introduced a new dataset containing 34,006 CT scan slices for 89 patients and 28,395 CT positive scans. The accuracy of the global model was 84.21 ± 0.43. | They did not report the latency of the blockchain or minimize the cost of the solution. |
Abdul et al., 2021 [54] | They studied the impact of the FL hyperparameters during testing on the accuracy and loss of the global model. | Diagnosing chest X-ray images with three infection labels: COVID-19, pneumonia, and healthy. | They found the following: - Softmax activation function and SGD optimizer gave the best prediction accuracy and loss. | They reported the limited impact of increasing data size and number of rounds. However, the results cannot be generalized because they are incompatible with other studies [50,73]. |
Zhang et al., 2021 [60] | Attempts to fix data availability and data privacy issues by using generative adversarial networks to generate fake chest X-ray images and DP to determine the gradient’s weights. | Diagnosing chest X-ray images with three infection labels: COVID-19, pneumonia, and healthy. | They demonstrated that the impact of generating fake images improves global accuracy by 0.84% and reduces loss by 3.0%. They achieved high performance with a low ratio of noise. They reported satisfactory results, even with non-IID. | They reported results of non-IID with label distribution skew only and did not consider other types of skewness. |
Kumar et al., 2021 [30] | Proposes a normalization method for uniform data to fix heterogeneity of data using a blockchain-based method. | Segmentation and classification to detect COVID-19. | Their method achieved the highest sensitivity and lowest specificity. They reported the negative impact of communication costs when increasing the number of participants. | The configuration procedure was not explained clearly. |
Dong et al., 2021 [74] | Attempts to annotate unlabeled data with a federated contrastive learning framework with two modules: metadata transfer module and self-adaptive aggregation module. | Labeling unlabeled data with two infection labels: COVID-19 and healthy. | They reduced annotation costs while utilizing only 3% of labeled data in training to achieve 90% accuracy. Their aggregation module outperformed the FedAvg method consistently, even with non-IID issues, while metadata transfer improved performance. | They did not apply any privacy-preserving method to guarantee privacy. |
Dayan et al., 2021 [64] | Uses data from 20 distributed sites to predict outcomes at 24 and 72 h from time of initial presentation to the emergency room and predicts mechanical ventilation treatment or death at 24 h for symptomatic patients with COVID-19 using inputs of vital signs, laboratory data, and chest X-ray images. | Predicting future oxygen requirements. | FL provided comparable performance even when only 25% of weight updates were shared. Personalization could be improved by fine-tuning local parameters. Participant diversity improved generalizability by 38%. | They did not refer to the time/cost of computations. |
Nguyen et al., 2021 [61] | Attempts to fix data availability and data privacy by using generative adversarial networks to generate fake chest X-ray images in edge cloud computing. | Diagnosing chest X-ray images with three infection labels: COVID-19, pneumonia, and healthy. | They improved the generalizability of models. | They did not apply any privacy-preserving methods to guarantee privacy. |
Lo et al., 2021 [70] | Attempts to enhance the accountability and fairness of FL by using a blockchain-based smart contract system and a weighted fair data sample algorithm. | Diagnosing chest X-ray images with four infection labels: COVID-19, pneumonia, lung opacity, and healthy. | They found the following: More stable and faster convergence rate than ResNet50 models. Blockchain-based smart contracts provided satisfying performance with accountability. Weighted fair data improved performance in cases of distribution skew. | They did not apply any privacy-preserving methods to guarantee privacy. |
Bhattacharya et al., 2022 [66] | Uses three different sources of data to maintain non-IID nature. | Diagnosing chest X-ray images with two infection labels: COVID-19 and healthy. | They found that personality was improved while each client’s models performed well on the test data belonging to the same source. However, they found that generalizability could be improved by averaging the weight on a global model. | They did not apply any privacy-preserving method to avoid privacy attacks. They did not mention the configuration process or HW of the system. |
Ho et al., 2022 [75] | Aims to improve the privacy and accuracy of COVID-19 detection models using an FL model with X-ray image and symptom data. | Diagnosing chest X-ray images with three infection labels: COVID-19, pneumonia, and healthy. | They found that SPP-CNN with 3X3 had higher accuracy because it extracts more spatial details. The accuracy was reduced with non-IID data from 14% to 24%. A larger batch size achieved faster convergence. The accuracy was only reduced by 0.17% with DP noise. | They did not fix the lack of data quantity using any preprocessing method. Their dataset contained only 3616 COVID-19 positives against 10,192 normal images. |
Durga et al., 2022 [87] | Combines a model of capsule networks and extreme learning machines (ELMs) to improve the accuracy of segmentation and COVID-19 detection. | Segmentation and classification to detect COVID-19. | The ensemble of capsule networks and ELMs produced the best accuracy in detecting COVID-19 from multiple datasets and was superior to other algorithms. | In the first phase, each hospital uploads image datasets for collaborative learning. In the second phase, hospitals share the locally trained model weights with the blockchain and use FL to aggregate all local models into a global model. Uploading images to the BC involves high costs and threatens privacy. |
Chowdhury et al., 2023 [76] | Proposes a web application to help users detect COVID-19 in a few seconds by uploading a single chest X-ray image. | Diagnosing chest X-ray images with three infection labels: COVID-19, pneumonia, and healthy. | They found that the Xception model outperforms other models. | They did not apply any privacy-preserving method to avoid privacy attacks. Also, they did not consider non-IID. |
Kumar et al., 2022 [73] | Attempts to improve fully decentralized FL by using distributed blockchain ledgers that share weights with HE. | Diagnosing chest X-ray images with three infection labels: COVID-19, pneumonia, and healthy. | They proposed a method to ensure the quality of the model and the learned data. The dropping of any FL participant may affect the performance of the model due to divergence of weights in the local models from the global model. HE provided lower reduction in accuracy than DP. | They mentioned the limitation of latency caused the blockchain and encryption computations. |
Wang et al., 2022 [62] | Attempts to fix the third-party dependence of FL on blockchain technology. | Diagnosing CT lung images with two infection labels: COVID-19 and healthy. | They found that the asynchronous method in the FL process achieved similar performance to using non-IID datasets. They reported results with different link capacities and found that increasing link capacity may decrease iteration delay time. | They reported difficulty in ensuring the quality of the local updated model because the operation was consistent for each local node. However, it was measured by Kumar et al. [62]. |
Kandati and Gadekallu [90] 2023 | Aims to address the issue of communication cost using swarm optimization algorithm. | Diagnosing X-ray images into three labels: Normal, COVID, and Viral Pneumonia. | They found swarm optimization has effective results only with small datasets and lower number of participants. | Their algorithm took longer to convert global model and required huge search space. |
References
- Coronavirus Disease (COVID-19). Available online: https://www.who.int/emergencies/diseases/novel-coronavirus-2019 (accessed on 25 May 2024).
- WHO Coronavirus (COVID-19) Dashboard|WHO Coronavirus (COVID-19) Dashboard with Vaccination Data. Available online: https://covid19.who.int/ (accessed on 3 May 2024).
- Halawa, S.; Pullamsetti, S.S.; Bangham, C.R.M.; Stenmark, K.R.; Dorfmüller, P.; Frid, M.G.; Butrous, G.; Morrell, N.W.; de Jesus Perez, V.A.; Stuart, D.I.; et al. Potential Long-Term Effects of SARS-CoV-2 Infection on the Pulmonary Vasculature: A Global Perspective. Nat. Rev. Cardiol. 2022, 19, 314–331. [Google Scholar] [CrossRef] [PubMed]
- Li, R.; Pei, S.; Chen, B.; Song, Y.; Zhang, T.; Yang, W.; Shaman, J. Substantial Undocumented Infection Facilitates the Rapid Dissemination of Novel Coronavirus (SARS-CoV-2). Science (1979) 2020, 368, 489–493. [Google Scholar] [CrossRef]
- Williamson, E.J.; Walker, A.J.; Bhaskaran, K.; Bacon, S.; Bates, C.; Morton, C.E.; Curtis, H.J.; Mehrkar, A.; Evans, D.; Inglesby, P.; et al. Factors Associated with COVID-19-Related Death Using OpenSAFELY. Nature 2020, 584, 430–436. [Google Scholar] [CrossRef]
- Aljondi, R.; Alghamdi, S. Diagnostic Value of Imaging Modalities for COVID-19: Scoping Review. J. Med. Internet Res. 2020, 22, e19673. [Google Scholar] [CrossRef]
- Bahadur, T.; Verma, K.; Kumar, B.; Jain, D. Coronavirus Disease (COVID-19) Detection in Chest X-Ray Images Using Majority Voting Based Classifier Ensemble. Expert Syst. Appl. 2021, 165, 113909. [Google Scholar]
- Sarma, K.V.; Harmon, S.; Sanford, T.; Roth, H.R.; Xu, Z.; Tetreault, J.; Xu, D.; Flores, M.G.; Raman, A.G.; Kulkarni, R.; et al. Federated Learning Improves Site Performance in Multicenter Deep Learning without Data Sharing. J. Am. Med. Inform. Assoc. 2021, 28, 1259–1264. [Google Scholar] [CrossRef]
- Shen, M.; Deng, Y.; Zhu, L.; Du, X.; Guizani, N. Privacy-Preserving Image Retrieval for Medical IoT Systems: A Blockchain-Based Approach. IEEE Netw. 2019, 33, 27–33. [Google Scholar] [CrossRef]
- Kaissis, G.A.; Makowski, M.R.; Rückert, D.; Braren, R.F. Secure, Privacy-Preserving and Federated Machine Learning in Medical Imaging. Nat. Mach. Intell. 2020, 2, 305–311. [Google Scholar] [CrossRef]
- Li, L.; Qin, L.; Xu, Z.; Yin, Y.; Wang, X.; Kong, B.; Bai, J.; Lu, Y.; Fang, Z.; Song, Q.; et al. Using Artificial Intelligence to Detect COVID-19 and Community-Acquired Pneumonia Based on Pulmonary CT: Evaluation of the Diagnostic Accuracy. Radiology 2020, 296, E65–E71. [Google Scholar] [CrossRef]
- Raisaro, J.L.; Marino, F.; Troncoso-Pastoriza, J.; Beau-Lejdstrom, R.; Bellazzi, R.; Murphy, R.; Bernstam, E.V.; Wang, H.; Bucalo, M.; Chen, Y.; et al. SCOR: A Secure International Informatics Infrastructure to Investigate COVID-19. J. Am. Med. Inform. Assoc. 2020, 27, 1721–1726. [Google Scholar] [CrossRef]
- Xu, Y.; Ma, L.; Yang, F.; Chen, Y.Y.; Ma, K.; Yang, J.; Yang, X.; Chen, Y.Y.; Shu, C.; Fan, Z.; et al. A Collaborative Online AI Engine for CT-Based COVID-19 Diagnosis. medRxiv 2020. [Google Scholar] [CrossRef]
- Mbunge, E.; Akinnuwesi, B.; Fashoto, S.G.; Metfula, A.S.; Mashwama, P. A Critical Review of Emerging Technologies for Tackling COVID-19 Pandemic. Hum. Behav. Emerg. Technol. 2021, 3, 25–39. [Google Scholar] [CrossRef]
- Thompson, P.M.; Stein, J.L.; Medland, S.E.; Hibar, D.P.; Vasquez, A.A.; Renteria, M.E.; Toro, R.; Jahanshad, N.; Schumann, G.; Franke, B.; et al. The ENIGMA Consortium: Large-Scale Collaborative Analyses of Neuroimaging and Genetic Data. Brain Imaging Behav. 2014, 8, 153–182. [Google Scholar] [CrossRef]
- Rieke, N.; Hancox, J.; Li, W.; Milletarì, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The Future of Digital Health with Federated Learning. NPJ Digit. Med. 2020, 3, 1–7. [Google Scholar] [CrossRef]
- Darzidehkalani, E.; Ghasemi-rad, M.; van Ooijen, P.M.A. Federated Learning in Medical Imaging: Part II: Methods, Challenges, and Considerations. J. Am. Coll. Radiol. 2022, 19, 975–982. [Google Scholar] [CrossRef]
- Darzidehkalani, E.; Ghasemi-rad, M.; van Ooijen, P.M.A. Federated Learning in Medical Imaging: Part I: Toward Multicentral Health Care Ecosystems. J. Am. Coll. Radiol. 2022, 19, 969–974. [Google Scholar] [CrossRef]
- Xu, J.; Glicksberg, B.S.; Su, C.; Walker, P.; Bian, J.; Wang, F. Federated Learning for Healthcare Informatics. J. Healthc. Inform. Res. 2021, 5, 1–19. [Google Scholar] [CrossRef]
- Yoo, J.H.; Jeong, H.; Lee, J.; Chung, T.M. Federated Learning: Issues in Medical Application. In Future Data and Security Engineering, Proceedings of the 8th International Conference, FDSE 2021, Virtual Event, 24–26 November 2021; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2021; Volume 13076 LNCS, pp. 3–22. [Google Scholar] [CrossRef]
- Peiffer-Smadja, N.; Maatoug, R.; Lescure, F.-X.; D’Ortenzio, E.; Pineau, J.; King, J.-R. Machine Learning for COVID-19 Needs Global Collaboration and Data-Sharing. Nat. Mach. Intell. 2020, 2, 293–294. [Google Scholar] [CrossRef]
- Shuja, J.; Alanazi, E.; Alasmary, W.; Alashaikh, A. COVID-19 Open Source Data Sets: A Comprehensive Survey. Appl. Intell. 2021, 51, 1296–1325. [Google Scholar] [CrossRef]
- Mondal, M.R.H.; Bharati, S.; Podder, P.; Kamruzzaman, J. Deep Learning and Federated Learning for Screening COVID-19: A Review. BioMedInformatics 2023, 3, 691–713. [Google Scholar] [CrossRef]
- Hwang, S.O.; Majeed, A. Analysis of Federated Learning Paradigm in Medical Domain: Taking COVID-19 as an Application Use Case. Appl. Sci. 2024, 14, 4100. [Google Scholar] [CrossRef]
- Hernandez-cruz, N.; Saha, P.; Sarker, M.K.; Noble, J.A. Review of Federated Learning and Machine Learning-Based Methods for Medical Image Analysis. Big Data Cogn. Comput. 2024, 8, 99. [Google Scholar] [CrossRef]
- Naz, S.; Phan, K.T.; Chen, Y.P.P. A Comprehensive Review of Federated Learning for COVID-19 Detection. Int. J. Intell. Syst. 2022, 37, 2371–2392. [Google Scholar] [CrossRef]
- Banabilah, S.; Aloqaily, M.; Alsayed, E.; Malik, N.; Jararweh, Y. Federated Learning Review: Fundamentals, Enabling Technologies, and Future Applications. Inf. Process. Manag. 2022, 59, 103061. [Google Scholar] [CrossRef]
- Lim, W.Y.B.; Luong, N.C.; Hoang, D.T.; Jiao, Y.; Liang, Y.C.; Yang, Q.; Niyato, D.; Miao, C. Federated Learning in Mobile Edge Networks: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2020, 22, 2031–2063. [Google Scholar] [CrossRef]
- Roberts, M.; Driggs, D.; Thorpe, M.; Gilbey, J.; Yeung, M.; Ursprung, S.; Aviles-Rivero, A.I.; Etmann, C.; McCague, C.; Beer, L.; et al. Common Pitfalls and Recommendations for Using Machine Learning to Detect and Prognosticate for COVID-19 Using Chest Radiographs and CT Scans. Nat. Mach. Intell. 2021, 3, 199–217. [Google Scholar] [CrossRef]
- Kumar, R.; Khan, A.A.; Kumar, J.; Golilarz, N.A.; Zhang, S.; Ting, Y.; Zheng, C.; Wang, W. Blockchain-Federated-Learning and Deep Learning Models for COVID-19 Detection Using CT Imaging. IEEE Sens. J. 2021, 21, 16301–16314. [Google Scholar] [CrossRef]
- Loddo, A.; Pili, F.; di Ruberto, C. Deep Learning for COVID-19 Diagnosis from CT Images. Appl. Sci. 2021, 11, 8227. [Google Scholar] [CrossRef]
- Frid-Adar, M.; Amer, R.; Gozes, O.; Nassar, J.; Greenspan, H. COVID-19 in CXR: From Detection and Severity Scoring to Patient Disease Monitoring. IEEE J. Biomed. Health Inform. 2021, 25, 1892–1903. [Google Scholar] [CrossRef]
- Tartaglione, E.; Barbano, C.A.; Berzovini, C.; Calandri, M.; Grangetto, M. Unveiling COVID-19 from Chest x-Ray with Deep Learning: A Hurdles Race with Small Data. Int. J. Environ. Res. Public Health 2020, 17, 6933. [Google Scholar] [CrossRef]
- World Health Organization. A Timeline of WHO’s COVID-19 Response in the WHO European Region: A Living Document (Version 3.0, from 31 December 2019 to 31 December 2021); Licence: CC BY-NC-SA 3.0 IGO; World Health Organization: Geneva, Switzerland, 2022. [Google Scholar]
- Mortality Analyses—Johns Hopkins Coronavirus Resource Center. Available online: https://coronavirus.jhu.edu/data/mortality (accessed on 8 May 2024).
- Pang, J.; Huang, Y.; Xie, Z.; Li, J.; Cai, Z. Collaborative City Digital Twin for the COVID-19 Pandemic: A Federated Learning Solution. Tsinghua Sci. Technol. 2021, 26, 759–771. [Google Scholar] [CrossRef]
- Ng, D.; Lan, X.; Yao, M.M.S.; Chan, W.P.; Feng, M. Federated Learning: A Collaborative Effort to Achieve Better Medical Imaging Models for Individual Sites That Have Small Labelled Datasets. Quant. Imaging Med. Surg. 2021, 11, 852–857. [Google Scholar] [CrossRef]
- Privacy|HHS.Gov. Available online: https://www.hhs.gov/hipaa/for-professionals/privacy/index.html (accessed on 8 November 2022).
- Processing—General Data Protection Regulation (GDPR). Available online: https://gdpr-info.eu/issues/processing/ (accessed on 8 November 2022).
- Yi, P.H.; Wei, J.; Kim, T.K.; Shin, J.; Sair, H.I.; Hui, F.K.; Hager, G.D.; Lin, C.T. Radiology “Forensics”: Determination of Age and Sex from Chest Radiographs Using Deep Learning. Emerg. Radiol. 2021, 28, 949–954. [Google Scholar] [CrossRef]
- Qian, F.; Zhang, A. The Value of Federated Learning during and Post-COVID-19. Int. J. Qual. Health Care 2021, 33, mzab010. [Google Scholar] [CrossRef]
- Banda, J.M.; Tekumalla, R.; Wang, G.; Yu, J.; Liu, T.; Ding, Y.; Artemova, E.; Tutubalina, E.; Chowell, G. A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration. Epidemiologia 2021, 2, 315–324. [Google Scholar] [CrossRef] [PubMed]
- Xia, T.; Spathis, D.; Brown, C.; Chauhan, J.; Grammenos, A.; Han, J.; Hasthanasombat, A.; Bondareva, E.; Dang, T.; Floto, A.; et al. COVID-19 Sounds: A Large-Scale Audio Dataset for Digital Respiratory Screening. In Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), Virtual-only Conference, August 2021; pp. 1–13. [Google Scholar]
- Kvak, D.; Bendik, M.; Chromcova, A. Towards Clinical Practice: Design and Implementation of Convolutional Neural Network-Based Assistive Diagnosis System for COVID-19 Case Detection from Chest X-Ray Images. arXiv 2022, arXiv:2203.10596. [Google Scholar]
- Golubev, A. Dicom Network Implementation and Usage in the Context of the Covid-19 Pandemic. Arch. Balk. Med. Union 2021, 56, 80–87. [Google Scholar] [CrossRef]
- Aiello, M.; Esposito, G.; Pagliari, G.; Borrelli, P.; Brancato, V.; Salvatore, M. How Does DICOM Support Big Data Management? Investigating Its Use in Medical Imaging Community. Insights Imaging 2021, 12, 164. [Google Scholar] [CrossRef]
- Tsai, E.B.; Simpson, S.; Lungren, M.P.; Hershman, M.; Roshkovan, L.; Colak, E.; Erickson, B.J.; Shih, G.; Stein, A.; Kalpathy-Cramer, J.; et al. The RSNA International COVID-19 Open Radiology Database (RICORD). Radiology 2021, 299, E204–E213. [Google Scholar] [CrossRef] [PubMed]
- Vayá, M.d.l.I.; Saborit, J.M.; Montell, J.A.; Pertusa, A.; Bustos, A.; Cazorla, M.; Galant, J.; Barber, X.; Orozco-Beltrán, D.; García-García, F.; et al. BIMCV COVID-19+: A Large Annotated Dataset of RX and CT Images from COVID-19 Patients. arXiv 2020, arXiv:2006.01174. [Google Scholar]
- Peng, L.; Luo, G.; Walker, A.; Zaiman, Z.; Jones, E.K.; Gupta, H.; Kersten, K.; Burns, J.L.; Harle, C.A.; Magoc, T.; et al. Evaluation of Federated Learning Variations for COVID-19 Diagnosis Using Chest Radiographs from 42 US and European Hospitals. J. Am. Med. Inform. Assoc. 2023, 30, 54–63. [Google Scholar] [CrossRef]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 22 April 2017; Volume 54, p. 10. [Google Scholar]
- Darzidehkalani, E. Federated Learning in Medical Image Analysis. Pattern Recognit. 2024, 151, 110424. [Google Scholar]
- Shyu, C.; Putra, K.T.; Chen, H.; Tsai, Y.; Hossain, K.S.M.T.; Jiang, W.; Shae, Z. A Systematic Review of Federated Learning in the Healthcare Area: From the Perspective of Data Properties and Applications. Appl. Sci. 2021, 11, 11191. [Google Scholar] [CrossRef]
- Liu, B.; Yan, B.; Zhou, Y.; Yang, Y.; Zhang, Y. Experiments of Federated Learning for COVID-19 Chest X-Ray Images. arXiv 2020, arXiv:2007.05592. [Google Scholar]
- Abdul, M.; Id, S.; Taha, S.; Ramadan, M. COVID-19 Detection Using Federated Machine Learning. PLoS ONE 2021, 16, e0252573. [Google Scholar] [CrossRef] [PubMed]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated Optimization In Heterogeneous Networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
- Kaissis, G.; Ziller, A.; Passerat-Palmbach, J.; Ryffel, T.; Usynin, D.; Trask, A.; Lima, I.; Mancuso, J.; Jungmann, F.; Steinborn, M.M.; et al. End-to-End Privacy Preserving Deep Learning on Multi-Institutional Medical Imaging. Nat. Mach. Intell. 2021, 3, 473–484. [Google Scholar] [CrossRef]
- Guha Roy, A.; Siddiqui, S.; Pölsterl, S.; Navab, N.; Wachinger, C. BrainTorrent: A Peer-to-Peer Environment for Decentralized Federated Learning. arXiv 2019, arXiv:1905.06731. [Google Scholar]
- Li, X.; Gu, Y.; Dvornek, N.; Staib, L.H.; Ventola, P.; Duncan, J.S. Multi-Site FMRI Analysis Using Privacy-Preserving Federated Learning and Domain. Med. Image Anal. 2020, 65, 101765. [Google Scholar] [CrossRef] [PubMed]
- Dou, Q.; So, T.Y.; Jiang, M.; Liu, Q.; Vardhanabhuti, V.; Kaissis, G.; Li, Z.; Si, W.; Lee, H.H.C.; Yu, K.; et al. Federated Deep Learning for Detecting COVID-19 Lung Abnormalities in CT: A Privacy-Preserving Multinational Validation Study. NPJ Digit. Med. 2021, 4, 60. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Shen, B.; Barnawi, A.; Xi, S.; Kumar, N.; Wu, Y. FedDPGAN: Federated Differentially Private Generative Adversarial Networks Framework for the Detection of COVID-19 Pneumonia. Inf. Syst. Front. 2021, 23, 1403–1415. [Google Scholar] [CrossRef]
- Nguyen, D.C.; Ding, M.; Member, S.; Pathirana, P.N.; Member, S. Federated Learning for COVID-19 Detection with Generative Adversarial Networks in Edge Cloud Computing. IEEE Internet Things J. 2021, 9, 10257–10271. [Google Scholar] [CrossRef]
- Wang, Z.; Cai, L.; Zhang, X.; Choi, C.; Su, X. Research Article A COVID-19 Auxiliary Diagnosis Based on Federated Learning and Blockchain. Comput. Math. Methods Med. 2022, 2022, 7078764. [Google Scholar] [PubMed]
- Yang, D.; Xu, Z.; Li, W.; Myronenko, A.; Roth, H.R.; Harmon, S.; Xu, S.; Turkbey, B.; Turkbey, E.; Wang, X.; et al. Federated Semi-Supervised Learning for COVID Region Segmentation in Chest CT Using Multi-National Data from China, Italy, Japan. Med. Image Anal. 2021, 70, 101992. [Google Scholar] [CrossRef]
- Dayan, I.; Roth, H.R.; Zhong, A.; Harouni, A.; Gentili, A.; Abidin, A.Z.; Liu, A.; Costa, A.B.; Wood, B.J.; Tsai, C.S.; et al. Federated Learning for Predicting Clinical Outcomes in Patients with COVID-19. Nat. Med. 2021, 27, 1735–1743. [Google Scholar] [CrossRef]
- Jiang, M.; Wang, Z.; Dou, Q. HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on Heterogeneous Medical Images. Proc. AAAI Conf. Artif. Intell. 2022, 36, 1087–1095. [Google Scholar] [CrossRef]
- Bhattacharya, A.; Gawali, M.; Seth, J.; Kulkarni, V. Application of Federated Learning in Building a Robust COVID-19 Chest X-Ray Classification Model. arXiv 2022, arXiv:2204.10505. [Google Scholar]
- Zhou, J.; Zhou, L.; Wang, D.; Xu, X.; Li, H.; Chu, Y.; Han, W.; Gao, X. Personalized and Privacy-Preserving Federated Heterogeneous Medical Image Analysis with PPPML-HMI. Comput. Biol. Med. 2024, 169, 107861. [Google Scholar] [CrossRef]
- Feki, I.; Ammar, S.; Kessentini, Y.; Muhammad, K. Federated Learning for COVID-19 Screening from Chest X-Ray Images. Appl. Soft Comput. 2021, 106, 107330. [Google Scholar] [CrossRef]
- Bai, X.; Wang, H.; Ma, L.; Xu, Y.; Gan, J.; Fan, Z.; Yang, F.; Ma, K.; Yang, J.; Bai, S.; et al. Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence. Nat. Mach. Intell. 2021, 3, 1081–1089. [Google Scholar] [CrossRef] [PubMed]
- Lo, S.K.; Liu, Y.; Lu, Q.; Wang, C.; Xu, X.; Paik, H.-Y.; Zhu, L. Blockchain-Based Trustworthy Federated Learning Architecture. arXiv 2021, arXiv:2108.06912. [Google Scholar]
- Malik, H.; Naeem, A.; Naqvi, R.A.; Loh, W.K. DMFL_Net: A Federated Learning-Based Framework for the Classification of COVID-19 from Multiple Chest Diseases Using X-Rays. Sensors 2023, 23, 743. [Google Scholar] [CrossRef]
- Li, X.; Jiang, M.; Zhang, X.; Kamp, M.; Dou, Q. FedBN: Federated Learning on Non-IID Features via Local Batch Normalization. arXiv 2021, arXiv:2102.07623. [Google Scholar]
- Kumar, R.; Kumar, J.; Aman, A.; Ali, H.; Bernard, C.M.; Ullah, R.; Zeng, S. Blockchain and Homomorphic Encryption Based Privacy-Preserving Model Aggregation for Medical Images. Comput. Med. Imaging Graph. 2022, 102, 102139. [Google Scholar] [CrossRef]
- Dong, N.; Voiculescu, I. Federated Contrastive Learning for Decentralized Unlabeled Medical Images. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2021, Proceedings of the 24th International Conference, Strasbourg, France, 27 September–1 October 2021; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2021; Volume 12903 LNCS, pp. 378–387. [Google Scholar] [CrossRef]
- Ho, T.T.; Tran, K.D.; Huang, Y. FedSGDCOVID: Federated SGD COVID-19 Detection under Local Differential Privacy Using Chest X-Ray Images and Symptom Information. Sensors 2022, 22, 3728. [Google Scholar] [CrossRef]
- Chowdhury, D.; Banerjee, S.; Sannigrahi, M.; Dey, A.; Dhar, A.; Chakraborty, A.; Das, A. Federated Learning Based Covid-19 Detection. Expert Syst. 2023, 40, e13173. [Google Scholar] [CrossRef]
- Kumar, R.; Wang, W.; Yuan, C.; Kumar, J.; Zheng, C.; Aman, A. Blockchain Based Privacy-Preserved Federated Learning for Medical Images: A Case Study of COVID-19 CT Scans. arXiv 2021, arXiv:2104.10903. [Google Scholar]
- Florescu, L.M.; Streba, C.T.; Şerbănescu, M.S.; Mămuleanu, M.; Florescu, D.N.; Teică, R.V.; Nica, R.E.; Gheonea, I.A. Federated Learning Approach with Pre-Trained Deep Learning Models for COVID-19 Detection from Unsegmented CT Images. Life 2022, 12, 958. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Zhou, T.; Lu, Q.; Wang, X.; Zhu, C.; Sun, H.; Wang, Z.; Lo, S.K.; Wang, F.-Y. Dynamic Fusion-Based Federated Learning for COVID-19 Detection. IEEE Internet Things 2021, 8, 15884–15891. [Google Scholar] [CrossRef] [PubMed]
- Qayyum, A.; Ahmad, K.; Ahsan, M.A.; Al-Fuqaha, A.; Qadir, J. Collaborative Federated Learning For Healthcare: Multi-Modal COVID-19 Diagnosis at the Edge. IEEE Open J. Comput. Soc. 2022, 3, 1–10. [Google Scholar] [CrossRef]
- Adhikari, R.; Settles, C. Secure Federated Learning Approaches to Diagnosing COVID-19. arXiv 2024, arXiv:2401.12438. [Google Scholar]
- Kareem, A.; Liu, H.; Velisavljevic, V. A Federated Learning Framework for Pneumonia Image Detection Using Distributed Data. Healthc. Anal. 2023, 4, 100204. [Google Scholar] [CrossRef]
- Jabłecki, P.; Ślazyk, F.; Malawski, M. Federated Learning in the Cloud for Analysis of Medical Images—Experience with Open Source Frameworks. In Clinical Image-Based Procedures, Distributed and Collaborative Learning, Artificial Intelligence for Combating COVID-19 and Secure and Privacy-Preserving Machine Learning, Proceedings of the 10th Workshop, CLIP 2021, Second Workshop, DCL 2021, First Workshop, LL-COVID19 2021, and First Workshop and Tutorial, PPML 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, 27 September and 1 October 2021; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2021; Volume 12969 LNCS, pp. 111–119. [Google Scholar] [CrossRef]
- Darzi, E.; Sijtsema, N.M.; van Ooijen, P.M.A. A Comparative Study of Federated Learning Methods for COVID-19 Detection. Sci. Rep. 2024, 14, 3944. [Google Scholar] [CrossRef] [PubMed]
- Sun, G.; Shu, H.; Shao, F.; Racharak, T.; Kong, W.; Pan, Y.; Dong, J.; Wang, S.; Nguyen, L.M.; Xin, J. FKD-Med: Privacy-Aware, Communication-Optimized Medical Image Segmentation via Federated Learning and Model Lightweighting Through Knowledge Distillation. IEEE Access 2024, 12, 33687–33704. [Google Scholar] [CrossRef]
- Balachandar, N.; Chang, K.; Kalpathy-Cramer, J.; Rubin, D.L. Accounting for Data Variability in Multi-Institutional Distributed Deep Learning for Medical Imaging. J. Am. Med. Inform. Assoc. 2020, 27, 700–708. [Google Scholar] [CrossRef] [PubMed]
- Durga, R.; Poovammal, E. FLED-Block: Federated Learning Ensembled Deep Learning Blockchain Model for COVID-19 Prediction. Front. Public Health 2022, 10, 892499. [Google Scholar] [CrossRef]
- Jothimurugesan, E.; Hsieh, K.; Wang, J.; Joshi, G.; Gibbons, P.B. Federated Learning under Distributed Concept Drift. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, Valencia, Spain, 25–27 April 2023; Volume 206, pp. 5834–5853. [Google Scholar]
- Chetoui, M.; Akhloufi, M.A. Federated Learning Approach for Early Detection Federated Learning for COVID-19 Detection. Computers 2023, 12, 106. [Google Scholar] [CrossRef]
- Kandati, D.R.; Gadekallu, T.R. Federated Learning Approach for Early Detection of Chest Lesion Caused by COVID-19 Infection Using Particle Swarm Optimization. Electronics 2023, 12, 710. [Google Scholar] [CrossRef]
Attack Name | Description of Impact | Methods |
---|---|---|
Reconstructor attacks | The image features are retrieved from the local updated weights. | DP/HE. |
Poisoning model | a local model trained on fake labels or irrelevant datasets aimed at harming a global model is uploaded. | Measure the quality of local updates, which is still an open door in FL systems. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alhafiz, F.S.; Basuhail, A.A. Non-IID Medical Imaging Data on COVID-19 in the Federated Learning Framework: Impact and Directions. COVID 2024, 4, 1985-2016. https://doi.org/10.3390/covid4120140
Alhafiz FS, Basuhail AA. Non-IID Medical Imaging Data on COVID-19 in the Federated Learning Framework: Impact and Directions. COVID. 2024; 4(12):1985-2016. https://doi.org/10.3390/covid4120140
Chicago/Turabian StyleAlhafiz, Fatimah Saeed, and Abdullah Ahmad Basuhail. 2024. "Non-IID Medical Imaging Data on COVID-19 in the Federated Learning Framework: Impact and Directions" COVID 4, no. 12: 1985-2016. https://doi.org/10.3390/covid4120140
APA StyleAlhafiz, F. S., & Basuhail, A. A. (2024). Non-IID Medical Imaging Data on COVID-19 in the Federated Learning Framework: Impact and Directions. COVID, 4(12), 1985-2016. https://doi.org/10.3390/covid4120140