Comment on Uzun Ozsahin et al. COVID-19 Prediction Using Black-Box Based Pearson Correlation Approach. Diagnostics 2023, 13, 1264
1. Introduction
2. Elements of the Comments
2.1. ML Methodology
- Definition of the research question that includes the ML models that will be exploited and the meaning of the dependent variables (output) to be predicted, and the motivation for this analysis.
- Detailed description of the independent (input) variables, including the descriptive statistics of the dataset. The feature selection and/or feature extraction processes are also specified here.
- Detailed description of data pre-processing. Including data cleaning, normalization or scaling of the data, and handling of outliers or null values.
- Detailed description of validation approach with specification of data set partitioning for training, validation, and test. Separation of training and test is critical to avoid double dipping [8] bias in the estimation of generalization results (aka external validation).
- Definition of the performance measures and experimental design, including the relevant statistical tests used for the analysis of the experimental results and the extraction of conclusions from the study.
- Exhaustive report of all computational experimental results and discussions relating them to the relevant literature. Many papers include supplementary material when the size of the comprehensive report is excessive.
2.2. AI Writing
2.3. Result Reporting and Visualization
2.4. Discussion and Conclusions
2.5. Open Science
- Open access to research results: papers should be open access to all readers.
- Open access to the data: the data used for the study should be accessible without limitations. This is of special importance when the conclusions of the study may influence policy makers. Specially when high impact policies are potentially harming the people.
- Open access to code: the code should be ready for third party validation of the claimed results. If the code is not functional, the claims of the papers can not be sustained.
- Reproducibility: in the case of ML studies, reproducibility encompasses access to the code and the original data in a way that allows to reproduce the claimed results. Non-reproducible studies should not be taken into consideration for policy making.
- Open peer review
- Ethical conduct of research that includes also ethical paper writing, i.e., no misleading conclusions unsupported by evidence, or the mischievous use of AI tools.
3. Specific Comments
3.1. ML Methodology
3.1.1. Research Questions
3.1.2. Dataset
3.1.3. Data Pre-Processing
3.1.4. Validation
3.1.5. Experimental Design
3.1.6. Results Reporting
- Assume that each row contains some correlation values (normalized as percentages) among variables, as identified by the color code at the bottom; there is no way to ensure that the sum of the considered correlation will be 100%. It appears that, to compensate, the authors represent the missing percentage as a form of self-correlation (a bar segment in the same color as the variable), which is a highly irregular procedure.
- The “dependent” variables seem to be the cumulative cases and deaths in Israel and Greece, with a plot for each combination. However, the “independent” variables are weekly counts that go up and down, while the dependent variables always go up (monotonically increasing). Hence, computing the correlation between these time series has no meaning. Despite this, the authors report positive correlations in almost all pairs of variables (if our interpretation is correct).
- For the case plots (Figures 2 and 4), the independent variables are hospital admissions and intensive care unit (ICU) admissions. These variables are, in fact, dependent of the number of cases. In other words, there is a causative link between (weekly) cases and these variables, as the cases may become (with some delay) hospital admissions and the hospital admissions may become ICU admissions. Hence, using them as explicative variables for the (total) cases is senseless.
- For the death plots (Figures 3 and 5), the independent variables are the death count variables, which is perfectly senseless. The ”total deaths” and “total deaths per million” are perfectly correlated variables; in fact, they are the same variable except for a scale factor. The same is true for “new deaths” and “new deaths per million”, and “new deaths smoothed” and “new deaths smoothed per million”. The correlation between these pairs should be 100%, but the authors report other values. In any case, only one pair should be used in any regression or correlation analysis. Furthermore, computing the correlation between the various death variables that have the same information is perfectly senseless.
- Similarly, in the case plots (Figures 2 and 4) “weekly hospital admissions” and “weekly hospital admissions per million” are perfectly colinear, as are “weekly icu admissions” and “weekly ICU admissions per million”. The values in the plots appear to be completely disconnected from reality.
3.2. AI Writing
- Abstract—85.9%
- Introduction—54.29%
- Materials and Methods—Data 45.13%—Models 54.82%
- Application of Results and Discussion—17.03%
- Conclusions—0%
3.3. Discussion and Conclusions
3.4. Open Science
4. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
ML | Machine Leaning |
AI | Artificial Intelligence |
ANN | Artificial Neural Network |
MLR | Multivariate Linear Regression |
ICU | Intensive care unit |
References
- Koelle, K.; Martin, M.A.; Antia, R.; Lopman, B.; Dean, N.E. The changing epidemiology of SARS-CoV-2. Science 2022, 375, 1116–1121. [Google Scholar] [CrossRef] [PubMed]
- Troiano, G.; Nardi, A. Vaccine hesitancy in the era of COVID-19. Public Health 2021, 194, 245–251. [Google Scholar] [CrossRef] [PubMed]
- Gomes, R.; Kamrowski, C.; Langlois, J.; Rozario, P.; Dircks, I.; Grottodden, K.; Martinez, M.; Tee, W.Z.; Sargeant, K.; LaFleur, C.; et al. A Comprehensive Review of Machine Learning Used to Combat COVID-19. Diagnostics 2022, 12, 1853. [Google Scholar] [CrossRef] [PubMed]
- Uzun Ozsahin, D.; Precious Onakpojeruo, E.; Bartholomew Duwa, B.; Usman, A.G.; Isah Abba, S.; Uzun, B. COVID-19 Prediction Using Black-Box Based Pearson Correlation Approach. Diagnostics 2023, 13, 1264. [Google Scholar] [CrossRef] [PubMed]
- Deo, R.C. Machine Learning in Medicine. Circulation 2015, 132, 1920–1930. [Google Scholar] [CrossRef] [PubMed]
- Schober, P.; Boer, C.; Schwarte, L.A. Correlation Coefficients: Appropriate Use and Interpretation. Anesth. Analg. 2018, 126, 1763–1768. [Google Scholar] [CrossRef] [PubMed]
- Greener, J.G.; Kandathil, S.M.; Moffat, L.; Jones, D.T. A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 2022, 23, 40–55. [Google Scholar] [CrossRef]
- Ball, T.M.; Squeglia, L.M.; Tapert, S.F.; Paulus, M.P. Double Dipping in Machine Learning: Problems and Solutions. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 2020, 5, 261–263. [Google Scholar] [CrossRef]
- Salvagno, M.; Taccone, F.S.; Gerli, A.G. Can artificial intelligence help for scientific writing? Crit. Care 2023, 27, 75. [Google Scholar] [CrossRef]
- Bertram, M.G.; Sundin, J.; Roche, D.G.; Sánchez-Tójar, A.; Thoré, E.S.J.; Brodin, T. Open science. Curr. Biol. 2023, 33, R792–R797. [Google Scholar] [CrossRef]
- Manuel Graña, G.B. A Critical Reading of “COVID-19 Prediction Using Black-Box Based Pearson Correlation Approach”. 2023. Available online: https://zenodo.org/records/8411191 (accessed on 27 October 2024).
- DeLisi, L.E. Editorial: Where have all the reviewers gone?: Is the peer review concept in crisis? Psychiatry Res. 2022, 310, 114454. [Google Scholar] [CrossRef]
- Berquist, T.H. Journal publication ethics 201: Culture in crisis? AJR Am. J. Roentgenol. 2010, 194, 553. [Google Scholar] [CrossRef]
- West, J.D.; Bergstrom, C.T. Misinformation in and about science. Proc. Natl. Acad. Sci. USA 2021, 118, e1912444117. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Graña, M.; Badiola-Zabala, G.; Cano-Escalera, G. Comment on Uzun Ozsahin et al. COVID-19 Prediction Using Black-Box Based Pearson Correlation Approach. Diagnostics 2023, 13, 1264. Diagnostics 2024, 14, 2528. https://doi.org/10.3390/diagnostics14222528
Graña M, Badiola-Zabala G, Cano-Escalera G. Comment on Uzun Ozsahin et al. COVID-19 Prediction Using Black-Box Based Pearson Correlation Approach. Diagnostics 2023, 13, 1264. Diagnostics. 2024; 14(22):2528. https://doi.org/10.3390/diagnostics14222528
Chicago/Turabian StyleGraña, Manuel, Goizalde Badiola-Zabala, and Guillermo Cano-Escalera. 2024. "Comment on Uzun Ozsahin et al. COVID-19 Prediction Using Black-Box Based Pearson Correlation Approach. Diagnostics 2023, 13, 1264" Diagnostics 14, no. 22: 2528. https://doi.org/10.3390/diagnostics14222528
APA StyleGraña, M., Badiola-Zabala, G., & Cano-Escalera, G. (2024). Comment on Uzun Ozsahin et al. COVID-19 Prediction Using Black-Box Based Pearson Correlation Approach. Diagnostics 2023, 13, 1264. Diagnostics, 14(22), 2528. https://doi.org/10.3390/diagnostics14222528