The New York City COVID-19 Spread in the 2020 Spring: A Study on the Potential Role of Particulate Using Time Series Analysis and Machine Learning
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset Description
2.2. Methodologies
2.2.1. Granger Causality
2.2.2. Machine Learning
3. Results
3.1. Granger Analysis: Results
3.2. Machine Learning: Results
Algorithm 1 Dataset preparation algorithm. |
1: Input: raw data with number of infections and pollution values per county 2: Output: time series with pollution levels and infection threshold exceedances per county 3: begin 4: in_data = [] 5: out_data = [] 6: for each county in counties do 7: i = 0 8: for each day from 03/04 to 03/22 do 9: out_data[county][i] = infections[county][day] > threshold 10: in_data[county][i] = [] 11: for each lag from 0 to 7 do 12: in_data[county][i].append(pollution[county][day-14+lag]) 13: end for 14: i++ 15: end for 16: end for 17: end |
Algorithm 2 Training and (county) cross validation algorithm. |
1: Input: time series with pollution levels and infection threshold exceedances per county 2: Output: predictions accuracy on the COVID-19 infections per each county 3: begin 4: results = [] 5: for each county in counties do 6: input_train = in_data[!county] 7: output_train = out_data[!county] 8: input_validation = in_data[county] 9: output_validation = out_data[county] 10: model.train(input_train, output_train) 11: output_pred = model.test(input_validation) 12: results.append(f1_score(output_result, output_validation) 13: end for 14: end |
4. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Goldstein, J.; McKinley, J. Coronavirus in N.Y.: Manhattan Woman Is First Confirmed Case in State. The New York Times. 1 March 2020. Available online: https://www.nytimes.com/2020/03/01/nyregion/new-york-coronvirus-confirmed.html (accessed on 24 November 2020).
- Zurcher, A. Coronavirus spreading in New York like ‘a bullet train’. BBC News. 24 March 2020. Available online: https://www.bbc.com/news/world-us-canada-52012048 (accessed on 24 November 2020).
- Yang, W.; Shaff, J.; Shaman, J. COVID-19 Transmission Dynamics and Effectiveness of Public Health Interventions in New York City during the 2020 Spring Pandemic Wave. medRxiv 2020. [Google Scholar] [CrossRef]
- COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). 2020. Available online: https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6 (accessed on 24 November 2020).
- Wynants, L.; Van Calster, B.; Collins, G.S.; Riley, R.D.; Heinze, G.; Schuit, E.; Bonten, M.M.; Dahly, D.L.; Damen, J.A.; Debray, T.P.; et al. Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal. BMJ 2020, 369. [Google Scholar] [CrossRef] [Green Version]
- Roda, W.C.; Varughese, M.B.; Han, D.; Li, M.Y. Why is it difficult to accurately predict the COVID-19 epidemic? Infect. Dis. Model. 2020, 5, 271–281. [Google Scholar] [CrossRef] [PubMed]
- Dinnon, K.H.; Leist, S.R.; Schäfer, A.; Edwards, C.E.; Martinez, D.R.; Montgomery, S.A.; West, A.; Yount, B.L.; Hou, Y.J.; Adams, L.E.; et al. A mouse-adapted model of SARS-CoV-2 to test COVID-19 countermeasures. Nature 2020, 586, 1–7. [Google Scholar] [CrossRef] [PubMed]
- Rocklöv, J.; Sjödin, H.; Wilder-Smith, A. COVID-19 outbreak on the Diamond Princess cruise ship: Estimating the epidemic potential and effectiveness of public health countermeasures. J. Travel Med. 2020, 27. [Google Scholar] [CrossRef] [Green Version]
- Rezaei, M.; Azarmi, M. DeepSOCIAL: Social Distancing Monitoring and Infection Risk Assessment in COVID-19 Pandemic. Appl. Sci. 2020, 10, 7514. [Google Scholar] [CrossRef]
- Lauritano, D.; Moreo, G.; Limongelli, L.; Nardone, M.; Carinci, F. Environmental Disinfection Strategies to Prevent Indirect Transmission of SARS-CoV2 in Healthcare Settings. Appl. Sci. 2020, 10, 6291. [Google Scholar] [CrossRef]
- Ahmed, N.; Michelin, R.A.; Xue, W.; Ruj, S.; Malaney, R.; Kanhere, S.S.; Seneviratne, A.; Hu, W.; Janicke, H.; Jha, S.K. A survey of covid-19 contact tracing apps. IEEE Access 2020, 8, 134577–134601. [Google Scholar] [CrossRef]
- Hellewell, J.; Abbott, S.; Gimma, A.; Bosse, N.I.; Jarvis, C.I.; Russell, T.W.; Munday, J.D.; Kucharski, A.J.; Edmunds, W.J.; Sun, F.; et al. Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts. Lancet Glob. Health 2020, 8, e488–e4996. [Google Scholar] [CrossRef] [Green Version]
- Kretzschmar, M.E.; Rozhnova, G.; Bootsma, M.C.; van Boven, M.; van de Wijgert, J.H.; Bonten, M.J. Impact of delays on effectiveness of contact tracing strategies for COVID-19: A modelling study. Lancet Public Health 2020, 5, e452–e459. [Google Scholar] [CrossRef]
- Hernández-Orallo, E.; Calafate, C.T.; Cano, J.-C.; Manzoni, P. Evaluating the Effectiveness of COVID-19 Bluetooth-Based Smartphone Contact Tracing Applications. Appl. Sci. 2020, 10, 7113. [Google Scholar] [CrossRef]
- Di Crosta, A.; Palumbo, R.; Marchetti, D.; Ceccato, I.; La Malva, P.; Maiella, R.; Cipi, M.; Roma, P.; Mammarella, N.; Verrocchio, M.C. Individual differences, economic stability, and fear of contagion as risk factors for PTSD symptoms in the COVID-19 emergency. Front. Psychol. 2020, 11, 2329. [Google Scholar] [CrossRef] [PubMed]
- Staszkiewicz, P.; Chomiak-Orsa, I. Dynamics of the COVID-19 Contagion and Mortality: Country Factors, Social Media, and Market Response Evidence from a Global Panel Analysis. IEEE Access 2020, 8, 106009–106022. [Google Scholar] [CrossRef]
- Marini, J.J.; Gattinoni, L. Management of COVID-19 Respiratory Distress. JAMA 2020, 323, 2329. [Google Scholar] [CrossRef] [PubMed]
- Shakil, M.H.; Munim, Z.H.; Tasnia, M.; Sarowar, S. COVID-19 and the environment: A critical review and research agenda. Sci. Total. Environ. 2020, 745, 141022. [Google Scholar] [CrossRef]
- Wu, X.; Nethery, R.C.; Sabath, M.B.; Braun, D.; Dominici, F. Air pollution and COVID-19 mortality in the United States: Strengths and limitations of an ecological regression analysis. Sci. Adv. 2020, 6, eabd4049. [Google Scholar] [CrossRef]
- Becchetti, L.; Conzo, G.; Conzo, P.; Salustri, F. Understanding the Heterogeneity of Adverse COVID-19 Outcomes: The Role of Poor Quality of Air and Lockdown Decisions. SSRN Electron. J. 2020. [Google Scholar] [CrossRef]
- Setti, L.; Passarini, F.; De Gennaro, G.; Barbieri, P.; Licen, S.; Perrone, M.G.; Piazzalunga, A.; Borelli, M.; Palmisani, J.; Di Gilio, A.; et al. Potential role of particulate matter in the spreading of COVID-19 in Northern Italy: First observational study based on initial epidemic diffusion. BMJ Open 2020, 10, e039338. [Google Scholar] [CrossRef]
- Delnevo, G.; Mirri, S.; Roccetti, M. Particulate Matter and COVID-19 Disease Diffusion in Emilia-Romagna (Italy). Already a Cold Case? Computation 2020, 8, 59. [Google Scholar] [CrossRef]
- Mirri, S.; Delnevo, G.; Roccetti, M. Is a COVID-19 Second Wave Possible in Emilia-Romagna (Italy)? Forecasting a Future Outbreak with Particulate Pollution and Machine Learning. Computation 2020, 8, 74. [Google Scholar] [CrossRef]
- Jiang, Y.; Wu, X.-J.; Guan, Y.-J. Effect of ambient air pollutants and meteorological variables on COVID-19 incidence. Infect. Control. Hosp. Epidemiol. 2020, 41, 1011–1015. [Google Scholar] [CrossRef] [PubMed]
- New York State Department of Health COVID-19 Tracker. Available online: https://covid19tracker.health.ny.gov/views/NYS-COVID19-Tracker/NYSDOHCOVID-19Tracker-DailyTracker (accessed on 24 November 2020).
- United States Environmental Protection Agency. Outdoor Air Quality Data. Available online: https://www.epa.gov/outdoor-air-quality-data/download-daily-data (accessed on 24 November 2020).
- Li, Q.; Guan, X.; Wu, P.; Wang, X.; Zhou, L.; Tong, Y.; Ren, R.; Leung, K.S.; Lau, E.H.; Wong, J.Y.; et al. Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia. N. Engl. J. Med. 2020. [Google Scholar] [CrossRef] [PubMed]
- Cereda, D.; Tirani, M.; Rovida, F.; Demicheli, V.; Ajelli, M.; Poletti, P.; Trentini, F.; Guzzetta, G.; Marziano, V.; Barone, A.; et al. The early phase of the COVID-19 outbreak in Lombardy, Italy. arXiv 2020, arXiv:2003.09320. [Google Scholar]
- New York State on PAUSE. Available online: https://coronavirus.health.ny.gov/new-york-state-pause (accessed on 24 November 2020).
- TownCharts. Top 25 New-York Counties Ranked by Population Density. Available online: https://www.towncharts.com/New-York/Top-25-Counties-in-New-York-ranked-by-Population-Density.html (accessed on 24 November 2020).
- WHO. Air Quality Guidelines for Particulate Matter, Ozone, Nitrogen Dioxide and Sulfur Dioxide. Available online: https://apps.who.int/iris/bitstream/handle/10665/69477/WHO_SDE_PHE_OEH_06.02_eng.pdf?sequence=1 (accessed on 24 November 2020).
- Granger, C. Investigating Causal Relations by Econometric Models and Cross-spectral Methods. Econometrica 1969, 37, 424–438. [Google Scholar] [CrossRef]
- Granger, C.W. Testing for causality: A personal viewpoint. J. Econ. Dyn. Control 1980, 2, 329–352. [Google Scholar] [CrossRef]
- Dickey, D.; Fuller, W. Distribution of the estimators for autoregressive time series with a unit root. J. Am. Stat. Assoc. 1979, 74, 427–431. [Google Scholar] [CrossRef]
- Roccetti, M.; Delnevo, G.; Casini, L.; Cappiello, G. Is bigger always better? A controversial journey to the center of machine learning design, with uses and misuses of big data for predicting water meter failures. J. Big Data 2019, 6, 70. [Google Scholar] [CrossRef] [Green Version]
- Carbonaro, A.; Piccinini, F.; Reda, R. Integrating Heterogeneous Data of Healthcare Devices to enable Domain Data Management. J. e-Learn. Knowl. Soc. 2018, 14. [Google Scholar] [CrossRef]
- Salomoni, P.; Mirri, S.; Ferretti, S.; Roccetti, M. Profiling learners with special needs for custom e-Learning experiences, a closed case? In Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A 2007), Banff, AB, Canada, 7–8 May 2007; pp. 84–92. [Google Scholar]
- Keller, J.M.; Gray, M.R.; Givens, J.A. A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern. 1985, 4, 580–585. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J.; Franklin, J. The elements of statistical learning: Data mining, inference and prediction. Math. Intell. 2005, 27, 83–85. [Google Scholar]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef] [Green Version]
County | Density | Population |
---|---|---|
New York | 72,056 | 1,632,480 |
Kings | 37,252 | 2,600,750 |
Bronx | 21,132 | 1,437,870 |
Queens | 34,194 | 2,298,510 |
Richmond | 8149 | 474,101 |
Nassau | 4763 | 1,356,560 |
Westchester | 2250 | 968,815 |
Rockland | 1866 | 323,686 |
Suffolk | 1632 | 1,487,900 |
Monroe | 1132 | 744,248 |
Onondaga | 596 | 464,242 |
Oneida | 190 | 230,782 |
County | PM2.5 | # of Days with PM2.5 > 10 | ||||
---|---|---|---|---|---|---|
19/02–08/03 | 19/02–31/07 | Increase | Percentage | 19/02–08/03 | 19/02–31/07 | |
New York | 7.48 | 6.63 | +0.85 | +12.8% | 4 | 10 |
Kings | 6.27 | 5.28 | +0.99 | +18.8% | 2 | 8 |
Bronx | 7.62 | 6.47 | +1.15 | +17.8% | 5 | 15 |
Queens | 6.87 | 5.79 | +1.08 | +18.7% | 4 | 14 |
Richmond | 6.86 | 5.63 | +1.23 | +21.8% | 5 | 12 |
Nassau | 6.52 | 5.18 | +1.34 | +25.9% | 3 | 7 |
Westchester | 6.08 | 4.77 | +1.31 | +27.5% | 2 | 4 |
Rockland | 6.14 | 4.82 | +1.32 | +27.4% | 3 | 6 |
Suffolk | 6.35 | 5.4 | +0.95 | +17.6% | 2 | 9 |
Monroe | 7.11 | 6,15 | +0.96 | +15.6% | 3 | 13 |
Onondaga | 8.51 | 6.38 | +2.13 | +33.4% | 5 | 11 |
Oneida | 6.63 | 4.38 | +2.25 | +51.4% | 3 | 6 |
Average | 24.04% |
County | Number of Infections (Four Days) | |||
---|---|---|---|---|
17/03 | 18/03 | 19/03 | 20/03 | |
New York | 69 | 161 | 335 | 437 |
Kings | 39 | 264 | 273 | 674 |
Bronx | 29 | 123 | 154 | 191 |
Queens | 38 | 123 | 336 | 519 |
Richmond | 11 | 26 | 33 | 116 |
Nassau | 24 | 52 | 186 | 385 |
Westchester | 157 | 158 | 261 | 292 |
Rockland | 9 | 8 | 23 | 48 |
Suffolk | 22 | 31 | 62 | 193 |
Monroe | 1 | 4 | 13 | 5 |
Onondaga | 1 | 0 | 3 | 3 |
Oneida | 0 | 0 | 2 | 0 |
Overall Average | 122.8 |
Start Date (Infections) | 04/03 | ||
---|---|---|---|
End Date (Infections) | 20/3 | 21/03 | 22/03 |
New York | <10−4 | 0.0518 | 0.0902 |
Kings | <10−4 | 0.0003 | <10−4 |
Bronx | <10−4 | 0.0011 | 0.0003 |
Queens | <10−4 | 0.0002 | 0.1283 |
Richmond | <10−4 | <10−4 | <10−4 |
Nassau | <10−4 | <10−4 | 0.0018 |
Westchester | <10−4 | <10−4 | <10−4 |
Rockland | 0.0071 | 0.0 | <10−4 |
Suffolk | <10−4 | <10−4 | <10−4 |
Monroe | <10−4 | 0.0058 | 0.001 |
Onondaga | <10−4 | <10−4 | <10−4 |
Oneida | <10−4 | <10−4 | <10−4 |
Algorithm | Hyper-Parameters | Value |
---|---|---|
KNN | N Neighbors | 5 |
Weights | Uniform | |
SVC | C | 1 |
Kernel | RBF | |
Degree | 3 | |
Gamma | 1/8 | |
MLP | Hidden Layer | 1 |
Hidden Layer size | 100 | |
Max Epochs | 500 | |
Activation Function | ReLU | |
Optimization Algorithm | Adam | |
Batch Size | 16 | |
Learning Rate | 0.001 | |
ET | N Estimators | 50 |
Criterion | Gini | |
Min Samples Split | 2 | |
Min Samples Leaf | 1 | |
Max Features | ||
Bootstrap | False |
County | KNN | SVC | MLP | ET | Avg. per County |
---|---|---|---|---|---|
New York | 1 | 1 | 0.95 | 0.82 | 0.94 |
Kings | 0.95 | 0.8 | 1 | 0.79 | 0.89 |
Bronx | 0.85 | 1 | 0.95 | 0.82 | 0.91 |
Queens | 0.9 | 0.89 | 0.89 | 0.89 | 0.89 |
Richmond | 0.87 | 0.87 | 0.87 | 0.91 | 0.88 |
Nassau | 0.8 | 0.7 | 0.95 | 0.89 | 0.84 |
Rockland | 0.77 | 0.82 | 0.82 | 0.82 | 0.81 |
Westchester | 0.95 | 0.83 | 0.76 | 0.76 | 0.83 |
Suffolk | 0.9 | 0.85 | 0.85 | 0.9 | 0.88 |
Rockland | 0.77 | 0.82 | 0.82 | 0.82 | 0.81 |
Avg per algorithm | 0.89 | 0.86 | 0.89 | 0.84 | |
Monroe | 0.85 | 0.85 | 0.88 | 0.91 | 0.87 |
Onondaga | 0.85 | 0.88 | 0.91 | 1 | 0.91 |
Oneida | 0.91 | 0.94 | 1 | 0.94 | 0.95 |
Avg per algorithm | 0.87 | 0.89 | 0.93 | 0.95 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mirri, S.; Roccetti, M.; Delnevo, G. The New York City COVID-19 Spread in the 2020 Spring: A Study on the Potential Role of Particulate Using Time Series Analysis and Machine Learning. Appl. Sci. 2021, 11, 1177. https://doi.org/10.3390/app11031177
Mirri S, Roccetti M, Delnevo G. The New York City COVID-19 Spread in the 2020 Spring: A Study on the Potential Role of Particulate Using Time Series Analysis and Machine Learning. Applied Sciences. 2021; 11(3):1177. https://doi.org/10.3390/app11031177
Chicago/Turabian StyleMirri, Silvia, Marco Roccetti, and Giovanni Delnevo. 2021. "The New York City COVID-19 Spread in the 2020 Spring: A Study on the Potential Role of Particulate Using Time Series Analysis and Machine Learning" Applied Sciences 11, no. 3: 1177. https://doi.org/10.3390/app11031177
APA StyleMirri, S., Roccetti, M., & Delnevo, G. (2021). The New York City COVID-19 Spread in the 2020 Spring: A Study on the Potential Role of Particulate Using Time Series Analysis and Machine Learning. Applied Sciences, 11(3), 1177. https://doi.org/10.3390/app11031177