New Virus Variant Detection Based on the Optimal Natural Metric
Highlights
- We introduce a new algorithm designed for the automatic detection of emerging virus variants.
- The algorithm was tested on real datasets including SARS-CoV-2 and HIV-1, demonstrating nearly 100% precision in identification.
- Our approach enables the efficient identification of new virus variants based solely on sequence data, eliminating the need for biologists to pinpoint key viral regions.
- Our method pushes the boundaries of alignment-free techniques, expanding their application from classifying within known categories to recognizing new categories.
Abstract
:1. Introduction
2. Materials and Methods
2.1. Materials
2.2. The Optimal Natural Metric
2.3. New Virus Detection Method
Algorithm 1 New variant detection method |
|
3. Results
3.1. SARS-CoV-2
3.2. HIV-1
3.3. Orthocoronavirinae
3.4. Time Complexity Analysis
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
COVID-19 | Coronavirus Disease 2019 |
SARS-CoV-2 | Severe Acute Respiratory Syndrome Coronavirus-2 |
HIV-1 | Human Immunodeficiency Virus-1 |
WHO | World Health Organization |
VOC | Variants Of Concern |
VOI | Variants Of Interest |
NCBI | National Center for Biotechnology Information |
NN | Nearest Neighbor |
Appendix A
References
- LaTourrette, K.; Garcia-Ruiz, H. Determinants of Virus Variation, Evolution, and Host Adaptation. Pathogens 2022, 11, 1039. [Google Scholar] [CrossRef]
- Uddin, M.; Mustafa, F.; Rizvi, T.A.; Loney, T.; Al Suwaidi, H.; Al-Marzouqi, A.H.H.; Kamal Eldin, A.; Alsabeeha, N.; Adrian, T.E.; Stefanini, C.; et al. SARS-CoV-2/COVID-19: Viral Genomics, Epidemiology, Vaccines, and Therapeutic Interventions. Viruses 2020, 12, 526. [Google Scholar] [CrossRef]
- Maartens, G.; Celum, C.; Lewin, S.R. HIV infection: Epidemiology, pathogenesis, treatment, and prevention. Lancet 2014, 384, 258–271. [Google Scholar] [CrossRef] [PubMed]
- Vulturar, D.-M.; Moacă, L.-Ș.; Neag, M.A.; Mitre, A.-O.; Alexescu, T.-G.; Gherman, D.; Făgărășan, I.; Chețan, I.M.; Gherman, C.D.; Melinte, O.-E.; et al. Delta Variant in the COVID-19 Pandemic: A Comparative Study on Clinical Outcomes Based on Vaccination Status. J. Pers. Med. 2024, 14, 358. [Google Scholar] [CrossRef]
- Huang, Y.; Yang, C.; Xu, X.F.; Xu, W.; Liu, S.W. Structural and functional properties of SARS-CoV-2 spike protein: Potential antivirus drug development for COVID-19. Acta Pharmacol. Sin. 2020, 41, 1141–1149. [Google Scholar] [CrossRef]
- Li, M.; Lou, F.; Fan, H. SARS-CoV-2 Variants of Concern Delta: A great challenge to prevention and control of COVID-19. Sig. Transduct. Target Ther. 2021, 6, 349. [Google Scholar] [CrossRef] [PubMed]
- Enhancing Response to Omicron SARS-CoV-2 Variant. Available online: https://www.who.int/publications/m/item/enhancing-readiness-for-omicron-(b.1.1.529)-technical-brief-and-priority-actions-for-member-states (accessed on 31 May 2024).
- Karim, S.S.A.; Karim, Q.A. Omicron SARS-CoV-2 variant: A new chapter in the COVID-19 pandemic. Lancet 2022, 399, 2126–2128. [Google Scholar]
- Zielezinski, A.; Vinga, S.; Almeida, J.S.; Karłowski, W.M. Alignment-free sequence comparison: Benefits, applications, and tools. Genome Biol. 2017, 18, 186. [Google Scholar] [CrossRef] [PubMed]
- Bonham-Carter, O.; Steele, J.; Bastola, D.R. Alignment-free genetic sequence comparisons: A review of recent approaches by word analysis. Brief. Bioinform. 2014, 15, 890–905. [Google Scholar] [CrossRef]
- Lu, Y.Y.; Tang, K.; Ren, J.; Fuhrman, J.A.; Waterman, M.S.; Sun, F. CAFE: ACcelerated Alignment-FrEe sequence analysis. Nucl. Acids Res. 2017, 45, W554–W559. [Google Scholar] [CrossRef] [PubMed]
- Deng, M.; Yu, C.; Liang, Q.; He, R.L.; Yau, S.S.T. A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications. PLoS ONE 2011, 6, e17293. [Google Scholar] [CrossRef]
- Wen, J.; Chan, R.H.; Yau, S.C.; He, R.L.; Yau, S.S.T. K-mer natural vector and its application to the phylogenetic analysis of genetic sequences. Gene 2014, 546, 25–34. [Google Scholar] [CrossRef] [PubMed]
- Yau, S.S.T.; Zhao, X.; Tian, K.; Yu, H. Mathematical Principles in Bioinformatics; Springer: Cham, Switzerland, 2023; pp. 91–144. [Google Scholar]
- Sun, N.; Pei, S.; He, L.; Yin, C.; He, R.L.; Yau, S.S.T. Geometric construction of viral genome space and its applications. Comput. Struct. Biotechnol. J. 2021, 19, 4226–4234. [Google Scholar] [CrossRef] [PubMed]
- Dong, R.; Pei, S.; Guan, M.; Yau, S.C.; Yin, C.; He, R.L.; Yau, S.S.T. Full Chromosomal Relationships Between Populations and the Origin of Humans. Front. Genet. 2022, 12, 828805. [Google Scholar] [CrossRef]
- Yu, H.; Yau, S.S.T. Automated recognition of chromosome fusion using an alignment-free natural vector method. Front. Genet. 2024, 15, 1364951. [Google Scholar] [CrossRef] [PubMed]
- Yu, H.; Yau, S.S.T. The optimal metric for viral genome space. Comput. Struct. Biotechnol. J. 2024, 23, 2083–2096. [Google Scholar] [CrossRef] [PubMed]
- Cover, T.M.; Hart, P.E. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Dekking, F.M.; Kraaikamp, C.; Lopuhaä, H.P.; Meester, L.E. A Modern Introduction to Probability and Statistics: Understanding Why and How; Springer: London, UK, 2005; pp. 377–379. [Google Scholar]
- Weglarczyk, S. Kernel density estimation and its application. ITM Web Conf. 2018, 23, 00037. [Google Scholar] [CrossRef]
- Taylor, B.S.; Sobieszczyk, M.E.; McCutchan, F.E.; Hammer, S.M. The challenge of HIV-1 subtype diversity. N. Engl. J. Med. 2008, 358, 1590–1602. [Google Scholar] [CrossRef]
- D’arc, M.; Ayouba, A.; Esteban, A.; Learn, G.H.; Boué, V.; Liegeois, F.; Etienne, L.; Tagg, N.; Leendertz, F.H.; Boesch, C.; et al. Origin of the HIV-1 group O epidemic in western lowland gorillas. Proc. Natl. Acad. Sci. USA 2015, 112, E1343–E1352. [Google Scholar] [CrossRef]
- Mourez, T.; Simon, F.; Plantier, J.C. Non-m variants of human immunodeficiency virus type 1. Clin. Microbiol. Rev. 2013, 26, 448–461. [Google Scholar] [CrossRef] [PubMed]
- Plantier, J.C.; Leoz, M.; Dickerson, J.E.; De Oliveira, F.; Cordonnier, F.; Lemée, V.; Damond, F.; Robertson, D.L.; Simon, F. A new human immunodeficiency virus derived from gorillas. Nat. Med. 2009, 15, 871–872. [Google Scholar] [CrossRef] [PubMed]
- Hemelaar, J.; Gouws, E.; Ghys, P.D.; Osmanov, S. Global and regional distribution of HIV-1 genetic subtypes and recombinants in 2004. AIDS 2006, 20, W13–W23. [Google Scholar] [CrossRef] [PubMed]
- Smith, D.M.; Richman, D.D.; Little, S.J. HIV Superinfection. J. Infect. Dis. 2005, 192, 438–444. [Google Scholar] [CrossRef]
- Louten, J. Essential Human Virology, 2nd ed.; Academic Press: Boston, MA, USA, 2023; pp. 277–306. [Google Scholar]
- McBride, R.; van Zyl, M.; Fielding, B.C. The coronavirus nucleocapsid is a multifunctional protein. Viruses 2014, 6, 2991–3018. [Google Scholar] [CrossRef] [PubMed]
- Carstens, E.B. Ratification vote on taxonomic proposals to the International Committee on Taxonomy of Viruses. Arch. Virol. 2010, 155, 133–146. [Google Scholar] [CrossRef]
- Altschul, S.; Gish, W.; Miller, W.; Myers, E.; Lipman, D. Basic Local Aligment Search Tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef]
- Needleman, S.B.; Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970, 48, 443–453. [Google Scholar] [CrossRef]
k-mer/Order | 0 | 1 | 2 |
---|---|---|---|
1 | |||
2 | |||
3 | |||
4 | |||
5 | |||
6 | |||
7 | |||
8 | |||
9 |
Dataset | Type I Error | Type II Error |
---|---|---|
SARS-CoV-2 | 0.94% | 0.96% |
HIV-1 | 0.94% | 0.87% |
Orthocoronavirinae | 0.03% | 0.98% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, H.; Yau, S.S.-T. New Virus Variant Detection Based on the Optimal Natural Metric. Genes 2024, 15, 891. https://doi.org/10.3390/genes15070891
Yu H, Yau SS-T. New Virus Variant Detection Based on the Optimal Natural Metric. Genes. 2024; 15(7):891. https://doi.org/10.3390/genes15070891
Chicago/Turabian StyleYu, Hongyu, and Stephen S.-T. Yau. 2024. "New Virus Variant Detection Based on the Optimal Natural Metric" Genes 15, no. 7: 891. https://doi.org/10.3390/genes15070891
APA StyleYu, H., & Yau, S. S. -T. (2024). New Virus Variant Detection Based on the Optimal Natural Metric. Genes, 15(7), 891. https://doi.org/10.3390/genes15070891