Next Article in Journal
Clinical and Biomarker Profile Responses to Rehabilitation Treatment in Patients with Long COVID Characterized by Chronic Fatigue
Next Article in Special Issue
Genetic Diversity of Domestic Cat Hepadnavirus in Southern Taiwan
Previous Article in Journal
Targeting Inflammasome Activation in Viral Infection: A Therapeutic Solution?
Previous Article in Special Issue
Genetic Analysis of Dengue Virus in Severe and Non-Severe Cases in Dhaka, Bangladesh, in 2018–2022
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Genomic Diversity and Evolution of SARS-CoV-2 Lineages in Pakistan

1
Institute of Microbiology, University of Veterinary and Animal Sciences, Lahore 54000, Pakistan
2
Department of Epidemiology, University of Veterinary and Animal Sciences, Lahore 54000, Pakistan
3
Department of Clinical Sciences, Sub Campus Jhang, University of Veterinary and Animal Sciences, Lahore 54000, Pakistan
4
Veterinary Research Institute, Lahore 53810, Pakistan
*
Author to whom correspondence should be addressed.
Viruses 2023, 15(7), 1450; https://doi.org/10.3390/v15071450
Submission received: 23 May 2023 / Revised: 25 June 2023 / Accepted: 26 June 2023 / Published: 27 June 2023
(This article belongs to the Special Issue Viral Genetic Variation)

Abstract

:
The emergence of SARS-CoV-2 variants has posed a challenge to disease control efforts worldwide. This study explored the genomic diversity and phylogenetic relationship of SARS-CoV-2 variants reported in Pakistan. Our objective was to understand the transmission dynamics of different lineages within the country. We retrieved and analyzed spike protein sequences from Pakistan and compared them with reference sequences reported worldwide. Our analysis revealed the clustering of Pakistan-origin isolates in nine different clades representing different regions worldwide, suggesting the transmission of multiple lineages within the country. We found 96 PANGO lineages of SARS-CoV-2 in Pakistan, and 64 of these corresponded to 4 WHO-designated variants: Alpha, Beta, Delta, and Omicron. The most dominant variants in Pakistan were Alpha (B.1.1.7), Beta (B.1.351), Delta (B.1.617.2, AY.108), and Omicron (BA.2.75, BA.5.2), and the N-terminal domain and receptor binding regions were the most hypervariable regions of the spike gene. Compared to the reference strain, characteristic substitutions were found in dominant variants. Our findings emphasize the importance of continuously monitoring and assessing nucleotide and residue substitutions over time to understand virus evolutionary trends better and devise effective disease control interventions.

1. Introduction

SARS-CoV-2, a novel coronavirus, was first identified in 2019 and rapidly became a global public health concern. It was initially reported in Pakistan in February 2020, and by mid-2020, the disease had spread nationwide, with clinical cases being observed throughout the country. SARS-CoV-2 had infected 1,577,213 individuals in Pakistan, resulting in 30,643 deaths [source: https://covid.gov.pk/ (accessed on 7 March 2023)].
The spike protein is a critical structural component of SARS-CoV-2 that facilitates viral entry into host cells. The spike protein comprises 1273 amino acids and 2 primary subdomains, S1 and S2. The S1 subdomain contains the receptor-binding domain (RBD), which mediates the attachment of the virus to the ACE2 receptor on host cells. In contrast, the S2 subdomain facilitates the fusion of the virus and the host cell membrane, enabling the virus to enter the cell efficiently [1]. The S1 subunit of the spike protein is less conserved than S2 due to the high frequency of interactions. The N-terminal domain (NTD) and the receptor-binding domain (RBD) are two highly immunogenic regions on the S1 subunit, and they are the primary targets of neutralizing monoclonal and polyclonal antibodies. The ongoing spread of SARS-CoV-2 in immune-competent populations has led to novel virus variants that have evolved to adapt to the host.
The spike protein plays a crucial role in determining the outcomes of COVID-19 vaccines in terms of success and failure. As a result, the World Health Organization (WHO) classifies SARS-CoV-2 variants based on the frequency and location of substitutions in the spike protein.
COVID-19 has spread through Pakistan in five waves to date. The first wave, which emerged with the report of the Alpha variant in late May 2020, peaked in the middle of June with the highest number of newly confirmed cases and daily mortality and rapidly ended in the middle of July. The second wave began in early November 2020 with the emergence of the Beta and Mu variants. The third wave, with the emergence of the highly transmissible Delta variant, was reported in the middle of March 2021. Pakistan then encountered two additional waves: the fourth from July to September 2021 and the fifth from January to March 2022 [2]. The Delta and Omicron variants were the dominant SARS-CoV-2 variants during Pakistan’s fourth and fifth COVID-19 waves.
The occurrence of multiple variants during COVID-19 waves has led to varying consequences regarding disease occurrence over time. This may be attributed to the emergence of escape mutants or vaccine failure in vaccinated individuals or those previously infected, resulting in varying variants with varying rates of disease occurrence. The lack of data on the genetic and antigenic characteristics of prevailing variants/lineages and their spatial distribution in susceptible populations has made developing and implementing effective disease control interventions challenging. Therefore, it is crucial to understand the ongoing genetic evolution in prevailing variants, coupled with comparative residue analysis, including reference vaccine strains used in the country. With this background, we explored the COVID-19 sequence database to identify circulating lineages and clades and assess the ongoing evolution among SARS-CoV-2 in Pakistan. We used spike gene sequence data for phylogenomic clustering, elucidation of substitutions, and evolutionary trends among dominant lineages/variants circulating in Pakistan and reference prototypes reported worldwide.

2. Methodology

2.1. Sequence Dataset

A total of 1496 nucleotide sequences of the spike protein of SARS-CoV-2 from Pakistan were obtained from the NCBI database [https://www.ncbi.nlm.nih.gov/sars-cov-2/ (accessed on 5 March 2023)], see Table S1. The sequences were initially grouped into 96 lineages and 1 unclassified group according to the Phylogenetic Assignment of Named Global Outbreak Lineages (PANGOLIN) nomenclature. The complete spike gene sequences were retrieved and evaluated for sequence quality, and 38 lineages with ambiguous bases (>1% of NNNs) were excluded from phylogeny and comparative genome analysis to minimize the risk of false positive results. The final dataset for this study included SARS-CoV-2 strains representing 58 lineages. The hCoV-2/Wuhan/WIV04/2019 (NC_045512.2) was used as a reference sequence and retrieved from the NCBI database [https://www.ncbi.nlm.nih.gov/sars-cov-2/ (accessed on 5 March 2023)]. The sequences were aligned using the ClustalW multiple alignment method implemented in BioEdit® version 7.2 software [3].

2.2. Bioinformatics Analysis

For the phylogenetic analysis, 58 complete spike protein sequences representing different circulating lineages in Pakistan and 158 from other geographical regions were used as input. A spike-protein-based phylogenetic tree was constructed using the FastTree algorithm in Geneious v7.1.9 (Biomatters Ltd., Auckland, New Zealand).
Comparative residue analysis of the spike protein was also performed using 58 complete spike protein sequences representing 58 different lineages to evaluate the presumptive role of substitutions. The open reading frame (ORF) was predicted by translating the nucleotide sequences into amino acid sequences for the entire coding region of the spike protein using BioEdit® software. The amino acid sequences were aligned using the ClustalW multiple alignment method implemented in BioEdit®. All substitutions in the spike protein were identified manually on BioEdit® and using the GISAID CoVsurver mutations application [https://gisaid.org/database-features/covsurver-mutations-app/ (accessed on 15 March 2023)] and mapped onto the spike protein gene structure. The spike protein gene structure was constructed and edited using Adobe Illustrator® software (version 27.1).

3. Results

3.1. SARS-CoV-2 Spike Protein Sequences

Spike protein gene sequences of SARS-CoV-2 retrieved from the NCBI SARS-CoV-2 database (https://www.ncbi.nlm.nih.gov/sars-cov-2/, accessed on 5 March 2023) were reported from different geographic regions of Pakistan, and the province-wide distribution is presented in Figure 1.
It is observed that maximum SARS-CoV-2 sequences were uploaded from Sindh (n = 846) followed by Punjab (n = 610), KPK (n = 36), AJK (n = 2), and Gilgit-Baltistan (n = 2) provinces of Pakistan (Figure 1). The present study also provides a comprehensive analysis of the temporal dynamics between COVID-19 variants and mortality rate in Pakistan. It explores the evolving landscape of variant prevalence, emphasizing the diverse genetic strains observed within the country. Moreover, it investigates the corresponding mortality rate, offering valuable insights into the severity as shown in Figure 2.

3.2. Lineage Categorization of SARS-CoV-2 Genome Sequences

A total of 96 PANGO-designated lineages were circulating in Pakistan in the last three years (March 2020-March 2023), as shown in Table 1, Table 2 and Table 3. A total of 64/96 prevalent lineages correspond to 4 WHO-designated variants, including Alpha, Beta, Delta, and Omicron, as shown in Table 1 and Table 2. GISAID clades corresponding to these circulating lineages, according to the available sequence data set of SARS-CoV-2 in Pakistan, are also shown in Table 1, Table 2 and Table 3.

3.3. GISAID Clades Distribution of SARS-CoV-2 Sequences of Pakistan

The S protein gene sequences of SARS-CoV-2 from Pakistan were clustered in nine clades, and some sequences were found unclassifiable and not included in any clade so far. The distribution of these nine clades shows the maximum prevalence of GRA (34.27%) as shown in Table 1, followed by GRY (23.71%), GH (16.26%), GK (8.24%), G (5.27%), GR (3.47%), L (1.3%), and O (1.08%) S (0.43%) as shown in Table 2. A total of 14.09% of the sequences were under monitoring and did not represent any clade as shown in Figure 3 and Table 3.

3.4. Phylogenetic Analysis

The maximum likelihood phylogenetic tree of 58 PANGOLIN lineage S protein gene sequences of Pakistan was developed as per the classification of the GISAID clades and PANGOLIN lineage (Figure 3). The overall clades and lineages distribution represented the dominant occurrence of L (B and B.4), O (B.1.260, B.6, and B.6.6), S (A), G (B.1.617.2), GH (B.1, B.1.351, B.1.36, B.1.36.24, B.1.36.31, B.1.36.34, B.1.471, and B.1.525), GK (AY.106, AY.108, AY.126, AY.127, AY.46, AY.46.2, AY.55, and AY.65), GR (AE.4, B.1.1, B.1.1.1, B.1.1.413, and C.23), GRA (B.1.1.529, BA.1, BA.1.1, BA.1.1.1, BA.1.1.18, BA.1.15.1, BA.1.17.2, BA.1.18, BA.1.21, BA.2, BA.2.12.1, BA.2.14, BA.2.15, BA.2.3, BA.2.36.1, BA.2.5, BA.2.75.1, BA.4, BA.4.1, BA.5.2, BA.5.2.1, BA.5.2.16, BA.5.2.20, BA.5.2.28, BA.5.2.3, BA.5.2.9, BA.5.3, BA.5.3.1, BA.5.5, BA.5.6, BQ.1, BQ.1.5, BQ.1.1, BQ.1.13, XBB.1, XBB.1.11.1, XBB.1.12, XBB.1.5, XBB.1.5.9, XBB.1.9, XBB.1.9.1, XBB.1.9.2, XBB.2, XBB.2.1, and XBB.2.4), and GRY (B.1.1.7, B.1.1.7, and Q.4) clades, respectively. The maximum likelihood phylogeny tree of Pakistani and global sequence samples is presented in Figure 4.

3.5. Mutational Diversity in Spike Protein of SARS-CoV-2

The mutational analysis reveals that D614G amino acid change was found in high frequency and 79.5% of lineages of Pakistan. The other high-frequency substitutions present in >10% of the lineage samples include T478K (59.09%); G142D (56.81%); N501Y (50%); P681H (50%); S477N and E484A (47.72%); V211G, G339D, K417N, N440K, C498R, Y505H, H655Y, N679K, D796Y, C954H, and N969K (45.45%); S373P and S375F (44.45%); T19I, L24S, T376A, and D405N (40.90%); S371F, R408S, and F486V (29.54%); V68I (27.27%); N764K (20.45%); D950N (13.63%); and T19R, E156G, N460K, C493R, and P681R (11.36%). The rare substitutions in sub variants include R346K, K444R, L452C, E484K, N658S, A570T, A570D, T572I, D574V, N658Y, S673T, A701V, S704L, T716I, K814R, A845S, A846V, I850L, D950H, S982A, A1020S, T1117I, D1118H, E1182C, and E120C (2.27%) shown in the Table 4.

3.6. Comparison of Mutational Substitutions in Spike Protein in Dominant Lineages of SARS-CoV-2 in Pakistan

In Pakistan, five waves of the SARS-CoV-2 infection were observed in the past two years. The sequence data reveal six lineages/WHO variants (B.1.1.7/Alpha, B.1.351/Beta, B.1.617.2/Delta, AY.108/Delta, BA.2.75/Omicron, and BA.5/Omicron) were dominant during March 2020–March 2023. The current study also represents the genetic diversity of these lineages/variants. After comparing these sequences with the Wuhan reference sequence, it was found that B.1.1.7/Alpha has 10 amino acid mutations with 99.4% identity, and its prevalence was higher (23.3%). The other variants also show the no. of mutation and % identity, and these include B.1.351/Beta (10, 99.4%), B.1.617.2/Delta (13, 99.1%), AY.108/Delta (11, 99.3%), BA.2.75/Omicron (33, 97.4%), and BA.5.2/Omicron (25, 98.3%) as shown in Table 5 and Figure 4. It was also found that most of the substitutions were found in the N-terminal domain and receptor-binding domain of the gene. The details of these mutations concerning the gene region are presented in Table 5 and Figure 5. The SARS-CoV-2 spike protein structures of these dominant lineages with characteristic mutations are also shown in Figure 6.

4. Discussion

The aim of this study was to assess the diversity of SARS-CoV-2 PANGOLIAN lineages and GISAID clades in Pakistan. We sought to understand the evolution of SARS-CoV-2 in the country by identifying amino acid substitutions in the spike protein. Additionally, we analyzed the phylogenetic relationship of Pakistan’s SARS-CoV-2 spike protein sequences with global sequences. A total of 1496 SARS-CoV-2 sequences have been uploaded to the NCBI database since the first sequence was collected in Pakistan in March 2020. Notably, the Sindh province reported the highest number of sequences (n = 846), which is attributed to their more efficient disease-reporting surveillance system and genome sequencing capabilities. Our analysis revealed the presence of B4 and B lineages in the first three sequences from Pakistan, and we investigated the ongoing evolution of these lineages by identifying amino-acid-changing substitutions in the spike protein. These findings provide valuable insights into the genetic diversity and evolution of SARS-CoV-2 in Pakistan, which can aid in developing effective disease control interventions [4]. The B4 lineage was the first reported lineage of Iran, having a significant role in the COVID-19 pandemic [5].
The abundance of various SARS-CoV-2 variants in Pakistan has been a subject of increasing concern as the pandemic has progressed. Early in the pandemic, the original wild-type variant dominated, but subsequently, several variants of concern (VOCs) and variants of interest (VOIs) emerged [https://gisaid.org/hcov-19-variants-dashboard/ (accessed on 28 May 2023)]. VOCs such as the Alpha, Beta, Gamma, and Delta variants have displayed increased transmissibility and have been associated with a higher risk of severe disease and mortality. These variants have often outcompeted their predecessors, leading to significant shifts in the viral landscape. Therefore, it is crucial to assess their prevalence and monitor their impact on mortality rates to guide public health strategies in Pakistan. Investigations into the mortality rates associated with different SARS-CoV-2 variants have revealed intriguing patterns. Studies have consistently shown that certain variants, such as the Alpha and Delta variants, have been linked to increased mortality compared to earlier variants [https://outbreak.info/location-reports?loc=PAK/ (accessed on 28 May 2023)]. This finding suggests that the genetic changes in these VOCs might contribute to higher virulence or evasion of the host immune response. Moreover, new variants have occasionally posed challenges regarding treatment efficacy, as certain variants have exhibited reduced susceptibility to therapeutic interventions, potentially impacting patient outcomes. However, it is essential to note that various factors, including healthcare capacity, population demographics, and vaccination rates, influence mortality rates. It is essential to interpret the observed associations between variant abundance and mortality rates within the broader context of the pandemic.
This study investigated the diversity of PANGOLIN lineages and GISAID clades in Pakistan and identified six dominant lineages/variants up to March 2023. We found a total of 96 PANGOLIN lineages circulating in Pakistan. The dominant lineages/variants were B.1.1.7/Alpha, B.1.351/Beta, B.1.617.2/Delta, AY.108/Delta, BA.2.75/Omicron, and BA.5.2/Omicron. The B.1.1.7 lineage (Alpha variant), which emerged in England in September 2020, was first reported in Pakistan in December 2020 and became the dominant lineage in the country. The submission of a higher number of sequences (n = 108) in April 2021 further supported its dominance. Previous studies have also reported its higher transmission and virulence than other variants [6,7,8]. Our analysis revealed that this lineage had more substitutions, possibly contributing to its higher transmission and mortality rates. The increased number of B.1.1.7 sequences could be attributed to its higher reproduction rate, as observed in another study in England [6,7,8].
The B.1.351 lineage, also known as the Beta variant, was initially detected in England in early 2020 but is predominantly known as the South African variant. In Pakistan, this lineage was first identified in May 2021, likely due to the travel of overseas Pakistanis from regions where the variant was prevalent. Previous research has indicated that this lineage (B.1.351) is more transmissible than the B.1.1.7/Alpha variant [9,10,11].
The emergence of different lineages and variants of SARS-CoV-2 is a matter of global concern. One of the major lineages reported in Pakistan is the B.1.617.2 lineage/Delta variant, first reported in India. This lineage is highly contagious and virulent in different studies, making it a serious public health concern. Similarly, the AY.108 lineage of the Delta variant has been reported in a higher frequency of sequences (n = 76) from Pakistan, highlighting the need to monitor and track the prevalence of this variant. Omicron variants have also been dominant in Pakistan, including BA.2.75 variant, which was first reported in India at the end of 2021. Another major lineage (BA.5.2) of the Omicron variant was found in high frequency (40%) in the United States and was first detected in the middle of 2020.
Interestingly, the present study revealed that B.1.1.7/Alpha, B.1.351/Beta, B.1.617.2/Delta, AY.108/Delta, BA.2.75/Omicron, and BA.5.2/Omicron were the dominant lineages of Pakistan till March 2023. The B.1.1.7/Alpha lineage, first reported in England in September 2020, was the dominant lineage in Pakistan in the initial months. However, previous studies reported the B.1.351/Beta lineage is more transmissible than the B.1.1.7/Alpha variant. Therefore, the emergence of the B.1.351/Beta lineage in Pakistan in May 2021 is a matter of concern as it has been reported to be highly transmissible.
It is worth noting that SARS-CoV-2 has undergone various amino-acid-changing substitutions, particularly in the spike protein, which has a significant role in viral entry into host cells. The Global Initiative on Sharing All Influenza Data (GISAID) database indicates that the S protein has undergone over 1229 amino acid substitutions. Identifying and tracking these amino acid substitutions in different lineages and variants are crucial for understanding their potential impact on viral transmission, virulence, and vaccine efficacy. Therefore, continuous surveillance and monitoring of SARS-CoV-2 lineages and variants are necessary to develop effective public health strategies to combat the pandemic. [12]. The current study investigated the substitutions in the spike protein gene of SARS-CoV-2 in Pakistan to understand the evolution of the virus in the country. The presence of substitutions was analyzed in all reported lineages of Pakistan, including the dominant lineages/variants. Substitutions in all regions of the spike protein of these dominant lineages/variants were also analyzed.
In the B.1.1.7/Alpha lineage, three substitutions (H69del, V70del, and Y144del) were observed in the N-terminal domain of the spike protein. The substitutions N501Y, A570D, and D614G were highly prevalent (>70%) in the RBD region and were considered characteristic of B.1.1.7. Moreover, P681H was identified as the characteristic mutation found in the S1/S2 region.
In the B.1.351/Beta lineage, substitutions D80A, D215G, L242del, A243del, and L244del were observed in the N.T.D. region. In the RBD region, substitutions K417N, E484K, N501Y, and D614G were highly prevalent (>80%) and were considered characteristic of this lineage. The study analyzed two lineages (B.1.617.2 and AY.108) of the Delta variant, and eight substitutions were found in the NTD region of B.1.617.2, whereas six substitutions were found in AY.108. In the RBD region of both lineages, three substitutions were found. L452R and T478K were new substitutions in the Delta variant, and due to their high prevalence, they were considered the characteristic substitutions of these variants.
Regarding the Omicron variant, the study analyzed 2 lineages (BA.2.75 and BA.5.2), and 11 substitutions were found in the NTD region of BA.2.75, while only 3 substitutions were found in BA.5.2. In the RBD region, 17 substitutions were detected, and G446S and N460K were found only in BA.2.75 and, hence, were considered the characteristic substitutions of this lineage of the Omicron variant.
The present study provides insights into the substitutions present in the spike protein gene of SARS-CoV-2 in Pakistan and the characteristic substitutions of different lineages/variants. These findings can aid in the development of effective vaccines and treatments against COVID-19. Based on previous studies, the spike protein N-terminal domain (NTD) of SARS-CoV-2 contains epitopes that are targeted by neutralizing antibodies generated by the host’s adaptive immune system [13,14]. Moreover, the NTD has been implicated in the interaction between the spike protein and glycosylated components of the host cell surface, which plays a crucial role in host cell adherence. Consequently, it is crucial to monitor the genetic diversity of the NTD to track the emergence of new viral variants. Our investigation indicates that specific loop components of the NTD, which have evolved differently among various SARS-CoV-2 clades, exhibit significant mutation rates. This finding underscores the importance of continuous surveillance and monitoring of the NTD of the spike protein to better understand the virus’s evolutionary dynamics and its potential impact on disease transmission and severity [13,14].
The substitutions occurring in the receptor-binding domain (RBD) of the spike (S) protein, play a critical role in the susceptibility of SARS-CoV-2 variants to both antibodies and infectiousness. Previous research has identified seven dominant RBD substitutions (G339D, N440K, L452R, S477N, T478K, E484K, and N501Y) that have been shown to enhance the positively charged residue interactions with ACE2 [15]. Of these substitutions, the D614G mutation has been the most prevalent (79%) and has been associated with increased viral transmission. It has been hypothesized that the substitution of aspartic acid with glycine at residue 614 enhances the virus’s infectivity [16]. These findings suggest that monitoring the emergence and prevalence of RBD substitutions is essential for understanding the evolution of SARS-CoV-2 variants and their potential impact on disease transmission and severity. Further investigation is needed to fully elucidate the functional consequences of these substitutions and their impact on the efficacy of existing therapeutics and vaccine strategies.
After analyzing the results of this study, there are several limitations that need to be considered. Firstly, our study only focused on analyzing the spike protein of SARS-CoV-2 in Pakistan, and thus the findings may not be generalizable to other countries or regions. Moreover, due to the rapid evolution of the virus, the analysis of sequenced samples may not capture all of the mutations that have occurred since the time of sampling. Furthermore, while our study provides insights into the evolutionary trend of SARS-CoV-2 in Pakistan, it does not provide information on the clinical or epidemiological implications of the observed mutations. Finally, the structural analysis of the spike protein was limited to substitutions in the dominant lineages, and future studies could explore other potential substitutions that may affect viral behavior. Therefore, caution should be exercised when interpreting the results of this study, and further research is required to fully understand the implications of the observed mutations.
In conclusion, the present study sheds light on the rapid evolution of SARS-CoV-2 in Pakistan by analyzing 96 lineages representing the country’s diverse clades. Our findings reveal numerous variations and amino acid substitutions over time, indicating a highly dynamic viral landscape. Our comparative residue analysis provides accurate insights into the dominant lineages and their structural changes. These results hold significant implications for understanding the evolutionary trend of SARS-CoV-2 in Pakistan and beyond. By identifying the pattern of substitutions in dominant lineages, this study highlights the importance of monitoring and adapting to the rapidly evolving virus. Ultimately, our research provides valuable information for developing effective strategies to combat the ongoing pandemic.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/v15071450/s1; Table S1: A Catalogue of SARS-CoV-2 Sequences Originating from Pakistan.

Author Contributions

M.W.A., T.Y. and M.Z.S. designed the study; M.W.A., M.A.A. and M.Z.S. collected data and samples; M.W.A., M.F.S. and N.M. performed the analysis; M.W.A., M.N., A.N. and N.M. conducted the formal analysis; T.Y., A.A.A. and M.H.M. provided overall supervision of the project; M.W.A. and M.Z.S. were responsible for writing the original draft; M.Z.S., A.N., N.S., R.T., N.M. and M.A.A. were responsible for reviewing and editing the drafts. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was approved by the Institutional Review Board (or Ethics Committee) of UVAS (DR/459).

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets analyzed during the current study were retrieved from and are available in the NCBI repository [https://www.ncbi.nlm.nih.gov/sars-cov-2/ (accessed on 5 March 2023)]. The accession numbers and details of these SARS-CoV-2 sequences are shown in the Supplementary Materials.

Acknowledgments

We thank all the laboratories that submitted their sequences to the NCBI platform. We are also grateful to the NCBI team for managing this platform.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Banerjee, A.K.; Begum, F.; Ray, U.J.P. Mutation hot spots in Spike protein of COVID-19. Preprints 2020, 2020, 2020040281. [Google Scholar]
  2. Sayeed, M.A. 258. Comparison of Different COVID Waves during COVID-19 Pandemic: A Retrospective Study from a Dedicated COVID-19 Facility of Karachi Pakistan. Open Forum Infect. Dis. 2022, 9, ofac492.336. [Google Scholar] [CrossRef]
  3. Hall, T.; Biosciences, I.; Carlsbad, C. BioEdit: An important software for molecular biology. GERF Bull. Biosci. 2011, 2, 60–61. [Google Scholar]
  4. Basheer, A.; Zahoor, I. Genomic epidemiology of SARS-CoV-2 divulge B. 1, B. 1.36, and B. 1.1. 7 as the most dominant lineages in the first, second, and third waves of SARS-CoV-2 infections in Pakistan. Microorganisms 2021, 9, 2609. [Google Scholar] [CrossRef] [PubMed]
  5. Fattahi, Z.; Mohseni, M.; Jalalvand, K.; Aghakhani Moghadam, F.; Ghaziasadi, A.; Keshavarzi, F.; Yavarian, J.; Jafarpour, A.; Mortazavi, S.E.; Ghodratpour, F. SARS-CoV-2 outbreak in Iran: The dynamics of the epidemic and evidence on two independent introductions. Transbound. Emerg. Dis. 2022, 69, 1375–1386. [Google Scholar] [CrossRef] [PubMed]
  6. Davies, N.G.; Abbott, S.; Barnard, R.C.; Jarvis, C.I.; Kucharski, A.J.; Munday, J.D.; Pearson, C.A.; Russell, T.W.; Tully, D.C.; Washburne, A.D. Estimated transmissibility and impact of SARS-CoV-2 lineage B. 1.1. 7 in England. Science 2021, 372, eabg3055. [Google Scholar] [CrossRef] [PubMed]
  7. Kraemer, M.U.; Hill, V.; Ruis, C.; Dellicour, S.; Bajaj, S.; McCrone, J.T.; Baele, G.; Parag, K.V.; Battle, A.L.; Gutierrez, B. Spatiotemporal invasion dynamics of SARS-CoV-2 lineage B. 1.1. 7 emergence. Science 2021, 373, 889–895. [Google Scholar] [CrossRef] [PubMed]
  8. Frampton, D.; Rampling, T.; Cross, A.; Bailey, H.; Heaney, J.; Byott, M.; Scott, R.; Sconza, R.; Price, J.; Margaritis, M. Genomic characteristics and clinical effect of the emergent SARS-CoV-2 B. 1.1. 7 lineage in London, UK: A whole-genome sequencing and hospital-based cohort study. Lancet Infect. Dis. 2021, 21, 1246–1256. [Google Scholar] [CrossRef] [PubMed]
  9. Williams, H.; Hutchinson, D.; Stone, H. Watching Brief: The evolution and impact of COVID-19 variants B. 1.1. 7, B. 1.351, P. 1 and B. 1.617. Glob. Biosecurity 2021, 3. [Google Scholar] [CrossRef]
  10. Roquebert, B.; Trombert-Paolantoni, S.; Haim-Boukobza, S.; Lecorche, E.; Verdurme, L.; Foulongne, V.; Sofonea, M.T.; Alizon, S. The SARS-CoV-2 B. 1.351 lineage (VOC β) is outgrowing the B. 1.1. 7 lineage (VOC α) in some French regions in April 2021. Eurosurveillance 2021, 26, 2100447. [Google Scholar] [CrossRef] [PubMed]
  11. Shen, X.; Tang, H.; Pajon, R.; Smith, G.; Glenn, G.M.; Shi, W.; Korber, B.; Montefiori, D.C. Neutralization of SARS-CoV-2 variants B. 1.429 and B. 1.351. N. Engl. J. Med. 2021, 384, 2352–2354. [Google Scholar] [CrossRef] [PubMed]
  12. Guruprasad, K. Mutations in human SARS-CoV-2 spike proteins, potential drug binding and epitope sites for COVID-19 therapeutics development. Curr. Res. Struct. Biol. 2022, 4, 41–50. [Google Scholar] [CrossRef] [PubMed]
  13. Harvey, W.T.; Carabelli, A.M.; Jackson, B.; Gupta, R.K.; Thomson, E.C.; Harrison, E.M.; Ludden, C.; Reeve, R.; Rambaut, A.; Consortium, C.-G.U. SARS-CoV-2 variants, spike mutations and immune escape. Nat. Rev. Microbiol. 2021, 19, 409–424. [Google Scholar] [CrossRef] [PubMed]
  14. Klinakis, A.; Cournia, Z.; Rampias, T. N-terminal domain mutations of the spike protein are structurally implicated in epitope recognition in emerging SARS-CoV-2 strains. Comput. Struct. Biotechnol. J. 2021, 19, 5556–5567. [Google Scholar] [CrossRef] [PubMed]
  15. da Costa, C.H.S.; de Freitas, C.A.B.; Alves, C.N.; Lameira, J. Assessment of mutations on RBD in the Spike protein of SARS-CoV-2 Alpha, Delta and Omicron variants. Sci. Rep. 2022, 12, 8540. [Google Scholar] [CrossRef] [PubMed]
  16. Zhang, L.; Jackson, C.B.; Mou, H.; Ojha, A.; Rangarajan, E.S.; Izard, T.; Farzan, M.; Choe, H. The D614G mutation in the SARS-CoV-2 spike protein reduces S1 shedding and increases infectivity. BioRxiv 2020. [Google Scholar] [CrossRef]
Figure 1. The province-wide distribution of SARS-CoV-2 sequences taken from the NCBI SARS-CoV-2 database (https://www.ncbi.nlm.nih.gov/sars-cov-2/, accessed on 5 March 2023) of Pakistan for a period spanning from March 2020 to March 2023. The map was edited by using Adobe Illustrator software (version 27.1).
Figure 1. The province-wide distribution of SARS-CoV-2 sequences taken from the NCBI SARS-CoV-2 database (https://www.ncbi.nlm.nih.gov/sars-cov-2/, accessed on 5 March 2023) of Pakistan for a period spanning from March 2020 to March 2023. The map was edited by using Adobe Illustrator software (version 27.1).
Viruses 15 01450 g001
Figure 2. (a) Graph showing the temporal dynamics of variant abundance (source: https://gisaid.org/hcov-19-variants-dashboard/, accessed on 5 March 2023) and (b) the corresponding trend of mortality rates over time in Pakistan (source: https://outbreak.info/location-reports?loc=PAK, accessed on 5 March 2023).
Figure 2. (a) Graph showing the temporal dynamics of variant abundance (source: https://gisaid.org/hcov-19-variants-dashboard/, accessed on 5 March 2023) and (b) the corresponding trend of mortality rates over time in Pakistan (source: https://outbreak.info/location-reports?loc=PAK, accessed on 5 March 2023).
Viruses 15 01450 g002
Figure 3. The chart represents the percentage distribution of GISAID clades observed among the SARS-CoV-2 sequences obtained from Pakistan. The names assigned to these clades—S, L, G, O, GH, GR, GRA, GK, and GRY—were designated by GISAID and are based on mutational markers observed within each clade. The data used for this analysis were downloaded on 5 March 2023. Each clade is represented by a distinct color in the chart, highlighting its relative prevalence within the sampled sequences from Pakistan.
Figure 3. The chart represents the percentage distribution of GISAID clades observed among the SARS-CoV-2 sequences obtained from Pakistan. The names assigned to these clades—S, L, G, O, GH, GR, GRA, GK, and GRY—were designated by GISAID and are based on mutational markers observed within each clade. The data used for this analysis were downloaded on 5 March 2023. Each clade is represented by a distinct color in the chart, highlighting its relative prevalence within the sampled sequences from Pakistan.
Viruses 15 01450 g003
Figure 4. Maximum likelihood phylogenetic tree that displays the genetic relationships among sequences of SARS-CoV-2 originating from Pakistan. The sequences from Pakistan are highlighted in red. The tree shows 9 distinct clades, 1 unclassifiable group denoted as “NA”, and 58 PANGO lineages. The global SARS-CoV-2 genomes with the highest number of mutations are grouped into different clades. The phylogenetic tree also includes light green, dark green, black, purple, red, yellow, dark teal, and blue balls, which indicate the distribution of Pakistani SARS-CoV-2 samples within their respective clades.
Figure 4. Maximum likelihood phylogenetic tree that displays the genetic relationships among sequences of SARS-CoV-2 originating from Pakistan. The sequences from Pakistan are highlighted in red. The tree shows 9 distinct clades, 1 unclassifiable group denoted as “NA”, and 58 PANGO lineages. The global SARS-CoV-2 genomes with the highest number of mutations are grouped into different clades. The phylogenetic tree also includes light green, dark green, black, purple, red, yellow, dark teal, and blue balls, which indicate the distribution of Pakistani SARS-CoV-2 samples within their respective clades.
Viruses 15 01450 g004
Figure 5. Pictorial representation of SARS-CoV-2 spike protein gene and substitutions found in dominant lineages/variants in Pakistan. The figure was constructed in Adobe Illustrator. (a) SARS-CoV-2 spike protein gene structure, (b) substitutions found in B.1.1.7/Alpha’s different regions, (c) B.1.351/Beta. (d) Substitutions found in the other regions of B.1.617.2/Delta. (e) Substitutions found in the different areas of AY.108/Delta and (f) in the different regions of BA.2.75/Omicron. (g) Substitutions found in the other regions of BA.5.2/Omicron.
Figure 5. Pictorial representation of SARS-CoV-2 spike protein gene and substitutions found in dominant lineages/variants in Pakistan. The figure was constructed in Adobe Illustrator. (a) SARS-CoV-2 spike protein gene structure, (b) substitutions found in B.1.1.7/Alpha’s different regions, (c) B.1.351/Beta. (d) Substitutions found in the other regions of B.1.617.2/Delta. (e) Substitutions found in the different areas of AY.108/Delta and (f) in the different regions of BA.2.75/Omicron. (g) Substitutions found in the other regions of BA.5.2/Omicron.
Viruses 15 01450 g005
Figure 6. SARS-CoV-2 spike protein with characteristic substitutions in dominant lineages/variants in Pakistan. The figure was constructed online using the GISAID CoVsurver mutations application (https://gisaid.org/database-features/covsurver-mutations-app/). (a) B.1.1.7/Alpha having A570D, T716I, and S982A substitutions, (b) B.1.351/Beta having K417N, E484K, N501Y, and A501Y substitutions, (c) B.1.617.2/Delta having A222V and L453R substitutions, (d) AY.108/Delta having T95I and L452R substitutions, (e) BA.2.75/Omicron having A210V, V213G, G275S, G339H, Y505H, D574Y, and D614G substitutions, and (f) BA.5.2/Omicron having S373F, R408S, D796Y, and Q954H substitutions.
Figure 6. SARS-CoV-2 spike protein with characteristic substitutions in dominant lineages/variants in Pakistan. The figure was constructed online using the GISAID CoVsurver mutations application (https://gisaid.org/database-features/covsurver-mutations-app/). (a) B.1.1.7/Alpha having A570D, T716I, and S982A substitutions, (b) B.1.351/Beta having K417N, E484K, N501Y, and A501Y substitutions, (c) B.1.617.2/Delta having A222V and L453R substitutions, (d) AY.108/Delta having T95I and L452R substitutions, (e) BA.2.75/Omicron having A210V, V213G, G275S, G339H, Y505H, D574Y, and D614G substitutions, and (f) BA.5.2/Omicron having S373F, R408S, D796Y, and Q954H substitutions.
Viruses 15 01450 g006
Table 1. Distribution of the GRA Clade of Omicron variant and corresponding lineages of SARS-CoV-2 Detected in Pakistan.
Table 1. Distribution of the GRA Clade of Omicron variant and corresponding lineages of SARS-CoV-2 Detected in Pakistan.
GISAIDPANGOLINWHOGeographical
Distribution
CladeNo. of Sequences Percentage LineageNo. of
Sequences
PercentageVariantProvince
GRA47434.27B.1.1.52930.22OmicronSindh
BA.1221.59OmicronSindh
BA.1.1443.18OmicronSindh
BA.1.1.120.14OmicronSindh
BA.1.1.1810.07OmicronSindh
BA.1.15.110.07OmicronSindh
BA.1.17.210.07OmicronSindh
BA.1.1810.07OmicronSindh
BA.1.2120.14OmicronSindh
BA.2614.41OmicronSindh
BA.2.10.110.07OmicronSindh
BA.2.12.130.22OmicronSindh
BA.2.1410.07OmicronSindh
BA.2.1510.07OmicronSindh
BA.2.310.07OmicronSindh
BA.2.36.120.14OmicronSindh
BA.2.510.07OmicronSindh
BA.2.75.110.07OmicronSindh
BA.2.75.210.07OmicronSindh
BA.470.51OmicronSindh
BA.4.180.58OmicronSindh
BA.5.110.07OmicronSindh
BA.5.220514.82OmicronSindh
BA.5.2.1372.68OmicronSindh
BA.5.2.1680.58OmicronSindh
BA.5.2.1810.07OmicronSindh
BA.5.2.2010.07OmicronSindh
BA.5.2.2810.07OmicronSindh
BA.5.2.310.07OmicronSindh
BA.5.2.5620.14OmicronSindh
BA.5.2.610.07OmicronSindh
BA.5.2.910.07OmicronSindh
BA.5.310.07OmicronSindh
BA.5.3.110.07OmicronSindh
BA.5.520.14OmicronSindh
BA.5.620.14OmicronSindh
BQ.130.22OmicronSindh
BQ.1.130.22OmicronSindh
BQ.1.1310.07OmicronSindh
XBB.1181.3OmicronPunjab, Sindh
XBB.1.11.110.07OmicronSindh
XBB.1.1210.07OmicronSindh
XBB.1.540.29OmicronPunjab, Sindh
XBB.1.5.910.07OmicronSindh
XBB.1.910.07OmicronSindh
XBB.1.9.150.36OmicronPunjab, Sindh
XBB.1.9.230.22OmicronPunjab, Sindh
XBB.210.07OmicronSindh
XBB.2.110.07OmicronSindh
XBB.2.410.07OmicronSindh
Total474
Table 2. Distribution of L, O, S, G, GH, GK, GR, GRY clades and corresponding lineages of SARS-CoV-2 detected in Pakistan along with WHO variant names.
Table 2. Distribution of L, O, S, G, GH, GK, GR, GRY clades and corresponding lineages of SARS-CoV-2 detected in Pakistan along with WHO variant names.
GISAIDPANGOLINWHOGeographical
Distribution
Clade No. of
Sequences
Percentage LineageNo. of
Sequences
PercentageVariantProvince
L181.3B141.01-KPK, Sindh, and Punjab
B.440.29-Sindh, Punjab, and Gilgit
O151.08B.1.26010.07-Sindh
B.6130.94-Punjab, Sindh, KPK, and AJK
B.6.610.07-Sindh
S60.43A60.43-Punjab and Sindh
G735.27B.1.617.2735.28DeltaPunjab and Sindh
GH22516.26B.11279.18BetaKPK, Sindh, and Punjab
B.1.351362.6BetaPunjab and Sindh
B.1.36241.74-Punjab and Sindh
B.1.36.2430.22-Punjab
B.1.36.31100.72-Punjab and Sindh
B.1.36.3420.14-Punjab
B.1.471211.52-Punjab and Sindh
B.1.52520.14-Sindh
GK1148.24AY.10610.07DeltaSindh
AY.108856.15DeltaPunjab, Sindh, and KPK
AY.12660.43DeltaPunjab and Sindh
AY.127120.87DeltaSindh
AY.4610.07DeltaSindh
AY.46.210.07DeltaSindh
AY.5560.43DeltaPunjab and Sindh
AY.6520.14DeltaPunjab and Sindh
GR483.47AE.410.07-Sindh
B.1.1211.52-Punjab and Sindh
B.1.1.1211.52-Punjab, KPK, and AJK
B.1.1.41310.07-Sindh
C.2340.29-Punjab
GRY32823.71B.1.1.732623.57AlphaPunjab and Sindh
BQ.1.510.07AlphaSindh
B.1.1.7510.07AlphaPunjab
Total827
Table 3. Distribution of Unclassified Group (NA) SARS-CoV-2 Sequences and Lineages Detected in Pakistan.
Table 3. Distribution of Unclassified Group (NA) SARS-CoV-2 Sequences and Lineages Detected in Pakistan.
GISAIDPANGOLINWHOGeographical
Distribution
CladeNo. of Sequences Percentage LineageNo. of
Sequences
PercentageVariantProvince
NA19514.09Q.410.07-Punjab
BE.330.22-Sindh
BF.520.14-Sindh
BN.110.07-Sindh
BN.1.320.14-Sindh
BN.1.3.410.07-Sindh
BN.1.410.07-Sindh
BV.260.43-Sindh
BY.110.07-Sindh
CK.1443.18-Sindh
CK.1.210.07-Sindh
CK.2130.94-Sindh
CK.2.170.51-Sindh
CR.120.14-Sindh
CT.110.07-Sindh
unclassifiable1097.88-Punjab, Sindh, and KPK
Total195
Table 4. Amino acid substitutions with the frequency of occurrence and percentage detected in Pakistan sequences.
Table 4. Amino acid substitutions with the frequency of occurrence and percentage detected in Pakistan sequences.
SubstitutionsFrequency of Occurrence
(N = 44)
Percentage
D614G3579.54
T478K2659.09
G142D2556.81
N501Y and P681H2250
S477N and E484A2147.72
V211G, G339D, K417N, N440K, C498R, Y505H, H655Y, N679K, D796Y, C954H, and N969K2045.45
S373P and S375F2044.45
T19I, L24S, T376A, and D405N1840.90
S371F, R408S, and F486V1329.54
V68I1227.27
N764K920.45
D950N613.63
T19R, E156G, N460K, C493R, and P681R511.36
R346K, K444R, L452C, E484K, N658S, A570T, A570D, T572I, D574V, N658Y, S673T, A701V, S704L, T716I, K814R, A845S, A846V, I850L, D950H, S982A, A1020S, T1117I, D1118H, E1182C, and E1202C12.27
Table 5. Mutational diversity present in the spike protein of B.1.1.7/Alpha, B.1.351/Beta, B.1.617.2/Delta, AY.108/Delta, BA.2.75/Omicron, and BA.5.2/Omicron lineage/variant in Pakistan.
Table 5. Mutational diversity present in the spike protein of B.1.1.7/Alpha, B.1.351/Beta, B.1.617.2/Delta, AY.108/Delta, BA.2.75/Omicron, and BA.5.2/Omicron lineage/variant in Pakistan.
S Gene RegionB.1.1.7/
Alpha
B.1.351/
Beta
B.1.617.2/
Delta
AY.108/
Delta
BA.2.75/
Omicron
BA.5.2/
Omicron
NTDH69delD80AT19RT19RT19IT19I
V70delD215G-T95IL24del-
Y144delL242del--A27S-
-A243delG142DG142DG142DG142D
-L244delE156GE156GK147E-
--F157delF157delW152R-
--R158delR158delF157L-
--V159X-I210V-
--Y160X-V213G-
--A222V-G257S-
----G339H G339D
RBD----S371FS371F
----S373PS373P
----S375FS375F
----T376AT376A
----D405ND405N
----R408SR408S
-K417N--K417NK417N
----N440KN440K
--L452RL452RS477NS477N
--T478KT478KT478KT478K
-E484K--E484AE484A
----Q498RQ498R
N501YN501Y--N501YN501Y
A570D---Y505HY505H
D614GD614GD614GD614GD614GD614G
SD2----H655YH655Y
----N679KN679K
S1/S2P681H-P681HP681HP681HP681H
UHT716IA701V--N764KN764K
FP.----D796YD796Y
HR1S982A-D950ND950NQ954HQ954H
----N969KN969K
CDD1118H-----
No. of Substitutions101013113325
% AA Identity99.4%99.4%99.1%99.3%97.4%98.3%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Aziz, M.W.; Mukhtar, N.; Anjum, A.A.; Mushtaq, M.H.; Ashraf, M.A.; Nasir, A.; Shahid, M.F.; Nawaz, M.; Shabbir, M.Z.; Sarwar, N.; et al. Genomic Diversity and Evolution of SARS-CoV-2 Lineages in Pakistan. Viruses 2023, 15, 1450. https://doi.org/10.3390/v15071450

AMA Style

Aziz MW, Mukhtar N, Anjum AA, Mushtaq MH, Ashraf MA, Nasir A, Shahid MF, Nawaz M, Shabbir MZ, Sarwar N, et al. Genomic Diversity and Evolution of SARS-CoV-2 Lineages in Pakistan. Viruses. 2023; 15(7):1450. https://doi.org/10.3390/v15071450

Chicago/Turabian Style

Aziz, Muhammad Waqar, Nadia Mukhtar, Aftab Ahamd Anjum, Muhammad Hassan Mushtaq, Muhammad Adnan Ashraf, Amar Nasir, Muhammad Furqan Shahid, Muhammad Nawaz, Muhammad Zubair Shabbir, Noreen Sarwar, and et al. 2023. "Genomic Diversity and Evolution of SARS-CoV-2 Lineages in Pakistan" Viruses 15, no. 7: 1450. https://doi.org/10.3390/v15071450

APA Style

Aziz, M. W., Mukhtar, N., Anjum, A. A., Mushtaq, M. H., Ashraf, M. A., Nasir, A., Shahid, M. F., Nawaz, M., Shabbir, M. Z., Sarwar, N., Tanvir, R., & Yaqub, T. (2023). Genomic Diversity and Evolution of SARS-CoV-2 Lineages in Pakistan. Viruses, 15(7), 1450. https://doi.org/10.3390/v15071450

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop