Poking COVID-19: Insights on Genomic Constraints among Immune-Related Genes between Qatari and Italian Populations

Mbarek, Hamdi; Cocca, Massimiliano; Al-Sarraj, Yasser; Saad, Chadi; Mezzavilla, Massimo; AlMuftah, Wadha; Cocciadiferro, Dario; Novelli, Antonio; Quinti, Isabella; AlTawashi, Azza; Salvaggio, Salvino; AlThani, Asma; Novelli, Giuseppe; Ismail, Said I.

doi:10.3390/genes12111842

Open AccessArticle

Poking COVID-19: Insights on Genomic Constraints among Immune-Related Genes between Qatari and Italian Populations

by

Hamdi Mbarek

^1,*,†

,

Massimiliano Cocca

^2,*,†

,

Yasser Al-Sarraj

^1,†

,

Chadi Saad

¹

,

Massimo Mezzavilla

²

,

Wadha AlMuftah

¹

,

Dario Cocciadiferro

³,

Antonio Novelli

³

,

Isabella Quinti

⁴,

Azza AlTawashi

⁵,

Salvino Salvaggio

⁵,

Asma AlThani

¹,

Giuseppe Novelli

⁶

and

Said I. Ismail

¹

Qatar Genome Program, Qatar Foundation Research, Development and Innovation, Qatar Foundation, Doha P.O. Box 5825, Qatar

²

Institute for Maternal and Child Health, IRCCS ‘Burlo Garofolo’, 34137 Trieste, Italy

³

Translational Cytogenomics Research Unit, Bambino Gesù Children’s Hospital, IRCCS, 00165 Rome, Italy

⁴

Department of Molecular Medicine, Sapienza University of Rome, 00185 Rome, Italy

⁵

VP RDI Office, Qatar Foundation Research, Development and Innovation, Qatar Foundation, Doha P.O. Box 5825, Qatar

⁶

Department of Biomedicine and Prevention, Tor Vergata University of Rome, 00133 Rome, Italy

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Genes 2021, 12(11), 1842; https://doi.org/10.3390/genes12111842

Submission received: 30 September 2021 / Revised: 10 November 2021 / Accepted: 12 November 2021 / Published: 22 November 2021

(This article belongs to the Special Issue COVID-19 and Molecular Genetics)

Download

Browse Figures

Versions Notes

Abstract

:

Host genomic information, specifically genomic variations, may characterize susceptibility to disease and identify people with a higher risk of harm, leading to better targeting of care and vaccination. Italy was the epicentre for the spread of COVID-19 in Europe, the first country to go into a national lockdown and has one of the highest COVID-19 associated mortality rates. Qatar, on the other hand has a very low mortality rate. In this study, we compared whole-genome sequencing data of 14398 adults and Qatari-national to 925 Italian individuals. We also included in the comparison whole-exome sequence data from 189 Italian laboratory-confirmed COVID-19 cases. We focused our study on a curated list of 3619 candidate genes involved in innate immunity and host-pathogen interaction. Two population-gene metric scores, the Delta Singleton-Cohort variant score (DSC) and Sum Singleton-Cohort variant score (SSC), were applied to estimate the presence of selective constraints in the Qatari population and in the Italian cohorts. Results based on DSC and SSC metrics demonstrated a different selective pressure on three genes (MUC5AC, ABCA7, FLNA) between Qatari and Italian populations. This study highlighted the genetic differences between Qatari and Italian populations and identified a subset of genes involved in innate immunity and host-pathogen interaction.

Keywords:

COVID-19; population genetics; genetic constraints; COVID-19 severity

1. Introduction

COVID-19 continues to spread worldwide, with over four million deaths to date and rising. However, this global spread is coupled with stark anomalies in morbidity and mortality. These differences can be seen not only between different populations but also within the same population [1,2,3,4]. While most of these differences can be attributed to sociodemographic and clinical factors, this is also a unique opportunity to assess associations with host genomes. Host genomic information, specifically genomic variations, may characterize susceptibility to disease and identify people with a higher risk of harm, leading to better targeting of care and vaccination [5,6,7]. In addition, characterizing these host factors may help identifying and development of adapted drugs and vaccines [8,9,10]. The scientific community came together with several efforts to investigate how the genomic variation in the host affects disease susceptibility and progress [11,12]. So far, these large consortia efforts have led to the identification of over 20 loci associated with susceptibility or severity of the disease [13].

Italy was the epicentre for the spread of COVID-19 in Europe and the first country to go into a national lockdown. It had one of the highest COVID-19 associated mortality rates in Europe [4]. At the time of writing, almost five million cases have been confirmed, with a death toll of more than 130 thousand people (infection fatality rate = 2.7%). Qatar, on the other hand, despite having one of the highest worldwide numbers of laboratory-confirmed cases (36,729 cases per million, by July 2020), has a very low mortality rate (infection fatality rate = 0.91 per 10,000 persons, by July 2020, per WHO COVID-19 mortality classification) [14]. Studies have even suggested that some communities in Qatar have reached herd immunity for SARS-CoV-2 at a proportion of infection of 65–70% [15]. With the development of the first generation of RNA based vaccines [16,17], along with the more standard adenovirus-based solutions [18,19], the end of the pandemic seems to be in sight, though we are aware that this is just the beginning of a more bearable coexistence with the virus.

Differences in terms of fatality rate or disease prevalence between population groups, besides the socio-economic factors, could also be imputed to the patients’ genetic background, and, as mentioned above, several studies are investigating the host genetic contribution to the disease susceptibility and severity [11,12,13]. In this study, we focused on genes involved in the immune response, combining them with a dataset of 1500 proteins mostly involved in COVID-19 disease [20] and a subset of genes already identified as linked to COVID-19 susceptibility and progression [7]. We applied, on this set of genes, a prioritization method based on ultra-rare and population-specific variants. With this approach, we aim to identify a group of genes showing different signs of selective pressure in our study cohorts. Our hypothesis is that those genes can provide information to understand the pandemic progression and maybe help towards therapy.

2. Materials and Methods

2.1. Population Description

The Qatari Cohort: The Qatar Genome Program (QGP) [21] is a population-based project launched by the Qatar Foundation to generate a large-scale whole-genome sequence (WGS) dataset, in combination with comprehensive phenotypic information collected by the Qatar Biobank (QBB) [22]. All subjects included in the analysis were of Qatari Middle Eastern Arabian ancestry [23]. In this study, we use a cohort of 14,398 individuals with an average coverage of 30X. Data preprocessing and downstream quality control analyses for WGS data were conducted as recommended by the Covid19 Host Genetics Initiative study protocol [7].

Italian Genetic Isolated cohorts: Three Italian cohorts belonging to the Italian network of Genetic Isolates (INGI) were involved in this study due to the availability of whole genome-sequence data. The selected populations localized in three different geographical areas of Italy: North-West (Val Borbera-VBI), North-East (Friuli Venezia Giulia-FVG) and South-East (Carlantino-CAR); In each cohort, a wide range of phenotypic data is available for each participant (e.g., anthropometric traits, blood tests, sensory impairment, taste and food preferences, extensive personal and familial anamnesis). A total of 925 samples with low coverage (4X to 10X) WGS data were selected for the analyses [24]

Italian COVID-19 positive samples: a cohort of 189 individuals which tested positive for the SARS-CoV-2 infection and collected at the Bambino Gesu’ hospital in Rome was included in the study to provide information on the pattern of genetic variation in a group of selected genes in an outbred Italian cohort. Whole Exome Sequencing data was generated by the University of Tor Vergata from peripheral blood. The samples are clustered in three groups, based on the disease severity: severe, extremely severe and asymptomatic [25].

All data analyzed was aligned to the reference genome’s GRCh38 release, and functional annotations were obtained using the Ensembl VEP tool [26].

2.2. Principal Component Analysis

To highlight the study cohorts’ population structure level, we performed a principal component analysis (PCA) using KING software [27]. Plink v1.9 software [28] was used to convert data from vcf to plink binary format. QGP and each INGI cohort results were projected into the 1000Genomes Project data [29]. To highlight the peculiar ancestry structure of the Qatari population, we also performed an ancestry inference analysis using the software KING.

2.3. Genes Selection and Prioritization Analyses

Literature curation process, Genomics England (GEL) panel expert and Ingenuity Variant Analysis (IVA): The candidate gene generation process, the initial candidate gene list ranking and curation are conveyed on the recent literature review to extract a list of genes involved in innate immunity and host-pathogen interaction. The primary gene list is curated according to the knowledge-literature base by the Ingenuity® Variant Analysis™ software from QIAGEN [30] and the viral gene panel expert from Genomics England (GEL) [31]. This list includes a total of 3617 genes (Table S1).

Candidate genes were annotated with the most common gene-ranking metrics using the loss of function intolerance score (pLI) [32] and the Residual Variation Intolerance Score (RVIS) [33]. In addition, we selected a list of 25 genes (Table S2) that were recently associated with COVID-19 susceptibility and severity [7] and overlapped with the primary list, extracting a subset of genes that underwent further analyses.

Population-based Gene constraints: Two population-based gene metric scores, the Delta Singleton-Cohort variant score (DSC–accounting for the difference in singletons between coding and non-coding regions) and Sum Singleton-Cohort variant score (SSC—accounting for the sum of singletons variants in the coding and non-coding regions), were adopted to estimate the presence of specific pressures selection in the Qatari population as well as the Italian Isolated cohorts [34]. Only variants with a QUAL value above 30 were used in the calculation to limit the inclusion of genotyping error for variants with allele count (AC) equal to 1. Only scores calculated on canonical transcripts were selected. Only genes with scores values lower or equal to −2 and greater or equal to 2 were retained in each population. These values represent the significant threshold that allows us to discriminate between a gene under constraint (DSC or SSC score ≤−2) or under relaxation (DSC or SSC score ≥ 2). Since we aim to compare two populations with different structures and high levels of inbreeding, we also calculated the same set of scores for the closest ancestry populations of each study cohort. We used data from the gnomAD v3.1 [35] call set, including 1000Genomes project samples and extracted information on the EUR, AFR and SAS superpopulations subset. The EUR subset was used as reference for the Italian samples [36], while the AFR and SAS subsets for the Qatari cohort [23]. We defined two different levels of comparison: at the ancestry level, in which we selected all genes showing concordant selective signals between our study cohorts and their closest ancestry population, and at the population level, in which we selected only genes showing different behaviour between our target population and the reference. Since we were dealing with three Italian populations, comparing only with one reference, we selected genes that satisfied our criteria in at least one of the three target Italian populations.

On the other hand, we used two different references for the Qatari target population, so we selected all genes that meet our criteria in at least one reference population. Finally, we proceeded with the comparisons between our target populations, performing three sets of comparisons: population-specific, population-specific vs. ancestry related and ancestry related comparisons (Table S3). Each comparison was performed separately for SSC and DSC scores. Finally, we generated a list of genes overlapping between SSC and DSC comparisons to select those genes that consistently showed opposite behaviour in terms of selection or relaxation in our target populations. We used the Fisher’s test to compare DSC and SSC scores distributions between study cohorts, and reference populations. We performed a Shapiro-Wilk test to assess the normality of the score distribution in each cohort and an enrichment test to assess whether there was an enrichment in relaxed or constrained genes in our target populations vs. the selected reference populations.

2.4. WES COVID-19 Cohorts

Using Whole Exome Sequence data from a cohort of COVID-19 positive samples (n = 189), we calculated singletons count and singletons density in the coding regions of the genes belonging to the shortlist generated, adjusting by sample size, and compared using Fisher’s test against data from the other target and reference populations. In this cohort, each sample was characterized by a disease severity code. The disease severity classes are defined as follows: (1) Asymptomatic/Paucisymptomatic, (2) Severe, (3) Critical/life-threatening [25]. We used this information to investigate if we could identify any contribution of the singleton burden of the prioritized genes to the classification. A multinomial analysis with R was performed using age, gender and the singleton count as explanatory variables. We also analyzed the contribution of the prioritized genes to the outcome (Survived/Deceased) with a logistic regression model and the same covariates used in the multinomial analysis. We performed the analyses using both the whole-gene singleton count and the coding regions singleton count. A summary of the phenotype information is available in Table S4.

3. Results

3.1. Population Stratification

As expected, the PCA analysis (Figure 1) showed a clear differentiation between QGP and INGI (European) ancestry. The Italian cohorts clustered with the European samples from the 1000Genomes Project reference data, while the QGP samples overlapped with clusters from different populations (AFR, SAS, AMR, EUR). Using the ancestry inference function provided by the KING software, we confirmed the presence of different sub-population clusters in the QGP cohort, highlighting that a considerable proportion of the analyzed samples (more than 4000 samples) belong to a ‘missing’ super population cluster (Figure S1). This is mainly due to the absence of population from the Near East in the 1000 Genomes Project data. This outcome confirms results already obtained by different studies on the Italian populations [36] and on the first subset of nearly 6000 samples of the Qatari population [37].

3.2. Population Based Gene Prioritization

For each population and each gene in the selected subset, we calculated two scores related to the presence of cohort-singletons variants. In (Figure 2), we show the distribution, among the 3617 genes selected, of the DSC score ((Figure 2) top panel) and the SSC score ((Figure 2) bottom panel), in each study cohort and the selected reference groups (EUR, AFR and SAS). We compared each target population score distribution with the relevant reference population (Table S5). All the INGI populations are significantly different from the reference EUR population for both scores. In contrast, the Qatari population significantly differs from the reference populations (AFR and SAS) in DSC score distribution, but not for the SSC score distribution. This pattern is also confirmed in the enrichment analyses of relaxed and constrained genes, in target populations vs. reference populations, of relaxed and constrained genes. Exact Fisher’s tests show enrichment in constrained genes in all the target populations when comparing DSC scores (Table S6). Similarly, if we consider the SSC scores, all target populations do not show significant enrichment in constrained genes. Regarding the relaxed genes, though, we detected a significant enrichment in both DSC and SSC scores for the Italian populations (CAR, VBI and FVG) but not for the QGP cohort (Table S6).

We used a threshold of −2 to define significant constraint and a threshold of +2 to define a significant relaxation signal [34]. Results for the comparisons between our target populations are summarized in Table 1 and Table 2. Regarding the DSC score, we identified six genes with a signature of constraint in the QGP population and relaxation in at least one of the Italian populations. Two of those genes (TTN and LRP1B) are results of population-specific comparisons, and one (RICTOR) is the outcome of an ancestry-related comparison. Eight genes showed an opposite pattern of relaxation in the Qatari cohort and constraint in at least one Italian cohort. Among them, RYR3 is the result of an ancestry related comparison. When comparing our target populations based on the overall burden of singletons in each gene (SSC score), we identified a total of 35 genes that behave differently between the Qatari population and at least one Italian population (Table 2). Seventeen of those genes showed a pattern of constraint in the Qatar population and relaxation in at least one Italian cohort. The HELZ gene was the only one arising from a population-specific comparison. The remaining eighteen genes showed a pattern of relaxation in the QGP dataset but a significant constraint in at least one of the other targets. In this subset, the CELSR2 gene is the result of a population-specific comparison. Since our focus is to identify genes that consistently show different selection signals among our cohorts, we selected a subset of genes for which both DSC and SSC scores are concordant: ABCA7, FLNA, MUC5AC (Table 3). Those three genes showed a consistent relaxation pattern in the QGP cohort while being always characterized by strong signals of constraints in at least one Italian cohort. Interestingly, FLNA shows a significant signal of constraint in CAR and VBI cohorts while remaining neutral in the FVG dataset. The trend for the constraint signal is also replicated in all the reference cohorts selected. Data from other outbred populations from the 1000Genome project (EAS and AMR) confirm the trend of constraints (Table S7). ABCA7 repeats the pattern observed for FLNA, in terms of target populations, with a significant constraint signal in the FVG cohort, and a trend of constraint in the CAR cohort, while being neutral in the VBI cohort. This time though, we can see how the outbred reference populations, plus the remaining super populations of 1000Genomes, are all in agreement, showing relaxation signals. Lastly, the MUC5AC gene shows a consistent pattern of significant constraint signal in all the Italian cohorts, but conversely, always a significantly relaxed pattern in all other populations.

3.3. COVID-19 Cohort Analysis

Next, we included a cohort of 189 COVID-19 positive samples (TOV cohort) and calculated the number of singleton variants in this subset for the three genes of interest. Table 4 shows the results of the comparisons with the other study populations and the reference populations. If we consider the whole gene, we can see how, for the FLNA gene, the TOV cohort shows a small difference in the singleton density when compared to the FVG cohort and a more significant difference with the VBI and QGP cohorts, while ABCA7 and MUC5AC genes have consistently a significantly different pattern when compared with all reference and target populations (Table S8). If we consider only the coding part of each gene, we confirm the minor differences in the FLNA gene between the TOV cohort and the VBI and QGP cohorts. We also confirm the results for ABCA7 and MUC5AC (Table S9).

To investigate if the burden of singletons in the prioritized genes could contribute to the disease severity classification, we performed multinomial logistic regression analyses. Disease severity class was the response variable, and age, gender and singleton count the explanatory variables. When we consider the contribution of the burden of singleton in the whole gene, the multinomial analyses showed that only age and gender are important predictors for the disease severity classification (Table S10, Figure S2). Using the burden of singletons in the coding regions as parameters in the regression model resulted in age and gender being significant predictors for being in class 2 vs. class 1 (p-values: 6.09 ×

10^{- 7}

and 0.04463 respectively) and belonging to class 3 vs. class 1 (p-values: 6.42 ×

10^{- 10}

and 0.02035). Age resulted in a significant predictor of being in class 2 vs. class 3 (p-value: 0.008617). Regarding genes contribution, the model highlighted only the FLNA gene as a significant predictor for being in class 2 vs. class 1 and for being in class 3 vs. class 2 (all p-values: < 2.2 ×

10^{- 16}

) (Table S11).

We also retrieved the disease outcome (Survived/Deceased) information and used the same parameters to perform a logistic regression analysis. As a result, we can still see that age is a major predictor for the outcome (p-value: 1.50 × 10⁻¹⁰) (Figure S3). We can also see a contribution of the ABCA7 gene, but only when we consider the number of singletons in the whole gene (p-value: 0.0228) (Tables S12 and S13).

4. Discussion

Since the H1N1 influenza pandemic in 1918, the ongoing COVID-19 pandemic is the most severe emergency we have met globally.

For the scientific community, this emergency has been a wake-up call to join forces to fight back, investigate the effect of the virus on patients’ health, and understand the infection’s molecular mechanisms. All this ongoing effort is producing knowledge that is driving therapy and vaccine development. In this context, we focused on population-based statistics to characterize a subset of genes involved in the inflammation/immune response biological process. These statistics were obtained by analyzing populations with different ancestries and levels of inbreeding and consanguinity.

We performed a comparison between our study cohorts and their matched reference populations, according to principal component analysis (PCA) results. The comparisons within our target populations allowed us to show the different patterns in genetic constraints for a large subset of genes involved in the immune response, leading to the prioritization of a group of genes that we could define as “the most differentiated” in terms of signatures of genetic constraints. These differences are most prominent for three genes (MUC5AC, ABCA7, FLNA), which harbor a pattern of relaxation in the QGP cohort with respect to other cohorts analyzed. This different pattern of relaxation could be a hint for a different impact of the role of these genes in different populations.

Two of the genes, MUC5AC and FLNA, have already been linked to the COVID-19 host response to different degrees. The MUC5AC gene is a gel-forming mucin expressed in the lungs in response to infectious agents. This protein plays a protective role against inhaled pathogens, like influenza [38]. A recent study [39] compared levels of MUC5AC, MUC1 and MUC1-CT between critical ill COVID-19 patients and healthy controls, finding a significantly higher level of those proteins in the patients’ mucus. It is also worth noting that another recent work from Kousathanas et al. reported a significant genome-wide association between variants in the MUC1 gene and critical illness caused by [9,39].

The second gene, FLNA, codes for the Filamin A protein, which has been identified as a putative interaction candidate with coronaviruses S protein and is involved in the coronavirus replication cycle [40]. A recent study showed that the FLNA gene is part of the host protein-protein interaction (PPI) network for the SARS-CoV-2 virus and among the targets of different drugs under development [41]. A loss-of-function mutation of the FLNA gene was reported in family adults with emphysema [42].

There is no study showing a direct link between the ABCA7 gene and COVID-19 yet, but it has been proven that it is highly expressed in the reticuloendothelial system and modulates the phagocytosis activity [43,44], though its function, like many other ABC-transporters, has yet to be clarified. Interestingly the BioGRID interactome database [45] lists physical interactions of ABCA7 with ADBR2, C5AR2 and SGTB, among others. Each one of these genes has been linked to COVID-19 host response in previous studies, in terms of interaction [46], therapy [47] and severity in case of pre-existing health conditions [48].

Altogether with the information available on the prioritized genes and the knowledge of the different evolution of the pandemic between Qatar and Italy, we performed a proof-of-concept analysis. Using the information provided by a cohort of COVID-19 positive samples from Italy, to try to identify, if present, the contribution of the amount of ultra-rare variants in those genes to the outcome of the disease (Survived/Deceased) and the severity (Asymptomatic/Paucisymptomatic, Severe, Critical/life-threatening).

From a cohort-based perspective, we can see differences in the distribution of singletons in the COVID-19 positive samples regarding our study populations and their reference populations. This outcome suggests that those three genes could play a role in the description of the cohort and that investigating rare genetic variations occurring in those genetic regions could be a starting point to complete the characterization of those samples. With a subsequent approach, we applied logistic regression analyses to investigate the impact of the singleton burden in the three prioritized genes on the disease outcome and disease severity. While the contribution of age and sex is explicit and expected, these analyses suggested that the burden of singletons carried by each patient in the ABCA7 gene could predict a worse outcome together with age. From the point of view of the disease severity, the burden of singleton in the FLNA gene could help discriminate samples with distinct levels of disease severity. In this case, though, our findings seem to be inconsistent since we find that having a lower burden of singletons is a predictor of developing a severe reaction. However, a high singleton burden is a predictor of developing a critical reaction. This finding can be better explained by looking at the distribution of singletons in our cohort stratified by disease severity. For the FLNA gene, all samples belonging to class 2 of severity do not carry any singleton. This feature could be introduced by one of the limitations of this study: the sample size of the COVID-19 positive cohort. Increasing the number of cases will undoubtedly allow us to have a better estimate of the singleton distribution. Moreover, in our model, we didn’t include any of the risk factors that are already linked to a diverse response to the infection.

One last limitation could be represented by the inclusion of only one cohort of COVID-19 positive samples, for which only Whole Exome sequence data was available. We chose to include this cohort due to the phenotypical characterization, which allowed us to investigate our hypothesis of a genetic contribution to the disease severity prioritised genes. Nevertheless, for all the cohorts involved, information on the COVID-19 affected samples is already being collected. That will allow us to produce more precise results with further analyses.To our knowledge, this is the first study performing a whole-genome population-level comparison between Arabian and European populations, both differently affected by the pandemic. Recent similar studies focused only on the ACE2 receptor and populations from the 1000Genomes Project [49] or compared allele frequencies on covid-19 related genes in the Brazilian population with data from the 1000Genome and gnomAD datasets [50].

With the development of new vaccines against SARS-CoV-2 infection, we are bound to see a decrease in adverse disease outcomes and disease severity among the immunized populations. However, our work could be a starting point to better prioritize genes that could be therapeutical targets in different populations. Moreover, with the increased knowledge obtained thanks to the many studies that focused on understanding virus-host interaction, we could extend our method to any new similar threat that should arise in the future.

5. Conclusions

We were able to identify three candidate genes that could be further investigated for their role in the COVID-19 infection, and we want to stress the message that harnessing the information provided by rare genetic variants, in this still evolving context, is proving increasingly useful to explain the different outcomes of this disease.

Supplementary Materials

The following are available online https://www.mdpi.com/article/10.3390/genes12111842/s1. Figure S1: Inferred ancestry. Using the king software inferred ancestry function, we were able to correctly infer the ancestry of our target populations, matching the already available data. Figure S2: Disease severity classification in the TOV cohort. The box plots show that age is still a major discriminant factor related to disease severity, in the TOV cohort. Disease severity classes are defined as (1) Asymptomatic/Paucisymptomatic, (2) Severe, (3) Critical/life-threatening. Figure S3: Disease outcome classification in the TOV cohort. The box plots show the clear contribution of age, in the definition of the disease outcome classes. Table S1: Initial candidate gene list of 3617 genes curated according to the knowledge-literature base by the Ingenuity® Variant Analysis™ software (https://www.qiagenbioinformatics.com/products/ingenuity-variant-analysis (accessed on 15 November 2021)) from QIAGEN, Inc. (IVA) and the viral gene panel expert from Genomics England (GEL). Table S2: List of genes recently associated with COVID-19 susceptibility and severity. Table S3: Details of the comparisons defined between Target and Reference populations Table S4: Summary statistics of the phenotype information available for the TOV cohort. Table S5: Comparison between scores distribution of target populations vs. reference populations Table S6: Enrichment tests in target populations vs. reference populations. Table S7: DSC and SSC scores for FLNA, MUC5AC and ABCA7 in all 1000Genomes super populations. Table S8: Comparison of Gene Singleton Density between the COVID-19 positive cohort (TOV) and other target and reference populations. All singleton counts have been adjusted considering the sample size of each cohort. Table S9: Comparison of Coding Singleton Density between the COVID-19 positive cohort (TOV) and other target and reference populations. All singleton counts have been adjusted considering the sample size of each cohort. Table S10: Multinomial analysis results, comparing classes of disease severity, considering the burden of singletons in the whole gene. Class 1: Asymptomatic/Paucisymptomatic; Class 2: Severe; Class 3: Critical/life-threatening. Only significant predictors are reported. Table S11: Multinomial analysis results, comparing classes of disease severity, considering the burden of singletons in the coding regions of each gene. Class 1: Asymptomatic/Paucisymptomatic; Class 2: Severe; Class 3: Critical/life-threatening. Only significant predictors are reported. Table S12: Logistic analysis results for disease outcome classification. Only significant predictors are reported. Table S13: Logistic analysis results for disease outcome classification considering the burden of singletons in the coding regions of each gene. Only significant predictors are reported.

Author Contributions

Conceptualization, M.C. and H.M.; Methodology, M.C., M.M.; Formal Analysis, M.C., Y.A.-S. and C.S., H.M.; Resources, D.C., A.N., I.Q., G.N., W.A., A.A. (Azza AlTawashi), S.S., A.A. (Asma AlThani), S.I.I.; Writing—Original Draft Preparation, H.M., M.C, Y.A.-S., C.S., M.M.; Writing—Review & Editing, H.M., M.C., Y.A.-S., C.S., M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported in part by a grant of Regione Lazio (Italy, Progetti di Gruppi di Ricerca 2020 A0375-2020-36663 GecoBiomark) to G.N. and A.N. and Rome Foundation (Italy, Prot 317A/I) to GN. This work was also supported by IRCCS Burlo Garofolo of Trieste (5X1000 funding).

Institutional Review Board Statement

The recruitment and sequencing of participants from the Qatar Biobank (QBB) were approved by the Hamad Medical Corporation Ethics Committee in 2011 and continued with QBB IRB from 2017 onwards and renewed annually (protocol IRB-A-QBB-2019-0017). The genetic analyses were approved by the QBB IRB protocol E-2020-QBB-Res-ACC-0226-0130.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Access to the QGP data used for this study is through a dedicated portal by QGP (Accession ID: QF-QGP-RES-PUB-0226). The informed consent given by the study participants does not cover posting of participant level phenotype and genotype data of Qatar Biobank (QBB)/Qatar Genome Project (QGP) in public databases. Access to QBB/QGP data can be obtained through an established ISO-certified process by submitting a project request at https://www.qatarbiobank.org.qa/research/how-apply (accessed on 15 November 2021) which is subject to approval by the QBB IRB committee. The genetic data belonging to INGI cohorts and analyzed in this manuscript have been submitted to the European Variation Archive (EVA) are accessible in Variant Call Format (VCF) at the following link: https://www.ebi.ac.uk/ena/data/view/PRJEB33648 (accessed on 15 November 2021).

Acknowledgments

We would like to thank the Qatar Biobank for data collection and phenotyping. The Qatar Genome Program and Qatar Biobank are both Research, Development & Innovation entities within Qatar Foundation for Education, Science and Community Development. We would like to thank the inhabitants and local administrations of Friuli Venezia Giulia Region, Carlantino and Val Borbera that made these studies possible.

Conflicts of Interest

The authors declare no conflict of interest.

References

Price-Haywood, E.G.; Burton, J.; Fort, D.; Seoane, L. Hospitalization and Mortality among Black Patients and White Patients with COVID-19. N. Engl. J. Med. 2020, 382, 2534–2543. [Google Scholar] [CrossRef] [PubMed]
Al Kuwari, H.M.; Rahim, H.F.A.; Abu-Raddad, L.J.; Abou-Samra, A.B.; Al Kanaani, Z.; Al Khal, A.; Butt, A.A.; Al Kuwari, E.; Al Marri, S.; Al Masalmani, M.; et al. Epidemiological investigation of the first 5685 cases of SARS-CoV-2 infection in Qatar, 28 February–18 April 2020. BMJ Open 2020, 10. [Google Scholar] [CrossRef] [PubMed]
Omrani, A.S.; Almaslamani, M.A.; Daghfal, J.; Alattar, R.A.; Elgara, M.; Shaar, S.H.; Ibrahim, T.B.H.; Zaqout, A.; Bakdach, D.; Akkari, A. The first consecutive 5000 patients with Coronavirus Disease 2019 from Qatar: A nation-wide cohort study. BMC Infect. Dis. 2020, 20. [Google Scholar] [CrossRef]
Weekly Epidemiological Update on COVID-19. Available online: https://www.who.int/publications/m/item/weekly-epidemiological-update-on-covid-19 (accessed on 31 August 2021).
Milne, R. Societal considerations in host genome testing for COVID-19. Genet. Med. 2020, 22, 1464–1466. [Google Scholar] [CrossRef]
Pairo-Castineira, E.; Clohisey, S.; Klaric, L.; Bretherick, A.D.; Rawlik, K.; Pasko, D.; Baillie, J.K.; Walker, S.; Parkinson, N.; Fourman, M.H.; et al. Genetic mechanisms of critical illness in COVID-19. Nature 2021, 591, 92–98. [Google Scholar] [CrossRef]
COVID-19 Host Genetics Initiative. Mapping the human genetic architecture of COVID-19. Nature 2021. [Google Scholar] [CrossRef]
Asano, T.; Boisson, B.; Onodi, F.; Matuozzo, D.; Moncada-Velez, M.; Zhang, P.; Meertens, L.; Bolze, A.; Casanova, J.L.; Gervais, A.; et al. X-linked recessive TLR7 deficiency in ∼1 old with life-threatening COVID-19. Sci. Immunol. 2021, 6, eabl4348. [Google Scholar] [CrossRef]
Kousathanas, A.; Pairo-Castineira, E.; Rawlik, K.; Stuckey, A.; Odhams, C.A.; Walker, S.; Russell, C.D.; Malinauskas, T.; Millar, J.; Elliott, K.S.; et al. Whole genome sequencing identifies multiple loci for critical illness caused by COVID-19. medRxiv 2021, 15. [Google Scholar] [CrossRef]
Degenhardt, F.; Ellinghaus, D.; Juzenas, S.; Lerga-Jaso, J.; Wendorff, M.; Maya-Miles, D.; Uellendahl-Werth, F.; ElAbd, H.; Arora, J.; Özer, O.; et al. New susceptibility loci for severe COVID-19 by detailed GWAS analysis in European populations. medRxiv 2021, 9. [Google Scholar] [CrossRef]
The COVID-19 Host Genetics Initiative. A global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eur. J. Hum. Genet. EJHG 2020, 28, 715–718. [Google Scholar] [CrossRef]
Bioscience Genomics Involved in a Genetic Study on COVID-19-Bioscience Institute. Available online: https://bioinst.com/en/bioscience-genomics-involved-in-a-genetic-study-on-covid-19/ (accessed on 15 November 2021).
Colona, V.L.; Vasilou, V.; Watt, J.; Novelli, G.; Reichardt, J.K. Update on human genetic susceptibility to COVID-19: Susceptibility to virus and response. Hum. Genom. 2021, 15. [Google Scholar] [CrossRef]
Abu-Raddad, L.J.; Chemaitelly, H.; Ayoub, H.H.; Al Kanaani, Z.; Al Khal, A.; Al Kuwari, E.; Butt, A.A.; Coyle, P.; Jeremijenko, A.; Kaleeckal, A.H.; et al. Characterizing the Qatar advanced-phase SARS-CoV-2 epidemic. Sci. Rep. 2021, 11, 1–15. [Google Scholar] [CrossRef] [PubMed]
Jeremijenko, A.; Chemaitelly, H.; Ayoub, H.H.; Alishaq, M.; Abou-Samra, A.B.; Ajmi, J.A.A.A.; Ansari, N.A.A.A.; Kanaani, Z.A.; Khal, A.A.; Kuwari, E.A.; et al. Herd Immunity against Severe Acute Respiratory Syndrome Coronavirus 2 Infection in 10 Communities, Qatar-Volume 27, Number 5–May 2021-Emerging Infectious Diseases journal-CDC. Emerg. Infect. Dis. 2021, 27, 1343–1352. [Google Scholar] [CrossRef] [PubMed]
Polack, F.P.; Thomas, S.J.; Kitchin, N.; Absalon, J.; Gurtman, A.; Lockhart, S.; Gruber, W.C.; Marc, G.P.; Moreira, E.D.; Zerbini, C.; et al. Safety and Efficacy of the BNT162b2 mRNA COVID-19 Vaccine. N. Engl. J. Med. 2020, 383, 2603–2615. [Google Scholar] [CrossRef] [PubMed]
Baden, L.R.; El Sahly, H.M.; Essink, B.; Kotloff, K.; Frey, S.; Novak, R.; Zaks, T.; Diemert, D.; Spector, S.A.; Rouphael, N.; et al. Efficacy and Safety of the mRNA-1273 SARS-CoV-2 Vaccine. N. Engl. J. Med. 2021, 384, 403–416. [Google Scholar] [CrossRef]
Voysey, M.; Clemens, S.A.C.; Madhi, S.A.; Weckx, L.Y.; Folegatti, P.M.; Aley, P.K.; Angus, B.; Baillie, V.L.; Barnabas, S.L.; Bhorat, Q.E.; et al. Safety and efficacy of the ChAdOx1 nCoV-19 vaccine (AZD1222) against SARS-CoV-2: An interim analysis of four randomised controlled trials in Brazil, South Africa, and the UK. Lancet 2021, 397, 99–111. [Google Scholar] [CrossRef]
Logunov, D.Y.; Dolzhikova, I.V.; Shcheblyakov, D.V.; Tukhvatulin, A.I.; Zubkova, O.V.; Dzharullaeva, A.S.; Kovyrshina, A.V.; Lubenets, N.L.; Grousova, D.M.; Erokhova, A.S.; et al. Safety and efficacy of an rAd26 and rAd5 vector-based heterologous prime-boost COVID-19 vaccine: An interim analysis of a randomised controlled phase 3 trial in Russia. Lancet 2021, 397, 671–681. [Google Scholar] [CrossRef]
Stolfi, P.; Manni, L.; Soligo, M.; Vergni, D.; Tieri, P. Designing a Network Proximity-Based Drug Repurposing Strategy for COVID-19. Front. Cell Dev. Biol. 2020, 0, 1021. [Google Scholar] [CrossRef]
Mbarek, H.; Gandhi, G.D.; Selvaraj, S.; Al-Muftah, W.; Badji, R.; Al-Sarraj, Y.; Saad, C.; Darwish, D.; Alvi, M.; Fadl, T.; et al. Qatar Genome: Insights on Genomics from the Middle East. medRxiv 2021. [Google Scholar] [CrossRef]
Al Thani, A.; Fthenou, E.; Paparrodopoulos, S.; Al Marri, A.; Shi, Z.; Qafoud, F.; Afifi, N. Qatar Biobank Cohort Study: Study Design and First Results. Am. J. Epidemiol. 2019, 188, 1420–1433. [Google Scholar] [CrossRef]
Razali, R.M.; Rodriguez-Flores, J.; Ghorbani, M.; Naeem, H.; Aamer, W.; Aliyev, E.; Jubran, A.; Clark, A.G.; Fakhro, K.A.; Mokrab, Y. Thousands of Qatari genomes inform human migration history and improve imputation of Arab haplotypes. Nat. Commun. 2021, 12, 1–16. [Google Scholar] [CrossRef]
Cocca, M.; Barbieri, C.; Concas, M.P.; Robino, A.; Brumat, M.; Gandin, I.; Trudu, M.; Sala, C.F.; Vuckovic, D.; Girotto, G.; et al. A bird’s-eye view of Italian genomic variation through whole-genome sequencing. Eur. J. Hum. Genet. 2020, 28, 435–444. [Google Scholar] [CrossRef] [PubMed]
Latini, A.; Agolini, E.; Novelli, A.; Borgiani, P.; Giannini, R.; Gravina, P.; Smarrazzo, A.; Dauri, M.; Andreoni, M.; Rogliani, P.; et al. COVID-19 and genetic variants of protein involved in the SARS-CoV-2 entry into the host cells. Genes 2020, 11, 1010. [Google Scholar] [CrossRef] [PubMed]
McLaren, W.; Gil, L.; Hunt, S.E.; Riat, H.S.; Ritchie, G.R.S.; Thormann, A.; Flicek, P.; Cunningham, F. The Ensembl Variant Effect Predictor. Genome Biol. 2016, 17, 1–14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Manichaikul, A.; Mychaleckyj, J.C.; Rich, S.S.; Daly, K.; Sale, M.; Chen, W.M. Robust relationship inference in genome-wide association studies. Bioinformatics 2010, 26, 2867–2873. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Chang, C.C.; Chow, C.C.; Tellier, L.C.; Vattikuti, S.; Purcell, S.M.; Lee, J.J. Second-generation PLINK: Rising to the challenge of larger and richer datasets. GigaScience 2015, 4, 7. [Google Scholar] [CrossRef] [PubMed]
Auton, A.; Abecasis, G.R.; Altshuler, D.M.; Durbin, R.M.; Bentley, D.R.; Chakravarti, A.; Clark, A.G.; Donnelly, P.; Eichler, E.E.; Flicek, P.; et al. A global reference for human genetic variation. Nature 2015, 526, 68–74. [Google Scholar] [CrossRef] [Green Version]
QIAGEN-Inc. Ingenuity Variant Analysis (IVA). Available online: https://www.qiagenbioinformatics.com/products/ingenuity-variant-analysis (accessed on 15 November 2021).
Martin, A.R.; Williams, E.; Foulger, R.E.; Leigh, S.; Daugherty, L.C.; Niblock, O.; Leong, I.U.S.; Smith, K.R.; Gerasimenko, O.; Haraldsdottir, E.; et al. PanelApp crowdsources expert knowledge to establish consensus diagnostic gene panels. Nat. Genet. 2019, 51, 1560–1565. [Google Scholar] [CrossRef]
Lek, M.; Karczewski, K.J.; Minikel, E.V.; Samocha, K.E.; Banks, E.; Fennell, T.; O’Donnell-Luria, A.H.; Ware, J.S.; Hill, A.J.; Cummings, B.B.; et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 2016, 536, 285–291. [Google Scholar] [CrossRef] [Green Version]
Petrovski, S.; Wang, Q.; Heinzen, E.L.; Allen, A.S.; Goldstein, D.B. Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes. PLoS Genet. 2013, 9, e1003709. [Google Scholar] [CrossRef]
Mezzavilla, M.; Cocca, M.; Guidolin, F.; Gasparini, P. A population-based approach for gene prioritization in understanding complex traits. Hum. Genet. 2020, 139, 647–655. [Google Scholar] [CrossRef] [PubMed]
Karczewski, K.J.; Francioli, L.C.; Tiao, G.; Cummings, B.B.; Alföldi, J.; Wang, Q.; Collins, R.L.; Laricchia, K.M.; Ganna, A.; Birnbaum, D.P.; et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 2020, 581, 434–443. [Google Scholar] [CrossRef]
Esko, T.; Mezzavilla, M.; Nelis, M.; Borel, C.; Debniak, T.; Jakkula, E.; Julia, A.; Karachanak, S.; Khrunin, A.; Kisfali, P.; et al. Genetic characterization of northeastern Italian population isolates in the context of broader European genetic diversity. Eur. J. Hum. Genet. 2013, 21, 659–665. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Thareja, G.; Al-Sarraj, Y.; Belkadi, A.; Almotawa, M.; Ismail, S.; Al-Muftah, W.; Badji, R.; Mbarek, H.; Darwish, D.; Fadl, T.; et al. Whole genome sequencing in the Middle Eastern Qatari population identifies genetic associations with 45 clinically relevant traits. Nat. Commun. 2021, 12. [Google Scholar] [CrossRef] [PubMed]
Ehre, C.; Worthington, E.N.; Liesman, R.M.; Grubb, B.R.; Barbier, D.; O’Neal, W.K.; Sallenave, J.M.; Pickles, R.J.; Boucher, R.C. Overexpressing mouse model demonstrates the protective role of Muc5ac in the lungs. Proc. Natl. Acad. Sci. USA 2012, 109, 16528–16533. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lu, W.; Liu, X.; Wang, T.; Liu, F.; Zhu, A.; Lin, Y.; Luo, J.; Ye, F.; He, J.; Zhao, J.; et al. Elevated MUC1 and MUC5AC mucin protein levels in airway mucus of critical ill COVID-19 Patients. J. Med. Virol. 2021, 93, 582–584. [Google Scholar] [CrossRef]
Trincone, A.; Schwegmann-Weßels, C. Looking for a needle in a haystack: Cellular proteins that may interact with the tyrosine-based sorting signal of the TGEV S protein. Virus Res. 2015, 202, 3–11. [Google Scholar] [CrossRef]
Chakrabarty, B.; Das, D.; Bulusu, G.; Roy, A. Network-Based Analysis of Fatal Comorbidities of COVID-19 and Potential Therapeutics. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 18, 1271–1280. [Google Scholar] [CrossRef] [PubMed]
Valentin, V.; Bervar, J.F.; Vincent-Delorme, C.; Smol, T.; Wemeau, L.; Remy, M.; Le Rouzic, O.; Chenivesse, C. Filamin A Mutations: A New Cause of Unexplained Emphysema in Adults? Chest 2021, 159, e131–e135. [Google Scholar] [CrossRef]
Jehle, A.W.; Gardai, S.J.; Li, S.; Linsel-Nitschke, P.; Morimoto, K.; Janssen, W.J.; Vandivier, R.W.; Wang, N.; Greenberg, S.; Dale, B.M.; et al. ATP-binding cassette transporter A7 enhances phagocytosis of apoptotic cells and associated ERK signaling in macrophages. J. Cell Biol. 2006, 174, 547–556. [Google Scholar] [CrossRef] [Green Version]
Iwamoto, N.; Abe-Dohmae, S.; Sato, R.; Yokoyama, S. ABCA7 expression is regulated by cellular cholesterol through the SREBP2 pathway and associated with phagocytosis. J. Lipid Res. 2006, 47, 1915–1927. [Google Scholar] [CrossRef] [Green Version]
Oughtred, R.; Rust, J.; Chang, C.; Breitkreutz, B.J.; Stark, C.; Willems, A.; Boucher, L.; Leung, G.; Kolas, N.; Zhang, F.; et al. The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci. 2021, 30, 187–200. [Google Scholar] [CrossRef] [PubMed]
Zhou, Y.; Hou, Y.; Shen, J.; Mehra, R.; Kallianpur, A.; Culver, D.A.; Gack, M.U.; Farha, S.; Zein, J.; Comhair, S.; et al. A network medicine approach to investigation and population-based validation of disease manifestations and drug repurposing for COVID-19. PLoS Biol. 2020, 18. [Google Scholar] [CrossRef] [PubMed]
Woodruff, T.M.; Shukla, A.K. The Complement C5a-C5aR1 GPCR Axis in COVID-19 Therapeutics. Trends Immunol. 2020. [Google Scholar] [CrossRef]
Lee, M.E.; Chang, Y.; Ahmadinejad, N.; Johnson-Agbakwu, C.E.; Bailey, C.; Liu, L. COVID-19 Mortality is Associated with Impaired Innate Immunity in Pre-existing Health Conditions. bioRxiv 2021. [Google Scholar] [CrossRef]
Cao, Y.; Li, L.; Feng, Z.; Wan, S.; Huang, P.; Sun, X.; Wen, F.; Huang, X.; Ning, G.; Wang, W. Comparative genetic analysis of the novel coronavirus (2019-nCoV/SARS-CoV-2) receptor ACE2 in different populations. Cell Discov. 2020. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Secolin, R.; de Araujo, T.K.; Gonsales, M.C.; Rocha, C.S.; Naslavsky, M.; Marco, L.D.; Bicalho, M.A.; Vazquez, V.L.; Zatz, M.; Silva, W.A.; et al. Genetic variability in COVID-19-related genes in the Brazilian population. Hum. Genome Var. 2021, 8. [Google Scholar] [CrossRef]

Figure 1. PCA plot of the QGP and INGI cohorts projected onto 1000Genomes Project data. As expected, the first two principal components already show the separation between the QGP and the INGI cohorts and the overlap with the selected populations for the ancestry-related comparisons.

Figure 2. Distributions of the prioritization scores. Violin plots of the distributions of DSC (top panel) and SSC (bottom panel) scores in the subset of selected genes for all target populations (CAR, FVG, VBI, QGP) and all reference outbred populations (AFR, EUR, SAS) from 1000Genomes project.

Table 1. Results from comparison of DSC scores between target cohorts (CAR, FVG, VBI, QGP) and the relevant reference superpopulations from the 1000 Genomes Project (EUR, AFR, SAS). The last column refers to the nature of the comparison carried out, as detailed in Supplementary Table S3.

		DSC Score
Transcript ID	Gene Name	QGP	CAR	FVG	VBI	EUR	AFR	SAS	Comparison
ENST00000369850	FLNA	3.854	−2.435	0.272	−2.510	−2.399	−2.166	−1.879	C5
ENST00000350763	TNC	3.370	−3.792	1.388	−0.631	2.666	1.838	2.187	C4
ENST00000389048	ALK	2.575	3.651	0.290	−4.098	3.388	3.212	2.852	C4
ENST00000263094	ABCA7	2.566	−0.433	−2.168	0.071	3.020	2.681	2.150	C4
ENST00000647814	ABCC2	2.528	−3.466	0.467	3.004	2.562	2.877	2.508	C4
ENST00000621226	MUC5AC	2.435	−2.404	−2.017	−2.892	3.477	3.500	3.032	C4
ENST00000634891	RYR3	2.229	−3.377	−1.554	−3.586	−2.431	2.639	−3.449	C8
ENST00000542267	FBXL17	2.026	−1.232	0.180	−2.658	3.086	0.266	2.477	C4
ENST00000589042	TTN	−2.242	−2.595	3.411	4.584	−2.388	−1.965	3.498	C1
ENST00000357387	RICTOR	−2.369	−2.206	−0.033	2.407	2.181	1.284	−4.070	C7
ENST00000561890	MUC22	−2.472	−1.682	−1.632	2.156	−2.562	−2.191	−2.425	C3
ENST00000336596	EPHA3	−3.001	2.139	−3.421	−1.535	−3.704	1.921	−2.814	C3
ENST00000648947	INO80	−3.444	−1.309	2.424	−2.928	−3.439	−2.692	−0.727	C3
ENST00000389484	LRP1B	−4.888	−4.916	3.192	2.602	−4.245	−2.136	3.231	C1

Table 2. Results from comparison of SSC scores between target cohorts (CAR, FVG, VBI, QGP) and the relevant reference superpopulations from the 1000 Genomes Project (EUR, AFR, SAS). The last column refers to the nature of the comparison carried out, as detailed in Supplementary Table S3.

		SSC Score
Transcript ID	Gene Name	QGP	CAR	FVG	VBI	EUR	AFR	SAS	Comparison
ENST00000378473	PLCB4	−4.524	2.917	−3.907	−0.413	−4.272	−2.873	−3.079	C3
ENST00000366574	RYR2	−4.347	3.792	−4.694	−5.053	−3.704	−2.318	−2.087	C3
ENST00000315872	ROCK2	−3.680	3.822	−2.372	−1.149	−4.160	−3.248	−3.665	C3
ENST00000361445	MTOR	−3.371	−0.975	0.100	2.659	−3.378	−4.276	−3.888	C3
ENST00000358691	HELZ	−3.131	4.249	1.658	−3.729	−3.437	−2.929	3.413	C1
ENST00000355286	EYA4	−3.000	−1.900	2.671	−1.486	−2.087	−2.589	−0.756	C3
ENST00000381501	TEC	−2.996	−2.615	−1.656	3.085	−2.497	−2.427	−0.767	C3
ENST00000265382	PIP5K1B	−2.952	2.574	−0.576	−2.746	−3.246	−3.197	−1.583	C3
ENST00000359015	MAP3K5	−2.758	2.108	0.850	1.431	−3.555	−2.472	−3.293	C3
ENST00000335670	RORA	−2.586	−2.953	1.176	2.228	−2.499	−2.743	−0.526	C3
ENST00000370056	VAV3	−2.523	3.467	1.312	1.038	−3.282	−2.646	−1.406	C3
ENST00000381298	IL6ST	−2.522	−1.224	3.859	2.542	−2.120	−1.466	−2.644	C3
ENST00000432237	CD163	−2.506	−1.404	2.793	−2.306	−2.419	−0.629	−2.156	C3
ENST00000392552	GPR155	−2.338	−1.261	−1.336	2.338	−2.417	−1.608	−2.585	C3
ENST00000382292	SACS	−2.324	−4.408	3.917	2.284	−3.530	−2.726	−2.082	C3
ENST00000392132	XRCC5	−2.176	−2.147	2.722	−1.257	−2.673	−2.107	−1.787	C3
ENST00000313708	EBF1	−2.068	2.253	−1.422	−0.914	−2.980	−1.665	−3.222	C3
ENST00000400841	CRLF2	2.036	−1.347	−1.496	−2.083	2.581	2.185	1.052	C4
ENST00000369850	FLNA	2.058	−3.158	−0.860	−4.025	−3.073	−3.351	−3.097	C5
ENST00000344327	TRPC6	2.062	−3.776	−2.671	−2.711	−3.382	0.278	−2.242	C5
ENST00000263317	NOX4	2.225	−2.716	−2.554	−2.717	2.134	3.532	3.770	C4
ENST00000403662	CSF2RB	2.237	−2.363	1.702	0.620	2.613	0.319	2.782	C4
ENST00000297494	NOS3	2.243	1.436	−2.178	−0.851	2.109	2.460	2.455	C4
ENST00000295598	ATP1A1	2.258	−2.679	0.547	0.930	−2.204	−1.886	−2.448	C5
ENST00000085219	CD22	2.288	0.576	0.028	−2.311	2.368	−0.600	2.142	C4
ENST00000305877	BCR	2.397	−1.338	2.028	−3.021	3.994	2.923	3.631	C4
ENST00000333149	TRIM50	2.501	2.275	1.138	−2.022	2.271	1.372	3.197	C4
ENST00000271332	CELSR2	2.522	3.651	−2.581	2.455	2.443	−2.129	2.936	C2
ENST00000447648	TECPR1	2.666	−2.351	1.822	−1.669	2.777	3.213	−0.027	C4
ENST00000324856	ARID1A	3.434	−3.120	−2.683	−1.845	−2.085	−2.053	0.835	C5
ENST00000263094	ABCA7	3.796	−1.601	−2.581	1.004	2.591	3.325	3.998	C4
ENST00000372923	DNM1	3.941	−2.514	−1.222	−0.710	−2.066	−1.575	−2.077	C5
ENST00000621226	MUC5AC	3.965	−3.705	−3.751	−4.601	3.679	3.267	4.244	C4
ENST00000533211	SPTBN2	4.531	−2.266	1.209	−1.893	2.589	2.191	2.778	C4
ENST00000529681	MUC5B	4.744	3.085	1.483	−2.070	4.884	4.396	5.095	C4

Table 3. List of genes with a concordant signature of selection between DSC and SSC scores, after the comparison between target cohorts (CAR, FVG, VBI, QGP) and the relevant reference superpopulations from the 1000 Genomes Project (EUR, AFR, SAS).

		DSC Score							SSC Score
Transcript ID	Gene Name	QGP	CAR	FVG	VBI	EUR	AFR	SAS	QGP	CAR	FVG	VBI	EUR	AFR	SAS
ENST00000369850	FLNA	3.854	−2.435	0.272	−2.510	−2.399	−2.166	−1.879	2.058	−3.158	−0.860	−4.025	−3.073	−3.351	−3.097
ENST00000263094	ABCA7	2.566	−0.433	−2.168	0.071	3.020	2.681	2.150	3.796	−1.601	−2.581	1.004	2.591	3.325	3.998
ENST00000621226	MUC5AC	2.435	−2.404	−2.017	−2.892	3.477	3.500	3.032	3.965	−3.705	−3.751	−4.601	3.679	3.267	4.244

Table 4. Comparison of Singleton burden between the COVID-19 positive cohort (TOV) and other target and reference populations. The reported p-values refer to the comparison between whole gene singleton burden (“p-value whole gene” column) and coding regions singletons burden (“p-value CDS region” column). All singleton counts have been adjusted considering the sample size of each cohort.

Transcript ID	Gene Name	Cohort	p-Value Whole Gene	p-Value CDS Regions
ENST00000369850	FLNA	CAR	0.630140	0.409653
		FVG	0.046901	0.316565
		VBI	0.000458	0.013015
		QGP	0.000028	0.039803
		EUR	0.312323	0.786342
		AFR	0.878006	0.787767
		SAS	0.200408	0.813561
ENST00000263094	ABCA7	CAR	3.2746 × 10⁻¹¹	2.5959 × 10⁻⁷
		FVG	3.1278 × 10⁻²³	1.0535 × 10⁻¹⁷
		VBI	7.1607 × 10⁻²¹	2.0060 × 10⁻¹⁶
		QGP	6.2413 × 10⁻⁶³	1.5606 × 10⁻⁴⁰
		EUR	4.4467 × 10⁻¹⁰	1.0966 × 10⁻⁸
		AFR	1.7360 × 10⁻⁸	5.3713 × 10⁻⁹
		SAS	2.3435 × 10⁻⁴	3.7924 × 10⁻⁶
ENST00000621226	MUC5AC	CAR	7.4274 × 10⁻¹²	8.4692 × 10⁻¹¹
		FVG	1.8836 × 10⁻³¹	2.7623 × 10⁻²⁴
		VBI	2.3148 × 10⁻³⁶	3.2118 × 10⁻³⁰
		QGP	1.5512 × 10⁻⁷⁶	1.2101 × 10⁻⁶⁴
		EUR	3.3142 × 10⁻⁶	4.0071 × 10⁻⁸
		AFR	2.9701 × 10⁻⁸	1.0241 × 10⁻¹⁰
		SAS	7.2316 × 10⁻³	9.9471 × 10⁻⁵

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mbarek, H.; Cocca, M.; Al-Sarraj, Y.; Saad, C.; Mezzavilla, M.; AlMuftah, W.; Cocciadiferro, D.; Novelli, A.; Quinti, I.; AlTawashi, A.; et al. Poking COVID-19: Insights on Genomic Constraints among Immune-Related Genes between Qatari and Italian Populations. Genes 2021, 12, 1842. https://doi.org/10.3390/genes12111842

AMA Style

Mbarek H, Cocca M, Al-Sarraj Y, Saad C, Mezzavilla M, AlMuftah W, Cocciadiferro D, Novelli A, Quinti I, AlTawashi A, et al. Poking COVID-19: Insights on Genomic Constraints among Immune-Related Genes between Qatari and Italian Populations. Genes. 2021; 12(11):1842. https://doi.org/10.3390/genes12111842

Chicago/Turabian Style

Mbarek, Hamdi, Massimiliano Cocca, Yasser Al-Sarraj, Chadi Saad, Massimo Mezzavilla, Wadha AlMuftah, Dario Cocciadiferro, Antonio Novelli, Isabella Quinti, Azza AlTawashi, and et al. 2021. "Poking COVID-19: Insights on Genomic Constraints among Immune-Related Genes between Qatari and Italian Populations" Genes 12, no. 11: 1842. https://doi.org/10.3390/genes12111842

APA Style

Mbarek, H., Cocca, M., Al-Sarraj, Y., Saad, C., Mezzavilla, M., AlMuftah, W., Cocciadiferro, D., Novelli, A., Quinti, I., AlTawashi, A., Salvaggio, S., AlThani, A., Novelli, G., & Ismail, S. I. (2021). Poking COVID-19: Insights on Genomic Constraints among Immune-Related Genes between Qatari and Italian Populations. Genes, 12(11), 1842. https://doi.org/10.3390/genes12111842

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Poking COVID-19: Insights on Genomic Constraints among Immune-Related Genes between Qatari and Italian Populations

Abstract

1. Introduction

2. Materials and Methods

2.1. Population Description

2.2. Principal Component Analysis

2.3. Genes Selection and Prioritization Analyses

2.4. WES COVID-19 Cohorts

3. Results

3.1. Population Stratification

3.2. Population Based Gene Prioritization

3.3. COVID-19 Cohort Analysis

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI