Next Article in Journal
Comparative Transcriptome Analysis of Two Aegilops tauschii with Contrasting Drought Tolerance by RNA-Seq
Previous Article in Journal
A Simple and Effective Flow Cytometry-Based Method for Identification and Quantification of Tissue Infiltrated Leukocyte Subpopulations in a Mouse Model of Peripheral Arterial Disease
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Sampling of Defective Pathways in Alzheimer’s Disease. Implications in Drug Repositioning

by
Juan Luis Fernández-Martínez
1,2,*,
Óscar Álvarez-Machancoses
1,2,
Enrique J. deAndrés-Galiana
1,3,
Guillermina Bea
1 and
Andrzej Kloczkowski
4,5
1
Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C/Federico García Lorca, 18, 33007 Oviedo, Spain
2
DeepBioInsights, C/Federico García Lorca, 18, 33007 Oviedo, Spain
3
Department of Informatics and Computer Science, University of Oviedo, C/Federico García Lorca, 18, 33007 Oviedo, Spain
4
Battelle Center for Mathematical Medicine, Nationwide Children’s Hospital, Columbus, OH 43205, USA
5
Department of Pediatrics, The Ohio State University, Columbus, OH 43205, USA
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2020, 21(10), 3594; https://doi.org/10.3390/ijms21103594
Submission received: 27 April 2020 / Revised: 9 May 2020 / Accepted: 13 May 2020 / Published: 19 May 2020
(This article belongs to the Section Molecular Biophysics)

Abstract

:
We present the analysis of the defective genetic pathways of the Late-Onset Alzheimer’s Disease (LOAD) compared to the Mild Cognitive Impairment (MCI) and Healthy Controls (HC) using different sampling methodologies. These algorithms sample the uncertainty space that is intrinsic to any kind of highly underdetermined phenotype prediction problem, by looking for the minimum-scale signatures (header genes) corresponding to different random holdouts. The biological pathways can be identified performing posterior analysis of these signatures established via cross-validation holdouts and plugging the set of most frequently sampled genes into different ontological platforms. That way, the effect of helper genes, whose presence might be due to the high degree of under determinacy of these experiments and data noise, is reduced. Our results suggest that common pathways for Alzheimer’s disease and MCI are mainly related to viral mRNA translation, influenza viral RNA transcription and replication, gene expression, mitochondrial translation, and metabolism, with these results being highly consistent regardless of the comparative methods. The cross-validated predictive accuracies achieved for the LOAD and MCI discriminations were 84% and 81.5%, respectively. The difference between LOAD and MCI could not be clearly established (74% accuracy). The most discriminatory genes of the LOAD-MCI discrimination are associated with proteasome mediated degradation and G-protein signaling. Based on these findings we have also performed drug repositioning using Dr. Insight package, proposing the following different typologies of drugs: isoquinoline alkaloids, antitumor antibiotics, phosphoinositide 3-kinase PI3K, autophagy inhibitors, antagonists of the muscarinic acetylcholine receptor and histone deacetylase inhibitors. We believe that the potential clinical relevance of these findings should be further investigated and confirmed with other independent studies.

1. Introduction

Alzheimer’s disease (AD) is the most common cause of dementia associated with aging, causing the loss of intellectual and social skills. The causes of Alzheimer’s are not yet fully understood. The Early-Onset form of Alzheimer’s Disease (EOAD) is due to mutations in chromosome 21, causing the formation of abnormal amyloid protein (APP) [1], and mutations on chromosomes 14 and 1 leading to abnormal presenilins 1 and 2 which undergo cleavage of APP [2,3]. However, the causes of Late-Onset Alzheimer’s Disease (LOAD) are not yet completely understood. The prevalent common view is that they likely include a combination of genetic, environmental, and lifestyle factors that determine the risk for developing the disease.
The main milestones in the genetic analysis of LOAD go from the early descriptions of apolipoprotein E (APOE) gene polymorphisms in LOAD and APP gene mutations in EOAD in 1991 to the presenilin 1 (PS1) and presenilin 2 (PS2) gene mutations in EOAD in 1995. Since 2009 until now new polymorphisms influencing the risk of LOAD are being reported every year. Ricciarelli et al. (2004) investigated gene expression in Alzheimer’s disease and aging, reporting 314 genes that were differentially expressed in LOAD cerebral cortex, and confirming via reverse transcription polymerase chain reaction (RT-PCR) the increased expression of the interferon-induced protein 3 in LOAD brains [4]. Kong et al. (2009) performed independent component analysis (ICA) of Alzheimer’s DNA microarray gene expression data [5]. They identified more than 50 significant genes with high expression levels in severe LOAD, representing immunity-related proteins, metal binding proteins, membrane proteins, lipoproteins, neuropeptides, cytoskeleton proteins, cellular binding proteins, and ribosomal proteins. Our understanding of the etiology and pathogenesis of LOAD was predominantly influenced by recent developments of genetics [6]. Particularly, genome-wide association studies (GWAS) identified over 20 genetic loci associated with LOAD.
Recent genetic data continue to support the amyloid hypothesis of LOAD with protective variants being found in the amyloid gene, and both common low-risk and rare high-risk variants being discovered in genes that are part of the amyloid response pathways. These data support the view that genetic variability in how the brain responds to amyloid deposition is a potential therapeutic target for the disease, and are consistent with the notion that anti-amyloid therapies should be initiated early in the disease process (Hardy et al. 2014) [7]. Genome-wide association studies involved genes related to three main ontological pathways: [1] Endosomal vesicle recycling (phosphatidylinositol binding clathrin assembly protein (PICALM) coding gene, bridging integrator-1 protein (BIN1) coding gene). (2) Cholesterol and lipid metabolism (apolipoprotein E (APOE) gene, clusterin protein (CLU) coding gene, and binding cassette subfamily A member 7 protein (ABCA7) coding gene) [3]. Innate immune system (Clusterin complement C3b/C4b receptor 1 (CR1) gene, membrane-spanning 4-domains subfamily A (MS4A) gene cluster, triggering receptor expressed on myeloid cells 2 (TREM2) gene) [8].
Giri et al. (2016) overviewed the genes associated with LOAD, noting that the majority of genes associated with LOAD cluster roughly within three main pathways: lipid metabolism, inflammatory response and endocytosis [9]. Additionally, several signaling pathways associated with LOAD might modulate various processes, such as the reduction of amyloid-β aggregation and inflammation, regulation of mitochondrial dynamics, and increased neuronal activity [10].
The difficulty to define some common mechanisms that could be responsible for development of LOAD is partly due to the high degree of underdeterminacy that is present in all genetic experiments. Besides, no single gene can describe complex diseases such as Alzheimer’s, caused by a combination of genetic, environmental, and lifestyle factors. Complex diseases do not obey the standard Mendelian patterns of inheritance, and it is well known that genes involved in complex diseases work in synergy (see for instance [11]).
Pathway and network analysis is a good alternative to understand Omics data, and allows finding distinct cellular processes and signaling pathways that are associated with the set of differentially expressed genes. Pathway analysis needs databases with pathway collections and interaction networks, and programming packages to analyze the data. The most popular freely available public collections of pathways and interaction networks are Kyoto Encyclopedia of Genes and Genomes (KEGG) [12] and REACTOME [13]. Pathway and network analysis of cancer genomes is currently used for better understanding of various types of tumors [14]. Dimitrakopoulos and Beerenwinkel (2017) reviewed several computational methods of the identification of cancer genes and the analysis of pathways [15]. For AD, Mizuno et al. (2012) developed a publicly available pathway map called AlzPathway (http://alzpathway.org/) that comprehensively catalogs signaling pathways in AD using CellDesigner [16]. AlzPathway is currently composed of 1347 molecules and 1070 reactions in neuron, brain blood barrier, presynaptic, postsynaptic, astrocyte, and microglial cells and their cellular localizations. There are still some outstanding challenges concerning both annotations and methodologies [17]. The annotation challenges are due to low-resolution of available databases; while the methodological challenges concern mainly finding the set of genes that are indeed related to the disease and understanding the dynamical nature of biological systems and the effect of external stimuli.
In this paper, we try to address the first methodological challenge related to the phenotype prediction problem, i.e. the development of robust computational methods of linking the cause (genotype) and the effect (phenotype). Researchers typically use sets of differentially expressed genes, but fold change is sensible to the presence of noise in genetic data and in the wrong class assignment of the samples [18].
The holdout sampler [19] looks for different equivalent high discriminatory genetic networks that are related to the uncertainty space of the classifier that is used to predict the phenotype. The holdout sampler generates different random 75/25 data bags (or holdouts): 75% of the data in each bag is used for learning and 25% for blind validation. For each of these bags the small-scale genetic signatures (header genes) are determined. The posterior analysis consists of finding the most frequently sampled genes taking into account all the highly predictive networks, that is, the small-scale genetic signatures with high validation accuracy. The biological pathways can be identified performing posterior analysis of these signatures established during the cross-validation holdouts and plugging the set of most frequently sampled genes into ontological platforms. That way, the effect of helper genes whose presence might be due to noise or to the high degree of underdeterminacy of these experiments is damped. As we briefly explain in the next section, this algorithm is inspired by the sampling of the equivalence region of a regression problem using bootstrapping (random data sampling with replacement) to find different sets of equivalent predicting parameters.
We show the application of this algorithm to the analysis of the genetic pathways involved in LOAD and mild cognitive impairment (MCI), obtaining an unexpected association with influenza viral RNA transcription and replication as the main mechanisms in LOAD and MCI development. Neurodegenerative diseases could be induced by chronic and viral infections that may lead to a loss of neural tissue in the central nervous system. It has published rare instances in which acute severe encephalitic viral diseases directly cause transient symptomatic Parkinson Disease [20]. Besides, in the comparison of the LOAD patients vs. healthy controls (HC) we have also compared the altered genetic pathways derived by using various sampling algorithms to probe the hypothesis of biological invariance [21], that is, the genetic pathways that are involved in the disease development should be independent of numerical algorithm (classifier and sampling algorithm) that is used to unravel them. The discrimination between LOAD and MCI could not be clearly achieved and their joint (LOAD + MCI) difference from healthy controls can be achieved with a common list of genes found in their individual comparisons (LOAD vs. HC and MCI vs. HC). This fact is somehow expected as MCI may represent an early stage of LOAD and the genetic mechanism may be the same.
Finally, based on the results of these analyses we show some implications for drug repositioning for LOAD-MCI.

2. Results

2.1. Comparison of Late-Onset Alzheimer’s Disease (LOAD) Patients and Healthy Controls

Table 1 shows the list of most discriminatory genes of the phenotypes of the LOAD patients compared with HC. In this case the predictive accuracy estimation is based on Leave-One-Out-Cross Validation (LOOCV) by averaging the LOOCV predictive accuracy over all samples of the validation dataset in each bag and involves a simple k-Nearest-Neighbor classifier in the reduced set of high discriminatory genes (small-scale signature).
Table 2 shows the most frequently sampled genes (sampling frequency higher than 0.7%) detected by the holdout algorithm. We also provide the mean of the expression in each group, the fold change, the Fisher’s ratio, and the sampling frequency. Table 3 shows the same analysis by using the Fisher’s ratio sampler (with a sampling frequency higher than 0.35) and Table 4 shows the results obtained with Random Forest (with a sampling frequency higher than 0.19). The sampling frequencies are varying and depend on the sampling algorithm.
Table 5 shows the pathway analysis using the list of most highly sampled genes together with the summary of the results obtained for all the comparisons performed in this paper. The scores relative to these pathways are given in Table S1 as supplementary material. We have used the sampled genes with a frequency higher than 0.2, since this set contains enough discriminatory genes to perform the pathway enrichment. This sampling frequency has been judged to be optimum in the enrichment analysis. Figure 1 shows the correlation network between the most discriminatory genes of the LOAD vs. healthy control phenotype and serves to explain how the most discriminatory genes are interrelated and control gene expression.

2.2. Comparison of Mild Cognitive Impairment (MCI) Patients and Healthy Controls

Table 6 shows the list of most discriminatory genes for the MCI vs. control phenotype discrimination. Table 7 shows the most discriminatory genes sampled in different networks for the Control vs. MCI phenotype found. In this case we have considered a sampling frequency higher than 0.38%. Table S2 (given in Supplementary Materials) shows the scores relative to the main genetic pathways involved in MCI vs. HC. Figure 2 shows the correlation network between the most discriminatory genes of the MCI phenotype.

2.3. Comparison of MCI and Alzheimer’s Disease (AD) Patients

Table 8 shows the list of most discriminatory genes for the MCI vs. LOAD phenotype discrimination. Table 9 shows the most discriminatory genes sampled in different networks for the LOAD vs. MCI phenotype with a sampling frequency higher than 0.5%. Table S3 (given in Supplementary Materials) shows the scores relative to the main genetic pathways in the discrimination between MCI and LOAD. Figure 3 shows the correlation network between the most discriminatory genes of the LOAD/MCI phenotype.

2.4. Comparison of MCI+LOAD Patients with Healthy Controls

Table 10 shows the list of most discriminatory genes for the MCI vs. LOAD phenotype discrimination. Table 11 shows the most discriminatory genes sampled in different networks for the LOAD + MCI vs. HC phenotype with a sampling frequency higher than 0.5%. Table S4 (given in Supplementary Materials) shows the scores relative to the main genetic pathways involved in the MCI + LOAD vs. HC discrimination. This last classification has been performed since the difference between MCI and LOAD cannot be easily established.
Table 12 summarizes the most important findings found in the main comparisons.

3. Discussion

3.1. LOAD vs. Healthy Controls Classification

In the LOAD vs. HC comparison, the maximum Fisher’s ratio obtained was 1.29 and corresponds to MRPL51. Only six genes have a Fisher’s ratio higher than 1. This fact gives an idea about the discriminatory power of gene expression data. The highest accuracy (79.52%) was obtained just with the two first genes: MRPL51 and CETN. This accuracy was increased up to 84% by using a genetic signature of 21 genes that have been sampled within the set of most discriminatory genes. This signature is shown in Table 12, that serves to summarize the results. The most frequently sampled genes were RPL36AL, MRPL51, LOC4011206, and RPS27A (Table 2). These five genes are underexpressed in LOAD.
RPL36AL is a protein-coding gene involved in metabolism and viral mRNA translation. It has been found that the overexpression of this gene is associated with cellular proliferation in hepatocellular carcinoma [22]. MPRL51 is also a protein-coding gene related to mitochondrial translation. Recently it has been found that mitochondrial genes are altered early in blood in LOAD and MCI, showing reduced expression of OXPHOS (oxidative phosphorylation) genes. RPS27A is also a protein-coding gene involved in interferon gamma signaling pathway and activated TLR4 signaling involved in immune responses. CETN (Centrin-1) is a protein-coding gene that plays a fundamental role in microtubule organization. LOC4011206 is a non-characterized gene.
The main pathways with high scoring matches found in the LOAD vs. HC discrimination are viral mRNA translation, influenza viral RNA transcription and replication, gene expression, mitochondrial translation, rRNA processing, and metabolism (Table S1). Other pathways involved are organelle biogenesis, HIV life cycle, antigen presentation, and TCR signaling. Besides, Table 5 shows the comparison of these pathways to those found by the Fisher’s ratio sampler and the Random Forest sampler. It can be observed that all the samplers depicted similar pathways. Besides the most important biological processes involved (see Table 12) also pointed in the same direction.
The correlation network (Figure 1) shows one unique link relating the header gene MRPL51 to NDUFA1, which is involved in the mitochondrial respiratory chain pathway for the generation of cellular energy. The correlation between both genes is very high and positive (0.94). The main sub-tree develops under LOC646200 (RPL22-60S ribosomal protein L22-heparin binding protein HBp15). This protein can bind specifically to Epstein–Barr virus-encoded small RNAs.

3.2. MCI vs. Healthy Controls Classification

In the MCI vs. HC comparison, the maximum Fisher’s ratio was 1.47, and corresponds to TAX1BP1. The highest accuracy (77.17%) was obtained with the first 59 most discriminatory genes. Table 6 shows only genes of this signature with Fisher’s ratio greater than 1. This accuracy was increased up to 81.5% by using a genetic signature of 13 genes (given in the summary Table 12) that have been sampled within the set of most discriminatory genes.
The most important sampled genes (Table 4) are underexpressed in MCI samples with respect to healthy controls. Only DENND1C, SULT1A3 are overexpressed in MCI. DENND1C is a protein-coding gene involved in clathrin-mediated endocytosis that seems to play an important role in AD [23]. SULT1A3 is a protein-coding gene related to sulfotransferase enzymes that catalyze the sulfate conjugation of many hormones and neurotransmitters. It has been found that Alzheimer’s subjects have a significant lower SULTA13 copy number compared to healthy controls [24]. Induction of SULTA13 significantly protects cells from dopamine neurotoxicity. The effect of dopamine in AD has been discussed in the literature [25]. The number of most-frequently sampled genes with respect to the LOAD vs. HC discrimination has increased, but the sampling frequencies of these genes have decreased. This result could be interpreted in the sense that the MCI discrimination vs. HC is more ambiguous than the LOAD vs. HC discrimination, and might correspond to the fact that MCI is an earlier stage than LOAD.
The pathways with higher scores found by the Holdout sampler are given in Table S2 and involve similar pathways to the LOAD vs. HC comparison (viral mRNA translation, gene expression, influenza viral RNA transcription and replication, metabolism of proteins, rRNA processing in the nucleus and cytosol, ubiquitin-proteasome proteolysis, antigen processing, cell cycle checkpoints, metabolism, HIV life cycle, mitotic metaphase and anaphase, CLEC7A (Dectin1 signaling), cellular senescence, TCR signaling, etc.). The main biological processes involved are also similar to the LOAD vs. HC comparison. Although not shown in the paper, the other samplers also depicted similar pathways.
The header gene in the correlation network (Figure 2), TAX1BP1, is positively correlated to TMC01 and VAMP7, which is only correlated to DENND1. TAX1BP1 encodes a HTLV-1 tax1 binding protein that interacts with TNFAIP3 (tumor necrosis factor (TNF) alpha induced protein 3) and inhibits TNF-induced apoptosis. Among its related pathways are apoptosis, autophagy, and innate immune system. The degradation of this protein by caspase-3-like family proteins is associated with apoptosis induced by TNF. This protein might also have a role in the inhibition of inflammatory signaling pathways. Interestingly, the caspases have been also found to be involved in ALS.
The most important sub-tree in the correlation network concerns TMC01, that plays a key role in calcium homeostasis. Dysregulation of calcium homeostasis in Alzheimer’s disease has been pointed by Kawahara (2004), Small (2009), Brawek and Garaschuk (2014) [26,27,28]. TMCO1 is positively correlated to RPL17 (Ribosomal Protein L17). Among its related pathways are viral mRNA translation and MAPK signaling. VAMP7 is a protein-coding gene involved in the targeting and/or fusion of transport vesicles to their target membrane during transport of proteins. This gene has multiple interesting functions according to the Human Gene Database.

3.3. MCI vs. LOAD Classification

MCI represents an early stage of AD, and therefore the genetic background of LOAD might be shared by an important subgroup of MCI subjects. However, not all MCI subjects convert to LOAD since they may also convert to other diseases or remain stable over time. Therefore, this comparison should be interpreted as a temporal difference between both conditions, and the analysis was done even though it was not expected to find highly discriminatory genes.
The highest accuracy (68%) was obtained with the first 118 genes, showing that, as expected, the difference between LOAD and MCI could not be clearly discriminated. Besides, to support this fact, only the three first genes in Table 5 have a Fisher’s ratio greater than 0.5: RPS4Y1, HLA-DRB1, and JARID1D. This accuracy was increased up to 74% by using a genetic signature of 8 genes that have been sampled within the set of most discriminatory genes: RPS4Y1, FAM96A, HS.460758, CLEC2A, SNORA25, SMCHD1, GIMAP2, MAP2K2. Some of these genes are related to the RANK (Receptor activator of nuclear factor-kappa B)-signaling pathway. The most important pathway involves the regulation of apoptosis by protein ubiquitination and degradation, which is one of the major mechanisms to regulate apoptotic cell death (Yang and Yu, 2003) [29]. A regulated balance between cell survival and apoptosis is essential for normal development and homeostasis of multicellular organisms. Defects in control of this balance may contribute to autoimmune disease, neurodegeneration, and cancer. A second important pathway concerns regulation of stress-activated MAP kinases by G protein signaling, that has an important role in the immune system [30] and has been a target in LOAD [31]. Besides, it is interesting to observe that within the set of most frequently sampled genes shown in Table 9 some genes are underexpressed and other overexpressed. This is different than in previous comparisons (MCI and LOAD vs. HC) where most of the discriminatory genes were underexpressed. The main pathway related to the underexpressed genes is chromatin regulation/acetylation. The epigenetic alterations in AD have been outlined by Sanchez-Mutt and Gräff (2015) [32]. The overexpressed genes are related to the regulation of activated PAK-2p34 by proteasome mediated degradation as main pathway. In both cases, pathways related to the immune system response are involved. The correlation network (Figure 3) has one main sub-tree relating RPS4Y1 and XIST, and two other final nodes (JARID1D and HS.546019). Xist (X-inactive specific transcript) is an RNA gene on the X chromosome of the placental mammals that acts as a major effect of the X inactivation process. The X-chromosome instability phenotype in LOAD has been studied by Bajić et al. (2009) [33]. Additionally, Barati and Mansour (2015) [34] pointed that Xist has the highest level of overexpression in LOAD using microarrays expression techniques.

3.4. MCI+LOAD vs. Healthy Controls Classification

The maximum Fisher’s ratio (1.22) in this comparison (Table 8) corresponds to LOC401206 (RPS25P6) that according to Aceview (NCBI) is an intronless pseudogene derived from the RPS25 gene. RPS25 (Ribosomal Protein S25) is a protein-coding gene. Among its related pathways are viral mRNA translation and activation of the mRNA. Within the set of most discriminatory genes we found MRPL1 and TAX1BP1 that also appeared in the LOAD and MCI vs. HC individual comparisons. Only SULT1A3 is overexpressed in MCI and LOAD in the set of high discriminatory genes with FR greater than 0.8. The major pathways (shown in Table S4 provided as supplementary material) and biological processes involved are similar to those of the LOAD and MCI vs. HC comparisons, reinforcing the viral hypothesis found in previous comparisons.

3.5. Implications in Drug Repositioning

Regarding therapeutics GeneAnalytics identified bortezomib and carfilzomib as potential compounds acting on some of the genes that are involved in the deregulated LOAD and MCI pathways. Bortezomib and carfilzomib are proteasome inhibitors used for multiple myeloma and mantle cell lymphoma. Nevertheless, GeneAnalytics does not provide information if these compounds act in the right direction to regulate genes involved in LOAD and MCI. Therefore, this finding indicates only the genetic processes and the genes that are involved.
A more detailed drug repositioning was performed via Dr Insights (Chan et al., 2019) that uses the Connectivity Map (CMAP) transcriptomic experiments in different types of cell lines [35,36].
Table 13 shows the results of drug repositioning via Dr. Insights, using the list of most discriminatory genes identified by the holdout sampler. This analysis highlighted the importance of different compounds analyzed in breast cancer cell lines (MCF7) and one compound in prostate cancer cell line (PC3). In this table for each drug we also show the pathways that are affected by the genes whose expressions are increased or decreased as deduced from CMAP.
Emetine and its desmethyl analog cephaeline are isoquinoline alkaloids, which is a large class of nitrogen-rich natural compounds. Isoquinoline alkaloids are not a structurally homogeneous group, and their properties depend on the different degrees of oxygenation and intramolecular arrangements. Emetine is used for the treatment of amebiasis and is a component of ipecac syrup. Emetine is both myotoxic and cardiotoxic. Some isoquinoline alkaloids, such as galantamine, have been approved to treat MCI and other memory impairments. Its mechanism of action consists in cholinesterase inhibition that prevents the breakdown of the neurotransmitter acetylcholine, increasing its level in the synaptic cleft and in brain areas lacking cholinergic neurons. This treatment does not cure the condition but slows the rate of cognitive decline. Recently, different isoquinoline alkaloids have been studied as potential compounds for the treatment of LOAD [37,38].
This drug affects the tumor necrosis factor (TNF)-related apoptosis-inducing ligand (TRAIL) which has been shown to be unregulated in HIV-1-infected and immune-activated macrophages. TRAIL is also induced on neuron by beta-amyloid protein, an important pathogen for Alzheimer’s disease [39].
It has been shown that neutralization of TNFSF10 ameliorates functional outcome in a murine model of Alzheimer’s disease [40,41]. Besides, it is evidenced that necroptosis is a major driver on neuron cell death in neurodegenerative diseases [42]. Additionally, among the pathways associated with genes whose expression has been decreased, the most important are related to cellular senescence and cellular response to stress.
Tanespimycin is antitumor antibiotic used for the treatment of leukemia and different types of solid cancer, whose mechanism of action consists of inhibiting Hsp90 (heat shock protein 90), which is a chaperone protein that assists other proteins to fold properly, stabilizes proteins against heat stress, and aids in protein degradation. Oxidative stress may cause Hsp60 structure modifications leading to loss of Hsp60 functions with the consequences of protein misfolding, aggregation and deposition. Besides, Hsp90 downregulation may induce the reduction of Tau hyperphosphorylation and aggregation and may trigger the so-called stress response. That is, in the presence of cellular stress and Hsp90 inhibitors, Heat Shock Factor 1 (HSF-1) protein dissociates from the chaperone, reaches the nucleus, inducing the activation of heat shock genes and of the stress response via the production of Hsp90, Hsp70, and Hsp40, restoring protein homeostasis [43].
The pathways associated with the genes whose expression has been increased are linked to the regulation of HSF1 heat shock response and unfolded protein response. The genes with decreased expression control apoptosis and signaling by TGF-beta receptor complex through a phosphorylated receptor SMAD (R-SMAD).
Wortmannin is a potent PI3K inhibitor, that serve to inhibit different phosphoinositide 3-kinase enzymes, which are part of the PI3K/AKT/mTOR pathway, an important signaling pathway for many cellular functions such as growth control, metabolism and translation initiation. Wortmannin and LY-294002 are autophagy inhibitors [44]. The main pathways related to the genes with increased expression are: toxicity of botulinum toxin type D, Insulin Receptor Signaling (IRS) activation, estrogen biosynthesis, collagen biosynthesis and growth hormone receptor signaling. Botulinum toxin is a neurotoxic protein produced by the Clostridium botulinum that prevents the release of the neurotransmitter acetylcholine from axon endings.
The main pathways related to the genes whose expression are decreased are related to RNA modification in the nucleus, regulation of TP53 activity through methylation, fatty acyl-CoA biosynthesis, and galactose catabolism.
Biperiden is a muscarinic antagonist that blocks the activity of the muscarinic acetylcholine receptor, and has effects in the central and peripheral nervous systems. It has been used in the treatment of Parkinsonism [45]. The increased expression genes pathways are related to the Nuclear Pore Complex (NPC), which is the largest protein complex in the cell and also to the HIV life cycle; while the genes whose expression has been decreased are related to the secretin family receptors, that are involved in numerous key neurotransmitter systems in the brain and seem to be disrupted in Alzheimer’s disease (AD) hemostasis, and also in the regulation of the adaptive immune system [46].
Trichostatin A is a histone deacetylase inhibitor (HDACi). Acetylated histones and DNA methylation play important role in Multiple Sclerosis (MS) [47,48]. The overexpressed genes are related to glutamate neurotransmitter release cycle, antigen activation of B-cell receptor and Collapsin Response Mediator Proteins (CRMPs) in Sema3A signaling that modulate the immune system in neurological disorders with inflammatory components [49]. Glutamate neurotransmission is critical for synaptic plasticity and survival of neurons. However, excessive activity causes excitotoxicity and promotes cell death. This constitutes a potential mechanism of neurodegeneration occurred in Alzheimer’s disease [50]. The glutamate pathway has been found to be crucial in the development of Chronic Fatigue Syndrome (CFS) in cancer patients treated with radiotherapy [51].
The decreased expression genes are related to SMAD transcription factors, which play the key in the most versatile cytokine signaling pathways via the Transforming Growth Factor-β (TGFβ) pathway. The role of these factors in AD has been investigated [52]. Cyclin D, synthesized in G1 phase, is involved in regulating cell cycle progression, and is activated through phosphorylation. Overexpression of cell cycle proteins of peripheral lymphocytes (CDK2, CDK4, CDK6, cyclin B, and cyclin D) was observed in AD patients [53]. Finally, the last pathway concerns the transcriptional activation of mitochondrial biogenesis that controls the energy generating functions of mitochondria in accordance with the metabolic demands. The last drug that has been repositioned with high score is LY-294002, which is a PI3K inhibitor, like wortmannin. The main pathways are associated with RNA polymerase promoter, and adaptive immune system response, among others.
Other drugs that appear to be involved with lower scores are: saquinavir-MCF7 (5.2 × 10−6), sirolimus-MCF7 (1 × 10−7), and valproic acid-PC3 (5 × 10−5, 2 × 10−4). Saquinavir is an antiretroviral drug used to treat or prevent HIV/AIDS. Saquinavir is a protease inhibitor that cleaves protein molecules into smaller fragments. Sirolimus (also known as rapamycin) is an immunosuppressant, that inhibits activation of T-cells and B-cells by reducing their sensitivity to interleukin-2 (IL-2) through mTOR inhibition. Finally, valproic acid is a medication primarily used to treat epilepsy and bipolar disorder. Proposed mechanisms for the valproic acid include increasing brain levels of gamma-aminobutyric acid (GABA), blocking of voltage-gated sodium channels, and inhibiting histone deacetylases.

4. Materials and Methods

We performed the pathway analysis of a cohort of patients with Late Onset Alzheimer Disease (LOAD), mild cognitive impairment (MCI), and healthy subjects (controls) (HC). The source of data were Alzheimer case-control samples originated from the EU funded AddNeuroMed Cohort, which is a large cross-European AD biomarker study relying on human blood as the source of RNA (Sood et al., 2015) [54]. The dataset contains 38323 probes and 329 samples (145 LOAD, 80 MCI, and 104 healthy controls). These authors identified a set of 150 probe sets that could be used in a diagnosis of Alzheimer disease (AD) based on gene expression. This multi-tissue RNA signature, extracted from peripheral blood samples, could be used as a diagnostic tool. Nevertheless, its predictive value has been recently discussed and received some criticism [55,56].
We performed a retrospective cohort study using a novel machine learning and uncertainty methodology to sample different combinations of highly predictive genes for four different phenotype prediction problems, identifying the discriminatory genetic pathways in each case: LOAD vs. HC; MCI vs. HC, MCI vs. LOAD and MCI + LOAD vs. HC. Identification of the deregulated (or defective) genetic pathways is critical to establish personalized therapies. The potential relevance of these findings could be further investigated in clinical studies.

4.1. Sampling Defective Pathways in Phenotype Prediction Problems

To understand the uncertainty in phenotype prediction problems and the need of robust methods of sampling, we first present the existing inherent uncertainty in a simple linear regression.
The least squares fitting of a linear model y * x = a 0 + a 1 x to a set of experimental data x 1 , y 1 ,   x 2 , y 2 , ,   x s , y s   consists of finding the parameters m = a 0 , a 1 so that the distance between the observed data y obs = y 1 y s and the corresponding predictions y pre m = y 1 * m y s * m is minimum according to the Euclidean distance in s . The regression problem is equivalent to finding the least squares solution of the linear system of equations F m = y obs , where the matrix F = 1 s     x depends on the abscissas of the data points x = x 1 x s .
The uncertainty analysis of the least squares solution consists in sampling the family of equivalent models m = a 0 , a 1 that fit the observed data y obs within the same error bounds:
M t o l = m = a 0 , a 1 :   y obs y pre m 2 y obs 2 < t o l .
Fernández-Martínez et al. (2012, 2013) demonstrated that the topography of the data error cost function corresponds to a straight flat elongated valley if the inverse problem is linear, whereas in the nonlinear case, the cost function topography consists of one or more curvilinear valleys (or basins) of low misfits eventually connected by saddle points. In this simple linear regression problem, the equivalent models belong to an ellipse whose axes and orientations are related to the eigenvalues and eigenvectors of the matrix F T F [57,58,59].
This mathematical result might sound a little bit technical, but its importance lies in the fact that the uncertainty space of any decision problem has a deterministic structure. Furthermore, it can be shown that a simple way of sampling these equivalent model parameters in a linear regression problem consists of finding the least-squares solution of these different data bags. That is, solving a set of different regression problems with partial information, for example, with 75% of the observed data in fitting and the rest (25%) in validation (evaluating the predictive accuracy of this set of parameters). This procedure is named as 75/25 holdout (bootstrap) technique.
Figure 4 shows a numerical example of this procedure performed in simple linear regression problem. The figure shows the ellipse of model uncertainty for a relative misfit of 20%, and the different sets of parameters in the least squares fitting of different bagging experiments. It can be observed that these sets belong to the region of uncertainty and the sampling is denser along the axis of maximum uncertainty of this ellipse.
Similarly, phenotype prediction problem can be viewed as a generalized regression between the discriminatory sets of genes that characterize the given phenotype and sets of sample classes that form the training data set [11]. As in the linear regression problem, one of the main obstacles in the analysis of genetic data is the absence of a conceptual model that relates the different genes/probes to the class prediction (phenotype). For this reason, a classifier L * g   has to be constructed, as an algorithm that maps the set of genetic signatures g to the set of classes into which the phenotype is divided:
L * g : g s C = HC LOAD ; HC MCI ; HC LOAD + MCI ; LOAD MCI .
In this paper we have performed four different comparisons: LOAD vs. HC, MCI vs. HC, LOAD vs. MCI, and LOAD + MCI vs. HC, to better understand the molecular mechanisms involved in LOAD and the main differences between LOAD and MCI.
Finding the discriminatory genetic signatures corresponding to L * g can be interpreted as a generalized regression problem of the observed sample class vector   c o b s with respect to the genetic signatures. For that purpose, the modeling is divided into 2 steps: learning and validation. The learning process consists of giving a subset of samples T (training data set) whose class vector   c o b s is known, and finding the subset of genetic signatures g , of minimum size that maximizes the learning accuracy (the percentage of samples with correctly predicted class):
A c c g = 100 L * g c obs 1 ,    
Here L * g c obs 1 stands for the prediction error (in percentage). In practice, the predictive accuracy of a genetic signature is established via cross-validation. Typically leave-one-out-cross validation (LOOCV) is used to use all the genetic samples that we have at disposal. The genetic signature with the highest accuracy and having the least number of discriminatory genes is named the smallest-scale signature. The validation consists in predicting the class of a new sample (whose class is unknown) using the genetic signatures that have been found during the learning process. The stability of the smallest-scale signature found at the learning step can be established via a holdout sampler procedure, where different samples for the training and validation set are randomly selected.
It is important to remark that the phenotype prediction problems have always a high degree of underdeterminacy, since the number of monitored probes (genes) is much greater than the number of samples from human subjects enrolled in clinical trials. Therefore, the learning step involves several lists of genes with similar predictive accuracy. These genes might be involved in the genetic pathways associated with the disease. For a given classifier the smallest-scale signature is the one that has the least number of discriminatory genes with the highest predictive accuracy. This knowledge is very important for early diagnosis and treatment optimization. Due to noise in the genetic data and class assignment, some of these highly discriminatory signatures might be false, containing genes uninvolved in the genetic pathways [18].
To deal with this uncertainty problem we use bootstrapping methodology to sample the genes that are discriminatory in different holdouts and perform posterior analysis of these discriminatory networks to unravel the biological pathways that are involved in the disease development. This algorithm has been named holdout sampler [19] and has been used to sample the uncertainty space in various inverse problems [60,61]. It has been also successfully applied in phenotype prediction and drug design [62,63,64,65]. Besides, the predicted pathways are compared with those obtained using other sampling algorithms such as the Fisher’s ratio sampler [66] and Random Forest [67]. Random Forest (RF) are random decision trees for classification via ensemble learning. RF have been used for phenotype prediction and uncertainty analysis by Pang et al. [68]. These algorithms clearly outperform Bayesian Networks (BN) [69] that sample the posterior distribution of the genetic signatures related to the phenotype prediction, P g / c o b s , according to Bayes rule:
P g / c obs ~ P g . P c obs / g ,
Here P g is the prior distribution of the genetic signatures, which is uniform in the set of discriminatory genes, or proportional to their discriminatory power, and P c o b s / g is the likelihood of the genetic signature g , that depends on its predictive accuracy Acc g as follows:
P c obs / g = k e A c c g .
BNs have been used for the analysis of defective pathways by Jiang et al. [70] to discover gene interactions and by Su et al. [71] to analyze epigenetic modifications that affect the diseases. BNs have not used in this paper because they are very ineffective for pathways sampling and also very time consuming, due to optimization procedure of finding the optimum probability factorization. This procedure is inadequate for pathways sampling since the set of possible probabilistic factorizations of the uncertainty space is not unique. In fact, all the samplers follow implicitly equation [4] with different prior probabilities defined for different sets of discriminatory genes. Additionally, the likelihood P c obs / g , depends on the type of classifier and on the cost function that is used to define the predictive accuracy Acc g .
For all the algorithms used in this paper the flowchart is shown in Figure 5 and described as follows:
Data bagging: different random 75/25 data bags holdouts are generated, where 75% of the data is used for learning and 25% for validation. In the present study, we have used 1000 different bags.
Gene selection: For each of these bags the genes are selected and the classifier is built. The set of genes that are used by different classifiers is first restricted to the most discriminatory ones to avoid the impact of genes that do not contribute mechanistically to the phenotype discrimination. The genes with Fisher’s ratio higher than a minimum cut-off (0.5 in this case) are selected. The accuracy is calculated on the validation set. This way of proceeding is based on the fact that variables with high discriminatory power serve to span the main features of the classification, while variables with lowest discriminatory ratios account for the details in the discrimination. This method determines the minimum amount of high-frequency details (helper genes) that are needed to optimally discriminate between classes promoting the header genes, which are those that explain the phenotype in a robust way. This methodology [11,18] has been successfully applied to the bioinformatics modeling of high-dimensional Omics data [72,73], and in the analysis of medical decision problems using hospital data [74,75].
Posterior analysis: once the data bagging simulation and analysis have been finished, the posterior analysis consists of finding all the minimum size signatures that have predictive accuracy of validation higher than a given tolerance. In this case, we have considered all the holdouts with predictive accuracy higher than 80%. Finally, we performed frequency analysis of these lists to find the most frequently sampled genes to establish the links with defective genetic pathways in Reactome Pathway Database.

5. Conclusions

In this paper, we present a novel algorithm to sample defective pathways involved in phenotype prediction problems that are highly underdetermined. The methodology consists of sampling different equivalent high-discriminatory genetic networks that are associated with the uncertainty space of the k-NN classifier that is used to separate the LOAD, MCI, and HC classes. To perform this task, the algorithm looks for the minimum scale signatures (header genes) corresponding to different 75/25 random holdouts. This methodology is related to bootstrapping techniques [76]. The biological pathways can be identified by performing posterior analysis, finding the most frequently sampled genes in these minimum scale signatures, and plugging these sets into ontological platforms. That way, the effect of helper genes whose presence might be due to noise or to the high degree of underdeterminacy of these experiments is damped. The proposed algorithm is very fast and simple to implement. However, a major problem is that predicted pathways may depend on the number of genes that are provided. Our experience recommends first using a sufficient number of representative genes. The second recommendation is to perform alternative analyses using variable numbers of genes to establish the cutoff frequency.
We show the application of this methodology to the analysis of the defective pathways in AD and MCI compared to healthy individuals. The results show common pathways for AD and MCI patients that are related to viral mRNA translation, influenza viral RNA transcription and replication, gene expression, mitochondrial translation, rRNA processing and metabolism. These pathways seem to be very consistent both for LOAD and MCI. The viral pathways have never been cited before among key pathogenic pathways involved in AD [9].
The predictive accuracies to discriminate LOAD and MCI from healthy controls were 84% and 81.5%, respectively. In addition, LOAD and MCI could not be clearly discriminated (74% accuracy). The most discriminatory genes of the LOAD-MCI discrimination are associated to regulation of apoptosis by protein ubiquitination and degradation and regulation of stress-activated MAP kinases by G protein signaling. This fact implies that MCI and LOAD can be diagnosed using similar pathways. We have also provided the list of genes that optimally discriminate LOAD and MCI from HC, which include the discriminatory genes that were found in individual discriminations of LOAD and MCI vs. HC.
In conclusion, the retrospective analysis of this LOAD-MCI dataset using novel sampling approaches provides a new working hypothesis for the main genetic mechanisms involved in LOAD-MCI. Based on these findings we propose a set of repositioned drugs by using CMAP methodologies. The main drugs belong to the category of isoquinoline alkaloids, antitumor antibiotics, PI3K and autophagy inhibitors, antagonists of the muscarinic acetylcholine receptor, and histone deacetylase inhibitors. The mechanisms of action of these drugs are also detailed and correlated with specific pathways.
We believe that the potential clinical relevance of these findings should be further investigated and confirmed with clinical studies of other independent cohorts using similar platforms to avoid possible biases.

Supplementary Materials

Supplementary materials can be found at https://www.mdpi.com/1422-0067/21/10/3594/s1.

Author Contributions

J.L.F.-M., Ó.Á.-M., E.J.d.-G and G.B. algorithm development and software. J.L.F.-M. and A.K. conceptualization, Writing—Original Draft preparation. A.K. project administration. All writing-review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NSF grant DBI 1661391, and NIH grants R01 GM127701 and R01 GM127701-01S1.

Acknowledgments

We acknowledge financial support from NSF grant DBI 1661391, and NIH grants R01 GM127701 and R01 GM127701-01S1.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Goate, A.; Chartier-Harlin, M.C.; Mullan, M.; Brown, J.; Crawford, F.; Fidani, L.; Giuffra, L.; Haynes, A.; Irving, N.; James, L.; et al. Segregation of a missense mutation in the amyloid precursor protein gene with familial Alzheimer’s disease. Nature 1991, 349, 704–706. [Google Scholar] [CrossRef] [PubMed]
  2. Cruts, M.; Van Broeckhoven, C. Presenilin mutations in Alzheimer’s disease. Hum. Mutat. 1998, 11, 183–190. [Google Scholar] [CrossRef]
  3. De Strooper, B. Loss-of-function presenilin mutations in Alzheimer disease. Talking Point on the role of presenilin mutations in Alzheimer disease. EMBO Rep. 2007, 8, 141–146. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Ricciarelli, R.; d’Abramo, C.; Massone, S.; Marinari, U.M.; Pronzato, M.A.; Tabaton, M. Microarray Analysis in Alzheimer’s Disease and Normal Aging. Life 2004, 56, 349–354. [Google Scholar] [CrossRef]
  5. Kong, W.; Mou, X.; Liu, Q.; Chen, Z.; Vanderburg, C.R.; Rogers, J.T.; Huang, X. Independent component analysis of Alzheimer’s DNA microarray gene expression data. Mol. Neurodegener. 2009, 4, 5. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Tanzi, R.E. The Genetics of Alzheimer Disease. Cold Spring Harbor Perspect. Med. 2012, 2, a006296. [Google Scholar] [CrossRef]
  7. Hardy, J.; Bogdanovic, N.; Winblad, B.; Portelius, E.; Andreasen, N.; Cedazo-Minguez, A.; Zetterberg, H. Pathways to Alzheimer’s disease. J. Intern. Med. 2014, 275, 296–303. [Google Scholar] [CrossRef]
  8. Jones, L.; Holmans, P.A.; Hamshere, M.L.; Harold, D.; Moskvina, V.; Ivanov, D.; Pocklington, A.; Abraham, R.; Hollingworth, P.; Sims, R.; et al. Genetic evidence implicates the immune system and cholesterol metabolism in the aetiology of Alzheimer’s disease. PLoS ONE 2010, 5, e13950. [Google Scholar] [CrossRef]
  9. Giri, M.; Zhang, M.; Lü, Y. Genes associated with Alzheimer’s disease: An overview and current status. Clin. Interv. Aging 2016, 11, 665–681. [Google Scholar] [CrossRef] [Green Version]
  10. Godoy, J.A.; Rios, J.A.; Zolezzi, J.M.; Braidy, N.; Inestrosa, N.C. Signaling pathway cross talk in Alzheimer’s disease. Cell Commun. Signal. 2014, 12, 23. [Google Scholar] [CrossRef] [Green Version]
  11. De Andrés-Galiana, E.J.; Fernández-Martínez, J.L.; Sonis, S.T. Design of biomedical robots for phenotype prediction problems. J. Comp. Biol. 2016, 23, 678–692. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Ogata, H.; Goto, S.; Sato, K.; Fujibuchi, W.; Bono, H.; Kanehisa, M. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 1999, 27, 29–34. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Vastrik, I.; D’Eustachio, P.; Schmidt, E.; Geeta, J.T.; Gopal, G.; Croft, D.; de Bono, B.; Gillespie, M.; Jassal, B.; Lewis, S.; et al. Reactome: A Knowledge Base of Biologic Pathways and Processes. Genome Biol. 2007, 8, R39. [Google Scholar] [CrossRef] [Green Version]
  14. Creixell, P.; Reimand, J.; Haider, S.; Wu, G.; Shibata, T.; Vazquez, M.; Mustonen, V.; Gonzalez-Perez, A.; Pearson, J.; Sander, C.; et al. Pathway and network analysis of cancer genomes. Nat. Methods 2015, 12, 615–621. [Google Scholar] [PubMed]
  15. Dimitrakopoulos, N.; Beerenwinkel, C.M. Computational approaches for the identification of cancer genes and pathways. WIREs Syst. Biol. Med. 2017, 9, e1364. [Google Scholar] [CrossRef] [PubMed]
  16. Mizuno, S.; Iijima, R.; Ogishima, S.; Kikuchi, M.; Matsuoka, Y.; Ghosh, S.; Miyamoto, T.; Miyashita, A.; Kuwano, R.; Tanaka, H. AlzPathway: A comprehensive map of signaling pathways of Alzheimer’s disease. BMC Syst. Biol. 2012, 6, 52. [Google Scholar] [CrossRef] [Green Version]
  17. Khatri, P.; Sirota, M.; Butte, A.J. Ten years of pathway analysis: Current approaches and outstanding challenges. PLoS Comput. Biol. 2012, 8, e1002375. [Google Scholar] [CrossRef]
  18. DeAndrés-Galiana, E.J.; Fernández-Martínez, J.L.; Sonis, S.T. Sensitivity analysis of gene ranking methods in phenotype prediction. J. Biomed. Inf. 2016, 64, 255–264. [Google Scholar] [CrossRef]
  19. Cernea, A.; Fernández-Martínez, J.L.; deAndrés-Galiana, E.J.; Fernández-Ovies, F.J.; Fernández-Muñiz, Z.; Alvarez-Machancoses, Ó.; Saligan, L.; Sonis, S.T. Sampling defective pathways in phenotype prediction problems via the Holdout sampler. In International Conference on Bioinformatics and Biomedical Engineering; Springer: Cham, Switzerland, 2018; Volume 10814, pp. 24–32. [Google Scholar]
  20. Karim, S.; Mirza, Z.; Kamal, M.A.; Abuzenadah, A.M.; Azhar, E.I.; Al-Qahtani, M.H.; Damanhouri, G.A.; Ahmad, F.; Gan, S.H.; Sohrab, S.S. The role of viruses in neurodegenerative and neurobehavioral diseases. CNS Neurol. Disord. Drug Targets 2014, 13, 1213–1223. [Google Scholar] [CrossRef]
  21. Alvarez, O.; Fernández-Martínez, J.L. The importance of Biological Invariance in Drug Design. J. Sci. Technol. Res. 2019, 18, 13211–13212. [Google Scholar]
  22. Kim, J.H.; You, K.R.; Kim, I.H.; Cho, B.H.; Kim, C.Y.; Kim, D.G. Over-expression of the ribosomal protein L36a gene is associated with cellular proliferation in hepatocellular carcinoma. Hepatology 2004, 39, 129–138. [Google Scholar] [CrossRef] [PubMed]
  23. Wu, F.; Yao, P.J. Clathrin-mediated endocytosis and Alzheimer’s disease: An update. Ageing Res. Rev. 2009, 8, 147–149. [Google Scholar] [CrossRef]
  24. Butcher, N.J.; Horne, M.K.; Mellick, G.D.; Fowler, C.J.; Masters, C.L.; Minchin, R.F. Sulfotransferase 1A3/4 copy number variation is associated with neurodegenerative disease. Pharmacogenom. J. 2017, 18, 209–214. [Google Scholar] [CrossRef] [PubMed]
  25. Martorana, A.; Koch, G. Is dopamine involved in Alzheimer’s disease? Front. Aging Neurosci. 2014, 6, 252. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  26. Kawahara, M. Disruption of calcium homeostasis in the pathogenesis of Alzheimer’s disease and other conformational diseases. Curr. Alzheimer Res. 2004, 1, 87–95. [Google Scholar] [CrossRef]
  27. Small, D.H. Dysregulation of Calcium Homeostasis in Alzheimer’s Disease. Neurochem. Res. 2009, 34, 1824. [Google Scholar] [CrossRef]
  28. Brawek, B.; Garaschuk, O. Network-wide dysregulation of calcium homeostasis in Alzheimer’s disease. Cell Tissue Res. 2014, 357, 427. [Google Scholar] [CrossRef]
  29. Yang, Y.; Yu, X. Regulation of apoptosis: The ubiquitous way. FASEB J. 2003, 17, 790–799. [Google Scholar] [CrossRef]
  30. Huang, G.; Shi, L.Z.; Chi, H. Regulation of JNK and p38 MAPK in the immune system: Signal integration, propagation and termination. Cytokine 2009, 48, 161–169. [Google Scholar] [CrossRef] [Green Version]
  31. Bogoyevitch, M.A.; Boehm, I.; Oakley, A.; Ketterman, A.J.; Barr, R.K. Targeting the JNK MAPK cascade for inhibition: Basic science and therapeutic potential. Biochim. Biophys. Acta 2004, 1697, 89–101. [Google Scholar] [CrossRef]
  32. Sanchez-Mut, J.V.; Gräff, J. Epigenetic Alterations in Alzheimer’s Disease. Front. Behav. Neurosci. 2015, 9, 347. [Google Scholar] [CrossRef] [PubMed]
  33. Bajić, V.; Spremo-Potparević, B.; Živković, L.; Siedlak, S.L.; Casadeus, G.; Smith, M.A. The X-chromosome instability phenotype in Alzheimer’s disease: A clinical sign of accelerating aging? Med. Hypotheses 2009, 73, 917–920. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Barati, M.; Mansour, E. A Gene Expression Profile of Alzheimer’s Disease Using Microarray Technology. Zahedan J. Res. Med. Sci. 2016, 18, e7950. [Google Scholar] [CrossRef] [Green Version]
  35. Lamb, J.; Crawford, E.D.; Peck, D.; Modell, J.W.; Blat, I.C.; Wrobel, M.J.; Lerner, J.; Brunet, J.P.; Subramanian, A.; Ross, K.N.; et al. The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease. Science 2006, 313, 1929–1935. [Google Scholar] [CrossRef] [Green Version]
  36. Chan, J.; Wang, X.; Turner, J.A.; Baldwin, N.E.; Gu, J. Breaking the paradigm: Dr Insight empowers signature-free, enhanced drug repurposing. Bioinformatics 2019, 35, 2818–2826. [Google Scholar] [CrossRef] [Green Version]
  37. Chlebek, J.; Novák, Z.; Kassemová, D.; Šafratová, M.; Kostelník, J.; Malý, L.; Ločárek, M.; Opletal, L.; Hošt’álková, A.; Hrabinová, M.; et al. Isoquinoline Alkaloids from Fumaria officinalis L. and Their Biological Activities Related to Alzheimer’s Disease. Chem. Biodivers. 2016, 13, 91–99. [Google Scholar] [CrossRef]
  38. Hostalkova, A.; Marikova, J.; Opletal, L.; Korabecny, J.; Hulcova, D.; Kunes, J.; Novakova, L.; Perez, D.I.; Jun, D.; Kucera, T.; et al. Isoquinoline Alkaloids from Berberis vulgaris as Potential Lead Compounds for the Treatment of Alzheimer’s Disease. J. Nat. Prod. 2019, 82, 239–248. [Google Scholar] [CrossRef]
  39. Huang, Y.; Erdmann, N.; Peng, H.; Zhao, Y.; Zheng, J. The role of TNF related apoptosis-inducing ligand in neurodegenerative diseases. Cell. Mol. Immunol. 2005, 2, 113–122. [Google Scholar]
  40. Cantarella, G.; Di Benedetto, G.; Puzzo, D.; Privitera, L.; Loreto, C.; Saccone, S.; Giunta, S.; Palmeri, A.; Bernardini, R. Neutralization of TNFSF10 ameliorates functional outcome in a murine model of Alzheimer’s disease. Brain 2015, 138, 203–216. [Google Scholar] [CrossRef]
  41. Frenkel, D. A new TRAIL in Alzheimer’s disease therapy. Brain 2014, 138, 8–10. [Google Scholar] [CrossRef] [Green Version]
  42. Zhang, S.; Tang, M.-b.; Luo, H.-y.; Shi, C.-h.; Xu, Y.-m. Necroptosis in neurodegenerative diseases: A potential therapeutic target. Cell Death Dis. 2017, 8, e2905. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  43. Campanella, C.; Pace, A.; Bavisotto, C.; Marzullo, P.; Marino Gammazza, A.; Buscemi, S.; Palumbo Piccionello, A. Heat Shock Proteins in Alzheimer’s Disease: Role and Targeting. Int. J. Mol. Sci. 2018, 19, 2603. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  44. Blommaart, E.F.; Krause, U.; Schellens, J.P.; Vreeling-Sindelarova, H.; Meijer, A.J. The phosphatidylinositol 3-kinase inhibitors wortmannin and LY294002 inhibit autophagy in isolated rat hepatocytes. Eur. J. Biochem. 1997, 243, 240–246. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  45. Jackisch, R.; Kruchen, A.; Sauermann, W.; Hertting, G.; Feuerstein, T.J. The antiparkinsonian drugs budipine and biperiden are use-dependent (uncompetitive) NMDA receptor antagonists. Eur. J. Pharmacol. 1994, 264, 207–211. [Google Scholar] [CrossRef]
  46. Thathiah, A.; De Strooper, B. The role of G protein-coupled receptors in the pathology of Alzheimer’s disease. Nat. Rev. Neurosci. 2011, 12, 73–87. [Google Scholar] [CrossRef]
  47. Hua, L.L.; Wen-Na, P.; Yu, D.; Jing-Jing, L.; Xiang-Rong, T. Action of trichostatin A on Alzheimer’s disease-like pathological changes in SH-SY5Y neuroblastoma cells. Neural Regen. Res. 2020, 15, 293–301. [Google Scholar]
  48. Nunes, M.J.; Moutinho, M.; Gama, M.J.; Rodrigues, E. Trichostatin A, a histone deacetylase inhibitor, modulates signaling pathways involved in the control of neuronal cholesterol metabolism. Alzheimer’s Dement. J. Alzheimer’s Assoc. 2013, 9, 360. [Google Scholar] [CrossRef]
  49. Quintremil, S.; Medina-Ferrer, F.; Puente, J.; Pando, M.E.; Valenzuela, M.A. Roles of Semaphorins in Neurodegenerative Diseases. In Neurons—Dendrites and Axons; Hernández-Aguilar, M.E., Aranda-Abreu, G.E., Eds.; IntechOpen: London, UK, 2019. [Google Scholar]
  50. Wang, R.; Reddy, P.H. Role of Glutamate and NMDA Receptors in Alzheimer’s Disease. J. Alzheimers Dis. 2017, 57, 1041–1048. [Google Scholar] [CrossRef] [Green Version]
  51. Feng, L.R.; Fernández-Martínez, J.L.; Zaal, K.J.M.; Wolff, B.S.; Saligan, L.N. mGluR5 mediates post-radiotherapy fatigue development in cancer patients. Transl. Psychiatry 2018, 8, 110. [Google Scholar] [CrossRef]
  52. Docagne, F.; Gabriel, C.; Lebeurrier, N.; Lesné, S.; Hommet, Y.; Plawinski, L.; Mackenzie, E.T.; Vivien, D. Sp1 and Smad transcription factors co-operate to mediate TGF-beta-dependent activation of amyloid-beta precursor protein gene transcription. Biochem. J. 2004, 383, 393–399. [Google Scholar] [CrossRef] [Green Version]
  53. Kim, H.; Kwon, Y.A.; Ahn, I.S.; Kim, S.; Kim, S.; Jo, S.A.; Kim, D.K. Overexpression of Cell Cycle Proteins of Peripheral Lymphocytes in Patients with Alzheimer’s Disease. Psychiatry Investig. 2016, 13, 127–134. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  54. Sood, S.; Gallagher, I.J.; Lunnon, K.; Rullman, E.; Keohane, A.; Crossland, H.; Phillips, B.E.; Cederholm, T.; Jensen, T.; van Loon, L.J.C.; et al. A novel multi-tissue RNA diagnostic of healthy ageing relates to cognitive health status. Genome Biol. 2015, 16, 185. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  55. Pilling, L.C.; Harries, L.W.; Hernandez, D.G.; Singleton, A.B.; AKuchel, G.; Ferrucci, L.; Melzer, D. The reported healthy ageing gene expression score: Lack of predictive value in two cohorts. bioRxiv 2015. [Google Scholar] [CrossRef]
  56. Jacob, L.; Speed, T.P. The healthy ageing gene expression signature for Alzheimer’s disease diagnosis: A random sampling perspective. bioRxiv 2018. [Google Scholar] [CrossRef] [Green Version]
  57. Fernández-Martínez, J.L.; Fernández-Muñiz, Z.; Tompkins, M.J. On the topography of the cost functional in linear and nonlinear inverse problems. Geophysics 2012, 77, W1–W5. [Google Scholar] [CrossRef]
  58. Fernández-Martínez, J.L.; Pallero, J.L.G.; Fernández-Muñiz, Z.; Pedruelo-González, L.M. From Bayes to Tarantola: New insights to understand uncertainty in inverse problems. J. Appl. Geophys. 2013, 98, 62–72. [Google Scholar] [CrossRef]
  59. Fernández-Martínez, J.L.; Fernández-Muñiz, Z. The curse of dimensionality in inverse problems. J. Comput. Appl. Math. 2019, in press. [Google Scholar]
  60. Fernández-Muñiz, Z.; Khaniani, H.; Fernández-Martínez, J.L. Data kit inversion and uncertainty analysis. J. Appl. Geophys. 2019, 161, 228–238. [Google Scholar] [CrossRef]
  61. Cernea, A.; Fernández-Martínez, J.L.; deAndrés-Galiana, E.J.; Galvan, J.A.; Garcia-Pravia, C. Analysis of Clinical Prognostic Variables for Triple Negative Breast Cancer Histological Grading and Lymph Node Metastasis. J. Med. Inform. Decis. Mak. 2018, 1, 14. [Google Scholar] [CrossRef]
  62. DeAndrés-Galiana, E.J.; Bea, G.; Fernández-Martínez, J.L.; Saligan, L.N. Analysis of defective pathways and drug repositioning in Multiple Sclerosis via machine learning approaches. Comput. Biol. Med. 2019, 115, 103492. [Google Scholar] [CrossRef]
  63. Cernea, A.; Fernández-Martínez, J.L.; deAndrés-Galiana, E.J.; Fernández-Muñiz, Z.; Bermejo-Millo, J.C.; González-Blanco, L.; Solano, J.J.; Abizanda, P.; Coto-Montes, A.; Caballero, B. Prognostic networks for unraveling the biological mechanisms of Sarcopenia. Mech. Ageing Dev. 2019, 182, 111129. [Google Scholar] [CrossRef] [PubMed]
  64. Eiro, N.; Cid, S.; Fernández, B.; Fraile, M.; Cernea, A.; Sánchez, R.; Andicoechea, A.; deAndrés-Galiana, E.J.; Gónzalez, L.O.; Fernández-Muñiz, Z.; et al. MMP11 expression in intratumoral inflammatory cells in breast cancer. Histopathology 2019, 75, 916–930. [Google Scholar] [CrossRef] [PubMed]
  65. Fernández-Martínez, J.L.; Álvarez, O.; deAndrés-Galiana, E.J.; de la Viña, J.F.S.; Huergo, L. Robust sampling of altered pathways for drug repositioning reveals promising novel therapeutics for inclusion body Myositis. J. Rare Dis. Res. Treat. 2019, 4, 7–15. [Google Scholar] [CrossRef]
  66. Cernea, A.; Fernández-Martínez, J.L.; deAndrés-Galiana, E.J.; Fernández-Ovies, F.J.; Fernández-Muñiz, Z.; Alvarez-Machancoses, Ó.; Saligan, L.; Sonis, S.T. Sampling Defective Pathways in Phenotype Prediction Problems via the Fisher’s Ratio Sampler. In International Conference on Bioinformatics and Biomedical Engineering; Springer: Cham, Switzerland, 2018; Volume 10814, pp. 15–23. [Google Scholar]
  67. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  68. Pang, H.; Lin, A.; Holford, M.; Enerson, B.E.; Lu, B.; Lawton, M.P.; Floyd, E.; Zhao, H. Pathway analysis using random forests classification and regression. Bioinformatics 2006, 22, 2028–2036. [Google Scholar] [CrossRef] [Green Version]
  69. Cernea, A.; Fernández-Martínez, J.L.; deAndrés-Galiana, E.J.; Fernández-Ovies, F.J.; Fernández-Muñiz, Z.; Alvarez-Machancoses, Ó.; Saligan, L.; Sonis, S.T. Comparison of different sampling algorithms for phenotype prediction. In International Conference on Bioinformatics and Biomedical Engineering; Springer: Cham, Switzerland, 2018; Volume 10814, pp. 33–45. [Google Scholar]
  70. Jiang, X.; Barmada, M.M.; Visweswaran, S. Identifying genetic interactions in genome-wide data using Bayesian networks. Genet. Epidemiol. 2010, 34, 575–581. [Google Scholar] [CrossRef] [Green Version]
  71. Su, C.; Andrew, A.; Karagas, M.R.; Borsuk, M.E. Using Bayesian networks to discover relations between genes, environment and disease. BioData Min. 2013, 6, 6. [Google Scholar] [CrossRef] [Green Version]
  72. Saligan, L.N.; Fernández-Martínez, J.L.; deAndrés-Galiana, E.J.; Sonis, S.T. Supervised classification by filter methods and recursive feature elimination predicts risk of radiotherapy-related fatigue in patients with prostate cancer. Cancer Inf. 2014, 13, 141–152. [Google Scholar] [CrossRef] [Green Version]
  73. Fernández-Martínez, J.L.; deAndrés-Galiana, E.J.; Sonis, S.T. Genomic data integration in chronic lymphocytic leukemia. J. Gene Med. 2017, 19, e2936. [Google Scholar] [CrossRef]
  74. deAndrés-Galiana, E.J.; Fernández-Martínez, J.L.; Luaces, O.; Del Coz, J.J.; Fernandez, R.; Solano, J.; Nogués, E.A.; Zanabilli, Y.; Alonso, J.M.; Payer, A.R.; et al. On the prediction of Hodgkin Lymphoma treatment response. Clin. Transl. Oncol. 2015, 17, 612–619. [Google Scholar] [CrossRef]
  75. deAndrés-Galiana, E.J.; Fernández-Martínez, J.L.; Luaces, O.; del Coz, J.J.; Huergo-Zapico, L.; Acebes-Huerta, A.; González, S.; González-Rodríguez, A.P. Analysis of clinical prognostic variables for Chronic Lymphocytic Leukemia decision-making problems. J. Biomed. Inform. 2016, 60, 342–351. [Google Scholar] [CrossRef] [PubMed]
  76. Efron, B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979, 7, 1–26. [Google Scholar] [CrossRef]
Figure 1. Correlation network for the LOAD vs. Healthy Control phenotype.
Figure 1. Correlation network for the LOAD vs. Healthy Control phenotype.
Ijms 21 03594 g001
Figure 2. Correlation network for MCI vs. Healthy Control phenotype.
Figure 2. Correlation network for MCI vs. Healthy Control phenotype.
Ijms 21 03594 g002
Figure 3. Correlation network for MCI vs. LOAD phenotypes.
Figure 3. Correlation network for MCI vs. LOAD phenotypes.
Ijms 21 03594 g003
Figure 4. Linear regression model. Ellipse of uncertainty for a relative misfit of 15% and different sets of model parameters a 0 ,   a 1 found in the different bagging experiment. It can be observed that these models sample the region of uncertainty within the ellipse of 15% relative misfit. This example is very important to understand that the list of genes that equally explain a phenotype is not unique, and one simple method to sample these high discriminatory genetic networks is by performing random holdout, looking for the minimum-scale signatures that better explain the phenotype in each holdout, and finding the most-frequently sampled genes in these signatures, that are similar in phenotype prediction problems to the points (model parameters) located within the ellipse of this simple regression problem.
Figure 4. Linear regression model. Ellipse of uncertainty for a relative misfit of 15% and different sets of model parameters a 0 ,   a 1 found in the different bagging experiment. It can be observed that these models sample the region of uncertainty within the ellipse of 15% relative misfit. This example is very important to understand that the list of genes that equally explain a phenotype is not unique, and one simple method to sample these high discriminatory genetic networks is by performing random holdout, looking for the minimum-scale signatures that better explain the phenotype in each holdout, and finding the most-frequently sampled genes in these signatures, that are similar in phenotype prediction problems to the points (model parameters) located within the ellipse of this simple regression problem.
Ijms 21 03594 g004
Figure 5. Flow chart of the machine learning methodology.
Figure 5. Flow chart of the machine learning methodology.
Ijms 21 03594 g005
Table 1. Late-onset Alzheimer’s Disease (LOAD) vs. healthy controls (HC). List of most discriminatory genes with a Fisher’s ratio higher than 0.8. All these genes are underexpressed in LOAD.
Table 1. Late-onset Alzheimer’s Disease (LOAD) vs. healthy controls (HC). List of most discriminatory genes with a Fisher’s ratio higher than 0.8. All these genes are underexpressed in LOAD.
GeneMean-HCStd-HCMean-ADStd-ADFCFR (log)Accuracy
MRPL51703.3183.20472.4121.550.571.2978.71
CETN2867.0153.21683.0118.880.341.1979.52
LOC40120618,073.75002.5212,083.04278.870.581.1779.12
RPL36AL7168.52178.394527.81711.000.661.1678.71
LOC6462003563.21854.661760.41074.051.021.0979.12
RPS2518,110.85359.1111,732.24441.810.631.0477.91
RPA3450.8111.78332.480.430.440.9977.11
RPS27A17,344.83217.7113,010.23989.020.410.9677.11
LOC6536581550.3883.60823.9599.030.910.9377.11
LOC6480002445.11824.271119.21114.321.130.9275.90
LOC6502765630.23807.082744.52278.711.040.9075.10
MRPL33401.180.33328.156.050.290.8874.70
RPL173505.52669.201567.41607.241.160.8774.30
CALML4542.0125.05411.487.250.400.8774.30
RPL36AL2909.1975.741789.1743.330.700.8674.70
TOMM73249.92034.701607.31223.541.020.8574.30
PSMC2815.7229.18610.1179.020.420.8478.31
COX171047.2329.17713.6192.860.550.8374.30
SNRPB2869.3239.44649.1161.490.420.8377.51
RPL614,487.94110.8211,020.4 3290.070.390.8275.90
LOC7313654091.61437.092900.31191.930.500.8177.11
ATP5J21179.6251.01961.4170.490.300.8078.71
LOC6464834230.11677.962885.71243.920.550.8076.71
Table 2. LOAD vs. HC. Holdout sampler.
Table 2. LOAD vs. HC. Holdout sampler.
GeneMean H-CMean A-DFCFRFrequency
RPL36AL7168.474527.830.661.082.31
MRPL51703.29472.420.571.012.29
CETN2867.00683.010.341.052.27
LOC40120618,073.7012,082.960.581.122.26
RPS27A17,344.7713,010.210.411.072.18
LOC6462003563.181760.361.020.531.99
RPS2518,110.8511,732.220.630.981.92
LOC6536581550.32823.920.910.611.83
RPA3450.84332.380.440.761.80
RPL36AL2909.071789.050.700.881.54
LOC6480002445.091119.191.130.421.44
LOC7313654091.552900.260.500.651.44
LOC6502765630.212744.461.040.561.42
COX171047.23713.580.550.621.37
CALML4542.04411.370.400.691.37
PSMC2815.75610.090.420.681.33
RPL173505.481567.421.160.421.30
MRPL33401.15328.140.290.791.28
ATP5J21179.56961.400.300.721.25
SNRPB2869.31649.120.420.631.24
ATP5EP211,802.467906.310.580.691.21
MRPL331461.801115.560.390.631.18
ATP5O1523.69952.530.680.531.16
TOMM73249.901607.261.020.411.10
LOC6464834230.072885.690.550.591.08
RPS175544.053406.530.700.631.03
METAP2595.99449.620.410.561.03
RPL614,487.8811,020.430.390.741.02
GNL2388.71320.970.280.711.01
ING3376.85282.530.420.561.00
RPL66640.204629.360.520.600.99
PSMC6462.83343.360.430.520.97
FXR1297.54249.310.260.660.93
TINP11437.501018.480.500.540.91
NDUFA13597.831604.431.170.320.84
LOC6486222989.301499.561.000.480.79
CCDC90B498.24419.100.250.690.76
DNAJA11628.19993.450.710.380.75
RPL311107.21589.020.910.280.74
SSBP1922.99754.560.290.750.74
LOC2859001831.821197.380.610.410.71
Most discriminatory genes sampled in different networks for the control vs. LOAD phenotype (sampling frequency higher than 0.7%). The sampling frequency is the number of times that a gene appears in the whole set of sampled genes. In this case a sampling frequency of 2.31% in a set of 43202 sampled genes in 1000 holdouts implies that this gene has been sampled 997 times. The sampling frequency could be also defined with respect to the number of holdouts and the first gene would have a sampling frequency of 99.7%. In any case this figure is a way of ranking the relative importance of each gene. We also provide the mean of the expression in each group, the fold change, the Fisher’s ratio, and the sampling frequency. All these genes are underexpressed in LOAD.
Table 3. LOAD vs. HC. Fisher’s sampler. Main genes found by the Fisher’s ratio sampler with a sampling frequency greater than 0.35.
Table 3. LOAD vs. HC. Fisher’s sampler. Main genes found by the Fisher’s ratio sampler with a sampling frequency greater than 0.35.
GeneMean-HCMean-ADFCFRFrequency
HSP90AA12538.091641.600.630.540.50
PSMC6463.77342.410.440.550.49
LOC6464834228.562893.520.550.530.48
RPL66647.434635.210.520.550.48
TMSB1011,572.199995.740.210.530.48
ARPC35021.073521.160.510.540.46
RPL395655.353956.080.520.540.46
RAB37862.351065.21−0.300.560.45
SNRPB2867.06650.200.420.550.44
RPS205026.683907.360.360.570.44
NDUFA41343.43847.620.660.530.44
GNL3272.90234.360.220.550.44
TCEAL4430.12363.540.240.590.44
ATP5F11737.341318.290.400.560.43
PCM1524.83453.960.210.540.42
PCMT11268.601055.060.270.570.41
RARS580.36501.850.210.580.41
BOLA3416.09354.090.230.560.41
KIAA0913882.611035.95−0.230.560.40
IGBP1501.94404.700.310.560.40
SCFD1638.10539.690.240.580.40
APBB3770.11910.99−0.240.580.39
TRABD2274.772621.10−0.200.580.38
TROVE2312.54351.15−0.170.590.37
NXF11187.731373.17−0.210.590.36
TAX1BP11297.46961.120.430.600.35
RPS2717,768.6812,737.750.480.580.35
SDCCAG10312.28266.720.230.600.35
C11ORF103247.372775.150.230.570.35
ACAT1392.11314.890.320.570.35
LOC7294663578.282644.660.440.590.35
RPS175544.943407.060.700.600.35
Table 4. LOAD vs. HC. Random Forest Sampler. Main genes found by the Random Forest sampler with a sampling frequency greater than 0.19.
Table 4. LOAD vs. HC. Random Forest Sampler. Main genes found by the Random Forest sampler with a sampling frequency greater than 0.19.
GeneMean-HCMean-ADFCFRFrequency
LOC40120614.0813.480.061.170.25
MRPL519.418.840.091.290.25
CETN29.749.400.051.190.24
MRPL3310.4710.090.050.760.24
RPS27A14.0613.600.050.960.24
RPL36AL11.4110.690.100.860.24
RPL3213.4313.120.030.700.23
SNTB29.359.63−0.040.740.23
RPL36AL12.7412.050.081.160.23
LOC38872014.1913.740.050.670.23
RPS2514.0813.430.071.040.22
BOLA28.088.32−0.040.710.21
ATP6V1E110.7310.310.060.640.21
PIGF8.208.020.030.550.21
SSBP19.829.530.040.770.21
RPL613.7613.370.040.820.21
NXF110.1910.41−0.030.350.21
PSMC29.619.200.060.840.21
SCFD19.309.060.040.750.21
MRPS178.117.910.030.630.21
COX179.969.430.080.830.21
ARPC312.2011.730.060.630.20
RPS2713.9613.480.050.440.20
CWF19L28.127.910.040.560.20
AK28.878.690.030.470.20
NDUFA410.119.540.080.610.20
RARS9.158.950.030.590.20
BOLA38.688.440.040.640.20
GNL38.077.860.040.540.20
CALML49.058.660.060.870.20
PCM19.028.800.030.490.20
CDC269.539.250.040.680.20
DNAJC78.538.340.030.310.19
SULT1A38.378.63−0.040.680.19
SDCCAG108.278.050.040.540.19
MRFAP1L19.829.470.050.550.19
Table 5. LOAD vs. Control. Pathways analysis obtained via different genetic samplers.
Table 5. LOAD vs. Control. Pathways analysis obtained via different genetic samplers.
SamplerLOAD vs. Healthy Control
Holdout samplerViral mRNA translation,
Influenza viral RNA transcription and replication,
Gene expression, Mitochondrial translation,
rRNA processing in the nucleus and cytosol,
Metabolism of proteins, Organelle biogenesis,
HIV life cycle, Antigen, TCR signaling
Fisher’s samplerTranslation, Influenza life cycle,
SRP-dependent co-translational protein targeting to membrane,
Peptide chain elongation,
Infectious disease, Influenza infection,
Influenza viral RNA transcription and replication.
Random ForestSelenoamino acid metabolism
Viral mRNA translation
Peptide chain elongation
Selenocysteine synthesis
Eukaryotic translation termination.
Table 6. Mild Cognitive Impairment (MCI) vs. HC. List of most discriminatory genes with Fisher’s ratio higher than 1.0. In this list only two genes are overexpressed in MCI (RPS41Y and DENND1C). Overexpressed genes are shown in bold.
Table 6. Mild Cognitive Impairment (MCI) vs. HC. List of most discriminatory genes with Fisher’s ratio higher than 1.0. In this list only two genes are overexpressed in MCI (RPS41Y and DENND1C). Overexpressed genes are shown in bold.
GeneMean-HCStd-HCMean-MCIStdC-MCIFCFRAccuracy
TAX1BP11300.3479.32833.5326.660.641.4769.02
RPS4Y11388.91508.671567.61416.45−0.171.3269.02
LOC40120618,073.75002.5212,518.93979.850.531.3271.74
RPL173505.52669.201222.91204.771.521.2467.93
ATP5F11737.2487.801187.0358.510.551.2169.57
SNX2828.8220.12625.0130.120.411.2070.65
SUB1274.867.03216.732.540.341.1871.20
LOC6486222989.32024.661182.5870.251.341.1469.57
DENND1C415.993.32504.082.49−0.281.1470.11
LOC6480002445.11824.27882.1741.591.471.1470.65
VBP1442.5127.57329.678.390.421.1469.02
LOC6502765630.23807.082312.92015.901.281.1269.57
PSMC2815.7229.18584.5161.750.481.1269.02
VBP1597.2200.94422.0128.840.501.1269.02
LYPLAL1389.969.42318.249.620.291.1269.57
HIGD1A472.3134.84354.785.620.411.1170.11
ATP6V1G11137.9494.51653.5253.770.801.1069.57
RPL311107.2801.24440.5334.261.331.1068.48
PNRC2873.2273.86621.0211.620.491.0969.02
HSP90AA12542.91091.301496.2747.870.771.0770.11
RPS3A1492.7948.63752.6625.260.991.0770.65
NDUFA41342.2772.90645.2291.731.061.0770.65
MRPL3537.4197.46366.9120.140.551.0771.20
RPA3450.8111.78323.465.800.481.0670.11
MRPL33401.180.33315.253.040.351.0668.48
LOC6473401228.3530.94754.3349.220.701.0668.48
VAMP7544.7134.08420.389.050.371.0567.93
PSMC6462.8179.02313.5116.150.561.0467.93
ANAPC131056.1248.28773.8178.280.451.0367.93
RPS2518,110.85359.1112,048.63951.720.591.0367.39
VPS29987.2340.14668.5209.150.561.0367.93
ACAT1392.9104.10299.365.650.391.0367.93
RPS3A1050.6618.94581.0402.600.851.0267.93
CRBN468.9127.79352.192.430.411.0269.57
HINT11849.41022.091046.0741.480.821.0071.74
RPS271562.31128.84651.8471.561.261.0071.20
Table 7. MCI vs. HC. Holdout sampler.
Table 7. MCI vs. HC. Holdout sampler.
GeneMean-HCMean-MCIFCFRFrequency
DENND1C415.91503.99−0.281.020.41
LOC6486222989.30 1182.471.340.680.41
LOC40120618,073.7012,518.850.531.220.41
SULT1A3337.68413.27−0.290.930.41
ATP5F11737.251186.980.551.040.41
SNX2828.84624.980.411.050.40
HSP90AA12542.861496.210.770.790.40
LOC6502765630.212312.931.280.650.40
LYPLAL1389.93318.160.290.940.40
TAX1BP11300.26833.540.641.290.40
NDUFA41342.20645.231.060.730.40
PSMC2815.75584.520.480.880.40
ANAPC131056.07773.770.450.960.40
RPS3A1492.70752.640.990.620.39
TINP11437.50951.690.590.730.39
VAMP7544.66420.270.371.040.39
RPS3A1050.62581.040.850.610.39
RPS271562.31651.801.260.450.39
HIGD1A472.27354.740.410.930.38
MITD1429.65343.490.320.900.38
MRPL3537.40366.900.550.770.38
RPA3450.84323.390.480.830.38
IGBP1503.89384.590.390.850.38
LOC6480002445.09882.141.470.530.38
SLC35A1559.43431.560.370.840.38
PSMC6462.83313.500.560.740.38
RARS579.64476.570.280.880.38
UBE2E1932.67641.560.540.730.38
TMEM126B448.51336.730.410.760.38
Most discriminatory genes sampled in different networks for the control vs. MCI phenotype (sampling frequency higher than 0.38%). We also provide the mean of the expression in each group, the fold change, the Fisher’s ratio, and the sampling frequency. Overexpressed genes are shown in bold.
Table 8. MCI vs. LOAD. List of most discriminatory genes with Fisher’s ratio higher than 0.25.
Table 8. MCI vs. LOAD. List of most discriminatory genes with Fisher’s ratio higher than 0.25.
GeneMean-MCIStd-MCIMean-ADStdC-ADFCFRAccuracy
RPS4Y11567.61416.451065.81351.050.561.4562.22
HLA-DRB1923.5830.82699.0686.500.401.0863.11
JARID1D281.4112.52243.9109.280.210.8362.22
HS.546019229.163.27258.874.76−0.180.5662.22
FYB2356.8737.832861.7823.29−0.280.5064.00
XIST265.087.63324.2140.46−0.290.4461.78
ZFX215.719.00228.820.44−0.090.3764.44
CCDC32226.117.09236.420.15−0.060.3662.67
SP3345.7102.33417.9141.78−0.270.3664.89
CREB5977.0380.801243.8457.64−0.350.3467.11
STK17B357.171.26402.597.13−0.170.3365.78
LOC653855182.47.36178.77.710.030.3267.56
ADAM10235.228.78253.835.35−0.110.3266.67
ACSL4370.0107.89451.5158.92−0.290.3164.89
RNF13476.8113.40558.3151.25−0.230.3064.00
FAM96A612.2160.47703.0209.88−0.200.2964.89
GNG10501.8157.68643.7277.03−0.360.2964.44
FBXO11872.8212.75990.0224.71−0.180.2964.89
HS.538581182.38.11178.17.040.030.2965.78
WDR5325.926.74312.129.540.060.2866.22
HS.537004331.462.88371.181.60−0.160.2766.67
CBFB389.183.57433.889.55−0.160.2766.22
CENTB2390.979.87437.693.86−0.160.2765.33
EML31472.5261.471333.6277.310.140.2764.89
HS.460758169.56.21174.07.99−0.040.2765.78
HS.566890207.312.17202.412.360.030.2667.11
ACD395.848.72369.851.790.100.2665.33
DDX3X1225.7390.461497.1501.69−0.290.2666.22
BAP1333.039.25317.339.280.070.2665.78
AZIN1601.1121.81679.0160.32−0.180.2664.89
HS.130036284.155.50318.867.07−0.170.2665.33
DDX46284.734.04300.735.77−0.080.2666.67
RCBTB2204.516.37211.916.37−0.050.2666.22
AMD1544.7142.10613.1166.94−0.170.2665.78
Table 9. MCI vs. LOAD. Most discriminatory genes sampled in different networks for the LOAD vs. MCI phenotype. Genes with sampling frequency higher than 0.5.
Table 9. MCI vs. LOAD. Most discriminatory genes sampled in different networks for the LOAD vs. MCI phenotype. Genes with sampling frequency higher than 0.5.
GeneMean-LOADMean-MCIFCFRFrequency
HLA-DRB1699.02923.49−0.40.392.04
FYB2861.742356.840.280.451.91
HS.546019258.81229.120.180.441.87
CCDC32236.43226.060.060.351.81
SP3417.86345.680.270.261.7
RPS4Y11065.751567.62−0.560.721.68
ZFX228.76215.660.090.361.62
XIST324.242650.290.271.46
CREB51243.82976.990.350.271.41
ACSL4451.53370.020.290.221.33
LOC653855178.65182.43−0.030.311.25
ADAM10253.83235.20.110.31.19
JARID1D243.87281.42−0.210.591.14
RNF13558.32476.840.230.261.14
CBFB433.83389.130.160.261.06
STK17B402.48357.120.170.291.04
FBXO11990.02872.760.180.271.02
PRKY231.85260.04−0.170.160.94
HS.130036318.83284.090.170.220.94
HS.537004371.06331.410.160.250.83
ACD369.76395.81−0.10.270.73
GNG10643.73501.820.360.170.71
DDX3X1497.141225.720.290.240.71
FAM96A703.02612.230.20.20.69
CENTB2437.57390.90.160.270.69
HS.538581178.06182.28−0.030.280.69
WDR5312.12325.87−0.060.280.64
SNORA25340.48319.20.090.20.62
BAP1317.32333−0.070.250.62
HS.460758174.03169.520.040.270.58
B2M11,415.829954.210.20.190.56
Table 10. MCI+LOAD vs. HC.
Table 10. MCI+LOAD vs. HC.
GeneMean HCStd HCMean
MCI-AD
Std
MCI-AD
FCFRAccuracy
LOC40120618,073.75002.5212,237.94171.30.561.2273.86
MRPL51703.3183.2480.2121.310.551.1976.9
TAX1BP11300.3479.32915.3368.370.511.0476.29
RPS2518,110.85359.1111,844.74267.770.611.0476.6
LOC6502765630.23807.0825912194.121.121.0275.68
RPL36AL7168.52178.394716.91728.290.61.0274.47
RPA3450.8111.78329.275.530.451.0175.99
LOC6462003563.21854.661764.91008.411.011.0175.08
LOC6480002445.11824.271034.91002.561.241.0075.08
RPL173505.52669.21444.91483.191.281.0075.68
MRPL33401.180.33323.555.230.310.9175.38
LOC6536581550.3883.6801.8531.340.950.9175.99
CETN2867153.21700.3121.750.310.9176.29
CALML4542125.0541083.480.40.8976.6
PSMC2815.7229.18601173.150.440.8976.9
RPS27A17,344.83217.7113,249.43883.870.390.8874.77
TOMM73249.92034.71547.51174.841.070.8774.77
RPS175544.12933.073241.92085.190.770.8775.99
SNX2828.8220.12651.9168.480.350.8775.68
PSMC6462.8179.02332.7132.180.480.8776.6
ATP5F11737.2487.81272.6411.820.450.8776.9
TINP11437.5490.98994.7405.240.530.8675.99
SNRPB2869.3239.44641.7151.970.440.8577.2
LYPLAL1389.969.42330.257.480.240.8575.38
RPL311107.2801.24536.2491.171.050.8475.38
RPL36AL2909.1975.741803.5691.870.690.8375.08
RPL614,487.94110.8210,956.53335.450.40.8375.68
LOC6486222989.32024.661386.81104.761.110.8375.38
ATP5O1523.7602.04939420.630.70.8375.38
SULT1A3337.765.75406.164.14−0.270.8275.99
GNL2388.772.09319.246.160.280.8275.68
LOC6464834230.11677.962869.11207.020.560.8275.99
ACAT1392.9104.1309.370.10.350.8275.68
MRPL331461.8356.331113.6260.920.390.8275.68
LOC388532902.7532.22499.6301.650.850.8175.68
COX171047.2329.17719.8183.950.540.8175.99
SSBP1923180.74750.8148.850.30.8176.29
List of most discriminatory genes with Fisher’s ratio higher than 0.8. Only one gene (in bold face) is overexpressed in MCI + LOAD. This list contains several discriminatory genes found in each individual comparison (LOAD vs. HC and MCI vs. HC). Overexpressed genes are shown in bold.
Table 11. MCI + LOAD vs. HC. Most discriminatory genes sampled in different networks for the LOAD vs. MCI phenotype. Genes with sampling frequency higher than 0.36.
Table 11. MCI + LOAD vs. HC. Most discriminatory genes sampled in different networks for the LOAD vs. MCI phenotype. Genes with sampling frequency higher than 0.36.
GeneMean HCMean
MCI-AD
FCFRFreq.
LOC6536581550.32761.681.030.680.37
LOC6486222989.31182.471.340.680.37
PPP2R3C726.33561.480.370.80.37
LOC6480002445.09882.141.470.530.37
SULT1A3337.68413.27−0.290.930.37
MITD1429.65343.490.320.90.37
LOC6502765630.212312.931.280.650.37
VAMP7544.66420.270.371.040.37
SNX2828.84624.980.411.050.37
MRPL3537.4366.90.550.770.37
ATP5F11737.251186.980.551.040.37
EIF2A414.86328.840.340.820.37
NDUFA41342.2645.231.060.730.37
DENND1C415.91503.99−0.281.020.37
PSMC2815.75584.520.480.880.37
LOC40120618,073.712,518.850.531.220.37
TAX1BP11300.26833.540.641.290.37
ANAPC131056.07773.770.450.960.37
SSBP1922.99743.880.310.920.37
RPS3A1492.7752.640.990.620.37
LYPLAL1389.93318.160.290.940.37
PNRC2873.24620.990.490.930.37
CRBN468.9352.140.410.860.37
HSP90AA12542.861496.210.770.790.37
PSMC6462.83313.50.560.740.36
RPL173505.481222.941.520.560.36
HIGD1A472.27354.740.410.930.36
DYNLT3260.74216.350.270.740.36
RPS271562.31651.81.260.450.36
GPBP1610.75470.140.380.790.36
IGBP1503.89384.590.390.850.36
MRPL33401.15315.190.350.960.36
INPPL1780.71958.41−0.30.860.36
SLC35A1559.43431.560.370.840.36
UBE2E1932.67641.560.540.730.36
Table 12. Main results obtained for all the comparisons via the holdout sampler.
Table 12. Main results obtained for all the comparisons via the holdout sampler.
Item LOAD vs. HCMCI vs. HCLOAD vs. MCI
Most Predictive Genetic SignatureMRPL51, CETN2, LOC401206, RPA3, PSMC2, ATP5J2, LOC648622, SNTB2, LSM, EIF3E, DNAJA1, RPAP3, RPS17, ERCC5, LOC401397, RPS3A, SNRPD2, CCDC34, LOC440567, ATP5H, ANXA1.TAX1BP1, LOC401206, RPL17, ATP5F1, LOC648622, VBP1, LOC650276, VBP1, ATP6V1G1, VAMP7, RPS25, RPS3A, LOC646483.RPS4Y1, FAM96A, HS.460758, CLEC2A, SNORA25, SMCHD1, GIMAP2, MAP2K2
Predictive Accuracy84%81.5%74%
Pathways
(High score matches)
Viral MRNA Translation, Influenza Viral RNA Transcription and Replication, Gene Expression, Mitochondrial translation, RRNA Processing in the nucleus and cytosol, Metabolism, Metabolism of proteins, Organelle Biogenesis,
HIV Life Cycle, Antigen, TCR signaling.
Viral MRNA Translation, Gene Expression, Influenza Viral RNA Transcription and Replication, Metabolism of proteins, RRNA Processing in the Nucleus and cytosol, Ubiquitin-Proteasome Proteolysis, Antigen Processing, Cell cycle checkpoints, Metabolism, HIV Life Cycle, Mitotic Metaphase and Anaphase, CLEC7A (Dectin-1 signaling), Cellular Senescence, TCR signaling,...Regulation of Activated PAK-2p34 By Proteasome Mediated Degradation,
G-protein Signaling Regulation of P38 and JNK Signaling Mediated By G-proteins
Biological Processes
(High score matches)
Translation, Nuclear-transcribed MRNA Catabolic Process, SRP-dependent Cotranslational Protein Targeting to Membrane, Proton Transport, Translation initiation, Viral Transcription, ATP synthesis, Mitochondrial Translation Termination and Elongation, ATP Biosynthetic Process, RRNA Processing,... Translation, Nuclear-transcribed MRNA Catabolic Process, Translation initiation, termination and elongation, SRP-dependent Cotranslational Protein Targeting to Membrane, Viral Transcription, NIK/NF-KappaB Signaling, Regulation of MRNA Stability,Positive Regulation of G1/S Transition of Mitotic Cell Cycle, Regulation of MRNA Stability.
Molecular Functions
(High score matches)
Structural Constituent of Ribosome, Poly(A) RNA Binding, ATPase Activity, RNA binding, Protein Binding, ATP Synthase Activity.Poly(A) RNA Binding, Protein Binding, RNA binding, Structural Constituent of Ribosome,
ATP Activity.
Protein Binding, Translation Initiation Factor Binding.
Table 13. Main drugs identified by Robust Sampling of genetic pathways via Dr. Insights.
Table 13. Main drugs identified by Robust Sampling of genetic pathways via Dr. Insights.
DrugsCell LineDosePathways
CephaelineMCF71 × 10−7Increased Expression
TRAIL signaling/Regulation of necroptotic cell death/Regulated Necrosis/RIP-regulated Necrosis/Interleukin-19,20,22
Decreased Expression
Cellular Senescence/TP53 regulates transcription of genes involved in G1 cell cycle arrest/Cellular responses to stress
TanespimycinMCF71 × 10−6Increased Expression
Cellular response to heat stress/Regulation of HSF1 heat shock response/Unfolded protein response.
Decreased Expression
Apoptotic Execution/Processing of intronless pre-mRNAs/Signaling by TGF-beta receptor complex.
WortmanninMCF7-1 × 10−8Increased Expression
Toxicity of botulinum toxin type D/IRS activation/Estrogen biosynthesis/Collagen biosynthesis/Growth hormone receptor signaling
Decreased Expression
RNA modification in the nucleus/Regulation of TP53 activity through methylation/Fatty acyl-CoA biosynthesis/Galactose catabolism
BiperidenMCF71.15 × 10−5Increased Expression
Nuclear Pore Complex disassembly/Intraflagellar Transport/HIV life cycle
Decreased Expression
Secretin family receptors/Hemostasis/Adaptive immune system
Trichostatin APC31 × 10−7/1 × 10−6Increased Expression
Glutamate neurotransmitter release/Antigen activates B-cell receptor/CRMPs in Sema3A signaling
Decreased Expression
SMAD2/SMAD3:SMAD4 regulates transcription/G1 phase/Cyclin D associated events in G1/Transcriptional activation of mitochondrial biogenesis
LY-294002MCF71 × 10−7/1 × 10−5Increased Expression
RNA polymerase I promoter opening/DNA methylation/Adaptive immune system/GPCR downstream signaling
Decreased Expression
Nucleotide-like (purinergic) receptors/P2Y receptors/Adaptive immune system

Share and Cite

MDPI and ACS Style

Fernández-Martínez, J.L.; Álvarez-Machancoses, Ó.; deAndrés-Galiana, E.J.; Bea, G.; Kloczkowski, A. Robust Sampling of Defective Pathways in Alzheimer’s Disease. Implications in Drug Repositioning. Int. J. Mol. Sci. 2020, 21, 3594. https://doi.org/10.3390/ijms21103594

AMA Style

Fernández-Martínez JL, Álvarez-Machancoses Ó, deAndrés-Galiana EJ, Bea G, Kloczkowski A. Robust Sampling of Defective Pathways in Alzheimer’s Disease. Implications in Drug Repositioning. International Journal of Molecular Sciences. 2020; 21(10):3594. https://doi.org/10.3390/ijms21103594

Chicago/Turabian Style

Fernández-Martínez, Juan Luis, Óscar Álvarez-Machancoses, Enrique J. deAndrés-Galiana, Guillermina Bea, and Andrzej Kloczkowski. 2020. "Robust Sampling of Defective Pathways in Alzheimer’s Disease. Implications in Drug Repositioning" International Journal of Molecular Sciences 21, no. 10: 3594. https://doi.org/10.3390/ijms21103594

APA Style

Fernández-Martínez, J. L., Álvarez-Machancoses, Ó., deAndrés-Galiana, E. J., Bea, G., & Kloczkowski, A. (2020). Robust Sampling of Defective Pathways in Alzheimer’s Disease. Implications in Drug Repositioning. International Journal of Molecular Sciences, 21(10), 3594. https://doi.org/10.3390/ijms21103594

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop