A Comprehensive Review of Bioinformatics Tools for Genomic Biomarker Discovery Driving Precision Oncology
Abstract
:1. Introduction
2. Overview of DNA Sequencing Methods
2.1. Whole Genome Sequencing
2.2. Whole-Exome Sequencing (WES) and RNA Sequencing (RNA-seq)
3. Overview of RNA-Seq Bioinformatics
3.1. Data Quality Control and Pre-Processing
3.2. Read Alignment/Mapping
3.3. Quantification
3.4. Alternative Splicing Analysis
3.5. Visualization
3.6. Data Integration
4. Open-Source Tools 4
5. Data Repositories
6. Microarray/RNAseq Data Repositories
Integration of Multiple Repositories for Cancer Research
7. Data Manipulation and Structuring
Structuring
8. Data Analysis
Pathway and Network Analysis Tools
9. Using Predictive Algorithms
9.1. Data Labeling and Supervised Learning
9.2. Data Labeling Beyond Cancer Research
9.3. Data for Predictive Models
9.4. Training ML Models
9.5. Machine-Learning Tools and Languages
9.6. Validation and Reproducibility of ML Models
10. Applications of Bioinformatics Tools in Cancer Research
11. Ethics in Bioinformatics
11.1. Data Privacy and Security
11.2. Informed Consent
11.3. Data Sharing
11.4. Potential Misuse of Genetic Information
11.5. AI and Machine Learning
12. Advancements, Challenges, and Future Directions
12.1. Key Findings and Applications
12.2. Data Challenges
12.3. Challenges in Biomarker Identification, Validation, Clinical Implications, and Ethics
12.4. Health Equity in Multi-Modal Cancer Research Challenges
12.5. Future Directions and Emerging Technologies
12.5.1. Single-Cell and Spatial Omics
12.5.2. Long-Read Sequencing Technologies
12.5.3. Machine Learning Approaches
12.5.4. Clinical Integration
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zamora Atenza, C.; Anguera, G.; Riudavets Melià, M.; Alserawan De Lamo, L.; Sullivan, I.; Barba Joaquin, A.; Serra Lopez, J.; Ortiz, M.A.; Mulet, M.; Vidal, S.; et al. The integration of systemic and tumor PD-L1 as a predictive biomarker of clinical outcomes in patients with advanced NSCLC treated with PD-(L)1blockade agents. Cancer Immunol. Immunother. 2022, 71, 1823–1835. [Google Scholar] [CrossRef] [PubMed]
- Mullis, K.B.; Faloona, F.A. Specific synthesis of DNA in vitro via a polymerase-catalyzed chain reaction. Methods Enzym. 1987, 155, 335–350. [Google Scholar]
- Sanger, F.; Nicklen, S.; Coulson, A.R. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 1977, 74, 5463–5467. [Google Scholar] [CrossRef]
- Rabbani, B.; Tekin, M.; Mahdieh, N. The promise of whole-exome sequencing in medical genetics. J. Hum. Genet. 2014, 59, 5–15. [Google Scholar] [CrossRef] [PubMed]
- Makałowski, W. The human genome structure and organization. Acta Biochim. Pol. 2001, 48, 587–598. [Google Scholar] [CrossRef] [PubMed]
- Nakagawa, H.; Wardell, C.P.; Furuta, M.; Taniguchi, H.; Fujimoto, A. Cancer whole-genome sequencing: Present and future. Oncogene 2015, 34, 5943–5950. [Google Scholar] [CrossRef] [PubMed]
- Nakagawa, H.; Fujita, M. Whole genome sequencing analysis for cancer genomics and precision medicine. Cancer Sci. 2018, 109, 513–522. [Google Scholar] [CrossRef] [PubMed]
- Singleton, A.B. Exome sequencing: A transformative technology. Lancet Neurol. 2011, 10, 942–946. [Google Scholar] [CrossRef] [PubMed]
- Rizzo, J.M.; Buck, M.J. Key Principles and Clinical Applications of “Next-Generation” DNA Sequencing. Cancer Prev. Res. 2012, 5, 887–900. [Google Scholar] [CrossRef]
- Rabbani, B.; Mahdieh, N.; Hosomichi, K.; Nakaoka, H.; Inoue, I. Next-generation sequencing: Impact of exome sequencing in characterizing Mendelian disorders. J. Hum. Genet. 2012, 57, 621–632. [Google Scholar] [CrossRef]
- Peng, L. Explore the Novel Biomarkers through Next-Generation Sequencing. In Genotyping; Ibrokhim, A., Ed.; IntechOpen: Rijeka, Croatia, 2018; p. 1. [Google Scholar]
- Tipu, H.N.; Shabbir, A. Evolution of DNA sequencing. J. Coll. Physicians Surg. Pak. 2015, 25, 210–215. [Google Scholar] [PubMed]
- Heather, J.M.; Chain, B. The sequence of sequencers: The history of sequencing DNA. Genomics 2016, 107, 1–8. [Google Scholar] [CrossRef]
- Sanger, F.; Coulson, A.R. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol. 1975, 94, 441–448. [Google Scholar] [CrossRef] [PubMed]
- Mardis, E.R. Next-generation sequencing platforms. Annu. Rev. Anal. Chem. 2013, 6, 287–303. [Google Scholar] [CrossRef]
- Mardis, E.R. Next-Generation DNA Sequencing Methods. Annu. Rev. Genom. Hum. Genet. 2008, 9, 387–402. [Google Scholar] [CrossRef]
- Margulies, M.; Egholm, M.; Altman, W.E.; Attiya, S.; Bader, J.S.; Bemben, L.A.; Berka, J.; Braverman, M.S.; Chen, Y.-J.; Chen, Z.; et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 2005, 437, 376–380. [Google Scholar] [CrossRef]
- Mardis, E.R. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008, 24, 133–141. [Google Scholar] [CrossRef]
- Applied Biosystems. SOLiD System Brochure. 2008. [Google Scholar]
- Ho, A.; Murphy, M.; Wilson, S.; Atlas, S.R.; Edwards, J.S. Sequencing by ligation variation with endonuclease V digestion and deoxyinosine-containing query oligonucleotides. BMC Genom. 2011, 12, 598. [Google Scholar] [CrossRef] [PubMed]
- Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data 2010. Available online: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (accessed on 28 July 2024).
- Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef]
- Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011, 2011, 3. [Google Scholar] [CrossRef]
- Dobin, A.; Davis, C.A.; Schlesinger, F.; Drenkow, J.; Zaleski, C.; Jha, S.; Batut, P.; Chaisson, M.; Gingeras, T.R. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 2013, 29, 15–21. [Google Scholar] [CrossRef]
- Kim, D.; Paggi, J.M.; Park, C.; Bennett, C.; Salzberg, S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 2019, 37, 907–915. [Google Scholar] [CrossRef] [PubMed]
- Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef]
- Liao, Y.; Smyth, G.K.; Shi, W. featureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 2013, 30, 923–930. [Google Scholar] [CrossRef] [PubMed]
- Putri, G.H.; Anders, S.; Pyl, P.T.; Pimanda, J.E.; Zanini, F. Analysing high-throughput sequencing data in Python with HTSeq 2.0. Bioinformatics 2022, 38, 2943–2945. [Google Scholar] [CrossRef] [PubMed]
- Patro, R.; Duggal, G.; Love, M.I.; Irizarry, R.A.; Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 2017, 14, 417–419. [Google Scholar] [CrossRef]
- Kovaka, S.; Zimin, A.V.; Pertea, G.M.; Razaghi, R.; Salzberg, S.L.; Pertea, M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019, 20, 278. [Google Scholar] [CrossRef]
- Shen, S.; Park, J.W.; Huang, J.; Dittmar, K.A.; Lu, Z.-X.; Zhou, Q.; Carstens, R.P.; Xing, Y. MATS: A Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data. Nucleic Acids Res. 2012, 40, e61. [Google Scholar] [CrossRef]
- Shen, S.; Park, J.W.; Lu, Z.-X.; Lin, L.; Henry, M.D.; Wu, Y.N.; Zhou, Q.; Xing, Y. rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc. Natl. Acad. Sci. USA 2014, 111, E5593–E5601. [Google Scholar] [CrossRef]
- Cheng, X.; Yan, J.; Liu, Y.; Wang, J.; Taubert, S. eVITTA: A web-based visualization and inference toolbox for transcriptome analysis. Nucleic Acids Res. 2021, 49, W207–W215. [Google Scholar] [CrossRef]
- Dries, R.; Zhu, Q.; Dong, R.; Eng, C.-H.L.; Li, H.; Liu, K.; Fu, Y.; Zhao, T.; Sarkar, A.; Bao, F.; et al. Giotto: A toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 2021, 22, 78. [Google Scholar] [CrossRef]
- Wolf, F.A.; Angerer, P.; Theis, F.J. SCANPY: Large-scale single-cell gene expression data analysis. Genome Biol. 2018, 19, 15. [Google Scholar] [CrossRef]
- Hao, Y.; Stuart, T.; Kowalski, M.H.; Choudhary, S.; Hoffman, P.; Hartman, A.; Srivastava, A.; Molla, G.; Madad, S.; Fernandez-Granda, C.; et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 2023, 42, 293–304. [Google Scholar] [CrossRef] [PubMed]
- Palla, G.; Spitzer, H.; Klein, M.; Fischer, D.; Schaar, A.C.; Kuemmerle, L.B.; Rybakov, S.; Ibarra, I.L.; Holmberg, O.; Virshup, I.; et al. Squidpy: A scalable framework for spatial omics analysis. Nat. Methods 2022, 19, 171–178. [Google Scholar] [CrossRef]
- Moncada, R.; Barkley, D.; Wagner, F.; Chiodin, M.; Devlin, J.C.; Baron, M.; Hajdu, C.H.; Simeone, D.M.; Yanai, I. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 2020, 38, 333–342. [Google Scholar] [CrossRef]
- Kleverov, M.; Zenkova, D.; Kamenev, V.; Sablina, M.; Artyomov, M.N.; Sergushichev, A.A. Phantasus: Web-application for visual and interactive gene expression analysis. bioRxiv 2022, 2022, 12.10.519861. [Google Scholar] [CrossRef] [PubMed]
- Giguere, D.J.; Macklaim, J.M.; Lieng, B.Y.; Gloor, G.B. omicplotR: Visualizing omic datasets as compositions. BMC Bioinform. 2019, 20, 580. [Google Scholar] [CrossRef] [PubMed]
- Perampalam, P.; Dick, F.A. BEAVR: A browser-based tool for the exploration and visualization of RNA-seq data. BMC Bioinform. 2020, 21, 221. [Google Scholar] [CrossRef]
- Zhang, W.; Xie, X.; Huang, Z.; Zhong, X.; Liu, Y.; Cheong, K.-L.; Zhou, J.; Tang, S. The integration of single-cell sequencing, TCGA, and GEO data analysis revealed that PRRT3-AS1 is a biomarker and therapeutic target of SKCM. Front. Immunol. 2022, 13, 919145. [Google Scholar] [CrossRef]
- Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; et al. Gene Ontology: Tool for the unification of biology. Nat. Genet. 2000, 25, 25–29. [Google Scholar] [CrossRef]
- Gene Ontology Consortium; Aleksander, S.A.; Balhoff, J.; Carbon, S.; Cherry, J.M.; Drabkin, H.J.; Ebert, D.; Feuermann, M.; Gaudet, P.; Harris, N.L.; et al. The Gene Ontology knowledgebase in 2023. Genetics 2023, 224, iyad031. [Google Scholar] [PubMed]
- Flanagan, S.E.; Patch, A.-M.; Ellard, S. Using SIFT and PolyPhen to Predict Loss-of-Function and Gain-of-Function Mutations. Genet. Test. Mol. Biomark. 2010, 14, 533–537. [Google Scholar] [CrossRef] [PubMed]
- Ng, P.C.; Henikoff, S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003, 31, 3812–3814. [Google Scholar] [CrossRef] [PubMed]
- Reva, B.; Antipin, Y.; Sander, C. Predicting the functional impact of protein mutations: Application to cancer genomics. Nucleic Acids Res. 2011, 39, e118. [Google Scholar] [CrossRef] [PubMed]
- Creixell, P.; Reimand, J.; Haider, S.; Wu, G.; Shibata, T.; Vazquez, M.; Mustonen, V.; Gonzalez-Perez, A.; Pearson, J.; Sander, C.; et al. Pathway and network analysis of cancer genomes. Nat. Methods 2015, 12, 615–621. [Google Scholar] [PubMed]
- Qiagen. Ingenuity Pathway Analysis (QIAGEN IPA). 2023. Available online: https://digitalinsights.qiagen.com/products-overview/discovery-insights-portfolio/analysis-and-visualization/qiagen-ipa/ (accessed on 28 July 2024).
- Subramanian, A.; Tamayo, P.; Mootha, V.K.; Mukherjee, S.; Ebert, B.L.; Gillette, M.A.; Paulovich, A.; Pomeroy, S.L.; Golub, T.R.; Lander, E.S.; et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 2005, 102, 15545–15550. [Google Scholar] [CrossRef]
- Gillespie, M.; Jassal, B.; Stephan, R.; Milacic, M.; Rothfels, K.; Senff-Ribeiro, A.; Griss, J.; Sevilla, C.; Matthews, L.; Gong, C.; et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 2021, 50, D687–D692. [Google Scholar] [CrossRef] [PubMed]
- Jassal, B.; Matthews, L.; Viteri, G.; Gong, C.; Lorente, P.; Fabregat, A.; Sidiropoulos, K.; Cook, J.; Gillespie, M.; Haw, R.; et al. The reactome pathway knowledgebase. Nucleic Acids Res. 2020, 48, D498–D503. [Google Scholar] [CrossRef] [PubMed]
- Kanehisa, M.; Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28, 27–30. [Google Scholar] [CrossRef]
- Dennis, G.; Sherman, B.T.; Hosack, D.A.; Yang, J.; Gao, W.; Lane, H.C.; Lempicki, R.A. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003, 4, R60. [Google Scholar] [CrossRef]
- Nikitin, A.; Egorov, S.; Daraselia, N.; Mazo, I. Pathway studio--the analysis and navigation of molecular networks. Bioinformatics 2003, 19, 2155–2157. [Google Scholar] [CrossRef] [PubMed]
- Ekins, S.; Bugrim, A.; Brovold, L.; Kirillov, E.; Nikolsky, Y.; Rakhmatulin, E.; Sorokina, S.; Ryabov, A.; Serebryiskaya, T.; Melnikov, A.; et al. Algorithms for network analysis in systems-ADME/Tox using the MetaCore and MetaDrug platforms. Xenobiotica 2006, 36, 877–901. [Google Scholar] [CrossRef] [PubMed]
- Szklarczyk, D.; Kirsch, R.; Koutrouli, M.; Nastou, K.; Mehryary, F.; Hachilif, R.; Gable, A.L.; Fang, T.; Doncheva, N.T.; Pyysalo, S.; et al. The STRING database in 2023: Protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023, 51, D638–D646. [Google Scholar] [CrossRef] [PubMed]
- Luo, W.; Brouwer, C. Pathview: An R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics 2013, 29, 1830–1831. [Google Scholar] [CrossRef] [PubMed]
- Wu, T.; Hu, E.; Xu, S.; Chen, M.; Guo, P.; Dai, Z.; Feng, T.; Zhou, L.; Tang, W.; Zhan, L.; et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation 2021, 2, 100141. [Google Scholar] [CrossRef] [PubMed]
- Sergushichev, A.A. An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. bioRxiv 2016, 060012. [Google Scholar]
- Otasek, D.; Morris, J.H.; Bouças, J.; Pico, A.R.; Demchak, B. Cytoscape Automation: Empowering workflow-based network analysis. Genome Biol. 2019, 20, 185. [Google Scholar] [CrossRef]
- Zhang, B.; Kirov, S.; Snoddy, J. WebGestalt: An integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res. 2005, 33, W741–W748. [Google Scholar] [CrossRef]
- Wang, J.; Duncan, D.; Shi, Z.; Zhang, B. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): Update 2013. Nucleic Acids Res. 2013, 41, W77–W83. [Google Scholar] [CrossRef]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv 2016, arXiv:1603.04467. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Zhou, J.; Troyanskaya, O.G. Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 2015, 12, 931–934. [Google Scholar] [CrossRef]
- Jaganathan, K.; Panagiotopoulou, S.K.; McRae, J.F.; Darbandi, S.F.; Knowles, D.; Li, Y.I.; Kosmicki, J.A.; Arbelaez, J.; Cui, W.; Schwartz, G.B.; et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell 2019, 176, 535–548.e24. [Google Scholar] [CrossRef] [PubMed]
- Yi, H.C.; You, Z.H.; Cheng, L.; Zhou, X.; Jiang, T.H.; Li, X.; Wang, Y.B. Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions. Comput. Struct. Biotechnol. J. 2020, 18, 20–26. [Google Scholar] [CrossRef] [PubMed]
- Clauwaert, J.; Menschaert, G.; Waegeman, W. DeepRibo: A neural network for precise gene annotation of prokaryotes by combining ribosome profiling signal and binding site patterns. Nucleic Acids Res. 2019, 47, e36. [Google Scholar] [CrossRef] [PubMed]
- Raschka, S. Model evaluation, model selection, and algorithm selection in machine learning. arXiv 2018, arXiv:1811.12808. [Google Scholar]
- Jung, Y.; Hu, J. A K-fold Averaging Cross-validation Procedure. J. Nonparametr Stat. 2015, 27, 167–179. [Google Scholar] [CrossRef] [PubMed]
- Wong, T.-T. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognit. 2015, 48, 2839–2846. [Google Scholar] [CrossRef]
- Mann, M.; Kumar, C.; Zeng, W.-F.; Strauss, M.T. Artificial intelligence for proteomics and biomarker discovery. Cell Syst. 2021, 12, 759–770. [Google Scholar] [CrossRef] [PubMed]
- Hongladarom, S. Ethics of bioinformatics: A convergence between bioethics and computer ethics. Asian Biotechnol. Dev. Rev. 2006, 9, 37–44. [Google Scholar]
- Friedlin, F.J.; McDonald, C.J. A software tool for removing patient identifying information from clinical documents. J. Am. Med. Inf. Assoc. 2008, 15, 601–610. [Google Scholar] [CrossRef]
- Goodman, D.; Johnson, C.O.; Bowen, D.; Smith, M.; Wenzel, L.; Edwards, K. De-identified genomic data sharing: The research participant perspective. J. Community Genet. 2017, 8, 173–181. [Google Scholar] [CrossRef] [PubMed]
- Gymrek, M.; McGuire, A.L.; Golan, D.; Halperin, E.; Erlich, Y. Identifying Personal Genomes by Surname Inference. Science 2013, 339, 321–324. [Google Scholar] [CrossRef]
- Yadav, N.; Pandey, S.; Gupta, A.; Dudani, P.; Gupta, S.; Rangarajan, K. Data Privacy in Healthcare: In the Era of Artificial Intelligence. Indian Dermatol. Online J. 2023, 14, 788–792. [Google Scholar] [CrossRef]
- Lee, S.; Kim, J.; Kwon, Y.; Kim, T.; Cho, S. Privacy Preservation in Patient Information Exchange Systems Based on Blockchain: System Design Study. J. Med. Internet Res. 2022, 24, e29108. [Google Scholar] [CrossRef] [PubMed]
- Miller, A.R.; Tucker, C.E. Encryption and the loss of patient data. J. Policy Anal. Manag. 2011, 30, 534–556. [Google Scholar] [CrossRef]
- Tryka, K.A.; Hao, L.; Sturcke, A.; Jin, Y.; Wang, Z.Y.; Ziyabari, L.; Lee, M.; Popova, N.; Sharopova, N.; Kimura, M.; et al. NCBI‘s Database of Genotypes and Phenotypes: dbGaP. Nucleic Acids Res. 2014, 42, D975–D979. [Google Scholar] [CrossRef]
- Allison, M. Industry welcomes Genetic Information Nondiscrimination Act. Nat. Biotechnol. 2008, 26, 596–597. [Google Scholar] [CrossRef] [PubMed]
- MacIntyre, M.R.; Cockerill, R.G.; Mirza, O.F.; Appel, J.M. Ethical considerations for the use of artificial intelligence in medical decision-making capacity assessments. Psychiatry Res. 2023, 328, 115466. [Google Scholar] [CrossRef]
- Calders, T.; Verwer, S. Three naive bayes approaches for discrimination-free classification. Data Min. Knowl. Discov. 2010, 21, 277–292. [Google Scholar] [CrossRef]
- Buolamwini, J.; Gebru, T. Gender shades: Intersectional accuracy disparities in commercial gender classification. Proc. Mach. Learn. Res. 2018, 81, 1–15. [Google Scholar]
- Komiyama, J.; Shimao, H. Two-stage algorithm for fairness-aware machine learning. arXiv 2017, arXiv:1710.04924. [Google Scholar]
- Bærøe, K.; Miyata-Sturm, A.; Henden, E. How to achieve trustworthy artificial intelligence for health. Bull. World Health Organ. 2020, 98, 257–262. [Google Scholar] [CrossRef] [PubMed]
- Divate, M.; Tyagi, A.; Richard, D.J.; Prasad, P.A.; Gowda, H.; Nagaraj, S.H. Deep Learning-Based Pan-Cancer Classification Model Reveals Tissue-of-Origin Specific Gene Expression Signatures. Cancers 2022, 14, 1185. [Google Scholar] [CrossRef] [PubMed]
- Bassez, A.; Vos, H.; Van Dyck, L.; Floris, G.; Arijs, I.; Desmedt, C.; Boeckx, B.; Bempt, M.V.; Nevelsteen, I.; Lambein, K.; et al. A single-cell map of intratumoral changes during anti-PD1 treatment of patients with breast cancer. Nat. Med. 2021, 27, 820–832. [Google Scholar] [CrossRef]
Network Analysis Tool | Description |
---|---|
ARCANE | A tool for inferring gene regulatory networks from gene expression data. It employs mutual information-based methods to identify direct regulatory relationships. |
WGCNA (Weighted Gene Co-Expression Network Analysis) | Used to identify co-expression modules within large gene expression datasets. It helps uncover gene networks related to specific biological processes or conditions. |
GeneMANIA | A tool integrates various data sources to predict and visualize gene function and interactions. It helps users understand the functional relationships between genes in the context of specific biological processes. |
VisANT | Network visualization and analysis tool that enables the exploration of biological pathways, gene interactions, and molecular networks |
BioGRID | Offers tools for network analysis. It helps users explore physical and genetic interactions within a network context. |
NetworkAnalyst | Integrated platform for network-based analysis that supports various types of omics data. It provides tools for network visualization, enrichment analysis, and pathway analysis. |
RegulatoryNetworks | Focuses on the reconstruction and analysis of gene regulatory networks. It utilizes transcription factor binding site data to infer regulatory interactions. |
GRNsight | Web-based tool for visualizing and analyzing gene regulatory networks. It helps users explore transcriptional interactions and regulatory relationships. |
CytoScape.js | JavaScript library for network visualization that can be integrated into web applications to display and analyze gene networks interactively. |
PathVisio | Offers plugins for network analysis. It allows users to draw, edit, and analyze biological pathways and networks. |
URL | Description | ||
---|---|---|---|
Repositories | TCGA | https://www.cancer.gov/ccg/research/genome-sequencing/tcga (Accessed on 31 October 2023) | Multi-omics data of 20,000 patients and 33 tumor types |
ICGC | https://dcc.icgc.org/ (Accessed on 31 October 2023) | 55 cancer genomics projects with tools to analyze and visualize data. | |
GDC | https://gdc.cancer.gov/ (Accessed on 31 October 2023) | Developed by the NIH and NCI and includes TCGA AND TARGET | |
Gene Expression Omnibus (GEO) | https://www.ncbi.nlm.nih.gov/geo/ (Accessed on 31 October 2023) | Stores processed data files, including RNA-Seq and chip-Seq | |
Array Express | https://www.ebi.ac.uk/biostudies/arrayexpress (Accessed on 31 October 2023) | Stores high-throughput genomics data. | |
European Nucleotide Archive (ENA) | https://www.ebi.ac.uk/ena/browser/home (Accessed on 31 October 2023) | Stores raw data files in Fastq format. | |
Sequence Read Archive | https://www.ncbi.nlm.nih.gov/sra (Accessed on 31 October 2023) | Stores raw data files in SRA format. | |
Dryad Digital Repository | https://datadryad.org/stash (Accessed on 31 October 2023) | Open access repository of medical research data | |
Figshare | https://figshare.com/ (Accessed on 31 October 2023) | Cross-disciplinary open-access repository for academic research | |
Harvard Dataverse Network | https://dataverse.harvard.edu/ (Accessed on 31 October 2023) | Multi-disciplinary data storage center | |
Kaggle | https://www.kaggle.com/ (Accessed on 31 October 2023) | Platform for data science training, competitions, and datasets | |
Network Data Exchange | https://home.ndexbio.org/about-ndex/ (Accessed on 31 October 2023) | Repository for network biology data | |
Open Science Framework | https://osf.io/ (Accessed on 31 October 2023) | Platform for collaborating on research projects | |
GenoVault | https://github.com/bioinformatics-cdac/GenoVault (Accessed on 31 October 2023) | Cloud-based repository for NGS data | |
UK Biobank | https://www.ukbiobank.ac.uk/ (Accessed on 31 October 2023) | Large-scale biomedical research database | |
Tools for data analysis and visualization | cBioPortal | https://www.cBioPortal.org/ (Accessed on 31 October 2023) | Visualizations, analysis, cancer genomics projects |
COSMIC | https://cancer.sanger.ac.uk/cosmic (Accessed on 31 October 2023) | Database of somatic mutations in cancer. | |
IGV | https://software.broadinstitute.org/software/igv/ (Accessed on 31 October 2023) | High-performance genome browser for visualizing and analyzing large-scale genomic data. | |
Regulome Explorer | https://explorer-cancerregulome.systemsbiology.net/ (Accessed on 31 July 2024) | Exploring and analyzing regulatory elements in the genome. | |
UCSC Genome Browser | https://genome.ucsc.edu/ (Accessed on 31 October 2023) | Provides access to a vast collection of genomic data and annotations | |
Bioconductor | https://www.bioconductor.org/ (Accessed on 31 October 2023) | Open-source software project for the analysis and comprehension of high-throughput genomics data. | |
Cytoscape | https://cytoscape.org/ (Accessed on 31 October 2023) | Network analysis and visualization tool | |
Gene Ontology | http://geneontology.org/ (Accessed on 31 October 2023) | Standardized system for annotating genes and their functions in different organisms. | |
UALCAN | https://ualcan.path.uab.edu/ (Accessed on 31 October 2023) | Web portal for in-depth analysis of cancer transcriptome data. | |
DAVID | https://david.ncifcrf.gov/ (Accessed on 31 October 2023) | Functional annotation and enrichment analysis of gene lists | |
HumanBase (GIANT) | https://hb.flatironinstitute.org/ (Accessed on 31 October 2023) | Exploring human genomic data and conducting large-scale integrative analysis. | |
CEDER | https://ieeexplore.ieee.org/document/6205734 (Accessed on 31 October 2023) | Detection of differentially expressed genes | |
CPTRA | https://pubmed.ncbi.nlm.nih.gov/19811681/ (Accessed on 31 October 2023) | Package for analyzing transcriptome sequencing data | |
Bioconductor | https://www.bioconductor.org/ (Accessed on 31 October 2023) | Open-source software for genomic data analysis | |
Tools for analyzing biomarker signatures from omics data | Limma | https://bioconductor.org/packages/release/bioc/html/limma.html (Accessed on 31 October 2023) | Statistical package for the analysis of microarray and RNA-seq data. |
Caret | https://cran.r-project.org/web/packages/caret/index.html (Accessed on 31 October 2023) | R package for training and evaluating ML models. | |
netClass | https://doi.org/10.1093/bioinformatics/btu025 (Accessed on 31 October 2023) | A tool for classifying biological samples using network-based features. | |
WGCNA | https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/ | Identifying gene modules and their relationships in high-throughput data. | |
Somatic variants interpretation | MyCancerGenome | https://www.mycancergenome.org/ (Accessed on 31 October 2023) | Understanding cancer genomics and personalized cancer treatment options. |
Civic | https://civicdb.org/welcome (Accessed on 31 October 2023) | Treatment options for cancer patients based on their unique tumor DNA | |
TARGET | https://www.cancer.gov/ccg/research/genome-sequencing/target (Accessed on 31 October 2023) | Molecular characterization | |
CGI | https://www.genomicinterpretation.org/ (Accessed on 31 October 2023) | Genomic alterations in cancer and their potential clinical relevance | |
ClinicalTrials.gov | https://www.clinicaltrials.gov/ (Accessed on 31 October 2023) | An online database that provides information on clinical trials | |
EUCTR | https://www.clinicaltrialsregister.eu/ (Accessed on 31 October 2023) | Database containing information on clinical trials conducted in the European Union |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Clark, A.J.; Lillard, J.W., Jr. A Comprehensive Review of Bioinformatics Tools for Genomic Biomarker Discovery Driving Precision Oncology. Genes 2024, 15, 1036. https://doi.org/10.3390/genes15081036
Clark AJ, Lillard JW Jr. A Comprehensive Review of Bioinformatics Tools for Genomic Biomarker Discovery Driving Precision Oncology. Genes. 2024; 15(8):1036. https://doi.org/10.3390/genes15081036
Chicago/Turabian StyleClark, Alexis J., and James W. Lillard, Jr. 2024. "A Comprehensive Review of Bioinformatics Tools for Genomic Biomarker Discovery Driving Precision Oncology" Genes 15, no. 8: 1036. https://doi.org/10.3390/genes15081036
APA StyleClark, A. J., & Lillard, J. W., Jr. (2024). A Comprehensive Review of Bioinformatics Tools for Genomic Biomarker Discovery Driving Precision Oncology. Genes, 15(8), 1036. https://doi.org/10.3390/genes15081036