Common Methods for Phylogenetic Tree Construction and Their Implementation in R
Abstract
:1. Introduction
2. The Popular Methods for Inferring Phylogenetic Trees
2.1. Distance-Based Method
2.2. Maximum Parsimony (MP) Method
2.3. Maximum Likelihood (ML) Method
2.4. Bayesian Inference (BI) Method
3. Advanced Computational Integrative Methods for Inferring Phylogenetic Tree
3.1. Concatenation Phylogeny Method
3.2. Coalescence Phylogeny Method
4. Construction and Evaluation of Phylogenetic Trees in R Language Environment
4.1. Implementation of Distance-Based Methods in R
- Listing 1.
- The code for implementation of the neighbor-joining method in R.
4.2. Implementation of MP Method in R
- Listing 2.
- The code for implementation of the maximum parsimony method in R.
4.3. Implementation of ML Method in R
- Listing 3.
- The code for implementation of the maximum likelihood method in R.
4.4. Implementation of BI Method in R
- Listing 4.
- The code for implementation of the Bayesian inference method in R.
4.5. Building the Consensus Phylogenetic Tree Using Multiple Genes in R
- Listing 5.
- The code for implementation of concatenation phylogeny method in R.
- Listing 6.
- The code for implementation of coalescence phylogeny method in R.
5. Summary and Perspectives
R Package | Description | Source | Reference |
---|---|---|---|
ape | Providing both utility functions for reading and writing data and manipulating phylogenetic trees, as well as several advanced methods for phylogenetic and evolutionary analysis. | CRAN * | [92] |
phangorn | Estimating phylogenetic trees and networks using maximum likelihood, maximum parsimony, distance methods, and Hadamard conjugation; offering methods for tree comparison, model selection, and visualization of phylogenetic networks. | CRAN * | [93] |
babette | Providing an alternative workflow to the BEAST2; conducting complex Bayesian phylogenetics easily and reproducibly from R. | Github | [106] |
BAMMtools | Reconstructing and visualizing changes in evolutionary rates through time and across clades in a Bayesian statistical framework. | CRAN * | [107] |
apex | Implementing new object classes for storing and handling multiple genes data. | CRAN * | [108] |
phytools | Concentrating on phylogenetic comparative biology; including numerous techniques for visualizing, analyzing, manipulating, reading or writing, and inferring phylogenetic trees. | CRAN * | [109] |
ggtree | Annotating phylogenetic trees with their associated data of different types and from various sources. | Bioconductor | [97] |
RPANDA | Characterizing and comparing phylogenies using spectral densities; fitting models of diversification to phylogenies. | CRAN * | [110] |
TreeSearch | Dataset construction and validation; phylogenetic search (including with inapplicable data); the interrogation of optimal tree sets. | CRAN * | [111] |
paleotree | Analyzing the combined paleontological and phylogenetic data sets, particularly the time-scaling of phylogenetic trees, which include extinct fossil lineages. | CRAN * | [112] |
treeman | Containing a new class called TreeMan for representing phylogenetic trees that has a list structure that allows for more efficient manipulation of phylogenetic trees; demonstrating intuitive tree manipulation, both conceptually and as computationally efficient as possible, within the R environment. | Github | [113] |
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Sanderson, M.J.; Driskell, A.C. The challenge of constructing large phylogenetic trees. Trends Plant Sci. 2003, 8, 374–379. [Google Scholar] [CrossRef] [PubMed]
- Hug, L.A.; Baker, B.J.; Anantharaman, K.; Brown, C.T.; Probst, A.J.; Castelle, C.J.; Butterfield, C.N.; Hernsdorf, A.W.; Amano, Y.; Ise, K.; et al. A new view of the tree of life. Nat. Microbiol. 2016, 1, 16048. [Google Scholar] [CrossRef] [PubMed]
- Abaza, S. What is and why do we have to know the phylogenetic tree? Parasitol. United J. 2020, 13, 68–71. [Google Scholar] [CrossRef]
- de Queiroz, K. Nodes, branches, and phylogenetic definitions. Syst. Biol. 2013, 62, 625–632. [Google Scholar] [CrossRef] [PubMed]
- Dissanayake, A.; Bhunjun, C.; Maharachchikumbura, S.; Liu, J. Applied aspects of methods to infer phylogenetic relationships amongst fungi. Mycosphere 2020, 11, 2652–2676. [Google Scholar] [CrossRef]
- Gupta, M.K.; Gouda, G.; Sabarinathan, S.; Donde, R.; Rajesh, N.; Pati, P.; Rathore, S.K.; Behera, L.; Vadde, R. Phylogenetic analysis. In Bioinformatics in Rice Research: Theories and Techniques; Springer: Singapore, 2021; pp. 179–207. [Google Scholar]
- Feng, H.; Liu, M.; Wang, B.; Feng, J.; Han, J.; Liu, J. HCPC: A New Parsimonious Clustering Method based on Hierarchical Characters for Morphological Phylogenetic Reconstruction. Res. Sq. 2021. [Google Scholar] [CrossRef]
- Mc, C.E.; Verdeflor, L.; Weinsztok, A.; Wiles, J.R.; Dorus, S. Exploratory Activities for Understanding Evolutionary Relationships Depicted by Phylogenetic Trees: United but Diverse. Am. Biol. Teach. 2020, 82, 333–337. [Google Scholar] [CrossRef]
- Jetz, W.; Thomas, G.H.; Joy, J.B.; Hartmann, K.; Mooers, A.O. The global diversity of birds in space and time. Nature 2012, 491, 444–448. [Google Scholar] [CrossRef]
- Hinchliff, C.E.; Smith, S.A.; Allman, J.F.; Burleigh, J.G.; Chaudhary, R.; Coghill, L.M.; Crandall, K.A.; Deng, J.; Drew, B.T.; Gazis, R.; et al. Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proc. Natl. Acad. Sci. USA 2015, 112, 12764–12769. [Google Scholar] [CrossRef]
- Denamur, E.; Clermont, O.; Bonacorsi, S.; Gordon, D. The population genetics of pathogenic Escherichia coli. Nat. Rev. Microbiol. 2021, 19, 37–54. [Google Scholar] [CrossRef]
- Smith, S.D.; Pennell, M.W.; Dunn, C.W.; Edwards, S.V. Phylogenetics is the New Genetics (for Most of Biodiversity). Trends Ecol. Evol. 2020, 35, 415–425. [Google Scholar] [CrossRef]
- Lee, M.S.; Palci, A. Morphological Phylogenetics in the Genomic Age. Curr. Biol. CB 2015, 25, R922–R929. [Google Scholar] [CrossRef] [PubMed]
- Lemmon, E.M.; Lemmon, A.R. High-throughput genomic data in systematics and phylogenetics. Annu. Rev. Ecol. Evol. Syst. 2013, 44, 99–121. [Google Scholar] [CrossRef]
- Morel, B.; Williams, T.A.; Stamatakis, A. Asteroid: A new algorithm to infer species trees from gene trees under high proportions of missing data. Bioinformatics 2023, 39, btac832. [Google Scholar] [CrossRef]
- James, T.Y.; Stajich, J.E.; Hittinger, C.T.; Rokas, A. Toward a Fully Resolved Fungal Tree of Life. Annu. Rev. Microbiol. 2020, 74, 291–313. [Google Scholar] [CrossRef] [PubMed]
- Ashkenazy, H.; Sela, I.; Levy Karin, E.; Landan, G.; Pupko, T. Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction. Syst. Biol. 2019, 68, 117–130. [Google Scholar] [CrossRef]
- Francis, W.R.; Canfield, D.E. Very few sites can reshape the inferred phylogenetic tree. PeerJ 2020, 8, e8865. [Google Scholar] [CrossRef] [PubMed]
- Talavera, G.; Castresana, J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 2007, 56, 564–577. [Google Scholar] [CrossRef] [PubMed]
- Williams, T.A.; Heaps, S.E. An introduction to phylogenetics and the tree of life. In Methods in Microbiology; Elsevier: Amsterdam, The Netherlands, 2014; Volume 41, pp. 13–44. [Google Scholar]
- Desper, R.; Gascuel, O. Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting. Mol. Biol. Evol. 2004, 21, 587–598. [Google Scholar] [CrossRef]
- Wang, Z.; Sun, J.; Gao, Y.; Xue, Y.; Zhang, Y.; Li, K.; Zhang, W.; Zhang, C.; Zu, J.; Zhang, L. Fusang: A framework for phylogenetic tree inference via deep learning. Nucleic Acids Res. 2023, 51, 10909–10923. [Google Scholar] [CrossRef]
- Balaban, M.; Jiang, Y.; Roush, D.; Zhu, Q.; Mirarab, S. Fast and accurate distance-based phylogenetic placement using divide and conquer. Mol. Ecol. Resour. 2022, 22, 1213–1227. [Google Scholar] [CrossRef] [PubMed]
- Vaz, C.; Nascimento, M.; Carriço, J.A.; Rocher, T.; Francisco, A.P. Distance-based phylogenetic inference from typing data: A unifying view. Brief. Bioinform. 2021, 22, bbaa147. [Google Scholar] [CrossRef] [PubMed]
- Coorens, T.H.; Spencer Chapman, M.; Williams, N.; Martincorena, I.; Stratton, M.R.; Nangalia, J.; Campbell, P.J. Reconstructing phylogenetic trees from genome-wide somatic mutations in clonal samples. Nat. Protoc. 2024, 1–21. [Google Scholar] [CrossRef] [PubMed]
- Scossa, F.; Fernie, A.R. Ancestral sequence reconstruction—An underused approach to understand the evolution of gene function in plants? Comput. Struct. Biotechnol. J. 2021, 19, 1579–1594. [Google Scholar] [CrossRef] [PubMed]
- Ojha, K.K.; Mishra, S.; Singh, V.K. Computational molecular phylogeny: Concepts and applications. In Bioinformatics; Academic Press: New York, NY, USA, 2022; pp. 67–89. [Google Scholar]
- Kapli, P.; Yang, Z.; Telford, M.J. Phylogenetic tree building in the genomic age. Nat. Rev. Genet. 2020, 21, 428–444. [Google Scholar] [CrossRef] [PubMed]
- Mount, D.W. Distance methods for phylogenetic prediction. CSH Protoc. 2008, 2008, pdb.top33. [Google Scholar] [CrossRef] [PubMed]
- Davidson, R.; Martín Del Campo, A. Combinatorial and Computational Investigations of Neighbor-Joining Bias. Front. Genet. 2020, 11, 584785. [Google Scholar] [CrossRef] [PubMed]
- Saitou, N.; Nei, M. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 1987, 4, 406–425. [Google Scholar] [CrossRef] [PubMed]
- Kuhner, M.K.; Felsenstein, J. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 1994, 11, 459–468. [Google Scholar] [CrossRef]
- Godini, R.; Fallahi, H. A brief overview of the concepts, methods and computational tools used in phylogenetic tree construction and gene prediction. Meta Gene 2019, 21, 100586. [Google Scholar] [CrossRef]
- Tamura, K.; Nei, M.; Kumar, S. Prospects for inferring very large phylogenies by using the neighbor-joining method. Proc. Natl. Acad. Sci. USA 2004, 101, 11030–11035. [Google Scholar] [CrossRef]
- Zhang, L.-N.; Rong, C.-H.; He, Y.; Guan, Q.; He, B.; Zhu, X.-W.; Liu, J.-N.; Chen, H.-J. A bird’s eye view of the algorithms and software packages for reconstructing phylogenetic trees. Zool. Res. 2013, 34, 640–650. [Google Scholar] [CrossRef]
- Santiago-Alarcon, D.; Tapia-McClung, H.; Lerma-Hernández, S.; Venegas-Andraca, S.E. Quantum aspects of evolution: A contribution towards evolutionary explorations of genotype networks via quantum walks. J. R. Soc. Interface 2020, 17, 20200567. [Google Scholar] [CrossRef]
- Farris, J.S. Methods for computing Wagner trees. Syst. Biol. 1970, 19, 83–92. [Google Scholar] [CrossRef]
- Fitch, W.M. Toward defining the course of evolution: Minimum change for a specific tree topology. Syst. Biol. 1971, 20, 406–416. [Google Scholar] [CrossRef]
- Liu, D.K.; Tu, X.D.; Zhao, Z.; Zeng, M.Y.; Zhang, S.; Ma, L.; Zhang, G.Q.; Wang, M.M.; Liu, Z.J.; Lan, S.R.; et al. Plastid phylogenomic data yield new and robust insights into the phylogeny of Cleisostoma-Gastrochilus clades (Orchidaceae, Aeridinae). Mol. Phylogenetics Evol. 2020, 145, 106729. [Google Scholar] [CrossRef] [PubMed]
- Azouri, D.; Abadi, S.; Mansour, Y.; Mayrose, I.; Pupko, T. Harnessing machine learning to guide phylogenetic-tree search algorithms. Nat. Commun. 2021, 12, 1983. [Google Scholar] [CrossRef] [PubMed]
- Felsenstein, J. Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. 1981, 17, 368–376. [Google Scholar] [CrossRef]
- Jukes, T.H.; Cantor, C.R. Evolution of protein molecules. Mamm. Protein Metab. 1969, 3, 21–132. [Google Scholar]
- Wascher, M.; Kubatko, L. Consistency of SVDQuartets and Maximum Likelihood for Coalescent-Based Species Tree Estimation. Syst. Biol. 2021, 70, 33–48. [Google Scholar] [CrossRef]
- Kimura, M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 1980, 16, 111–120. [Google Scholar] [CrossRef] [PubMed]
- Tamura, K.; Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 1993, 10, 512–526. [Google Scholar] [CrossRef] [PubMed]
- Hasegawa, M.; Kishino, H.; Yano, T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 1985, 22, 160–174. [Google Scholar] [CrossRef] [PubMed]
- Tavaré, S. Some probabilistic and statistical problems on the analysis of DNA sequence. Lect. Math. Life Sci. 1986, 17, 57. [Google Scholar]
- Jacob, S.S.; Sengupta, P.P.; Chandu, A.G.S.; Shamshad, S.; Yogisharadhya, R.; Sudhagar, S.; Ramesh, P. Existence of genetic lineages within Asian genotype of Taenia solium-Genetic characterization based on mitochondrial and ribosomal DNA markers. Transbound. Emerg. Dis. 2022, 69, 2256–2265. [Google Scholar] [CrossRef]
- Heaps, S.E.; Nye, T.M.; Boys, R.J.; Williams, T.A.; Embley, T.M. Bayesian modelling of compositional heterogeneity in molecular phylogenetics. Stat. Appl. Genet. Mol. Biol. 2014, 13, 589–609. [Google Scholar] [CrossRef] [PubMed]
- Amiroch, S.; Pradana, M.S.; Irawan, M.I.; Mukhlash, I. Maximum Likelihood Method on The Construction of Phylogenetic Tree for Identification the Spreading of SARS Epidemic. In Proceedings of the 2018 International Symposium on Advanced Intelligent Informatics (SAIN), Yogyakarta, Indonesia, 29–30 August 2018; pp. 137–141. [Google Scholar]
- Rannala, B.; Yang, Z. Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference. J. Mol. Evol. 1996, 43, 304–311. [Google Scholar] [CrossRef] [PubMed]
- Flouri, T.; Huang, J.; Jiao, X.; Kapli, P.; Rannala, B.; Yang, Z. Bayesian Phylogenetic Inference using Relaxed-clocks and the Multispecies Coalescent. Mol. Biol. Evol. 2022, 39, msac161. [Google Scholar] [CrossRef] [PubMed]
- Nascimento, F.F.; Reis, M.D.; Yang, Z. A biologist’s guide to Bayesian phylogenetic analysis. Nat. Ecol. Evol. 2017, 1, 1446–1454. [Google Scholar] [CrossRef]
- Cornuault, J.; Sanmartín, I. A road map for phylogenetic models of species trees. Mol. Phylogenetics Evol. 2022, 173, 107483. [Google Scholar] [CrossRef]
- Spade, D.A. Geometric ergodicity of a Metropolis-Hastings algorithm for Bayesian inference of phylogenetic branch lengths. Comput. Stat. 2020, 35, 2043–2076. [Google Scholar] [CrossRef]
- Csősz, S.; Loss, A.C.; Fisher, B.L. Exploring the diversity of the Malagasy Ponera (Hymenoptera: Formicidae) fauna via integrative taxonomy. Org. Divers. Evol. 2023, 23, 917–927. [Google Scholar] [CrossRef]
- Larget, B.; Simon, D.L. Markov Chasin Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees. Mol. Biol. Evol. 1999, 16, 750. [Google Scholar] [CrossRef]
- Whidden, C.; Matsen, F.A.t. Quantifying MCMC exploration of phylogenetic tree space. Syst. Biol. 2015, 64, 472–491. [Google Scholar] [CrossRef]
- Inagaki, Y.; Nakajima, Y.; Sato, M.; Sakaguchi, M.; Hashimoto, T. Gene sampling can bias multi-gene phylogenetic inferences: The relationship between red algae and green plants as a case study. Mol. Biol. Evol. 2009, 26, 1171–1178. [Google Scholar] [CrossRef]
- Lax, G.; Kolisko, M.; Eglit, Y.; Lee, W.J.; Yubuki, N.; Karnkowska, A.; Leander, B.S.; Burger, G.; Keeling, P.J.; Simpson, A.G.B. Multigene phylogenetics of euglenids based on single-cell transcriptomics of diverse phagotrophs. Mol. Phylogenetics Evol. 2021, 159, 107088. [Google Scholar] [CrossRef]
- Kanzi, A.M.; Trollip, C.; Wingfield, M.J.; Barnes, I.; Van der Nest, M.A.; Wingfield, B.D. Phylogenomic incongruence in Ceratocystis: A clue to speciation? BMC Genom. 2020, 21, 362. [Google Scholar] [CrossRef] [PubMed]
- Williams, T.A.; Cox, C.J.; Foster, P.G.; Szöllősi, G.J.; Embley, T.M. Phylogenomics provides robust support for a two-domains tree of life. Nat. Ecol. Evol. 2020, 4, 138–147. [Google Scholar] [CrossRef]
- Pardo-De la Hoz, C.J.; Magain, N.; Piatkowski, B.; Cornet, L.; Dal Forno, M.; Carbone, I.; Miadlikowska, J.; Lutzoni, F. Ancient Rapid Radiation Explains Most Conflicts Among Gene Trees and Well-Supported Phylogenomic Trees of Nostocalean Cyanobacteria. Syst. Biol. 2023, 72, 694–712. [Google Scholar] [CrossRef]
- Shen, X.X.; Li, Y.; Hittinger, C.T.; Chen, X.X.; Rokas, A. An investigation of irreproducibility in maximum likelihood phylogenetic inference. Nat. Commun. 2020, 11, 6096. [Google Scholar] [CrossRef]
- Zhao, P.; Kakishima, M.; Uzuhashi, S.; Ishii, H. Multigene phylogenetic analysis of inter- and intraspecific relationships in Venturia nashicola and V. pirina. Eur. J. Plant Pathol. 2012, 132, 245–258. [Google Scholar] [CrossRef]
- Abeysundera, M.; Field, C.; Gu, H. Phylogenetic Analysis Based on Spectral Methods. Mol. Biol. Evol. 2012, 29, 579–597. [Google Scholar] [CrossRef]
- Bi, G.; Mao, Y.; Xing, Q.; Cao, M. HomBlocks: A multiple-alignment construction pipeline for organelle phylogenomics based on locally collinear block searching. Genomics 2018, 110, 18–22. [Google Scholar] [CrossRef]
- Steenwyk, J.L.; Li, Y.; Zhou, X.; Shen, X.X.; Rokas, A. Incongruence in the phylogenomics era. Nat. Rev. Genet. 2023, 24, 834–850. [Google Scholar] [CrossRef] [PubMed]
- Wolsan, M.; Sato, J.J. Effects of data incompleteness on the relative performance of parsimony and Bayesian approaches in a supermatrix phylogenetic reconstruction of Mustelidae and Procyonidae (Carnivora). Cladistics Int. J. Willi Hennig Soc. 2010, 26, 168–194. [Google Scholar] [CrossRef]
- Rannala, B.; Yang, Z. Phylogenetic inference using whole genomes. Annu. Rev. Genom. Hum. Genet. 2008, 9, 217–231. [Google Scholar] [CrossRef] [PubMed]
- Zou, X.-H.; Song, G. Conflicting gene trees and phylogenomics. J. Syst. Evol. 2008, 46, 795. [Google Scholar]
- Delsuc, F.; Brinkmann, H.; Philippe, H. Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 2005, 6, 361–375. [Google Scholar] [CrossRef]
- Bininda-Emonds, O.R.; Sanderson, M.J. Assessment of the accuracy of matrix representation with parsimony analysis supertree construction. Syst. Biol. 2001, 50, 565–579. [Google Scholar] [CrossRef]
- Zhao, T.; Zwaenepoel, A.; Xue, J.-Y.; Kao, S.-M.; Li, Z.; Schranz, M.E.; Van de Peer, Y. Whole-genome microsynteny-based phylogeny of angiosperms. Nat. Commun. 2021, 12, 3498. [Google Scholar] [CrossRef]
- Cotton, J.A.; Wilkinson, M. Majority-rule supertrees. Syst. Biol. 2007, 56, 445–452. [Google Scholar] [CrossRef]
- Delucchi, E.; Hoessly, L.; Paolini, G. Impossibility Results on Stability of Phylogenetic Consensus Methods. Syst. Biol. 2020, 69, 557–565. [Google Scholar] [CrossRef] [PubMed]
- Goloboff, P.A.; Pol, D. Semi-strict supertrees. Cladistics Int. J. Willi Hennig Soc. 2002, 18, 514–525. [Google Scholar] [CrossRef]
- Fischer, M.; Hendriksen, M. Refinement-stable Consensus Methods. arXiv 2021, arXiv:2102.04502. [Google Scholar] [CrossRef]
- Lapointe, F.-J.; Cucumel, G. The Average Consensus Procedure: Combination of Weighted Trees Containing Identical or Overlapping Sets of Taxa. Syst. Biol. 1997, 46, 306–312. [Google Scholar] [CrossRef]
- Mavrodiev, E.V.; Williams, D.M.; Ebach, M.C. On the Typology of Relations. Evol. Biol. 2019, 46, 71–89. [Google Scholar] [CrossRef]
- Lu, L.; Sun, M.; Zhang, J.; Li, H.; Lin, L.; Yang, T.; Chen, M.; Chen, Z. Tree of life and its applications. Biodivers. Sci. 2014, 22, 3–20. [Google Scholar] [CrossRef]
- Jiang, X.; Edwards, S.V.; Liu, L. The Multispecies Coalescent Model Outperforms Concatenation Across Diverse Phylogenomic Data Sets. Syst. Biol. 2020, 69, 795–812. [Google Scholar] [CrossRef]
- Retief, J.D. Phylogenetic analysis using PHYLIP. Methods Mol. Biol. 2000, 132, 243–258. [Google Scholar] [CrossRef]
- Wilgenbusch, J.C.; Swofford, D. Inferring evolutionary trees with PAUP*. In Current Protocols in Bioinformatics; Wiley: Hoboken, NJ, USA, 2003; Chapter 6, Unit 6.4. [Google Scholar] [CrossRef]
- Guindon, S.; Dufayard, J.F.; Lefort, V.; Anisimova, M.; Hordijk, W.; Gascuel, O. New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst. Biol. 2010, 59, 307–321. [Google Scholar] [CrossRef]
- Ronquist, F.; Teslenko, M.; van der Mark, P.; Ayres, D.L.; Darling, A.; Höhna, S.; Larget, B.; Liu, L.; Suchard, M.A.; Huelsenbeck, J.P. MrBayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 2012, 61, 539–542. [Google Scholar] [CrossRef]
- Tamura, K.; Stecher, G.; Kumar, S. MEGA11: Molecular Evolutionary Genetics Analysis Version 11. Mol. Biol. Evol. 2021, 38, 3022–3027. [Google Scholar] [CrossRef] [PubMed]
- Xiang, C.Y.; Gao, F.; Jakovlić, I.; Lei, H.P.; Hu, Y.; Zhang, H.; Zou, H.; Wang, G.T.; Zhang, D. Using PhyloSuite for molecular phylogeny and tree-based analyses. iMeta 2023, 2, e87. [Google Scholar] [CrossRef]
- Huber, W.; Carey, V.J.; Gentleman, R.; Anders, S.; Carlson, M.; Carvalho, B.S.; Bravo, H.C.; Davis, S.; Gatto, L.; Girke, T.; et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 2015, 12, 115–121. [Google Scholar] [CrossRef] [PubMed]
- Giorgi, F.M.; Ceraolo, C.; Mercatelli, D. The R Language: An Engine for Bioinformatics and Data Science. Life 2022, 12, 648. [Google Scholar] [CrossRef] [PubMed]
- Yu, G.; Smith, D.K.; Zhu, H.; Guan, Y.; Lam, T.T.Y. ggtree: An R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 2017, 8, 28–36. [Google Scholar] [CrossRef]
- Paradis, E.; Schliep, K. ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 2019, 35, 526–528. [Google Scholar] [CrossRef] [PubMed]
- Schliep, K.P. phangorn: Phylogenetic analysis in R. Bioinformatics 2011, 27, 592–593. [Google Scholar] [CrossRef] [PubMed]
- Galili, T. dendextend: An R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics 2015, 31, 3718–3720. [Google Scholar] [CrossRef]
- Wang, L.G.; Lam, T.T.; Xu, S.; Dai, Z.; Zhou, L.; Feng, T.; Guo, P.; Dunn, C.W.; Jones, B.R.; Bradley, T.; et al. Treeio: An R Package for Phylogenetic Tree Input and Output with Richly Annotated and Associated Data. Mol. Biol. Evol. 2020, 37, 599–603. [Google Scholar] [CrossRef]
- Yu, G. Data Integration, Manipulation and Visualization of Phylogenetic Trees; Chapman and Hall/CRC: Boca Raton, FL, USA, 2022. [Google Scholar]
- Xu, S.; Li, L.; Luo, X.; Chen, M.; Tang, W.; Zhan, L.; Dai, Z.; Lam, T.T.; Guan, Y.; Yu, G. Ggtree: A serialized data object for visualization of a phylogenetic tree and annotation data. iMeta 2022, 1, e56. [Google Scholar] [CrossRef]
- Wilkinson, L. ggplot2: Elegant Graphics for Data Analysis by WICKHAM, H. Biometrics 2011, 67, 678–679. [Google Scholar] [CrossRef]
- Cock, P.J.; Antao, T.; Chang, J.T.; Chapman, B.A.; Cox, C.J.; Dalke, A.; Friedberg, I.; Hamelryck, T.; Kauff, F.; Wilczynski, B.; et al. Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009, 25, 1422–1423. [Google Scholar] [CrossRef]
- Sukumaran, J.; Holder, M.T. DendroPy: A Python library for phylogenetic computing. Bioinformatics 2010, 26, 1569–1571. [Google Scholar] [CrossRef] [PubMed]
- Hao, J.; Ho, T.K. Machine learning made easy: A review of scikit-learn package in python programming language. J. Educ. Behav. Stat. 2019, 44, 348–361. [Google Scholar] [CrossRef]
- Ketkar, N.; Moolayil, J.; Ketkar, N.; Moolayil, J. Introduction to pytorch. In Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch; Apress: Berkeley, CA, USA, 2021; pp. 27–91. [Google Scholar]
- Jombart, T. adegenet: A R package for the multivariate analysis of genetic markers. Bioinformatics 2008, 24, 1403–1405. [Google Scholar] [CrossRef]
- Bilderbeek, R.J.; Laudanno, G.; Etienne, R.S. Quantifying the impact of an inference model in Bayesian phylogenetics. Methods Ecol. Evol. 2021, 12, 351–358. [Google Scholar] [CrossRef]
- Zou, Z.; Zhang, H.; Guan, Y.; Zhang, J. Deep Residual Neural Networks Resolve Quartet Molecular Phylogenies. Mol. Biol. Evol. 2020, 37, 1495–1507. [Google Scholar] [CrossRef] [PubMed]
- Bilderbeek, R.J.C.; Etienne, R.S. babette: BEAUti 2, BEAST2 and Tracer for R. Methods Ecol. Evol. 2018, 9, 2034–2040. [Google Scholar] [CrossRef]
- Rabosky, D.L.; Grundler, M.; Anderson, C.; Title, P.; Shi, J.J.; Brown, J.W.; Huang, H.; Larson, J.G. BAMMtools: An R package for the analysis of evolutionary dynamics on phylogenetic trees. Methods Ecol. Evol. 2014, 5, 701–707. [Google Scholar] [CrossRef]
- Jombart, T.; Archer, F.; Schliep, K.; Kamvar, Z.; Harris, R.; Paradis, E.; Goudet, J.; Lapp, H. apex: Phylogenetics with multiple genes. Mol. Ecol. Resour. 2017, 17, 19–26. [Google Scholar] [CrossRef]
- Revell, L.J. phytools 2.0: An updated R ecosystem for phylogenetic comparative methods (and other things). PeerJ 2024, 12, e16505. [Google Scholar] [CrossRef]
- Morlon, H.; Lewitus, E.; Condamine, F.L.; Manceau, M.; Clavel, J.; Drury, J. RPANDA: An R package for macroevolutionary analyses on phylogenetic trees. Methods Ecol. Evol. 2016, 7, 589–597. [Google Scholar] [CrossRef]
- Smith, M.R. TreeSearch: Morphological phylogenetic analysis in R. bioRxiv 2021. [Google Scholar] [CrossRef]
- Bapst, D.W. paleotree: An R package for paleontological and phylogenetic analyses of evolution. Methods Ecol. Evol. 2012, 3, 803–807. [Google Scholar] [CrossRef]
- Bennett, D.J.; Sutton, M.D.; Turvey, S.T. treeman: An R package for efficient and intuitive manipulation of phylogenetic trees. BMC Res. Notes 2017, 10, 30. [Google Scholar] [CrossRef] [PubMed]
- Burgstaller-Muehlbacher, S.; Crotty, S.M.; Schmidt, H.A.; Reden, F.; Drucks, T.; von Haeseler, A. ModelRevelator: Fast phylogenetic model estimation via deep learning. Mol. Phylogenetics Evol. 2023, 188, 107905. [Google Scholar] [CrossRef]
- Sarkar, R. Low distortion delaunay embedding of trees in hyperbolic plane. In Proceedings of the International Symposium on Graph Drawing, Eindhoven, The Netherlands, 21–23 September 2011; pp. 355–366. [Google Scholar]
- Matsumoto, H.; Mimori, T.; Fukunaga, T. Novel metric for hyperbolic phylogenetic tree embeddings. Biol. Methods Protoc. 2021, 6, bpab006. [Google Scholar] [CrossRef]
- Jiang, Y.; Tabaghi, P.; Mirarab, S. Learning Hyperbolic Embedding for Phylogenetic Tree Placement and Updates. Biology 2022, 11, 1256. [Google Scholar] [CrossRef]
- Macaulay, M.; Darling, A.; Fourment, M. Fidelity of hyperbolic space for Bayesian phylogenetic inference. PLoS Comput. Biol. 2023, 19, e1011084. [Google Scholar] [CrossRef]
- Thirunavukarasu, A.J.; Ting, D.S.J.; Elangovan, K.; Gutierrez, L.; Tan, T.F.; Ting, D.S.W. Large language models in medicine. Nat. Med. 2023, 29, 1930–1940. [Google Scholar] [CrossRef] [PubMed]
- Lubiana, T.; Lopes, R.; Medeiros, P.; Silva, J.C.; Goncalves, A.N.A.; Maracaja-Coutinho, V.; Nakaya, H.I. Ten quick tips for harnessing the power of ChatGPT in computational biology. PLoS Comput. Biol. 2023, 19, e1011319. [Google Scholar] [CrossRef] [PubMed]
Algorithm | Principle | Hypothesis | Criteria for Selecting the Final Tree | Scope of Application |
---|---|---|---|---|
NJ * | Minimal evolution: Minimizing the total branch length of the phylogenetic tree. | BME branch length estimation model: Ensuring general statistical consistency of minimum length phylogeny and non-negativity of its branch lengths [21]. | In the end, only one tree was constructed. | Short sequences with small evolutionary distance and few informative sites. |
MP | Maximum-parsimony criterion: Minimize the number of evolutionary steps required to explain the data set. | No model required. | The phylogenetic tree with the smallest number of base (or amino acid) substitutions during evolution. | Sequences with high sequence similarity, sequences for which it is difficult to design appropriate characteristic evolution models. |
ML | Maximize likelihood value. | The sites in the alignment are independent; each branch is allowed to evolve at different rates. | Phylogenetic tree with maximum likelihood value. | Distantly related and small number of sequences. |
BI | Bayes theorem. | Continuous-time Markov substitution model: Substitution probability is only related to the current nucleotide and has nothing to do with past nucleotides. | The most sampled phylogenetic tree in MCMC. | A small number of sequences. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zou, Y.; Zhang, Z.; Zeng, Y.; Hu, H.; Hao, Y.; Huang, S.; Li, B. Common Methods for Phylogenetic Tree Construction and Their Implementation in R. Bioengineering 2024, 11, 480. https://doi.org/10.3390/bioengineering11050480
Zou Y, Zhang Z, Zeng Y, Hu H, Hao Y, Huang S, Li B. Common Methods for Phylogenetic Tree Construction and Their Implementation in R. Bioengineering. 2024; 11(5):480. https://doi.org/10.3390/bioengineering11050480
Chicago/Turabian StyleZou, Yue, Zixuan Zhang, Yujie Zeng, Hanyue Hu, Youjin Hao, Sheng Huang, and Bo Li. 2024. "Common Methods for Phylogenetic Tree Construction and Their Implementation in R" Bioengineering 11, no. 5: 480. https://doi.org/10.3390/bioengineering11050480
APA StyleZou, Y., Zhang, Z., Zeng, Y., Hu, H., Hao, Y., Huang, S., & Li, B. (2024). Common Methods for Phylogenetic Tree Construction and Their Implementation in R. Bioengineering, 11(5), 480. https://doi.org/10.3390/bioengineering11050480