Computational Methods for Secondary Metabolite Discovery

A special issue of Metabolites (ISSN 2218-1989). This special issue belongs to the section "Bioinformatics and Data Analysis".

Deadline for manuscript submissions: closed (30 April 2021) | Viewed by 16727

Special Issue Editors

Center for Algorithmic Biotechnology, Saint Petersburg State University, St. Petersburg, Russia
Interests: bioinformatics; chemoinformatics; computational mass spectrometry; metabolomics; natural products; dereplication/annotation; genome/metagenome assembly
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Computational Biology Department, School of Computer Science, Carnegie Mellon University, PA, USA
Interests: computational metabolomics and metagenomics; natural product discovery; microbiome analysis
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Secondary metabolites (SMs) are a rich source of industrially important substances such as medicines and pesticides. The latest breakthroughs in biotechnologies, in particular high-throughput genome sequencing and mass spectrometry, enabled the acquisition of huge volumes of data from SMs and their producers (bacteria, fungi, and plants). However, the lack of proper computational methods for processing these amounts of data prevent the transformation of SM discovery into a routine pipeline.

This Special Issue is devoted to computational techniques for analyzing metabolomics data. Topics that will be covered by this Special Issue include but are not limited to dereplication/annotation of known SMs in high-resolution mass spectrometry data, discovery of novel compounds using multi-omics approaches, machine learning techniques to increase sensitivity and specificity of SM search tools, genome mining methods to reveal SM biosynthesis, visualization of metabolomics data, and novel databases of SMs. Manuscripts from both software developers and researchers applying existing computational tools to SM discovery are welcome.

Dr. Alexey Gurevich
Dr. Hosein Mohimani
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Metabolites is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • secondary metabolites
  • natural products
  • bioinformatics
  • chemoinformatics
  • computational mass spectrometry
  • genome mining
  • machine learning
  • software
  • algorithms
  • metabolomics data analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 1957 KiB  
Article
Nerpa: A Tool for Discovering Biosynthetic Gene Clusters of Bacterial Nonribosomal Peptides
by Olga Kunyavskaya, Azat M. Tagirdzhanov, Andrés Mauricio Caraballo-Rodríguez, Louis-Félix Nothias, Pieter C. Dorrestein, Anton Korobeynikov, Hosein Mohimani and Alexey Gurevich
Metabolites 2021, 11(10), 693; https://doi.org/10.3390/metabo11100693 - 11 Oct 2021
Cited by 12 | Viewed by 4889
Abstract
Microbial natural products are a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class of natural products that include antibiotics, immunosuppressants, and anticancer agents. Recent breakthroughs in natural product discovery have revealed the chemical [...] Read more.
Microbial natural products are a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class of natural products that include antibiotics, immunosuppressants, and anticancer agents. Recent breakthroughs in natural product discovery have revealed the chemical structure of several thousand NRPs. However, biosynthetic gene clusters (BGCs) encoding them are known only for a few hundred compounds. Here, we developed Nerpa, a computational method for the high-throughput discovery of novel BGCs responsible for producing known NRPs. After searching 13,399 representative bacterial genomes from the RefSeq repository against 8368 known NRPs, Nerpa linked 117 BGCs to their products. We further experimentally validated the predicted BGC of ngercheumicin from Photobacterium galatheae via mass spectrometry. Nerpa supports searching new genomes against thousands of known NRP structures, and novel molecular structures against tens of thousands of bacterial genomes. The availability of these tools can enhance our understanding of NRP synthesis and the function of their biosynthetic enzymes. Full article
(This article belongs to the Special Issue Computational Methods for Secondary Metabolite Discovery)
Show Figures

Graphical abstract

22 pages, 3738 KiB  
Article
A Compositional Model to Predict the Aggregated Isotope Distribution for Average DNA and RNA Oligonucleotides
by Annelies Agten, Piotr Prostko, Melvin Geubbelmans, Youzhong Liu, Thomas De Vijlder and Dirk Valkenborg
Metabolites 2021, 11(6), 400; https://doi.org/10.3390/metabo11060400 - 18 Jun 2021
Cited by 3 | Viewed by 2474
Abstract
Structural modifications of DNA and RNA molecules play a pivotal role in epigenetic and posttranscriptional regulation. To characterise these modifications, more and more MS and MS/MS- based tools for the analysis of nucleic acids are being developed. To identify an oligonucleotide in a [...] Read more.
Structural modifications of DNA and RNA molecules play a pivotal role in epigenetic and posttranscriptional regulation. To characterise these modifications, more and more MS and MS/MS- based tools for the analysis of nucleic acids are being developed. To identify an oligonucleotide in a mass spectrum, it is useful to compare the obtained isotope pattern of the molecule of interest to the one that is theoretically expected based on its elemental composition. However, this is not straightforward when the identity of the molecule under investigation is unknown. Here, we present a modelling approach for the prediction of the aggregated isotope distribution of an average DNA or RNA molecule when a particular (monoisotopic) mass is available. For this purpose, a theoretical database of all possible DNA/RNA oligonucleotides up to a mass of 25 kDa is created, and the aggregated isotope distribution for the entire database of oligonucleotides is generated using the BRAIN algorithm. Since this isotope information is compositional in nature, the modelling method is based on the additive log-ratio analysis of Aitchison. As a result, a univariate weighted polynomial regression model of order 10 is fitted to predict the first 20 isotope peaks for DNA and RNA molecules. The performance of the prediction model is assessed by using a mean squared error approach and a modified Pearson’s χ2 goodness-of-fit measure on experimental data. Our analysis has indicated that the variability in spectral accuracy contributed more to the errors than the approximation of the theoretical isotope distribution by our proposed average DNA/RNA model. The prediction model is implemented as an online tool. An R function can be downloaded to incorporate the method in custom analysis workflows to process mass spectral data. Full article
(This article belongs to the Special Issue Computational Methods for Secondary Metabolite Discovery)
Show Figures

Graphical abstract

15 pages, 2939 KiB  
Article
Diverse Taxonomies for Diverse Chemistries: Enhanced Representation of Natural Product Metabolism in UniProtKB
by Marc Feuermann, Emmanuel Boutet, Anne Morgat, Kristian B. Axelsen, Parit Bansal, Jerven Bolleman, Edouard de Castro, Elisabeth Coudert, Elisabeth Gasteiger, Sébastien Géhant, Damien Lieberherr, Thierry Lombardot, Teresa B. Neto, Ivo Pedruzzi, Sylvain Poux, Monica Pozzato, Nicole Redaschi, Alan Bridge and on behalf of the UniProt Consortium
Metabolites 2021, 11(1), 48; https://doi.org/10.3390/metabo11010048 - 12 Jan 2021
Cited by 3 | Viewed by 4109
Abstract
The UniProt Knowledgebase UniProtKB is a comprehensive, high-quality, and freely accessible resource of protein sequences and functional annotation that covers genomes and proteomes from tens of thousands of taxa, including a broad range of plants and microorganisms producing natural products of medical, nutritional, [...] Read more.
The UniProt Knowledgebase UniProtKB is a comprehensive, high-quality, and freely accessible resource of protein sequences and functional annotation that covers genomes and proteomes from tens of thousands of taxa, including a broad range of plants and microorganisms producing natural products of medical, nutritional, and agronomical interest. Here we describe work that enhances the utility of UniProtKB as a support for both the study of natural products and for their discovery. The foundation of this work is an improved representation of natural product metabolism in UniProtKB using Rhea, an expert-curated knowledgebase of biochemical reactions, that is built on the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Knowledge of natural products and precursors is captured in ChEBI, enzyme-catalyzed reactions in Rhea, and enzymes in UniProtKB/Swiss-Prot, thereby linking chemical structure data directly to protein knowledge. We provide a practical demonstration of how users can search UniProtKB for protein knowledge relevant to natural products through interactive or programmatic queries using metabolite names and synonyms, chemical identifiers, chemical classes, and chemical structures and show how to federate UniProtKB with other data and knowledge resources and tools using semantic web technologies such as RDF and SPARQL. All UniProtKB data are freely available for download in a broad range of formats for users to further mine or exploit as an annotation source, to enrich other natural product datasets and databases. Full article
(This article belongs to the Special Issue Computational Methods for Secondary Metabolite Discovery)
Show Figures

Figure 1

27 pages, 3825 KiB  
Article
SeMPI 2.0—A Web Server for PKS and NRPS Predictions Combined with Metabolite Screening in Natural Product Databases
by Paul F. Zierep, Adriana T. Ceci, Ilia Dobrusin, Sinclair C. Rockwell-Kollmann and Stefan Günther
Metabolites 2021, 11(1), 13; https://doi.org/10.3390/metabo11010013 - 29 Dec 2020
Cited by 18 | Viewed by 4132
Abstract
Microorganisms produce secondary metabolites with a remarkable range of bioactive properties. The constantly increasing amount of published genomic data provides the opportunity for efficient identification of biosynthetic gene clusters by genome mining. On the other hand, for many natural products with resolved structures, [...] Read more.
Microorganisms produce secondary metabolites with a remarkable range of bioactive properties. The constantly increasing amount of published genomic data provides the opportunity for efficient identification of biosynthetic gene clusters by genome mining. On the other hand, for many natural products with resolved structures, the encoding biosynthetic gene clusters have not been identified yet. Of those secondary metabolites, the scaffolds of nonribosomal peptides and polyketides (type I modular) can be predicted due to their building block-like assembly. SeMPI v2 provides a comprehensive prediction pipeline, which includes the screening of the scaffold in publicly available natural compound databases. The screening algorithm was designed to detect homologous structures even for partial, incomplete clusters. The pipeline allows linking of gene clusters to known natural products and therefore also provides a metric to estimate the novelty of the cluster if a matching scaffold cannot be found. Whereas currently available tools attempt to provide comprehensive information about a wide range of gene clusters, SeMPI v2 aims to focus on precise predictions. Therefore, the cluster detection algorithm, including building block generation and domain substrate prediction, was thoroughly refined and benchmarked, to provide high-quality scaffold predictions. In a benchmark based on 559 gene clusters, SeMPI v2 achieved comparable or better results than antiSMASH v5. Additionally, the SeMPI v2 web server provides features that can help to further investigate a submitted gene cluster, such as the incorporation of a genome browser, and the possibility to modify a predicted scaffold in a workbench before the database screening. Full article
(This article belongs to the Special Issue Computational Methods for Secondary Metabolite Discovery)
Show Figures

Graphical abstract

Back to TopTop