Application of Machine Learning Algorithms for Biological Data and Biological Systems

A special issue of Processes (ISSN 2227-9717). This special issue belongs to the section "Biological Processes and Systems".

Deadline for manuscript submissions: closed (31 October 2024) | Viewed by 14460

Special Issue Editor

Department of Chemical Engineering, Lousiana State University, Baton Rouge, LA 70803, USA
Interests: machine learning; biomaterials; computer simulations; polymers; Li-ion battery, ionic liquids

Special Issue Information

Dear Colleagues,

Biological data are collected from a variety of sources, including humans, animals, and viruses. Research on these data has emerged in many fields, such as genetics, proteomics, and healthcare applications, to name a few. Presently, thanks to the development of computer technology, extracting useful data from raw biological data and deciphering intricate biological data has become more feasible through the application of machine learning techniques.

The purpose of this Special Issue is to publish recent developments in machine learning applications for biological data. We invite researchers to submit their research articles and reviews to this Special Issue. Example topics include (but are not limited to) the following, which relate machine learning applications to biological data:

  • Use of machine learning algorithms for investigating biological systems;
  • New machine learning algorithms;
  • Genetics/genomics;
  • Proteomics;
  • Medical image analysis and diagnosis;
  • Drug discovery/development;
  • Healthcare applications.

Dr. Tong Gao
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Processes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • artificial intelligence
  • biological data
  • genetics
  • genomics
  • proteomics
  • bioimage
  • drug discovery

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 1438 KiB  
Article
Mus4mCPred: Accurate Identification of DNA N4-Methylcytosine Sites in Mouse Genome Using Multi-View Feature Learning and Deep Hybrid Network
by Xiao Wang, Qian Du and Rong Wang
Processes 2024, 12(6), 1129; https://doi.org/10.3390/pr12061129 - 30 May 2024
Cited by 2 | Viewed by 884
Abstract
N4-methylcytosine (4mC) is a critical epigenetic modification that plays a pivotal role in the regulation of a multitude of biological processes, including gene expression, DNA replication, and cellular differentiation. Traditional experimental methods for detecting DNA N4-methylcytosine sites are time-consuming, labor-intensive, and costly, making [...] Read more.
N4-methylcytosine (4mC) is a critical epigenetic modification that plays a pivotal role in the regulation of a multitude of biological processes, including gene expression, DNA replication, and cellular differentiation. Traditional experimental methods for detecting DNA N4-methylcytosine sites are time-consuming, labor-intensive, and costly, making them unsuitable for large-scale or high-throughput research. Computational methods for identifying DNA N4-methylcytosine sites enable the rapid and cost-effective analysis of DNA 4mC sites across entire genomes. In this study, we focus on the identification of DNA 4mC sites in the mouse genome. Although there are already some computational methods that can predict DNA 4mC sites in the mouse genome, there is still significant room for improvement in accurately predicting them due to their inability to fully capture the multifaceted characteristics of DNA sequences. To address this issue, we propose a new deep learning predictor called Mus4mCPred, which utilizes multi-view feature learning and deep hybrid networks for accurately predicting DNA 4mC sites in the mouse genome. The predictor Mus4mCPred firstly employed different encoding methods to extract the feature vectors of DNA sequences, then input these features generated by different encoding methods into various hybrid deep learning models for the learning and extraction of more sophisticated representations of these features, and finally fused the extracted multi-view features to serve as the final features for DNA 4mC site prediction in the mouse genome. Multi-view features enabled the more comprehensive capture of data characteristics, enhancing the feature representation of DNA sequences. The independent test results showed that the sensitivity (Sn), specificity (Sp), accuracy (Acc), and Matthews’ correlation coefficient (MCC) were 0.7688, 0.9375, 0.8531, and 0.7165, respectively. The predictor Mus4mCPred outperformed other state-of-the-art methods, achieving the accurate identification of 4mC sites in the mouse genome. Full article
Show Figures

Figure 1

23 pages, 11574 KiB  
Article
Discovery of Natural Compound-Based Lead Molecule against Acetyltransferase Type 1 Bacterial Enzyme from Morganella morgani Using Machine Learning-Enabled Molecular Dynamics Simulation
by Meshari Alazmi and Olaa Motwalli
Processes 2024, 12(6), 1047; https://doi.org/10.3390/pr12061047 - 21 May 2024
Viewed by 926
Abstract
Drug-resistant Morganella morganii, a rod-shaped, Gram-negative, facultatively anaerobic bacillus belonging to the Enterobacteriaceae family, is a growing worldwide health concern due to its association with high morbidity and mortality rates. Recent advancements in machine learning, particularly Alphafold 2’s protein structure prediction using [...] Read more.
Drug-resistant Morganella morganii, a rod-shaped, Gram-negative, facultatively anaerobic bacillus belonging to the Enterobacteriaceae family, is a growing worldwide health concern due to its association with high morbidity and mortality rates. Recent advancements in machine learning, particularly Alphafold 2’s protein structure prediction using local physics and pattern recognition, have aided research efforts. This study focuses on the enzymatic activity of aminoglycoside N6′-acetyltransferase (aacA7), a critical transferase enzyme in bacteria that confers resistance to aminoglycosides. AacA7 modifies aminoglycoside molecules by catalyzing the acetylation of their 6′-amino group using acetyl-CoA, rendering antibiotics like kanamycin, neomycin, tobramycin, and amikacin inactive. We propose that Doripenem and OncoglabrinolC can interact with aacA7, potentially modifying its enzymatic activity. Molecular docking analysis of aacA7 with 22 drug targets revealed OncoglabrinolC as the most promising candidate, exhibiting a binding energy of −12.82 kcal/mol. These two top candidates, OncoglabrinolC and Doripenem, were then subjected to 100 ns of molecular dynamic simulations to assess their dynamic conformational features. Furthermore, the PredictSNP consensus classifier was used to predict the impact of mutations on aacA7 protein functionality. The study also investigated the interaction of wild-type and mutant aacA7 proteins with both Doripenem and OncoglabrinolC. These findings provide valuable insights into the binding behavior of OncoglabrinolC and Doripenem as potential lead molecules for repurposing against aacA7, potentially reducing the pathogenicity of Morganella morganii. Full article
Show Figures

Figure 1

12 pages, 1272 KiB  
Article
PreSubLncR: Predicting Subcellular Localization of Long Non-Coding RNA Based on Multi-Scale Attention Convolutional Network and Bidirectional Long Short-Term Memory Network
by Xiao Wang, Sujun Wang, Rong Wang and Xu Gao
Processes 2024, 12(4), 666; https://doi.org/10.3390/pr12040666 - 26 Mar 2024
Cited by 1 | Viewed by 1164
Abstract
The subcellular localization of long non-coding RNA (lncRNA) provides important insights and opportunities for an in-depth understanding of cell biology, revealing disease mechanisms, drug development, and innovation in the biomedical field. Although several computational methods have been proposed to identify the subcellular localization [...] Read more.
The subcellular localization of long non-coding RNA (lncRNA) provides important insights and opportunities for an in-depth understanding of cell biology, revealing disease mechanisms, drug development, and innovation in the biomedical field. Although several computational methods have been proposed to identify the subcellular localization of lncRNA, it is difficult to accurately predict the subcellular localization of lncRNA effectively with these methods. In this study, a new deep-learning predictor called PreSubLncR has been proposed for accurately predicting the subcellular localization of lncRNA. This predictor firstly used the word embedding model word2vec to encode the RNA sequences, and then combined multi-scale one-dimensional convolutional neural networks with attention and bidirectional long short-term memory networks to capture the different characteristics of various RNA sequences. This study used multiple RNA subcellular localization datasets for experimental validation, and the results showed that our method has higher accuracy and robustness compared with other state-of-the-art methods. It is expected to provide more in-depth insights into cell function research. Full article
Show Figures

Figure 1

12 pages, 1282 KiB  
Article
CrossTx: Cross-Cell-Line Transcriptomic Signature Predictions
by Panagiotis Chrysinas, Changyou Chen and Rudiyanto Gunawan
Processes 2024, 12(2), 332; https://doi.org/10.3390/pr12020332 - 3 Feb 2024
Viewed by 1304
Abstract
Predicting the cell response to drugs is central to drug discovery, drug repurposing, and personalized medicine. To this end, large datasets of drug signatures have been curated, most notably the Connectivity Map (CMap). A multitude of in silico approaches have also been formulated, [...] Read more.
Predicting the cell response to drugs is central to drug discovery, drug repurposing, and personalized medicine. To this end, large datasets of drug signatures have been curated, most notably the Connectivity Map (CMap). A multitude of in silico approaches have also been formulated, but strategies for predicting drug signatures in unseen cells—cell lines not in the reference datasets—are still lacking. In this work, we developed a simple-yet-efficacious computational strategy, called CrossTx, for predicting the drug transcriptomic signatures of an unseen target cell line using drug transcriptome data of reference cell lines and unlabeled transcriptome data of the target cells. Our strategy involves the combination of Predictor and Corrector steps. The Predictor generates cell-line-agnostic drug signatures using the reference dataset, while the Corrector produces target-cell-specific drug signatures by projecting the signatures from the Predictor onto the transcriptomic latent space of the target cell line. Testing different Predictor–Corrector functions using the CMap revealed the combination of averaging (Mean) as a Predictor and Principal Component Analysis (PCA) followed by Autoencoder (AE) as a Corrector to be the best. Yet, using Mean as a Predictor and PCA as a Corrector achieved comparatively high accuracy with much lower computational requirements when compared to the best combination. Full article
Show Figures

Figure 1

21 pages, 6870 KiB  
Article
A Hybrid Feature-Selection Method Based on mRMR and Binary Differential Evolution for Gene Selection
by Kun Yu, Wei Li, Weidong Xie and Linjie Wang
Processes 2024, 12(2), 313; https://doi.org/10.3390/pr12020313 - 1 Feb 2024
Cited by 1 | Viewed by 1496
Abstract
The selection of critical features from microarray data as biomarkers holds significant importance in disease diagnosis and drug development. It is essential to reduce the number of biomarkers while maintaining their performance to effectively minimize subsequent validation costs. However, the processing of microarray [...] Read more.
The selection of critical features from microarray data as biomarkers holds significant importance in disease diagnosis and drug development. It is essential to reduce the number of biomarkers while maintaining their performance to effectively minimize subsequent validation costs. However, the processing of microarray data often encounters the challenge of the “curse of dimensionality”. Existing feature-selection methods face difficulties in effectively reducing feature dimensionality while ensuring classification accuracy, algorithm efficiency, and optimal search space exploration. This paper proposes a hybrid feature-selection algorithm based on an enhanced version of the Max Relevance and Min Redundancy (mRMR) method, coupled with differential evolution. The proposed method improves the quantization functions of mRMR to accommodate the continuous nature of microarray data attributes, utilizing them as the initial step in feature selection. Subsequently, an enhanced differential evolution algorithm is employed to further filter the features. Two adaptive mechanisms are introduced to enhance early search efficiency and late population diversity, thus reducing the number of features and balancing the algorithm’s exploration and exploitation. The results highlight the improved performance and efficiency of the hybrid algorithm in feature selection for microarray data analysis. Full article
Show Figures

Figure 1

17 pages, 2025 KiB  
Article
Development of a Novel Multi-Modal Contextual Fusion Model for Early Detection of Varicella Zoster Virus Skin Lesions in Human Subjects
by McDominic Chimaobi Eze, Lida Ebrahimi Vafaei, Charles Tochukwu Eze, Turgut Tursoy, Dilber Uzun Ozsahin and Mubarak Taiwo Mustapha
Processes 2023, 11(8), 2268; https://doi.org/10.3390/pr11082268 - 27 Jul 2023
Cited by 5 | Viewed by 5225
Abstract
Skin lesion detection is crucial in diagnosing and managing dermatological conditions. In this study, we developed and demonstrated the potential applicability of a novel mixed-scale dense convolution, self-attention mechanism, hierarchical feature fusion, and attention-based contextual information technique (MSHA) model for skin lesion detection [...] Read more.
Skin lesion detection is crucial in diagnosing and managing dermatological conditions. In this study, we developed and demonstrated the potential applicability of a novel mixed-scale dense convolution, self-attention mechanism, hierarchical feature fusion, and attention-based contextual information technique (MSHA) model for skin lesion detection using digital skin images of chickenpox and shingles lesions. The model adopts a combination of unique architectural designs, such as a mixed-scale dense convolution layer, self-attention mechanism, hierarchical feature fusion, and attention-based contextual information, enabling the MSHA model to capture and extract relevant features more effectively for chickenpox and shingles lesion classification. We also implemented an effective training strategy to enhance a better capacity to learn and represent the relevant features in the skin lesion images. We evaluated the performance of the novel model in comparison to state-of-the-art models, including ResNet50, VGG16, VGG19, InceptionV3, and ViT. The results indicated that the MSHA model outperformed the other models with accuracy and loss of 95.0% and 0.104, respectively. Furthermore, it exhibited superior performance in terms of true-positive and true-negative rates while maintaining low-false positive and false-negative rates. The MSHA model’s success can be attributed to its unique architectural design, effective training strategy, and better capacity to learn and represent the relevant features in skin lesion images. The study underscores the potential of the MSHA model as a valuable tool for the accurate and reliable detection of chickenpox and shingles lesions, which can aid in timely diagnosis and appropriate treatment planning for dermatological conditions. Full article
Show Figures

Figure 1

18 pages, 3391 KiB  
Article
Accelerating SARS-CoV-2 Vaccine Development: Leveraging Novel Hybrid Deep Learning Models and Bioinformatics Analysis for Epitope Selection and Classification
by Zubaida Said Ameen, Hala Mostafa, Dilber Uzun Ozsahin and Auwalu Saleh Mubarak
Processes 2023, 11(6), 1829; https://doi.org/10.3390/pr11061829 - 16 Jun 2023
Cited by 3 | Viewed by 2148
Abstract
It is essential to use highly antigenic epitope areas, since the development of peptide vaccines heavily relies on the precise design of epitope regions that can elicit a strong immune response. Choosing epitope regions experimentally for the production of the SARS-CoV-2 vaccine can [...] Read more.
It is essential to use highly antigenic epitope areas, since the development of peptide vaccines heavily relies on the precise design of epitope regions that can elicit a strong immune response. Choosing epitope regions experimentally for the production of the SARS-CoV-2 vaccine can be time-consuming, costly, and labor-intensive. Scientists have created in silico prediction techniques based on machine learning to find these regions, to cut down the number of candidate epitopes that might be tested in experiments, and, as a result, to lessen the time-consuming process of their mapping. However, the tools and approaches involved continue to have low accuracy. In this work, we propose a hybrid deep learning model based on a convolutional neural network (CNN) and long short-term memory (LSTM) for the classification of peptides into epitopes or non-epitopes. Numerous transfer learning strategies were utilized, and the fine-tuned method gave the best result, with an AUC of 0.979, an f1 score of 0.902, and 95.1% accuracy, which was far better than the performance of the model trained from scratch. The experimental results obtained show that this model has superior performance when compared to other methods trained on IEDB datasets. Using bioinformatics tools such as ToxinPred, VaxiJen, and AllerTop2.0, the toxicities, antigenicities, and allergenicities, respectively, of the predicted epitopes were determined. In silico cloning and codon optimization were used to successfully express the vaccine in E. coli. This work will help scientists choose the best epitope for the development of the COVID-19 vaccine, reducing cost and labor and thereby accelerating vaccine production. Full article
Show Figures

Graphical abstract

Back to TopTop