Molecules

Research

16 pages, 7229 KiB

Open AccessArticle

Reconstructing Phylogeny by Aligning Multiple Metabolic Pathways Using Functional Module Mapping

by Yiran Huang, Cheng Zhong, Hai Xiang Lin, Jianyi Wang and Yuzhong Peng

Molecules 2018, 23(2), 486; https://doi.org/10.3390/molecules23020486 - 23 Feb 2018

Cited by 2 | Viewed by 4344

Comparison of metabolic pathways provides a systematic way for understanding the evolutionary and phylogenetic relationships in systems biology. Although a number of phylogenetic methods have been developed, few efforts have been made to provide a unified phylogenetic framework that sufficiently reflects the metabolic [...] Read more.

Comparison of metabolic pathways provides a systematic way for understanding the evolutionary and phylogenetic relationships in systems biology. Although a number of phylogenetic methods have been developed, few efforts have been made to provide a unified phylogenetic framework that sufficiently reflects the metabolic features of organisms. In this paper, we propose a phylogenetic framework that characterizes the metabolic features of organisms by aligning multiple metabolic pathways using functional module mapping. Our method transforms the alignment of multiple metabolic pathways into constructing the union graph of pathways, builds mappings between functional modules of pathways in the union graph, and infers phylogenetic relationships among organisms based on module mappings. Experimental results show that the use of functional module mapping enables us to correctly categorize organisms into main categories with specific metabolic characteristics. Traditional genome-based phylogenetic methods can reconstruct phylogenetic relationships, whereas our method can offer in-depth metabolic analysis for phylogenetic reconstruction, which can add insights into traditional phyletic reconstruction. The results also demonstrate that our phylogenetic trees are closer to the classic classifications in comparison to existing classification methods using metabolic pathway data. Full article

(This article belongs to the Special Issue Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017))

► Show Figures

Figure 1

15 pages, 704 KiB

Open AccessArticle

The Integrative Method Based on the Module-Network for Identifying Driver Genes in Cancer Subtypes

by Xinguo Lu, Xing Li, Ping Liu, Xin Qian, Qiumai Miao and Shaoliang Peng

Molecules 2018, 23(2), 183; https://doi.org/10.3390/molecules23020183 - 24 Jan 2018

Cited by 26 | Viewed by 5701

Abstract

With advances in next-generation sequencing(NGS) technologies, a large number of multiple types of high-throughput genomics data are available. A great challenge in exploring cancer progression is to identify the driver genes from the variant genes by analyzing and integrating multi-types genomics data. Breast [...] Read more.

With advances in next-generation sequencing(NGS) technologies, a large number of multiple types of high-throughput genomics data are available. A great challenge in exploring cancer progression is to identify the driver genes from the variant genes by analyzing and integrating multi-types genomics data. Breast cancer is known as a heterogeneous disease. The identification of subtype-specific driver genes is critical to guide the diagnosis, assessment of prognosis and treatment of breast cancer. We developed an integrated frame based on gene expression profiles and copy number variation (CNV) data to identify breast cancer subtype-specific driver genes. In this frame, we employed statistical machine-learning method to select gene subsets and utilized an module-network analysis method to identify potential candidate driver genes. The final subtype-specific driver genes were acquired by paired-wise comparison in subtypes. To validate specificity of the driver genes, the gene expression data of these genes were applied to classify the patient samples with 10-fold cross validation and the enrichment analysis were also conducted on the identified driver genes. The experimental results show that the proposed integrative method can identify the potential driver genes and the classifier with these genes acquired better performance than with genes identified by other methods. Full article

(This article belongs to the Special Issue Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017))

► Show Figures

Figure 1

225 KiB

Open AccessArticle

Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics

by Xiaohui Lin, Chao Li, Yanhui Zhang, Benzhe Su, Meng Fan and Hai Wei

Molecules 2018, 23(1), 52; https://doi.org/10.3390/molecules23010052 - 26 Dec 2017

Cited by 79 | Viewed by 5439

Abstract

Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in [...] Read more.

Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data. Full article

(This article belongs to the Special Issue Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017))

3181 KiB

Open AccessArticle

Extracting Fitness Relationships and Oncogenic Patterns among Driver Genes in Cancer

by Xindong Zhang, Lin Gao and Songwei Jia

Molecules 2018, 23(1), 39; https://doi.org/10.3390/molecules23010039 - 25 Dec 2017

Cited by 1 | Viewed by 4070

Abstract

Driver mutation provides fitness advantage to cancer cells, the accumulation of which increases the fitness of cancer cells and accelerates cancer progression. This work seeks to extract patterns accumulated by driver genes (“fitness relationships”) in tumorigenesis. We introduce a network-based method for extracting [...] Read more.

Driver mutation provides fitness advantage to cancer cells, the accumulation of which increases the fitness of cancer cells and accelerates cancer progression. This work seeks to extract patterns accumulated by driver genes (“fitness relationships”) in tumorigenesis. We introduce a network-based method for extracting the fitness relationships of driver genes by modeling the network properties of the “fitness” of cancer cells. Colon adenocarcinoma (COAD) and skin cutaneous malignant melanoma (SKCM) are employed as case studies. Consistent results derived from different background networks suggest the reliability of the identified fitness relationships. Additionally co-occurrence analysis and pathway analysis reveal the functional significance of the fitness relationships with signaling transduction. In addition, a subset of driver genes called the “fitness core” is recognized for each case. Further analyses indicate the functional importance of the fitness core in carcinogenesis, and provide potential therapeutic opportunities in medicinal intervention. Fitness relationships characterize the functional continuity among driver genes in carcinogenesis, and suggest new insights in understanding the oncogenic mechanisms of cancers, as well as providing guiding information for medicinal intervention. Full article

(This article belongs to the Special Issue Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017))

► Show Figures

Figure 1

2204 KiB

Open AccessArticle

HIGA: A Running History Information Guided Genetic Algorithm for Protein–Ligand Docking

by Boxin Guan, Changsheng Zhang and Yuhai Zhao

Molecules 2017, 22(12), 2233; https://doi.org/10.3390/molecules22122233 - 15 Dec 2017

Cited by 2 | Viewed by 4584

Abstract

Protein-ligand docking is an essential part of computer-aided drug design, and it identifies the binding patterns of proteins and ligands by computer simulation. Though Lamarckian genetic algorithm (LGA) has demonstrated excellent performance in terms of protein-ligand docking problems, it can not memorize the [...] Read more.

Protein-ligand docking is an essential part of computer-aided drug design, and it identifies the binding patterns of proteins and ligands by computer simulation. Though Lamarckian genetic algorithm (LGA) has demonstrated excellent performance in terms of protein-ligand docking problems, it can not memorize the history information that it has accessed, rendering it effort-consuming to discover some promising solutions. This article illustrates a novel optimization algorithm (HIGA), which is based on LGA for solving the protein-ligand docking problems with an aim to overcome the drawback mentioned above. A running history information guided model, which includes CE crossover, ED mutation, and BSP tree, is applied in the method. The novel algorithm is more efficient to find the lowest energy of protein-ligand docking. We evaluate the performance of HIGA in comparison with GA, LGA, EDGA, CEPGA, SODOCK, and ABC, the results of which indicate that HIGA outperforms other search algorithms. Full article

(This article belongs to the Special Issue Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017))

► Show Figures

Figure 1

324 KiB

Open AccessArticle

Multi-Objective Optimization Algorithm to Discover Condition-Specific Modules in Multiple Networks

by Xiaoke Ma, Penggang Sun and Jianbang Zhao

Molecules 2017, 22(12), 2228; https://doi.org/10.3390/molecules22122228 - 14 Dec 2017

Cited by 5 | Viewed by 3301

Abstract

The advances in biological technologies make it possible to generate data for multiple conditions simultaneously. Discovering the condition-specific modules in multiple networks has great merit in understanding the underlying molecular mechanisms of cells. The available algorithms transform the multiple networks into a single [...] Read more.

The advances in biological technologies make it possible to generate data for multiple conditions simultaneously. Discovering the condition-specific modules in multiple networks has great merit in understanding the underlying molecular mechanisms of cells. The available algorithms transform the multiple networks into a single objective optimization problem, which is criticized for its low accuracy. To address this issue, a multi-objective genetic algorithm for condition-specific modules in multiple networks (MOGA-CSM) is developed to discover the condition-specific modules. By using the artificial networks, we demonstrate that the MOGA-CSM outperforms state-of-the-art methods in terms of accuracy. Furthermore, MOGA-CSM discovers stage-specific modules in breast cancer networks based on The Cancer Genome Atlas (TCGA) data, and these modules serve as biomarkers to predict stages of breast cancer. The proposed model and algorithm provide an effective way to analyze multiple networks. Full article

(This article belongs to the Special Issue Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017))

► Show Figures

Figure 1

1853 KiB

Open AccessArticle

Developing an Agent-Based Drug Model to Investigate the Synergistic Effects of Drug Combinations

by Hongjie Gao, Zuojing Yin, Zhiwei Cao and Le Zhang

Molecules 2017, 22(12), 2209; https://doi.org/10.3390/molecules22122209 - 14 Dec 2017

Cited by 15 | Viewed by 3773

Abstract

The growth and survival of cancer cells are greatly related to their surrounding microenvironment. To understand the regulation under the impact of anti-cancer drugs and their synergistic effects, we have developed a multiscale agent-based model that can investigate the synergistic effects of drug [...] Read more.

The growth and survival of cancer cells are greatly related to their surrounding microenvironment. To understand the regulation under the impact of anti-cancer drugs and their synergistic effects, we have developed a multiscale agent-based model that can investigate the synergistic effects of drug combinations with three innovations. First, it explores the synergistic effects of drug combinations in a huge dose combinational space at the cell line level. Second, it can simulate the interaction between cells and their microenvironment. Third, it employs both local and global optimization algorithms to train the key parameters and validate the predictive power of the model by using experimental data. The research results indicate that our multicellular system can not only describe the interactions between the microenvironment and cells in detail, but also predict the synergistic effects of drug combinations. Full article

(This article belongs to the Special Issue Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017))

► Show Figures

Figure 1

4366 KiB

Open AccessArticle

Detection of Network Motif Based on a Novel Graph Canonization Algorithm from Transcriptional Regulation Networks

by Jialu Hu and Xuequn Shang

Molecules 2017, 22(12), 2194; https://doi.org/10.3390/molecules22122194 - 10 Dec 2017

Cited by 19 | Viewed by 4095

Abstract

Network motifs are patterns of complex networks occurring significantly more frequently than those in random networks. They have been considered as fundamental building blocks of complex networks. Therefore, the detection of network motifs in transcriptional regulation networks is a crucial step in understanding [...] Read more.

Network motifs are patterns of complex networks occurring significantly more frequently than those in random networks. They have been considered as fundamental building blocks of complex networks. Therefore, the detection of network motifs in transcriptional regulation networks is a crucial step in understanding the mechanism of transcriptional regulation and network evolution. The search for network motifs is similar to solving subgraph searching problems, which has proven to be NP-complete. To quickly and effectively count subgraphs of a large biological network, we propose a novel graph canonization algorithm based on resolving sets. This method has been implemented in a command line interface (CLI) program sgip using the SeqAn library. Comparing to Babai’s algorithm, this approach has a tighter complexity bound,

o (\exp (\sqrt{n} {log}^{2} n + 4 log n))

, on strongly regular graphs. Results on several simulated datasets and transcriptional regulation networks indicate that sgip outperforms nauty on many graph cases. The source code of sgip is freely accessible in https://github.com/seqan/seqan/tree/master/apps/sgip and the binary code in http://packages.seqan.de/sgip/. Full article

(This article belongs to the Special Issue Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017))

► Show Figures

Figure 1

2297 KiB

Open AccessArticle

A Seed Expansion Graph Clustering Method for Protein Complexes Detection in Protein Interaction Networks

by Jie Wang, Wenping Zheng, Yuhua Qian and Jiye Liang

Molecules 2017, 22(12), 2179; https://doi.org/10.3390/molecules22122179 - 8 Dec 2017

Cited by 9 | Viewed by 4631

Abstract

Most proteins perform their biological functions while interacting as complexes. The detection of protein complexes is an important task not only for understanding the relationship between functions and structures of biological network, but also for predicting the function of unknown proteins. We present [...] Read more.

Most proteins perform their biological functions while interacting as complexes. The detection of protein complexes is an important task not only for understanding the relationship between functions and structures of biological network, but also for predicting the function of unknown proteins. We present a new nodal metric by integrating its local topological information. The metric reflects its representability in a larger local neighborhood to a cluster of a protein interaction (PPI) network. Based on the metric, we propose a seed-expansion graph clustering algorithm (SEGC) for protein complexes detection in PPI networks. A roulette wheel strategy is used in the selection of the seed to enhance the diversity of clustering. For a candidate node u, we define its closeness to a cluster C, denoted as NC(u, C), by combing the density of a cluster C and the connection between a node u and C. In SEGC, a cluster which initially consists of only a seed node, is extended by adding nodes recursively from its neighbors according to the closeness, until all neighbors fail the process of expansion. We compare the F-measure and accuracy of the proposed SEGC algorithm with other algorithms on Saccharomyces cerevisiae protein interaction networks. The experimental results show that SEGC outperforms other algorithms under full coverage. Full article

(This article belongs to the Special Issue Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017))

► Show Figures

Figure 1

1541 KiB

Open AccessArticle

A Robust Manifold Graph Regularized Nonnegative Matrix Factorization Algorithm for Cancer Gene Clustering

by Rong Zhu, Jin-Xing Liu, Yuan-Ke Zhang and Ying Guo

Molecules 2017, 22(12), 2131; https://doi.org/10.3390/molecules22122131 - 2 Dec 2017

Cited by 16 | Viewed by 4864

Abstract

Detecting genomes with similar expression patterns using clustering techniques plays an important role in gene expression data analysis. Non-negative matrix factorization (NMF) is an effective method for clustering the analysis of gene expression data. However, the NMF-based method is performed within the Euclidean [...] Read more.

Detecting genomes with similar expression patterns using clustering techniques plays an important role in gene expression data analysis. Non-negative matrix factorization (NMF) is an effective method for clustering the analysis of gene expression data. However, the NMF-based method is performed within the Euclidean space, and it is usually inappropriate for revealing the intrinsic geometric structure of data space. In order to overcome this shortcoming, Cai et al. proposed a novel algorithm, called graph regularized non-negative matrices factorization (GNMF). Motivated by the topological structure of the GNMF-based method, we propose improved graph regularized non-negative matrix factorization (GNMF) to facilitate the display of geometric structure of data space. Robust manifold non-negative matrix factorization (RM-GNMF) is designed for cancer gene clustering, leading to an enhancement of the GNMF-based algorithm in terms of robustness. We combine the

l_{2, 1}

-norm NMF with spectral clustering to conduct the wide-ranging experiments on the three known datasets. Clustering results indicate that the proposed method outperforms the previous methods, which displays the latest application of the RM-GNMF-based method in cancer gene clustering. Full article

(This article belongs to the Special Issue Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017))

► Show Figures

Figure 1

1825 KiB

Open AccessArticle

An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer

by Xi Yang, Chengkun Wu, Kai Lu, Lin Fang, Yong Zhang, Shengkang Li, Guixin Guo and YunFei Du

Molecules 2017, 22(12), 2116; https://doi.org/10.3390/molecules22122116 - 1 Dec 2017

Cited by 3 | Viewed by 4648

Abstract

Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing [...] Read more.

Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Orion—a big data interface on the Tianhe-2 supercomputer—to enable big data applications to run on Tianhe-2 via a single command or a shell script. Orion supports multiple users, and each user can launch multiple tasks. It minimizes the effort needed to initiate big data applications on the Tianhe-2 supercomputer via automated configuration. Orion follows the “allocate-when-needed” paradigm, and it avoids the idle occupation of computational resources. We tested the utility and performance of Orion using a big genomic dataset and achieved a satisfactory performance on Tianhe-2 with very few modifications to existing applications that were implemented in Hadoop/Spark. In summary, Orion provides a practical and economical interface for big data processing on Tianhe-2. Full article

(This article belongs to the Special Issue Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017))

► Show Figures

Figure 1

1092 KiB

Open AccessArticle

Cancer Classification Based on Support Vector Machine Optimized by Particle Swarm Optimization and Artificial Bee Colony

by Lingyun Gao, Mingquan Ye and Changrong Wu

Molecules 2017, 22(12), 2086; https://doi.org/10.3390/molecules22122086 - 29 Nov 2017

Cited by 44 | Viewed by 6046

Abstract

Intelligent optimization algorithms have advantages in dealing with complex nonlinear problems accompanied by good flexibility and adaptability. In this paper, the FCBF (Fast Correlation-Based Feature selection) method is used to filter irrelevant and redundant features in order to improve the quality of cancer [...] Read more.

Intelligent optimization algorithms have advantages in dealing with complex nonlinear problems accompanied by good flexibility and adaptability. In this paper, the FCBF (Fast Correlation-Based Feature selection) method is used to filter irrelevant and redundant features in order to improve the quality of cancer classification. Then, we perform classification based on SVM (Support Vector Machine) optimized by PSO (Particle Swarm Optimization) combined with ABC (Artificial Bee Colony) approaches, which is represented as PA-SVM. The proposed PA-SVM method is applied to nine cancer datasets, including five datasets of outcome prediction and a protein dataset of ovarian cancer. By comparison with other classification methods, the results demonstrate the effectiveness and the robustness of the proposed PA-SVM method in handling various types of data for cancer classification. Full article

(This article belongs to the Special Issue Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017))

► Show Figures

Graphical abstract

610 KiB

Open AccessArticle

Deep Convolutional Neural Network-Based Early Automated Detection of Diabetic Retinopathy Using Fundus Image

by Kele Xu, Dawei Feng and Haibo Mi

Molecules 2017, 22(12), 2054; https://doi.org/10.3390/molecules22122054 - 23 Nov 2017

Cited by 219 | Viewed by 11618

Abstract

The automatic detection of diabetic retinopathy is of vital importance, as it is the main cause of irreversible vision loss in the working-age population in the developed world. The early detection of diabetic retinopathy occurrence can be very helpful for clinical treatment; although [...] Read more.

The automatic detection of diabetic retinopathy is of vital importance, as it is the main cause of irreversible vision loss in the working-age population in the developed world. The early detection of diabetic retinopathy occurrence can be very helpful for clinical treatment; although several different feature extraction approaches have been proposed, the classification task for retinal images is still tedious even for those trained clinicians. Recently, deep convolutional neural networks have manifested superior performance in image classification compared to previous handcrafted feature-based image classification methods. Thus, in this paper, we explored the use of deep convolutional neural network methodology for the automatic classification of diabetic retinopathy using color fundus image, and obtained an accuracy of 94.5% on our dataset, outperforming the results obtained by using classical approaches. Full article

(This article belongs to the Special Issue Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017))

► Show Figures

Figure 1

Journal Menu

Journal Browser

Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017)

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (13 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI