Next Article in Journal
Acceleration of Global Optimization Algorithm by Detecting Local Extrema Based on Machine Learning
Next Article in Special Issue
Fatigue Detection with Spatial-Temporal Fusion Method on Covariance Manifolds of Electroencephalography
Previous Article in Journal
Ising Model for Interpolation of Spatial Data on Regular Grids
Previous Article in Special Issue
Enhanced Directed Random Walk for the Identification of Breast Cancer Prognostic Markers from Multiclass Expression Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Entropy-Based Graph Clustering of PPI Networks for Predicting Overlapping Functional Modules of Proteins

1
Department of Biostatistics, Wonju College of Medicine, Yonsei University, Wonju-si 26426, Gangwon-do, Korea
2
National Health Big Data Clinical Research Institute, Wonju College of Medicine, Yonsei University, Wonju-si 26426, Gangwon-do, Korea
3
Division of Software, Yonsei University Mirae Campus, Wonju-si 26493, Gangwon-do, Korea
4
Division of Digital Healthcare, Yonsei University Mirae Campus, Wonju-si 26493, Gangwon-do, Korea
*
Author to whom correspondence should be addressed.
Entropy 2021, 23(10), 1271; https://doi.org/10.3390/e23101271
Submission received: 10 August 2021 / Revised: 25 September 2021 / Accepted: 25 September 2021 / Published: 28 September 2021
(This article belongs to the Special Issue Networks and Systems in Bioinformatics)

Abstract

:
Functional modules can be predicted using genome-wide protein–protein interactions (PPIs) from a systematic perspective. Various graph clustering algorithms have been applied to PPI networks for this task. In particular, the detection of overlapping clusters is necessary because a protein is involved in multiple functions under different conditions. graph entropy (GE) is a novel metric to assess the quality of clusters in a large, complex network. In this study, the unweighted and weighted GE algorithm is evaluated to prove the validity of predicting function modules. To measure clustering accuracy, the clustering results are compared to protein complexes and Gene Ontology (GO) annotations as references. We demonstrate that the GE algorithm is more accurate in overlapping clusters than the other competitive methods. Moreover, we confirm the biological feasibility of the proteins that occur most frequently in the set of identified clusters. Finally, novel proteins for the additional annotation of GO terms are revealed.

1. Introduction

A functional module is a separable entity in which the functions can be separated. Functional modules overlap with each other because a protein performs multiple functions under different conditions [1]. A protein complex is a multiprotein unit composed of several proteins linked by non-covalent bonds. A protein can be included as a subunit in multiple complexes of oligomeric structures. Functional modules or protein complexes can be predicted using protein–protein interactions (PPIs) from a systematic perspective. PPIs can be represented as a network, which is an undirected graph. The discovery of the entire set of functional modules from genome-wide PPI networks is an important goal of functional genomics [2]. Detecting overlapping clusters is also useful for predicting functional modules at the genome scale [3].
Various graph clustering algorithms have been applied to biological networks. Graph clustering algorithms can be divided into two groups: partition-based and local search algorithms. Partition-based algorithms search for the optimal partitioning of a graph. For example, Markov clustering (MCL) [4] is a partition-based clustering algorithm for weighted networks. This algorithm strengthens and weakens the connections iteratively through Markov chains to determine the optimal partition. InfoMap [5,6] is also a partition-based clustering algorithm that was originally designed for directed and weighted networks. However, it can be applied to undirected graphs by considering all the edges as bidirectional. InfoMap uses an entropy metric to determine the optimal partition by minimizing both the local entropy per cluster and the global entropy. Hierarchical algorithms can be included in the category of partition-based clustering. They repeatedly merge the closest subgraphs or recursively divide a graph into two subgraphs to achieve the best partition. The major characteristic of these partition-based clustering algorithms is that they are unable to detect overlapping clusters, that is, two or more clusters do not share any nodes.
Local search algorithms repeatedly search for the best cluster in a local area to generate a set of clusters. They use their own modularity functions for the local optimization. MCODE [7] is one of the most prevalent graph clustering algorithms for biological networks. This algorithm follows the seed growth procedure for a local search. For each cluster, the selected seed node grows by adding neighbors that have a score above a given threshold. As a severe disadvantage, MCODE requires the setting of many parameters for scoring and adjusting the cluster growth, and the clustering results are sensitive to these parameter settings. This indicates that the MCODE algorithm is unsuitable for unsupervised learning. CFinder [8] is a local search algorithm that uses the clique percolation technique. CFinder finds the cliques with k nodes, called k -cliques, and iteratively merges them if they share ( k 1 ) nodes. Because CFinder must search for cliques to find each cluster, its efficiency and scalability are typically limited, particularly if the network is large and has complex connectivity. The graph entropy (GE) algorithm [9,10] also performs a local search following the seed growth procedure. For each cluster, the selected seed node grows based on the novel metric of GE. Because these local search algorithms find each cluster independently, the resultant clusters can overlap.
Recent studies have emphasized the importance of detecting overlapping clusters. For example, overlapping MCL, which is a method for iteratively forming overlapping clusters [11]; the overlapping cluster generator as a method to use extended modularity for overlapping clusters [12]; pairwise constraint non-negative matrix tri-factorization, which is a method for finding overlapping functional modules based on the matrix [13]; and a method for forming nested clusters with a greedy search algorithm [14] have recently been proposed.
In this study, we verified the role of predicting functional modules from PPI networks using unweighted and weighted GE algorithms. The accuracy of the GE algorithm was compared with that of competitive graph clustering algorithms. We also assessed the contributions of overlapping clusters in terms of functional module prediction.

2. Materials and Methods

2.1. PPI Datasets

We used two datasets, STRING and BioPlex, as PPI networks for Homo sapiens. The STRING database [15,16] provides broadly integrated interactions and a confidence score for each interaction. The confidence score of STRING [17] corresponds to the probability of finding a linked protein within the same pathway in KEGG [18]. For PPIs from STRING, we used the physical links of their confidence scores limited to 700 or higher. The BioPlex network [19] consists of PPIs obtained using high-throughput affinity-purification mass spectrometry. A unique gene symbol was used for each protein in both datasets and was capitalized. For PPI networks from STRING and BioPlex, 4,338,217 and 118,162 links were used, respectively, by removing redundant links and self-loops.
To analyze the weighted networks, we used the confidence scores of PPIs in STRING as the probabilistic weights of edges. We also applied topological weights to PPIs in STRING and BioPlex by computing the ratios of the common neighboring nodes using the Jaccard index.

2.2. References

To measure clustering accuracy on the PPI networks, we compared the clustering results with protein complexes and Gene Ontology (GO) annotation data. Protein complexes were collected from both large- and small-scale experimental results in CORUM [20] and PCDq [21]. The integrated dataset included 2576 distinct proteins.
GO [22] is the most widely referenced ontology database unifying biological representation and provides annotations of molecular products to the biological descriptions based on published evidence. For GO annotations, we combined the terms of the biological process and molecular function sub-ontologies to make 16,588 reference clusters. GOATOOLS [23] was used to identify each description for annotation.

2.3. Graph Clustering Algorithms

2.3.1. Graph Entropy Algorithm

A previous study [9,10] introduced a graph clustering algorithm based on the information-theoretic definition of GE which is a measure of modularity in a graph. Suppose an undirected graph is partitioned into k subgraphs, C 1 , C 2 , ⋯, C k ; the entropy of each node v is computed as follows:
e ( v ) = i = 1 k p ( x i ) log p ( x i )
where p ( x i ) is the ratio of the edges between v and the nodes of C i to all edges of v. The entropy e ( G ) of a graph G is then defined as the sum of the node entropies for all nodes in G.
e ( G ) = i = 1 N e ( v i )
where N is the total number of nodes in G. The lowest GE in this equation indicates the highest modularity of the partition of G.
The GE algorithm employs the seed–growth procedure, which selects a node as an initial seed cluster and grows the seed cluster to optimize graph modularity. The definition of GE is applied to the local optimization of a cluster. The graph is partitioned into two subgraphs: a seed cluster and the other part. The seed cluster grows to search for the lowest GE. Here, the entropy of each node is computed as follows:
e ( v ) = p ( x i ) log p ( x i ) p ( x o ) log p ( x o )
where p ( x i ) is the ratio of the edges between v and any node inside the seed cluster, and p ( x o ) is the ratio of the edges between v and any node outside the seed cluster.
In the GE algorithm, the seed–growth process iterates to find a set of clusters. Because each cluster is generated independently, the resultant clusters can overlap with each other even though a seed node is selected outside the clusters that are found during the preceding iterations. A stepwise description of the GE algorithm is provided below. Similar to the definition of a neighbor of a node v as a node linked to v, the neighbor of a cluster C is defined as a node outside C that is linked to any node in C.
  • Select a seed node. Among the nodes that are not in the output clusters from Step 6, select the one with the highest degree as the seed node.
  • Form an initial seed cluster including the seed node and its neighbors.
  • Delete each neighbor of the seed node iteratively from the seed cluster if GE decreases. Check the neighbors in descending order of their degrees.
  • Add each neighbor of the seed cluster iteratively into the seed cluster if GE decreases. Check the neighbors in descending order of their degrees.
  • Output the seed cluster if partitioning the graph by the cluster results in the lowest GE.
  • Repeat Steps 1–5 to output a set of clusters until no seed node remains.

2.3.2. Weighted GE Algorithm

The GE algorithm can be applied to a weighted graph. The weight of each edge indicates the strength of the interaction. To detect strongly connected clusters from a weighted graph, the equation for node entropy should be upgraded. In this study, we tested two methods to compute the weighted entropy of each node. The first method weighs the two factors of Equation (3) using the sums of the edge weights.
e ( v ) = W i · p ( x i ) log p ( x i ) W o · p ( x o ) log p ( x o ) .
In this equation, W i = i = 1 m w i , where w is an edge weight, and m is the number of edges of v that are linked to the nodes inside the seed cluster. In other words, the weight W i is the sum of the edge weights between v and the nodes inside the seed cluster. Similarly, W o = i = 1 k w i , where k is the number of edges of v that are linked to the nodes outside the seed cluster. The weighted GE algorithm using Equation (4) is referred to as GE with multiplied weights (GE-MW). The second method replaces the ratios of edges in Equation (3) with the weighted ratios.
e ( v ) = W i W i + W o log W i W i + W o W o W i + W o log W o W i + W o .
The weighted GE algorithm using Equation (5) is referred to as GE with weighted ratios (GE-WR).

2.4. Evaluation of Clustering Accuracy

The metrics of average F-score (Equation (6)) and average precision (Equation (7)) were used to evaluate the accuracy of clusters in comparison with protein complexes or GO annotations. The highest value of each cluster set was obtained and the collected values were averaged over all resultant clusters. In this evaluation, we excluded the proteins that do not exist in PPIs from the references. We also excluded the proteins that do not exist in the reference set from the clusters.
Let the clusters be a set { C 1 , C 2 , , C l } and the reference be a set { r 1 , r 2 , , r p } . The precision of a cluster C i compared to r j can be expressed as follows: P i j = | C i r j | / | C i | . The recall of a cluster C i compared to r j can be expressed as follows: R i j = | C i r j | / | r j | . The average F-score F ¯ and average precision P ¯ are expressed using Equations (6) and (7), respectively.
F ¯ = 1 l i = 1 l max j = 1 p 2 × P i j × R i j P i j + R i j , i = 1 , 2 , , l
P ¯ = 1 l i = 1 l max j = 1 p P i j , i = 1 , 2 , , l
In order to measure the proportion of functionally homogeneous modules in the set of clusters, we referred to a related previous study [24]. Among all clusters, we measured the proportion of the clusters with precision of 0.6 or greater in comparison with GO annotations.
The feasibility of overlapping cluster detection was also evaluated. A node in a PPI network that appears twice or more in the set of clusters is defined as an overlapping node, and a cluster that includes at least one overlapping node is defined as an overlapping cluster. We evaluated the accuracy of the overlapping clusters obtained through each method by comparing them with the references.

3. Results

3.1. Experimental Settings

We implemented the unweighted and weighted GE algorithms MCODE, CFinder, InfoMap, and MCL for accuracy comparison. Among the selected algorithms, GE and InfoMap do not have any parameters. However, the other methods require parameter settings. We used the same parameter settings for these unsupervised methods on two datasets, STRING and BioPlex. The recommended values from the original study that introduced each method were applied.
MCODE requires many parameter settings. The “degree cutoff” parameter controls the minimum degree of a node to be scored, the “node density cutoff” parameter describes a density threshold for neighbors of a current cluster to be added, the “node score cutoff” parameter controls the score of a node to be added to the current cluster, the “k-core” parameter filters out the clusters that do not contain the nodes of at least k degrees, and the “max depth” parameter limits the distance from the seed node. Our experimental settings for MCODE parameter values are as follows: we set the degree cutoff to 2, node density cutoff to 0.1, node score cutoff to 0.2, and k-core to 2.
CFinder must specify the k value to search for k-cliques. It also runs in a weighted network with an intensity parameter. A clique is added to a cluster only if its intensity, the geometric average of edge weights in the clique, is larger than a threshold. We used k = 3 and intensity threshold = 0 for unweighted networks and intensity threshold = 0.5 for weighted networks.
MCL requires inflation as a parameter that controls the extent of strengthening and weakening. This parameter influences the granularity of clusters. We set inflation = 3. The CDLI [25] and NetworkX [26] libraries were used to implement MCL and InfoMap.

3.2. Clustering Results

We excluded singletons having only one node from the obtained clusters. The number of clusters and the average cluster size are compared in Table 1.Comparing the clustering results on the STRING PPI dataset, which is a large network, the GE and MCL algorithms generated a larger number of clusters than the others. This indicates that they become reliable methods in genome-wide analysis of large networks. Conversely, it was confirmed that such a number could not be obtained in the case of a small network, the Bioplex PPI dataset. The unweighted GE algorithm removed a large number of singletons obtained from the small network.
For the reference datasets, the average size of protein complexes was 4.4, and that of GO annotations was 19.9. The clustering results of GE, CFinder, and MCL had an average size similar to those values. However, the clustering results of MCODE and InfoMap had a significantly larger average size than the references.

3.3. Accuracy Evaluation of Clusters

The performance of the selected graph clustering algorithms was evaluated by comparing their clustering results with two reference datasets, protein complexes and GO annotations. Table 2 shows F ¯ scores, and Table 3 shows P ¯ scores and the proportion of functionally homogeneous modules. For the STRING dataset, the GE, MCODE, InfoMap, and MCL algorithms were applied. Because CFinder is not suitable for application to a large complex network, it could not be tested with the STRING dataset. In our experiment, CFinder was not completed within 50 h under the specifications of Core i9, DDR4 32GB, and RTX 3070. However, for the BioPlex dataset, CFinder and the above four methods were implemented. The elapsed time of CFinder for the BioPlex dataset was 2 h 6 min, whereas GE produced an entire set of clusters in 7 min.
To evaluate the edge weighting, three cases of unweighted, probabilistic, and topological weights were examined. The confidence scores of PPIs in the STRING dataset were used as the probabilistic weights. In summary, for the STRING dataset, all three cases were applied, whereas two cases—unweighted and topological weights—were applied for the BioPlex dataset. To implement weighted GE, we used the GE-WR in Equation (5).
In the F ¯ -score evaluation of Table 2, in the case of the STRING dataset, GE, InfoMap, and MCL excelled in comparison with protein complexes, and in comparison with GO annotations, GE and InfoMap stood out. In the case of BioPlex dataset, GE, CFinder, and MCL excelled in comparison with protein complexes, and GE and MCL excelled in comparison with GO annotations. That is, the GE algorithm took precedence in all four cases applied. The P ¯ score in Table 3 also showed a similar pattern. It is also noteworthy that the proportion of functionally homogeneous modules among the clusters from the GE algorithm is upstream. This means that most clusters from the GE algorithm are composed of proteins with the same function.

3.4. Accuracy Evaluation of Overlapping Clusters

Among the graph clustering algorithms selected for our experiment, the partition-based methods of InfoMap and MCL are unable to detect overlapping clusters. For GE, MCODE, and CFinder, the ratios of overlapping clusters are listed in Table 4. MCODE produced the highest ratio of overlapping clusters. However, the number of clusters of MCODE was significantly smaller than that of GE.
We evaluated the accuracy of the overlapping clusters collected from the clustering results of GE, MCODE, and CFinder. Table 5 shows the F ¯ scores measured for overlapping clusters only. It can be observed that as the number of clusters is reduced, the overall accuracy decreases. For the STRING dataset, GE using probabilistic weights had the highest F ¯ score (0.375) compared to protein complexes, and unweighted GE had the highest F ¯ score (0.537) compared to GO annotations. For the BioPlex dataset, CFinder using topological weights had the highest F ¯ score (0.432) compared to protein complexes, and GE using topological weights had the highest F ¯ score (0.359) compared to GO annotations. Overall, in the evaluation of overlapping clusters, GE exhibited the best performance.
We also compared the precision of the overlapping clusters to assess whether the members of an overlapping cluster were included in the same protein complex or GO annotation. As shown in Table 6, the unweighted GE method showed the highest precision for all PPI datasets and references. For the STRING dataset, the average precision was 0.308 compared to protein complexes and 0.880 compared to GO annotations. For the BioPlex dataset, the average precision was 0.575 compared to protein complexes and 0.915 compared to GO annotations. When the overlapping clusters are compared to GO annotations, the P ¯ scores in Table 6 are remarkably higher than the F ¯ scores in Table 5. This result was caused by the relatively large size of the GO annotations used as a reference, as well as the large number of GO annotations.
For a more detailed comparison of the accuracy of overlapping clusters, Figure 1 and Figure 2 show the distributions of the values from each accuracy metric using boxplots. The distributions of F-scores of the overlapping clusters compared to protein complexes and GO annotations (shown as GOA) are displayed in Figure 1. The distributions of precision scores of the overlapping clusters are also examined in Figure 2. In the case of the median and mean values, results similar to those in Table 5 and Table 6 can be confirmed. Overall, the distributions demonstrate that unweighted GE and GE with probabilistic weights have higher accuracy than the other cases.
From the precision comparison in Figure 2, it can be seen that the comparison with GO annotations of Figure 2b shows a higher precision value than the comparison with protein complexes of Figure 2a due to larger and more reference clusters of GO annotations than protein complexes. That is, precision is higher because the average size of the reference clusters of GO annotations (19.9) is significantly larger than that of protein complexes (4.4), and the number of reference clusters of GO annotations (16,588) is greater than the number of protein complexes (2576). Larger and more references give a comparative advantage of higher precision.

3.5. Biological Aspects of Clusters from GE Algorithms

To examine the biological aspects of the clustering results from the unweighted and weighted GE algorithms, the STRING dataset was used because of the larger number of PPIs. We considered two aspects: First, the biological suitability of the proteins that appeared most frequently in the clusters was investigated with reference to previous studies, as described in Table 7. Second, novel members of known functional modules from GO were predicted based on the clustering results, as shown in Table 8. Unlike the previous calculation, the F-scores compared to GO annotations were obtained without removing the exclusive proteins from the reference. Newly discovered proteins in the clusters with an F-score greater than 0.9 were treated as novel members.
Table 6 shows that two high-frequency classes of proteins (Rab1 and ITSN) are involved in the regulation of many other proteins, and the third class of proteins, CaM, is involved in increasing the interaction affinity of many proteins. These functional descriptions explain why overlapping proteins appear so frequently across clusters.
In Table 8, two GO terms with an F-score close to 1, that is, GO:0019054 and GO:0070125, were selected. In the case of GO:0019054, its function is described as modulation by virus of host cellular process, and the missing element (KPNA6) can be filled in the Karyopherin proteins; this is easy to reveal intuitively, as confirmed by connections on the PPI network. A recent study [33] also reported that KPNA6 is necessary for replicating viruses such as Zika virus.
In the case of GO:0070125, its function is described as mitochondrial translational elongation, and the newly appeared members were different for each algorithm; therefore, the F-score is also different. First, the novel proteins common to all algorithms are AC004556.3, AC139530.2, HDDC3, HIBCH, ICT1, MRRF, MTIF2, MTIF3, MTRF1L, RPL23L, and RPMS17. Among them, ITCN1 has been reported as a putative factor for mitochondrial translational release [34]. A previous study [35] also reported that MTRF1 is a mitochondrial translational release factor, and MRRF is required for ribosome recycling at the termination of mitochondrial translation. Another study [36] reported that MTIF2 and MTIF3 are two initiation factors involved in mitochondrial translation. As mitochondrial ribosomal proteins, RPL23L and RPMS17 are aliases of MRPL23 and MRPS17 from GO annotation, respectively.
Among the novel members of GO:0070125, the exclusive proteins for each method were as follows: using unweighted GE, the three exclusive proteins from GO annotation were MRPL23, MRPL58, and TSFM, and the two exclusive novel proteins are C12ORF65 and MTG2. C12ORF65, also known as mitochondrial translation release factor in rescue (MTRFR), has been reported to prevent aberrant translation during elongation [37]. MTG2, also known as GTPBP5 according to the HGNC symbols, has been reported to be required for mitochondrial translation [38]. Using GE with probabilistic weights, the two exclusive proteins from GO annotation were MRPL23 and MRPL58, and the two exclusive novel proteins were C12ORF65 and MTG2, identical to those determined using unweighted GE. Using GE with topological weights, the three exclusive proteins from GO annotation were MRPL23, MRPL58, and TSFM, and the three exclusive novel proteins were GUF1, PDF, and SOD2. According to the gene nomenclature, GUF1 is known as a translation factor, mitochondrial, or GTP-binding elongation factor; PDF is known as peptide deformylase, mitochondrial; and SOD2 is known as superoxide dismutase 2, mitochondrial.

4. Discussion and Conclusions

GE is a novel metric to quantify the modularity of a set of subgraphs (i.e., clusters) in a large, complex network. The GE-based graph clustering algorithm, which iteratively performs a local search to detect an optimal cluster with the lowest GE, was recently proposed. This algorithm can also be extended to the versions for a weighted network, a graph with edge weights. By applying the unweighted and weighted GE algorithms to PPI networks and evaluating their performance, this study confirms their validity for predicting functional modules of proteins.
Unlike other networks, the major property to be considered in a PPI network is modularity. We applied the GE algorithm to a random network with the same number of nodes and edges (in both the Erdos–Renyi method [39] and Knuth method [40]) and found that no clusters were created and only singletons were left. Even in the case of a node with a high degree, most of its neighbors are eliminated during the node removal stage of the GE algorithm because of low modularity. In other words, the connections in random networks cannot be measured because they are literally random, whereas in PPI networks, clusters of proteins can be detected because similar proteins tend to be linked together as protein complexes or functional modules.
Our clustering results have two major implications. First, the GE algorithms are particularly suitable for genome-wide analysis of PPI networks. Their clustering results most closely represent the reference datasets, a set of protein complexes at the genome scale and comprehensive functional modules in GO annotations, in terms of the number of clusters and the average cluster size. Second, the GE algorithm is suitable for predicting functional modules. That is, it belongs to the upper group in a comparison of prediction accuracy and homogeneity of functional modules and has a comparative advantage in accuracy, especially in a comparison of overlapping clusters.
We propose the following two implications from a biological perspective: First, a protein that occurs in many overlapping clusters can be biologically justified based on the functions it performs. Our results confirmed that such proteins are involved in regulation or interaction affinity. A previous study [12] reported that such proteins are involved in regulating and binding activity. It has also been reported that a large number of proteins are involved in the regulation of endocytosis and cell signaling [41]. Second, we propose novel proteins for annotating the GO terms. This may imply the discovery of novel pathways, such as disease–gene associations [42,43].
Finally, the following limitations were identified in this study. The STRING database has an advantage in that it supports a vast amount of interactome data. However, it provides Ensembl protein IDs, whereas gene symbols are commonly used in other datasets. There might be a limit to completely converting Ensembl protein IDs into gene symbols or vice versa. If the aliases for the gene symbols of all proteins are investigated and standardized, better results can be confirmed. There are also cases in which publication bias is inevitable in known PPI networks [44], and attempts to overcome this issue are still insufficient for a genome-wide study [45]. If these limitations are overcome, more accurate and useful results can be expected for research on genome-wide large PPI networks.

Author Contributions

Conceptualization, H.J. and Y.-R.C.; data curation, H.J., Y.K., Y.-S.J. and Y.-R.C.; formal analysis, H.J., Y.K. and Y.-S.J.; funding acquisition, Y.-R.C.; investigation, H.J. and Y.-R.C.; methodology, H.J. and Y.-R.C.; project administration, D.R.K. and Y.-R.C.; resources, D.R.K. and Y.-R.C.; software, H.J., Y.K. and Y.-S.J.; supervision, Y.-R.C.; validation, H.J., Y.K. and Y.-S.J.; visualization, H.J.; writing—original draft preparation, H.J., Y.K., Y.-S.J. and Y.-R.C.; writing—review and editing, H.J. and Y.-R.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government, the Ministry of Science and ICT (No. 2021R1A2C101194611).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

STRING, https://string-db.org/cgi/download, accessed on 17 February 2021; BioPlex, https://bioplex.hms.harvard.edu/interactions.php, accessed on 17 February 2021; GO annotations, http://geneontology.org/docs/download-go-annotations/, accessed on 27 February 2021.

Acknowledgments

We would like to thank Joon-Hyung Sohn of Wonju Medical University’s Central Laboratory for his advice on the use of the data and its biological significance.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
PPIProtein–Protein Interaction
GEGraph Entropy
GOGene Ontology
F ¯ Average F-Score
P ¯ Average Precision Score
MCLMarkov Clustering

References

  1. Barabasi, A.L.; Oltvai, Z.N. Network biology: Understanding the cell’s functional organization. Nat. Rev. Genet. 2004, 5, 101–113. [Google Scholar] [CrossRef]
  2. Pereira-Leal, J.B.; Enright, A.J.; Ouzounis, C.A. Detection of functional modules from protein interaction networks. Proteins Struct. Funct. Bioinform. 2004, 54, 49–57. [Google Scholar] [CrossRef]
  3. Pereira-Leal, J.B.; Levy, E.D.; Teichmann, S.A. The origins and evolution of functional modules: Lessons from protein complexes. Philos. Trans. R. Soc. Biol. Sci. 2006, 361, 507–517. [Google Scholar] [CrossRef] [Green Version]
  4. Enright, A.J.; Van Dongen, S.; Ouzounis, C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002, 30, 1575–1584. [Google Scholar] [CrossRef]
  5. Rosvall, M.; Bergstrom, C.T. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. USA 2008, 105, 1118–1123. [Google Scholar] [CrossRef] [Green Version]
  6. Bohlin, L.; Edler, D.; Lancichinetti, A.; Rosvall, M. Community detection and visualization of networks with the map equation framework. In Measuring Scholarly Impact; Springer: Berlin/Heidelberg, Germany, 2014; pp. 3–34. [Google Scholar]
  7. Bader, G.D.; Hogue, C.W. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 2003, 4, 1–27. [Google Scholar] [CrossRef] [Green Version]
  8. Palla, G.; Derényi, I.; Farkas, I.; Vicsek, T. Uncovering the overlapping community structure of complex networks in nature and society. Nature 2005, 435, 814–818. [Google Scholar] [CrossRef] [Green Version]
  9. Kenley, E.; Cho, Y. Entropy-Based Graph Clustering: Application to Biological and Social Networks. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining, Vancouver, BC, Canada, 11–14 December 2011; pp. 1116–1121. [Google Scholar] [CrossRef]
  10. Kenley, E.; Cho, Y. Detecting protein complexes and functional modules from protein interaction networks: A graph entropy approach. Proteomics 2011, 11, 3835–3844. [Google Scholar] [CrossRef]
  11. Shih, Y.K.; Parthasarathy, S. Identifying functional modules in interaction networks through overlapping Markov clustering. Bioinformatics 2012, 28, i473–i479. [Google Scholar] [CrossRef] [Green Version]
  12. Becker, E.; Robisson, B.; Chapple, C.E.; Guénoche, A.; Brun, C. Multifunctional proteins revealed by overlapping clustering in protein interaction network. Bioinformatics 2012, 28, 84–90. [Google Scholar] [CrossRef] [Green Version]
  13. Liu, G.; Chai, B.; Yang, K.; Yu, J.; Zhou, X. Overlapping functional modules detection in PPI network with pair-wise constrained non-negative matrix tri-factorisation. IET Syst. Biol. 2018, 12, 45–54. [Google Scholar] [CrossRef]
  14. Nepusz, T.; Yu, H.; Paccanaro, A. Detecting overlapping protein complexes in protein-protein interaction networks. Nat. Methods 2012, 9, 471–472. [Google Scholar] [CrossRef]
  15. Szklarczyk, D.; Franceschini, A.; Wyder, S.; Forslund, K.; Heller, D.; Huerta-Cepas, J.; Simonovic, M.; Roth, A.; Santos, A.; Tsafou, K.P.; et al. STRING v10: Protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015, 43, D447–D452. [Google Scholar] [CrossRef]
  16. Szklarczyk, D.; Morris, J.H.; Cook, H.; Kuhn, M.; Wyder, S.; Simonovic, M.; Santos, A.; Doncheva, N.T.; Roth, A.; Bork, P.; et al. The STRING database in 2017: Quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. 2016, 45, D362–D368. [Google Scholar] [CrossRef]
  17. Von Mering, C.; Jensen, L.J.; Snel, B.; Hooper, S.D.; Krupp, M.; Foglierini, M.; Jouffre, N.; Huynen, M.A.; Bork, P. STRING: Known and predicted protein–protein associations, integrated and transferred across organisms. Nucleic Acids Res. 2005, 33, D433–D437. [Google Scholar] [CrossRef]
  18. Kanehisa, M.; Goto, S.; Kawashima, S.; Okuno, Y.; Hattori, M. The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004, 32, D277–D280. [Google Scholar] [CrossRef] [Green Version]
  19. Huttlin, E.L.; Ting, L.; Bruckner, R.J.; Gebreab, F.; Gygi, M.P.; Szpyt, J.; Tam, S.; Zarraga, G.; Colby, G.; Baltier, K.; et al. The BioPlex network: A systematic exploration of the human interactome. Cell 2015, 162, 425–440. [Google Scholar] [CrossRef] [Green Version]
  20. Ruepp, A.; Waegele, B.; Lechner, M.; Brauner, B.; Dunger-Kaltenbach, I.; Fobo, G.; Frishman, G.; Montrone, C.; Mewes, H.W. CORUM: The comprehensive resource of mammalian protein complexes—2009. Nucleic Acids Res. 2010, 38, D497–D501. [Google Scholar] [CrossRef] [Green Version]
  21. Kikugawa, S.; Nishikata, K.; Murakami, K.; Sato, Y.; Suzuki, M.; Altaf-Ul-Amin, M.; Kanaya, S.; Imanishi, T. PCDq: Human protein complex database with quality index which summarizes different levels of evidences of protein complexes predicted from H-Invitational protein-protein interactions integrative dataset. BMC Syst. Biol. 2012, 6, S7. [Google Scholar] [CrossRef] [Green Version]
  22. The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 2019, 47, D330–D338. [Google Scholar] [CrossRef] [Green Version]
  23. Klopfenstein, D.; Zhang, L.; Pedersen, B.S.; Ramírez, F.; Vesztrocy, A.W.; Naldi, A.; Mungall, C.J.; Yunes, J.M.; Botvinnik, O.; Weigel, M.; et al. GOATOOLS: A Python library for Gene Ontology analyses. Sci. Rep. 2018, 8, 10872. [Google Scholar] [CrossRef]
  24. Liu, G.; Wang, H.; Chu, H.; Yu, J.; Zhou, X. Functional diversity of topological modules in human protein-protein interaction networks. Sci. Rep. 2017, 7, 16199. [Google Scholar] [CrossRef] [Green Version]
  25. Rossetti, G.; Milli, L.; Cazabet, R. CDLIB: A python library to extract, compare and evaluate communities from complex networks. Appl. Netw. Sci. 2019, 4, 52. [Google Scholar] [CrossRef] [Green Version]
  26. Hagberg, A.A.; Schult, D.A.; Swart, P.J. Exploring Network Structure, Dynamics, and Function using NetworkX. In Proceedings of the 7th Python in Science Conference, Pasadena, CA, USA, 19–24 August 2008; pp. 11–15. [Google Scholar]
  27. Cavieres, V.A.; Cerda-Troncoso, C.; Rivera-Dictter, A.; Castro, R.I.; Luchsinger, C.; Santibañez, N.; Burgos, P.V.; Mardones, G.A. Human Golgi phosphoprotein 3 is an effector of RAB1A and RAB1B. PLoS ONE 2020, 15, e0237514. [Google Scholar] [CrossRef]
  28. Mizuno-Yamasaki, E.; Rivera-Molina, F.; Novick, P. GTPase networks in membrane traffic. Annu. Rev. Biochem. 2012, 81, 637–659. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Hunter, M.P.; Russo, A.; O’Bryan, J.P. Emerging roles for intersectin (ITSN) in regulating signaling and disease pathways. Int. J. Mol. Sci. 2013, 14, 7829–7852. [Google Scholar] [CrossRef] [Green Version]
  30. Herrero-Garcia, E.; O’Bryan, J.P. Intersectin scaffold proteins and their role in cell signaling and endocytosis. Biochim. Biophys. Acta (BBA)-Mol. Cell Res. 2017, 1864, 23–30. [Google Scholar] [CrossRef]
  31. Boczek, N.J.; Gomez-Hurtado, N.; Ye, D.; Calvert, M.L.; Tester, D.J.; Kryshtal, D.O.; Hwang, H.S.; Johnson, C.N.; Chazin, W.J.; Loporcaro, C.G.; et al. Spectrum and Prevalence of CALM1-, CALM2-, and CALM3-Encoded Calmodulin Variants in Long QT Syndrome and Functional Characterization of a Novel Long QT Syndrome–Associated Calmodulin Missense Variant, E141G. Circ. Cardiovasc. Genet. 2016, 9, 136–146. [Google Scholar] [CrossRef] [Green Version]
  32. Chin, D.; Means, A.R. Calmodulin: A prototypical calcium sensor. Trends Cell Biol. 2000, 10, 322–328. [Google Scholar] [CrossRef]
  33. Yang, L.; Wang, R.; Yang, S.; Ma, Z.; Lin, S.; Nan, Y.; Li, Q.; Tang, Q.; Zhang, Y.J. Karyopherin alpha 6 is required for replication of porcine reproductive and respiratory syndrome virus and zika virus. J. Virol. 2018, 92, e00072-18. [Google Scholar] [CrossRef] [Green Version]
  34. Richter, R.; Rorbach, J.; Pajak, A.; Smith, P.M.; Wessels, H.J.; Huynen, M.A.; Smeitink, J.A.; Lightowlers, R.N.; Chrzanowska-Lightowlers, Z.M. A functional peptidyl-tRNA hydrolase, ICT1, has been recruited into the human mitochondrial ribosome. EMBO J. 2010, 29, 1116–1125. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Hansen, L.L.; Jørgensen, R.; Justesen, J. Assignment of the Human Mitochondrial Translational Release Factor 1 (MTRF1) to Chromosome 13q14. 1–> q14. 3 and of the Human Mitochondrial Ribosome Recycling Factor (MRRF) to Chromosome 9q32–> q34. 1 With Radiation Hybrid Mapping. Cytogenet. Cell Genet. 2000, 88, 91–92. [Google Scholar] [CrossRef]
  36. Rudler, D.L.; Hughes, L.A.; Perks, K.L.; Richman, T.R.; Kuznetsova, I.; Ermer, J.A.; Abudulai, L.N.; Shearwood, A.M.J.; Viola, H.M.; Hool, L.C.; et al. Fidelity of translation initiation is required for coordinated respiratory complex assembly. Sci. Adv. 2019, 5, eaay2118. [Google Scholar] [CrossRef] [Green Version]
  37. Desai, N.; Yang, H.; Chandrasekaran, V.; Kazi, R.; Minczuk, M.; Ramakrishnan, V. Elongational stalling activates mitoribosome-associated quality control. Science 2020, 370, 1105–1110. [Google Scholar] [CrossRef] [PubMed]
  38. Maiti, P.; Antonicka, H.; Gingras, A.C.; Shoubridge, E.A.; Barrientos, A. Human GTPBP5 (MTG2) fuels mitoribosome large subunit maturation by facilitating 16S rRNA methylation. Nucleic Acids Res. 2020, 48, 7924–7943. [Google Scholar] [CrossRef]
  39. Renyi, E. On random graph. Publ. Math. 1959, 6, 290–297. [Google Scholar]
  40. Knuth, D.E. Art of Computer Programming, Volume 2: Seminumerical Algorithms; Addison-Wesley Professional: Boston, MA, USA, 2014. [Google Scholar]
  41. Sorkin, A.; Von Zastrow, M. Endocytosis and signalling: Intertwining molecular networks. Nat. Rev. Mol. Cell Biol. 2009, 10, 609–622. [Google Scholar] [CrossRef] [Green Version]
  42. Afiqah-Aleng, N.; Altaf-Ul-Amin, M.; Kanaya, S.; Mohamed-Hussein, Z.A. Graph cluster approach in identifying novel proteins and significant pathways involved in polycystic ovary syndrome. Reprod. Biomed. Online 2020, 40, 319–330. [Google Scholar] [CrossRef]
  43. Eguchi, R.; Karim, M.B.; Hu, P.; Sato, T.; Ono, N.; Kanaya, S.; Altaf-Ul-Amin, M. An integrative network-based approach to identify novel disease genes and pathways: A case study in the context of inflammatory bowel disease. BMC Bioinform. 2018, 19, 1–12. [Google Scholar] [CrossRef]
  44. Schaefer, M.H.; Serrano, L.; Andrade-Navarro, M.A. Correcting for the study bias associated with protein–protein interaction measurements reveals differences between protein degree distributions from different cancer types. Front. Genet. 2015, 6, 260. [Google Scholar] [CrossRef] [Green Version]
  45. Luck, K.; Kim, D.K.; Lambourne, L.; Spirohn, K.; Begg, B.E.; Bian, W.; Brignall, R.; Cafarelli, T.; Campos-Laborie, F.J.; Charloteaux, B.; et al. A reference map of the human binary protein interactome. Nature 2020, 580, 402–408. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Boxplots of F-score distributions of overlapping clusters. (a) F-scores of overlapping clusters compared to protein complexes. (b) F-scores of overlapping clusters compared to GO annotations. The green solid triangles and orange lines represent the mean and median values of the distributions, respectively.
Figure 1. Boxplots of F-score distributions of overlapping clusters. (a) F-scores of overlapping clusters compared to protein complexes. (b) F-scores of overlapping clusters compared to GO annotations. The green solid triangles and orange lines represent the mean and median values of the distributions, respectively.
Entropy 23 01271 g001
Figure 2. Boxplots of precision distributions of overlapping clusters. (a) Precision of overlapping clusters compared to protein complexes. (b) Precision of overlapping clusters compared to GO annotations. The green solid triangles and orange lines represent the mean and median values of the distributions, respectively.
Figure 2. Boxplots of precision distributions of overlapping clusters. (a) Precision of overlapping clusters compared to protein complexes. (b) Precision of overlapping clusters compared to GO annotations. The green solid triangles and orange lines represent the mean and median values of the distributions, respectively.
Entropy 23 01271 g002
Table 1. Clustering results of the selected graph clustering algorithm.
Table 1. Clustering results of the selected graph clustering algorithm.
PPI NetworkAlgorithmWeightNumber of ClustersAverage Cluster Size
STRINGGEUnweighted99519.9
Probabilistic98220.5
Topological44429.9
MCODEUnweighted29871.2
Probabilistic85172.7
Topological29665.2
InfoMapUnweighted21744.0
Probabilistic21943.6
Topological9079.3
MCLUnweighted10618.9
Probabilistic78111.2
Topological4789.2
BioPlexGEUnweighted1264.2
Topological11887.8
MCODEUnweighted113237.8
Topological23770.5
CFinderUnweighted82313.1
Topological14510.4
InfoMapUnweighted51427.2
Topological79109.5
MCLUnweighted30873.2
Topological1547.0
Table 2. F ¯ scores of clusters from the selected graph clustering algorithms.
Table 2. F ¯ scores of clusters from the selected graph clustering algorithms.
PPI NetworkAlgorithmWeight F ¯ Score with Protein Complexes F ¯ Score with GO Annotations
STRINGGEUnweighted0.5670.554
Probabilistic0.5720.552
Topological0.5210.517
MCODEUnweighted0.4650.487
Probabilistic0.2970.397
Topological0.4830.494
InfoMapUnweighted0.5970.587
Probabilistic0.5700.584
Topological0.5930.600
MCLUnweighted0.5470.495
Probabilistic0.5380.494
Topological0.5920.533
BioPlexGEUnweighted0.5060.435
Topological0.5360.389
MCODEUnweighted0.3160.254
Topological0.3960.318
CFinderUnweighted0.4830.377
Topological0.6610.433
InfoMapUnweighted0.4400.296
Topological0.3720.388
MCLUnweighted0.5340.378
Topological0.6430.456
Table 3. P ¯ scores and the proportion of functionally homogeneous modules among clusters from the selected graph clustering algorithms.
Table 3. P ¯ scores and the proportion of functionally homogeneous modules among clusters from the selected graph clustering algorithms.
PPI NetworkAlgorithmWeight P ¯ Score with Protein Complexes P ¯ Score with GO AnnotationsProportion of Functionally Homogeneous Modules (%)
STRINGGEUnweighted0.5420.90298.2
Probabilistic0.5430.90298.2
Topological0.4820.88197.5
MCODEUnweighted0.4320.85694.3
Probabilistic0.2380.80694.1
Topological0.4510.87595.6
InfoMapUnweighted0.6920.93293.6
Probabilistic0.6600.92993.7
Topological0.5800.89295.6
MCLUnweighted0.5830.89592.5
Probabilistic0.5780.90396.4
Topological0.6190.94397.7
BioPlexGEUnweighted0.7750.91193.2
Topological0.5770.79182.3
MCODEUnweighted0.2960.69079.6
Topological0.3770.76981.0
CFinderUnweighted0.5380.74574.5
Topological0.7140.93097.9
InfoMapUnweighted0.4130.65864.1
Topological0.3560.79586.1
MCLUnweighted0.6570.75464.0
Topological0.7870.94795.8
Table 4. Proportion of overlapping clusters from graph clustering algorithms.
Table 4. Proportion of overlapping clusters from graph clustering algorithms.
PPI NetworkAlgorithmWeightNumber of ClustersNumber of Overlapping ClustersProportion (%)
STRINGGEunweighted99577878.1
probabilistic98275676.9
topological44427762.3
MCODEunweighted29825083.8
probabilistic858094.1
topological29623077.7
BioPlexGEunweighted1265543.6
topological118885471.8
MCODEunweighted11311299.1
topological23719281.0
CFinderunweighted82378395.1
topological1454128.2
Table 5. F ¯ scores of overlapping clusters from GE, MCODE, and CFinder.
Table 5. F ¯ scores of overlapping clusters from GE, MCODE, and CFinder.
Graph Clustering Algorithm F ¯ Score with Protein Complexes F ¯ Score with GO Annotations
STRING GE unweighted0.3710.537
STRING GE with probabilistic weights0.3750.535
STRING GE with topological weights0.3070.470
STRING MCODE unweighted0.2690.438
STRING MCODE with probabilistic weights0.1740.390
STRING MCODE with topological weights0.2700.432
BioPlex GE unweighted0.3810.342
BioPlex GE with topological weights0.3210.359
BioPlex MCODE unweighted0.1590.231
BioPlex MCODE with topological weights0.2190.248
BioPlex CFinder unweighted0.2440.347
BioPlex CFinder with topological weights0.4320.349
Table 6. P ¯ scores of overlapping clusters from GE, MCODE, and CFinder.
Table 6. P ¯ scores of overlapping clusters from GE, MCODE, and CFinder.
Graph Clustering Algorithm P ¯ Score with Protein Complexes P ¯ Score with GO Annotations
STRING GE unweighted0.3080.880
STRING GE with probabilistic weights0.3080.880
STRING GE with topological weights0.2440.817
STRING MCODE unweighted0.2070.793
STRING MCODE with probabilistic weights0.1190.764
STRING MCODE with topological weights0.2110.807
BioPlex GE unweighted0.5750.915
BioPlex GE with topological weights0.3190.718
BioPlex MCODE unweighted0.1170.610
BioPlex MCODE with topological weights0.1680.659
BioPlex CFinder unweighted0.2320.659
BioPlex CFinder with topological weights0.4550.847
Table 7. Functions of overlapping proteins with high frequency in the clusters generated by GE.
Table 7. Functions of overlapping proteins with high frequency in the clusters generated by GE.
Overlapping ProteinsFrequency of AppearanceReported Function
STRING GE UnweightedSTRING GE with Probabilistic WeightsSTRING GE with Topological Weights
RAB1A606010Rab1 proteins regulate vesicular transport [27]. Rab GTPases regulate membrane traffic and are involved in many cell types [28].
RAB1B606010
ITSN1555410Intersectins (ITSNs) regulate endocytosis and cell signaling [29]. ITSNs may regulate the interactions of various functions [30].
ITSN2555410
CALM1535115Calmodulin (CaM) is an essential protein for calcium ion sensing and signal transduction [31]. CaM enhances the interaction affinity of many proteins [32].
CALM2535115
CALM3535115
Table 8. Proposing novel proteins for additional annotation to GO terms.
Table 8. Proposing novel proteins for additional annotation to GO terms.
GO TermGO NameGO Annotated ProteinsNovel ProteinsAlgorithmF-Score
GO:0019054modulation by virus of host cellular processKPNA1, KPNA2, KPNA3, KPNA4, KPNA5, KPNA7, KPNB1KPNA6GE unweighted0.933
KPNA6GE with probabilistic weights0.933
GO:0070125mitochondrial translational elongationAURKAIP1, CHCHD1, DAP3, ERAL1, GADD45GIP1, GFM1, GFM2, MRPL1, MRPL10, MRPL11, MRPL12, MRPL13, MRPL14, MRPL15, MRPL16, MRPL17, MRPL18, MRPL19, MRPL2, MRPL20, MRPL21, MRPL22, MRPL23, MRPL24, MRPL27, MRPL28, MRPL3, MRPL30, MRPL32, MRPL33, MRPL34, MRPL35, MRPL36, MRPL37, MRPL38, MRPL39, MRPL4, MRPL40, MRPL41, MRPL42, MRPL43, MRPL44, MRPL45, MRPL46, MRPL47, MRPL48, MRPL49, MRPL50, MRPL51, MRPL52, MRPL53, MRPL54, MRPL55, MRPL57, MRPL58, MRPL9, MRPS10, MRPS11, MRPS12, MRPS14, MRPS15, MRPS16, MRPS17, MRPS18A, MRPS18B, MRPS18C, MRPS2, MRPS21, MRPS22, MRPS23, MRPS24, MRPS25, MRPS26, MRPS27, MRPS28, MRPS30, MRPS31, MRPS33, MRPS34, MRPS35, MRPS36, MRPS5, MRPS6, MRPS7, MRPS9, OXA1L, PTCD3, TSFM, TUFMAC004556.3, AC139530.2, C12ORF65, HDDC3, HIBCH, ICT1, MRRF, MTG2, MTIF2, MTIF3, MTRF1L, RPL23L, RPMS17GE unweighted0.914
AC004556.3, AC139530.2, C12ORF65, HDDC3, HIBCH, ICT1, MRRF, MTG2, MTIF2, MTIF3, MTRF1L, RPL23L, RPMS17GE with probabilistic weights0.920
AC004556.3, AC139530.2, GUF1, HDDC3, HIBCH, ICT1, MRRF, MTIF2, MTIF3, MTRF1L, PDF, RPL23L, RPMS17, SOD2GE with topological weights0.910
Bold typefaces indicate a common set of novel proteins from each algorithm.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jeong, H.; Kim, Y.; Jung, Y.-S.; Kang, D.R.; Cho, Y.-R. Entropy-Based Graph Clustering of PPI Networks for Predicting Overlapping Functional Modules of Proteins. Entropy 2021, 23, 1271. https://doi.org/10.3390/e23101271

AMA Style

Jeong H, Kim Y, Jung Y-S, Kang DR, Cho Y-R. Entropy-Based Graph Clustering of PPI Networks for Predicting Overlapping Functional Modules of Proteins. Entropy. 2021; 23(10):1271. https://doi.org/10.3390/e23101271

Chicago/Turabian Style

Jeong, Hoyeon, Yoonbee Kim, Yi-Sue Jung, Dae Ryong Kang, and Young-Rae Cho. 2021. "Entropy-Based Graph Clustering of PPI Networks for Predicting Overlapping Functional Modules of Proteins" Entropy 23, no. 10: 1271. https://doi.org/10.3390/e23101271

APA Style

Jeong, H., Kim, Y., Jung, Y. -S., Kang, D. R., & Cho, Y. -R. (2021). Entropy-Based Graph Clustering of PPI Networks for Predicting Overlapping Functional Modules of Proteins. Entropy, 23(10), 1271. https://doi.org/10.3390/e23101271

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop