Symmetry and Complexity in Gene Association Networks Using the Generalized Correlation Coefficient
Abstract
:1. Introduction
2. Advancements and Applications of the Generalized Correlation Coefficient
2.1. Theoretical Foundations and Developments of the Generalized Correlation Coefficient
- —which captures the mutual influence between differences in pairs of variables X and Y.
- —which quantifies the squared influence of differences in the variable X.
- —which measures the squared influence of differences in the variable Y.
2.2. Practical Implementations and Computational Refinements of GCC
3. Simulation Study
3.1. Simulation Design
- Case 1—Standard bivariate normal distribution without contamination, where samples were drawn from a bivariate normal distribution with the following parameters: means ; variances ; and correlation coefficients , representing cases of no correlation, moderate correlation, and high correlation, respectively. This case evaluates the estimators under ideal conditions with no contamination and different correlation strengths.
- Case 2—Bivariate normal distribution with shifted means. To assess the robustness of the estimators to location shifts, we generate samples from a bivariate normal distribution , with shifted means, with the same correlation coefficients being used. This case evaluates the effect of mean shifts on the estimator performance.
- Case 3—Bivariate normal distribution with increased variance. To investigate the impact of increased variability, we generate samples from the distribution , , with variances four times greater than in previous cases and the correlation coefficients remaining as . This case simulates scenarios with high variability in biological data.
- Case 4—Contaminated bivariate normal distribution. In this case, we create a mixture consisting of 60% of a bivariate normal distribution with high correlation () and 40% of a bivariate normal distribution with no correlation (). The mixture proportions considered are 60%, 40%, with correlation coefficients . This case evaluates the performance of the estimators in the presence of heterogeneous subpopulations with varying correlation patterns.
- Case 5—Mixture of bivariate normal distributions. To simulate heterogeneous data commonly observed in gene expression analysis, we generate samples from a mixture of two bivariate normal distributions with different means and/or covariances. The mixture proportions considered are 10%, 30%, and 50%, with a weak correlation . This case evaluates the performance of the estimators when data arise from different subpopulations with distinct correlation patterns.
3.2. Simulation Results
- Case 1—Standard bivariate normal distribution without contamination. The RMSE values for each estimator are in Table 1 for different correlation values , flexibility parameters , and sample sizes .The results presented in Table 1 lead to the following key observations:
- –
- Superior performance of (GCC-ML)—Across all correlation levels and sample sizes, the ML estimator () consistently achieves the lowest RMSE. This demonstrates its robustness and accuracy, particularly for small to moderate sample sizes. The GCC-ML estimator effectively handles different correlation structures, making it a reliable choice in both low- and high-correlation scenarios.
- –
- Convergence with increasing sample size—As the sample size increases, all estimators show a reduction in RMSE, indicating convergence towards the true value of . For , the RMSE differences between estimators narrow, but GCC-ML continues to exhibit a slight advantage.
- –
- Impact of correlation strength—In high-correlation settings (), all estimators show an improvement with markedly lower RMSE values, reflecting better performance in strong linear relationships. This improvement is more pronounced for large sample sizes, where RMSE values decrease rapidly.
- –
- Effect of the flexibility parameter —The parameter influences the estimator sensitivity to different types of dependencies. When (similar to the Kendall tau), the RMSE is higher for small sample sizes, indicating a sensitivity to rank-based measures. As increases, the estimators capture more linear dependencies, leading to a decrease in RMSE. The intermediate value of offers a balance between capturing rank-based and moment-based correlation properties.
- –
- Relative performance of (GCC-U) and (adjusted Spearman)—While both (GCC-U) and (adjusted Spearman) generally exhibit higher RMSE compared to (GCC-ML), their performance improves with large sample sizes. In small sample sizes ( or ), GCC-U tends to slightly overestimate for , especially in low-correlation settings (). In addition, the adjusted Spearman estimator tends to underestimate , particularly for moderate correlations ().
These findings indicate that, while the three estimators exhibit consistency with increasing sample sizes, the ML estimator provides the most reliable and accurate estimates for a wide range of scenarios. The choice of should be based on the underlying correlation structure and desired sensitivity to linear or non-linear dependencies. - Case 2—Bivariate normal distribution with shifted means. In this case, we evaluate the robustness of the estimators when data are drawn from a bivariate normal distribution with shifted means, reflecting deviations commonly encountered in real-world datasets, such as gene expression profiles. Specifically, samples were generated from a bivariate normal distribution while maintaining the same correlation levels as in Case 1, that is, . The shift in means introduces an additional layer of complexity, testing the ability of the estimators to adapt to changes in location. Although the variances remain constant, the altered central tendency requires the estimators to perform effectively under different distributional settings. RMSE values for each estimator are presented in Table 2.Key observations from the results of Table 2 are the following:
- –
- Similar to Case 1, (GCC-ML) consistently shows the lowest RMSE values across most correlation levels and sample sizes, demonstrating robustness to mean shifts. The estimator remains stable even under these non-standard conditions, with minimal sensitivity to the shifted means, especially for higher values of (closer to the Pearson correlation), where RMSE is lowest across all sample sizes.
- –
- Both (GCC-U) and (adjusted Spearman) are more affected by the mean shift, particularly for small sample sizes ( and ). RMSE values for increase slightly compared to Case 1, reflecting reduced performance in adapting to location shifts. This effect is more noticeable for , suggesting that rank-based estimators are more sensitive to shifts in location. The adjusted Spearman estimator tends to underestimate but shows less sensitivity to the mean shift than the GCC-U estimator.
- –
- In high-correlation scenarios (), estimators exhibit lower RMSEs, confirming their ability to capture strong relationships despite the mean shift. The ML estimator displays the least variability across different values, maintaining its advantage. For moderate correlations (), the mean shift has a more pronounced effect on the adjusted Spearman estimator, which exhibits higher RMSE values compared to the ML and U-statistic-based estimators.
- –
- As sample sizes increase, RMSE values for all estimators decrease, with differences between them becoming less pronounced. For and , RMSE values converge across all values of , but the ML estimator continues to perform slightly better, particularly for small and moderate sample sizes.
- –
- The parameter continues to influence estimator performance. For (similar to Pearson correlation), the estimators are unaffected by the mean shift. However, for (similar to the Kendall tau), the impact of the mean shift is evident, particularly for the GCC-U estimator. Lower values of show high sensitivity to location shifts, reflecting the rank-based nature of the estimator in such settings.
The introduction of mean shifts provided valuable insights into the robustness of the estimators. While all estimators showed convergence as sample sizes increased, the estimator consistently outperformed the others across a wide range of conditions. The mean shift had a noticeable impact on the performance of the GCC-U and adjusted Spearman estimators, particularly for small sample sizes and low values of . These findings highlight the importance of choosing an appropriate value for based on the data structure and the expected behavior of the estimators under non-standard conditions such as location shifts. - Case 3—Bivariate normal distribution with increased variance. In this case, samples are drawn from a bivariate normal distribution , where the variances are increased fourfold for both variables. This case simulates high-variability conditions, often observed in genomics and biological data, where their variability can obscure underlying correlation patterns. Table 3 presents the RMSE results for this case, considering the same range of correlation coefficients ; flexibility parameters ; and sample sizes .Key observations from the results in Table 3 include the following:
- –
- The increase in variance gives greater dispersion, making correlation estimation more challenging. This is reflected in the slightly higher RMSE values, particularly for small sample sizes (), when compared to the previous cases.
- –
- Despite the high variance, the GCC-ML estimator continues to exhibit the lowest RMSE across most scenarios, consistent with previous observations. However, in certain conditions, such as low correlation () and low flexibility (), the adjusted Spearman estimator may display slightly lower RMSE. This emphasizes the robustness of the ML estimator in varied data conditions, though the adjusted Spearman estimator remains a competitive alternative in some settings.
- –
- As in previous cases, RMSE values for all estimators decrease as the sample size grows, indicating their consistency and convergence toward the true . For large sample sizes ( and ), differences between estimators become less pronounced, though the GCC-ML estimator maintains a slight advantage.
- –
- The parameter continues to play a relevant role in the performance of the estimators. For (similar to the Kendall tau), the RMSE tends to be higher for small sample sizes, reflecting greater sensitivity to rank-based associations. As increases to 1 (similar to the Pearson correlation), the estimators perform better in capturing linear relationships, resulting in lower RMSE values.
- –
- While and show slightly higher RMSE values compared to , their performance improves as the sample size increases. For small sample sizes, the GCC-U estimator tends to overestimate when , particularly in low-correlation scenarios. Additionally, the adjusted Spearman estimator tends to underestimate , especially at moderate correlation levels ().
Therefore, while the increased variance in the data leads to slightly higher RMSE values for all estimators, continues to demonstrate superior performance across all conditions. The choice of remains crucial, influencing the estimator sensitivity to different types of dependencies. In particular, provides a balanced performance across linear and rank-based correlations. - Case 4—Contaminated bivariate normal distribution. In this case, we model contamination by introducing a mixture of bivariate normal distributions, where 60% of the data is drawn from a bivariate normal distribution with a correlation of and 40% from a bivariate normal distribution with zero correlation (). This setup simulates the presence of uncorrelated observations, effectively introducing outliers and reflecting scenarios commonly observed in real-world data. The results of this case are summarized in Table 4.Key observations from Table 4 include the following:
- –
- For , both estimators and tend to overestimate the value of when the sample size is small (). However, as the sample size increases, these estimators converge toward the true value, with the GCC-ML estimator showing marginally lower RMSE values. The estimator consistently yields the smallest RMSE, demonstrating strong robustness to contamination in this scenario.
- –
- At a moderate correlation (), the GCC-ML estimator underestimates the true value of , particularly for small sample sizes. Conversely, the GCC-U estimator tends to overestimate the true correlation when . However, as the sample size increases, the performance of the GCC-U estimator improves, and its RMSE decreases. The adjusted Spearman estimator continues to perform well, although it slightly underestimates the true value of across all sample sizes.
- –
- For high correlation (), the GCC-ML estimator shows consistent underestimation of , though its variability is reduced compared to the moderate correlation case. The GCC-U estimator tends to overestimate when the sample size is small, but this tendency diminishes with large sample sizes. The adjusted Spearman estimator exhibits a slight underestimation but demonstrates less variability than the other estimators for large sample sizes.
- –
- Contamination influences the estimators differently based on the value of . For small values of (closer to the Kendall tau), the estimators tend to be more robust, with the adjusted Spearman estimator showing the highest robustness. As increases, moving closer to the Pearson correlation, the estimators become more sensitive to outliers, leading to higher RMSE values, particularly for the GCC-ML and GCC-U estimators in small sample settings.
- –
- As observed in previous cases, RMSE values decrease as the sample size grows, reflecting consistency and convergence toward the true correlation value. The GCC-ML estimator continues to hold an advantage for large sample sizes, while the adjusted Spearman estimator shows greater stability across different values.
The results of Case 4 highlight the influence of contamination on estimator performance. Although the GCC-ML estimator generally performs well, its sensitivity to outliers is more pronounced for small sample sizes and high values of . The adjusted Spearman estimator exhibits greater robustness under these conditions, particularly in moderate- and high-correlation settings. - Case 5—Mixture of bivariate normal distributions. In this case, we simulate heterogeneity in the data by generating samples from a mixture of two bivariate normal distributions with different means and/or covariances. The mixture proportions considered are 10%, 30%, and 50%, and the performance of the estimators is evaluated for a weak correlation (). Additionally, the flexibility parameter is assessed for values of , , and , covering a range from rank-based to moment-based correlation measures. The results of this case are summarized in Table 5.
- With 10% contamination, the estimator tends to slightly underestimate for , particularly when the sample size is small. However, as the sample size increases, all estimators converge to the true value. The estimator exhibits more variability, especially for small sample sizes and high values of . The estimator shows a slight underestimation for , but it converges as the sample size increases.
- With 30% contamination, the GCC-ML estimator tends to slightly overestimate for and a small sample size (). The GCC-U estimator also shows some overestimation for small sample sizes but improves with large sample sizes. The adjusted Spearman estimator underestimates across all values of , although its performance improves considerably with large sample sizes.
- With 50% contamination, the GCC-ML and GCC-U estimators exhibit high variability for small sample sizes, particularly for , with both tending to overestimate . The adjusted Spearman estimator remains consistent, slightly underestimating the true value but showing much lower variability as the sample size increases.
- As contamination levels increase (from 10% to 50%), all estimators show increased variability, particularly for small sample sizes. However, for large sample sizes ( and ), the RMSE values decrease, indicating convergence toward the true value. The estimators generally perform better with lower contamination levels, and the impact of contamination is pronounced for high values of .
- The parameter affects the estimators’ performance. For (similar to the Kendall tau), the estimators tend to be more robust against contamination, especially for large sample sizes. For (similar to the Pearson correlation), the estimators become more sensitive to contamination, resulting in higher RMSE values, particularly for small sample sizes.
4. Relevance Networks and Advanced Statistical Applications
4.1. Data Collection and Relevance Network Methodology
4.2. Integration of Advanced Statistical Methods in RN Analysis
5. Conclusions
- The adaptability of the generalized correlation coefficient to various data complexities, demonstrating robustness and sensitivity in gene association network analysis.
- The influence of the flexibility parameter on network topology, where low values of this parameter lead to sparser networks, emphasizing the strongest correlations.
- The detection of unique interactions using the Spearman correlation, not captured by any configuration of the generalized correlation coefficient, underscoring the importance of applying multiple correlation measures for comprehensive data analysis.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Cavalcante, T.; Ospina, R.; Leiva, V.; Martin-Barreiro, C.; Cabezas, X. Weibull regression and machine learning survival models: Methodology, comparison, and application to biomedical data related to cardiac surgery. Biology 2023, 12, 442. [Google Scholar] [CrossRef] [PubMed]
- Varuzza, L.; Pereira, C.A.D.B. Significance test for comparing digital gene expression profiles: Partial likelihood application. Chil. J. Stat. 2010, 1, 91–102. [Google Scholar]
- Ospina, R.; Ferreira, A.G.O.; de Oliveira, H.M.; Leiva, V.; Castro, C. On the use of machine learning techniques and non-invasive indicators for classifying and predicting cardiac disorders. Biomedicines 2023, 11, 2604. [Google Scholar] [CrossRef] [PubMed]
- Bielińska-Wąż, D.; Wąż, P.; Błaczkowska, A.; Mandrysz, J.; Lass, A.; Gładysz, P.; Karamon, J. Mathematical modeling in bioinformatics: Application of an alignment-free method combined with principal component analysis. Symmetry 2024, 16, 967. [Google Scholar] [CrossRef]
- Chicco, D.; Jurman, G. A statistical comparison between Matthews correlation coefficient (MCC), prevalence threshold, and Fowlkes–Mallows index. J. Biomed. Informat. 2023, 144, 104426. [Google Scholar] [CrossRef]
- Zhou, K.; Zhang, S.; Wang, Y.; Cohen, K.B.; Kim, J.-D.; Luo, Q.; Yao, X.; Zhou, X.; Xia, J. High-quality gene/disease embedding in a multi-relational heterogeneous graph after a joint matrix/tensor decomposition. J. Biomed. Informat. 2022, 126, 103973. [Google Scholar] [CrossRef]
- Ortega-Leon, A.; Gucciardi, A.; Segado-Arenas, A.; Benavente-Fernández, I.; Urda, D.; Turias, I.J. Neurodevelopmental impairments prediction in premature infants based on clinical data and machine learning techniques. Stats 2024, 7, 685–696. [Google Scholar] [CrossRef]
- Han, H. Bayesian model averaging and regularized regression as methods for data-driven model exploration, with practical considerations. Stats 2024, 7, 732–744. [Google Scholar] [CrossRef]
- Leiva, V.; Corzo, J.; Vergara, M.E.; Ospina, R.; Castro, C. A statistical methodology for evaluating asymmetry after normalization with application to genomic data. Stats 2024, 7, 967–983. [Google Scholar] [CrossRef]
- Leiva, V.; Sanhueza, A.; Kelmansky, S.; Martinez, E. On the glog-normal distribution and its association with the gene expression problem. Comput. Stat. Data Anal. 2009, 53, 1613–1621. [Google Scholar] [CrossRef]
- Vilca, F.; Rodrigues-Motta, M.; Leiva, V. On a variance stabilizing model and its application to genomic data. J. Appl. Stat. 2013, 40, 2354–2371. [Google Scholar] [CrossRef]
- Kelmansky, D.; Martinez, E.; Leiva, V. A new variance stabilizing transformation for gene expression data analysis. Stat. Appl. Genet. Mol. Biol. 2013, 12, 653–666. [Google Scholar] [CrossRef] [PubMed]
- Wilcox, R. The percentage bend correlation coefficient. Psychometrika 1994, 59, 601–616. [Google Scholar] [CrossRef]
- Wilcox, R. Inferences based on a skipped correlation coefficient. J. Appl. Stat. 2004, 31, 131–143. [Google Scholar] [CrossRef]
- Reshef, D.N.; Reshef, Y.A.; Finucane, H.K.; Grossman, S.R.; McVean, G.; Turnbaugh, P.J.; Lander, E.S.; Mitzenmacher, M.; Sabeti, P.C. Detecting novel associations in large datasets. Science 2011, 334, 1518–1524. [Google Scholar] [CrossRef]
- Ravindran, U.; Gunavathi, C. A survey on gene expression data analysis using deep learning methods for cancer diagnosis. Prog. Biophys. Mol. Biol. 2023, 177, 1–13. [Google Scholar] [CrossRef]
- Masoodi, F.; Quasim, M.; Bukhari, S.; Dixit, S.; Alam, S. (Eds.) Applications of Machine Learning and Deep Learning on Biological Data; CRC Press: New York, NY, USA, 2023. [Google Scholar]
- Rahnenführer, J.; De Bin, R.; Benner, A.; Ambrogi, F.; Lusa, L.; Boulesteix, A.L.; Migliavacca, E. Statistical analysis of high-dimensional biomedical data: A gentle introduction to analytical goals, common approaches and challenges. BMC Med. 2023, 21, 182. [Google Scholar] [CrossRef]
- Li, J.J.; Zhou, H.J.; Bickel, P.J.; Tong, X. Dissecting gene expression heterogeneity: Generalized Pearson correlation squares and the K-lines clustering algorithm. J. Am. Stat. Assoc. 2024, 119, 1–14. [Google Scholar] [CrossRef]
- Bai, X.; Wang, S.; Zhang, X.; Wang, H. Molecular-memory-induced counter-intuitive noise attenuator in protein polymerization. Symmetry 2024, 16, 315. [Google Scholar] [CrossRef]
- Chinchilli, V.M.; Philips, B.R.; Mauger, D.T.; Szefler, S.J. A general class of correlation coefficients for the 2 × 2 crossover design. Biom. J. 2005, 47, 644–653. [Google Scholar] [CrossRef]
- McManus, C. Cerebral polymorphisms for lateralisation: Modelling the genetic and phenotypic architectures of multiple functional modules. Symmetry 2022, 14, 814. [Google Scholar] [CrossRef]
- Chen, V.Y.J.; Chinchilli, V.M.; Richards, D.S.P. Robustness and monotonicity properties of generalized correlation coefficients. J. Stat. Plan. Infer. 2011, 141, 924–936. [Google Scholar] [CrossRef]
- Sanchez, J.D.; Rêgo, J.C.; Ospina, R.; Leiva, V.; Chesneau, C.; Castro, C. Similarity-based predictive models: Sensitivity analysis and a biological application with multi-attributes. Biology 2023, 12, 959. [Google Scholar] [CrossRef] [PubMed]
- Alkadya, W.; ElBahnasy, K.; Leiva, V.; Gad, W. Classifying COVID-19 based on amino acids encoding with machine learning algorithms. Chemom. Intell. Lab. Syst. 2022, 224, 104535. [Google Scholar] [CrossRef]
- Bustos, N.; Tello, M.; Droppelmann, G.; Garcia, N.; Feijoo, F.; Leiva, V. Machine learning techniques as an efficient alternative diagnostic tool for COVID-19 cases. Signa Vitae 2022, 18, 23. [Google Scholar]
- García-Sancho, M.; Lowe, J. A History of Genomics Across Species, Communities and Projects; Springer: New York, NY, USA, 2023. [Google Scholar]
- Tully, J.; Hill, A.; Ahmed, H.; Whitley, R.; Skjellum, A.; Mukhtar, M. Expression-based network biology identifies immune-related functional modules involved in plant defense. BMC Genom. 2014, 15, 421. [Google Scholar] [CrossRef]
- Jaskowiak, P.A.; Campello, R.J.G.B.; Costa, I. Proximity measures for clustering gene expression microarray data: A validation methodology and a comparative analysis. Comput. Biol. Bioinform. IEEE/ACM Trans. 2013, 10, 845–857. [Google Scholar] [CrossRef]
- Langfelder, P.; Horvath, S. Fast R functions for robust correlations and hierarchical clustering. J. Stat. Softw. 2012, 46, 1–17. [Google Scholar] [CrossRef]
- Jaskowiak, P.; Campello, R.G.B.; Costa, I. Evaluating correlation coefficients for clustering gene expression profiles of cancer. In Advances in Bioinformatics and Computational Biology; de Souto, M., Kann, M., Eds.; Springer: Heidelberg/Berlin, Germany, 2012; Volume 7409, pp. 120–131. [Google Scholar]
- Son, Y.S.; Baek, J. A modified correlation coefficient based similarity measure for clustering time-course gene expression data. Pattern Recognit. Lett. 2008, 29, 232–242. [Google Scholar] [CrossRef]
- Hardin, J.S.; Mitani, A.; Hicks, L.; VanKoten, B. A robust measure of correlation between two genes on a microarray. BMC Bioinform. 2007, 8, 220. [Google Scholar] [CrossRef]
- Ma, S.; Gong, Q.; Bohnert, H.J. An arabidopsis gene network based on the graphical gaussian model. Genome Res. 2007, 17, 1614–1625. [Google Scholar] [CrossRef] [PubMed]
- Elo, L.L.; Lahesmaa, R.; Aittokallio, T. Inference of gene coexpression networks by integrative analysis across microarray experiments. J. Integr. Bioinform. 2006, 3, 33. [Google Scholar] [CrossRef]
- Voy, B.H.; Scharff, J.A.; Perkins, A.D.; Saxton, A.M.; Borate, B.; Chesler, E.J.; Branstetter, L.K.; Langston, M.A. Extracting gene networks for low-dose radiation using graph theoretical algorithms. PLoS Comput. Biol. 2006, 2, e89. [Google Scholar] [CrossRef] [PubMed]
- Zhu, D.; Hero, A.O.; Cheng, H.; Khanna, R.; Swaroop, A. Network constrained clustering for gene microarray data. Bioinformatics 2005, 21, 4014–4020. [Google Scholar] [CrossRef] [PubMed]
- Xu, W.; Hou, Y.; Hung, Y.S.; Zou, Y. A comparative analysis of Spearman rho and Kendall tau in normal and contaminated normal models. Signal Process. 2013, 93, 261–276. [Google Scholar] [CrossRef]
- Croux, C.; Dehon, C. Influence functions of the spearman and kendall correlation measures. Stat. Methods Appl. 2010, 19, 497–515. [Google Scholar] [CrossRef]
- Maronna, R.A.; Martin, D.R.; Yohai, V.J. Robust Statistics: Theory and Methods; Wiley: New York, NY, USA, 2006. [Google Scholar]
- Kendall, M.G. A new measure of rank correlation. Biometrika 1938, 1, 81–93. [Google Scholar] [CrossRef]
- Kendall, M.G.; Gibbons, J.D. Rank Correlation Methods. A Charles Griffin Book; E. Arnold: London, UK, 1990. [Google Scholar]
- Blomqvist, N. On a measure of dependence between two random variables. Ann. Math. Stat. 1950, 21, 593–600. [Google Scholar] [CrossRef]
- Spearman, C. The proof and measurement of association between two things. Am. J. Psychol. 1904, 15, 72–101. [Google Scholar] [CrossRef]
- Lee, A.J. U-Statistics: Theory and Practice; Routledge: Abingdon, UK, 2019. [Google Scholar]
- Andrews, G.E.; Askey, R.; Roy, R. Special Functions. Encyclopedia of Mathematics and its Applications; Cambridge University Press: Cambridge, UK, 1999; Volume 71. [Google Scholar]
- Hotelling, H. New light on the correlation coefficient and its transformation. J. Royal Stat. Soc. B 1953, 15, 193–232. [Google Scholar] [CrossRef]
- Fisher, R.A. On the probable error of a coefficient of correlation deduced from a small sample. Metron 1921, 1, 3–32. [Google Scholar]
- David, F.N.; Mallows, C.L. The variance of Spearman rho in normal samples. Biometrika 1961, 48, 19–28. [Google Scholar] [CrossRef]
- Serfling, R.J. Approximation Theorems of Mathematical Statistics; Wiley: Hoboken, NJ, USA, 1981. [Google Scholar]
- Butte, A.J.; Kohane, I.S. Mutual information relevance networks: Functional genomic clusteringusing pairwise entropy measurements. Pac. Symp. Biocomput. 2000, 5, 415–426. [Google Scholar]
- Butte, A.J.; Kohane, I.S. Unsupervised knowledge discovery in medical databases using relevance networks. In Proceedings of the AMIA Symposium; American Medical Informatics Association: Washington, DC, USA, 1999; pp. 711–715. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023. [Google Scholar]
- Sanchez, L.; Leiva, V.; Galea, M.; Saulo, H. Birnbaum-Saunders quantile regression and its diagnostics with application to economic data. Appl. Stoch. Model. Bus. Ind. 2021, 37, 53–73. [Google Scholar] [CrossRef]
- Deng, D.; Chowdhury, M.H. Quantile regression approach for analyzing similarity of gene expressions under multiple biological conditions. Stats 2022, 5, 583–605. [Google Scholar] [CrossRef]
Estimator | |||||||
---|---|---|---|---|---|---|---|
0 | 0 | (GCC-ML) | 0.1097 | 0.0892 | 0.0864 | 0.0849 | 0.0840 |
(GCC-U) | 0.1528 | 0.0735 | 0.0649 | 0.0594 | 0.0574 | ||
(Adjusted Spearman) | 0.2246 | 0.0925 | 0.0642 | 0.0407 | 0.0284 | ||
(GCC-ML) | 0.3252 | 0.1315 | 0.0905 | 0.0612 | 0.0465 | ||
(GCC-U) | 0.2969 | 0.1253 | 0.0869 | 0.0554 | 0.0386 | ||
(Adjusted Spearman) | 0.2993 | 0.1271 | 0.0884 | 0.0572 | 0.0407 | ||
1 | (GCC-ML) | 0.3450 | 0.1405 | 0.0967 | 0.0654 | 0.0497 | |
(GCC-U) | 0.3097 | 0.1315 | 0.0912 | 0.0582 | 0.0406 | ||
(Adjusted Spearman) | 0.3171 | 0.1356 | 0.0943 | 0.0611 | 0.0434 | ||
0 | (GCC-ML) | 0.2361 | 0.0931 | 0.0640 | 0.0432 | 0.0328 | |
(GCC-U) | 0.2404 | 0.0944 | 0.0649 | 0.0411 | 0.0286 | ||
(Adjusted Spearman) | 0.2187 | 0.0905 | 0.0628 | 0.0406 | 0.0288 | ||
(GCC-ML) | 0.3252 | 0.1315 | 0.0905 | 0.0612 | 0.0465 | ||
(GCC-U) | 0.2969 | 0.1253 | 0.0869 | 0.0554 | 0.0386 | ||
(Adjusted Spearman) | 0.2993 | 0.1271 | 0.0884 | 0.0572 | 0.0407 | ||
1 | (GCC-ML) | 0.3450 | 0.1405 | 0.0967 | 0.0654 | 0.0497 | |
(GCC-U) | 0.3097 | 0.1315 | 0.0912 | 0.0582 | 0.0406 | ||
(Adjusted Spearman) | 0.3171 | 0.1356 | 0.0943 | 0.0611 | 0.0434 | ||
0 | (GCC-ML) | 0.0631 | 0.0343 | 0.0281 | 0.0243 | 0.0228 | |
(GCC-U) | 0.1380 | 0.0470 | 0.0314 | 0.0196 | 0.0135 | ||
(Adjusted Spearman) | 0.1418 | 0.0542 | 0.0378 | 0.0258 | 0.0199 | ||
(GCC-ML) | 0.3252 | 0.1315 | 0.0905 | 0.0612 | 0.0465 | ||
(GCC-U) | 0.2969 | 0.1253 | 0.0869 | 0.0554 | 0.0386 | ||
(Adjusted Spearman) | 0.2993 | 0.1271 | 0.0884 | 0.0572 | 0.0407 | ||
1 | (GCC-ML) | 0.3450 | 0.1405 | 0.0967 | 0.0654 | 0.0497 | |
(GCC-U) | 0.3097 | 0.1315 | 0.0912 | 0.0582 | 0.0406 | ||
(Adjusted Spearman) | 0.3171 | 0.1356 | 0.0943 | 0.0611 | 0.0434 |
Estimator | |||||||
---|---|---|---|---|---|---|---|
0 | 0 | (GCC-ML) | 0.2017 | 0.1618 | 0.1404 | 0.1086 | 0.0955 |
(GCC-U) | 0.2725 | 0.0774 | 0.0554 | 0.0336 | 0.0241 | ||
(Adjusted Spearman) | 0.2246 | 0.0925 | 0.0642 | 0.0407 | 0.0284 | ||
(GCC-ML) | 0.2104 | 0.1702 | 0.1487 | 0.1165 | 0.1023 | ||
(GCC-U) | 0.2829 | 0.0853 | 0.0601 | 0.0385 | 0.0289 | ||
(Adjusted Spearman) | 0.2301 | 0.0975 | 0.0682 | 0.0437 | 0.0302 | ||
1 | (GCC-ML) | 0.2186 | 0.1785 | 0.1558 | 0.1243 | 0.1088 | |
(GCC-U) | 0.2934 | 0.0912 | 0.0643 | 0.0415 | 0.0321 | ||
(Adjusted Spearman) | 0.2378 | 0.1028 | 0.0723 | 0.0469 | 0.0326 | ||
0 | (GCC-ML) | 0.2785 | 0.2371 | 0.1953 | 0.1456 | 0.1264 | |
(GCC-U) | 0.3154 | 0.1352 | 0.0941 | 0.0600 | 0.0418 | ||
(Adjusted Spearman) | 0.2187 | 0.0905 | 0.0628 | 0.0406 | 0.0288 | ||
(GCC-ML) | 0.2902 | 0.2501 | 0.2103 | 0.1557 | 0.1352 | ||
(GCC-U) | 0.3315 | 0.1439 | 0.1003 | 0.0640 | 0.0446 | ||
(Adjusted Spearman) | 0.2993 | 0.1271 | 0.0884 | 0.0572 | 0.0407 | ||
1 | (GCC-ML) | 0.3105 | 0.2802 | 0.2403 | 0.1804 | 0.1607 | |
(GCC-U) | 0.3097 | 0.1315 | 0.0912 | 0.0582 | 0.0406 | ||
(Adjusted Spearman) | 0.3171 | 0.1356 | 0.0943 | 0.0611 | 0.0434 | ||
0 | (GCC-ML) | 0.1891 | 0.1563 | 0.1352 | 0.1014 | 0.0882 | |
(GCC-U) | 0.1380 | 0.0470 | 0.0314 | 0.0196 | 0.0135 | ||
(Adjusted Spearman) | 0.1418 | 0.0542 | 0.0378 | 0.0258 | 0.0199 | ||
(GCC-ML) | 0.2003 | 0.1655 | 0.1428 | 0.1087 | 0.0934 | ||
(GCC-U) | 0.1487 | 0.0501 | 0.0334 | 0.0212 | 0.0147 | ||
(Adjusted Spearman) | 0.1472 | 0.0585 | 0.0406 | 0.0276 | 0.0206 | ||
1 | (GCC-ML) | 0.2104 | 0.1745 | 0.1504 | 0.1156 | 0.0998 | |
(GCC-U) | 0.1578 | 0.0532 | 0.0356 | 0.0228 | 0.0158 | ||
(Adjusted Spearman) | 0.1539 | 0.0621 | 0.0434 | 0.0298 | 0.0223 |
Estimator | |||||||
---|---|---|---|---|---|---|---|
0 | 0 | (GCC-ML) | 0.3258 | 0.1807 | 0.1404 | 0.1086 | 0.0955 |
(GCC-U) | 0.2725 | 0.1457 | 0.1278 | 0.1167 | 0.1126 | ||
(Adjusted Spearman) | 0.2518 | 0.1434 | 0.1275 | 0.1174 | 0.1137 | ||
(GCC-ML) | 0.4190 | 0.2331 | 0.1774 | 0.1308 | 0.1103 | ||
(GCC-U) | 0.3277 | 0.1773 | 0.1520 | 0.1359 | 0.1299 | ||
(Adjusted Spearman) | 0.3235 | 0.1771 | 0.1532 | 0.1375 | 0.1316 | ||
1 | (GCC-ML) | 0.4539 | 0.2535 | 0.1914 | 0.1383 | 0.1142 | |
(GCC-U) | 0.3492 | 0.1880 | 0.1589 | 0.1405 | 0.1337 | ||
(Adjusted Spearman) | 0.1638 | 0.1405 | 0.1420 | 0.1435 | 0.1438 | ||
0 | (GCC-ML) | 0.4648 | 0.2602 | 0.1959 | 0.1405 | 0.1151 | |
(GCC-U) | 0.2309 | 0.1127 | 0.0963 | 0.0836 | 0.0786 | ||
(Adjusted Spearman) | 0.2415 | 0.1054 | 0.0857 | 0.0690 | 0.0626 | ||
(GCC-ML) | 0.2810 | 0.2546 | 0.2534 | 0.2521 | 0.2514 | ||
(GCC-U) | 0.2309 | 0.1127 | 0.0963 | 0.0836 | 0.0786 | ||
(Adjusted Spearman) | 0.2415 | 0.1054 | 0.0857 | 0.0690 | 0.0626 | ||
1 | (GCC-ML) | 0.2810 | 0.2546 | 0.2534 | 0.2521 | 0.2514 | |
(GCC-U) | 0.2309 | 0.1127 | 0.0963 | 0.0836 | 0.0786 | ||
(Adjusted Spearman) | 0.2415 | 0.1054 | 0.0857 | 0.0690 | 0.0626 | ||
0 | (GCC-ML) | 0.2902 | 0.2501 | 0.2103 | 0.1557 | 0.1352 | |
(GCC-U) | 0.3315 | 0.1439 | 0.1003 | 0.0640 | 0.0446 | ||
(Adjusted Spearman) | 0.2993 | 0.1271 | 0.0884 | 0.0572 | 0.0407 | ||
(GCC-ML) | 0.3105 | 0.2802 | 0.2403 | 0.1804 | 0.1607 | ||
(GCC-U) | 0.3097 | 0.1315 | 0.0912 | 0.0582 | 0.0406 | ||
(Adjusted Spearman) | 0.3171 | 0.1356 | 0.0943 | 0.0611 | 0.0434 | ||
1 | (GCC-ML) | 0.3233 | 0.1393 | 0.0976 | 0.0622 | 0.0437 | |
(GCC-U) | 0.3428 | 0.1493 | 0.1048 | 0.0668 | 0.0470 | ||
(Adjusted Spearman) | 0.3434 | 0.1497 | 0.1050 | 0.0670 | 0.0471 |
Estimator | |||||||
---|---|---|---|---|---|---|---|
0 | 0 | (GCC-ML) | 0.2489 | 0.1067 | 0.0754 | 0.0523 | 0.0417 |
(GCC-U) | 0.2480 | 0.0981 | 0.0676 | 0.0428 | 0.0298 | ||
(adjusted Spearman) | 0.2246 | 0.0925 | 0.0642 | 0.0407 | 0.0284 | ||
(GCC-ML) | 0.3433 | 0.1537 | 0.1092 | 0.0760 | 0.0606 | ||
(GCC-U) | 0.3154 | 0.1352 | 0.0941 | 0.0600 | 0.0418 | ||
(adjusted Spearman) | 0.3129 | 0.1336 | 0.0931 | 0.0592 | 0.0414 | ||
1 | (GCC-ML) | 0.3645 | 0.1652 | 0.1176 | 0.0820 | 0.0653 | |
(GCC-U) | 0.3315 | 0.1439 | 0.1003 | 0.0640 | 0.0446 | ||
(adjusted Spearman) | 0.3330 | 0.1437 | 0.1003 | 0.0639 | 0.0446 | ||
0 | (GCC-ML) | 0.2361 | 0.0931 | 0.0640 | 0.0432 | 0.0328 | |
(GCC-U) | 0.2404 | 0.0944 | 0.0649 | 0.0411 | 0.0286 | ||
(Adjusted Spearman) | 0.2187 | 0.0905 | 0.0628 | 0.0406 | 0.0288 | ||
(GCC-ML) | 0.3252 | 0.1315 | 0.0905 | 0.0612 | 0.0465 | ||
(GCC-U) | 0.2969 | 0.1253 | 0.0869 | 0.0554 | 0.0386 | ||
(adjusted Spearman) | 0.2993 | 0.1271 | 0.0884 | 0.0572 | 0.0407 | ||
1 | (GCC-ML) | 0.3450 | 0.1405 | 0.0967 | 0.0654 | 0.0497 | |
(GCC-U) | 0.3097 | 0.1315 | 0.0912 | 0.0582 | 0.0406 | ||
(adjusted Spearman) | 0.3171 | 0.1356 | 0.0943 | 0.0611 | 0.0434 | ||
0 | (GCC-ML) | 0.0631 | 0.0343 | 0.0281 | 0.0243 | 0.0228 | |
(GCC-U) | 0.1380 | 0.0470 | 0.0314 | 0.0196 | 0.0135 | ||
(adjusted Spearman) | 0.1418 | 0.0542 | 0.0378 | 0.0258 | 0.0199 | ||
(GCC-ML) | 0.0525 | 0.0280 | 0.0228 | 0.0195 | 0.0181 | ||
(GCC-U) | 0.0940 | 0.0331 | 0.0225 | 0.0142 | 0.0099 | ||
(adjusted Spearman) | 0.1357 | 0.0456 | 0.0310 | 0.0208 | 0.0158 | ||
1 | (GCC-ML) | 0.0480 | 0.0254 | 0.0205 | 0.0175 | 0.0163 | |
(GCC-U) | 0.0839 | 0.0286 | 0.0193 | 0.0122 | 0.0085 | ||
(adjusted Spearman) | 0.1301 | 0.0417 | 0.0281 | 0.0187 | 0.0142 |
Contamination | Estimator | ||||||
---|---|---|---|---|---|---|---|
10% | 0 | (GCC-ML) | 0.2238 | 0.0991 | 0.0768 | 0.0550 | 0.0456 |
(GCC-U) | 0.2495 | 0.1115 | 0.0864 | 0.0619 | 0.0514 | ||
(adjusted Spearman) | 0.3117 | 0.1427 | 0.1107 | 0.0795 | 0.0660 | ||
(GCC-ML) | 0.2685 | 0.1090 | 0.0771 | 0.0482 | 0.0341 | ||
(GCC-U) | 0.3224 | 0.1417 | 0.1025 | 0.0649 | 0.0468 | ||
(adjusted Spearman) | 0.3444 | 0.1694 | 0.1339 | 0.0959 | 0.0762 | ||
1 | (GCC-ML) | 0.3128 | 0.1328 | 0.0943 | 0.0593 | 0.0421 | |
(GCC-U) | 0.3321 | 0.1424 | 0.1013 | 0.0637 | 0.0453 | ||
(adjusted Spearman) | 0.3327 | 0.1428 | 0.1015 | 0.0639 | 0.0454 | ||
30% | 0 | (GCC-ML) | 0.2179 | 0.1085 | 0.0875 | 0.0663 | 0.0566 |
(GCC-U) | 0.2431 | 0.1220 | 0.0984 | 0.0746 | 0.0638 | ||
(adjusted Spearman) | 0.3041 | 0.1559 | 0.1261 | 0.0958 | 0.0819 | ||
(GCC-ML) | 0.2779 | 0.1146 | 0.0802 | 0.0498 | 0.0360 | ||
(GCC-U) | 0.3423 | 0.1594 | 0.1158 | 0.0733 | 0.0537 | ||
(adjusted Spearman) | 0.3745 | 0.2098 | 0.1703 | 0.1253 | 0.1051 | ||
1 | (GCC-ML) | 0.3189 | 0.1371 | 0.0961 | 0.0603 | 0.0437 | |
(GCC-U) | 0.3383 | 0.1471 | 0.1032 | 0.0647 | 0.0469 | ||
(adjusted Spearman) | 0.3389 | 0.1474 | 0.1035 | 0.0649 | 0.0471 | ||
50% | 0 | (GCC-ML) | 0.2188 | 0.1132 | 0.0898 | 0.0697 | 0.0584 |
(GCC-U) | 0.2440 | 0.1272 | 0.1010 | 0.0785 | 0.0658 | ||
(adjusted Spearman) | 0.3050 | 0.1626 | 0.1293 | 0.1008 | 0.0844 | ||
(GCC-ML) | 0.2835 | 0.1177 | 0.0823 | 0.0518 | 0.0364 | ||
(GCC-U) | 0.3572 | 0.1709 | 0.1227 | 0.0788 | 0.0563 | ||
(adjusted Spearman) | 0.3957 | 0.2349 | 0.1849 | 0.1395 | 0.1109 | ||
1 | (GCC-ML) | 0.3233 | 0.1393 | 0.0976 | 0.0622 | 0.0437 | |
(GCC-U) | 0.3428 | 0.1493 | 0.1048 | 0.0668 | 0.0470 | ||
(adjusted Spearman) | 0.3434 | 0.1497 | 0.1050 | 0.0670 | 0.0471 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ospina, R.; Xavier, C.M.; Esteves, G.H.; Espinheira, P.L.; Castro, C.; Leiva, V. Symmetry and Complexity in Gene Association Networks Using the Generalized Correlation Coefficient. Symmetry 2024, 16, 1510. https://doi.org/10.3390/sym16111510
Ospina R, Xavier CM, Esteves GH, Espinheira PL, Castro C, Leiva V. Symmetry and Complexity in Gene Association Networks Using the Generalized Correlation Coefficient. Symmetry. 2024; 16(11):1510. https://doi.org/10.3390/sym16111510
Chicago/Turabian StyleOspina, Raydonal, Cleber M. Xavier, Gustavo H. Esteves, Patrícia L. Espinheira, Cecilia Castro, and Víctor Leiva. 2024. "Symmetry and Complexity in Gene Association Networks Using the Generalized Correlation Coefficient" Symmetry 16, no. 11: 1510. https://doi.org/10.3390/sym16111510
APA StyleOspina, R., Xavier, C. M., Esteves, G. H., Espinheira, P. L., Castro, C., & Leiva, V. (2024). Symmetry and Complexity in Gene Association Networks Using the Generalized Correlation Coefficient. Symmetry, 16(11), 1510. https://doi.org/10.3390/sym16111510