Symmetry in Scientific Collaboration Networks: A Study Using Temporal Graph Data Science and Scientometrics
Abstract
:1. Introduction
- A new methodology to perform scientific production analysis that considers symmetry in collaboration networks for evaluating and analyzing the cohesion of research groups;
- A temporal and experimental process that incorporates symmetry in network metrics and embedding features to cluster and investigate data, providing a more comprehensive analysis;
- A practical analysis on a dataset from a Brazilian graduate program in computer and electrical engineering, demonstrating the usefulness of considering symmetry in scientific collaboration networks in evaluating the scientific production of research groups.
2. Related Works
3. Proposed Methodology
4. Materials and Methods
4.1. Goal Definition
4.2. Planning
4.2.1. Participant and Artifact Selection
4.2.2. Research Questions
- Is it possible to evaluate a research group based on the centrality metrics of co-authorship networks?
- Based on centrality metrics, is it possible to assess the evolution of a research group over time?
- Do isolated temporal networks tend to have a large number of subgroups isolated from the strongly connected component of a collaborative network, and consequently low cohesion indices?
- Can the number of connected components be considered as a metric for the cohesion analysis of research groups?
- Can the chosen research group be considered cohesive or is it becoming more cohesive?
- Can the use of graph embeddings help in the detection of collaboration patterns in temporal networks?
- Can the use of node embeddings help in the detection of patterns of collaborations in a research group?
- Is there any correlation between the group members’ lines of research and their patterns of scientific collaboration?
- Is the proposed methodology suitable to support the tasks of evaluation and cohesion analysis of research groups?
4.2.3. Instrumentation
- Python Data Science ecosystem (pandas (https://pandas.pydata.org), NumPy (https://numpy.org), Matplotlib (https://matplotlib.org), seaborn (https://seaborn.pydata.org), scikit-learn (https://scikit-learn.org) and others), provided by Anaconda platform (https://www.anaconda.com) or Google Colab (https://colab.research.google.com);
- Anaconda’s Jupyter Lab;
- NetworkX (https://networkx.org) e Gephi (https://gephi.org), library and tool, respectively, for modeling, visualization, analysis and manipulation of complex networks;
- Node2Vec (https://snap.stanford.edu/node2vec/) and Graph2Vec (https://github.com/benedekrozemberczki/karateclub) libraries for conversion of graphs into vector structures;
- Data Version Control;
- The researchers’ stats, scholarly production associated to research group chosen, and the proposed methodology, both discussed in Section 3;
- The Jupyter Notebooks that contain all source code for performing the data analysis, which are available in a GitHub repository (https://github.com/breno-madruga/evaluation-research-groups).
4.3. Operation
5. Results and Discussion
5.1. Conventional Analysis
5.2. Temporal Networks’ Embeddings Analysis
5.3. Group’s Members Analysis
Algorithm 1 Weighted voting process. |
|
6. Threats to Validity
- Collection bias: A risk associated with the collecting procedure is the possibility of relevant manuscripts being published after the date of retrieval, in addition to failing to consider additional sources of supplementary data that may contain relevant works from the evaluated research group. Thus, the main production base extensively used by the research world was employed to mitigate this problem.
- Indexing bias: This threat could not be attenuated since the data contained in the utilized datasets are classified and maintained by external parties, which is beyond the limits of the suggested solution.
- Ethical approval: Given that this was a metadata analysis of published manuscripts, no ethics committee permission was necessary.
- External validity: An evaluation and cohesion analysis of a research group was carried out for the period from January 2010 to 14 October 2022. However, it is possible that some relevant work was not yet indexed or is indexed on bases other than the employed one. Therefore, it is impossible to generalize the conclusions obtained to validate the complete effectiveness of the proposed methodology. However, the results are pretty relevant to outline future research group evaluation and cohesion analysis investigations.
7. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sugimoto, C.R.; Larivière, V. Measuring Research: What Everyone Needs to Know; Oxford University Press: Oxford, UK, 2018. [Google Scholar]
- Amat, C.B.; Perruchas, F. Evolving cohesion metrics of a research network on rare diseases: A longitudinal study over 14 years. Scientometrics 2016, 108, 41–56. [Google Scholar] [CrossRef] [Green Version]
- Vinkler, P. The Evaluation of Research by Scientometric Indicators; Chandos Publishing: Oxford, UK, 2010. [Google Scholar]
- Franceschini, F.; Maisano, D. Structured evaluation of the scientific output of academic research groups by recent h-based indicators. J. Inf. 2011, 5, 64–74. [Google Scholar] [CrossRef]
- Mryglod, O.; Holovatch, Y.; Kenna, R. Big fish and small ponds: Why the departmental h-index should not be used to rank universities. Scientometrics 2022, 127, 3279–3292. [Google Scholar] [CrossRef]
- Kudelka, M.; Plato, J.; Krömer, P. Author evaluation based on H-index and citation response. In Proceedings of the 2016 International Conference on Intelligent Networking and Collaborative Systems (INCoS), Ostrava, Czech Republic, 7–9 September 2016; pp. 375–379. [Google Scholar] [CrossRef] [Green Version]
- Montazerian, M.; Zanotto, E.D.; Eckert, H. A new parameter for (normalized) evaluation of H-index: Countries as a case study. Scientometrics 2019, 118, 1065–1078. [Google Scholar] [CrossRef]
- Menczer, F.; Fortunato, S.; Davis, C.A. A First Course in Network Science; Cambridge University Press: Cambridge, UK, 2020. [Google Scholar]
- Wang, D.; Barabási, A.L. The Science of Science; Cambridge University Press: Cambridge, UK, 2021. [Google Scholar]
- Jeon, H.J.; Lee, O.J.; Jung, J.J. Is performance of scholars correlated to their research collaboration patterns? Front. Big Data 2019, 2, 1–10. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wiechetek, Ł.; Pastuszak, Z. Academic social networks metrics: An effective indicator for university performance? Scientometrics 2022, 127, 1381–1401. [Google Scholar] [CrossRef]
- Camargo, L.S.d.; Barbosa, R.R. Bibliometria, Cientometria e um possível caminho para a Construção de Indicadores e Mapas da Produção Científica. PontodeAcesso 2018, 12, 109–125. [Google Scholar] [CrossRef]
- Moral-Munoz, J.A.; López-Herrera, A.G.; Herrera-Viedma, E.; Cobo, M.J. Science Mapping Analysis Software Tools: A Review. In Springer Handbook of Science and Technology Indicators; Glänzel, W., Moed, H.F., Schmoch, U., Thelwall, M., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 159–185. [Google Scholar] [CrossRef]
- Ju, H.; Zhou, D.; Blevins, A.S.; Lydon-Staley, D.M.; Kaplan, J.; Tuma, J.R.; Bassett, D.S. Historical growth of concept networks in Wikipedia. Collect. Intell. 2022, 1. [Google Scholar] [CrossRef]
- Keramatfar, A.; Rafiee, M.; Amirkhani, H. Graph Neural Networks: A bibliometrics overview. Mach. Learn. Appl. 2022, 10, 100401. [Google Scholar] [CrossRef]
- Zweig, K.A. Network Analysis Literacy; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Zinoviev, D. Complex Network Analysis in Python: Recognize-Construct-Visualize-Analyze-Interpret; Pragmatic Bookshelf: North Carolina, NC, USA, 2018. [Google Scholar]
- Grohe, M. Word2vec, Node2vec, Graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Portland, OR, USA, 14–19 June 2020; PODS’20. Association for Computing Machinery: New York, NY, USA, 2020; pp. 1–16. [Google Scholar] [CrossRef]
- Grover, A.; Leskovec, J. Node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; KDD’16. Association for Computing Machinery: New York, NY, USA, 2016; pp. 855–864. [Google Scholar] [CrossRef] [Green Version]
- Narayanan, A.; Chandramohan, M.; Venkatesan, R.; Chen, L.; Liu, Y.; Jaiswal, S. graph2vec: Learning Distributed Representations of Graphs. arXiv 2017, arXiv:1707.05005. [Google Scholar]
- Santos, B.S.; Silva, I.; Lima, L.; Endo, P.T.; Alves, G.; Ribeiro-Dantas, M.d.C. Discovering temporal scientometric knowledge in COVID-19 scholarly production. Scientometrics 2022, 127, 1609–1642. [Google Scholar] [CrossRef] [PubMed]
- Kuprieiev, R.; Skshetry; Petrov, D.; Rowlands, P.; Redzyński, P.; da Costa-Luis, C.; Schepanovski, A.; Gao; de la Iglesia Castro, D.; Shcheklein, I.; et al. DVC: Data Version Control-Git for Data & Models. Zenodo. February 2023. Available online: https://doi.org/10.5281/zenodo.3677553 (accessed on 20 February 2023).
- Santos, B.S.; Júnior, M.C.; da Paixão, B.C.; Santos, R.M.; Nascimento, A.V.R.P.; dos Santos, H.C.; Filho, W.H.L.; de Medeiros, A.S.L. Comparing Text Mining Algorithms for Predicting Irregularities in Public Accounts. In Proceedings of the XI Brazilian Symposium on Information Systems SBSI 2015, Goiania, Goias, Brazil, 26–29 June 2015; Brazilian Computer Society: Porto Alegre, Brazil, 2015; pp. 667–674. [Google Scholar]
- Santos, B.S.; Silva, I.; Melo, E. Metodologia orientada a ciência de dados em grafos para avaliação de PPGs. In Proceedings of the XV Simpósio Brasileiro de Automação Inteligente (SBAI 2021), Virtual, 17–19 October 2021; Sociedade Brasileira de Automática: Rio Grande, Rio Grande do Sul, Brazil, 2021; pp. 1998–2005. [Google Scholar] [CrossRef]
- Basili, V.R.; Weiss, D.M. A Methodology for Collecting Valid Software Engineering Data. IEEE Trans. Softw. Eng. 1984, SE-10, 728–738. [Google Scholar] [CrossRef]
- van Solingen, D.R.; Berghout, E.W. The Goal/Question/Metric Method: A Practical Guide for Quality Improvement of Software Development; McGraw-Hill: New York, NY, USA, 1999. [Google Scholar]
- CAPES. CAPES—Institutional Page. 2022. Available online: https://www.gov.br/capes/pt-br/acesso-a-informacao/institucional/historia-e-missao (accessed on 18 October 2022).
- CAPES. CAPES—Quadrennial Evaluation. 2022. Available online: https://www.gov.br/capes/pt-br/acesso-a-informacao/acoes-e-programas/avaliacao/sobre-a-avaliacao/avaliacao-o-que-e/sobre-a-avaliacao-conceitos-processos-e-normas/conceito-avaliacao (accessed on 18 October 2022).
- Cai, H.; Zheng, V.W.; Chang, K.C.C. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Trans. Knowl. Data Eng. 2018, 30, 1616–1637. [Google Scholar] [CrossRef] [Green Version]
- Gu, W.; Tandon, A.; Ahn, Y.Y.; Radicchi, F. Principled approach to the selection of the embedding dimension of networks. Nat. Commun. 2021, 12, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Longa, A. Graph Embedding in 2D. Master’s Thesis, Università degli Studi di Trento, Trento, Italy, 2019. [Google Scholar]
- Bonaccorso, G. Hands-On Unsupervised Learning with Python; Packt Publishing Ltd.: Birmingham, UK, 2019. [Google Scholar]
- Müller, A.C.; Guido, S. Introduction to Machine Learning with Python: A Guide for Data Scientists; O’Reilly Media: Sebastopol, CA, USA, 2016. [Google Scholar]
- Patel, A.A. Hands-On Unsupervised Learning Using Python: How to Build Applied Machine Learning Solutions from Unlabeled Data; O’Reilly Media: Sebastopol, CA, USA, 2019. [Google Scholar]
- Bramer, M. Principles of Data Mining; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Zhou, S.; Yuan, P.; Liu, L.; Jin, H. MGTag: A Multi-Dimensional Graph Labeling Scheme for Fast Reachability Queries. In Proceedings of the 2018 IEEE 34th International Conference on Data Engineering (ICDE), Paris, France, 16–19 April 2018; pp. 1372–1375. [Google Scholar] [CrossRef]
- Agrawal, G.; Deng, Y.; Park, J.; Liu, H.; Chen, Y.C. Building Knowledge Graphs from Unstructured Texts: Applications and Impact Analyses in Cybersecurity Education. Information 2022, 13, 526. [Google Scholar] [CrossRef]
- Santos, B.; Silva, I.; Costa, D.G. Research Group Dataset. Dataset Version 2, Mendeley Data. 2022. Available online: https://doi.org/10.17632/rwfd6p6xsd (accessed on 20 February 2023).
Features | Analysis Focus | Metrics | Temporal | DVC | |
---|---|---|---|---|---|
Study | |||||
[2] | Cohesion Analysis of groups | CNA metrics | Yearly | No | |
[4] | Evaluation of groups | h-index and its derivatives | No | No | |
[6] | Evaluation of groups | h-index and its derivatives | No | No | |
[10] | Evaluation of groups | Embeddings | No | No | |
[7] | Evaluation of groups | h-index and MZE-index | No | No | |
[5] | Evaluation of groups | h-index and its derivatives | No | No | |
[11] | Evaluation of groups | CNA metrics | No | No | |
Proposed approach | Evaluation and Cohesion Analysis of groups | CNA metrics and Embeddings | Yearly | Yes |
ID | Research Line | Number of Members |
---|---|---|
Automation and Systems | 10 | |
Computer Engineering | 11 | |
Telecommunication | 5 |
Period | Network Metrics | Node Embedding | ||||||
---|---|---|---|---|---|---|---|---|
Accuracy | Accuracy | |||||||
2010–2012 | 52% | 0% | 67% | 57% | 56% | 0% | 69% | 67% |
2013–2016 | 58% | 61% | 67% | 0% | 69% | 50% | 77% | 80% |
2017–2020 | 58% | 61% | 67% | 0% | 69% | 57% | 77% | 67% |
2021–2022 | 58% | 50% | 71% | 0% | 69% | 67% | 73% | 67% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Santos, B.S.; Silva, I.; Costa, D.G. Symmetry in Scientific Collaboration Networks: A Study Using Temporal Graph Data Science and Scientometrics. Symmetry 2023, 15, 601. https://doi.org/10.3390/sym15030601
Santos BS, Silva I, Costa DG. Symmetry in Scientific Collaboration Networks: A Study Using Temporal Graph Data Science and Scientometrics. Symmetry. 2023; 15(3):601. https://doi.org/10.3390/sym15030601
Chicago/Turabian StyleSantos, Breno Santana, Ivanovitch Silva, and Daniel G. Costa. 2023. "Symmetry in Scientific Collaboration Networks: A Study Using Temporal Graph Data Science and Scientometrics" Symmetry 15, no. 3: 601. https://doi.org/10.3390/sym15030601
APA StyleSantos, B. S., Silva, I., & Costa, D. G. (2023). Symmetry in Scientific Collaboration Networks: A Study Using Temporal Graph Data Science and Scientometrics. Symmetry, 15(3), 601. https://doi.org/10.3390/sym15030601