Collaboration System for Multidisciplinary Research with Essential Data Analysis Toolkit Built-In
Abstract
:1. Introduction
- In the first step, we address the identification of the needs, pain points, and challenges directly with the final user and propose an incremental solution, usable since the first version, covering the basic needs in information management, fuzzy search of documents, and metadata, and compute essential statistic and display information of the datasets transparently.
- Secondly, we implement tools such as visual representations of n-variables, computing linear and nonlinear regressions of any numerical pair of selected variables, and displaying information in several distinct forms, with particular emphasis on t-SNE and PCA plots for supervised and non-supervised database clusters.
2. Theoretical Background
2.1. Collaborating System Infrastructure
2.2. Multidimensional Data Analysis and Visualisation
3. Materials and Methods
3.1. Hybrid Methodology
3.2. Requirements and Techniques Definition
3.2.1. Resource Management
3.2.2. Fuzzy Search Method
3.2.3. Multidimensional Data Visualization and Statistics Characterization
3.2.4. Knowledge-Sharing Implementation Tools
3.3. System Implementation
- Frontend. This component is a microservice that generates and serves the Web application to every connected client. This microservice has been designed to support horizontal scaling, as it only serves as a stateless proxy between the user interface and the data-processing backend.
- Data-processing backend. This component is a microservice that attends user requests through a Representational State Transfer (REST) Application Programming Interface (API). It handles information, uploads, data storage, analysis, and processing. Communication with the Relational Database Management System (RDBMS) is provided using two distinct Object Relational Mapping (ORM) interfaces. The first ORM handles the platform’s data and state, while the second handles creating and manipulating user-generated databases. This microservice has not been designed with thread-safety methods. Since information processing is handled stateful, it is currently only possible to support the vertical scaling of this microservice, which means increasing resources for processing the information.
- Relational Database Management System. This component is external to the application and can be implemented as either a vertically scaled physical or virtual server or a cluster of vertically scaled load-balanced servers to enable horizontal scaling. The platform employs a MySQL-compatible database engine with two information schemas: the first one stores the platform’s data, which considers the users and document metadata and pointers, and the second one stores raw user databases.
- File storage backend. This component is external to the application and has been implemented as a simple storage service (S3) that does not require any external cloud-dependent API; instead, the file storage backend stores flat (that is, non-hierarchical) data identified by their UUID, as described in [39,40], whose metadata is stored in the platform database.
- User authentication and login process. Access to the platform is provided by a user-password authentication mechanism, where users and passwords are stored in the application database using the PKCS2 cryptography algorithm, as detailed in Moriarty et al. [41]. Every time a user is successfully authenticated into the platform, the current date and time in UTC format are stored in the database in a dedicated “last login” field. This field is used to determine if a user has not logged into the application since the creation of the account, in which case the entry for the user is deleted after one day using the periodic execution of a background housekeeping task. Upon successful authentication to the platform, the user dashboard is displayed. The user dashboard contains quick links to recently uploaded and anchored information through a pointer to a document or database, as shown in Figure 2.
- Data analysis and visualization component. This component allows users to upload tabular data in several standard formats, which are then interpreted by the data-processing backend and stored as intermediate tables in the database backend. From the user interface into the databases section, users can select different datasets from the dropdown menu. Three data display methods exist: previsualization, tabular display, and graphical format.
- Document manager. This component allows users to upload files of arbitrary format. However, some Web-supported multimedia formats (images and PDF documents) can be previewed directly on the platform, either in the right-side panel UI element or in a new browser tab or window. These previews are not downloaded onto the user’s computer but rather cached in memory. As previously described, a text box UI element is provided at the top of the screen to provide search capabilities using fuzzy matching to assist the system use.
- Researcher directory. This component allows the user to search for the contact information of a specific researcher, either by their name or current association with a module. In the same fashion as the document manager, a text box element is provided at the top of the screen to provide search capabilities employing fuzzy matching, as previously described.
3.4. System Testing and Feedback
3.4.1. Project Members’ Test
3.4.2. Testing by Researchers of a Multidisciplinary Project
4. Results
4.1. Researcher’s Test
4.2. Product Owner Test
4.3. Case of Study
5. Discussion
6. Conclusions
7. Patents
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Shaoshao, X.Y.; Wu, J.; Deng, C.; Li, P.G.; Feng, C.X.J. A web-enabled collaborative quality management system. J. Manuf. Syst. 2006, 25, 95–107. [Google Scholar] [CrossRef]
- Yoon, S.W.; Matsui, M.; Yamada, T.; Nof, S.Y. Analysis of effectiveness and benefits of collaboration modes with information and knowledge-sharing. J. Intell. Manuf. 2011, 22, 101–112. [Google Scholar] [CrossRef]
- Akhavan, P.; Rahimi, A.; Mehralian, G. Developing a model for knowledge sharing in research centres. Vine 2013, 43, 357–393. [Google Scholar] [CrossRef]
- Lubell, M. Collaborative partnerships in complex institutional systems. Curr. Opin. Environ. Sustain. 2015, 12, 41–47. [Google Scholar] [CrossRef]
- Franz, M.; Lopes, C.T.; Huck, G.; Dong, Y.; Sumer, O.; Bader, G.D. Cytoscape.js: A graph theory library for visualisation and analysis. Bioinformatics 2016, 32, 309–311. [Google Scholar] [CrossRef] [PubMed]
- Van Rossum, G.; Drake, F.L., Jr. The Python Language Reference; Python Software Foundation: Wilmington, DE, USA, 2014. [Google Scholar]
- Singhal, S.; Jena, M. A study on Weka tool for data preprocessing, classification, and clustering. IJITEE 2013, 2, 250–253. [Google Scholar]
- Kulkarni, E.G.; Kulkarni, R.B. Weka, powerful tool in data mining. IJCA 2016, 975, 8887. [Google Scholar]
- Attwal, K.P.S.; Dhiman, A.S. Exploring data mining tool Weka and using Weka to build and evaluate predictive models. Adv. Appl. Match. Sci. 2020, 19, 451–469. [Google Scholar]
- Mullon, P.A.; Ngoepe, M. An integrated framework to elevate information governance to a national level in South Africa. Rec. Manag. J. 2019, 29, 103–116. [Google Scholar] [CrossRef]
- Janssen, M.; Brous, P.; Estevez, E.; Barbosa, L.S.; Janowski, T. Data governance: Organising data for trustworthy Artificial Intelligence. Gov. Inf. Q. 2020, 37, 101493. [Google Scholar] [CrossRef]
- Taivalsaari, A.; Mikkonen, T.; Ingalls, D.; Palacz, K. Web Browser as an Application Platform. In Proceedings of the 2008 34th Euromicro Conference Software Engineering and Advanced Applications, Parma, Italy, 3–5 September 2008; pp. 293–302. [Google Scholar]
- Taivalsaari, A.; Mikkonen, T.; Pautasso, C.; Systä, K. Comparing the Built-In Application Architecture Models in the Web Browser. In Proceedings of the IEEE International Conference on Software Architecture (ICSA), Gothenburg, Sweden, 3–7 April 2017; pp. 51–54. [Google Scholar]
- Waseem, M.; Liang, P.; Shahin, M. A Systematic Mapping Study on Microservices Architecture in DevOps. J. Syst. Soft. 2020, 170, 110798. [Google Scholar] [CrossRef]
- Qin, X.; Luo, Y.; Tang, N.; Li, G. Making data visualisation more efficient and effective: A survey. VLDB J. 2020, 29, 93–117. [Google Scholar] [CrossRef]
- Ou, J.; Zhu, L.J. trackViewer: A bioconductor package for interactive and integrative visualisation of multi-omics data. Nat. Method 2019, 16, 453–454. [Google Scholar] [CrossRef] [PubMed]
- Morales, J.; Echavarría, F. Methodology to explore open data of road crashes using Data Science: Case Medellín. Ingeniare 2019, 27, 495–509. [Google Scholar] [CrossRef]
- Medina-Quispe, F.; Castillo-Rojas, W.; Villegas, C. Metrics for the support of visual exploration of components in data mining models. Ingeniare 2020, 28, 596–611. [Google Scholar] [CrossRef]
- Da Silva Lopes, M.A.; Dória Neto, A.D.; De Medeiros Martins, A. Parallel t-SNE Applied to Data Visualization in Smart Cities. IEEE 2020, 8, 11482–11490. [Google Scholar] [CrossRef]
- Kopecká, M.; Hájek, M.; Jiménez-Alfaro, B.; Chytrý, M. The T-SNE Algorithm as a Tool to Improve the Quality of Reference Data Used in Accurate Mapping of Heterogeneous Non-forest Vegetation. Remote Sens. 2020, 12, 39. [Google Scholar] [CrossRef]
- Liu, H.; Yang, J.; Ye, M.; James, S.C.; Tang, Z.; Dong, J.; Xing, T. Using T-distributed Stochastic Neighbor Embedding (t-SNE) for Cluster Analysis and Spatial Zone Delineation of Groundwater Geochemistry Data. J. Hydrol. 2021, 597, 126146. [Google Scholar] [CrossRef]
- Figma Co. Available online: https://www.figma.com/ (accessed on 20 April 2023).
- Shore Labs. Kanban Tool. Available online: https://kanbantool.com/es/metodologia-kanban (accessed on 20 April 2023).
- GitHub Inc. GitHub Platform. Available online: https://github.com/ (accessed on 24 April 2023).
- Sutherland, J. Future of scrum: Parallel pipelining of sprints in complex projects. In Proceedings of the Agile Development Conference (ADC’05), Denver, CO, USA, 24–29 July 2005; pp. 90–99. [Google Scholar]
- Swimm Team: Popular Collaborative Coding Practices. Available online: https://swimm.io/learn/code-collabo-ration/code-collaboration-styles-tools-and-best-practices/ (accessed on 2 July 2023).
- Villamizar, M.; Garcés, O.; Castro, H.; Verano, M.; Salamanca, L.; Casallas, R.; Gil, S. Evaluating the Monolithic and the Microservice Architecture Pattern to Deploy Web Applications in the Cloud. In Proceedings of the 2015 10th Computing Colombian Conference (10CCC), Bogota, Colombia, 21–25 September 2015; pp. 583–590. [Google Scholar]
- Jain, S.; Seeja, K.R.; Jindal, R. A fuzzy ontology framework in information retrieval using semantic query expansion. Int. J. Inf. Manag. Data Insights 2021, 1, 100009. [Google Scholar] [CrossRef]
- Baeza-Yates, R.A.; Perleberg, C.H. Fast and Practical Approximate String Matching. Inf. Process. Lett. 1996, 59, 21–27. [Google Scholar] [CrossRef]
- Yujian, L.; Bo, L. A normalised Levenshtein distance metric. IEEE TPAMI 2007, 29, 1091–1095. [Google Scholar] [CrossRef]
- Schmidt, J. Usage of Visualisation Techniques in Data Science Workflows. VISIGRAPP 2020, 3, 309–316. [Google Scholar] [CrossRef]
- Yalim, C.; Handley Holy, A.H. The effectiveness of visualisation techniques for supporting decision-making. In Proceedings of the Modeling, Simulation and Visualization Student Capstone Conference 2023, Suffolk, VA, USA, 20 April 2023. [Google Scholar] [CrossRef]
- Bisong, E. Matplotlib and Seaborn. In Building Machine Learning and Deep Learning Models on Google Cloud Platform; Apress: Berkeley, CA, USA, 2019; pp. 151–165. [Google Scholar] [CrossRef]
- Kurita, T. Principal Component Analysis (PCA). In Computer Vision; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
- van der Maaten, L.J.P.; Hinton, G.E. Visualizing Data using t-SNE. JMRL 2008, 9, 2579–2605. [Google Scholar]
- Hildenbrand, T.; Meyer, J. Intertwining lean and design thinking: Software product development from empathy to shipment. In Software for People, Management for Professional; Maedche, A., Botzenhardt, A., Neer, L., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 217–237. [Google Scholar]
- Srivastava, A.; Bhardwaj, S.; Saraswat, S. SCRUM model for agile methodology. In Proceedings of the 2017 International Conference on Computing, Communication and Automation (ICCCA), Greater Noida, India, 5–6 May 2017; pp. 864–869. [Google Scholar]
- Sandhu, R.S. Role-Based Access Control. In Advances in Computers; Zelkowitz, M.V., Ed.; Elsevier: Amsterdam, The Netherlands, 1998; Volume 46, pp. 237–286. ISBN 978-0-12-012146-5. [Google Scholar]
- ITU-T X.667; Information Technology—Procedures for the Operation of Object Identifier Registration Authorities: Generation of Universally Unique Identifiers and Their Use in Object Identifiers. ITU-T X-Series Recommendations; Telecommunication Standardization Sector of ITU (ITU-T): Geneva, Switzerland, 2012.
- Leach, P.J.; Salz, R.; Mealling, M.H. RFC 4122—A Universally Unique Identifier (UUID) URN Namespace. Internet Engineering Task Force. 2005. Available online: https://www.irtf.org/ (accessed on 10 November 2023).
- Moriarty, K.; Kaliski, B.; Rusch, A. PKCS #5: Password-Based Cryptography Specification Version 2.1; IETF: 2017. Available online: https://www.ietf.org/ (accessed on 10 November 2023).
- Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
- Dirección General de Información en Salud. Datos Abiertos. Available online: http://www.dgis.salud.gob.mx/contenidos/basesdedatos/Datos_Abiertos_gobmx.html (accessed on 16 August 2023).
- De Paula, D.; Cormican, K.; Dobrigkeit, F. From Acquaintances to Partners in Innovation: An Analysis of 20 Years of Design Thinking’s Contribution to New Product Development. IEEE Trans. Eng. Manag. 2022, 69, 1664–1677. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Garay-Jiménez, L.I.; Romero-Lujambio, J.F.; Santiago-Horta, A.; Tovar-Corona, B.; Gómez-Miranda, P.; Mata-Rivera, M.F. Collaboration System for Multidisciplinary Research with Essential Data Analysis Toolkit Built-In. Information 2023, 14, 626. https://doi.org/10.3390/info14120626
Garay-Jiménez LI, Romero-Lujambio JF, Santiago-Horta A, Tovar-Corona B, Gómez-Miranda P, Mata-Rivera MF. Collaboration System for Multidisciplinary Research with Essential Data Analysis Toolkit Built-In. Information. 2023; 14(12):626. https://doi.org/10.3390/info14120626
Chicago/Turabian StyleGaray-Jiménez, Laura I., Jose Fausto Romero-Lujambio, Amaury Santiago-Horta, Blanca Tovar-Corona, Pilar Gómez-Miranda, and Miguel Félix Mata-Rivera. 2023. "Collaboration System for Multidisciplinary Research with Essential Data Analysis Toolkit Built-In" Information 14, no. 12: 626. https://doi.org/10.3390/info14120626
APA StyleGaray-Jiménez, L. I., Romero-Lujambio, J. F., Santiago-Horta, A., Tovar-Corona, B., Gómez-Miranda, P., & Mata-Rivera, M. F. (2023). Collaboration System for Multidisciplinary Research with Essential Data Analysis Toolkit Built-In. Information, 14(12), 626. https://doi.org/10.3390/info14120626