Opening Software Research Data 5Ws+1H
Abstract
:1. Introduction
2. Why Should Software Engineers Share Their Software Engineering Research Data?
3. Who Should Be Considered during the Process of Opening Software Engineering Research Data?
4. What Type of Software Engineering Research Data Should Be Shared?
- Findability: Software Research Data and its associated metadata are easy for both humans and machines to find.
- Accessibility: Software Research Data should be understandable and retrievable through standardized protocols, with the original research purpose clearly defined.
- Interoperability: Software Research Data should be executable and have the ability to seamlessly be integrated into other software artifacts.
- Reusability: Software Research Data should be both executable and modifiable, ensuring ongoing relevance.
- Static metadata: Primarily includes data produced during the planning, designing, and maintaining phases of software development. This type of metadata provides general information necessary to recreate the described system, without including the executables or runtime results.
- Dynamic-Runtime metadata: Includes metadata that can be executed to reproduce either the entire system or the research outcomes. These data mostly come from the development, implementation, and testing phases, where the system executes in real or simulated environments. These types may also include actions focused on reproduction, replication, verification, validation, repeatability, and/or utility [48].
- Software name.
- Programming language.
- Version information.
- File system metadata (e.g., file sizes, number of files, etc.).
- All types of Licenses applied to any material (e.g., text, image, multimedia, software) those licenses can be GPL, MIT, LGPL (GNU Lesser Public License), CC creative etc. [54].
- Identifiers refers to unique, persistent, and machine-actionable identifiers (e.g., DOI, ARK, or PURL).
- All Locations where metadata is stored.
- Categorization information (e.g., application category, keywords, etc.).
- Availability information (e.g., release dates, embargo, etc.).
- Relational metadata (e.g., software is part of another work).
- High-level description (e.g., abstract, README, or other text description).
- Dependency information.
- References to work the software is built on or relates to.
- Software metrics (e.g., quality metrics like code coverage, etc.).
- Development metrics (e.g., pertaining to issues, pull requests, etc.).
- Usage metrics (e.g., downloads, stars, citations, etc.).
- Funding information.
5. When Is the Right Time to Share Software Engineering Research Data?
6. Where Should Software Engineers Deposit Their Software Engineering Research Data?
- Transparency: To be transparent about specific repository services and data holdings that are verifiable by publicly accessible evidence. When choosing the method of deposition, researchers need to make sure that there are no limitations or restrictions regarding the use of the deposit data. Being able to easily assess whether a repository can handle sensitive data responsibly would also inform their decision on whether to utilize the available data services.
- Responsibility: To be responsible for ensuring the authenticity and integrity of data holdings and for the reliability and persistence of its service. Before sharing your data, you need to make sure that it aligns with the practice restrictions of the means of sharing. In several cases, the verification of the data and metadata integrity might require time. Both depositors and users must have confidence that the data will remain accessible over time and thus can be cited and referenced in scholarly publications.
- User Focus: To ensure that the data management norms and expectations of target user communities are met. Before sharing data, researchers need to address the project’s target group. It might need to monitor and identify the expectations of users before deposing data. Moreover, it is important to remember that different types of users reach data from different routes [9].
- Sustainability: To sustain services and preserve data holdings for the long term. The software’s lifetime can be long for well-maintained projects, or end quickly if the task it was supposed to do is not needed anymore or if another software does it in a better way. While the software is easily replaceable, metadata, on the other hand, is an important digital artifact that should be preserved [58] along with datasets in order to properly verify or reproduce [59] published findings. Metadata might become less interesting with time, but it cannot be replaced as it is connected to one particular experiment at that particular time.
- Technology: To provide infrastructure and capabilities to support secure, persistent, and reliable services. Without active maintenance, data deteriorates and disappears at an alarming rate, making digital data far more disposable than traditional information sources [56].
7. How Should Software Engineers Share the Software Engineering Research Data?
- Purpose: Provide a concise description of the research and the produced software, outlining its purpose and the intended audience. Specify the primary function it serves and who would benefit most from using it.
- Types of Data Description: Clearly describe the types and formats of data collected or generated in the project, including any existing data being reused and its origin.
- Contextual Details (Metadata): Provide the contextual details and metadata necessary to make the data meaningful to others, ensuring documentation and data quality.
- Storage, Backup, and Security: Outline the storage infrastructure, backup procedures, and security measures to protect data against loss, corruption, or unauthorized access.
- Provisions for Protection/Privacy, Legal, and Ethical Aspects: Address privacy and legal considerations, including provisions for data protection, compliance with ethical guidelines, and compliance with relevant laws and regulations. While choosing licensing, Engineers need to keep in mind that data and software should be licensed under different licenses [68]. For scientific data content, the most used licenses are Creative Commons licenses. The CC licenses are a good option for works such articles, books, working papers, and reports while a dedication to the public domain using CC0 is recommended for datasets and databases. The list of possible choices CC licenses, Open Data Commons Licenses, CC0 and CC Public Domain Mark. On the software licensing side the licenses are divided into three groups: permissive licenses (MIT License, Free BSD License, the New BSD License, Apache License 2.0 and Artistic License 2.0), weak copyleft licenses (GNU LGPL, Mozilla Public License 2.0, Eclipse Public License 1.0, Common Development and Distribution License 1.0), strong copyleft licenses (GNU GPL) and an additional group of network copyleft licenses (Affero GPL). The choice of licenses is therefore relatively broad [54]. Engineers need to ensure that it is compatible with the licenses of any libraries or dependencies used.
- Policies for Reuse: Define policies for data reuse, including access requirements, sharing agreements, and licensing terms to facilitate responsible data sharing.
- Policies for Access and Sharing: Establish policies and procedures for accessing and sharing research data with collaborators, institutions, or the broader scientific community. Explain how users can cite your research in their work. Provide links to citation guidelines that users should follow.
- Plan for Archiving and Preservation: Define a plan for the long-term archiving and preservation of data, specifying repositories or archives to deposit the data and ensuring its accessibility over time. Special attention must be paid when a reusable software component nears the end of its life. Discuss partnerships, sponsorships, or funding sources that will help support the software over time. Describe the level of support that will be provided to users of the software. This includes specifying the types of support (e.g., bug fixes, updates, technical assistance) and how it will be organized.
- Argos: Developed by OpenAIRE to facilitate Research Data Management activities. The service is an open platform designed to manage, validate, monitor, and maintain data management plans (DMPs) and data sets. It allows users to view publicly released DMPs and Datasets, and build DMPs that can be either private or publicly released and traded between infrastructures.
- DMPRoadmap: The tool is jointly provided by the Digital Curation Center (DCC), and the University of California Curation Center (UC3). DMPRoadmap assists the collaborative creation and maintenance of different versions of DMPs, provides guidance on management issues, and supports various formats exportation. The service is free at the point of use for researchers to develop DMPs.
- DMPonline: DMPonline is a web-based open tool that relies on the DMPRoadmap code-base. The tool offers a number of templates that represent the requirements of different funders and institutions. Users are asked three questions at the outset to determine the appropriate template. Guidance by researcher funders, universities, and disciplines is provided to help researchers interpret and answer the questions.
- DMPTool: DMP Tool provides a click-through wizard to create a DMP that complies with the requirements of the funders, with direct links to the funder websites, help text to answer questions and resources for best practices in data management.
- Data Stewardship Wizard: Developed by the Elixir is one of the recommended tools for DMPs by the European Commission. Brings together data stewards and researchers to compose DMPs efficiently. Researchers are guided through composing the DMP, which can then be exported using a selected template and format, including machine-actionable.
- RDMO: Offering an API, the RDMO supports researchers with the systematic planning, organization, and implementation of data management throughout the course of a research project.
- DAMAP: The tool supports managing both data and code along the research data life-cycle by generating DMPs using a guided ten-step questionnaire.
8. Use Case Scenario: Opening Software Research
8.1. Use Case Description
8.2. Description of Research Study
8.3. Description of Data
8.4. Why Should We Share Our Software Engineering Research Data?
8.5. Who Should Be Considered during OUR Process of Opening Software Engineering Research Data?
8.6. What Type of Software Engineering Research Data Should WE Share?
8.7. When Is the Right Time to Share OUR Software Engineering Research Data?
8.8. Where Should WE Deposit Their Software Engineering Research Data?
8.9. How Should WE Share the Software Engineering Research Data?
8.10. Use Case Conclusion
9. Discussion
9.1. Implications to Researchers And Practitioners
9.2. Future Work
- Support and promote the guidelines: Identifying key organizations and institutions that will actively endorse and promote the guidelines within their networks.
- Provide training on the guidelines: Providing resources for researchers, data managers, and developers to put the principles into practice.
- Use the guidelines: Encouraging researchers and developers to apply these recommendations into their research process.
10. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Russell, A.L. Open Standards and the Digital Age: History, Ideology, and Networks; Cambridge Studies in the Emergence of Global Enterprise; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Critchlow, T.; Kleese, K. Data-Intensive Science; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
- Woelfle, M.; Olliaro, P.; Todd, M. Open science is a research accelerator. Nat. Chem. 2011, 3, 745–748. [Google Scholar] [CrossRef] [PubMed]
- Open Science Definition | FOSTER. Available online: https://www.fosteropenscience.eu/foster-taxonomy/open-science-definition (accessed on 15 September 2024).
- Budapest Open Access Initiative. Available online: https://www.budapestopenaccessinitiative.org/read/ (accessed on 15 September 2024).
- Kelley, A.; Garijo, D. A framework for creating knowledge graphs of scientific software metadata. Quant. Sci. Stud. 2021, 2, 1423–1446. [Google Scholar] [CrossRef]
- Anzt, H.; Kuehn, E.; Flegar, G. Crediting pull requests to open source research software as an academic contribution. J. Comput. Sci. 2021, 49, 101278. [Google Scholar] [CrossRef]
- Li, Z.; Yu, Y.; Zhou, M.; Wang, T.; Yin, G.; Lan, L.; Wang, H. Redundancy, context, and preference: An empirical study of duplicate pull requests in OSS projects. IEEE Trans. Softw. Eng. 2020, 48, 1309–1335. [Google Scholar] [CrossRef]
- Hucka, M.; Graham, M. Software search is not a science, even among scientists: A survey of how scientists and engineers find software. J. Syst. Softw. 2018, 141, 171–191. [Google Scholar] [CrossRef]
- Katz, D.S.; Gruenpeter, M.; Honeyman, T. Taking a fresh look at FAIR for research software. Patterns 2021, 2, 100222. [Google Scholar] [CrossRef]
- Ojala, M.; Cohn, M.L. Software Maintenance as Materialization of Common Knowledge. Engag. Sci. Technol. Soc. 2023, 9, 165–185. [Google Scholar] [CrossRef]
- Zaragozí, B.M.; Trilles, S.; Navarro-Carrión, J.T. Leveraging Container Technologies in a GIScience Project: A Perspective from Open Reproducible Research. ISPRS Int. J. Geo-Inf. 2020, 9, 138. [Google Scholar] [CrossRef]
- Herala, A.; Kasurinen, J.; Vanhala, E. Views on Open Data Business from Software Development Companies. J. Theor. Appl. Electron. Commer. Res. 2018, 13, 91–105. [Google Scholar] [CrossRef]
- Krishnamurthy, S. A managerial overview of open source software. Bus. Horizons 2003, 45, 47–56. [Google Scholar] [CrossRef]
- Geiger, R.S.; Howard, D.; Irani, L. The Labor of Maintaining and Scaling Free and Open-Source Software Projects. Proc. ACM Hum.-Comput. Interact. 2021, 5, 175. [Google Scholar] [CrossRef]
- Terzi, A.; Christou, O.; Bibi, S.; Angelidis, P. Software Reuse and Evolution in JavaScript Applications. In Proceedings of the 2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), Gran Canaria, Spain, 31 August–2 September 2022; pp. 263–269. [Google Scholar] [CrossRef]
- Jackson, M. Software Deposit: What to Deposit. 2018. Available online: https://doi.org/10.5281/zenodo.1327325 (accessed on 15 September 2024).
- Pashchenko, I.; Plate, H.; Ponta, S.E.; Sabetta, A.; Massacci, F. Vulnerable open source dependencies: Counting those that matter. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, New York, NY, USA, 11–12 October 2018. [Google Scholar] [CrossRef]
- Alarcon, G.M.; Gibson, A.M.; Walter, C.; Gamble, R.F.; Ryan, T.J.; Jessup, S.A.; Boyd, B.E.; Capiola, A. Trust Perceptions of Metadata in Open-Source Software: The Role of Performance and Reputation. Systems 2020, 8, 28. [Google Scholar] [CrossRef]
- Katz, D.; Niemeyer, K.; Smith, A.; Anderson, W.; Boettiger, C.; Hinsen, K.; Hooft, R.; Hucka, M.; Lee, A.; Löffler, F.; et al. Software, vs. data in the context of citation. PeerJ Prepr. 2016, 4, e2630v1. [Google Scholar]
- Tenopir, C.; Allard, S.; Douglass, K.; Aydinoglu, A.U.; Wu, L.; Read, E.; Manoff, M.; Frame, M. Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 2011, 6, e21101. [Google Scholar] [CrossRef]
- Wnuk, K.; Pfahl, D.; Callele, D.; Karlsson, E.A. How can open source software development help requirements management gain the potential of open innovation: An exploratory study. In Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, New York, NY, USA, 19–20 September 2012; pp. 271–280. [Google Scholar] [CrossRef]
- Ho-Quang, T.; Hebig, R.; Robles, G.; Chaudron, M.R.; Fernandez, M.A. Practices and Perceptions of UML Use in Open Source Projects. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), Buenos Aires, Argentina, 20–28 May 2017; pp. 203–212. [Google Scholar] [CrossRef]
- Hebig, R.; Quang, T.H.; Chaudron, M.R.V.; Robles, G.; Fernandez, M.A. The quest for open source projects that use UML: Mining GitHub. In Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems, New York, NY, USA, 2–7 October 2016; pp. 173–183. [Google Scholar] [CrossRef]
- Di Gangi, P.M.; Wasko, M. Steal my idea! Organizational adoption of user innovations from a user innovation community: A case study of Dell IdeaStorm. Decis. Support Syst. 2009, 48, 303–312. [Google Scholar] [CrossRef]
- Open Data Report | Elsevier. Available online: https://www.elsevier.com/about/open-science/research-data/open-data-report (accessed on 15 September 2024).
- Runeson, P.; Soderberg, E.; Host, M. A conceptual framework and recommendations for open data and artifacts in empirical software engineering. In Proceedings of the 1st IEEE/ACM International Workshop on Methodological Issues with Empirical Studies in Software Engineering, Lisbon, Portugal, 16 August 2024; pp. 68–75. [Google Scholar]
- Kipling, R. Just So Stories; Macmillan & Co.: London, UK, 1902. [Google Scholar]
- Imarah, T.S.; Jaelani, R. ABC Analysis, Forecasting And Economic Order Quantity (Eoq) Implementation to Improve Smooth Operation Process. Dinasti Int. J. Educ. Manag. Soc. Sci. 2020, 1, 319–325. [Google Scholar]
- Hart, G. ‘The five W’s: An old tool for the new task of task analysis. Tech. Commun. 1996, 43, 139–145. [Google Scholar]
- Abdulkadir, S.; Aliyu, H.O. ReQueclass: A Framework for Classifying Requirement Elicitation Questions Based on Kipling’s Technique and Zachman’s Enterprise Framework—A Guide for Software Requirement Engineers; i-manager Publications: Tamil Nadu, India, 2018. [Google Scholar]
- Terzi, A.; Bibi, S.; Tsitsimiklis, N.; Angelidis, P. Using Code from ChatGPT: Finding Patterns in the Developers’ Interaction with ChatGPT. In Proceedings of the International Conference on Software and Software Reuse; Springer: Cham, Switzerland, 2024; pp. 137–152. [Google Scholar]
- Schmidt, B.; Gemeinholzer, B.; Treloar, A. Open Data in Global Environmental Research: The Belmont Forum’s Open Data Survey. PLoS ONE 2016, 11, e0146695. [Google Scholar] [CrossRef]
- Data, S.; Astell, M. Benefits of Open Research Data Infographic. 2017. Available online: https://doi.org/10.6084/m9.figshare.5179006.v3 (accessed on 15 September 2024).
- Jackson, M. Software Deposit: Why Deposit Software. 2018. Available online: https://doi.org/10.5281/zenodo.1327333 (accessed on 15 September 2024).
- Pasquetto, I.V.; Sands, A.E.; Borgman, C.L. Exploring openness in data and science: What is “open”, to whom, when, and why? Proc. Proc. Assoc. Inf. Sci. Technol. 2015, 52, 1–2. [Google Scholar] [CrossRef]
- Reilly, S.; Schallier, W.; Schrimpf, S.; Smit, E.; Wilkinson, M. Report on Integration of Data and Publications. 2011. Available online: https://doi.org/10.5281/zenodo.8307 (accessed on 15 September 2024).
- Pasquetto, I.; Randles, B.; Borgman, C. On the Reuse of Scientific Data. Data Sci. J. 2017, 16, 8. [Google Scholar] [CrossRef]
- Sitek, D.; Bertelmann, R. Open Access: A State of the Art. In Opening Science: The Evolving Guide on How the Internet Is Changing Research, Collaboration and Scholarly Publishing; Bartling, S., Friesike, S., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 139–153. [Google Scholar] [CrossRef]
- Kazman, R.; Goldenson, D.; Monarch, I.; Nichols, W.; Valetto, G. Evaluating the Effects of Architectural Documentation: A Case Study of a Large Scale Open Source Project. IEEE Trans. Softw. Eng. 2016, 42, 220–260. [Google Scholar] [CrossRef]
- Ding, W.; Liang, P.; Tang, A.; Van Vliet, H.; Shahin, M. How Do Open Source Communities Document Software Architecture: An Exploratory Survey. In Proceedings of the 2014 19th International Conference on Engineering of Complex Computer Systems, Tianjin, China, 4–7 August 2014; pp. 136–145. [Google Scholar] [CrossRef]
- Gandhi, R.; Germonprez, M.; Link, G.J. Open Data Standards for Open Source Software Risk Management Routines: An Examination of SPDX. In Proceedings of the 2018 ACM International Conference on Supporting Group Work, New York, NY, USA, 7–10 January 2018; pp. 219–229. [Google Scholar] [CrossRef]
- Open Science: Purpose, Benefits, and What It Means for You. Available online: https://blog.theopenscholar.com/en/open-science-purposebenefits (accessed on 15 September 2024).
- McKiernan, E.; Bourne, P.; Brown, C.T.; Buck, S.; Kenall, A.; Lin, J.; McDougall, D.; Nosek, B.; Ram, K.; Soderberg, C.; et al. How open science helps researchers succeed. eLife 2016, 5, e16800. [Google Scholar] [CrossRef] [PubMed]
- Costello, M.J. Motivating Online Publication of Data. BioScience 2009, 59, 418–427. [Google Scholar] [CrossRef]
- Enders, T.; Satzger, G.; Fassnacht, M.; Wolff, C. Why should I share? Exploring benefits of open data for private sector organizations. In Proceedings of the Pacific Asia Conference on Information Systems, Taibei, Taiwan, 5–9 July 2022; Volume 1. [Google Scholar]
- Barker, M.; Chue Hong, N.; Katz, D.; Lamprecht, A.; Martinez-Ortiz, C.; Psomopoulos, F.; Harrow, J.; Castro, L.; Gruenpeter, M.; Martinez, P.; et al. Introducing the FAIR Principles for research software. Sci. Data 2022, 9, 622. [Google Scholar] [CrossRef]
- Hasselbring, W.; Carr, L.; Hettrick, S.; Packer, H.; Tiropanis, T. From FAIR research data toward FAIR and open research software. Inf. Technol. 2020, 62, 39–47. [Google Scholar] [CrossRef]
- Gil, Y.; Ratnakar, V.; Garijo, D. OntoSoft: Capturing Scientific Software Metadata. In Proceedings of the 8th International Conference on Knowledge Capture, New York, NY, USA, 7–10 October 2015. [Google Scholar] [CrossRef]
- Martinez-Ortiz, C.; Martinez Lavanchy, P.; Sesink, L.; Olivier, B.G.; Meakin, J.; de Jong, M.; Cruz, M. Practical Guide to Software Management Plans. 2023. Available online: https://doi.org/10.5281/zenodo.7589725 (accessed on 15 September 2024).
- Smith, A.M.; Katz, D.S.; Niemeyer, K.E. Software citation principles. PeerJ Comput. Sci. 2016, 2, e86. [Google Scholar] [CrossRef]
- Lamprecht, A.L.; Garcia, L.; Kuzak, M.; Martinez, C.; Arcila, R.; Martin Del Pico, E.; Dominguez Del Angel, V.; Sandt, S.; Ison, J.; Martinez, P.; et al. Towards FAIR principles for research software. Data Sci. 2020, 3, 37–59. [Google Scholar] [CrossRef]
- Druskat, S.; Bertuch, O.; Juckeland, G.; Knodel, O.; Schlauch, T. Software publications with rich metadata: State of the art, automated workflows and HERMES concept. arXiv 2022, arXiv:2201.09015. [Google Scholar]
- Kapitsaki, G.M.; Tselikas, N.D.; Foukarakis, I.E. An insight into license tools for open source software systems. J. Syst. Softw. 2015, 102, 72–87. [Google Scholar] [CrossRef]
- Dorta-González, P.; González-Betancor, S.M.; Dorta-González, M.I. To what extent is researchers’ data-sharing motivated by formal mechanisms of recognition and credit? Scientometrics 2021, 126, 2209–2225. [Google Scholar] [CrossRef]
- Shah, U.A.; Hussain, M.; Saddiqa, M.; Yar, M.S. Problems and Challenges in the Preservation of Digital Contents: An Analytical Study. Libr. Philos. Pract. 2021, 2021, 5628. [Google Scholar]
- Strecker, D. Quantitative Assessment of Metadata Collections of Research Data Repositories. Master’s Thesis, Humboldt-Universität zu Berlin, Philosophische Fakultät, Berlin, Germany, 2021. [Google Scholar] [CrossRef]
- Rollins, N.D.; Barton, C.M.; Bergin, S.; Janssen, M.A.; Lee, A. A computational model library for publishing model documentation and code. Environ. Model. Softw. 2014, 61, 59–64. [Google Scholar] [CrossRef]
- Peng, R.D. Reproducible research in computational science. Science 2011, 334, 1226–1227. [Google Scholar] [CrossRef] [PubMed]
- Gousios, G.; Vasilescu, B.; Serebrenik, A.; Zaidman, A. Lean GHTorrent: GitHub data on demand. In Proceedings of the 11th Working Conference on Mining Software Repositories, Hyderabad, India, 31 May–1 June 2014; pp. 384–387. [Google Scholar]
- Hyppölä, J.; Essen von, J.; Keskitalo, E.P. Beyond Open Access—Tools and methods for open research. In Proceedings of the AcademicMindTrek’15, Tampere, Finland, 22–24 September 2015; pp. 206–209. [Google Scholar] [CrossRef]
- Hasselbring, W.; Carr, L.; Hettrick, S.; Packer, H.; Tiropanis, T. Open source research software. Computer 2020, 53, 84–88. [Google Scholar] [CrossRef]
- Di Cosmo, R. Archiving and referencing source code with Software Heritage. In Proceedings of the Mathematical Software–ICMS 2020: 7th International Conference, Braunschweig, Germany, 13–16 July 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 362–373. [Google Scholar]
- Walters, W.H. Data journals: Incentivizing data access and documentation within the scholarly communication system. Insights 2020, 33, 18. [Google Scholar] [CrossRef]
- von Suchdoletz, D.; Brettschneider, P.; Axtmann, A.; Heber, M.; Oberländer, L.; Leendertse, J.; Schumm, I.; Brandt, O.; Schmidt, K.; Gertis, L.; et al. Sicherstellung der Reproduzierbarkeit von Forschungsergebnissen durch Bewahrung des Zugriffs auf Forschungssoftware. Bausteine Forschungsdatenmanagement 2023, 5, 1–13. [Google Scholar] [CrossRef]
- Chue Hong, N.P.; Crouch, S. What Is a Software Management Plan and How Can It Help Your Project? 2021. Available online: https://doi.org/10.5281/zenodo.5648418 (accessed on 15 September 2024).
- Gomez-Diaz, T.; Romier, G. Research Software Management Plan Template, V3.2. Bilingual Document (FR/EN). Available online: https://hal.science/hal-01802565/document (accessed on 15 September 2024).
- Kamocki, P.; Straňák, P.; Sedlák, M. The Public License Selector: Making Open Licensing Easier. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016); Chair, N.C.C., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., et al., Eds.; European Language Resources Association (ELRA): Paris, France, 2016. [Google Scholar]
- Xiao, T.; Treude, C.; Hata, H.; Matsumoto, K. Devgpt: Studying developer-chatgpt conversations. In Proceedings of the 2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR); IEEE: New York, NY, USA, 2024; pp. 227–230. [Google Scholar]
- NAIST. Available online: https://github.com/NAIST (accessed on 15 September 2024).
- ChatGPT Shared Links FAQ | OpenAI Help Center. Available online: https://help.openai.com/en/articles/7925741-chatgpt-shared-links-faq (accessed on 15 September 2024).
- White, J.; Hays, S.; Fu, Q.; Spencer-Smith, J.; Schmidt, D.C. Chatgpt prompt patterns for improving code quality, refactoring, requirements elicitation, and software design. In Generative AI for Effective Software Development; Springer: Berlin/Heidelberg, Germany, 2024; pp. 71–108. [Google Scholar]
- GitHub-Anasterzia/gptchallenge. Available online: https://github.com/Anasterzia/gptchallenge (accessed on 15 September 2024).
- B2SHARE. Available online: https://b2share.eudat.eu/records/db2ef5890fa44c7a85af366a50de73b9 (accessed on 15 September 2024).
- Meijer, I.; Costas, R.; Zahedi, Z.; Wouters, P. The Value of Research Data—Metrics for Datasets from a Cultural and Technical Point of View. A Knowledge Exchange Report; Knowledge Exchange: Copenhagen, Denmark, 2016. [Google Scholar]
Artifact (What) | Benefits of Sharing (Why) |
---|---|
Phase 1: Project Initiation | |
Project Plan | - Provides clarity on the research objectives, methodology, and expected outcomes in advance. - Users are aware of upcoming milestones, which helps in coordinating efforts. - Funders are aware of how their resources will be utilized, increasing the likelihood of future support, ensuring confidence in the research. - Prevents duplication of effort by informing others of ongoing work in the field [8,38,39]. |
Phase 2: Analysis and Detailed Planning | |
Business Requirement Documentation (BRD) User Requirement Specifications Software Requirement Specifications (SRS) | - Sharing requirements ensures it cannot be used as a differentiator by competitors. - Researchers can inject new requirements into the open source community by outsourcing the cost of prototype development. - Public sharing of documents facilitates automatic extraction of requirements based on evolving needs [22,25]. |
Configuration Management (CM) Plan | - Increases reliability over project’s longevity |
Phase 3: Design | |
Detailed Design Specifications (DDS) | - Provides better understanding of the software structure. - Defines a set of constraints on subsequent implementation. - Provides consistent between intention and implementation. - Enables faster verification and increases user acceptance. - Improves communication and reduces ambiguity regarding structure and use. - Helps new contributors onboard effectively. - Provides better understanding on costs and delivery time of system [23,24,40,41]. |
Phase 4: Software Construction | |
Unit Code Software packages Artifacts | The research on Open Software has thoroughly analyzed the benefits of sharing source code, including enhanced quality, improved security, flexibility of use, reduced costs for improvements, increased collaboration, and better user support. Additionally, we propose the following benefits: - Reduces code redundancy, preventing duplicated efforts. - Increases recognition and researchers’ reputation. - Expands opportunities for reusability, allows users to replicate, validate, and build upon existing work [7,8,9,19]. |
Phase 5: Testing | |
Code Inspection Test Summary Report | - Clarifies the intended behavior of the software at the time it was last programmed. - Helps ensure that proposed changes do not break other projects within the ecosystem. - Distributes the labor of maintaining code quality, reducing the burden on code reviewers. - Supports validation of submitted code and identification of errors by more contributors, increasing the likelihood of detecting and resolving bugs. - Enables test modifications and improvements without disrupting the main development process, supporting continuous integration and quality assurance [8,11,12,13,14]. |
User Acceptance | - Demonstrates a commitment to meeting users’ needs and addressing their concerns - Highly involve users part on the development process - Increases research reputation and discoverability [9,19] |
Risk Management | - Simplifies internal routines by providing external feedback. - Facilitates the exchange of standards by aligning practices with widely accepted benchmarks. - Allows researchers to compare their local routines with other implementations, leading to adaptation and potential improvement [42]. |
Phase 6: Implementation | |
Production Environment Live System | - Provides users real time experience increasing trust on the system - With more people putting system under stress, bugs and errors are being easily discovered and faster addressed [7,8,9] |
Phase 7: Maintenance | |
Dependencies Tech Debt | - Informs users on necessary third-party components, supporting easier implementation. - External feedback leads to updating outdated or risky dependencies. - Increases code security by identifying unused or vulnerable dependencies. - Supports learning and adaptation by helping users manage dependencies better. - Reduces the cost of fixing vulnerabilities related to outdated dependencies [16,17,18]. |
Maintenance Guide | - Helps new contributors easily adapt, aiding in long-term project sustainability. - Gains users’ trust by providing clear maintenance practices. - Distributes maintenance tasks, reducing the burden on individual maintainers as the project scales. - Enhances the project’s longevity by ensuring consistent upkeep and adaptability within the software ecosystem [8,11,15]. |
Customer’s Review Software metrics Development metrics Usage metrics | - Raising awareness on the software’s quality. - Product appears appealing due to transparency of metrics. - Better evaluation and validation of the project. - Increasing willingness to use the data. - Builds reputation, leading to more views and increased discoverability. - Increase loyalty [9,19] |
Audience (Who) | Why Is Beneficial | Where | ||
---|---|---|---|---|
Generalist Data Repositories | Discipline Specific Repositories | (Data) Journals | ||
SE Academic Researchers | - Additional publications. - Greater citation rate. - Wider recognition among peers. - Invitations to collaborate. - Invitations to provide consultancy [45]. | ✓ | ✓ | ✓ |
SE in private sector | - Cost and effort reduction due to reuse. - Ready made solutions requiring alterations to fit sector’s need. - Increased availability of verified data. - Acquiring new skills and knowledge [46]. | ✓ | ✓ | |
Data Scientists | - Access to Extended Open Data Sets. - Time and effort reduce. - High quality multi format data. - Reduced data redundancy. | ✓ | ✓ | |
Developers | - Acquiring new skills and knowledge. - Reduced software redundancy. - Easy access to innovative tools. - Better understanding of code structure. - Faster development. | ✓ | ✓ | |
Maintainers | - Acquiring new skills and knowledge. - Data is undergoing tests by a diverse pool of users, leading to faster identification and rectification of bugs. - Reduced reliance on outdated sources. - Greater pool of automated test cases. - Reduction of vulnerable code | ✓ | ✓ | ✓ |
Business owners | - Businesses strengthen their innovative capabilities. - Better informed employers. - Greater pool of users [46]. | ✓ | ||
Research’s Funders | - Better financial return from research investment. - Increased reputation. - Building network. - Invitations to collaborate. - Invitations to provide consultancy. - New opportunities for funds. | ✓ | ✓ | |
Publishers | - Independent verification and qualification of research - Increased reputation can elevate the publisher’s impact - Related research is likely to gain attention | ✓ | ||
Affiliations | - Independent verification and qualification of research. - Increased reputation. - Higher impact. - Attracting better funding opportunities. | ✓ | ✓ | |
Research Participants | - Access to data - Better control over shared content. - Increased Trust. | ✓ | ||
Public | - Access to knowledge. - Greater Support. - Easier access to resources. - Better understanding. - Reducing communication barrier. - Increased Trust. | ✓ |
Use Case (Why) | Where | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Artifact (What) | Increasing FAIR Principles | Static/Dynamic Runtime | Replication of Research | Contribute to Software | Present Funding Results | Use Data for New Project | Evaluate Contribution | Store Software Entry | Informal Sharing among Peers | Formal Sharing to Consortium | Funder’s Website | Discipline Specific Repositories | Generalist Data Repositories | (Data) Journals as Part of Research Paper |
Phase 1: Project Initiation | ||||||||||||||
Project Plan | F | S | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
Phase 2: Analysis and Detailed Planning | ||||||||||||||
Business Requirement Documentation (BRD) | I | S | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
Software Requirement Specifications (SRS) | I | S | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
User Requirement Specifications | I | S | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
Configuration Management (CM) Plan | I | S | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
Phase 3: Design | ||||||||||||||
Detailed Design Specifications (DDS) | A | DR | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
Phase 4: Software Construction | ||||||||||||||
Unit Code | R | DR | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
Software packages Artifacts | R | DR | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
Phase 5: Testing | ||||||||||||||
Code Inspection | I | DR | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
Test Summary Report | I | S | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
User Acceptance | F | S | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
Risk Management | I | S | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
Phase 6: Implementation | ||||||||||||||
Production Environment | F | DR | ✓ | ✓ | ✓ | ✓ | ||||||||
Live System | R | DR | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
Phase 7: Maintenance | ||||||||||||||
Dependencies | I | S | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
Tech debt | I | S | ✓ | ✓ | ✓ | |||||||||
Maintenance Guide | R | S | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
Customer’s Review | F | S | ✓ | ✓ | ✓ | ✓ | ||||||||
Software metrics | A | S | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
Development metrics | A | S | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
Usage metrics | A | S | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Name (Where) | Link | Supported Artifacts (What) | When | |||
---|---|---|---|---|---|---|
Access | Name (Where) | Supported Artifacts (What) | Beginning of Project | Midpoint of Project | Completion of Project | |
Generalist Data Repository | ||||||
Open Access | Kaggle | https://www.kaggle.com/ (accessed on 15 September 2024) | Datasets, Notebooks (Jupyter notebooks, R or Python scripts), pre-trained ML models | ✓ | ✓ | ✓ |
GitHub | https://github.com/ (accessed on 15 September 2024) | Static and Dynamic Metadata, raw Source code, test cases, and documents in any format | ✓ | ✓ | ✓ | |
EUDAT | https://www.eudat.eu/ (accessed on 15 September 2024) | Dynamic Metadata, Software, workflows | ✓ | ✓ | ||
Zenodo | https://zenodo.org/ (accessed on 15 September 2024) | Static and Dynamic Metadata, Software, Pre-publications | ✓ | ✓ | ||
ISBSG | https://www.isbsg.org/about-isbsg/ (accessed on 15 September 2024) | Software projects with documentation | ||||
OSF | https://osf.io/ (accessed on 15 September 2024) | Files, research data, code, protocols | ✓ | ✓ | ||
OLOS | https://access.olos.swiss/portal/#/home (accessed on 15 September 2024) | Research data, Static Metadata, Source code | ✓ | ✓ | ||
SIR | https://sir.csc.ncsu.edu/portal/index.php (accessed on 15 September 2024) | Software systems, artifacts, test suites, scripts | ✓ | ✓ | ||
Dataverse | https://dataverse.org/ (accessed on 15 September 2024) | Static and Dynamic Metadata, Software, Research data, Publications | ✓ | |||
SourceForge | https://sourceforge.net/ (accessed on 15 September 2024) | Static Metadata, Open source and paid software projects | ✓ | |||
Proprietary | Dryad | https://datadryad.org/stash (accessed on 15 September 2024) | Any form of Research Files, including compressed archives | ✓ | ✓ | |
Figshare | https://figshare.com/ (accessed on 15 September 2024) | Any form of Static or multimedia Research Output, excluding the source code | ✓ | ✓ | ||
Discipline-/Subject-Specific Data Repository | ||||||
Open Access | PROMISE | http://promise.site.uottawa.ca/SERepository/ (accessed on 15 September 2024) | Datasets and tools to serve researchers in building predictive software models | ✓ | ✓ | |
Software Heritage Dataset | https://docs.softwareheritage.org/index.html (accessed on 15 September 2024) | Source code from software projects and development forges | ✓ | ✓ | ||
Only for data use not no contribution | Qualitas Corpus | http://www.qualitascorpus.com/ (accessed on 15 September 2024) | Collection of software systems intended for empirical studies of code artifacts | ✓ | ✓ | ✓ |
Bug Prediction Dataset | https://bug.inf.usi.ch/index.php (accessed on 15 September 2024) | Collection of models and metrics of software systems and their histories | ✓ | ✓ | ✓ | |
Ultimate Debian Database (UDD) | http://udd.debian.org/ (accessed on 15 September 2024) | Aspects of Debian in the same SQL database | ✓ | ✓ | ✓ | |
CiteSeerx | https://csxstatic.ist.psu.edu/home (accessed on 15 September 2024) | Digital library, includes scientific papers, algorithms, data, metadata, services, techniques, and software | ✓ | ✓ | ✓ | |
Software Engineering Data Repository for Research and Education | http://analytics.jpn.org/SEdata/ (accessed on 15 September 2024) | Datasets and tools to serve researchers in building predictive software models | ✓ | ✓ | ✓ | |
Data Journals | ||||||
Open Access | Data in Brief | https://www.sciencedirect.com/journal/data-in-brief (accessed on 15 September 2024) | Short articles, Static and Dynamic Metadata, Software | ✓ | ✓ | ✓ |
SoftwareX | https://www.sciencedirect.com/journal/softwarex (accessed on 15 September 2024) | Scientific paper, Software | ✓ | ✓ | ✓ | |
Data | https://www.mdpi.com/journal/data (accessed on 15 September 2024) | Static and Dynamic Metadata, Scientific paper | ✓ | ✓ | ✓ | |
Journal of Open Research Software (JORS) | https://openresearchsoftware.metajnl.com/ (accessed on 15 September 2024) | Software Metapapers, Research Software | ✓ | ✓ | ||
Journal of Open Source Software (JOSS) | https://joss.theoj.org/ (accessed on 15 September 2024) | Scientific papers, research software packages | ✓ | ✓ | ||
Scientific Data | https://www.nature.com/sdata/ (accessed on 15 September 2024) | Short articles, Static Metadata | ✓ | ✓ | ||
Software Impacts | https://www.sciencedirect.com/journal/software-impacts (accessed on 15 September 2024) | Static and Dynamic Metadata, Software, Short description (Software research Impact, Use cases) | ✓ | |||
SIAM Journal on Scientific Computing (SISC) | https://epubs.siam.org/journal/sisc/ (accessed on 15 September 2024) | Scientific papers, documented Research Impact and Use cases | ✓ |
When | Who | Why | What | Where |
---|---|---|---|---|
Beginning of Project | Authors | Collaboration, Disclosure, Establishing a foundation for data accuracy | Raw Data and Preprocessed Data (original JSON files, Python scripts, data analysis procedures, preprocessing actions) | Natively on machines and in a GitHub private repository |
Midpoint of Project | Research Team, Software Researchers | Cross-Validation, Reproducibility, Verification of research, Proofreading, Ensuring real-time collaboration | Processed Data (assembled dataset, Python scripts, Weka files, SPSS files, complete data analysis procedures and methodologies) | GitHub public repository |
Completion of Project | Software Researchers, Businesses, Educational Institutions, Learners, Financial Sponsors, Publishers | Knowledge Dissemination, Enhancing transparency, Supporting future research, Informing curriculum development | Final Data (new dataset in CSV file, Python scripts, ARFF file, SPSS file) and Research Paper (tables, graphs, metrics, and analytic research methodology) | Journal and EUDAT repository |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Terzi, A.; Bibi, S. Opening Software Research Data 5Ws+1H. Software 2024, 3, 411-441. https://doi.org/10.3390/software3040021
Terzi A, Bibi S. Opening Software Research Data 5Ws+1H. Software. 2024; 3(4):411-441. https://doi.org/10.3390/software3040021
Chicago/Turabian StyleTerzi, Anastasia, and Stamatia Bibi. 2024. "Opening Software Research Data 5Ws+1H" Software 3, no. 4: 411-441. https://doi.org/10.3390/software3040021
APA StyleTerzi, A., & Bibi, S. (2024). Opening Software Research Data 5Ws+1H. Software, 3(4), 411-441. https://doi.org/10.3390/software3040021