Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

DataPLAN: A Web-Based Data Management Plan Generator for the Plant Sciences

Data 2023, 8(11), 159; https://doi.org/10.3390/data8110159

by Xiao-Ran Zhou^1,*

, Sebastian Beier¹

, Dominik Brilhaus²

, Cristina Martins Rodrigues³

, Timo Mühlhaus⁴, Dirk von Suchodoletz³, Richard M. Twyman⁵

, Björn Usadel^1,6

and Angela Kranz^1,*

Reviewer 1: Anonymous

Reviewer 2:

Claudia Duran

Reviewer 3:

Yongjun Ren

Data 2023, 8(11), 159; https://doi.org/10.3390/data8110159

Submission received: 10 August 2023 / Revised: 29 September 2023 / Accepted: 17 October 2023 / Published: 24 October 2023

Round 1

Reviewer 1 Report

This paper describes a tool for automated generation of DMPs with a use of pre-defined templates, specifically in plant sciences. The tool is publicly available and I had a chance to test it myself.

The tool is very intuitive and creates an overall good impression. I think it is also relevant to the designated community. However, the structure, focus, and the way of reporting in the paper must be improved.

Major:

The introduction is lengthy and is not clear:

- what are the specific needs of the plant community?

- why the mentioned tools and solutions are not sufficient?

- how is your solution different and what is new in your approach compared to the previous work?

- what research you did to create the tool, i.e. what was the research methodology, how you evaluated your solution, etc.?

- an outline of what this paper is about, i.e. what should the readers expect to find in it

Section 2.3. comes out of the blue. I guess it is a contribution of one of the co-authors that you decided to merge "somewhere". It provides details on clustering and comparison of algorithms which seems completely unrelated to the topic of the paper. There is also only one short sentence that tries to explain why this was needed at all and it is still not clear to me how it influenced the design of the tool. I think you need to create new sections that make it explicit: (1) what the specific requirements of the community plants community are, (2) what is the systematic approach you followed to design the tool and how all actions, e.g. like the clustering, influenced it design.

Section 2.4.1. must be deleted, because it is confusing and unnecessary. FAIR principles in most cases apply to data (at least you don't refer to FAIR principles for software, etc.). Based on the flow of the paper and the section header I would expect to learn how your tool helps in making data FAIR. Instead, you describe how you make the tool itself FAIR. Why do you do this? Was this a requirement of the plant community? Even if it was, then the description is far-fetched. For example: "lightweight design and fast loading contribute to accessibility" - I don't see any sub-principle in the original FAIR paper by Wilkinson et al. that would postulate this... Same for the interoperability - the fact that you use JSON, does not mean that it is interoperable. Following some well-known standards would likely make it. I have checked your outputs and the JSON includes a basic stub from the example provided by RDA and a list of your own properties – not very interoperable.

3. Discussion - it includes many words that I would rather see in a marketing leaflet and not in the research paper. They have no justification and are self-praising, e.g. "versatile but light-weight tool" - how was this evaluated? who said that?

You also write in the introduction of this section that your tool fulfil requirements of Horizon and DFG template - where is the evidence?

The comparison with other tools is lengthy and it provides a lot of seemingly relevant differences. However, it seems that you're not distinguishing between software and a service. A lot of tools you used in comparison can be self-hosted or a custom instance can be hosted only for a dedicated community. Tools like DSW allow you to have your own custom knowledge model. There is also tool DAMAP that provides pre-defined answers similar to yours (not mentioned in the comparison). In my opinion, you are trying very hard to argue that developing yet another tool was necessary. I am ok with a new tool. I think there is room for more tools. However, you should argue this necessity based on the needs of your community or the specificity of the use case you consider, and not by trying to prove that some tools require you to log-in, while yours does not...

Sections 3.2.1 - 3.3.3 write about things in future tense and thus I understand that this is future work that was not evaluated anyhow. It should be removed or moved to future work section.

Section 4.2 gives a glimpse on the part of the methodology you applied to develop the tool. It should appear much earlier in the paper. This is what makes the paper interesting to a broader audience, i.e. they can benefit from your approach and do things in a similar way. This is what puts apart a marketing leaflet from a research paper!

4.3. Testing - this does not replace evaluation! We need to see a case study in which your tool was used to produce (and reuse!) the DMP. Currently the paper has no evidence whether the tool helped the designated community achieve the original goals and to what extent!

When talking about the reuse of information for DMPs, one cannot ignore the existence of the RDA recommendation for maDMPs. Most of the tools you mention support it and you also have parts of it in your work (but you don't name it properly...). The main difference between RDA and your work is that you seem to still work with text descriptions, while RDA models information. Am I right? Or you have some internal data model that organises facts and not text? I looked at your JSON export and it uses a very basic set of fields from RDA recommendation, but has a lot of placeholder fields and also custom fields from your tool. Since the community seems to be moving towards maDMPs, it is not clear to me to what extent your tool is compatible with it and if not, then why.

Minor:

Abstract: First sentence of the abstract is already confusing! "RDM is a system for..." - DataPlan may be a system for, but not the RDM.

Abstract: in general it is not clear why you want to reuse text formulations across DMPs. I can suspect that this is for the researchers' convenience, but it is nowhere made explicit. In other words, it is not clear from the abstract what's the goal of your work and who benefits from it and how.

Introduction: you talk a lot about "standardisation" or rather it's lack. I only understood way later what you mean. Most readers will assume that there is standardisation, because we only have few DMP templates (not multiple like it used to be around 2010)! You refer to a standardisation in the way people answer questions in DMPs. Similar to my comment regarding abstract, it is not clear why this is needed and what it actually means.

Line 82: "...DMPs results in limited reusability" - given that you refer to FAIR principles already in the introdcution, most of the people will understand that you mean reusabiliy of DATA described by a DMP, not the reusability of a DMP itself. Such formulations are really confusing.

Conclusion: “we applied machine learning to develop DMP tools” – buzz wording and this is not true, you used some clustering algorithms to compare two text documents…, you did not develop the tool using ML…

Most of the references are links to repositories, templates, documents. Please update the references with proper research papers that deal with the topic of this work and please explain how similar/different is what you did compared to the recent developments

The english in terms of grammar is fine. The problem is how you interpret and use certain terms, e.g. "no standardisation" or FAIRness - see the comments above.

Author Response

We thank Reviewer 1 for diligently evaluating our tool and providing us with comprehensive feedback. In response to the insightful comments, we have thoroughly revised the paper to improve its structure, organization, focus, and reporting approach.

Major:

The introduction is lengthy and is not clear:

Response: We agree and have considerably shortened the introduction.

- what are the specific needs of the plant community?

Response: Many data types are specific to the plant sciences or are shared with other disciplines but have aspects that are unique to plants. For example, growth and biomass accumulation occur in all living organisms but organ-specific growth parameters such as leaf and root biomass, number of flowers and fruits, are unique to plants. Photosynthetic performance is shared with some microbes but it is measured in a unique way in plants. High-throughput phenotyping in plants generates large, highly-dimensional datasets based on different types of imaging among other data types, and these must be annotated with plant-specific ontologies. Plant taxonomy is another example, particularly in the context of plant-microbe interactions. These rich and diverse datasets must also be merged with fairly universal data types (such as genomics, transcriptomics, proteomics and metabolomics data) which generate heterogeneous workflows. RDM platforms for plants therefore need to accommodate these unique aspects. This is now stated in the revised introduction. (lines 143 -168 in the tracked changes document)

- why the mentioned tools and solutions are not sufficient?

Response: Other DMP tools that without plant-specific examples are not useful for the plant sciences community. DataPLAN uses plant-specific examples purposefully to overcome this limitation. In some cases, they provide specific examples focusing on disciplines other than plants. We have now explained the need for this new tool in the introduction. (lines 125 -141 in the tracked changes document)

- how is your solution different and what is new in your approach compared to the previous work?

Response: DataPLAN provides a user-friendly, step-by-step wizard in a web-based tool that enables users to generate multiple DMPs suitable for the different formats demanded by different funding bodies, but users only need to answer one set of questions. We focused deliberately on this approach and the user-friendliness was evaluated highly in a recent study (https://doi.org/10.52825/cordi.v1i.338). We initially designed DataPLAN for the plant sciences because it is integrated within the NFDI DataPLANT framework. However, it can be adapted to other research disciplines and this is why we make the code freely accessible. The DataPLANT ecosystem offers a diverse set of tools and services to assist users with various aspects of RDM, including standardized (meta)data annotation using controlled vocabularies and ontologies. We have now made this clearer in the revised manuscript. (lines 319 - 403 in the tracked changes document)

- what research you did to create the tool, i.e. what was the research methodology, how you evaluated your solution, etc.?

Response: We used manual comparison of RDM content and questions posed by the selected funding agencies and programs in order to create DMPs. We gather the responses needed to prepare the DMPs and manual answer of all questions. The results are compared with existing DMPs from prior projects. This information is now provided in the revised manuscript. (lines 239 - 250 in the tracked changes document)

- an outline of what this paper is about, i.e. what should the readers expect to find in it

Response: As requested, we have summarized the content of the manuscript . (lines 205 -221 in the tracked changes document)

Response: We agree that the detailed background of the STS method is unnecessary and we have removed it from the manuscript. We have reorganized the manuscript to cover the points suggested.

Response: We appreciate this detailed analysis of our work and we respond to each point in turn.

We have included the correct citation of the “FAIR principles for software” because we would like to specify that DataPLAN is FAIR as a software. This is not a requirement for the plant community per se, but it would be a good start for any research data software to have a dedicated section detailing the implementation of FAIR principles and we would like to lead this by example.
Our tool only provides helpful RDM practices to make data FAIR. As a DMP tool, we only give guidance and recommendations to our users on how to best abide by FAIR principles for their data, but we cannot give instructions that cover use cases. We have added a section that explains the FAIR RDM practices that are included in the DataPLAN tool and how the tool can help to make data FAIR in general (lines 319 -345 in the tracked changes document). We added the reference to the FAIR principles for software (line 538) and modified the header by adding “of software” in each relevant subsection.
We have toned down the assessment of interoperability (lines 570 - 581 in the tracked changes document). Currently, we only use maDMP JSON for metadata whereas raw data for the DMP are stored in the HTML file. The maDMP JSON metadata standard is currently evolving, so we aim to follow the standard and add as much information as possible to the JSON in future. (lines 758 - 761 in the tracked changes document)
Discussion - it includes many words that I would rather see in a marketing leaflet and not in the research paper. They have no justification and are self-praising, e.g. "versatile but light-weight tool" - how was this evaluated? who said that?

We have revised the discussion to make it more objective and have provided explanations to explain any claims made (e.g. “DataPLAN is a client-side web-based application with less than 1 MB of source code.”) We have also added LightHouse testing results (lines 556 - 569 in the tracked changes document).

You also write in the introduction of this section that your tool fulfil requirements of Horizon and DFG template - where is the evidence?

Response: We have modified the sentence to state that the tool has been modeled according to templates and questionnaires provided by Horizon Europe and DFG to facilitate the preparation of DMPs.

Response: Again we are very grateful for this detailed critique and we respond to each point in turn:

In the comparison of tools, we highlighted some features that improve accessibility and reduce user workload. We have rephrased the paragraph on offline usage (lines 556 in the tracked changes document), which is achieved by Ctrl+S in the browser to save a local HTML file. We have also changed the text describing other tools that “…do not offer prewritten templates because as tools (but not service providers) they are applicable to a wide range of research domains, which need to be self-hosted or customized to provide specific and practical RDM solutions...” (lines 563 - 668 in the tracked changes document).
The exclusion of DAMAP was an oversight and we have now included it in the text and the comparison table. We have also stated that it provides “prewritten answers”.
We have also explained the focus on plant communities in more detail (lines 729 -739 in the tracked changes document).

Sections 3.2.1 - 3.3.3 write about things in future tense and thus I understand that this is future work that was not evaluated anyhow. It should be removed or moved to a future work section.

Response: We agree, and we have changed this part of the manuscript into an outlook section. Reviewer 2 also commented on this issue and we have merged the changes.

Response: We very much appreciate this feedback and have moved the section into the results as suggested. (section 3.1.1).

Response: DataPLAN has been used mainly for internal projects thus far in order to test its capabilities and limitations. We are only now at the stage where we feel confident in releasing it to the broader research community. Indeed, one of the purposes of this manuscript is to raise awareness and encourage new users. This means we have no external case studies to present. However, we have created a comprehensive example DMP using DataPLAN, which serves as a demonstration of the tool’s capabilities. This is included in the supplemental files accompanying the manuscript. We believe that this practical example will provide readers with a clear understanding of how DataPLAN can be used to produce DMPs effectively. For external reuse of DMP between different projects and tools, we plan to be fully compliant with maDMP in the future (also see next response).

Response:

Currently only the user input and metadata are stored in the JSON file. We use minimal maDMP recommendations for metadata storage.
We did not use the maDMP information model on all our data because some template data is stored in the HTML. We will look for ways to store HTML content in maDMP correctly.
Our inter-data model is based on placeholders and different text types (lines 283 -294 and line 358 - 402 in the tracked changes document).
The current JSON is intended for internal reuse and storage of the answers. We plan to use maDMP to facilitate data reuse between different tools. .

Minor:

Abstract: First sentence of the abstract is already confusing! "RDM is a system for..." - DataPlan may be a system for, but not the RDM.

Response: The abstract has been completely revised to summarize the content of the manuscript.

Response: We have now included a statement explaining that the lack of reusable components in DMPs is inefficient because additional time is required by researchers to write unique DMPs for different projects. DataPLAN addresses this issue by allowing the same text inputs to be used for multiple DMPs meeting the criteria mandated by different funding agencies.

Response: We have adjusted our descriptions and explanations to provide greater clarity and precision on this issue (lines 65 -74, Figure 1C, in the tracked changes document).

Response: We have removed this confusing sentence to improve clarity.

Response: As stated earlier, we have now removed the detailed STS analysis from the manuscript.

Response: We have updated the references as suggested. For example, we have cited the following publications:

Blumesberger, S. et al. FAIR Data Austria – Abstimmung der Implementierung von FAIR Tools und Services. Mitteilungen der VÖB 74, 102–120 (2021)

Miksa, T., Oblasser, S. & Rauber, A. Automating research data management using machine-actionable data management plans. ACM Transact. Manag. Inf. Syst. 13, 18 (2022)

Cardoso, J. et al. DCSO: towards an ontology for machine-actionable data management plans. J. Biomed. Semant. 13, 21 (2022)

Reviewer 2 Report

1. The introduction is unclear, has many acronyms and the context and objective of the research is not understood. It is necessary to add what methods or methodologies have been used and what results have been obtained in order to understand what new knowledge is being proposed.

2. The method used for the research is missing, it adds results in which the methods are mixed, with the description of the case and the results. It is necessary to structure the manuscript since it is not understood how the texts are analyzed even though the techniques and models of data analytics are used.

3. In materials and methods it is not understood what the methods are. It is advisable to write it in a clearer way.

4. Conclusions need to be improved.

5. It is necessary to improve the structure and add a diagram showing the architecture, to better understand the research.

6. How useful is it? What new knowledge does it generate?

7. Add future research lines

Moderate editing of English language required

Author Response

We thank Reviewer 2 for diligently evaluating our tool and providing us with comprehensive feedback.

The introduction is unclear, has many acronyms and the context and objective of the research is not understood. It is necessary to add what methods or methodologies have been used and what results have been obtained in order to understand what new knowledge is being proposed.

Response: We have fully rewritten the introduction (lines 112 -221 in the tracked changes document) and have added methods (lines 222 -306 in the tracked changes document) and results (lines 308 -569 in the tracked changes document).

The method used for the research is missing, it adds results in which the methods are mixed, with the description of the case and the results. It is necessary to structure the manuscript since it is not understood how the texts are analyzed even though the techniques and models of data analytics are used.

Response: Thank you for your valuable feedback. In response to your advice, we have restructured the manuscript to create a clear distinction between methods and results.

In materials and methods it is not understood what the methods are. It is advisable to write it in a clearer way.

Response: We have restructured and rewritten the entire methods section to improve clarity and we have added Figure 3, a comprehensive flow chart illustrating the DataPLAN architecture and core functions, to provide a visual overview.

Conclusions need to be improved.

Response: We have revised the conclusions as suggested. For example, we removed the first paragraph (which repeated the description of our tool) and rephrased the remainder, including the replacement of “standardized answers“ with “prewritten reusable answers”.

It is necessary to improve the structure and add a diagram showing the architecture, to better understand the research.

Response: Thank you for your comment. The structure has been changed (see for example, added more explanation of the text contents from lines 358 -403 in the tracked changes document) and we added figure 3 to illustrate the architecture of DataPLAN.

How useful is it? What new knowledge does it generate?

Response: DataPLAN does not generate any new knowledge but this is not its purpose. Its purpose is to streamline the preparation of DMPs for projects in the plant sciences, enabling multiple DMPs fitting the requirements of different funding agencies to be prepared from a single set of input questions. This saves time and effort when research groups/laboratories need to prepare DMPs with similar contents for different funding organizations. We have quantified the technical accessibility by adding LightHouse testing results (lines 566 - 570 in the tracked changes document). The user-friendliness of DataPLAN was also evaluated highly in a recent study (https://doi.org/10.52825/cordi.v1i.338).

Add future research lines

Response: We have now included an outlook section as suggested (lines 754 -783 in the tracked changes document).

Reviewer 3 Report

This article investigates the issue that the standardization of data management plan (DMP) content in plant science is not as clear as the standardization of data management (RDM) practices and data/metadata. The article proposes a web-based plant science data management plan generator, DataPLAN. Specifically, this is a tool that combines questionnaire surveys with pre written standardized responses. We package the questionnaire in a serverless single page web application, which can then generate standardized responses from the DMP template.

Pros:

- Considering the searchability, accessibility, interoperability, and reusability of data. The DataPLAN proposed in the article is a tool that combines questionnaires with pre written standardized responses. The author encapsulates the questionnaire in a serverless single page web application, which can then generate standardized responses from DMP templates.

- Considering the compatibility of the tool. DataPLAN has created standardized templates for generating DMP, which currently meet the requirements of three funding agencies. DataPLAN also provides a practical user guide to promote RDM methods.

- Considering the practicality of the tool. The DataPLAN web application is open source and does not require an internet connection. By using DataPLAN, the workload associated with creating, updating, and adhering to DMP is significantly reduced.

Minor comments:

- The arrangement of the introduction section of the article is not reasonable enough. The entire article only describes some background knowledge and does not discuss the advantages of DMP and RDM mentioned in the article, nor does it focus on the advantages of the proposed tools. Therefore, it is recommended that the author arrange the structure of the article reasonably.

- The authors should consider referring to some papers related to their topic, such as “BSMD: A blockchain-based secure storage mechanism for big spatio-temporal data”, and “Multiple cloud storage mechanism based on blockchain in smart homes”.

- In the experimental section, the content is too simple and lacks the analysis that should be included in the experimental section, such as comparative experiments on time or storage costs, which require further comparison with existing methods. I hope the author can supplement the experimental content and use pictures to demonstrate the advantages and disadvantages of the proposed tool.

minor editing

Author Response

Comments and Suggestions for Authors

We are sincerely appreciative of the time and effort contributed by Reviewer 3, and we highly value the insightful comments and valuable advice provided.

Pros:

Minor comments:

Response: We have rewritten the introduction to address these comments, including the advantages of DMPs and RDM (lines 66 -220 in the tracked changes document).

Response: We appreciate your kind gesture in sharing additional papers for us to review and potentially reference. After careful consideration, we have determined that the current connections between our manuscript and the provided literature may not be sufficiently robust. Nevertheless, we will certainly explore the possibility of incorporating references to these papers in our future publications when relevant.

Response: We have added Figure 1c and Figure 3 to explain the tool’s architecture and core functionality more clearly. We have also provided an extensive comparison of our tool and others developed to fulfill a similar purpose. Comparative experiments are not appropriate in this scenario because there are no quantitative metrics that can be used to rank the performance of different tools – they will all produce DMPs given user feedback. The differences are qualitative, reflecting variations in functionality and usability, which are covered by our comparative descriptions in the discussion.

Round 2

Reviewer 1 Report

Well done. I am glad that my comments were helpful.

Author Response

Thank you for your valuable feedback again. We are happy that our revisions have addressed all your comments.

Reviewer 2 Report

There is a lack of descriptions of the figures and in the conclusions to add what lines of future research can be developed.

Minor editing of English language required

Author Response

Comment: There is a lack of descriptions of the figures and in the conclusions to add what lines of future research can be developed.

Response: We have added more content in the caption of Figure 1-4 and also added more descriptions in the main text. For example:

The caption of Figure 1 has been rewritten to:

“Figure 1. DMPs prepared for multiple projects can be merged if they all use standardized RDM and have reusable metadata and raw data. (a) DMPs encompass RDM practices for raw data and metadata. The content of a DMP is dependent on the RDM practices used. (b) Although DMPs encompass reusable standardized RDM practices in similar plant-related projects (blue, green and yellow boxes) [1], the contents of DMPs prepared for different projects or funding agencies are disconnected [6,7], and users must provide input (red boxes) multiple times even though the DMPs only have minor differences. (c) If similar or standardized RDM practices (blue, green and yellow boxes) are used in different projects, the content of different DMPs can be merged. The merged content can be provided for use in diverse projects with different funding agencies and programs. This reduces the user input (red box) compared to that shown in panel (b).”

The caption of Figure 2 has been rewritten to:

“Figure 2. DataPLAN template and questionnaire design. Step 1: Manual checking and answering of questions in the DFG and Horizon Europe questionnaires. Step 2: Generation of reusable answer building-blocks for each funding body. We ensure the answers comply with existing metadata standards, data types and RDM platforms so that they can be reused between different projects and funding bodies. Step 3: Design of the questions displayed by DataPLAN followed by matching them with the reusable answers generated in step 2.”

For Figure 3, in lines 161- 163 of main text, we added:

“As shown in Figure 3a, we stored the interface, functions and reusable DMP content in a single HTML file. The interface is exposed whereas the other content is hidden. ”.

In lines 184 - 186, we added:

“Figure 3b shows four diamond blocks. These decide which rectangular blocks need to run searches.”

The caption of Figure 4 has been rewritten to:

“Figure 4. The web-based user interface of DataPLAN. The left panel displays a live preview of the DMP while the right panel displays the DataPLAN questionnaire. In the left panel, the static text is shown in black, the user-selected text is shown in green, and the user-written text has yellow highlights. In the right panel, text inputs are indicated by red lines, and reusable answers in the checkbox format are indicated by blue lines. The red lines connect all the answers given by the text input, which is located at the top of the questionnaire in the right-hand panel, and a blue line connects the answer associated with selecting the EU project option in the checkbox. This connection between the left and right panels is also animated when the DataPLAN is used.”

We have added future development plans in the conclusions.

“In the future, DataPLAN will be maintained and updated by DataPLANT and IBG-4, to integrate new technologies and to deepen the synchronization of evolving DataPLANT tools, such as Swate, DataHub, and ARCcommander.”

Article Menu

DataPLAN: A Web-Based Data Management Plan Generator for the Plant Sciences

Further Information

Guidelines

MDPI Initiatives

Follow MDPI