1. Introduction
The last three decades have been marked by some major developments and innovations in digital technologies, providing support to newly created paradigms of e-science and open science and to activities related to research data such as creating, collecting, storing, sharing and use of the data, making modern science increasingly digital and data-intensive [
1]. Since the 1990s and the advent of e-science, research data have become a popular topic of many theoretical and practical scientific endeavors. During the same period, scientific and professional papers and books have been published describing the positive impact of research data on the development of modern science. Consequently, the value of research data has increased and, for many, they have become a new currency in science [
2]. Additionally, the increased quantity of research data created a strong demand for suitable infrastructure: hardware, software, data management policies and other supporting documents and, of course, human resources for supervising the process of research data management. Suitable infrastructure, as the most crucial component, provides support to scientists, helping them focus on research itself rather than on technical and administrative aspects of research like manipulating large quantities of data.
Although science has been technically and structurally advancing through the second part of the 20th and the beginning of the 21st century, activities related to research data in digital format still lack the ease and speed with which researchers could create, collect large, complex datasets and manage them so they could be openly available. Due to the lack of adequate human (professional), technical (infrastructural) and document (guidelines and policies) support, researchers fall behind in the development and acquisition of knowledge and skills necessary to ensure data quality, integrity, shareability, discoverability and reuse over time [
3]. Furthermore, researchers are not always able to spend enough time on research data management, while at the same time they neglect other activities. “Managing research data is time-consuming, costly and tedious, requiring additional resources that are not available to all researchers” [
4] (p. 23). To achieve better and easier research data management with the aim to provide open access to research data resulting in broader availability and accessibility of research data and better cooperation between individual researchers, academic institutions and society must build uniform support to all researchers, which includes necessary technical infrastructure (individual, institutional and national), research data management policies and guidelines and training for researchers in research data management procedures. All these activities require a choice of good technical resources, time, good organization, professionally trained human resources (in IT and information sciences) and substantial long-term funding. The outcome of these activities will create a difference between research communities, those who are able to manage research data and those who are not.
This paper focuses on the management and use of data created by the Croatian scientists working in all scientific disciplines. Research data are “at the very heart of the knowledge life cycle and are a central ingredient to the scholarships of discovery, integration, teaching, engagement” etc. [
5] (p. 345). The aim of data management is not only to facilitate the long-term preservation of research data but also to make these data available for two crucial activities: possible sharing and possible reuse by interested scientists. The latter activity adds most value to research data in general. Providing access to research data is achieved mostly by sharing research data with other people, directly upon their request or by providing access in institutional general purpose digital repositories or in specialized data repositories.
4. Findings
The findings are divided into four sections (A–D) in accordance with the structure of the questionnaire.
4.1. Use and Non-Use of Other Researchers’ Data
The first section (Section A) of the research aimed at identifying whether the respondents used other scientists’ research data or not.
A total of 584 respondents answered the initial questions about using or not using other scientists’ research data. A total of 428 respondents (73.3%) used the research data of other scientists while 156 respondents (26.7%) did not use other scientists’ research data.
As an addition to this question, the scientists in this research study were asked about their years of service. The scientists had been working in science for 20 years (median). These data about years of service were compared to the number of respondents who used research data of other scientists (not all respondents provided their years of service by choice): 1–10 years of service, 82 respondents used other scientists’ research data; 11–20 years of service, 185 respondents used other scientists’ research data; 21–30 years of service, 114 respondents used other scientists’ research data; 31–40 years of service 65 respondents used other scientists’ research data and 41+ years of service, 16 respondents used other scientists’ research data. The years of service for the respondents who did not use other scientists’ research data were the following: 1–10 years of service, 35 respondents did not use other scientists’ research data; 11–20 years of service, 59 respondents did not use other scientists’ research data; 21–30 years of service, 35 respondents did not use other scientists’ research data; 31–40 years of service, 23 respondents did not use other scientists’ research data and 40+ years of service, three respondents did not use other scientists’ research data. Again, not all respondents provided data on their years of service.
The number of scientists in this research study who claimed they used other scientists’ research data is high. It is impossible to back up such results with actual numbers of the exchanged research data because researchers use other scientists’ data from different resources, sometimes internally (within the same lab, department or university) and frequently there is no exact proof of these activities. This is the reason why this result should be investigated further by conducting interviews with scientists.
A total of 153 respondents (3 out of the total of 156 who did not use other scientists’ data left this question unanswered) did not use other scientists’ research data for the following reasons (multiple answers were possible).
Table 1 shows the detailed reasons for not using other scientists’ data. Predefined reasons were offered to the respondents. About three-quarters of the respondents (from those who did not use other scientists’ data) did not have a need for other scientists’ data, while other respondents encountered different obstacles like paywall, membership, passwords or licenses. Such obstacles have usually been met by scientists in recent decades and they have caused the inability to access published papers and books. Now they have been extended to research data.
After answering this question, the respondents were asked to go to section D of the questionnaire related to storing research data and providing access to them.
4.2. Sending Requests for Research Data to Other Scientists
The next section (section B) of the questionnaire was dedicated to sending requests for research data of other scientists.
Direct contact with other scientists for gaining access to their research data was not the first choice of the respondents (N = 182) (
Table 2). Another big block of results started with scientists who sent 2–3 requests (N = 163), while the rest of the respondents sent much less requests.
The respondents were also asked to list the sources from whom or from which they received research data (
Table 3), suggesting that they used multiple resources for finding research data before accessing them.
Next, the respondents were asked to estimate the number of their own requests sent to other scientists to gain access to research data (
Table 4).
Not all the delivered requests were answered positively: 0—16 respondents; 1—30 respondents; 2—22 respondents; 3—24 respondents; 4—five respondents; 5—eight respondents; 6—three respondents; 7—one respondent; 10—six respondents; 15—one respondent. Some respondents added their own answers instead of choosing pre-defined answers: “a few”—four respondents; “less than half”—two respondents; “half”—three respondents; “almost all”—22 respondents; “all”—116 respondents
The next four questions were oriented toward estimating the importance of research data for different phases of research in general: acquiring ideas for new research (N = 580), preparation of research (N = 575), execution of research (N = 580) and verifying the quality of research (N = 579).
Figure 1 shows the importance of research data in four aspects of research: developing ideas for new research, preparation of research, implementation of research and finally, increasing quality of research. The respondents considered increasing quality of research as the most important aspect of research data over other aspects of use of research data.
In the last question in this section, the respondents were asked whether they ever tried to repeat the scientific research of another scientist based on their available research data (
Table 5).
The issue of reproducibility of research in science is a problem that has been well described in the scientific literature. Scientists involved in repeating one’s research would most certainly like to obtain identical results to those in the original research/analysis.
This was the last question in this section that was aimed only at scientists who use other scientists’ research data.
The next section (section C) was oriented toward the frequency of use of other scientists’ research data as a template for new research, comparison with other research and quality control of research.
4.3. Providing Access to Own Research Data
The remaining part of the research study results included all the respondents, regardless of whether they used other scientists’ research data or not.
The respondents were asked to estimate the number of requests they received for their own research data in the last five years (
Table 6).
Almost half of the respondents did not receive any request from other scientists for their research data, while over half of the respondents (cumulatively) received one or more requests.
The next part of this research study was oriented toward providing access to research data for other scientists (the second part of the questionnaire).
In addition to the information found in
Table 7, the respondents (N = 459) were divided according to their working experience (years of services), regardless of the mode of providing access to research data: 1–10 years of service, 82 respondents opened access to their research data; 11–20 years of service, 169 respondents opened access to their research data; 21–30 years of service, 127 respondents opened access to their research data; 31–40 years of service, 69 respondents opened access to their research data and 40+ years of service, nine respondents opened access to their research data. Next, the respondents who did not open access to their research data (N = 152) were divided according to their years of service: 1–10 years of service, 35 respondents did not open access to their research data; 11–20 years of service, 60 respondents did not open access to their research data; 21–30 years of service, 36 respondents did not open access to their research data; 31–40 years of service, 19 respondents did not open access to their research data and 40+ years of service, two respondents did not open access to their research data.
While scientists shared their research data partially or fully, they also identified problems they encountered or feared encountering during the sharing of their own research data with other scientists (
Table 8).
Research data sharing depends on the good will of scientists, but it also depends on a potential system of recognition received for this activity.
Receiving recognition for sharing research data is one of the most important moments in scientists’ careers, as it acknowledges scientists’ efforts in certain areas.
Table 9 shows that almost none of the respondents received a recognition for sharing their research data, which does not motivate scientists to start sharing their research data in the future.
The results presented in
Table 10 indicate the awareness of practical aspects of opening access to research data like researchers’ visibility, creating partnerships, transparency, quality and citations, which are benefits found at the top of the list (first five choices). Advancement in academic career and receiving recognition for opening/providing access to research data were least selected by the respondents. The complete lack of recognition or inadequate recognition for researchers for opening access to their research data was indicated in the previous question, and the same answer in this question was ranked very low and perceived to be less important by the respondents in this research study.
To be able to provide access to research data, researchers must be offered some type of training/education about the process of research data management.
Based on the results in
Table 11, researchers should be offered education about research data management if the scientific community in general expects them to share their research data on a wider scale.
The next section (section D of these findings and the third part of the questionnaire) in the research study was dedicated to data archiving, a general and crucial precondition for data sharing and data reuse.
4.4. Research Data Storing and Archiving
The first question in this section was about devices or places for storing research data.
Table 12 suggests that the respondents store their research data most frequently on their own computer at work, which is potentially very dangerous in the case of computer failure. Additionally, they stored research data on network drives in the cloud, which can also be dangerous if the network connection is lost, in the case of a security breach or due to another type of failure to access the data. Storing research data on offline external drives (and possibly on different locations) is a much better solution than storing data only on one’s computer at work or on a shared computer. Digital repositories are not so popular, while there is a growing number of general types of repositories and data-only repositories worldwide available to scientists from different countries.
File format is a highly important element in research data archiving. The choice of data file format (
Table 13) depends on the area of science in which different devices are used for the acquisition of data (by taking a record of a phenomenon, by recording measurements, etc.).
Table 13 shows the great versatility of file formats. Some of them are standardized and appear in all areas of science, while some can be found only in particular areas of science and relate to some type of laboratory instrument paired with a computer, etc.
Time spent on data archiving is one of the biggest time eaters when considering a scientist’s (monthly) workload. The results in
Table 14 show that more than one-third of the respondents spend less than one hour monthly on data archiving, and that more than three-quarters of the respondents (cumulatively) spend up to 5 h monthly on data archiving. While the time spent on data archiving may differ from one research area to another and may depend on the type of research, the results show that more than three-quarters of the respondents do spend up to 5 h a month on data archiving, which is a positive result.
Data archiving requires knowledge about online or offline storage systems, file formats, metadata creation for data description and institutional policies for data archiving. The Croatian scientists were asked to rate their knowledge about research data archiving. A total of 644 respondents provided answers on a scale from 1 (no knowledge at all) to 5 (excellent knowledge). The results in
Table 15 indicate that they acquired a certain amount of knowledge about data archiving but there is still space to reach a more advanced level of knowledge.
5. Discussion
Empirical research studies of this type are a good base for solving different problems in research data management and could enable comparative analyses of conditions for work in different areas of science.
The results of this research study on the Croatian scientists identified several problems in use and management of research data. Roughly three-quarters of the respondents claimed they used other scientists’ research data, which is a very good result. Still, they ran into different obstacles like paywalls, memberships in institutions or technical issues, which are all globally present problems that remain unsolved. Therefore, they require additional efforts on the side of different stakeholders involved in scientific endeavors. Some obstacles, like paywalls, require cooperation with information aggregators in the commercial sector and are subject to long-term negotiations with commercial publishers. Technical issues are more easily solved but can sometimes be expensive.
Generally, some of the encountered problems are local and infrastructural and can be solved rather easily (e.g., digital research data repositories), while other problems are global and require more money, time and effort to be solved on the international level (e.g., publishing fees and research data storing as a supplement to books and articles).
Almost half of the respondents did not obtain data by sending requests to other scientists to access their data, whether they did not have a need for research data or did not receive any answer from the scientists to whom they sent requests. This latter problem could be solved by storing data in open data repositories to alleviate the problem of spending too much time on direct communication with other scientists.
The use of other scientists’ research data by a larger number of scientists is practical, as a single researcher cannot discover all the problems present in some areas of research.
Also on the practical side, most of the research data that the respondents use in their analyses or new research come from their colleagues from the same department or institution, who are easily accessible and are reliable sources of research data. Research data also come from colleagues from outside the country with varying but mostly positive outcomes.
This research study showed that the Croatian scientists consider research data to be important in several cases: for obtaining ideas for new research, its preparation and execution, which is expected (if one uses research data as a starting point). The most important quality of research data for the respondents is that they are a means to increase research quality as they view research data as tools for verifying and increasing research quality. Regarding the problem of bad reproducibility of someone’s research, it has been extensively addressed in the scientific literature but the solution to this problem has not been found yet. One evident reason is the need for hyperproduction of scientific output in the form of published books and articles as proof of one’s scientific abilities. The hyperproduction diminishes the research quality and focuses on bureaucratic criteria for academic advancement. A growing number of research studies therefore remain irreproducible due to different problems related to low quality, which was partially confirmed by this research study. A significant number of researchers in this research study did not try to reproduce someone’s research by using his/her research data, while one-quarter of the respondents claim that they managed to reproduce research and obtain the exact same results. Clearly, the results are not good.
Archiving data is another big and important topic in global science covered by this research study. The Croatian scientists who participated in this research study store research data on their own desktop computer, which could lead to a disaster in case of hardware failure. There are also other technical solutions for storing research data other than desktop computers like cloud infrastructure and external drives, each with its own problems, but they are still more reliable for long-term storing/archiving research data than desktop computers used by one or even many scientists. One possible solution is that cloud infrastructure could be better marketed to scientists.
Knowledge about data archiving has not yet reached a desirable level at which scientists will have advanced knowledge on this topic. Their knowledge is currently at an intermediate level. In Croatia, scientists have an opportunity to store research data in the national digital repository system, specially made for the Croatian academic community.
The proper storing of research data enables data sharing and data reuse. However, data sharing is not a straightforward process since there are many administrative, legal and technical obstacles and fears like misuse of research data, excessive consumption of time while sharing data to other people, idea theft, etc. These are very serious problems that are present globally in academic communities and remain unresolved. The same problems are present in the Croatian academic community and were recognized by the scientists participating in this research study. Another area that can be further researched is the benefits of opening research data to other scientists. The benefits were also clearly recognized by the respondents, but are hardly present in their daily work, and this is especially true for receiving reward for data sharing, which is a practically non-existent occurrence.
Finally, education on how to manage research data has become necessary, yet close to three-quarters of the respondents did not receive any form of education. Education in research data management will help them to overcome obstacles and spend less time on research data management.
This research study confirmed the hypothesis according to which the Croatian scientists who participated in this research study are only moderately ready for data sharing and data reuse, which also only partially fulfils their research mission in society. They will have to put more effort to achieve better results in research data management to make data sharing and data reuse more easily doable, but they should also receive recognition and rewards for doing so. This study also provided answers to all three research questions, as discussed in this part of the paper.