Reducing Consumer Uncertainty: Towards an Ontology for Geospatial User-Centric Metadata
Abstract
:1. Introduction
- O1.
- Sufficiency of language: the support of capturing and representing metadata and fitness-for-use descriptions for a dataset at various levels of granularity; i.e., the dataset, its features and attributes.
- O2.
- Utility of language: the support of communicating fitness-for-use of datasets, by enabling both producers and users of geospatial data to generate metadata and fitness-for-use descriptions, using a single model.
- O3.
- Extensibility of language: The support of domain-independent and widely adopted standards and vocabularies for describing metadata in order to facilitate interoperability between metadata descriptions from spatial and nonspatial domains, thereby making spatial data sources searchable on open data portals.
- (i)
- Eliciting user and producer views on geospatial data quality and fitness-for-use, by conducting (a) semi-structured interviews with spatial data producers and users from a variety of domains and applications; and (b) a survey to collect data from more diverse GIS communities. The survey aimed to examine findings from our industry engagements and to potentially identify additional quality themes (from the comments and suggestions) for assessing and evaluating spatial data quality and fitness-for-use;
- (ii)
- Qualitative and quantitative analysis of our industry engagements to identify the gaps that exist between internal quality (producer supplied) and external quality (consumer described). This analysis informed the design of the Geospatial User-Centric Metadata ontology;
- (iii)
- Designed the Geospatial User-Centric Metadata (GUCM) ontology, for communicating metadata and fitness-for-use of geospatial data sources to users, in order to enable them to make informed data source selection decisions.
1.1. Motivation and Significance
1.2. Related Work
2. Materials and Methods
2.1. Semi-Structured Interviews with Spatial Data Producers and Users
2.2. Data Quality Survey
- Questions 1–10 captured general information about the participants (e.g., identifying participants as spatial data users, producers or both; identifying participants’ level of expertise in geospatial data and metadata);
- Questions 11–17 and 26–28 aimed to gather participants’ views on geospatial data quality elements and sub-elements (outlined in Table 1);
- Questions 18–25 and 29 aimed to gather participants’ views on requirements for assessing fitness-for-use of datasets (outlined in Table 2);
- Questions 30 and 31 aimed to elicit participants’ perspectives on the usefulness of a geospatial user-centric metadata vocabulary for communicating dataset quality and fitness-for-use;
- Question 32 captured comments and suggestions on a geospatial user-centric metadata vocabulary, information that it should communicate and its potential role in enabling users to identify datasets that are fit for their intended purposes.
3. Results
3.1. Qualitative Analysis of the Interviews
3.2. Analysis of Survey Results
- 80% of participants identified themselves as both user and producer of spatial data;
- Most participants work in either “agriculture, forestry, and fishing” (26%) or “other services” (33%). We used the Australian and New Zealand Standard Industrial Classification (https://www.abs.gov.au/ausstats/[email protected]/0/20C5B5A4F46DF95BCA25711F00146D75?opendocument) to identify the industry represented by each participant;
- 93% of participants use data from external data providers;
- 86% of participants have a range of data sources to choose from;
- 46% of participants have worked with geospatial data for two to nine years;
- 93% of participants make data source selection decisions based on prior knowledge and experience;
- 53% of participants find selecting datasets that fit their needs a challenging task;
- 80% of participants consider metadata records or other supporting information when selecting data sources;
- 53% of participants believe that up to 25% manual effort is involved in understanding fitness-for-use of data sources;
- 6.7% of participants believe that metadata describing dataset quality do not follow any standards; 26.7% believe that such metadata are not provided; 33.3% believe that metadata that describe data quality are incomplete; and 33.3% believe that this metadata follow widely adopted standards;
- 53% of participants indicated that their organization has been impacted by not understanding (or misunderstanding) fitness-for-use of a dataset at least once.
- Attribute/thematic accuracy: only one participant in the financial and insurance services domain, scored this quality element low (2);
- Logical consistency: only one participant in the professional, scientific, and technical services domain, scored this quality element low (2);
- Completeness: only one participant in the professional, scientific, and technical services, scored this quality element low (2);
- Lineage/Provenance: only one participant from the “other services” domain, scored this quality element low (2);
- Compliance with international standards: two participants from the “other services” domain, scored this requirement low (2);
- Community advice and recommendations (user feedback): only one participant from the “other services” domain, scored this requirement low (2);
- Reputation of dataset provider (producer profile): two participants (one from the “administrative and support services” domain, and the other from the “other services” domain) scored this requirement very low (1);
- Quantitative quality information: only one participant from the “other services” domain, scored this requirement low (2);
- Overall reliability: only one participant from the “other services” domain, scored this requirement 4. All other responses scored this requirement greater than 4;
- Relevancy: only one participant from the “professional, scientific, and technical services” domain, scored this requirement 4. All other responses scored this requirement greater than 4;
- Data dictionary (description of a dataset and its components; i.e., feature types and attribute types and their relationships): only one participant from the “administrative and support services domain”, scored this requirement low (2).
3.3. Geospatial User Centric Metadata Ontology
3.3.1. Dataset Schema
3.3.2. Interoperable Metadata
3.3.3. User Feedback
4. Discussion
4.1. Contributions to Knowledge
- Capturing metadata in structured form—as outlined in the introduction section, one of the objectives of this study is to make spatial data sources searchable on open data portals (O3). The GUCM ontology captures and represents metadata and fitness-for-use descriptions of spatial datasets, using concepts from domain-independent and widely adopted vocabularies and ontologies. The structured metadata described and captured by the GUCM ontology can be published to Open Data Portals and the Web of Data [18], providing a means to search and discover spatial data based on metadata and fitness-for-use criteria, in addition to facilitating interoperability between spatial and nonspatial metadata on open data platforms.
- Representing producer-supplied and user-described metadata using a single model—as outlined in the introduction section, one of the objectives of this study (O2) is to enable both producers and users of geospatial data to describe metadata and fitness-for-use descriptions of datasets using a single model. Internal quality, modelled by the Dataset Schema component, and external quality, modelled by the User Feedback component, are captured and represented using the same model, rather than separate producer and user models. As mentioned in Section 3.3, in order to ensure the integrity and trustworthiness of metadata descriptions, the model differentiates between metadata that is solely created and maintained by producers (Dataset Schema), and metadata created by users, producers and experts (User Feedback).
- Enabling metadata description at various levels of granularity—one of the objectives of this study (O1) is to facilitate metadata and fitness-for-use descriptions for datasets and their components. The hierarchical structure of the ontology enables metadata and fitness-for-use descriptions to target various components of a dataset; i.e., dataset, feature type or attribute type. This in turn, facilitates dataset search and discovery based on metadata and usage descriptions for specific components of a dataset.
- Facilitating communication and discussion between geospatial data producers and users—the GUCM ontology enables producers and users of geospatial data to generate metadata and fitness-for-use descriptions using the same model (O2). This in turn facilitates communication and discussion between geospatial data producers and users. The User Feedback component of GUCM facilitates communication and discussion between spatial data users, producers and experts (Figure 2). In addition, metadata captured by the User Feedback component can be used to improve producer-supplied metadata (Dataset Schema) over time. This will render the producer-supplied metadata more relevant to users’ specific needs and requirements.
- Providing contextual information for metadata—the GUCM ontology captures profiles of users that describe their experiences with spatial data sources and contribute to user-centric metadata by sharing their insights and implicit knowledge of data sources. In addition, the ontology captures the applications and domains within which metadata are described. The user profiles and application domain information associated with metadata can be used to put metadata and fitness-for-use descriptions in context when assessing the suitability of data sources for specific uses and purposes.
4.2. Study Limitations
- We took great care to ensure that our industry engagements for eliciting user requirements represented a broad spectrum of industries (please see Section 2.1 for a complete list of participating industries); however, we were unable to arrange interviews with some industries such as the military. Future work will aim to include industries that were underrepresented or missing in the requirements gathering phase of this study.
- We engaged geospatial users and producers from diverse GIS communities in our local context, i.e., Australia and New Zealand. However, in order to create an all-encompassing solution for assessing fitness-for-use of geospatial data, requirements elicitation should include a wider group of geospatial users and producers from around the globe. For example, the Spatial Data Quality Working Group of the Open Geospatial Consortium Technical Committee (http://www.opengeospatial.org/projects/groups/dqdwg), which conducted the geospatial data quality online survey in 2008 (http://portal.opengeospatial.org/files/?artifact_id=30415), used randomized sample technique to reach a large number of GIS users and vendors, where respondents came from seven continents. Our future work will also focus on extending the collaboration that was initiated with our European partners during this research initiative. More specifically, we will continue to collaborate with the Quality Knowledge Exchange Network (QKEN (https://eurogeographics.org/knowledge-exchange/qken/)) of EuroGeographics, in order to share insights and experiences and uncover additional informational aspects of spatial data that are influential for assessing fitness-for-use of geospatial data sources. This information will be used to refine the GUCM model, which will in turn lead to a more inclusive vocabulary to enable spatial data users to assess fitness-for-use of geospatial data.
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Devillers, R.; Stein, A.; Bédard, Y.; Chrisman, N.; Fisher, P.; Shi, W. Thirty years of research on spatial data quality: Achievements, failures, and opportunities. Trans. GIS 2010, 14, 387–400. [Google Scholar] [CrossRef]
- Goodchild, M.F. Sharing imperfect data. In Sharing Geographic Information; Onsrud, H.J., Rushton, G., Eds.; Centre for Urban Policy Research: New Brunswick, NJ, USA, 1995; pp. 413–425. [Google Scholar]
- Devillers, R.; Jeansoulin, R. Spatial data quality: Concepts. In Fundamentals of Spatial Data Quality; Devillers, R., Jeansoulin, R., Eds.; ISTE Ltd.: London, UK, 2006; pp. 31–42. [Google Scholar]
- Arnold, L. Spatial Data Supply Chain and End User Frameworks: Towards an Ontology for Value Creation. In GeoValue Workshop; Curtin University: Perth, Australia, 2016. [Google Scholar]
- Goodchild, M.F. Foreword. In Fundamentals of Spatial Data Quality; Devillers, R., Jeansoulin, R., Eds.; ISTE Ltd.: London, UK, 2006; pp. 13–16. [Google Scholar]
- Chrisman, N.R. The error component in spatial data. In Geographical Information Systems: Overview Principles and Applications; Maguire, D.A., Goodchild, M.F., Rhind, D.W., Eds.; Longman: White Plains, NY, USA, 1991; pp. 165–174. [Google Scholar]
- Ivánová, I.; Morales, J.; de By, R.A.; Beshe, T.S.; Gebresilassie, M.A. Searching for spatial data resources by fitness-for-use. J. Spat. Sci. 2013, 58, 15–28. [Google Scholar] [CrossRef]
- Gahegan, M. The Grid. Bringing Data Producers and Consumers Closer? In NIEeS Workshop on Activating Metadata; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
- Longhorn, R.A. Geospatial standards, interoperability, metadata semantics and spatial data infrastructure. In NIEeS Workshop on Activating Metadata; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
- Comber, A.J.; Fisher, P.F.; Wadsworth, R.A. User-focused metadata for spatial data, geographical information and data quality assessments. In Proceedings of the 10th AGILE International Conference on Geographic Information Science, Aalborg University, Aalborg, Denmark, 8–11 May 2007; pp. 1–13. [Google Scholar]
- Goodchild, M.F. Putting research into practice. In Quality Aspects of Spatial Data Mining; Stein, A., Shi, W., Bijker, W., Eds.; CRC Press: Boca Raton, FL, USA, 2009; pp. 345–356. [Google Scholar]
- Brown, M.; Sharples, S.; Harding, J.; Parker, C.; Bearman, N.; Maguire, M.; Forrest, D.; Haklay, M.; Jackson, M. Usability of geographic information: Current challenges and future directions. Appl. Ergon. 2013, 44, 855–865. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Boin, A.T.; Hunter, G.J. Do spatial data consumers really understand data quality information? In Proceedings of the 7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Lisbon, Portugal, 5–7 July 2006; pp. 215–224. [Google Scholar]
- Comber, A.J.; Fisher, P.F.; Wadsworth, R.A. Approaches for Providing User Relevant Metadata and Data Quality Assessments. In Geographical Information Science Research UK Conference (GISRUK); National Centre for Geocomputation, National University of Ireland: Maynooth, Ireland, 2007; pp. 79–82. [Google Scholar]
- Goodchild, M.F. The future of digital earth. Ann. GIS 2012, 18, 93–98. [Google Scholar] [CrossRef]
- Ellul, C.; Foord, J.; Mooney, J. Making metadata usable in a multi-national research setting. Appl. Ergon. 2013, 44, 909–918. [Google Scholar] [CrossRef] [PubMed]
- Berners-Lee, T.; Hendler, J.; Lassila, O. The semantic web. Sci. Am. 2001, 284, 34–43. [Google Scholar] [CrossRef]
- Bizer, C.; Heath, T.; Berners-Lee, T. Linked data: The story so far. In Semantic Services, Interoperability and Web Applications: Emerging Concepts; IGI Global: Hershey, PA, USA, 2011; pp. 205–227. [Google Scholar]
- Lee, Y.W.; Pipino, L.L.; Funk, J.D.; Wang, R.Y. Journey to Data Quality; Massachussets Institute of Technology: Cambridge, MA, USA, 2006. [Google Scholar]
- Wang, R.Y.; Strong, D.M. Beyond accuracy: What data quality means to data consumers. J. Manag. Inf. Syst. 1996, 12, 5–33. [Google Scholar] [CrossRef]
- Batini, C.; Scannapieco, M. Data and Information Quality; Springer International Publishing: Cham, Switzerland, 2016; Volume 43. [Google Scholar]
- Debattista, J.; Lange, C.; Auer, S.; Cortis, D. Evaluating the quality of the LOD cloud: An empirical investigation. SWJ 2018, 9, 859–901. [Google Scholar] [CrossRef] [Green Version]
- Debattista, J.; Clinton, E.; Brennan, R. Assessing the quality of geospatial linked data–experiences from Ordnance Survey Ireland (OSi). In Proceedings of the SEMANTiCS Conference, Vienna, Austria, 11–13 September 2018. [Google Scholar]
- Attard, J.; Brennan, R. A semantic data value vocabulary supporting data value assessment and measurement integration. In Proceedings of the 20th International Conference on Enterprise Information Systems, Madeira, Portugal, 21–24 March 2018. [Google Scholar]
- ISO. ISO 9000:2015 Quality Management Systems—Fundamentals and Vocabulary; ISO: Geneva, Switzerland, 2015.
- ISO. ISO 19157:2013 Geographic Information—Data Quality; ISO-Standard and Swedish SIS Standard; ISO: Geneva, Switzerland, 2013.
- Congalton, R.G. Accuracy assessment and validation of remotely sensed and other spatial information. Int. J. Wildland Fire 2001, 10, 321–328. [Google Scholar] [CrossRef] [Green Version]
- ISO. ISO 19115-1:2014 Geographic Information-Metadata-Part 1: Fundamentals; International Standards Organization: Geneva, Switzerland, 2014.
- ISO. ISO 19158:2012 Geographic Information—Quality Assurance of Data Supply; ISO: Geneva, Switzerland, 2012.
- da Silva, J.R.; Castro, J.A.; Ribeiro, C.; Honrado, J.; Lomba, Â.; Gonçalves, J. Beyond INSPIRE: An ontology for biodiversity metadata records. In Proceedings of the OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”, Amantea, Italy, 27–31 October 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 597–607. [Google Scholar]
- European Commission. Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 Establishing an Infrastructure for Spatial Information in the European Community (INSPIRE). Off. J. Eur. Union 2007, 50, 1–14. [Google Scholar]
- ISO. ISO 19115:2003 Geographic Information-Metadata; International Organization for Standardization (ISO): Geneva, Switzerland, 2003.
- Braun, V.; Clarke, V. Using thematic analysis in psychology. Qual. Res. Psychol. 2006, 3, 77–101. [Google Scholar] [CrossRef] [Green Version]
- Miles, M.B.; Huberman, A.M.; Saldana, J. Qualitative Data Analysis: A Methods Sourcebook; Sage Publications: Thousand Oaks, CA, USA, 2018. [Google Scholar]
- George, D.; Mallery, P. IBM SPSS Statistics 23 Step by Step: A Simple Guide and Reference; Routledge: London, UK, 2016. [Google Scholar]
- Gennari, J.H.; Musen, M.A.; Fergerson, R.W.; Grosso, W.E.; Crubézy, M.; Eriksson, H.; Noyb, N.F.; Tu, S.W. The evolution of Protégé: An environment for knowledge-based systems development. Int. J. Hum. Comput. Stud. 2003, 58, 89–123. [Google Scholar] [CrossRef]
- Gaševic, D.; Djuric, D.; Devedžic, V. Model Driven Engineering and Ontology Development; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Data Quality Elements | Data Quality Sub-Elements | Direct Quote from Participant |
---|---|---|
Accuracy A measure of difference between the produced spatial data and the real world that it represents. It is a relative measure and often depends on some defined specification of a true value. Accuracy of data could be measured in terms of horizontal and vertical accuracy of captured data, correctness of object classifications (e.g., a road should not be misclassified as a river), and time stamp applied to the entities in the dataset. | Positional/Spatial Accuracy The difference between the recorded location of a feature in a spatial database or in a map and its actual location on the ground, or its location on a source of known higher accuracy. Positional accuracy can be refined to horizontal and vertical accuracy as it applies to horizontal and vertical positions of captured data. | “That level of quality and positional accuracy, can make a world of difference. So, like you were saying, when it comes to flood data, the positional accuracy of that flood study is more important than my household level geocode position. Which, if it’s give or take, one or two metres out of position, is less important, to me, than that, you know, to need a perfect flood study.” |
Attribute/Thematic Accuracy Denotes the correctness of object “classifications” and the level of precision of attribute “descriptions” in the produced data. For instance, a line in a dataset that denotes a river can be misclassified as a road; or a farm object can have the farmer or crops descriptions missing from it. | “As I said, one data set could be used for any, there might be fifty to a hundred columns of attributes within the data, but, when the individual components and the attributes of the data set are used by different people in different parts, it’s just not very good…” | |
Temporal Accuracy Indicates the time stamp applied to the entities in the dataset. It is the difference between encoded dataset values and the true temporal values of the measured entities. It only applies when the dataset has a temporal (time) dimension in the form of [x, y, z, t]. This type of accuracy is identical with the concept accuracy of a time measurement. | “So, the quality, we have to have a good understanding as I said, of those input data sets and how they have changed and developed over time so that we can spot errors in those data sets, so they come through, so we are not building mesh block where we shouldn’t be building mesh blocks.” | |
Completeness Measures the omission error in the data and its compliance with data capture specification, dataset coverage, and at the level of currency required by the update policy. Highly generalised data can be accepted as complete if it complies with its specification of coverage, classification and verification. | “Yes, completeness, when I think of completeness I think about, well yes, I know that [name of the organisation removed] doesn’t have every address in the country that is actively used... There are many locations associated with the address and the data has to provide a type for each of the different locations associated with an individual address. So that it can be implemented appropriately for the business use.” | |
Logical Consistency Consistency as a general term is defined as the absence of conflicts or contradictions in a dataset. Logical Consistency relates to structures and attributes of geospatial data and defines compatibility between dataset objects – e.g., variables used adhere to the appropriate limits or types. | “… sometimes there’s no consistency between different producers on how that metadata is produced. That’s one thing, but then in terms of the attributes, the consistency that I was referring to was, in the example, was that, how it was actually, the definition that defined it, there may not necessarily be consistency there and that needs to be understood. For example, the first subclass that I was mentioning was ground water, surface water, or might be a meteorological station” | |
Relevancy Relevancy (perceived relevancy) of a specific dataset to a user’s intended uses and business purposes. | “So, um, it’s not that those points are wrong because they are correctly centroids of cadastral parcels, it’s just, um, an alignment issue between our definition of the coast which is derived from GSIS Australia and the national cadastral. I think there is the succinct story, or you could provide, we could provide information on which points they were, and those points could be tagged, and people could deal with them in the appropriate way for their business use.” | |
Currency Currency is also known as timeliness (up-to-datedness) of data. Currency of data set differs from temporal accuracy, which relates to the time stamp applied to entities in a dataset. | “[a dataset] that is updated weekly, so that we can have confidence that we have the most current representation of parcel information of title and ownership of parcels of land.” “So, you know, updating your addresses, updating data sets, it’s not just the accuracy, it’s also the currency.” | |
Reliability The extent to which a user perceives a data set to be trustworthy. Factors such as reputation and credibility of producer contribute to the user’s perceived reliability of the dataset. Producer profile (if exists) can contribute to communicating reputation of producer and to the overall reliability of the dataset. The producer’s identity alludes to how users perceive trustworthiness of a dataset. | “…there is a degree of trust and knowledge that the data is fit for purpose and we sort of iterated through various different issues with the data and solved those issues as we have gone along. We don’t, to be honest, we don’t analyse the metadata that we get from [name of the organisation removed] because those questions are raised in a, in the quarterly meetings. I suppose, what is important is a knowledge that this is the best data that is available and then a good understanding of the actual limitations of that data.” | |
Lineage (provenance information) historical information such as the source, producer, content, methods of collecting and processing, and geographic coverage of the data product. | “So, the metadata that is provided for those contains a lot of history, around where the data set originated from, what kind of sources were used to initially create it, and how it’s updated, so that, I guess the history information is interesting to know how the data set came into being… The information around how the data set is maintained and updated is important, the frequency with which it’s updated and knowing how the other authorities’ information is fed into their process and then into their database.” | |
Cost Financial cost of a dataset for a user, considering their own financial circumstances (e.g., a user is able to and willing to pay more for a dataset, which better suits their intended purposes). | “I guess there is more choice in say, imagery or Lidar, but that’s more of a cost issue and a licencing issue and an ability to cost share with other authorities to obtain that.” |
Requirements for Assessing Fitness-For-Use of a Dataset | Direct Quote from Participants |
---|---|
Producer profile: the producer’s profile can present information on reputation of dataset producer/provider. The information could contribute to the user’s perceived reliability of dataset. Users tend to rely on spatial data from producers who they know. | “Yeah, so, definitely that information around the domain that they are working in and a small number of classifications of their abilities. So, intermediate or advanced or… would be beneficial” |
Dataset citation information: Some publications and journal articles report data quality checks, dataset use and evaluation which are useful for assessing quality of a dataset. | “… It definitely would be useful to know, that if it was actually used in publications and what not… Because, if it’s used and people say, oh that’s accurate, well, how is that known? So, in some ways, if there is that, you know, validation by journals, that actually can become quite useful.” |
Data dictionary: information on every field, allowed values, types, formats, etc. | “So, we wouldn’t even start looking at it (at a dataset) … We’d probably dive straight into the data dictionary.” “The data dictionary that is provided for the data set goes a long way towards enabling someone to understand how to use it for their purposes.” |
Quantitative quality information: providing a numeric quantification of some data quality aspects by creating a specification for the dataset or comparing it with other accepted reference sources (e.g., external vocabularies such as UncertML present statistical/quantitative definition of uncertainty). This quantitative quality information can cover information about spatial and temporal resolution; spatial and temporal scale; geometric correctness; horizontal, vertical and absolute accuracy; precision; error estimates; and uncertainty. | “Yes, and estimates around the, so, estimates of the accuracy in terms of percentages for new data that is added and estimates on the allowable errors for the historical information which is mostly digitised.” “[We] will be driven by a process that will allow people to be able to, have some, sort of, standardisation in quantifying the quality and fitness for purpose and use of data.” |
Soft knowledge: Producer’s comments (textual statements) that could help to evaluate fitness-for-use of a data product, such as comments on the overall quality of a data product, any known data errors, and potential use. This information could be updated periodically by the producer. | “The metadata statement is fairly complex to use, and I think trying to provide a more user-friendly description of those products and services, is exactly where producers need to go.” |
Compliance with standards: Dataset’s compliance with national (if any) and international standards such as ISO 19157:2013, ISO 19115-1:2014, ISO 19115-2:2009, and Dublin Core. | “[Many data producers] conform with OGC and ISO standards. [However] it would be fair to say that for anybody who is trying to read an ISO compliant metadata statement, related to a data set, is not only just confusing, but really doesn’t, you know, you can get lost in that.” |
User ratings of the dataset (as a part of peer reviews and feedback): quality ratings in the form of quality stars (e.g., four out of five quality stars) or any similar form of rating that conveys a quick visual feedback on overall quality of a dataset. Such rating is different from feedback and advice (from users and producer of a dataset) that is in the form of textual statements and can express more in-depth feedback on quality of the dataset. | “In a way, I think it (a rating system) would [be] beneficial. You may have a rating about quality.” “… allowing data custodians access to a template of the processes, to be able to describe, and rate the quality and fitness for purpose of datasets that are being populated into that, and that’s web services as well. So, it’s an emerging space.” |
Community recommendations and advice (as a part of peer reviews and feedback): textual or verbal feedback from community of users on the quality of a dataset and advice on fitness-for-use of the dataset. It could also provide the underlying rationale for a rating (e.g., quality star rating) of a dataset. The interactions (e.g., brief Q&A and discussions) among the users could be via an online interactive tool (e.g., a discussion forum) that is specifically designed for this purpose or via other means of communication such as email and face-to-face meetings. | “We need to put some structure encoding around this so, as you say, it is queryable and people can make better use of people’s understanding.” “Allowing people to enter limitations that they have encountered would be, I could see the benefits of that, to kind of, generating a feedback to the supplier and capturing people’s experience.” |
Independent expert reviews: expert value judgments from other organizations or businesses who are not the producer and user of a specific dataset, but have expert knowledge that could provide value judgments on the general quality, errors, domain of application of a specific dataset, etc. | “Most of the data sources that I get are from government agencies. So, there is already inherent, I guess, the assumption, that are of a certain credibility and alike. But that being said, I also use engineering drawings and get drawings from engineers...” |
Data Quality Sub-Element or Sub-Element | Frequency |
---|---|
Positional/Spatial Accuracy | 5 |
Attribute/Thematic Accuracy | 4 |
Temporal Accuracy | 4 |
Completeness | 3 |
Logical Consistency | 4 |
Relevancy | 5 |
Currency | 4 |
Reliability | 5 |
Lineage (provenance information) | 5 |
Cost | 3 |
Requirement for Assessing Fitness-for-Use | Frequency |
---|---|
Producer profile (reputation of the producer) | 4 |
Dataset citation information | 4 |
Data dictionary | 5 |
Quantitative quality information | 5 |
Soft knowledge | 6 |
Compliance with standards | 3 |
User ratings of the dataset (as a part of peer reviews and feedback) | 5 |
Community recommendations and advice (as a part of peer reviews and feedback) | 5 |
Independent expert reviews | 4 |
N | Range | Minimum | Maximum | Mean | Std. Error | Std. Deviation | Variance | |
---|---|---|---|---|---|---|---|---|
Positional/Spatial Accuracy | 15 | 4 | 3 | 7 | 5.73 | 0.345 | 1.335 | 1.781 |
Attribute/Thematic Accuracy | 15 | 5 | 2 | 7 | 5.80 | 0.327 | 1.265 | 1.600 |
Temporal Accuracy | 15 | 3 | 4 | 7 | 5.93 | 0.267 | 1.033 | 1.067 |
Logical Consistency | 15 | 5 | 2 | 7 | 5.20 | 0.355 | 1.373 | 1.886 |
Completeness | 15 | 5 | 2 | 7 | 5.40 | 0.412 | 1.595 | 2.543 |
Currency (timeliness) | 15 | 3 | 4 | 7 | 6.13 | 0.256 | 0.990 | 0.981 |
Lineage/Provenance | 15 | 5 | 2 | 7 | 5.20 | 0.380 | 1.474 | 2.171 |
Cost of Quality (Financial) | 15 | 3 | 4 | 7 | 5.40 | 0.254 | 0.986 | 0.971 |
Overall Reliability of Data | 15 | 3 | 4 | 7 | 6.07 | 0.228 | 0.884 | 0.781 |
Relevancy | 15 | 3 | 4 | 7 | 6.47 | 0.236 | 0.915 | 0.838 |
Valid N (listwise) | 15 |
N | Range | Minimum | Maximum | Mean | Std. Error | Std. Deviation | Variance | |
---|---|---|---|---|---|---|---|---|
Experts’ Review | 15 | 5 | 2 | 7 | 4.67 | 0.361 | 1.397 | 1.952 |
Compliance with Standards | 15 | 5 | 1 | 6 | 4.07 | 0.431 | 1.668 | 2.781 |
Community Advice and Recommendations (User Feedback) | 15 | 4 | 2 | 6 | 4.27 | 0.300 | 1.163 | 1.352 |
Producer Profile (Reputation) | 15 | 5 | 2 | 7 | 5.27 | 0.358 | 1.387 | 1.924 |
Dataset Citations | 15 | 6 | 1 | 7 | 3.13 | 0.533 | 2.066 | 4.267 |
Quantitative Quality Information | 15 | 5 | 2 | 7 | 5.60 | 0.349 | 1.352 | 1.829 |
Soft Knowledge | 15 | 5 | 2 | 7 | 4.80 | 0.312 | 1.207 | 1.457 |
User Ratings | 15 | 5 | 1 | 6 | 3.67 | 0.347 | 1.345 | 1.810 |
Data Dictionary | 15 | 5 | 2 | 7 | 5.53 | 0.350 | 1.356 | 1.838 |
Valid N (listwise) | 15 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ziaimatin, H.; Nili, A.; Barros, A. Reducing Consumer Uncertainty: Towards an Ontology for Geospatial User-Centric Metadata. ISPRS Int. J. Geo-Inf. 2020, 9, 488. https://doi.org/10.3390/ijgi9080488
Ziaimatin H, Nili A, Barros A. Reducing Consumer Uncertainty: Towards an Ontology for Geospatial User-Centric Metadata. ISPRS International Journal of Geo-Information. 2020; 9(8):488. https://doi.org/10.3390/ijgi9080488
Chicago/Turabian StyleZiaimatin, Hasti, Alireza Nili, and Alistair Barros. 2020. "Reducing Consumer Uncertainty: Towards an Ontology for Geospatial User-Centric Metadata" ISPRS International Journal of Geo-Information 9, no. 8: 488. https://doi.org/10.3390/ijgi9080488
APA StyleZiaimatin, H., Nili, A., & Barros, A. (2020). Reducing Consumer Uncertainty: Towards an Ontology for Geospatial User-Centric Metadata. ISPRS International Journal of Geo-Information, 9(8), 488. https://doi.org/10.3390/ijgi9080488