Next Issue
Volume 10, February
Previous Issue
Volume 9, December
 
 

Data, Volume 10, Issue 1 (January 2025) – 10 articles

Cover Story (view full-size image): Reactive heritage digital twins (RHDTs) have revolutionised cultural heritage management by creating dynamic, data-rich replicas of cultural objects. Empowered by ontologies and semantic graphs, RHDTs interlink cultural documentation, historical contexts, and real-time sensor data, offering a comprehensive representation of heritage entities. Exploring multiple synergies between artificial intelligence and ontologies demonstrates how their integration advances RHDTs, with cutting-edge data analysis, semantic organisation, and predictive capabilities. Transparent and explainable AI processes of semantic data redefine the role of RHDTs in cultural heritage monitoring and preservation, forging an innovative, sustainable framework for safeguarding invaluable cultural assets for the future generations. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
12 pages, 1763 KiB  
Data Descriptor
A Comprehensive Parcel-Level Dataset on Farmland Assessment: Addressing Grid-Cell Data Bias Estimation
by Wai Yan Siu, Man Li and Arthur J. Caplan
Data 2025, 10(1), 10; https://doi.org/10.3390/data10010010 - 17 Jan 2025
Viewed by 497
Abstract
Grid-cell data are increasingly used in research due to the growing availability and accessibility of remote sensing products. However, grid-cell data often fails to represent the actual decision-making unit, leading to biased estimates in socio-economic analysis. To this end, this paper presents a [...] Read more.
Grid-cell data are increasingly used in research due to the growing availability and accessibility of remote sensing products. However, grid-cell data often fails to represent the actual decision-making unit, leading to biased estimates in socio-economic analysis. To this end, this paper presents a comprehensive parcel-level dataset for Salt Lake County, Utah, spanning from 2008 to 2018. This dataset combines detailed spatial and temporal data on land ownership, land use, and preferential farmland tax assessments under the Greenbelt program. Compiled from multiple geospatial sources, the dataset includes nearly 200,000 parcel-year observations, providing valuable insights into landowner decision-making and the impact of tax abatement incentives at the decision-making level. This resource is beneficial for researchers, educators, and practitioners in sustainable development, environmental studies, and farmland conservation. Full article
Show Figures

Figure 1

21 pages, 2822 KiB  
Article
Credit Evaluation of Technology-Based Small and Micro Enterprises: An Innovative Weighting Method Based on Machine Learning and AHP
by Bingya Wu, Zhihui Hu, Zhouyi Gu, Yuxi Zheng and Jiayan Lv
Data 2025, 10(1), 9; https://doi.org/10.3390/data10010009 - 14 Jan 2025
Viewed by 546
Abstract
Technology-based small and micro enterprises play a crucial role in national economic and social development. Managing their credit risk effectively is key to ensuring their healthy growth. This study is based on corporate credit management theory and Wu’s three-dimensional credit theory. It clarifies [...] Read more.
Technology-based small and micro enterprises play a crucial role in national economic and social development. Managing their credit risk effectively is key to ensuring their healthy growth. This study is based on corporate credit management theory and Wu’s three-dimensional credit theory. It clarifies the credit concept and measurement logic of these enterprises, considering their unique development characteristics in China. A credit evaluation system is constructed, and an innovative method combining machine learning with comprehensive evaluation is proposed. This approach aims to assess the credit status of technology-based small and micro enterprises in a thorough and objective manner. The study finds that, first, the credit level of these enterprises is currently moderate, with little variation. Second, financial information remains a key factor in credit evaluation. Third, the ML-AHP (Machine Learning-Analytic Hierarchy Process) combined weighting method effectively integrates subjective experience with objective data, providing a more rational assessment. The findings provide theoretical references and practical guidance for the healthy development of technology-based small and micro enterprises, early credit risk warning, and improved financing efficiency. Full article
Show Figures

Figure 1

17 pages, 3498 KiB  
Review
Application of Google Earth Engine to Monitor Greenhouse Gases: A Review
by Damar David Wilson, Gebrekidan Worku Tefera and Ram L. Ray
Data 2025, 10(1), 8; https://doi.org/10.3390/data10010008 - 11 Jan 2025
Viewed by 833
Abstract
Google Earth Engine (GEE) is a cloud-based platform revolutionizing geospatial analysis by providing access to vast satellite datasets and computational capabilities for monitoring environmental and societal issues. It incorporates machine learning (ML) techniques and algorithms as part of its tools for analyzing and [...] Read more.
Google Earth Engine (GEE) is a cloud-based platform revolutionizing geospatial analysis by providing access to vast satellite datasets and computational capabilities for monitoring environmental and societal issues. It incorporates machine learning (ML) techniques and algorithms as part of its tools for analyzing and processing large geospatial data. This review explores the diverse applications of GEE in monitoring and mitigating greenhouse gas emissions and uptakes. GEE is a cloud-based platform built on Google’s infrastructure for analyzing and visualizing large-scale geospatial datasets. It offers large datasets for monitoring greenhouse gas (GHG) emissions and understanding their environmental impact. By leveraging GEE’s capabilities, researchers have developed tools and algorithms to analyze remotely sensed data and accurately quantify GHG emissions and uptakes. This review examines progress and trends in GEE applications, focusing on monitoring carbon dioxide (CO2), methane (CH4), and nitrous oxide/nitrogen dioxide (N2O/NO2) emissions. It discusses the integration of GEE with different machine learning methods and the challenges and opportunities in optimizing algorithms and ensuring data interoperability. Furthermore, it highlights GEE’s role in pinpointing emission hotspots, as demonstrated in studies monitoring uptakes. By providing insights into GEE’s capabilities for precise monitoring and mapping of GHGs, this review aims to advance environmental research and decision-making processes in mitigating climate change. Full article
Show Figures

Figure 1

9 pages, 2730 KiB  
Data Descriptor
Cholec80-Boxes: Bounding Box Labelling Data for Surgical Tools in Cholecystectomy Images
by Tamer Abdulbaki Alshirbaji, Nour Aldeen Jalal, Herag Arabian, Alberto Battistel, Paul David Docherty, Hisham ElMoaqet, Thomas Neumuth and Knut Moeller
Data 2025, 10(1), 7; https://doi.org/10.3390/data10010007 - 8 Jan 2025
Viewed by 754
Abstract
Surgical data analysis is crucial for developing and integrating context-aware systems (CAS) in advanced operating rooms. Automatic detection of surgical tools is an essential component in CAS, as it enables the recognition of surgical activities and understanding the contextual status of the procedure. [...] Read more.
Surgical data analysis is crucial for developing and integrating context-aware systems (CAS) in advanced operating rooms. Automatic detection of surgical tools is an essential component in CAS, as it enables the recognition of surgical activities and understanding the contextual status of the procedure. Acquiring surgical data is challenging due to ethical constraints and the complexity of establishing data recording infrastructures. For machine learning tasks, there is also the large burden of data labelling. Although a relatively large dataset, namely the Cholec80, is publicly available, it is limited to the binary label data corresponding to the surgical tool presence. In this work, 15,691 frames from five videos from the dataset have been labelled with bounding boxes for surgical tool localisation. These newly labelled data support future research in developing and evaluating object detection models, particularly in the laparoscopic image data analysis domain. Full article
Show Figures

Figure 1

15 pages, 1414 KiB  
Data Descriptor
Self-Reported Data for Sustainable Development from People Living in Rural and Remote Areas
by Salem Ahmed Alabdali, Salvatore Flavio Pileggi and Gnana Bharathy
Data 2025, 10(1), 6; https://doi.org/10.3390/data10010006 - 8 Jan 2025
Viewed by 467
Abstract
This paper describes a dataset for the Sustainable Development of remote and rural areas. Version 1.0 includes self-reported data, with a total of 212 valid responses collected in 2024 across different sectors (education, healthcare, and business) from people living in rural and remote [...] Read more.
This paper describes a dataset for the Sustainable Development of remote and rural areas. Version 1.0 includes self-reported data, with a total of 212 valid responses collected in 2024 across different sectors (education, healthcare, and business) from people living in rural and remote areas in Saudi Arabia. The structured survey is understood to support research endeavors and policy making, looking at the peculiar characteristics of those regions. The 40 core questions, in addition to the detailed demographic questions, aim to capture different perspectives and perceptions on innovative and sustainable solutions. Overall, the dataset offers valuable strategic insights to be integrated with other sources of information, as well as the opportunity to incrementally generate extensive and diverse knowledge in the field. The major limitation is inherently related to the local context, as data comes from the most educated persons with access to digital resources. Additionally, the dataset may be considered as relatively small, and there is some gender imbalance due to cultural factors. Full article
Show Figures

Figure 1

14 pages, 6079 KiB  
Data Descriptor
The EDI Multi-Modal Simultaneous Localization and Mapping Dataset (EDI-SLAM)
by Peteris Racinskis, Gustavs Krasnikovs, Janis Arents and Modris Greitans
Data 2025, 10(1), 5; https://doi.org/10.3390/data10010005 - 7 Jan 2025
Viewed by 569
Abstract
This paper accompanies the initial public release of the EDI multi-modal SLAM dataset, a collection of long tracks recorded with a portable sensor package. These include two global shutter RGB camera feeds, LiDAR scans, as well as inertial and GNSS data from an [...] Read more.
This paper accompanies the initial public release of the EDI multi-modal SLAM dataset, a collection of long tracks recorded with a portable sensor package. These include two global shutter RGB camera feeds, LiDAR scans, as well as inertial and GNSS data from an RTK-enabled IMU-GNSS positioning module—both as satellite fixes and internally fused interpolated pose estimates. The tracks are formatted as ROS1 and ROS2 bags, with separately available calibration and ground truth data. In addition to the filtered positioning module outputs, a second form of sparse ground truth pose annotation is provided using independently surveyed visual fiducial markers as a reference. This enables the meaningful evaluation of systems that directly utilize data from the positioning module into their localization estimates, and serves as an alternative when the GNSS reference is disrupted by intermittent signals or multipath scattering. In this paper, we describe the methods used to collect the dataset, its contents, and its intended use. Full article
Show Figures

Figure 1

20 pages, 2508 KiB  
Article
Optimizing Parkinson’s Disease Prediction: A Comparative Analysis of Data Aggregation Methods Using Multiple Voice Recordings via an Automated Artificial Intelligence Pipeline
by Zhengxiao Yang, Hao Zhou, Sudesh Srivastav, Jeffrey G. Shaffer, Kuukua E. Abraham, Samuel M. Naandam and Samuel Kakraba
Data 2025, 10(1), 4; https://doi.org/10.3390/data10010004 - 2 Jan 2025
Viewed by 777
Abstract
Patient-level grouped data are prevalent in public health and medical fields, and multiple instance learning (MIL) offers a framework to address the challenges associated with this type of data structure. This study compares four data aggregation methods designed to tackle the grouped structure [...] Read more.
Patient-level grouped data are prevalent in public health and medical fields, and multiple instance learning (MIL) offers a framework to address the challenges associated with this type of data structure. This study compares four data aggregation methods designed to tackle the grouped structure in classification tasks: post-mean, post-max, post-min, and pre-mean aggregation. We developed a customized AI pipeline that incorporates twelve machine learning algorithms along with the four aggregation methods to detect Parkinson’s disease (PD) using multiple voice recordings from individuals available in the UCI Machine Learning Repository, which includes 756 voice recordings from 188 PD patients and 64 healthy individuals. Seven performance metrics—accuracy, precision, sensitivity, specificity, F1 score, AUC, and MCC—were utilized for model evaluation. Various techniques, such as Bag Over-Sampling (BOS), cross-validation, and grid search, were implemented to enhance classification performance. Among the four aggregation methods, post-mean aggregation combined with XGBoost achieved the highest accuracy (0.880), F1 score (0.922), and MCC (0.672). Furthermore, we identified potential trends in selecting aggregation methods that are suitable for imbalanced data, particularly based on their differences in sensitivity and specificity. These findings provide meaningful implications for the further exploration of grouped imbalanced data. Full article
Show Figures

Figure 1

11 pages, 2058 KiB  
Data Descriptor
Synthetic Dataset for Analyzing Geometry-Dependent Optical Properties of All-Pass Micro-Ring Resonators
by Sebastian Valencia-Garzon, Esteban Gonzalez-Valencia, Nelson Gómez-Cardona, Andres Calvo-Salcedo, J. A. Jaramillo-Villegas, Jorge Montoya-Cardona and Erick Reyes-Vera
Data 2025, 10(1), 3; https://doi.org/10.3390/data10010003 - 30 Dec 2024
Viewed by 555
Abstract
This study focuses on the analysis of the spectral response of all-pass micro-ring resonators (MRRs), which are essential in photonic device applications such as telecommunications, sensing, and optical frequency comb generation. The aim of this work is to generate a synthetic dataset that [...] Read more.
This study focuses on the analysis of the spectral response of all-pass micro-ring resonators (MRRs), which are essential in photonic device applications such as telecommunications, sensing, and optical frequency comb generation. The aim of this work is to generate a synthetic dataset that explores the spectral characteristics of the expected transmission spectra of MRRs by varying their structural parameters. Using numerical simulations, the dataset will allow the optimization of MRR performance metrics such as free spectral range (FSR), full width at half maximum (FWHM), and quality factor (Q-factor). The results confirm that variations in geometric configurations can significantly affect MRR performance, and the dataset provides valuable insights into the optimization process. Furthermore, machine learning techniques can be applied to the dataset to automate and improve the design process, reducing simulation times and increasing accuracy. This work contributes to the development of photonic devices by providing a broad dataset for further analysis and optimization. Full article
Show Figures

Figure 1

11 pages, 1926 KiB  
Data Descriptor
Minisatellite Isolation and Minisatellite Molecular Marker Development in Citrus limon (L.) Osbeck
by Oleg S. Alexandrov and Dmitry V. Romanov
Data 2025, 10(1), 2; https://doi.org/10.3390/data10010002 - 28 Dec 2024
Viewed by 518
Abstract
Minisatellites are widespread tandem DNA repeats in the genome with a monomer length of 10 to 100 bp. The high variability of minisatellite loci makes them attractive for the development of molecular markers. Minisatellites are used as markers according to three strategies: marking [...] Read more.
Minisatellites are widespread tandem DNA repeats in the genome with a monomer length of 10 to 100 bp. The high variability of minisatellite loci makes them attractive for the development of molecular markers. Minisatellites are used as markers according to three strategies: marking of digested genomic DNA with minisatellite-based probes; amplification with primers based on the sequences of the minisatellites themselves; amplification with primers designed for borders upstream and downstream of the minisatellite locus. In this study, a microsatellite dataset was obtained from the analysis of the Citrus limon (L.) Osbeck genome using Tandem Repeat Finder (TRF) and GMATA software. The minisatellite loci found were used to develop molecular markers that were tested in GMATA using electronic PCR (e-PCR). The obtained dataset includes sequences of extracted minisatellites and their characteristics (start and end nucleotide positions on the chromosome, length of monomer, number of repetitions and length of array), as well as sequences of developed primers, expected lengths of amplicons, and e-PCR results. The presented dataset can be used for the marking of lemon samples according to any of the three strategies. It provides a useful basis for lemon variety certification, identification of samples, verification of collections, lemon genome mapping, saturation of already created maps, studying of the lemon genome architecture etc. Full article
(This article belongs to the Special Issue Benchmarking Datasets in Bioinformatics, 2nd Volume)
Show Figures

Figure 1

24 pages, 2074 KiB  
Article
Artificial Intelligence and Ontologies for the Management of Heritage Digital Twins Data
by Achille Felicetti and Franco Niccolucci
Data 2025, 10(1), 1; https://doi.org/10.3390/data10010001 - 26 Dec 2024
Viewed by 760
Abstract
This study builds upon the Reactive Heritage Digital Twin paradigm established in prior research, exploring the role of artificial intelligence in expanding and enhancing its capabilities. After providing an overview of the ontological model underlying the RHDT paradigm, this paper investigates the application [...] Read more.
This study builds upon the Reactive Heritage Digital Twin paradigm established in prior research, exploring the role of artificial intelligence in expanding and enhancing its capabilities. After providing an overview of the ontological model underlying the RHDT paradigm, this paper investigates the application of AI to improve data analysis and predictive capabilities of Heritage Digital Twins in synergy with the previously defined RHDTO semantic model. The structured nature of ontologies is highlighted as essential for enabling AIs to operate transparently, minimising hallucinations and other errors that are characteristic challenges of these technologies. New classes and properties within RHDTO are introduced to represent the AI-enhanced functions. Finally, some case studies are provided to illustrate how integrating AI within the RHDT framework can contribute to enriching the understanding of cultural information through interconnected data and facilitate real-time monitoring and preservation of cultural objects. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop