Next Issue
Volume 8, May
Previous Issue
Volume 8, March
 
 

Data, Volume 8, Issue 4 (April 2023) – 11 articles

Cover Story (view full-size image): Combining a high spatial resolution (10 m) and a revisiting period of 5 days, the optical satellite data from Sentinel-2 (ESA, Copernicus) have strong potential to document changes on the Earth’s surface. We applied an optimized cross-correlation method on image pairs embedded in an automatized processing chain that also encompasses modules of post-processing for data filtering and aggregation, to produce glacier surface flow velocity maps at the European Alps scale over the period 2015–2021. Although the covered time period is rather short, the 50 m resolution maps we produce allow significant changes in glacier flow velocity to be documented, which is paramount information for documenting the state of mountain glaciers in the current context of environmental changes. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
20 pages, 2073 KiB  
Article
Digital Twin Application and Bibliometric Analysis for Digitization and Intelligence Studies in Geology and Deep Underground Research Areas
by Eun-Young Ahn and Seong-Yong Kim
Data 2023, 8(4), 73; https://doi.org/10.3390/data8040073 - 20 Apr 2023
Cited by 2 | Viewed by 2460
Abstract
As deep underground digital twins have not yet been established worldwide, this study extracted keywords from national or city-led digital twin practices and elements of digital twins and through these keywords selected research papers and topics that could contribute to the establishment of [...] Read more.
As deep underground digital twins have not yet been established worldwide, this study extracted keywords from national or city-led digital twin practices and elements of digital twins and through these keywords selected research papers and topics that could contribute to the establishment of deep underground digital twins in the future. We applied the concept of digital twins in geology and underground research to collect 1702 papers from the Web of Science and conducted semantic network analysis and topic modeling. The keywords digital, three dimensions, and real time were placed in the middle and have many links in the word network. Artificial intelligence, deep learning, and neural networks all showed a low degree of centrality. As a result of topic modeling using Latent Dirichlet allocation (LDA), topics related to topography, geological structure, and rock distribution, which are the basic data for building a deep underground digital twin, were noted, and topics related to earthquakes/vibrations, landslides, groundwater, and volcanoes were identified. Energy resources and space utilization have emerged as the main themes. Full article
Show Figures

Figure 1

14 pages, 587 KiB  
Data Descriptor
Collecting and Pre-Processing Data for Industry 4.0 Implementation Using Hydraulic Press
by Radim Hercik and Radek Svoboda
Data 2023, 8(4), 72; https://doi.org/10.3390/data8040072 - 15 Apr 2023
Cited by 1 | Viewed by 1878
Abstract
More and more activities are being undertaken to implement the Industry 4.0 concept in industrial practice. One of the biggest challenges is the digitization of existing industrial systems and heavy industry operations, where there is huge potential for optimizing and managing these processes [...] Read more.
More and more activities are being undertaken to implement the Industry 4.0 concept in industrial practice. One of the biggest challenges is the digitization of existing industrial systems and heavy industry operations, where there is huge potential for optimizing and managing these processes more efficiently, but this requires collecting large amounts of data, understanding, and evaluating it so that we can add value back based on it. This paper focuses on the collection, local pre-processing of data, and its subsequent transfer to the cloud from an industrial hydraulic press to create a comprehensive dataset that forms the basis for further digitization of the operation. The novelty lies mainly in the process of data collection and pre-processing in the framework of edge computing of large amounts of data. In the data pre-processing, data normalization methods are applied, which allow the data to be logically sorted, tagged, and linked, which also allows the data to be efficiently compressed, thus, dynamically creating a complex dataset for later use in the process digitization. Full article
Show Figures

Figure 1

10 pages, 2302 KiB  
Data Descriptor
Proteomic Shotgun and Targeted Mass Spectrometric Datasets of Cerebrospinal Fluid (Liquor) Derived from Patients with Vestibular Schwannoma
by Svetlana Novikova, Natalia Soloveva, Tatiana Farafonova, Olga Tikhonova, Vadim Shimansky, Ivan Kugushev and Victor Zgoda
Data 2023, 8(4), 71; https://doi.org/10.3390/data8040071 - 6 Apr 2023
Cited by 1 | Viewed by 1973
Abstract
Vestibular schwannomas are relatively rare intracranial tumors compared to other brain tumors. Data on the molecular features, especially on schwannoma proteome, are scarce. The 41 cerebrospinal fluid (liquor) samples were obtained during the surgical removal of vestibular schwannoma. Obtained peptide samples were analyzed [...] Read more.
Vestibular schwannomas are relatively rare intracranial tumors compared to other brain tumors. Data on the molecular features, especially on schwannoma proteome, are scarce. The 41 cerebrospinal fluid (liquor) samples were obtained during the surgical removal of vestibular schwannoma. Obtained peptide samples were analyzed by shotgun LC-MS/MS high-resolution mass spectrometry. The same peptide samples were spiked with 148 stable isotopically labeled peptide standards (SIS) followed by alkaline fractionation and scheduled multiple reaction monitoring (MRM) for quantitative analysis. The natural counterparts of SIS peptides were mapped onto 111 proteins that were Food and Drug Administration (FDA)-approved for diagnostic use. As a result, 525 proteins were identified by shotgun LC-MS/MS with high confidence (at least two peptides per protein, FDR < 1%) in liquor samples. Absolute quantitative concentrations were obtained for 54 FDA-approved proteins detected in at least five experimental samples. Since there is lack of data on the molecular landscape of vestibular schwannoma, the obtained datasets are unique and one of the first in its field. Full article
Show Figures

Figure 1

8 pages, 3691 KiB  
Data Descriptor
Clinical Trial Data on the Mechanical Removal of 14-Day-Old Dental Plaque Using Accelerated Micro-Droplets of Air and Water (Airfloss)
by Yumi C. Del Rey, Pernille D. Rikvold, Karina K. Johnsen and Sebastian Schlafer
Data 2023, 8(4), 70; https://doi.org/10.3390/data8040070 - 31 Mar 2023
Viewed by 1727
Abstract
Novel strategies to combat dental biofilms aim at reducing biofilm stability with the ultimate goal of facilitating mechanical cleaning. To test the stability of dental biofilms, they need to be subjected to a defined mechanical stress. Here, we employed an oral care device [...] Read more.
Novel strategies to combat dental biofilms aim at reducing biofilm stability with the ultimate goal of facilitating mechanical cleaning. To test the stability of dental biofilms, they need to be subjected to a defined mechanical stress. Here, we employed an oral care device (Airfloss) that emits microbursts of compressed air and water to apply a defined mechanical shear to 14-day-old dental plaque in 20 healthy participants with no signs of oral diseases (clinical trial no. NCT05082103). Exclusion criteria included pregnant or nursing women, users of oral prostheses, retainers or orthodontic appliances, and recent antimicrobial or anti-inflammatory therapy. Plaque accumulation, before and after treatment, was assessed using fluorescence images of disclosed dental plaque on the central incisor, first premolar, and first molar in the third quadrant (120 images). For each tooth, the pre- and post-treatment plaque percentage index (PPI) and Turesky modification of the Quigley-Hein plaque index (TM-QHPI) were recorded. The mean TM-QHPI significantly decreased after treatment (p = 0.03; one-sample sign test), but no significant difference between the mean pre- and post-treatment PPI was observed (p = 0.09; one-sample t-test). These data are of value for researchers that seek to apply a defined mechanical shear to remove and/or disrupt dental biofilms. Full article
Show Figures

Figure 1

13 pages, 2488 KiB  
Data Descriptor
Froth Images from Flotation Laboratory Test in Magotteaux Cell
by Carlos Yantén, Willy Kracht, Gonzalo Díaz, Pía Lois-Morales and Alvaro Egaña
Data 2023, 8(4), 69; https://doi.org/10.3390/data8040069 - 31 Mar 2023
Viewed by 2638
Abstract
Froth flotation is a widely used method for the concentration of sulfide minerals. The structure of the superficial froth is an indicator of the performance of froth flotation alongside with the operational conditions in which this process is carried out. The aim of [...] Read more.
Froth flotation is a widely used method for the concentration of sulfide minerals. The structure of the superficial froth is an indicator of the performance of froth flotation alongside with the operational conditions in which this process is carried out. The aim of this study is to explore how the different operational conditions that can be managed in a flotation plant could directly influence the observable characteristics of the superficial froth. For this purpose, a froth image database was created using a special laboratory cell, designed to emulate the conditions seen in an industrial flotation cell. The database contains 2250 images, distributed in 45 categories; each category has a specific combination of testing conditions, and the main visual characteristics are observed. It also includes a methodology used to assess the quality of each corresponding category. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

13 pages, 3390 KiB  
Data Descriptor
Sentiment Analysis of Multilingual Dataset of Bahraini Dialects, Arabic, and English
by Thuraya Omran, Baraa Sharef, Crina Grosan and Yongmin Li
Data 2023, 8(4), 68; https://doi.org/10.3390/data8040068 - 30 Mar 2023
Cited by 1 | Viewed by 2299
Abstract
Sentiment analysis is an application of natural language processing (NLP) that requires a machine learning algorithm and a dataset. In some cases, the dataset availability is scarce, particularly with Arabic dialects, precisely the Bahraini ones, which necessitates using an approach such as translation, [...] Read more.
Sentiment analysis is an application of natural language processing (NLP) that requires a machine learning algorithm and a dataset. In some cases, the dataset availability is scarce, particularly with Arabic dialects, precisely the Bahraini ones, which necessitates using an approach such as translation, where a rich source language is exploited to create the target language dataset. In this study, a dataset of Amazon product reviews in Bahraini dialects is presented. This dataset was generated using two cascading stages of translation—a machine translation followed by a manual one. Machine translation was applied using Google Translate to translate English Amazon product reviews into Standard Arabic. In contrast, the manual approach was applied to translate the resulting Arabic reviews into Bahraini ones by qualified native speakers utilizing constructed customized forms. The resulting parallel dataset of English, Standard Arabic, and Bahraini dialects is called English_Modern Standard Arabic_Bahraini Dialects product reviews for sentiment analysis “E_MSA_BDs-PR-SA”. The dataset is balanced, composed of 2500 positive and 2500 negative reviews. The sentiment analysis process was implemented using a stacked LSTM deep learning model. The Bahraini dialect product dataset can be utilized in the transfer learning process for sentimentally analyzing another dataset in Bahraini dialects. Full article
(This article belongs to the Special Issue Sentiment Analysis in Social Media Data)
Show Figures

Figure 1

5 pages, 6254 KiB  
Data Descriptor
NGS Reads Dataset of Sunflower Interspecific Hybrids
by Maksim S. Makarenko and Vera A. Gavrilova
Data 2023, 8(4), 67; https://doi.org/10.3390/data8040067 - 27 Mar 2023
Cited by 1 | Viewed by 1849
Abstract
The sunflower (Helianthus annuus), which belongs to the family of Asteraceae, is a crop grown worldwide for consumption by humans and livestock. Interspecific hybridization is widespread for sunflowers both in wild populations and commercial breeding. The current dataset comprises 250 bp [...] Read more.
The sunflower (Helianthus annuus), which belongs to the family of Asteraceae, is a crop grown worldwide for consumption by humans and livestock. Interspecific hybridization is widespread for sunflowers both in wild populations and commercial breeding. The current dataset comprises 250 bp and 76 paired-end NGS reads for six interspecific sunflower hybrids (F1). The dataset aimed to expand Helianthus species genomic information and benefit genetic research, and is useful in alloploids’ features investigations and nuclear–organelle interactions studies. Mitochondrial genomes of perennial sunflower hybrids H. annuus × H. strumosus and H. annuus × H. occidentalis were assembled and compared with parental forms. Full article
Show Figures

Figure 1

18 pages, 5798 KiB  
Data Descriptor
Satellite-Derived Annual Glacier Surface Flow Velocity Products for the European Alps, 2015–2021
by Antoine Rabatel, Etienne Ducasse, Romain Millan and Jérémie Mouginot
Data 2023, 8(4), 66; https://doi.org/10.3390/data8040066 - 27 Mar 2023
Cited by 4 | Viewed by 2549
Abstract
Documenting glacier surface flow velocity from a longer-term perspective is highly relevant to evaluate the past and current state of glaciers worldwide. For this purpose, satellite data are widely used to obtain region-wide coverage of glacier velocity data. Well-established image correlation methods allow [...] Read more.
Documenting glacier surface flow velocity from a longer-term perspective is highly relevant to evaluate the past and current state of glaciers worldwide. For this purpose, satellite data are widely used to obtain region-wide coverage of glacier velocity data. Well-established image correlation methods allow for the automated measurement of glacier surface displacements from satellite data (optical and radar) acquired at different dates. Although computationally expensive, image correlation is nowadays relatively simple to implement and allows two-dimensional displacement measurements. Here, we present a data set of annual glacier surface flow velocity maps at the European Alps scale, covering the period 2015–2021 at a 50 m × 50 m resolution. This data set has been quantified by applying the normalized cross-correlation approach on Sentinel-2 optical data. Parameters of the cross-correlation method (e.g., window size, sampling resolution) have been optimized, and the results have been validated by comparing them with in situ data on monitored glaciers showing an RMSE of 10 m/yr. These data can be used to evaluate glacier dynamics and its spatial and temporal evolution (e.g., quantify mass fluxes or calving) or can be used as an input for model calibration/validation or for the early detection of regional hazards associated with glacier destabilization. Full article
Show Figures

Figure 1

21 pages, 4416 KiB  
Article
CyL-GHI: Global Horizontal Irradiance Dataset Containing 18 Years of Refined Data at 30-Min Granularity from 37 Stations Located in Castile and León (Spain)
by Llinet Benavides Cesar, Miguel Ángel Manso Callejo, Calimanut-Ionut Cira and Ramon Alcarria
Data 2023, 8(4), 65; https://doi.org/10.3390/data8040065 - 26 Mar 2023
Cited by 4 | Viewed by 2987
Abstract
Accurate solar forecasting lately relies on advances in the field of artificial intelligence and on the availability of databases with large amounts of information on meteorological variables. In this paper, we present the methodology applied to introduce a large-scale, public, and solar irradiance [...] Read more.
Accurate solar forecasting lately relies on advances in the field of artificial intelligence and on the availability of databases with large amounts of information on meteorological variables. In this paper, we present the methodology applied to introduce a large-scale, public, and solar irradiance dataset, CyL-GHI, containing refined data from 37 stations found within the Spanish region of Castile and León (Spanish: Castilla y León, or CyL). In addition to the data cleaning steps, the procedure also features steps that enable the addition of meteorological and geographical variables that complement the value of the initial data. The proposed dataset, resulting from applying the processing methodology, is delivered both in raw format and with the quality processing applied, and continuously covers 18 years (the period from 1 January 2002 to 31 December 2019), with a temporal resolution of 30 min. CyL-GHI can result in great importance in studies focused on the spatial-temporal characteristics of solar irradiance data, due to the geographical information considered that enables a regional analysis of the phenomena (the 37 stations cover a land area larger than 94,226 km2). Afterwards, three popular artificial intelligence algorithms were optimised and tested on CyL-GHI, their performance values being offered as baselines to compare other forecasting implementations. Furthermore, the ERA5 values corresponding to the studied area were analysed and compared with performance values delivered by the trained models. The inclusion of previous observations of neighbours as input to an optimised Random Forest model (applying a spatio-temporal approach) improved the predictive capability of the machine learning models by almost 3%. Full article
Show Figures

Figure 1

18 pages, 5189 KiB  
Article
Improving an Acoustic Vehicle Detector Using an Iterative Self-Supervision Procedure
by Birdy Phathanapirom, Jason Hite, Kenneth Dayman, David Chichester and Jared Johnson
Data 2023, 8(4), 64; https://doi.org/10.3390/data8040064 - 25 Mar 2023
Cited by 1 | Viewed by 2207
Abstract
In many non-canonical data science scenarios, obtaining, detecting, attributing, and annotating enough high-quality training data is the primary barrier to developing highly effective models. Moreover, in many problems that are not sufficiently defined or constrained, manually developing a training dataset can often overlook [...] Read more.
In many non-canonical data science scenarios, obtaining, detecting, attributing, and annotating enough high-quality training data is the primary barrier to developing highly effective models. Moreover, in many problems that are not sufficiently defined or constrained, manually developing a training dataset can often overlook interesting phenomena that should be included. To this end, we have developed and demonstrated an iterative self-supervised learning procedure, whereby models are successfully trained and applied to new data to extract new training examples that are added to the corpus of training data. Successive generations of classifiers are then trained on this augmented corpus. Using low-frequency acoustic data collected by a network of infrasound sensors deployed around the High Flux Isotope Reactor and Radiochemical Engineering Development Center at Oak Ridge National Laboratory, we test the viability of our proposed approach to develop a powerful classifier with the goal of identifying vehicles from continuously streamed data and differentiating these from other sources of noise such as tools, people, airplanes, and wind. Using a small collection of exhaustively manually labeled data, we test several implementation details of the procedure and demonstrate its success regardless of the fidelity of the initial model used to seed the iterative procedure. Finally, we demonstrate the method’s ability to update a model to accommodate changes in the data-generating distribution encountered during long-term persistent data collection. Full article
Show Figures

Figure 1

10 pages, 19729 KiB  
Data Descriptor
Batik Nitik 960 Dataset for Classification, Retrieval, and Generator
by Agus Eko Minarno, Indah Soesanti and Hanung Adi Nugroho
Data 2023, 8(4), 63; https://doi.org/10.3390/data8040063 - 24 Mar 2023
Cited by 5 | Viewed by 3232
Abstract
Batik is one of the traditional heritages of Indonesia, with each motif of batik having a profound cultural and philosophical significance. This article introduces Batik Nitik 960 dataset from Yogyakarta, Indonesia. The dataset was extracted from a piece of fabric with 60 Nitik [...] Read more.
Batik is one of the traditional heritages of Indonesia, with each motif of batik having a profound cultural and philosophical significance. This article introduces Batik Nitik 960 dataset from Yogyakarta, Indonesia. The dataset was extracted from a piece of fabric with 60 Nitik patterns. The dataset was supplied by the Paguyuban Pecinta Batik Indonesia (PPBI) Sekar Jagad Yogyakarta collection of Winotosasto Batik and the data were extracted from the APIPS Gallery. Each of the 60 categories in the collection contains 16 photographs, for a total of 960 images. The photographs were acquired with a Sony Alpha a6400, illuminated with a Godox SK II 400, and the data were compressed using the jpg file format. Each category contains four motifs rotated by 90, 180, and 270 degrees. Thus, the total number of images per motif is 16. Each class has a specific philosophical significance associated with the motif’s origins. This dataset aims to enable the training and evaluation of machine learning models for classification, retrieval, or generation of a new batik pattern using a generative adversarial network. To our knowledge, this study is the first to present a Batik Nitik dataset equipped with philosophical significance that is freely accessible. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop