Next Issue
Volume 4, September
Previous Issue
Volume 4, March
 
 

Data, Volume 4, Issue 2 (June 2019) – 43 articles

Cover Story (view full-size image): The daily production of clinical and biomedical research data can help enable data-driven healthcare through novel biomedical discoveries, improved diagnostics processes, epidemiology, and education. Finding and gaining access to these data and relevant metadata to achieve these goals, however, remains a challenge. These data sources are often geographically distributed, have diverse characteristics, and are controlled by a host of logistical and legal factors that require appropriate governance and access control guarantees. The primary desirable dataset properties are, thus, that the data should be findable, accessible, interoperable, and reusable (FAIR). In the proposed research work, we introduce and describe an abstract framework Data integration and indexing System (DiiS) that models these ideal goals and could be a step toward supporting data driven research. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
13 pages, 685 KiB  
Data Descriptor
Reduced Collatz Dynamics Data Reveals Properties for the Future Proof of Collatz Conjecture
by Wei Ren
Data 2019, 4(2), 89; https://doi.org/10.3390/data4020089 - 21 Jun 2019
Cited by 3 | Viewed by 5510
Abstract
Collatz conjecture is also known as 3X + 1 conjecture. For verifying the conjecture, we designed an algorithm that can output reduced dynamics (occurred 3 × x + 1 or x/2 computations from a starting integer to the first integer smaller [...] Read more.
Collatz conjecture is also known as 3X + 1 conjecture. For verifying the conjecture, we designed an algorithm that can output reduced dynamics (occurred 3 × x + 1 or x/2 computations from a starting integer to the first integer smaller than the starting integer) and original dynamics of integers (from a starting integer to 1). Especially, the starting integer has no upper bound. That is, extremely large integers with length of about 100,000 bits, e.g., 2100000 − 1, can be verified for Collatz conjecture, which is much larger than current upper bound (about 260). We analyze the properties of those data (e.g., reduced dynamics) and discover the following laws; reduced dynamics is periodic and the period is the length of its reduced dynamics; the count of x/2 equals to minimal integer that is not less than the count of (3 × x + 1)/2 times ln(1.5)/ln(2). Besides, we observe that all integers are partitioned regularly in half and half iteratively along with the prolonging of reduced dynamics, thus given a reduced dynamics we can compute a residue class that presents this reduced dynamics by a proposed algorithm. It creates one-to-one mapping between a reduced dynamics and a residue class. These observations from data can reveal the properties of reduced dynamics, which are proved mathematically in our other papers (see references). If it can be proved that every integer has reduced dynamics, then every integer will have original dynamics (i.e., Collatz conjecture will be true). The data set includes reduced dynamics of all odd positive integers in [3, 99999999] whose remainder is 3 when dividing 4, original dynamics of some extremely large integers, and all computer source codes in C that implement our proposed algorithms for generating data (i.e., reduced or original dynamics). Full article
Show Figures

Figure 1

12 pages, 5808 KiB  
Data Descriptor
The Ionic Liquid Property Explorer: An Extensive Library of Task-Specific Solvents
by Vishwesh Venkatraman, Sigvart Evjen and Kallidanthiyil Chellappan Lethesh
Data 2019, 4(2), 88; https://doi.org/10.3390/data4020088 - 21 Jun 2019
Cited by 21 | Viewed by 7478
Abstract
Ionic liquids have a broad spectrum of applications ranging from gas separation to sensors and pharmaceuticals. Rational selection of the constituent ions is key to achieving tailor-made materials with functional properties. To facilitate the discovery of new ionic liquids for sustainable applications, we [...] Read more.
Ionic liquids have a broad spectrum of applications ranging from gas separation to sensors and pharmaceuticals. Rational selection of the constituent ions is key to achieving tailor-made materials with functional properties. To facilitate the discovery of new ionic liquids for sustainable applications, we have created a virtual library of over 8 million synthetically feasible ionic liquids. Each structure has been evaluated for their-task suitability using data-driven statistical models calculated for 12 highly relevant properties: melting point, thermal decomposition, glass transition, heat capacity, viscosity, density, cytotoxicity, CO 2 solubility, surface tension, and electrical and thermal conductivity. For comparison, values of six properties computed using quantum chemistry based equilibrium thermodynamics COSMO-RS methods are also provided. We believe the data set will be useful for future efforts directed towards targeted synthesis and optimization. Full article
(This article belongs to the Special Issue Machine Learning and Materials Informatics)
Show Figures

Graphical abstract

8 pages, 492 KiB  
Data Descriptor
Dataset on Substrate-Borne Vibrations of Constrictotermes cyphergaster (Blattodea: Isoptera) Termites
by Lívia Fonseca Nunes, Paulo Fellipe Cristaldo, Pedro Sérgio Silva, Leonardo Bonato Felix, Danilo Miranda Ribeiro and Og DeSouza
Data 2019, 4(2), 87; https://doi.org/10.3390/data4020087 - 19 Jun 2019
Viewed by 3822
Abstract
Here we present data on distinct stimuli as elicitors of substrate-borne vibrations performed by groups of termites belonging to the species Constrictotermes cyphergaster (Blattodea: Isoptera: Termitidae: Nasutitermitinae). The study consisted of assays where termite workers and soldiers were exposed to different airborne stimuli [...] Read more.
Here we present data on distinct stimuli as elicitors of substrate-borne vibrations performed by groups of termites belonging to the species Constrictotermes cyphergaster (Blattodea: Isoptera: Termitidae: Nasutitermitinae). The study consisted of assays where termite workers and soldiers were exposed to different airborne stimuli and the vibrations thereby elicited were captured by an accelerometer attached under the floor of the arena in which the termites were confined. A video camera was also used as a visual complement. The data provided here contribute to fill a gap currently existing in published datasets on termite communication. Full article
Show Figures

Figure 1

25 pages, 2658 KiB  
Article
A Topology Based Spatio-Temporal Map Algebra for Big Data Analysis
by Sören Gebbert, Thomas Leppelt and Edzer Pebesma
Data 2019, 4(2), 86; https://doi.org/10.3390/data4020086 - 18 Jun 2019
Cited by 7 | Viewed by 6666
Abstract
Continental and global datasets based on earth observations or computational models challenge the existing map algebra approaches. The available datasets differ in their spatio-temporal extents and their spatio-temporal granularity, which makes it difficult to process them as time series data in map algebra [...] Read more.
Continental and global datasets based on earth observations or computational models challenge the existing map algebra approaches. The available datasets differ in their spatio-temporal extents and their spatio-temporal granularity, which makes it difficult to process them as time series data in map algebra expressions. To address this issue we introduce a new map algebra approach that is topology based. This topology based map algebra uses spatio-temporal topological operators (STTOP and STTCOP) to specify spatio-temporal operations between topological related map layers of different time-series data. We have implemented several topology based map algebra tools in the open source geoinformation system GRASS GIS and its open source cloud processing engine actinia. We demonstrate the application of our topology based map algebra by solving real world big data problems using a single algebraic expression. This included the massively parallel computation of the NDVI from a series of 100 Sentinel2A scenes organized as earth observation data cubes. The processing was performed and benchmarked on a many core computer setup and in a distributed container environment. The design of our topology based map algebra allows us to deploy it as a standardized service in the EU Horizon 2020 project openEO. Full article
(This article belongs to the Special Issue Earth Observation Data Cubes)
Show Figures

Figure 1

4 pages, 203 KiB  
Editorial
Special Issue on the Curative Power of Medical Data
by Daniela Gîfu, Diana Trandabăț, Kevin Cohen and Jingbo Xia
Data 2019, 4(2), 85; https://doi.org/10.3390/data4020085 - 14 Jun 2019
Cited by 3 | Viewed by 2911
Abstract
With the massive amounts of medical data made available online, language technologies have proven to be indispensable in processing biomedical and molecular biology literature, health data or patient records. With huge amount of reports, evaluating their impact has long ceased to be a [...] Read more.
With the massive amounts of medical data made available online, language technologies have proven to be indispensable in processing biomedical and molecular biology literature, health data or patient records. With huge amount of reports, evaluating their impact has long ceased to be a trivial task. Linking the contents of these documents to each other, as well as to specialized ontologies, could enable access to and the discovery of structured clinical information and could foster a major leap in natural language processing and in health research. The aim of this Special Issue, “Curative Power of Medical Data” in Data, is to gather innovative approaches for the exploitation of biomedical data using semantic web technologies and linked data by developing a community involvement in biomedical research. This Special Issue contains four surveys, which include a wide range of topics, from the analysis of biomedical articles writing style, to automatically generating tests from medical references, constructing a Gold standard biomedical corpus or the visualization of biomedical data. Full article
(This article belongs to the Special Issue Curative Power of Medical Data)
12 pages, 242 KiB  
Article
Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014
by Isaac Chun-Hai Fung, Jingjing Yin, Keisha D. Pressley, Carmen H. Duke, Chen Mo, Hai Liang, King-Wa Fu, Zion Tsz Ho Tse and Su-I Hou
Data 2019, 4(2), 84; https://doi.org/10.3390/data4020084 - 10 Jun 2019
Cited by 14 | Viewed by 4174
Abstract
As a pedagogical demonstration of Twitter data analysis, a case study of HIV/AIDS-related tweets around World AIDS Day, 2014, was presented. This study examined if Twitter users from countries with various income levels responded differently to World AIDS Day. The performance of support [...] Read more.
As a pedagogical demonstration of Twitter data analysis, a case study of HIV/AIDS-related tweets around World AIDS Day, 2014, was presented. This study examined if Twitter users from countries with various income levels responded differently to World AIDS Day. The performance of support vector machine (SVM) models as classifiers of relevant tweets was evaluated. A manual coding of 1,826 randomly sampled HIV/AIDS-related original tweets from November 30 through December 2, 2014 was completed. Logistic regression was applied to analyze the association between the World Bank-designated income level of users’ self-reported countries and Twitter contents. To identify the optimal SVM model, 1278 (70%) of the 1826 sampled tweets were randomly selected as the training set, and 548 (30%) served as the test set. Another 180 tweets were separately sampled and coded as the held-out dataset. Compared with tweets from low-income countries, tweets from the Organization for Economic Cooperation and Development countries had 60% lower odds to mention epidemiology (adjusted odds ratio, aOR = 0.404; 95% CI: 0.166, 0.981) and three times the odds to mention compassion/support (aOR = 3.080; 95% CI: 1.179, 8.047). Tweets from lower-middle-income countries had 79% lower odds than tweets from low-income countries to mention HIV-affected sub-populations (aOR = 0.213; 95% CI: 0.068, 0.664). The optimal SVM model was able to identify relevant tweets from the held-out dataset of 180 tweets with an accuracy (F1 score) of 0.72. This study demonstrated how students can be taught to analyze Twitter data using manual coding, regression models, and SVM models. Full article
(This article belongs to the Special Issue Big Data and Digital Health)
11 pages, 650 KiB  
Article
CaosDB—Research Data Management for Complex, Changing, and Automated Research Workflows
by Timm Fitschen, Alexander Schlemmer, Daniel Hornung, Henrik tom Wörden, Ulrich Parlitz and Stefan Luther
Data 2019, 4(2), 83; https://doi.org/10.3390/data4020083 - 10 Jun 2019
Cited by 4 | Viewed by 6440
Abstract
We present CaosDB, a Research Data Management System (RDMS) designed to ensure seamless integration of inhomogeneous data sources and repositories of legacy data in a FAIR way. Its primary purpose is the management of data from biomedical sciences, both from simulations and experiments [...] Read more.
We present CaosDB, a Research Data Management System (RDMS) designed to ensure seamless integration of inhomogeneous data sources and repositories of legacy data in a FAIR way. Its primary purpose is the management of data from biomedical sciences, both from simulations and experiments during the complete research data lifecycle. An RDMS for this domain faces particular challenges: research data arise in huge amounts, from a wide variety of sources, and traverse a highly branched path of further processing. To be accepted by its users, an RDMS must be built around workflows of the scientists and practices and thus support changes in workflow and data structure. Nevertheless, it should encourage and support the development and observation of standards and furthermore facilitate the automation of data acquisition and processing with specialized software. The storage data model of an RDMS must reflect these complexities with appropriate semantics and ontologies while offering simple methods for finding, retrieving, and understanding relevant data. We show how CaosDB responds to these challenges and give an overview of its data model, the CaosDB Server and its easy-to-learn CaosDB Query Language. We briefly discuss the status of the implementation, how we currently use CaosDB, and how we plan to use and extend it. Full article
Show Figures

Figure 1

10 pages, 1900 KiB  
Data Descriptor
Homisland-IO: Homogeneous Land Use/Land Cover over the Small Islands of the Indian Ocean
by Christophe Révillion, Artadji Attoumane and Vincent Herbreteau
Data 2019, 4(2), 82; https://doi.org/10.3390/data4020082 - 8 Jun 2019
Cited by 6 | Viewed by 4828
Abstract
Many small islands are located in the southwestern Indian Ocean. These islands have their own environmental specificities and very fragmented landscapes. Generic land use products developed from low and medium resolution satellite images are not suitable for studying these small territories. This is [...] Read more.
Many small islands are located in the southwestern Indian Ocean. These islands have their own environmental specificities and very fragmented landscapes. Generic land use products developed from low and medium resolution satellite images are not suitable for studying these small territories. This is why we have developed a land use/land cover product, called Homisland-IO, based on remote sensing processing on high spatial resolution satellite images acquired by SPOT 5 satellite between December 2012 and July 2014. This product has been produced using an object-based classification process. The overall accuracy of the product is 86%. Homisland-IO is freely accessible through a web portal and is thus available for future use. Full article
Show Figures

Graphical abstract

12 pages, 1368 KiB  
Article
Graph Theoretic and Pearson Correlation-Based Discovery of Network Biomarkers for Cancer
by Raihanul Bari Tanvir, Tasmia Aqila, Mona Maharjan, Abdullah Al Mamun and Ananda Mohan Mondal
Data 2019, 4(2), 81; https://doi.org/10.3390/data4020081 - 5 Jun 2019
Cited by 10 | Viewed by 4635
Abstract
Two graph theoretic concepts—clique and bipartite graphs—are explored to identify the network biomarkers for cancer at the gene network level. The rationale is that a group of genes work together by forming a cluster or a clique-like structures to initiate a cancer. After [...] Read more.
Two graph theoretic concepts—clique and bipartite graphs—are explored to identify the network biomarkers for cancer at the gene network level. The rationale is that a group of genes work together by forming a cluster or a clique-like structures to initiate a cancer. After initiation, the disease signal goes to the next group of genes related to the second stage of a cancer, which can be represented as a bipartite graph. In other words, bipartite graphs represent the cross-talk among the genes between two disease stages. To prove this hypothesis, gene expression values for three cancers— breast invasive carcinoma (BRCA), colorectal adenocarcinoma (COAD) and glioblastoma multiforme (GBM)—are used for analysis. First, a co-expression gene network is generated with highly correlated gene pairs with a Pearson correlation coefficient ≥ 0.9. Second, clique structures of all sizes are isolated from the co-expression network. Then combining these cliques, three different biomarker modules are developed—maximal clique-like modules, 2-clique-1-bipartite modules, and 3-clique-2-bipartite modules. The list of biomarker genes discovered from these network modules are validated as the essential genes for causing a cancer in terms of network properties and survival analysis. This list of biomarker genes will help biologists to design wet lab experiments for further elucidating the complex mechanism of cancer. Full article
Show Figures

Figure 1

9 pages, 499 KiB  
Data Descriptor
Survey Data for Measuring Musical Creativity and the Impact of Information
by Petros Kostagiolas, Charilaos Lavranos and Panagiotis Manolitzas
Data 2019, 4(2), 80; https://doi.org/10.3390/data4020080 - 1 Jun 2019
Cited by 2 | Viewed by 3666
Abstract
This paper presents data about the analysis of Webster’s model of creative thinking in music products, and the impact of information on musical creativity. For this purpose, a specially designed closed-ended structured questionnaire was developed and distributed. The questionnaire was completed by 238 [...] Read more.
This paper presents data about the analysis of Webster’s model of creative thinking in music products, and the impact of information on musical creativity. For this purpose, a specially designed closed-ended structured questionnaire was developed and distributed. The questionnaire was completed by 238 musicians and was analyzed using the Statistical Package for Social Science (SPSS) version 22.0. The data are presented though descriptive and inferential statistics, principal component analysis for variable reduction, and finally, bivariate regression analysis. The data provide information on Webster’s model of creative thinking in music as well as on the impact of music information on musical creativity. The survey results indicate that the overall sense of musical creativity includes conceptional and replicational musical creativity components. These are significantly positively correlated with music information. Musicians’ sense of creativity is impacted by music information availability when dealing with various musical creative activities. Full article
Show Figures

Figure 1

7 pages, 1360 KiB  
Article
Visualization of Myocardial Strain Pattern Uniqueness with Respect to Activation Time and Contractility: A Computational Study
by Borut Kirn
Data 2019, 4(2), 79; https://doi.org/10.3390/data4020079 - 24 May 2019
Viewed by 2900
Abstract
Speckle tracking echography is used to measure myocardial strain patterns in order to assess the state of myocardial tissue. Because electro-mechanical coupling in myocardial tissue is complex and nonlinear, and because of the measurement errors the uniqueness of strain patterns is questionable. In [...] Read more.
Speckle tracking echography is used to measure myocardial strain patterns in order to assess the state of myocardial tissue. Because electro-mechanical coupling in myocardial tissue is complex and nonlinear, and because of the measurement errors the uniqueness of strain patterns is questionable. In this study, the uniqueness of strain patterns was visualized in order to revel characteristics that may improve their interpretation. A computational model of sarcomere mechanics was used to generate a database of 1681 strain patterns, each simulated with a different set of sarcomere parameters: time of activation (TA) and contractility (Con). TA and Con ranged from −100 ms to 100 ms and 2% to 202% in 41 steps respectively, thus forming a two-dimensional 41 × 41 parameter space. Uniqueness of the strain pattern was assessed by using a cohort of similar strain patterns defined by a measurement error. The cohort members were then visualized in the parameter space. Each cohort formed one connected component (or blob) in the parameter space; however, large differences in the shape, size, and eccentricity of the blobs were found for different regions in the parameter space. The blobs were elongated along the TA direction (±50 ms) when contractility was low, and along the Con direction (±50%) when contractility was high. The uniqueness of the strain patterns can be assessed and visualized in the parameter space. The strain patterns in the studied database are not degenerated because a cohort of similar strain patterns forms only one connected blob in the parameter space. However, the elongation of the blobs means that estimations of TA when contractility is low and of Con when contractility is high have high uncertainty. Full article
(This article belongs to the Special Issue Biological Data Visualization)
Show Figures

Figure 1

5 pages, 343 KiB  
Data Descriptor
Data for Fish Stock Assessment Obtained from the CMSY Algorithm for all Global FAO Datasets
by Arnaud Hélias
Data 2019, 4(2), 78; https://doi.org/10.3390/data4020078 - 24 May 2019
Cited by 7 | Viewed by 3808
Abstract
Assessing the state of fish stocks requires the determination of descriptors. They correspond to the absolute and relative (to the carrying capacity of the habitat) fish biomasses in the ecosystem, and the absolute and relative (to the intrinsic growth rate of the population) [...] Read more.
Assessing the state of fish stocks requires the determination of descriptors. They correspond to the absolute and relative (to the carrying capacity of the habitat) fish biomasses in the ecosystem, and the absolute and relative (to the intrinsic growth rate of the population) fishing mortality resulting from catches. This allows, among other things, to compare the catch with the maximum sustainability yield. Some fish stocks are well described and monitored, but for many data-limited stocks, catch time series are remaining the only source of data. Recently, an algorithm (CMSY) has been proposed, allowing an estimation of stock assessment variables from catch and resilience. In this paper, we provide stock reference points for all global fisheries reported by Food and Agriculture Organization (FAO) major fishing area for almost 5000 fish stocks. These data come from the CMSY algorithm for 42% of the stock (75% of the global reported fish catch) and are estimated by aggregated values for the remaining 58%. Full article
Show Figures

Figure 1

14 pages, 9575 KiB  
Article
A New Crop Spectral Signatures Database Interactive Tool (CSSIT)
by Mohamad M. Awad, Bassem Alawar and Rana Jbeily
Data 2019, 4(2), 77; https://doi.org/10.3390/data4020077 - 24 May 2019
Cited by 8 | Viewed by 6339
Abstract
In many countries, commodities provided by the agriculture sector play an important role in the economy. Securing food is one aspect of this role, which can be achieved when the decision makers are supported by tools. The need for cheap, fast, and accurate [...] Read more.
In many countries, commodities provided by the agriculture sector play an important role in the economy. Securing food is one aspect of this role, which can be achieved when the decision makers are supported by tools. The need for cheap, fast, and accurate tools with high temporal resolution and global coverage has encouraged the decision makers to use remote sensing technologies. Field spectroradiometer with high spectral resolution can substantially improve crop mapping by reducing similarities between different crop types that have similar ecological conditions. This is done by recording fine details of the crop interaction with sunlight. These details can increase the same crop recognition even with the variation in the crop chemistry and structure. This paper presents a new spectral signatures database interactive tool (CSSIT) for the major crops in the Eastern Mediterranean Basin such as wheat and potato. The CSSIT’s database combines different data such as spectral signatures for different periods of crop growth stages and many physical and chemical parameters for crops such as leaf area index (LAI) and chlorophyll-a content (CHC). In addition, the CSSIT includes functions for calculating indices from spectral signatures for a specific crop and user interactive dialog boxes for displaying spectral signatures of a specific crop at a specific period of time. Full article
(This article belongs to the Special Issue Smart Farming: Monitoring Sensor Data)
Show Figures

Figure 1

13 pages, 925 KiB  
Article
Ensemble Based Classification of Sentiments Using Forest Optimization Algorithm
by Mehreen Naz, Kashif Zafar and Ayesha Khan
Data 2019, 4(2), 76; https://doi.org/10.3390/data4020076 - 23 May 2019
Cited by 12 | Viewed by 3888
Abstract
Feature subset selection is a process to choose a set of relevant features from a high dimensionality dataset to improve the performance of classifiers. The meaningful words extracted from data forms a set of features for sentiment analysis. Many evolutionary algorithms, like the [...] Read more.
Feature subset selection is a process to choose a set of relevant features from a high dimensionality dataset to improve the performance of classifiers. The meaningful words extracted from data forms a set of features for sentiment analysis. Many evolutionary algorithms, like the Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), have been applied to feature subset selection problem and computational performance can still be improved. This research presents a solution to feature subset selection problem for classification of sentiments using ensemble-based classifiers. It consists of a hybrid technique of minimum redundancy and maximum relevance (mRMR) and Forest Optimization Algorithm (FOA)-based feature selection. Ensemble-based classification is implemented to optimize the results of individual classifiers. The Forest Optimization Algorithm as a feature selection technique has been applied to various classification datasets from the UCI machine learning repository. The classifiers used for ensemble methods for UCI repository datasets are the k-Nearest Neighbor (k-NN) and Naïve Bayes (NB). For the classification of sentiments, 15–20% improvement has been recorded. The dataset used for classification of sentiments is Blitzer’s dataset consisting of reviews of electronic products. The results are further improved by ensemble of k-NN, NB, and Support Vector Machine (SVM) with an accuracy of 95% for the classification of sentiment tasks. Full article
Show Figures

Figure 1

28 pages, 3950 KiB  
Article
A Novel Hybrid Model for Stock Price Forecasting Based on Metaheuristics and Support Vector Machine
by Mojtaba Sedighi, Hossein Jahangirnia, Mohsen Gharakhani and Saeed Farahani Fard
Data 2019, 4(2), 75; https://doi.org/10.3390/data4020075 - 22 May 2019
Cited by 50 | Viewed by 9913
Abstract
This paper intends to present a new model for the accurate forecast of the stock’s future price. Stock price forecasting is one of the most complicated issues in view of the high fluctuation of the stock exchange and also it is a key [...] Read more.
This paper intends to present a new model for the accurate forecast of the stock’s future price. Stock price forecasting is one of the most complicated issues in view of the high fluctuation of the stock exchange and also it is a key issue for traders and investors. Many predicting models were upgraded by academy investigators to predict stock price. Despite this, after reviewing the past research, there are several negative aspects in the previous approaches, namely: (1) stringent statistical hypotheses are essential; (2) human interventions take part in predicting process; and (3) an appropriate range is complex to be discovered. Due to the problems mentioned, we plan to provide a new integrated approach based on Artificial Bee Colony (ABC), Adaptive Neuro-Fuzzy Inference System (ANFIS), and Support Vector Machine (SVM). ABC is employed to optimize the technical indicators for forecasting instruments. To achieve a more precise approach, ANFIS has been applied to predict long-run price fluctuations of the stocks. SVM was applied to create the nexus between the stock price and technical indicator and to further decrease the forecasting errors of the presented model, whose performance is examined by five criteria. The comparative outcomes, obtained by running on datasets taken from 50 largest companies of the U.S. Stock Exchange from 2008 to 2018, have clearly demonstrated that the suggested approach outperforms the other methods in accuracy and quality. The findings proved that our model is a successful instrument in stock price forecasting and will assist traders and investors to identify stock price trends, as well as it is an innovation in algorithmic trading. Full article
(This article belongs to the Special Issue Data Analysis for Financial Markets)
Show Figures

Figure 1

18 pages, 16435 KiB  
Article
A Study on Visual Representations for Active Plant Wall Data Analysis
by Kahin Akram Hassan, Yu Liu, Lonni Besançon, Jimmy Johansson and Niklas Rönnberg
Data 2019, 4(2), 74; https://doi.org/10.3390/data4020074 - 21 May 2019
Cited by 4 | Viewed by 5137
Abstract
The indoor climate is closely related to human health, well-being, and comfort. Thus, an understanding of the indoor climate is vital. One way to improve the indoor climates is to place an aesthetically pleasing active plant wall in the environment. By collecting data [...] Read more.
The indoor climate is closely related to human health, well-being, and comfort. Thus, an understanding of the indoor climate is vital. One way to improve the indoor climates is to place an aesthetically pleasing active plant wall in the environment. By collecting data using sensors placed in and around the plant wall both the indoor climate and the status of the plant wall can be monitored and analyzed. This manuscript presents a user study with domain experts in this field with a focus on the representation of such data. The experts explored this data with a Line graph, a Horizon graph, and a Stacked area graph to better understand the status of the active plant wall and the indoor climate. Qualitative measures were collected with Think-aloud protocol and semi-structured interviews. The study resulted in four categories of analysis tasks: Overview, Detail, Perception, and Complexity. The Line graph was found to be preferred for use in providing an overview, and the Horizon graph for detailed analysis, revealing patterns and showing discernible trends, while the Stacked area graph was generally not preferred. Based on these findings, directions for future research are discussed and formulated. The results and future directions of this research can facilitate the analysis of multivariate temporal data, both for domain users and visualization researchers. Full article
Show Figures

Figure 1

15 pages, 32770 KiB  
Data Descriptor
A Multi-Year Data Set of Beach-Foredune Topography and Environmental Forcing Conditions at Egmond aan Zee, The Netherlands
by Gerben Ruessink, Christian S. Schwarz, Timothy D. Price and Jasper J. A. Donker
Data 2019, 4(2), 73; https://doi.org/10.3390/data4020073 - 21 May 2019
Cited by 14 | Viewed by 5722
Abstract
Coastal dunes offer numerous functions to society, such as sea defense and recreation, and host unique habitats with high biodiversity. Research on coastal dune dynamics has traditionally focused on the erosional impact of short-duration (hours to days), high-wave storm events on the most [...] Read more.
Coastal dunes offer numerous functions to society, such as sea defense and recreation, and host unique habitats with high biodiversity. Research on coastal dune dynamics has traditionally focused on the erosional impact of short-duration (hours to days), high-wave storm events on the most seaward dune, called the foredune. In contrast, research data on its subsequent slow (months to years), wind-driven recovery are rather rare, yet essential to aid studying wind-driven processes, identifying the most relevant wind-forcing conditions, and testing and improving dune-growth models. The present data set contains 39 digital elevation models and 11 orthophotos of the beach-foredune system near Egmond aan Zee, The Netherlands. The novelty of the data set lies in the combination of long-term observations (6 years; January 2013 to January 2019), with high temporal (intervals of 2–4 months) and spatial resolution (1 × 1 m) covering an extensive spatial domain (1.4 km alongshore). The 25-m high foredune eroded substantially in October 2014, with a maximum recession of 75 m3/m, and subsequently recovered with a rate of approximately 15 m3/m/yr, although with substantial alongshore variability. The data set is supplemented with high-frequency time series of offshore wave, water level, and wind characteristics, as well as various annual subtidal cross-shore profiles, to facilitate its future application in coastal dune research. Full article
Show Figures

Figure 1

17 pages, 2311 KiB  
Data Descriptor
Climate Data to Undertake Hygrothermal and Whole Building Simulations Under Projected Climate Change Influences for 11 Canadian Cities
by Abhishek Gaur, Michael Lacasse and Marianne Armstrong
Data 2019, 4(2), 72; https://doi.org/10.3390/data4020072 - 21 May 2019
Cited by 66 | Viewed by 5791
Abstract
Buildings and homes in Canada will be exposed to unprecedented climatic conditions in the future as a consequence of global climate change. To improve the climate resiliency of existing and new buildings, it is important to evaluate their performance over current and projected [...] Read more.
Buildings and homes in Canada will be exposed to unprecedented climatic conditions in the future as a consequence of global climate change. To improve the climate resiliency of existing and new buildings, it is important to evaluate their performance over current and projected future climates. Hygrothermal and whole building simulation models, which are important tools for assessing performance, require continuous climate records at high temporal frequencies of a wide range of climate variables for input into the kinds of models that relate to solar radiation, cloud-cover, wind, humidity, rainfall, temperature, and snow-cover. In this study, climate data that can be used to assess the performance of building envelopes under current and projected future climates, concurrent with 2 °C and 3.5 °C increases in global temperatures, are generated for 11 major Canadian cities. The datasets capture the internal variability of the climate as they are comprised of 15 realizations of the future climate generated by dynamically downscaling future projections from the CanESM2 global climate model and thereafter bias-corrected with reference to observations. An assessment of the bias-corrected projections suggests, as a consequence of global warming, future increases in the temperatures and precipitation, and decreases in the snow-cover and wind-speed for all cities. Full article
Show Figures

Figure 1

19 pages, 6610 KiB  
Article
Isolation, Characterization, and Agent-Based Modeling of Mesenchymal Stem Cells in a Bio-construct for Myocardial Regeneration Scaffold Design
by Diana Victoria Ramírez López, María Isabel Melo Escobar, Carlos A. Peña-Reyes, Álvaro J. Rojas Arciniegas and Paola Andrea Neuta Arciniegas
Data 2019, 4(2), 71; https://doi.org/10.3390/data4020071 - 19 May 2019
Cited by 3 | Viewed by 3807
Abstract
Regenerative medicine involves methods to control and modify normal tissue repair processes. Polymer and cell constructs are under research to create tissue that replaces the affected area in cardiac tissue after myocardial infarction (MI). The aim of the present study is to evaluate [...] Read more.
Regenerative medicine involves methods to control and modify normal tissue repair processes. Polymer and cell constructs are under research to create tissue that replaces the affected area in cardiac tissue after myocardial infarction (MI). The aim of the present study is to evaluate the behavior of differentiated and undifferentiated mesenchymal stem cells (MSCs) in vitro and in silico and to compare the results that both offer when it comes to the design process of biodevices for the treatment of infarcted myocardium in biomodels. To assess in vitro behavior, MSCs are isolated from rat bone marrow and seeded undifferentiated and differentiated in multiple scaffolds of a gelled biomaterial. Subsequently, cell behavior is evaluated by trypan blue and fluorescence microscopy, which showed that the cells presented high viability and low cell migration in the biomaterial. An agent-based model intended to reproduce as closely as possible the behavior of individual MSCs by simulating cellular-level processes was developed, where the in vitro results are used to identify parameters in the agent-based model that is developed, and which simulates cellular-level processes: Apoptosis, differentiation, proliferation, and migration. Thanks to the results obtained, suggestions for good results in the design and fabrication of the proposed scaffolds and how an agent-based model can be helpful for testing hypothesis are presented in the discussion. It is concluded that assessment of cell behavior through the observation of viability, proliferation, migration, inflammation reduction, and spatial composition in vitro and in silico, represents an appropriate strategy for scaffold engineering. Full article
Show Figures

Figure 1

8 pages, 3366 KiB  
Article
A Business Rules Management System for Fixed Assets
by Sabina-Cristiana Necula
Data 2019, 4(2), 70; https://doi.org/10.3390/data4020070 - 17 May 2019
Viewed by 4527
Abstract
The goal of this paper is to discuss the necessity of separating decision rules from domain model implementation. (1) Background: can rules help to discover hidden connections between data? We propose a separated implementation of decision rules on data about fixed assets for [...] Read more.
The goal of this paper is to discuss the necessity of separating decision rules from domain model implementation. (1) Background: can rules help to discover hidden connections between data? We propose a separated implementation of decision rules on data about fixed assets for decision support. This will enhance search results. (2) Methods and technical workflow: We used DROOLS (Decision Rules Object Oriented System) to implement decision rules on the subject of accounting decisions on fixed assets; (3) Results: Making the model involves: the existence of a domain ontology and an ontology for the developed application; the possibility of executing specified inferences; the possibility of extracting information from a database; the possibility of simulations, predictions; the possibility of addressing fuzzy questions; and (4) Conclusions: The rules, the plans, and the business models must be implemented to allow specification of control over concepts. The editing of meta models must be directed to the user to ensure adaptation and not implemented at the level of control of the data control. Full article
(This article belongs to the Special Issue Data Analysis for Financial Markets)
Show Figures

Graphical abstract

37 pages, 6301 KiB  
Data Descriptor
Exploration of Youth’s Digital Competencies: A Dataset in the Educational Context of Vietnam
by Anh-Vinh Le, Duc-Lan Do, Duc-Quang Pham, Phuong-Hanh Hoang, Thu-Huong Duong, Hoai-Nam Nguyen, Thu-Trang Vuong, Hong-Kong T. Nguyen, Manh-Toan Ho, Viet-Phuong La and Quan-Hoang Vuong
Data 2019, 4(2), 69; https://doi.org/10.3390/data4020069 - 14 May 2019
Cited by 19 | Viewed by 8793
Abstract
The recent surge of the Fourth Industrial Revolution has set forth demands for a new generation of the labor force with a comprehensive set of skills to meet the standards of the global market. Despite widespread concerns about educational reforms and renovations to [...] Read more.
The recent surge of the Fourth Industrial Revolution has set forth demands for a new generation of the labor force with a comprehensive set of skills to meet the standards of the global market. Despite widespread concerns about educational reforms and renovations to enhance the workforce capacity in terms of information and communication technology (ICT) skills, research into the digital proficiencies of students has been limited in Vietnam. This dataset contains 1061 observations on the digital competency level of 10th-grade students in 20 surveyed schools from five provinces in Vietnam. The investigation, joining frequentist and Bayesian analyses, aims to provide valuable insights into the current state of children’s attitudes, behaviors, competency levels, and use of ICT within the Vietnamese educational context. The values of the dataset lie in its proposed scientific framework for replication in multiple regions and contexts as well as the feasibility of categorical regression techniques together with Bayesian statistics for hierarchical regression analysis. Full article
Show Figures

Figure 1

7 pages, 3087 KiB  
Data Descriptor
Stem-Maps of Forest Restoration Cuttings in Pinus ponderosa-Dominated Forests in the Interior West, USA
by Justin P. Ziegler, Chad M. Hoffman, Mike A. Battaglia and William Mell
Data 2019, 4(2), 68; https://doi.org/10.3390/data4020068 - 14 May 2019
Cited by 2 | Viewed by 3805
Abstract
Stem-maps, maps of tree locations with optional associated measurements, are increasingly being used for ecological study in forest and plant sciences. Analyses of stem-map data have led to greater scientific understanding and improved forest management. However, availability of these data for reuse remains [...] Read more.
Stem-maps, maps of tree locations with optional associated measurements, are increasingly being used for ecological study in forest and plant sciences. Analyses of stem-map data have led to greater scientific understanding and improved forest management. However, availability of these data for reuse remains limited. We present a description of eight 4-ha stem-maps used in four prior research studies. These stem-maps contain locations and associated measurements of residual trees and stumps measured after forest restoration cuttings in Colorado, Arizona, and New Mexico. Data are published in two file formats to facilitate reuse. Full article
(This article belongs to the Special Issue Forest Monitoring Systems and Assessments at Multiple Scales)
Show Figures

Graphical abstract

4 pages, 1324 KiB  
Data Descriptor
Point of Sale (POS) Data from a Supermarket: Transactions and Cashier Operations
by Tomasz Antczak and Rafał Weron
Data 2019, 4(2), 67; https://doi.org/10.3390/data4020067 - 11 May 2019
Cited by 7 | Viewed by 18065
Abstract
As queues in supermarkets seem to be inevitable, researchers try to find solutions that can improve and speed up the checkout process. This, however, requires access to real-world data for developing and validating models. With this objective in mind, we have prepared and [...] Read more.
As queues in supermarkets seem to be inevitable, researchers try to find solutions that can improve and speed up the checkout process. This, however, requires access to real-world data for developing and validating models. With this objective in mind, we have prepared and made publicly available high-frequency datasets containing nearly six weeks of actual transactions and cashier operations from a grocery supermarket belonging to one of the major European retail chains. This dataset can provide insights on how the intensity and duration of checkout operations changes throughout the day and week. Full article
(This article belongs to the Special Issue Data Analysis for Financial Markets)
Show Figures

Figure 1

20 pages, 5156 KiB  
Data Descriptor
Agro-Climatic Data by County: A Spatially and Temporally Consistent U.S. Dataset for Agricultural Yields, Weather and Soils
by Seong Do Yun and Benjamin M. Gramig
Data 2019, 4(2), 66; https://doi.org/10.3390/data4020066 - 8 May 2019
Cited by 11 | Viewed by 8017
Abstract
Agro-climatic data by county (ACDC) is designed to provide the major agro-climatic variables from publicly available spatial data sources to diverse end-users. ACDC provides USDA NASS annual (1981–2015) crop yields for corn, soybeans, upland cotton and winter wheat by county. Customizable growing degree [...] Read more.
Agro-climatic data by county (ACDC) is designed to provide the major agro-climatic variables from publicly available spatial data sources to diverse end-users. ACDC provides USDA NASS annual (1981–2015) crop yields for corn, soybeans, upland cotton and winter wheat by county. Customizable growing degree days for 1 °C intervals between −60 °C and +60 °C, and total precipitation for two different crop growing seasons from the PRISM weather data are included. Soil characteristic data from USDA-NRCS gSSURGO are also provided for each county in the 48 contiguous US states. All weather and soil data are processed to include only data for land being used for non-forestry agricultural uses based on the USGS NLCD land cover/land use data. This paper explains the numerical and geo-computational methods and data generating processes employed to create ACDC from the original data sources. Essential considerations for data management and use are discussed, including the use of the agricultural mask, spatial aggregation and disaggregation, and the computational requirements for working with the raw data sources. Full article
(This article belongs to the Special Issue Open Data and Robust & Reliable GIScience)
Show Figures

Graphical abstract

18 pages, 307 KiB  
Article
Predictive Models of Student College Commitment Decisions Using Machine Learning
by Kanadpriya Basu, Treena Basu, Ron Buckmire and Nishu Lal
Data 2019, 4(2), 65; https://doi.org/10.3390/data4020065 - 8 May 2019
Cited by 22 | Viewed by 15644
Abstract
Every year, academic institutions invest considerable effort and substantial resources to influence, predict and understand the decision-making choices of applicants who have been offered admission. In this study, we applied several supervised machine learning techniques to four years of data on 11,001 students, [...] Read more.
Every year, academic institutions invest considerable effort and substantial resources to influence, predict and understand the decision-making choices of applicants who have been offered admission. In this study, we applied several supervised machine learning techniques to four years of data on 11,001 students, each with 35 associated features, admitted to a small liberal arts college in California to predict student college commitment decisions. By treating the question of whether a student offered admission will accept it as a binary classification problem, we implemented a number of different classifiers and then evaluated the performance of these algorithms using the metrics of accuracy, precision, recall, F-measure and area under the receiver operator curve. The results from this study indicate that the logistic regression classifier performed best in modeling the student college commitment decision problem, i.e., predicting whether a student will accept an admission offer, with an AUC score of 79.6%. The significance of this research is that it demonstrates that many institutions could use machine learning algorithms to improve the accuracy of their estimates of entering class sizes, thus allowing more optimal allocation of resources and better control over net tuition revenue. Full article
Show Figures

Figure 1

11 pages, 554 KiB  
Data Descriptor
Lateral Root and Nodule Transcriptomes of Soybean
by Sajag Adhikari, Suresh Damodaran and Senthil Subramanian
Data 2019, 4(2), 64; https://doi.org/10.3390/data4020064 - 8 May 2019
Cited by 5 | Viewed by 5018
Abstract
Symbiotic legume nodules and lateral roots arise away from the root meristem via dedifferentiation events. While these organs share some morphological and developmental similarities, whether legume nodules are modified lateral roots is an open question. We dissected emerging nodules, mature nodules, emerging lateral [...] Read more.
Symbiotic legume nodules and lateral roots arise away from the root meristem via dedifferentiation events. While these organs share some morphological and developmental similarities, whether legume nodules are modified lateral roots is an open question. We dissected emerging nodules, mature nodules, emerging lateral roots and young lateral roots, and constructed strand-specific RNA sequencing (RNAseq) libraries using polyA-enriched RNA preparations. Root sections above and below these organs, devoid of any lateral organs, were used to construct respective control tissue libraries. High sequence quality, predominant mapping to coding sequences, and consistency between replicates indicated that the RNAseq libraries were of a very high quality. We identified genes enriched in emerging nodules, mature nodules, emerging lateral roots and young lateral roots in soybean by comparing global gene expression profiles between each of these organs and adjacent root segments. Potential uses for this high quality transcriptome data set include generation of global gene regulatory networks to identify key regulators; metabolic pathway analyses and comparative analyses of key gene families to discover organ-specific biological processes; and identification of organ-specific alternate spliced transcripts. When combined with other similar datasets, especially from leguminous plants, these analyses can help answer questions on the evolutionary origins of root nodules and relationships between the development of different plant lateral organs. Full article
Show Figures

Figure 1

2 pages, 144 KiB  
Editorial
Semantics in the Deep: Semantic Analytics for Big Data
by Dimitrios Koutsomitropoulos, Spiridon Likothanassis and Panos Kalnis
Data 2019, 4(2), 63; https://doi.org/10.3390/data4020063 - 7 May 2019
Cited by 1 | Viewed by 3151
Abstract
One cannot help but classify the continuous birth and demise of Artificial Intelligence (AI) trends into the everlasting theme of the battle between connectionist and symbolic AI [...] Full article
(This article belongs to the Special Issue Semantics in the Deep: Semantic Analytics for Big Data)
5 pages, 1488 KiB  
Data Descriptor
Transcriptome Dataset of Leaf Tissue in Agave H11648
by Xing Huang, Li Xie, Thomas Gbokie, Jr., Jingen Xi and Kexian Yi
Data 2019, 4(2), 62; https://doi.org/10.3390/data4020062 - 6 May 2019
Cited by 4 | Viewed by 3350
Abstract
Sisal is widely cultivated in tropical areas for fiber production. The main sisal cultivar, Agave H11648 ((A. amaniensis × A. angustifolia) × A. amaniensis) has a relatively scarce molecular basis and no genomic information. Next-generation sequencing technology has offered a [...] Read more.
Sisal is widely cultivated in tropical areas for fiber production. The main sisal cultivar, Agave H11648 ((A. amaniensis × A. angustifolia) × A. amaniensis) has a relatively scarce molecular basis and no genomic information. Next-generation sequencing technology has offered a great opportunity for functional gene mining in Agave species. Several published Agave transcriptomes have already been reused for gene cloning and selection pressure analysis. There are also other potential uses of the published transcriptomes, such as meta-analysis, molecular marker detection, alternative splicing analysis, multi-omics analysis, genome assembly, weighted gene co-expression network analysis, expression quantitative trait loci analysis, miRNA target site prediction, etc. In order to make the best of our published transcriptome of A. H11648 leaf, we here represent a data descriptor, with the aim to expand Agave bio information and benefit Agave genetic researches. Full article
Show Figures

Figure 1

10 pages, 901 KiB  
Data Descriptor
Seed Volume Dataset—An Ongoing Inventory of Seed Size Expressed by Volume
by Elsa Ganhão and Luís Silva Dias
Data 2019, 4(2), 61; https://doi.org/10.3390/data4020061 - 1 May 2019
Cited by 15 | Viewed by 5238
Abstract
This paper presents a dataset of seed volumes calculated from length, width, and when available, thickness, abstracted from printed literature—essentially scientific journals and books including Floras and illustrated manuals, from online inventories, and from data obtained directly by the authors or provided by [...] Read more.
This paper presents a dataset of seed volumes calculated from length, width, and when available, thickness, abstracted from printed literature—essentially scientific journals and books including Floras and illustrated manuals, from online inventories, and from data obtained directly by the authors or provided by colleagues. Seed volumes were determined from the linear dimensions of seeds using published equations and decision trees. Ways of characterizing species by seed volume were compared and the minimum volume of the seed was found to be preferable. The adequacy of seed volume as a surrogate for seed size was examined and validated using published data on the relationship between light requirements for seed germination and seed size expressed as mass. Full article
Show Figures

Graphical abstract

17 pages, 1349 KiB  
Data Descriptor
BrainRun: A Behavioral Biometrics Dataset towards Continuous Implicit Authentication
by Michail D. Papamichail, Kyriakos C. Chatzidimitriou, Thomas Karanikiotis, Napoleon-Christos I. Oikonomou, Andreas L. Symeonidis and Sashi K. Saripalle
Data 2019, 4(2), 60; https://doi.org/10.3390/data4020060 - 1 May 2019
Cited by 29 | Viewed by 7261
Abstract
The widespread use of smartphones has dictated a new paradigm, where mobile applications are the primary channel for dealing with day-to-day tasks. This paradigm is full of sensitive information, making security of utmost importance. To that end, and given the traditional authentication techniques [...] Read more.
The widespread use of smartphones has dictated a new paradigm, where mobile applications are the primary channel for dealing with day-to-day tasks. This paradigm is full of sensitive information, making security of utmost importance. To that end, and given the traditional authentication techniques (passwords and/or unlock patterns) which have become ineffective, several research efforts are targeted towards biometrics security, while more advanced techniques are considering continuous implicit authentication on the basis of behavioral biometrics. However, most studies in this direction are performed “in vitro” resulting in small-scale experimentation. In this context, and in an effort to create a solid information basis upon which continuous authentication models can be built, we employ the real-world application “BrainRun”, a brain-training game aiming at boosting cognitive skills of individuals. BrainRun embeds a gestures capturing tool, so that the different types of gestures that describe the swiping behavior of users are recorded and thus can be modeled. Upon releasing the application at both the “Google Play Store” and “Apple App Store”, we construct a dataset containing gestures and sensors data for more than 2000 different users and devices. The dataset is distributed under the CC0 license and can be found at the EU Zenodo repository. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop