Next Issue
Volume 4, March
Previous Issue
Volume 3, September
 
 

Data, Volume 3, Issue 4 (December 2018) – 33 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
12 pages, 2579 KiB  
Article
Medi-Test: Generating Tests from Medical Reference Texts
by Ionuț Pistol, Diana Trandabăț and Mădălina Răschip
Data 2018, 3(4), 70; https://doi.org/10.3390/data3040070 - 19 Dec 2018
Cited by 4 | Viewed by 4495
Abstract
The Medi-test system we developed was motivated by the large number of resources available for the medical domain, as well as the number of tests needed in this field (during and after the medical school) for evaluation, promotion, certification, etc. Generating questions to [...] Read more.
The Medi-test system we developed was motivated by the large number of resources available for the medical domain, as well as the number of tests needed in this field (during and after the medical school) for evaluation, promotion, certification, etc. Generating questions to support learning and user interactivity has been an interesting and dynamic topic in NLP since the availability of e-book curricula and e-learning platforms. Current e-learning platforms offer increased support for student evaluation, with an emphasis in exploiting automation in both test generation and evaluation. In this context, our system is able to evaluate a student’s academic performance for the medical domain. Using medical reference texts as input and supported by a specially designed medical ontology, Medi-test generates different types of questionnaires for Romanian language. The evaluation includes 4 types of questions (multiple-choice, fill in the blanks, true/false, and match), can have customizable length and difficulty, and can be automatically graded. A recent extension of our system also allows for the generation of tests which include images. We evaluated our system with a local testing team, but also with a set of medicine students, and user satisfaction questionnaires showed that the system can be used to enhance learning. Full article
(This article belongs to the Special Issue Curative Power of Medical Data)
Show Figures

Figure 1

9 pages, 409 KiB  
Data Descriptor
Data for an Importance-Performance Analysis (IPA) of a Public Green Infrastructure and Urban Nature Space in Perth, Western Australia
by Greg D. Simpson and Jackie Parker
Data 2018, 3(4), 69; https://doi.org/10.3390/data3040069 - 17 Dec 2018
Cited by 17 | Viewed by 4868
Abstract
This Data Descriptor shares the dataset generated by a visitor satisfaction survey of users of a mixed-use public green infrastructure (PGI) space in Perth, Western Australia, that incorporates remnant and reintroduced urban nature (UN). Conducted in the Austral summer of 2016–2017, the survey [...] Read more.
This Data Descriptor shares the dataset generated by a visitor satisfaction survey of users of a mixed-use public green infrastructure (PGI) space in Perth, Western Australia, that incorporates remnant and reintroduced urban nature (UN). Conducted in the Austral summer of 2016–2017, the survey (n = 393) utilized the technique of Importance-Performance Analysis (IPA) to elucidate perceptions of PGI users regarding performance of the amenity and facilities at the study site. There is a growing body of research that reports the innate, inbuilt affinity of humans to natural systems and living things. As humankind has grown exponentially over the past 50 years, humanity, as a species, is living an increasingly urbanized lifestyle, resulting in spreading urban footprints and increased population densities that are causing humans to become increasingly disconnected from nature. These conflicting phenomena are driving research to understand the contribution that PGI and UN can make to enhancing the quality of life of urban residents. With diminishing opportunities to acquire or create new PGI spaces within ever-more-densely populated urban centers, understanding, efficiently managing, and continuously improving existing PGI spaces is crucial to access the benefits and services that PGI and UN provide. The IPA technique can provide the data necessary to inform an evidenced-based approach to managing and resourcing PGI and UN spaces. Full article
Show Figures

Figure 1

14 pages, 2770 KiB  
Article
Continuous Genetic Algorithms as Intelligent Assistance for Resource Distribution in Logistic Systems
by Łukasz Wieczorek and Przemysław Ignaciuk
Data 2018, 3(4), 68; https://doi.org/10.3390/data3040068 - 16 Dec 2018
Cited by 8 | Viewed by 3719
Abstract
This paper addresses the problem of resource distribution control in logistic systems influenced by uncertain demand. The considered class of logistic topologies comprises two types of actors—controlled nodes and external sources—interconnected without any structural restrictions. In this paper, the application of continuous-domain genetic [...] Read more.
This paper addresses the problem of resource distribution control in logistic systems influenced by uncertain demand. The considered class of logistic topologies comprises two types of actors—controlled nodes and external sources—interconnected without any structural restrictions. In this paper, the application of continuous-domain genetic algorithms (GAs) is proposed in order to support the optimization process of resource reflow in the network channels. GAs allow one to perform simulation-based optimization and provide desirable operating conditions in the face of a priori unknown, time-varying demand. The effectiveness of inventory management process governed under an order-up-to policy involves two different objectives—holding costs and service level. Using the network analytical model with the inventory management policy implemented in a centralized way, GAs search a space of candidate solutions to find optimal policy parameters for a given topology. Numerical experiments confirm the analytical assumptions. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Show Figures

Figure 1

19 pages, 4268 KiB  
Article
Congestion Adaptive Traffic Light Control and Notification Architecture Using Google Maps APIs
by Sumit Mishra, Devanjan Bhattacharya and Ankit Gupta
Data 2018, 3(4), 67; https://doi.org/10.3390/data3040067 - 14 Dec 2018
Cited by 23 | Viewed by 11387
Abstract
Traffic jams can be avoided by controlling traffic signals according to quickly building congestion with steep gradients on short temporal and small spatial scales. With the rising standards of computational technology, single-board computers, software packages, platforms, and APIs (Application Program Interfaces), it has [...] Read more.
Traffic jams can be avoided by controlling traffic signals according to quickly building congestion with steep gradients on short temporal and small spatial scales. With the rising standards of computational technology, single-board computers, software packages, platforms, and APIs (Application Program Interfaces), it has become relatively easy for developers to create systems for controlling signals and informative systems. Hence, for enhancing the power of Intelligent Transport Systems in automotive telematics, in this study, we used crowdsourced traffic congestion data from Google to adjust traffic light cycle times with a system that is adaptable to congestion. One aim of the system proposed here is to inform drivers about the status of the upcoming traffic light on their route. Since crowdsourced data are used, the system does not entail the high infrastructure cost associated with sensing networks. A full system module-level analysis is presented for implementation. The system proposed is fail-safe against temporal communication failure. Along with a case study for examining congestion levels, generic information processing for the cycle time decision and status delivery system was tested and confirmed to be viable and quick for a restricted prototype model. The information required was delivered correctly over sustained trials, with an average time delay of 1.5 s and a maximum of 3 s. Full article
(This article belongs to the Special Issue Big Data Challenges in Smart Cities)
Show Figures

Figure 1

9 pages, 1786 KiB  
Article
Similar Text Fragments Extraction for Identifying Common Wikipedia Communities
by Svitlana Petrasova, Nina Khairova, Włodzimierz Lewoniewski, Orken Mamyrbayev and Kuralay Mukhsina
Data 2018, 3(4), 66; https://doi.org/10.3390/data3040066 - 13 Dec 2018
Cited by 3 | Viewed by 3979
Abstract
Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we [...] Read more.
Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. With WordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments in Wikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Show Figures

Figure 1

5 pages, 179 KiB  
Data Descriptor
Effect of Heat-Producing Needling Technique on the Local Skin Temperature: Clinical Dataset
by Zehuan Liao and Yan Zhao
Data 2018, 3(4), 65; https://doi.org/10.3390/data3040065 - 12 Dec 2018
Cited by 5 | Viewed by 3508
Abstract
The heat-producing needling technique is a special compound manipulating procedure on the acupuncture needle which has been recorded to produce a warm sensation in the body in ancient TCM literature. This randomized, subject-blinded clinical study was performed to examine the effect of heat-producing [...] Read more.
The heat-producing needling technique is a special compound manipulating procedure on the acupuncture needle which has been recorded to produce a warm sensation in the body in ancient TCM literature. This randomized, subject-blinded clinical study was performed to examine the effect of heat-producing acupuncture treatment on the ST36 local skin temperature. A total of 30 healthy participants received four successive sessions of heat-producing acupuncture treatment, non-acupoint heat-producing acupuncture treatment, normal stable acupuncture treatment, and non-invasive sham acupuncture treatment at the ST36 acupoint in a random sequence. Within each session, the local ST36 skin temperature and basal body temperature of each participant were measured at 1 min before needle insertion, just after needle insertion and manipulation (if any), 5 min after needle insertion with needle removal immediately after temperature taking, and 5 min after needle removal. Furthermore, the participants were also required to declare their needling and heat sensation felt during the acupuncture needling treatment period using a visual analogue scale from 1 to 10 immediately after each treatment session. This data descriptor presents all the clinical data obtained in the above mentioned study. Full article
17 pages, 5506 KiB  
Data Descriptor
Short Baseline Observations at Geodetic Observatory Wettzell
by Apurva Phogat, Gerhard Kronschnabl, Christian Plötz, Walter Schwarz and Torben Schüler
Data 2018, 3(4), 64; https://doi.org/10.3390/data3040064 - 10 Dec 2018
Cited by 1 | Viewed by 4272
Abstract
The Geodetic Observatory Wettzell (GOW), jointly operated by the Federal Agency for Cartography and Geodesy (BKG), Germany and the Technical University of Munich, Germany is equipped with three radio telescopes for Very Long Baseline Interferometry (VLBI). Correlation capability is primarily designed for relative [...] Read more.
The Geodetic Observatory Wettzell (GOW), jointly operated by the Federal Agency for Cartography and Geodesy (BKG), Germany and the Technical University of Munich, Germany is equipped with three radio telescopes for Very Long Baseline Interferometry (VLBI). Correlation capability is primarily designed for relative positioning of the three Wettzell radio telescopes i.e., to derive the local ties between the three telescopes from VLBI raw data in addition to the conventional terrestrial surveys. A computing cluster forming the GO Wettzell Local Correlator (GOWL) was installed in 2017 as well as the Distributed FX (DiFX) software correlation package and the Haystack Observatory Postprocessing System (HOPS) for fringe fitting and postprocessing of the output. Data pre-processing includes ambiguity resolution (if necessary) as well as the generation of the geodetic database and NGS card files with υ Solve. The final analysis is either carried out with local processing software (LEVIKA short baseline analysis) or with the Vienna VLBI and Satellite (VieVS) software. We will present an overview of the scheduling, correlation and analysis capabilities at GOW and results obtained so. The dataset includes auxiliary files (schedule and log files) which contain information about the participating antenna, observed sources, clock offset between formatter and GPS time, cable delay, meteorological parameters (temperature, barometric pressure, and relative humidity) and ASCII files created after fringe fitting and final analysis. The published dataset can be used by the researchers and scientists to further explore short baseline interferometry. Full article
(This article belongs to the Special Issue Data in Astrophysics & Geophysics: Research and Applications)
Show Figures

Figure 1

10 pages, 892 KiB  
Article
The Extended Multidimensional Neo-Fuzzy System and Its Fast Learning in Pattern Recognition Tasks
by Yevgeniy Bodyanskiy, Nonna Kulishova and Olha Chala
Data 2018, 3(4), 63; https://doi.org/10.3390/data3040063 - 9 Dec 2018
Cited by 9 | Viewed by 2768
Abstract
Methods of machine learning and data mining are becoming the cornerstone in information technologies with real-time image and video recognition methods getting more and more attention. While computational system architectures are getting larger and more complex, their learning methods call for changes, as [...] Read more.
Methods of machine learning and data mining are becoming the cornerstone in information technologies with real-time image and video recognition methods getting more and more attention. While computational system architectures are getting larger and more complex, their learning methods call for changes, as training datasets often reach tens and hundreds of thousands of samples, therefore increasing the learning time of such systems. It is possible to reduce computational costs by tuning the system structure to allow fast, high accuracy learning algorithms to be applied. This paper proposes a system based on extended multidimensional neo-fuzzy units and its learning algorithm designed for data streams processing tasks. The proposed learning algorithm, based on the information entropy criterion, has significantly improved the system approximating capabilities. Experiments have confirmed the efficiency of the proposed system in solving real-time video stream recognition tasks. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Show Figures

Figure 1

14 pages, 3186 KiB  
Article
A Novel Neuro-Fuzzy Model for Multivariate Time-Series Prediction
by Alexander Vlasenko, Nataliia Vlasenko, Olena Vynokurova and Dmytro Peleshko
Data 2018, 3(4), 62; https://doi.org/10.3390/data3040062 - 8 Dec 2018
Cited by 7 | Viewed by 3490
Abstract
Time series forecasting can be a complicated problem when the underlying process shows high degree of complex nonlinear behavior. In some domains, such as financial data, processing related time-series jointly can have significant benefits. This paper proposes a novel multivariate hybrid neuro-fuzzy model [...] Read more.
Time series forecasting can be a complicated problem when the underlying process shows high degree of complex nonlinear behavior. In some domains, such as financial data, processing related time-series jointly can have significant benefits. This paper proposes a novel multivariate hybrid neuro-fuzzy model for forecasting tasks, which is based on and generalizes the neuro-fuzzy model with consequent layer multi-variable Gaussian units and its learning algorithm. The model is distinguished by a separate consequent block for each output, which is tuned with respect to the its output error only, but benefits from extracting additional information by processing the whole input vector including lag values of other variables. Numerical experiments show better accuracy and computational performance results than competing models and separate neuro-fuzzy models for each output, and thus an ability to implicitly handle complex cross correlation dependencies between variables. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Show Figures

Figure 1

12 pages, 3174 KiB  
Data Descriptor
SolarView: Georgia Solar Adoption in Context
by Jacqueline Hettel Tidwell, Abraham Tidwell, Steffan Nelson and Marcus Hill
Data 2018, 3(4), 61; https://doi.org/10.3390/data3040061 - 7 Dec 2018
Cited by 3 | Viewed by 3517
Abstract
The local-national gap is a problem currently plaguing the adoption of emerging technologies targeted at resolving energy transition issues that are characterized by disparities in the adoption of innovations and policies on a local level in response to national policy implementation. These disparities [...] Read more.
The local-national gap is a problem currently plaguing the adoption of emerging technologies targeted at resolving energy transition issues that are characterized by disparities in the adoption of innovations and policies on a local level in response to national policy implementation. These disparities reflect a complex system of technical, economic, social, political, and ecological factors linked to the perceptions held by communities and how they see energy development and national/global policy goals. This dataset is an attempt to bridge the local-national gap regarding solar PV adoption in the State of Georgia (U.S.) by aggregating variables from seven different publicly-available sources. The objective of this activity was to design a resource that would help researchers interested in the context underlying solar adoption on the local scale of governance (e.g., the county level). The SolarView database includes information necessary for informing policy-making activities such as solar installation information, a historical county zip code directory, county-level census data, housing value indexes, renewable energy incentive totals, PV rooftop suitability percentages, and utility rates. As this is a database from multiple sources, incomplete data entries are noted. Full article
Show Figures

Figure 1

21 pages, 9752 KiB  
Article
Using Recurrent Procedures in Adaptive Control System for Identify the Model Parameters of the Moving Vessel on the Cross Slipway
by Hanna Rudakova, Oksana Polyvoda and Anton Omelchuk
Data 2018, 3(4), 60; https://doi.org/10.3390/data3040060 - 7 Dec 2018
Cited by 3 | Viewed by 4321
Abstract
The article analyses the problems connected with ensuring the coordinated operation of slipway drives that arise during the launch of a ship. The dynamic model of load of the electric drive of the ship’s cart is obtained taking into account the peculiarities of [...] Read more.
The article analyses the problems connected with ensuring the coordinated operation of slipway drives that arise during the launch of a ship. The dynamic model of load of the electric drive of the ship’s cart is obtained taking into account the peculiarities of the construction of the ship-lifting complex, which allows us to analyse the influence of external factors and random influences during the entire process of launching the ship. A linearized mathematical model of the dynamics of a complex vessel movement in the process of descent in the space of states is developed, which allows us to identify the mode of operation of the multi-drive system, taking into account its structure. The analysis of application efficiency of recurrent methods for identification (stochastic approximation and least squares) of the linearized model parameters in the space of states is carried out. A decision support system has been developed in the automated system of operational control by the module for estimating the situation and the control synthesis to ensure a coherent motion of a complex ship-carts object in a two-phase environment. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Show Figures

Figure 1

19 pages, 1565 KiB  
Article
Real-Time Fuzzy Data Processing Based on a Computational Library of Analytic Models
by Yuriy Kondratenko and Nina Kondratenko
Data 2018, 3(4), 59; https://doi.org/10.3390/data3040059 - 4 Dec 2018
Cited by 6 | Viewed by 3087
Abstract
This work focuses on fuzzy data processing in control and decision-making systems based on the transformation of real-timeseries and high-frequency data to fuzzy sets with further implementation of diverse fuzzy arithmetic operations. Special attention was paid to the synthesis of the computational library [...] Read more.
This work focuses on fuzzy data processing in control and decision-making systems based on the transformation of real-timeseries and high-frequency data to fuzzy sets with further implementation of diverse fuzzy arithmetic operations. Special attention was paid to the synthesis of the computational library of horizontal and vertical analytic models for fuzzy sets as the results of fuzzy arithmetic operations. The usage of the developed computational library allows increasing the operating speed and accuracy of fuzzy data processing in real time. A computational library was formed for computing of such fuzzy arithmetic operations as fuzzy-maximum. Fuzzy sets as components of fuzzy data processing were chosen as triangular fuzzy numbers. The analytic models were developed based on the analysis of the intersection points between left and right branches of considered triangular fuzzy numbers with different relations between their parameters. Our study introduces the mask for the evaluation of the relations between corresponding parameters of fuzzy numbers that allows to determine the appropriate model from the computational library in automatic mode. The simulation results confirm the efficiency of the proposed computational library for different applications. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Show Figures

Figure 1

15 pages, 1910 KiB  
Article
Multi-Agent Big-Data Lambda Architecture Model for E-Commerce Analytics
by Gautam Pal, Gangmin Li and Katie Atkinson
Data 2018, 3(4), 58; https://doi.org/10.3390/data3040058 - 1 Dec 2018
Cited by 14 | Viewed by 6801
Abstract
We study big-data hybrid-data-processing lambda architecture, which consolidates low-latency real-time frameworks with high-throughput Hadoop-batch frameworks over a massively distributed setup. In particular, real-time and batch-processing engines act as autonomous multi-agent systems in collaboration. We propose a Multi-Agent Lambda Architecture (MALA) for e-commerce data [...] Read more.
We study big-data hybrid-data-processing lambda architecture, which consolidates low-latency real-time frameworks with high-throughput Hadoop-batch frameworks over a massively distributed setup. In particular, real-time and batch-processing engines act as autonomous multi-agent systems in collaboration. We propose a Multi-Agent Lambda Architecture (MALA) for e-commerce data analytics. We address the high-latency problem of Hadoop MapReduce jobs by simultaneous processing at the speed layer to the requests which require a quick turnaround time. At the same time, the batch layer in parallel provides comprehensive coverage of data by intelligent blending of stream and historical data through the weighted voting method. The cold-start problem of streaming services is addressed through the initial offset from historical batch data. Challenges of high-velocity data ingestion is resolved with distributed message queues. A proposed multi-agent decision-maker component is placed at the MALA stack as the gateway of the data pipeline. We prove efficiency of our batch model by implementing an array of features for an e-commerce site. The novelty of the model and its key significance is a scheme for multi-agent interaction between batch and real-time agents to produce deeper insights at low latency and at significantly lower costs. Hence, the proposed system is highly appealing for applications involving big data and caters to high-velocity streaming ingestion and a massive data pool. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Show Figures

Figure 1

21 pages, 835 KiB  
Article
Television Rating Control in the Multichannel Environment Using Trend Fuzzy Knowledge Bases and Monitoring Results
by Olexiy Azarov, Leonid Krupelnitsky and Hanna Rakytyanska
Data 2018, 3(4), 57; https://doi.org/10.3390/data3040057 - 1 Dec 2018
Cited by 2 | Viewed by 2936
Abstract
The purpose of this study is to control the ratio of programs of different genres when forming the broadcast grid in order to increase and maintain the rating of a channel. In the multichannel environment, television rating controls consist of selecting content, the [...] Read more.
The purpose of this study is to control the ratio of programs of different genres when forming the broadcast grid in order to increase and maintain the rating of a channel. In the multichannel environment, television rating controls consist of selecting content, the ratings of which are completely restored after advertising. The hybrid approach to rule set refinement based on fuzzy relational calculus simplifies the process of expert recommendation systems construction. By analogy with the problem of the inverted pendulum control, the managerial actions aim to retain the balance between the fuzzy demand and supply. The increase or decrease trends of the demand and supply are described by primary fuzzy relations. The rule-based solutions of fuzzy relational equations connect significance measures of the primary fuzzy terms. Program set refinement by solving fuzzy relational equations allows avoiding procedures of content-based selective filtering. The solution set generation corresponds to the granulation of television time, where each solution represents the time slot and the granulated rating of the content. In automated media planning, generation of the weekly TV program in the form of the granular solution provides the decrease of time needed for the programming of the channel broadcast grid. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Show Figures

Figure 1

14 pages, 764 KiB  
Article
Russian–German Astroparticle Data Life Cycle Initiative
by Igor Bychkov, Andrey Demichev, Julia Dubenskaya, Oleg Fedorov, Andreas Haungs, Andreas Heiss, Donghwa Kang, Yulia Kazarina, Elena Korosteleva, Dmitriy Kostunin, Alexander Kryukov, Andrey Mikhailov, Minh-Duc Nguyen, Stanislav Polyakov, Evgeny Postnikov, Alexey Shigarov, Dmitry Shipilov, Achim Streit, Victoria Tokareva, Doris Wochele, Jürgen Wochele and Dmitry Zhurovadd Show full author list remove Hide full author list
Data 2018, 3(4), 56; https://doi.org/10.3390/data3040056 - 28 Nov 2018
Cited by 29 | Viewed by 4885
Abstract
Modern large-scale astroparticle setups measure high-energy particles, gamma rays, neutrinos, radio waves, and the recently discovered gravitational waves. Ongoing and future experiments are located worldwide. The data acquired have different formats, storage concepts, and publication policies. Such differences are a crucial point in [...] Read more.
Modern large-scale astroparticle setups measure high-energy particles, gamma rays, neutrinos, radio waves, and the recently discovered gravitational waves. Ongoing and future experiments are located worldwide. The data acquired have different formats, storage concepts, and publication policies. Such differences are a crucial point in the era of Big Data and of multi-messenger analysis in astroparticle physics. We propose an open science web platform called ASTROPARTICLE.ONLINE which enables us to publish, store, search, select, and analyze astroparticle data. In the first stage of the project, the following components of a full data life cycle concept are under development: describing, storing, and reusing astroparticle data; software to perform multi-messenger analysis using deep learning; and outreach for students, post-graduate students, and others who are interested in astroparticle physics. Here we describe the concepts of the web platform and the first obtained results, including the meta data structure for astroparticle data, data analysis by using convolution neural networks, description of the binary data, and the outreach platform for those interested in astroparticle physics. The KASCADE-Grande and TAIGA cosmic-ray experiments were chosen as pilot examples. Full article
(This article belongs to the Special Issue Data in Astrophysics & Geophysics: Research and Applications)
Show Figures

Figure 1

15 pages, 3054 KiB  
Article
Transcriptional Profiles of Secondary Metabolite Biosynthesis Genes and Cytochromes in the Leaves of Four Papaver Species
by Dowan Kim, Myunghee Jung, In Jin Ha, Min Young Lee, Seok-Geun Lee, Younhee Shin, Sathiyamoorthy Subramaniyam and Jaehyeon Oh
Data 2018, 3(4), 55; https://doi.org/10.3390/data3040055 - 28 Nov 2018
Cited by 7 | Viewed by 4887
Abstract
Poppies are well-known plants in the family Papaveraceae that are rich in alkaloids. This family contains 61 species, and in this study we sequenced the transcriptomes of four species’ (Papaver rhoeas, Papaver nudicaule, Papaver fauriei, and Papaver somniferum) [...] Read more.
Poppies are well-known plants in the family Papaveraceae that are rich in alkaloids. This family contains 61 species, and in this study we sequenced the transcriptomes of four species’ (Papaver rhoeas, Papaver nudicaule, Papaver fauriei, and Papaver somniferum) leaves. These transcripts were systematically assessed for the expression of secondary metabolite biosynthesis (SMB) genes and cytochromes, and their expression profiles were assessed for use in bioinformatics analyses. This study contributed 265 Gb (13 libraries with three biological replicates) of leaf transcriptome data from three Papaver plant developmental stages. Sequenced transcripts were assembled into 815 Mb of contigs, including 226 Mb of full-length transcripts. The transcripts for 53 KEGG pathways, 55 cytochrome superfamilies, and benzylisoquinoline alkaloid biosynthesis (BIA) were identified and compared to four other alkaloid-rich genomes. Additionally, 22 different alkaloids and their relative expression profiles in three developmental stages of Papaver species were assessed by targeted metabolomics using LC-QTOF-MS/MS. Collectively, the results are given in co-occurrence heat-maps to help researchers obtain an overview of the transcripts and their differential expression in the Papaver development life cycle, particularly in leaves. Moreover, this dataset will be a valuable resource to derive hypotheses to mitigate an array of Papaver developmental and secondary metabolite biosynthesis issues in the future. Full article
Show Figures

Graphical abstract

16 pages, 3492 KiB  
Article
Performance Analysis of Statistical and Supervised Learning Techniques in Stock Data Mining
by Manik Sharma, Samriti Sharma and Gurvinder Singh
Data 2018, 3(4), 54; https://doi.org/10.3390/data3040054 - 24 Nov 2018
Cited by 40 | Viewed by 6897
Abstract
Nowadays, overwhelming stock data is available, which areonly of use if it is properly examined and mined. In this paper, the last twelve years of ICICI Bank’s stock data have been extensively examined using statistical and supervised learning techniques. This study may be [...] Read more.
Nowadays, overwhelming stock data is available, which areonly of use if it is properly examined and mined. In this paper, the last twelve years of ICICI Bank’s stock data have been extensively examined using statistical and supervised learning techniques. This study may be of great interest for those who wish to mine or study the stock data of banks or any financial organization. Different statistical measures have been computed to explore the nature, range, distribution, and deviation of data. The different descriptive statistical measures assist in finding different valuable metrics such as mean, variance, skewness, kurtosis, p-value, a-squared, and 95% confidence mean interval level of ICICI Bank’s stock data. Moreover, daily percentage changes occurring over the last 12 years have also been recorded and examined. Additionally, the intraday stock status has been mined using ten different classifiers. The performance of different classifiers has been evaluated on the basis of various parameters such as accuracy, misclassification rate, precision, recall, specificity, and sensitivity. Based upon different parameters, the predictive results obtained using logistic regression are more acceptable than the outcomes of other classifiers, whereas naïve Bayes, C4.5, random forest, linear discriminant, and cubic support vector machine (SVM) merely act as a random guessing machine. The outstanding performance of logistic regression has been validated using TOPSIS (technique for order preference by similarity to ideal solution) and WSA (weighted sum approach). Full article
(This article belongs to the Special Issue Data Analysis for Financial Markets)
Show Figures

Figure 1

12 pages, 233 KiB  
Article
Towards the Construction of a Gold Standard Biomedical Corpus for the Romanian Language
by Maria Mitrofan, Verginica Barbu Mititelu and Grigorina Mitrofan
Data 2018, 3(4), 53; https://doi.org/10.3390/data3040053 - 23 Nov 2018
Cited by 4 | Viewed by 3588
Abstract
Gold standard corpora (GSCs) are essential for the supervised training and evaluation of systems that perform natural language processing (NLP) tasks. Currently, most of the resources used in biomedical NLP tasks are mainly in English. Little effort has been reported for other languages [...] Read more.
Gold standard corpora (GSCs) are essential for the supervised training and evaluation of systems that perform natural language processing (NLP) tasks. Currently, most of the resources used in biomedical NLP tasks are mainly in English. Little effort has been reported for other languages including Romanian and, thus, access to such language resources is poor. In this paper, we present the construction of the first morphologically and terminologically annotated biomedical corpus of the Romanian language (MoNERo), meant to serve as a gold standard for biomedical part-of-speech (POS) tagging and biomedical named entity recognition (bioNER). It contains 14,012 tokens distributed in three medical subdomains: cardiology, diabetes and endocrinology, extracted from books, journals and blogposts. In order to automatically annotate the corpus with POS tags, we used a Romanian tag set which has 715 labels, while diseases, anatomy, procedures and chemicals and drugs labels were manually annotated for bioNER with a Cohen Kappa coefficient of 92.8% and revealed the occurrence of 1877 medical named entities. The automatic annotation of the corpus has been manually checked. The corpus is publicly available and can be used to facilitate the development of NLP algorithms for the Romanian language. Full article
(This article belongs to the Special Issue Curative Power of Medical Data)
Show Figures

Figure 1

10 pages, 2395 KiB  
Article
Analysis of Application of Cluster Descriptions in Space of Characteristic Image Features
by Oleksii Gorokhovatskyi, Volodymyr Gorokhovatskyi and Olena Peredrii
Data 2018, 3(4), 52; https://doi.org/10.3390/data3040052 - 14 Nov 2018
Cited by 8 | Viewed by 3154
Abstract
In this paper, we propose an investigation of the properties of structural image recognition methods in the cluster space of characteristic features. Recognition, which is based on key point descriptors like SIFT (Scale-invariant Feature Transform), SURF (Speeded Up Robust Features), ORB (Oriented FAST [...] Read more.
In this paper, we propose an investigation of the properties of structural image recognition methods in the cluster space of characteristic features. Recognition, which is based on key point descriptors like SIFT (Scale-invariant Feature Transform), SURF (Speeded Up Robust Features), ORB (Oriented FAST and Rotated BRIEF), etc., often relating to the search for corresponding descriptor values between an input image and all etalon images, which require many operations and time. Recognition of the previously quantized (clustered) sets of descriptor features is described. Clustering is performed across the complete set of etalon image descriptors and followed by screening, which allows for representation of each etalon image in vector form as a distribution of clusters. Due to such representations, the number of computation and comparison procedures, which are the core of the recognition process, might be reduced tens of times. Respectively, the preprocessing stage takes additional time for clustering. The implementation of the proposed approach was tested on the Leeds Butterfly dataset. The dependence of cluster amount on recognition performance and processing time was investigated. It was proven that recognition may be performed up to nine times faster with only a moderate decrease in quality recognition compared to searching for correspondences between all existing descriptors in etalon images and input one without quantization. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Show Figures

Figure 1

10 pages, 621 KiB  
Data Descriptor
Data on Peer-Reviewed Papers about Green Infrastructure, Urban Nature, and City Liveability
by Greg D. Simpson and Jackie Parker
Data 2018, 3(4), 51; https://doi.org/10.3390/data3040051 - 12 Nov 2018
Cited by 17 | Viewed by 5447
Abstract
This data descriptor summarizes the process applied and data gathered from the contents of 87 peer-reviewed papers/sources reporting on the contribution of public green infrastructure (PGI), in the form of public parks and urban nature spaces, in the context of city liveability and [...] Read more.
This data descriptor summarizes the process applied and data gathered from the contents of 87 peer-reviewed papers/sources reporting on the contribution of public green infrastructure (PGI), in the form of public parks and urban nature spaces, in the context of city liveability and general human health and well-being. These papers were collected in a systematic literature review that informed the design of a questionnaire-based survey of PGI users in Perth, Western Australia. The survey explored visitor satisfaction with the amenities and facilities of the PGI space, and perceptions of the importance of such spaces for city liveability. Papers were sourced by searching over 15,000 databases, including all the major English language academic publishing houses, using the ProQuest Summon® service. Only English language peer-reviewed papers/editorial thought pieces/book chapters that were published since 2000 with the full text available online were considered for this review. The primary search, conducted in December 2016, identified 71 papers, and a supplementary search undertaken in June 2018 identified a further 16 papers that had become discoverable online after the completion of the initial search. Full article
Show Figures

Figure 1

15 pages, 1469 KiB  
Article
Application of Rough Set Theory to Water Quality Analysis: A Case Study
by Maryam Zavareh and Viviana Maggioni
Data 2018, 3(4), 50; https://doi.org/10.3390/data3040050 - 7 Nov 2018
Cited by 11 | Viewed by 4118
Abstract
This work proposes an approach to analyze water quality data that is based on rough set theory. Six major water quality indicators (temperature, pH, dissolved oxygen, turbidity, specific conductivity, and nitrate concentration) were collected at the outlet of the watershed that contains the [...] Read more.
This work proposes an approach to analyze water quality data that is based on rough set theory. Six major water quality indicators (temperature, pH, dissolved oxygen, turbidity, specific conductivity, and nitrate concentration) were collected at the outlet of the watershed that contains the George Mason University campus in Fairfax, VA during three years (October 2015–December 2017). Rough set theory is applied to monthly averages of the collected data to estimate one indicator (decision attribute) based on the remainder indicators and to determine what indicators (conditional attributes) are essential (core) to predict the missing indicator. The redundant attributes are identified, the importance degree of each attribute is quantified, and the certainty and coverage of any detected rule(s) is evaluated. Possible decision making rules are also assessed and the certainty coverage factor is calculated. Results show that the core water quality indicators for the Mason watershed during the study period are turbidity and specific conductivity. Particularly, if pH is chosen as a decision attribute, the importance degree of turbidity is higher than the one of conductivity. If the decision attribute is turbidity, the only indispensable attribute is specific conductivity and if specific conductivity is the decision attribute, the indispensable attribute beside turbidity is temperature. Full article
(This article belongs to the Special Issue Overcoming Data Scarcity in Earth Science)
Show Figures

Figure 1

21 pages, 6263 KiB  
Article
Adaptive Degradation Prognostic Reasoning by Particle Filter with a Neural Network Degradation Model for Turbofan Jet Engine
by Faisal Khan, Omer F. Eker, Atif Khan and Wasim Orfali
Data 2018, 3(4), 49; https://doi.org/10.3390/data3040049 - 6 Nov 2018
Cited by 13 | Viewed by 4340
Abstract
In the aerospace industry, every minute of downtime because of equipment failure impacts operations significantly. Therefore, efficient maintenance, repair and overhaul processes to aid maximum equipment availability are essential. However, scheduled maintenance is costly and does not track the degradation of the equipment [...] Read more.
In the aerospace industry, every minute of downtime because of equipment failure impacts operations significantly. Therefore, efficient maintenance, repair and overhaul processes to aid maximum equipment availability are essential. However, scheduled maintenance is costly and does not track the degradation of the equipment which could result in unexpected failure of the equipment. Prognostic Health Management (PHM) provides techniques to monitor the precise degradation of the equipment along with cost-effective reliability. This article presents an adaptive data-driven prognostics reasoning approach. An engineering case study of Turbofan Jet Engine has been used to demonstrate the prognostic reasoning approach. The emphasis of this article is on an adaptive data-driven degradation model and how to improve the remaining useful life (RUL) prediction performance in condition monitoring of a Turbofan Jet Engine. The RUL prediction results show low prediction errors regardless of operating conditions, which contrasts with a conventional data-driven model (a non-parameterised Neural Network model) where prediction errors increase as operating conditions deviate from the nominal condition. In this article, the Neural Network has been used to build the Nominal model and Particle Filter has been used to track the present degradation along with degradation parameter. Full article
Show Figures

Figure 1

15 pages, 15439 KiB  
Article
An Evaluation of the Information Technology of Gene Expression Profiles Processing Stability for Different Levels of Noise Components
by Sergii Babichev
Data 2018, 3(4), 48; https://doi.org/10.3390/data3040048 - 5 Nov 2018
Cited by 40 | Viewed by 3178
Abstract
This paper presents the results of research concerning the evaluation of stability of information technology of gene expression profiles processing with the use of gene expression profiles, which contain different levels of noise components. The information technology is presented as a structural block-chart, [...] Read more.
This paper presents the results of research concerning the evaluation of stability of information technology of gene expression profiles processing with the use of gene expression profiles, which contain different levels of noise components. The information technology is presented as a structural block-chart, which contains all stages of the studied data processing. The hybrid model of objective clustering based on the SOTA algorithm and the technology of gene regulatory networks reconstruction have been investigated to evaluate the stability to the level of the noise components. The results of the simulation have shown that the hybrid model of the objective clustering has high level of stability to noise components and vice versa, the technology of gene regulatory networks reconstruction is rather sensitive to the level of noise component. The obtained results indicate the importance of gene expression profiles preprocessing at the early stage of the gene regulatory network reconstruction in order to remove background noise and non-informative genes in terms of the used criteria. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Show Figures

Figure 1

12 pages, 473 KiB  
Data Descriptor
Characteristics of Unemployed People, Training Attendance and Job Searching Success in the Valencian Region (Spain)
by Francisco Guijarro
Data 2018, 3(4), 47; https://doi.org/10.3390/data3040047 - 3 Nov 2018
Viewed by 3420
Abstract
The current economical recovery is driven by expansions in many countries, with a global economic growth of 3.6% in 2017. However, some countries are still struggling with vulnerable forms of employment and high unemployment rates. Official statistics in Spain reveal that women and [...] Read more.
The current economical recovery is driven by expansions in many countries, with a global economic growth of 3.6% in 2017. However, some countries are still struggling with vulnerable forms of employment and high unemployment rates. Official statistics in Spain reveal that women and older people constitutes the core of structural unemployment, and are persistently being excluded from employment recovery. This paper contributes with a database that includes jobseekers’ characteristics, enrollment on training initiatives for unemployed and employment contracts for the Valencian region in Spain. Analysing the relation between the involved variables can help researchers to shed light on which characteristics are positively related to employment and then encourage political decision makers to promote initiatives to support vulnerable groups. Full article
Show Figures

Figure 1

14 pages, 2861 KiB  
Article
Development of the Non-Iterative Supervised Learning Predictor Based on the Ito Decomposition and SGTM Neural-Like Structure for Managing Medical Insurance Costs
by Roman Tkachenko, Ivan Izonin, Pavlo Vitynskyi, Nataliia Lotoshynska and Olena Pavlyuk
Data 2018, 3(4), 46; https://doi.org/10.3390/data3040046 - 31 Oct 2018
Cited by 66 | Viewed by 4961
Abstract
The paper describes a new non-iterative linear supervised learning predictor. It is based on the use of Ito decomposition and the neural-like structure of the successive geometric transformations model (SGTM). Ito decomposition (Kolmogorov–Gabor polynomial) is used to extend the inputs of the SGTM [...] Read more.
The paper describes a new non-iterative linear supervised learning predictor. It is based on the use of Ito decomposition and the neural-like structure of the successive geometric transformations model (SGTM). Ito decomposition (Kolmogorov–Gabor polynomial) is used to extend the inputs of the SGTM neural-like structure. This provides high approximation properties for solving various tasks. The search for the coefficients of this polynomial is carried out using the fast, non-iterative training algorithm of the SGTM linear neural-like structure. The developed method provides high speed and increased generalization properties. The simulation of the developed method’s work for solving the medical insurance costs prediction task showed a significant increase in accuracy compared with existing methods (common SGTM neural-like structure, multilayer perceptron, Support Vector Machine, adaptive boosting, linear regression). Given the above, the developed method can be used to process large amounts of data from a variety of industries (medicine, materials science, economics, etc.) to improve the accuracy and speed of their processing. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Show Figures

Figure 1

10 pages, 224 KiB  
Article
Improving the Quality of Survey Data Documentation: A Total Survey Error Perspective
by Alexander Jedinger, Oliver Watteler and André Förster
Data 2018, 3(4), 45; https://doi.org/10.3390/data3040045 - 29 Oct 2018
Cited by 7 | Viewed by 5202
Abstract
Surveys are a common method in the social and behavioral sciences to collect data on attitudes, personality and social behavior. Methodological reports should provide researchers with a complete and comprehensive overview of the design, collection and statistical processing of the survey data that [...] Read more.
Surveys are a common method in the social and behavioral sciences to collect data on attitudes, personality and social behavior. Methodological reports should provide researchers with a complete and comprehensive overview of the design, collection and statistical processing of the survey data that are to be analyzed. As an important aspect of open science practices, they should enable secondary users to assess the quality and the analytical potential of the data. In the present article, we propose guidelines for the documentation of survey data that are based on the total survey error approach. Considering these guidelines, we conclude that both scientists and data-holding institutions should become more sensitive to the quality of survey data documentation. Full article
16 pages, 2960 KiB  
Article
CRC806-KB: A Semantic MediaWiki Based Collaborative Knowledge Base for an Interdisciplinary Research Project
by Christian Willmes, Finn Viehberg, Sarah Esteban Lopez and Georg Bareth
Data 2018, 3(4), 44; https://doi.org/10.3390/data3040044 - 25 Oct 2018
Cited by 7 | Viewed by 4583
Abstract
In the frame of an interdisciplinary research project that is concerned with data from heterogeneous domains, such as archaeology, cultural sciences, and the geosciences, a web-based Knowledge Base system was developed to facilitate and improve research collaboration between the project participants. The presented [...] Read more.
In the frame of an interdisciplinary research project that is concerned with data from heterogeneous domains, such as archaeology, cultural sciences, and the geosciences, a web-based Knowledge Base system was developed to facilitate and improve research collaboration between the project participants. The presented system is based on a Wiki that was enhanced with a semantic extension, which enables to store and query structured data within the Wiki. Using an additional open source tool for Schema–Driven Development of the data model, and the structure of the Knowledge Base, improved the collaborative data model development process, as well as semi-automation of data imports and updates. The paper presents the system architecture, as well as some example applications of a collaborative Wiki based Knowledge Base infrastructure. Full article
(This article belongs to the Special Issue Semantics in the Deep: Semantic Analytics for Big Data)
Show Figures

Figure 1

26 pages, 12265 KiB  
Article
Short-Term Forecasting of Electricity Supply and Demand by Using the Wavelet-PSO-NNs-SO Technique for Searching in Big Data of Iran’s Electricity Market
by Mesbaholdin Salami, Farzad Movahedi Sobhani and Mohammad Sadegh Ghazizadeh
Data 2018, 3(4), 43; https://doi.org/10.3390/data3040043 - 23 Oct 2018
Cited by 6 | Viewed by 3523
Abstract
The databases of Iran’s electricity market have been storing large sizes of data. Retail buyers and retailers will operate in Iran’s electricity market in the foreseeable future when smart grids are implemented thoroughly across Iran. As a result, there will be very much [...] Read more.
The databases of Iran’s electricity market have been storing large sizes of data. Retail buyers and retailers will operate in Iran’s electricity market in the foreseeable future when smart grids are implemented thoroughly across Iran. As a result, there will be very much larger data of the electricity market in the future than ever before. If certain methods are devised to perform quick search in such large sizes of stored data, it will be possible to improve the forecasting accuracy of important variables in Iran’s electricity market. In this paper, available methods were employed to develop a new technique of Wavelet-Neural Networks-Particle Swarm Optimization-Simulation-Optimization (WT-NNPSO-SO) with the purpose of searching in Big Data stored in the electricity market and improving the accuracy of short-term forecasting of electricity supply and demand. The electricity market data exploration approach was based on the simulation-optimization algorithms. It was combined with the Wavelet-Neural Networks-Particle Swarm Optimization (Wavelet-NNPSO) method to improve the forecasting accuracy with the assumption Length of Training Data (LOTD) increased. In comparison with previous techniques, the runtime of the proposed technique was improved in larger sizes of data due to the use of metaheuristic algorithms. The findings were dealt with in the Results section. Full article
(This article belongs to the Special Issue Data Stream Mining and Processing)
Show Figures

Figure 1

6 pages, 2920 KiB  
Data Descriptor
Tamarisk and Russian Olive Occurrence and Absence Dataset Collected in Select Tributaries of the Colorado River for 2017
by Anthony G. Vorster, Brian D. Woodward, Amanda M. West, Nicholas E. Young, Robert G. Sturtevant, Timothy J. Mayer, Rebecca K. Girma and Paul H. Evangelista
Data 2018, 3(4), 42; https://doi.org/10.3390/data3040042 - 17 Oct 2018
Cited by 3 | Viewed by 3614
Abstract
Non-native and invasive tamarisk (Tamarix spp.) and Russian olive (Elaeagnus angustifolia) are common in riparian areas of the Colorado River Basin and are regarded as problematic by many land and water managers. Widespread location data showing current distribution of these [...] Read more.
Non-native and invasive tamarisk (Tamarix spp.) and Russian olive (Elaeagnus angustifolia) are common in riparian areas of the Colorado River Basin and are regarded as problematic by many land and water managers. Widespread location data showing current distribution of these species, especially data suitable for remote sensing analyses, are lacking. This dataset contains 3476 species occurrence and absence point records for tamarisk and Russian olive along rivers within the Colorado River Basin in Arizona, California, Colorado, Nevada, New Mexico, and Utah. Data were collected in the field in the summer of 2017 with high-resolution imagery loaded on computer tablets. This dataset includes status (live, dead, defoliated, etc.) of observed tamarisk to capture variability in tamarisk health across the basin, in part attributable to the tamarisk beetle (Diorhabda spp.). For absence points, vegetation or land cover were recorded. These data have a range of applications including serving as a baseline for the current distribution of these species, species distribution modeling, species detection with remote sensing, and invasive species management. Full article
Show Figures

Figure 1

12 pages, 8042 KiB  
Data Descriptor
Measurement and Numerical Modeling of Cell-Free Protein Synthesis: Combinatorial Block-Variants of the PURE System
by Paolo Carrara, Emiliano Altamura, Francesca D’Angelo, Fabio Mavelli and Pasquale Stano
Data 2018, 3(4), 41; https://doi.org/10.3390/data3040041 - 14 Oct 2018
Cited by 9 | Viewed by 3314
Abstract
Protein synthesis is at the core of bottom-up construction of artificial cellular mimics. Intriguingly, several reports have revealed that when a transcription–translation (TX–TL) kit is encapsulated inside lipid vesicles (or water-in-oil droplets), high between-vesicles diversity is observed in terms of protein synthesis rate [...] Read more.
Protein synthesis is at the core of bottom-up construction of artificial cellular mimics. Intriguingly, several reports have revealed that when a transcription–translation (TX–TL) kit is encapsulated inside lipid vesicles (or water-in-oil droplets), high between-vesicles diversity is observed in terms of protein synthesis rate and yield. Stochastic solute partition can be a major determinant of these observations. In order to verify that the variation of TX–TL components concentration brings about a variation of produced protein rate and yield, here we directly measure the performances of the ‘PURE system’ TX–TL kit variants. We report and share the kinetic traces of the enhanced Green Fluorescent Protein (eGFP) synthesis in bulk aqueous phase, for 27 combinatorial block-variants. The eGFP production is a sensitive function of TX–TL components concentration in the explored concentration range. Providing direct evidence that protein synthesis yield and rate actually mirror the TX–TL composition, this study supports the above-mentioned hypothesis on stochastic solute partition, without excluding, however, the contribution of other factors (e.g., inactivation of components). Full article
Show Figures

Graphical abstract

Previous Issue
Next Issue
Back to TopTop