Next Issue
Volume 3, March
Previous Issue
Volume 2, September
 
 

Big Data Cogn. Comput., Volume 2, Issue 4 (December 2018) – 10 articles

Cover Story (view full-size image): Despite applications of Topological Data Analysis (TDA) in data mining, it has not been widely applied to text analytics. On the other hand, in many text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features, TDA may play a role in filling this gap. Our new approach analyzes the topology of multi-dimensional time series of the same-type entities in long documents. Using topological signatures, we examined 75 novels by six novelists of the Romantic era and tried to predict the author only based on the graph of the main characters (persons) in each novel. We achieved 77% accuracy, thereby proving the value of homological persistence and time series analysis for text classification. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Select all
Export citation of selected articles as:
19 pages, 647 KiB  
Article
Unscented Kalman Filter Based on Spectrum Sensing in a Cognitive Radio Network Using an Adaptive Fuzzy System
by Md Ruhul Amin, Md Mahbubur Rahman, Mohammad Amazad Hossain, Md Khairul Islam, Kazi Mowdud Ahmed, Bikash Chandra Singh and Md Sipon Miah
Big Data Cogn. Comput. 2018, 2(4), 39; https://doi.org/10.3390/bdcc2040039 - 17 Dec 2018
Cited by 8 | Viewed by 4417
Abstract
In this paper, we proposed the unscented Kalman filter (UKF) based on cooperative spectrum sensing (CSS) scheme in a cognitive radio network (CRN) using an adaptive fuzzy system—in this proposed scheme, firstly, the UKF to apply the nonlinear system which is used to [...] Read more.
In this paper, we proposed the unscented Kalman filter (UKF) based on cooperative spectrum sensing (CSS) scheme in a cognitive radio network (CRN) using an adaptive fuzzy system—in this proposed scheme, firstly, the UKF to apply the nonlinear system which is used to minimize the mean square estimation error; secondly, an adaptive fuzzy logic rule based on an inference engine to estimate the local decisions to detect a licensed primary user (PU) that is applied at the fusion center (FC). After that, the FC makes a global decision by using a defuzzification procedure based on a proposed algorithm. Simulation results show that the proposed scheme achieved better detection gain than the conventional schemes like an equal gain combining (EGC) based soft fusion rule and a Kalman filter (KL) based soft fusion rule under any conditions. Moreover, the proposed scheme achieved the lowest global probability of error compared to both the conventional EGC and KF schemes. Full article
Show Figures

Figure 1

13 pages, 5797 KiB  
Article
Prototype of Mobile Device to Contribute to Urban Mobility of Visually Impaired People
by Fabrício Rosa Amorim and Fernando Luiz de Paula Santil
Big Data Cogn. Comput. 2018, 2(4), 38; https://doi.org/10.3390/bdcc2040038 - 4 Dec 2018
Viewed by 3584
Abstract
Visually impaired people (VIP) feel a lack of aid for their facilitated urban mobility, mainly due to obstacles encountered on their routes. This paper describes the design of AudioMaps, a prototype of cartographic technology for mobile devices. AudioMaps was designed to register the [...] Read more.
Visually impaired people (VIP) feel a lack of aid for their facilitated urban mobility, mainly due to obstacles encountered on their routes. This paper describes the design of AudioMaps, a prototype of cartographic technology for mobile devices. AudioMaps was designed to register the descriptions and locations of points of interest. When a point is registered, the prototype inserts a georeferenced landmark on the screen (based on Google Maps). Then, if the AudioMaps position is next to (15 or 5 m from) the previously registered point, it sends by audio the missing distance and a description. For a preview, a test area located in Monte Carmelo, Brazil, was selected, and the light poles, street corners (name of streets forming the intersections), and crosswalks were registered in AudioMaps. A tactile model, produced manually, was used to form the first image of four sighted people and four VIP, who completed a navigation task in the tested area. The results indicate that both the tactile model and the audiovisual prototype can be used by both groups of participants. Above all, the prototype proved to be a viable and promising option for decision-making and spatial orientation in urban environments. New ways of presenting data to VIP or sighted people are described. Full article
Show Figures

Figure 1

16 pages, 4852 KiB  
Article
Leveraging Image Representation of Network Traffic Data and Transfer Learning in Botnet Detection
by Shayan Taheri, Milad Salem and Jiann-Shiun Yuan
Big Data Cogn. Comput. 2018, 2(4), 37; https://doi.org/10.3390/bdcc2040037 - 27 Nov 2018
Cited by 30 | Viewed by 6670
Abstract
The advancements in the Internet has enabled connecting more devices into this technology every day. The emergence of the Internet of Things has aggregated this growth. Lack of security in an IoT world makes these devices hot targets for cyber criminals to perform [...] Read more.
The advancements in the Internet has enabled connecting more devices into this technology every day. The emergence of the Internet of Things has aggregated this growth. Lack of security in an IoT world makes these devices hot targets for cyber criminals to perform their malicious actions. One of these actions is the Botnet attack, which is one of the main destructive threats that has been evolving since 2003 into different forms. This attack is a serious threat to the security and privacy of information. Its scalability, structure, strength, and strategy are also under successive development, and that it has survived for decades. A bot is defined as a software application that executes a number of automated tasks (simple but structurally repetitive) over the Internet. Several bots make a botnet that infects a number of devices and communicates with their controller called the botmaster to get their instructions. A botnet executes tasks with a rate that would be impossible to be done by a human being. Nowadays, the activities of bots are concealed in between the normal web flows and occupy more than half of all web traffic. The largest use of bots is in web spidering (web crawler), in which an automated script fetches, analyzes, and files information from web servers. They also contribute to other attacks, such as distributed denial of service (DDoS), SPAM, identity theft, phishing, and espionage. A number of botnet detection techniques have been proposed, such as honeynet-based and Intrusion Detection System (IDS)-based. These techniques are not effective anymore due to the constant update of the bots and their evasion mechanisms. Recently, botnet detection techniques based upon machine/deep learning have been proposed that are more capable in comparison to their previously mentioned counterparts. In this work, we propose a deep learning-based engine for botnet detection to be utilized in the IoT and the wearable devices. In this system, the normal and botnet network traffic data are transformed into image before being given into a deep convolutional neural network, named DenseNet with and without considering transfer learning. The system is implemented using Python programming language and the CTU-13 Dataset is used for evaluation in one study. According to our simulation results, using transfer learning can improve the accuracy from 33.41% up to 99.98%. In addition, two other classifiers of Support Vector Machine (SVM) and logistic regression have been used. They showed an accuracy of 83.15% and 78.56%, respectively. In another study, we evaluate our system by an in-house live normal dataset and a solely botnet dataset. Similarly, the system performed very well in data classification in these studies. To examine the capability of our system for real-time applications, we measure the system training and testing times. According to our examination, it takes 0.004868 milliseconds to process each packet from the network traffic data during testing. Full article
(This article belongs to the Special Issue Applied Deep Learning: Business and Industrial Applications)
Show Figures

Figure 1

15 pages, 1245 KiB  
Article
A Model Free Control Based on Machine Learning for Energy Converters in an Array
by Simon Thomas, Marianna Giassi, Mikael Eriksson, Malin Göteman, Jan Isberg, Edward Ransley, Martyn Hann and Jens Engström
Big Data Cogn. Comput. 2018, 2(4), 36; https://doi.org/10.3390/bdcc2040036 - 22 Nov 2018
Cited by 15 | Viewed by 4279
Abstract
This paper introduces a machine learning based control strategy for energy converter arrays designed to work under realistic conditions where the optimal control parameter can not be obtained analytically. The control strategy neither relies on a mathematical model, nor does it need a [...] Read more.
This paper introduces a machine learning based control strategy for energy converter arrays designed to work under realistic conditions where the optimal control parameter can not be obtained analytically. The control strategy neither relies on a mathematical model, nor does it need a priori information about the energy medium. Therefore several identical energy converters are arranged so that they are affected simultaneously by the energy medium. Each device uses a different control strategy, of which at least one has to be the machine learning approach presented in this paper. During operation all energy converters record the absorbed power and control output; the machine learning device gets the data from the converter with the highest power absorption and so learns the best performing control strategy for each situation. Consequently, the overall network has a better overall performance than each individual strategy. This concept is evaluated for wave energy converters (WECs) with numerical simulations and experiments with physical scale models in a wave tank. In the first of two numerical simulations, the learnable WEC works in an array with four WECs applying a constant damping factor. In the second simulation, two learnable WECs were learning with each other. It showed that in the first test the WEC was able to absorb as much as the best constant damping WEC, while in the second run it could absorb even slightly more. During the physical model test, the ANN showed its ability to select the better of two possible damping coefficients based on real world input data. Full article
Show Figures

Figure 1

17 pages, 634 KiB  
Article
The Next Generation Cognitive Security Operations Center: Network Flow Forensics Using Cybersecurity Intelligence
by Konstantinos Demertzis, Panayiotis Kikiras, Nikos Tziritas, Salvador Llopis Sanchez and Lazaros Iliadis
Big Data Cogn. Comput. 2018, 2(4), 35; https://doi.org/10.3390/bdcc2040035 - 22 Nov 2018
Cited by 28 | Viewed by 6538
Abstract
A Security Operations Center (SOC) can be defined as an organized and highly skilled team that uses advanced computer forensics tools to prevent, detect and respond to cybersecurity incidents of an organization. The fundamental aspects of an effective SOC is related to the [...] Read more.
A Security Operations Center (SOC) can be defined as an organized and highly skilled team that uses advanced computer forensics tools to prevent, detect and respond to cybersecurity incidents of an organization. The fundamental aspects of an effective SOC is related to the ability to examine and analyze the vast number of data flows and to correlate several other types of events from a cybersecurity perception. The supervision and categorization of network flow is an essential process not only for the scheduling, management, and regulation of the network’s services, but also for attacks identification and for the consequent forensics’ investigations. A serious potential disadvantage of the traditional software solutions used today for computer network monitoring, and specifically for the instances of effective categorization of the encrypted or obfuscated network flow, which enforces the rebuilding of messages packets in sophisticated underlying protocols, is the requirements of computational resources. In addition, an additional significant inability of these software packages is they create high false positive rates because they are deprived of accurate predicting mechanisms. For all the reasons above, in most cases, the traditional software fails completely to recognize unidentified vulnerabilities and zero-day exploitations. This paper proposes a novel intelligence driven Network Flow Forensics Framework (NF3) which uses low utilization of computing power and resources, for the Next Generation Cognitive Computing SOC (NGC2SOC) that rely solely on advanced fully automated intelligence methods. It is an effective and accurate Ensemble Machine Learning forensics tool to Network Traffic Analysis, Demystification of Malware Traffic and Encrypted Traffic Identification. Full article
Show Figures

Figure 1

15 pages, 366 KiB  
Article
Big-Crypto: Big Data, Blockchain and Cryptocurrency
by Hossein Hassani, Xu Huang and Emmanuel Silva
Big Data Cogn. Comput. 2018, 2(4), 34; https://doi.org/10.3390/bdcc2040034 - 19 Oct 2018
Cited by 67 | Viewed by 21111
Abstract
Cryptocurrency has been a trending topic over the past decade, pooling tremendous technological power and attracting investments valued over trillions of dollars on a global scale. The cryptocurrency technology and its network have been endowed with many superior features due to its unique [...] Read more.
Cryptocurrency has been a trending topic over the past decade, pooling tremendous technological power and attracting investments valued over trillions of dollars on a global scale. The cryptocurrency technology and its network have been endowed with many superior features due to its unique architecture, which also determined its worldwide efficiency, applicability and data intensive characteristics. This paper introduces and summarises the interactions between two significant concepts in the digitalized world, i.e., cryptocurrency and Big Data. Both subjects are at the forefront of technological research, and this paper focuses on their convergence and comprehensively reviews the very recent applications and developments after 2016. Accordingly, we aim to present a systematic review of the interactions between Big Data and cryptocurrency and serve as the one stop reference directory for researchers with regard to identifying research gaps and directing future explorations. Full article
Show Figures

Figure 1

10 pages, 887 KiB  
Article
Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining
by Shafie Gholizadeh, Armin Seyeditabari and Wlodek Zadrozny
Big Data Cogn. Comput. 2018, 2(4), 33; https://doi.org/10.3390/bdcc2040033 - 18 Oct 2018
Cited by 13 | Viewed by 5622
Abstract
Topological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most [...] Read more.
Topological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features of the data, TDA may play a significant role in the resulted gap on text processing state of the art. Once provided, the topology of different entities through a textual document may reveal some additive information regarding the document that is not reflected in any other features from conventional text processing methods. In this paper, we introduce a novel approach that hires TDA in text processing in order to capture and use the topology of different same-type entities in textual documents. First, we will show how to extract some topological signatures in the text using persistent homology-i.e., a TDA tool that captures topological signature of data cloud. Then we will show how to utilize these signatures for text classification. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2018)
Show Figures

Figure 1

30 pages, 3203 KiB  
Review
Data Stream Clustering Techniques, Applications, and Models: Comparative Analysis and Discussion
by Umesh Kokate, Arvind Deshpande, Parikshit Mahalle and Pramod Patil
Big Data Cogn. Comput. 2018, 2(4), 32; https://doi.org/10.3390/bdcc2040032 - 17 Oct 2018
Cited by 41 | Viewed by 7902
Abstract
Data growth in today’s world is exponential, many applications generate huge amount of data streams at very high speed such as smart grids, sensor networks, video surveillance, financial systems, medical science data, web click streams, network data, etc. In the case of traditional [...] Read more.
Data growth in today’s world is exponential, many applications generate huge amount of data streams at very high speed such as smart grids, sensor networks, video surveillance, financial systems, medical science data, web click streams, network data, etc. In the case of traditional data mining, the data set is generally static in nature and available many times for processing and analysis. However, data stream mining has to satisfy constraints related to real-time response, bounded and limited memory, single-pass, and concept-drift detection. The main problem is identifying the hidden pattern and knowledge for understanding the context for identifying trends from continuous data streams. In this paper, various data stream methods and algorithms are reviewed and evaluated on standard synthetic data streams and real-life data streams. Density-micro clustering and density-grid-based clustering algorithms are discussed and comparative analysis in terms of various internal and external clustering evaluation methods is performed. It was observed that a single algorithm cannot satisfy all the performance measures. The performance of these data stream clustering algorithms is domain-specific and requires many parameters for density and noise thresholds. Full article
Show Figures

Figure 1

16 pages, 1043 KiB  
Article
Constrained Optimization-Based Extreme Learning Machines with Bagging for Freezing of Gait Detection
by Syed Waqas Haider Shah, Khalid Iqbal and Ahmad Talal Riaz
Big Data Cogn. Comput. 2018, 2(4), 31; https://doi.org/10.3390/bdcc2040031 - 15 Oct 2018
Cited by 5 | Viewed by 4218
Abstract
The Internet-of-Things (IoT) is a paradigm shift from slow and manual approaches to fast and automated systems. It has been deployed for various use-cases and applications in recent times. There are many aspects of IoT that can be used for the assistance of [...] Read more.
The Internet-of-Things (IoT) is a paradigm shift from slow and manual approaches to fast and automated systems. It has been deployed for various use-cases and applications in recent times. There are many aspects of IoT that can be used for the assistance of elderly individuals. In this paper, we detect the presence or absence of freezing of gait in patients suffering from Parkinson’s disease (PD) by using the data from body-mounted acceleration sensors placed on the legs and hips of the patients. For accurate detection and estimation, constrained optimization-based extreme learning machines (C-ELM) have been utilized. Moreover, in order to enhance the accuracy even further, C-ELM with bagging (C-ELMBG) has been proposed, which uses the characteristics of least squares support vector machines. The experiments have been carried out on the publicly available Daphnet freezing of gait dataset to verify the feasibility of C-ELM and C-ELMBG. The simulation results show an accuracy above 90% for both methods. A detailed comparison with other state-of-the-art statistical learning algorithms such as linear discriminate analysis, classification and regression trees, random forest and state vector machines is also presented where C-ELM and C-ELMBG show better performance in all aspects, including accuracy, sensitivity, and specificity. Full article
(This article belongs to the Special Issue Health Assessment in the Big Data Era)
Show Figures

Figure 1

20 pages, 7738 KiB  
Article
An Experimental Evaluation of Fault Diagnosis from Imbalanced and Incomplete Data for Smart Semiconductor Manufacturing
by Milad Salem, Shayan Taheri and Jiann-Shiun Yuan
Big Data Cogn. Comput. 2018, 2(4), 30; https://doi.org/10.3390/bdcc2040030 - 21 Sep 2018
Cited by 24 | Viewed by 5711
Abstract
The SECOM dataset contains information about a semiconductor production line, entailing the products that failed the in-house test line and their attributes. This dataset, similar to most semiconductor manufacturing data, contains missing values, imbalanced classes, and noisy features. In this work, the challenges [...] Read more.
The SECOM dataset contains information about a semiconductor production line, entailing the products that failed the in-house test line and their attributes. This dataset, similar to most semiconductor manufacturing data, contains missing values, imbalanced classes, and noisy features. In this work, the challenges of this dataset are met and many different approaches for classification are evaluated to perform fault diagnosis. We present an experimental evaluation that examines 288 combinations of different approaches involving data pruning, data imputation, feature selection, and classification methods, to find the suitable approaches for this task. Furthermore, a novel data imputation approach, namely “In-painting KNN-Imputation” is introduced and is shown to outperform the common data imputation technique. The results show the capability of each classifier, feature selection method, data generation method, and data imputation technique, with a full analysis of their respective parameter optimizations. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop