Editor’s Choice Articles

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
12 pages, 2770 KiB  
Article
Which Way to Cope with COVID-19 Challenges? Contributions of the IoT for Smart City Projects
by Silvia Fernandes
Big Data Cogn. Comput. 2021, 5(2), 26; https://doi.org/10.3390/bdcc5020026 - 16 Jun 2021
Cited by 13 | Viewed by 5772
Abstract
Many activities and sectors have come to a halt due to the COVID-19 crisis. People and workers’ habits and behaviors have changed dramatically, as the use of technologies and connections, virtual reality, and remote support have been enhanced. Businesses and cities have been [...] Read more.
Many activities and sectors have come to a halt due to the COVID-19 crisis. People and workers’ habits and behaviors have changed dramatically, as the use of technologies and connections, virtual reality, and remote support have been enhanced. Businesses and cities have been forced to quickly adapt to the new challenges. Digital technologies have allowed people to have better access to public services due to improved use of resources. Smart cities have significant potential for linking people to work and services as never done before. Additionally, the technological convergence produces data that can enhance interactions and decisions toward the “new normal”. In this paper, the aim is to assess how Portugal is prepared to respond to the accelerated process that this context demands from cities. Portuguese SMEs have developed a good capacity for entrepreneurship and innovation; however, they are still behind in converting the knowledge acquired in sales and exports and there is still limited collaboration at the public-private level. The acceleration of smart cities through the Internet of Things (IoT) may encourage changes in these issues. A more assertive alignment between the emergent technologies and the digitization goals of companies is required. This paper opens a discussion around major needs and trends of IoT (and related technologies) since the pandemic has leveraged them. The relationship between innovation and city smartness is approached to assess main contributing and limiting variables (through the European Innovation Scoreboard), to clarify future directions toward smarter services. The tourism sector, as the largest export economic activity in the country, is addressed in this matter. An analytical framework (using, for example, Power BI and Azure IoT Hub) around this approach can choose and support the most suitable areas of development in the country. Full article
(This article belongs to the Special Issue Internet of Things (IoT) and Ambient Intelligence)
Show Figures

Figure 1

12 pages, 1420 KiB  
Article
Structural Differences of the Semantic Network in Adolescents with Intellectual Disability
by Karin Nilsson, Lisa Palmqvist, Magnus Ivarsson, Anna Levén, Henrik Danielsson, Marie Annell, Daniel Schöld and Michaela Socher
Big Data Cogn. Comput. 2021, 5(2), 25; https://doi.org/10.3390/bdcc5020025 - 1 Jun 2021
Cited by 3 | Viewed by 6731
Abstract
The semantic network structure is a core aspect of the mental lexicon and is, therefore, a key to understanding language development processes. This study investigated the structure of the semantic network of adolescents with intellectual disability (ID) and children with typical development (TD) [...] Read more.
The semantic network structure is a core aspect of the mental lexicon and is, therefore, a key to understanding language development processes. This study investigated the structure of the semantic network of adolescents with intellectual disability (ID) and children with typical development (TD) using network analysis. The semantic networks of the participants (nID = 66; nTD = 49) were estimated from the semantic verbal fluency task with the pathfinder method. The groups were matched on the number of produced words. The average shortest path length (ASPL), the clustering coefficient (CC), and the network’s modularity (Q) of the two groups were compared. A significantly smaller ASPL and Q and a significantly higher CC were found for the adolescents with ID in comparison with the children with TD. Reasons for this might be differences in the language environment and differences in cognitive skills. The quality and quantity of the language input might differ for adolescents with ID due to differences in school curricula and because persons with ID tend to engage in different out-of-school activities compared to TD peers. Future studies should investigate the influence of different language environments on the language development of persons with ID. Full article
(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)
Show Figures

Figure 1

12 pages, 546 KiB  
Article
Without Data Quality, There Is No Data Migration
by Otmane Azeroual and Meena Jha
Big Data Cogn. Comput. 2021, 5(2), 24; https://doi.org/10.3390/bdcc5020024 - 18 May 2021
Cited by 10 | Viewed by 6372
Abstract
Data migration is required to run data-intensive applications. Legacy data storage systems are not capable of accommodating the changing nature of data. In many companies, data migration projects fail because their importance and complexity are not taken seriously enough. Data migration strategies include [...] Read more.
Data migration is required to run data-intensive applications. Legacy data storage systems are not capable of accommodating the changing nature of data. In many companies, data migration projects fail because their importance and complexity are not taken seriously enough. Data migration strategies include storage migration, database migration, application migration, and business process migration. Regardless of which migration strategy a company chooses, there should always be a stronger focus on data cleansing. On the one hand, complete, correct, and clean data not only reduce the cost, complexity, and risk of the changeover, it also means a good basis for quick and strategic company decisions and is therefore an essential basis for today’s dynamic business processes. Data quality is an important issue for companies looking for data migration these days and should not be overlooked. In order to determine the relationship between data quality and data migration, an empirical study with 25 large German and Swiss companies was carried out to find out the importance of data quality in companies for data migration. In this paper, we present our findings regarding how data quality plays an important role in a data migration plans and must not be ignored. Without acceptable data quality, data migration is impossible. Full article
(This article belongs to the Special Issue Educational Data Mining and Technology)
Show Figures

Figure 1

19 pages, 2049 KiB  
Article
Big Remote Sensing Image Classification Based on Deep Learning Extraction Features and Distributed Spark Frameworks
by Imen Chebbi, Nedra Mellouli, Imed Riadh Farah and Myriam Lamolle
Big Data Cogn. Comput. 2021, 5(2), 21; https://doi.org/10.3390/bdcc5020021 - 5 May 2021
Cited by 15 | Viewed by 6153
Abstract
Big data analysis assumes a significant role in Earth observation using remote sensing images, since the explosion of data images from multiple sensors is used in several fields. The traditional data analysis techniques have different limitations on storing and processing massive volumes of [...] Read more.
Big data analysis assumes a significant role in Earth observation using remote sensing images, since the explosion of data images from multiple sensors is used in several fields. The traditional data analysis techniques have different limitations on storing and processing massive volumes of data. Besides, big remote sensing data analytics demand sophisticated algorithms based on specific techniques to store to process the data in real-time or in near real-time with high accuracy, efficiency, and high speed. In this paper, we present a method for storing a huge number of heterogeneous satellite images based on Hadoop distributed file system (HDFS) and Apache Spark. We also present how deep learning algorithms such as VGGNet and UNet can be beneficial to big remote sensing data processing for feature extraction and classification. The obtained results prove that our approach outperforms other methods. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis for Image Processing)
Show Figures

Figure 1

14 pages, 303 KiB  
Article
Traceability for Trustworthy AI: A Review of Models and Tools
by Marçal Mora-Cantallops, Salvador Sánchez-Alonso, Elena García-Barriocanal and Miguel-Angel Sicilia
Big Data Cogn. Comput. 2021, 5(2), 20; https://doi.org/10.3390/bdcc5020020 - 4 May 2021
Cited by 39 | Viewed by 14555
Abstract
Traceability is considered a key requirement for trustworthy artificial intelligence (AI), related to the need to maintain a complete account of the provenance of data, processes, and artifacts involved in the production of an AI model. Traceability in AI shares part of its [...] Read more.
Traceability is considered a key requirement for trustworthy artificial intelligence (AI), related to the need to maintain a complete account of the provenance of data, processes, and artifacts involved in the production of an AI model. Traceability in AI shares part of its scope with general purpose recommendations for provenance as W3C PROV, and it is also supported to different extents by specific tools used by practitioners as part of their efforts in making data analytic processes reproducible or repeatable. Here, we review relevant tools, practices, and data models for traceability in their connection to building AI models and systems. We also propose some minimal requirements to consider a model traceable according to the assessment list of the High-Level Expert Group on AI. Our review shows how, although a good number of reproducibility tools are available, a common approach is currently lacking, together with the need for shared semantics. Besides, we have detected that some tools have either not achieved full maturity, or are already falling into obsolescence or in a state of near abandonment by its developers, which might compromise the reproducibility of the research trusted to them. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: 5th Anniversary Feature Papers)
Show Figures

Figure 1

16 pages, 2518 KiB  
Article
From data Processing to Knowledge Processing: Working with Operational Schemas by Autopoietic Machines
by Mark Burgin and Rao Mikkilineni
Big Data Cogn. Comput. 2021, 5(1), 13; https://doi.org/10.3390/bdcc5010013 - 10 Mar 2021
Cited by 11 | Viewed by 6231
Abstract
Knowledge processing is an important feature of intelligence in general and artificial intelligence in particular. To develop computing systems working with knowledge, it is necessary to elaborate the means of working with knowledge representations (as opposed to data), because knowledge is an abstract [...] Read more.
Knowledge processing is an important feature of intelligence in general and artificial intelligence in particular. To develop computing systems working with knowledge, it is necessary to elaborate the means of working with knowledge representations (as opposed to data), because knowledge is an abstract structure. There are different forms of knowledge representations derived from data. One of the basic forms is called a schema, which can belong to one of three classes: operational, descriptive, and representation schemas. The goal of this paper is the development of theoretical and practical tools for processing operational schemas. To achieve this goal, we use schema representations elaborated in the mathematical theory of schemas and use structural machines as a powerful theoretical tool for modeling parallel and concurrent computational processes. We describe the schema of autopoietic machines as physical realizations of structural machines. An autopoietic machine is a technical system capable of regenerating, reproducing, and maintaining itself by production, transformation, and destruction of its components and the networks of processes downstream contained in them. We present the theory and practice of designing and implementing autopoietic machines as information processing structures integrating both symbolic computing and neural networks. Autopoietic machines use knowledge structures containing the behavioral evolution of the system and its interactions with the environment to maintain stability by counteracting fluctuations. Full article
(This article belongs to the Special Issue Big Data Analytics and Cloud Data Management)
Show Figures

Figure 1

18 pages, 1958 KiB  
Article
Processing Big Data with Apache Hadoop in the Current Challenging Era of COVID-19
by Otmane Azeroual and Renaud Fabre
Big Data Cogn. Comput. 2021, 5(1), 12; https://doi.org/10.3390/bdcc5010012 - 9 Mar 2021
Cited by 26 | Viewed by 10791
Abstract
Big data have become a global strategic issue, as increasingly large amounts of unstructured data challenge the IT infrastructure of global organizations and threaten their capacity for strategic forecasting. As experienced in former massive information issues, big data technologies, such as Hadoop, should [...] Read more.
Big data have become a global strategic issue, as increasingly large amounts of unstructured data challenge the IT infrastructure of global organizations and threaten their capacity for strategic forecasting. As experienced in former massive information issues, big data technologies, such as Hadoop, should efficiently tackle the incoming large amounts of data and provide organizations with relevant processed information that was formerly neither visible nor manageable. After having briefly recalled the strategic advantages of big data solutions in the introductory remarks, in the first part of this paper, we focus on the advantages of big data solutions in the currently difficult time of the COVID-19 pandemic. We characterize it as an endemic heterogeneous data context; we then outline the advantages of technologies such as Hadoop and its IT suitability in this context. In the second part, we identify two specific advantages of Hadoop solutions, globality combined with flexibility, and we notice that they are at work with a “Hadoop Fusion Approach” that we describe as an optimal response to the context. In the third part, we justify selected qualifications of globality and flexibility by the fact that Hadoop solutions enable comparable returns in opposite contexts of models of partial submodels and of models of final exact systems. In part four, we remark that in both these opposite contexts, Hadoop’s solutions allow a large range of needs to be fulfilled, which fits with requirements previously identified as the current heterogeneous data structure of COVID-19 information. In the final part, we propose a framework of strategic data processing conditions. To the best of our knowledge, they appear to be the most suitable to overcome COVID-19 massive information challenges. Full article
(This article belongs to the Special Issue Big Data Analytics and Cloud Data Management)
Show Figures

Figure 1

21 pages, 7004 KiB  
Article
Automatic Defects Segmentation and Identification by Deep Learning Algorithm with Pulsed Thermography: Synthetic and Experimental Data
by Qiang Fang, Clemente Ibarra-Castanedo and Xavier Maldague
Big Data Cogn. Comput. 2021, 5(1), 9; https://doi.org/10.3390/bdcc5010009 - 26 Feb 2021
Cited by 38 | Viewed by 5877
Abstract
In quality evaluation (QE) of the industrial production field, infrared thermography (IRT) is one of the most crucial techniques used for evaluating composite materials due to the properties of low cost, fast inspection of large surfaces, and safety. The application of deep neural [...] Read more.
In quality evaluation (QE) of the industrial production field, infrared thermography (IRT) is one of the most crucial techniques used for evaluating composite materials due to the properties of low cost, fast inspection of large surfaces, and safety. The application of deep neural networks tends to be a prominent direction in IRT Non-Destructive Testing (NDT). During the training of the neural network, the Achilles heel is the necessity of a large database. The collection of huge amounts of training data is the high expense task. In NDT with deep learning, synthetic data contributing to training in infrared thermography remains relatively unexplored. In this paper, synthetic data from the standard Finite Element Models are combined with experimental data to build repositories with Mask Region based Convolutional Neural Networks (Mask-RCNN) to strengthen the neural network, learning the essential features of objects of interest and achieving defect segmentation automatically. These results indicate the possibility of adapting inexpensive synthetic data merging with a certain amount of the experimental database for training the neural networks in order to achieve the compelling performance from a limited collection of the annotated experimental data of a real-world practical thermography experiment. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis for Image Processing)
Show Figures

Figure 1

40 pages, 4247 KiB  
Review
IoT Technologies for Livestock Management: A Review of Present Status, Opportunities, and Future Trends
by Bernard Ijesunor Akhigbe, Kamran Munir, Olugbenga Akinade, Lukman Akanbi and Lukumon O. Oyedele
Big Data Cogn. Comput. 2021, 5(1), 10; https://doi.org/10.3390/bdcc5010010 - 26 Feb 2021
Cited by 66 | Viewed by 18103
Abstract
The world population currently stands at about 7 billion amidst an expected increase in 2030 from 9.4 billion to around 10 billion in 2050. This burgeoning population has continued to influence the upward demand for animal food. Moreover, the management of finite resources [...] Read more.
The world population currently stands at about 7 billion amidst an expected increase in 2030 from 9.4 billion to around 10 billion in 2050. This burgeoning population has continued to influence the upward demand for animal food. Moreover, the management of finite resources such as land, the need to reduce livestock contribution to greenhouse gases, and the need to manage inherent complex, highly contextual, and repetitive day-to-day livestock management (LsM) routines are some examples of challenges to overcome in livestock production. The Internet of Things (IoT)’s usefulness in other vertical industries (OVI) shows that its role will be significant in LsM. This work uses the systematic review methodology of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) to guide a review of existing literature on IoT in OVI. The goal is to identify the IoT’s ecosystem, architecture, and its technicalities—present status, opportunities, and expected future trends—regarding its role in LsM. Among identified IoT roles in LsM, the authors found that data will be its main contributor. The traditional approach of reactive data processing will give way to the proactive approach of augmented analytics to provide insights about animal processes. This will undoubtedly free LsM from the drudgery of repetitive tasks with opportunities for improved productivity. Full article
Show Figures

Figure 1

21 pages, 3920 KiB  
Article
Big Data and Personalisation for Non-Intrusive Smart Home Automation
by Suriya Priya R. Asaithambi, Sitalakshmi Venkatraman and Ramanathan Venkatraman
Big Data Cogn. Comput. 2021, 5(1), 6; https://doi.org/10.3390/bdcc5010006 - 30 Jan 2021
Cited by 26 | Viewed by 9082
Abstract
With the advent of the Internet of Things (IoT), many different smart home technologies are commercially available. However, the adoption of such technologies is slow as many of them are not cost-effective and focus on specific functions such as energy efficiency. Recently, IoT [...] Read more.
With the advent of the Internet of Things (IoT), many different smart home technologies are commercially available. However, the adoption of such technologies is slow as many of them are not cost-effective and focus on specific functions such as energy efficiency. Recently, IoT devices and sensors have been designed to enhance the quality of personal life by having the capability to generate continuous data streams that can be used to monitor and make inferences by the user. While smart home devices connect to the home Wi-Fi network, there are still compatibility issues between devices from different manufacturers. Smart devices get even smarter when they can communicate with and control each other. The information collected by one device can be shared with others for achieving an enhanced automation of their operations. This paper proposes a non-intrusive approach of integrating and collecting data from open standard IoT devices for personalised smart home automation using big data analytics and machine learning. We demonstrate the implementation of our proposed novel technology instantiation approach for achieving non-intrusive IoT based big data analytics with a use case of a smart home environment. We employ open-source frameworks such as Apache Spark, Apache NiFi and FB-Prophet along with popular vendor tech-stacks such as Azure and DataBricks. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

21 pages, 8633 KiB  
Article
An Exploratory Study of COVID-19 Information on Twitter in the Greater Region
by Ninghan Chen, Zhiqiang Zhong and Jun Pang
Big Data Cogn. Comput. 2021, 5(1), 5; https://doi.org/10.3390/bdcc5010005 - 28 Jan 2021
Cited by 8 | Viewed by 6101
Abstract
The outbreak of the COVID-19 led to a burst of information in major online social networks (OSNs). Facing this constantly changing situation, OSNs have become an essential platform for people expressing opinions and seeking up-to-the-minute information. Thus, discussions on OSNs may become a [...] Read more.
The outbreak of the COVID-19 led to a burst of information in major online social networks (OSNs). Facing this constantly changing situation, OSNs have become an essential platform for people expressing opinions and seeking up-to-the-minute information. Thus, discussions on OSNs may become a reflection of reality. This paper aims to figure out how Twitter users in the Greater Region (GR) and related countries react differently over time through conducting a data-driven exploratory study of COVID-19 information using machine learning and representation learning methods. We find that tweet volume and COVID-19 cases in GR and related countries are correlated, but this correlation only exists in a particular period of the pandemic. Moreover, we plot the changing of topics in each country and region from 22 January 2020 to 5 June 2020, figuring out the main differences between GR and related countries. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

16 pages, 1054 KiB  
Article
NLP-Based Customer Loyalty Improvement Recommender System (CLIRS2)
by Katarzyna Anna Tarnowska and Zbigniew Ras
Big Data Cogn. Comput. 2021, 5(1), 4; https://doi.org/10.3390/bdcc5010004 - 19 Jan 2021
Cited by 23 | Viewed by 6794
Abstract
Structured data on customer feedback is becoming more costly and timely to collect and organize. On the other hand, unstructured opinionated data, e.g., in the form of free-text comments, is proliferating and available on public websites, such as social media websites, blogs, forums, [...] Read more.
Structured data on customer feedback is becoming more costly and timely to collect and organize. On the other hand, unstructured opinionated data, e.g., in the form of free-text comments, is proliferating and available on public websites, such as social media websites, blogs, forums, and websites that provide recommendations. This research proposes a novel method to develop a knowledge-based recommender system from unstructured (text) data. The method is based on applying an opinion mining algorithm, extracting aspect-based sentiment score per text item, and transforming text into a structured form. An action rule mining algorithm is applied to the data table constructed from sentiment mining. The proposed application of the method is the problem of improving customer satisfaction ratings. The results obtained from the dataset of customer comments related to the repair services were evaluated with accuracy and coverage. Further, the results were incorporated into the framework of a web-based user-friendly recommender system to advise the business on how to maximally increase their profits by introducing minimal sets of changes in their service. Experiments and evaluation results from comparing the structured data-based version of the system CLIRS (Customer Loyalty Improvement Recommender System) with the unstructured data-based version of the system (CLIRS2) are provided. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

24 pages, 1556 KiB  
Review
Forecasting Plant and Crop Disease: An Explorative Study on Current Algorithms
by Gianni Fenu and Francesca Maridina Malloci
Big Data Cogn. Comput. 2021, 5(1), 2; https://doi.org/10.3390/bdcc5010002 - 12 Jan 2021
Cited by 91 | Viewed by 14659
Abstract
Every year, plant diseases cause a significant loss of valuable food crops around the world. The plant and crop disease management practice implemented in order to mitigate damages have changed considerably. Today, through the application of new information and communication technologies, it is [...] Read more.
Every year, plant diseases cause a significant loss of valuable food crops around the world. The plant and crop disease management practice implemented in order to mitigate damages have changed considerably. Today, through the application of new information and communication technologies, it is possible to predict the onset or change in the severity of diseases using modern big data analysis techniques. In this paper, we present an analysis and classification of research studies conducted over the past decade that forecast the onset of disease at a pre-symptomatic stage (i.e., symptoms not visible to the naked eye) or at an early stage. We examine the specific approaches and methods adopted, pre-processing techniques and data used, performance metrics, and expected results, highlighting the issues encountered. The results of the study reveal that this practice is still in its infancy and that many barriers need to be overcome. Full article
Show Figures

Figure 1

14 pages, 315 KiB  
Article
eGAP: An Evolutionary Game Theoretic Approach to Random Forest Pruning
by Khaled Fawagreh and Mohamed Medhat Gaber
Big Data Cogn. Comput. 2020, 4(4), 37; https://doi.org/10.3390/bdcc4040037 - 28 Nov 2020
Cited by 4 | Viewed by 4947
Abstract
To make healthcare available and easily accessible, the Internet of Things (IoT), which paved the way to the construction of smart cities, marked the birth of many smart applications in numerous areas, including healthcare. As a result, smart healthcare applications have been and [...] Read more.
To make healthcare available and easily accessible, the Internet of Things (IoT), which paved the way to the construction of smart cities, marked the birth of many smart applications in numerous areas, including healthcare. As a result, smart healthcare applications have been and are being developed to provide, using mobile and electronic technology, higher diagnosis quality of the diseases, better treatment of the patients, and improved quality of lives. Since smart healthcare applications that are mainly concerned with the prediction of healthcare data (like diseases for example) rely on predictive healthcare data analytics, it is imperative for such predictive healthcare data analytics to be as accurate as possible. In this paper, we will exploit supervised machine learning methods in classification and regression to improve the performance of the traditional Random Forest on healthcare datasets, both in terms of accuracy and classification/regression speed, in order to produce an effective and efficient smart healthcare application, which we have termed eGAP. eGAP uses the evolutionary game theoretic approach replicator dynamics to evolve a Random Forest ensemble. Trees of high resemblance in an initial Random Forest are clustered, and then clusters grow and shrink by adding and removing trees using replicator dynamics, according to the predictive accuracy of each subforest represented by a cluster of trees. All clusters have an initial number of trees that is equal to the number of trees in the smallest cluster. Cluster growth is performed using trees that are not initially sampled. The speed and accuracy of the proposed method have been demonstrated by an experimental study on 10 classification and 10 regression medical datasets. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

17 pages, 2977 KiB  
Article
Ticket Sales Prediction and Dynamic Pricing Strategies in Public Transport
by Francesco Branda, Fabrizio Marozzo and Domenico Talia
Big Data Cogn. Comput. 2020, 4(4), 36; https://doi.org/10.3390/bdcc4040036 - 27 Nov 2020
Cited by 21 | Viewed by 10018
Abstract
In recent years, the demand for collective mobility services registered significant growth. In particular, the long-distance coach market underwent an important change in Europe, since FlixBus adopted a dynamic pricing strategy, providing low-cost transport services and an efficient and fast information system. This [...] Read more.
In recent years, the demand for collective mobility services registered significant growth. In particular, the long-distance coach market underwent an important change in Europe, since FlixBus adopted a dynamic pricing strategy, providing low-cost transport services and an efficient and fast information system. This paper presents a methodology, called DA4PT (Data Analytics for Public Transport), for discovering the factors that influence travelers in booking and purchasing bus tickets. Starting from a set of 3.23 million user-generated event logs of a bus ticketing platform, the methodology shows the correlation rules between booking factors and purchase of tickets. Such rules are then used to train machine learning models for predicting whether a user will buy or not a ticket. The rules are also used to define various dynamic pricing strategies with the purpose of increasing the number of tickets sales on the platform and the related amount of revenues. The methodology reaches an accuracy of 95% in forecasting the purchase of a ticket and a low variance in results. Exploiting a dynamic pricing strategy, DA4PT is able to increase the number of purchased tickets by 6% and the total revenue by 9% by showing the effectiveness of the proposed approach. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

30 pages, 975 KiB  
Article
Engineering Human–Machine Teams for Trusted Collaboration
by Basel Alhaji, Janine Beecken, Rüdiger Ehlers, Jan Gertheiss, Felix Merz, Jörg P. Müller, Michael Prilla, Andreas Rausch, Andreas Reinhardt, Delphine Reinhardt, Christian Rembe, Niels-Ole Rohweder, Christoph Schwindt, Stephan Westphal and Jürgen Zimmermann
Big Data Cogn. Comput. 2020, 4(4), 35; https://doi.org/10.3390/bdcc4040035 - 23 Nov 2020
Cited by 11 | Viewed by 10218
Abstract
The way humans and artificially intelligent machines interact is undergoing a dramatic change. This change becomes particularly apparent in domains where humans and machines collaboratively work on joint tasks or objects in teams, such as in industrial assembly or disassembly processes. While there [...] Read more.
The way humans and artificially intelligent machines interact is undergoing a dramatic change. This change becomes particularly apparent in domains where humans and machines collaboratively work on joint tasks or objects in teams, such as in industrial assembly or disassembly processes. While there is intensive research work on human–machine collaboration in different research disciplines, systematic and interdisciplinary approaches towards engineering systems that consist of or comprise human–machine teams are still rare. In this paper, we review and analyze the state of the art, and derive and discuss core requirements and concepts by means of an illustrating scenario. In terms of methods, we focus on how reciprocal trust between humans and intelligent machines is defined, built, measured, and maintained from a systems engineering and planning perspective in literature. Based on our analysis, we propose and outline three important areas of future research on engineering and operating human–machine teams for trusted collaboration. For each area, we describe exemplary research opportunities. Full article
Show Figures

Figure 1

17 pages, 11783 KiB  
Article
A Complete VADER-Based Sentiment Analysis of Bitcoin (BTC) Tweets during the Era of COVID-19
by Toni Pano and Rasha Kashef
Big Data Cogn. Comput. 2020, 4(4), 33; https://doi.org/10.3390/bdcc4040033 - 9 Nov 2020
Cited by 122 | Viewed by 17817
Abstract
During the COVID-19 pandemic, many research studies have been conducted to examine the impact of the outbreak on the financial sector, especially on cryptocurrencies. Social media, such as Twitter, plays a significant role as a meaningful indicator in forecasting the Bitcoin (BTC) prices. [...] Read more.
During the COVID-19 pandemic, many research studies have been conducted to examine the impact of the outbreak on the financial sector, especially on cryptocurrencies. Social media, such as Twitter, plays a significant role as a meaningful indicator in forecasting the Bitcoin (BTC) prices. However, there is a research gap in determining the optimal preprocessing strategy in BTC tweets to develop an accurate machine learning prediction model for bitcoin prices. This paper develops different text preprocessing strategies for correlating the sentiment scores of Twitter text with Bitcoin prices during the COVID-19 pandemic. We explore the effect of different preprocessing functions, features, and time lengths of data on the correlation results. Out of 13 strategies, we discover that splitting sentences, removing Twitter-specific tags, or their combination generally improve the correlation of sentiment scores and volume polarity scores with Bitcoin prices. The prices only correlate well with sentiment scores over shorter timespans. Selecting the optimum preprocessing strategy would prompt machine learning prediction models to achieve better accuracy as compared to the actual prices. Full article
(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)
Show Figures

Figure 1

20 pages, 2980 KiB  
Article
Using Big and Open Data to Generate Content for an Educational Game to Increase Student Performance and Interest
by Irene Vargianniti and Kostas Karpouzis
Big Data Cogn. Comput. 2020, 4(4), 30; https://doi.org/10.3390/bdcc4040030 - 22 Oct 2020
Cited by 12 | Viewed by 5584
Abstract
The goal of this paper is to utilize available big and open data sets to create content for a board and a digital game and implement an educational environment to improve students’ familiarity with concepts and relations in the data and, in the [...] Read more.
The goal of this paper is to utilize available big and open data sets to create content for a board and a digital game and implement an educational environment to improve students’ familiarity with concepts and relations in the data and, in the process, academic performance and engagement. To this end, we used Wikipedia data to generate content for a Monopoly clone called Geopoly and designed a game-based learning experiment. Our research examines whether this game had any impact on the students’ performance, which is related to identifying implied ranking and grouping mechanisms in the game, whether performance is correlated with interest and whether performance differs across genders. Student performance and knowledge about the relationships contained in the data improved significantly after playing the game, while the positive correlation between student interest and performance illustrated the relationship between them. This was also verified by a digital version of the game, evaluated by the students during the COVID-19 pandemic; initial results revealed that students found the game more attractive and rewarding than a traditional geography lesson. Full article
(This article belongs to the Special Issue Big Data Analytics for Cultural Heritage)
Show Figures

Figure 1

27 pages, 2515 KiB  
Review
A Review of Blockchain in Internet of Things and AI
by Hany F. Atlam, Muhammad Ajmal Azad, Ahmed G. Alzahrani and Gary Wills
Big Data Cogn. Comput. 2020, 4(4), 28; https://doi.org/10.3390/bdcc4040028 - 14 Oct 2020
Cited by 107 | Viewed by 16653
Abstract
The Internet of Things (IoT) represents a new technology that enables both virtual and physical objects to be connected and communicate with each other, and produce new digitized services that improve our quality of life. The IoT system provides several advantages, however, the [...] Read more.
The Internet of Things (IoT) represents a new technology that enables both virtual and physical objects to be connected and communicate with each other, and produce new digitized services that improve our quality of life. The IoT system provides several advantages, however, the current centralized architecture introduces numerous issues involving a single point of failure, security, privacy, transparency, and data integrity. These challenges are an obstacle in the way of the future developments of IoT applications. Moving the IoT into one of the distributed ledger technologies may be the correct choice to resolve these issues. Among the common and popular types of distributed ledger technologies is the blockchain. Integrating the IoT with blockchain technology can bring countless benefits. Therefore, this paper provides a comprehensive discussion of integrating the IoT system with blockchain technology. After providing the basics of the IoT system and blockchain technology, a thorough review of integrating the blockchain with the IoT system is presented by highlighting benefits of the integration and how the blockchain can resolve the issues of the IoT system. Then, the blockchain as a service for the IoT is presented to show how various features of blockchain technology can be implemented as a service for various IoT applications. This is followed by discussing the impact of integrating artificial intelligence (AI) on both IoT and blockchain. In the end, future research directions of IoT with blockchain are presented. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

18 pages, 2777 KiB  
Article
Multi-Level Clustering-Based Outlier’s Detection (MCOD) Using Self-Organizing Maps
by Menglu Li, Rasha Kashef and Ahmed Ibrahim
Big Data Cogn. Comput. 2020, 4(4), 24; https://doi.org/10.3390/bdcc4040024 - 23 Sep 2020
Cited by 16 | Viewed by 5680
Abstract
Outlier detection is critical in many business applications, as it recognizes unusual behaviours to prevent losses and optimize revenue. For example, illegitimate online transactions can be detected based on its pattern with outlier detection. The performance of existing outlier detection methods is limited [...] Read more.
Outlier detection is critical in many business applications, as it recognizes unusual behaviours to prevent losses and optimize revenue. For example, illegitimate online transactions can be detected based on its pattern with outlier detection. The performance of existing outlier detection methods is limited by the pattern/behaviour of the dataset; these methods may not perform well without prior knowledge of the dataset. This paper proposes a multi-level outlier detection algorithm (MCOD) that uses multi-level unsupervised learning to cluster the data and discover outliers. The proposed detection method is tested on datasets in different fields with different sizes and dimensions. Experimental analysis has shown that the proposed MCOD algorithm has the ability to improving the outlier detection rate, as compared to the traditional anomaly detection methods. Enterprises and organizations can adopt the proposed MCOD algorithm to ensure a sustainable and efficient detection of frauds/outliers to increase profitability (and/or) to enhance business outcomes. Full article
Show Figures

Figure 1

19 pages, 3738 KiB  
Article
Keyword Search over RDF: Is a Single Perspective Enough?
by Christos Nikas, Giorgos Kadilierakis, Pavlos Fafalios and Yannis Tzitzikas
Big Data Cogn. Comput. 2020, 4(3), 22; https://doi.org/10.3390/bdcc4030022 - 27 Aug 2020
Cited by 15 | Viewed by 6073
Abstract
Since the task of accessing RDF datasets through structured query languages like SPARQL is rather demanding for ordinary users, there are various approaches that attempt to exploit the simpler and widely used keyword-based search paradigm. However this task is challenging since there [...] Read more.
Since the task of accessing RDF datasets through structured query languages like SPARQL is rather demanding for ordinary users, there are various approaches that attempt to exploit the simpler and widely used keyword-based search paradigm. However this task is challenging since there is no clear unit of retrieval and presentation, the user information needs are in most cases not clearly formulated, the underlying RDF datasets are in most cases incomplete, and there is not a single presentation method appropriate for all kinds of information needs. As a means to alleviate these problems, in this paper we investigate an interaction approach that offers multiple presentation methods of the search results (multiple-perspectives), allowing the user to easily switch between these perspectives and thus exploit the added value that each such perspective offers. We focus on a set of fundamental perspectives, we discuss the benefits from each one, we compare this approach with related existing systems and report the results of a task-based evaluation with users. The key finding of the task-based evaluation is that users not familiar with RDF (a) managed to complete the information-seeking tasks (with performance very close to that of the experienced users), and (b) they rated positively the approach. Full article
(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)
Show Figures

Figure 1

27 pages, 4915 KiB  
Article
MOBDA: Microservice-Oriented Big Data Architecture for Smart City Transport Systems
by Suriya Priya R. Asaithambi, Ramanathan Venkatraman and Sitalakshmi Venkatraman
Big Data Cogn. Comput. 2020, 4(3), 17; https://doi.org/10.3390/bdcc4030017 - 9 Jul 2020
Cited by 31 | Viewed by 10617
Abstract
Highly populated cities depend highly on intelligent transportation systems (ITSs) for reliable and efficient resource utilization and traffic management. Current transportation systems struggle to meet different stakeholder expectations while trying their best to optimize resources in providing various transport services. This paper proposes [...] Read more.
Highly populated cities depend highly on intelligent transportation systems (ITSs) for reliable and efficient resource utilization and traffic management. Current transportation systems struggle to meet different stakeholder expectations while trying their best to optimize resources in providing various transport services. This paper proposes a Microservice-Oriented Big Data Architecture (MOBDA) incorporating data processing techniques, such as predictive modelling for achieving smart transportation and analytics microservices required towards smart cities of the future. We postulate key transportation metrics applied on various sources of transportation data to serve this objective. A novel hybrid architecture is proposed to combine stream processing and batch processing of big data for a smart computation of microservice-oriented transportation metrics that can serve the different needs of stakeholders. Development of such an architecture for smart transportation and analytics will improve the predictability of transport supply for transport providers and transport authority as well as enhance consumer satisfaction during peak periods. Full article
Show Figures

Figure 1

20 pages, 4375 KiB  
Article
TranScreen: Transfer Learning on Graph-Based Anti-Cancer Virtual Screening Model
by Milad Salem, Aminollah Khormali, Arash Keshavarzi Arshadi, Julia Webb and Jiann-Shiun Yuan
Big Data Cogn. Comput. 2020, 4(3), 16; https://doi.org/10.3390/bdcc4030016 - 29 Jun 2020
Cited by 11 | Viewed by 6847
Abstract
Deep learning’s automatic feature extraction has proven its superior performance over traditional fingerprint-based features in the implementation of virtual screening models. However, these models face multiple challenges in the field of early drug discovery, such as over-training and generalization to unseen data, due [...] Read more.
Deep learning’s automatic feature extraction has proven its superior performance over traditional fingerprint-based features in the implementation of virtual screening models. However, these models face multiple challenges in the field of early drug discovery, such as over-training and generalization to unseen data, due to the inherently unbalanced and small datasets. In this work, the TranScreen pipeline is proposed, which utilizes transfer learning and a collection of weight initializations to overcome these challenges. An amount of 182 graph convolutional neural networks are trained on molecular source datasets and the learned knowledge is transferred to the target task for fine-tuning. The target task of p53-based bioactivity prediction, an important factor for anti-cancer discovery, is chosen to showcase the capability of the pipeline. Having trained a collection of source models, three different approaches are implemented to compare and rank them for a given task before fine-tuning. The results show improvement in performance of the model in multiple cases, with the best model increasing the area under receiver operating curve ROC-AUC from 0.75 to 0.91 and the recall from 0.25 to 1. This improvement is vital for practical virtual screening via lowering the false negatives and demonstrates the potential of transfer learning. The code and pre-trained models are made accessible online. Full article
Show Figures

Figure 1

23 pages, 3397 KiB  
Article
#lockdown: Network-Enhanced Emotional Profiling in the Time of COVID-19
by Massimo Stella, Valerio Restocchi and Simon De Deyne
Big Data Cogn. Comput. 2020, 4(2), 14; https://doi.org/10.3390/bdcc4020014 - 16 Jun 2020
Cited by 46 | Viewed by 9773
Abstract
The COVID-19 pandemic forced countries all over the world to take unprecedented measures, like nationwide lockdowns. To adequately understand the emotional and social repercussions, a large-scale reconstruction of how people perceived these unexpected events is necessary but currently missing. We address this gap [...] Read more.
The COVID-19 pandemic forced countries all over the world to take unprecedented measures, like nationwide lockdowns. To adequately understand the emotional and social repercussions, a large-scale reconstruction of how people perceived these unexpected events is necessary but currently missing. We address this gap through social media by introducing MERCURIAL (Multi-layer Co-occurrence Networks for Emotional Profiling), a framework which exploits linguistic networks of words and hashtags to reconstruct social discourse describing real-world events. We use MERCURIAL to analyse 101,767 tweets from Italy, the first country to react to the COVID-19 threat with a nationwide lockdown. The data were collected between the 11th and 17th March, immediately after the announcement of the Italian lockdown and the WHO declaring COVID-19 a pandemic. Our analysis provides unique insights into the psychological burden of this crisis, focussing on—(i) the Italian official campaign for self-quarantine (#iorestoacasa), (ii) national lockdown (#italylockdown), and (iii) social denounce (#sciacalli). Our exploration unveils the emergence of complex emotional profiles, where anger and fear (towards political debates and socio-economic repercussions) coexisted with trust, solidarity, and hope (related to the institutions and local communities). We discuss our findings in relation to mental well-being issues and coping mechanisms, like instigation to violence, grieving, and solidarity. We argue that our framework represents an innovative thermometer of emotional status, a powerful tool for policy makers to quickly gauge feelings in massive audiences and devise appropriate responses based on cognitive data. Full article
(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)
Show Figures

Figure 1

12 pages, 227 KiB  
Case Report
The “Social” Side of Big Data: Teaching BD Analytics to Political Science Students
by Giampiero Giacomello and Oltion Preka
Big Data Cogn. Comput. 2020, 4(2), 13; https://doi.org/10.3390/bdcc4020013 - 5 Jun 2020
Cited by 6 | Viewed by 5092
Abstract
In an increasingly technology-dependent world, it is not surprising that STEM (Science, Technology, Engineering, and Mathematics) graduates are in high demand. This state of affairs, however, has made the public overlook the case that not only computing and artificial intelligence are naturally interdisciplinary, [...] Read more.
In an increasingly technology-dependent world, it is not surprising that STEM (Science, Technology, Engineering, and Mathematics) graduates are in high demand. This state of affairs, however, has made the public overlook the case that not only computing and artificial intelligence are naturally interdisciplinary, but that a huge portion of generated data comes from human–computer interactions, thus they are social in character and nature. Hence, social science practitioners should be in demand too, but this does not seem the case. One of the reasons for such a situation is that political and social science departments worldwide tend to remain in their “comfort zone” and see their disciplines quite traditionally, but by doing so they cut themselves off from many positions today. The authors believed that these conditions should and could be changed and thus in a few years created a specifically tailored course for students in Political Science. This paper examines the experience of the last year of such a program, which, after several tweaks and adjustments, is now fully operational. The results and students’ appreciation are quite remarkable. Hence the authors considered the experience was worth sharing, so that colleagues in social and political science departments may feel encouraged to follow and replicate such an example. Full article
21 pages, 2678 KiB  
Article
Artificial Intelligence-Enhanced Predictive Insights for Advancing Financial Inclusion: A Human-Centric AI-Thinking Approach
by Meng-Leong How, Sin-Mei Cheah, Aik Cheow Khor and Yong Jiet Chan
Big Data Cogn. Comput. 2020, 4(2), 8; https://doi.org/10.3390/bdcc4020008 - 27 Apr 2020
Cited by 31 | Viewed by 8755
Abstract
According to the World Bank, a key factor to poverty reduction and improving prosperity is financial inclusion. Financial service providers (FSPs) offering financially-inclusive solutions need to understand how to approach the underserved successfully. The application of artificial intelligence (AI) on legacy data can [...] Read more.
According to the World Bank, a key factor to poverty reduction and improving prosperity is financial inclusion. Financial service providers (FSPs) offering financially-inclusive solutions need to understand how to approach the underserved successfully. The application of artificial intelligence (AI) on legacy data can help FSPs to anticipate how prospective customers may respond when they are approached. However, it remains challenging for FSPs who are not well-versed in computer programming to implement AI projects. This paper proffers a no-coding human-centric AI-based approach to simulate the possible dynamics between the financial profiles of prospective customers collected from 45,211 contact encounters and predict their intentions toward the financial products being offered. This approach contributes to the literature by illustrating how AI for social good can also be accessible for people who are not well-versed in computer science. A rudimentary AI-based predictive modeling approach that does not require programming skills will be illustrated in this paper. In these AI-generated multi-criteria optimizations, analysts in FSPs can simulate scenarios to better understand their prospective customers. In conjunction with the usage of AI, this paper also suggests how AI-Thinking could be utilized as a cognitive scaffold for educing (drawing out) actionable insights to advance financial inclusion. Full article
Show Figures

Figure 1

28 pages, 10611 KiB  
Article
Hydria: An Online Data Lake for Multi-Faceted Analytics in the Cultural Heritage Domain
by Kimon Deligiannis, Paraskevi Raftopoulou, Christos Tryfonopoulos, Nikos Platis and Costas Vassilakis
Big Data Cogn. Comput. 2020, 4(2), 7; https://doi.org/10.3390/bdcc4020007 - 23 Apr 2020
Cited by 15 | Viewed by 6472
Abstract
Advancements in cultural informatics have significantly influenced the way we perceive, analyze, communicate and understand culture. New data sources, such as social media, digitized cultural content, and Internet of Things (IoT) devices, have allowed us to enrich and customize the cultural experience, but [...] Read more.
Advancements in cultural informatics have significantly influenced the way we perceive, analyze, communicate and understand culture. New data sources, such as social media, digitized cultural content, and Internet of Things (IoT) devices, have allowed us to enrich and customize the cultural experience, but at the same time have created an avalanche of new data that needs to be stored and appropriately managed in order to be of value. Although data management plays a central role in driving forward the cultural heritage domain, the solutions applied so far are fragmented, physically distributed, require specialized IT knowledge to deploy, and entail significant IT experience to operate even for trivial tasks. In this work, we present Hydria, an online data lake that allows users without any IT background to harvest, store, organize, analyze and share heterogeneous, multi-faceted cultural heritage data. Hydria provides a zero-administration, zero-cost, integrated framework that enables researchers, museum curators and other stakeholders within the cultural heritage domain to easily (i) deploy data acquisition services (like social media scrapers, focused web crawlers, dataset imports, questionnaire forms), (ii) design and manage versatile customizable data stores, (iii) share whole datasets or horizontal/vertical data shards with other stakeholders, (iv) search, filter and analyze data via an expressive yet simple-to-use graphical query engine and visualization tools, and (v) perform user management and access control operations on the stored data. To the best of our knowledge, this is the first solution in the literature that focuses on collecting, managing, analyzing, and sharing diverse, multi-faceted data in the cultural heritage domain and targets users without an IT background. Full article
(This article belongs to the Special Issue Big Data Analytics for Cultural Heritage)
Show Figures

Figure 1

53 pages, 5668 KiB  
Review
Big Data and Its Applications in Smart Real Estate and the Disaster Management Life Cycle: A Systematic Analysis
by Hafiz Suliman Munawar, Siddra Qayyum, Fahim Ullah and Samad Sepasgozar
Big Data Cogn. Comput. 2020, 4(2), 4; https://doi.org/10.3390/bdcc4020004 - 26 Mar 2020
Cited by 93 | Viewed by 25157
Abstract
Big data is the concept of enormous amounts of data being generated daily in different fields due to the increased use of technology and internet sources. Despite the various advancements and the hopes of better understanding, big data management and analysis remain a [...] Read more.
Big data is the concept of enormous amounts of data being generated daily in different fields due to the increased use of technology and internet sources. Despite the various advancements and the hopes of better understanding, big data management and analysis remain a challenge, calling for more rigorous and detailed research, as well as the identifications of methods and ways in which big data could be tackled and put to good use. The existing research lacks in discussing and evaluating the pertinent tools and technologies to analyze big data in an efficient manner which calls for a comprehensive and holistic analysis of the published articles to summarize the concept of big data and see field-specific applications. To address this gap and keep a recent focus, research articles published in last decade, belonging to top-tier and high-impact journals, were retrieved using the search engines of Google Scholar, Scopus, and Web of Science that were narrowed down to a set of 139 relevant research articles. Different analyses were conducted on the retrieved papers including bibliometric analysis, keywords analysis, big data search trends, and authors’ names, countries, and affiliated institutes contributing the most to the field of big data. The comparative analyses show that, conceptually, big data lies at the intersection of the storage, statistics, technology, and research fields and emerged as an amalgam of these four fields with interlinked aspects such as data hosting and computing, data management, data refining, data patterns, and machine learning. The results further show that major characteristics of big data can be summarized using the seven Vs, which include variety, volume, variability, value, visualization, veracity, and velocity. Furthermore, the existing methods for big data analysis, their shortcomings, and the possible directions were also explored that could be taken for harnessing technology to ensure data analysis tools could be upgraded to be fast and efficient. The major challenges in handling big data include efficient storage, retrieval, analysis, and visualization of the large heterogeneous data, which can be tackled through authentication such as Kerberos and encrypted files, logging of attacks, secure communication through Secure Sockets Layer (SSL) and Transport Layer Security (TLS), data imputation, building learning models, dividing computations into sub-tasks, checkpoint applications for recursive tasks, and using Solid State Drives (SDD) and Phase Change Material (PCM) for storage. In terms of frameworks for big data management, two frameworks exist including Hadoop and Apache Spark, which must be used simultaneously to capture the holistic essence of the data and make the analyses meaningful, swift, and speedy. Further field-specific applications of big data in two promising and integrated fields, i.e., smart real estate and disaster management, were investigated, and a framework for field-specific applications, as well as a merger of the two areas through big data, was highlighted. The proposed frameworks show that big data can tackle the ever-present issues of customer regrets related to poor quality of information or lack of information in smart real estate to increase the customer satisfaction using an intermediate organization that can process and keep a check on the data being provided to the customers by the sellers and real estate managers. Similarly, for disaster and its risk management, data from social media, drones, multimedia, and search engines can be used to tackle natural disasters such as floods, bushfires, and earthquakes, as well as plan emergency responses. In addition, a merger framework for smart real estate and disaster risk management show that big data generated from the smart real estate in the form of occupant data, facilities management, and building integration and maintenance can be shared with the disaster risk management and emergency response teams to help prevent, prepare, respond to, or recover from the disasters. Full article
Show Figures

Figure 1

34 pages, 637 KiB  
Article
Text Mining in Big Data Analytics
by Hossein Hassani, Christina Beneki, Stephan Unger, Maedeh Taj Mazinani and Mohammad Reza Yeganegi
Big Data Cogn. Comput. 2020, 4(1), 1; https://doi.org/10.3390/bdcc4010001 - 16 Jan 2020
Cited by 178 | Viewed by 29456
Abstract
Text mining in big data analytics is emerging as a powerful tool for harnessing the power of unstructured textual data by analyzing it to extract new knowledge and to identify significant patterns and correlations hidden in the data. This study seeks to determine [...] Read more.
Text mining in big data analytics is emerging as a powerful tool for harnessing the power of unstructured textual data by analyzing it to extract new knowledge and to identify significant patterns and correlations hidden in the data. This study seeks to determine the state of text mining research by examining the developments within published literature over past years and provide valuable insights for practitioners and researchers on the predominant trends, methods, and applications of text mining research. In accordance with this, more than 200 academic journal articles on the subject are included and discussed in this review; the state-of-the-art text mining approaches and techniques used for analyzing transcripts and speeches, meeting transcripts, and academic journal articles, as well as websites, emails, blogs, and social media platforms, across a broad range of application areas are also investigated. Additionally, the benefits and challenges related to text mining are also briefly outlined. Full article
(This article belongs to the Special Issue Knowledge Modelling and Learning through Cognitive Networks)
Show Figures

Figure 1

17 pages, 768 KiB  
Article
Emotional Decision-Making Biases Prediction in Cyber-Physical Systems
by Alberto Corredera, Marta Romero and Jose M. Moya
Big Data Cogn. Comput. 2019, 3(3), 49; https://doi.org/10.3390/bdcc3030049 - 30 Aug 2019
Cited by 1 | Viewed by 4877
Abstract
This article faces the challenge of discovering the trends in decision-making based on capturing emotional data and the influence of the possible external stimuli. We conducted an experiment with a significant sample of the workforce and used machine-learning techniques to model the decision-making [...] Read more.
This article faces the challenge of discovering the trends in decision-making based on capturing emotional data and the influence of the possible external stimuli. We conducted an experiment with a significant sample of the workforce and used machine-learning techniques to model the decision-making process. We studied the trends introduced by the emotional status and the external stimulus that makes these personnel act or report to the supervisor. The main result of this study is the production of a model capable of predicting the bias to act in a specific context. We studied the relationship between emotions and the probability of acting or correcting the system. The main area of interest of these issues is the ability to influence in advance the personnel to make their work more efficient and productive. This would be a whole new line of research for the future. Full article
Show Figures

Figure 1

18 pages, 380 KiB  
Article
Optimal Number of Choices in Rating Contexts
by Sam Ganzfried and Farzana Beente Yusuf
Big Data Cogn. Comput. 2019, 3(3), 48; https://doi.org/10.3390/bdcc3030048 - 27 Aug 2019
Cited by 2 | Viewed by 4414
Abstract
In many settings, people must give numerical scores to entities from a small discrete set—for instance, rating physical attractiveness from 1–5 on dating sites, or papers from 1–10 for conference reviewing. We study the problem of understanding when using a different number of [...] Read more.
In many settings, people must give numerical scores to entities from a small discrete set—for instance, rating physical attractiveness from 1–5 on dating sites, or papers from 1–10 for conference reviewing. We study the problem of understanding when using a different number of options is optimal. We consider the case when scores are uniform random and Gaussian. We study computationally when using 2, 3, 4, 5, and 10 options out of a total of 100 is optimal in these models (though our theoretical analysis is for a more general setting with k choices from n total options as well as a continuous underlying space). One may expect that using more options would always improve performance in this model, but we show that this is not necessarily the case, and that using fewer choices—even just two—can surprisingly be optimal in certain situations. While in theory for this setting it would be optimal to use all 100 options, in practice, this is prohibitive, and it is preferable to utilize a smaller number of options due to humans’ limited computational resources. Our results could have many potential applications, as settings requiring entities to be ranked by humans are ubiquitous. There could also be applications to other fields such as signal or image processing where input values from a large set must be mapped to output values in a smaller set. Full article
(This article belongs to the Special Issue Computational Models of Cognition and Learning)
Show Figures

Figure 1

24 pages, 1025 KiB  
Article
PerTract: Model Extraction and Specification of Big Data Systems for Performance Prediction by the Example of Apache Spark and Hadoop
by Johannes Kroß and Helmut Krcmar
Big Data Cogn. Comput. 2019, 3(3), 47; https://doi.org/10.3390/bdcc3030047 - 9 Aug 2019
Cited by 10 | Viewed by 6113
Abstract
Evaluating and predicting the performance of big data applications are required to efficiently size capacities and manage operations. Gaining profound insights into the system architecture, dependencies of components, resource demands, and configurations cause difficulties to engineers. To address these challenges, this paper presents [...] Read more.
Evaluating and predicting the performance of big data applications are required to efficiently size capacities and manage operations. Gaining profound insights into the system architecture, dependencies of components, resource demands, and configurations cause difficulties to engineers. To address these challenges, this paper presents an approach to automatically extract and transform system specifications to predict the performance of applications. It consists of three components. First, a system-and tool-agnostic domain-specific language (DSL) allows the modeling of performance-relevant factors of big data applications, computing resources, and data workload. Second, DSL instances are automatically extracted from monitored measurements of Apache Spark and Apache Hadoop (i.e., YARN and HDFS) systems. Third, these instances are transformed to model- and simulation-based performance evaluation tools to allow predictions. By adapting DSL instances, our approach enables engineers to predict the performance of applications for different scenarios such as changing data input and resources. We evaluate our approach by predicting the performance of linear regression and random forest applications of the HiBench benchmark suite. Simulation results of adjusted DSL instances compared to measurement results show accurate predictions errors below 15% based upon averages for response times and resource utilization. Full article
Show Figures

Figure 1

19 pages, 502 KiB  
Article
Viability in Multiplex Lexical Networks and Machine Learning Characterizes Human Creativity
by Massimo Stella and Yoed N. Kenett
Big Data Cogn. Comput. 2019, 3(3), 45; https://doi.org/10.3390/bdcc3030045 - 31 Jul 2019
Cited by 31 | Viewed by 7573
Abstract
Previous studies have shown how individual differences in creativity relate to differences in the structure of semantic memory. However, the latter is only one aspect of the whole mental lexicon, a repository of conceptual knowledge that is considered to simultaneously include multiple types [...] Read more.
Previous studies have shown how individual differences in creativity relate to differences in the structure of semantic memory. However, the latter is only one aspect of the whole mental lexicon, a repository of conceptual knowledge that is considered to simultaneously include multiple types of conceptual similarities. In the current study, we apply a multiplex network approach to compute a representation of the mental lexicon combining semantics and phonology and examine how it relates to individual differences in creativity. This multiplex combination of 150,000 phonological and semantic associations identifies a core of words in the mental lexicon known as viable cluster, a kernel containing simpler to parse, more general, concrete words acquired early during language learning. We focus on low (N = 47) and high (N = 47) creative individuals’ performance in generating animal names during a semantic fluency task. We model this performance as the outcome of a mental navigation on the multiplex lexical network, going within, outside, and in-between the viable cluster. We find that low and high creative individuals differ substantially in their access to the viable cluster during the semantic fluency task. Higher creative individuals tend to access the viable cluster less frequently, with a lower uncertainty/entropy, reaching out to more peripheral words and covering longer multiplex network distances between concepts in comparison to lower creative individuals. We use these differences for constructing a machine learning classifier of creativity levels, which leads to an accuracy of 65.0 ± 0.9 % and an area under the curve of 68.0 ± 0.8 % , which are both higher than the random expectation of 50%. These results highlight the potential relevance of combining psycholinguistic measures with multiplex network models of the mental lexicon for modelling mental navigation and, consequently, classifying people automatically according to their creativity levels. Full article
Show Figures

Figure 1

43 pages, 64972 KiB  
Article
Future-Ready Strategic Oversight of Multiple Artificial Superintelligence-Enabled Adaptive Learning Systems via Human-Centric Explainable AI-Empowered Predictive Optimizations of Educational Outcomes
by Meng-Leong HOW
Big Data Cogn. Comput. 2019, 3(3), 46; https://doi.org/10.3390/bdcc3030046 - 31 Jul 2019
Cited by 12 | Viewed by 7160
Abstract
Artificial intelligence-enabled adaptive learning systems (AI-ALS) have been increasingly utilized in education. Schools are usually afforded the freedom to deploy the AI-ALS that they prefer. However, even before artificial intelligence autonomously develops into artificial superintelligence in the future, it would be remiss to [...] Read more.
Artificial intelligence-enabled adaptive learning systems (AI-ALS) have been increasingly utilized in education. Schools are usually afforded the freedom to deploy the AI-ALS that they prefer. However, even before artificial intelligence autonomously develops into artificial superintelligence in the future, it would be remiss to entirely leave the students to the AI-ALS without any independent oversight of the potential issues. For example, if the students score well in formative assessments within the AI-ALS but subsequently perform badly in paper-based post-tests, or if the relentless algorithm of a particular AI-ALS is suspected of causing undue stress for the students, they should be addressed by educational stakeholders. Policy makers and educational stakeholders should collaborate to analyze the data from multiple AI-ALS deployed in different schools to achieve strategic oversight. The current paper provides exemplars to illustrate how this future-ready strategic oversight could be implemented using an artificial intelligence-based Bayesian network software to analyze the data from five dissimilar AI-ALS, each deployed in a different school. Besides using descriptive analytics to reveal potential issues experienced by students within each AI-ALS, this human-centric AI-empowered approach also enables explainable predictive analytics of the students’ learning outcomes in paper-based summative assessments after training is completed in each AI-ALS. Full article
(This article belongs to the Special Issue Artificial Superintelligence: Coordination & Strategy)
Show Figures

Figure 1

17 pages, 5998 KiB  
Article
RazorNet: Adversarial Training and Noise Training on a Deep Neural Network Fooled by a Shallow Neural Network
by Shayan Taheri, Milad Salem and Jiann-Shiun Yuan
Big Data Cogn. Comput. 2019, 3(3), 43; https://doi.org/10.3390/bdcc3030043 - 23 Jul 2019
Cited by 5 | Viewed by 6044
Abstract
In this work, we propose ShallowDeepNet, a novel system architecture that includes a shallow and a deep neural network. The shallow neural network has the duty of data preprocessing and generating adversarial samples. The deep neural network has the duty of understanding data [...] Read more.
In this work, we propose ShallowDeepNet, a novel system architecture that includes a shallow and a deep neural network. The shallow neural network has the duty of data preprocessing and generating adversarial samples. The deep neural network has the duty of understanding data and information as well as detecting adversarial samples. The deep neural network gets its weights from transfer learning, adversarial training, and noise training. The system is examined on the biometric (fingerprint and iris) and the pharmaceutical data (pill image). According to the simulation results, the system is capable of improving the detection accuracy of the biometric data from 1.31% to 80.65% when the adversarial data is used and to 93.4% when the adversarial data as well as the noisy data are given to the network. The system performance on the pill image data is increased from 34.55% to 96.03% and then to 98.2%, respectively. Training on different types of noise can benefit us in detecting samples from unknown and unseen adversarial attacks. Meanwhile, the system training on the adversarial data as well as noisy data occurs only once. In fact, retraining the system may improve the performance further. Furthermore, training the system on new types of attacks and noise can help in enhancing the system performance. Full article
Show Figures

Figure 1

19 pages, 412 KiB  
Article
Cooking Is Creating Emotion: A Study on Hinglish Sentiments of Youtube Cookery Channels Using Semi-Supervised Approach
by Gagandeep Kaur, Abhishek Kaushik and Shubham Sharma
Big Data Cogn. Comput. 2019, 3(3), 37; https://doi.org/10.3390/bdcc3030037 - 3 Jul 2019
Cited by 32 | Viewed by 9740
Abstract
The success of Youtube has attracted a lot of users, which results in an increase of the number of comments present on Youtube channels. By analyzing those comments we could provide insight to the Youtubers that would help them to deliver better quality. [...] Read more.
The success of Youtube has attracted a lot of users, which results in an increase of the number of comments present on Youtube channels. By analyzing those comments we could provide insight to the Youtubers that would help them to deliver better quality. Youtube is very popular in India. A majority of the population in India speak and write a mixture of two languages known as Hinglish for casual communication on social media. Our study focuses on the sentiment analysis of Hinglish comments on cookery channels. The unsupervised learning technique DBSCAN was employed in our work to find the different patterns in the comments data. We have modelled and evaluated both parametric and non-parametric learning algorithms. Logistic regression with the term frequency vectorizer gave 74.01% accuracy in Nisha Madulika’s dataset and 75.37% accuracy in Kabita’s Kitchen dataset. Each classifier is statistically tested in our study. Full article
Show Figures

Figure 1

17 pages, 5150 KiB  
Article
Data-Driven Load Forecasting of Air Conditioners for Demand Response Using Levenberg–Marquardt Algorithm-Based ANN
by Muhammad Waseem, Zhenzhi Lin and Li Yang
Big Data Cogn. Comput. 2019, 3(3), 36; https://doi.org/10.3390/bdcc3030036 - 2 Jul 2019
Cited by 29 | Viewed by 6600
Abstract
Air Conditioners (AC) impact in overall electricity consumption in buildings is very high. Therefore, controlling ACs power consumption is a significant factor for demand response. With the advancement in the area of demand side management techniques implementation and smart grid, precise AC load [...] Read more.
Air Conditioners (AC) impact in overall electricity consumption in buildings is very high. Therefore, controlling ACs power consumption is a significant factor for demand response. With the advancement in the area of demand side management techniques implementation and smart grid, precise AC load forecasting for electrical utilities and end-users is required. In this paper, big data analysis and its applications in power systems is introduced. After this, various load forecasting categories and various techniques applied for load forecasting in context of big data analysis in power systems have been explored. Then, Levenberg–Marquardt Algorithm (LMA)-based Artificial Neural Network (ANN) for residential AC short-term load forecasting is presented. This forecasting approach utilizes past hourly temperature observations and AC load as input variables for assessment. Different performance assessment indices have also been investigated. Error formulations have shown that LMA-based ANN presents better results in comparison to Scaled Conjugate Gradient (SCG) and statistical regression approach. Furthermore, information of AC load is obtainable for different time horizons like weekly, hourly, and monthly bases due to better prediction accuracy of LMA-based ANN, which is helpful for efficient demand response (DR) implementation. Full article
Show Figures

Figure 1

30 pages, 2154 KiB  
Review
Big Data and Business Analytics: Trends, Platforms, Success Factors and Applications
by Ifeyinwa Angela Ajah and Henry Friday Nweke
Big Data Cogn. Comput. 2019, 3(2), 32; https://doi.org/10.3390/bdcc3020032 - 10 Jun 2019
Cited by 107 | Viewed by 51338
Abstract
Big data and business analytics are trends that are positively impacting the business world. Past researches show that data generated in the modern world is huge and growing exponentially. These include structured and unstructured data that flood organizations daily. Unstructured data constitute the [...] Read more.
Big data and business analytics are trends that are positively impacting the business world. Past researches show that data generated in the modern world is huge and growing exponentially. These include structured and unstructured data that flood organizations daily. Unstructured data constitute the majority of the world’s digital data and these include text files, web, and social media posts, emails, images, audio, movies, etc. The unstructured data cannot be managed in the traditional relational database management system (RDBMS). Therefore, data proliferation requires a rethinking of techniques for capturing, storing, and processing the data. This is the role big data has come to play. This paper, therefore, is aimed at increasing the attention of organizations and researchers to various applications and benefits of big data technology. The paper reviews and discusses, the recent trends, opportunities and pitfalls of big data and how it has enabled organizations to create successful business strategies and remain competitive, based on available literature. Furthermore, the review presents the various applications of big data and business analytics, data sources generated in these applications and their key characteristics. Finally, the review not only outlines the challenges for successful implementation of big data projects but also highlights the current open research directions of big data analytics that require further consideration. The reviewed areas of big data suggest that good management and manipulation of the large data sets using the techniques and tools of big data can deliver actionable insights that create business values. Full article
Show Figures

Figure 1

18 pages, 1808 KiB  
Article
Automatic Human Brain Tumor Detection in MRI Image Using Template-Based K Means and Improved Fuzzy C Means Clustering Algorithm
by Md Shahariar Alam, Md Mahbubur Rahman, Mohammad Amazad Hossain, Md Khairul Islam, Kazi Mowdud Ahmed, Khandaker Takdir Ahmed, Bikash Chandra Singh and Md Sipon Miah
Big Data Cogn. Comput. 2019, 3(2), 27; https://doi.org/10.3390/bdcc3020027 - 13 May 2019
Cited by 118 | Viewed by 13024
Abstract
In recent decades, human brain tumor detection has become one of the most challenging issues in medical science. In this paper, we propose a model that includes the template-based K means and improved fuzzy C means (TKFCM) algorithm for detecting human brain tumors [...] Read more.
In recent decades, human brain tumor detection has become one of the most challenging issues in medical science. In this paper, we propose a model that includes the template-based K means and improved fuzzy C means (TKFCM) algorithm for detecting human brain tumors in a magnetic resonance imaging (MRI) image. In this proposed algorithm, firstly, the template-based K-means algorithm is used to initialize segmentation significantly through the perfect selection of a template, based on gray-level intensity of image; secondly, the updated membership is determined by the distances from cluster centroid to cluster data points using the fuzzy C-means (FCM) algorithm while it contacts its best result, and finally, the improved FCM clustering algorithm is used for detecting tumor position by updating membership function that is obtained based on the different features of tumor image including Contrast, Energy, Dissimilarity, Homogeneity, Entropy, and Correlation. Simulation results show that the proposed algorithm achieves better detection of abnormal and normal tissues in the human brain under small detachment of gray-level intensity. In addition, this algorithm detects human brain tumors within a very short time—in seconds compared to minutes with other algorithms. Full article
Show Figures

Figure 1

17 pages, 245 KiB  
Article
AI Governance and the Policymaking Process: Key Considerations for Reducing AI Risk
by Brandon Perry and Risto Uuk
Big Data Cogn. Comput. 2019, 3(2), 26; https://doi.org/10.3390/bdcc3020026 - 8 May 2019
Cited by 28 | Viewed by 12412
Abstract
This essay argues that a new subfield of AI governance should be explored that examines the policy-making process and its implications for AI governance. A growing number of researchers have begun working on the question of how to mitigate the catastrophic risks of [...] Read more.
This essay argues that a new subfield of AI governance should be explored that examines the policy-making process and its implications for AI governance. A growing number of researchers have begun working on the question of how to mitigate the catastrophic risks of transformative artificial intelligence, including what policies states should adopt. However, this essay identifies a preceding, meta-level problem of how the space of possible policies is affected by the politics and administrative mechanisms of how those policies are created and implemented. This creates a new set of key considerations for the field of AI governance and should influence the action of future policymakers. This essay examines some of the theories of the policymaking process, how they compare to current work in AI governance, and their implications for the field at large and ends by identifying areas of future research. Full article
(This article belongs to the Special Issue Artificial Superintelligence: Coordination & Strategy)
6 pages, 189 KiB  
Opinion
The Supermoral Singularity—AI as a Fountain of Values
by Eleanor Nell Watson
Big Data Cogn. Comput. 2019, 3(2), 23; https://doi.org/10.3390/bdcc3020023 - 11 Apr 2019
Cited by 4 | Viewed by 6834
Abstract
This article looks at the problem of moral singularity in the development of artificial intelligence. We are now on the verge of major breakthroughs in machine technology where autonomous robots that can make their own decisions will become an integral part of our [...] Read more.
This article looks at the problem of moral singularity in the development of artificial intelligence. We are now on the verge of major breakthroughs in machine technology where autonomous robots that can make their own decisions will become an integral part of our way of life. This article presents a qualitative, comparative approach, which considers the differences between humans and machines, especially in relation to morality, and is grounded in historical and contemporary examples. This argument suggests that it is difficult to apply models of human morality and evolution to machines and that the creation of super-intelligent robots that will be able to make moral decisions could have potentially serious consequences. A runaway moral singularity could result in machines seeking to confront human moral transgressions in a quest to eliminate all forms of evil. This might also culminate in an all-out war in which humanity might be defeated. Full article
(This article belongs to the Special Issue Artificial Superintelligence: Coordination & Strategy)
Show Figures

Graphical abstract

20 pages, 3634 KiB  
Article
Pruning Fuzzy Neural Network Applied to the Construction of Expert Systems to Aid in the Diagnosis of the Treatment of Cryotherapy and Immunotherapy
by Augusto Junio Guimarães, Paulo Vitor de Campos Souza, Vinícius Jonathan Silva Araújo, Thiago Silva Rezende and Vanessa Souza Araújo
Big Data Cogn. Comput. 2019, 3(2), 22; https://doi.org/10.3390/bdcc3020022 - 9 Apr 2019
Cited by 20 | Viewed by 5847
Abstract
Human papillomavirus (HPV) infection is related to frequent cases of cervical cancer and genital condyloma in humans. Up to now, numerous methods have come into existence for the prevention and treatment of this disease. In this context, this paper aims to help predict [...] Read more.
Human papillomavirus (HPV) infection is related to frequent cases of cervical cancer and genital condyloma in humans. Up to now, numerous methods have come into existence for the prevention and treatment of this disease. In this context, this paper aims to help predict the susceptibility of the patient to forms treatment using both cryotherapy and immunotherapy. These studies facilitate the choice of medications, which can be painful and embarrassing for patients who have warts on intimate parts. However, the use of intelligent models generates efficient results but does not allow a better interpretation of the results. To solve the problem, we present the method of a fuzzy neural network (FNN). A hybrid model capable of solving complex problems and extracting knowledge from the database will pruned through F-score techniques to perform pattern classification in the treatment of warts, and to produce a specialist system based on if/then rules, according to the experience obtained from the database collected through medical research. Finally, binary pattern-classification tests realized in the FNN and compared with other models commonly used for classification tasks capture results of greater accuracy than the current state of the art for this type of problem (84.32% for immunotherapy, and 88.64% for cryotherapy), and extract fuzzy rules from the problem database. It was found that the hybrid approach based on neural networks and fuzzy systems can be an excellent tool to aid the prediction of cryotherapy and immunotherapy treatments. Full article
(This article belongs to the Special Issue Health Assessment in the Big Data Era)
Show Figures

Figure 1

15 pages, 280 KiB  
Communication
Multiparty Dynamics and Failure Modes for Machine Learning and Artificial Intelligence
by David Manheim
Big Data Cogn. Comput. 2019, 3(2), 21; https://doi.org/10.3390/bdcc3020021 - 5 Apr 2019
Cited by 14 | Viewed by 7166
Abstract
An important challenge for safety in machine learning and artificial intelligence systems is a set of related failures involving specification gaming, reward hacking, fragility to distributional shifts, and Goodhart’s or Campbell’s law. This paper presents additional failure modes for interactions within multi-agent systems [...] Read more.
An important challenge for safety in machine learning and artificial intelligence systems is a set of related failures involving specification gaming, reward hacking, fragility to distributional shifts, and Goodhart’s or Campbell’s law. This paper presents additional failure modes for interactions within multi-agent systems that are closely related. These multi-agent failure modes are more complex, more problematic, and less well understood than the single-agent case, and are also already occurring, largely unnoticed. After motivating the discussion with examples from poker-playing artificial intelligence (AI), the paper explains why these failure modes are in some senses unavoidable. Following this, the paper categorizes failure modes, provides definitions, and cites examples for each of the modes: accidental steering, coordination failures, adversarial misalignment, input spoofing and filtering, and goal co-option or direct hacking. The paper then discusses how extant literature on multi-agent AI fails to address these failure modes, and identifies work which may be useful for the mitigation of these failure modes. Full article
(This article belongs to the Special Issue Artificial Superintelligence: Coordination & Strategy)
18 pages, 1484 KiB  
Article
Big Data Management Canvas: A Reference Model for Value Creation from Data
by Michael Kaufmann
Big Data Cogn. Comput. 2019, 3(1), 19; https://doi.org/10.3390/bdcc3010019 - 11 Mar 2019
Cited by 26 | Viewed by 16164
Abstract
Many big data projects are technology-driven and thus, expensive and inefficient. It is often unclear how to exploit existing data resources and map data, systems and analytics results to actual use cases. Existing big data reference models are mostly either technological or business-oriented [...] Read more.
Many big data projects are technology-driven and thus, expensive and inefficient. It is often unclear how to exploit existing data resources and map data, systems and analytics results to actual use cases. Existing big data reference models are mostly either technological or business-oriented in nature, but do not consequently align both aspects. To address this issue, a reference model for big data management is proposed that operationalizes value creation from big data by linking business targets with technical implementation. The purpose of this model is to provide a goal- and value-oriented framework to effectively map and plan purposeful big data systems aligned with a clear value proposition. Based on an epistemic model that conceptualizes big data management as a cognitive system, the solution space of data value creation is divided into five layers: preparation, analysis, interaction, effectuation, and intelligence. To operationalize the model, each of these layers is subdivided into corresponding business and IT aspects to create a link from use cases to technological implementation. The resulting reference model, the big data management canvas, can be applied to classify and extend existing big data applications and to derive and plan new big data solutions, visions, and strategies for future projects. To validate the model in the context of existing information systems, the paper describes three cases of big data management in existing companies. Full article
Show Figures

Figure 1

23 pages, 284 KiB  
Article
Global Solutions vs. Local Solutions for the AI Safety Problem
by Alexey Turchin, David Denkenberger and Brian Patrick Green
Big Data Cogn. Comput. 2019, 3(1), 16; https://doi.org/10.3390/bdcc3010016 - 20 Feb 2019
Cited by 8 | Viewed by 6812
Abstract
There are two types of artificial general intelligence (AGI) safety solutions: global and local. Most previously suggested solutions are local: they explain how to align or “box” a specific AI (Artificial Intelligence), but do not explain how to prevent the creation of dangerous [...] Read more.
There are two types of artificial general intelligence (AGI) safety solutions: global and local. Most previously suggested solutions are local: they explain how to align or “box” a specific AI (Artificial Intelligence), but do not explain how to prevent the creation of dangerous AI in other places. Global solutions are those that ensure any AI on Earth is not dangerous. The number of suggested global solutions is much smaller than the number of proposed local solutions. Global solutions can be divided into four groups: 1. No AI: AGI technology is banned or its use is otherwise prevented; 2. One AI: the first superintelligent AI is used to prevent the creation of any others; 3. Net of AIs as AI police: a balance is created between many AIs, so they evolve as a net and can prevent any rogue AI from taking over the world; 4. Humans inside AI: humans are augmented or part of AI. We explore many ideas, both old and new, regarding global solutions for AI safety. They include changing the number of AI teams, different forms of “AI Nanny” (non-self-improving global control AI system able to prevent creation of dangerous AIs), selling AI safety solutions, and sending messages to future AI. Not every local solution scales to a global solution or does it ethically and safely. The choice of the best local solution should include understanding of the ways in which it will be scaled up. Human-AI teams or a superintelligent AI Service as suggested by Drexler may be examples of such ethically scalable local solutions, but the final choice depends on some unknown variables such as the speed of AI progress. Full article
(This article belongs to the Special Issue Artificial Superintelligence: Coordination & Strategy)
29 pages, 3400 KiB  
Article
Intelligent Recommender System for Big Data Applications Based on the Random Neural Network
by Will Serrano
Big Data Cogn. Comput. 2019, 3(1), 15; https://doi.org/10.3390/bdcc3010015 - 18 Feb 2019
Cited by 10 | Viewed by 5329
Abstract
Online market places make their profit based on their advertisements or sales commission while businesses have the commercial interest to rank higher on recommendations to attract more customers. Web users cannot be guaranteed that the products provided by recommender systems within Big Data [...] Read more.
Online market places make their profit based on their advertisements or sales commission while businesses have the commercial interest to rank higher on recommendations to attract more customers. Web users cannot be guaranteed that the products provided by recommender systems within Big Data are either exhaustive or relevant to their needs. This article analyses the product rank relevance provided by different commercial Big Data recommender systems (Grouplens film, Trip Advisor and Amazon); it also proposes an Intelligent Recommender System (IRS) based on the Random Neural Network; IRS acts as an interface between the customer and the different Recommender Systems that iteratively adapts to the perceived user relevance. In addition, a relevance metric that combines both relevance and rank is presented; this metric is used to validate and compare the performance of the proposed algorithm. On average, IRS outperforms the Big Data recommender systems after learning iteratively from its customer. Full article
(This article belongs to the Special Issue Big-Data Driven Multi-Criteria Decision-Making)
Show Figures

Figure 1

14 pages, 2321 KiB  
Review
A Review of Facial Landmark Extraction in 2D Images and Videos Using Deep Learning
by Matteo Bodini
Big Data Cogn. Comput. 2019, 3(1), 14; https://doi.org/10.3390/bdcc3010014 - 13 Feb 2019
Cited by 47 | Viewed by 10420
Abstract
The task of facial landmark extraction is fundamental in several applications which involve facial analysis, such as facial expression analysis, identity and face recognition, facial animation, and 3D face reconstruction. Taking into account the most recent advances resulting from deep-learning techniques, the performance [...] Read more.
The task of facial landmark extraction is fundamental in several applications which involve facial analysis, such as facial expression analysis, identity and face recognition, facial animation, and 3D face reconstruction. Taking into account the most recent advances resulting from deep-learning techniques, the performance of methods for facial landmark extraction have been substantially improved, even on in-the-wild datasets. Thus, this article presents an updated survey on facial landmark extraction on 2D images and video, focusing on methods that make use of deep-learning techniques. An analysis of many approaches comparing the performances is provided. In summary, an analysis of common datasets, challenges, and future research directions are provided. Full article
Show Figures

Figure 1

17 pages, 468 KiB  
Review
Big Data and Climate Change
by Hossein Hassani, Xu Huang and Emmanuel Silva
Big Data Cogn. Comput. 2019, 3(1), 12; https://doi.org/10.3390/bdcc3010012 - 2 Feb 2019
Cited by 69 | Viewed by 14813
Abstract
Climate science as a data-intensive subject has overwhelmingly affected by the era of big data and relevant technological revolutions. The big successes of big data analytics in diverse areas over the past decade have also prompted the expectation of big data and its [...] Read more.
Climate science as a data-intensive subject has overwhelmingly affected by the era of big data and relevant technological revolutions. The big successes of big data analytics in diverse areas over the past decade have also prompted the expectation of big data and its efficacy on the big problem—climate change. As an emerging topic, climate change has been at the forefront of the big climate data analytics implementations and exhaustive research have been carried out covering a variety of topics. This paper aims to present an outlook of big data in climate change studies over the recent years by investigating and summarising the current status of big data applications in climate change related studies. It is also expected to serve as a one-stop reference directory for researchers and stakeholders with an overview of this trending subject at a glance, which can be useful in guiding future research and improvements in the exploitation of big climate data. Full article
Show Figures

Figure 1

22 pages, 2173 KiB  
Article
Modelling Early Word Acquisition through Multiplex Lexical Networks and Machine Learning
by Massimo Stella
Big Data Cogn. Comput. 2019, 3(1), 10; https://doi.org/10.3390/bdcc3010010 - 24 Jan 2019
Cited by 23 | Viewed by 5742
Abstract
Early language acquisition is a complex cognitive task. Recent data-informed approaches showed that children do not learn words uniformly at random but rather follow specific strategies based on the associative representation of words in the mental lexicon, a conceptual system enabling human cognitive [...] Read more.
Early language acquisition is a complex cognitive task. Recent data-informed approaches showed that children do not learn words uniformly at random but rather follow specific strategies based on the associative representation of words in the mental lexicon, a conceptual system enabling human cognitive computing. Building on this evidence, the current investigation introduces a combination of machine learning techniques, psycholinguistic features (i.e., frequency, length, polysemy and class) and multiplex lexical networks, representing the semantics and phonology of the mental lexicon, with the aim of predicting normative acquisition of 529 English words by toddlers between 22 and 26 months. Classifications using logistic regression and based on four psycholinguistic features achieve the best baseline cross-validated accuracy of 61.7% when half of the words have been acquired. Adding network information through multiplex closeness centrality enhances accuracy (up to 67.7%) more than adding multiplex neighbourhood density/degree (62.4%) or multiplex PageRank versatility (63.0%) or the best single-layer network metric, i.e., free association degree (65.2%), instead. Multiplex closeness operationalises the structural relevance of words for semantic and phonological information flow. These results indicate that the whole, global, multi-level flow of information and structure of the mental lexicon influence word acquisition more than single-layer or local network features of words when considered in conjunction with language norms. The highlighted synergy of multiplex lexical structure and psycholinguistic norms opens new ways for understanding human cognition and language processing through powerful and data-parsimonious cognitive computing approaches. Full article
(This article belongs to the Special Issue Computational Models of Cognition and Learning)
Show Figures

Figure 1

29 pages, 3282 KiB  
Article
Fog Computing for Internet of Things (IoT)-Aided Smart Grid Architectures
by Md. Muzakkir Hussain and M.M. Sufyan Beg
Big Data Cogn. Comput. 2019, 3(1), 8; https://doi.org/10.3390/bdcc3010008 - 19 Jan 2019
Cited by 58 | Viewed by 8089
Abstract
The fast-paced development of power systems necessitates the smart grid (SG) to facilitate real-time control and monitoring with bidirectional communication and electricity flows. In order to meet the computational requirements for SG applications, cloud computing (CC) provides flexible resources and services shared in [...] Read more.
The fast-paced development of power systems necessitates the smart grid (SG) to facilitate real-time control and monitoring with bidirectional communication and electricity flows. In order to meet the computational requirements for SG applications, cloud computing (CC) provides flexible resources and services shared in network, parallel processing, and omnipresent access. Even though CC model is considered to be efficient for SG, it fails to guarantee the Quality-of-Experience (QoE) requirements for the SG services, viz. latency, bandwidth, energy consumption, and network cost. Fog Computing (FC) extends CC by deploying localized computing and processing facilities into the edge of the network, offering location-awareness, low latency, and latency-sensitive analytics for mission critical requirements of SG applications. By deploying localized computing facilities at the premise of users, it pre-stores the cloud data and distributes to SG users with fast-rate local connections. In this paper, we first examine the current state of cloud based SG architectures and highlight the motivation(s) for adopting FC as a technology enabler for real-time SG analytics. We also present a three layer FC-based SG architecture, characterizing its features towards integrating massive number of Internet of Things (IoT) devices into future SG. We then propose a cost optimization model for FC that jointly investigates data consumer association, workload distribution, virtual machine placement and Quality-of-Service (QoS) constraints. The formulated model is a Mixed-Integer Nonlinear Programming (MINLP) problem which is solved using Modified Differential Evolution (MDE) algorithm. We evaluate the proposed framework on real world parameters and show that for a network with approximately 50% time critical applications, the overall service latency for FC is nearly half to that of cloud paradigm. We also observed that the FC lowers the aggregated power consumption of the generic CC model by more than 44%. Full article
Show Figures

Figure 1

Back to TopTop