Big Data and Cognitive Computing

19 pages, 784 KiB

Open AccessEditor’s ChoiceReview

A Survey on Big Data in Pharmacology, Toxicology and Pharmaceutics

by Krithika Latha Bhaskaran, Richard Sakyi Osei, Evans Kotei, Eric Yaw Agbezuge, Carlos Ankora and Ernest D. Ganaa

Big Data Cogn. Comput. 2022, 6(4), 161; https://doi.org/10.3390/bdcc6040161 - 19 Dec 2022

Cited by 7 | Viewed by 3476

Abstract

Patients, hospitals, sensors, researchers, providers, phones, and healthcare organisations are producing enormous amounts of data in both the healthcare and drug detection sectors. The real challenge in these sectors is to find, investigate, manage, and collect information from patients in order to make [...] Read more.

Patients, hospitals, sensors, researchers, providers, phones, and healthcare organisations are producing enormous amounts of data in both the healthcare and drug detection sectors. The real challenge in these sectors is to find, investigate, manage, and collect information from patients in order to make their lives easier and healthier, not only in terms of formulating new therapies and understanding diseases, but also to predict the results at earlier stages and make effective decisions. The volumes of data available in the fields of pharmacology, toxicology, and pharmaceutics are constantly increasing. These increases are driven by advances in technology, which allow for the analysis of ever-larger data sets. Big Data (BD) has the potential to transform drug development and safety testing by providing new insights into the effects of drugs on human health. However, harnessing this potential involves several challenges, including the need for specialised skills and infrastructure. In this survey, we explore how BD approaches are currently being used in the pharmacology, toxicology, and pharmaceutics fields; in particular, we highlight how researchers have applied BD in pharmacology, toxicology, and pharmaceutics to address various challenges and establish solutions. A comparative analysis helps to trace the implementation of big data in the fields of pharmacology, toxicology, and pharmaceutics. Certain relevant limitations and directions for future research are emphasised. The pharmacology, toxicology, and pharmaceutics fields are still at an early stage of BD adoption, and there are many research challenges to be overcome, in order to effectively employ BD to address specific issues. Full article

(This article belongs to the Special Issue Review Papers in Big Data, Cloud-Based Data Analysis and Learning Systems)

► Show Figures

Figure 1

20 pages, 1982 KiB

Open AccessEditor’s ChoiceArticle

Using an Evidence-Based Approach for Policy-Making Based on Big Data Analysis and Applying Detection Techniques on Twitter

by Somayeh Labafi, Sanee Ebrahimzadeh, Mohamad Mahdi Kavousi, Habib Abdolhossein Maregani and Samad Sepasgozar

Big Data Cogn. Comput. 2022, 6(4), 160; https://doi.org/10.3390/bdcc6040160 - 19 Dec 2022

Viewed by 3079

Abstract

Evidence-based policy seeks to use evidence in public policy in a systematic way in a bid to improve decision-making quality. Evidence-based policy cannot work properly and achieve the expected results without accurate, appropriate, and sufficient evidence. Given the prevalence of social media and [...] Read more.

Evidence-based policy seeks to use evidence in public policy in a systematic way in a bid to improve decision-making quality. Evidence-based policy cannot work properly and achieve the expected results without accurate, appropriate, and sufficient evidence. Given the prevalence of social media and intense user engagement, the question to ask is whether the data on social media can be used as evidence in the policy-making process. The question gives rise to the debate on what characteristics of data should be considered as evidence. Despite the numerous research studies carried out on social media analysis or policy-making, this domain has not been dealt with through an “evidence detection” lens. Thus, this study addresses the gap in the literature on how to analyze the big text data produced by social media and how to use it for policy-making based on evidence detection. The present paper seeks to fill the gap by developing and offering a model that can help policy-makers to distinguish “evidence” from “non-evidence”. To do so, in the first phase of the study, the researchers elicited the characteristics of the “evidence” by conducting a thematic analysis of semi-structured interviews with experts and policy-makers. In the second phase, the developed model was tested against 6-month data elicited from Twitter accounts. The experimental results show that the evidence detection model performed better with decision tree (DT) than the other algorithms. Decision tree (DT) outperformed the other algorithms by an 85.9% accuracy score. This study shows how the model managed to fulfill the aim of the present study, which was detecting Twitter posts that can be used as evidence. This study contributes to the body of knowledge by exploring novel models of text processing and offering an efficient method for analyzing big text data. The practical implication of the study also lies in its efficiency and ease of use, which offers the required evidence for policy-makers. Full article

(This article belongs to the Special Issue Challenges and Perspectives of Social Networks within Social Computing)

► Show Figures

Figure 1

13 pages, 2240 KiB

Open AccessEditor’s ChoiceArticle

Proposal of Decentralized P2P Service Model for Transfer between Blockchain-Based Heterogeneous Cryptocurrencies and CBDCs

by Keundug Park and Heung-Youl Youm

Big Data Cogn. Comput. 2022, 6(4), 159; https://doi.org/10.3390/bdcc6040159 - 19 Dec 2022

Cited by 5 | Viewed by 3506

Abstract

This paper proposes a solution to the transfer problem between blockchain-based heterogeneous cryptocurrencies and CBDCs, with research derived from an analysis of the existing literature. Interoperability between heterogeneous blockchains has been an obstacle to service diversity and user convenience. Many types of cryptocurrencies [...] Read more.

This paper proposes a solution to the transfer problem between blockchain-based heterogeneous cryptocurrencies and CBDCs, with research derived from an analysis of the existing literature. Interoperability between heterogeneous blockchains has been an obstacle to service diversity and user convenience. Many types of cryptocurrencies are currently trading on the market, and many countries are researching and testing central bank digital currencies (CBDCs). In this paper, existing interoperability studies and solutions between heterogeneous blockchains and differences from the proposed service model are described. To enhance digital financial services and improve user convenience, transfer between heterogeneous cryptocurrencies, transfer between heterogeneous CBDCs, and transfer between cryptocurrency and CBDC should be required. This paper proposes an interoperable architecture between heterogeneous blockchains, and a decentralized peer-to-peer (P2P) service model based on the interoperable architecture for transferring between blockchain-based heterogeneous cryptocurrencies and CBDCs. Security threats to the proposed service model are identified and security requirements to prevent the identified security threats are specified. The mentioned security threats and security requirements should be considered when implementing the proposed service model. Full article

(This article belongs to the Special Issue Managing Cybersecurity Threats and Increasing Organizational Resilience)

► Show Figures

Figure 1

29 pages, 464 KiB

Open AccessEditor’s ChoiceArticle

What Is (Not) Big Data Based on Its 7Vs Challenges: A Survey

by Cristian González García and Eva Álvarez-Fernández

Big Data Cogn. Comput. 2022, 6(4), 158; https://doi.org/10.3390/bdcc6040158 - 14 Dec 2022

Cited by 4 | Viewed by 4328

Abstract

Big Data has changed how enterprises and people manage knowledge and make decisions. However, when talking about Big Data, so many times there are different definitions about what it is and what it is used for, as there are many interpretations and disagreements. [...] Read more.

Big Data has changed how enterprises and people manage knowledge and make decisions. However, when talking about Big Data, so many times there are different definitions about what it is and what it is used for, as there are many interpretations and disagreements. For these reasons, we have reviewed the literature to compile and provide a possible solution to the existing discrepancies between the terms Data Analysis, Data Mining, Knowledge Discovery in Databases, and Big Data. In addition, we have gathered the patterns used in Data Mining, the different phases of Knowledge Discovery in Databases, and some definitions of Big Data according to some important companies and organisations. Moreover, Big Data has challenges that sometimes are the same as its own characteristics. These characteristics are known as the Vs. Nonetheless, depending on the author, these Vs can be more or less, from 3 to 5, or even 7. Furthermore, the 4Vs or 5Vs are not the same every time. Therefore, in this survey, we reviewed the literature to explain how many Vs have been detected and explained according to different existing problems. In addition, we detected 7Vs, three of which had subtypes. Full article

23 pages, 735 KiB

Open AccessEditor’s ChoiceReview

Explore Big Data Analytics Applications and Opportunities: A Review

by Zaher Ali Al-Sai, Mohd Heikal Husin, Sharifah Mashita Syed-Mohamad, Rasha Moh’d Sadeq Abdin, Nour Damer, Laith Abualigah and Amir H. Gandomi

Big Data Cogn. Comput. 2022, 6(4), 157; https://doi.org/10.3390/bdcc6040157 - 14 Dec 2022

Cited by 22 | Viewed by 12745

Abstract

Big data applications and analytics are vital in proposing ultimate strategic decisions. The existing literature emphasizes that big data applications and analytics can empower those who apply Big Data Analytics during the COVID-19 pandemic. This paper reviews the existing literature specializing in big [...] Read more.

Big data applications and analytics are vital in proposing ultimate strategic decisions. The existing literature emphasizes that big data applications and analytics can empower those who apply Big Data Analytics during the COVID-19 pandemic. This paper reviews the existing literature specializing in big data applications pre and peri-COVID-19. A comparison between Pre and Peri of the pandemic for using Big Data applications is presented. The comparison is expanded to four highly recognized industry fields: Healthcare, Education, Transportation, and Banking. A discussion on the effectiveness of the four major types of data analytics across the mentioned industries is highlighted. Hence, this paper provides an illustrative description of the importance of big data applications in the era of COVID-19, as well as aligning the applications to their relevant big data analytics models. This review paper concludes that applying the ultimate big data applications and their associated data analytics models can harness the significant limitations faced by organizations during one of the most fateful pandemics worldwide. Future work will conduct a systematic literature review and a comparative analysis of the existing Big Data Systems and models. Moreover, future work will investigate the critical challenges of Big Data Analytics and applications during the COVID-19 pandemic. Full article

(This article belongs to the Special Issue Review Papers in Big Data, Cloud-Based Data Analysis and Learning Systems)

► Show Figures

Figure 1

23 pages, 4084 KiB

Open AccessArticle

Locating Source Code Bugs in Software Information Systems Using Information Retrieval Techniques

by Ali Alawneh, Iyad M. Alazzam and Khadijah Shatnawi

Big Data Cogn. Comput. 2022, 6(4), 156; https://doi.org/10.3390/bdcc6040156 - 13 Dec 2022

Cited by 2 | Viewed by 2319

Abstract

Bug localization is the process through which the buggy source code files are located regarding a certain bug report. Bug localization is an overwhelming and time-consuming process. Automating bug localization is the key to help developers and increase their productivities. Expanding bug reports [...] Read more.

Bug localization is the process through which the buggy source code files are located regarding a certain bug report. Bug localization is an overwhelming and time-consuming process. Automating bug localization is the key to help developers and increase their productivities. Expanding bug reports with more semantic and increasing software understanding using information retrieval and natural language techniques will be the way to locate the buggy source code file, in which the bug report works as a query and source code as search space. This research investigates the effect of segmenting open source files into executable code and comments, as they have a conflicting nature, seeks the effect of synonyms on the accuracy of bug localization, and examines the effect of “part-of-speech” techniques on reducing the manual inspection for appropriate synonyms. This research aims to approve that such methods improve the accuracy of bug localization tasks. The used approach was evaluated on three Java open source software, namely Eclipse 3.1, AspectJ 1.0, and SWT 3.1; we implement our dedicated Java tool to adopt our methodology and conduct several experiments on each software. The experimental results reveal a considerable improvement in recall and precision levels, and the developed methods display an accuracy improvement of 4–10% compared with the state-of-the-art approaches. Full article

(This article belongs to the Special Issue Human Factor in Information Systems Development and Management)

► Show Figures

Figure 1

17 pages, 1860 KiB

Open AccessEditor’s ChoiceArticle

Explaining Exploration–Exploitation in Humans

by Antonio Candelieri, Andrea Ponti and Francesco Archetti

Big Data Cogn. Comput. 2022, 6(4), 155; https://doi.org/10.3390/bdcc6040155 - 12 Dec 2022

Cited by 1 | Viewed by 1994

Abstract

Human as well as algorithmic searches are performed to balance exploration and exploitation. The search task in this paper is the global optimization of a 2D multimodal function, unknown to the searcher. Thus, the task presents the following features: (i) uncertainty [...] Read more.

Human as well as algorithmic searches are performed to balance exploration and exploitation. The search task in this paper is the global optimization of a 2D multimodal function, unknown to the searcher. Thus, the task presents the following features: (i) uncertainty (i.e., information about the function can be acquired only through function observations), (ii) sequentiality (i.e., the choice of the next point to observe depends on the previous ones), and (iii) limited budget (i.e., a maximum number of sequential choices allowed to the players). The data about human behavior are gathered through a gaming app whose screen represents all the possible locations the player can click on. The associated value of the unknown function is shown to the player. Experimental data are gathered from 39 subjects playing 10 different tasks each. Decisions are analyzed in a Pareto optimality setting—improvement vs. uncertainty. The experimental results show that the most significant deviations from the Pareto rationality are associated with a behavior named “exasperated exploration”, close to random search. This behavior shows a statistically significant association with stressful situations occurring when, according to their current belief, the human feels there are no chances to improve over the best value observed so far, while the remaining budget is running out. To classify between Pareto and Not-Pareto decisions, an explainable/interpretable Machine Learning model based on Decision Tree learning is developed. The resulting model is used to implement a synthetic human searcher/optimizer successively compared against Bayesian Optimization. On half of the test problems, the synthetic human results as more effective and efficient. Full article

► Show Figures

Figure 1

31 pages, 6664 KiB

Open AccessEditor’s ChoiceReview

Machine Learning Styles for Diabetic Retinopathy Detection: A Review and Bibliometric Analysis

by Shyamala Subramanian, Sashikala Mishra, Shruti Patil, Kailash Shaw and Ebrahim Aghajari

Big Data Cogn. Comput. 2022, 6(4), 154; https://doi.org/10.3390/bdcc6040154 - 12 Dec 2022

Cited by 12 | Viewed by 8169

Abstract

Diabetic retinopathy (DR) is a medical condition caused by diabetes. The development of retinopathy significantly depends on how long a person has had diabetes. Initially, there may be no symptoms or just a slight vision problem due to impairment of the retinal blood [...] Read more.

Diabetic retinopathy (DR) is a medical condition caused by diabetes. The development of retinopathy significantly depends on how long a person has had diabetes. Initially, there may be no symptoms or just a slight vision problem due to impairment of the retinal blood vessels. Later, it may lead to blindness. Recognizing the early clinical signs of DR is very important for intervening in and effectively treating DR. Thus, regular eye check-ups are necessary to direct the person to a doctor for a comprehensive ocular examination and treatment as soon as possible to avoid permanent vision loss. Nevertheless, due to limited resources, it is not feasible for screening. As a result, emerging technologies, such as artificial intelligence, for the automatic detection and classification of DR are alternative screening methodologies and thereby make the system cost-effective. People have been working on artificial-intelligence-based technologies to detect and analyze DR in recent years. This study aimed to investigate different machine learning styles that are chosen for diagnosing retinopathy. Thus, a bibliometric analysis was systematically done to discover different machine learning styles for detecting diabetic retinopathy. The data were exported from popular databases, namely, Web of Science (WoS) and Scopus. These data were analyzed using Biblioshiny and VOSviewer in terms of publications, top countries, sources, subject area, top authors, trend topics, co-occurrences, thematic evolution, factorial map, citation analysis, etc., which form the base for researchers to identify the research gaps in diabetic retinopathy detection and classification. Full article

► Show Figures

Figure 1

22 pages, 2447 KiB

Open AccessEditor’s ChoiceArticle

An Advanced Big Data Quality Framework Based on Weighted Metrics

by Widad Elouataoui, Imane El Alaoui, Saida El Mendili and Youssef Gahi

Big Data Cogn. Comput. 2022, 6(4), 153; https://doi.org/10.3390/bdcc6040153 - 9 Dec 2022

Cited by 14 | Viewed by 4181

Abstract

While big data benefits are numerous, the use of big data requires, however, addressing new challenges related to data processing, data security, and especially degradation of data quality. Despite the increased importance of data quality for big data, data quality measurement is actually [...] Read more.

While big data benefits are numerous, the use of big data requires, however, addressing new challenges related to data processing, data security, and especially degradation of data quality. Despite the increased importance of data quality for big data, data quality measurement is actually limited to few metrics. Indeed, while more than 50 data quality dimensions have been defined in the literature, the number of measured dimensions is limited to 11 dimensions. Therefore, this paper aims to extend the measured dimensions by defining four new data quality metrics: Integrity, Accessibility, Ease of manipulation, and Security. Thus, we propose a comprehensive Big Data Quality Assessment Framework based on 12 metrics: Completeness, Timeliness, Volatility, Uniqueness, Conformity, Consistency, Ease of manipulation, Relevancy, Readability, Security, Accessibility, and Integrity. In addition, to ensure accurate data quality assessment, we apply data weights at three data unit levels: data fields, quality metrics, and quality aspects. Furthermore, we define and measure five quality aspects to provide a macro-view of data quality. Finally, an experiment is performed to implement the defined measures. The results show that the suggested methodology allows a more exhaustive and accurate big data quality assessment, with a more extensive methodology defining a weighted quality score based on 12 metrics and achieving a best quality model score of 9/10. Full article

► Show Figures

Figure 1

41 pages, 6572 KiB

Open AccessEditor’s ChoiceReview

A Systematic Literature Review on Diabetic Retinopathy Using an Artificial Intelligence Approach

by Pooja Bidwai, Shilpa Gite, Kishore Pahuja and Ketan Kotecha

Big Data Cogn. Comput. 2022, 6(4), 152; https://doi.org/10.3390/bdcc6040152 - 8 Dec 2022

Cited by 19 | Viewed by 13265

Abstract

Diabetic retinopathy occurs due to long-term diabetes with changing blood glucose levels and has become the most common cause of vision loss worldwide. It has become a severe problem among the working-age group that needs to be solved early to avoid vision loss [...] Read more.

Diabetic retinopathy occurs due to long-term diabetes with changing blood glucose levels and has become the most common cause of vision loss worldwide. It has become a severe problem among the working-age group that needs to be solved early to avoid vision loss in the future. Artificial intelligence-based technologies have been utilized to detect and grade diabetic retinopathy at the initial level. Early detection allows for proper treatment and, as a result, eyesight complications can be avoided. The in-depth analysis now details the various methods for diagnosing diabetic retinopathy using blood vessels, microaneurysms, exudates, macula, optic discs, and hemorrhages. In most trials, fundus images of the retina are used, which are taken using a fundus camera. This survey discusses the basics of diabetes, its prevalence, complications, and artificial intelligence approaches to deal with the early detection and classification of diabetic retinopathy. The research also discusses artificial intelligence-based techniques such as machine learning and deep learning. New research fields such as transfer learning using generative adversarial networks, domain adaptation, multitask learning, and explainable artificial intelligence in diabetic retinopathy are also considered. A list of existing datasets, screening systems, performance measurements, biomarkers in diabetic retinopathy, potential issues, and challenges faced in ophthalmology, followed by the future scope conclusion, is discussed. To the author, no other literature has analyzed recent state-of-the-art techniques considering the PRISMA approach and artificial intelligence as the core. Full article

(This article belongs to the Collection Machine Learning and Artificial Intelligence for Health Applications on Social Networks)

► Show Figures

Figure 1

21 pages, 3495 KiB

Open AccessEditor’s ChoiceArticle

Innovative Business Process Reengineering Adoption: Framework of Big Data Sentiment, Improving Customers’ Service Level Agreement

by Heru Susanto, Aida Sari and Fang-Yie Leu

Big Data Cogn. Comput. 2022, 6(4), 151; https://doi.org/10.3390/bdcc6040151 - 8 Dec 2022

Cited by 7 | Viewed by 3070

Abstract

Social media is now regarded as the most valuable source of data for trend analysis and innovative business process reengineering preferences. Data made accessible through social media can be utilized for a variety of purposes, such as by an entrepreneur who wants to [...] Read more.

Social media is now regarded as the most valuable source of data for trend analysis and innovative business process reengineering preferences. Data made accessible through social media can be utilized for a variety of purposes, such as by an entrepreneur who wants to learn more about the market they intend to enter and uncover their consumers’ requirements before launching their new products or services. Sentiment analysis and text mining of telecommunication businesses via social media posts and comments are the subject of this study. A proposed framework will be utilized as a guideline, and it will be tested for sentiment analysis. Lexicon-based sentiment categorization is used as a model training dataset for a supervised machine learning support vector machine. The result is very promising. The accuracy and the quantity of the true sentiments it can detect are compared. This result signifies the usefulness of text mining and sentiment analysis on social media data, while the use of machine learning classifiers for predicting sentiment orientation provides a useful tool for operations and marketing departments. The availability of large amounts of data in this digitally active society is advantageous for sectors such as the telecommunication industry. These companies can be two steps ahead with their strategy and develop a more cohesive company that can make customers happier and mitigate problems easily with the use of text mining and sentiment analysis for further adopting innovative business process reengineering for service improvements within the telecommunications industry. Full article

(This article belongs to the Special Issue Advanced Data Mining Techniques for IoT and Big Data)

► Show Figures

Figure 1

12 pages, 1047 KiB

Open AccessArticle

EffResUNet: Encoder Decoder Architecture for Cloud-Type Segmentation

by Sunveg Nalwar, Kunal Shah, Ranjeet Vasant Bidwe, Bhushan Zope, Deepak Mane, Veena Jadhav and Kailash Shaw

Big Data Cogn. Comput. 2022, 6(4), 150; https://doi.org/10.3390/bdcc6040150 - 7 Dec 2022

Cited by 7 | Viewed by 2375

Abstract

Clouds play a vital role in Earth’s water cycle and the energy balance of the climate system; understanding them and their composition is crucial in comprehending the Earth–atmosphere system. The dataset “Understanding Clouds from Satellite Images” contains cloud pattern images downloaded from NASA [...] Read more.

Clouds play a vital role in Earth’s water cycle and the energy balance of the climate system; understanding them and their composition is crucial in comprehending the Earth–atmosphere system. The dataset “Understanding Clouds from Satellite Images” contains cloud pattern images downloaded from NASA Worldview, captured by the satellites divided into four classes, labeled Fish, Flower, Gravel, and Sugar. Semantic segmentation, also known as semantic labeling, is a fundamental yet complex problem in remote sensing image interpretation of assigning pixel-by-pixel semantic class labels to a given picture. In this study, we propose a novel approach for the semantic segmentation of cloud patterns. We began our study with a simple convolutional neural network-based model. We worked our way up to a complex model consisting of a U-shaped encoder-decoder network, residual blocks, and an attention mechanism for efficient and accurate semantic segmentation. Being an architecture of the first of its kind, the model achieved an IoU score of 0.4239 and a Dice coefficient of 0.5557, both of which are improvements over the previous research conducted in this field. Full article

► Show Figures

Figure 1

16 pages, 4442 KiB

Open AccessEditor’s ChoiceArticle

Yolov5 Series Algorithm for Road Marking Sign Identification

by Christine Dewi, Rung-Ching Chen, Yong-Cun Zhuang and Henoch Juli Christanto

Big Data Cogn. Comput. 2022, 6(4), 149; https://doi.org/10.3390/bdcc6040149 - 7 Dec 2022

Cited by 21 | Viewed by 5761

Abstract

Road markings and signs provide vehicles and pedestrians with essential information that assists them to follow the traffic regulations. Road surface markings include pedestrian crossings, directional arrows, zebra crossings, speed limit signs, other similar signs and text, and so on, which are usually [...] Read more.

Road markings and signs provide vehicles and pedestrians with essential information that assists them to follow the traffic regulations. Road surface markings include pedestrian crossings, directional arrows, zebra crossings, speed limit signs, other similar signs and text, and so on, which are usually painted directly onto the road surface. Road markings fulfill a variety of important functions, such as alerting drivers to the potentially hazardous road section, directing traffic, prohibiting certain actions, and slowing down. This research paper provides a summary of the Yolov5 algorithm series for road marking sign identification, which includes Yolov5s, Yolov5m, Yolov5n, Yolov5l, and Yolov5x. This study explores a wide range of contemporary object detectors, such as the ones that are used to determine the location of road marking signs. Performance metrics monitor important data, including the quantity of BFLOPS, the mean average precision (mAP), and the detection time (IoU). Our findings shows that Yolov5m is the most stable method compared to other methods with 76% precision, 86% recall, and 83% mAP during the training stage. Moreover, Yolov5m and Yolov5l achieve the highest score, mAP 87% on average in the testing stage. In addition, we have created a new dataset for road marking signs in Taiwan, called TRMSD. Full article

(This article belongs to the Special Issue Computational Collective Intelligence with Big Data–AI Society)

► Show Figures

Figure 1

13 pages, 879 KiB

Open AccessEditor’s ChoiceArticle

Trust-Based Data Communication in Wireless Body Area Network for Healthcare Applications

by Sangeetha Ramaswamy and Usha Devi Gandhi

Big Data Cogn. Comput. 2022, 6(4), 148; https://doi.org/10.3390/bdcc6040148 - 1 Dec 2022

Cited by 7 | Viewed by 2746

Abstract

A subset of Wireless Sensor Networks, Wireless Body Area Networks (WBAN) is an emerging technology. WBAN is a collection of tiny pieces of wireless body sensors with small computational capability, communicating short distances using ZigBee or Bluetooth, an application mainly in the healthcare [...] Read more.

A subset of Wireless Sensor Networks, Wireless Body Area Networks (WBAN) is an emerging technology. WBAN is a collection of tiny pieces of wireless body sensors with small computational capability, communicating short distances using ZigBee or Bluetooth, an application mainly in the healthcare industry like remote patient monitoring. The small piece of sensor monitors health factors like body temperature, pulse rate, ECG, heart rate, etc., and communicates to the base station or central coordinator for aggregation or data computation. The final data is communicated to remote monitoring devices through the internet or cloud service providers. The main challenge for this technology is energy consumption and secure communication within the network and the possibility of attacks executed by malicious nodes, creating problems for the network. This system proposes a suitable trust model for secure communication in WBAN based on node trust and data trust. Node trust is calculated using direct trust calculation and node behaviours. The data trust is calculated using consistent data success and data aging. The performance is compared with an existing protocol like Trust Evaluation (TE)-WBAN and Body Area Network (BAN)-Trust which is not a cryptographic technique. The protocol is lightweight and has low overhead. The performance is rated best for Throughput, Packet Delivery Ratio, and Minimum delay. With extensive simulation on-off attacks, Selfishness attacks, sleeper attacks, and Message suppression attacks were prevented. Full article

(This article belongs to the Special Issue Computational Collective Intelligence with Big Data–AI Society)

► Show Figures

Figure 1

17 pages, 844 KiB

Open AccessArticle

NLP-Based Bi-Directional Recommendation System: Towards Recommending Jobs to Job Seekers and Resumes to Recruiters

by Suleiman Ali Alsaif, Minyar Sassi Hidri, Imen Ferjani, Hassan Ahmed Eleraky and Adel Hidri

Big Data Cogn. Comput. 2022, 6(4), 147; https://doi.org/10.3390/bdcc6040147 - 1 Dec 2022

Cited by 20 | Viewed by 5525

Abstract

For more than ten years, online job boards have provided their services to both job seekers and employers who want to hire potential candidates. The provided services are generally based on traditional information retrieval techniques, which may not be appropriate for both job [...] Read more.

For more than ten years, online job boards have provided their services to both job seekers and employers who want to hire potential candidates. The provided services are generally based on traditional information retrieval techniques, which may not be appropriate for both job seekers and employers. The reason is that the number of produced results for job seekers may be enormous. Therefore, they are required to spend time reading and reviewing their finding criteria. Reciprocally, recruitment is a crucial process for every organization. Identifying potential candidates and matching them with job offers requires a wide range of expertise and knowledge. This article proposes a reciprocal recommendation based on bi-directional correspondence as a way to support both recruiters’ and job seekers’ work. Recruiters can find the best-fit candidates for every job position in their job postings, and job seekers can find the best-match jobs to match their resumes. We show how machine learning can solve problems in natural language processing of text content and similarity scores depending on job offers in major Saudi cities scraped from Indeed. For bi-directional matching, a similarity calculation based on the integration of explicit and implicit job information from two sides (recruiters and job seekers) has been used. The proposed system is evaluated using a resume/job offer dataset. The performance of generated recommendations is evaluated using decision support measures. Obtained results confirm that the proposed system can not only solve the problem of bi-directional recommendation, but also improve the prediction accuracy. Full article

► Show Figures

Figure 1

19 pages, 6928 KiB

Open AccessEditor’s ChoiceArticle

Image Fundus Classification System for Diabetic Retinopathy Stage Detection Using Hybrid CNN-DELM

by Dian Candra Rini Novitasari, Fatmawati Fatmawati, Rimuljo Hendradi, Hetty Rohayani, Rinda Nariswari, Arnita Arnita, Moch Irfan Hadi, Rizal Amegia Saputra and Ardhin Primadewi

Big Data Cogn. Comput. 2022, 6(4), 146; https://doi.org/10.3390/bdcc6040146 - 1 Dec 2022

Cited by 8 | Viewed by 2670

Abstract

Diabetic retinopathy is the leading cause of blindness suffered by working-age adults. The increase in the population diagnosed with DR can be prevented by screening and early treatment of eye damage. This screening process can be conducted by utilizing deep learning techniques. In [...] Read more.

Diabetic retinopathy is the leading cause of blindness suffered by working-age adults. The increase in the population diagnosed with DR can be prevented by screening and early treatment of eye damage. This screening process can be conducted by utilizing deep learning techniques. In this study, the detection of DR severity was carried out using the hybrid CNN-DELM method (CDELM). The CNN architectures used were ResNet-18, ResNet-50, ResNet-101, GoogleNet, and DenseNet. The learning outcome features were further classified using the DELM algorithm. The comparison of CNN architecture aimed to find the best CNN architecture for fundus image features extraction. This research also compared the effect of using the kernel function on the performance of DELM in fundus image classification. All experiments using CDELM showed maximum results, with an accuracy of 100% in the DRIVE data and the two-class MESSIDOR data. Meanwhile, the best results obtained in the MESSIDOR 4 class data reached 98.20%. The advantage of the DELM method compared to the conventional CNN method is that the training time duration is much shorter. CNN takes an average of 30 min for training, while the CDELM method takes only an average of 2.5 min. Based on the value of accuracy and duration of training time, the CDELM method had better performance than the conventional CNN method. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

15 pages, 627 KiB

Open AccessArticle

Adverse Drug Reaction Concept Normalization in Russian-Language Reviews of Internet Users

by Alexander Sboev, Roman Rybka, Artem Gryaznov, Ivan Moloshnikov, Sanna Sboeva, Gleb Rylkov and Anton Selivanov

Big Data Cogn. Comput. 2022, 6(4), 145; https://doi.org/10.3390/bdcc6040145 - 29 Nov 2022

Cited by 2 | Viewed by 2331

Abstract

Mapping the pharmaceutically significant entities on natural language to standardized terms/concepts is a key task in the development of the systems for pharmacovigilance, marketing, and using drugs out of the application scope. This work estimates the accuracy of mapping adverse reaction mentions to [...] Read more.

Mapping the pharmaceutically significant entities on natural language to standardized terms/concepts is a key task in the development of the systems for pharmacovigilance, marketing, and using drugs out of the application scope. This work estimates the accuracy of mapping adverse reaction mentions to the concepts from the Medical Dictionary of Regulatory Activity (MedDRA) in the case of adverse reactions extracted from the reviews on the use of pharmaceutical products by Russian-speaking Internet users (normalization task). The solution we propose is based on a neural network approach using two neural network models: the first one for encoding concepts, and the second one for encoding mentions. Both models are pre-trained language models, but the second one is additionally tuned for the normalization task using both the Russian Drug Reviews (RDRS) corpus and a set of open English-language corpora automatically translated into Russian. Additional tuning of the model during the proposed procedure increases the accuracy of mentions of adverse drug reactions by 3% on the RDRS corpus. The resulting accuracy for the adverse reaction mentions mapping to the preferred terms of MedDRA in RDRS is 70.9%

F 1

-micro. The paper analyzes the factors that affect the accuracy of solving the task based on a comparison of the RDRS and the CSIRO Adverse Drug Event Corpus (CADEC) corpora. It is shown that the composition of the concepts of the MedDRA and the number of examples for each concept play a key role in the task solution. The proposed model shows a comparable accuracy of 87.5%

F 1

-micro on a subsample of RDRS and CADEC datasets with the same set of MedDRA preferred terms. Full article

► Show Figures

Figure 1

11 pages, 1092 KiB

Open AccessArticle

ACUX Recommender: A Mobile Recommendation System for Multi-Profile Cultural Visitors Based on Visiting Preferences Classification

by Markos Konstantakis, Yannis Christodoulou, John Aliprantis and George Caridakis

Big Data Cogn. Comput. 2022, 6(4), 144; https://doi.org/10.3390/bdcc6040144 - 28 Nov 2022

Cited by 12 | Viewed by 3109

Abstract

In recent years, Recommendation Systems (RSs) have gained popularity in different scientific fields through the creation of (mostly mobile) applications that deliver personalized services. A mobile recommendation system (MRS) that classifies in situ visitors according to different visiting profiles could act as a [...] Read more.

In recent years, Recommendation Systems (RSs) have gained popularity in different scientific fields through the creation of (mostly mobile) applications that deliver personalized services. A mobile recommendation system (MRS) that classifies in situ visitors according to different visiting profiles could act as a mediator between their visiting preferences and cultural content. Drawing on the above, in this paper, we propose ACUX Recommender (ACUX-R), an MRS, for recommending personalized cultural POIs to visitors based on their visiting preferences. ACUX-R experimentally employs the ACUX typology for assigning profiles to cultural visitors. ACUX-R was evaluated through a user study and a questionnaire. The evaluation conducted showed that the proposed ACUX-R satisfies cultural visitors and is capable of capturing their nonverbal visiting preferences and needs. Full article

(This article belongs to the Special Issue Big Data Analytics for Cultural Heritage)

► Show Figures

Figure 1

15 pages, 2125 KiB

Open AccessArticle

Security and Privacy Threats and Requirements for the Centralized Contact Tracing System in Korea

by Sungchae Park and Heung-Youl Youm

Big Data Cogn. Comput. 2022, 6(4), 143; https://doi.org/10.3390/bdcc6040143 - 28 Nov 2022

Cited by 2 | Viewed by 4293

Abstract

As COVID-19 became a pandemic worldwide, contact tracing technologies and information systems were developed for quick control of infectious diseases in both the private and public sectors. This study aims to strengthen the data subject’s security, privacy, and rights in a centralized contact [...] Read more.

As COVID-19 became a pandemic worldwide, contact tracing technologies and information systems were developed for quick control of infectious diseases in both the private and public sectors. This study aims to strengthen the data subject’s security, privacy, and rights in a centralized contact tracing system adopted for a quick response to the spread of infectious diseases due to climate change, increasing cross-border movement, etc. There are several types of contact tracing systems: centralized, decentralized, and hybrid models. This study demonstrates the privacy model for a centralized contact tracing system, focusing on the case in Korea. Hence, we define security and privacy threats to the centralized contact tracing system. The threat analysis involved mapping the threats in ITU-T X.1121; in order to validate the defined threats, we used LIDDUN and STRIDE to map the threats. In addition, this study provides security requirements for each threat defined for more secure utilization of the centralized contact tracing system. Full article

(This article belongs to the Special Issue Managing Cybersecurity Threats and Increasing Organizational Resilience)

► Show Figures

Figure 1

13 pages, 1011 KiB

Open AccessArticle

Moving Object Detection Based on a Combination of Kalman Filter and Median Filtering

by Diana Kalita and Pavel Lyakhov

Big Data Cogn. Comput. 2022, 6(4), 142; https://doi.org/10.3390/bdcc6040142 - 25 Nov 2022

Cited by 2 | Viewed by 2937

Abstract

The task of determining the distance from one object to another is one of the important tasks solved in robotics systems. Conventional algorithms rely on an iterative process of predicting distance estimates, which results in an increased computational burden. Algorithms used in robotic [...] Read more.

The task of determining the distance from one object to another is one of the important tasks solved in robotics systems. Conventional algorithms rely on an iterative process of predicting distance estimates, which results in an increased computational burden. Algorithms used in robotic systems should require minimal time costs, as well as be resistant to the presence of noise. To solve these problems, the paper proposes an algorithm for Kalman combination filtering with a Goldschmidt divisor and a median filter. Software simulation showed an increase in the accuracy of predicting the estimate of the developed algorithm in comparison with the traditional filtering algorithm, as well as an increase in the speed of the algorithm. The results obtained can be effectively applied in various computer vision systems. Full article

► Show Figures

Figure 1

20 pages, 6697 KiB

Open AccessEditor’s ChoiceArticle

Image Segmentation for Mitral Regurgitation with Convolutional Neural Network Based on UNet, Resnet, Vnet, FractalNet and SegNet: A Preliminary Study

by Linda Atika, Siti Nurmaini, Radiyati Umi Partan and Erwin Sukandi

Big Data Cogn. Comput. 2022, 6(4), 141; https://doi.org/10.3390/bdcc6040141 - 25 Nov 2022

Cited by 5 | Viewed by 2931

Abstract

The heart’s mitral valve is the valve that separates the chambers of the heart between the left atrium and left ventricle. Heart valve disease is a fairly common heart disease, and one type of heart valve disease is mitral regurgitation, which is an [...] Read more.

The heart’s mitral valve is the valve that separates the chambers of the heart between the left atrium and left ventricle. Heart valve disease is a fairly common heart disease, and one type of heart valve disease is mitral regurgitation, which is an abnormality of the mitral valve on the left side of the heart that causes an inability of the mitral valve to close properly. Convolutional Neural Network (CNN) is a type of deep learning that is suitable for use in image analysis. Segmentation is widely used in analyzing medical images because it can divide images into simpler ones to facilitate the analysis process by separating objects that are not analyzed into backgrounds and objects to be analyzed into foregrounds. This study builds a dataset from the data of patients with mitral regurgitation and patients who have normal hearts, and heart valve image analysis is done by segmenting the images of their mitral heart valves. Several types of CNN architecture were applied in this research, including U-Net, SegNet, V-Net, FractalNet, and ResNet architectures. The experimental results show that the best architecture is U-Net3 in terms of Pixel Accuracy (97.59%), Intersection over Union (86.98%), Mean Accuracy (93.46%), Precision (85.60%), Recall (88.39%), and Dice Coefficient (86.58%). Full article

(This article belongs to the Special Issue Advancements in Deep Learning and Deep Federated Learning Models)

► Show Figures

Figure 1

23 pages, 2848 KiB

Open AccessSystematic Review

Physics-Informed Neural Network (PINN) Evolution and Beyond: A Systematic Literature Review and Bibliometric Analysis

by Zaharaddeen Karami Lawal, Hayati Yassin, Daphne Teck Ching Lai and Azam Che Idris

Big Data Cogn. Comput. 2022, 6(4), 140; https://doi.org/10.3390/bdcc6040140 - 21 Nov 2022

Cited by 37 | Viewed by 21549

Abstract

This research aims to study and assess state-of-the-art physics-informed neural networks (PINNs) from different researchers’ perspectives. The PRISMA framework was used for a systematic literature review, and 120 research articles from the computational sciences and engineering domain were specifically classified through a well-defined [...] Read more.

This research aims to study and assess state-of-the-art physics-informed neural networks (PINNs) from different researchers’ perspectives. The PRISMA framework was used for a systematic literature review, and 120 research articles from the computational sciences and engineering domain were specifically classified through a well-defined keyword search in Scopus and Web of Science databases. Through bibliometric analyses, we have identified journal sources with the most publications, authors with high citations, and countries with many publications on PINNs. Some newly improved techniques developed to enhance PINN performance and reduce high training costs and slowness, among other limitations, have been highlighted. Different approaches have been introduced to overcome the limitations of PINNs. In this review, we categorized the newly proposed PINN methods into Extended PINNs, Hybrid PINNs, and Minimized Loss techniques. Various potential future research directions are outlined based on the limitations of the proposed solutions. Full article

(This article belongs to the Special Issue Sustainable Big Data Analytics and Machine Learning Technologies)

► Show Figures

Figure 1

14 pages, 490 KiB

Open AccessArticle

Lung Cancer Risk Prediction with Machine Learning Models

by Elias Dritsas and Maria Trigka

Big Data Cogn. Comput. 2022, 6(4), 139; https://doi.org/10.3390/bdcc6040139 - 15 Nov 2022

Cited by 61 | Viewed by 14043

Abstract

The lungs are the center of breath control and ensure that every cell in the body receives oxygen. At the same time, they filter the air to prevent the entry of useless substances and germs into the body. The human body has specially [...] Read more.

The lungs are the center of breath control and ensure that every cell in the body receives oxygen. At the same time, they filter the air to prevent the entry of useless substances and germs into the body. The human body has specially designed defence mechanisms that protect the lungs. However, they are not enough to completely eliminate the risk of various diseases that affect the lungs. Infections, inflammation or even more serious complications, such as the growth of a cancerous tumor, can affect the lungs. In this work, we used machine learning (ML) methods to build efficient models for identifying high-risk individuals for incurring lung cancer and, thus, making earlier interventions to avoid long-term complications. The suggestion of this article is the Rotation Forest that achieves high performance and is evaluated by well-known metrics, such as precision, recall, F-Measure, accuracy and area under the curve (AUC). More specifically, the evaluation of the experiments showed that the proposed model prevailed with an AUC of 99.3%, F-Measure, precision, recall and accuracy of 97.1%. Full article

► Show Figures

Figure 1

13 pages, 4030 KiB

Open AccessArticle

The “Unreasonable” Effectiveness of the Wasserstein Distance in Analyzing Key Performance Indicators of a Network of Stores

by Andrea Ponti, Ilaria Giordani, Matteo Mistri, Antonio Candelieri and Francesco Archetti

Big Data Cogn. Comput. 2022, 6(4), 138; https://doi.org/10.3390/bdcc6040138 - 15 Nov 2022

Cited by 2 | Viewed by 2746

Abstract

Large retail companies routinely gather huge amounts of customer data, which are to be analyzed at a low granularity. To enable this analysis, several Key Performance Indicators (KPIs), acquired for each customer through different channels are associated to the main drivers of the [...] Read more.

Large retail companies routinely gather huge amounts of customer data, which are to be analyzed at a low granularity. To enable this analysis, several Key Performance Indicators (KPIs), acquired for each customer through different channels are associated to the main drivers of the customer experience. Analyzing the samples of customer behavior only through parameters such as average and variance does not cope with the growing heterogeneity of customers. In this paper, we propose a different approach in which the samples from customer surveys are represented as discrete probability distributions whose similarities can be assessed by different models. The focus is on the Wasserstein distance, which is generally well defined, even when other distributional distances are not, and it provides an interpretable distance metric between distributions. The support of the distributions can be both one- and multi-dimensional, allowing for the joint consideration of several KPIs for each store, leading to a multi-variate histogram. Moreover, the Wasserstein barycenter offers a useful synthesis of a set of distributions and can be used as a reference distribution to characterize and classify behavioral patterns. Experimental results of real data show the effectiveness of the Wasserstein distance in providing global performance measures. Full article

(This article belongs to the Special Issue Review Papers in Big Data, Cloud-Based Data Analysis and Learning Systems)

► Show Figures

Figure 1

13 pages, 367 KiB

Open AccessEditor’s ChoiceArticle

PSO-Driven Feature Selection and Hybrid Ensemble for Network Anomaly Detection

by Maya Hilda Lestari Louk and Bayu Adhi Tama

Big Data Cogn. Comput. 2022, 6(4), 137; https://doi.org/10.3390/bdcc6040137 - 13 Nov 2022

Cited by 7 | Viewed by 2813

Abstract

As a system capable of monitoring and evaluating illegitimate network access, an intrusion detection system (IDS) profoundly impacts information security research. Since machine learning techniques constitute the backbone of IDS, it has been challenging to develop an accurate detection mechanism. This study aims [...] Read more.

As a system capable of monitoring and evaluating illegitimate network access, an intrusion detection system (IDS) profoundly impacts information security research. Since machine learning techniques constitute the backbone of IDS, it has been challenging to develop an accurate detection mechanism. This study aims to enhance the detection performance of IDS by using a particle swarm optimization (PSO)-driven feature selection approach and hybrid ensemble. Specifically, the final feature subsets derived from different IDS datasets, i.e., NSL-KDD, UNSW-NB15, and CICIDS-2017, are trained using a hybrid ensemble, comprising two well-known ensemble learners, i.e., gradient boosting machine (GBM) and bootstrap aggregation (bagging). Instead of training GBM with individual ensemble learning, we train GBM on a subsample of each intrusion dataset and combine the final class prediction using majority voting. Our proposed scheme led to pivotal refinements over existing baselines, such as TSE-IDS, voting ensembles, weighted majority voting, and other individual ensemble-based IDS such as LightGBM. Full article

(This article belongs to the Special Issue Managing Cybersecurity Threats and Increasing Organizational Resilience)

► Show Figures

Figure 1

17 pages, 3101 KiB

Open AccessArticle

Improving Natural Language Person Description Search from Videos with Language Model Fine-Tuning and Approximate Nearest Neighbor

by Sumeth Yuenyong and Konlakorn Wongpatikaseree

Big Data Cogn. Comput. 2022, 6(4), 136; https://doi.org/10.3390/bdcc6040136 - 11 Nov 2022

Viewed by 2012

Abstract

Due to the ubiquitous nature of CCTV cameras that record continuously, there is a large amount of video data that are unstructured. Often, when these recordings have to be reviewed, it is to look for a specific person that fits a certain description. [...] Read more.

Due to the ubiquitous nature of CCTV cameras that record continuously, there is a large amount of video data that are unstructured. Often, when these recordings have to be reviewed, it is to look for a specific person that fits a certain description. Currently, this is achieved by manual inspection of the videos, which is both time-consuming and labor-intensive. While person description search is not a new topic, in this work, we made two contributions. First, we improve upon the existing state-of-the-art by proposing unsupervised finetuning on the language model that forms a main part of the text branch of person description search models. This led to higher recall values on the standard dataset. The second contribution is that we engineered a complete pipeline from video files to fast searchable objects. Due to the use of an approximate nearest neighbor search and some model optimizations, a person description search can be performed such that the result is available immediately when deployed on a standard PC with no GPU, allowing an interactive search. We demonstrated the effectiveness of the system on new data and showed that most people in the videos can be successfully discovered by the search. Full article

(This article belongs to the Topic Big Data and Artificial Intelligence)

► Show Figures

Figure 1

15 pages, 1233 KiB

Open AccessConcept Paper

Unleashing the Potentials of Quantum Probability Theory for Customer Experience Analytics

by Havana Rika, Itzhak Aviv and Roye Weitzfeld

Big Data Cogn. Comput. 2022, 6(4), 135; https://doi.org/10.3390/bdcc6040135 - 10 Nov 2022

Cited by 10 | Viewed by 2770

Abstract

In information systems research, the advantages of Customer Experience (CX) and its contribution to organizations are largely recognized. The CX analytics evaluate how customers perceive products, ranging from their functional usage to their cognitive states regarding the product, such as emotions, sentiment, and [...] Read more.

In information systems research, the advantages of Customer Experience (CX) and its contribution to organizations are largely recognized. The CX analytics evaluate how customers perceive products, ranging from their functional usage to their cognitive states regarding the product, such as emotions, sentiment, and satisfaction. The most recent research in psychology reveals that cognition analytics research based on Classical Probability Theory (CPT) and statistical learning, which is used to evaluate people’s cognitive states, is limited due to their reliance on rational decision-making. However, the cognitive attitudes of customers are characterized by uncertainty and entanglement, resulting in irrational decision-making bias. What is captured by traditional CPT-based data science in the context of cognition aspects of CX analytics is only a small portion of what should be captured. Current CX analytics efforts fall far short of their full potential. In this paper, we set a novel research direction for CX analytics by Quantum Probability Theory (QPT). QPT-based analytics have been introduced recently in psychology research and reveal better cognition assessment under uncertainty, with a high level of irrational behavior. Adopting recent advances in the psychology domain, this paper develops a vision and sets a research agenda for expanding the application of CX analytics by QPT to overcome CPT shortcomings, identifies research areas that contribute to the vision, and proposes elements of a future research agenda. To stimulate debate and research QPT-CX analytics, we attempt a preliminary characterization of the novel method by introducing a QPT-based rich mathematical framework for CX cognitive modeling based on quantum superposition, Bloch sphere, and Hilbert space. We demonstrate the implementation of the QPT-CX model by the use case of customers’ emotional motivator assessments while implementing quantum vector space with a set of mathematical axioms for CX analytics. Finally, we outline the key advantages of quantum CX over classical by supporting theoretical proof for each key. Full article

► Show Figures

Figure 1

18 pages, 6014 KiB

Open AccessCase Report

Comparative Study of Mortality Rate Prediction Using Data-Driven Recurrent Neural Networks and the Lee–Carter Model

by Yuan Chen and Abdul Q. M. Khaliq

Big Data Cogn. Comput. 2022, 6(4), 134; https://doi.org/10.3390/bdcc6040134 - 10 Nov 2022

Cited by 5 | Viewed by 3050

Abstract

The Lee–Carter model could be considered as one of the most important mortality prediction models among stochastic models in the field of mortality. With the recent developments of machine learning and deep learning, many studies have applied deep learning approaches to time series [...] Read more.

The Lee–Carter model could be considered as one of the most important mortality prediction models among stochastic models in the field of mortality. With the recent developments of machine learning and deep learning, many studies have applied deep learning approaches to time series mortality rate predictions, but most of them only focus on a comparison between the Long Short-Term Memory and the traditional models. In this study, three different recurrent neural networks, Long Short-Term Memory, Bidirectional Long Short-Term Memory, and Gated Recurrent Unit, are proposed for the task of mortality rate prediction. Different from the standard country level mortality rate comparison, this study compares the three deep learning models and the classic Lee–Carter model on nine divisions’ yearly mortality data by gender from 1966 to 2015 in the United States. With the out-of-sample testing, we found that the Gated Recurrent Unit model showed better average MAE and RMSE values than the Lee–Carter model on 72.2% (13/18) and 67.7% (12/18) of the database, respectively, while the same measure for the Long Short-Term Memory model and Bidirectional Long Short-Term Memory model are 50%/38.9% (MAE/RMSE) and 61.1%/61.1% (MAE/RMSE), respectively. If we consider forecasting accuracy, computing expense, and interpretability, the Lee–Carter model with ARIMA exhibits the best overall performance, but the recurrent neural networks could also be good candidates for mortality forecasting for divisions in the United States. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

19 pages, 2082 KiB

Open AccessArticle

SmartMinutes—A Blockchain-Based Framework for Automated, Reliable, and Transparent Meeting Minutes Management

by Amir Salar Hajy Sheikhi, Elias Iosif, Oleg Basov and Ioanna Dionysiou

Big Data Cogn. Comput. 2022, 6(4), 133; https://doi.org/10.3390/bdcc6040133 - 10 Nov 2022

Cited by 2 | Viewed by 2719

Abstract

The aim of this research work was to investigate the applicability of smart contracts in the context of automating the process of managing meeting minutes. To this end, smartMinutes, a proof-of-concept prototype of automating meeting minutes was designed, implemented, and validated with [...] Read more.

The aim of this research work was to investigate the applicability of smart contracts in the context of automating the process of managing meeting minutes. To this end, smartMinutes, a proof-of-concept prototype of automating meeting minutes was designed, implemented, and validated with test cases. The smartMinutes framework improves current practices related to the meeting minutes process by providing automation in areas where possible, and doing so in a transparent, flexible, reliable, and tamper-proof manner. The last feature is of paramount importance due to the fact that meeting minutes offer legal protection, as they are considered official records of the actions taken by an organisation. Additionally, smartMinutes supports meeting agendas with non-voting items as well as voting items, offering a pool of three voting schemes, executing under three different configurations. A particular configuration, the hidden mode, provides for secrecy while at the same time guaranteeing transparency. Full article

► Show Figures

Figure 1

24 pages, 718 KiB

Open AccessEditor’s ChoiceReview

An Overview of Data Warehouse and Data Lake in Modern Enterprise Data Management

by Athira Nambiar and Divyansh Mundra

Big Data Cogn. Comput. 2022, 6(4), 132; https://doi.org/10.3390/bdcc6040132 - 7 Nov 2022

Cited by 49 | Viewed by 47063

Abstract

Data is the lifeblood of any organization. In today’s world, organizations recognize the vital role of data in modern business intelligence systems for making meaningful decisions and staying competitive in the field. Efficient and optimal data analytics provides a competitive edge to its [...] Read more.

Data is the lifeblood of any organization. In today’s world, organizations recognize the vital role of data in modern business intelligence systems for making meaningful decisions and staying competitive in the field. Efficient and optimal data analytics provides a competitive edge to its performance and services. Major organizations generate, collect and process vast amounts of data, falling under the category of big data. Managing and analyzing the sheer volume and variety of big data is a cumbersome process. At the same time, proper utilization of the vast collection of an organization’s information can generate meaningful insights into business tactics. In this regard, two of the popular data management systems in the area of big data analytics (i.e., data warehouse and data lake) act as platforms to accumulate the big data generated and used by organizations. Although seemingly similar, both of them differ in terms of their characteristics and applications. This article presents a detailed overview of the roles of data warehouses and data lakes in modern enterprise data management. We detail the definitions, characteristics and related works for the respective data management frameworks. Furthermore, we explain the architecture and design considerations of the current state of the art. Finally, we provide a perspective on the challenges and promising research directions for the future. Full article

(This article belongs to the Special Issue Review Papers in Big Data, Cloud-Based Data Analysis and Learning Systems)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Big Data Cogn. Comput., Volume 6, Issue 4 (December 2022) – 62 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI