Big Data and Cognitive Computing

18 pages, 1651 KiB

Open AccessArticle

Sentiment Analysis of Product Reviews Using Machine Learning and Pre-Trained LLM

by Pawanjit Singh Ghatora, Seyed Ebrahim Hosseini, Shahbaz Pervez, Muhammad Javed Iqbal and Nabil Shaukat

Big Data Cogn. Comput. 2024, 8(12), 199; https://doi.org/10.3390/bdcc8120199 - 23 Dec 2024

Viewed by 1645

Abstract

Sentiment analysis via artificial intelligence, i.e., machine learning and large language models (LLMs), is a pivotal tool that classifies sentiments within texts as positive, negative, or neutral. It enables computers to automatically detect and interpret emotions from textual data, covering a spectrum of [...] Read more.

Sentiment analysis via artificial intelligence, i.e., machine learning and large language models (LLMs), is a pivotal tool that classifies sentiments within texts as positive, negative, or neutral. It enables computers to automatically detect and interpret emotions from textual data, covering a spectrum of feelings without direct human intervention. Sentiment analysis is integral to marketing research, helping to gauge consumer emotions and opinions across various sectors. Its applications span analyzing movie reviews, monitoring social media, evaluating product feedback, assessing employee sentiments, and identifying hate speech. This study explores the application of both traditional machine learning and pre-trained LLMs for automated sentiment analysis of customer product reviews. The motivation behind this work lies in the demand for more nuanced understanding of consumer sentiments that can drive data-informed business decisions. In this research, we applied machine learning-based classifiers, i.e., Random Forest, Naive Bayes, and Support Vector Machine, alongside the GPT-4 model to benchmark their effectiveness for sentiment analysis. Traditional models show better results and efficiency in processing short, concise text, with SVM in classifying sentiment of short length comments. However, GPT-4 showed better results with more detailed texts, capturing subtle sentiments with higher precision, recall, and F1 scores to uniquely identify mixed sentiments not found in the simpler models. Conclusively, this study shows that LLMs outperform traditional models in context-rich sentiment analysis by not only providing accurate sentiment classification but also insightful explanations. These results enable LLMs to provide a superior tool for customer-centric businesses, which helps actionable insights to be derived from any textual data. Full article

► Show Figures

Figure 1

21 pages, 807 KiB

Open AccessReview

Digital Eye-Movement Outcomes (DEMOs) as Biomarkers for Neurological Conditions: A Narrative Review

by Lisa Graham, Rodrigo Vitorio, Richard Walker, Gill Barry, Alan Godfrey, Rosie Morris and Samuel Stuart

Big Data Cogn. Comput. 2024, 8(12), 198; https://doi.org/10.3390/bdcc8120198 - 19 Dec 2024

Viewed by 771

Abstract

Eye-movement assessment is a key component of neurological evaluation, offering valuable insights into neural deficits and underlying mechanisms. This narrative review explores the emerging subject of digital eye-movement outcomes (DEMOs) and their potential as sensitive biomarkers for neurological impairment. Eye tracking has become [...] Read more.

Eye-movement assessment is a key component of neurological evaluation, offering valuable insights into neural deficits and underlying mechanisms. This narrative review explores the emerging subject of digital eye-movement outcomes (DEMOs) and their potential as sensitive biomarkers for neurological impairment. Eye tracking has become a useful method for investigating visual system functioning, attentional processes, and cognitive mechanisms. Abnormalities in eye movements, such as altered saccadic patterns or impaired smooth pursuit, can act as important diagnostic indicators for various neurological conditions. The non-invasive nature, cost-effectiveness, and ease of implementation of modern eye-tracking systems makes it particularly attractive in both clinical and research settings. Advanced digital eye-tracking technologies and analytical methods enable precise quantification of eye-movement parameters, complementing subjective clinical evaluations with objective data. This review examines how DEMOs could contribute to the localisation and diagnosis of neural impairments, potentially serving as useful biomarkers. By comprehensively exploring the role of eye-movement assessment, this review aims to highlight the common eye-movement deficits seen in neurological injury and disease by using the examples of mild traumatic brain injury and Parkinson’s Disease. This review also aims to enhance the understanding of the potential use of DEMOs in diagnosis, monitoring, and management of neurological disorders, ultimately improving patient care and deepening our understanding of complex neurological processes. Furthermore, we consider the broader implications of this technology in unravelling the complexities of visual processing, attention mechanisms, and cognitive functions. This review summarises how DEMOs could reshape our understanding of brain health and allow for more targeted and effective neurological interventions. Full article

► Show Figures

Figure 1

27 pages, 9069 KiB

Open AccessArticle

Forecasting Human Core and Skin Temperatures: A Long-Term Series Approach

by Xinge Han, Jiansong Wu, Zhuqiang Hu, Chuan Li and Boyang Sun

Big Data Cogn. Comput. 2024, 8(12), 197; https://doi.org/10.3390/bdcc8120197 - 19 Dec 2024

Viewed by 640

Abstract

Human core and skin temperature (T_cr and T_sk) are crucial indicators of human health and are commonly utilized in diagnosing various types of diseases. This study presents a deep learning model that combines a long-term series forecasting method with transfer [...] Read more.

Human core and skin temperature (T_cr and T_sk) are crucial indicators of human health and are commonly utilized in diagnosing various types of diseases. This study presents a deep learning model that combines a long-term series forecasting method with transfer learning techniques, capable of making precise, personalized predictions of T_cr and T_sk in high-temperature environments with only a small corpus of actual training data. To practically validate the model, field experiments were conducted in complex environments, and a thorough analysis of the effects of three diverse training strategies on the overall performance of the model was performed. The comparative analysis revealed that the optimized training method significantly improved prediction accuracy for forecasts extending up to 10 min into the future. Specifically, the approach of pretraining the model on in-distribution samples followed by fine-tuning markedly outperformed other methods in terms of prediction accuracy, with a prediction error for T_cr within ±0.14 °C and T_{sk, mean} within ±0.46 °C. This study provides a viable approach for the precise, real-time prediction of T_cr and T_sk, offering substantial support for advancing early warning research of human thermal health. Full article

► Show Figures

Figure 1

21 pages, 1364 KiB

Open AccessArticle

Arabic Opinion Classification of Customer Service Conversations Using Data Augmentation and Artificial Intelligence

by Rihab Fahd Al-Mutawa and Arwa Yousuf Al-Aama

Big Data Cogn. Comput. 2024, 8(12), 196; https://doi.org/10.3390/bdcc8120196 - 19 Dec 2024

Viewed by 623

Abstract

Customer satisfaction is not just a significant factor but a cornerstone for smart cities and their organizations that offer services to people. It enhances the organization’s reputation and profitability and drastically raises the chances of returning customers. Unfortunately, customer support service through online [...] Read more.

Customer satisfaction is not just a significant factor but a cornerstone for smart cities and their organizations that offer services to people. It enhances the organization’s reputation and profitability and drastically raises the chances of returning customers. Unfortunately, customer support service through online chat is often not rated by customers to help improve the service. This study employs artificial intelligence and data augmentation to predict customer satisfaction ratings from conversations by analyzing the responses of customers and service providers. For the study, the authors obtained actual conversations between customers and real agents from the call center database of Jeddah Municipality that were rated by customers on a scale of 1–5. They trained and tested five prediction models with approaches based on logistic regression, random forest, and ensemble-based deep learning, and fine-tuned two pre-trained recent models: ArabicT5 and SaudiBERT. Then, they repeated training and testing models after applying a data augmentation technique using the generative artificial intelligence, GPT-4, to improve the unbalance in customer conversation data. The study found that the ensemble-based deep learning approach best predicts the five-, three-, and two-class classifications. Moreover, data augmentation improved accuracy using the ensemble-based deep learning model with a 1.69% increase and the logistic regression model with a 3.84% increase. This study contributes to the advancement of Arabic opinion mining, as it is the first to report the performance of determining customer satisfaction levels using Arabic conversation data. The implications of this study are significant, as the findings can be applied to improve customer service in various organizations. Full article

► Show Figures

Figure 1

13 pages, 552 KiB

Open AccessArticle

Mandarin Recognition Based on Self-Attention Mechanism with Deep Convolutional Neural Network (DCNN)-Gated Recurrent Unit (GRU)

by Xun Chen, Chengqi Wang, Chao Hu and Qin Wang

Big Data Cogn. Comput. 2024, 8(12), 195; https://doi.org/10.3390/bdcc8120195 - 18 Dec 2024

Viewed by 890

Abstract

Speech recognition technology is an important branch in the field of artificial intelligence, aiming to transform human speech into computer-readable text information. However, speech recognition technology still faces many challenges, such as noise interference, and accent and speech rate differences. An aim of [...] Read more.

Speech recognition technology is an important branch in the field of artificial intelligence, aiming to transform human speech into computer-readable text information. However, speech recognition technology still faces many challenges, such as noise interference, and accent and speech rate differences. An aim of this paper is to explore a deep learning-based speech recognition method to improve the accuracy and robustness of speech recognition. Firstly, this paper introduces the basic principles of speech recognition and existing mainstream technologies, and then focuses on the deep learning-based speech recognition method. Through comparative experiments, it is found that the self-attention mechanism performs best in speech recognition tasks. In order to further improve speech recognition performance, this paper proposes a deep learning model based on the self-attention mechanism with DCNN-GRU. The model realizes the dynamic attention to an input speech by introducing the self-attention mechanism in a neural network model instead of an RNN and with a deep convolutional neural network, which improves the robustness and recognition accuracy of this model. This experiment uses 170 h of Chinese dataset AISHELL-1. Compared with the deep convolutional neural network, the deep learning model based on the self-attention mechanism with DCNN-GRU accomplishes a reduction of at least 6% in CER. Compared with a bidirectional gated recurrent neural network, the deep learning model based on the self-attention mechanism with DCNN-GRU accomplishes a reduction of 0.7% in CER. And finally, this experiment is performed on a test set analyzed the influencing factors affecting the CER. The experimental results show that this model exhibits good performance in various noise environments and accent conditions. Full article

► Show Figures

Figure 1

16 pages, 953 KiB

Open AccessArticle

Assessing the Guidelines on the Use of Generative Artificial Intelligence Tools in Universities: A Survey of the World’s Top 50 Universities

by Midrar Ullah, Salman Bin Naeem and Maged N. Kamel Boulos

Big Data Cogn. Comput. 2024, 8(12), 194; https://doi.org/10.3390/bdcc8120194 - 18 Dec 2024

Viewed by 1433

Abstract

The widespread adoption of Generative Artificial Intelligence (GenAI) tools in higher education has necessitated the development of appropriate and ethical usage guidelines. This study aims to explore and assess publicly available guidelines covering the use of GenAI tools in universities, following a predefined [...] Read more.

The widespread adoption of Generative Artificial Intelligence (GenAI) tools in higher education has necessitated the development of appropriate and ethical usage guidelines. This study aims to explore and assess publicly available guidelines covering the use of GenAI tools in universities, following a predefined checklist. We searched and downloaded publicly accessible guidelines on the use of GenAI tools from the websites of the top 50 universities globally, according to the 2025 QS university rankings. From the literature on GenAI use guidelines, we created a 24-item checklist, which was then reviewed by a panel of experts. This checklist was used to assess the characteristics of the retrieved university guidelines. Out of the 50 university websites explored, guidelines were publicly accessible on the sites of 41 institutions. All these guidelines allowed for the use of GenAI tools in academic settings provided that specific instructions detailed in the guidelines were followed. These instructions encompassed securing instructor consent before utilization, identifying appropriate and inappropriate instances for deployment, employing suitable strategies in classroom settings and assessment, appropriately integrating results, acknowledging and crediting GenAI tools, and adhering to data privacy and security measures. However, our study found that only a small number of the retrieved guidelines offered instructions on the AI algorithm (understanding how it works), the documentation of prompts and outputs, AI detection tools, and mechanisms for reporting misconduct. Higher education institutions should develop comprehensive guidelines and policies for the responsible use of GenAI tools. These guidelines must be frequently updated to stay in line with the fast-paced evolution of AI technologies and their applications within the academic sphere. Full article

► Show Figures

Figure 1

16 pages, 1558 KiB

Open AccessArticle

EIF-SlideWindow: Enhancing Simultaneous Localization and Mapping Efficiency and Accuracy with a Fixed-Size Dynamic Information Matrix

by Javier Lamar Léon, Pedro Salgueiro, Teresa Gonçalves and Luis Rato

Big Data Cogn. Comput. 2024, 8(12), 193; https://doi.org/10.3390/bdcc8120193 - 17 Dec 2024

Viewed by 488

Abstract

This paper introduces EIF-SlideWindow, a novel enhancement of the Extended Information Filter (EIF) algorithm for Simultaneous Localization and Mapping (SLAM). Traditional EIF-SLAM, while effective in many scenarios, struggles with inaccuracies in highly non-linear systems or environments characterized by significant non-Gaussian noise. Moreover, the [...] Read more.

This paper introduces EIF-SlideWindow, a novel enhancement of the Extended Information Filter (EIF) algorithm for Simultaneous Localization and Mapping (SLAM). Traditional EIF-SLAM, while effective in many scenarios, struggles with inaccuracies in highly non-linear systems or environments characterized by significant non-Gaussian noise. Moreover, the computational complexity of EIF/EKF-SLAM scales with the size of the environment, often resulting in performance bottlenecks. Our proposed EIF-SlideWindow approach addresses these limitations by maintaining a fixed-size information matrix and vector, ensuring constant-time processing per robot step, regardless of trajectory length. This is achieved through a sliding window mechanism centered on the robot’s pose, where older landmarks are systematically replaced by newer ones. We assess the effectiveness of EIF-SlideWindow using simulated data and demonstrate that it outperforms standard EIF/EKF-SLAM in both accuracy and efficiency. Additionally, our implementation leverages PyTorch for matrix operations, enabling efficient execution on both CPU and GPU. Additionally, the code for this approach is made available for further exploration and development. Full article

► Show Figures

Figure 1

21 pages, 5273 KiB

Open AccessArticle

Integrating Statistical Methods and Machine Learning Techniques to Analyze and Classify COVID-19 Symptom Severity

by Yaqeen Raddad, Ahmad Hasasneh, Obada Abdallah, Camil Rishmawi and Nouar Qutob

Big Data Cogn. Comput. 2024, 8(12), 192; https://doi.org/10.3390/bdcc8120192 - 16 Dec 2024

Viewed by 1269

Abstract

Background/Objectives: The COVID-19 pandemic, caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), led to significant global health challenges, including the urgent need for accurate symptom severity prediction aimed at optimizing treatment. While machine learning (ML) and deep learning (DL) models have [...] Read more.

Background/Objectives: The COVID-19 pandemic, caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), led to significant global health challenges, including the urgent need for accurate symptom severity prediction aimed at optimizing treatment. While machine learning (ML) and deep learning (DL) models have shown promise in predicting COVID-19 severity using imaging and clinical data, there is limited research utilizing comprehensive tabular symptom datasets. This study aims to address this gap by leveraging a detailed symptom dataset to develop robust models for categorizing COVID-19 symptom severity, thereby enhancing clinical decision making. Methods: A unique tabular dataset was created using questionnaire responses from 5654 individuals, including demographic information, comorbidities, travel history, and medical data. Both unsupervised and supervised ML techniques were employed, including k-means clustering to categorize symptom severity into mild, moderate, and severe clusters. In addition, classification models, namely, Support Vector Machine (SVM), Adaptive Boosting (AdaBoost), eXtreme Gradient Boosting (XGBoost), random forest, and a deep neural network (DNN) were used to predict symptom severity levels. Feature importance was analyzed using the random forest model for its robustness with high-dimensional data and ability to capture complex non-linear relationships, and statistical significance was evaluated through ANOVA and Chi-square tests. Results: Our study showed that fatigue, joint pain, and headache were the most important features in predicting severity. SVM, AdaBoost, and random forest achieved an accuracy of 94%, while XGBoost achieved an accuracy of 96%. DNN showed robust performance in handling complex patterns with 98% accuracy. In terms of precision and recall metrics, both the XGBoost and DNN models demonstrated robust performance, particularly for the moderate class. XGBoost recorded 98% precision and 97% recall, while DNN achieved 99% precision and recall. The clustering approach improved classification accuracy by reducing noise and dimensionality. Statistical tests confirmed the significance of additional features like Body Mass Index (BMI), age, and dominant variant type. Conclusions: Integrating symptom data with advanced ML models offers a promising approach for accurate COVID-19 severity classification. This method provides a reliable tool for healthcare professionals to optimize patient care and resource management, particularly in managing COVID-19 and potential future pandemics. Future work should focus on incorporating imaging and clinical data to further enhance model accuracy and clinical applicability. Full article

► Show Figures

Figure 1

25 pages, 6393 KiB

Open AccessArticle

Re-Evaluating Deep Learning Attacks and Defenses in Cybersecurity Systems

by Meaad Ahmed, Qutaiba Alasad, Jiann-Shiun Yuan and Mohammed Alawad

Big Data Cogn. Comput. 2024, 8(12), 191; https://doi.org/10.3390/bdcc8120191 - 16 Dec 2024

Viewed by 891

Abstract

Cybersecurity attacks pose a significant threat to the security of network systems through intrusions and illegal communications. Measuring the vulnerability of cybersecurity is crucial for refining the overall system security to further mitigate potential security risks. Machine learning (ML)-based intrusion detection systems (IDSs) [...] Read more.

Cybersecurity attacks pose a significant threat to the security of network systems through intrusions and illegal communications. Measuring the vulnerability of cybersecurity is crucial for refining the overall system security to further mitigate potential security risks. Machine learning (ML)-based intrusion detection systems (IDSs) are mainly designed to detect malicious network traffic. Unfortunately, ML models have recently been demonstrated to be vulnerable to adversarial perturbation, and therefore enable potential attackers to crash the system during normal operation. Among different attacks, generative adversarial networks (GANs) have been known as one of the most powerful threats to cybersecurity systems. To address these concerns, it is important to explore new defense methods and understand the nature of different types of attacks. In this paper, we investigate four serious attacks, GAN, Zeroth-Order Optimization (ZOO), kernel density estimation (KDE), and DeepFool attacks, on cybersecurity. Deep analysis was conducted on these attacks using three different cybersecurity datasets, ADFA-LD, CSE-CICIDS2018, and CSE-CICIDS2019. Our results have shown that KDE and DeepFool attacks are stronger than GANs in terms of attack success rate and impact on system performance. To demonstrate the effectiveness of our approach, we develop a defensive model using adversarial training where the DeepFool method is used to generate adversarial examples. The model is evaluated against GAN, ZOO, KDE, and DeepFool attacks to assess the level of system protection against adversarial perturbations. The experiment was conducted by leveraging a deep learning model as a classifier with the three aforementioned datasets. The results indicate that the proposed defensive model refines the resilience of the system and mitigates the presented serious attacks. Full article

► Show Figures

Figure 1

16 pages, 1694 KiB

Open AccessArticle

Comparative Study of Filtering Methods for Scientific Research Article Recommendations

by Driss El Alaoui, Jamal Riffi, Abdelouahed Sabri, Badraddine Aghoutane, Ali Yahyaouy and Hamid Tairi

Big Data Cogn. Comput. 2024, 8(12), 190; https://doi.org/10.3390/bdcc8120190 - 16 Dec 2024

Viewed by 577

Abstract

Given the daily influx of scientific publications, researchers often face challenges in identifying relevant content amid the vast volume of available information, typically resorting to conventional methods like keyword searches or manual browsing. Utilizing a dataset comprising 1895 users and 3122 articles from [...] Read more.

Given the daily influx of scientific publications, researchers often face challenges in identifying relevant content amid the vast volume of available information, typically resorting to conventional methods like keyword searches or manual browsing. Utilizing a dataset comprising 1895 users and 3122 articles from the CI&T Deskdrop collection, as well as 7947 users and 25,975 articles from CiteULike-t, we examine the effectiveness of collaborative filtering and content-based and hybrid recommendation approaches in scientific literature recommendations. These methods automatically generate article suggestions by analyzing user preferences and historical behavior. Our findings, evaluated based on accuracy (Precision@K), ranking quality (NDCG@K), and novelty, reveal that the hybrid approach significantly outperforms other methods, tackling some challenges such as cold starts and sparsity problems. This research offers theoretical insights into recommendation model effectiveness and practical implications for developing tools that enhance content discovery and researcher productivity. Full article

► Show Figures

Figure 1

23 pages, 4791 KiB

Open AccessArticle

An Intelligent Self-Validated Sensor System Using Neural Network Technologies and Fuzzy Logic Under Operating Implementation Conditions

by Serhii Vladov, Victoria Vysotska, Valerii Sokurenko, Oleksandr Muzychuk and Lyubomyr Chyrun

Big Data Cogn. Comput. 2024, 8(12), 189; https://doi.org/10.3390/bdcc8120189 - 13 Dec 2024

Viewed by 669

Abstract

This article presents an intelligent self-validated sensor system developed for dynamic objects and based on the intelligent sensor concept, which ensures autonomous data collection and real-time analysis while adapting to changing conditions and compensating for errors. The research’s scientific merit is that an [...] Read more.

This article presents an intelligent self-validated sensor system developed for dynamic objects and based on the intelligent sensor concept, which ensures autonomous data collection and real-time analysis while adapting to changing conditions and compensating for errors. The research’s scientific merit is that an intelligent self-validated sensor for dynamic objects has been developed that integrates adaptive correction algorithms, fuzzy logic, and neural networks to improve the sensors’ accuracy and reliability under changing operating conditions. The proposed intelligent self-validated sensor system provides real-time error compensation, long-term stability, and effective fault diagnostics. Analytical equations are described, considering corrections related to influencing factors, temporal drift, and calibration characteristics, significantly enhancing measurement accuracy and reliability. The fuzzy logic application allows for refining the scaling coefficient that adjusts the relationship between the measured parameter and influencing factors, utilizing fuzzy inference algorithms. Additionally, monitoring and diagnostics implementation for sensor states through LSTM networks enable effective fault detection. Computational experiments on the TV3-117 engine demonstrated high data-restoring accuracy during forced interruptions, reaching 99.5%. A comparative analysis with alternative approaches confirmed the advantages of using LSTM (Long Short-Term Memory) neural networks in improving measurement quality. Full article

► Show Figures

Figure 1

23 pages, 2292 KiB

Open AccessArticle

Integrating Generative AI in Hackathons: Opportunities, Challenges, and Educational Implications

by Ramteja Sajja, Carlos Erazo Ramirez, Zhouyayan Li, Bekir Z. Demiray, Yusuf Sermet and Ibrahim Demir

Big Data Cogn. Comput. 2024, 8(12), 188; https://doi.org/10.3390/bdcc8120188 - 13 Dec 2024

Viewed by 1222

Abstract

Hackathons have become essential in the software industry, fostering innovation and skill development for both organizations and students. These events facilitate rapid prototyping for companies while providing students with hands-on learning opportunities that bridge theory and practice. Over time, hackathons have evolved from [...] Read more.

Hackathons have become essential in the software industry, fostering innovation and skill development for both organizations and students. These events facilitate rapid prototyping for companies while providing students with hands-on learning opportunities that bridge theory and practice. Over time, hackathons have evolved from competitive arenas into dynamic educational platforms, promoting collaboration between academia and industry. The integration of artificial intelligence (AI) and machine learning is transforming hackathons, enhancing learning experiences, and introducing ethical considerations. This study examines the impact of generative AI tools on technological decision-making during the 2023 University of Iowa Hackathon. It analyzes how AI influences project efficiency, learning outcomes, and collaboration, while addressing the ethical challenges posed by its use. The findings offer actionable insights and strategies for effectively integrating AI into future hackathons, balancing innovation, ethics, and educational value. Full article

► Show Figures

Figure 1

42 pages, 1236 KiB

Open AccessSystematic Review

Predictive Models for Educational Purposes: A Systematic Review

by Ahlam Almalawi, Ben Soh, Alice Li and Halima Samra

Big Data Cogn. Comput. 2024, 8(12), 187; https://doi.org/10.3390/bdcc8120187 - 13 Dec 2024

Viewed by 1706

Abstract

This systematic literature review evaluates predictive models in education, focusing on their role in forecasting student performance, identifying at-risk students, and personalising learning experiences. The review compares the effectiveness of machine learning (ML) algorithms such as Support Vector Machines (SVMs), Artificial Neural Networks [...] Read more.

This systematic literature review evaluates predictive models in education, focusing on their role in forecasting student performance, identifying at-risk students, and personalising learning experiences. The review compares the effectiveness of machine learning (ML) algorithms such as Support Vector Machines (SVMs), Artificial Neural Networks (ANNs), and Decision Trees with traditional statistical models, assessing their ability to manage complex educational data and improve decision-making. The search, conducted across databases including ScienceDirect, IEEE Xplore, ACM Digital Library, and Google Scholar, yielded 400 records. After screening and removing duplicates, 124 studies were included in the final review. The findings show that ML algorithms consistently outperform traditional models due to their capacity to handle large, non-linear datasets and continuously enhance predictive accuracy as new patterns emerge. These models effectively incorporate socio-economic, demographic, and academic data, making them valuable tools for improving student retention and performance. However, the review also identifies key challenges, including the risk of perpetuating biases present in historical data, issues of transparency, and the complexity of interpreting AI-driven decisions. In addition, reliance on varying data processing methods across studies reduces the generalisability of current models. Future research should focus on developing more transparent, interpretable, and equitable models while standardising data collection and incorporating non-traditional variables, such as cognitive and motivational factors. Ensuring transparency and ethical standards in handling student data is essential for fostering trust in AI-driven models. Full article

► Show Figures

Figure 1

28 pages, 5843 KiB

Open AccessArticle

An Analysis of Vaccine-Related Sentiments on Twitter (X) from Development to Deployment of COVID-19 Vaccines

by Rohitash Chandra, Jayesh Sonawane and Jahnavi Lande

Big Data Cogn. Comput. 2024, 8(12), 186; https://doi.org/10.3390/bdcc8120186 - 13 Dec 2024

Viewed by 920

Abstract

Anti-vaccine sentiments have been well-known and reported throughout the history of viral outbreaks and vaccination programmes. The COVID-19 pandemic caused fear and uncertainty about vaccines, which has been well expressed on social media platforms such as Twitter (X). We analyse sentiments from the [...] Read more.

Anti-vaccine sentiments have been well-known and reported throughout the history of viral outbreaks and vaccination programmes. The COVID-19 pandemic caused fear and uncertainty about vaccines, which has been well expressed on social media platforms such as Twitter (X). We analyse sentiments from the beginning of the COVID-19 pandemic and study the public behaviour on X during the planning, development, and deployment of vaccines expressed in tweets worldwide using a sentiment analysis framework via deep learning models. We provide visualisation and analysis of anti-vaccine sentiments throughout the COVID-19 pandemic. We review the nature of the sentiments expressed with the number of tweets and monthly COVID-19 infections. Our results show a link between the number of tweets, the number of cases, and the change in sentiment polarity scores during major waves of COVID-19. We also find that the first half of the pandemic had drastic changes in the sentiment polarity scores that later stabilised, implying that the vaccine rollout impacted the nature of discussions on social media. Full article

(This article belongs to the Special Issue Application of Semantic Technologies in Intelligent Environment)

► Show Figures

Figure 1

22 pages, 487 KiB

Open AccessArticle

From Fact Drafts to Operational Systems: Semantic Search in Legal Decisions Using Fact Drafts

by Gergely Márk Csányi, Dorina Lakatos, István Üveges, Andrea Megyeri , János Pál Vadász, Dániel Nagy and Renátó Vági

Big Data Cogn. Comput. 2024, 8(12), 185; https://doi.org/10.3390/bdcc8120185 - 10 Dec 2024

Viewed by 819

Abstract

This research paper presents findings from an investigation in the semantic similarity search task within the legal domain, using a corpus of 1172 Hungarian court decisions. The study establishes the groundwork for an operational semantic similarity search system designed to identify cases with [...] Read more.

This research paper presents findings from an investigation in the semantic similarity search task within the legal domain, using a corpus of 1172 Hungarian court decisions. The study establishes the groundwork for an operational semantic similarity search system designed to identify cases with comparable facts using preliminary legal fact drafts. Evaluating such systems often poses significant challenges, given the need for thorough document checks, which can be costly and limit evaluation reusability. To address this, the study employs manually created fact drafts for legal cases, enabling reliable ranking of original cases within retrieved documents and quantitative comparison of various vectorization methods. The study compares twelve different text embedding solutions (the most recent became available just a few weeks before the manuscript was written) identifying Cohere’s embed-multilingual-v3.0, Beijing Academy of Artificial Intelligence’s bge-m3, Jina AI’s jina-embeddings-v3, OpenAI’s text-embedding-3-large, and Microsoft’s multilingual-e5-large models as top performers. To overcome the transformer-based models’ context window limitation, we investigated chunking, striding, and last chunk scaling techniques, with last chunk scaling significantly improving embedding quality. The results suggest that the effectiveness of striding varies based on token count. Notably, employing striding with 16 tokens yielded optimal results, representing 3.125% of the context window size for the best-performing models. Results also suggested that from the models having 8192 token long context window the bge-m3 model is superior compared to jina-embeddings-v3 and text-embedding-3-large models in capturing the relevant parts of a document if the text contains significant amount of noise. The validity of the approach was evaluated and confirmed by legal experts. These insights led to an operational semantic search system for a prominent legal content provider. Full article

► Show Figures

Figure 1

15 pages, 5809 KiB

Open AccessArticle

The Use of Eye-Tracking to Explore the Relationship Between Consumers’ Gaze Behaviour and Their Choice Process

by Maria-Jesus Agost and Vicente Bayarri-Porcar

Big Data Cogn. Comput. 2024, 8(12), 184; https://doi.org/10.3390/bdcc8120184 - 9 Dec 2024

Viewed by 729

Abstract

Eye-tracking technology can assist researchers in understanding motivational decision-making and choice processes by analysing consumers’ gaze behaviour. Previous studies showed that attention is related to decision, as the preferred stimulus is generally the most observed and the last visited before a decision is [...] Read more.

Eye-tracking technology can assist researchers in understanding motivational decision-making and choice processes by analysing consumers’ gaze behaviour. Previous studies showed that attention is related to decision, as the preferred stimulus is generally the most observed and the last visited before a decision is made. In this work, the relationship between gaze behaviour and decision-making was explored using eye-tracking technology. Images of six wardrobes incorporating different sustainable design strategies were presented to 57 subjects, who were tasked with selecting the wardrobe they intended to keep the longest. The amount of time spent looking was higher when it was the chosen version. Detailed analyses of gaze plots and heat maps derived from eye-tracking records were employed to identify different patterns of gaze behaviour during the selection process. These patterns included alternating attention between a few versions or comparing them against a reference, allowing the identification of stimuli that initially piqued interest but were ultimately not chosen, as well as potential doubts in the decision-making process. These findings suggest that doubts that arise before making a selection warrant further investigation. By identifying stimuli that attract attention but are not chosen, this study provides valuable insights into consumer behaviour and decision-making processes. Full article

► Show Figures

Figure 1

18 pages, 529 KiB

Open AccessArticle

eFC-Evolving Fuzzy Classifier with Incremental Clustering Algorithm Based on Samples Mean Value

by Emmanuel Tavares, Gray Farias Moita and Alisson Marques Silva

Big Data Cogn. Comput. 2024, 8(12), 183; https://doi.org/10.3390/bdcc8120183 - 6 Dec 2024

Viewed by 622

Abstract

This paper introduces a new multiclass classifier called the evolving Fuzzy Classifier (eFC). Starting its knowledge base from scratch, the eFC structure evolves based on a clustering algorithm that can add, merge, delete, or update clusters (= rules) simultaneously while providing class predictions. [...] Read more.

This paper introduces a new multiclass classifier called the evolving Fuzzy Classifier (eFC). Starting its knowledge base from scratch, the eFC structure evolves based on a clustering algorithm that can add, merge, delete, or update clusters (= rules) simultaneously while providing class predictions. The procedure to add clusters uses the procrastination idea to prevent outliers from affecting the quality of learning. Two pruning mechanisms are used to maintain a concise and compact structure. In the first, redundant clusters are merged based on a similarity measure, and in the second, obsolete and unrepresentative clusters are excluded based on an inactivity strategy. The center of the clusters is adjusted based on the mean value of the attributes. The eFC model was evaluated and compared with state-of-the-art evolving fuzzy systems on 8 randomly selected data streams from the UCI and Kaggle repositories. The experimental results indicate that the eFC outperforms or is at least comparable to alternative state-of-the-art models. Specifically, the eFC achieved an average accuracy of 7% to 37% higher than the competing classifiers. The results and comparisons demonstrate that the eFC is a promising alternative for classification tasks in non-stationary environments, offering good accuracy, a compact structure, low computational cost, and efficient processing time. Full article

► Show Figures

Figure 1

33 pages, 1325 KiB

Open AccessArticle

A Centrality-Weighted Bidirectional Encoder Representation from Transformers Model for Enhanced Sequence Labeling in Key Phrase Extraction from Scientific Texts

by Tsitsi Zengeya, Jean Vincent Fonou Dombeu and Mandlenkosi Gwetu

Big Data Cogn. Comput. 2024, 8(12), 182; https://doi.org/10.3390/bdcc8120182 - 4 Dec 2024

Viewed by 703

Abstract

Deep learning approaches, utilizing Bidirectional Encoder Representation from Transformers (BERT) and advanced fine-tuning techniques, have achieved state-of-the-art accuracies in the domain of term extraction from texts. However, BERT presents some limitations in that it primarily captures the semantic context relative to the surrounding [...] Read more.

Deep learning approaches, utilizing Bidirectional Encoder Representation from Transformers (BERT) and advanced fine-tuning techniques, have achieved state-of-the-art accuracies in the domain of term extraction from texts. However, BERT presents some limitations in that it primarily captures the semantic context relative to the surrounding text without considering how relevant or central a token is to the overall document content. There has also been research on the application of sequence labeling on contextualized embeddings; however, the existing methods often rely solely on local context for extracting key phrases from texts. To address these limitations, this study proposes a centrality-weighted BERT model for key phrase extraction from text using sequence labelling (CenBERT-SEQ). The proposed CenBERT-SEQ model utilizes BERT to represent terms with various contextual embedding architectures, and introduces a centrality-weighting layer that integrates document-level context into BERT. This layer leverages document embeddings to influence the importance of each term based on its relevance to the entire document. Finally, a linear classifier layer is employed to model the dependencies between the outputs, thereby enhancing the accuracy of the CenBERT-SEQ model. The proposed CenBERT-SEQ model was evaluated against the standard BERT base-uncased model using three Computer Science article datasets, namely, SemEval-2010, WWW, and KDD. The experimental results show that, although the CenBERT-SEQ and BERT-base models achieved higher and close comparable accuracy, the proposed CenBERT-SEQ model achieved higher precision, recall, and F1-score than the BERT-base model. Furthermore, a comparison of the proposed CenBERT-SEQ model to that of related studies revealed that the proposed CenBERT-SEQ model achieved a higher accuracy, precision, recall, and F1-score of 95%, 97%, 91%, and 94%, respectively, than related studies, showing the superior capabilities of the CenBERT-SEQ model in keyphrase extraction from scientific documents. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

18 pages, 5855 KiB

Open AccessArticle

Suspension Parameter Estimation Method for Heavy-Duty Freight Trains Based on Deep Learning

by Changfan Zhang, Yuxuan Wang and Jing He

Big Data Cogn. Comput. 2024, 8(12), 181; https://doi.org/10.3390/bdcc8120181 - 4 Dec 2024

Viewed by 615

Abstract

The suspension parameters of heavy-duty freight trains can deviate from their initial design values due to material aging and performance degradation. While traditional multibody dynamics simulation models are usually designed for fixed working conditions, it is difficult for them to adequately analyze the [...] Read more.

The suspension parameters of heavy-duty freight trains can deviate from their initial design values due to material aging and performance degradation. While traditional multibody dynamics simulation models are usually designed for fixed working conditions, it is difficult for them to adequately analyze the safety status of the vehicle–line system in actual operation. To address this issue, this research provides a suspension parameter estimation technique based on CNN-GRU. Firstly, a prototype C80 train was utilized to build a simulation model for multibody dynamics. Secondly, six key suspension parameters for wheel–rail force were selected using the Sobol global sensitivity analysis method. Then, a CNN-GRU proxy model was constructed, with the actually measured wheel–rail forces as a reference. By combining this approach with NSGA-II (Non-dominated Sorting Genetic Algorithm II), the key suspension parameters were calculated. Finally, the estimated parameter values were applied into the vehicle–line coupled multibody dynamical model and validated. The results show that, with the corrected dynamical model, the relative errors of the simulated wheel–rail force are reduced from 9.28%, 6.24% and 18.11% to 7%, 4.52% and 10.44%, corresponding to straight, curve, and long and steep uphill conditions, respectively. The wheel–rail force simulation’s precision is increased, indicating that the proposed method is effective in estimating the suspension parameters for heavy-duty freight trains. Full article

(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

► Show Figures

Figure 1

38 pages, 3749 KiB

Open AccessArticle

Patient Satisfaction with the Mawiidi Hospital Appointment Scheduling Application: Insights from the Information Systems Success Model and Technology Acceptance Model in a Moroccan Healthcare Setting

by Abdelaziz Ouajdouni, Khalid Chafik, Soukaina Allioui and Mourad Jbene

Big Data Cogn. Comput. 2024, 8(12), 180; https://doi.org/10.3390/bdcc8120180 - 3 Dec 2024

Viewed by 1312

Abstract

This article aims to find the determinants that affect patient satisfaction regarding the Mawiidi public portal in Moroccan public hospitals and assess its outpatient online booking system effectiveness using a model that integrates the Technology Acceptance Model (TAM) with the Information Systems Success [...] Read more.

This article aims to find the determinants that affect patient satisfaction regarding the Mawiidi public portal in Moroccan public hospitals and assess its outpatient online booking system effectiveness using a model that integrates the Technology Acceptance Model (TAM) with the Information Systems Success Model (ISSM) while adopting a quantitative research methodology. The analysis was conducted using 348 self-administered questionnaires to analyze eight key constructs, such as information quality, patient satisfaction, perceived ease of use, and privacy protection, among others. The results of PLS-SEM verified six out of eleven hypotheses tested, which reflected that information quality has a positive influence on perceived ease of use, which again enhances patient satisfaction. The major factors influencing the satisfaction and trust of patients in online appointment scheduling systems at public hospitals are highlighted. Indeed, privacy protection enhances patient satisfaction and trust. Service quality positively affects satisfaction but to a lesser degree. Website-related anxiety impacts perceived ease of use, although it has a limited influence on satisfaction. Such findings can inform suggestions for the managers of hospitals and portal designers to increase user satisfaction. This study uses a model from the TAM and ISSM frameworks, including cultural and socioeconomic aspects that apply to Morocco’s healthcare context. Full article

► Show Figures

Figure 1

19 pages, 1803 KiB

Open AccessArticle

Exploring Named Entity Recognition via MacBERT-BiGRU and Global Pointer with Self-Attention

by Chengzhe Yuan, Feiyi Tang, Chun Shan, Weiqiang Shen, Ronghua Lin, Chengjie Mao and Junxian Li

Big Data Cogn. Comput. 2024, 8(12), 179; https://doi.org/10.3390/bdcc8120179 - 3 Dec 2024

Viewed by 790

Abstract

Named Entity Recognition (NER) is a fundamental task in natural language processing that aims to identify and categorize named entities within unstructured text. In recent years, with the development of deep learning techniques, pre-trained language models have been widely used in NER tasks. [...] Read more.

Named Entity Recognition (NER) is a fundamental task in natural language processing that aims to identify and categorize named entities within unstructured text. In recent years, with the development of deep learning techniques, pre-trained language models have been widely used in NER tasks. However, these models still face limitations in terms of their scalability and adaptability, especially when dealing with complex linguistic phenomena such as nested entities and long-range dependencies. To address these challenges, we propose the MacBERT-BiGRU-Self Attention-Global Pointer (MB-GAP) model, which integrates MacBERT for deep semantic understanding, BiGRU for rich contextual information, self-attention for focusing on relevant parts of the input, and a global pointer mechanism for precise entity boundary detection. By optimizing the number of attention heads and global pointer heads, our model achieves an effective balance between complexity and performance. Extensive experiments on benchmark datasets, including ResumeNER, CLUENER2020, and SCHOLAT-School, demonstrate significant improvements over baseline models. Full article

(This article belongs to the Special Issue Research Progress in Artificial Intelligence and Social Network Analysis)

► Show Figures

Figure 1

24 pages, 2138 KiB

Open AccessArticle

A Multimodal Machine Learning Model in Pneumonia Patients Hospital Length of Stay Prediction

by Anna Annunziata, Salvatore Cappabianca, Salvatore Capuozzo, Nicola Coppola, Camilla Di Somma, Ludovico Docimo, Giuseppe Fiorentino, Michela Gravina, Lidia Marassi, Stefano Marrone, Domenico Parmeggiani, Giorgio Emanuele Polistina, Alfonso Reginelli, Caterina Sagnelli and Carlo Sansone

Big Data Cogn. Comput. 2024, 8(12), 178; https://doi.org/10.3390/bdcc8120178 - 3 Dec 2024

Viewed by 750

Abstract

Hospital overcrowding, driven by both structural management challenges and widespread medical emergencies, has prompted extensive research into machine learning (ML) solutions for predicting patient length of stay (LOS) to optimize bed allocation. While many existing models simplify the LOS prediction problem to a [...] Read more.

Hospital overcrowding, driven by both structural management challenges and widespread medical emergencies, has prompted extensive research into machine learning (ML) solutions for predicting patient length of stay (LOS) to optimize bed allocation. While many existing models simplify the LOS prediction problem to a classification task, predicting broad ranges of hospital days, an exact day-based regression model is often crucial for precise planning. Additionally, available data are typically limited and heterogeneous, often collected from a small patient cohort. To address these challenges, we present a novel multimodal ML framework that combines imaging and clinical data to enhance LOS prediction accuracy. Specifically, our approach uses the following: (i) feature extraction from chest CT scans via a convolutional neural network (CNN), (ii) their integration with clinically relevant tabular data from patient exams, refined through a feature selection system to retain only significant predictors. As a case study, we applied this framework to pneumonia patient data collected during the COVID-19 pandemic at two hospitals in Naples, Italy—one specializing in infectious diseases and the other general-purpose. Under our experimental setup, the proposed system achieved an average prediction error of only three days, demonstrating its potential to improve patient flow management in critical care environments. Full article

(This article belongs to the Special Issue Application of Deep Learning and Convolution Neural Networks for Social Healthcare)

► Show Figures

Figure 1

23 pages, 377 KiB

Open AccessReview

Application of Task Allocation Algorithms in Multi-UAV Intelligent Transportation Systems: A Critical Review

by Marco Rinaldi, Sheng Wang, Renan Sanches Geronel and Stefano Primatesta

Big Data Cogn. Comput. 2024, 8(12), 177; https://doi.org/10.3390/bdcc8120177 - 2 Dec 2024

Viewed by 1696

Abstract

Unmanned aerial vehicles (UAVs), commonly known as drones, are being seen as the most promising type of autonomous vehicles in the context of intelligent transportation system (ITS) technology. A key enabling factor for the current development of ITS technology based on autonomous vehicles [...] Read more.

Unmanned aerial vehicles (UAVs), commonly known as drones, are being seen as the most promising type of autonomous vehicles in the context of intelligent transportation system (ITS) technology. A key enabling factor for the current development of ITS technology based on autonomous vehicles is the task allocation architecture. This approach allows tasks to be efficiently assigned to robots of a multi-agent system, taking into account both the robots’ capabilities and service requirements. Consequently, this study provides an overview of the application of drones in ITSs, focusing on the applications of task allocation algorithms for UAV networks. Currently, there are different types of algorithms that are employed for task allocation in drone-based intelligent transportation systems, including market-based approaches, game-theory-based algorithms, optimization-based algorithms, machine learning techniques, and other hybrid methodologies. This paper offers a comprehensive literature review of how such approaches are being utilized to optimize the allocation of tasks in UAV-based ITSs. The main characteristics, constraints, and limitations are detailed to highlight their advantages, current achievements, and applicability to different types of UAV-based ITSs. Current research trends in this field as well as gaps in the literature are also thoughtfully discussed. Full article

(This article belongs to the Special Issue Machine Learning and AI Technology for Sustainable Development)

30 pages, 9597 KiB

Open AccessArticle

PSR-LeafNet: A Deep Learning Framework for Identifying Medicinal Plant Leaves Using Support Vector Machines

by Praveen Kumar Sekharamantry, Marada Srinivasa Rao, Yarramalle Srinivas and Archana Uriti

Big Data Cogn. Comput. 2024, 8(12), 176; https://doi.org/10.3390/bdcc8120176 - 1 Dec 2024

Viewed by 1057

Abstract

In computer vision, recognizing plant pictures has emerged as a multidisciplinary area of interest. In the last several years, much research has been conducted to determine the type of plant in each image automatically. The challenges in identifying the medicinal plants are due [...] Read more.

In computer vision, recognizing plant pictures has emerged as a multidisciplinary area of interest. In the last several years, much research has been conducted to determine the type of plant in each image automatically. The challenges in identifying the medicinal plants are due to the changes in the effects of image light, stance, and orientation. Further, it is difficult to identify the medicinal plants due to factors like variations in leaf shape with age and changing leaf color in response to varying weather conditions. The proposed work uses machine learning techniques and deep neural networks to choose appropriate leaf features to determine if the leaf is a medicinal or non-medicinal plant. This study presents a neural network design based on PSR-LeafNet (PSR-LN). PSR-LeafNet is a single network that combines the P-Net, S-Net, and R-Net, all intended for leaf feature extraction using the minimum redundancy maximum relevance (MRMR) approach. The PSR-LN helps obtain the shape features, color features, venation of the leaf, and textural features. A support vector machine (SVM) is applied to the output achieved from the PSR network, which helps classify the name of the plant. The model design is named PSR-LN-SVM. The advantage of the designed model is that it suits more considerable dataset processing and provides better results than traditional neural network models. The methodology utilized in the work achieves an accuracy of 97.12% for the MalayaKew dataset, 98.10% for the IMP dataset, and 95.88% for the Flavia dataset. The proposed models surpass all the existing models, having an improvement in accuracy. These outcomes demonstrate that the suggested method is successful in accurately recognizing the leaves of medicinal plants, paving the way for more advanced uses in plant taxonomy and medicine. Full article

(This article belongs to the Special Issue Emerging Trends and Applications of Big Data in Robotic Systems)

► Show Figures

Graphical abstract

12 pages, 1482 KiB

Open AccessArticle

Semi-Open Set Object Detection Algorithm Leveraged by Multi-Modal Large Language Models

by Kewei Wu, Yiran Wang, Xiaogang He, Jinyu Yan, Yang Guo, Zhuqing Jiang, Xing Zhang, Wei Wang, Yongping Xiong, Aidong Men and Li Xiao

Big Data Cogn. Comput. 2024, 8(12), 175; https://doi.org/10.3390/bdcc8120175 - 29 Nov 2024

Viewed by 764

Abstract

Currently, closed-set object detection models represented by YOLO are widely deployed in the industrial field. However, such closed-set models lack sufficient tuning ability for easily confused objects in complex detection scenarios. Open-set object detection models such as GroundingDINO expand the detection range to [...] Read more.

Currently, closed-set object detection models represented by YOLO are widely deployed in the industrial field. However, such closed-set models lack sufficient tuning ability for easily confused objects in complex detection scenarios. Open-set object detection models such as GroundingDINO expand the detection range to a certain extent, but they still have a gap in detection accuracy compared with closed-set detection models and cannot meet the requirements for high-precision detection in practical applications. In addition, existing detection technologies are also insufficient in interpretability, making it difficult to clearly show users the basis and process of judgment of detection results, causing users to have doubts about the trust and application of detection results. Based on the above deficiencies, we propose a new object detection algorithm based on multi-modal large language models that significantly improves the detection effect of closed-set object detection models for more difficult boundary tasks while ensuring detection accuracy, thereby achieving a semi-open set object detection algorithm. It has significant improvements in accuracy and interpretability under the verification of seven common traffic and safety production scenarios. Full article

(This article belongs to the Special Issue Big Data Analytics and Edge Computing: Recent Trends and Future)

► Show Figures

Figure 1

47 pages, 2104 KiB

Open AccessReview

Exploring IoT and Blockchain: A Comprehensive Survey on Security, Integration Strategies, Applications and Future Research Directions

by Muath A. Obaidat, Majdi Rawashdeh, Mohammad Alja’afreh, Meryem Abouali, Kutub Thakur and Ali Karime

Big Data Cogn. Comput. 2024, 8(12), 174; https://doi.org/10.3390/bdcc8120174 - 28 Nov 2024

Viewed by 1582

Abstract

The rise of the Internet of Things (IoT) has driven significant advancements across sectors such as urbanization, manufacturing, and healthcare, all of which are focused on enhancing quality of life and stimulating the global economy. This survey offers an in-depth analysis of the [...] Read more.

The rise of the Internet of Things (IoT) has driven significant advancements across sectors such as urbanization, manufacturing, and healthcare, all of which are focused on enhancing quality of life and stimulating the global economy. This survey offers an in-depth analysis of the integration of blockchain technology with IoT, addressing aspects such as architectural alignment, applications, security, limitations, scalability, and latency. Moreover, this survey focuses on security, integration techniques, and future research directions. The primary contributions of this review include a taxonomy of security concerns specific to IoT, an analysis of integration methods, and insights into consensus mechanisms suitable for resource-constrained environments. These findings highlight the unique challenges and opportunities in IoT–blockchain integration, providing a foundation for advancing secure and scalable IoT applications. By exploring consensus mechanisms and resource-constrained deployments, this paper provides a framework for developing secure and efficient IoT applications utilizing blockchain technology and providing a basis for future research and practical applications. In addition, this survey investigates innovative trends, including AI-driven blockchain for IoT. Full article

(This article belongs to the Special Issue Big Data and Internet of Things in Smart Cities)

► Show Figures

Figure 1

14 pages, 4503 KiB

Open AccessArticle

Personality Traits Estimation Based on Job Interview Video Analysis: Importance of Human Nonverbal Cues Detection

by Kenan Kassab and Alexey Kashevnik

Big Data Cogn. Comput. 2024, 8(12), 173; https://doi.org/10.3390/bdcc8120173 - 28 Nov 2024

Viewed by 875

Abstract

In this research, we delve into the analysis of non-verbal cues and their impact on evaluating job performance estimation and hireability by analyzing video interviews. We study a variety of non-verbal cues, which can be extracted from video interviews and can provide a [...] Read more.

In this research, we delve into the analysis of non-verbal cues and their impact on evaluating job performance estimation and hireability by analyzing video interviews. We study a variety of non-verbal cues, which can be extracted from video interviews and can provide a framework that utilizes the extracted features, and we combine them with personality traits to estimate sales abilities. Experimenting on the (Human Face Video Dataset for Personality Traits Detection) VPTD dataset, we proved the importance of smiling as a valid indicator for estimating extraversion and sales abilities. We also examined the role of head movements (represented by the rotation angles, roll, pitch, and yaw) since they play a crucial role in evaluating personality traits in general and extraversion and neuroticism in particular. The testing results show how these non-verbal cues can be used as assisting features in the proposed approach to provide a valid, reliable, and accurate estimation of sales abilities and job performance. Full article

► Show Figures

Figure 1

13 pages, 1963 KiB

Open AccessArticle

Machine Learning-Driven Dynamic Traffic Steering in 6G: A Novel Path Selection Scheme

by Hibatul Azizi Hisyam Ng and Toktam Mahmoodi

Big Data Cogn. Comput. 2024, 8(12), 172; https://doi.org/10.3390/bdcc8120172 - 27 Nov 2024

Viewed by 708

Abstract

Machine learning is taking on a significant role in materializing a new vision of 6G. 6G aspires to provide more use cases, handle high-complexity tasks, and improvise the current 5G and beyond 5G infrastructure. Artificial Intelligence (AI) and machine learning (ML) are the [...] Read more.

Machine learning is taking on a significant role in materializing a new vision of 6G. 6G aspires to provide more use cases, handle high-complexity tasks, and improvise the current 5G and beyond 5G infrastructure. Artificial Intelligence (AI) and machine learning (ML) are the optimal candidates to support and deliver these aspirations. Traffic steering functions encompass many opportunities to help enable new use cases and improve overall performance. The emergence and advancement of the non-terrestrial network is another driving factor for creating an intelligence selection scheme to have a dynamic traffic steering function. With service-based architecture, 5G and 6G are data-driven architectures that use massive transactional data to emerge a new approach to handling highly complex processes. A highly complex process, a massive volume of data, and a short timeframe require a scheme using machine learning techniques to resolve the challenges. In this paper, the study creates a scheme to use the massive historical data and provide a decision scheme that enables dynamic traffic steering functions addressing the future emergence of the heterogeneous transport network and aligns with the Open Radio Access Network (O-RAN). The proposed scheme in this paper gives an inference to be programmed in the telecommunication nodes. It provides a novel scheme to enable dynamic traffic steering functions for the 6G transport network. The study shows an appropriate data size to create a high-performance multi-output classification model that produces more than 90% accuracy for traffic steering functions. Full article

► Show Figures

Figure 1

19 pages, 1428 KiB

Open AccessArticle

Behavioral Analysis of Android Riskware Families Using Clustering and Explainable Machine Learning

by Mohammed M. Alani and Moatsum Alawida

Big Data Cogn. Comput. 2024, 8(12), 171; https://doi.org/10.3390/bdcc8120171 - 26 Nov 2024

Viewed by 754

Abstract

The Android operating system has become increasingly popular, not only on mobile phones but also in various other platforms such as Internet-of-Things devices, tablet computers, and wearable devices. Due to its open-source nature and significant market share, Android poses an attractive target for [...] Read more.

The Android operating system has become increasingly popular, not only on mobile phones but also in various other platforms such as Internet-of-Things devices, tablet computers, and wearable devices. Due to its open-source nature and significant market share, Android poses an attractive target for malicious actors. One of the notable security challenges associated with this operating system is riskware. Riskware refers to applications that may pose a security threat due to their vulnerability and potential for misuse. Although riskware constitutes a considerable portion of Android’s ecosystem malware, it has not been studied as extensively as other types of malware such as ransomware and trojans. In this study, we employ machine learning techniques to analyze the behavior of different riskware families and identify similarities in their actions. Furthermore, our research identifies specific behaviors that can be used to distinguish these riskware families. To achieve these insights, we utilize various tools such as k-Means clustering, principal component analysis, extreme gradient boost classifiers, and Shapley additive explanation. Our findings can contribute significantly to the detection, identification, and forensic analysis of Android riskware. Full article

► Show Figures

Figure 1

24 pages, 1944 KiB

Open AccessArticle

Investigating Offensive Language Detection in a Low-Resource Setting with a Robustness Perspective

by Israe Abdellaoui, Anass Ibrahimi, Mohamed Amine El Bouni, Asmaa Mourhir, Saad Driouech and Mohamed Aghzal

Big Data Cogn. Comput. 2024, 8(12), 170; https://doi.org/10.3390/bdcc8120170 - 25 Nov 2024

Viewed by 1051

Abstract

Moroccan Darija, a dialect of Arabic, presents unique challenges for natural language processing due to its lack of standardized orthographies, frequent code switching, and status as a low-resource language. In this work, we focus on detecting offensive language in Darija, addressing these complexities. [...] Read more.

Moroccan Darija, a dialect of Arabic, presents unique challenges for natural language processing due to its lack of standardized orthographies, frequent code switching, and status as a low-resource language. In this work, we focus on detecting offensive language in Darija, addressing these complexities. We present three key contributions that advance the field. First, we introduce a human-labeled dataset of Darija text collected from social media platforms. Second, we explore and fine-tune various language models on the created dataset. This investigation identifies a Darija RoBERTa-based model as the most effective approach, with an accuracy of 90% and F1 score of 85%. Third, we evaluate the best model beyond accuracy by assessing properties such as correctness, robustness and fairness using metamorphic testing and adversarial attacks. The results highlight potential vulnerabilities in the model’s robustness, with the model being susceptible to attacks such as inserting dots (29.4% success rate), inserting spaces (24.5%), and modifying characters in words (18.3%). Fairness assessments show that while the model is generally fair, it still exhibits bias in specific cases, with a 7% success rate for attacks targeting entities typically subject to discrimination. The key finding is that relying solely on offline metrics such as the F1 score and accuracy in evaluating machine learning systems is insufficient. For low-resource languages, the recommendation is to focus on identifying and addressing domain-specific biases and enhancing pre-trained monolingual language models with diverse and noisier data to improve their robustness and generalization capabilities in diverse linguistic scenarios. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Big Data Cogn. Comput., Volume 8, Issue 12 (December 2024) – 34 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI