Journal Description
Big Data and Cognitive Computing
Big Data and Cognitive Computing
is an international, peer-reviewed, open access journal on big data and cognitive computing published monthly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: JCR - Q1 (Computer Science, Theory and Methods) / CiteScore - Q1 (Management Information Systems)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 25.3 days after submission; acceptance to publication is undertaken in 5.6 days (median values for papers published in this journal in the second half of 2024).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
3.7 (2023)
Latest Articles
VSA-GCNN: Attention Guided Graph Neural Networks for Brain Tumor Segmentation and Classification
Big Data Cogn. Comput. 2025, 9(2), 29; https://doi.org/10.3390/bdcc9020029 - 31 Jan 2025
Abstract
►
Show Figures
For the past few decades, brain tumors have had a substantial influence on human life, and pose severe health risks if not treated and diagnosed in the early stages. Brain tumor problems are highly diverse and vary extensively in terms of size, type,
[...] Read more.
For the past few decades, brain tumors have had a substantial influence on human life, and pose severe health risks if not treated and diagnosed in the early stages. Brain tumor problems are highly diverse and vary extensively in terms of size, type, and location. This brain tumor diversity makes it challenging to progress an accurate and reliable diagnostic tool. In order to effectively segment and classify the tumor region, still several developments are required to make an accurate diagnosis. Thus, the purpose of this research is to accurately segment and classify brain tumor Magnetic Resonance Images (MRI) to enhance diagnosis. Primarily, the images are collected from BraTS 2019, 2020, and 2021 datasets, which are pre-processed using min–max normalization to eliminate noise. Then, the pre-processed images are given into the segmentation stage, where a Variational Spatial Attention with Graph Convolutional Neural Network (VSA-GCNN) is applied to handle the variations in tumor shape, size, and location. Then, the segmented outputs are processed into feature extraction, where an AlexNet model is used to reduce the dimensionality. Finally, in the classification stage, a Bidirectional Gated Recurrent Unit (Bi-GRU) is employed to classify the brain tumor regions as gliomas and meningiomas. From the results, it is evident that the proposed VSA-GCNN-BiGRU shows superior results on the BraTS 2019 dataset in terms of accuracy (99.98%), sensitivity (99.92%), and specificity (99.91%) when compared with existing models. By considering the BraTS 2020 dataset, the proposed VSA-GCNN-BiGRU shows superior results in terms of Dice similarity coefficient (0.4), sensitivity (97.7%), accuracy (98.2%), and specificity (97.4%). While evaluating with the BraTS 2021 dataset, the proposed VSA-GCNN-BiGRU achieved specificity of 97.6%, Dice similarity of 98.6%, sensitivity of 99.4%, and accuracy of 99.8%. From the overall observation, the proposed VSA-GCNN-BiGRU supports accurate brain tumor segmentation and classification, which provides clinical significance in MRI when compared to existing models.
Full article
Open AccessArticle
Lightweight GAN-Assisted Class Imbalance Mitigation for Apple Flower Bud Detection
by
Wenan Yuan and Peng Li
Big Data Cogn. Comput. 2025, 9(2), 28; https://doi.org/10.3390/bdcc9020028 - 29 Jan 2025
Abstract
Multi-class object detectors often suffer from the class imbalance issue, where substantial model performance discrepancies exist between classes. Generative adversarial networks (GANs), an emerging deep learning research topic, are able to learn from existing data distributions and generate similar synthetic data, which might
[...] Read more.
Multi-class object detectors often suffer from the class imbalance issue, where substantial model performance discrepancies exist between classes. Generative adversarial networks (GANs), an emerging deep learning research topic, are able to learn from existing data distributions and generate similar synthetic data, which might serve as valid training data for improving object detectors. The current study investigated the utility of lightweight unconditional GAN in addressing weak object detector class performance by incorporating synthetic data into real data for model retraining, under an agricultural context. AriAplBud, a multi-growth stage aerial apple flower bud dataset was deployed in the study. A baseline YOLO11n detector was first developed based on training, validation, and test datasets derived from AriAplBud. Six FastGAN models were developed based on dedicated subsets of the same YOLO training and validation datasets for different apple flower bud growth stages. Positive sample rates and average instance number per image of synthetic data generated by each of the FastGAN models were investigated based on 1000 synthetic images and the baseline detector at various confidence thresholds. In total, 13 new YOLO11n detectors were retrained specifically for the two weak growth stages, tip and half-inch green, by including synthetic data in training datasets to increase total instance number to 1000, 2000, 4000, and 8000, respectively, pseudo-labeled by the baseline detector. FastGAN showed its resilience in successfully generating positive samples, despite apple flower bud instances being generally small and randomly distributed in the images. Positive sample rates of the synthetic datasets were negatively correlated with the detector confidence thresholds as expected, which ranged from 0 to 1. Higher overall positive sample rates were observed for the growth stages with higher detector performance. The synthetic images generally contained fewer detector-detectable instances per image than the corresponding real training images. The best achieved YOLO11n AP improvements in the retrained detectors for tip and half-inch green were 30.13% and 14.02% respectively, while the best achieved YOLO11n mAP improvement was 2.83%. However, the relationship between synthetic training instance quantity and detector class performances had yet to be determined. GAN was concluded to be beneficial in retraining object detectors and improving their performances. Further studies are still in need to investigate the influence of synthetic training data quantity and quality on retrained object detector performance.
Full article
(This article belongs to the Special Issue Advances and Applications of Deep Learning Methods and Image Processing)
►▼
Show Figures
Figure 1
Open AccessArticle
Application of Symbolic Classifiers and Multi-Ensemble Threshold Techniques for Android Malware Detection
by
Nikola Anđelić, Sandi Baressi Šegota and Vedran Mrzljak
Big Data Cogn. Comput. 2025, 9(2), 27; https://doi.org/10.3390/bdcc9020027 - 29 Jan 2025
Abstract
Android malware detection using artificial intelligence today is a mandatory tool to prevent cyber attacks. To address this problem in this paper the proposed methodology consists of the application of genetic programming symbolic classifier (GPSC) to obtain symbolic expressions (SEs) that can detect
[...] Read more.
Android malware detection using artificial intelligence today is a mandatory tool to prevent cyber attacks. To address this problem in this paper the proposed methodology consists of the application of genetic programming symbolic classifier (GPSC) to obtain symbolic expressions (SEs) that can detect if the android is malware or not. To find the optimal combination of GPSC hyperparameter values the random hyperparameter values search method (RHVS) method and the GPSC were trained using 5-fold cross-validation (5FCV). It should be noted that the initial dataset is highly imbalanced (publicly available dataset). This problem was addressed by applying various preprocessing and oversampling techniques thus creating a huge number of balanced dataset variations and on each dataset variation the GPSC was trained. Since the dataset has many input variables three different approaches were considered: the initial investigation with all input variables, input variables with high feature importance, application of principal component analysis. After the SEs with the highest classification performance were obtained they were used in threshold-based voting ensembles and the threshold values were adjusted to improve classification performance. Multi-TBVE has been developed and using them the robust system for Android malware detection was achieved with the highest accuracy of 0.98 was obtained.
Full article
(This article belongs to the Special Issue Big Data Analytics with Machine Learning for Cyber Security)
Open AccessArticle
Leveraging Open Big Data from R&D Projects with Large Language Models
by
Desireé Ruiz, Yudith Cardinale, Abraham Casas and Vanessa Moscardó
Big Data Cogn. Comput. 2025, 9(2), 26; https://doi.org/10.3390/bdcc9020026 - 28 Jan 2025
Abstract
Recent studies have highlighted the potential of Large Language Models (LLMs) to become experts in specific areas of knowledge through the utilization of techniques that enhance their context. Nevertheless, an interesting and underexplored application in the literature is the creation of an LLM
[...] Read more.
Recent studies have highlighted the potential of Large Language Models (LLMs) to become experts in specific areas of knowledge through the utilization of techniques that enhance their context. Nevertheless, an interesting and underexplored application in the literature is the creation of an LLM that specializes in research projects, as it could streamline the process of project ideation and accelerate the advancement of research initiatives. In this regard, the aim of this work is to develop a tool based on LLM technology capable of assisting the employees of technology centers in answering their queries related to research projects funded under the Horizon 2020 program. By facilitating the identification of suitable funding calls and the formation of consortia with partners meeting specific requirements, tasks that are traditionally time-intensive, the proposed tool has the potential to improve operational efficiency and enable technology centers to allocate their resources more effectively. To improve the model’s baseline performance, context extension techniques such as Retrieved Augmented Generation (RAG) and prompt engineering were explored. Specifically, different RAG approaches and configurations, along with a specialized prompt, were tested on the LLaMA 3 70B model, and their results were compared to those obtained without context extension. The proposed evaluation metrics, which aligned with human judgment while maintaining objectivity, revealed that RAG systems outperformed the standalone LLaMA 3 70B, achieving a rate of optimal responses of up to 46% compared to 0% for the baseline model. These findings emphasize that integrating RAG and prompt engineering pipelines into LLMs can address key limitations, such as generating accurate and informative answers. Moreover, this study demonstrates the practical feasibility of leveraging advanced LLM configurations to support research-driven organizations, highlighting a pathway for the further development of intelligent tools that enhance productivity and foster innovation in the research domain.
Full article
Open AccessArticle
Optimizing Convolutional Neural Network Architectures with Optimal Activation Functions for Pediatric Pneumonia Diagnosis Using Chest X-rays
by
Petra Radočaj, Dorijan Radočaj and Goran Martinović
Big Data Cogn. Comput. 2025, 9(2), 25; https://doi.org/10.3390/bdcc9020025 - 27 Jan 2025
Abstract
Pneumonia remains a significant cause of morbidity and mortality among pediatric patients worldwide. Accurate and timely diagnosis is crucial for effective treatment and improved patient outcomes. Traditionally, pneumonia diagnosis has relied on a combination of clinical evaluation and radiologists’ interpretation of chest X-rays.
[...] Read more.
Pneumonia remains a significant cause of morbidity and mortality among pediatric patients worldwide. Accurate and timely diagnosis is crucial for effective treatment and improved patient outcomes. Traditionally, pneumonia diagnosis has relied on a combination of clinical evaluation and radiologists’ interpretation of chest X-rays. However, this process is time-consuming and prone to inconsistencies in diagnosis. The integration of advanced technologies such as Convolutional Neural Networks (CNNs) into medical diagnostics offers a potential to enhance the accuracy and efficiency. In this study, we conduct a comprehensive evaluation of various activation functions within CNNs for pediatric pneumonia classification using a dataset of 5856 chest X-ray images. The novel Mish activation function was compared with Swish and ReLU, demonstrating superior performance in terms of accuracy, precision, recall, and F1-score in all cases. Notably, InceptionResNetV2 combined with Mish activation function achieved the highest overall performance with an accuracy of 97.61%. Although the dataset used may not fully represent the diversity of real-world clinical cases, this research provides valuable insights into the influence of activation functions on CNN performance in medical image analysis, laying a foundation for future automated pneumonia diagnostic systems.
Full article
(This article belongs to the Topic Applied Computing and Machine Intelligence (ACMI))
Open AccessArticle
BankNet: Real-Time Big Data Analytics for Secure Internet Banking
by
Kaushik Sathupadi, Sandesh Achar, Shinoy Vengaramkode Bhaskaran, Nuruzzaman Faruqui and Jia Uddin
Big Data Cogn. Comput. 2025, 9(2), 24; https://doi.org/10.3390/bdcc9020024 - 26 Jan 2025
Abstract
►▼
Show Figures
The rapid growth of Internet banking has necessitated advanced systems for secure, real-time decision making. This paper introduces BankNet, a predictive analytics framework integrating big data tools and a BiLSTM neural network to deliver high-accuracy transaction analysis. BankNet achieves exceptional predictive performance, with
[...] Read more.
The rapid growth of Internet banking has necessitated advanced systems for secure, real-time decision making. This paper introduces BankNet, a predictive analytics framework integrating big data tools and a BiLSTM neural network to deliver high-accuracy transaction analysis. BankNet achieves exceptional predictive performance, with a Root Mean Squared Error of 0.0159 and fraud detection accuracy of 98.5%, while efficiently handling data rates up to 1000 Mbps with minimal latency. By addressing critical challenges in fraud detection and operational efficiency, BankNet establishes itself as a robust decision support system for modern Internet banking. Its scalability and precision make it a transformative tool for enhancing security and trust in financial services.
Full article
Figure 1
Open AccessArticle
Labeling Network Intrusion Detection System (NIDS) Rules with MITRE ATT&CK Techniques: Machine Learning vs. Large Language Models
by
Nir Daniel, Florian Klaus Kaiser, Shay Giladi, Sapir Sharabi, Raz Moyal, Shalev Shpolyansky, Andres Murillo, Aviad Elyashar and Rami Puzis
Big Data Cogn. Comput. 2025, 9(2), 23; https://doi.org/10.3390/bdcc9020023 - 26 Jan 2025
Abstract
Analysts in Security Operations Centers (SOCs) are often occupied with time-consuming investigations of alerts from Network Intrusion Detection Systems (NIDSs). Many NIDS rules lack clear explanations and associations with attack techniques, complicating the alert triage and the generation of attack hypotheses. Large Language
[...] Read more.
Analysts in Security Operations Centers (SOCs) are often occupied with time-consuming investigations of alerts from Network Intrusion Detection Systems (NIDSs). Many NIDS rules lack clear explanations and associations with attack techniques, complicating the alert triage and the generation of attack hypotheses. Large Language Models (LLMs) may be a promising technology to reduce the alert explainability gap by associating rules with attack techniques. In this paper, we investigate the ability of three prominent LLMs (ChatGPT, Claude, and Gemini) to reason about NIDS rules while labeling them with MITRE ATT&CK tactics and techniques. We discuss prompt design and present experiments performed with 973 Snort rules. Our results indicate that while LLMs provide explainable, scalable, and efficient initial mappings, traditional machine learning (ML) models consistently outperform them in accuracy, achieving higher precision, recall, and F1-scores. These results highlight the potential for hybrid LLM-ML approaches to enhance SOC operations and better address the evolving threat landscape. By utilizing automation, the presented methods will enhance the analysis efficiency of SOC alerts, and decrease workloads for analysts.
Full article
(This article belongs to the Special Issue Generative AI and Large Language Models)
Open AccessArticle
Evaluating the Effect of Surrogate Data Generation on Healthcare Data Assessment
by
Saeid Sanei, Tracey K. M. Lee, Issam Boukhennoufa, Delaram Jarchi, Xiaojun Zhai and Klaus McDonald-Maier
Big Data Cogn. Comput. 2025, 9(2), 22; https://doi.org/10.3390/bdcc9020022 - 26 Jan 2025
Abstract
In healthcare applications, often it is not possible to record sufficient data as required for deep learning or data-driven classification and feature detection systems due to the patient condition, various clinical or experimental limitations, or time constraints. On the other hand, data imbalance
[...] Read more.
In healthcare applications, often it is not possible to record sufficient data as required for deep learning or data-driven classification and feature detection systems due to the patient condition, various clinical or experimental limitations, or time constraints. On the other hand, data imbalance invalidates many of the test results crucial for clinical approvals. Generating synthetic (artificial or dummy) data has become a potential solution to address this issue. Such data should possess adequate information, properties, and characteristics to mimic the real-world data recorded in natural circumstances. Several methods have been proposed for this purpose, and results often show that adding surrogates improves the decision-making accuracy. This article evaluates the most recent surrogate data generation and data synthesis methods to investigate the effects of the number of surrogates on improving the classification results. It is shown that the data analysis/classification results improve with an increasing number of surrogates, but this no longer continues after a certain number of surrogates. This achievement helps in deciding on the number of surrogates for each strategy, resulting in the alleviation of the computation cost.
Full article
Open AccessArticle
Long-Term Forecasting of Solar Irradiation in Riyadh, Saudi Arabia, Using Machine Learning Techniques
by
Khalil AlSharabi, Yasser Bin Salamah, Majid Aljalal, Akram M. Abdurraqeeb and Fahd A. Alturki
Big Data Cogn. Comput. 2025, 9(2), 21; https://doi.org/10.3390/bdcc9020021 - 25 Jan 2025
Abstract
►▼
Show Figures
Forecasting of time series data presents some challenges because the data’s nature is complex and therefore difficult to accurately forecast. This study presents the design and development of a novel forecasting system that integrates efficient data processing techniques with advanced machine learning algorithms
[...] Read more.
Forecasting of time series data presents some challenges because the data’s nature is complex and therefore difficult to accurately forecast. This study presents the design and development of a novel forecasting system that integrates efficient data processing techniques with advanced machine learning algorithms to improve time series forecasting across the sustainability domain. Specifically, this study focuses on solar irradiation forecasting in Riyadh, Saudi Arabia. Efficient and accurate forecasts of solar irradiation are important for optimizing power production and its smooth integration into the utility grid. This advancement supports Saudi Arabia in Vision 2030, which aims to generate and utilize renewable energy sources to drive sustainable development. Therefore, the proposed forecasting system has been developed to the parameters characteristic of the Riyadh region environment, including high solar intensity, dust storms, and unpredictable weather conditions. After the cleaning and filtering process, the filtered dataset was pre-processed using the standardization method. Then, the Discrete Wavelet Transform (DWT) technique has been applied to extract the features of the pre-processed data. Next, the extracted features of the solar dataset have been split into three subsets: train, test, and forecast. Finally, two different machine learning techniques have been utilized for the forecasting process: Support Vector Machine (SVM) and Gaussian Process (GP) techniques. The proposed forecasting system has been evaluated across different time horizons: one-day, five-day, ten-day, and fifteen-day ahead. Comprehensive evaluation metrics were calculated including accuracy, stability, and generalizability measures. The study outcomes present the proposed forecasting system which provides a more robust and adaptable solution for time-series long-term forecasting and complex patterns of solar irradiation in Riyadh, Saudi Arabia.
Full article
Figure 1
Open AccessArticle
Fit Talks: Forecasting Fitness Awareness in Saudi Arabia Using Fine-Tuned Transformers
by
Nora Alturayeif, Deemah Alqahtani, Sumayh S. Aljameel, Najla Almajed, Lama Alshehri, Nourah Aldhuwaihi, Madawi Alhadyan and Nouf Aldakheel
Big Data Cogn. Comput. 2025, 9(2), 20; https://doi.org/10.3390/bdcc9020020 - 23 Jan 2025
Abstract
►▼
Show Figures
Understanding public sentiment on health and fitness is essential for addressing regional health challenges in Saudi Arabia. This research employs sentiment analysis to assess fitness awareness by analyzing content from the X platform (formerly Twitter), using a dataset called Saudi Aware, which includes
[...] Read more.
Understanding public sentiment on health and fitness is essential for addressing regional health challenges in Saudi Arabia. This research employs sentiment analysis to assess fitness awareness by analyzing content from the X platform (formerly Twitter), using a dataset called Saudi Aware, which includes 3593 posts related to fitness awareness. Preprocessing steps such as normalization, stop-word removal, and tokenization ensured high-quality data. The findings revealed that positive sentiments about fitness and health were more prevalent than negative ones, with posts across all sentiment categories being most common in the western region. However, the eastern region exhibited the highest percentage of positive sentiment, indicating a strong interest in fitness and health. For sentiment classification, we fine-tuned two transformer architectures—BERT and GPT—utilizing three BERT-based models (AraBERT, MARBERT, CAMeLBERT) and GPT-3.5. These findings provide valuable insights into Saudi Arabian attitudes toward fitness and health, offering actionable information for public health campaigns and initiatives.
Full article
Figure 1
Open AccessArticle
Low-Cost Embedded System Applications for Smart Cities
by
Victoria Alejandra Salazar Herrera, Hugo Puertas de Araújo, César Giacomini Penteado, Mario Gazziro and João Paulo Carmo
Big Data Cogn. Comput. 2025, 9(2), 19; https://doi.org/10.3390/bdcc9020019 - 22 Jan 2025
Abstract
The Internet of Things (IoT) represents a transformative technology that allows interconnected devices to exchange data over the Internet, enabling automation and real-time decision making in a variety of areas. A key aspect of the success of the IoT lies in its integration
[...] Read more.
The Internet of Things (IoT) represents a transformative technology that allows interconnected devices to exchange data over the Internet, enabling automation and real-time decision making in a variety of areas. A key aspect of the success of the IoT lies in its integration with low-resource hardware, such as low-cost microprocessors and microcontrollers. These devices, which are affordable and energy efficient, are capable of handling basic tasks such as sensing, processing, and data transmission. Their low cost makes them ideal for IoT applications in low-income communities where the government is often absent. This review aims to present some applications—such as a flood detection system; a monitoring system for analog and digital sensors; an air quality measurement system; a mesh video network for community surveillance; and a real-time fleet management system—that use low-cost hardware such as ESP32, Raspberry Pi, and Arduino, and the MQTT protocol used to implement low-cost monitoring systems applied to improve the quality of life of people in small cities or communities.
Full article
(This article belongs to the Special Issue Application of Cloud Computing in Industrial Internet of Things)
►▼
Show Figures
Figure 1
Open AccessArticle
Improving Synthetic Data Generation Through Federated Learning in Scarce and Heterogeneous Data Scenarios
by
Patricia A. Apellániz, Juan Parras and Santiago Zazo
Big Data Cogn. Comput. 2025, 9(2), 18; https://doi.org/10.3390/bdcc9020018 - 21 Jan 2025
Abstract
Synthetic Data Generation (SDG) is a promising solution for healthcare, offering the potential to generate synthetic patient data closely resembling real-world data while preserving privacy. However, data scarcity and heterogeneity, particularly in under-resourced regions, challenge the effective implementation of SDG. This paper addresses
[...] Read more.
Synthetic Data Generation (SDG) is a promising solution for healthcare, offering the potential to generate synthetic patient data closely resembling real-world data while preserving privacy. However, data scarcity and heterogeneity, particularly in under-resourced regions, challenge the effective implementation of SDG. This paper addresses these challenges using Federated Learning (FL) for SDG, focusing on sharing synthetic patients across nodes. By leveraging collective knowledge and diverse data distributions, we hypothesize that sharing synthetic data can significantly enhance the quality and representativeness of generated data, particularly for institutions with limited or biased datasets. This approach aligns with meta-learning concepts, like Domain Randomized Search. We compare two FL techniques, FedAvg and Synthetic Data Sharing (SDS), the latter being our proposed contribution. Both approaches are evaluated using variational autoencoders with Bayesian Gaussian mixture models across diverse medical datasets. Our results demonstrate that while both methods improve SDG, SDS consistently outperforms FedAvg, producing higher-quality, more representative synthetic data. Non-IID scenarios reveal that while FedAvg achieves improvements of 13–27% in reducing divergence compared to isolated training, SDS achieves reductions exceeding 50% in the worst-performing nodes. These findings underscore synthetic data sharing potential to reduce disparities between data-rich and data-poor institutions, fostering more equitable healthcare research and innovation.
Full article
(This article belongs to the Special Issue Research on Privacy and Data Security)
►▼
Show Figures
Figure 1
Open AccessArticle
Cognitive Method for Synthesising a Fuzzy Controller Mathematical Model Using a Genetic Algorithm for Tuning
by
Serhii Vladov
Big Data Cogn. Comput. 2025, 9(1), 17; https://doi.org/10.3390/bdcc9010017 - 20 Jan 2025
Abstract
►▼
Show Figures
In this article, a fuzzy controller mathematical model synthesising method that uses cognitive computing and a genetic algorithm for automated tuning and adaptation to changing environmental conditions has been developed. The technique consists of 12 stages, including creating the control objects’ mathematical model
[...] Read more.
In this article, a fuzzy controller mathematical model synthesising method that uses cognitive computing and a genetic algorithm for automated tuning and adaptation to changing environmental conditions has been developed. The technique consists of 12 stages, including creating the control objects’ mathematical model and tuning the controller coefficients using classical methods. The research pays special attention to the error parameters and their derivative fuzzification, which simplifies the development of logical rules and helps increase the stability of the systems. The fuzzy controller parameters were tuned using a genetic algorithm in a computational experiment based on helicopter flight data. The results show an increase in the integral quality criterion from 85.36 to 98.19%, which confirms an increase in control efficiency by 12.83%. The fuzzy controller use made it possible to significantly improve the helicopter turboshaft engines’ gas-generator rotor speed control performance, reducing the first and second types of errors by 2.06…12.58 times compared to traditional methods.
Full article
Figure 1
Open AccessArticle
AI-Driven Mental Health Surveillance: Identifying Suicidal Ideation Through Machine Learning Techniques
by
Hesham Allam, Chris Davison, Faisal Kalota, Edward Lazaros and David Hua
Big Data Cogn. Comput. 2025, 9(1), 16; https://doi.org/10.3390/bdcc9010016 - 20 Jan 2025
Abstract
►▼
Show Figures
As suicide rates increase globally, there is a growing need for effective, data-driven methods in mental health monitoring. This study leverages advanced artificial intelligence (AI), particularly natural language processing (NLP) and machine learning (ML), to identify suicidal ideation from Twitter data. A predictive
[...] Read more.
As suicide rates increase globally, there is a growing need for effective, data-driven methods in mental health monitoring. This study leverages advanced artificial intelligence (AI), particularly natural language processing (NLP) and machine learning (ML), to identify suicidal ideation from Twitter data. A predictive model was developed to process social media posts in real time, using NLP and sentiment analysis to detect textual and emotional cues associated with distress. The model aims to identify potential suicide risks accurately, while minimizing false positives, offering a practical tool for targeted mental health interventions. The study achieved notable predictive performance, with an accuracy of 85%, precision of 88%, and recall of 83% in detecting potential suicide posts. Advanced preprocessing techniques, including tokenization, stemming, and feature extraction with term frequency–inverse document frequency (TF-IDF) and count vectorization, ensured high-quality data transformation. A random forest classifier was selected for its ability to handle high-dimensional data and effectively capture linguistic and emotional patterns linked to suicidal ideation. The model’s reliability was supported by a precision–recall AUC score of 0.93, demonstrating its potential for real-time mental health monitoring and intervention. By identifying behavioral patterns and triggers, such as social isolation and bullying, this framework provides a scalable and efficient solution for mental health support, contributing significantly to suicide prevention strategies worldwide.
Full article
Figure 1
Open AccessArticle
Eliciting Emotions: Investigating the Use of Generative AI and Facial Muscle Activation in Children’s Emotional Recognition
by
Manuel A. Solis-Arrazola, Raul E. Sanchez-Yanez, Ana M. S. Gonzalez-Acosta, Carlos H. Garcia-Capulin and Horacio Rostro-Gonzalez
Big Data Cogn. Comput. 2025, 9(1), 15; https://doi.org/10.3390/bdcc9010015 - 20 Jan 2025
Abstract
This study explores children’s emotions through a novel approach of Generative Artificial Intelligence (GenAI) and Facial Muscle Activation (FMA). It examines GenAI’s effectiveness in creating facial images that produce genuine emotional responses in children, alongside FMA’s analysis of muscular activation during these expressions.
[...] Read more.
This study explores children’s emotions through a novel approach of Generative Artificial Intelligence (GenAI) and Facial Muscle Activation (FMA). It examines GenAI’s effectiveness in creating facial images that produce genuine emotional responses in children, alongside FMA’s analysis of muscular activation during these expressions. The aim is to determine if AI can realistically generate and recognize emotions similar to human experiences. The study involves generating a database of 280 images (40 per emotion) of children expressing various emotions. For real children’s faces from public databases (DEFSS and NIMH-CHEFS), five emotions were considered: happiness, angry, fear, sadness, and neutral. In contrast, for AI-generated images, seven emotions were analyzed, including the previous five plus surprise and disgust. A feature vector is extracted from these images, indicating lengths between reference points on the face that contract or expand based on the expressed emotion. This vector is then input into an artificial neural network for emotion recognition and classification, achieving accuracies of up to 99% in certain cases. This approach offers new avenues for training and validating AI algorithms, enabling models to be trained with artificial and real-world data interchangeably. The integration of both datasets during training and validation phases enhances model performance and adaptability.
Full article
(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)
►▼
Show Figures
Figure 1
Open AccessArticle
A Secure Learned Image Codec for Authenticity Verification via Self-Destructive Compression
by
Chen-Hsiu Huang and Ja-Ling Wu
Big Data Cogn. Comput. 2025, 9(1), 14; https://doi.org/10.3390/bdcc9010014 - 15 Jan 2025
Abstract
►▼
Show Figures
In the era of deepfakes and AI-generated content, digital image manipulation poses significant challenges to image authenticity, creating doubts about the credibility of images. Traditional image forensics techniques often struggle to detect sophisticated tampering, and passive detection approaches are reactive, verifying authenticity only
[...] Read more.
In the era of deepfakes and AI-generated content, digital image manipulation poses significant challenges to image authenticity, creating doubts about the credibility of images. Traditional image forensics techniques often struggle to detect sophisticated tampering, and passive detection approaches are reactive, verifying authenticity only after counterfeiting occurs. In this paper, we propose a novel full-resolution secure learned image codec (SLIC) designed to proactively prevent image manipulation by creating self-destructive artifacts upon re-compression. Once a sensitive image is encoded using SLIC, any subsequent re-compression or editing attempts will result in visually severe distortions, making the image’s tampering immediately evident. Because the content of an SLIC image is either original or visually damaged after tampering, images encoded with this secure codec hold greater credibility. SLIC leverages adversarial training to fine-tune a learned image codec that introduces out-of-distribution perturbations, ensuring that the first compressed image retains high quality while subsequent re-compressions degrade drastically. We analyze and compare the adversarial effects of various perceptual quality metrics combined with different learned codecs. Our experiments demonstrate that SLIC holds significant promise as a proactive defense strategy against image manipulation, offering a new approach to enhancing image credibility and authenticity in a media landscape increasingly dominated by AI-driven forgeries.
Full article
Figure 1
Open AccessArticle
Predicting Intensive Care Unit Admissions in COVID-19 Patients: An AI-Powered Machine Learning Model
by
A. M. Mutawa
Big Data Cogn. Comput. 2025, 9(1), 13; https://doi.org/10.3390/bdcc9010013 - 14 Jan 2025
Abstract
Intensive Care Units (ICUs) have been in great demand worldwide since the COVID-19 pandemic, necessitating organized allocation. The spike in critical care patients has overloaded ICUs, which along with prolonged hospitalizations, has increased workload for medical personnel and lead to a significant shortage
[...] Read more.
Intensive Care Units (ICUs) have been in great demand worldwide since the COVID-19 pandemic, necessitating organized allocation. The spike in critical care patients has overloaded ICUs, which along with prolonged hospitalizations, has increased workload for medical personnel and lead to a significant shortage of resources. The study aimed to improve resource management by quickly and accurately identifying patients who need ICU admission. We designed an intelligent decision support system that employs machine learning (ML) to anticipate COVID-19 ICU admissions in Kuwait. Our algorithm examines several clinical and demographic characteristics to identify high-risk individuals early in illness diagnosis. We used 4399 patients to identify ICU admission with predictors such as shortness of breath, high D-dimer values, and abnormal chest X-rays. Any data imbalance was addressed by employing cross-validation along with the Synthetic Minority Oversampling Technique (SMOTE), the feature selection was refined using backward elimination, and the model interpretability was improved using Shapley Additive Explanations (SHAP). We employed various ML classifiers, including support vector machines (SVM). The SVM model surpasses all other models in terms of precision (0.99) and area under curve (AUC, 0.91). This study investigated the healthcare process during a pandemic, facilitating ML-based decision-making solutions to confront healthcare problems.
Full article
(This article belongs to the Special Issue Revolutionizing Healthcare: Exploring the Latest Advances in Digital Health Technology)
►▼
Show Figures
Figure 1
Open AccessArticle
Quantum-Cognitive Neural Networks: Assessing Confidence and Uncertainty with Human Decision-Making Simulations
by
Milan Maksimovic and Ivan S. Maksymov
Big Data Cogn. Comput. 2025, 9(1), 12; https://doi.org/10.3390/bdcc9010012 - 14 Jan 2025
Abstract
►▼
Show Figures
Contemporary machine learning (ML) systems excel in recognising and classifying images with remarkable accuracy. However, like many computer software systems, they can fail by generating confusing or erroneous outputs or by deferring to human operators to interpret the results and make final decisions.
[...] Read more.
Contemporary machine learning (ML) systems excel in recognising and classifying images with remarkable accuracy. However, like many computer software systems, they can fail by generating confusing or erroneous outputs or by deferring to human operators to interpret the results and make final decisions. In this paper, we employ the recently proposed quantum tunnelling neural networks (QT-NNs) inspired by human brain processes alongside quantum cognition theory to classify image datasets while emulating human perception and judgment. Our findings suggest that the QT-NN model provides compelling evidence of its potential to replicate human-like decision-making. We also reveal that the QT-NN model can be trained up to 50 times faster than its classical counterpart.
Full article
Figure 1
Open AccessArticle
The Data Heterogeneity Issue Regarding COVID-19 Lung Imaging in Federated Learning: An Experimental Study
by
Fatimah Alhafiz and Abdullah Basuhail
Big Data Cogn. Comput. 2025, 9(1), 11; https://doi.org/10.3390/bdcc9010011 - 14 Jan 2025
Abstract
►▼
Show Figures
Federated learning (FL) has emerged as a transformative framework for collaborative learning, offering robust model training across institutions while ensuring data privacy. In the context of making a COVID-19 diagnosis using lung imaging, FL enables institutions to collaboratively train a global model without
[...] Read more.
Federated learning (FL) has emerged as a transformative framework for collaborative learning, offering robust model training across institutions while ensuring data privacy. In the context of making a COVID-19 diagnosis using lung imaging, FL enables institutions to collaboratively train a global model without sharing sensitive patient data. A central manager aggregates local model updates to compute global updates, ensuring secure and effective integration. The global model’s generalization capability is evaluated using centralized testing data before dissemination to participating nodes, where local assessments facilitate personalized adaptations tailored to diverse datasets. Addressing data heterogeneity, a critical challenge in medical imaging, is essential for improving both global performance and local personalization in FL systems. This study emphasizes the importance of recognizing real-world data variability before proposing solutions to tackle non-independent and non-identically distributed (non-IID) data. We investigate the impact of data heterogeneity on FL performance in COVID-19 lung imaging across seven distinct heterogeneity settings. By comprehensively evaluating models using generalization and personalization metrics, we highlight challenges and opportunities for optimizing FL frameworks. The findings provide valuable insights that can guide future research toward achieving a balance between global generalization and local adaptation, ultimately enhancing diagnostic accuracy and patient outcomes in COVID-19 lung imaging.
Full article
Figure 1
Open AccessArticle
SqueezeMaskNet: Real-Time Mask-Wearing Recognition for Edge Devices
by
Gibran Benitez-Garcia, Lidia Prudente-Tixteco, Jesus Olivares-Mercado and Hiroki Takahashi
Big Data Cogn. Comput. 2025, 9(1), 10; https://doi.org/10.3390/bdcc9010010 - 10 Jan 2025
Abstract
►▼
Show Figures
This paper presents SqueezeMaskNet, a lightweight convolutional neural network designed for real-time recognition of proper and improper mask usage. The model classifies four categories: masks worn correctly, masks covering only the mouth, masks not covering, and no mask. SqueezeMaskNet integrates seamlessly with existing
[...] Read more.
This paper presents SqueezeMaskNet, a lightweight convolutional neural network designed for real-time recognition of proper and improper mask usage. The model classifies four categories: masks worn correctly, masks covering only the mouth, masks not covering, and no mask. SqueezeMaskNet integrates seamlessly with existing face detection systems, removing the need for retraining. We propose using Fire modules for efficiency, along with attention mechanisms like efficient channel attention (ECA) and squeeze-and-excitation (SE) blocks for improved feature refinement. SqueezeMaskNet achieved 96.7% accuracy on the challenging FineFM test set and ran at 297 FPS on a GPU and up to 96 FPS on edge devices like a Jetson Orin NX. We also introduced ImproperTFM, a subset of real-world images focusing on improper mask usage, which enhanced the model accuracy when combined with FineFM data. Comparative experiments demonstrated SqueezeMaskNet’s superior performance, efficiency, and adaptability compared to MobileNet and EfficientNet, making it a practical solution for mask-wearing recognition across various devices and settings.
Full article
Figure 1
Journal Menu
► ▼ Journal Menu-
- BDCC Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Special Issues
- Topics
- Topical Collections
- Article Processing Charge
- Indexing & Archiving
- Editor’s Choice Articles
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Editorial Office
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
BDCC, Data, Environments, Geosciences, Remote Sensing
Database, Mechanism and Risk Assessment of Slope Geologic Hazards
Topic Editors: Chong Xu, Yingying Tian, Xiaoyi Shao, Zikang Xiao, Yulong CuiDeadline: 28 February 2025
Topic in
Applied Sciences, BDCC, Future Internet, Information, Sci
Social Computing and Social Network Analysis
Topic Editors: Carson K. Leung, Fei Hao, Giancarlo Fortino, Xiaokang ZhouDeadline: 30 June 2025
Topic in
AI, BDCC, Fire, GeoHazards, Remote Sensing
AI for Natural Disasters Detection, Prediction and Modeling
Topic Editors: Moulay A. Akhloufi, Mozhdeh ShahbaziDeadline: 25 July 2025
Topic in
Algorithms, BDCC, BioMedInformatics, Information, Mathematics
Machine Learning Empowered Drug Screen
Topic Editors: Teng Zhou, Jiaqi Wang, Youyi SongDeadline: 31 August 2025
Conferences
Special Issues
Special Issue in
BDCC
Research Progress in Artificial Intelligence and Social Network Analysis
Guest Editors: Yong Tang, Chaobo He, Chengzhou FuDeadline: 28 February 2025
Special Issue in
BDCC
Machine Learning and AI Technology for Sustainable Development
Guest Editors: Wei-Chen Wu, Jason C. Hung, Yuchih Wei, Jui-hung KaoDeadline: 28 February 2025
Special Issue in
BDCC
Advances in System Design and IoT Based Smart City
Guest Editors: Syed Attique Shah, Muhammad RathoreDeadline: 1 March 2025
Special Issue in
BDCC
Fault Diagnosis and Detection Based on Deep Learning
Guest Editors: Pin Liu, Jianyong Zhu, Alfredo CuzzocreaDeadline: 31 March 2025