Application of Machine Learning in Data Science and Computational Intelligence

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 31 May 2025 | Viewed by 4400

Special Issue Editors


E-Mail Website
Guest Editor
Industrial Systems Institute (ISI), Athena Research and Innovation Center, 26504 Patras, Greece
Interests: artificial intelligence; big data; data analysis; databases; data mining; data structures; machine learning; privacy; security; trust
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Industrial Systems Institute (ISI), Athena Research and Innovation Center, 26504 Patras, Greece
Interests: 5G; 6G; artificial intelligence; deep learning; image processing; IoT; machine learning; MIMO; mmWave; signal processing; wireless communications
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data science is a field of study that focuses on the extraction of valuable information from noisy data and incorporates various disciplines, such as data engineering, data preprocessing, visualization, predictive analytics, data mining, machine learning and statistics. In recent years, there has been rapidly growing interest in the mathematical and theoretical aspects of data science. This manifests in deterministic and non-deterministic models (i.e., probabilistic and a family of probabilistic known as statistical) that provide guaranteed performance, robustness, and reusable and interpretable results. The digital transformation of information systems has made feasible the effective use of data science techniques such as artificial intelligence (AI) and machine learning (ML) for various applications. In addition, the application of sensor technology and AI/ML will inevitably lead to a more objective and enhanced performance, lower cost and more effective system management overall. The aim of this Special Issue is to present high-quality innovative ideas and research solutions (for both theoretical and practical challenges) that facilitate data analysis and modelling with the aid of artificial intelligence and machine learning in various domains and applications.

Dr. Elias Dritsas
Dr. Maria Trigka
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data science
  • data mining
  • artificial intelligence
  • machine learning
  • statistics
  • predictive modelling
  • monitoring
  • data analytics

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 389 KiB  
Article
Exploring the Behavior and Performance of Large Language Models: Can LLMs Infer Answers to Questions Involving Restricted Information?
by Ángel Cadena-Bautista, Francisco F. López-Ponce, Sergio Luis Ojeda-Trueba, Gerardo Sierra and Gemma Bel-Enguix
Information 2025, 16(2), 77; https://doi.org/10.3390/info16020077 - 22 Jan 2025
Viewed by 376
Abstract
In this paper various LLMs are tested in a specific domain using a Retrieval-Augmented Generation (RAG) system. The study focuses on the performance and behavior of the models and was conducted in Spanish. A questionnaire based on The Bible, which consists of questions [...] Read more.
In this paper various LLMs are tested in a specific domain using a Retrieval-Augmented Generation (RAG) system. The study focuses on the performance and behavior of the models and was conducted in Spanish. A questionnaire based on The Bible, which consists of questions that vary in complexity of reasoning, was created in order to evaluate the reasoning capabilities of each model. The RAG system matches a question with the most similar passage from The Bible and feeds the pair to each LLM. The evaluation aims to determine whether each model can reason solely with the provided information or if it disregards the instructions given and makes use of its pretrained knowledge. Full article
Show Figures

Graphical abstract

17 pages, 1865 KiB  
Article
Improving Sentiment Analysis Performance on Imbalanced Moroccan Dialect Datasets Using Resample and Feature Extraction Techniques
by Zineb Nassr, Faouzia Benabbou, Nawal Sael and Touria Hamim
Information 2025, 16(1), 39; https://doi.org/10.3390/info16010039 - 10 Jan 2025
Viewed by 527
Abstract
Sentiment analysis is a crucial component of text mining and natural language processing (NLP), involving the evaluation and classification of text data based on its emotional tone, typically categorized as positive, negative, or neutral. While significant research has focused on structured languages like [...] Read more.
Sentiment analysis is a crucial component of text mining and natural language processing (NLP), involving the evaluation and classification of text data based on its emotional tone, typically categorized as positive, negative, or neutral. While significant research has focused on structured languages like English, unstructured languages, such as the Moroccan Dialect (MD), face substantial resource limitations and linguistic challenges, making effective sentiment analysis difficult. This study addresses this gap by exploring the integration of data-balancing techniques with machine learning (ML) methods, specifically investigating the impact of resampling techniques and feature extraction methods, including Term Frequency–Inverse Document Frequency (TF-IDF), Bag of Words (BOW), and N-grams. Through rigorous experimentation, we evaluate the effectiveness of these approaches in enhancing sentiment analysis accuracy for the Moroccan dialect. Our findings demonstrate that strategic resampling, combined with the TF-IDF method, significantly improves classification accuracy and robustness. We also explore the interaction between resampling strategies and feature extraction methods, revealing varying levels of effectiveness across different combinations. Notably, the Support Vector Machine (SVM) classifier, when paired with TF-IDF representation, achieves superior performance, with an accuracy of 90.24% and a precision of 90.34%. These results highlight the importance of tailored resampling techniques, appropriate feature extraction methods, and machine learning optimization in advancing sentiment analysis for under-resourced and dialect-heavy languages like the Moroccan dialect, providing a practical framework for future research and development in NLP for unstructured languages. Full article
Show Figures

Graphical abstract

14 pages, 405 KiB  
Article
Understanding Online Purchases with Explainable Machine Learning
by João A. Bastos and Maria Inês Bernardes
Information 2024, 15(10), 587; https://doi.org/10.3390/info15100587 - 26 Sep 2024
Viewed by 771
Abstract
Customer profiling in e-commerce is a powerful tool that enables organizations to create personalized offers through direct marketing. One crucial objective of customer profiling is to predict whether a website visitor will make a purchase, thereby generating revenue. Machine learning models are the [...] Read more.
Customer profiling in e-commerce is a powerful tool that enables organizations to create personalized offers through direct marketing. One crucial objective of customer profiling is to predict whether a website visitor will make a purchase, thereby generating revenue. Machine learning models are the most accurate means to achieve this objective. However, the opaque nature of these models may deter companies from adopting them. Instead, they may prefer simpler models that allow for a clear understanding of the customer attributes that contribute to a purchase. In this study, we show that companies need not compromise on prediction accuracy to understand their online customers. By leveraging website data from a multinational communications service provider, we establish that the most pertinent customer attributes can be readily extracted from a black box model. Specifically, we show that the features that measure customer activity within the e-commerce platform are the most reliable predictors of conversions. Moreover, we uncover significant nonlinear relationships between customer features and the likelihood of conversion. Full article
Show Figures

Figure 1

24 pages, 667 KiB  
Article
Utilizing Multi-Class Classification Methods for Automated Sleep Disorder Prediction
by Elias Dritsas and Maria Trigka
Information 2024, 15(8), 426; https://doi.org/10.3390/info15080426 - 23 Jul 2024
Cited by 1 | Viewed by 1762
Abstract
Even from infancy, a human’s day-life alternates from a period of wakefulness to a period of sleep at night, during the 24-hour cycle. Sleep is a normal process necessary for human physical and mental health. A lack of sleep makes it difficult to [...] Read more.
Even from infancy, a human’s day-life alternates from a period of wakefulness to a period of sleep at night, during the 24-hour cycle. Sleep is a normal process necessary for human physical and mental health. A lack of sleep makes it difficult to control emotions and behaviour, reduces productivity at work, and can even increase stress or depression. In addition, poor sleep affects health; when sleep is insufficient, the chances of developing serious diseases greatly increase. Researchers in sleep medicine have identified an extensive list of sleep disorders, and thus leveraged Artificial Intelligence (AI) to automate their analysis and gain a deeper understanding of sleep patterns and related disorders. In this research, we seek a Machine Learning (ML) solution that will allow for efficient classification of unlabeled instances as being Sleep Apnea, Insomnia or Normal (subjects without a specific sleep disorder) by assessing the performance of two well-established strategies for multi-class classification tasks: the One-Vs-All (OVA) and One-Vs-One (OVO). In the context of the specific strategies, two well-known binary classification models were assumed, Logistic Regression (LR) and Support Vector Machines (SVMs). Both strategies’ validity was verified upon a dataset of diverse information related to the profiles (anthropometric data, sleep metrics, lifestyle and cardiovascular health factors) of potential patients or individuals not exhibiting any specific sleep disorder. Performance evaluation was carried out by comparing the weighted average results in all involved classes that represent these two specific sleep disorders and no-disorder occurrence; accuracy, kappa score, precision, recall, f-measure, and Area Under the ROC curve (AUC) were recorded and compared to identify an effective and robust model and strategy, both class-wise and on average. The experimental evaluation unveiled that after feature selection, 2-degree polynomial SVM under both strategies was the least complex and most efficient, recording an accuracy of 91.44%, a kappa score of 84.97%, precision, recall and f-measure equal to 0.914, and an AUC of 0.927. Full article
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: Analysis and Evaluation of the Common Agricultural Policy (CAP) Funding Business Processes using Process Mining
Authors: Konstantinos Gkousaris; Alexandros Bousdekis
Affiliation: Department of Informatics and Computer Engineering, University of West Attica, Egaleo, Greece
Abstract: The demand for data scientists who can transform data into valuable insights is rapidly increasing. In the context of process mining, the challenge is to extract relevant information about the actual processes being executed from the vast amount of data available. Process mining aims to discover, monitor, and improve real processes by extracting knowledge from event logs readily available in today’s information systems. In this paper, we propose a process mining approach to analyze and evaluate the efficiency of business processes related to EU funding in the context of CAP. The European Union spends a large part of its budget on the Common Agricultural Policy (CAP). Among these expenditures are direct payments, which are mainly aimed at providing a basic income to farmers regardless of production. The remainder of the CAP budget is earmarked for market and rural development expenditure. The processes governing the distribution of these funds are subject to complex regulations recorded in EU and national state laws. Member States are required to operate an Integrated Administration and Control System. The process examined concerns the processing of applications for EU direct payments to German farmers from the European Agricultural Guarantee Fund. The process is repeated every year with minor differences due to changes in EU regulations.

Back to TopTop