Next Issue
Volume 5, December
Previous Issue
Volume 5, June
 
 

AI, Volume 5, Issue 3 (September 2024) – 38 articles

Cover Story (view full-size image): The advent of large language models has profoundly impacted software development, making the distinction between human-written and AI-generated code ambiguous. This uncertainty is particularly concerning in higher educational and professional contexts. Our paper addresses the challenge of distinguishing human-written code from ChatGPT-generated code. By employing a combination of advanced embedding features and supervised learning algorithms, we achieve a remarkable 98% accuracy. Furthermore, we explore model calibration and interpretable techniques. While the latter offer insights into the underlying distinction, their performance is lower, highlighting the importance of code snippet representation. Notably, tests on untrained humans show that their performance barely surpasses random guessing, underlining the need for our models. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
22 pages, 749 KiB  
Article
Improving Distantly Supervised Relation Extraction with Multi-Level Noise Reduction
by Wei Song and Zijiang Yang
AI 2024, 5(3), 1709-1730; https://doi.org/10.3390/ai5030084 - 23 Sep 2024
Viewed by 756
Abstract
Background: Distantly supervised relation extraction (DSRE) aims to identify semantic relations in large-scale texts automatically labeled via knowledge base alignment. It has garnered significant attention due to its high efficiency, but existing methods are plagued by noise at both the word and [...] Read more.
Background: Distantly supervised relation extraction (DSRE) aims to identify semantic relations in large-scale texts automatically labeled via knowledge base alignment. It has garnered significant attention due to its high efficiency, but existing methods are plagued by noise at both the word and sentence level and fail to address these issues adequately. The former level of noise arises from the large proportion of irrelevant words within sentences, while noise at the latter level is caused by inaccurate relation labels for various sentences. Method: We propose a novel multi-level noise reduction neural network (MLNRNN) to tackle both issues by mitigating the impact of multi-level noise. We first build an iterative keyword semantic aggregator (IKSA) to remove noisy words, and capture distinctive features of sentences by aggregating the information of keywords. Next, we implement multi-objective multi-instance learning (MOMIL) to reduce the impact of incorrect labels in sentences by identifying the cluster of correctly labeled instances. Meanwhile, we leverage mislabeled sentences with cross-level contrastive learning (CCL) to further enhance the classification capability of the extractor. Results: Comprehensive experimental results on two DSRE benchmark datasets demonstrated that the MLNRNN outperformed state-of-the-art methods for distantly supervised relation extraction in almost all cases. Conclusions: The proposed MLNRNN effectively addresses both word- and sentence-level noise, providing a significant improvement in relation extraction performance under distant supervision. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

14 pages, 905 KiB  
Article
Spatiotemporal Graph Autoencoder Network for Skeleton-Based Human Action Recognition
by Hosam Abduljalil, Ahmed Elhayek, Abdullah Marish Ali and Fawaz Alsolami
AI 2024, 5(3), 1695-1708; https://doi.org/10.3390/ai5030083 - 23 Sep 2024
Viewed by 794
Abstract
Human action recognition (HAR) based on skeleton data is a challenging yet crucial task due to its wide-ranging applications, including patient monitoring, security surveillance, and human- machine interaction. Although numerous algorithms have been proposed to distinguish between various activities, most practical applications require [...] Read more.
Human action recognition (HAR) based on skeleton data is a challenging yet crucial task due to its wide-ranging applications, including patient monitoring, security surveillance, and human- machine interaction. Although numerous algorithms have been proposed to distinguish between various activities, most practical applications require highly accurate detection of specific actions. In this study, we propose a novel, highly accurate spatiotemporal graph autoencoder network for HAR, designated as GA-GCN. Furthermore, an extensive investigation was conducted employing diverse modalities. To this end, a spatiotemporal graph autoencoder was constructed to automatically learn both spatial and temporal patterns from skeleton data. The proposed method achieved accuracies of 92.3% and 96.8% on the NTU RGB+D dataset for cross-subject and cross-view evaluations, respectively. On the more challenging NTU RGB+D 120 dataset, GA-GCN attained accuracies of 88.8% and 90.4% for cross-subject and cross-set evaluations. Overall, our model outperforms the majority of the existing state-of-the-art methods on these common benchmark datasets. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

11 pages, 1221 KiB  
Article
Probabilistic Ensemble Framework for Injury Narrative Classification
by Srushti Vichare, Gaurav Nanda and Raji Sundararajan
AI 2024, 5(3), 1684-1694; https://doi.org/10.3390/ai5030082 - 20 Sep 2024
Viewed by 808
Abstract
In this research, we analyzed narratives from the National Electronic Injury Surveillance System (NEISS) dataset to predict the top two injury codes using a comparative study of ensemble machine learning (ML) models. Four ensemble models were evaluated: Random Forest (RF) combined with Logistic [...] Read more.
In this research, we analyzed narratives from the National Electronic Injury Surveillance System (NEISS) dataset to predict the top two injury codes using a comparative study of ensemble machine learning (ML) models. Four ensemble models were evaluated: Random Forest (RF) combined with Logistic Regression (LR), K-Nearest Neighbor (KNN) paired with RF, LR combined with KNN, and a model integrating LR, RF, and KNN, all utilizing a probabilistic likelihood-based approach to improve decision-making across different classifiers. The combined KNN + LR ensemble achieved an accuracy of 90.47% for the top one prediction, while the KNN + RF + LR model excelled in predicting the top two injury codes with a very high accuracy of 99.50%. These results demonstrate the significant potential of ensemble models to enhance unstructured narrative classification accuracy, particularly in addressing underrepresented cases, and the potential of the proposed probabilistic ensemble framework ML models in improving decision-making in public health and safety, providing a foundation for future research in automated clinical narrative classification and predictive modeling, especially in scenarios with imbalanced data. Full article
Show Figures

Figure 1

14 pages, 2342 KiB  
Project Report
Enhancing Literature Review Efficiency: A Case Study on Using Fine-Tuned BERT for Classifying Focused Ultrasound-Related Articles
by Reanna K. Panagides, Sean H. Fu, Skye H. Jung, Abhishek Singh, Rose T. Eluvathingal Muttikkal, R. Michael Broad, Timothy D. Meakem and Rick A. Hamilton
AI 2024, 5(3), 1670-1683; https://doi.org/10.3390/ai5030081 - 10 Sep 2024
Viewed by 1503
Abstract
Over the past decade, focused ultrasound (FUS) has emerged as a promising therapeutic modality for various medical conditions. However, the exponential growth in the published literature on FUS therapies has made the literature review process increasingly time-consuming, inefficient, and error-prone. Machine learning approaches [...] Read more.
Over the past decade, focused ultrasound (FUS) has emerged as a promising therapeutic modality for various medical conditions. However, the exponential growth in the published literature on FUS therapies has made the literature review process increasingly time-consuming, inefficient, and error-prone. Machine learning approaches offer a promising solution to address these challenges. Therefore, the purpose of our study is to (1) explore and compare machine learning techniques for the text classification of scientific abstracts, and (2) integrate these machine learning techniques into the conventional literature review process. A classified dataset of 3588 scientific abstracts related and unrelated to FUS therapies sourced from the PubMed database was used to train various traditional machine learning and deep learning models. The fine-tuned Bio-ClinicalBERT (Bidirectional Encoder Representations from Transformers) model, which we named FusBERT, had comparatively optimal performance metrics with an accuracy of 0.91, a precision of 0.85, a recall of 0.99, and an F1 of 0.91. FusBERT was then successfully integrated into the literature review process. Ultimately, the integration of this model into the literature review pipeline will reduce the number of irrelevant manuscripts that the clinical team must screen, facilitating efficient access to emerging findings in the field. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

22 pages, 10205 KiB  
Article
Perspectives for Generative AI-Assisted Art Therapy for Melanoma Patients
by Lennart Jütte, Ning Wang, Martin Steven and Bernhard Roth
AI 2024, 5(3), 1648-1669; https://doi.org/10.3390/ai5030080 - 6 Sep 2024
Cited by 1 | Viewed by 2004
Abstract
Digital technologies are making their mark in medicine, and especially also in art therapy, offering innovative therapeutic interventions for patients, including those with melanoma skin cancer. However, the integration of novel technologies, such as AI-generated art, brings along ethical, psychological, and technical challenges [...] Read more.
Digital technologies are making their mark in medicine, and especially also in art therapy, offering innovative therapeutic interventions for patients, including those with melanoma skin cancer. However, the integration of novel technologies, such as AI-generated art, brings along ethical, psychological, and technical challenges that are viewed differently among therapists. We aim to gauge art therapists’ views on the ethical, application, and challenge facets of utilizing AI-generated art from medical images in therapy. The focus is on assessing its applicability and limitations for melanoma patients. Art therapists were surveyed via a questionnaire focusing on their experience, digital tool familiarity, and views on AI in therapy, encompassing ethics, benefits, challenges, and applicability for melanoma. Art therapists have already implemented digital technologies and acknowledged potential therapeutic benefits of creating personalized artworks with generative artificial intelligence. Attention needs to be given to technological hurdles and the necessity for supplementary interventions. Views on the method’s adaptability varied, underscoring a need for tailored, patient-focused applications. Art therapists are welcoming AI-generated art as a promising creative therapeutic tool and acknowledge potential therapeutic benefits. There are ethical, technical, and psychological challenges that must be addressed for application in therapeutic sessions. Therapists should navigate AI integration with sensitivity, adhering to ethical norms around consent and privacy. Future studies should show the therapeutic benefit in practice with emphasis on equipping therapists to manage the technical complexities effectively. Furthermore, it is important to ensure that patients can influence the AI output, allowing for creative moments in the process. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

15 pages, 874 KiB  
Article
Facial Recognition Using Hidden Markov Model and Convolutional Neural Network
by Muhammad Bilal, Saqlain Razzaq, Nirman Bhowmike, Azib Farooq, Muhammad Zahid and Sultan Shoaib
AI 2024, 5(3), 1633-1647; https://doi.org/10.3390/ai5030079 - 6 Sep 2024
Viewed by 937
Abstract
Face recognition (FR) uses a passive approach to person authentication that avoids face-to-face contact. Among different FR techniques, most FR approaches place little emphasis on reducing powerful cryptography and instead concentrate on increasing recognition rates. In this paper, we have proposed the Hidden [...] Read more.
Face recognition (FR) uses a passive approach to person authentication that avoids face-to-face contact. Among different FR techniques, most FR approaches place little emphasis on reducing powerful cryptography and instead concentrate on increasing recognition rates. In this paper, we have proposed the Hidden Markov Model (HMM) and convolutional Neural Network (CNN) models for FR by using ORL and Yale datasets. Facial images from the given data sets are divided into 3 portions, 4 portions, 5 portions, and 6 portions corresponding to their respective HMM hidden states being used in the HMM model. Quantized levels of eigenvalues and eigenvector coefficients of overlapping blocks of facial images define the observation states of the HMM model. For image selection and rejection, a threshold is calculated using singular value decomposition (SVD). After training HMM on 3 states HMM, 4 states HMM, 5 states HMM, and 6 states HMM, the recognition accuracies are 96.5%, 98.5%, 98.5%, and 99.5%, respectively, on the ORL database and 90.6667%, 94.6667%, 94.6667%, and 94.6667% on the Yale database. The CNN model uses convolutional layers, a max-pooling layer, a flattening layer, a dense layer, and a dropout layer. Relu is used as the activation function in all layers except in the last layer, where softmax is used as the activation function. Cross entropy is used as a loss function, and we have used the Adam optimizer in our proposed algorithm. The proposed CNN model has given 100% training and testing accuracy on the ORL data set. While using the Yale data set, the CNN model has a training accuracy of 100% and a testing accuracy of 85.71%. In this paper, our proposed model showed that the HMM model is cost-effective with lesser accuracy, while the CNN model is more accurate as compared to HMM but has a higher computational cost. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

19 pages, 2777 KiB  
Article
Generative Models Utilizing Padding Can Efficiently Integrate and Generate Multi-Omics Data
by Hyeon-Su Lee, Seung-Hwan Hong, Gwan-Heon Kim, Hye-Jin You, Eun-Young Lee, Jae-Hwan Jeong, Jin-Woo Ahn and June-Hyuk Kim
AI 2024, 5(3), 1614-1632; https://doi.org/10.3390/ai5030078 - 5 Sep 2024
Viewed by 943
Abstract
Technological advances in information-processing capacity have enabled integrated analyses (multi-omics) of different omics data types, improving target discovery and clinical diagnosis. This study proposes novel artificial intelligence (AI) learning strategies for incomplete datasets, common in omics research. The model comprises (1) a multi-omics [...] Read more.
Technological advances in information-processing capacity have enabled integrated analyses (multi-omics) of different omics data types, improving target discovery and clinical diagnosis. This study proposes novel artificial intelligence (AI) learning strategies for incomplete datasets, common in omics research. The model comprises (1) a multi-omics generative model based on a variational auto-encoder that learns tumor genetic patterns based on different omics data types and (2) an expanded classification model that predicts cancer phenotypes. Padding was applied to replace missing data with virtual data. The embedding data generated by the model accurately classified cancer phenotypes, addressing the class imbalance issue (weighted F1 score: cancer type > 0.95, primary site > 0.92, sample type > 0.97). The classification performance was maintained in the absence of omics data, and the virtual data resembled actual omics data (cosine similarity mRNA gene expression > 0.96, mRNA isoform expression > 0.95, DNA methylation > 0.96). Meanwhile, in the presence of omics data, high-quality, non-existent omics data were generated (cosine similarity mRNA gene expression: 0.9702, mRNA isoform expression: 0.9546, DNA methylation: 0.9687). This model can effectively classify cancer phenotypes based on incomplete omics data with data sparsity robustness, generating omics data through deep learning and enabling precision medicine. Full article
Show Figures

Figure 1

20 pages, 19697 KiB  
Article
Efficacy Evaluation of You Only Learn One Representation (YOLOR) Algorithm in Detecting, Tracking, and Counting Vehicular Traffic in Real-World Scenarios, the Case of Morelia México: An Artificial Intelligence Approach
by José A. Guzmán-Torres, Francisco J. Domínguez-Mota, Gerardo Tinoco-Guerrero, Maybelin C. García-Chiquito and José G. Tinoco-Ruíz
AI 2024, 5(3), 1594-1613; https://doi.org/10.3390/ai5030077 - 4 Sep 2024
Viewed by 1022
Abstract
This research explores the efficacy of the YOLOR (You Only Learn One Representation) algorithm integrated with the Deep Sort algorithm for real-time vehicle detection, classification, and counting in Morelia, Mexico. The study aims to enhance traffic monitoring and management by leveraging advanced deep [...] Read more.
This research explores the efficacy of the YOLOR (You Only Learn One Representation) algorithm integrated with the Deep Sort algorithm for real-time vehicle detection, classification, and counting in Morelia, Mexico. The study aims to enhance traffic monitoring and management by leveraging advanced deep learning techniques. The methodology involves deploying the YOLOR model at six key monitoring stations, with varying confidence levels and pre-trained weights, to evaluate its performance across diverse traffic conditions. The results demonstrate that the model is effective compared to other approaches in classifying multiple vehicle types. The combination of YOLOR and Deep Sort proves effective in tracking vehicles and distinguishing between different types, providing valuable data for optimizing traffic flow and infrastructure planning. This innovative approach offers a scalable and precise solution for intelligent traffic management, setting new methodologies for urban traffic monitoring systems. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

19 pages, 37717 KiB  
Article
Detection of AI-Generated Synthetic Images with a Lightweight CNN
by Adrian Lokner Lađević, Tin Kramberger, Renata Kramberger and Dino Vlahek
AI 2024, 5(3), 1575-1593; https://doi.org/10.3390/ai5030076 - 3 Sep 2024
Viewed by 1638
Abstract
The rapid development of generative adversarial networks has significantly advanced the generation of synthetic images, presenting valuable opportunities and ethical dilemmas in their potential misuse across various industries. The necessity to distinguish real from AI-generated content is becoming increasingly critical to preserve the [...] Read more.
The rapid development of generative adversarial networks has significantly advanced the generation of synthetic images, presenting valuable opportunities and ethical dilemmas in their potential misuse across various industries. The necessity to distinguish real from AI-generated content is becoming increasingly critical to preserve the integrity of online data. While traditional methods for detecting fake images resulting from image tampering rely on hand-crafted features, the sophistication of manipulated images produced by generative adversarial networks requires more advanced detection approaches. The lightweight approach proposed here is based on convolutional neural networks that comprise only eight convolutional and two hidden layers that effectively differentiate AI-generated images from real ones. The proposed approach was assessed using two benchmark datasets and custom-generated data from Sentinel-2 imagery. It demonstrated superior performance compared to four state-of-the-art methods on the CIFAKE dataset, achieving the highest accuracy of 97.32%, on par with the highest-performing state-of-the-art method. Explainable AI is utilized to enhance our comprehension of the complex processes involved in synthetic image recognition. We have shown that, unlike authentic images, where activations often center around the main object, in synthetic images, activations cluster around the edges of objects, in the background, or in areas with complex textures. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

17 pages, 6251 KiB  
Article
Effective Hybrid Structure Health Monitoring through Parametric Study of GoogLeNet
by Saleh Al-Qudah and Mijia Yang
AI 2024, 5(3), 1558-1574; https://doi.org/10.3390/ai5030075 - 30 Aug 2024
Cited by 1 | Viewed by 840
Abstract
This paper presents an innovative approach that utilizes infused images from vibration signals and visual inspections to enhance the efficiency and accuracy of structure health monitoring through GoogLeNet. Scrutiny of the structure of GoogLeNet identified four key parameters, and thus, the optimization of [...] Read more.
This paper presents an innovative approach that utilizes infused images from vibration signals and visual inspections to enhance the efficiency and accuracy of structure health monitoring through GoogLeNet. Scrutiny of the structure of GoogLeNet identified four key parameters, and thus, the optimization of GoogLeNet was completed through manipulation of the four key parameters. First, the impact of the number of inception modules on the performance of GoogLeNet revealed that employing eight inception layers achieves remarkable 100% accuracy while requiring less computational time compared to nine layers. Second, the choice of activation function was studied, with the Rectified Linear Unit (ReLU) emerging as the most effective option. Types of optimizers were then researched, which identified Stochastic Gradient Descent with Momentum (SGDM) as the most efficient optimizer. Finally, the influence of learning rate was compared, which found that a rate of 0.001 produces the best outcomes. By amalgamating these findings, a comprehensive optimized GoogLeNet model was found to identify damage cases effectively and accurately through infused images from vibrations and visual inspections. Full article
Show Figures

Figure 1

24 pages, 443 KiB  
Review
Understanding Physics-Informed Neural Networks: Techniques, Applications, Trends, and Challenges
by Amer Farea, Olli Yli-Harja and Frank Emmert-Streib
AI 2024, 5(3), 1534-1557; https://doi.org/10.3390/ai5030074 - 29 Aug 2024
Cited by 2 | Viewed by 6833
Abstract
Physics-informed neural networks (PINNs) represent a significant advancement at the intersection of machine learning and physical sciences, offering a powerful framework for solving complex problems governed by physical laws. This survey provides a comprehensive review of the current state of research on PINNs, [...] Read more.
Physics-informed neural networks (PINNs) represent a significant advancement at the intersection of machine learning and physical sciences, offering a powerful framework for solving complex problems governed by physical laws. This survey provides a comprehensive review of the current state of research on PINNs, highlighting their unique methodologies, applications, challenges, and future directions. We begin by introducing the fundamental concepts underlying neural networks and the motivation for integrating physics-based constraints. We then explore various PINN architectures and techniques for incorporating physical laws into neural network training, including approaches to solving partial differential equations (PDEs) and ordinary differential equations (ODEs). Additionally, we discuss the primary challenges faced in developing and applying PINNs, such as computational complexity, data scarcity, and the integration of complex physical laws. Finally, we identify promising future research directions. Overall, this survey seeks to provide a foundational understanding of PINNs within this rapidly evolving field. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

17 pages, 2210 KiB  
Review
A Systematic Literature Review on Parameters Optimization for Smart Hydroponic Systems
by Umar Shareef, Ateeq Ur Rehman and Rafiq Ahmad
AI 2024, 5(3), 1517-1533; https://doi.org/10.3390/ai5030073 - 27 Aug 2024
Viewed by 2190
Abstract
Hydroponics is a soilless farming technique that has emerged as a sustainable alternative. However, new technologies such as Industry 4.0, the internet of things (IoT), and artificial intelligence are needed to keep up with issues related to economics, automation, and social challenges in [...] Read more.
Hydroponics is a soilless farming technique that has emerged as a sustainable alternative. However, new technologies such as Industry 4.0, the internet of things (IoT), and artificial intelligence are needed to keep up with issues related to economics, automation, and social challenges in hydroponics farming. One significant issue is optimizing growth parameters to identify the best conditions for growing fruits and vegetables. These parameters include pH, total dissolved solids (TDS), electrical conductivity (EC), light intensity, daily light integral (DLI), and nutrient solution/ambient temperature and humidity. To address these challenges, a systematic literature review was conducted aiming to answer research questions regarding the optimal growth parameters for leafy green vegetables and herbs and spices grown in hydroponic systems. The review selected a total of 131 papers related to indoor farming, hydroponics, and aquaponics. The review selected a total of 123 papers related to indoor farming, hydroponics, and aquaponics. The majority of the articles focused on technology description (38.5%), artificial illumination (26.2%), and nutrient solution composition/parameters (13.8%). Additionally, remaining 10.7% articles focused on the application of sensors, slope, environment and economy. This comprehensive review provides valuable information on optimized growth parameters for smart hydroponic systems and explores future prospects and the application of digital technologies in this field. Full article
(This article belongs to the Special Issue Artificial Intelligence in Agriculture)
Show Figures

Figure 1

21 pages, 8208 KiB  
Article
Seismic Performance Prediction of RC, BRB and SDOF Structures Using Deep Learning and the Intensity Measure INp
by Omar Payán-Serrano, Edén Bojórquez, Julián Carrillo, Juan Bojórquez, Herian Leyva, Ali Rodríguez-Castellanos, Joel Carvajal and José Torres
AI 2024, 5(3), 1496-1516; https://doi.org/10.3390/ai5030072 - 26 Aug 2024
Viewed by 720
Abstract
The motivation for using artificial neural networks in this study stems from their computational efficiency and ability to model complex, high-level abstractions. Deep learning models were utilized to predict the structural responses of reinforced concrete (RC) buildings subjected to earthquakes. For this aim, [...] Read more.
The motivation for using artificial neural networks in this study stems from their computational efficiency and ability to model complex, high-level abstractions. Deep learning models were utilized to predict the structural responses of reinforced concrete (RC) buildings subjected to earthquakes. For this aim, the dataset for training and evaluation was derived from complex computational dynamic analyses, which involved scaling real ground motion records at different intensity levels (spectral acceleration Sa(T1) and the recently proposed INp). The results, specifically the maximum interstory drifts, were characterized for the output neurons in terms of their corresponding statistical parameters: mean, median, and standard deviation; while two input variables (fundamental period and earthquake intensity) were used in the neural networks to represent buildings and seismic risk. To validate deep learning as a robust tool for seismic predesign and rapid estimation, a prediction model was developed to assess the seismic performance of a complex RC building with buckling restrained braces (RC-BRBs). Additionally, other deep learning models were explored to predict ductility and hysteretic energy in nonlinear single degree of freedom (SDOF) systems. The findings demonstrated that increasing the number of hidden layers generally reduces prediction error, although an excessive number can lead to overfitting. Full article
Show Figures

Figure 1

14 pages, 2577 KiB  
Article
xLSTMTime: Long-Term Time Series Forecasting with xLSTM
by Musleh Alharthi and Ausif Mahmood
AI 2024, 5(3), 1482-1495; https://doi.org/10.3390/ai5030071 - 23 Aug 2024
Viewed by 2473
Abstract
In recent years, transformer-based models have gained prominence in multivariate long-term time series forecasting (LTSF), demonstrating significant advancements despite facing challenges such as high computational demands, difficulty in capturing temporal dynamics, and managing long-term dependencies. The emergence of LTSF-Linear, with its straightforward linear [...] Read more.
In recent years, transformer-based models have gained prominence in multivariate long-term time series forecasting (LTSF), demonstrating significant advancements despite facing challenges such as high computational demands, difficulty in capturing temporal dynamics, and managing long-term dependencies. The emergence of LTSF-Linear, with its straightforward linear architecture, has notably outperformed transformer-based counterparts, prompting a reevaluation of the transformer’s utility in time series forecasting. In response, this paper presents an adaptation of a recent architecture, termed extended LSTM (xLSTM), for LTSF. xLSTM incorporates exponential gating and a revised memory structure with higher capacity that has good potential for LTSF. Our adopted architecture for LTSF, termed xLSTMTime, surpasses current approaches. We compare xLSTMTime’s performance against various state-of-the-art models across multiple real-world datasets, demonstrating superior forecasting capabilities. Our findings suggest that refined recurrent architectures can offer competitive alternatives to transformer-based models in LTSF tasks, potentially redefining the landscape of time series forecasting. Full article
Show Figures

Figure 1

20 pages, 5693 KiB  
Article
H-QNN: A Hybrid Quantum–Classical Neural Network for Improved Binary Image Classification
by Muhammad Asfand Hafeez, Arslan Munir and Hayat Ullah
AI 2024, 5(3), 1462-1481; https://doi.org/10.3390/ai5030070 - 19 Aug 2024
Viewed by 2027
Abstract
Image classification is an important application for deep learning. With the advent of quantum technology, quantum neural networks (QNNs) have become the focus of research. Traditional deep learning-based image classification involves using a convolutional neural network (CNN) to extract features from the image [...] Read more.
Image classification is an important application for deep learning. With the advent of quantum technology, quantum neural networks (QNNs) have become the focus of research. Traditional deep learning-based image classification involves using a convolutional neural network (CNN) to extract features from the image and a multi-layer perceptron (MLP) network to create the decision boundaries. However, quantum circuits with parameters can extract rich features from images and also create complex decision boundaries. This paper proposes a hybrid QNN (H-QNN) model designed for binary image classification that capitalizes on the strengths of quantum computing and classical neural networks. Our H-QNN model uses a compact, two-qubit quantum circuit integrated with a classical convolutional architecture, making it highly efficient for computation on noisy intermediate-scale quantum (NISQ) devices that are currently leading the way in practical quantum computing applications. Our H-QNN model significantly enhances classification accuracy, achieving a 90.1% accuracy rate on binary image datasets. In addition, we have extensively evaluated baseline CNN and our proposed H-QNN models for image retrieval tasks. The obtained quantitative results exhibit the generalization of our H-QNN for downstream image retrieval tasks. Furthermore, our model addresses the issue of overfitting for small datasets, making it a valuable tool for practical applications. Full article
(This article belongs to the Special Issue Advances in Quantum Computing and Quantum Machine Learning)
Show Figures

Figure 1

16 pages, 8587 KiB  
Article
Prompt Engineering for Knowledge Creation: Using Chain-of-Thought to Support Students’ Improvable Ideas
by Alwyn Vwen Yen Lee, Chew Lee Teo and Seng Chee Tan
AI 2024, 5(3), 1446-1461; https://doi.org/10.3390/ai5030069 - 16 Aug 2024
Viewed by 1712
Abstract
Knowledge creation in education is a critical practice for advancing collective knowledge and fostering innovation within a student community. Students play vital roles in identifying gaps and collaborative work to improve community ideas from discourse, but idea quality can be suboptimal, affected by [...] Read more.
Knowledge creation in education is a critical practice for advancing collective knowledge and fostering innovation within a student community. Students play vital roles in identifying gaps and collaborative work to improve community ideas from discourse, but idea quality can be suboptimal, affected by a lack of resources or diversity of ideas. The use of generative Artificial Intelligence and large language models (LLMs) in education has allowed work on idea-centric discussions to advance in ways that were previously unfeasible. However, the use of LLMs requires specific skill sets in prompt engineering, relevant to the in-context technique known as Chain-of-Thought (CoT) for generating and supporting improvable ideas in student discourse. A total of 721 discourse turns consisting of 272 relevant question–answer pairs and 149 threads of student discourse data were collected from 31 students during a two-day student Knowledge Building Design Studio (sKBDS). Student responses were augmented using the CoT approach and the LLM-generated responses were compared with students’ original responses. Findings are illustrated using two threads to show that CoT-augmented inputs for the LLMs can generate responses that support improvable ideas in the context of knowledge creation. This study presents work from authentic student discourse and has implications for research and classroom practice. Full article
Show Figures

Figure 1

19 pages, 892 KiB  
Article
Harnessing Generative Artificial Intelligence for Digital Literacy Innovation: A Comparative Study between Early Childhood Education and Computer Science Undergraduates
by Ioannis Kazanidis and Nikolaos Pellas
AI 2024, 5(3), 1427-1445; https://doi.org/10.3390/ai5030068 - 15 Aug 2024
Viewed by 6331
Abstract
The recent surge of generative artificial intelligence (AI) in higher education presents a fascinating landscape of opportunities and challenges. AI has the potential to personalize education and create more engaging learning experiences. However, the effectiveness of AI interventions relies on well-considered implementation strategies. [...] Read more.
The recent surge of generative artificial intelligence (AI) in higher education presents a fascinating landscape of opportunities and challenges. AI has the potential to personalize education and create more engaging learning experiences. However, the effectiveness of AI interventions relies on well-considered implementation strategies. The impact of AI platforms in education is largely determined by the particular learning environment and the distinct needs of each student. Consequently, investigating the attitudes of future educators towards this technology is becoming a critical area of research. This study explores the impact of generative AI platforms on students’ learning performance, experience, and satisfaction within higher education. It specifically focuses on students’ experiences with varying levels of technological proficiency. A comparative study was conducted with two groups from different academic contexts undergoing the same experimental condition to design, develop, and implement instructional design projects using various AI platforms to produce multimedia content tailored to their respective subjects. Undergraduates from two disciplines—Early Childhood Education (n = 32) and Computer Science (n = 34)—participated in this study, which examined the integration of generative AI platforms into educational content implementation. Results indicate that both groups demonstrated similar learning performance in designing, developing, and implementing instructional design projects. Regarding user experience, the general outcomes were similar across both groups; however, Early Childhood Education students rated the usefulness of AI multimedia platforms significantly higher. Conversely, Computer Science students reported a slightly higher comfort level with these tools. In terms of overall satisfaction, Early Childhood Education students expressed greater satisfaction with AI software than their counterparts, acknowledging its importance for their future careers. This study contributes to the understanding of how AI platforms affect students from diverse backgrounds, bridging a gap in the knowledge of user experience and learning outcomes. Furthermore, by exploring best practices for integrating AI into educational contexts, it provides valuable insights for educators and scholars seeking to optimize the potential of AI to enhance educational outcomes. Full article
Show Figures

Figure 1

36 pages, 3308 KiB  
Review
Fractional Calculus Meets Neural Networks for Computer Vision: A Survey
by Cecília Coelho, M. Fernanda P. Costa and Luís L. Ferrás
AI 2024, 5(3), 1391-1426; https://doi.org/10.3390/ai5030067 - 7 Aug 2024
Cited by 1 | Viewed by 1447
Abstract
Traditional computer vision techniques aim to extract meaningful information from images but often depend on manual feature engineering, making it difficult to handle complex real-world scenarios. Fractional calculus (FC), which extends derivatives to non-integer orders, provides a flexible way to model systems with [...] Read more.
Traditional computer vision techniques aim to extract meaningful information from images but often depend on manual feature engineering, making it difficult to handle complex real-world scenarios. Fractional calculus (FC), which extends derivatives to non-integer orders, provides a flexible way to model systems with memory effects and long-term dependencies, making it a powerful tool for capturing fractional rates of variation. Recently, neural networks (NNs) have demonstrated remarkable capabilities in learning complex patterns directly from raw data, automating computer vision tasks and enhancing performance. Therefore, the use of fractional calculus in neural network-based computer vision is a powerful method to address existing challenges by effectively capturing complex spatial and temporal relationships in images and videos. This paper presents a survey of fractional calculus neural network-based (FC NN-based) computer vision techniques for denoising, enhancement, object detection, segmentation, restoration, and NN compression. This survey compiles existing FFC NN-based approaches, elucidates underlying concepts, and identifies open questions and research directions. By leveraging FC’s properties, FC NN-based approaches offer a novel way to improve the robustness and efficiency of computer vision systems. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

14 pages, 1254 KiB  
Article
Optimizing Curriculum Vitae Concordance: A Comparative Examination of Classical Machine Learning Algorithms and Large Language Model Architectures
by Mohammed Maree and Wala’a Shehada
AI 2024, 5(3), 1377-1390; https://doi.org/10.3390/ai5030066 - 6 Aug 2024
Viewed by 1361
Abstract
Digital recruitment systems have revolutionized the hiring paradigm, imparting exceptional efficiencies and extending the reach for both employers and job seekers. This investigation scrutinized the efficacy of classical machine learning methodologies alongside advanced large language models (LLMs) in aligning resumes with job categories. [...] Read more.
Digital recruitment systems have revolutionized the hiring paradigm, imparting exceptional efficiencies and extending the reach for both employers and job seekers. This investigation scrutinized the efficacy of classical machine learning methodologies alongside advanced large language models (LLMs) in aligning resumes with job categories. Traditional matching techniques, such as Logistic Regression, Decision Trees, Naïve Bayes, and Support Vector Machines, are constrained by the necessity of manual feature extraction, limited feature representation, and performance degradation, particularly as dataset size escalates, rendering them less suitable for large-scale applications. Conversely, LLMs such as GPT-4, GPT-3, and LLAMA adeptly process unstructured textual content, capturing nuanced language and context with greater precision. We evaluated these methodologies utilizing two datasets comprising resumes and job descriptions to ascertain their accuracy, efficiency, and scalability. Our results revealed that while conventional models excel at processing structured data, LLMs significantly enhance the interpretation and matching of intricate textual information. This study highlights the transformative potential of LLMs in recruitment, offering insights into their application and future research avenues. Full article
Show Figures

Figure 1

20 pages, 1680 KiB  
Article
Teaming Up with an AI: Exploring Human–AI Collaboration in a Writing Scenario with ChatGPT
by Teresa Luther, Joachim Kimmerle and Ulrike Cress
AI 2024, 5(3), 1357-1376; https://doi.org/10.3390/ai5030065 - 5 Aug 2024
Cited by 1 | Viewed by 2948
Abstract
Recent advancements in artificial intelligence (AI) technologies, particularly in generative pre-trained transformer large language models, have significantly enhanced the capabilities of text-generative AI tools—a development that opens new avenues for human–AI collaboration across various domains. However, the dynamics of human interaction with AI-based [...] Read more.
Recent advancements in artificial intelligence (AI) technologies, particularly in generative pre-trained transformer large language models, have significantly enhanced the capabilities of text-generative AI tools—a development that opens new avenues for human–AI collaboration across various domains. However, the dynamics of human interaction with AI-based chatbots, such as ChatGPT, remain largely unexplored. We observed and analyzed how people interact with ChatGPT in a collaborative writing setting to address this research gap. A total of 135 participants took part in this exploratory lab study, which consisted of engaging with ChatGPT to compose a text discussing the prohibition of alcohol in public in relation to a given statement on risky alcohol consumption. During the writing task, all screen activity was logged. In addition to the writing task, further insights on user behavior and experience were gained by applying questionnaires and conducting an additional short interview with a randomly selected subset of 18 participants. Our results reveal high satisfaction with ChatGPT regarding quality aspects, mainly cognitive rather than affect-based trust in ChatGPT’s responses, and higher ratings on perceived competence than on warmth. Compared to other types of prompts, mostly content-related prompts for data, facts, and information were sent to ChatGPT. Mixed-method analysis showed that affinity for technology integration and current use of ChatGPT were positively associated with the frequency of complete text requests. Moreover, prompts for complete texts were associated with more copy–paste behavior. These first insights into co-writing with ChatGPT can inform future research on how successful human–AI collaborative writing can be designed. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

33 pages, 1785 KiB  
Article
Sustainable Machine Vision for Industry 4.0: A Comprehensive Review of Convolutional Neural Networks and Hardware Accelerators in Computer Vision
by Muhammad Hussain
AI 2024, 5(3), 1324-1356; https://doi.org/10.3390/ai5030064 - 1 Aug 2024
Viewed by 2006
Abstract
As manifestations of Industry 4.0. become visible across various applications, one key and opportune area of development are quality inspection processes and defect detection. Over the last decade, computer vision architectures, in particular, object detectors have received increasing attention from the research community, [...] Read more.
As manifestations of Industry 4.0. become visible across various applications, one key and opportune area of development are quality inspection processes and defect detection. Over the last decade, computer vision architectures, in particular, object detectors have received increasing attention from the research community, due to their localisation advantage over image classification. However, for these architectural advancements to provide tangible solutions, they must be optimised with respect to the target hardware along with the deployment environment. To this effect, this survey provides an in-depth review of the architectural progression of image classification and object detection architectures with a focus on advancements within Artificially Intelligent accelerator hardware. This will provide readers with an understanding of the present state of architecture–hardware integration within the computer vision discipline. The review also provides examples of the industrial implementation of computer vision architectures across various domains, from the detection of fabric defects to pallet racking inspection. The survey highlights the need for representative hardware-benchmarked datasets for providing better performance comparisons along with envisioning object detection as the primary domain where more research efforts would be focused over the next decade. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

23 pages, 5989 KiB  
Article
Vision Transformers in Optimization of AI-Based Early Detection of Botrytis cinerea
by Panagiotis Christakakis, Nikolaos Giakoumoglou, Dimitrios Kapetas, Dimitrios Tzovaras and Eleftheria-Maria Pechlivani
AI 2024, 5(3), 1301-1323; https://doi.org/10.3390/ai5030063 - 1 Aug 2024
Cited by 1 | Viewed by 1193
Abstract
Detecting early plant diseases autonomously poses a significant challenge for self-navigating robots and automated systems utilizing Artificial Intelligence (AI) imaging. For instance, Botrytis cinerea, also known as gray mold disease, is a major threat to agriculture, particularly impacting significant crops in the [...] Read more.
Detecting early plant diseases autonomously poses a significant challenge for self-navigating robots and automated systems utilizing Artificial Intelligence (AI) imaging. For instance, Botrytis cinerea, also known as gray mold disease, is a major threat to agriculture, particularly impacting significant crops in the Cucurbitaceae and Solanaceae families, making early and accurate detection essential for effective disease management. This study focuses on the improvement of deep learning (DL) segmentation models capable of early detecting B. cinerea on Cucurbitaceae crops utilizing Vision Transformer (ViT) encoders, which have shown promising segmentation performance, in systemic use with the Cut-and-Paste method that further improves accuracy and efficiency addressing dataset imbalance. Furthermore, to enhance the robustness of AI models for early detection in real-world settings, an advanced imagery dataset was employed. The dataset consists of healthy and artificially inoculated cucumber plants with B. cinerea and captures the disease progression through multi-spectral imaging over the course of days, depicting the full spectrum of symptoms of the infection, ranging from early, non-visible stages to advanced disease manifestations. Research findings, based on a three-class system, identify the combination of U-Net++ with MobileViTV2-125 as the best-performing model. This model achieved a mean Dice Similarity Coefficient (mDSC) of 0.792, a mean Intersection over Union (mIoU) of 0.816, and a recall rate of 0.885, with a high accuracy of 92%. Analyzing the detection capabilities during the initial days post-inoculation demonstrates the ability to identify invisible B. cinerea infections as early as day 2 and increasing up to day 6, reaching an IoU of 67.1%. This study assesses various infection stages, distinguishing them from abiotic stress responses or physiological deterioration, which is crucial for accurate disease management as it separates pathogenic from non-pathogenic stress factors. The findings of this study indicate a significant advancement in agricultural disease monitoring and control, with the potential for adoption in on-site digital systems (robots, mobile apps, etc.) operating in real settings, showcasing the effectiveness of ViT-based DL segmentation models for prompt and precise botrytis detection. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

15 pages, 5569 KiB  
Article
Comparative Analysis of Machine Learning Techniques Using RGB Imaging for Nitrogen Stress Detection in Maize
by Sumaira Ghazal, Namratha Kommineni and Arslan Munir
AI 2024, 5(3), 1286-1300; https://doi.org/10.3390/ai5030062 - 28 Jul 2024
Viewed by 1612
Abstract
Proper nitrogen management in crops is crucial to ensure optimal growth and yield maximization. While hyperspectral imagery is often used for nitrogen status estimation in crops, it is not feasible for real-time applications due to the complexity and high cost associated with it. [...] Read more.
Proper nitrogen management in crops is crucial to ensure optimal growth and yield maximization. While hyperspectral imagery is often used for nitrogen status estimation in crops, it is not feasible for real-time applications due to the complexity and high cost associated with it. Much of the research utilizing RGB data for detecting nitrogen stress in plants relies on datasets obtained under laboratory settings, which limits its usability in practical applications. This study focuses on identifying nitrogen deficiency in maize crops using RGB imaging data from a publicly available dataset obtained under field conditions. We have proposed a custom-built vision transformer model for the classification of maize into three stress classes. Additionally, we have analyzed the performance of convolutional neural network models, including ResNet50, EfficientNetB0, InceptionV3, and DenseNet121, for nitrogen stress estimation. Our approach involves transfer learning with fine-tuning, adding layers tailored to our specific application. Our detailed analysis shows that while vision transformer models generalize well, they converge prematurely with a higher loss value, indicating the need for further optimization. In contrast, the fine-tuned CNN models classify the crop into stressed, non-stressed, and semi-stressed classes with higher accuracy, achieving a maximum accuracy of 97% with EfficientNetB0 as the base model. This makes our fine-tuned EfficientNetB0 model a suitable candidate for practical applications in nitrogen stress detection. Full article
(This article belongs to the Special Issue Artificial Intelligence in Agriculture)
Show Figures

Figure 1

31 pages, 1582 KiB  
Article
Recent Advances in 3D Object Detection for Self-Driving Vehicles: A Survey
by Oluwajuwon A. Fawole and Danda B. Rawat
AI 2024, 5(3), 1255-1285; https://doi.org/10.3390/ai5030061 - 25 Jul 2024
Cited by 1 | Viewed by 2743
Abstract
The development of self-driving or autonomous vehicles has led to significant advancements in 3D object detection technologies, which are critical for the safety and efficiency of autonomous driving. Despite recent advances, several challenges remain in sensor integration, handling sparse and noisy data, and [...] Read more.
The development of self-driving or autonomous vehicles has led to significant advancements in 3D object detection technologies, which are critical for the safety and efficiency of autonomous driving. Despite recent advances, several challenges remain in sensor integration, handling sparse and noisy data, and ensuring reliable performance across diverse environmental conditions. This paper comprehensively surveys state-of-the-art 3D object detection techniques for autonomous vehicles, emphasizing the importance of multi-sensor fusion techniques and advanced deep learning models. Furthermore, we present key areas for future research, including enhancing sensor fusion algorithms, improving computational efficiency, and addressing ethical, security, and privacy concerns. The integration of these technologies into real-world applications for autonomous driving is presented by highlighting potential benefits and limitations. We also present a side-by-side comparison of different techniques in a tabular form. Through a comprehensive review, this paper aims to provide insights into the future directions of 3D object detection and its impact on the evolution of autonomous driving. Full article
(This article belongs to the Section AI in Autonomous Systems)
Show Figures

Figure 1

20 pages, 1207 KiB  
Article
A Model for Feature Selection with Binary Particle Swarm Optimisation and Synthetic Features
by Samuel Olusegun Ojo, Juliana Adeola Adisa, Pius Adewale Owolawi and Chunling Tu
AI 2024, 5(3), 1235-1254; https://doi.org/10.3390/ai5030060 - 25 Jul 2024
Viewed by 791
Abstract
Recognising patterns and inferring nonlinearities between data that are seemingly random and stochastic in nature is one of the strong suites of machine learning models. Given a set of features, the ability to distinguish between useful features and seemingly useless features, and thereafter [...] Read more.
Recognising patterns and inferring nonlinearities between data that are seemingly random and stochastic in nature is one of the strong suites of machine learning models. Given a set of features, the ability to distinguish between useful features and seemingly useless features, and thereafter extract a subset of features that will result in the best prediction on data that are highly stochastic, remains an open issue. This study presents a model for feature selection by generating synthetic features and applying Binary Particle Swarm Optimisation with a Long Short-Term Memory-based model. The study analyses the correlation between data and makes use of Apple stock market data as a use case. Synthetic features are created from features that have weak/low correlation to the label and analysed how synthetic features that are descriptive of features can enhance the model’s predictive capability. The results obtained show that by expanding the dataset to contain synthetic features before applying feature selection, the objective function was better optimised as compared to when no synthetic features were added. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

20 pages, 3755 KiB  
Article
Dynamic Programming-Based White Box Adversarial Attack for Deep Neural Networks
by Swati Aggarwal, Anshul Mittal, Sanchit Aggarwal and Anshul Kumar Singh
AI 2024, 5(3), 1216-1234; https://doi.org/10.3390/ai5030059 - 24 Jul 2024
Viewed by 789
Abstract
Recent studies have exposed the vulnerabilities of deep neural networks to some carefully perturbed input data. We propose a novel untargeted white box adversarial attack, the dynamic programming-based sub-pixel score method (SPSM) attack (DPSPSM), which is a variation of the traditional gradient-based white [...] Read more.
Recent studies have exposed the vulnerabilities of deep neural networks to some carefully perturbed input data. We propose a novel untargeted white box adversarial attack, the dynamic programming-based sub-pixel score method (SPSM) attack (DPSPSM), which is a variation of the traditional gradient-based white box adversarial approach that is limited by a fixed hamming distance using a dynamic programming-based structure. It is stimulated using a pixel score metric technique, the SPSM, which is introduced in this paper. In contrast to the conventional gradient-based adversarial attacks, which alter entire images almost imperceptibly, the DPSPSM is swift and offers the robustness of manipulating only a small number of input pixels. The presented algorithm quantizes the gradient update with a score generated for each pixel, incorporating contributions from each channel. The results show that the DPSPSM deceives the model with a success rate of 30.45% in the CIFAR-10 test set and 29.30% in the CIFAR-100 test set. Full article
Show Figures

Figure 1

24 pages, 7706 KiB  
Article
Computer Vision for Safety Management in the Steel Industry
by Roy Lan, Ibukun Awolusi and Jiannan Cai
AI 2024, 5(3), 1192-1215; https://doi.org/10.3390/ai5030058 - 19 Jul 2024
Viewed by 1760
Abstract
The complex nature of the steel manufacturing environment, characterized by different types of hazards from materials and large machinery, makes the need for objective and automated monitoring very critical to replace the traditional methods, which are manual and subjective. This study explores the [...] Read more.
The complex nature of the steel manufacturing environment, characterized by different types of hazards from materials and large machinery, makes the need for objective and automated monitoring very critical to replace the traditional methods, which are manual and subjective. This study explores the feasibility of implementing computer vision for safety management in steel manufacturing, with a case study implementation for automated hard hat detection. The research combines hazard characterization, technology assessment, and a pilot case study. First, a comprehensive review of steel manufacturing hazards was conducted, followed by the application of TOPSIS, a multi-criteria decision analysis method, to select a candidate computer vision system from eight commercially available systems. This pilot study evaluated YOLOv5m, YOLOv8m, and YOLOv9c models on 703 grayscale images from a steel mini-mill, assessing performance through precision, recall, F1-score, mAP, specificity, and AUC metrics. Results showed high overall accuracy in hard hat detection, with YOLOv9c slightly outperforming others, particularly in detecting safety violations. Challenges emerged in handling class imbalance and accurately identifying absent hard hats, especially given grayscale imagery limitations. Despite these challenges, this study affirms the feasibility of computer vision-based safety management in steel manufacturing, providing a foundation for future automated safety monitoring systems. Findings underscore the need for larger, diverse datasets and advanced techniques to address industry-specific complexities, paving the way for enhanced workplace safety in challenging industrial environments. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

20 pages, 1623 KiB  
Article
Optimization Strategies for Atari Game Environments: Integrating Snake Optimization Algorithm and Energy Valley Optimization in Reinforcement Learning Models
by Sadeq Mohammed Kadhm Sarkhi and Hakan Koyuncu
AI 2024, 5(3), 1172-1191; https://doi.org/10.3390/ai5030057 - 17 Jul 2024
Cited by 2 | Viewed by 1087
Abstract
One of the biggest problems in gaming AI is related to how we can optimize and adapt a deep reinforcement learning (DRL) model, especially when it is running inside complex, dynamic environments like “PacMan”. The existing research has concentrated more or less on [...] Read more.
One of the biggest problems in gaming AI is related to how we can optimize and adapt a deep reinforcement learning (DRL) model, especially when it is running inside complex, dynamic environments like “PacMan”. The existing research has concentrated more or less on basic DRL approaches though the utilization of advanced optimization methods. This paper tries to fill these gaps by proposing an innovative methodology that combines DRL with high-level metaheuristic optimization methods. The work presented in this paper specifically refactors DRL models on the “PacMan” domain with Energy Serpent Optimizer (ESO) for hyperparameter search. These novel adaptations give a major performance boost to the AI agent, as these are where its adaptability, response time, and efficiency gains start actually showing in the more complex game space. This work innovatively incorporates the metaheuristic optimization algorithm into another field—DRL—for Atari gaming AI. This integration is essential for the improvement of DRL models in general and allows for more efficient and real-time game play. This work delivers a comprehensive empirical study for these algorithms that not only verifies their capabilities in practice but also sets a state of the art through the prism of AI-driven game development. More than simply improving gaming AI, the developments could eventually apply to more sophisticated gaming environments, ongoing improvement of algorithms during execution, real-time adaptation regarding learning, and likely even robotics/autonomous systems. This study further illustrates the necessity for even-handed and conscientious application of AI in gaming—specifically regarding questions of fairness and addiction. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

40 pages, 5912 KiB  
Article
ConVision Benchmark: A Contemporary Framework to Benchmark CNN and ViT Models
by Shreyas Bangalore Vijayakumar, Krishna Teja Chitty-Venkata, Kanishk Arya and Arun K. Somani
AI 2024, 5(3), 1132-1171; https://doi.org/10.3390/ai5030056 - 11 Jul 2024
Viewed by 1867
Abstract
Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have shown remarkable performance in computer vision tasks, including object detection and image recognition. These models have evolved significantly in architecture, efficiency, and versatility. Concurrently, deep-learning frameworks have diversified, with versions that often complicate reproducibility [...] Read more.
Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have shown remarkable performance in computer vision tasks, including object detection and image recognition. These models have evolved significantly in architecture, efficiency, and versatility. Concurrently, deep-learning frameworks have diversified, with versions that often complicate reproducibility and unified benchmarking. We propose ConVision Benchmark, a comprehensive framework in PyTorch, to standardize the implementation and evaluation of state-of-the-art CNN and ViT models. This framework addresses common challenges such as version mismatches and inconsistent validation metrics. As a proof of concept, we performed an extensive benchmark analysis on a COVID-19 dataset, encompassing nearly 200 CNN and ViT models in which DenseNet-161 and MaxViT-Tiny achieved exceptional accuracy with a peak performance of around 95%. Although we primarily used the COVID-19 dataset for image classification, the framework is adaptable to a variety of datasets, enhancing its applicability across different domains. Our methodology includes rigorous performance evaluations, highlighting metrics such as accuracy, precision, recall, F1 score, and computational efficiency (FLOPs, MACs, CPU, and GPU latency). The ConVision Benchmark facilitates a comprehensive understanding of model efficacy, aiding researchers in deploying high-performance models for diverse applications. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

21 pages, 2574 KiB  
Article
ZTCloudGuard: Zero Trust Context-Aware Access Management Framework to Avoid Medical Errors in the Era of Generative AI and Cloud-Based Health Information Ecosystems
by Khalid Al-hammuri, Fayez Gebali and Awos Kanan
AI 2024, 5(3), 1111-1131; https://doi.org/10.3390/ai5030055 - 8 Jul 2024
Viewed by 1226
Abstract
Managing access between large numbers of distributed medical devices has become a crucial aspect of modern healthcare systems, enabling the establishment of smart hospitals and telehealth infrastructure. However, as telehealth technology continues to evolve and Internet of Things (IoT) devices become more widely [...] Read more.
Managing access between large numbers of distributed medical devices has become a crucial aspect of modern healthcare systems, enabling the establishment of smart hospitals and telehealth infrastructure. However, as telehealth technology continues to evolve and Internet of Things (IoT) devices become more widely used, they are also increasingly exposed to various types of vulnerabilities and medical errors. In healthcare information systems, about 90% of vulnerabilities emerge from medical error and human error. As a result, there is a need for additional research and development of security tools to prevent such attacks. This article proposes a zero-trust-based context-aware framework for managing access to the main components of the cloud ecosystem, including users, devices, and output data. The main goal and benefit of the proposed framework is to build a scoring system to prevent or alleviate medical errors while using distributed medical devices in cloud-based healthcare information systems. The framework has two main scoring criteria to maintain the chain of trust. First, it proposes a critical trust score based on cloud-native microservices for authentication, encryption, logging, and authorizations. Second, a bond trust scoring system is created to assess the real-time semantic and syntactic analysis of attributes stored in a healthcare information system. The analysis is based on a pre-trained machine learning model that generates the semantic and syntactic scores. The framework also takes into account regulatory compliance and user consent in the creation of the scoring system. The advantage of this method is that it applies to any language and adapts to all attributes, as it relies on a language model, not just a set of predefined and limited attributes. The results show a high F1 score of 93.5%, which proves that it is valid for detecting medical errors. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop