Algorithms

Research

28 pages, 9307 KiB

Open AccessArticle

Application Framework and Optimal Features for UAV-Based Earthquake-Induced Structural Displacement Monitoring

by Ruipu Ji, Shokrullah Sorosh, Eric Lo, Tanner J. Norton, John W. Driscoll, Falko Kuester, Andre R. Barbosa, Barbara G. Simpson and Tara C. Hutchinson

Algorithms 2025, 18(2), 66; https://doi.org/10.3390/a18020066 - 26 Jan 2025

Viewed by 471

Abstract

Unmanned aerial vehicle (UAV) vision-based sensing has become an emerging technology for structural health monitoring (SHM) and post-disaster damage assessment of civil infrastructure. This article proposes a framework for monitoring structural displacement under earthquakes by reprojecting image points obtained courtesy of UAV-captured videos [...] Read more.

Unmanned aerial vehicle (UAV) vision-based sensing has become an emerging technology for structural health monitoring (SHM) and post-disaster damage assessment of civil infrastructure. This article proposes a framework for monitoring structural displacement under earthquakes by reprojecting image points obtained courtesy of UAV-captured videos to the 3-D world space based on the world-to-image point correspondences. To identify optimal features in the UAV imagery, geo-reference targets with various patterns were installed on a test building specimen, which was then subjected to earthquake shaking. A feature point tracking-based algorithm for square checkerboard patterns and a Hough Transform-based algorithm for concentric circular patterns are developed to ensure reliable detection and tracking of image features. Photogrammetry techniques are applied to reconstruct the 3-D world points and extract structural displacements. The proposed methodology is validated by monitoring the displacements of a full-scale 6-story mass timber building during a series of shake table tests. Reasonable accuracy is achieved in that the overall root-mean-square errors of the tracking results are at the millimeter level compared to ground truth measurements from analog sensors. Insights on optimal features for monitoring structural dynamic response are discussed based on statistical analysis of the error characteristics for the various reference target patterns used to track the structural displacements. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Graphical abstract

23 pages, 205579 KiB

Open AccessArticle

DDL R-CNN: Dynamic Direction Learning R-CNN for Rotated Object Detection

by Weixian Su and Donglin Jing

Algorithms 2025, 18(1), 21; https://doi.org/10.3390/a18010021 - 4 Jan 2025

Viewed by 743

Abstract

Current remote sensing (RS) detectors often rely on predefined anchor boxes with fixed angles to handle the multi-directional variations of targets. This approach makes it challenging to accurately select regions of interest and extract features that align with the direction of the targets. [...] Read more.

Current remote sensing (RS) detectors often rely on predefined anchor boxes with fixed angles to handle the multi-directional variations of targets. This approach makes it challenging to accurately select regions of interest and extract features that align with the direction of the targets. Most existing regression methods also adopt angle regression to match the attributes of remote sensing detectors. Due to the inconsistent regression direction and massive anchor boxes with a high aspect ratio, the extracted target features change greatly, the loss function changes drastically, and the training is unstable. However, existing RS detectors and regression techniques have not been able to effectively balance the precision of directional feature extraction with the complexity of the models. To address these challenges, this paper introduces a novel approach known as Dynamic Direction Learning R-CNN (DDL R-CNN), which comprises a dynamic direction learning (DDL) module and a boundary center region offset generation network (BC-ROPN). The DDL module pre-extracts the directional features of targets to provide a coarse estimation of their angles and the corresponding weights. This information is used to generate rotationally aligned anchor boxes that better model the directional features of the targets. BC-ROPN represents an innovative method for anchor box regression. It utilizes the central features of the maximum bounding rectangle’s width and height, along with the coarse angle estimation and weights derived from DDL module, to refine the orientation of the anchor box. Our method has been proven to surpass existing rotating detection networks in extensive testing across two widely used remote sensing detection datasets, namely UCAS-AOD and HRSC2016. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Graphical abstract

21 pages, 5152 KiB

Open AccessArticle

GAGAN: Enhancing Image Generation Through Hybrid Optimization of Genetic Algorithms and Deep Convolutional Generative Adversarial Networks

by Despoina Konstantopoulou, Paraskevi Zacharia, Michail Papoutsidakis, Helen C. Leligou and Charalampos Patrikakis

Algorithms 2024, 17(12), 584; https://doi.org/10.3390/a17120584 - 19 Dec 2024

Viewed by 1136

Abstract

Generative Adversarial Networks (GANs) are highly effective for generating realistic images, yet their training can be unstable due to challenges such as mode collapse and oscillatory convergence. In this paper, we propose a novel hybrid optimization method that integrates Genetic Algorithms (GAs) to [...] Read more.

Generative Adversarial Networks (GANs) are highly effective for generating realistic images, yet their training can be unstable due to challenges such as mode collapse and oscillatory convergence. In this paper, we propose a novel hybrid optimization method that integrates Genetic Algorithms (GAs) to improve the training process of Deep Convolutional GANs (DCGANs). Specifically, GAs are used to evolve the discriminator’s weights, complementing the gradient-based learning typically employed in GANs. The proposed GAGAN model is trained on the CelebA dataset, using 2000 images, to generate 128 × 128 images, with the generator learning to produce realistic faces from random latent vectors. The discriminator, which classifies images as real or fake, is optimized not only through standard backpropagation, but also through a GA framework that evolves its weights via crossover, mutation, and selection processes. This hybrid method aims to enhance convergence stability and boost image quality by balancing local search from gradient-based methods with the global search capabilities of GAs. Experiments show that the proposed approach reduces generator loss and improves image fidelity, demonstrating that evolutionary algorithms can effectively complement deep learning techniques. This work opens new avenues for optimizing GAN training and enhancing performance in generative models. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

20 pages, 2003 KiB

Open AccessArticle

Enhanced Curvature-Based Fabric Defect Detection: A Experimental Study with Gabor Transform and Deep Learning

by Mehmet Erdogan and Mustafa Dogan

Algorithms 2024, 17(11), 506; https://doi.org/10.3390/a17110506 - 5 Nov 2024

Viewed by 841

Abstract

Quality control at every stage of production in the textile industry is essential for maintaining competitiveness in the global market. Manual fabric defect inspections are often characterized by low precision and high time costs, in contrast to intelligent anomaly detection systems implemented in [...] Read more.

Quality control at every stage of production in the textile industry is essential for maintaining competitiveness in the global market. Manual fabric defect inspections are often characterized by low precision and high time costs, in contrast to intelligent anomaly detection systems implemented in the early stages of fabric production. To achieve successful automated fabric defect identification, significant challenges must be addressed, including accurate detection, classification, and decision-making processes. Traditionally, fabric defect classification has relied on inefficient and labor-intensive human visual inspection, particularly as the variety of fabric defects continues to increase. Despite the global chip crisis and its adverse effects on supply chains, electronic hardware costs for quality control systems have become more affordable. This presents a notable advantage, as vision systems can now be easily developed with the use of high-resolution, advanced cameras. In this study, we propose a discrete curvature algorithm, integrated with the Gabor transform, which demonstrates significant success in near real-time defect classification. The primary contribution of this work is the development of a modified curvature algorithm that achieves high classification performance without the need for training. This method is particularly efficient due to its low data storage requirements and minimal processing time, making it ideal for real-time applications. Furthermore, we implemented and evaluated several other methods from the literature, including Gabor and Convolutional Neural Networks (CNNs), within a unified coding framework. Each defect type was analyzed individually, with results indicating that the proposed algorithm exhibits comparable success and robust performance relative to deep learning-based approaches. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

18 pages, 5670 KiB

Open AccessArticle

Improved U2Net-Based Surface Defect Detection Method for Blister Tablets

by Jianmin Zhou, Jian Huang, Jikang Liu and Jingbo Liu

Algorithms 2024, 17(10), 429; https://doi.org/10.3390/a17100429 - 26 Sep 2024

Viewed by 829

Abstract

Aiming at the problem that the surface defects of blAister tablets are difficult to detect correctly, this paper proposes a detection method based on the improved U2Net. First, the features extracted from the RSU module of U2Net are enhanced and adjusted using the [...] Read more.

Aiming at the problem that the surface defects of blAister tablets are difficult to detect correctly, this paper proposes a detection method based on the improved U2Net. First, the features extracted from the RSU module of U2Net are enhanced and adjusted using the large kernel attention mechanism, so that the U2Net model strengthens its ability to extract defective features. Second, a loss function combining the Gaussian Laplace operator and the cross-entropy function is designed to make the model strengthen its ability to detect edge defects on the surface of blister tablets. Finally, thresholds are adaptively determined using the local mean and OTSU(an adaptive threshold segmentation method) method to improve accuracy. The experimental results show that the method proposed in this paper can reach an average accuracy of 99% and an average precision rate of 96.3%; the model test only takes 50 ms per image, which can meet the rapid detection requirements. Minor surface defects can also be accurately detected, which is better than other algorithmic models of the same type, proving the effectiveness of this method. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

19 pages, 5132 KiB

Open AccessArticle

Synthetic Face Discrimination via Learned Image Compression

by Sofia Iliopoulou, Panagiotis Tsinganos, Dimitris Ampeliotis and Athanassios Skodras

Algorithms 2024, 17(9), 375; https://doi.org/10.3390/a17090375 - 23 Aug 2024

Viewed by 923

Abstract

The emergence of deep learning has sparked notable strides in the quality of synthetic media. Yet, as photorealism reaches new heights, the line between generated and authentic images blurs, raising concerns about the dissemination of counterfeit or manipulated content online. Consequently, there is [...] Read more.

The emergence of deep learning has sparked notable strides in the quality of synthetic media. Yet, as photorealism reaches new heights, the line between generated and authentic images blurs, raising concerns about the dissemination of counterfeit or manipulated content online. Consequently, there is a pressing need to develop automated tools capable of effectively distinguishing synthetic images, especially those portraying faces, which is one of the most commonly encountered issues. In this work, we propose a novel approach to synthetic face discrimination, leveraging deep learning-based image compression and predominantly utilizing the quality metrics of an image to determine its authenticity. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

19 pages, 7973 KiB

Open AccessArticle

Determining Thresholds for Optimal Adaptive Discrete Cosine Transformation

by Alexander Khanov, Anastasija Shulzhenko, Anzhelika Voroshilova, Alexander Zubarev, Timur Karimov and Shakeeb Fahmi

Algorithms 2024, 17(8), 366; https://doi.org/10.3390/a17080366 - 21 Aug 2024

Viewed by 783

Abstract

The discrete cosine transform (DCT) is widely used for image and video compression. Lossy algorithms such as JPEG, WebP, BPG and many others are based on it. Multiple modifications of DCT have been developed to improve its performance. One of them is adaptive [...] Read more.

The discrete cosine transform (DCT) is widely used for image and video compression. Lossy algorithms such as JPEG, WebP, BPG and many others are based on it. Multiple modifications of DCT have been developed to improve its performance. One of them is adaptive DCT (ADCT) designed to deal with heterogeneous image structure and it may be found, for example, in the HEVC video codec. Adaptivity means that the image is divided into an uneven grid of squares: smaller ones retain information about details better, while larger squares are efficient for homogeneous backgrounds. The practical use of adaptive DCT algorithms is complicated by the lack of optimal threshold search algorithms for image partitioning procedures. In this paper, we propose a novel method for optimal threshold search in ADCT using a metric based on tonal distribution. We define two thresholds: pm, the threshold defining solid mean coloring, and ps, defining the quadtree fragment splitting. In our algorithm, the values of these thresholds are calculated via polynomial functions of the tonal distribution of a particular image or fragment. The polynomial coefficients are determined using the dedicated optimization procedure on the dataset containing images from the specific domain, urban road scenes in our case. In the experimental part of the study, we show that ADCT allows a higher compression ratio compared to non-adaptive DCT at the same level of quality loss, up to 66% for acceptable quality. The proposed algorithm may be used directly for image compression, or as a core of video compression framework in traffic-demanding applications, such as urban video surveillance systems. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

13 pages, 14978 KiB

Open AccessArticle

Lester: Rotoscope Animation through Video Object Segmentation and Tracking

by Ruben Tous

Algorithms 2024, 17(8), 330; https://doi.org/10.3390/a17080330 - 30 Jul 2024

Viewed by 1299

Abstract

This article introduces Lester, a novel method to automatically synthesize retro-style 2D animations from videos. The method approaches the challenge mainly as an object segmentation and tracking problem. Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are [...] Read more.

This article introduces Lester, a novel method to automatically synthesize retro-style 2D animations from videos. The method approaches the challenge mainly as an object segmentation and tracking problem. Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are tracked through subsequent frames with DeAOT, a method of hierarchical propagation for semi-supervised video object segmentation. The geometry of the masks’ contours is simplified with the Douglas–Peucker algorithm. Finally, facial traits, pixelation and a basic rim light effect can be optionally added. The results show that the method exhibits an excellent temporal consistency and can correctly process videos with different poses and appearances, dynamic shots, partial shots and diverse backgrounds. The proposed method provides a more simple and deterministic approach than diffusion models based video-to-video translation pipelines, which suffer from temporal consistency problems and do not cope well with pixelated and schematic outputs. The method is also more feasible than techniques based on 3D human pose estimation, which require custom handcrafted 3D models and are very limited with respect to the type of scenes they can process. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

13 pages, 2390 KiB

Open AccessArticle

Continuous Recognition of Teachers’ Hand Signals for Students with Attention Deficits

by Ivane Delos Santos Chen, Chieh-Ming Yang, Shang-Shu Wu, Chih-Kang Yang, Mei-Juan Chen, Chia-Hung Yeh and Yuan-Hong Lin

Algorithms 2024, 17(7), 300; https://doi.org/10.3390/a17070300 - 7 Jul 2024

Viewed by 1077

Abstract

In the era of inclusive education, students with attention deficits are integrated into the general classroom. To ensure a seamless transition of students’ focus towards the teacher’s instruction throughout the course and to align with the teaching pace, this paper proposes a continuous [...] Read more.

In the era of inclusive education, students with attention deficits are integrated into the general classroom. To ensure a seamless transition of students’ focus towards the teacher’s instruction throughout the course and to align with the teaching pace, this paper proposes a continuous recognition algorithm for capturing teachers’ dynamic gesture signals. This algorithm aims to offer instructional attention cues for students with attention deficits. According to the body landmarks of the teacher’s skeleton by using vision and machine learning-based MediaPipe BlazePose, the proposed method uses simple rules to detect the teacher’s hand signals dynamically and provides three kinds of attention cues (Pointing to left, Pointing to right, and Non-pointing) during the class. Experimental results show the average accuracy, sensitivity, specificity, precision, and F₁ score achieved 88.31%, 91.03%, 93.99%, 86.32%, and 88.03%, respectively. By analyzing non-verbal behavior, our method of competent performance can replace verbal reminders from the teacher and be helpful for students with attention deficits in inclusive education. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

31 pages, 14424 KiB

Open AccessArticle

Enhancing Video Anomaly Detection Using a Transformer Spatiotemporal Attention Unsupervised Framework for Large Datasets

by Mohamed H. Habeb, May Salama and Lamiaa A. Elrefaei

Algorithms 2024, 17(7), 286; https://doi.org/10.3390/a17070286 - 1 Jul 2024

Cited by 2 | Viewed by 2056

Abstract

This work introduces an unsupervised framework for video anomaly detection, leveraging a hybrid deep learning model that combines a vision transformer (ViT) with a convolutional spatiotemporal relationship (STR) attention block. The proposed model addresses the challenges of anomaly detection in video surveillance by [...] Read more.

This work introduces an unsupervised framework for video anomaly detection, leveraging a hybrid deep learning model that combines a vision transformer (ViT) with a convolutional spatiotemporal relationship (STR) attention block. The proposed model addresses the challenges of anomaly detection in video surveillance by capturing both local and global relationships within video frames, a task that traditional convolutional neural networks (CNNs) often struggle with due to their localized field of view. We have utilized a pre-trained ViT as an encoder for feature extraction, which is then processed by the STR attention block to enhance the detection of spatiotemporal relationships among objects in videos. The novelty of this work is utilizing the ViT with the STR attention to detect video anomalies effectively in large and heterogeneous datasets, an important thing given the diverse environments and scenarios encountered in real-world surveillance. The framework was evaluated on three benchmark datasets, i.e., the UCSD-Ped2, CHUCK Avenue, and ShanghaiTech. This demonstrates the model’s superior performance in detecting anomalies compared to state-of-the-art methods, showcasing its potential to significantly enhance automated video surveillance systems by achieving area under the receiver operating characteristic curve (AUC ROC) values of 95.6, 86.8, and 82.1. To show the effectiveness of the proposed framework in detecting anomalies in extra-large datasets, we trained the model on a subset of the huge contemporary CHAD dataset that contains over 1 million frames, achieving AUC ROC values of 71.8 and 64.2 for CHAD-Cam 1 and CHAD-Cam 2, respectively, which outperforms the state-of-the-art techniques. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

19 pages, 1087 KiB

Open AccessArticle

Simple Histogram Equalization Technique Improves Performance of VGG Models on Facial Emotion Recognition Datasets

by Jaher Hassan Chowdhury, Qian Liu and Sheela Ramanna

Algorithms 2024, 17(6), 238; https://doi.org/10.3390/a17060238 - 3 Jun 2024

Cited by 4 | Viewed by 2440

Abstract

Facial emotion recognition (FER) is crucial across psychology, neuroscience, computer vision, and machine learning due to the diversified and subjective nature of emotions, varying considerably across individuals, cultures, and contexts. This study explored FER through convolutional neural networks (CNNs) and Histogram Equalization techniques. [...] Read more.

Facial emotion recognition (FER) is crucial across psychology, neuroscience, computer vision, and machine learning due to the diversified and subjective nature of emotions, varying considerably across individuals, cultures, and contexts. This study explored FER through convolutional neural networks (CNNs) and Histogram Equalization techniques. It investigated the impact of histogram equalization, data augmentation, and various model optimization strategies on FER accuracy across different datasets like KDEF, CK+, and FER2013. Using pre-trained VGG architectures, such as VGG19 and VGG16, this study also examined the effectiveness of fine-tuning hyperparameters and implementing different learning rate schedulers. The evaluation encompassed diverse metrics including accuracy, Area Under the Receiver Operating Characteristic Curve (AUC-ROC), Area Under the Precision–Recall Curve (AUC-PRC), and Weighted F1 score. Notably, the fine-tuned VGG architecture demonstrated a state-of-the-art performance compared to conventional transfer learning models and achieved 100%, 95.92%, and 69.65% on the CK+, KDEF, and FER2013 datasets, respectively. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

16 pages, 2857 KiB

Open AccessArticle

Enforcing Traffic Safety: A Deep Learning Approach for Detecting Motorcyclists’ Helmet Violations Using YOLOv8 and Deep Convolutional Generative Adversarial Network-Generated Images

by Maged Shoman, Tarek Ghoul, Gabriel Lanzaro, Tala Alsharif, Suliman Gargoum and Tarek Sayed

Algorithms 2024, 17(5), 202; https://doi.org/10.3390/a17050202 - 10 May 2024

Viewed by 2280

Abstract

In this study, we introduce an innovative methodology for the detection of helmet usage violations among motorcyclists, integrating the YOLOv8 object detection algorithm with deep convolutional generative adversarial networks (DCGANs). The objective of this research is to enhance the precision of existing helmet [...] Read more.

In this study, we introduce an innovative methodology for the detection of helmet usage violations among motorcyclists, integrating the YOLOv8 object detection algorithm with deep convolutional generative adversarial networks (DCGANs). The objective of this research is to enhance the precision of existing helmet violation detection techniques, which are typically reliant on manual inspection and susceptible to inaccuracies. The proposed methodology involves model training on an extensive dataset comprising both authentic and synthetic images, and demonstrates high accuracy in identifying helmet violations, including scenarios with multiple riders. Data augmentation, in conjunction with synthetic images produced by DCGANs, is utilized to expand the training data volume, particularly focusing on imbalanced classes, thereby facilitating superior model generalization to real-world circumstances. The stand-alone YOLOv8 model exhibited an F1 score of 0.91 for all classes at a confidence level of 0.617, whereas the DCGANs + YOLOv8 model demonstrated an F1 score of 0.96 for all classes at a reduced confidence level of 0.334. These findings highlight the potential of DCGANs in enhancing the accuracy of helmet rule violation detection, thus fostering safer motorcycling practices. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

16 pages, 808 KiB

Open AccessArticle

GDUI: Guided Diffusion Model for Unlabeled Images

by Xuanyuan Xie and Jieyu Zhao

Algorithms 2024, 17(3), 125; https://doi.org/10.3390/a17030125 - 18 Mar 2024

Viewed by 2344

Abstract

The diffusion model has made progress in the field of image synthesis, especially in the area of conditional image synthesis. However, this improvement is highly dependent on large annotated datasets. To tackle this challenge, we present the Guided Diffusion model for Unlabeled Images [...] Read more.

The diffusion model has made progress in the field of image synthesis, especially in the area of conditional image synthesis. However, this improvement is highly dependent on large annotated datasets. To tackle this challenge, we present the Guided Diffusion model for Unlabeled Images (GDUI) framework in this article. It utilizes the inherent feature similarity and semantic differences in the data, as well as the downstream transferability of Contrastive Language-Image Pretraining (CLIP), to guide the diffusion model in generating high-quality images. We design two semantic-aware algorithms, namely, the pseudo-label-matching algorithm and label-matching refinement algorithm, to match the clustering results with the true semantic information and provide more accurate guidance for the diffusion model. First, GDUI encodes the image into a semantically meaningful latent vector through clustering. Then, pseudo-label matching is used to complete the matching of the true semantic information of the image. Finally, the label-matching refinement algorithm is used to adjust the irrelevant semantic information in the data, thereby improving the quality of the guided diffusion model image generation. Our experiments on labeled datasets show that GDUI outperforms diffusion models without any guidance and significantly reduces the gap between it and models guided by ground-truth labels. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

17 pages, 25202 KiB

Open AccessArticle

Denoising Diffusion Models on Model-Based Latent Space

by Carmelo Scribano, Danilo Pezzi, Giorgia Franchini and Marco Prato

Algorithms 2023, 16(11), 501; https://doi.org/10.3390/a16110501 - 28 Oct 2023

Viewed by 3021

Abstract

With the recent advancements in the field of diffusion generative models, it has been shown that defining the generative process in the latent space of a powerful pretrained autoencoder can offer substantial advantages. This approach, by abstracting away imperceptible image details and introducing [...] Read more.

With the recent advancements in the field of diffusion generative models, it has been shown that defining the generative process in the latent space of a powerful pretrained autoencoder can offer substantial advantages. This approach, by abstracting away imperceptible image details and introducing substantial spatial compression, renders the learning of the generative process more manageable while significantly reducing computational and memory demands. In this work, we propose to replace autoencoder coding with a model-based coding scheme based on traditional lossy image compression techniques; this choice not only further diminishes computational expenses but also allows us to probe the boundaries of latent-space image generation. Our objectives culminate in the proposal of a valuable approximation for training continuous diffusion models within a discrete space, accompanied by enhancements to the generative model for categorical values. Beyond the good results obtained for the problem at hand, we believe that the proposed work holds promise for enhancing the adaptability of generative diffusion models across diverse data types beyond the realm of imagery. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

21 pages, 4300 KiB

Open AccessArticle

Indoor Scene Recognition: An Attention-Based Approach Using Feature Selection-Based Transfer Learning and Deep Liquid State Machine

by Ranjini Surendran, Ines Chihi, J. Anitha and D. Jude Hemanth

Algorithms 2023, 16(9), 430; https://doi.org/10.3390/a16090430 - 8 Sep 2023

Cited by 2 | Viewed by 2637

Abstract

Scene understanding is one of the most challenging areas of research in the fields of robotics and computer vision. Recognising indoor scenes is one of the research applications in the category of scene understanding that has gained attention in recent years. Recent developments [...] Read more.

Scene understanding is one of the most challenging areas of research in the fields of robotics and computer vision. Recognising indoor scenes is one of the research applications in the category of scene understanding that has gained attention in recent years. Recent developments in deep learning and transfer learning approaches have attracted huge attention in addressing this challenging area. In our work, we have proposed a fine-tuned deep transfer learning approach using DenseNet201 for feature extraction and a deep Liquid State Machine model as the classifier in order to develop a model for recognising and understanding indoor scenes. We have included fuzzy colour stacking techniques, colour-based segmentation, and an adaptive World Cup optimisation algorithm to improve the performance of our deep model. Our proposed model would dedicatedly assist the visually impaired and blind to navigate in the indoor environment and completely integrate into their day-to-day activities. Our proposed work was implemented on the NYU depth dataset and attained an accuracy of 96% for classifying the indoor scenes. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

22 pages, 7154 KiB

Open AccessArticle

A Comprehensive Analysis of Real-Time Car Safety Belt Detection Using the YOLOv7 Algorithm

by Lwando Nkuzo, Malusi Sibiya and Elisha Didam Markus

Algorithms 2023, 16(9), 400; https://doi.org/10.3390/a16090400 - 23 Aug 2023

Cited by 2 | Viewed by 4240

Abstract

Using a safety belt is crucial for preventing severe injuries and fatalities during vehicle accidents. In this paper, we propose a real-time vehicle occupant safety belt detection system based on the YOLOv7 (You Only Look Once version seven) object detection algorithm. The proposed [...] Read more.

Using a safety belt is crucial for preventing severe injuries and fatalities during vehicle accidents. In this paper, we propose a real-time vehicle occupant safety belt detection system based on the YOLOv7 (You Only Look Once version seven) object detection algorithm. The proposed approach aims to automatically detect whether the occupants of a vehicle have buckled their safety belts or not as soon as they are detected within the vehicle. A dataset for this purpose was collected and annotated for validation and testing. By leveraging the efficiency and accuracy of YOLOv7, we achieve near-instantaneous analysis of video streams, making our system suitable for deployment in various surveillance and automotive safety applications. This paper outlines a comprehensive methodology for training the YOLOv7 model using the labelImg tool to annotate the dataset with images showing vehicle occupants. It also discusses the challenges of detecting seat belts and evaluates the system’s performance on a real-world dataset. The evaluation focuses on distinguishing the status of a safety belt between two classes: “buckled” and “unbuckled”. The results demonstrate a high level of accuracy, with a mean average precision (mAP) of 99.6% and an F1 score of 98%, indicating the system’s effectiveness in identifying the safety belt status. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

22 pages, 2661 KiB

Open AccessArticle

Human Action Representation Learning Using an Attention-Driven Residual 3DCNN Network

by Hayat Ullah and Arslan Munir

Algorithms 2023, 16(8), 369; https://doi.org/10.3390/a16080369 - 31 Jul 2023

Cited by 3 | Viewed by 1790

Abstract

The recognition of human activities using vision-based techniques has become a crucial research field in video analytics. Over the last decade, there have been numerous advancements in deep learning algorithms aimed at accurately detecting complex human actions in video streams. While these algorithms [...] Read more.

The recognition of human activities using vision-based techniques has become a crucial research field in video analytics. Over the last decade, there have been numerous advancements in deep learning algorithms aimed at accurately detecting complex human actions in video streams. While these algorithms have demonstrated impressive performance in activity recognition, they often exhibit a bias towards either model performance or computational efficiency. This biased trade-off between robustness and efficiency poses challenges when addressing complex human activity recognition problems. To address this issue, this paper presents a computationally efficient yet robust approach, exploiting saliency-aware spatial and temporal features for human action recognition in videos. To achieve effective representation of human actions, we propose an efficient approach called the dual-attentional Residual 3D Convolutional Neural Network (DA-R3DCNN). Our proposed method utilizes a unified channel-spatial attention mechanism, allowing it to efficiently extract significant human-centric features from video frames. By combining dual channel-spatial attention layers with residual 3D convolution layers, the network becomes more discerning in capturing spatial receptive fields containing objects within the feature maps. To assess the effectiveness and robustness of our proposed method, we have conducted extensive experiments on four well-established benchmark datasets for human action recognition. The quantitative results obtained validate the efficiency of our method, showcasing significant improvements in accuracy of up to 11% as compared to state-of-the-art human action recognition methods. Additionally, our evaluation of inference time reveals that the proposed method achieves up to a 74× improvement in frames per second (FPS) compared to existing approaches, thus showing the suitability and effectiveness of the proposed DA-R3DCNN for real-time human activity recognition. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Algorithms for Image Processing and Machine Vision

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (17 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI