Algorithms for Image Processing and Machine Vision

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Algorithms for Multidisciplinary Applications".

Deadline for manuscript submissions: closed (31 January 2025) | Viewed by 31423

Special Issue Editor


E-Mail Website
Guest Editor
Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
Interests: artificial intelligence; computer vision; parallel computing; embedded systems; secure and trustworthy systems
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Modern image processing is a process of transforming an image into a digital form and using computing systems to process, manipulate, and/or enhance digital images through various algorithms. Image processing is also a requisite for many machine vision tasks as it helps to preprocess images and prepare data in a form suitable for various machine vision models. Machine vision generally refers to techniques and algorithms that enable computers/machines to understand and make sense of images. Machine vision enables machines to extract latent information from visual data and to mimic the human perception of sight with computational algorithms. Active research is ongoing on developing novel image processing and machine vision algorithms including deep learning-based algorithms for enabling new and fascinating applications.

This Special Issue targets algorithms for image processing and machine vision. This Special Issue invites original research articles and reviews that relate to computing, architecture, algorithms, security, and applications of image processing and machine vision. Topics of interest include but are not limited to the following:

  • Image interpretation;
  • Object detection and recognition;
  • Spatial artificial intelligence;
  • Event detection and activity recognition;
  • Image segmentation;
  • Video classification and analysis;
  • Face and gesture recognition;
  • Pose estimation;
  • Computational photography;
  • Image security;
  • Vision hardware and/or software architectures;
  • Image/vision acceleration techniques;
  • Monitoring and surveillance;
  • Situational awareness.

Dr. Arslan Munir
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • machine vision
  • image fusion
  • vision algorithms
  • deep learning
  • stereo vision
  • activity recognition
  • image/video analysis
  • image encryption algorithms
  • computational photography
  • vision hardware/software
  • monitoring and surveillance
  • biometrics
  • robotics
  • augmented reality

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (17 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

28 pages, 9307 KiB  
Article
Application Framework and Optimal Features for UAV-Based Earthquake-Induced Structural Displacement Monitoring
by Ruipu Ji, Shokrullah Sorosh, Eric Lo, Tanner J. Norton, John W. Driscoll, Falko Kuester, Andre R. Barbosa, Barbara G. Simpson and Tara C. Hutchinson
Algorithms 2025, 18(2), 66; https://doi.org/10.3390/a18020066 - 26 Jan 2025
Viewed by 471
Abstract
Unmanned aerial vehicle (UAV) vision-based sensing has become an emerging technology for structural health monitoring (SHM) and post-disaster damage assessment of civil infrastructure. This article proposes a framework for monitoring structural displacement under earthquakes by reprojecting image points obtained courtesy of UAV-captured videos [...] Read more.
Unmanned aerial vehicle (UAV) vision-based sensing has become an emerging technology for structural health monitoring (SHM) and post-disaster damage assessment of civil infrastructure. This article proposes a framework for monitoring structural displacement under earthquakes by reprojecting image points obtained courtesy of UAV-captured videos to the 3-D world space based on the world-to-image point correspondences. To identify optimal features in the UAV imagery, geo-reference targets with various patterns were installed on a test building specimen, which was then subjected to earthquake shaking. A feature point tracking-based algorithm for square checkerboard patterns and a Hough Transform-based algorithm for concentric circular patterns are developed to ensure reliable detection and tracking of image features. Photogrammetry techniques are applied to reconstruct the 3-D world points and extract structural displacements. The proposed methodology is validated by monitoring the displacements of a full-scale 6-story mass timber building during a series of shake table tests. Reasonable accuracy is achieved in that the overall root-mean-square errors of the tracking results are at the millimeter level compared to ground truth measurements from analog sensors. Insights on optimal features for monitoring structural dynamic response are discussed based on statistical analysis of the error characteristics for the various reference target patterns used to track the structural displacements. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Graphical abstract

23 pages, 205579 KiB  
Article
DDL R-CNN: Dynamic Direction Learning R-CNN for Rotated Object Detection
by Weixian Su and Donglin Jing
Algorithms 2025, 18(1), 21; https://doi.org/10.3390/a18010021 - 4 Jan 2025
Viewed by 743
Abstract
Current remote sensing (RS) detectors often rely on predefined anchor boxes with fixed angles to handle the multi-directional variations of targets. This approach makes it challenging to accurately select regions of interest and extract features that align with the direction of the targets. [...] Read more.
Current remote sensing (RS) detectors often rely on predefined anchor boxes with fixed angles to handle the multi-directional variations of targets. This approach makes it challenging to accurately select regions of interest and extract features that align with the direction of the targets. Most existing regression methods also adopt angle regression to match the attributes of remote sensing detectors. Due to the inconsistent regression direction and massive anchor boxes with a high aspect ratio, the extracted target features change greatly, the loss function changes drastically, and the training is unstable. However, existing RS detectors and regression techniques have not been able to effectively balance the precision of directional feature extraction with the complexity of the models. To address these challenges, this paper introduces a novel approach known as Dynamic Direction Learning R-CNN (DDL R-CNN), which comprises a dynamic direction learning (DDL) module and a boundary center region offset generation network (BC-ROPN). The DDL module pre-extracts the directional features of targets to provide a coarse estimation of their angles and the corresponding weights. This information is used to generate rotationally aligned anchor boxes that better model the directional features of the targets. BC-ROPN represents an innovative method for anchor box regression. It utilizes the central features of the maximum bounding rectangle’s width and height, along with the coarse angle estimation and weights derived from DDL module, to refine the orientation of the anchor box. Our method has been proven to surpass existing rotating detection networks in extensive testing across two widely used remote sensing detection datasets, namely UCAS-AOD and HRSC2016. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Graphical abstract

21 pages, 5152 KiB  
Article
GAGAN: Enhancing Image Generation Through Hybrid Optimization of Genetic Algorithms and Deep Convolutional Generative Adversarial Networks
by Despoina Konstantopoulou, Paraskevi Zacharia, Michail Papoutsidakis, Helen C. Leligou and Charalampos Patrikakis
Algorithms 2024, 17(12), 584; https://doi.org/10.3390/a17120584 - 19 Dec 2024
Viewed by 1136
Abstract
Generative Adversarial Networks (GANs) are highly effective for generating realistic images, yet their training can be unstable due to challenges such as mode collapse and oscillatory convergence. In this paper, we propose a novel hybrid optimization method that integrates Genetic Algorithms (GAs) to [...] Read more.
Generative Adversarial Networks (GANs) are highly effective for generating realistic images, yet their training can be unstable due to challenges such as mode collapse and oscillatory convergence. In this paper, we propose a novel hybrid optimization method that integrates Genetic Algorithms (GAs) to improve the training process of Deep Convolutional GANs (DCGANs). Specifically, GAs are used to evolve the discriminator’s weights, complementing the gradient-based learning typically employed in GANs. The proposed GAGAN model is trained on the CelebA dataset, using 2000 images, to generate 128 × 128 images, with the generator learning to produce realistic faces from random latent vectors. The discriminator, which classifies images as real or fake, is optimized not only through standard backpropagation, but also through a GA framework that evolves its weights via crossover, mutation, and selection processes. This hybrid method aims to enhance convergence stability and boost image quality by balancing local search from gradient-based methods with the global search capabilities of GAs. Experiments show that the proposed approach reduces generator loss and improves image fidelity, demonstrating that evolutionary algorithms can effectively complement deep learning techniques. This work opens new avenues for optimizing GAN training and enhancing performance in generative models. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

20 pages, 2003 KiB  
Article
Enhanced Curvature-Based Fabric Defect Detection: A Experimental Study with Gabor Transform and Deep Learning
by Mehmet Erdogan and Mustafa Dogan
Algorithms 2024, 17(11), 506; https://doi.org/10.3390/a17110506 - 5 Nov 2024
Viewed by 841
Abstract
Quality control at every stage of production in the textile industry is essential for maintaining competitiveness in the global market. Manual fabric defect inspections are often characterized by low precision and high time costs, in contrast to intelligent anomaly detection systems implemented in [...] Read more.
Quality control at every stage of production in the textile industry is essential for maintaining competitiveness in the global market. Manual fabric defect inspections are often characterized by low precision and high time costs, in contrast to intelligent anomaly detection systems implemented in the early stages of fabric production. To achieve successful automated fabric defect identification, significant challenges must be addressed, including accurate detection, classification, and decision-making processes. Traditionally, fabric defect classification has relied on inefficient and labor-intensive human visual inspection, particularly as the variety of fabric defects continues to increase. Despite the global chip crisis and its adverse effects on supply chains, electronic hardware costs for quality control systems have become more affordable. This presents a notable advantage, as vision systems can now be easily developed with the use of high-resolution, advanced cameras. In this study, we propose a discrete curvature algorithm, integrated with the Gabor transform, which demonstrates significant success in near real-time defect classification. The primary contribution of this work is the development of a modified curvature algorithm that achieves high classification performance without the need for training. This method is particularly efficient due to its low data storage requirements and minimal processing time, making it ideal for real-time applications. Furthermore, we implemented and evaluated several other methods from the literature, including Gabor and Convolutional Neural Networks (CNNs), within a unified coding framework. Each defect type was analyzed individually, with results indicating that the proposed algorithm exhibits comparable success and robust performance relative to deep learning-based approaches. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

18 pages, 5670 KiB  
Article
Improved U2Net-Based Surface Defect Detection Method for Blister Tablets
by Jianmin Zhou, Jian Huang, Jikang Liu and Jingbo Liu
Algorithms 2024, 17(10), 429; https://doi.org/10.3390/a17100429 - 26 Sep 2024
Viewed by 829
Abstract
Aiming at the problem that the surface defects of blAister tablets are difficult to detect correctly, this paper proposes a detection method based on the improved U2Net. First, the features extracted from the RSU module of U2Net are enhanced and adjusted using the [...] Read more.
Aiming at the problem that the surface defects of blAister tablets are difficult to detect correctly, this paper proposes a detection method based on the improved U2Net. First, the features extracted from the RSU module of U2Net are enhanced and adjusted using the large kernel attention mechanism, so that the U2Net model strengthens its ability to extract defective features. Second, a loss function combining the Gaussian Laplace operator and the cross-entropy function is designed to make the model strengthen its ability to detect edge defects on the surface of blister tablets. Finally, thresholds are adaptively determined using the local mean and OTSU(an adaptive threshold segmentation method) method to improve accuracy. The experimental results show that the method proposed in this paper can reach an average accuracy of 99% and an average precision rate of 96.3%; the model test only takes 50 ms per image, which can meet the rapid detection requirements. Minor surface defects can also be accurately detected, which is better than other algorithmic models of the same type, proving the effectiveness of this method. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

19 pages, 5132 KiB  
Article
Synthetic Face Discrimination via Learned Image Compression
by Sofia Iliopoulou, Panagiotis Tsinganos, Dimitris Ampeliotis and Athanassios Skodras
Algorithms 2024, 17(9), 375; https://doi.org/10.3390/a17090375 - 23 Aug 2024
Viewed by 923
Abstract
The emergence of deep learning has sparked notable strides in the quality of synthetic media. Yet, as photorealism reaches new heights, the line between generated and authentic images blurs, raising concerns about the dissemination of counterfeit or manipulated content online. Consequently, there is [...] Read more.
The emergence of deep learning has sparked notable strides in the quality of synthetic media. Yet, as photorealism reaches new heights, the line between generated and authentic images blurs, raising concerns about the dissemination of counterfeit or manipulated content online. Consequently, there is a pressing need to develop automated tools capable of effectively distinguishing synthetic images, especially those portraying faces, which is one of the most commonly encountered issues. In this work, we propose a novel approach to synthetic face discrimination, leveraging deep learning-based image compression and predominantly utilizing the quality metrics of an image to determine its authenticity. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

19 pages, 7973 KiB  
Article
Determining Thresholds for Optimal Adaptive Discrete Cosine Transformation
by Alexander Khanov, Anastasija Shulzhenko, Anzhelika Voroshilova, Alexander Zubarev, Timur Karimov and Shakeeb Fahmi
Algorithms 2024, 17(8), 366; https://doi.org/10.3390/a17080366 - 21 Aug 2024
Viewed by 783
Abstract
The discrete cosine transform (DCT) is widely used for image and video compression. Lossy algorithms such as JPEG, WebP, BPG and many others are based on it. Multiple modifications of DCT have been developed to improve its performance. One of them is adaptive [...] Read more.
The discrete cosine transform (DCT) is widely used for image and video compression. Lossy algorithms such as JPEG, WebP, BPG and many others are based on it. Multiple modifications of DCT have been developed to improve its performance. One of them is adaptive DCT (ADCT) designed to deal with heterogeneous image structure and it may be found, for example, in the HEVC video codec. Adaptivity means that the image is divided into an uneven grid of squares: smaller ones retain information about details better, while larger squares are efficient for homogeneous backgrounds. The practical use of adaptive DCT algorithms is complicated by the lack of optimal threshold search algorithms for image partitioning procedures. In this paper, we propose a novel method for optimal threshold search in ADCT using a metric based on tonal distribution. We define two thresholds: pm, the threshold defining solid mean coloring, and ps, defining the quadtree fragment splitting. In our algorithm, the values of these thresholds are calculated via polynomial functions of the tonal distribution of a particular image or fragment. The polynomial coefficients are determined using the dedicated optimization procedure on the dataset containing images from the specific domain, urban road scenes in our case. In the experimental part of the study, we show that ADCT allows a higher compression ratio compared to non-adaptive DCT at the same level of quality loss, up to 66% for acceptable quality. The proposed algorithm may be used directly for image compression, or as a core of video compression framework in traffic-demanding applications, such as urban video surveillance systems. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

13 pages, 14978 KiB  
Article
Lester: Rotoscope Animation through Video Object Segmentation and Tracking
by Ruben Tous
Algorithms 2024, 17(8), 330; https://doi.org/10.3390/a17080330 - 30 Jul 2024
Viewed by 1299
Abstract
This article introduces Lester, a novel method to automatically synthesize retro-style 2D animations from videos. The method approaches the challenge mainly as an object segmentation and tracking problem. Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are [...] Read more.
This article introduces Lester, a novel method to automatically synthesize retro-style 2D animations from videos. The method approaches the challenge mainly as an object segmentation and tracking problem. Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are tracked through subsequent frames with DeAOT, a method of hierarchical propagation for semi-supervised video object segmentation. The geometry of the masks’ contours is simplified with the Douglas–Peucker algorithm. Finally, facial traits, pixelation and a basic rim light effect can be optionally added. The results show that the method exhibits an excellent temporal consistency and can correctly process videos with different poses and appearances, dynamic shots, partial shots and diverse backgrounds. The proposed method provides a more simple and deterministic approach than diffusion models based video-to-video translation pipelines, which suffer from temporal consistency problems and do not cope well with pixelated and schematic outputs. The method is also more feasible than techniques based on 3D human pose estimation, which require custom handcrafted 3D models and are very limited with respect to the type of scenes they can process. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

13 pages, 2390 KiB  
Article
Continuous Recognition of Teachers’ Hand Signals for Students with Attention Deficits
by Ivane Delos Santos Chen, Chieh-Ming Yang, Shang-Shu Wu, Chih-Kang Yang, Mei-Juan Chen, Chia-Hung Yeh and Yuan-Hong Lin
Algorithms 2024, 17(7), 300; https://doi.org/10.3390/a17070300 - 7 Jul 2024
Viewed by 1077
Abstract
In the era of inclusive education, students with attention deficits are integrated into the general classroom. To ensure a seamless transition of students’ focus towards the teacher’s instruction throughout the course and to align with the teaching pace, this paper proposes a continuous [...] Read more.
In the era of inclusive education, students with attention deficits are integrated into the general classroom. To ensure a seamless transition of students’ focus towards the teacher’s instruction throughout the course and to align with the teaching pace, this paper proposes a continuous recognition algorithm for capturing teachers’ dynamic gesture signals. This algorithm aims to offer instructional attention cues for students with attention deficits. According to the body landmarks of the teacher’s skeleton by using vision and machine learning-based MediaPipe BlazePose, the proposed method uses simple rules to detect the teacher’s hand signals dynamically and provides three kinds of attention cues (Pointing to left, Pointing to right, and Non-pointing) during the class. Experimental results show the average accuracy, sensitivity, specificity, precision, and F1 score achieved 88.31%, 91.03%, 93.99%, 86.32%, and 88.03%, respectively. By analyzing non-verbal behavior, our method of competent performance can replace verbal reminders from the teacher and be helpful for students with attention deficits in inclusive education. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

31 pages, 14424 KiB  
Article
Enhancing Video Anomaly Detection Using a Transformer Spatiotemporal Attention Unsupervised Framework for Large Datasets
by Mohamed H. Habeb, May Salama and Lamiaa A. Elrefaei
Algorithms 2024, 17(7), 286; https://doi.org/10.3390/a17070286 - 1 Jul 2024
Cited by 2 | Viewed by 2056
Abstract
This work introduces an unsupervised framework for video anomaly detection, leveraging a hybrid deep learning model that combines a vision transformer (ViT) with a convolutional spatiotemporal relationship (STR) attention block. The proposed model addresses the challenges of anomaly detection in video surveillance by [...] Read more.
This work introduces an unsupervised framework for video anomaly detection, leveraging a hybrid deep learning model that combines a vision transformer (ViT) with a convolutional spatiotemporal relationship (STR) attention block. The proposed model addresses the challenges of anomaly detection in video surveillance by capturing both local and global relationships within video frames, a task that traditional convolutional neural networks (CNNs) often struggle with due to their localized field of view. We have utilized a pre-trained ViT as an encoder for feature extraction, which is then processed by the STR attention block to enhance the detection of spatiotemporal relationships among objects in videos. The novelty of this work is utilizing the ViT with the STR attention to detect video anomalies effectively in large and heterogeneous datasets, an important thing given the diverse environments and scenarios encountered in real-world surveillance. The framework was evaluated on three benchmark datasets, i.e., the UCSD-Ped2, CHUCK Avenue, and ShanghaiTech. This demonstrates the model’s superior performance in detecting anomalies compared to state-of-the-art methods, showcasing its potential to significantly enhance automated video surveillance systems by achieving area under the receiver operating characteristic curve (AUC ROC) values of 95.6, 86.8, and 82.1. To show the effectiveness of the proposed framework in detecting anomalies in extra-large datasets, we trained the model on a subset of the huge contemporary CHAD dataset that contains over 1 million frames, achieving AUC ROC values of 71.8 and 64.2 for CHAD-Cam 1 and CHAD-Cam 2, respectively, which outperforms the state-of-the-art techniques. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

19 pages, 1087 KiB  
Article
Simple Histogram Equalization Technique Improves Performance of VGG Models on Facial Emotion Recognition Datasets
by Jaher Hassan Chowdhury, Qian Liu and Sheela Ramanna
Algorithms 2024, 17(6), 238; https://doi.org/10.3390/a17060238 - 3 Jun 2024
Cited by 4 | Viewed by 2440
Abstract
Facial emotion recognition (FER) is crucial across psychology, neuroscience, computer vision, and machine learning due to the diversified and subjective nature of emotions, varying considerably across individuals, cultures, and contexts. This study explored FER through convolutional neural networks (CNNs) and Histogram Equalization techniques. [...] Read more.
Facial emotion recognition (FER) is crucial across psychology, neuroscience, computer vision, and machine learning due to the diversified and subjective nature of emotions, varying considerably across individuals, cultures, and contexts. This study explored FER through convolutional neural networks (CNNs) and Histogram Equalization techniques. It investigated the impact of histogram equalization, data augmentation, and various model optimization strategies on FER accuracy across different datasets like KDEF, CK+, and FER2013. Using pre-trained VGG architectures, such as VGG19 and VGG16, this study also examined the effectiveness of fine-tuning hyperparameters and implementing different learning rate schedulers. The evaluation encompassed diverse metrics including accuracy, Area Under the Receiver Operating Characteristic Curve (AUC-ROC), Area Under the Precision–Recall Curve (AUC-PRC), and Weighted F1 score. Notably, the fine-tuned VGG architecture demonstrated a state-of-the-art performance compared to conventional transfer learning models and achieved 100%, 95.92%, and 69.65% on the CK+, KDEF, and FER2013 datasets, respectively. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

16 pages, 2857 KiB  
Article
Enforcing Traffic Safety: A Deep Learning Approach for Detecting Motorcyclists’ Helmet Violations Using YOLOv8 and Deep Convolutional Generative Adversarial Network-Generated Images
by Maged Shoman, Tarek Ghoul, Gabriel Lanzaro, Tala Alsharif, Suliman Gargoum and Tarek Sayed
Algorithms 2024, 17(5), 202; https://doi.org/10.3390/a17050202 - 10 May 2024
Viewed by 2280
Abstract
In this study, we introduce an innovative methodology for the detection of helmet usage violations among motorcyclists, integrating the YOLOv8 object detection algorithm with deep convolutional generative adversarial networks (DCGANs). The objective of this research is to enhance the precision of existing helmet [...] Read more.
In this study, we introduce an innovative methodology for the detection of helmet usage violations among motorcyclists, integrating the YOLOv8 object detection algorithm with deep convolutional generative adversarial networks (DCGANs). The objective of this research is to enhance the precision of existing helmet violation detection techniques, which are typically reliant on manual inspection and susceptible to inaccuracies. The proposed methodology involves model training on an extensive dataset comprising both authentic and synthetic images, and demonstrates high accuracy in identifying helmet violations, including scenarios with multiple riders. Data augmentation, in conjunction with synthetic images produced by DCGANs, is utilized to expand the training data volume, particularly focusing on imbalanced classes, thereby facilitating superior model generalization to real-world circumstances. The stand-alone YOLOv8 model exhibited an F1 score of 0.91 for all classes at a confidence level of 0.617, whereas the DCGANs + YOLOv8 model demonstrated an F1 score of 0.96 for all classes at a reduced confidence level of 0.334. These findings highlight the potential of DCGANs in enhancing the accuracy of helmet rule violation detection, thus fostering safer motorcycling practices. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

16 pages, 808 KiB  
Article
GDUI: Guided Diffusion Model for Unlabeled Images
by Xuanyuan Xie and Jieyu Zhao
Algorithms 2024, 17(3), 125; https://doi.org/10.3390/a17030125 - 18 Mar 2024
Viewed by 2344
Abstract
The diffusion model has made progress in the field of image synthesis, especially in the area of conditional image synthesis. However, this improvement is highly dependent on large annotated datasets. To tackle this challenge, we present the Guided Diffusion model for Unlabeled Images [...] Read more.
The diffusion model has made progress in the field of image synthesis, especially in the area of conditional image synthesis. However, this improvement is highly dependent on large annotated datasets. To tackle this challenge, we present the Guided Diffusion model for Unlabeled Images (GDUI) framework in this article. It utilizes the inherent feature similarity and semantic differences in the data, as well as the downstream transferability of Contrastive Language-Image Pretraining (CLIP), to guide the diffusion model in generating high-quality images. We design two semantic-aware algorithms, namely, the pseudo-label-matching algorithm and label-matching refinement algorithm, to match the clustering results with the true semantic information and provide more accurate guidance for the diffusion model. First, GDUI encodes the image into a semantically meaningful latent vector through clustering. Then, pseudo-label matching is used to complete the matching of the true semantic information of the image. Finally, the label-matching refinement algorithm is used to adjust the irrelevant semantic information in the data, thereby improving the quality of the guided diffusion model image generation. Our experiments on labeled datasets show that GDUI outperforms diffusion models without any guidance and significantly reduces the gap between it and models guided by ground-truth labels. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

17 pages, 25202 KiB  
Article
Denoising Diffusion Models on Model-Based Latent Space
by Carmelo Scribano, Danilo Pezzi, Giorgia Franchini and Marco Prato
Algorithms 2023, 16(11), 501; https://doi.org/10.3390/a16110501 - 28 Oct 2023
Viewed by 3021
Abstract
With the recent advancements in the field of diffusion generative models, it has been shown that defining the generative process in the latent space of a powerful pretrained autoencoder can offer substantial advantages. This approach, by abstracting away imperceptible image details and introducing [...] Read more.
With the recent advancements in the field of diffusion generative models, it has been shown that defining the generative process in the latent space of a powerful pretrained autoencoder can offer substantial advantages. This approach, by abstracting away imperceptible image details and introducing substantial spatial compression, renders the learning of the generative process more manageable while significantly reducing computational and memory demands. In this work, we propose to replace autoencoder coding with a model-based coding scheme based on traditional lossy image compression techniques; this choice not only further diminishes computational expenses but also allows us to probe the boundaries of latent-space image generation. Our objectives culminate in the proposal of a valuable approximation for training continuous diffusion models within a discrete space, accompanied by enhancements to the generative model for categorical values. Beyond the good results obtained for the problem at hand, we believe that the proposed work holds promise for enhancing the adaptability of generative diffusion models across diverse data types beyond the realm of imagery. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

21 pages, 4300 KiB  
Article
Indoor Scene Recognition: An Attention-Based Approach Using Feature Selection-Based Transfer Learning and Deep Liquid State Machine
by Ranjini Surendran, Ines Chihi, J. Anitha and D. Jude Hemanth
Algorithms 2023, 16(9), 430; https://doi.org/10.3390/a16090430 - 8 Sep 2023
Cited by 2 | Viewed by 2637
Abstract
Scene understanding is one of the most challenging areas of research in the fields of robotics and computer vision. Recognising indoor scenes is one of the research applications in the category of scene understanding that has gained attention in recent years. Recent developments [...] Read more.
Scene understanding is one of the most challenging areas of research in the fields of robotics and computer vision. Recognising indoor scenes is one of the research applications in the category of scene understanding that has gained attention in recent years. Recent developments in deep learning and transfer learning approaches have attracted huge attention in addressing this challenging area. In our work, we have proposed a fine-tuned deep transfer learning approach using DenseNet201 for feature extraction and a deep Liquid State Machine model as the classifier in order to develop a model for recognising and understanding indoor scenes. We have included fuzzy colour stacking techniques, colour-based segmentation, and an adaptive World Cup optimisation algorithm to improve the performance of our deep model. Our proposed model would dedicatedly assist the visually impaired and blind to navigate in the indoor environment and completely integrate into their day-to-day activities. Our proposed work was implemented on the NYU depth dataset and attained an accuracy of 96% for classifying the indoor scenes. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

22 pages, 7154 KiB  
Article
A Comprehensive Analysis of Real-Time Car Safety Belt Detection Using the YOLOv7 Algorithm
by Lwando Nkuzo, Malusi Sibiya and Elisha Didam Markus
Algorithms 2023, 16(9), 400; https://doi.org/10.3390/a16090400 - 23 Aug 2023
Cited by 2 | Viewed by 4240
Abstract
Using a safety belt is crucial for preventing severe injuries and fatalities during vehicle accidents. In this paper, we propose a real-time vehicle occupant safety belt detection system based on the YOLOv7 (You Only Look Once version seven) object detection algorithm. The proposed [...] Read more.
Using a safety belt is crucial for preventing severe injuries and fatalities during vehicle accidents. In this paper, we propose a real-time vehicle occupant safety belt detection system based on the YOLOv7 (You Only Look Once version seven) object detection algorithm. The proposed approach aims to automatically detect whether the occupants of a vehicle have buckled their safety belts or not as soon as they are detected within the vehicle. A dataset for this purpose was collected and annotated for validation and testing. By leveraging the efficiency and accuracy of YOLOv7, we achieve near-instantaneous analysis of video streams, making our system suitable for deployment in various surveillance and automotive safety applications. This paper outlines a comprehensive methodology for training the YOLOv7 model using the labelImg tool to annotate the dataset with images showing vehicle occupants. It also discusses the challenges of detecting seat belts and evaluates the system’s performance on a real-world dataset. The evaluation focuses on distinguishing the status of a safety belt between two classes: “buckled” and “unbuckled”. The results demonstrate a high level of accuracy, with a mean average precision (mAP) of 99.6% and an F1 score of 98%, indicating the system’s effectiveness in identifying the safety belt status. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

22 pages, 2661 KiB  
Article
Human Action Representation Learning Using an Attention-Driven Residual 3DCNN Network
by Hayat Ullah and Arslan Munir
Algorithms 2023, 16(8), 369; https://doi.org/10.3390/a16080369 - 31 Jul 2023
Cited by 3 | Viewed by 1790
Abstract
The recognition of human activities using vision-based techniques has become a crucial research field in video analytics. Over the last decade, there have been numerous advancements in deep learning algorithms aimed at accurately detecting complex human actions in video streams. While these algorithms [...] Read more.
The recognition of human activities using vision-based techniques has become a crucial research field in video analytics. Over the last decade, there have been numerous advancements in deep learning algorithms aimed at accurately detecting complex human actions in video streams. While these algorithms have demonstrated impressive performance in activity recognition, they often exhibit a bias towards either model performance or computational efficiency. This biased trade-off between robustness and efficiency poses challenges when addressing complex human activity recognition problems. To address this issue, this paper presents a computationally efficient yet robust approach, exploiting saliency-aware spatial and temporal features for human action recognition in videos. To achieve effective representation of human actions, we propose an efficient approach called the dual-attentional Residual 3D Convolutional Neural Network (DA-R3DCNN). Our proposed method utilizes a unified channel-spatial attention mechanism, allowing it to efficiently extract significant human-centric features from video frames. By combining dual channel-spatial attention layers with residual 3D convolution layers, the network becomes more discerning in capturing spatial receptive fields containing objects within the feature maps. To assess the effectiveness and robustness of our proposed method, we have conducted extensive experiments on four well-established benchmark datasets for human action recognition. The quantitative results obtained validate the efficiency of our method, showcasing significant improvements in accuracy of up to 11% as compared to state-of-the-art human action recognition methods. Additionally, our evaluation of inference time reveals that the proposed method achieves up to a 74× improvement in frames per second (FPS) compared to existing approaches, thus showing the suitability and effectiveness of the proposed DA-R3DCNN for real-time human activity recognition. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

Back to TopTop