Deep Learning for Computer Vision Application

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 15 February 2025 | Viewed by 14745

Special Issue Editor


E-Mail Website
Guest Editor
Research Officer (AI/ML Expert), Construction Research Centre, National Research Council Canada, Ottawa, ON K1A 0R6, Canada
Interests: computer vision; image processing; artificial intelligence; deep learning; medical imaging; thermal imaging; spectroscopy; virtual reality; data analytics and risk assessment; electronics/embedded systems
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Artificial intelligence (AI) methods, and more specifically deep neural networks (also called deep learning models), have became the core technique for computer vision tasks across various applications. The advent of these powerful deep learning models allows state-of-the-art automation levels in autonomous pattern recognition from image data. In a general sense, the ultimate manifestation of these techniques can be seen in our daily life, from automatically sorting and retrieving photos in Google photos to autonomous cars. However, these powerful techniques still have not been utilized in all computer vision tasks. Future studies should seek to find more applications of AI in our life, e.g., via data acquisition and cleaning, as well as more model optimization, innovation, and research. In this Special Issue, we are particularly interested in new applications of deep learning in the computer vision field.

Topics of interest include but are not limited to:

  • Image classification using deep learning;
  • Object detection using deep learning;
  • Semantic and instant segmentation using deep learning;
  • Deep learning techniques for generating new images (generative adversarial networks);
  • Employing reinforcement learning for computer vision tasks;
  • Application of deep learning in the Internet of Things (IoT);
  • Application of deep learning in embedded systems, sensor development, and electronics;
  • Computer vision tasks using deep learning (medical image processing, remote sensing, hyperspectral imaging, thermal imaging, space and extraterrestrial observations);
  • Image sequence analysis using deep learning;
  • Deep learning and computer vision for smart and green building, smart industry, and smart devices.

Dr. Hamed Mozaffari
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • convolutional neural network
  • deep learning
  • computer vision
  • artificial intelligence
  • image processing
  • medical image processing
  • internet of things
  • thermal imaging
  • image technologies
  • application of deep learning
  • autonomous vehicles
  • image classification
  • object detection
  • and object segmentation

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

16 pages, 5772 KiB  
Article
Optimizing Football Formation Analysis via LSTM-Based Event Detection
by Benjamin Orr, Ephraim Pan and Dah-Jye Lee
Electronics 2024, 13(20), 4105; https://doi.org/10.3390/electronics13204105 - 18 Oct 2024
Viewed by 630
Abstract
The process of manually annotating sports footage is a demanding one. In American football alone, coaches spend thousands of hours reviewing and analyzing videos each season. We aim to automate this process by developing a system that generates comprehensive statistical reports from full-length [...] Read more.
The process of manually annotating sports footage is a demanding one. In American football alone, coaches spend thousands of hours reviewing and analyzing videos each season. We aim to automate this process by developing a system that generates comprehensive statistical reports from full-length football game videos. Having previously demonstrated the proof of concept for our system, here, we present optimizations to our preprocessing techniques along with an inventive method for multi-person event detection in sports videos. Employing a long short-term memory (LSTM)-based architecture to detect the snap in American football, we achieve an outstanding LSI (Levenshtein similarity index) of 0.9445, suggesting a normalized difference of less than 0.06 between predictions and ground truth labels. We also illustrate the utility of snap detection as a means of identifying the offensive players’ assuming of formation. Our results exhibit not only the success of our unique approach and underlying optimizations but also the potential for continued robustness as we pursue the development of our remaining system components. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision Application)
Show Figures

Graphical abstract

19 pages, 9439 KiB  
Article
MFAD-RTDETR: A Multi-Frequency Aggregate Diffusion Feature Flow Composite Model for Printed Circuit Board Defect Detection
by Zhihua Xie and Xiaowei Zou
Electronics 2024, 13(17), 3557; https://doi.org/10.3390/electronics13173557 - 7 Sep 2024
Viewed by 1176
Abstract
To address the challenges of excessive model parameters and low detection accuracy in printed circuit board (PCB) defect detection, this paper proposes a novel PCB defect detection model based on the improved RTDETR (Real-Time Detection, Embedding and Tracking) method, named MFAD-RTDETR. Specifically, the [...] Read more.
To address the challenges of excessive model parameters and low detection accuracy in printed circuit board (PCB) defect detection, this paper proposes a novel PCB defect detection model based on the improved RTDETR (Real-Time Detection, Embedding and Tracking) method, named MFAD-RTDETR. Specifically, the proposed model introduces the designed Detail Feature Retainer (DFR) into the original RTDETR backbone to capture and retain local details. Subsequently, based on the Mamba architecture, the Visual State Space (VSS) module is integrated to enhance global attention while reducing the original quadratic complexity to a linear level. Furthermore, by exploiting the deformable attention mechanism, which dynamically adjusts reference points, the model achieves precise localization of target defects and improves the accuracy of the transformer in complex visual tasks. Meanwhile, a receptive field synthesis mechanism is incorporated to enrich multi-scale semantic information and reduce parameter complexity. In addition, the scheme proposes a novel Multi-frequency Aggregation and Diffusion feature composite paradigm (MFAD-feature composite paradigm), which consists of the Aggregation Diffusion Fusion (ADF) module and the Refiner Feature Composition (RFC) module. It aims to strengthen features with fine-grained awareness while preserving a certain level of global attention. Finally, the Wise IoU (WIoU) dynamic nonmonotonic focusing mechanism is used to reduce competition among high-quality anchor boxes and mitigate the effects of the harmful gradients from low-quality examples, thereby concentrating on anchor boxes of average quality to promote the overall performance of the detector. Extensive experiments are conducted on the PCB defect dataset released by Peking University to validate the effectiveness of the proposed model. The experimental results show that our approach achieves the 97.0% and 51.0% performance in mean Average Precision (mAP)@0.5 and [email protected]:0.95, respectively, which significantly outperforms the original RTDETR. Moreover, the model reduces the number of parameters by approximately 18.2% compared to the original RTDETR. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision Application)
Show Figures

Figure 1

18 pages, 89225 KiB  
Article
Graph Attention Networks and Track Management for Multiple Object Tracking
by Yajuan Zhang, Yongquan Liang, Ahmed Elazab, Zhihui Wang and Changmiao Wang
Electronics 2023, 12(19), 4079; https://doi.org/10.3390/electronics12194079 - 28 Sep 2023
Cited by 1 | Viewed by 1889
Abstract
Multiple object tracking (MOT) constitutes a critical research area within the field of computer vision. The creation of robust and efficient systems, which can approximate the mechanisms of human vision, is essential to enhance the efficacy of multiple object-tracking techniques. However, obstacles such [...] Read more.
Multiple object tracking (MOT) constitutes a critical research area within the field of computer vision. The creation of robust and efficient systems, which can approximate the mechanisms of human vision, is essential to enhance the efficacy of multiple object-tracking techniques. However, obstacles such as repetitive target appearances and frequent occlusions cause considerable inaccuracies or omissions in detection. Following the updating of these inaccurate observations into the tracklet, the effectiveness of the tracking model, employing appearance features, declines significantly. This paper introduces a novel method of multiple object tracking, employing graph attention networks and track management (GATM). Utilizing a graph attention network, an attention mechanism is employed to capture the relationships of nodes within the graph as well as node-to-node correlations across graphs. This mechanism allows selective focus on the features of advantageous nodes and enhances discriminability between node features, subsequently improving the performance and robustness of multiple object tracking. Simultaneously, we categorize distinct tracklet states and introduce an efficient track management method, which employs varying processing techniques for tracklets in diverse states. This method can manage occluded tracks in crowded scenes and improves tracking accuracy. Experiments conducted on three challenging public datasets (MOT16, MOT17, and MOT20) demonstrate that our method could deliver competitive performance. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision Application)
Show Figures

Figure 1

14 pages, 3711 KiB  
Article
VHR-BirdPose: Vision Transformer-Based HRNet for Bird Pose Estimation with Attention Mechanism
by Runang He, Xiaomin Wang, Huazhen Chen and Chang Liu
Electronics 2023, 12(17), 3643; https://doi.org/10.3390/electronics12173643 - 29 Aug 2023
Cited by 3 | Viewed by 2118
Abstract
Pose estimation plays a crucial role in recognizing and analyzing the postures, actions, and movements of humans and animals using computer vision and machine learning techniques. However, bird pose estimation encounters specific challenges, including bird diversity, posture variation, and the fine granularity of [...] Read more.
Pose estimation plays a crucial role in recognizing and analyzing the postures, actions, and movements of humans and animals using computer vision and machine learning techniques. However, bird pose estimation encounters specific challenges, including bird diversity, posture variation, and the fine granularity of posture. To overcome these challenges, we propose VHR-BirdPose, a method that combines Vision Transformer (ViT) and Deep High-Resolution Network (HRNet) with an attention mechanism. VHR-BirdPose effectively extracts features using Vision Transformer’s self-attention mechanism, which captures global dependencies in the images and allows for better capturing of pose details and changes. The attention mechanism is employed to enhance the focus on bird keypoints, improving the accuracy of pose estimation. By combining HRNet with Vision Transformer, our model can extract multi-scale features while maintaining high-resolution details and incorporating richer semantic information through the attention mechanism. This integration of HRNet and Vision Transformer leverages the advantages of both models, resulting in accurate and robust bird pose estimation. We conducted extensive experiments on the Animal Kingdom dataset to evaluate the performance of VHR-BirdPose. The results demonstrate that our proposed method achieves state-of-the-art performance in bird pose estimation. VHR-BirdPose based on bird images is of great significance for the advancement of bird behaviors, ecological understanding, and the protection of bird populations. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision Application)
Show Figures

Figure 1

20 pages, 10877 KiB  
Article
SGooTY: A Scheme Combining the GoogLeNet-Tiny and YOLOv5-CBAM Models for Nüshu Recognition
by Yan Zhang and Liumei Zhang
Electronics 2023, 12(13), 2819; https://doi.org/10.3390/electronics12132819 - 26 Jun 2023
Cited by 1 | Viewed by 1611
Abstract
With the development of society, the intangible cultural heritage of Chinese Nüshu is in danger of extinction. To promote the research and popularization of traditional Chinese culture, we use deep learning to automatically detect and recognize handwritten Nüshu characters. To address difficulties such [...] Read more.
With the development of society, the intangible cultural heritage of Chinese Nüshu is in danger of extinction. To promote the research and popularization of traditional Chinese culture, we use deep learning to automatically detect and recognize handwritten Nüshu characters. To address difficulties such as the creation of a Nüshu character dataset, uneven samples, and difficulties in character recognition, we first build a large-scale handwritten Nüshu character dataset, HWNS2023, by using various data augmentation methods. This dataset contains 5500 Nüshu images and 1364 labeled character samples. Second, in this paper, we propose a two-stage scheme model combining GoogLeNet-tiny and YOLOv5-CBAM (SGooTY) for Nüshu recognition. In the first stage, five basic deep learning models including AlexNet, VGGNet16, GoogLeNet, MobileNetV3, and ResNet are trained and tested on the dataset, and the model structure is improved to enhance the accuracy of recognising handwritten Nüshu characters. In the second stage, we combine an object detection model to re-recognize misidentified handwritten Nüshu characters to ensure the accuracy of the overall system. Experimental results show that in the first stage, the improved model achieves the highest accuracy of 99.3% in recognising Nüshu characters, which significantly improves the recognition rate of handwritten Nüshu characters. After integrating the object recognition model, the overall recognition accuracy of the model reached 99.9%. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision Application)
Show Figures

Figure 1

20 pages, 9967 KiB  
Article
CNN-Based Fluid Motion Estimation Using Correlation Coefficient and Multiscale Cost Volume
by Jun Chen, Hui Duan, Yuanxin Song, Ming Tang and Zemin Cai
Electronics 2022, 11(24), 4159; https://doi.org/10.3390/electronics11244159 - 13 Dec 2022
Cited by 1 | Viewed by 2413
Abstract
Motion estimation for complex fluid flows via their image sequences is a challenging issue in computer vision. It plays a significant role in scientific research and engineering applications related to meteorology, oceanography, and fluid mechanics. In this paper, we introduce a novel convolutional [...] Read more.
Motion estimation for complex fluid flows via their image sequences is a challenging issue in computer vision. It plays a significant role in scientific research and engineering applications related to meteorology, oceanography, and fluid mechanics. In this paper, we introduce a novel convolutional neural network (CNN)-based motion estimator for complex fluid flows using multiscale cost volume. It uses correlation coefficients as the matching costs, which can improve the accuracy of motion estimation by enhancing the discrimination of the feature matching and overcoming the feature distortions caused by the changes of fluid shapes and illuminations. Specifically, it first generates sparse seeds by a feature extraction network. A correlation pyramid is then constructed for all pairs of sparse seeds, and the predicted matches are iteratively updated through a recurrent neural network, which lookups a multi-scale cost volume from a correlation pyramid via a multi-scale search scheme. Then it uses the searched multi-scale cost volume, the current matches, and the context features as the input features to correlate the predicted matches. Since the multi-scale cost volume contains motion information for both large and small displacements, it can recover small-scale motion structures. However, the predicted matches are sparse, so the final flow field is computed by performing a CNN-based interpolation for these sparse matches. The experimental results show that our method significantly outperforms the current motion estimators in capturing different motion patterns in complex fluid flows, especially in recovering some small-scale vortices. It also achieves state-of-the-art evaluation results on the public fluid datasets and successfully captures the storms in Jupiter’s White Ovals from the remote sensing images. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision Application)
Show Figures

Figure 1

Review

Jump to: Research

30 pages, 4599 KiB  
Review
Deep Learning Innovations in Video Classification: A Survey on Techniques and Dataset Evaluations
by Makara Mao, Ahyoung Lee and Min Hong
Electronics 2024, 13(14), 2732; https://doi.org/10.3390/electronics13142732 - 11 Jul 2024
Cited by 1 | Viewed by 3456
Abstract
Video classification has achieved remarkable success in recent years, driven by advanced deep learning models that automatically categorize video content. This paper provides a comprehensive review of video classification techniques and the datasets used in this field. We summarize key findings from recent [...] Read more.
Video classification has achieved remarkable success in recent years, driven by advanced deep learning models that automatically categorize video content. This paper provides a comprehensive review of video classification techniques and the datasets used in this field. We summarize key findings from recent research, focusing on network architectures, model evaluation metrics, and parallel processing methods that enhance training speed. Our review includes an in-depth analysis of state-of-the-art deep learning models and hybrid architectures, comparing models to traditional approaches and highlighting their advantages and limitations. Critical challenges such as handling large-scale datasets, improving model robustness, and addressing computational constraints are explored. By evaluating performance metrics, we identify areas where current models excel and where improvements are needed. Additionally, we discuss data augmentation techniques designed to enhance dataset accuracy and address specific challenges in video classification tasks. This survey also examines the evolution of convolutional neural networks (CNNs) in image processing and their adaptation to video classification tasks. We propose future research directions and provide a detailed comparison of existing approaches using the UCF-101 dataset, highlighting progress and ongoing challenges in achieving robust video classification. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision Application)
Show Figures

Figure 1

Back to TopTop