AI | Special Issue : Artificial Intelligence-Based Image Processing and Computer Vision

Research

Jump to: Review

21 pages, 5748 KiB

Open AccessArticle

Automated Audible Truck-Mounted Attenuator Alerts: Vision System Development and Evaluation

by Neema Jakisa Owor, Yaw Adu-Gyamfi, Linlin Zhang and Carlos Sun

AI 2024, 5(4), 1816-1836; https://doi.org/10.3390/ai5040090 - 8 Oct 2024

Viewed by 812

Abstract

Background: The rise in work zone crashes due to distracted and aggressive driving calls for improved safety measures. While Truck-Mounted Attenuators (TMAs) have helped reduce crash severity, the increasing number of crashes involving TMAs shows the need for improved warning systems. Methods: This [...] Read more.

Background: The rise in work zone crashes due to distracted and aggressive driving calls for improved safety measures. While Truck-Mounted Attenuators (TMAs) have helped reduce crash severity, the increasing number of crashes involving TMAs shows the need for improved warning systems. Methods: This study proposes an AI-enabled vision system to automatically alert drivers on collision courses with TMAs, addressing the limitations of manual alert systems. The system uses multi-task learning (MTL) to detect and classify vehicles, estimate distance zones (danger, warning, and safe), and perform lane and road segmentation. MTL improves efficiency and accuracy, making it ideal for devices with limited resources. Using a Generalized Efficient Layer Aggregation Network (GELAN) backbone, the system enhances stability and performance. Additionally, an alert module triggers alarms based on speed, acceleration, and time to collision. Results: The model achieves a recall of 90.5%, an mAP of 0.792 for vehicle detection, an mIOU of 0.948 for road segmentation, an accuracy of 81.5% for lane segmentation, and 83.8% accuracy for distance classification. Conclusions: The results show the system accurately detects vehicles, classifies distances, and provides real-time alerts, reducing TMA collision risks and enhancing work zone safety. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

23 pages, 12844 KiB

Open AccessArticle

Aircraft Skin Damage Visual Testing System Using Lightweight Devices with YOLO: An Automated Real-Time Material Evaluation System

by Kuo-Chien Liao, Jirayu Lau and Muhamad Hidayat

AI 2024, 5(4), 1793-1815; https://doi.org/10.3390/ai5040089 - 29 Sep 2024

Viewed by 990

Abstract

Inspection and material evaluation are some of the critical factors to ensure the structural integrity and safety of an aircraft in the aviation industry. These inspections are carried out by trained personnel, and while effective, they are prone to human error, where even [...] Read more.

Inspection and material evaluation are some of the critical factors to ensure the structural integrity and safety of an aircraft in the aviation industry. These inspections are carried out by trained personnel, and while effective, they are prone to human error, where even a minute error could result in a large-scale negative impact. Automated detection devices designed to improve the reliability of inspections could help the industry reduce the potential effects caused by human error. This study aims to develop a system that can automatically detect and identify defects on aircraft skin using relatively lightweight devices, including mobile phones and unmanned aerial vehicles (UAVs). The study combines an internet of things (IoT) network, allowing the results to be reviewed in real time, regardless of distance. The experimental results confirmed the effective recognition of defects with the mean average precision ([email protected]) at 0.853 for YOLOv9c for all classes. However, despite the effective detection, the test device (mobile phone) was prone to overheating, significantly reducing its performance. While there is still room for further enhancements, this study demonstrates the potential of introducing automated image detection technology to assist the inspection process in the aviation industry. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

14 pages, 905 KiB

Open AccessArticle

Spatiotemporal Graph Autoencoder Network for Skeleton-Based Human Action Recognition

by Hosam Abduljalil, Ahmed Elhayek, Abdullah Marish Ali and Fawaz Alsolami

AI 2024, 5(3), 1695-1708; https://doi.org/10.3390/ai5030083 - 23 Sep 2024

Viewed by 772

Abstract

Human action recognition (HAR) based on skeleton data is a challenging yet crucial task due to its wide-ranging applications, including patient monitoring, security surveillance, and human- machine interaction. Although numerous algorithms have been proposed to distinguish between various activities, most practical applications require [...] Read more.

Human action recognition (HAR) based on skeleton data is a challenging yet crucial task due to its wide-ranging applications, including patient monitoring, security surveillance, and human- machine interaction. Although numerous algorithms have been proposed to distinguish between various activities, most practical applications require highly accurate detection of specific actions. In this study, we propose a novel, highly accurate spatiotemporal graph autoencoder network for HAR, designated as GA-GCN. Furthermore, an extensive investigation was conducted employing diverse modalities. To this end, a spatiotemporal graph autoencoder was constructed to automatically learn both spatial and temporal patterns from skeleton data. The proposed method achieved accuracies of 92.3% and 96.8% on the NTU RGB+D dataset for cross-subject and cross-view evaluations, respectively. On the more challenging NTU RGB+D 120 dataset, GA-GCN attained accuracies of 88.8% and 90.4% for cross-subject and cross-set evaluations. Overall, our model outperforms the majority of the existing state-of-the-art methods on these common benchmark datasets. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

33 pages, 1785 KiB

Open AccessArticle

Sustainable Machine Vision for Industry 4.0: A Comprehensive Review of Convolutional Neural Networks and Hardware Accelerators in Computer Vision

by Muhammad Hussain

AI 2024, 5(3), 1324-1356; https://doi.org/10.3390/ai5030064 - 1 Aug 2024

Viewed by 1935

Abstract

As manifestations of Industry 4.0. become visible across various applications, one key and opportune area of development are quality inspection processes and defect detection. Over the last decade, computer vision architectures, in particular, object detectors have received increasing attention from the research community, [...] Read more.

As manifestations of Industry 4.0. become visible across various applications, one key and opportune area of development are quality inspection processes and defect detection. Over the last decade, computer vision architectures, in particular, object detectors have received increasing attention from the research community, due to their localisation advantage over image classification. However, for these architectural advancements to provide tangible solutions, they must be optimised with respect to the target hardware along with the deployment environment. To this effect, this survey provides an in-depth review of the architectural progression of image classification and object detection architectures with a focus on advancements within Artificially Intelligent accelerator hardware. This will provide readers with an understanding of the present state of architecture–hardware integration within the computer vision discipline. The review also provides examples of the industrial implementation of computer vision architectures across various domains, from the detection of fabric defects to pallet racking inspection. The survey highlights the need for representative hardware-benchmarked datasets for providing better performance comparisons along with envisioning object detection as the primary domain where more research efforts would be focused over the next decade. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

23 pages, 5989 KiB

Open AccessArticle

Vision Transformers in Optimization of AI-Based Early Detection of Botrytis cinerea

by Panagiotis Christakakis, Nikolaos Giakoumoglou, Dimitrios Kapetas, Dimitrios Tzovaras and Eleftheria-Maria Pechlivani

AI 2024, 5(3), 1301-1323; https://doi.org/10.3390/ai5030063 - 1 Aug 2024

Cited by 1 | Viewed by 1167

Abstract

Detecting early plant diseases autonomously poses a significant challenge for self-navigating robots and automated systems utilizing Artificial Intelligence (AI) imaging. For instance, Botrytis cinerea, also known as gray mold disease, is a major threat to agriculture, particularly impacting significant crops in the [...] Read more.

Detecting early plant diseases autonomously poses a significant challenge for self-navigating robots and automated systems utilizing Artificial Intelligence (AI) imaging. For instance, Botrytis cinerea, also known as gray mold disease, is a major threat to agriculture, particularly impacting significant crops in the Cucurbitaceae and Solanaceae families, making early and accurate detection essential for effective disease management. This study focuses on the improvement of deep learning (DL) segmentation models capable of early detecting B. cinerea on Cucurbitaceae crops utilizing Vision Transformer (ViT) encoders, which have shown promising segmentation performance, in systemic use with the Cut-and-Paste method that further improves accuracy and efficiency addressing dataset imbalance. Furthermore, to enhance the robustness of AI models for early detection in real-world settings, an advanced imagery dataset was employed. The dataset consists of healthy and artificially inoculated cucumber plants with B. cinerea and captures the disease progression through multi-spectral imaging over the course of days, depicting the full spectrum of symptoms of the infection, ranging from early, non-visible stages to advanced disease manifestations. Research findings, based on a three-class system, identify the combination of U-Net++ with MobileViTV2-125 as the best-performing model. This model achieved a mean Dice Similarity Coefficient (mDSC) of 0.792, a mean Intersection over Union (mIoU) of 0.816, and a recall rate of 0.885, with a high accuracy of 92%. Analyzing the detection capabilities during the initial days post-inoculation demonstrates the ability to identify invisible B. cinerea infections as early as day 2 and increasing up to day 6, reaching an IoU of 67.1%. This study assesses various infection stages, distinguishing them from abiotic stress responses or physiological deterioration, which is crucial for accurate disease management as it separates pathogenic from non-pathogenic stress factors. The findings of this study indicate a significant advancement in agricultural disease monitoring and control, with the potential for adoption in on-site digital systems (robots, mobile apps, etc.) operating in real settings, showcasing the effectiveness of ViT-based DL segmentation models for prompt and precise botrytis detection. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

24 pages, 7706 KiB

Open AccessArticle

Computer Vision for Safety Management in the Steel Industry

by Roy Lan, Ibukun Awolusi and Jiannan Cai

AI 2024, 5(3), 1192-1215; https://doi.org/10.3390/ai5030058 - 19 Jul 2024

Viewed by 1722

Abstract

The complex nature of the steel manufacturing environment, characterized by different types of hazards from materials and large machinery, makes the need for objective and automated monitoring very critical to replace the traditional methods, which are manual and subjective. This study explores the [...] Read more.

The complex nature of the steel manufacturing environment, characterized by different types of hazards from materials and large machinery, makes the need for objective and automated monitoring very critical to replace the traditional methods, which are manual and subjective. This study explores the feasibility of implementing computer vision for safety management in steel manufacturing, with a case study implementation for automated hard hat detection. The research combines hazard characterization, technology assessment, and a pilot case study. First, a comprehensive review of steel manufacturing hazards was conducted, followed by the application of TOPSIS, a multi-criteria decision analysis method, to select a candidate computer vision system from eight commercially available systems. This pilot study evaluated YOLOv5m, YOLOv8m, and YOLOv9c models on 703 grayscale images from a steel mini-mill, assessing performance through precision, recall, F1-score, mAP, specificity, and AUC metrics. Results showed high overall accuracy in hard hat detection, with YOLOv9c slightly outperforming others, particularly in detecting safety violations. Challenges emerged in handling class imbalance and accurately identifying absent hard hats, especially given grayscale imagery limitations. Despite these challenges, this study affirms the feasibility of computer vision-based safety management in steel manufacturing, providing a foundation for future automated safety monitoring systems. Findings underscore the need for larger, diverse datasets and advanced techniques to address industry-specific complexities, paving the way for enhanced workplace safety in challenging industrial environments. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

40 pages, 5912 KiB

Open AccessArticle

ConVision Benchmark: A Contemporary Framework to Benchmark CNN and ViT Models

by Shreyas Bangalore Vijayakumar, Krishna Teja Chitty-Venkata, Kanishk Arya and Arun K. Somani

AI 2024, 5(3), 1132-1171; https://doi.org/10.3390/ai5030056 - 11 Jul 2024

Viewed by 1833

Abstract

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have shown remarkable performance in computer vision tasks, including object detection and image recognition. These models have evolved significantly in architecture, efficiency, and versatility. Concurrently, deep-learning frameworks have diversified, with versions that often complicate reproducibility [...] Read more.

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have shown remarkable performance in computer vision tasks, including object detection and image recognition. These models have evolved significantly in architecture, efficiency, and versatility. Concurrently, deep-learning frameworks have diversified, with versions that often complicate reproducibility and unified benchmarking. We propose ConVision Benchmark, a comprehensive framework in PyTorch, to standardize the implementation and evaluation of state-of-the-art CNN and ViT models. This framework addresses common challenges such as version mismatches and inconsistent validation metrics. As a proof of concept, we performed an extensive benchmark analysis on a COVID-19 dataset, encompassing nearly 200 CNN and ViT models in which DenseNet-161 and MaxViT-Tiny achieved exceptional accuracy with a peak performance of around

95 %

. Although we primarily used the COVID-19 dataset for image classification, the framework is adaptable to a variety of datasets, enhancing its applicability across different domains. Our methodology includes rigorous performance evaluations, highlighting metrics such as accuracy, precision, recall, F1 score, and computational efficiency (FLOPs, MACs, CPU, and GPU latency). The ConVision Benchmark facilitates a comprehensive understanding of model efficacy, aiding researchers in deploying high-performance models for diverse applications. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

31 pages, 13435 KiB

Open AccessArticle

Real-Time Camera Operator Segmentation with YOLOv8 in Football Video Broadcasts

by Serhii Postupaiev, Robertas Damaševičius and Rytis Maskeliūnas

AI 2024, 5(2), 842-872; https://doi.org/10.3390/ai5020042 - 6 Jun 2024

Cited by 2 | Viewed by 2794

Abstract

Using instance segmentation and video inpainting provides a significant leap in real-time football video broadcast enhancements by removing potential visual distractions, such as an occasional person or another object accidentally occupying the frame. Despite its relevance and importance in the media industry, this [...] Read more.

Using instance segmentation and video inpainting provides a significant leap in real-time football video broadcast enhancements by removing potential visual distractions, such as an occasional person or another object accidentally occupying the frame. Despite its relevance and importance in the media industry, this area remains challenging and relatively understudied, thus offering potential for research. Specifically, the segmentation and inpainting of camera operator instances from video remains an underexplored research area. To address this challenge, this paper proposes a framework designed to accurately detect and remove camera operators while seamlessly hallucinating the background in real-time football broadcasts. The approach aims to enhance the quality of the broadcast by maintaining its consistency and level of engagement to retain and attract users during the game. To implement the inpainting task, firstly, the camera operators instance segmentation method should be developed. We used a YOLOv8 model for accurate real-time operator instance segmentation. The resulting model produces masked frames, which are used for further camera operator inpainting. Moreover, this paper presents an extensive “Cameramen Instances” dataset with more than 7500 samples, which serves as a solid foundation for future investigations in this area. The experimental results show that the YOLOv8 model performs better than other baseline algorithms in different scenarios. The precision of 95.5%, recall of 92.7%, mAP50-95 of 79.6, and a high FPS rate of 87 in low-volume environment prove the solution efficacy for real-time applications. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

13 pages, 3061 KiB

Open AccessArticle

Quantifying Visual Differences in Drought-Stressed Maize through Reflectance and Data-Driven Analysis

by Sanjana Banerjee, James Reynolds, Matthew Taggart, Michael Daniele, Alper Bozkurt and Edgar Lobaton

AI 2024, 5(2), 790-802; https://doi.org/10.3390/ai5020040 - 4 Jun 2024

Viewed by 1148

Abstract

Environmental factors, such as drought stress, significantly impact maize growth and productivity worldwide. To improve yield and quality, effective strategies for early detection and mitigation of drought stress in maize are essential. This paper presents a detailed analysis of three imaging trials conducted [...] Read more.

Environmental factors, such as drought stress, significantly impact maize growth and productivity worldwide. To improve yield and quality, effective strategies for early detection and mitigation of drought stress in maize are essential. This paper presents a detailed analysis of three imaging trials conducted to detect drought stress in maize plants using an existing, custom-developed, low-cost, high-throughput phenotyping platform. A pipeline is proposed for early detection of water stress in maize plants using a Vision Transformer classifier and analysis of distributions of near-infrared (NIR) reflectance from the plants. A classification accuracy of 85% was achieved in one of our trials, using hold-out trials for testing. Suitable regions on the plant that are more sensitive to drought stress were explored, and it was shown that the region surrounding the youngest expanding leaf (YEL) and the stem can be used as a more consistent alternative to analysis involving just the YEL. Experiments in search of an ideal window size showed that small bounding boxes surrounding the YEL and the stem area of the plant perform better in separating drought-stressed and well-watered plants than larger window sizes enclosing most of the plant. The results presented in this work show good separation between well-watered and drought-stressed categories for two out of the three imaging trials, both in terms of classification accuracy from data-driven features as well as through analysis of histograms of NIR reflectance. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

13 pages, 9978 KiB

Open AccessArticle

The Eye in the Sky—A Method to Obtain On-Field Locations of Australian Rules Football Athletes

by Zachery Born, Marion Mundt, Ajmal Mian, Jason Weber and Jacqueline Alderson

AI 2024, 5(2), 733-745; https://doi.org/10.3390/ai5020038 - 16 May 2024

Viewed by 1397

Abstract

The ability to overcome an opposition in team sports is reliant upon an understanding of the tactical behaviour of the opposing team members. Recent research is limited to a performance analysts’ own playing team members, as the required opposing team athletes’ geolocation (GPS) [...] Read more.

The ability to overcome an opposition in team sports is reliant upon an understanding of the tactical behaviour of the opposing team members. Recent research is limited to a performance analysts’ own playing team members, as the required opposing team athletes’ geolocation (GPS) data are unavailable. However, in professional Australian rules Football (AF), animations of athlete GPS data from all teams are commercially available. The purpose of this technical study was to obtain the on-field location of AF athletes from animations of the 2019 Australian Football League season to enable the examination of the tactical behaviour of any team. The pre-trained object detection model YOLOv4 was fine-tuned to detect players, and a custom convolutional neural network was trained to track numbers in the animations. The object detection and the athlete tracking achieved an accuracy of 0.94 and 0.98, respectively. Subsequent scaling and translation coefficients were determined through solving an optimisation problem to transform the pixel coordinate positions of a tracked player number to field-relative Cartesian coordinates. The derived equations achieved an average Euclidean distance from the athletes’ raw GPS data of 2.63 m. The proposed athlete detection and tracking approach is a novel methodology to obtain the on-field positions of AF athletes in the absence of direct measures, which may be used for the analysis of opposition collective team behaviour and in the development of interactive play sketching AF tools. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

18 pages, 1201 KiB

Open AccessArticle

Efficient Paddy Grain Quality Assessment Approach Utilizing Affordable Sensors

by Aditya Singh, Kislay Raj, Teerath Meghwar and Arunabha M. Roy

AI 2024, 5(2), 686-703; https://doi.org/10.3390/ai5020036 - 14 May 2024

Cited by 1 | Viewed by 1267

Abstract

Paddy (Oryza sativa) is one of the most consumed food grains in the world. The process from its sowing to consumption via harvesting, processing, storage and management require much effort and expertise. The grain quality of the product is heavily affected [...] Read more.

Paddy (Oryza sativa) is one of the most consumed food grains in the world. The process from its sowing to consumption via harvesting, processing, storage and management require much effort and expertise. The grain quality of the product is heavily affected by the weather conditions, irrigation frequency, and many other factors. However, quality control is of immense importance, and thus, the evaluation of grain quality is necessary. Since it is necessary and arduous, we try to overcome the limitations and shortcomings of grain quality evaluation using image processing and machine learning (ML) techniques. Most existing methods are designed for rice grain quality assessment, noting that the key characteristics of paddy and rice are different. In addition, they have complex and expensive setups and utilize black-box ML models. To handle these issues, in this paper, we propose a reliable ML-based IoT paddy grain quality assessment system utilizing affordable sensors. It involves a specific data collection procedure followed by image processing with an ML-based model to predict the quality. Different explainable features are used for classifying the grain quality of paddy grain, like the shape, size, moisture, and maturity of the grain. The precision of the system was tested in real-world scenarios. To our knowledge, it is the first automated system to precisely provide an overall quality metric. The main feature of our system is its explainability in terms of utilized features and fuzzy rules, which increases the confidence and trustworthiness of the public toward its use. The grain variety used for experiments majorly belonged to the Indian Subcontinent, but it covered a significant variation in the shape and size of the grain. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

20 pages, 6807 KiB

Open AccessArticle

Single Image Super Resolution Using Deep Residual Learning

by Moiz Hassan, Kandasamy Illanko and Xavier N. Fernando

AI 2024, 5(1), 426-445; https://doi.org/10.3390/ai5010021 - 21 Mar 2024

Cited by 1 | Viewed by 3248

Abstract

Single Image Super Resolution (SSIR) is an intriguing research topic in computer vision where the goal is to create high-resolution images from low-resolution ones using innovative techniques. SSIR has numerous applications in fields such as medical/satellite imaging, remote target identification and autonomous vehicles. [...] Read more.

Single Image Super Resolution (SSIR) is an intriguing research topic in computer vision where the goal is to create high-resolution images from low-resolution ones using innovative techniques. SSIR has numerous applications in fields such as medical/satellite imaging, remote target identification and autonomous vehicles. Compared to interpolation based traditional approaches, deep learning techniques have recently gained attention in SISR due to their superior performance and computational efficiency. This article proposes an Autoencoder based Deep Learning Model for SSIR. The down-sampling part of the Autoencoder mainly uses 3 by 3 convolution and has no subsampling layers. The up-sampling part uses transpose convolution and residual connections from the down sampling part. The model is trained using a subset of the VILRC ImageNet database as well as the RealSR database. Quantitative metrics such as PSNR and SSIM are found to be as high as 76.06 and 0.93 in our testing. We also used qualitative measures such as perceptual quality. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

31 pages, 4849 KiB

Open AccessArticle

MultiWave-Net: An Optimized Spatiotemporal Network for Abnormal Action Recognition Using Wavelet-Based Channel Augmentation

by Ramez M. Elmasry, Mohamed A. Abd El Ghany, Mohammed A.-M. Salem and Omar M. Fahmy

AI 2024, 5(1), 259-289; https://doi.org/10.3390/ai5010014 - 24 Jan 2024

Viewed by 1917

Abstract

Human behavior is regarded as one of the most complex notions present nowadays, due to the large magnitude of possibilities. These behaviors and actions can be distinguished as normal and abnormal. However, abnormal behavior is a vast spectrum, so in this work, abnormal [...] Read more.

Human behavior is regarded as one of the most complex notions present nowadays, due to the large magnitude of possibilities. These behaviors and actions can be distinguished as normal and abnormal. However, abnormal behavior is a vast spectrum, so in this work, abnormal behavior is regarded as human aggression or in another context when car accidents occur on the road. As this behavior can negatively affect the surrounding traffic participants, such as vehicles and other pedestrians, it is crucial to monitor such behavior. Given the current prevalent spread of cameras everywhere with different types, they can be used to classify and monitor such behavior. Accordingly, this work proposes a new optimized model based on a novel integrated wavelet-based channel augmentation unit for classifying human behavior in various scenes, having a total number of trainable parameters of 5.3 m with an average inference time of 0.09 s. The model has been trained and evaluated on four public datasets: Real Live Violence Situations (RLVS), Highway Incident Detection (HWID), Movie Fights, and Hockey Fights. The proposed technique achieved accuracies in the range of

92 %

to

99.5 %

across the used benchmark datasets. Comprehensive analysis and comparisons between different versions of the model and the state-of-the-art have been performed to confirm the model’s performance in terms of accuracy and efficiency. The proposed model has higher accuracy with an average of

4.97 %

, and higher efficiency by reducing the number of parameters by around 139.1 m compared to other models trained and tested on the same benchmark datasets. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

23 pages, 9079 KiB

Open AccessArticle

Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks

by Muhammad Ali Shafique, Arslan Munir and Joonho Kong

AI 2023, 4(4), 926-948; https://doi.org/10.3390/ai4040047 - 18 Oct 2023

Cited by 1 | Viewed by 2997

Abstract

Deep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency, throughput, [...] Read more.

Deep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency, throughput, energy consumption, and memory usage in the training and inference stages. To solve these challenges, various optimization techniques and frameworks have been developed for the efficient performance of deep learning models in the training and inference stages. Although optimization techniques such as quantization have been studied thoroughly in the past, less work has been done to study the performance of frameworks that provide quantization techniques. In this paper, we have used different performance metrics to study the performance of various quantization frameworks, including TensorFlow automatic mixed precision and TensorRT. These performance metrics include training time and memory utilization in the training stage along with latency and throughput for graphics processing units (GPUs) in the inference stage. We have applied the automatic mixed precision (AMP) technique during the training stage using the TensorFlow framework, while for inference we have utilized the TensorRT framework for the post-training quantization technique using the TensorFlow TensorRT (TF-TRT) application programming interface (API).We performed model profiling for different deep learning models, datasets, image sizes, and batch sizes for both the training and inference stages, the results of which can help developers and researchers to devise and deploy efficient deep learning models for GPUs. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

Review

Jump to: Research

36 pages, 3308 KiB

Open AccessReview

Fractional Calculus Meets Neural Networks for Computer Vision: A Survey

by Cecília Coelho, M. Fernanda P. Costa and Luís L. Ferrás

AI 2024, 5(3), 1391-1426; https://doi.org/10.3390/ai5030067 - 7 Aug 2024

Cited by 1 | Viewed by 1408

Abstract

Traditional computer vision techniques aim to extract meaningful information from images but often depend on manual feature engineering, making it difficult to handle complex real-world scenarios. Fractional calculus (FC), which extends derivatives to non-integer orders, provides a flexible way to model systems with [...] Read more.

Traditional computer vision techniques aim to extract meaningful information from images but often depend on manual feature engineering, making it difficult to handle complex real-world scenarios. Fractional calculus (FC), which extends derivatives to non-integer orders, provides a flexible way to model systems with memory effects and long-term dependencies, making it a powerful tool for capturing fractional rates of variation. Recently, neural networks (NNs) have demonstrated remarkable capabilities in learning complex patterns directly from raw data, automating computer vision tasks and enhancing performance. Therefore, the use of fractional calculus in neural network-based computer vision is a powerful method to address existing challenges by effectively capturing complex spatial and temporal relationships in images and videos. This paper presents a survey of fractional calculus neural network-based (FC NN-based) computer vision techniques for denoising, enhancement, object detection, segmentation, restoration, and NN compression. This survey compiles existing FFC NN-based approaches, elucidates underlying concepts, and identifies open questions and research directions. By leveraging FC’s properties, FC NN-based approaches offer a novel way to improve the robustness and efficiency of computer vision systems. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

21 pages, 683 KiB

Open AccessReview

Few-Shot Fine-Grained Image Classification: A Comprehensive Review

by Jie Ren, Changmiao Li, Yaohui An, Weichuan Zhang and Changming Sun

AI 2024, 5(1), 405-425; https://doi.org/10.3390/ai5010020 - 6 Mar 2024

Cited by 1 | Viewed by 3375

Abstract

Few-shot fine-grained image classification (FSFGIC) methods refer to the classification of images (e.g., birds, flowers, and airplanes) belonging to different subclasses of the same species by a small number of labeled samples. Through feature representation learning, FSFGIC methods can make better use of [...] Read more.

Few-shot fine-grained image classification (FSFGIC) methods refer to the classification of images (e.g., birds, flowers, and airplanes) belonging to different subclasses of the same species by a small number of labeled samples. Through feature representation learning, FSFGIC methods can make better use of limited sample information, learn more discriminative feature representations, greatly improve the classification accuracy and generalization ability, and thus achieve better results in FSFGIC tasks. In this paper, starting from the definition of FSFGIC, a taxonomy of feature representation learning for FSFGIC is proposed. According to this taxonomy, we discuss key issues on FSFGIC (including data augmentation, local and/or global deep feature representation learning, class representation learning, and task-specific feature representation learning). In addition, the existing popular datasets, current challenges and future development trends of feature representation learning on FSFGIC are also described. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Artificial Intelligence-Based Image Processing and Computer Vision

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (16 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI