Artificial Intelligence in Image and Video Processing

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (15 January 2025) | Viewed by 11894

Special Issue Editors


E-Mail Website
Guest Editor
School of Digital Media and Design Arts, Beijing University of Posts and Telecommunications, Beijing 100876, China
Interests: object detection; multimodal learning; machine learning foundation model
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
International Research Institute for Artificial Intelligence, Harbin Institute of Technology, Shenzhen 518055, China
Interests: Artificial Intelligence; object detection

E-Mail
Guest Editor
College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
Interests: multimedia information retrieval; computer vision

E-Mail Website
Guest Editor
School of Computing and Information Hefei University of Technology Hefei 230009, China
Interests: multimedia information retrieval; computer vision

Special Issue Information

Dear Colleagues,

With the popularity of various camera platforms, such as smart phones, new energy vehicles, satellites and drones, the number of images or videos being created has increased exponentially. The massive amounts of images or videos can be used in many different fields, and some typical tasks are proposed herein, including segmentation, classification, recognition, tracking, etc. For instance, image or video processing can be used for facial recognition, self-driving cars, geological survey, etc. However, it is still challenging to achieve high performance for artificial intelligence-based image and video interpretation in real-world scenarios. The reasons may be the complex noise, occlusion and deformation observed in these scenarios. Recently, advances in the machine learning computer vision domain have shown their potential in practical applications.

This Special Issue aims to publish novel ideas for artificial intelligence in image and video processing. In this Special Issue, original research articles and reviews are welcome. We request researchers, engineers and scientists to contribute their peer review research which explains research gaps including, but not limited to:

  • New architectures and theories for image and video processing;
  • New applications or tasks for image and video processing;
  • Fine-tuning and adaptation for large pretrained models;
  • Deep learning for smart phones images/videos;
  • Deep learning for surveillance images/videos;
  • Deep learning for remote sensing images/videos;
  • Deep learning for drone images/videos;
  • Artificial intelligence content generation and detection.

Dr. Yue Zhang
Dr. Bin Chen
Dr. Jinlin Guo
Prof. Dr. Xueliang Liu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • image analysis
  • video analysis
  • large pretrained model
  • surveillance images
  • remote sensing images
  • object detection
  • image segmentation
  • object tracking

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

24 pages, 7528 KiB  
Article
EOS: Edge-Based Operation Skip Scheme for Real-Time Object Detection Using Viola-Jones Classifier
by Cheol-Ho Choi, Joonhwan Han, Hyun Woo Oh, Jeongwoo Cha and Jungho Shin
Electronics 2025, 14(2), 397; https://doi.org/10.3390/electronics14020397 - 20 Jan 2025
Viewed by 440
Abstract
Machine learning-based object detection systems are preferred due to their cost-effectiveness compared to deep learning approaches. Among machine learning methods, the Viola-Jones classifier stands out for its reasonable accuracy and efficient resource utilization. However, as the number of classification iterations increases or the [...] Read more.
Machine learning-based object detection systems are preferred due to their cost-effectiveness compared to deep learning approaches. Among machine learning methods, the Viola-Jones classifier stands out for its reasonable accuracy and efficient resource utilization. However, as the number of classification iterations increases or the resolution of the input image increases, the detection processing speed may decrease. To address the detection speed issue related to input image resolution, an improved edge component calibration method is applied. Additionally, an edge-based operation skip scheme is proposed to overcome the detection processing speed problem caused by the number of classification iterations. Our experiments using the FDDB public dataset show that our method reduces classification iterations by 24.6157% to 84.1288% compared to conventional methods, except for our previous study. Importantly, our method maintains detection accuracy while reducing classification iterations. This result implies that our method can realize almost real-time object detection when implemented on field-programmable gate arrays. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image and Video Processing)
Show Figures

Figure 1

19 pages, 3898 KiB  
Article
KARAN: Mitigating Feature Heterogeneity and Noise for Efficient and Accurate Multimodal Medical Image Segmentation
by Xinjia Gu, Yimin Chen and Weiqin Tong
Electronics 2024, 13(23), 4594; https://doi.org/10.3390/electronics13234594 - 21 Nov 2024
Viewed by 752
Abstract
Multimodal medical image segmentation is challenging due to feature heterogeneity across modalities and the presence of modality-specific noise and artifacts. These factors hinder the effective capture and fusion of information, limiting the performance of existing methods. This paper introduces KARAN, a novel end-to-end [...] Read more.
Multimodal medical image segmentation is challenging due to feature heterogeneity across modalities and the presence of modality-specific noise and artifacts. These factors hinder the effective capture and fusion of information, limiting the performance of existing methods. This paper introduces KARAN, a novel end-to-end deep learning model designed to overcome these limitations. KARAN improves feature representation and robustness to intermodal variations through two key innovations: First, KA-MLA, a novel attention block incorporating State Space Model (SSM) and Kolmogorov–Arnold Network (KAN) characteristics into Transformer blocks for efficient, discriminative feature extraction from heterogeneous modalities. Building on KA-MLA, we propose KA-MPE for multi-path parallel feature extraction to avoid multimodal feature entanglement. Second, RanPyramid leverages random convolutions to enhance modality appearance learning, mitigating the impact of noise and artifacts while improving feature fusion. It comprises two components: an Appearance Generator, creating diverse visual appearances, and an Appearance Adjuster, dynamically modulating their weights to optimize model performance. KARAN achieves high segmentation accuracy with lower computational complexity on two publicly available datasets, highlighting its potential to significantly advance medical image analysis. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image and Video Processing)
Show Figures

Figure 1

9 pages, 1460 KiB  
Article
Atmospheric Gravity Wave Detection in Low-Light Images: A Transfer Learning Approach
by Beimin Xiao, Shensen Hu, Weihua Ai and Yi Li
Electronics 2024, 13(20), 4030; https://doi.org/10.3390/electronics13204030 - 13 Oct 2024
Viewed by 885
Abstract
Atmospheric gravity waves, as a key fluctuation in the atmosphere, have a significant impact on climate change and weather processes. Traditional observation methods rely on manually identifying and analyzing gravity wave stripe features from satellite images, resulting in a limited number of gravity [...] Read more.
Atmospheric gravity waves, as a key fluctuation in the atmosphere, have a significant impact on climate change and weather processes. Traditional observation methods rely on manually identifying and analyzing gravity wave stripe features from satellite images, resulting in a limited number of gravity wave events for parameter analysis and excitation mechanism studies, which restricts further related research. In this study, we focus on the gravity wave events in the South China Sea region and utilize a one-year low-light satellite dataset processed with wavelet transform noise reduction and light pixel replacement. Furthermore, transfer learning is employed to adapt the Inception V3 model to the classification task of a small-sample dataset, performing the automatic identification of gravity waves in low-light images. By employing sliding window cutting and data enhancement techniques, we further expand the dataset and enhance the generalization ability of the model. We compare the results of transfer learning detection based on the Inception V3 model with the YOLO v10 model, showing that the results of the Inception V3 model are greatly superior to those of the YOLO v10 model. The accuracy on the test dataset is 88.2%. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image and Video Processing)
Show Figures

Figure 1

21 pages, 25692 KiB  
Article
DDE-Net: Dynamic Density-Driven Estimation for Arbitrary-Oriented Object Detection
by Boyu Wang, Donglin Jing, Xiaokai Xia, Yu Liu, Luo Xu and Jiangmai Cheng
Electronics 2024, 13(15), 3029; https://doi.org/10.3390/electronics13153029 - 1 Aug 2024
Viewed by 1337
Abstract
Compared with general images, objects in remote sensing (RS) images typically exhibit a conspicuous diversity due to their arbitrary orientations. However, many of the prevalent detectors generally apply an inflexible strategy in setting the angles of anchor, ignoring the fact that the number [...] Read more.
Compared with general images, objects in remote sensing (RS) images typically exhibit a conspicuous diversity due to their arbitrary orientations. However, many of the prevalent detectors generally apply an inflexible strategy in setting the angles of anchor, ignoring the fact that the number of possible orientations is predictable. Consequently, their processes integrate numerous superfluous angular considerations and hinder their efficiency. To deal with this situation, we propose a dynamic density-driven estimation network (DDE-Net). We design three core modules in DDE-Net: a density-map and mask generation module (DGM), mask routing prediction module (MRM), and spatial-balance calculation module (SCM). DGM is designed for the generation of a density map and mask, which can extract salient features. MRM is for the prediction of object orientation and corresponding weights, which are used to calculate feature maps. SCM is used to affine transform the convolution kernel, which applies an adaptive weighted compute mechanism to enhance the average feature, so as to balance the spatial difference to the rotation feature extraction. A broad array of experimental evaluations have conclusively shown that our methodology outperforms existing state-of-the-art detectors on common aerial object datasets (DOTA and HRSC2016). Full article
(This article belongs to the Special Issue Artificial Intelligence in Image and Video Processing)
Show Figures

Figure 1

25 pages, 8352 KiB  
Article
Real-Time Deepfake Video Detection Using Eye Movement Analysis with a Hybrid Deep Learning Approach
by Muhammad Javed, Zhaohui Zhang, Fida Hussain Dahri and Asif Ali Laghari
Electronics 2024, 13(15), 2947; https://doi.org/10.3390/electronics13152947 - 26 Jul 2024
Cited by 4 | Viewed by 4357
Abstract
Deepfake technology uses artificial intelligence to create realistic but false audio, images, and videos. Deepfake technology poses a significant threat to the authenticity of visual content, particularly in live-stream scenarios where the immediacy of detection is crucial. Existing Deepfake detection approaches have limitations [...] Read more.
Deepfake technology uses artificial intelligence to create realistic but false audio, images, and videos. Deepfake technology poses a significant threat to the authenticity of visual content, particularly in live-stream scenarios where the immediacy of detection is crucial. Existing Deepfake detection approaches have limitations and challenges, prompting the need for more robust and accurate solutions. This research proposes an innovative approach: combining eye movement analysis with a hybrid deep learning model to address the need for real-time Deepfake detection. The proposed hybrid deep learning model integrates two deep neural network architectures, MesoNet4 and ResNet101, to leverage their respective architectures’ strengths for effective Deepfake classification. MesoNet4 is a lightweight CNN model designed explicitly to detect subtle manipulations in facial images. At the same time, ResNet101 handles complex visual data and robust feature extraction. Combining the localized feature learning of MesoNet4 with the deeper, more comprehensive feature representations of ResNet101, our robust hybrid model achieves enhanced performance in distinguishing between manipulated and authentic videos, which cannot be conducted with the naked eye or traditional methods. The model is evaluated on diverse datasets, including FaceForensics++, CelebV1, and CelebV2, demonstrating compelling accuracy results, with the hybrid model attaining an accuracy of 0.9873 on FaceForensics++, 0.9689 on CelebV1, and 0.9790 on CelebV2, showcasing its robustness and potential for real-world deployment in content integrity verification and video forensics applications. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image and Video Processing)
Show Figures

Figure 1

15 pages, 672 KiB  
Article
Improving MLP-Based Weakly Supervised Crowd-Counting Network via Scale Reasoning and Ranking
by Ming Gao, Mingfang Deng, Huailin Zhao, Yangjian Chen and Yongqi Chen
Electronics 2024, 13(3), 471; https://doi.org/10.3390/electronics13030471 - 23 Jan 2024
Cited by 1 | Viewed by 1175
Abstract
MLP-based weakly supervised crowd counting approaches have made significant advancements over the past few years. However, owing to the limited datasets, the current MLP-based methods do not consider the problem of region-to-region dependency in the image. For this, we propose a weakly supervised [...] Read more.
MLP-based weakly supervised crowd counting approaches have made significant advancements over the past few years. However, owing to the limited datasets, the current MLP-based methods do not consider the problem of region-to-region dependency in the image. For this, we propose a weakly supervised method termed SR2. SR2 consists of three parts: scale-reasoning module, scale-ranking module, and regression branch. In particular, the scale-reasoning module extracts and fuses the region-to-region dependency in the image and multiple scale feature, then sends the fused features to the regression branch to obtain estimated counts; the scale-ranking module is used to understand the internal information of the image better and expand the datasets efficiently, which will help to improve the accuracy of the estimated counts in the regression branch. We conducted extensive experiments on four benchmark datasets. The final results showed that our approach has better and higher competing counting performance with respect to other weakly supervised counting networks and with respect to some popular fully supervised counting networks. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image and Video Processing)
Show Figures

Figure 1

20 pages, 5283 KiB  
Article
Fault Classification and Diagnosis Approach Using FFT-CNN for FPGA-Based CORDIC Processor
by Yu Xie, He Chen, Yin Zhuang and Yizhuang Xie
Electronics 2024, 13(1), 72; https://doi.org/10.3390/electronics13010072 - 22 Dec 2023
Cited by 6 | Viewed by 1428
Abstract
Within the realm of digital signal processing and communication systems, FPGA-based CORDIC (Coordinate Rotation Digital Computer) processors play pivotal roles, applied in trigonometric calculations and vector operations. However, soft errors have become one of the major threats in high-reliability FPGA-based applications, potentially degrading [...] Read more.
Within the realm of digital signal processing and communication systems, FPGA-based CORDIC (Coordinate Rotation Digital Computer) processors play pivotal roles, applied in trigonometric calculations and vector operations. However, soft errors have become one of the major threats in high-reliability FPGA-based applications, potentially degrading performance and causing system failures. This paper proposes a fault classification and diagnosis method for FPGA-based CORDIC processors, leveraging Fast Fourier Transform (FFT) and Convolutional Neural Networks (CNNs). The approach involves constructing fault classification datasets, optimizing features extraction through FFT to shorten the time of diagnosis and improve the diagnostic accuracy, and employing CNNs for training and testing of faults diagnosis. Different CNN architectures are tested to explore and construct the optimal fault classifier. Experimental results encompassing simulation and implementation demonstrate the improved accuracy and efficiency in fault classification and diagnosis. The proposed method provides fault prediction with an accuracy of more than 98.6% and holds the potential to enhance the reliability and performance of FPGA-based CORDIC circuit systems, surpassing traditional fault diagnosis methods such as Sum of Square (SoS). Full article
(This article belongs to the Special Issue Artificial Intelligence in Image and Video Processing)
Show Figures

Figure 1

Back to TopTop