Advances in Image Processing and Computer Vision Based on Machine Learning

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (30 June 2024) | Viewed by 11850

Special Issue Editor


E-Mail Website
Guest Editor
Department of Electrical, Electronics and Computer Engineering (DIEEI), University of Catania, 95125 Catania, Italy
Interests: audio signal processing; biometrics; IoT; drone/UAV communications; rainfall estimation and monitoring; post-earthquake geolocation; image processing, computer vision, machine learning-based applications
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue is devoted to the recent advances in image processing and computer vision. Much of this recent explosion of developments and application areas is due to the powerful capabilities of machine learning algorithms and, more specifically, convolutional neural networks (CNNs).

Computer Vision plays an important role in health care (e.g., COVID-19), anti-crime, and countering hydrogeological disruption. This Special Issue aims to present original, unpublished, and breakthrough research in the metaverse and computer vision focusing on new algorithms and mechanisms, such as artificial intelligence, machine learning, and explainable artificial intelligence (XAI). We aim to bring leading scientists and researchers together and create an interdisciplinary platform for the exchange of computational theories, methodologies, and techniques.

The purpose of this Special Issue is to disseminate research papers or state-of-the-art surveys that pertain to novel or emerging applications in the field of image processing and computer vision based on machine learning algorithms. Papers may contribute to technologies and application areas that have emerged during the past decade. Submissions are particularly welcome in, though not limited to, the areas in the list of keywords below.

Technical Program Committee Member:

Ms. Roberta Avanzato   
E-mail: [email protected]
Homepage: https://www.researchgate.net/profile/Roberta-Avanzato
Affiliation: Department of Electrical, Electronics and Computer Engineering (DIEEI), University of Catania, 95124 Catania, Italy
Research Interests: rainfall estimation; geolocation; natural disasters; Internet of Things; UAV; computer networking; biomedical signal processing

Dr. Francesco Beritelli
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • image segmentation
  • computer vision
  • deep learning
  • machine learning
  • reinforcement learning
  • classification
  • healthcare applications
  • novel industrial applications
  • high-speed computer vision
  • novel applications for 3D vision
  • object recognition
  • object detection
  • object tracking

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 23607 KiB  
Article
Enhancing Image Copy Detection through Dynamic Augmentation and Efficient Sampling with Minimal Data
by Mohamed Fawzy, Noha S. Tawfik and Sherine Nagy Saleh
Electronics 2024, 13(16), 3125; https://doi.org/10.3390/electronics13163125 - 7 Aug 2024
Viewed by 1094
Abstract
Social networks have become deeply integrated into our daily lives, leading to an increase in image sharing across different platforms. Simultaneously, the existence of robust and user-friendly media editors not only facilitates artistic innovation, but also raises concerns regarding the ease of creating [...] Read more.
Social networks have become deeply integrated into our daily lives, leading to an increase in image sharing across different platforms. Simultaneously, the existence of robust and user-friendly media editors not only facilitates artistic innovation, but also raises concerns regarding the ease of creating misleading media. This highlights the need for developing new advanced techniques for the image copy detection task, which involves evaluating whether photos or videos originate from the same source. This research introduces a novel application of the Vision Transformer (ViT) model to the image copy detection task on the DISC21 dataset. Our approach involves innovative strategic sampling of the extensive DISC21 training set using K-means clustering to achieve a representative subset. Additionally, we employ complex augmentation pipelines applied while training with varying intensities. Our methodology follows the instance discrimination concept, where the Vision Transformer model is used as a classifier to map different augmentations of the same image to the same class. Next, the trained ViT model extracts descriptors of original and manipulated images that subsequently underwent post-processing to reduce dimensionality. Our best-achieving model, tested on a refined query set of 10K augmented images from the DISC21 dataset, attained a state-of-the-art micro-average precision of 0.79, demonstrating the effectiveness and innovation of our approach. Full article
Show Figures

Figure 1

27 pages, 6343 KiB  
Article
Detection and Classification of Obstructive Sleep Apnea Using Audio Spectrogram Analysis
by Salvatore Serrano, Luca Patanè, Omar Serghini and Marco Scarpa
Electronics 2024, 13(13), 2567; https://doi.org/10.3390/electronics13132567 - 29 Jun 2024
Cited by 1 | Viewed by 1567
Abstract
Sleep disorders are steadily increasing in the population and can significantly affect daily life. Low-cost and noninvasive systems that can assist the diagnostic process will become increasingly widespread in the coming years. This work aims to investigate and compare the performance of machine [...] Read more.
Sleep disorders are steadily increasing in the population and can significantly affect daily life. Low-cost and noninvasive systems that can assist the diagnostic process will become increasingly widespread in the coming years. This work aims to investigate and compare the performance of machine learning-based classifiers for the identification of obstructive sleep apnea–hypopnea (OSAH) events, including apnea/non-apnea status classification, apnea–hypopnea index (AHI) prediction, and AHI severity classification. The dataset considered contains recordings from 192 patients. It is derived from a recently released dataset which contains, amongst others, audio signals recorded with an ambient microphone placed ∼1 m above the studied subjects and apnea/hypopnea accurate events annotations performed by specialized medical doctors. We employ mel spectrogram images extracted from the environmental audio signals as input of a machine-learning-based classifier for apnea/hypopnea events classification. The proposed approach involves a stacked model which utilizes a combination of a pretrained VGG-like audio classification (VGGish) network and a bidirectional long short-term memory (bi-LSTM) network. Performance analysis was conducted using a 5-fold cross-validation approach, leaving out patients used for training and validation of the models in the testing step. Comparative evaluations with recently presented methods from the literature demonstrate the advantages of the proposed approach. The proposed architecture can be considered a useful tool for supporting OSAHS diagnoses by means of low-cost devices such as smartphones. Full article
Show Figures

Figure 1

25 pages, 940 KiB  
Article
Fast Versatile Video Coding (VVC) Intra Coding for Power-Constrained Applications
by Lei Chen, Baoping Cheng, Haotian Zhu, Haowen Qin, Lihua Deng and Lei Luo
Electronics 2024, 13(11), 2150; https://doi.org/10.3390/electronics13112150 - 31 May 2024
Cited by 1 | Viewed by 887
Abstract
Versatile Video Coding (VVC) achieves impressive coding gain improvement (about 40%+) over the preceding High-Efficiency Video Coding (HEVC) technology at the cost of extremely high computational complexity. Such an extremely high complexity increase is a great challenge for power-constrained applications, such as Internet [...] Read more.
Versatile Video Coding (VVC) achieves impressive coding gain improvement (about 40%+) over the preceding High-Efficiency Video Coding (HEVC) technology at the cost of extremely high computational complexity. Such an extremely high complexity increase is a great challenge for power-constrained applications, such as Internet of video things. In the case of intra coding, VVC utilizes the brute-force recursive search for both the partition structure of the coding unit (CU), which is based on the quadtree with nested multi-type tree (QTMT), and 67 intra prediction modes, compared to 35 in HEVC. As a result, we offer optimization strategies for CU partition decision and intra coding modes to lessen the computational overhead. Regarding the high complexity of the CU partition process, first, CUs are categorized as simple, fuzzy, and complex based on their texture characteristics. Then, we train two random forest classifiers to speed up the RDO-based brute-force recursive search process. One of the classifiers directly predicts the optimal partition modes for simple and complex CUs, while another classifier determines the early termination of the partition process for fuzzy CUs. Meanwhile, to reduce the complexity of intra mode prediction, a fast hierarchical intra mode search method is designed based on the texture features of CUs, including texture complexity, texture direction, and texture context information. Extensive experimental findings demonstrate that the proposed approach reduces complexity by up to 77% compared to the latest VVC reference software (VTM-23.1). Additionally, an average coding time saving of 70% is achieved with only a 1.65% increase in BDBR. Furthermore, when compared to state-of-the-art methods, the proposed method also achieves the largest time saving with comparable BDBR loss. These findings indicate that our method is superior to other up-to-date methods in terms of lowering VVC intra coding complexity, which provides an elective solution for power-constrained applications. Full article
Show Figures

Figure 1

14 pages, 617 KiB  
Article
Automatic Evaluation Method for Functional Movement Screening Based on Multi-Scale Lightweight 3D Convolution and an Encoder–Decoder
by Xiuchun Lin, Yichao Liu, Chen Feng, Zhide Chen, Xu Yang and Hui Cui
Electronics 2024, 13(10), 1813; https://doi.org/10.3390/electronics13101813 - 7 May 2024
Viewed by 1048
Abstract
Functional Movement Screening (FMS) is a test used to evaluate fundamental movement patterns in the human body and identify functional limitations. However, the challenge of carrying out an automated assessment of FMS is that complex human movements are difficult to model accurately and [...] Read more.
Functional Movement Screening (FMS) is a test used to evaluate fundamental movement patterns in the human body and identify functional limitations. However, the challenge of carrying out an automated assessment of FMS is that complex human movements are difficult to model accurately and efficiently. To address this challenge, this paper proposes an automatic evaluation method for FMS based on a multi-scale lightweight 3D convolution encoder–decoder (ML3D-ED) architecture. This method adopts a self-built multi-scale lightweight 3D convolution architecture to extract features from videos. The extracted features are then processed using an encoder–decoder architecture and probabilistic integration technique to effectively predict the final score distribution. This architecture, compared with the traditional Two-Stream Inflated 3D ConvNet (I3D) network, offers a better performance and accuracy in capturing advanced human movement features in temporal and spatial dimensions. Specifically, the ML3D-ED backbone network reduces the number of parameters by 59.5% and the computational cost by 77.7% when compared to I3D. Experiments have shown that ML3D-ED achieves an accuracy of 93.33% on public datasets, demonstrating an improvement of approximately 9% over the best existing method. This outcome demonstrates the effectiveness of and advancements made by the ML3D-ED architecture and probabilistic integration technique in extracting advanced human movement features and evaluating functional movements. Full article
Show Figures

Figure 1

18 pages, 7562 KiB  
Article
Graph- and Machine-Learning-Based Texture Classification
by Musrrat Ali, Sanoj Kumar, Rahul Pal, Manoj K. Singh and Deepika Saini
Electronics 2023, 12(22), 4626; https://doi.org/10.3390/electronics12224626 - 12 Nov 2023
Cited by 2 | Viewed by 2007
Abstract
The analysis of textures is an important task in image processing and computer vision because it provides significant data for image retrieval, synthesis, segmentation, and classification. Automatic texture recognition is difficult, however, and necessitates advanced computational techniques due to the complexity and diversity [...] Read more.
The analysis of textures is an important task in image processing and computer vision because it provides significant data for image retrieval, synthesis, segmentation, and classification. Automatic texture recognition is difficult, however, and necessitates advanced computational techniques due to the complexity and diversity of natural textures. This paper presents a method for classifying textures using graphs; specifically, natural and horizontal visibility graphs. The related image natural visibility graph (INVG) and image horizontal visibility graph (IHVG) are used to obtain features for classifying textures. These features are the clustering coefficient and the degree distribution. The suggested outcomes show that the aforementioned technique outperforms traditional ones and even comes close to matching the performance of convolutional neural networks (CNNs). Classifiers such as the support vector machine (SVM), K-nearest neighbor (KNN), decision tree (DT), and random forest (RF) are utilized for the categorization. The suggested method is tested on well-known image datasets like the Brodatz texture and the Salzburg texture image (STex) datasets. The results are positive, showing the potential of graph methods for texture classification. Full article
Show Figures

Figure 1

10 pages, 1247 KiB  
Article
Multi-Modality Tensor Fusion Based Human Fatigue Detection
by Jongwoo Ha, Joonhyuck Ryu and Joonghoon Ko
Electronics 2023, 12(15), 3344; https://doi.org/10.3390/electronics12153344 - 4 Aug 2023
Viewed by 1229
Abstract
Multimodal learning is an expanding research area and aims to pursue a better understanding of given data by regarding different modals. Multimodal approaches for qualitative data are used for the quantitative proofing of ground-truth datasets and discovering unexpected phenomena. In this paper, we [...] Read more.
Multimodal learning is an expanding research area and aims to pursue a better understanding of given data by regarding different modals. Multimodal approaches for qualitative data are used for the quantitative proofing of ground-truth datasets and discovering unexpected phenomena. In this paper, we investigate the effect of multimodal learning schemes of quantitative data to assess its qualitative state. We try to interpret human fatigue levels through analyzing video, thermal image and voice data together. The experiment showed that the multimodal approach using three types of data was more effective than the method of using each dataset individually. As a result, we identified the possibility of predicting human fatigue states. Full article
Show Figures

Figure 1

18 pages, 6534 KiB  
Article
Underwater Image Color Constancy Calculation with Optimized Deep Extreme Learning Machine Based on Improved Arithmetic Optimization Algorithm
by Junyi Yang, Qichao Yu, Sheng Chen and Donghe Yang
Electronics 2023, 12(14), 3174; https://doi.org/10.3390/electronics12143174 - 21 Jul 2023
Cited by 1 | Viewed by 1107
Abstract
To overcome the challenges posed by the underwater environment and restore the true colors of marine objects’ surfaces, a novel underwater image illumination estimation model, termed the iterative chaotic improved arithmetic optimization algorithm for deep extreme learning machines (IAOA-DELM), is proposed. In this [...] Read more.
To overcome the challenges posed by the underwater environment and restore the true colors of marine objects’ surfaces, a novel underwater image illumination estimation model, termed the iterative chaotic improved arithmetic optimization algorithm for deep extreme learning machines (IAOA-DELM), is proposed. In this study, the gray edge framework is utilized to extract color features from underwater images, which are employed as input vectors. To address the issue of unstable prediction results caused by the random selection of parameters in DELM, the arithmetic optimization algorithm (AOA) is integrated, and the search segment mapping method is optimized by using hidden layer biases and input layer weights. Furthermore, an iterative chaotic mapping initialization strategy is incorporated to provide AOA with a better initial search proxy. The IAOA-DELM model computes illumination information based on the input color vectors. Experimental evaluations conducted on actual underwater images demonstrate that the proposed IAOA-DELM illumination correction model achieves an accuracy of 96.07%. When compared to the ORELM, ELM, RVFL, and BP models, the IAOA-DELM model exhibits improvements of 6.96%, 7.54%, 8.00%, and 8.89%, respectively, making it the most effective among the compared illumination correction models. Full article
Show Figures

Figure 1

25 pages, 9839 KiB  
Article
An Improved Median Filter Based on YOLOv5 Applied to Electrochemiluminescence Image Denoising
by Jun Yang, Junyang Chen, Jun Li, Shijie Dai and Yihui He
Electronics 2023, 12(7), 1544; https://doi.org/10.3390/electronics12071544 - 24 Mar 2023
Cited by 1 | Viewed by 1941
Abstract
In many experiments, the electrochemiluminescence images captured by smartphones often have a lot of noise, which makes it difficult for researchers to accurately analyze the light spot information from the captured images. Therefore, it is very important to remove the noise in the [...] Read more.
In many experiments, the electrochemiluminescence images captured by smartphones often have a lot of noise, which makes it difficult for researchers to accurately analyze the light spot information from the captured images. Therefore, it is very important to remove the noise in the image. In this paper, a Center-Adaptive Median Filter (CAMF) based on YOLOv5 is proposed. Unlike other traditional filtering algorithms, CAMF can adjust its size in real-time according to the current pixel position, the center and the boundary frame of each light spot, and the distance between them. This gives CAMF both a strong noise reduction ability and light spot detail protection ability. In our experiment, the evaluation scores of CAMF for the three indicators Peak Signal-to-Noise Ratio (PSNR), Image Enhancement Factor (IEF), and Structural Similarity (SSIM) were 40.47 dB, 613.28 and 0.939, respectively. The results show that CAMF is superior to other filtering algorithms in noise reduction and light spot protection. Full article
Show Figures

Figure 1

Back to TopTop