Electronics

Research

15 pages, 23607 KiB

Open AccessArticle

Enhancing Image Copy Detection through Dynamic Augmentation and Efficient Sampling with Minimal Data

by Mohamed Fawzy, Noha S. Tawfik and Sherine Nagy Saleh

Electronics 2024, 13(16), 3125; https://doi.org/10.3390/electronics13163125 - 7 Aug 2024

Viewed by 1094

Social networks have become deeply integrated into our daily lives, leading to an increase in image sharing across different platforms. Simultaneously, the existence of robust and user-friendly media editors not only facilitates artistic innovation, but also raises concerns regarding the ease of creating [...] Read more.

Social networks have become deeply integrated into our daily lives, leading to an increase in image sharing across different platforms. Simultaneously, the existence of robust and user-friendly media editors not only facilitates artistic innovation, but also raises concerns regarding the ease of creating misleading media. This highlights the need for developing new advanced techniques for the image copy detection task, which involves evaluating whether photos or videos originate from the same source. This research introduces a novel application of the Vision Transformer (ViT) model to the image copy detection task on the DISC21 dataset. Our approach involves innovative strategic sampling of the extensive DISC21 training set using K-means clustering to achieve a representative subset. Additionally, we employ complex augmentation pipelines applied while training with varying intensities. Our methodology follows the instance discrimination concept, where the Vision Transformer model is used as a classifier to map different augmentations of the same image to the same class. Next, the trained ViT model extracts descriptors of original and manipulated images that subsequently underwent post-processing to reduce dimensionality. Our best-achieving model, tested on a refined query set of 10K augmented images from the DISC21 dataset, attained a state-of-the-art micro-average precision of 0.79, demonstrating the effectiveness and innovation of our approach. Full article

(This article belongs to the Special Issue Advances in Image Processing and Computer Vision Based on Machine Learning)

► Show Figures

Figure 1

27 pages, 6343 KiB

Open AccessArticle

Detection and Classification of Obstructive Sleep Apnea Using Audio Spectrogram Analysis

by Salvatore Serrano, Luca Patanè, Omar Serghini and Marco Scarpa

Electronics 2024, 13(13), 2567; https://doi.org/10.3390/electronics13132567 - 29 Jun 2024

Cited by 1 | Viewed by 1567

Abstract

Sleep disorders are steadily increasing in the population and can significantly affect daily life. Low-cost and noninvasive systems that can assist the diagnostic process will become increasingly widespread in the coming years. This work aims to investigate and compare the performance of machine [...] Read more.

Sleep disorders are steadily increasing in the population and can significantly affect daily life. Low-cost and noninvasive systems that can assist the diagnostic process will become increasingly widespread in the coming years. This work aims to investigate and compare the performance of machine learning-based classifiers for the identification of obstructive sleep apnea–hypopnea (OSAH) events, including apnea/non-apnea status classification, apnea–hypopnea index (AHI) prediction, and AHI severity classification. The dataset considered contains recordings from 192 patients. It is derived from a recently released dataset which contains, amongst others, audio signals recorded with an ambient microphone placed ∼1 m above the studied subjects and apnea/hypopnea accurate events annotations performed by specialized medical doctors. We employ mel spectrogram images extracted from the environmental audio signals as input of a machine-learning-based classifier for apnea/hypopnea events classification. The proposed approach involves a stacked model which utilizes a combination of a pretrained VGG-like audio classification (VGGish) network and a bidirectional long short-term memory (bi-LSTM) network. Performance analysis was conducted using a 5-fold cross-validation approach, leaving out patients used for training and validation of the models in the testing step. Comparative evaluations with recently presented methods from the literature demonstrate the advantages of the proposed approach. The proposed architecture can be considered a useful tool for supporting OSAHS diagnoses by means of low-cost devices such as smartphones. Full article

(This article belongs to the Special Issue Advances in Image Processing and Computer Vision Based on Machine Learning)

► Show Figures

Figure 1

25 pages, 940 KiB

Open AccessArticle

Fast Versatile Video Coding (VVC) Intra Coding for Power-Constrained Applications

by Lei Chen, Baoping Cheng, Haotian Zhu, Haowen Qin, Lihua Deng and Lei Luo

Electronics 2024, 13(11), 2150; https://doi.org/10.3390/electronics13112150 - 31 May 2024

Cited by 1 | Viewed by 887

Abstract

Versatile Video Coding (VVC) achieves impressive coding gain improvement (about 40%+) over the preceding High-Efficiency Video Coding (HEVC) technology at the cost of extremely high computational complexity. Such an extremely high complexity increase is a great challenge for power-constrained applications, such as Internet [...] Read more.

Versatile Video Coding (VVC) achieves impressive coding gain improvement (about 40%+) over the preceding High-Efficiency Video Coding (HEVC) technology at the cost of extremely high computational complexity. Such an extremely high complexity increase is a great challenge for power-constrained applications, such as Internet of video things. In the case of intra coding, VVC utilizes the brute-force recursive search for both the partition structure of the coding unit (CU), which is based on the quadtree with nested multi-type tree (QTMT), and 67 intra prediction modes, compared to 35 in HEVC. As a result, we offer optimization strategies for CU partition decision and intra coding modes to lessen the computational overhead. Regarding the high complexity of the CU partition process, first, CUs are categorized as simple, fuzzy, and complex based on their texture characteristics. Then, we train two random forest classifiers to speed up the RDO-based brute-force recursive search process. One of the classifiers directly predicts the optimal partition modes for simple and complex CUs, while another classifier determines the early termination of the partition process for fuzzy CUs. Meanwhile, to reduce the complexity of intra mode prediction, a fast hierarchical intra mode search method is designed based on the texture features of CUs, including texture complexity, texture direction, and texture context information. Extensive experimental findings demonstrate that the proposed approach reduces complexity by up to 77% compared to the latest VVC reference software (VTM-23.1). Additionally, an average coding time saving of 70% is achieved with only a 1.65% increase in BDBR. Furthermore, when compared to state-of-the-art methods, the proposed method also achieves the largest time saving with comparable BDBR loss. These findings indicate that our method is superior to other up-to-date methods in terms of lowering VVC intra coding complexity, which provides an elective solution for power-constrained applications. Full article

(This article belongs to the Special Issue Advances in Image Processing and Computer Vision Based on Machine Learning)

► Show Figures

Figure 1

14 pages, 617 KiB

Open AccessArticle

Automatic Evaluation Method for Functional Movement Screening Based on Multi-Scale Lightweight 3D Convolution and an Encoder–Decoder

by Xiuchun Lin, Yichao Liu, Chen Feng, Zhide Chen, Xu Yang and Hui Cui

Electronics 2024, 13(10), 1813; https://doi.org/10.3390/electronics13101813 - 7 May 2024

Viewed by 1048

Abstract

Functional Movement Screening (FMS) is a test used to evaluate fundamental movement patterns in the human body and identify functional limitations. However, the challenge of carrying out an automated assessment of FMS is that complex human movements are difficult to model accurately and [...] Read more.

Functional Movement Screening (FMS) is a test used to evaluate fundamental movement patterns in the human body and identify functional limitations. However, the challenge of carrying out an automated assessment of FMS is that complex human movements are difficult to model accurately and efficiently. To address this challenge, this paper proposes an automatic evaluation method for FMS based on a multi-scale lightweight 3D convolution encoder–decoder (ML3D-ED) architecture. This method adopts a self-built multi-scale lightweight 3D convolution architecture to extract features from videos. The extracted features are then processed using an encoder–decoder architecture and probabilistic integration technique to effectively predict the final score distribution. This architecture, compared with the traditional Two-Stream Inflated 3D ConvNet (I3D) network, offers a better performance and accuracy in capturing advanced human movement features in temporal and spatial dimensions. Specifically, the ML3D-ED backbone network reduces the number of parameters by 59.5% and the computational cost by 77.7% when compared to I3D. Experiments have shown that ML3D-ED achieves an accuracy of 93.33% on public datasets, demonstrating an improvement of approximately 9% over the best existing method. This outcome demonstrates the effectiveness of and advancements made by the ML3D-ED architecture and probabilistic integration technique in extracting advanced human movement features and evaluating functional movements. Full article

(This article belongs to the Special Issue Advances in Image Processing and Computer Vision Based on Machine Learning)

► Show Figures

Figure 1

18 pages, 7562 KiB

Open AccessArticle

Graph- and Machine-Learning-Based Texture Classification

by Musrrat Ali, Sanoj Kumar, Rahul Pal, Manoj K. Singh and Deepika Saini

Electronics 2023, 12(22), 4626; https://doi.org/10.3390/electronics12224626 - 12 Nov 2023

Cited by 2 | Viewed by 2007

Abstract

The analysis of textures is an important task in image processing and computer vision because it provides significant data for image retrieval, synthesis, segmentation, and classification. Automatic texture recognition is difficult, however, and necessitates advanced computational techniques due to the complexity and diversity [...] Read more.

The analysis of textures is an important task in image processing and computer vision because it provides significant data for image retrieval, synthesis, segmentation, and classification. Automatic texture recognition is difficult, however, and necessitates advanced computational techniques due to the complexity and diversity of natural textures. This paper presents a method for classifying textures using graphs; specifically, natural and horizontal visibility graphs. The related image natural visibility graph (INVG) and image horizontal visibility graph (IHVG) are used to obtain features for classifying textures. These features are the clustering coefficient and the degree distribution. The suggested outcomes show that the aforementioned technique outperforms traditional ones and even comes close to matching the performance of convolutional neural networks (CNNs). Classifiers such as the support vector machine (SVM), K-nearest neighbor (KNN), decision tree (DT), and random forest (RF) are utilized for the categorization. The suggested method is tested on well-known image datasets like the Brodatz texture and the Salzburg texture image (STex) datasets. The results are positive, showing the potential of graph methods for texture classification. Full article

(This article belongs to the Special Issue Advances in Image Processing and Computer Vision Based on Machine Learning)

► Show Figures

Figure 1

10 pages, 1247 KiB

Open AccessArticle

Multi-Modality Tensor Fusion Based Human Fatigue Detection

by Jongwoo Ha, Joonhyuck Ryu and Joonghoon Ko

Electronics 2023, 12(15), 3344; https://doi.org/10.3390/electronics12153344 - 4 Aug 2023

Viewed by 1229

Abstract

Multimodal learning is an expanding research area and aims to pursue a better understanding of given data by regarding different modals. Multimodal approaches for qualitative data are used for the quantitative proofing of ground-truth datasets and discovering unexpected phenomena. In this paper, we [...] Read more.

Multimodal learning is an expanding research area and aims to pursue a better understanding of given data by regarding different modals. Multimodal approaches for qualitative data are used for the quantitative proofing of ground-truth datasets and discovering unexpected phenomena. In this paper, we investigate the effect of multimodal learning schemes of quantitative data to assess its qualitative state. We try to interpret human fatigue levels through analyzing video, thermal image and voice data together. The experiment showed that the multimodal approach using three types of data was more effective than the method of using each dataset individually. As a result, we identified the possibility of predicting human fatigue states. Full article

(This article belongs to the Special Issue Advances in Image Processing and Computer Vision Based on Machine Learning)

► Show Figures

Figure 1

18 pages, 6534 KiB

Open AccessArticle

Underwater Image Color Constancy Calculation with Optimized Deep Extreme Learning Machine Based on Improved Arithmetic Optimization Algorithm

by Junyi Yang, Qichao Yu, Sheng Chen and Donghe Yang

Electronics 2023, 12(14), 3174; https://doi.org/10.3390/electronics12143174 - 21 Jul 2023

Cited by 1 | Viewed by 1107

Abstract

To overcome the challenges posed by the underwater environment and restore the true colors of marine objects’ surfaces, a novel underwater image illumination estimation model, termed the iterative chaotic improved arithmetic optimization algorithm for deep extreme learning machines (IAOA-DELM), is proposed. In this [...] Read more.

To overcome the challenges posed by the underwater environment and restore the true colors of marine objects’ surfaces, a novel underwater image illumination estimation model, termed the iterative chaotic improved arithmetic optimization algorithm for deep extreme learning machines (IAOA-DELM), is proposed. In this study, the gray edge framework is utilized to extract color features from underwater images, which are employed as input vectors. To address the issue of unstable prediction results caused by the random selection of parameters in DELM, the arithmetic optimization algorithm (AOA) is integrated, and the search segment mapping method is optimized by using hidden layer biases and input layer weights. Furthermore, an iterative chaotic mapping initialization strategy is incorporated to provide AOA with a better initial search proxy. The IAOA-DELM model computes illumination information based on the input color vectors. Experimental evaluations conducted on actual underwater images demonstrate that the proposed IAOA-DELM illumination correction model achieves an accuracy of 96.07%. When compared to the ORELM, ELM, RVFL, and BP models, the IAOA-DELM model exhibits improvements of 6.96%, 7.54%, 8.00%, and 8.89%, respectively, making it the most effective among the compared illumination correction models. Full article

(This article belongs to the Special Issue Advances in Image Processing and Computer Vision Based on Machine Learning)

► Show Figures

Figure 1

25 pages, 9839 KiB

Open AccessArticle

An Improved Median Filter Based on YOLOv5 Applied to Electrochemiluminescence Image Denoising

by Jun Yang, Junyang Chen, Jun Li, Shijie Dai and Yihui He

Electronics 2023, 12(7), 1544; https://doi.org/10.3390/electronics12071544 - 24 Mar 2023

Cited by 1 | Viewed by 1941

Abstract

In many experiments, the electrochemiluminescence images captured by smartphones often have a lot of noise, which makes it difficult for researchers to accurately analyze the light spot information from the captured images. Therefore, it is very important to remove the noise in the [...] Read more.

In many experiments, the electrochemiluminescence images captured by smartphones often have a lot of noise, which makes it difficult for researchers to accurately analyze the light spot information from the captured images. Therefore, it is very important to remove the noise in the image. In this paper, a Center-Adaptive Median Filter (CAMF) based on YOLOv5 is proposed. Unlike other traditional filtering algorithms, CAMF can adjust its size in real-time according to the current pixel position, the center and the boundary frame of each light spot, and the distance between them. This gives CAMF both a strong noise reduction ability and light spot detail protection ability. In our experiment, the evaluation scores of CAMF for the three indicators Peak Signal-to-Noise Ratio (PSNR), Image Enhancement Factor (IEF), and Structural Similarity (SSIM) were 40.47 dB, 613.28 and 0.939, respectively. The results show that CAMF is superior to other filtering algorithms in noise reduction and light spot protection. Full article

(This article belongs to the Special Issue Advances in Image Processing and Computer Vision Based on Machine Learning)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Advances in Image Processing and Computer Vision Based on Machine Learning

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI