Machine Learning Algorithms for Image Understanding and Analysis

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Evolutionary Algorithms and Machine Learning".

Deadline for manuscript submissions: 28 February 2025 | Viewed by 5940

Special Issue Editor

Special Issue Information

Dear Colleagues,

This Special Issue calls for innovative contributions on developing and applying machine learning algorithms to advance image understanding and analysis. We invite papers covering new algorithms, models, and frameworks using machine learning, computer vision, and AI to extract meaningful information from complex image data across domains and modalities.

Topics of interest include deep neural networks for image recognition and segmentation, graph-based machine learning algorithms, adversarial learning methods, explainable AI models, as well as image analysis techniques for the re-identification and understanding of patterns, activities, relationships, and high-level concepts. Both theoretical developments and applications of machine learning algorithms on image data are within the scope of this Special Issue. Submissions offering new insights into image analysis using machine learning are highly encouraged.

Dr. Paolo Spagnolo
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning algorithms
  • deep learning algorithms
  • re-identification
  • pattern recognition

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 1508 KiB  
Article
Adversarial Validation in Image Classification Datasets by Means of Cumulative Spectral Gradient
by Diego Renza, Ernesto Moya-Albor and Adrian Chavarro
Algorithms 2024, 17(11), 531; https://doi.org/10.3390/a17110531 - 19 Nov 2024
Viewed by 342
Abstract
The main objective of a machine learning (ML) system is to obtain a trained model from input data in such a way that it allows predictions to be made on new i.i.d. (Independently and Identically Distributed) data with the lowest possible error. However, [...] Read more.
The main objective of a machine learning (ML) system is to obtain a trained model from input data in such a way that it allows predictions to be made on new i.i.d. (Independently and Identically Distributed) data with the lowest possible error. However, how can we assess whether the training and test data have a similar distribution? To answer this question, this paper presents a proposal to determine the degree of distribution shift of two datasets. To this end, a metric for evaluating complexity in datasets is used, which can be applied in multi-class problems, comparing each pair of classes of the two sets. The proposed methodology has been applied to three well-known datasets: MNIST, CIFAR-10 and CIFAR-100, together with corrupted versions of these. Through this methodology, it is possible to evaluate which types of modification have a greater impact on the generalization of the models without the need to train multiple models multiple times, also allowing us to determine which classes are more affected by corruption. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)
Show Figures

Figure 1

21 pages, 3213 KiB  
Article
An Autoencoder-Based Task-Oriented Semantic Communication System for M2M Communication
by Prabhath Samarathunga, Hossein Rezaei, Maheshi Lokumarambage, Thushan Sivalingam, Nandana Rajatheva and Anil Fernando
Algorithms 2024, 17(11), 492; https://doi.org/10.3390/a17110492 - 2 Nov 2024
Viewed by 500
Abstract
Semantic communication (SC) is a communication paradigm that has gained significant attention, as it offers a potential solution to move beyond Shannon’s formulation in bandwidth-limited communication channels by delivering the semantic meaning of the message rather than its exact form. In this paper, [...] Read more.
Semantic communication (SC) is a communication paradigm that has gained significant attention, as it offers a potential solution to move beyond Shannon’s formulation in bandwidth-limited communication channels by delivering the semantic meaning of the message rather than its exact form. In this paper, we propose an autoencoder-based SC system for transmitting images between two machines over error-prone channels to support emerging applications such as VIoT, XR, M2M, and M2H communications. The proposed autoencoder architecture, with a semantically modeled encoder and decoder, transmits image data as a reduced-dimension vector (latent vector) through an error-prone channel. The decoder then reconstructs the image to determine its M2M implications. The autoencoder is trained for different noise levels under various channel conditions, and both image quality and classification accuracy are used to evaluate the system’s efficacy. A CNN image classifier measures accuracy, as no image quality metric is available for SC yet. The simulation results show that all proposed autoencoders maintain high image quality and classification accuracy at high SNRs, while the autoencoder trained with zero noise underperforms other trained autoencoders at moderate SNRs. The results further indicate that all other proposed autoencoders trained under different noise levels are highly robust against channel impairments. We compare the proposed system against a comparable JPEG transmission system, and results reveal that the proposed system outperforms the JPEG system in compression efficiency by up to 50% and in received image quality with an image coding gain of up to 17 dB. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)
Show Figures

Figure 1

28 pages, 5276 KiB  
Article
Frequency-Domain and Spatial-Domain MLMVN-Based Convolutional Neural Networks
by Igor Aizenberg and Alexander Vasko
Algorithms 2024, 17(8), 361; https://doi.org/10.3390/a17080361 - 17 Aug 2024
Viewed by 849
Abstract
This paper presents a detailed analysis of a convolutional neural network based on multi-valued neurons (CNNMVN) and a fully connected multilayer neural network based on multi-valued neurons (MLMVN), employed here as a convolutional neural network in the frequency domain. We begin by providing [...] Read more.
This paper presents a detailed analysis of a convolutional neural network based on multi-valued neurons (CNNMVN) and a fully connected multilayer neural network based on multi-valued neurons (MLMVN), employed here as a convolutional neural network in the frequency domain. We begin by providing an overview of the fundamental concepts underlying CNNMVN, focusing on the organization of convolutional layers and the CNNMVN learning algorithm. The error backpropagation rule for this network is justified and presented in detail. Subsequently, we consider how MLMVN can be used as a convolutional neural network in the frequency domain. It is shown that each neuron in the first hidden layer of MLMVN may work as a frequency-domain convolutional kernel, utilizing the Convolution Theorem. Essentially, these neurons create Fourier transforms of the feature maps that would have resulted from the convolutions in the spatial domain performed in regular convolutional neural networks. Furthermore, we discuss optimization techniques for both networks and compare the resulting convolutions to explore which features they extract from images. Finally, we present experimental results showing that both approaches can achieve high accuracy in image recognition. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)
Show Figures

Figure 1

26 pages, 501 KiB  
Article
In-Depth Analysis of GAF-Net: Comparative Fusion Approaches in Video-Based Person Re-Identification
by Moncef Boujou, Rabah Iguernaissi, Lionel Nicod, Djamal Merad and Séverine Dubuisson
Algorithms 2024, 17(8), 352; https://doi.org/10.3390/a17080352 - 11 Aug 2024
Viewed by 1215
Abstract
This study provides an in-depth analysis of GAF-Net, a novel model for video-based person re-identification (Re-ID) that matches individuals across different video sequences. GAF-Net combines appearance-based features with gait-based features derived from skeletal data, offering a new approach that diverges from traditional silhouette-based [...] Read more.
This study provides an in-depth analysis of GAF-Net, a novel model for video-based person re-identification (Re-ID) that matches individuals across different video sequences. GAF-Net combines appearance-based features with gait-based features derived from skeletal data, offering a new approach that diverges from traditional silhouette-based methods. We thoroughly examine each module of GAF-Net and explore various fusion methods at the both score and feature levels, extending beyond initial simple concatenation. Comprehensive evaluations on the iLIDS-VID and MARS datasets demonstrate GAF-Net’s effectiveness across scenarios. GAF-Net achieves state-of-the-art 93.2% rank-1 accuracy on iLIDS-VID’s long sequences, while MARS results (86.09% mAP, 89.78% rank-1) reveal challenges with shorter, variable sequences in complex real-world settings. We demonstrate that integrating skeleton-based gait features consistently improves Re-ID performance, particularly with long, more informative sequences. This research provides crucial insights into multi-modal feature integration in Re-ID tasks, laying a foundation for the advancement of multi-modal biometric systems for diverse computer vision applications. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)
Show Figures

Figure 1

27 pages, 2251 KiB  
Article
Threshold Active Learning Approach for Physical Violence Detection on Images Obtained from Video (Frame-Level) Using Pre-Trained Deep Learning Neural Network Models
by Itzel M. Abundez, Roberto Alejo, Francisco Primero Primero, Everardo E. Granda-Gutiérrez, Otniel Portillo-Rodríguez and Juan Alberto Antonio Velázquez
Algorithms 2024, 17(7), 316; https://doi.org/10.3390/a17070316 - 18 Jul 2024
Viewed by 1582
Abstract
Public authorities and private companies have used video cameras as part of surveillance systems, and one of their objectives is the rapid detection of physically violent actions. This task is usually performed by human visual inspection, which is labor-intensive. For this reason, different [...] Read more.
Public authorities and private companies have used video cameras as part of surveillance systems, and one of their objectives is the rapid detection of physically violent actions. This task is usually performed by human visual inspection, which is labor-intensive. For this reason, different deep learning models have been implemented to remove the human eye from this task, yielding positive results. One of the main problems in detecting physical violence in videos is the variety of scenarios that can exist, which leads to different models being trained on datasets, leading them to detect physical violence in only one or a few types of videos. In this work, we present an approach for physical violence detection on images obtained from video based on threshold active learning, that increases the classifier’s robustness in environments where it was not trained. The proposed approach consists of two stages: In the first stage, pre-trained neural network models are trained on initial datasets, and we use a threshold (μ) to identify those images that the classifier considers ambiguous or hard to classify. Then, they are included in the training dataset, and the model is retrained to improve its classification performance. In the second stage, we test the model with video images from other environments, and we again employ (μ) to detect ambiguous images that a human expert analyzes to determine the real class or delete the ambiguity on them. After that, the ambiguous images are added to the original training set and the classifier is retrained; this process is repeated while ambiguous images exist. The model is a hybrid neural network that uses transfer learning and a threshold μ to detect physical violence on images obtained from video files successfully. In this active learning process, the classifier can detect physical violence in different environments, where the main contribution is the method used to obtain a threshold μ (which is based on the neural network output) that allows human experts to contribute to the classification process to obtain more robust neural networks and high-quality datasets. The experimental results show the proposed approach’s effectiveness in detecting physical violence, where it is trained using an initial dataset, and new images are added to improve its robustness in diverse environments. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)
Show Figures

Figure 1

20 pages, 1373 KiB  
Article
A Sparsity-Invariant Model via Unifying Depth Prediction and Completion
by Shuling Wang, Fengze Jiang and Xiaojin Gong
Algorithms 2024, 17(7), 298; https://doi.org/10.3390/a17070298 - 6 Jul 2024
Viewed by 663
Abstract
The development of a sparse-invariant depth completion model capable of handling varying levels of input depth sparsity is highly desirable in real-world applications. However, existing sparse-invariant models tend to degrade when the input depth points are extremely sparse. In this paper, we propose [...] Read more.
The development of a sparse-invariant depth completion model capable of handling varying levels of input depth sparsity is highly desirable in real-world applications. However, existing sparse-invariant models tend to degrade when the input depth points are extremely sparse. In this paper, we propose a new model that combines the advantageous designs of depth completion and monocular depth estimation tasks to achieve sparse invariance. Specifically, we construct a dual-branch architecture with one branch dedicated to depth prediction and the other to depth completion. Additionally, we integrate the multi-scale local planar module in the decoders of both branches. Experimental results on the NYU Depth V2 benchmark and the OPPO prototype dataset equipped with the Spot-iToF316 sensor demonstrate that our model achieves reliable results even in cases with irregularly distributed, limited or absent depth information. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)
Show Figures

Figure 1

Back to TopTop