Machine Learning Techniques for Computer Vision

A special issue of Future Internet (ISSN 1999-5903). This special issue belongs to the section "Internet of Things".

Deadline for manuscript submissions: 10 May 2025 | Viewed by 4562

Special Issue Editors


E-Mail Website
Guest Editor
School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
Interests: machine learning; pattern recognition; computer vision; bioinformatics

E-Mail Website
Guest Editor
Department of Information Engineering, University of Brescia, Via Branze 38, 25121 Brescia, Italy
Interests: artificial intelligence; AI planning; multi-agent planning; machine learning; neural networks; deep learning; heuristic optimization; heuristic search

Special Issue Information

Dear Colleagues,

Computer vision can be used to enhance the reliability of communications through wireless networks and to improve human–computer interaction. With the rapid development of machine learning techniques, research in computer vision has made significant progress in recent years. Deep learning methods have brought revolutionized breakthroughs in many computer vision tasks. Techniques have been developed to process stationary images and the combination of video and audio input.

However, the “black box” nature of the deep learning model blocks its application to computer vision tasks for high-stakes decision making. Developing interpretable deep learning models is thus desired in this application area. Many post hoc interpretability analysis methods have been developed to understand pre-trained models. Ad hoc interpretability modeling methods have also been developed, such as feature disentanglement and interpretable model extraction using mimic learning.

Similar to large language models, transformer-based large language models have also been intensively used for tackling computer vision tasks. However, due to the high cost of data labeling in special application scenarios, self-supervised learning, few-shot learning, transfer learning, or other new machine learning techniques must be developed to address the lack of labeled data.

In addition, applications in autonomous vehicles, augmented reality, and smart cities are driving the evolution of internet infrastructure for computer vision in profound ways. Autonomous vehicles require real-time data processing for navigation and decision making, relying on robust internet connectivity for updates and synchronization. Augmented reality applications demand high-bandwidth connections to stream immersive content seamlessly. Smart city initiatives leverage computer vision for traffic management, surveillance, and infrastructure monitoring, necessitating scalable networks to handle the influx of data. These applications are catalyzing advancements in internet infrastructure to support the growing demands of computer vision technologies in various domains.

The goal of this Special Issue is to provide a platform for researchers to share their new findings and ideas in the area of tackling computer vision tasks with the development of machine learning methods. Submissions may include reviews, surveys, or technical papers that are original and unpublished, with topic areas including, but not limited to, the following:

  • Computer vision for human–computer interaction;
  • Image processing on mobile devices;
  • Self-supervised learning for computer vision;
  • Few-shot learning for computer vision;
  • Transfer learning for computer vision;
  • Interpretable machine learning methods for computer vision;
  • Machine learning methods for image classification;
  • Machine learning methods for object detection;
  • Machine learning methods for semantic segmentation;
  • Machine learning methods for video processing;
  • Machine learning methods for action recognition;
  • Machine learning methods for anomaly detection;
  • Autonomous vehicles and computer vision;
  • Augmented reality and computer vision;
  • Smart cities and computer vision.

Prof. Dr. Yonggang Lu
Dr. Ivan Serina
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Future Internet is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • computer vision
  • mobile devices
  • human–computer interaction
  • deep learning
  • image processing
  • video processing
  • few-shot learning
  • transfer learning
  • interpretability
  • image classification
  • object detection
  • semantic segmentation
  • action recognition
  • anomaly detection

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 2710 KiB  
Article
SPDepth: Enhancing Self-Supervised Indoor Monocular Depth Estimation via Self-Propagation
by Xiaotong Guo, Huijie Zhao, Shuwei Shao, Xudong Li, Baochang Zhang and Na Li
Future Internet 2024, 16(10), 375; https://doi.org/10.3390/fi16100375 - 16 Oct 2024
Viewed by 899
Abstract
Due to the existence of low-textured areas in indoor scenes, some self-supervised depth estimation methods have specifically designed sparse photometric consistency losses and geometry-based losses. However, some of the loss terms cannot supervise all the pixels, which limits the performance of these methods. [...] Read more.
Due to the existence of low-textured areas in indoor scenes, some self-supervised depth estimation methods have specifically designed sparse photometric consistency losses and geometry-based losses. However, some of the loss terms cannot supervise all the pixels, which limits the performance of these methods. Some approaches introduce an additional optical flow network to provide dense correspondences supervision, but overload the loss function. In this paper, we propose to perform depth self-propagation based on feature self-similarities, where high-accuracy depths are propagated from supervised pixels to unsupervised ones. The enhanced self-supervised indoor monocular depth estimation network is called SPDepth. Since depth self-similarities are significant in a local range, a local window self-attention module is embedded at the end of the network to propagate depths in a window. The depth of a pixel is weighted using the feature correlation scores with other pixels in the same window. The effectiveness of self-propagation mechanism is demonstrated in the experiments on the NYU Depth V2 dataset. The root-mean-squared error of SPDepth is 0.585 and the δ1 accuracy is 77.6%. Zero-shot generalization studies are also conducted on the 7-Scenes dataset and provide a more comprehensive analysis about the application characteristics of SPDepth. Full article
(This article belongs to the Special Issue Machine Learning Techniques for Computer Vision)
Show Figures

Figure 1

17 pages, 857 KiB  
Article
Enhancing Recognition of Human–Object Interaction from Visual Data Using Egocentric Wearable Camera
by Danish Hamid, Muhammad Ehatisham Ul Haq, Amanullah Yasin, Fiza Murtaza and Muhammad Awais Azam
Future Internet 2024, 16(8), 269; https://doi.org/10.3390/fi16080269 - 29 Jul 2024
Viewed by 1492
Abstract
Object detection and human action recognition have great significance in many real-world applications. Understanding how a human being interacts with different objects, i.e., human–object interaction, is also crucial in this regard since it enables diverse applications related to security, surveillance, and immersive reality. [...] Read more.
Object detection and human action recognition have great significance in many real-world applications. Understanding how a human being interacts with different objects, i.e., human–object interaction, is also crucial in this regard since it enables diverse applications related to security, surveillance, and immersive reality. Thus, this study explored the potential of using a wearable camera for object detection and human–object interaction recognition, which is a key technology for the future Internet and ubiquitous computing. We propose a system that uses an egocentric camera view to recognize objects and human–object interactions by analyzing the wearer’s hand pose. Our novel idea leverages the hand joint data of the user, which were extracted from the egocentric camera view, for recognizing different objects and related interactions. Traditional methods for human–object interaction rely on a third-person, i.e., exocentric, camera view by extracting morphological and color/texture-related features, and thus, often fall short when faced with occlusion, camera variations, and background clutter. Moreover, deep learning-based approaches in this regard necessitate substantial data for training, leading to a significant computational overhead. Our proposed approach capitalizes on hand joint data captured from an egocentric perspective, offering a robust solution to the limitations of traditional methods. We propose a machine learning-based innovative technique for feature extraction and description from 3D hand joint data by presenting two distinct approaches: object-dependent and object-independent interaction recognition. The proposed method offered advantages in computational efficiency compared with deep learning methods and was validated using the publicly available HOI4D dataset, where it achieved a best-case average F1-score of 74%. The proposed system paves the way for intuitive human–computer collaboration within the future Internet, enabling applications like seamless object manipulation and natural user interfaces for smart devices, human–robot interactions, virtual reality, and augmented reality. Full article
(This article belongs to the Special Issue Machine Learning Techniques for Computer Vision)
Show Figures

Figure 1

21 pages, 5094 KiB  
Article
TQU-SLAM Benchmark Dataset for Comparative Study to Build Visual Odometry Based on Extracted Features from Feature Descriptors and Deep Learning
by Thi-Hao Nguyen, Van-Hung Le, Huu-Son Do, Trung-Hieu Te and Van-Nam Phan
Future Internet 2024, 16(5), 174; https://doi.org/10.3390/fi16050174 - 17 May 2024
Cited by 1 | Viewed by 1377
Abstract
The problem of data enrichment to train visual SLAM and VO construction models using deep learning (DL) is an urgent problem today in computer vision. DL requires a large amount of data to train a model, and more data with many different contextual [...] Read more.
The problem of data enrichment to train visual SLAM and VO construction models using deep learning (DL) is an urgent problem today in computer vision. DL requires a large amount of data to train a model, and more data with many different contextual and conditional conditions will create a more accurate visual SLAM and VO construction model. In this paper, we introduce the TQU-SLAM benchmark dataset, which includes 160,631 RGB-D frame pairs. It was collected from the corridors of three interconnected buildings comprising a length of about 230 m. The ground-truth data of the TQU-SLAM benchmark dataset were prepared manually, including 6-DOF camera poses, 3D point cloud data, intrinsic parameters, and the transformation matrix between the camera coordinate system and the real world. We also tested the TQU-SLAM benchmark dataset using the PySLAM framework with traditional features such as SHI_TOMASI, SIFT, SURF, ORB, ORB2, AKAZE, KAZE, and BRISK and features extracted from DL such as VGG, DPVO, and TartanVO. The camera pose estimation results are evaluated, and we show that the ORB2 features have the best results (Errd = 5.74 mm), while the ratio of the number of frames with detected keypoints of the SHI_TOMASI feature is the best (rd=98.97%). At the same time, we also present and analyze the challenges of the TQU-SLAM benchmark dataset for building visual SLAM and VO systems. Full article
(This article belongs to the Special Issue Machine Learning Techniques for Computer Vision)
Show Figures

Figure 1

Back to TopTop