Computer Vision and Pattern Recognition with Applications, 2nd Edition

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (30 September 2024) | Viewed by 4769

Special Issue Editor


E-Mail Website
Guest Editor
School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China
Interests: computer vision; pattern recognition; multimedia computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues, 

Computer vision and pattern recognition are fundamental problems in artificial intelligence, which also belongs to the application scopes of mathematical theory and tools. Computer vision enables computers and systems to derive meaningful information from digital images, videos, and other visual inputs, and take action or make recommendations based on that information. Pattern recognition is the process of recognizing patterns using a machine learning algorithm. In recent years, there has been a rapid expansion of computer vision and pattern recognition, and a wide range of applications based on computer vision and pattern recognition can be seen everywhere, e.g., object detection, recognition, segmentation, classification, content generation, and multimedia analysis. In this Special Issue, we aim to assemble recent advances in computer vision, pattern recognition, and related extended applications.

Prof. Dr. Teng Li
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • pattern classification and clustering
  • machine learning, neural network, and deep learning
  • theory in computer vision and pattern recognition
  • low-level vision, image processing, and machine vision
  • 3D computer vision and reconstruction
  • object detection, tracking, recognition, and action recognition
  • data mining and signal processing
  • multimedia/multimodal analysis and applications
  • biomedical image processing and analysis
  • medical image analysis and applications
  • graph theory and its applications
  • vision analysis and understanding
  • vision applications and systems
  • vision for robots and autonomous driving
  • vision and language

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

27 pages, 10884 KiB  
Article
Two–Stage Detection and Localization of Inter–Frame Tampering in Surveillance Videos Using Texture and Optical Flow
by Naheed Akhtar, Muhammad Hussain and Zulfiqar Habib
Mathematics 2024, 12(22), 3482; https://doi.org/10.3390/math12223482 - 7 Nov 2024
Viewed by 403
Abstract
Surveillance cameras provide security and protection through real-time monitoring or through the investigation of recorded videos. The authenticity of surveillance videos cannot be taken for granted, but tampering detection is challenging. Existing techniques face significant limitations, including restricted applicability, poor generalizability, and high [...] Read more.
Surveillance cameras provide security and protection through real-time monitoring or through the investigation of recorded videos. The authenticity of surveillance videos cannot be taken for granted, but tampering detection is challenging. Existing techniques face significant limitations, including restricted applicability, poor generalizability, and high computational complexity. This paper presents a robust detection system to meet the challenges of frame duplication (FD) and frame insertion (FI) detection in surveillance videos. The system leverages the alterations in texture patterns and optical flow between consecutive frames and works in two stages; first, suspicious tampered videos are detected using motion residual–based local binary patterns (MR–LBPs) and SVM; second, by eliminating false positives, the precise tampering location is determined using the consistency in the aggregation of optical flow and the variance in MR–LBPs. The system is extensively evaluated on a large COMSATS Structured Video Tampering Evaluation Dataset (CSVTED) comprising challenging videos with varying quality of tampering and complexity levels and cross–validated on benchmark public domain datasets. The system exhibits outstanding performance, achieving 99.5% accuracy in detecting and pinpointing tampered regions. It ensures the generalization and wide applicability of the system while maintaining computational efficiency. Full article
Show Figures

Figure 1

14 pages, 892 KiB  
Article
CC-DETR: DETR with Hybrid Context and Multi-Scale Coordinate Convolution for Crowd Counting
by Yanhong Gu, Tao Zhang, Yuxia Hu and Fudong Nian
Mathematics 2024, 12(10), 1562; https://doi.org/10.3390/math12101562 - 17 May 2024
Viewed by 1073
Abstract
Prevailing crowd counting approaches primarily rely on density map regression methods. Despite wonderful progress, significant scale variations and complex background interference within the same image remain challenges. To address these issues, in this paper we propose a novel DETR-based crowd counting framework called [...] Read more.
Prevailing crowd counting approaches primarily rely on density map regression methods. Despite wonderful progress, significant scale variations and complex background interference within the same image remain challenges. To address these issues, in this paper we propose a novel DETR-based crowd counting framework called Crowd Counting DETR (CC-DETR), which aims to extend the state-of-the-art DETR object detection framework to the crowd counting task. In CC-DETR, a DETR-like encoder–decoder structure (Hybrid Context DETR, i.e., HCDETR) is proposed to tackle complex visual information by fusing features from hybrid semantic levels through a transformer. In addition, we design a Coordinate Dilated Convolution Module (CDCM) to effectively employ position-sensitive context information in different scales. Extensive experiments on three challenging crowd counting datasets (ShanghaiTech, UCF-QNRF, and NWPU) demonstrate that our model is effective and competitive when compared against SOTA crowd counting models. Full article
Show Figures

Figure 1

17 pages, 5606 KiB  
Article
LASFormer: Light Transformer for Action Segmentation with Receptive Field-Guided Distillation and Action Relation Encoding
by Zhichao Ma and Kan Li
Mathematics 2024, 12(1), 57; https://doi.org/10.3390/math12010057 - 24 Dec 2023
Viewed by 1183
Abstract
Transformer-based models for action segmentation have achieved high frame-wise accuracy against challenging benchmarks. However, they rely on multiple decoders and self-attention blocks for informative representations, whose huge computing and memory costs remain an obstacle to handling long video sequences and practical deployment. To [...] Read more.
Transformer-based models for action segmentation have achieved high frame-wise accuracy against challenging benchmarks. However, they rely on multiple decoders and self-attention blocks for informative representations, whose huge computing and memory costs remain an obstacle to handling long video sequences and practical deployment. To address these issues, we design a light transformer model for the action segmentation task, named LASFormer, with a novel encoder–decoder structure based on three key designs. First, we propose a receptive field-guided distillation to realize mode reduction, which can overcome more generally the gap in semantic feature structure between the intermediate features by aggregated temporal dilation convolution (ATDC). Second, we propose a simplified implicit attention to replace self-attention to avoid its quadratic complexity. Third, we design an efficient action relation encoding module embedded after the decoder, where the temporal graph reasoning introduces an inductive bias that adjacent frames are more likely to belong to the same class of model global temporal relations, and the cross-model fusion structure integrates frame-level and segment-level temporal clues, which can avoid over-segmentation independent of multiple decoders, thus reducing further computational complexity. Extensive experiments have verified the effectiveness and efficiency of the framework. Against the challenging 50Salads, GTEA, and Breakfast benchmarks, LASFormer significantly outperforms the current state-of-the-art methods in accuracy, edit score, and F1 score. Full article
Show Figures

Figure 1

29 pages, 12414 KiB  
Article
OMOFuse: An Optimized Dual-Attention Mechanism Model for Infrared and Visible Image Fusion
by Jianye Yuan and Song Li
Mathematics 2023, 11(24), 4902; https://doi.org/10.3390/math11244902 - 7 Dec 2023
Cited by 1 | Viewed by 1086
Abstract
Infrared and visible image fusion aims to fuse the thermal information of infrared images and the texture information of visible images into images that are more in compliance with people’s visual perception characteristics. However, in the existing related work, the fused images have [...] Read more.
Infrared and visible image fusion aims to fuse the thermal information of infrared images and the texture information of visible images into images that are more in compliance with people’s visual perception characteristics. However, in the existing related work, the fused images have incomplete contextual information and poor fusion results. This paper presents a new image fusion algorithm—OMOFuse. At first, both the channel and spatial attention mechanisms are optimized by a DCA (dual-channel attention) mechanism and an ESA (enhanced spatial attention) mechanism. Then, an ODAM (optimized dual-attention mechanism) module is constructed to further improve the integration effect. Moreover, a MO module is used to improve the network’s feature extraction capability for contextual information. Finally, there is the loss function ℒ from the three parts of SSL (structural similarity loss), PL (perceptual loss), and GL (gap loss). Extensive experiments on three major datasets are performed to demonstrate that OMOFuse outperforms the existing image fusion methods in terms of quantitative determination, qualitative detection, and superior generalization capabilities. Further evidence of the effectiveness of our algorithm in this study are provided. Full article
Show Figures

Figure 1

Back to TopTop