sensors-logo

Journal Browser

Journal Browser

Computer Vision Sensing and Pattern Recognition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (30 November 2024) | Viewed by 6992

Special Issue Editor


E-Mail Website
Guest Editor
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
Interests: machine learning; image processing; computer vision; sensor technology
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue is devoted to computer vision and pattern recognition methods based on deep learning (CVPR-DL). 

In recent years, advances in deep learning and a new round of artificial intelligence development have promoted the development of computer vision sensing and pattern recognition. For example, human action pattern recognition based on acceleration sensors has become an emerging research direction in the field of pattern recognition. This Special Issue is oriented towards intelligent algorithms and technologies in computer vision sensing and pattern recognition fields. Our aim in this publication is to share the latest theoretical and technological achievements in computer vision sensing and pattern recognition, as well as to encourage scientists to publish their experimental and theoretical results in these fields. The related application areas include image and video analysis and processing, intelligent sensors, intelligent video surveillance, intelligent visual inspection, intelligent communication control and management, human face and fingerprint recognition, autonomous driving, multimedia communication, as well as security and privacy problems in sensing.

Therefore, in this Special Issue, we invite submissions related to, but not limited to, the following research topics: vision research under new imaging conditions, biologically inspired computer vision research, multi-sensor fusion 3D vision research, visual scene understanding under highly dynamic complex scenes, small-sample target recognition and understanding, and complex behavior semantic understanding. Theoretical and experimental studies are welcome, as are comprehensive review and survey papers.

Prof. Dr. Daming Shi
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • pattern recognition
  • intelligent sensing
  • deep learning
  • image and video analysis and processing
  • intelligent sensors
  • intelligent video surveillance
  • intelligent visual inspection
  • security and privacy in sensing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 1222 KiB  
Article
Infrared Small Target Detection Algorithm Based on Improved Dense Nested U-Net Network
by Xinyue Du, Ke Cheng, Jin Zhang, Yuanyu Wang, Fan Yang, Wei Zhou and Yu Lin
Sensors 2025, 25(3), 814; https://doi.org/10.3390/s25030814 - 29 Jan 2025
Viewed by 342
Abstract
Infrared weak and small target detection technology has attracted much attention in recent years and is crucial in the application fields of early warning, monitoring, medical diagnostics, and anti-UAV detection.With the advancement of deep learning, CNN-based methods have achieved promising results in general-purpose [...] Read more.
Infrared weak and small target detection technology has attracted much attention in recent years and is crucial in the application fields of early warning, monitoring, medical diagnostics, and anti-UAV detection.With the advancement of deep learning, CNN-based methods have achieved promising results in general-purpose target detection due to their powerful modeling capabilities; however, CNN-based methods cannot be directly applied to infrared small targets due to the disappearance of deep targets caused by multiple downsampling operations. Aiming at these problems, we proposed an improved dense nesting and attention infrared small target detection method based on U-Net called IDNA-UNet. A dense nested interaction module (DNIM) is designed as a feature extraction module to achieve level-by-level feature fusion and retain small targets’ features and detailed positioning information. To integrate low-level features into deeper high-level features, we designed a bottom-up feature pyramid fusion module, which can further retain high-level semantic information and target detail information. In addition, a more suitable scale and position sensitive (SLS) loss is applied to each prediction scale to help the detector locate the target more accurately and distinguish different scales of the target. With our IDNA-UNet, the contextual information of small targets can be well incorporated and fully exploited by repetitive fusion and enhancement. Compared with existing methods, IDNA-UNet has achieved significant advantages in the intersection over union (IoU), detection probability (Pd), and false alarm rate (Fa) of infrared small target detection. Full article
(This article belongs to the Special Issue Computer Vision Sensing and Pattern Recognition)
Show Figures

Figure 1

17 pages, 4070 KiB  
Article
Efficient Multi-Task Training with Adaptive Feature Alignment for Universal Image Segmentation
by Yipeng Qu and Joohee Kim
Sensors 2025, 25(2), 359; https://doi.org/10.3390/s25020359 - 9 Jan 2025
Viewed by 368
Abstract
Universal image segmentation aims to handle all segmentation tasks within a single model architecture and ideally requires only one training phase. To achieve task-conditioned joint training, a task token needs to be used in the multi-task training to condition the model for specific [...] Read more.
Universal image segmentation aims to handle all segmentation tasks within a single model architecture and ideally requires only one training phase. To achieve task-conditioned joint training, a task token needs to be used in the multi-task training to condition the model for specific tasks. Existing approaches generate the task token from a text input (e.g., “the task is panoptic”). However, such text-based inputs merely serve as labels and fail to capture the inherent differences between tasks, potentially misleading the model. In addition, the discrepancy between visual and textual modalities limits the performance gains in existing text-involved segmentation models. Nevertheless, prevailing modality-alignment methods rely on large-scale uni-modal encoders for both modalities and an extremely large amount of paired data for training, and therefore it is hard to apply these existing models to lightweight segmentation models and resource-constrained devices. In this paper, we propose Adaptive Feature Alignment (AFA) integrated with a learnable task token to address these issues. The learnable task token automatically captures inter-task differences from both image features and text queries during training, providing a more effective and efficient solution than a predefined text-based token. To efficiently align the two modalities without introducing extra complexity, we reconsider the differences between a text token and an image token and replace image features with class-specific means in our proposed AFA. We evaluate our model performance on the ADE20K and Cityscapes datasets. Experimental results demonstrate that our model surpasses baseline models in both efficiency and effectiveness, achieving state-of-the-art performance among segmentation models with a comparable amount of parameters. Full article
(This article belongs to the Special Issue Computer Vision Sensing and Pattern Recognition)
Show Figures

Figure 1

20 pages, 6755 KiB  
Article
MASDF-Net: A Multi-Attention Codec Network with Selective and Dynamic Fusion for Skin Lesion Segmentation
by Jinghao Fu and Hongmin Deng
Sensors 2024, 24(16), 5372; https://doi.org/10.3390/s24165372 - 20 Aug 2024
Viewed by 882
Abstract
Automated segmentation algorithms for dermoscopic images serve as effective tools that assist dermatologists in clinical diagnosis. While existing deep learning-based skin lesion segmentation algorithms have achieved certain success, challenges remain in accurately delineating the boundaries of lesion regions in dermoscopic images with irregular [...] Read more.
Automated segmentation algorithms for dermoscopic images serve as effective tools that assist dermatologists in clinical diagnosis. While existing deep learning-based skin lesion segmentation algorithms have achieved certain success, challenges remain in accurately delineating the boundaries of lesion regions in dermoscopic images with irregular shapes, blurry edges, and occlusions by artifacts. To address these issues, a multi-attention codec network with selective and dynamic fusion (MASDF-Net) is proposed for skin lesion segmentation in this study. In this network, we use the pyramid vision transformer as the encoder to model the long-range dependencies between features, and we innovatively designed three modules to further enhance the performance of the network. Specifically, the multi-attention fusion (MAF) module allows for attention to be focused on high-level features from various perspectives, thereby capturing more global contextual information. The selective information gathering (SIG) module improves the existing skip-connection structure by eliminating the redundant information in low-level features. The multi-scale cascade fusion (MSCF) module dynamically fuses features from different levels of the decoder part, further refining the segmentation boundaries. We conducted comprehensive experiments on the ISIC 2016, ISIC 2017, ISIC 2018, and PH2 datasets. The experimental results demonstrate the superiority of our approach over existing state-of-the-art methods. Full article
(This article belongs to the Special Issue Computer Vision Sensing and Pattern Recognition)
Show Figures

Figure 1

14 pages, 9581 KiB  
Article
A Lightweight Model for Real-Time Detection of Vehicle Black Smoke
by Ke Chen, Han Wang and Yingchao Zhai
Sensors 2023, 23(23), 9492; https://doi.org/10.3390/s23239492 - 29 Nov 2023
Viewed by 1462
Abstract
This paper discusses the application of deep learning technology in recognizing vehicle black smoke in road traffic monitoring videos. The use of massive surveillance video data imposes higher demands on the real-time performance of vehicle black smoke detection models. The YOLOv5s model, known [...] Read more.
This paper discusses the application of deep learning technology in recognizing vehicle black smoke in road traffic monitoring videos. The use of massive surveillance video data imposes higher demands on the real-time performance of vehicle black smoke detection models. The YOLOv5s model, known for its excellent single-stage object detection performance, has a complex network structure. Therefore, this study proposes a lightweight real-time detection model for vehicle black smoke, named MGSNet, based on the YOLOv5s framework. The research involved collecting road traffic monitoring video data and creating a custom dataset for vehicle black smoke detection by applying data augmentation techniques such as changing image brightness and contrast. The experiment explored three different lightweight networks, namely ShuffleNetv2, MobileNetv3 and GhostNetv1, to reconstruct the CSPDarknet53 backbone feature extraction network of YOLOv5s. Comparative experimental results indicate that reconstructing the backbone network with MobileNetv3 achieved a better balance between detection accuracy and speed. The introduction of the squeeze excitation attention mechanism and inverted residual structure from MobileNetv3 effectively reduced the complexity of black smoke feature fusion. Simultaneously, a novel convolution module, GSConv, was introduced to enhance the expression capability of black smoke features in the neck network. The combination of depthwise separable convolution and standard convolution in the module further reduced the model’s parameter count. After the improvement, the parameter count of the model is compressed to 1/6 of the YOLOv5s model. The lightweight vehicle black smoke real-time detection network, MGSNet, achieved a detection speed of 44.6 frames per second on the test set, an increase of 18.9 frames per second compared with the YOLOv5s model. The [email protected] still exceeded 95%, meeting the application requirements for real-time and accurate detection of vehicle black smoke. Full article
(This article belongs to the Special Issue Computer Vision Sensing and Pattern Recognition)
Show Figures

Figure 1

18 pages, 1612 KiB  
Article
A Multi-Scale Recursive Attention Feature Fusion Network for Image Super-Resolution Reconstruction Algorithm
by Xiaowei Han, Lei Wang, Xiaopeng Wang, Pengchao Zhang and Haoran Xu
Sensors 2023, 23(23), 9458; https://doi.org/10.3390/s23239458 - 28 Nov 2023
Cited by 4 | Viewed by 1434
Abstract
In recent years, deep convolutional neural networks (CNNs) have made significant progress in single-image super-resolution (SISR) tasks. Despite their good performance, the single-image super-resolution task remains a challenging one due to problems with underutilization of feature information and loss of feature details. In [...] Read more.
In recent years, deep convolutional neural networks (CNNs) have made significant progress in single-image super-resolution (SISR) tasks. Despite their good performance, the single-image super-resolution task remains a challenging one due to problems with underutilization of feature information and loss of feature details. In this paper, a multi-scale recursive attention feature fusion network (MSRAFFN) is proposed for this purpose. The network consists of three parts: a shallow feature extraction module, a multi-scale recursive attention feature fusion module, and a reconstruction module. The shallow features of the image are first extracted by the shallow feature extraction module. Then, the feature information at different scales is extracted by the multi-scale recursive attention feature fusion network block (MSRAFFB) to enhance the channel features of the network through the attention mechanism and fully fuse the feature information at different scales in order to improve the network’s performance. In addition, the image features at different levels are integrated through cross-layer connections using residual connections. Finally, in the reconstruction module, the upsampling capability of the deconvolution module is used to enlarge the image while extracting its high-frequency information in order to obtain a sharper high-resolution image and achieve a better visual effect. Through extensive experiments on a benchmark dataset, the proposed network model is shown to have better performance than other models in terms of both subjective visual effects and objective evaluation metrics. Full article
(This article belongs to the Special Issue Computer Vision Sensing and Pattern Recognition)
Show Figures

Figure 1

20 pages, 5028 KiB  
Article
Automatic Detection Method for Black Smoke Vehicles Considering Motion Shadows
by Han Wang, Ke Chen and Yanfeng Li
Sensors 2023, 23(19), 8281; https://doi.org/10.3390/s23198281 - 6 Oct 2023
Cited by 2 | Viewed by 1660
Abstract
Various statistical data indicate that mobile source pollutants have become a significant contributor to atmospheric environmental pollution, with vehicle tailpipe emissions being the primary contributor to these mobile source pollutants. The motion shadow generated by motor vehicles bears a visual resemblance to emitted [...] Read more.
Various statistical data indicate that mobile source pollutants have become a significant contributor to atmospheric environmental pollution, with vehicle tailpipe emissions being the primary contributor to these mobile source pollutants. The motion shadow generated by motor vehicles bears a visual resemblance to emitted black smoke, making this study primarily focused on the interference of motion shadows in the detection of black smoke vehicles. Initially, the YOLOv5s model is used to locate moving objects, including motor vehicles, motion shadows, and black smoke emissions. The extracted images of these moving objects are then processed using simple linear iterative clustering to obtain superpixel images of the three categories for model training. Finally, these superpixel images are fed into a lightweight MobileNetv3 network to build a black smoke vehicle detection model for recognition and classification. This study breaks away from the traditional approach of “detection first, then removal” to overcome shadow interference and instead employs a “segmentation-classification” approach, ingeniously addressing the coexistence of motion shadows and black smoke emissions. Experimental results show that the Y-MobileNetv3 model, which takes motion shadows into account, achieves an accuracy rate of 95.17%, a 4.73% improvement compared with the N-MobileNetv3 model (which does not consider motion shadows). Moreover, the average single-image inference time is only 7.3 ms. The superpixel segmentation algorithm effectively clusters similar pixels, facilitating the detection of trace amounts of black smoke emissions from motor vehicles. The Y-MobileNetv3 model not only improves the accuracy of black smoke vehicle recognition but also meets the real-time detection requirements. Full article
(This article belongs to the Special Issue Computer Vision Sensing and Pattern Recognition)
Show Figures

Figure 1

Back to TopTop