sensors-logo

Journal Browser

Journal Browser

Computer Vision and Machine Learning for Intelligent Sensing Systems—2nd Edition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (15 November 2024) | Viewed by 28857

Special Issue Editor

Institute of Systems Science, National University of Singapore, Singapore 119620, Singapore
Interests: computer vision; machine learning; video analytics; multimedia application
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

With the rapid development of computer vision and machine learning technology, intelligent sensing systems have been fuelled to make sense of vision sensory data to address complex and challenging real-world sense-making problems. This has raised tremendous opportunities and challenges in managing and understanding vision sensory data for intelligent sensing systems. With the recent advances in machine learning techniques, we are now able to boost the intelligence of analyzing vision sensory data significantly. This has attracted massive research efforts devoted to addressing challenges in this area, including visual surveillance, smart cities, and healthcare, etc. The Special Issue aims to provide a collection of high-quality research articles that address the broad challenges in both theoretical and application aspects of computer vision and machine learning for intelligent sensing systems.

The topics of interest include, but are not limited to:

  • Computer vision for intelligent sensing systems
    • Sensing, representation, modeling
    • Restoration, enhancement, and super-resolution
    • Color, multispectral, and hyperspectral imaging
    • Stereoscopic, multiview, and 3D processing
  • Machine learning for intelligent sensing systems
    • Classification, detection, segmentation
    • Action and event recognition, behavior understanding
    • Multimodal machine learning
  • Computer vision applications for healthcare, manufacture, security and safety, biomedical sciences, and other emerging applications

Dr. Jing Tian
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • deep learning
  • computer vision
  • image classification
  • image analysis
  • object detection
  • image segmentation
  • action recognition

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

25 pages, 7532 KiB  
Article
A Novel Approach to Detect Drones Using Deep Convolutional Neural Network Architecture
by Hrishi Rakshit and Pooneh Bagheri Zadeh
Sensors 2024, 24(14), 4550; https://doi.org/10.3390/s24144550 - 13 Jul 2024
Viewed by 1165
Abstract
Over the past decades, drones have become more attainable by the public due to their widespread availability at affordable prices. Nevertheless, this situation sparks serious concerns in both the cyber and physical security domains, as drones can be employed for malicious activities with [...] Read more.
Over the past decades, drones have become more attainable by the public due to their widespread availability at affordable prices. Nevertheless, this situation sparks serious concerns in both the cyber and physical security domains, as drones can be employed for malicious activities with public safety threats. However, detecting drones instantly and efficiently is a very difficult task due to their tiny size and swift flights. This paper presents a novel drone detection method using deep convolutional learning and deep transfer learning. The proposed algorithm employs a new feature extraction network, which is added to the modified YOU ONLY LOOK ONCE version2 (YOLOv2) network. The feature extraction model uses bypass connections to learn features from the training sets and solves the “vanishing gradient” problem caused by the increasing depth of the network. The structure of YOLOv2 is modified by replacing the rectified linear unit (relu) with a leaky-relu activation function and adding an extra convolutional layer with a stride of 2 to improve the small object detection accuracy. Using leaky-relu solves the “dying relu” problem. The additional convolution layer with a stride of 2 reduces the spatial dimensions of the feature maps and helps the network to focus on larger contextual information while still preserving the ability to detect small objects. The model is trained with a custom dataset that contains various types of drones, airplanes, birds, and helicopters under various weather conditions. The proposed model demonstrates a notable performance, achieving an accuracy of 77% on the test images with only 5 million learnable parameters in contrast to the Darknet53 + YOLOv3 model, which exhibits a 54% accuracy on the same test set despite employing 62 million learnable parameters. Full article
Show Figures

Figure 1

14 pages, 6859 KiB  
Communication
Addressing Challenges in Port Depth Analysis: Integrating Machine Learning and Spatial Information for Accurate Remote Sensing of Turbid Waters
by Xin Li, Zhongqiang Wu and Wei Shen
Sensors 2024, 24(12), 3802; https://doi.org/10.3390/s24123802 - 12 Jun 2024
Cited by 2 | Viewed by 872
Abstract
Bathymetry estimation is essential for various applications in port management, navigation safety, marine engineering, and environmental monitoring. Satellite remote sensing data can rapidly acquire the bathymetry of the target shallow waters, and researchers have developed various models to invert the water depth from [...] Read more.
Bathymetry estimation is essential for various applications in port management, navigation safety, marine engineering, and environmental monitoring. Satellite remote sensing data can rapidly acquire the bathymetry of the target shallow waters, and researchers have developed various models to invert the water depth from the satellite data. Geographically weighted regression (GWR) is a common method for satellite-based bathymetry estimation. However, in sediment-laden water environments, especially ports, the suspended materials significantly affect the performance of GWR for depth inversion. This study proposes a novel approach that integrates GWR with Random Forest (RF) techniques, using longitude, latitude, and multispectral remote sensing reflectance as input variables. This approach effectively addresses the challenge of estimating bathymetry in turbid waters by considering the strong correlation between water depth and geographical location. The proposed method not only overcomes the limitations of turbid waters but also improves the accuracy of depth inversion results in such complex aquatic settings. This breakthrough in modeling has significant implications for turbid waters, enhancing port management, navigational safety, and environmental monitoring in sediment-laden maritime zones. Full article
Show Figures

Figure 1

24 pages, 8032 KiB  
Article
GM-DETR: Research on a Defect Detection Method Based on Improved DETR
by Xin Liu, Xudong Yang, Lianhe Shao, Xihan Wang, Quanli Gao and Hongbo Shi
Sensors 2024, 24(11), 3610; https://doi.org/10.3390/s24113610 - 3 Jun 2024
Cited by 1 | Viewed by 1254
Abstract
Defect detection is an indispensable part of the industrial intelligence process. The introduction of the DETR model marked the successful application of a transformer for defect detection, achieving true end-to-end detection. However, due to the complexity of defective backgrounds, low resolutions can lead [...] Read more.
Defect detection is an indispensable part of the industrial intelligence process. The introduction of the DETR model marked the successful application of a transformer for defect detection, achieving true end-to-end detection. However, due to the complexity of defective backgrounds, low resolutions can lead to a lack of image detail control and slow convergence of the DETR model. To address these issues, we proposed a defect detection method based on an improved DETR model, called the GM-DETR. We optimized the DETR model by integrating GAM global attention with CNN feature extraction and matching features. This optimization process reduces the defect information diffusion and enhances the global feature interaction, improving the neural network’s performance and ability to recognize target defects in complex backgrounds. Next, to filter out unnecessary model parameters, we proposed a layer pruning strategy to optimize the decoding layer, thereby reducing the model’s parameter count. In addition, to address the issue of poor sensitivity of the original loss function to small differences in defect targets, we replaced the L1 loss in the original loss function with MSE loss to accelerate the network’s convergence speed and improve the model’s recognition accuracy. We conducted experiments on a dataset of road pothole defects to further validate the effectiveness of the GM-DETR model. The results demonstrate that the improved model exhibits better performance, with an increase in average precision of 4.9% ([email protected]), while reducing the parameter count by 12.9%. Full article
Show Figures

Figure 1

18 pages, 1567 KiB  
Article
Image Classifier for an Online Footwear Marketplace to Distinguish between Counterfeit and Real Sneakers for Resale
by Joshua Onalaja, Essa Q. Shahra, Shadi Basurra and Waheb A. Jabbar
Sensors 2024, 24(10), 3030; https://doi.org/10.3390/s24103030 - 10 May 2024
Cited by 1 | Viewed by 1280
Abstract
The sneaker industry is continuing to expand at a fast rate and will be worth over USD 120 billion in the next few years. This is, in part due to social media and online retailers building hype around releases of limited-edition sneakers, which [...] Read more.
The sneaker industry is continuing to expand at a fast rate and will be worth over USD 120 billion in the next few years. This is, in part due to social media and online retailers building hype around releases of limited-edition sneakers, which are usually collaborations between well-known global icons and footwear companies. These limited-edition sneakers are typically released in low quantities using an online raffle system, meaning only a few people can get their hands on them. As expected, this causes their value to skyrocket and has created an extremely lucrative resale market for sneakers. This has given rise to numerous counterfeit sneakers flooding the resale market, resulting in online platforms having to hand-verify a sneaker’s authenticity, which is an important but time-consuming procedure that slows the selling and buying process. To speed up the authentication process, Support Vector Machines and a convolutional neural network were used to classify images of fake and real sneakers and then their accuracies were compared to see which performed better. The results showed that the CNNs performed much better at this task than the SVMs with some accuracies over 95%. Therefore, a CNN is well equipped to be a sneaker authenticator and will be of great benefit to the reselling industry. Full article
Show Figures

Figure 1

20 pages, 32970 KiB  
Article
Faces in Event Streams (FES): An Annotated Face Dataset for Event Cameras
by Ulzhan Bissarinova, Tomiris Rakhimzhanova, Daulet Kenzhebalin and Huseyin Atakan Varol
Sensors 2024, 24(5), 1409; https://doi.org/10.3390/s24051409 - 22 Feb 2024
Cited by 2 | Viewed by 1974
Abstract
The use of event-based cameras in computer vision is a growing research direction. However, despite the existing research on face detection using the event camera, a substantial gap persists in the availability of a large dataset featuring annotations for faces and facial landmarks [...] Read more.
The use of event-based cameras in computer vision is a growing research direction. However, despite the existing research on face detection using the event camera, a substantial gap persists in the availability of a large dataset featuring annotations for faces and facial landmarks on event streams, thus hampering the development of applications in this direction. In this work, we address this issue by publishing the first large and varied dataset (Faces in Event Streams) with a duration of 689 min for face and facial landmark detection in direct event-based camera outputs. In addition, this article presents 12 models trained on our dataset to predict bounding box and facial landmark coordinates with an mAP50 score of more than 90%. We also performed a demonstration of real-time detection with an event-based camera using our models. Full article
Show Figures

Figure 1

15 pages, 1792 KiB  
Article
Rethinking Attention Mechanisms in Vision Transformers with Graph Structures
by Hyeongjin Kim and Byoung Chul Ko
Sensors 2024, 24(4), 1111; https://doi.org/10.3390/s24041111 - 8 Feb 2024
Viewed by 1847
Abstract
In this paper, we propose a new type of vision transformer (ViT) based on graph head attention (GHA). Because the multi-head attention (MHA) of a pure ViT requires multiple parameters and tends to lose the locality of an image, we replaced MHA with [...] Read more.
In this paper, we propose a new type of vision transformer (ViT) based on graph head attention (GHA). Because the multi-head attention (MHA) of a pure ViT requires multiple parameters and tends to lose the locality of an image, we replaced MHA with GHA by applying a graph to the attention head of the transformer. Consequently, the proposed GHA maintains both the locality and globality of the input patches and guarantees the diversity of the attention. The proposed GHA-ViT commonly outperforms pure ViT-based models using small-sized CIFAR-10/100, MNIST, and MNIST-F datasets and a medium-sized ImageNet-1K dataset in scratch training. A Top-1 accuracy of 81.7% was achieved for ImageNet-1K using GHA-B, which is a base model with approximately 29 M parameters. In addition, with CIFAR-10/100, the existing ViT and parameters are reduced 17-fold and the performance increased by 0.4/4.3%, respectively. The proposed GHA-ViT shows promising results in terms of the number of parameters and operations and the level of accuracy in comparison with other state-of-the-art ViT-lightweight models. Full article
Show Figures

Figure 1

21 pages, 2798 KiB  
Article
An Improved YOLOv5-Based Underwater Object-Detection Framework
by Jian Zhang, Jinshuai Zhang, Kexin Zhou, Yonghui Zhang, Hongda Chen and Xinyue Yan
Sensors 2023, 23(7), 3693; https://doi.org/10.3390/s23073693 - 3 Apr 2023
Cited by 35 | Viewed by 9011
Abstract
To date, general-purpose object-detection methods have achieved a great deal. However, challenges such as degraded image quality, complex backgrounds, and the detection of marine organisms at different scales arise when identifying underwater organisms. To solve such problems and further improve the accuracy of [...] Read more.
To date, general-purpose object-detection methods have achieved a great deal. However, challenges such as degraded image quality, complex backgrounds, and the detection of marine organisms at different scales arise when identifying underwater organisms. To solve such problems and further improve the accuracy of relevant models, this study proposes a marine biological object-detection architecture based on an improved YOLOv5 framework. First, the backbone framework of Real-Time Models for object Detection (RTMDet) is introduced. The core module, Cross-Stage Partial Layer (CSPLayer), includes a large convolution kernel, which allows the detection network to precisely capture contextual information more comprehensively. Furthermore, a common convolution layer is added to the stem layer, to extract more valuable information from the images efficiently. Then, the BoT3 module with the multi-head self-attention (MHSA) mechanism is added into the neck module of YOLOv5, such that the detection network has a better effect in scenes with dense targets and the detection accuracy is further improved. The introduction of the BoT3 module represents a key innovation of this paper. Finally, union dataset augmentation (UDA) is performed on the training set using the Minimal Color Loss and Locally Adaptive Contrast Enhancement (MLLE) image augmentation method, and the result is used as the input to the improved YOLOv5 framework. Experiments on the underwater datasets URPC2019 and URPC2020 show that the proposed framework not only alleviates the interference of underwater image degradation, but also makes the [email protected] reach 79.8% and 79.4% and improves the [email protected] by 3.8% and 1.1%, respectively, when compared with the original YOLOv8 on URPC2019 and URPC2020, demonstrating that the proposed framework presents superior performance for the high-precision detection of marine organisms. Full article
Show Figures

Figure 1

15 pages, 5805 KiB  
Article
Real-Time Forest Fire Detection by Ensemble Lightweight YOLOX-L and Defogging Method
by Jiarun Huang, Zhili He, Yuwei Guan and Hongguo Zhang
Sensors 2023, 23(4), 1894; https://doi.org/10.3390/s23041894 - 8 Feb 2023
Cited by 29 | Viewed by 3451
Abstract
Forest fires can destroy forest and inflict great damage to the ecosystem. Fortunately, forest fire detection with video has achieved remarkable results in enabling timely and accurate fire warnings. However, the traditional forest fire detection method relies heavily on artificially designed features; CNN-based [...] Read more.
Forest fires can destroy forest and inflict great damage to the ecosystem. Fortunately, forest fire detection with video has achieved remarkable results in enabling timely and accurate fire warnings. However, the traditional forest fire detection method relies heavily on artificially designed features; CNN-based methods require a large number of parameters. In addition, forest fire detection is easily disturbed by fog. To solve these issues, a lightweight YOLOX-L and defogging algorithm-based forest fire detection method, GXLD, is proposed. GXLD uses the dark channel prior to defog the image to obtain a fog-free image. After the lightweight improvement of YOLOX-L by GhostNet, depth separable convolution, and SENet, we obtain the YOLOX-L-Light and use it to detect the forest fire in the fog-free image. To evaluate the performance of YOLOX-L-Light and GXLD, mean average precision (mAP) was used to evaluate the detection accuracy, and network parameters were used to evaluate the lightweight effect. Experiments on our forest fire dataset show that the number of the parameters of YOLOX-L-Light decreased by 92.6%, and the mAP increased by 1.96%. The mAP of GXLD is 87.47%, which is 2.46% higher than that of YOLOX-L; and the average fps of GXLD is 26.33 when the input image size is 1280 × 720. Even in a foggy environment, the GXLD can detect a forest fire in real time with a high accuracy, target confidence, and target integrity. This research proposes a lightweight forest fire detection method (GXLD) with fog removal. Therefore, GXLD can detect a forest fire with a high accuracy in real time. The proposed GXLD has the advantages of defogging, a high target confidence, and a high target integrity, which makes it more suitable for the development of a modern forest fire video detection system. Full article
Show Figures

Figure 1

Review

Jump to: Research

33 pages, 18843 KiB  
Review
Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative Study
by Hung-Cuong Nguyen, Thi-Hao Nguyen, Rafał Scherer and Van-Hung Le
Sensors 2023, 23(11), 5121; https://doi.org/10.3390/s23115121 - 27 May 2023
Cited by 11 | Viewed by 6267
Abstract
Human activity recognition (HAR) is an important research problem in computer vision. This problem is widely applied to building applications in human–machine interactions, monitoring, etc. Especially, HAR based on the human skeleton creates intuitive applications. Therefore, determining the current results of these studies [...] Read more.
Human activity recognition (HAR) is an important research problem in computer vision. This problem is widely applied to building applications in human–machine interactions, monitoring, etc. Especially, HAR based on the human skeleton creates intuitive applications. Therefore, determining the current results of these studies is very important in selecting solutions and developing commercial products. In this paper, we perform a full survey on using deep learning to recognize human activity based on three-dimensional (3D) human skeleton data as input. Our research is based on four types of deep learning networks for activity recognition based on extracted feature vectors: Recurrent Neural Network (RNN) using extracted activity sequence features; Convolutional Neural Network (CNN) uses feature vectors extracted based on the projection of the skeleton into the image space; Graph Convolution Network (GCN) uses features extracted from the skeleton graph and the temporal–spatial function of the skeleton; Hybrid Deep Neural Network (Hybrid–DNN) uses many other types of features in combination. Our survey research is fully implemented from models, databases, metrics, and results from 2019 to March 2023, and they are presented in ascending order of time. In particular, we also carried out a comparative study on HAR based on a 3D human skeleton on the KLHA3D 102 and KLYOGA3D datasets. At the same time, we performed analysis and discussed the obtained results when applying CNN-based, GCN-based, and Hybrid–DNN-based deep learning networks. Full article
Show Figures

Figure 1

Back to TopTop