sensors-logo

Journal Browser

Journal Browser

Deep Learning-Based Neural Networks for Sensing and Imaging

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (25 January 2024) | Viewed by 38412

Special Issue Editors

1. Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China
2. Center of Materials Science and Optoelectronics Engineering School of Integrated Circuits, University of Chinese Academy of Sciences, Beijing 100049, China
Interests: pattern recognition; image classification; neural network; convolutional network; computer vision; object detection
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
The Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing 100083, China
Interests: computer vision; intelligent big data; data security

Special Issue Information

Dear Colleagues,

Deep learning-based neural networks have brought about a significant transformation in the field of sensing and imaging. These networks have demonstrated exceptional performance in a wide range of applications, including image classification, object detection, and segmentation. With their ability to learn from large amounts of data, deep learning-based neural networks have become a powerful tool for analyzing and interpreting complex sensory information. This Special Issue is dedicated to showcasing the latest research and developments in this field, with a focus on exploring the potential of deep learning-based neural networks for sensing and imaging. The Special Issue aims to bring together researchers and experts from various disciplines to share their insights and findings, and to foster collaboration and innovation in this rapidly evolving field. By presenting cutting-edge research and developments, the Special Issue will contribute to advancing state-of-the-art deep learning-based neural networks for sensing and imaging, and will pave the way for new applications and technologies in the future.

This Special Issue covers a wide range of topics related to deep learning-based neural networks for sensing and imaging. The topics include, but are not limited to:

  1. Object detection, tracking, and classification using deep learning;
  2. Deep learning-based segmentation of medical images;
  3. Deep learning-based video analysis and processing;
  4. Deep learning-based sensor fusion for multi-modal sensing;
  5. Deep learning-based feature extraction and representation learning;
  6. Deep learning-based optimization for sensing and imaging;
  7. Deep learning-based hardware and software implementations for sensing and imaging.

Dr. Xin Ning
Prof. Dr. Wenfa Li
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (20 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

21 pages, 13388 KiB  
Article
An Optimized Instance Segmentation of Underlying Surface in Low-Altitude TIR Sensing Images for Enhancing the Calculation of LSTs
by Yafei Wu, Chao He, Yao Shan, Shuai Zhao and Shunhua Zhou
Sensors 2024, 24(9), 2937; https://doi.org/10.3390/s24092937 - 5 May 2024
Viewed by 1023
Abstract
The calculation of land surface temperatures (LSTs) via low-altitude thermal infrared remote (TIR) sensing images at a block scale is gaining attention. However, the accurate calculation of LSTs requires a precise determination of the range of various underlying surfaces in the TIR images, [...] Read more.
The calculation of land surface temperatures (LSTs) via low-altitude thermal infrared remote (TIR) sensing images at a block scale is gaining attention. However, the accurate calculation of LSTs requires a precise determination of the range of various underlying surfaces in the TIR images, and existing approaches face challenges in effectively segmenting the underlying surfaces in the TIR images. To address this challenge, this study proposes a deep learning (DL) methodology to complete the instance segmentation and quantification of underlying surfaces through the low-altitude TIR image dataset. Mask region-based convolutional neural networks were utilized for pixel-level classification and segmentation with an image dataset of 1350 annotated TIR images of an urban rail transit hub with a complex distribution of underlying surfaces. Subsequently, the hyper-parameters and architecture were optimized for the precise classification of the underlying surfaces. The algorithms were validated using 150 new TIR images, and four evaluation indictors demonstrated that the optimized algorithm outperformed the other algorithms. High-quality segmented masks of the underlying surfaces were generated, and the area of each instance was obtained by counting the true-positive pixels with values of 1. This research promotes the accurate calculation of LSTs based on the low-altitude TIR sensing images. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

19 pages, 3464 KiB  
Article
Relative Localization and Circumnavigation of a UGV0 Based on Mixed Measurements of Multi-UAVs by Employing Intelligent Sensors
by Jia Guo, Minggang Gan and Kang Hu
Sensors 2024, 24(7), 2347; https://doi.org/10.3390/s24072347 - 7 Apr 2024
Cited by 2 | Viewed by 836
Abstract
Relative localization (RL) and circumnavigation is a highly challenging problem that is crucial for the safe flight of multi-UAVs (multiple unmanned aerial vehicles). Most methods depend on some external infrastructure for positioning. However, in some complex environments such as forests, it is difficult [...] Read more.
Relative localization (RL) and circumnavigation is a highly challenging problem that is crucial for the safe flight of multi-UAVs (multiple unmanned aerial vehicles). Most methods depend on some external infrastructure for positioning. However, in some complex environments such as forests, it is difficult to set up such infrastructures. In this paper, an approach to infrastructure-free RL estimations of multi-UAVs is investigated for circumnavigating a slowly drifting UGV0 (unmanned ground vehicle 0), where UGV0 serves as the RL and circumnavigation target. Firstly, a discrete-time direct RL estimator is proposed to ascertain the coordinates of each UAV relative to the UGV0 based on intelligent sensing. Secondly, an RL fusion estimation method is proposed to obtain the final estimate of UGV0. Thirdly, an integrated estimation control scheme is also proposed for the application of the RL fusion estimation method to circumnavigation. The convergence and the performance are analyzed. The simulation results validate the effectiveness of the proposed algorithm for RL fusion estimations and of the integrated scheme. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

17 pages, 10147 KiB  
Article
Fourier Ptychographic Neural Network Combined with Zernike Aberration Recovery and Wirtinger Flow Optimization
by Xiaoli Wang, Zechuan Lin, Yan Wang, Jie Li, Xinbo Wang and Hao Wang
Sensors 2024, 24(5), 1448; https://doi.org/10.3390/s24051448 - 23 Feb 2024
Viewed by 1193
Abstract
Fourier ptychographic microscopy, as a computational imaging method, can reconstruct high-resolution images but suffers optical aberration, which affects its imaging quality. For this reason, this paper proposes a network model for simulating the forward imaging process in the Tensorflow framework using samples and [...] Read more.
Fourier ptychographic microscopy, as a computational imaging method, can reconstruct high-resolution images but suffers optical aberration, which affects its imaging quality. For this reason, this paper proposes a network model for simulating the forward imaging process in the Tensorflow framework using samples and coherent transfer functions as the input. The proposed model improves the introduced Wirtinger flow algorithm, retains the central idea, simplifies the calculation process, and optimizes the update through back propagation. In addition, Zernike polynomials are used to accurately estimate aberration. The simulation and experimental results show that this method can effectively improve the accuracy of aberration correction, maintain good correction performance under complex scenes, and reduce the influence of optical aberration on imaging quality. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

12 pages, 2700 KiB  
Article
Monitoring Disease Severity of Mild Cognitive Impairment from Single-Channel EEG Data Using Regression Analysis
by Saleha Khatun, Bashir I. Morshed and Gavin M. Bidelman
Sensors 2024, 24(4), 1054; https://doi.org/10.3390/s24041054 - 6 Feb 2024
Cited by 3 | Viewed by 1462
Abstract
A deviation in the soundness of cognitive health is known as mild cognitive impairment (MCI), and it is important to monitor it early to prevent complicated diseases such as dementia, Alzheimer’s disease (AD), and Parkinson’s disease (PD). Traditionally, MCI severity is monitored with [...] Read more.
A deviation in the soundness of cognitive health is known as mild cognitive impairment (MCI), and it is important to monitor it early to prevent complicated diseases such as dementia, Alzheimer’s disease (AD), and Parkinson’s disease (PD). Traditionally, MCI severity is monitored with manual scoring using the Montreal Cognitive Assessment (MoCA). In this study, we propose a new MCI severity monitoring algorithm with regression analysis of extracted features of single-channel electro-encephalography (EEG) data by automatically generating severity scores equivalent to MoCA scores. We evaluated both multi-trial and single-trail analysis for the algorithm development. For multi-trial analysis, 590 features were extracted from the prominent event-related potential (ERP) points and corresponding time domain characteristics, and we utilized the lasso regression technique to select the best feature set. The 13 best features were used in the classical regression techniques: multivariate regression (MR), ensemble regression (ER), support vector regression (SVR), and ridge regression (RR). The best results were observed for ER with an RMSE of 1.6 and residual analysis. In single-trial analysis, we extracted a time–frequency plot image from each trial and fed it as an input to the constructed convolutional deep neural network (CNN). This deep CNN model resulted an RMSE of 2.76. To our knowledge, this is the first attempt to generate automated scores for MCI severity equivalent to MoCA from single-channel EEG data with multi-trial and single data. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

22 pages, 6171 KiB  
Article
Lightweight Detection Method for X-ray Security Inspection with Occlusion
by Zanshi Wang, Xiaohua Wang, Yueting Shi, Hang Qi, Minli Jia and Weijiang Wang
Sensors 2024, 24(3), 1002; https://doi.org/10.3390/s24031002 - 4 Feb 2024
Cited by 3 | Viewed by 1576
Abstract
Identifying the classes and locations of prohibited items is the target of security inspection. However, X-ray security inspection images with insufficient feature extraction, imbalance between easy and hard samples, and occlusion lead to poor detection accuracy. To address the above problems, an object-detection [...] Read more.
Identifying the classes and locations of prohibited items is the target of security inspection. However, X-ray security inspection images with insufficient feature extraction, imbalance between easy and hard samples, and occlusion lead to poor detection accuracy. To address the above problems, an object-detection method based on YOLOv8 is proposed. Firstly, an ASFF (adaptive spatial feature fusion) and a weighted feature concatenation algorithm are introduced to fully extract the scale features from input images. In this way, the model can learn further details in training. Secondly, CoordAtt (coordinate attention module), which belongs to the hybrid attention mechanism, is embedded to enhance the learning of features of interest. Then, the slide loss function is introduced to balance the simple samples and the difficult samples. Finally, Soft-NMS (non-maximum suppression) is introduced to resist the conditions containing occlusion. The experimental result shows that mAP (mean average precision) achieves 90.2%, 90.5%, 79.1%, and 91.4% on the Easy, Hard, and Hidden sets of the PIDray and SIXray public test set, respectively. Contrasted with original model, the mAP of our proposed YOLOv8n model increased by 2.7%, 3.1%, 9.3%, and 2.4%, respectively. Furthermore, the parameter count of the modified YOLOv8n model is roughly only 3 million. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

17 pages, 27372 KiB  
Article
Cluster2Former: Semisupervised Clustering Transformers for Video Instance Segmentation
by Áron Fóthi, Adrián Szlatincsán and Ellák Somfai
Sensors 2024, 24(3), 997; https://doi.org/10.3390/s24030997 - 3 Feb 2024
Cited by 1 | Viewed by 1377
Abstract
A novel approach for video instance segmentation is presented using semisupervised learning. Our Cluster2Former model leverages scribble-based annotations for training, significantly reducing the need for comprehensive pixel-level masks. We augment a video instance segmenter, for example, the Mask2Former architecture, with similarity-based constraint loss [...] Read more.
A novel approach for video instance segmentation is presented using semisupervised learning. Our Cluster2Former model leverages scribble-based annotations for training, significantly reducing the need for comprehensive pixel-level masks. We augment a video instance segmenter, for example, the Mask2Former architecture, with similarity-based constraint loss to handle partial annotations efficiently. We demonstrate that despite using lightweight annotations (using only 0.5% of the annotated pixels), Cluster2Former achieves competitive performance on standard benchmarks. The approach offers a cost-effective and computationally efficient solution for video instance segmentation, especially in scenarios with limited annotation resources. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

20 pages, 25820 KiB  
Article
Enhanced Out-of-Stock Detection in Retail Shelf Images Based on Deep Learning
by Franko Šikić, Zoran Kalafatić, Marko Subašić and Sven Lončarić
Sensors 2024, 24(2), 693; https://doi.org/10.3390/s24020693 - 22 Jan 2024
Cited by 1 | Viewed by 2443
Abstract
The term out-of-stock (OOS) describes a problem that occurs when shoppers come to a store and the product they are seeking is not present on its designated shelf. Missing products generate huge sales losses and may lead to a declining reputation or the [...] Read more.
The term out-of-stock (OOS) describes a problem that occurs when shoppers come to a store and the product they are seeking is not present on its designated shelf. Missing products generate huge sales losses and may lead to a declining reputation or the loss of loyal customers. In this paper, we propose a novel deep-learning (DL)-based OOS-detection method that utilizes a two-stage training process and a post-processing technique designed for the removal of inaccurate detections. To develop the method, we utilized an OOS detection dataset that contains a commonly used fully empty OOS class and a novel class that represents the frontal OOS. We present a new image augmentation procedure in which some existing OOS instances are enlarged by duplicating and mirroring themselves over nearby products. An object-detection model is first pre-trained using only augmented shelf images and, then, fine-tuned on the original data. During the inference, the detected OOS instances are post-processed based on their aspect ratio. In particular, the detected instances are discarded if their aspect ratio is higher than the maximum or lower than the minimum instance aspect ratio found in the dataset. The experimental results showed that the proposed method outperforms the existing DL-based OOS-detection methods and detects fully empty and frontal OOS instances with 86.3% and 83.7% of the average precision, respectively. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

16 pages, 17822 KiB  
Article
Efficient Defect Detection of Rotating Goods under the Background of Intelligent Retail
by Zhengming Hu, Xuepeng Zeng, Kai Xie, Chang Wen, Jianbiao He and Wei Zhang
Sensors 2024, 24(2), 467; https://doi.org/10.3390/s24020467 - 12 Jan 2024
Cited by 1 | Viewed by 1293
Abstract
Dynamic visual vending machines are rapidly growing in popularity, offering convenience and speed to customers. However, there is a prevalent issue with consumers damaging goods and then returning them to the machine, severely affecting business interests. This paper addresses the issue from the [...] Read more.
Dynamic visual vending machines are rapidly growing in popularity, offering convenience and speed to customers. However, there is a prevalent issue with consumers damaging goods and then returning them to the machine, severely affecting business interests. This paper addresses the issue from the standpoint of defect detection. Although existing industrial defect detection algorithms, such as PatchCore, perform well, they face challenges, including handling goods in various orientations, detection speeds that do not meet real-time monitoring requirements, and complex backgrounds that hinder detection accuracy. These challenges hinder their application in dynamic vending environments. It is crucial to note that efficient visual features play a vital role in memory banks, yet current memory repositories for industrial inspection algorithms do not adequately address the problem of location-specific feature redundancy. To tackle these issues, this paper introduces a novel defect detection algorithm for goods using adaptive subsampling and partitioned memory banks. Firstly, Grad-CAM is utilized to extract deep features, which, in combination with shallow features, mitigate the impact of complex backgrounds on detection accuracy. Next, graph convolutional networks extract rotationally invariant features. The adaptive subsampling partitioned memory bank is then employed to store features of non-defective goods, which reduces memory consumption and enhances training speed. Experimental results on the MVTec AD dataset demonstrate that the proposed algorithm achieves a marked improvement in detection speed while maintaining accuracy that is comparable to state-of-the-art models. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

20 pages, 2736 KiB  
Article
Deep Learning Based CAPTCHA Recognition Network with Grouping Strategy
by Zaid Derea, Beiji Zou, Asma A. Al-Shargabi, Alaa Thobhani and Amr Abdussalam
Sensors 2023, 23(23), 9487; https://doi.org/10.3390/s23239487 - 29 Nov 2023
Cited by 2 | Viewed by 3098
Abstract
Websites can improve their security and protect against harmful Internet attacks by incorporating CAPTCHA verification, which assists in distinguishing between human users and robots. Among the various types of CAPTCHA, the most prevalent variant involves text-based challenges that are intentionally designed to be [...] Read more.
Websites can improve their security and protect against harmful Internet attacks by incorporating CAPTCHA verification, which assists in distinguishing between human users and robots. Among the various types of CAPTCHA, the most prevalent variant involves text-based challenges that are intentionally designed to be easily understandable by humans while presenting a difficulty for machines or robots in recognizing them. Nevertheless, due to significant advancements in deep learning, constructing convolutional neural network (CNN)-based models that possess the capability of effectively recognizing text-based CAPTCHAs has become considerably simpler. In this regard, we present a CAPTCHA recognition method that entails creating multiple duplicates of the original CAPTCHA images and generating separate binary images that encode the exact locations of each group of CAPTCHA characters. These replicated images are subsequently fed into a well-trained CNN, one after another, for obtaining the final output characters. The model possesses a straightforward architecture with a relatively small storage in system, eliminating the need for CAPTCHA segmentation into individual characters. Following the training and testing of the suggested CNN model for CAPTCHA recognition, the experimental results demonstrate the model’s effectiveness in accurately recognizing CAPTCHA characters. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

21 pages, 3858 KiB  
Article
Miner Fatigue Detection from Electroencephalogram-Based Relative Power Spectral Topography Using Convolutional Neural Network
by Lili Xu, Jizu Li and Ding Feng
Sensors 2023, 23(22), 9055; https://doi.org/10.3390/s23229055 - 9 Nov 2023
Cited by 2 | Viewed by 1637
Abstract
Fatigue of miners is caused by intensive workloads, long working hours, and shift-work schedules. It is one of the major factors increasing the risk of safety problems and work mistakes. Examining the detection of miner fatigue is important because it can potentially prevent [...] Read more.
Fatigue of miners is caused by intensive workloads, long working hours, and shift-work schedules. It is one of the major factors increasing the risk of safety problems and work mistakes. Examining the detection of miner fatigue is important because it can potentially prevent work accidents and improve working efficiency in underground coal mines. Many previous studies have introduced feature-based machine-learning methods to estimate miner fatigue. This work proposes a method that uses electroencephalogram (EEG) signals to generate topographic maps containing frequency and spatial information. It utilizes a convolutional neural network (CNN) to classify the normal state, critical state, and fatigue state of miners. The topographic maps are generated from the EEG signals and contrasted using power spectral density (PSD) and relative power spectral density (RPSD). These two feature extraction methods were applied to feature recognition and four representative deep-learning methods. The results showthat RPSD achieves better performance than PSD in classification accuracy with all deep-learning methods. The CNN achieved superior results to the other deep-learning methods, with an accuracy of 94.5%, precision of 97.0%, sensitivity of 94.8%, and F1 score of 96.3%. Our results also show that the RPSD–CNN method outperforms the current state of the art. Thus, this method might be a useful and effective miner fatigue detection tool for coal companies in the near future. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

16 pages, 7500 KiB  
Article
Illumination-Based Color Reconstruction for the Dynamic Vision Sensor
by Khen Cohen, Omer Hershko, Homer Levy, David Mendlovic and Dan Raviv
Sensors 2023, 23(19), 8327; https://doi.org/10.3390/s23198327 - 9 Oct 2023
Viewed by 1315
Abstract
This work demonstrates a novel, state-of-the-art method to reconstruct colored images via the dynamic vision sensor (DVS). The DVS is an image sensor that indicates only a binary change in brightness, with no information about the captured wavelength (color) or intensity level. However, [...] Read more.
This work demonstrates a novel, state-of-the-art method to reconstruct colored images via the dynamic vision sensor (DVS). The DVS is an image sensor that indicates only a binary change in brightness, with no information about the captured wavelength (color) or intensity level. However, the reconstruction of the scene’s color could be essential for many tasks in computer vision and DVS. We present a novel method for reconstructing a full spatial resolution, colored image utilizing the DVS and an active colored light source. We analyze the DVS response and present two reconstruction algorithms: linear-based and convolutional-neural-network-based. Our two presented methods reconstruct the colored image with high quality, and they do not suffer from any spatial resolution degradation as other methods. In addition, we demonstrate the robustness of our algorithm to changes in environmental conditions, such as illumination and distance. Finally, compared with previous works, we show how we reach the state-of-the-art results. We share our code on GitHub. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

20 pages, 9691 KiB  
Article
Towards Automated Measurement of As-Built Components Using Computer Vision
by Husein Perez and Joseph H. M. Tah
Sensors 2023, 23(16), 7110; https://doi.org/10.3390/s23167110 - 11 Aug 2023
Cited by 3 | Viewed by 1671
Abstract
Regular inspections during construction work ensure that the completed work aligns with the plans and specifications and that it is within the planned time and budget. This requires frequent physical site observations to independently measure and verify the completion percentage of the construction [...] Read more.
Regular inspections during construction work ensure that the completed work aligns with the plans and specifications and that it is within the planned time and budget. This requires frequent physical site observations to independently measure and verify the completion percentage of the construction progress performed over periods of time. The current computer vision techniques for measuring as-built elements predominantly employ three-dimensional laser scanning or three-dimensional photogrammetry modeling to ascertain the geometric properties of as-built elements on construction sites. Both techniques require data acquisition from several positions and angles to generate sufficient information about the element’s coordinates, making the deployment of these techniques on dynamic construction project sites challenging. This paper proposes a pipeline for automating the measurement of as-built components using artificial intelligence and computer vision techniques. The pipeline requires a single image obtained with a stereo camera system to measure the sizes of selected objects or as-built components. The results in this work were demonstrated by measuring the sizes of concrete walls and columns. The novelty of this work is attributed to the use of a single image and a single target for developing a fully automated computer vision-based method for measuring any given object. The proposed solution is suitable for use in measuring the sizes of as-built components in built assets. It has the potential to be further developed and integrated with building information modelling applications for use on construction projects for progress monitoring. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

23 pages, 16853 KiB  
Article
A Real-Time Semantic Segmentation Method Based on STDC-CT for Recognizing UAV Emergency Landing Zones
by Bo Jiang, Zhonghui Chen, Jintao Tan, Ruokun Qu, Chenglong Li and Yandong Li
Sensors 2023, 23(14), 6514; https://doi.org/10.3390/s23146514 - 19 Jul 2023
Cited by 2 | Viewed by 2042
Abstract
With the accelerated growth of the UAV industry, researchers are paying close attention to the flight safety of UAVs. When a UAV loses its GPS signal or encounters unusual conditions, it must perform an emergency landing. Therefore, real-time recognition of emergency landing zones [...] Read more.
With the accelerated growth of the UAV industry, researchers are paying close attention to the flight safety of UAVs. When a UAV loses its GPS signal or encounters unusual conditions, it must perform an emergency landing. Therefore, real-time recognition of emergency landing zones on the ground is an important research topic. This paper employs a semantic segmentation approach for recognizing emergency landing zones. First, we created a dataset of UAV aerial images, denoted as UAV-City. A total of 600 UAV aerial images were densely annotated with 12 semantic categories. Given the complex backgrounds, diverse categories, and small UAV aerial image targets, we propose the STDC-CT real-time semantic segmentation network for UAV recognition of emergency landing zones. The STDC-CT network is composed of three branches: detail guidance, small object attention extractor, and multi-scale contextual information. The fusion of detailed and contextual information branches is guided by small object attention. We conducted extensive experiments on the UAV-City, Cityscapes, and UAVid datasets to demonstrate that the STDC-CT method is superior for attaining a balance between segmentation accuracy and inference speed. Our method improves the segmentation accuracy of small objects and achieves 76.5% mIoU on the Cityscapes test set at 122.6 FPS, 68.4% mIoU on the UAVid test set, and 67.3% mIoU on the UAV-City dataset at 196.8 FPS on an NVIDIA RTX 2080Ti GPU. Finally, we deployed the STDC-CT model on Jetson TX2 for testing in a real-world environment, attaining real-time semantic segmentation with an average inference speed of 58.32 ms per image. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

20 pages, 10422 KiB  
Article
ESAMask: Real-Time Instance Segmentation Fused with Efficient Sparse Attention
by Qian Zhang, Lu Chen, Mingwen Shao, Hong Liang and Jie Ren
Sensors 2023, 23(14), 6446; https://doi.org/10.3390/s23146446 - 16 Jul 2023
Cited by 1 | Viewed by 2135
Abstract
Instance segmentation is a challenging task in computer vision, as it requires distinguishing objects and predicting dense areas. Currently, segmentation models based on complex designs and large parameters have achieved remarkable accuracy. However, from a practical standpoint, achieving a balance between accuracy and [...] Read more.
Instance segmentation is a challenging task in computer vision, as it requires distinguishing objects and predicting dense areas. Currently, segmentation models based on complex designs and large parameters have achieved remarkable accuracy. However, from a practical standpoint, achieving a balance between accuracy and speed is even more desirable. To address this need, this paper presents ESAMask, a real-time segmentation model fused with efficient sparse attention, which adheres to the principles of lightweight design and efficiency. In this work, we propose several key contributions. Firstly, we introduce a dynamic and sparse Related Semantic Perceived Attention mechanism (RSPA) for adaptive perception of different semantic information of various targets during feature extraction. RSPA uses the adjacency matrix to search for regions with high semantic correlation of the same target, which reduces computational cost. Additionally, we design the GSInvSAM structure to reduce redundant calculations of spliced features while enhancing interaction between channels when merging feature layers of different scales. Lastly, we introduce the Mixed Receptive Field Context Perception Module (MRFCPM) in the prototype branch to enable targets of different scales to capture the feature representation of the corresponding area during mask generation. MRFCPM fuses information from three branches of global content awareness, large kernel region awareness, and convolutional channel attention to explicitly model features at different scales. Through extensive experimental evaluation, ESAMask achieves a mask AP of 45.4 at a frame rate of 45.2 FPS on the COCO dataset, surpassing current instance segmentation methods in terms of the accuracy–speed trade-off, as demonstrated by our comprehensive experimental results. In addition, the high-quality segmentation results of our proposed method for objects of various classes and scales can be intuitively observed from the visualized segmentation outputs. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

17 pages, 1912 KiB  
Article
Non-Contact Assessment of Swallowing Dysfunction Using Smartphone Captured Skin Displacements
by Nikyta Chesney, Prashanna Khwaounjoo, Maggie-Lee Huckabee and Yusuf Ozgur Cakmak
Sensors 2023, 23(12), 5392; https://doi.org/10.3390/s23125392 - 7 Jun 2023
Cited by 1 | Viewed by 1961
Abstract
Early and accurate dysphagia diagnosis is essential for reducing the risk of associated co-morbidities and mortalities. Barriers to current evaluation methods may alter the effectiveness of identifying at-risk patients. This preliminary study evaluates the feasibility of using iPhone X-captured videos of swallowing as [...] Read more.
Early and accurate dysphagia diagnosis is essential for reducing the risk of associated co-morbidities and mortalities. Barriers to current evaluation methods may alter the effectiveness of identifying at-risk patients. This preliminary study evaluates the feasibility of using iPhone X-captured videos of swallowing as a non-contact dysphagia screening tool. Video recordings of the anterior and lateral necks were captured simultaneously with videofluoroscopy in dysphagic patients. Videos were analyzed using an image registration algorithm (phase-based Savitzky–Golay gradient correlation (P-SG-GC)) to determine skin displacements over hyolaryngeal regions. Biomechanical swallowing parameters of hyolaryngeal displacement and velocity were also measured. Swallowing safety and efficiency were assessed by the Penetration Aspiration Scale (PAS), Residue Severity Ratings (RSR), and the Normalized Residue Ratio Scale (NRRS). Anterior hyoid excursion and horizontal skin displacements were strongly correlated with swallows of a 20 mL bolus (rs = 0.67). Skin displacements of the neck were moderately to very strongly correlated with scores on the PAS (rs = 0.80), NRRS (rs = 0.41–0.62), and RSR (rs = 0.33). This is the first study to utilize smartphone technology and image registration methods to produce skin displacements indicating post-swallow residual and penetration-aspiration. Enhancing screening methods provides a greater chance of detecting dysphagia, reducing the risk of negative health impacts. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

18 pages, 7198 KiB  
Article
Classification and Recognition of Building Appearance Based on Optimized Gradient-Boosted Decision Tree Algorithm
by Mengting Hu, Lingxiang Guo, Jing Liu and Yuxuan Song
Sensors 2023, 23(11), 5353; https://doi.org/10.3390/s23115353 - 5 Jun 2023
Cited by 2 | Viewed by 1704
Abstract
There are high concentrations of urban spaces and increasingly complex land use types. Providing an efficient and scientific identification of building types has become a major challenge in urban architectural planning. This study used an optimized gradient-boosted decision tree algorithm to enhance a [...] Read more.
There are high concentrations of urban spaces and increasingly complex land use types. Providing an efficient and scientific identification of building types has become a major challenge in urban architectural planning. This study used an optimized gradient-boosted decision tree algorithm to enhance a decision tree model for building classification. Through supervised classification learning, machine learning training was conducted using a business-type weighted database. We innovatively established a form database to store input items. During parameter optimization, parameters such as the number of nodes, maximum depth, and learning rate were gradually adjusted based on the performance of the verification set to achieve optimal performance on the verification set under the same conditions. Simultaneously, a k-fold cross-validation method was used to avoid overfitting. The model clusters trained in the machine learning training corresponded to various city sizes. By setting the parameters to determine the size of the area of land for a target city, the corresponding classification model could be invoked. The experimental results show that this algorithm has high accuracy in building recognition. Especially in R, S, and U-class buildings, the overall accuracy rate of recognition reaches over 94%. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

19 pages, 6439 KiB  
Article
Transformer-Based Semantic Segmentation for Extraction of Building Footprints from Very-High-Resolution Images
by Jia Song, A-Xing Zhu and Yunqiang Zhu
Sensors 2023, 23(11), 5166; https://doi.org/10.3390/s23115166 - 29 May 2023
Cited by 5 | Viewed by 3049
Abstract
Semantic segmentation with deep learning networks has become an important approach to the extraction of objects from very high-resolution remote sensing images. Vision Transformer networks have shown significant improvements in performance compared to traditional convolutional neural networks (CNNs) in semantic segmentation. Vision Transformer [...] Read more.
Semantic segmentation with deep learning networks has become an important approach to the extraction of objects from very high-resolution remote sensing images. Vision Transformer networks have shown significant improvements in performance compared to traditional convolutional neural networks (CNNs) in semantic segmentation. Vision Transformer networks have different architectures to CNNs. Image patches, linear embedding, and multi-head self-attention (MHSA) are several of the main hyperparameters. How we should configure them for the extraction of objects in VHR images and how they affect the accuracy of networks are topics that have not been sufficiently investigated. This article explores the role of vision Transformer networks in the extraction of building footprints from very-high-resolution (VHR) images. Transformer-based models with different hyperparameter values were designed and compared, and their impact on accuracy was analyzed. The results show that smaller image patches and higher-dimension embeddings result in better accuracy. In addition, the Transformer-based network is shown to be scalable and can be trained with general-scale graphics processing units (GPUs) with comparable model sizes and training times to convolutional neural networks while achieving higher accuracy. The study provides valuable insights into the potential of vision Transformer networks in object extraction using VHR images. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

16 pages, 7247 KiB  
Article
SALSA-Net: Explainable Deep Unrolling Networks for Compressed Sensing
by Heping Song, Qifeng Ding, Jingyao Gong, Hongying Meng and Yuping Lai
Sensors 2023, 23(11), 5142; https://doi.org/10.3390/s23115142 - 28 May 2023
Cited by 1 | Viewed by 2268
Abstract
Deep unrolling networks (DUNs) have emerged as a promising approach for solving compressed sensing (CS) problems due to their superior explainability, speed, and performance compared to classical deep network models. However, the CS performance in terms of efficiency and accuracy remains a principal [...] Read more.
Deep unrolling networks (DUNs) have emerged as a promising approach for solving compressed sensing (CS) problems due to their superior explainability, speed, and performance compared to classical deep network models. However, the CS performance in terms of efficiency and accuracy remains a principal challenge for approaching further improvements. In this paper, we propose a novel deep unrolling model, SALSA-Net, to solve the image CS problem. The network architecture of SALSA-Net is inspired by unrolling and truncating the split augmented Lagrangian shrinkage algorithm (SALSA) which is used to solve sparsity-induced CS reconstruction problems. SALSA-Net inherits the interpretability of the SALSA algorithm while incorporating the learning ability and fast reconstruction speed of deep neural networks. By converting the SALSA algorithm into a deep network structure, SALSA-Net consists of a gradient update module, a threshold denoising module, and an auxiliary update module. All parameters, including the shrinkage thresholds and gradient steps, are optimized through end-to-end learning and are subject to forward constraints to ensure faster convergence. Furthermore, we introduce learned sampling to replace traditional sampling methods so that the sampling matrix can better preserve the feature information of the original signal and improve sampling efficiency. Experimental results demonstrate that SALSA-Net achieves significant reconstruction performance compared to state-of-the-art methods while inheriting the advantages of explainable recovery and high speed from the DUNs paradigm. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

16 pages, 2811 KiB  
Article
Generating Defective Epoxy Drop Images for Die Attachment in Integrated Circuit Manufacturing via Enhanced Loss Function CycleGAN
by Lamia Alam and Nasser Kehtarnavaz
Sensors 2023, 23(10), 4864; https://doi.org/10.3390/s23104864 - 18 May 2023
Cited by 3 | Viewed by 1694
Abstract
In integrated circuit manufacturing, defects in epoxy drops for die attachments are required to be identified during production. Modern identification techniques based on vision-based deep neural networks require the availability of a very large number of defect and non-defect epoxy drop images. In [...] Read more.
In integrated circuit manufacturing, defects in epoxy drops for die attachments are required to be identified during production. Modern identification techniques based on vision-based deep neural networks require the availability of a very large number of defect and non-defect epoxy drop images. In practice, however, very few defective epoxy drop images are available. This paper presents a generative adversarial network solution to generate synthesized defective epoxy drop images as a data augmentation approach so that vision-based deep neural networks can be trained or tested using such images. More specifically, the so-called CycleGAN variation of the generative adversarial network is used by enhancing its cycle consistency loss function with two other loss functions consisting of learned perceptual image patch similarity (LPIPS) and a structural similarity index metric (SSIM). The results obtained indicate that when using the enhanced loss function, the quality of synthesized defective epoxy drop images are improved by 59%, 12%, and 131% for the metrics of the peak signal-to-noise ratio (PSNR), universal image quality index (UQI), and visual information fidelity (VIF), respectively, compared to the CycleGAN standard loss function. A typical image classifier is used to show the improvement in the identification outcome when using the synthesized images generated by the developed data augmentation approach. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

Review

Jump to: Research

13 pages, 6635 KiB  
Review
Application of Deep Learning and Intelligent Sensing Analysis in Smart Home
by Yi Lu, Lejia Zhou, Aili Zhang, Siyu Zha, Xiaojie Zhuo and Sen Ge
Sensors 2024, 24(3), 953; https://doi.org/10.3390/s24030953 - 1 Feb 2024
Cited by 3 | Viewed by 2489
Abstract
Deep learning technology can improve sensing efficiency and has the ability to discover potential patterns in data; the efficiency of user behavior recognition in the field of smart homes has been further improved, making the recognition process more intelligent and humanized. This paper [...] Read more.
Deep learning technology can improve sensing efficiency and has the ability to discover potential patterns in data; the efficiency of user behavior recognition in the field of smart homes has been further improved, making the recognition process more intelligent and humanized. This paper analyzes the optical sensors commonly used in smart homes and their working principles through case studies and explores the technical framework of user behavior recognition based on optical sensors. At the same time, CiteSpace (Basic version 6.2.R6) software is used to visualize and analyze the related literature, elaborate the main research hotspots and evolutionary changes of optical sensor-based smart home user behavior recognition, and summarize the future research trends. Finally, fully utilizing the advantages of cloud computing technology, such as scalability and on-demand services, combining typical life situations and the requirements of smart home users, a smart home data collection and processing technology framework based on elderly fall monitoring scenarios is designed. Based on the comprehensive research results, the application and positive impact of optical sensors in smart home user behavior recognition were analyzed, and inspiration was provided for future smart home user experience research. Full article
(This article belongs to the Special Issue Deep Learning-Based Neural Networks for Sensing and Imaging)
Show Figures

Figure 1

Back to TopTop