Investigations of Object Detection in Images/Videos Using Various Deep Learning Techniques

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (15 February 2024) | Viewed by 16308

Special Issue Editors


E-Mail
Guest Editor
School of Automation, Central South University, Changsha 410083, China
Interests: image processing; pattern recognition; artificial intelligence; object detection
Special Issues, Collections and Topics in MDPI journals
College of Information Science and Technology, Dalian Maritime University, Dalian, China
Interests: image processing; pattern recognition; deep learning; object detection
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Faculty of Robot Science and Engineering, Northeastern University, Shenyang 110819, China
Interests: image processing; pattern recognition; artificial intelligence; object detection

Special Issue Information

Dear Colleagues,

Object detection is a mainstream and challenging branch of computer vision, and it has attracted much research attention in recent years because of its close relationship with video analysis and image understanding. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. The performance easily stagnates by constructing complex ensembles that combine multiple low-level image features with high-level contexts from object detectors and scene classifiers. With the rapid development in deep learning, more promising methods are proposed to address object detection. The deep learning-based models are able to learn semantic, high-level, and deeper features. However, the network architecture, training strategy and optimization function are still worth investigating. The deep learning-based models are data-driven and rely on the training dataset. Hence, it is essential to build the object detection datasets and investigate the few-shot learning methods. Furthermore, to achieve real-time object detection, it is necessary to improve the model efficiency and design light-weighted models. This Special Issue is aimed at addressing the following issues:

  • Salient object detection, face detection and pedestrian detection.
  • Object detection architecture designing.
  • SAR object detection.
  • Infrared object detection.
  • Real-time object detection.
  • Object detection datasets building.
  • Few-shot object detection.
  • Small object detection.
  • Underwater object detection.
  • Image enhancement and image generation.
  • Data augmentation.

Dr. Junchao Zhang
Dr. Moran Ju
Dr. Xiangyue Zhang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • object detection
  • deep learning
  • dataset building
  • data augmentation
  • image enhancement
  • real-time object detection

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 5059 KiB  
Article
Multi-Dimensional Information Fusion You Only Look Once Network for Suspicious Object Detection in Millimeter Wave Images
by Zhenhong Chen, Ruijiao Tian, Di Xiong, Chenchen Yuan, Tang Li and Yiran Shi
Electronics 2024, 13(4), 773; https://doi.org/10.3390/electronics13040773 - 16 Feb 2024
Cited by 2 | Viewed by 1135
Abstract
Millimeter wave (MMW) imaging systems have been widely used for security screening in public places due to their advantages of being able to detect a variety of suspicious objects, non-contact operation, and harmlessness to the human body. In this study, we propose an [...] Read more.
Millimeter wave (MMW) imaging systems have been widely used for security screening in public places due to their advantages of being able to detect a variety of suspicious objects, non-contact operation, and harmlessness to the human body. In this study, we propose an innovative, multi-dimensional information fusion YOLO network that can aggregate and capture multimodal information to cope with the challenges of low resolution and susceptibility to noise in MMW images. In particular, an MMW data information aggregation module is developed to adaptively synthesize a novel type of MMW image, which simultaneously contains pixel, depth, phase, and diverse signal-to-noise information to overcome the limitations of current MMW images containing consistent pixel information in all three channels. Furthermore, this module is capable of differentiable data enhancements to take into account adverse noise conditions in real application scenarios. In order to fully acquire the augmented contextual information mentioned above, we propose an asymptotic path aggregation network and combine it with YOLOv8. The proposed method is able to adaptively and bidirectionally fuse deep and shallow features while avoiding semantic gaps. In addition, a multi-view, multi-parameter mapping technique is designed to enhance the detection ability. The experiments on the measured MMW datasets validate the improvement in object detection using the proposed model. Full article
Show Figures

Figure 1

14 pages, 4760 KiB  
Article
SC-YOLOv8: A Security Check Model for the Inspection of Prohibited Items in X-ray Images
by Li Han, Chunhai Ma, Yan Liu, Junyang Jia and Jiaxing Sun
Electronics 2023, 12(20), 4208; https://doi.org/10.3390/electronics12204208 - 11 Oct 2023
Cited by 8 | Viewed by 2280
Abstract
X-ray package security check systems are widely used in public places, but they face difficulties in accurately detecting prohibited items due to the stacking and diversity of shapes of the objects inside the luggage, posing a threat to personal safety in public places. [...] Read more.
X-ray package security check systems are widely used in public places, but they face difficulties in accurately detecting prohibited items due to the stacking and diversity of shapes of the objects inside the luggage, posing a threat to personal safety in public places. The existing methods for X-ray image object detection suffer from low accuracy and poor generalization, mainly due to the lack of large-scale and high-quality datasets. To address this gap, a novel large-scale X-ray image dataset for object detection, LSIray, is provided, consisting of high-quality X-ray images of luggage and objects of 21 types and sizes. LSIray covers some common categories that were neglected in previous research. The dataset provides more realistic and rich data resources for X-ray image object detection. To address the problem of poor security inspection, an improved model based on YOLOv8 is proposed, named SC- YOLOv8, consisting of two new modules: CSPnet Deformable Convolution Network Module (C2F_DCN) and Spatial Pyramid Multi-Head Attention Module (SPMA). C2F_DCN uses deformable convolution, which can adaptively adjust the position and shape of the receptive field to accommodate the diversity of targets. SPMA adopts the spatial pyramid head attention mechanism, which can utilize feature information from different scales and perspectives to enhance the representation ability of targets. The proposed method is evaluated through extensive experiments using the LSIray dataset and comparisons with the existing methods. The results show that the method surpasses the state-of-the-art methods on various indicators. Experimenting using the LSIray dataset and the OPIXray dataset, our SC-YOlOv8 model achieves 82.7% and 89.2% detection accuracies, compared to the YOLOv8 model, which is an improvement of 1.4% and 1.2%, respectively. The work not only provides valuable data resources, but also offers a novel and effective solution for the X-ray image security check problem. Full article
Show Figures

Figure 1

15 pages, 5006 KiB  
Article
Intelligent Recognition of Smoking and Calling Behaviors for Safety Surveillance
by Jingyuan Zhang, Lunsheng Wei, Bin Chen, Heping Chen and Wangming Xu
Electronics 2023, 12(15), 3225; https://doi.org/10.3390/electronics12153225 - 26 Jul 2023
Viewed by 1373
Abstract
Smoking and calling are two typical behaviors involved in public and industrial safety that usually need to be strictly monitored and even prohibited on many occasions. To resolve the problems of missed detection and false detection in the existing traditional and deep-learning-based behavior-recognition [...] Read more.
Smoking and calling are two typical behaviors involved in public and industrial safety that usually need to be strictly monitored and even prohibited on many occasions. To resolve the problems of missed detection and false detection in the existing traditional and deep-learning-based behavior-recognition methods, an intelligent recognition method using a multi-task YOLOv4 (MT-YOLOv4) network combined with behavioral priors is proposed. The original YOLOv4 is taken as the baseline network to be improved in the proposed method. Firstly, a K-means++ algorithm is used to re-cluster and optimize the anchor boxes, which are a set of predefined bounding boxes to capture the scale and aspect ratio of specific objects. Then, the network is divided into two branches with the same blocks but independent tasks after the shared feature extraction layer of CSPDarknet-53, i.e., the behavior-detection branch and the object-detection branch, which predict the behaviors and their related objects respectively from the input image or video frame. Finally, according to the preliminary predicted results of the two branches, comprehensive reasoning rules are established to obtain the final behavior-recognition result. A dataset on smoking and calling detection is constructed for training and testing, and the experimental results indicate that the proposed method has a 6.2% improvement in recall and a 2.4% improvement in F1 score at the cost of a slight loss in precision compared to the baseline method; the proposed method achieved the best performance among the compared methods. It can be deployed to related security surveillance systems for unsafe-behavior monitoring and early-warning management in practical scenarios. Full article
Show Figures

Figure 1

14 pages, 6736 KiB  
Article
A Mask-Wearing Detection Model in Complex Scenarios Based on YOLOv7-CPCSDSA
by Jingyang Wang, Junkai Wang, Xiaotian Zhang and Naiwen Yu
Electronics 2023, 12(14), 3128; https://doi.org/10.3390/electronics12143128 - 19 Jul 2023
Cited by 7 | Viewed by 1704
Abstract
With the rapid development of deep learning technology, many algorithms for mask-wearing detection have achieved remarkable results. However, the detection effect still needs to be improved when dealing with mask-wearing in some complex scenes where the targets are too dense or partially occluded. [...] Read more.
With the rapid development of deep learning technology, many algorithms for mask-wearing detection have achieved remarkable results. However, the detection effect still needs to be improved when dealing with mask-wearing in some complex scenes where the targets are too dense or partially occluded. This paper proposes a new mask-wearing detection model: YOLOv7-CPCSDSA. Based on YOLOv7, this model replaces some convolutions of the original model, CatConv, with FasterNet’s partial convolution (PConv) to form a CatPConv (CPC) structure, which can reduce computational redundancy and memory access. In the case of an increase in the network layer, the parameters are reduced instead. The Small Detection (SD) module is added to the model, which includes structures such as upsampling, concat convolution, and MaxPooling to enhance the ability to capture small targets, thereby improving detection accuracy. In addition, the Shuffle Attention (SA) mechanism is introduced, which enables the model to adaptively focus on important local information, thereby improving the accuracy of detecting mask-wearing. This paper uses comparative and ablation experiments in the mask dataset (including many images in complex scenarios) to verify the model’s effectiveness. The results show that the mean average [email protected] ([email protected]) of YOLOv7-CPCSDSA reaches 88.4%, which is 1.9% higher than that of YOLOv7, and its frames per second (FPS) rate reaches 75.8 f/s, meeting the real-time detection requirements. Therefore, YOLOv7-CPCSDSA is suitable for detecting mask-wearing in complex scenarios. Full article
Show Figures

Figure 1

20 pages, 7210 KiB  
Article
An Enhanced Lightweight Network for Road Damage Detection Based on Deep Learning
by Hui Luo, Chenbiao Li, Mingquan Wu and Lianming Cai
Electronics 2023, 12(12), 2583; https://doi.org/10.3390/electronics12122583 - 8 Jun 2023
Cited by 8 | Viewed by 1807
Abstract
Achieving accurate and efficient detection of road damage in complex scenes has always been a challenging task. In this paper, an enhanced lightweight network, E-EfficientDet, is proposed. Firstly, a feature extraction enhancement module (FEEM) is designed to increase the receptive field and improve [...] Read more.
Achieving accurate and efficient detection of road damage in complex scenes has always been a challenging task. In this paper, an enhanced lightweight network, E-EfficientDet, is proposed. Firstly, a feature extraction enhancement module (FEEM) is designed to increase the receptive field and improve the feature expression capability of the network, which can extract richer multi-scale feature information. Secondly, to promote the reuse of feature information between different layers in the network and take full advantage of multi-scale context information, four pyramid modules with different structures are designed based on the idea of semi-dense connection, among which the bidirectional feature pyramid network with longitudinal connection (LC-BiFPN) is more suitable for road damage detection. Finally, to meet the road damage detection tasks under different hardware resource constraints, the E-EfficientDet-D0~D2 networks are proposed in this paper based on the compound scaling strategy. Experimental results show that the detection accuracy of E-EfficientDet-D0 improves by 2.41% compared with the original EfficientDet-D0 on the publicly available road damage dataset and outperforms other networks such as YOLOv5s, YOLOv7-tiny, YOLOv4-tiny, Faster R-CNN, and SSD. Meanwhile, the detection speed of EfficientDet-D0 can reach 27.0 FPS, which meets the demand for real-time detection, and the model size is only 32.31 MB, which is suitable for deployment in mobile devices such as unmanned inspection carts, UAVs, and smartphones. In addition, the detection accuracy of E-EfficientDet-D2 can reach 57.51%, which is 4.39% higher than E-EfficientDet-D0, and the model size is 61.78 MB, which is suitable for practical application scenarios that require higher detection accuracy and better hardware performance. Full article
Show Figures

Figure 1

14 pages, 10141 KiB  
Article
SuperDet: An Efficient Single-Shot Network for Vehicle Detection in Remote Sensing Images
by Moran Ju, Buniu Niu, Sinian Jin and Zhaoming Liu
Electronics 2023, 12(6), 1312; https://doi.org/10.3390/electronics12061312 - 9 Mar 2023
Cited by 4 | Viewed by 1549
Abstract
Vehicle detection in remote sensing images plays an important role for its wide range of applications. However, it is still a challenging task due to their small sizes. In this paper, we propose an efficient single-shot-based detector, called SuperDet, which achieves a combination [...] Read more.
Vehicle detection in remote sensing images plays an important role for its wide range of applications. However, it is still a challenging task due to their small sizes. In this paper, we propose an efficient single-shot-based detector, called SuperDet, which achieves a combination of a super resolution algorithm with a deep convolutional neural network (DCNN)-based object detector. In SuperDet, there are two interconnected modules, namely, the super resolution module and the vehicle detection module. The super resolution module aims to recover a high resolution sensing image from its low resolution counterpart. With this module, the small vehicles will have a higher resolution, which is helpful for their detection. Taking the higher resolution image as input, the vehicle detection module extracts the features and predicts the location and category of the vehicles. We use a multi-task loss function to train the network in an end-to-end way. To assess the detection performance of SuperDet, we conducted experiments between SuperDet and the classical object detectors on both VEDAI and DOTA datasets. Experimental results indicate that SuperDet outperforms other detectors for vehicle detection in remote sensing images. Full article
Show Figures

Figure 1

14 pages, 4120 KiB  
Article
Saliency Detection Based on Low-Level and High-Level Features via Manifold-Space Ranking
by Xiaoli Li, Yunpeng Liu and Huaici Zhao
Electronics 2023, 12(2), 449; https://doi.org/10.3390/electronics12020449 - 15 Jan 2023
Cited by 2 | Viewed by 2779
Abstract
Saliency detection as an active research direction in image understanding and analysis has been studied extensively. In this paper, to improve the accuracy of saliency detection, we propose an efficient unsupervised salient object detection method. The first step of our method is that [...] Read more.
Saliency detection as an active research direction in image understanding and analysis has been studied extensively. In this paper, to improve the accuracy of saliency detection, we propose an efficient unsupervised salient object detection method. The first step of our method is that we extract local low-level features of each superpixel after segmenting the image into different scale parts, which helps to locate the approximate locations of salient objects. Then, we use convolutional neural networks to extract high-level, semantically rich features as complementary features of each superpixel, and low-level features, as well as high-level features of each superpixel, are incorporated into a new feature vector to measure the distance between different superpixels. The last step is that we use a manifold space-ranking method to calculate the saliency of each superpixel. Extensive experiments over four challenging datasets indicate that the proposed method surpasses state-of-the-art methods and is closer to the ground truth. Full article
Show Figures

Figure 1

14 pages, 4439 KiB  
Article
Infrared Weak and Small Target Detection Based on Top-Hat Filtering and Multi-Feature Fuzzy Decision-Making
by Degui Yang, Zhengyang Bai and Junchao Zhang
Electronics 2022, 11(21), 3549; https://doi.org/10.3390/electronics11213549 - 31 Oct 2022
Cited by 4 | Viewed by 2148
Abstract
Infrared weak and small target detection in a complex background has always been a research hotspot in the fields of area defense and long-range precision strikes. Among them, the single-frame infrared weak and small target detection technology is even more difficult to study [...] Read more.
Infrared weak and small target detection in a complex background has always been a research hotspot in the fields of area defense and long-range precision strikes. Among them, the single-frame infrared weak and small target detection technology is even more difficult to study due to factors such as lack of target motion information, complex background, and low signal-to-noise ratio. Aiming at the problem of a high false alarm rate in infrared weak and small target detection caused by the complex background edges and noise interference in infrared images, this paper proposes an infrared weak and small target detection algorithm based on top-hat filtering and multi-feature fuzzy decision-making. The algorithm first uses the multi-structural element top-hat operator to filter the original image and then obtains the suspected target area through adaptive threshold segmentation; secondly, it uses image feature algorithms, such as central pixel contrast, regional gradient, and directional gradient, to extract the feature information of the suspected target at multiple scales, and the fuzzy decision method is used for multi-feature fusion to achieve the final target detection. Finally, the performance of the proposed algorithm and several existing comparison algorithms are compared using the measured infrared sequence image data of five different scenarios. The results show that the proposed algorithm has obvious advantages in various performance indicators compared with the existing algorithms for infrared image sequences in different interference scenarios, especially for complex background types, and has a lower performance under the condition of ensuring the same detection rate and false alarm rate and in meeting the real-time requirements of the algorithm. Full article
Show Figures

Figure 1

Back to TopTop