applsci-logo

Journal Browser

Journal Browser

Advanced Pattern Recognition & Computer Vision

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 June 2025 | Viewed by 38940

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
Interests: computer vision; deep learning; object segmentation; visual tracking; image classification; remote sensing

E-Mail Website
Guest Editor
School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
Interests: computer vision; deep learning; image aesthetic quality assessment; affective computing

Special Issue Information

Dear Colleagues,

Pattern recognition and computer vision are long-term research hotspots, which have a wide range of application scenarios in real life. Recently, deep learning has become the core technology for pattern recognition and computer vision tasks. Although these deep learning models have achieved remarkable success in fields such as multimedia data recognition and analysis, the existing technology mainly has a promising performance from a data-driven perspective. Therefore, advanced pattern recognition and computer vision methods are urgently needed in relevant research fields. Future studies should seek the application of general-purpose deep models with interpretable knowledge learning to pattern recognition and computer vision tasks, such as object detection and image analysis and understanding. In this Special Issue, we are particularly interested in advanced pattern recognition and computer vision approaches.

Topics of interest include but are not limited to:

  • Image classification and segmentation;
  • Object detection and tracking;
  • Image understanding and scene analysis;
  • Image denoising and reconstruction;
  • Psychophysical analysis of visual perception;
  • Image generation and super-resolution;
  • Visual data reduction and compression;
  • Deep learning for computer vision tasks (medical image processing, remote sensing, hyperspectral imaging, thermal imaging);
  • Multimedia affective computing;
  • RGB-D and 3D processing;
  • Interpretable deep learning models.

Prof. Dr. Rui Yao
Dr. Hancheng Zhu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • pattern recognition
  • computer vision
  • deep learning
  • image processing
  • object detection

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (18 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

20 pages, 4849 KiB  
Article
Occlusion Removal in Light-Field Images Using CSPDarknet53 and Bidirectional Feature Pyramid Network: A Multi-Scale Fusion-Based Approach
by Mostafa Farouk Senussi and Hyun-Soo Kang
Appl. Sci. 2024, 14(20), 9332; https://doi.org/10.3390/app14209332 - 13 Oct 2024
Viewed by 959
Abstract
Occlusion removal in light-field images remains a significant challenge, particularly when dealing with large occlusions. An architecture based on end-to-end learning is proposed to address this challenge that interactively combines CSPDarknet53 and the bidirectional feature pyramid network for efficient light-field occlusion removal. CSPDarknet53 [...] Read more.
Occlusion removal in light-field images remains a significant challenge, particularly when dealing with large occlusions. An architecture based on end-to-end learning is proposed to address this challenge that interactively combines CSPDarknet53 and the bidirectional feature pyramid network for efficient light-field occlusion removal. CSPDarknet53 acts as the backbone, providing robust and rich feature extraction across multiple scales, while the bidirectional feature pyramid network enhances comprehensive feature integration through an advanced multi-scale fusion mechanism. To preserve efficiency without sacrificing the quality of the extracted feature, our model uses separable convolutional blocks. A simple refinement module based on half-instance initialization blocks is integrated to explore the local details and global structures. The network’s multi-perspective approach guarantees almost total occlusion removal, enabling it to handle occlusions of varying sizes or complexity. Numerous experiments were run on sparse and dense datasets with varying degrees of occlusion severity in order to assess the performance. Significant advancements over the current cutting-edge techniques are shown in the findings for the sparse dataset, while competitive results are obtained for the dense dataset. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

24 pages, 3285 KiB  
Article
SBD-Net: Incorporating Multi-Level Features for an Efficient Detection Network of Student Behavior in Smart Classrooms
by Zhifeng Wang, Minghui Wang, Chunyan Zeng and Longlong Li
Appl. Sci. 2024, 14(18), 8357; https://doi.org/10.3390/app14188357 - 17 Sep 2024
Viewed by 795
Abstract
Detecting student behavior in smart classrooms is a critical area of research in educational technology that significantly enhances teaching quality and student engagement. This paper introduces an innovative approach using advanced computer vision and artificial intelligence technologies to monitor and analyze student behavior [...] Read more.
Detecting student behavior in smart classrooms is a critical area of research in educational technology that significantly enhances teaching quality and student engagement. This paper introduces an innovative approach using advanced computer vision and artificial intelligence technologies to monitor and analyze student behavior in real time. Such monitoring assists educators in adjusting their teaching strategies effectively, thereby optimizing classroom instruction. However, the application of this technology faces substantial challenges, including the variability in student sizes, the diversity of behaviors, and occlusions among students in complex classroom settings. Additionally, the uneven distribution of student behaviors presents a significant hurdle. To overcome these challenges, we propose Student Behavior Detection Network (SBD-Net), a lightweight target detection model enhanced by the Focal Modulation module for robust multi-level feature fusion, which augments feature extraction capabilities. Furthermore, the model incorporates the ESLoss function to address the imbalance in behavior sample detection effectively. The innovation continues with the Dyhead detection head, which integrates three-dimensional attention mechanisms, enhancing behavioral representation without escalating computational demands. This balance achieves both a high detection accuracy and manageable computational complexity. Empirical results from our bespoke student behavior dataset, Student Classroom Behavior (SCBehavior), demonstrate that SBD-Net achieves a mean Average Precision (mAP) of 0.824 with a low computational complexity of just 9.8 G. These figures represent a 4.3% improvement in accuracy and a 3.8% increase in recall compared to the baseline model. These advancements underscore the capability of SBD-Net to handle the skewed distribution of student behaviors and to perform high-precision detection in dynamically challenging classroom environments. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

22 pages, 6038 KiB  
Article
An Enhanced SL-YOLOv8-Based Lightweight Remote Sensing Detection Algorithm for Identifying Broken Strands in Transmission Lines
by Xiang Zhang, Jianwei Zhang and Xiaoqiang Jia
Appl. Sci. 2024, 14(17), 7469; https://doi.org/10.3390/app14177469 - 23 Aug 2024
Viewed by 625
Abstract
Power transmission lines frequently face threats from lightning strikes, severe storms, and chemical corrosion, which can lead to damage in steel–aluminum-stranded wires, thereby seriously affecting the stability of the power system. Currently, manual inspections are relatively inefficient and high risk, while drone inspections [...] Read more.
Power transmission lines frequently face threats from lightning strikes, severe storms, and chemical corrosion, which can lead to damage in steel–aluminum-stranded wires, thereby seriously affecting the stability of the power system. Currently, manual inspections are relatively inefficient and high risk, while drone inspections are often limited by complex environments and obstacles. Existing detection algorithms still face difficulties in identifying broken strands. To address these issues, this paper proposes a new method called SL-YOLOv8. This method incorporates an improved You Only Look Once version 8 (YOLOv8) algorithm, specifically designed for online intelligent inspection robots to detect broken strands in transmission lines. Transmission lines are susceptible to lightning strikes, storms, and chemical corrosion, which is leading to the potential failure of steel- and aluminum-stranded lines, and significantly impacting the stability of the power system. Currently, manual inspections come with relatively low efficiency and high risk, and Unmanned Aerial Vehicle (UAV) inspections are hindered by complex situations and obstacles, with current algorithms making it difficult to detect the broken strand lines. This paper proposes SL-YOLOv8, which is a broken transmission line strand detection method for an online intelligent inspection robot combined with an improved You Only Look Once version 8 (YOLOv8). By incorporating the Squeeze-and-Excitation Network version 2 (SENet_v2) into the feature fusion network, the method effectively enhances adaptive feature representation by focusing on and amplifying key information, thereby improving the network’s capability to detect small objects. Additionally, the introduction of the LSKblockAttention module, which combines Large Selective Kernels (LSKs) and the attention mechanism, allows the model to dynamically select and enhance critical features, significantly enhancing detection accuracy and robustness while maintaining model precision. Compared with the original YOLOv8 algorithm, SL-YOLOv8 demonstrates improved precision recognition accuracy in Break-ID-1632 and cable damage datasets. The precision is increased by 3.9% and 2.7%, and the recall is increased by 12.2% and 2.3%, respectively, for the two datasets. The mean average precision (mAP) at the Intersection over Union (IoU) threshold of 0.5 is also increased by 4.9% and 1.2%, showing the SL-YOLOv8’s effectiveness in accurately identifying small objects in complex situations. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

21 pages, 3355 KiB  
Article
All-in-Focus Three-Dimensional Reconstruction Based on Edge Matching for Artificial Compound Eye
by Sidong Wu, Liuquan Ren and Qingqing Yang
Appl. Sci. 2024, 14(11), 4403; https://doi.org/10.3390/app14114403 - 22 May 2024
Viewed by 757
Abstract
An artificial compound eye consists of multiple apertures that allow for a large field of view (FOV) while maintaining a small size. Each aperture captures a sub-image, and multiple sub-images are needed to reconstruct the full FOV. The reconstruction process is depth-related due [...] Read more.
An artificial compound eye consists of multiple apertures that allow for a large field of view (FOV) while maintaining a small size. Each aperture captures a sub-image, and multiple sub-images are needed to reconstruct the full FOV. The reconstruction process is depth-related due to the parallax between adjacent apertures. This paper presents an all-in-focus 3D reconstruction method for a specific type of artificial compound eye called the electronic cluster eye (eCley). The proposed method uses edge matching to address the edge blur and large textureless areas existing in the sub-images. First, edges are extracted from each sub-image, and then a matching operator is applied to match the edges based on their shape context and intensity. This produces a sparse matching result that is then propagated to the whole image. Next, a depth consistency check and refinement method is performed to refine the depth of all sub-images. Finally, the sub-images and depth maps are merged to produce the final all-in-focus image and depth map. The experimental results and comparative analysis demonstrate the effectiveness of the proposed method. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

19 pages, 13672 KiB  
Article
Frequency-Separated Attention Network for Image Super-Resolution
by Daokuan Qu, Liulian Li and Rui Yao
Appl. Sci. 2024, 14(10), 4238; https://doi.org/10.3390/app14104238 - 16 May 2024
Cited by 1 | Viewed by 1138
Abstract
The use of deep convolutional neural networks has significantly improved the performance of super-resolution. Employing deeper networks to enhance the non-linear mapping capability from low-resolution (LR) to high-resolution (HR) images has inadvertently weakened the information flow and disrupted long-term memory. Moreover, overly deep [...] Read more.
The use of deep convolutional neural networks has significantly improved the performance of super-resolution. Employing deeper networks to enhance the non-linear mapping capability from low-resolution (LR) to high-resolution (HR) images has inadvertently weakened the information flow and disrupted long-term memory. Moreover, overly deep networks are challenging to train, thus failing to exhibit the expressive capability commensurate with their depth. High-frequency and low-frequency features in images play different roles in image super-resolution. Networks based on CNNs, which should focus more on high-frequency features, treat these two types of features equally. This results in redundant computations when processing low-frequency features and causes complex and detailed parts of the reconstructed images to appear as smooth as the background. To maintain long-term memory and focus more on the restoration of image details in networks with strong representational capabilities, we propose the Frequency-Separated Attention Network (FSANet), where dense connections ensure the full utilization of multi-level features. In the Feature Extraction Module (FEM), the use of the Res ASPP Module expands the network’s receptive field without increasing its depth. To differentiate between high-frequency and low-frequency features within the network, we introduce the Feature-Separated Attention Block (FSAB). Furthermore, to enhance the quality of the restored images using heuristic features, we incorporate attention mechanisms into the Low-Frequency Attention Block (LFAB) and the High-Frequency Attention Block (HFAB) for processing low-frequency and high-frequency features, respectively. The proposed network outperforms the current state-of-the-art methods in tests on benchmark datasets. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

17 pages, 4706 KiB  
Article
Gender Identification of Chinese Mitten Crab Juveniles Based on Improved Faster R-CNN
by Hao Gu, Ming Chen and Dongmei Gan
Appl. Sci. 2024, 14(2), 908; https://doi.org/10.3390/app14020908 - 21 Jan 2024
Cited by 1 | Viewed by 1489
Abstract
The identification of gender in Chinese mitten crab juveniles is a critical prerequisite for the automatic classification of these crab juveniles. Aiming at the problem that crab juveniles are of different sizes and relatively small, with unclear male and female characteristics and complex [...] Read more.
The identification of gender in Chinese mitten crab juveniles is a critical prerequisite for the automatic classification of these crab juveniles. Aiming at the problem that crab juveniles are of different sizes and relatively small, with unclear male and female characteristics and complex background environment, an algorithm C-SwinFaster for identifying the gender of Chinese mitten crab juveniles based on improved Faster R-CNN was proposed. This algorithm introduces Swin Transformer as the backbone network and an improved Path Aggregation Feature Pyramid Network (PAFPN) in the neck to obtain multi-scale high-level semantic feature maps, thereby improving the gender recognition accuracy of Chinese mitten crab male and female juveniles. Then, a self-attention mechanism is introduced into the region of interest pooling network (ROI Pooling) to enhance the model’s attention to the classification features of male and female crab juveniles and reduce background interference on the detection results. Additionally, we introduce an improved non-maximum suppression algorithm, termed Softer-NMS. This algorithm refines the process of determining precise target candidate boxes by modulating the confidence level, thereby enhancing detection accuracy. Finally, the focal loss function is introduced to train the model, reducing the weight of simple samples during the training process, and allowing the model to focus more on samples that are difficult to distinguish. Experimental results demonstrate that the enhanced C-SwinFaster algorithm significantly improves the identification accuracy of male and female Chinese mitten crab juveniles. The mean average precision (mAP) of this algorithm reaches 98.45%, marking a 10.33 percentage point increase over the original model. This algorithm has a good effect on the gender recognition of Chinese mitten crab juveniles and can provide technical support for the automatic classification of Chinese mitten crab juveniles. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

15 pages, 31048 KiB  
Article
Multi-View Masked Autoencoder for General Image Representation
by Seungbin Ji, Sangkwon Han and Jongtae Rhee
Appl. Sci. 2023, 13(22), 12413; https://doi.org/10.3390/app132212413 - 16 Nov 2023
Cited by 1 | Viewed by 1848
Abstract
Self-supervised learning is a method that learns general representation from unlabeled data. Masked image modeling (MIM), one of the generative self-supervised learning methods, has drawn attention for showing state-of-the-art performance on various downstream tasks, though it has shown poor linear separability resulting from [...] Read more.
Self-supervised learning is a method that learns general representation from unlabeled data. Masked image modeling (MIM), one of the generative self-supervised learning methods, has drawn attention for showing state-of-the-art performance on various downstream tasks, though it has shown poor linear separability resulting from the token-level approach. In this paper, we propose a contrastive learning-based multi-view masked autoencoder for MIM, thus exploiting an image-level approach by learning common features from two different augmented views. We strengthen the MIM by learning long-range global patterns from contrastive loss. Our framework adopts a simple encoder–decoder architecture, thus learning rich and general representations by following a simple process: (1) Two different views are generated from an input image with random masking and by contrastive loss, we can learn the semantic distance of the representations generated by an encoder. By applying a high mask ratio, of 80%, it works as strong augmentation and alleviates the representation collapse problem. (2) With reconstruction loss, the decoder learns to reconstruct an original image from the masked image. We assessed our framework through several experiments on benchmark datasets of image classification, object detection, and semantic segmentation. We achieved 84.3% in fine-tuning accuracy on ImageNet-1K classification and 76.7% in linear probing, thus exceeding previous studies and showing promising results on other downstream tasks. The experimental results demonstrate that our work can learn rich and general image representation by applying contrastive loss to masked image modeling. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

18 pages, 5209 KiB  
Article
The Machine Vision Dial Automatic Drawing System—Based on CAXA Secondary Development
by Ning Zhang, Fei Li and Enxu Zhang
Appl. Sci. 2023, 13(13), 7365; https://doi.org/10.3390/app13137365 - 21 Jun 2023
Cited by 1 | Viewed by 1405
Abstract
Whether the pointer recognition of an SF6 (sulfur hexafluoride) density controller is correct or not is directly related to enterprise security. Therefore, a pointer recognition system that is highly efficient, precise, and secure is required. In this paper, we set up a detection [...] Read more.
Whether the pointer recognition of an SF6 (sulfur hexafluoride) density controller is correct or not is directly related to enterprise security. Therefore, a pointer recognition system that is highly efficient, precise, and secure is required. In this paper, we set up a detection platform for efficient detection of pointer readings. To detect the pointer with high accuracy, we propose a dial pointer angle detection method based on OpenCV that obtains high-precision dial pointer angle information. We then use Socket communication to transmit the dial pointer information to the secondary development program. Finally, we use the ObjectCRX development software package based on Visual Studio 2015 to quickly access the CAXA electronic drawing board graphics database, carry out secondary development of the CAXA electronic drawing board 2021, and comprehensively apply various functions to draw the overall outline of the dial. Finally, we read the dial pointer angle in order to draw the current test product dial CAD drawing. After verification, the average error rate of pointer angle recognition is 0.69°, which is much lower than the enterprise standard of 2.7°. Based on machine vision technology, we researched and designed an SF6 density controller dial automatic drawing system, and after many tests, the system runs stably and the accuracy meets the needs of enterprises. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

28 pages, 10357 KiB  
Article
Modeling of Walking-Gait Parameters and Walking Strategy for Quadruped Robots
by Zhaolu Li, Yumin Song, Xiaoli Zhang, Xiafu Peng and Ning Xu
Appl. Sci. 2023, 13(12), 6876; https://doi.org/10.3390/app13126876 - 6 Jun 2023
Cited by 7 | Viewed by 2782
Abstract
The inspiration for the footed robot was originally derived from biology, and it was an imitation of biological form and movement. In this paper, a bionic-robot dog is designed to reveal the motion characteristics of a quadruped robot mechanism through modeling, model kinematic [...] Read more.
The inspiration for the footed robot was originally derived from biology, and it was an imitation of biological form and movement. In this paper, a bionic-robot dog is designed to reveal the motion characteristics of a quadruped robot mechanism through modeling, model kinematic analysis, and other methods. First, the structural characteristics and movement characteristics of the developed bionic-dog model are studied. The first step is to study the physiological structure of the dog, analyze the function of the dog’s limbs, and then use a high-speed camera to capture the motion of the marked bionic-robot dog and shoot motion video of the bionic-robot dog in different motion states. The effective data of the marked points in the video are extracted using PHOTRON 1.0 software, and the extracted data are analyzed and processed in the software MATLAB R2020a, and finally the structural characteristics and motion laws of the bionic-robot dog are obtained. Then, a bionic-robot-dog experimental platform is built to conduct experiments with three planned gaits (dynamic gait, static gait, and gait transition). The experiments showed that the three gaits were consistent with the planned movements and the bionic-robot dog could perform stable fast-gait walking, slow-gait walking, and quickly complete gait transitions. All three gaits were simulated in ADAMS 2019 software, and the simulation results showed that all three gaits caused the bionic dog robot to move smoothly. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

14 pages, 4191 KiB  
Article
GAN Data Augmentation Methods in Rock Classification
by Gaochang Zhao, Zhao Cai, Xin Wang and Xiaohu Dang
Appl. Sci. 2023, 13(9), 5316; https://doi.org/10.3390/app13095316 - 24 Apr 2023
Cited by 9 | Viewed by 1770
Abstract
In this paper, a data augmentation method Conditional Residual Deep Convolutional Generative Adversarial Network (CRDCGAN) based on Deep Convolutional Generative Adversarial Network (DCGAN) is proposed to address the problem that the accuracy of existing image classification techniques is too low when classifying small-scale [...] Read more.
In this paper, a data augmentation method Conditional Residual Deep Convolutional Generative Adversarial Network (CRDCGAN) based on Deep Convolutional Generative Adversarial Network (DCGAN) is proposed to address the problem that the accuracy of existing image classification techniques is too low when classifying small-scale rock images. Firstly, Wasserstein distance is introduced to change the loss function, which makes the training of the network more stable; secondly, conditional information is added, and the network has the ability to generate and discriminate image data with label information; finally, the residual module is added to improve the quality of generated images. The results demonstrate that by applying CRDCGAN to the augmented rock image dataset, the accuracy of the classification model trained on this dataset is as high as 96.38%, which is 13.39% higher than that of the classification model trained on the non-augmented dataset, and 8.56% and 6.27% higher than that of the traditional dataset augmented method and the DCGAN dataset augmentation method, respectively. CRDCGAN expands the rock image dataset, which makes the rock classification model accuracy effectively improved. The data augmentation method was found to be able to change the accuracy of the classification model to a greater extent. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

17 pages, 1876 KiB  
Article
A Blind Image Quality Index for Synthetic and Authentic Distortions with Hierarchical Feature Fusion
by Lingbi Hu, Juan Peng, Tuoxun Zhao, Wei Yu and Bo Hu
Appl. Sci. 2023, 13(6), 3591; https://doi.org/10.3390/app13063591 - 11 Mar 2023
Cited by 2 | Viewed by 1846
Abstract
Blind Image Quality Assessment (BIQA) for synthetic and authentic distortions has attracted much attention in the community, and it is still a great challenge. The existing quality metrics are mildly consistent with subjective perception. Traditional handcrafted quality metrics can easily and directly extract [...] Read more.
Blind Image Quality Assessment (BIQA) for synthetic and authentic distortions has attracted much attention in the community, and it is still a great challenge. The existing quality metrics are mildly consistent with subjective perception. Traditional handcrafted quality metrics can easily and directly extract low-level features, which mainly account for the outline, edge, color, texture, and shape features, while ignoring the important deep semantics of the distorted image. In the field of popular deep learning, multilevel features can be acquired easily. However, most of them either use only high-level features, ignoring the shallow features, or they simply combine features at different levels, resulting in limited prediction performance. Motivated by these, this paper presents a novel BIQA for synthetic and authentic distortions with hierarchical feature fusion in a flexible vision-Transformer framework. First, multiscale features are extracted from a strong vision-Transformer backbone. Second, an effective hierarchical feature fusion module is proposed to incorporate the features at different levels progressively. To eliminate redundant information, a simple but effective attention mechanism is employed after each fusion. Third, inspired by the human visual system, local and global features from the fusion features are extracted to represent different granularity distortions. Finally, these local and global features are mapped to the final quality score. Extensive experiments on three authentic image databases and two synthetic image datasets show that the proposed method is superior to the state-of-the-art quality metrics for both single-database testing and cross-database testing. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

23 pages, 5705 KiB  
Article
Integrating Prior Knowledge into Attention for Ship Detection in SAR Images
by Yin Pan, Lei Ye, Yingkun Xu and Junyi Liang
Appl. Sci. 2023, 13(5), 2941; https://doi.org/10.3390/app13052941 - 24 Feb 2023
Cited by 2 | Viewed by 1675
Abstract
Although they have achieved great success in optical images, deep convolutional neural networks underperform for ship detection in SAR images because of the lack of color and textual features. In this paper, we propose our framework which integrates prior knowledge into neural networks [...] Read more.
Although they have achieved great success in optical images, deep convolutional neural networks underperform for ship detection in SAR images because of the lack of color and textual features. In this paper, we propose our framework which integrates prior knowledge into neural networks by means of the attention mechanism. Because the background of ships is mostly water surface or coast, we use clustering algorithms to generate the prior knowledge map from brightness and density features. The prior knowledge map is later resized and fused with convolutional feature maps by the attention mechanism. Our experiments demonstrate that our framework is able to improve various one-stage and two-stage object detection algorithms (Faster R-CNN, RetinaNet, SSD, and YOLOv4) on two benchmark datasets (SSDD, LS-SSDD, and HRSID). Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Graphical abstract

22 pages, 6706 KiB  
Article
Research on Human-Computer Interaction Technology of Large-Scale High-Resolution Display Wall System
by Chen Huang, Yimin Chen, Weiqin Tong, Tao Feng and Mingxing Deng
Appl. Sci. 2023, 13(1), 591; https://doi.org/10.3390/app13010591 - 1 Jan 2023
Cited by 3 | Viewed by 3027
Abstract
As an effective solution for data visualization and analysis, the large-scale high-resolution display wall system has been widely used in various scientific research fields. On the basis of investigating existing system cases and research results, this paper introduces the SHU-VAS system (60 screens, [...] Read more.
As an effective solution for data visualization and analysis, the large-scale high-resolution display wall system has been widely used in various scientific research fields. On the basis of investigating existing system cases and research results, this paper introduces the SHU-VAS system (60 screens, 120 megapixels) for data visualization and analysis. In order to improve the efficiency of human-computer interaction in large-scale high-definition display wall systems, we propose an interaction framework based on gesture and double-precision pointing technology. During the interaction process, an adaptive mode switching method based on user action features is used to switch between rough and precise control modes. In the evaluation experiments, we analyzed the characteristics of different interaction methods in movement and navigation interaction tasks, and verified the effectiveness of the gesture-based interaction framework proposed in this paper. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

18 pages, 5484 KiB  
Article
Multiple Pedestrian Tracking in Dense Crowds Combined with Head Tracking
by Zhouming Qi, Mian Zhou, Guoqiang Zhu and Yanbing Xue
Appl. Sci. 2023, 13(1), 440; https://doi.org/10.3390/app13010440 - 29 Dec 2022
Cited by 2 | Viewed by 2348
Abstract
In order to reduce the negative impact of severe occlusion in dense scenes on the performance degradation of the tracker, considering that the head is the highest and least occluded part of the pedestrian’s entire body, we propose a new multiobject tracking method [...] Read more.
In order to reduce the negative impact of severe occlusion in dense scenes on the performance degradation of the tracker, considering that the head is the highest and least occluded part of the pedestrian’s entire body, we propose a new multiobject tracking method for pedestrians in dense crowds combined with head tracking. For each frame of the video, a head tracker is first used to generate the pedestrians’ head movement tracklets, and the pedestrians’ whole body bounding boxes are detected at the same time. Secondly, the degree of association between the head bounding boxes and the whole body bounding boxes are calculated, and the Hungarian algorithm is used to match the above calculation results. Finally, according to the matching results, the head bounding boxes in the head tracklets are replaced with the whole body bounding boxes, and the whole body motion tracklets of the pedestrians in the dense scene are generated. Our method can be performed online, and experiments suggested that our method effectively reduces the negative effects of false negatives and false positives on the tracker caused by severe occlusion in dense scenes. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

22 pages, 2988 KiB  
Article
Efficient Object Detection in SAR Images Based on Computation-Aware Neural Architecture Search
by Chuanyou Li, Yifan Li, Huanyun Hu, Jiangwei Shang, Kun Zhang, Lei Qian and Kexiang Wang
Appl. Sci. 2022, 12(21), 10978; https://doi.org/10.3390/app122110978 - 29 Oct 2022
Cited by 3 | Viewed by 2956
Abstract
Remote sensing techniques are becoming more sophisticated as radar imaging techniques mature. Synthetic aperture radar (SAR) can now provide high-resolution images for day-and-night earth observation. Detecting objects in SAR images is increasingly playing a significant role in a series of applications. In this [...] Read more.
Remote sensing techniques are becoming more sophisticated as radar imaging techniques mature. Synthetic aperture radar (SAR) can now provide high-resolution images for day-and-night earth observation. Detecting objects in SAR images is increasingly playing a significant role in a series of applications. In this paper, we address an edge detection problem that applies to scenarios with ship-like objects, where the detection accuracy and efficiency must be considered together. The key to ship detection lies in feature extraction. To efficiently extract features, many existing studies have proposed lightweight neural networks by pruning well-known models in the computer vision field. We found that although different baseline models have been tailored, a large amount of computation is still required. In order to achieve a lighter neural network-based ship detector, we propose Darts_Tiny, a novel differentiable neural architecture search model, to design dedicated convolutional neural networks automatically. Darts_Tiny is customized from Darts. It prunes superfluous operations to simplify the search model and adopts a computation-aware search process to enhance the detection efficiency. The computation-aware search process not only integrates a scheme cutting down the number of channels on purpose but also adopts a synthetic loss function combining the cross-entropy loss and the amount of computation. Comprehensive experiments are conducted to evaluate Darts_Tiny on two open datasets, HRSID and SSDD. Experimental results demonstrate that our neural networks win by at least an order of magnitude in terms of model complexity compared with SOTA lightweight models. A representative model obtained from Darts_Tiny (158 KB model volume, 28 K parameters and 0.58 G computations) yields a faster detection speed such that more than 750 frames per second (800×800 SAR images) could be achieved when testing on a platform equipped with an Nvidia Tesla V100 and an Intel Xeon Platinum 8260. The lightweight neural networks generated by Darts_Tiny are still competitive in detection accuracy: the F1 score can still reach more than 83 and 90, respectively, on HRSID and SSDD. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

15 pages, 3853 KiB  
Article
A New Knowledge-Distillation-Based Method for Detecting Conveyor Belt Defects
by Qi Yang, Fang Li, Hong Tian, Hua Li, Shuai Xu, Jiyou Fei, Zhongkai Wu, Qiang Feng and Chang Lu
Appl. Sci. 2022, 12(19), 10051; https://doi.org/10.3390/app121910051 - 6 Oct 2022
Cited by 8 | Viewed by 2447
Abstract
Aiming to assess the problems of low detection accuracy, poor reliability, and high cost of the manual inspection method for conveyor-belt-surface defect detection, in this paper we propose a new method of conveyor-belt-surface defect detection based on knowledge distillation. First, a data enhancement [...] Read more.
Aiming to assess the problems of low detection accuracy, poor reliability, and high cost of the manual inspection method for conveyor-belt-surface defect detection, in this paper we propose a new method of conveyor-belt-surface defect detection based on knowledge distillation. First, a data enhancement method combining GAN and copy–pasting strategies is proposed to expand the dataset to solve the problem of insufficient and difficult-to-obtain samples of conveyor-belt-surface defects. Then, the target detection network, the YOLOv5 model, is pruned to generate a mini-network. A knowledge distillation method for fine-grained feature simulation is used to distill the lightweight detection network YOLOv5n and the pruned mini-network YOLOv5n-slim. The experiments show that our method significantly reduced the number of parameters and the inference time of the model, and significantly improves the detection accuracy, up to 97.33% accuracy, in the detection of conveyor belt defects. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

Review

Jump to: Research

24 pages, 26974 KiB  
Review
A Review of Target Recognition Technology for Fruit Picking Robots: From Digital Image Processing to Deep Learning
by Xuehui Hua, Haoxin Li, Jinbin Zeng, Chongyang Han, Tianci Chen, Luxin Tang and Yuanqiang Luo
Appl. Sci. 2023, 13(7), 4160; https://doi.org/10.3390/app13074160 - 24 Mar 2023
Cited by 26 | Viewed by 5086
Abstract
Machine vision technology has dramatically improved the efficiency, speed, and quality of fruit-picking robots in complex environments. Target recognition technology for fruit is an integral part of the recognition systems of picking robots. The traditional digital image processing technology is a recognition method [...] Read more.
Machine vision technology has dramatically improved the efficiency, speed, and quality of fruit-picking robots in complex environments. Target recognition technology for fruit is an integral part of the recognition systems of picking robots. The traditional digital image processing technology is a recognition method based on hand-designed features, which makes it difficult to achieve better recognition as it results in dealing with the complex and changing orchard environment. Numerous pieces of literature have shown that extracting special features by training data with deep learning has significant advantages for fruit recognition in complex environments. In addition, to realize fully automated picking, reconstructing fruits in three dimensions is a necessary measure. In this paper, we systematically summarize the research work on target recognition techniques for picking robots in recent years, analyze the technical characteristics of different approaches, and conclude their development history. Finally, the challenges and future development trends of target recognition technology for picking robots are pointed out. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

26 pages, 489 KiB  
Review
A Survey: Network Feature Measurement Based on Machine Learning
by Muyi Sun, Bingyu He, Ran Li, Jinhua Li and Xinchang Zhang
Appl. Sci. 2023, 13(4), 2551; https://doi.org/10.3390/app13042551 - 16 Feb 2023
Cited by 1 | Viewed by 2986
Abstract
In network management, network measuring is crucial. Accurate network measurements can increase network utilization, network management, and the ability to find network problems promptly. With extensive technological advancements, the difficulty for network measurement is not just the growth in users and traffic but [...] Read more.
In network management, network measuring is crucial. Accurate network measurements can increase network utilization, network management, and the ability to find network problems promptly. With extensive technological advancements, the difficulty for network measurement is not just the growth in users and traffic but also the increasingly difficult technical problems brought on by the network’s design becoming more complicated. In recent years, network feature measurement issues have been extensively solved by the use of ML approaches, which are ideally suited to thorough data analysis and the investigation of complicated network behavior. However, there is yet no favored learning model that can best address the network measurement issue. The problems that ML applications in the field of network measurement must overcome are discussed in this study, along with an analysis of the current characteristics of ML algorithms in network measurement. Finally, network measurement techniques that have been used as ML techniques are examined, and potential advancements in the field are explored and examined. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

Back to TopTop