Topic Editors

Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, 02071 Albacete, Spain
Department of IT Engineering, Sookmyung Women’s University, Seoul 04310, Republic of Korea
Department of Computer Science, University of Beira Interior, 6201-001 Covilhã, Portugal

Applied Computer Vision and Pattern Recognition

Abstract submission deadline
closed (31 December 2021)
Manuscript submission deadline
closed (31 March 2022)
Viewed by
295964

Topic Information

Dear Colleagues,

Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Computer vision tasks include methods for acquiring digital images (through image sensors), image processing, and image analysis to reach an understanding of digital images. In general, it deals with the extraction of high-dimensional data from the real world in order to produce numerical or symbolic information that the computer can interpret. For interpretation, computer vision is closely related to pattern recognition.

Indeed, pattern recognition is the process of recognizing patterns by using machine learning algorithms. Pattern recognition can be defined as the identification and classification of meaningful patterns of data based on the extraction and comparison of the characteristic properties or features of the data. Pattern recognition is a very important area of research and application, underpinning developments in related fields such as computer vision, image processing, text and document analysis and neural networks. It is closely related to machine learning and finds applications in rapidly emerging areas such as biometrics, bioinformatics, multimedia data analysis and, more recently, data science.

The Applied Computer Vision and Pattern Recognition topic invites papers on theoretical and applied issues including, but not limited to, the following:

  • Statistical, structural and syntactic pattern recognition;
  • Neural networks, machine learning and deep learning;
  • Computer vision, robot vision and machine vision;
  • Multimedia systems and multimedia content;
  • Bio-signal processing, speech processing, image processing and video processing;
  • Data mining, information retrieval, big data and business intelligence.

This topic will present the results of research describing recent advances in both the computer vision and the pattern recognition fields.

Prof. Dr. Antonio Fernández-Caballero
Prof. Dr. Byung-Gyu Kim
Prof. Dr. Hugo Pedro Proença
Topic Editors

Keywords

  • pattern recognition
  • neural networks, machine learning
  • deep learning, artificial intelligence
  • computer vision
  • multimedia
  • data mining
  • signal processing
  • image processing

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.5 5.3 2011 17.8 Days CHF 2400
AI
ai
3.1 7.2 2020 17.6 Days CHF 1600
Big Data and Cognitive Computing
BDCC
3.7 7.1 2017 18 Days CHF 1800
Mathematical and Computational Applications
mca
1.9 - 1996 28.8 Days CHF 1400
Machine Learning and Knowledge Extraction
make
4.0 6.3 2019 27.1 Days CHF 1800

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (87 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
17 pages, 3304 KiB  
Article
Meta-YOLO: Meta-Learning for Few-Shot Traffic Sign Detection via Decoupling Dependencies
by Xinyue Ren, Weiwei Zhang, Minghui Wu, Chuanchang Li and Xiaolan Wang
Appl. Sci. 2022, 12(11), 5543; https://doi.org/10.3390/app12115543 - 30 May 2022
Cited by 11 | Viewed by 4813
Abstract
Considering the low coverage of roadside cooperative devices at the current time, automated driving should detect all road markings relevant to driving safety, such as traffic signs that tend to be of great variety but are fewer in number. In this work, we [...] Read more.
Considering the low coverage of roadside cooperative devices at the current time, automated driving should detect all road markings relevant to driving safety, such as traffic signs that tend to be of great variety but are fewer in number. In this work, we propose an innovative few-shot object detection framework, namely Meta-YOLO, whose challenge is to generalize to the unseen classes by using only a few seen classes. Simply integrating the YOLO mechanism into a meta-learning pipeline will encounter problems in terms of computational efficiency and mistake detection. Therefore, we construct a two-stage meta-learner model that can learn the learner initialization, the learner update direction and learning rate, all in a single meta-learning process. To facilitate deep networks with learning, the fidelity features of the targets improve the performance of meta-learner , but we also design a feature decorrelation module (FDM), which firstly transforms non-linear features into computable linear features based on RFF, and secondly perceives and removes global correlations by iteratively saving and reloading the features and sample weights of the model. We introduce a three-head module to learn global, local and patch correlations with the category detection result outputted by the aggregation in meta-learner , which endows a multi-scale ability with detector ϕ. In our experiments, the proposed algorithm outperforms the three benchmark algorithms and improves the mAP of few-shot detection by 39.8%. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

18 pages, 17613 KiB  
Article
Dynamic Anchor: A Feature-Guided Anchor Strategy for Object Detection
by Xing Liu, Huai-Xin Chen and Bi-Yuan Liu
Appl. Sci. 2022, 12(10), 4897; https://doi.org/10.3390/app12104897 - 12 May 2022
Cited by 4 | Viewed by 2431
Abstract
The majority of modern object detectors rely on a set of pre-defined anchor boxes, which enhances detection performance dramatically. Nevertheless, the pre-defined anchor strategy suffers some drawbacks, especially the complex hyper-parameters of anchors, seriously affecting detection performance. In this paper, we propose a [...] Read more.
The majority of modern object detectors rely on a set of pre-defined anchor boxes, which enhances detection performance dramatically. Nevertheless, the pre-defined anchor strategy suffers some drawbacks, especially the complex hyper-parameters of anchors, seriously affecting detection performance. In this paper, we propose a feature-guided anchor generation method named dynamic anchor. Dynamic anchor mainly includes two structures: the anchor generator and the feature enhancement module. The anchor generator leverages semantic features to predict optimized anchor shapes at the locations where the objects are likely to exist in the feature maps; by converting the predicted shape maps into location offsets, the feature enhancement module uses the high-quality anchors to improve detection performance. Compared with the hand-designed anchor scheme, dynamic anchor discards all pre-defined boxes and avoids complex hyper-parameters. In addition, only one anchor box is predicted for each location, which dramatically reduces calculation. With ResNet-50 and ResNet-101 as the backbone of the one-stage detector RetinaNet, dynamic anchor achieved 2.1 AP and 1.0 AP gains, respectively. The proposed dynamic anchor strategy can be easily integrated into the anchor-based detectors to replace the traditional pre-defined anchor scheme. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

17 pages, 14296 KiB  
Article
Lamb Behaviors Analysis Using a Predictive CNN Model and a Single Camera
by Yair González-Baldizón, Madaín Pérez-Patricio, Jorge Luis Camas-Anzueto, Oscar Mario Rodríguez-Elías, Elias Neftali Escobar-Gómez, Hector Daniel Vazquez-Delgado, Julio Alberto Guzman-Rabasa and José Armando Fragoso-Mandujano
Appl. Sci. 2022, 12(9), 4712; https://doi.org/10.3390/app12094712 - 7 May 2022
Cited by 6 | Viewed by 2684
Abstract
Object tracking is the process of estimating in time N the location of one or more moving element through an agent (camera, sensor, or other perceptive device). An important application in object tracking is the analysis of animal behavior to estimate their health. [...] Read more.
Object tracking is the process of estimating in time N the location of one or more moving element through an agent (camera, sensor, or other perceptive device). An important application in object tracking is the analysis of animal behavior to estimate their health. Traditionally, experts in the field have performed this task. However, this approach requires a high level of knowledge in the area and sufficient employees to ensure monitoring quality. Another alternative is the application of sensors (inertial and thermal), which provides precise information to the user, such as location and temperature, among other data. Nevertheless, this type of analysis results in high infrastructure costs and constant maintenance. Another option to overcome these problems is to analyze an RGB image to obtain information from animal tracking. This alternative eliminates the reliance on experts and different sensors, yet it adds the challenge of interpreting image ambiguity correctly. Taking into consideration the aforementioned, this article proposes a methodology to analyze lamb behavior from an approach based on a predictive model and deep learning, using a single RGB camera. This method consists of two stages. First, an architecture for lamb tracking was designed and implemented using CNN. Second, a predictive model was designed for the recognition of animal behavior. The results obtained in this research indicate that the proposed methodology is feasible and promising. In this sense, according to the experimental results on the used dataset, the accuracy was 99.85% for detecting lamb activities with YOLOV4, and for the proposed predictive model, a mean accuracy was 83.52% for detecting abnormal states. These results suggest that the proposed methodology can be useful in precision agriculture in order to take preventive actions and to diagnose possible diseases or health problems. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

20 pages, 5126 KiB  
Article
Driving Fatigue Detection Based on the Combination of Multi-Branch 3D-CNN and Attention Mechanism
by Wenbin Xiang, Xuncheng Wu, Chuanchang Li, Weiwei Zhang and Feiyang Li
Appl. Sci. 2022, 12(9), 4689; https://doi.org/10.3390/app12094689 - 6 May 2022
Cited by 9 | Viewed by 2691
Abstract
Fatigue driving is one of the main causes of traffic accidents today. In this study, a fatigue driving detection system based on a 3D convolutional neural network combined with a channel attention mechanism (Squeeze-and-Excitation module) is proposed. The model obtains information of multiple [...] Read more.
Fatigue driving is one of the main causes of traffic accidents today. In this study, a fatigue driving detection system based on a 3D convolutional neural network combined with a channel attention mechanism (Squeeze-and-Excitation module) is proposed. The model obtains information of multiple channels of grayscale, gradient and optical flow from the input frame. The temporal and spatial information contained in the feature map is extracted by three-dimensional convolution, after which the feature map is fed to the attention mechanism module to optimize the feature weights. EAR and MAR are used as fatigue analysis criteria and, finally, a full binary tree SVM classifier is used to output the four driving states. In addition, this study uses the frame aggregation strategy to solve the frame loss problem well and designs application software to record the driver’s status in real time while protecting the driver’s facial privacy and security. Compared with other classical fatigue driving detection methods, this method extracts features from temporal and spatial dimensions and optimizes the feature weights using the attention mechanism module, which significantly improves the fatigue detection performance. The experimental results show that 95% discriminative accuracy is achieved on the FDF dataset, which can be effectively applied to driving fatigue detection. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

22 pages, 4508 KiB  
Article
A Framework for Short Video Recognition Based on Motion Estimation and Feature Curves on SPD Manifolds
by Xiaohe Liu, Shuyu Liu and Zhengming Ma
Appl. Sci. 2022, 12(9), 4669; https://doi.org/10.3390/app12094669 - 6 May 2022
Cited by 4 | Viewed by 1883
Abstract
Given the prosperity of video media such as TikTok and YouTube, the requirement of short video recognition is becoming more and more urgent. A significant feature of short video is that there are few switches of scenes in short video, and the target [...] Read more.
Given the prosperity of video media such as TikTok and YouTube, the requirement of short video recognition is becoming more and more urgent. A significant feature of short video is that there are few switches of scenes in short video, and the target (e.g., the face of the key person in the short video) often runs through the short video. This paper presents a new short video recognition algorithm framework that transforms a short video into a family of feature curves on symmetric positive definite (SPD) manifold as the basis of recognition. Thus far, no similar algorithm has been reported. The results of experiments suggest that our method performs better on three changeling databases than seven other related algorithms published in the top issues. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

23 pages, 3130 KiB  
Article
ODEM-GAN: An Object Deformation Enhancement Model Based on Generative Adversarial Networks
by Zeyang Zhang, Zhongcai Pei, Zhiyong Tang and Fei Gu
Appl. Sci. 2022, 12(9), 4609; https://doi.org/10.3390/app12094609 - 3 May 2022
Cited by 2 | Viewed by 1936
Abstract
Object detection has attracted great attention in recent years. Many experts and scholars have proposed efficient solutions to address object detection problems and achieve perfect performance. For example, coordinate-based anchor-free (CBAF) module was proposed recently to predict the category and the adjustments to [...] Read more.
Object detection has attracted great attention in recent years. Many experts and scholars have proposed efficient solutions to address object detection problems and achieve perfect performance. For example, coordinate-based anchor-free (CBAF) module was proposed recently to predict the category and the adjustments to the box of the object by its feature part and its contextual part features, which are based on feature maps divided by spatial coordinates. However, these methods do not work very well for some particular situations (e.g., small object detection, scale variation, deformations, etc.), and the accuracy of object detection still needs to be improved. In this paper, to address these problems, we proposed ODEM-GAN based on CBAF, which utilizes generative adversarial networks to implement the detection of a deformed object. Specifically, ODEM-GAN first generates the object deformation features and then uses these features to enhance the learning ability of CBFA for improving the robustness of the detection. We also conducted extensive experiments to validate the effectiveness of ODEM-GAN in the simulation of a parachute opening process. The experimental results demonstrate that, with the assistance of ODEM-GAN, the AP score of CBAF for parachute detection is 88.4%, thereby the accuracy of detecting the deformed object by CBAF significantly increases. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 3478 KiB  
Article
Image Classification of Pests with Residual Neural Network Based on Transfer Learning
by Chen Li, Tong Zhen and Zhihui Li
Appl. Sci. 2022, 12(9), 4356; https://doi.org/10.3390/app12094356 - 25 Apr 2022
Cited by 41 | Viewed by 3893
Abstract
Agriculture is regarded as one of the key food sources for humans throughout history. In some countries, more than 90% of the population lives on agriculture. However, pests are regarded as one of the major causes of crop loss worldwide. Accurate and automated [...] Read more.
Agriculture is regarded as one of the key food sources for humans throughout history. In some countries, more than 90% of the population lives on agriculture. However, pests are regarded as one of the major causes of crop loss worldwide. Accurate and automated technology to classify pests can help pest detection with great significance for early preventive measures. This paper proposes the solution of a residual convolutional neural network for pest identification based on transfer learning. The IP102 agricultural pest image dataset was adopted as the experimental dataset to achieve data augmentation through random cropping, color transformation, CutMix and other operations. The processing technology can bring strong robustness to the affecting factors such as shooting angles, light and color changes. The experiment in this study compared the ResNeXt-50 (32 × 4d) model in terms of classification accuracy with different combinations of learning rate, transfer learning and data augmentation. In addition, the experiment compared the effects of data enhancement on the classification performance of different samples. The results show that the model classification effect based on transfer learning is generally superior to that based on new learning. Compared with new learning, transfer learning can greatly improve the model recognition ability and significantly reduce the training time to achieve the same classification accuracy. It is also very important to choose the appropriate data augmentation technology to improve classification accuracy. The accuracy rate of classification can reach 86.95% based on the combination of transfer learning + fine-tuning and CutMix. Compared to the original model, the accuracy of classification of some smaller samples was significantly improved. Compared with the relevant studies based on the same dataset, the method in this paper can achieve higher classification accuracy for more effective application in the field of pest classification. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

24 pages, 3689 KiB  
Article
PCB Component Detection Using Computer Vision for Hardware Assurance
by Wenwei Zhao, Suprith Reddy Gurudu, Shayan Taheri, Shajib Ghosh, Mukhil Azhagan Mallaiyan Sathiaseelan and Navid Asadizanjani
Big Data Cogn. Comput. 2022, 6(2), 39; https://doi.org/10.3390/bdcc6020039 - 8 Apr 2022
Cited by 18 | Viewed by 7500
Abstract
Printed circuit board (PCB) assurance in the optical domain is a crucial field of study. Though there are many existing PCB assurance methods using image processing, computer vision (CV), and machine learning (ML), the PCB field is complex and increasingly evolving, so new [...] Read more.
Printed circuit board (PCB) assurance in the optical domain is a crucial field of study. Though there are many existing PCB assurance methods using image processing, computer vision (CV), and machine learning (ML), the PCB field is complex and increasingly evolving, so new techniques are required to overcome the emerging problems. Existing ML-based methods outperform traditional CV methods; however, they often require more data, have low explainability, and can be difficult to adapt when a new technology arises. To overcome these challenges, CV methods can be used in tandem with ML methods. In particular, human-interpretable CV algorithms such as those that extract color, shape, and texture features increase PCB assurance explainability. This allows for incorporation of prior knowledge, which effectively reduces the number of trainable ML parameters and, thus, the amount of data needed to achieve high accuracy when training or retraining an ML model. Hence, this study explores the benefits and limitations of a variety of common computer vision-based features for the task of PCB component detection. The study results indicate that color features demonstrate promising performance for PCB component detection. The purpose of this paper is to facilitate collaboration between the hardware assurance, computer vision, and machine learning communities. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3458 KiB  
Article
Comparison of Multilayer Neural Network Models in Terms of Success of Classifications Based on EmguCV, ML.NET and Tensorflow.Net
by Martin Magdin, Juraj Benc, Štefan Koprda, Zoltán Balogh and Daniel Tuček
Appl. Sci. 2022, 12(8), 3730; https://doi.org/10.3390/app12083730 - 7 Apr 2022
Cited by 5 | Viewed by 2279
Abstract
In this paper, we compare three different models of multilayer neural networks in terms of their success in the classification phase. These models were designed for EmguCV, ML.NET and Tensorflow.Net libraries, which are currently among the most widely used libraries in the implementation [...] Read more.
In this paper, we compare three different models of multilayer neural networks in terms of their success in the classification phase. These models were designed for EmguCV, ML.NET and Tensorflow.Net libraries, which are currently among the most widely used libraries in the implementation of an automatic recognition system. Using the EmguCV library, we achieved a success rate in the classification of human faces of 81.95% and with ML.NET, which was based on the pre-trained ResNet50 model using convolution layers, up to 91.15% accuracy. The result of the success of the classification process was influenced by the time required for training and also the time required for the classification itself. The Tensorflow.Net model did not show sufficient classification ability when classifying using vector distances; the highest success rate of classification was only 13.31%. Neural networks were trained on a dataset with 1454 photographs of faces involving 43 people. At a time when neural networks are becoming more and more used for applications of different natures, it is necessary to choose the right model in the classification process that will be able to achieve the required accuracy with the minimum time required for training. The application created by us allows the insertion of images and the creation of their own datasets, on the basis of which the user can train a model with its own parameters. Models can then be saved and integrated into other applications. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

11 pages, 1293 KiB  
Article
PIFNet: 3D Object Detection Using Joint Image and Point Cloud Features for Autonomous Driving
by Wenqi Zheng, Han Xie, Yunfan Chen, Jeongjin Roh and Hyunchul Shin
Appl. Sci. 2022, 12(7), 3686; https://doi.org/10.3390/app12073686 - 6 Apr 2022
Cited by 8 | Viewed by 3565
Abstract
Owing to its wide range of applications, 3D object detection has attracted increasing attention in computer vision tasks. Most existing 3D object detection methods are based on Lidar point cloud data. However, these methods have some limitations in localization consistency and classification confidence, [...] Read more.
Owing to its wide range of applications, 3D object detection has attracted increasing attention in computer vision tasks. Most existing 3D object detection methods are based on Lidar point cloud data. However, these methods have some limitations in localization consistency and classification confidence, due to the irregularity and sparsity of Light Detection and Ranging (LiDAR) point cloud data. Inspired by the complementary characteristics of Lidar and camera sensors, we propose a new end-to-end learnable framework named Point-Image Fusion Network (PIFNet) to integrate the LiDAR point cloud and camera images. To resolve the problem of inconsistency in the localization and classification, we designed an Encoder-Decoder Fusion (EDF) module to extract the image features effectively, while maintaining the fine-grained localization information of objects. Furthermore, a new effective fusion module is proposed to integrate the color and texture features from images and the depth information from the point cloud. This module can enhance the irregularity and sparsity problem of the point cloud features by capitalizing the fine-grained information from camera images. In PIFNet, each intermediate feature map is fed into the fusion module to be integrated with its corresponding point-wise features. Furthermore, point-wise features are used instead of voxel-wise features to reduce information loss. Extensive experiments using the KITTI dataset demonstrate the superiority of PIFNet over other state-of-the-art methods. Compared with several state-of-the-art methods, our approach outperformed by 1.97% in mean Average Precision (mAP) and by 2.86% in Average Precision (AP) for the hard cases on the KITTI 3D object detection benchmark. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

17 pages, 7933 KiB  
Article
3D Pose Recognition System of Dump Truck for Autonomous Excavator
by Ju-hwan Lee, Junesuk Lee and Soon-Yong Park
Appl. Sci. 2022, 12(7), 3471; https://doi.org/10.3390/app12073471 - 29 Mar 2022
Cited by 3 | Viewed by 2992
Abstract
The purpose of an excavator is to dig up materials and load them onto heavy-duty dump trucks. Typically, an excavator is positioned at the rear of the dump truck when loading. In order to automate this process, this paper proposes a system that [...] Read more.
The purpose of an excavator is to dig up materials and load them onto heavy-duty dump trucks. Typically, an excavator is positioned at the rear of the dump truck when loading. In order to automate this process, this paper proposes a system that employs a combined stereo camera and two LiDAR sensors to determine the three-dimensional (3D) position of the truck’s cargo box and to analyze its loading space. Sparse depth information acquired from the two LiDAR sensors is used to detect points on the door of the cargo box and establish the plane on its rear side. Dense depth information of the cargo box acquired from the stereo camera sensor is projected onto the plane of the box’s rear to estimate its initial 3D position. In the next step, the point cloud sampled along the long shaft of the edge of the cargo box is used as the input of the Iterative Closest Point algorithm to calculate a more accurate cargo box position. The data collected from the stereo camera are then used to determine the 3D position of the cargo box and provide an estimate of the volume of the load along with the 3D position of the loading space to the excavator. In order to demonstrate the efficiency of the proposed method, a mock-up of a heavy-duty truck cargo box was created, and the volume of the load in the cargo box was analyzed. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

17 pages, 411 KiB  
Article
An Oversampling Method for Class Imbalance Problems on Large Datasets
by Fredy Rodríguez-Torres, José F. Martínez-Trinidad and Jesús A. Carrasco-Ochoa
Appl. Sci. 2022, 12(7), 3424; https://doi.org/10.3390/app12073424 - 28 Mar 2022
Cited by 22 | Viewed by 3218
Abstract
Several oversampling methods have been proposed for solving the class imbalance problem. However, most of them require searching the k-nearest neighbors to generate synthetic objects. This requirement makes them time-consuming and therefore unsuitable for large datasets. In this paper, an oversampling method [...] Read more.
Several oversampling methods have been proposed for solving the class imbalance problem. However, most of them require searching the k-nearest neighbors to generate synthetic objects. This requirement makes them time-consuming and therefore unsuitable for large datasets. In this paper, an oversampling method for large class imbalance problems that do not require the k-nearest neighbors’ search is proposed. According to our experiments on large datasets with different sizes of imbalance, the proposed method is at least twice as fast as 8 the fastest method reported in the literature while obtaining similar oversampling quality. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

12 pages, 4466 KiB  
Article
BBRefinement: A Universal Scheme to Improve the Precision of Box Object Detectors
by Petr Hurtik, Marek Vajgl and David Hynar
Appl. Sci. 2022, 12(7), 3402; https://doi.org/10.3390/app12073402 - 27 Mar 2022
Viewed by 1898
Abstract
We present a conceptually simple yet powerful and general scheme for refining the predictions of bounding boxes produced by an arbitrary object detector. Our approach was trained separately on single objects extracted from ground truth labels. For inference, it can be coupled with [...] Read more.
We present a conceptually simple yet powerful and general scheme for refining the predictions of bounding boxes produced by an arbitrary object detector. Our approach was trained separately on single objects extracted from ground truth labels. For inference, it can be coupled with an arbitrary object detector to improve its precision. The method, called BBRefinement, uses a mixture of data consisting of the image crop of an object and the object’s class and center. Because BBRefinement works in a restricted domain, it does not have to be concerned with multiscale detection, recognition of the object’s class, computing confidence, or multiple detections. Thus, the training is much more effective. It results in the ability to improve the performance of SOTA architectures by up to two mAP points on the COCO dataset in the benchmark. The refinement process is fast; it adds 50–80 ms overhead to a standard detector using RTX2080; therefore, it can run in real time on standard hardware. Finally, we show that BBRefinement can also be applied to COCO’s ground truth labels to create new, more precise labels. The link to the source code is provided in the contribution. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

11 pages, 8924 KiB  
Article
Non-Maximum Suppression Performs Later in Multi-Object Tracking
by Hong Liang, Ting Wu, Qian Zhang and Hui Zhou
Appl. Sci. 2022, 12(7), 3334; https://doi.org/10.3390/app12073334 - 25 Mar 2022
Cited by 7 | Viewed by 3136
Abstract
Multi-object tracking aims to assign a uniform ID for the same target in continuous frames, which is widely used in autonomous driving, security monitoring, etc. In the previous work, the low-scoring box, which inevitably contained occluded target, was filtered by Non-Maximum Suppression (NMS) [...] Read more.
Multi-object tracking aims to assign a uniform ID for the same target in continuous frames, which is widely used in autonomous driving, security monitoring, etc. In the previous work, the low-scoring box, which inevitably contained occluded target, was filtered by Non-Maximum Suppression (NMS) in a detection stage with a confidence threshold. In order to track occluded target effectively, in this paper, we propose a method of NMS performing later. The NMS works in tracking rather than the detection stage. More candidate boxes that contain the occluded target are reserved for trajectory matching. In addition, unrelated boxes are discarded according to the Intersection over Union (IOU) between the predicted and detected box. Furthermore, an unsupervised pre-trained person re-identification (ReID) model is applied to improve the domain adaptability. In addition, the bicubic interpolation is used to increase the resolution of low-scoring boxes. Extensive experiments on the MOT17 and MOT20 datasets have proven the effectiveness of tracking occluded targets of the proposed method, which achieves an MOTA of 78.3%. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 2100 KiB  
Article
Feature Mining: A Novel Training Strategy for Convolutional Neural Network
by Tianshu Xie, Jiali Deng, Xuan Cheng, Minghui Liu, Xiaomin Wang and Ming Liu
Appl. Sci. 2022, 12(7), 3318; https://doi.org/10.3390/app12073318 - 24 Mar 2022
Cited by 5 | Viewed by 2388
Abstract
In this paper, we propose a novel training strategy named Feature Mining for convolutional neural networks (CNNs) that aims to strengthen the network’s learning of the local features. Through experiments, we found that different parts of the feature contain different semantics, while the [...] Read more.
In this paper, we propose a novel training strategy named Feature Mining for convolutional neural networks (CNNs) that aims to strengthen the network’s learning of the local features. Through experiments, we found that different parts of the feature contain different semantics, while the network will inevitably lose a large amount of local information during feedforward propagation. In order to enhance the learning of the local features, Feature Mining divides the complete feature into two complementary parts and reuses this divided feature to make the network capture different local information; we call the two steps Feature Segmentation and Feature Reusing. Feature Mining is a parameter-free method with a plug-and-play nature and can be applied to any CNN model. Extensive experiments demonstrated the wide applicability, versatility, and compatibility of our method. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 31420 KiB  
Article
Monocular Real Time Full Resolution Depth Estimation Arrangement with a Tunable Lens
by Ricardo Oliva-García, Sabato Ceruso, José G. Marichal-Hernández and José M. Rodriguez-Ramos
Appl. Sci. 2022, 12(6), 3141; https://doi.org/10.3390/app12063141 - 19 Mar 2022
Cited by 4 | Viewed by 4131
Abstract
This work introduces a real-time full-resolution depth estimation device, which allows integral displays to be fed with a real-time light-field. The core principle of the technique is a high-speed focal stack acquisition method combined with an efficient implementation of the depth estimation algorithm, [...] Read more.
This work introduces a real-time full-resolution depth estimation device, which allows integral displays to be fed with a real-time light-field. The core principle of the technique is a high-speed focal stack acquisition method combined with an efficient implementation of the depth estimation algorithm, allowing the generation of real time, high resolution depth maps. As the procedure does not depend on any custom hardware, if the requirements are met, the described method can turn any high speed camera into a 3D camera with true depth output. The concept was tested with an experimental setup consisting of an electronically variable focus lens, a high-speed camera, and a GPU for processing, plus a control board for lens and image sensor synchronization. The comparison with other state of the art algorithms shows our advantages in computational time and precision. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 3589 KiB  
Article
Discrete HMM for Visualizing Domiciliary Human Activity Perception and Comprehension
by Ta-Wen Kuan, Shih-Pang Tseng, Che-Wen Chen, Jhing-Fa Wang and Chieh-An Sun
Appl. Sci. 2022, 12(6), 3070; https://doi.org/10.3390/app12063070 - 17 Mar 2022
Cited by 1 | Viewed by 1607
Abstract
Advances in artificial intelligence-based autonomous applications have led to the advent of domestic robots for smart elderly care; the preliminary critical step for such robots involves increasing the comprehension of robotic visualizing of human activity recognition. In this paper, discrete hidden Markov models [...] Read more.
Advances in artificial intelligence-based autonomous applications have led to the advent of domestic robots for smart elderly care; the preliminary critical step for such robots involves increasing the comprehension of robotic visualizing of human activity recognition. In this paper, discrete hidden Markov models (D-HMMs) are used to investigate human activity recognition. Eleven daily home activities are recorded using a video camera with an RGB-D sensor to collect a dataset composed of 25 skeleton joints in a frame, wherein only 10 skeleton joints are utilized to efficiently perform human activity recognition. Features of the chosen ten skeleton joints are sequentially extracted in terms of pose sequences for a specific human activity, and then, processed through coordination transformation and vectorization into a codebook prior to the D-HMM for estimating the maximal posterior probability to predict the target. In the experiments, the confusion matrix is evaluated based on eleven human activities; furthermore, the extension criterion of the confusion matrix is also examined to verify the robustness of the proposed work. The novelty indicated D-HMM theory is not only promising in terms of speech signal processing but also is applicable to visual signal processing and applications. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

13 pages, 2590 KiB  
Article
Double Branch Attention Block for Discriminative Representation of Siamese Trackers
by Jiaqi Xi, Jin Yang, Xiaodong Chen, Yi Wang and Huaiyu Cai
Appl. Sci. 2022, 12(6), 2897; https://doi.org/10.3390/app12062897 - 11 Mar 2022
Cited by 1 | Viewed by 1668
Abstract
Siamese trackers have achieved a good balance between accuracy and efficiency in generic object tracking. However, background distractors cause side effects to the discriminative representation of the target. To suppress the sensitivity of trackers to background distractors, we propose a Double Branch Attention [...] Read more.
Siamese trackers have achieved a good balance between accuracy and efficiency in generic object tracking. However, background distractors cause side effects to the discriminative representation of the target. To suppress the sensitivity of trackers to background distractors, we propose a Double Branch Attention (DBA) block and a Siamese tracker equipped with the DBA block named DBA-Siam. First, the DBA block concatenates channels of multiple layers from two branches of the Siamese framework to obtain rich feature representation. Second, the channel attention is applied to the two concatenated feature blocks to enhance the robust features selectively, thus enhancing the ability to distinguish the target from the complex background. Finally, the DBA block collects the contextual relevance between the Siamese branches and adaptively encodes it into the feature weight of the detection branch for information compensation. Ablation experiments show that the proposed block can enhance the discriminative representation of the target and significantly improve the tracking performance. Results on two popular benchmarks show that DBA-Siam performs favorably against its counterparts. Compared with the advanced algorithm CSTNet, DBA-Siam improves the EAO by 18.9% on VOT2016. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

11 pages, 4887 KiB  
Article
No-Reference Image Quality Assessment Based on Image Multi-Scale Contour Prediction
by Fan Wang, Jia Chen, Haonan Zhong, Yibo Ai and Weidong Zhang
Appl. Sci. 2022, 12(6), 2833; https://doi.org/10.3390/app12062833 - 10 Mar 2022
Cited by 4 | Viewed by 2252
Abstract
Accurately assessing image quality is a challenging task, especially without a reference image. Currently, most of the no-reference image quality assessment methods still require reference images in the training stage, but reference images are usually not available in real scenes. In this paper, [...] Read more.
Accurately assessing image quality is a challenging task, especially without a reference image. Currently, most of the no-reference image quality assessment methods still require reference images in the training stage, but reference images are usually not available in real scenes. In this paper, we proposed a model named MSIQA inspired by biological vision and a convolution neural network (CNN), which does not require reference images in the training and testing phases. The model contains two modules, a multi-scale contour prediction network that simulates the contour response of the human optic nerve to images at different distances, and a central attention peripheral inhibition module inspired by the receptive field mechanism of retinal ganglion cells. There are two steps in the training stage. In the first step, the multi-scale contour prediction network learns to predict the contour features of images in different scales, and in the second step, the model combines the central attention peripheral inhibition module to learn to predict the quality score of the image. In the experiments, our method has achieved excellent performance. The Pearson linear correlation coefficient of the MSIQA model test on the LIVE database reached 0.988. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

26 pages, 1487 KiB  
Review
An Overview on Deep Learning Techniques for Video Compressive Sensing
by Wael Saideni, David Helbert, Fabien Courreges and Jean-Pierre Cances
Appl. Sci. 2022, 12(5), 2734; https://doi.org/10.3390/app12052734 - 7 Mar 2022
Cited by 11 | Viewed by 4405
Abstract
The use of compressive sensing in several applications has allowed to capture impressive results, especially in various applications such as image and video processing and it has become a promising direction of scientific research. It provides extensive application value in optimizing video surveillance [...] Read more.
The use of compressive sensing in several applications has allowed to capture impressive results, especially in various applications such as image and video processing and it has become a promising direction of scientific research. It provides extensive application value in optimizing video surveillance networks. In this paper, we introduce recent state-of-the-art video compressive sensing methods based on neural networks and categorize them into different categories. We compare these approaches by analyzing the networks architectures. Then, we present their pros and cons. The general conclusion of the paper identify open research challenges and point out future research directions. The goal of this paper is to overview the current approaches in image and video compressive sensing and demonstrate their powerful impact in computer vision when using well designed compressive sensing algorithms. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

27 pages, 29095 KiB  
Article
Anthropometric Ratios for Lower-Body Detection Based on Deep Learning and Traditional Methods
by Jermphiphut Jaruenpunyasak, Alba García Seco de Herrera and Rakkrit Duangsoithong
Appl. Sci. 2022, 12(5), 2678; https://doi.org/10.3390/app12052678 - 4 Mar 2022
Cited by 1 | Viewed by 4420
Abstract
Lower-body detection can be useful in many applications, such as the detection of falling and injuries during exercises. However, it can be challenging to detect the lower-body, especially under various lighting and occlusion conditions. This paper presents a novel lower-body detection framework using [...] Read more.
Lower-body detection can be useful in many applications, such as the detection of falling and injuries during exercises. However, it can be challenging to detect the lower-body, especially under various lighting and occlusion conditions. This paper presents a novel lower-body detection framework using proposed anthropometric ratios and compares the performance of deep learning (convolutional neural networks and OpenPose) and traditional detection methods. According to the results, the proposed framework helps to successfully detect the accurate boundaries of the lower-body under various illumination and occlusion conditions for lower-limb monitoring. The proposed framework of anthropometric ratios combined with convolutional neural networks (A-CNNs) also achieves high accuracy (90.14%), while the combination of anthropometric ratios and traditional techniques (A-Traditional) for lower-body detection shows satisfactory performance with an averaged accuracy (74.81%). Although the accuracy of OpenPose (95.82%) is higher than the A-CNNs for lower-body detection, the A-CNNs provides lower complexity than the OpenPose, which is advantageous for lower-body detection and implementation on monitoring systems. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 7274 KiB  
Article
RoiSeg: An Effective Moving Object Segmentation Approach Based on Region-of-Interest with Unsupervised Learning
by Zeyang Zhang, Zhongcai Pei, Zhiyong Tang and Fei Gu
Appl. Sci. 2022, 12(5), 2674; https://doi.org/10.3390/app12052674 - 4 Mar 2022
Cited by 1 | Viewed by 1607
Abstract
Traditional video object segmentation often has low detection speed and inaccurate results due to the jitter caused by the pan-and-tilt or hand-held devices. Deep neural network (DNN) has been widely adopted to address these problems; however, it relies on a large number of [...] Read more.
Traditional video object segmentation often has low detection speed and inaccurate results due to the jitter caused by the pan-and-tilt or hand-held devices. Deep neural network (DNN) has been widely adopted to address these problems; however, it relies on a large number of annotated data and high-performance computing units. Therefore, DNN is not suitable for some special scenarios (e.g., no prior knowledge or powerful computing ability). In this paper, we propose RoiSeg, an effective moving object segmentation approach based on Region-of-Interest (ROI), which utilizes unsupervised learning method to achieve automatic segmentation of moving objects. Specifically, we first hypothesize that the central n × n pixels of images act as the ROI to represent the features of the segmented moving object. Second, we pool the ROI to a central point of the foreground to simplify the segmentation problem into a classification problem based on ROI. Third but not the least, we implement a trajectory-based classifier and an online updating mechanism to address the classification problem and the compensation of class imbalance, respectively. We conduct extensive experiments to evaluate the performance of RoiSeg and the experimental results demonstrate that RoiSeg is more accurate and faster compared with other segmentation algorithms. Moreover, RoiSeg not only effectively handles ambient lighting changes, fog, salt and pepper noise, but also has a good ability to deal with camera jitter and windy scenes. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 7920 KiB  
Article
LW-FIRE: A Lightweight Wildfire Image Classification with a Deep Convolutional Neural Network
by Amila Akagic and Emir Buza
Appl. Sci. 2022, 12(5), 2646; https://doi.org/10.3390/app12052646 - 4 Mar 2022
Cited by 15 | Viewed by 3182
Abstract
Analysis of reports published by the leading national centers for monitoring wildfires and other emergencies revealed that the devastation caused by wildfires has increased by 2.96-fold when compared to a decade earlier. The reports show that the total number of wildfires is declining; [...] Read more.
Analysis of reports published by the leading national centers for monitoring wildfires and other emergencies revealed that the devastation caused by wildfires has increased by 2.96-fold when compared to a decade earlier. The reports show that the total number of wildfires is declining; however, their impact on the wildlife appears to be more devastating. In recent years, deep neural network models have demonstrated state-of-the-art accuracy on many computer vision tasks. In this paper, we describe the design and implementation of a lightweight wildfire image classification model (LW-FIRE) based on convolutional neural networks. We explore different ways of using the existing dataset to efficiently train a deep convolutional neural network. We also propose a new method for dataset transformation to increase the number of samples in the dataset and improve the accuracy and generalization of the deep learning model. Experimental results show that the proposed model outperforms the state-of-the-art methods, and is suitable for real-time classification of wildfire images. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 8692 KiB  
Article
Data Extraction Method for Industrial Data Matrix Codes Based on Local Adjacent Modules Structure
by Licheng Liao, Jianmei Li and Changhou Lu
Appl. Sci. 2022, 12(5), 2291; https://doi.org/10.3390/app12052291 - 22 Feb 2022
Cited by 4 | Viewed by 4325
Abstract
A 2D barcode is a reliable way to provide lifetime traceability of parts that are exposed to harsh environments. However, there are considerable challenges in adopting mobile cameras to read symbols directly marked on metal surfaces. Images captured by mobile cameras are usually [...] Read more.
A 2D barcode is a reliable way to provide lifetime traceability of parts that are exposed to harsh environments. However, there are considerable challenges in adopting mobile cameras to read symbols directly marked on metal surfaces. Images captured by mobile cameras are usually of low quality with poor contrast due to the reflective surface of 2D barcode symbols. To deal with this problem, a novel laser-marked Data Matrix symbols reading method based on deep learning is proposed for mobile phone captured images. Utilizing the barcode module features, we train different convolutional neural network (CNN) models to learn the colors of two adjacent modules of a Data Matrix symbol. Depending on whether the colors of the two adjacent modules are the same or not, an edge image is transformed from a square grid, which is the same size as the barcode. A correction method based on the KM algorithm is used to get a corrected edge image, which helps to reconstruct the final barcode image. Experiments are carried out on our database, and the results show that the proposed algorithm outperforms in high accuracy of barcode recognition. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 10361 KiB  
Article
DeepProfile: Accurate Under-the-Clothes Body Profile Estimation
by Shufang Lu, Funan Lu, Xufeng Shou and Shuaiyin Zhu
Appl. Sci. 2022, 12(4), 2220; https://doi.org/10.3390/app12042220 - 21 Feb 2022
Viewed by 4134
Abstract
Accurate human body profiles have many potential applications. Image-based human body profile estimation can be regarded as a fine-grained semantic segmentation problem, which is typically used to locate objects and boundaries in images. However, existing image segmentation methods, such as human parsing, require [...] Read more.
Accurate human body profiles have many potential applications. Image-based human body profile estimation can be regarded as a fine-grained semantic segmentation problem, which is typically used to locate objects and boundaries in images. However, existing image segmentation methods, such as human parsing, require significant amounts of annotation and their datasets consider clothes as part of the human body profile. Therefore, the results they generate are not accurate when the human subject is dressed in loose-fitting clothing. In this paper, we created and labeled an under-the-clothes human body contour keypoint dataset; we utilized a convolutional neural network (CNN) to extract the contour keypoints, then combined them with a body profile database to generate under-the-clothes profiles. In order to improve the precision of keypoint detection, we propose a short-skip multi-scale dense (SMSD) block in the CNN to keep the details of the image and increase the information flow among different layers. Extensive experiments were conducted to show the effectiveness of our method. We demonstrate that our method achieved better results—especially when the person was dressed in loose-fitting clothes—than and competitive quantitative performance compared to state-of-the-art methods, while requiring less annotation effort. We also extended our method to the applications of 3D human model reconstruction and body size measurement. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

13 pages, 15516 KiB  
Article
Towards an Approach for Filtration Efficiency Estimation of Consumer-Grade Face Masks Using Thermography
by José Armando Fragoso-Mandujano, Madain Pérez-Patricio, Jorge Luis Camas-Anzueto, Hector Daniel Vázquez-Delgado, Eduardo Chandomí-Castellanos, Yair Gonzalez-Baldizón, Julio Alberto Guzman-Rabasa, Julio Cesar Martinez-Morgan and Luis Enrique Guillén-Ruíz
Appl. Sci. 2022, 12(4), 2071; https://doi.org/10.3390/app12042071 - 16 Feb 2022
Cited by 2 | Viewed by 2712
Abstract
Due to the increasing need for continuous use of face masks caused by COVID-19, it is essential to evaluate the filtration quality that each face mask provides. In this research, an estimation method based on thermal image processing was developed; the main objective [...] Read more.
Due to the increasing need for continuous use of face masks caused by COVID-19, it is essential to evaluate the filtration quality that each face mask provides. In this research, an estimation method based on thermal image processing was developed; the main objective was to evaluate the effectiveness of different face masks while being used during breathing. For the acquisition of heat distribution images, a thermographic imaging system was built; moreover, a deep learning model detected the leakage percentage of each face mask with a mAP of 0.9345, recall of 0.842 and F1-score of 0.82. The results obtained from this research revealed that the filtration effectiveness depended on heat loss through the manufacturing material; the proposed estimation method is simple, fast, and can be replicated and operated by people who are not experts in the computer field. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 3199 KiB  
Article
Chainlet-Based Ear Recognition Using Image Multi-Banding and Support Vector Machine
by Matthew Martin Zarachoff, Akbar Sheikh-Akbari and Dorothy Monekosso
Appl. Sci. 2022, 12(4), 2033; https://doi.org/10.3390/app12042033 - 16 Feb 2022
Cited by 3 | Viewed by 1862
Abstract
This paper introduces the Chainlet-based Ear Recognition algorithm using Multi-Banding and Support Vector Machine (CERMB-SVM). The proposed technique splits the gray input image into several bands based on the intensity of its pixels, similar to a hyperspectral image. It performs Canny edge detection [...] Read more.
This paper introduces the Chainlet-based Ear Recognition algorithm using Multi-Banding and Support Vector Machine (CERMB-SVM). The proposed technique splits the gray input image into several bands based on the intensity of its pixels, similar to a hyperspectral image. It performs Canny edge detection on each generated normalized band, extracting edges that correspond to the ear shape in each band. The generated binary edge maps are then combined, creating a single binary edge map. The resulting edge map is then divided into non-overlapping cells and the Freeman chain code for each group of connected edges within each cell is determined. A histogram of each group of contiguous four cells is computed, and the generated histograms are normalized and linked together to create a chainlet for the input image. The created chainlet histogram vectors of the images of the dataset are then utilized for the training and testing of a pairwise Support Vector Machine (SVM). Results obtained using the two benchmark ear image datasets demonstrate that the suggested CERMB-SVM method generates considerably higher performance in terms of accuracy than the principal component analysis based techniques. Furthermore, the proposed CERMB-SVM method yields greater performance in comparison to its anchor chainlet technique and state-of-the-art learning-based ear recognition techniques. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

12 pages, 2532 KiB  
Article
A Serial Attention Frame for Multi-Label Waste Bottle Classification
by Jingyu Xiao, Jiayu Xu, Chunwei Tian, Peiyi Han, Lei You and Shichao Zhang
Appl. Sci. 2022, 12(3), 1742; https://doi.org/10.3390/app12031742 - 8 Feb 2022
Cited by 13 | Viewed by 2729
Abstract
The multi-label recognition of damaged waste bottles has important significance in environmental protection. However, most of the previous methods are known for their poor performance, especially in regards to damaged waste bottle classification. In this paper, we propose the use of a serial [...] Read more.
The multi-label recognition of damaged waste bottles has important significance in environmental protection. However, most of the previous methods are known for their poor performance, especially in regards to damaged waste bottle classification. In this paper, we propose the use of a serial attention frame (SAF) to overcome the mentioned drawback. The proposed network architecture includes the following three parts: a residual learning block (RB), a mixed attention block (MAB), and a self-attention block (SAB). The RB uses ResNet to pretrain the SAF to extract more detailed information. To address the effect of the complex background of waste bottle recognition, a serial attention mechanism containing MAB and SAB is presented. MAB is used to extract more salient category information via the simultaneous use of spatial attention and channel attention. SAB exploits the obtained features and its parameters to enable the diverse features to improve the classification results of waste bottles. The experimental results demonstrate that our proposed model exhibited good recognition performance in the collected waste bottle datasets, with eight labels of three classifications, i.e., the color, whether the bottle was damage, and whether the wrapper had been removed, as well as public image classification datasets. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

10 pages, 1687 KiB  
Article
PTRNet: Global Feature and Local Feature Encoding for Point Cloud Registration
by Cuixia Li, Shanshan Yang, Li Shi, Yue Liu and Yinghao Li
Appl. Sci. 2022, 12(3), 1741; https://doi.org/10.3390/app12031741 - 8 Feb 2022
Cited by 3 | Viewed by 2775
Abstract
Existing end-to-end cloud registration methods are often inefficient and susceptible to noise. We propose an end-to-end point cloud registration network model, Point Transformer for Registration Network (PTRNet), that considers local and global features to improve this behavior. Our model uses point clouds as [...] Read more.
Existing end-to-end cloud registration methods are often inefficient and susceptible to noise. We propose an end-to-end point cloud registration network model, Point Transformer for Registration Network (PTRNet), that considers local and global features to improve this behavior. Our model uses point clouds as inputs and applies a Transformer method to extract their global features. Using a K-Nearest Neighbor (K-NN) topology, our method then encodes the local features of a point cloud and integrates them with the global features to obtain the point cloud’s strong global features. Comparative experiments using the ModelNet40 data set show that our method offers better results than other methods, with a mean square error (MSE), root mean square error (RMSE), and mean absolute error (MAE) between the ground truth and predicted values lower than those of competing methods. In the case of multi-object class without noise, the rotation average absolute error of PTRNet is reduced to 1.601 degrees and the translation average absolute error is reduced to 0.005 units. Compared to other recent end-to-end registration methods and traditional point cloud registration methods, the PTRNet method has less error, higher registration accuracy, and better robustness. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

24 pages, 30859 KiB  
Article
Feature Transformation Framework for Enhancing Compactness and Separability of Data Points in Feature Space for Small Datasets
by Mahmoud Maher ElMorshedy, Radwa Fathalla and Yasser El-Sonbaty
Appl. Sci. 2022, 12(3), 1713; https://doi.org/10.3390/app12031713 - 7 Feb 2022
Cited by 4 | Viewed by 3280
Abstract
Compactness and separability of data points are two important properties that contribute to the accuracy of machine learning tasks such as classification and clustering. We propose a framework that enhances the goodness criteria of the two properties by transforming the data points to [...] Read more.
Compactness and separability of data points are two important properties that contribute to the accuracy of machine learning tasks such as classification and clustering. We propose a framework that enhances the goodness criteria of the two properties by transforming the data points to a subspace in the same feature space, where data points of the same class are most similar to each other. Most related research about feature engineering in the input data points space relies on manually specified transformation functions. In contrast, our work utilizes a fully automated pipeline, in which the transformation function is learnt via an autoencoder for extraction of latent representation and multi-layer perceptron (MLP) regressors for the feature mapping. We tested our framework on both standard small datasets and benchmark-simulated small datasets by taking small fractions of their samples for training. Our framework consistently produced the best results in all semi-supervised clustering experiments based on K-means and different seeding techniques, with regards to clustering metrics and execution time. In addition, it enhances the performance of linear support vector machine (LSVM) and artificial neural network (ANN) classifier, when embedded as a preprocessing step before applying the classifiers. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

21 pages, 2895 KiB  
Article
Which Features Are More Correlated to Illuminant Estimation: A Composite Substitute
by Yunhui Luo, Xingguang Wang and Qing Wang
Appl. Sci. 2022, 12(3), 1175; https://doi.org/10.3390/app12031175 - 23 Jan 2022
Cited by 1 | Viewed by 1664
Abstract
Computational color constancy (CCC) is to endow computers or cameras with the capability to remove the color bias effect caused by different scene illuminations. The first procedure of CCC is illuminant estimation, i.e., to calculate the illuminant color for a given image scene. [...] Read more.
Computational color constancy (CCC) is to endow computers or cameras with the capability to remove the color bias effect caused by different scene illuminations. The first procedure of CCC is illuminant estimation, i.e., to calculate the illuminant color for a given image scene. Recently, some methods directly mapping image features to illuminant estimation provide an effective and robust solution for this issue. Nevertheless, due to diverse image features, it is uncertain to select which features to model illuminant color. In this research, a series of artificial features weaved into a mapping-based illuminant estimation framework is extensively investigated. This framework employs a multi-model structure and integrates the functions of kernel-based fuzzy c-means (KFCM) clustering, non-negative least square regression (NLSR), and fuzzy weighting. By comparing the resulting performance of different features, the features more correlated to illuminant estimation are found in the candidate feature set. Furthermore, the composite features are designed to achieve the outstanding performances of illuminant estimation. Extensive experiments are performed on typical benchmark datasets and the effectiveness of the proposed method has been validated. The proposed method makes illuminant estimation an explicit transformation of suitable image features with regressed and fuzzy weights, which has significant potential for both competing performances and fast implementation against state-of-the-art methods. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 1465 KiB  
Article
Automated Recognition of Chemical Molecule Images Based on an Improved TNT Model
by Yanchi Li, Guanyu Chen and Xiang Li
Appl. Sci. 2022, 12(2), 680; https://doi.org/10.3390/app12020680 - 11 Jan 2022
Cited by 3 | Viewed by 2499
Abstract
The automated recognition of optical chemical structures, with the help of machine learning, could speed up research and development efforts. However, historical sources often have some level of image corruption, which reduces the performance to near zero. To solve this downside, we need [...] Read more.
The automated recognition of optical chemical structures, with the help of machine learning, could speed up research and development efforts. However, historical sources often have some level of image corruption, which reduces the performance to near zero. To solve this downside, we need a dependable algorithmic program to help chemists to further expand their research. This paper reports the results of research conducted for the Bristol-Myers Squibb-Molecular Translation competition, which was held on Kaggle and which invited participants to convert old chemical images to their underlying chemical structures, annotated as InChI text; we define this work as molecular translation. We proposed a model based on a transformer, which can be utilized in molecular translation. To better capture the details of the chemical structure, the image features we want to extract need to be accurate at the pixel level. TNT is one of the existing transformer models that can meet this requirement. This model was originally used for image classification, and is essentially a transformer-encoder, which cannot be utilized for generation tasks. On the other hand, we believe that TNT cannot integrate the local information of images well, so we improve the core module of TNT—TNT block—and propose a novel module—Deep TNT block—by stacking the module to form an encoder structure, and then use the vanilla transformer-decoder as a decoder, forming a chemical formula generation model based on the encoder–decoder structure. Since molecular translation is an image-captioning task, we named it the Image Captioning Model based on Deep TNT (ICMDT). A comparison with different models shows that our model has benefits in each convergence speed and final description accuracy. We have designed a complete process in the model inference and fusion phase to further enhance the final results. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

12 pages, 2851 KiB  
Communication
A Novel, Automated, and Real-Time Method for the Analysis of Non-Human Primate Behavioral Patterns Using a Depth Image Sensor
by Sang Kuy Han, Keonwoo Kim, Yejoon Rim, Manhyung Han, Youngjeon Lee, Sung-Hyun Park, Won Seok Choi, Keyoung Jin Chun and Dong-Seok Lee
Appl. Sci. 2022, 12(1), 471; https://doi.org/10.3390/app12010471 - 4 Jan 2022
Cited by 1 | Viewed by 2134
Abstract
By virtue of their upright locomotion, similar to that of humans, motion analysis of non-human primates has been widely used in order to better understand musculoskeletal biomechanics and neuroscience problems. Given the difficulty of conducting a marker-based infrared optical tracking system for the [...] Read more.
By virtue of their upright locomotion, similar to that of humans, motion analysis of non-human primates has been widely used in order to better understand musculoskeletal biomechanics and neuroscience problems. Given the difficulty of conducting a marker-based infrared optical tracking system for the behavior analysis of primates, a 2-dimensional (D) video analysis has been applied. Distinct from a conventional marker-based optical tracking system, a depth image sensor system provides 3-D information on movement without any skin markers. The specific aim of this study was to develop a novel algorithm to analyze the behavioral patterns of non-human primates in a home cage using a depth image sensor. The behavioral patterns of nine monkeys in their home cage, including sitting, standing, and pacing, were captured using a depth image sensor. Thereafter, these were analyzed by observers’ manual assessment and the newly written automated program. We confirmed that the measurement results from the observers’ manual assessments and the automated program with depth image analysis were statistically identical. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

11 pages, 2266 KiB  
Communication
An Algorithm for Obtaining 3D Egg Models from Visual Images
by Zlatin Zlatev, Mariya Georgieva-Nikolova and Hristo Lukanov
Appl. Sci. 2022, 12(1), 373; https://doi.org/10.3390/app12010373 - 31 Dec 2021
Cited by 2 | Viewed by 2175
Abstract
Mathematical models for describing the shape of eggs find application in various fields of practice. The article proposes a method and tools for a detailed study of the shape and peripheral contours of digital images of eggs that are suitable for grouping and [...] Read more.
Mathematical models for describing the shape of eggs find application in various fields of practice. The article proposes a method and tools for a detailed study of the shape and peripheral contours of digital images of eggs that are suitable for grouping and sorting. A scheme has been adapted to determine the morphological characteristics of eggs, on the basis of which an algorithm has been created for obtaining their 3D models, based on data from color digital images. The deviation from the dimensions of the major and minor axes measured with a caliper and the proposed algorithm is 0.5–1.5 mm. A model of a correction factor has been established by which the three-dimensional shape of eggs can be determined with sufficient accuracy. The results obtained in this work improve the assumption that the use of algorithms to determine the shape of eggs strongly depends on those of the bird species studied. It is approved with data for Mallard eggs which have a more elliptical shape and correspondingly lower values of correction coefficient ‘c’ (c = 1.55–4.96). In sparrow (c = 9.55–11.19) and quail (c = 11.71–13.11) eggs, the form tends to be ovoid. After testing the obtained model for eggs from three bird species, sparrow, mallard, and quail, the coefficient of the determination of proposed model was R2 = 0.96. The standard error was SE = 0.08. All of the results show a p-value of the model less than α = 0.05. The proposed algorithm was applied to create 3D egg shapes that were not used in the previous calculations. The resulting error was up to 9%. This shows that in the test, the algorithm had an accuracy of 91%. An advantage of the algorithm proposed here is that the human operator does not need to select points in the image, as is the case with some of the algorithms developed by other authors. The proposed methods and tools for three-dimensional transformation of egg images would be applicable not only for the needs of poultry farming, but also in ornithological research when working with different shaped varieties of eggs. Experimental results show that the proposed algorithm has sufficient accuracy. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Graphical abstract

14 pages, 4865 KiB  
Article
A Self-Attention Augmented Graph Convolutional Clustering Networks for Skeleton-Based Video Anomaly Behavior Detection
by Chengming Liu, Ronghua Fu, Yinghao Li, Yufei Gao, Lei Shi and Weiwei Li
Appl. Sci. 2022, 12(1), 4; https://doi.org/10.3390/app12010004 - 21 Dec 2021
Cited by 16 | Viewed by 3778
Abstract
In this paper, we propose a new method for detecting abnormal human behavior based on skeleton features using self-attention augment graph convolution. The skeleton data have been proved to be robust to the complex background, illumination changes, and dynamic camera scenes and are [...] Read more.
In this paper, we propose a new method for detecting abnormal human behavior based on skeleton features using self-attention augment graph convolution. The skeleton data have been proved to be robust to the complex background, illumination changes, and dynamic camera scenes and are naturally constructed as a graph in non-Euclidean space. Particularly, the establishment of spatial temporal graph convolutional networks (ST-GCN) can effectively learn the spatio-temporal relationships of Non-Euclidean Structure Data. However, it only operates on local neighborhood nodes and thereby lacks global information. We propose a novel spatial temporal self-attention augmented graph convolutional networks (SAA-Graph) by combining improved spatial graph convolution operator with a modified transformer self-attention operator to capture both local and global information of the joints. The spatial self-attention augmented module is used to understand the intra-frame relationships between human body parts. As far as we know, we are the first group to utilize self-attention for video anomaly detection tasks by enhancing spatial temporal graph convolution. Moreover, to validate the proposed model, we performed extensive experiments on two large-scale publicly standard datasets (i.e., ShanghaiTech Campus and CUHK Avenue datasets) which reveal the state-of-art performance for our proposed approach when compared to existing skeleton-based methods and graph convolution methods. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

25 pages, 1242 KiB  
Article
Automated Event Detection and Classification in Soccer: The Potential of Using Multiple Modalities
by Olav Andre Nergård Rongved, Markus Stige, Steven Alexander Hicks, Vajira Lasantha Thambawita, Cise Midoglu, Evi Zouganeli, Dag Johansen, Michael Alexander Riegler and Pål Halvorsen
Mach. Learn. Knowl. Extr. 2021, 3(4), 1030-1054; https://doi.org/10.3390/make3040051 - 16 Dec 2021
Cited by 20 | Viewed by 6067
Abstract
Detecting events in videos is a complex task, and many different approaches, aimed at a large variety of use-cases, have been proposed in the literature. Most approaches, however, are unimodal and only consider the visual information in the videos. This paper presents and [...] Read more.
Detecting events in videos is a complex task, and many different approaches, aimed at a large variety of use-cases, have been proposed in the literature. Most approaches, however, are unimodal and only consider the visual information in the videos. This paper presents and evaluates different approaches based on neural networks where we combine visual features with audio features to detect (spot) and classify events in soccer videos. We employ model fusion to combine different modalities such as video and audio, and test these combinations against different state-of-the-art models on the SoccerNet dataset. The results show that a multimodal approach is beneficial. We also analyze how the tolerance for delays in classification and spotting time, and the tolerance for prediction accuracy, influence the results. Our experiments show that using multiple modalities improves event detection performance for certain types of events. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

12 pages, 634 KiB  
Article
Time Classification Algorithm Based on Windowed-Color Histogram Matching
by Hye-Jin Park, Jung-In Jang and Byung-Gyu Kim
Appl. Sci. 2021, 11(24), 11997; https://doi.org/10.3390/app112411997 - 16 Dec 2021
Cited by 2 | Viewed by 2722
Abstract
A web-based search system recommends and gives results such as customized image or video contents using information such as user interests, search time, and place. Time information extracted from images can be used as a important metadata in the web search system. We [...] Read more.
A web-based search system recommends and gives results such as customized image or video contents using information such as user interests, search time, and place. Time information extracted from images can be used as a important metadata in the web search system. We present an efficient algorithm to classify time period into day, dawn, and night when the input is a single image with a sky region. We employ the Mask R-CNN to extract a sky region. Based on the extracted sky region, reference color histograms are generated, which can be considered as the ground-truth. To compare the histograms effectively, we design the windowed-color histograms (for RGB bands) to compare each time period from the sky region of the reference data with one of the input images. Also, we use a weighting approach to reflect a more separable feature on the windowed-color histogram. With the proposed windowed-color histogram, we verify about 91% of the recognition accuracy in the test data. Compared with the existing deep neural network models, we verify that the proposed algorithm achieves better performance in the test dataset. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

18 pages, 92045 KiB  
Article
Cooperative Visual Augmentation Algorithm of Intelligent Vehicle Based on Inter-Vehicle Image Fusion
by Wei Liu, Yun Ma, Mingqiang Gao, Shuaidong Duan and Longsheng Wei
Appl. Sci. 2021, 11(24), 11917; https://doi.org/10.3390/app112411917 - 15 Dec 2021
Cited by 4 | Viewed by 2406
Abstract
In a connected vehicle environment based on vehicle-to-vehicle (V2V) technology, images from front and ego vehicles are fused to augment a driver’s or autonomous system’s visual field, which is helpful in avoiding road accidents by eliminating the blind point (the objects occluded by [...] Read more.
In a connected vehicle environment based on vehicle-to-vehicle (V2V) technology, images from front and ego vehicles are fused to augment a driver’s or autonomous system’s visual field, which is helpful in avoiding road accidents by eliminating the blind point (the objects occluded by vehicles), especially tailgating in urban areas. Realizing multi-view image fusion is a tough problem without knowing the relative location of two sensors and the fusing object is occluded in some views. Therefore, we propose an image geometric projection model and a new fusion method between neighbor vehicles in a cooperative way. Based on a 3D inter-vehicle projection model, selected feature matching points are adopted to estimate the geometric transformation parameters. By adding deep information, our method also designs a new deep-affine transformation to realize fusing of inter-vehicle images. Experimental results on KIITI (Karlsruhe Institute of Technology and Toyota Technological Institute) datasets are shown to validate our algorithm. Compared with previous work, our method improves the IoU index by 2~3 times. This algorithm can effectively enhance the visual perception ability of intelligent vehicles, and it will help to promote the further development and improvement of computer vision technology in the field of cooperative perception. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 4788 KiB  
Article
AI-Based Video Clipping of Soccer Events
by Joakim Olav Valand, Haris Kadragic, Steven Alexander Hicks, Vajira Lasantha Thambawita, Cise Midoglu, Tomas Kupka, Dag Johansen, Michael Alexander Riegler and Pål Halvorsen
Mach. Learn. Knowl. Extr. 2021, 3(4), 990-1008; https://doi.org/10.3390/make3040049 - 8 Dec 2021
Cited by 7 | Viewed by 6159
Abstract
The current gold standard for extracting highlight clips from soccer games is the use of manual annotations and clippings, where human operators define the start and end of an event and trim away the unwanted scenes. This is a tedious, time-consuming, and expensive [...] Read more.
The current gold standard for extracting highlight clips from soccer games is the use of manual annotations and clippings, where human operators define the start and end of an event and trim away the unwanted scenes. This is a tedious, time-consuming, and expensive task, to the extent of being rendered infeasible for use in lower league games. In this paper, we aim to automate the process of highlight generation using logo transition detection, scene boundary detection, and optional scene removal. We experiment with various approaches, using different neural network architectures on different datasets, and present two models that automatically find the appropriate time interval for extracting goal events. These models are evaluated both quantitatively and qualitatively, and the results show that we can detect logo and scene transitions with high accuracy and generate highlight clips that are highly acceptable for viewers. We conclude that there is considerable potential in automating the overall soccer video clipping process. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 5987 KiB  
Article
Memory-Efficient AI Algorithm for Infant Sleeping Death Syndrome Detection in Smart Buildings
by Qian Huang, Chenghung Hsieh, Jiaen Hsieh and Chunchen Liu
AI 2021, 2(4), 705-719; https://doi.org/10.3390/ai2040042 - 8 Dec 2021
Cited by 4 | Viewed by 4719
Abstract
Artificial intelligence (AI) is fundamentally transforming smart buildings by increasing energy efficiency and operational productivity, improving life experience, and providing better healthcare services. Sudden Infant Death Syndrome (SIDS) is an unexpected and unexplained death of infants under one year old. Previous research reports [...] Read more.
Artificial intelligence (AI) is fundamentally transforming smart buildings by increasing energy efficiency and operational productivity, improving life experience, and providing better healthcare services. Sudden Infant Death Syndrome (SIDS) is an unexpected and unexplained death of infants under one year old. Previous research reports that sleeping on the back can significantly reduce the risk of SIDS. Existing sensor-based wearable or touchable monitors have serious drawbacks such as inconvenience and false alarm, so they are not attractive in monitoring infant sleeping postures. Several recent studies use a camera, portable electronics, and AI algorithm to monitor the sleep postures of infants. However, there are two major bottlenecks that prevent AI from detecting potential baby sleeping hazards in smart buildings. In order to overcome these bottlenecks, in this work, we create a complete dataset containing 10,240 day and night vision samples, and use post-training weight quantization to solve the huge memory demand problem. Experimental results verify the effectiveness and benefits of our proposed idea. Compared with the state-of-the-art AI algorithms in the literature, the proposed method reduces memory footprint by at least 89%, while achieving a similar high detection accuracy of about 90%. Our proposed AI algorithm only requires 6.4 MB of memory space, while other existing AI algorithms for sleep posture detection require 58.2 MB to 275 MB of memory space. This comparison shows that the memory is reduced by at least 9 times without sacrificing the detection accuracy. Therefore, our proposed memory-efficient AI algorithm has great potential to be deployed and to run on edge devices, such as micro-controllers and Raspberry Pi, which have low memory footprint, limited power budget, and constrained computing resources. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 616 KiB  
Article
EnCaps: Clothing Image Classification Based on Enhanced Capsule Network
by Feng Yu, Chenghu Du, Ailing Hua, Minghua Jiang, Xiong Wei, Tao Peng and Xinrong Hu
Appl. Sci. 2021, 11(22), 11024; https://doi.org/10.3390/app112211024 - 21 Nov 2021
Cited by 5 | Viewed by 2372
Abstract
Clothing image classification is more and more important in the development of online clothing shopping. The clothing category marking, clothing commodity retrieval, and similar clothing recommendations are the popular applications in current clothing shopping, which are based on the technology of accurate clothing [...] Read more.
Clothing image classification is more and more important in the development of online clothing shopping. The clothing category marking, clothing commodity retrieval, and similar clothing recommendations are the popular applications in current clothing shopping, which are based on the technology of accurate clothing image classification. Wide varieties and various styles of clothing lead to great difficulty for the accurate clothing image classification. The traditional neural network can not obtain the spatial structure information of clothing images, which leads to poor classification accuracy. In order to reach the high accuracy, the enhanced capsule (EnCaps) network is proposed with the image feature and spatial structure feature. First, the spatial structure extraction model is proposed to obtain the clothing structure feature based on the EnCaps network. Second, the enhanced feature extraction model is proposed to extract more robust clothing features based on deeper network structure and attention mechanism. Third, parameter optimization is used to reduce the computation in the proposed network based on inception mechanism. Experimental results indicate that the proposed EnCaps network achieves high performance in terms of classification accuracy and computational efficiency. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

18 pages, 1417 KiB  
Article
Spatio-Temporal Deep Learning-Based Methods for Defect Detection: An Industrial Application Study Case
by Lucas A. da Silva, Eulanda M. dos Santos, Leo Araújo, Natalia S. Freire, Max Vasconcelos, Rafael Giusti, David Ferreira, Anderson S. Jesus, Agemilson Pimentel, Caio F. S. Cruz, Ruan J. S. Belem, André S. Costa and Osmar A. da Silva
Appl. Sci. 2021, 11(22), 10861; https://doi.org/10.3390/app112210861 - 17 Nov 2021
Cited by 2 | Viewed by 2713
Abstract
Data-driven methods—particularly machine learning techniques—are expected to play a key role in the headway of Industry 4.0. One increasingly popular application in this context is when anomaly detection is employed to test manufactured goods in assembly lines. In this work, we compare supervised, [...] Read more.
Data-driven methods—particularly machine learning techniques—are expected to play a key role in the headway of Industry 4.0. One increasingly popular application in this context is when anomaly detection is employed to test manufactured goods in assembly lines. In this work, we compare supervised, semi/weakly-supervised, and unsupervised strategies to detect anomalous sequences in video samples which may be indicative of defective televisions assembled in a factory. We compare 3D autoencoders, convolutional neural networks, and generative adversarial networks (GANs) with data collected in a laboratory. Our methodology to simulate anomalies commonly found in TV devices is discussed in this paper. We also propose an approach to generate anomalous sequences similar to those produced by a defective device as part of our GAN approach. Our results show that autoencoders perform poorly when trained with only non-anomalous data—which is important because class imbalance in industrial applications is typically skewed towards the non-anomalous class. However, we show that fine-tuning the GAN is a feasible approach to overcome this problem, achieving results comparable to those of supervised methods. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

12 pages, 41848 KiB  
Article
Active Sonar Target Classification Method Based on Fisher’s Dictionary Learning
by Tongjing Sun, Jiwei Jin, Tong Liu and Jun Zhang
Appl. Sci. 2021, 11(22), 10635; https://doi.org/10.3390/app112210635 - 11 Nov 2021
Cited by 8 | Viewed by 1796
Abstract
The marine environment is complex and changeable, and the interference of noise and reverberation seriously affects the classification performance of active sonar equipment. In particular, when the targets to be measured have similar characteristics, underwater classification becomes more complex. Therefore, a strong, recognizable [...] Read more.
The marine environment is complex and changeable, and the interference of noise and reverberation seriously affects the classification performance of active sonar equipment. In particular, when the targets to be measured have similar characteristics, underwater classification becomes more complex. Therefore, a strong, recognizable algorithm needs to be developed that can handle similar feature targets in a reverberation environment. This paper combines Fisher’s discriminant criterion and a dictionary-learning-based sparse representation classification algorithm, and proposes an active sonar target classification method based on Fisher discriminant dictionary learning (FDDL). Based on the learning dictionaries, the proposed method introduces the Fisher restriction criterion to limit the sparse coefficients, thereby obtaining a more discriminating dictionary; finally, it distinguishes the category according to the reconstruction errors of the reconstructed signal and the signal to be measured. The classification performance is compared with the existing methods, such as SVM (Support Vector Machine), SRC (Sparse Representation Based Classification), D-KSVD (Discriminative K-Singular Value Decomposition), and LC-KSVD (label-consistent K-SVD), and the experimental results show that FDDL has a better classification performance than the existing classification methods. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 4882 KiB  
Article
Assembly Quality Detection Based on Class-Imbalanced Semi-Supervised Learning
by Zichen Lu, Jiabin Jiang, Pin Cao and Yongying Yang
Appl. Sci. 2021, 11(21), 10373; https://doi.org/10.3390/app112110373 - 4 Nov 2021
Cited by 4 | Viewed by 2082
Abstract
Due to the imperfect assembly process, the unqualified assembly of a missing gasket or lead seal will affect the product’s performance and possibly cause safety accidents. Machine vision method based on deep learning has been widely used in quality inspection. Semi-supervised learning (SSL) [...] Read more.
Due to the imperfect assembly process, the unqualified assembly of a missing gasket or lead seal will affect the product’s performance and possibly cause safety accidents. Machine vision method based on deep learning has been widely used in quality inspection. Semi-supervised learning (SSL) has been applied in training deep learning models to reduce the burden of data annotation. The dataset obtained from the production line tends to be class-imbalanced because the assemblies are qualified in most cases. However, most SSL methods suffer from lower performance in class-imbalanced datasets. Therefore, we propose a new semi-supervised algorithm that achieves high classification accuracy on the class-imbalanced assembly dataset with limited labeled data. Based on the mean teacher algorithm, the proposed algorithm uses certainty to select reliable teacher predictions for student learning dynamically, and loss functions are modified to improve the model’s robustness against class imbalance. Results show that when only 10% of the total data are labeled, and the imbalance rate is 5.3, the proposed method can improve the accuracy from 85.34% to 93.67% compared to supervised learning. When the amount of annotated data accounts for 20%, the accuracy can reach 98.83%. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

21 pages, 7862 KiB  
Article
Illuminant Estimation Using Adaptive Neuro-Fuzzy Inference System
by Yunhui Luo, Xingguang Wang, Qing Wang and Yehong Chen
Appl. Sci. 2021, 11(21), 9936; https://doi.org/10.3390/app11219936 - 25 Oct 2021
Cited by 1 | Viewed by 2356
Abstract
Computational color constancy (CCC) is a fundamental prerequisite for many computer vision tasks. The key of CCC is to estimate illuminant color so that the image of a scene under varying illumination can be normalized to an image under the canonical illumination. As [...] Read more.
Computational color constancy (CCC) is a fundamental prerequisite for many computer vision tasks. The key of CCC is to estimate illuminant color so that the image of a scene under varying illumination can be normalized to an image under the canonical illumination. As a type of solution, combination algorithms generally try to reach better illuminant estimation by weighting other unitary algorithms for a given image. However, due to the diversity of image features, applying the same weighting combination strategy to different images might result in unsound illuminant estimation. To address this problem, this study provides an effective option. A two-step strategy is first employed to cluster the training images, then for each cluster, ANFIS (adaptive neuro-network fuzzy inference system) models are effectively trained to map image features to illuminant color. While giving a test image, the fuzzy weights measuring what degrees the image belonging to each cluster are calculated, thus a reliable illuminant estimation will be obtained by weighting all ANFIS predictions. The proposed method allows illuminant estimation to be dynamic combinations of initial illumination estimates from some unitary algorithms, relying on the powerful learning and reasoning capabilities of ANFIS. Extensive experiments on typical benchmark datasets demonstrate the effectiveness of the proposed approach. In addition, although there is an initial observation that some learning-based methods outperform even the most carefully designed and tested combinations of statistical and fuzzy inference systems, the proposed method is good practice for illuminant estimation considering fuzzy inference eases to implement in imaging signal processors with if-then rules and low computation efforts. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 5611 KiB  
Article
How Character-Centric Game Icon Design Affects the Perception of Gameplay
by Xiaoxiao Cao, Makoto Watanabe and Kenta Ono
Appl. Sci. 2021, 11(21), 9911; https://doi.org/10.3390/app11219911 - 23 Oct 2021
Cited by 7 | Viewed by 3840
Abstract
Mobile games are developing rapidly as an important part of the national economy. Gameplay is an important attribute, and a game’s icon sometimes determines the user’s initial impression. Whether the user can accurately perceive gameplay and affective quality through the icon is particularly [...] Read more.
Mobile games are developing rapidly as an important part of the national economy. Gameplay is an important attribute, and a game’s icon sometimes determines the user’s initial impression. Whether the user can accurately perceive gameplay and affective quality through the icon is particularly critical. In this article, a two-stage perceptual matching procedure is used to evaluate the perceptual quality of six categories of games whose icons include characters as elements. First, 60 highly visual matching icons were selected as second-stage objects through classification tasks. Second, through the semantic differential method and correlation analysis, highly visual matching icons’ affective matching quality was measured. Finally, a series of icon samples were determined, and element analysis was carried out. Several methods were proposed for improving the perceptual quality of game icons. Studying the perceptual matching relationship can better enhance the interaction between designers, developers, and users. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 4612 KiB  
Article
Unsupervised Anomaly Approach to Pedestrian Age Classification from Surveillance Cameras Using an Adversarial Model with Skip-Connections
by Husnu Baris Baydargil, Jangsik Park and Ibrahim Furkan Ince
Appl. Sci. 2021, 11(21), 9904; https://doi.org/10.3390/app11219904 - 23 Oct 2021
Cited by 1 | Viewed by 2181
Abstract
Anomaly detection is an active research area within the machine learning and scene understanding fields. Despite the ambiguous definition, anomaly detection is considered an outlier detection in a given data based on normality constraints. The biggest problem in real-world anomaly detection applications is [...] Read more.
Anomaly detection is an active research area within the machine learning and scene understanding fields. Despite the ambiguous definition, anomaly detection is considered an outlier detection in a given data based on normality constraints. The biggest problem in real-world anomaly detection applications is the high bias of the available data due to the class imbalance, meaning a limited amount of all possible anomalous and normal samples, thus making supervised learning model use difficult. This paper introduces an unsupervised and adversarially trained anomaly model with a unique encoder–decoder structure to address this issue. The proposed model distinguishes different age groups of people—namely child, adult, and elderly—from surveillance camera data in Busan, Republic of Korea. The proposed model has three major parts: a parallel-pipeline encoder with a conventional convolutional neural network and a dilated-convolutional neural network. The latent space vectors created at the end of both networks are concatenated. While the convolutional pipeline extracts local features, the dilated convolutional pipeline extracts the global features from the same input image. Concatenation of these features is sent as the input into the decoder, which has partial skip-connection elements from both pipelines. This, along with the concatenated feature vector, improves feature diversity. The input image is reconstructed from the feature vector through the stacked transpose convolution layers. Afterward, both the original input image and the corresponding reconstructed image are sent into the discriminator and are distinguished as real or fake. The image reconstruction loss and its corresponding latent space loss are considered for the training of the model and the adversarial Wasserstein loss. Only normal-designated class images are used during the training. The hypothesis is that if the model is trained with normal class images, then during the inference, the construction loss will be minimal. On the other hand, if the untrained anomalous class images are input through the model, the reconstruction loss value will be very high. This method is applied to distinguish different age clusters of people using unsupervised training. The proposed model outperforms the benchmark models in both the qualitative and the quantitative measurements. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 3776 KiB  
Article
The Influence of Commodity Presentation Mode on Online Shopping Decision Preference Induced by the Serial Position Effect
by Zhiman Zhu, Ningyue Peng, Yafeng Niu, Haiyan Wang and Chengqi Xue
Appl. Sci. 2021, 11(20), 9671; https://doi.org/10.3390/app11209671 - 17 Oct 2021
Viewed by 2793
Abstract
The information cluster that supports the final decision in a decision task is usually presented as a series of information. According to the serial position effect, the decision result is easily affected by the presentation order of the information. In this study, we [...] Read more.
The information cluster that supports the final decision in a decision task is usually presented as a series of information. According to the serial position effect, the decision result is easily affected by the presentation order of the information. In this study, we seek to investigate how the presentation mode of commodities and the informativeness on a shopping website will influence online shopping decisions. To this end, we constructed two experiments via a virtual online shopping environment. The first experiment suggests that the serial position effect can induce human computer interaction decision-making bias, and user decision-making results in separate evaluation mode are more prone to the recency effect, whereas user decision-making results in joint evaluation mode are more prone to the primacy effect. The second experiment confirms the influence of explicit and implicit details of information on the decision bias of the human computer interaction caused by the serial position effect. The results of the research will be better applied to the design and development of shopping websites or further applied to the interactive design of complex information systems to alleviate user decision-making biases and induce users to make more rational decisions. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

21 pages, 459 KiB  
Review
A Review on Machine and Deep Learning for Semiconductor Defect Classification in Scanning Electron Microscope Images
by Francisco López de la Rosa, Roberto Sánchez-Reolid, José L. Gómez-Sirvent, Rafael Morales and Antonio Fernández-Caballero
Appl. Sci. 2021, 11(20), 9508; https://doi.org/10.3390/app11209508 - 13 Oct 2021
Cited by 29 | Viewed by 8619
Abstract
Continued advances in machine learning (ML) and deep learning (DL) present new opportunities for use in a wide range of applications. One prominent application of these technologies is defect detection and classification in the manufacturing industry in order to minimise costs and ensure [...] Read more.
Continued advances in machine learning (ML) and deep learning (DL) present new opportunities for use in a wide range of applications. One prominent application of these technologies is defect detection and classification in the manufacturing industry in order to minimise costs and ensure customer satisfaction. Specifically, this scoping review focuses on inspection operations in the semiconductor manufacturing industry where different ML and DL techniques and configurations have been used for defect detection and classification. Inspection operations have traditionally been carried out by specialised personnel in charge of visually judging the images obtained with a scanning electron microscope (SEM). This scoping review focuses on inspection operations in the semiconductor manufacturing industry where different ML and DL methods have been used to detect and classify defects in SEM images. We also include the performance results of the different techniques and configurations described in the articles found. A thorough comparison of these results will help us to find the best solutions for future research related to the subject. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 42126 KiB  
Article
Bag of Features (BoF) Based Deep Learning Framework for Bleached Corals Detection
by Sonain Jamil, MuhibUr Rahman and Amir Haider
Big Data Cogn. Comput. 2021, 5(4), 53; https://doi.org/10.3390/bdcc5040053 - 8 Oct 2021
Cited by 21 | Viewed by 6488
Abstract
Coral reefs are the sub-aqueous calcium carbonate structures collected by the invertebrates known as corals. The charm and beauty of coral reefs attract tourists, and they play a vital role in preserving biodiversity, ceasing coastal erosion, and promoting business trade. However, they are [...] Read more.
Coral reefs are the sub-aqueous calcium carbonate structures collected by the invertebrates known as corals. The charm and beauty of coral reefs attract tourists, and they play a vital role in preserving biodiversity, ceasing coastal erosion, and promoting business trade. However, they are declining because of over-exploitation, damaging fishery, marine pollution, and global climate changes. Also, coral reefs help treat human immune-deficiency virus (HIV), heart disease, and coastal erosion. The corals of Australia’s great barrier reef have started bleaching due to the ocean acidification, and global warming, which is an alarming threat to the earth’s ecosystem. Many techniques have been developed to address such issues. However, each method has a limitation due to the low resolution of images, diverse weather conditions, etc. In this paper, we propose a bag of features (BoF) based approach that can detect and localize the bleached corals before the safety measures are applied. The dataset contains images of bleached and unbleached corals, and various kernels are used to support the vector machine so that extracted features can be classified. The accuracy of handcrafted descriptors and deep convolutional neural networks is analyzed and provided in detail with comparison to the current method. Various handcrafted descriptors like local binary pattern, a histogram of an oriented gradient, locally encoded transform feature histogram, gray level co-occurrence matrix, and completed joint scale local binary pattern are used for feature extraction. Specific deep convolutional neural networks such as AlexNet, GoogLeNet, VGG-19, ResNet-50, Inception v3, and CoralNet are being used for feature extraction. From experimental analysis and results, the proposed technique outperforms in comparison to the current state-of-the-art methods. The proposed technique achieves 99.08% accuracy with a classification error of 0.92%. A novel bleached coral positioning algorithm is also proposed to locate bleached corals in the coral reef images. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

13 pages, 1794 KiB  
Article
A Shallow–Deep Feature Fusion Method for Pedestrian Detection
by Daxue Liu, Kai Zang and Jifeng Shen
Appl. Sci. 2021, 11(19), 9202; https://doi.org/10.3390/app11199202 - 3 Oct 2021
Cited by 1 | Viewed by 1828
Abstract
In this paper, a shallow–deep feature fusion (SDFF) method is developed for pedestrian detection. Firstly, we propose a shallow feature-based method under the ACF framework of pedestrian detection. More precisely, improved Haar-like templates with Local FDA learning are used to filter the channel [...] Read more.
In this paper, a shallow–deep feature fusion (SDFF) method is developed for pedestrian detection. Firstly, we propose a shallow feature-based method under the ACF framework of pedestrian detection. More precisely, improved Haar-like templates with Local FDA learning are used to filter the channel maps of ACF such that these Haar-like features are able to improve the discriminative power and therefore enhance the detection performance. The proposed shallow feature is also referred to as weighted subset-haar-like feature. It is efficient in pedestrian detection with a high recall rate and precise localization. Secondly, the proposed shallow feature-based detection method operates as a region proposal. A classifier equipped with ResNet is then used to refine the region proposals to judge whether each region contains a pedestrian or not. The extensive experiments evaluated on INRIA, Caltech, and TUD-Brussel datasets show that SDFF is an effective and efficient method for pedestrian detection. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

18 pages, 11112 KiB  
Article
Recovery of Natural Scenery Image by Content Using Wiener-Granger Causality: A Self-Organizing Methodology
by Cesar Benavides-Alvarez, Carlos Aviles-Cruz, Eduardo Rodriguez-Martinez, Andrés Ferreyra-Ramírez and Arturo Zúñiga-López
Appl. Sci. 2021, 11(19), 8795; https://doi.org/10.3390/app11198795 - 22 Sep 2021
Viewed by 2124
Abstract
One of the most important applications of data science and data mining is is organizing, classifying, and retrieving digital images on Internet. The current focus of the researchers is to develop methods for the content based exploration of natural scenery images. In this [...] Read more.
One of the most important applications of data science and data mining is is organizing, classifying, and retrieving digital images on Internet. The current focus of the researchers is to develop methods for the content based exploration of natural scenery images. In this research paper, a self-organizing method of natural scenes images using Wiener-Granger Causality theory is proposed. It is achieved by carrying out Wiener-Granger causality for organizing the features in the time series form and introducing a characteristics extraction stage at random points within the image. Once the causal relationships are obtained, the k-means algorithm is applied to achieve the self-organizing of these attributes. Regarding classification, the kNN distance classification algorithm is used to find the most similar images that share the causal relationships between the elements of the scenes. The proposed methodology is validated on three public image databases, obtaining 100% recovery results. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

13 pages, 4515 KiB  
Article
An Attention Enhanced Spatial–Temporal Graph Convolutional LSTM Network for Action Recognition in Karate
by Jianping Guo, Hong Liu, Xi Li, Dahong Xu and Yihan Zhang
Appl. Sci. 2021, 11(18), 8641; https://doi.org/10.3390/app11188641 - 17 Sep 2021
Cited by 19 | Viewed by 2426
Abstract
With the increasing popularity of artificial intelligence applications, artificial intelligence technology has begun to be applied in competitive sports. These applications have promoted the improvement of athletes’ competitive ability, as well as the fitness of the masses. Human action recognition technology, based on [...] Read more.
With the increasing popularity of artificial intelligence applications, artificial intelligence technology has begun to be applied in competitive sports. These applications have promoted the improvement of athletes’ competitive ability, as well as the fitness of the masses. Human action recognition technology, based on deep learning, has gradually been applied to the analysis of the technical actions of competitive sports athletes, as well as the analysis of tactics. In this paper, a new graph convolution model is proposed. Delaunay’s partitioning algorithm was used to construct a new spatiotemporal topology which can effectively obtain the structural information and spatiotemporal features of athletes’ technical actions. At the same time, the attention mechanism was integrated into the model, and different weight coefficients were assigned to the joints, which significantly improved the accuracy of technical action recognition. First, a comparison between the current state-of-the-art methods was undertaken using the general datasets of Kinect and NTU-RGB + D. The performance of the new algorithm model was slightly improved in comparison to the general dataset. Then, the performance of our algorithm was compared with spatial temporal graph convolutional networks (ST-GCN) for the karate technique action dataset. We found that the accuracy of our algorithm was significantly improved. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 11346 KiB  
Article
Spatiotemporal Correlation-Based Accurate 3D Face Imaging Using Speckle Projection and Real-Time Improvement
by Wei Xiong, Hongyu Yang, Pei Zhou, Keren Fu and Jiangping Zhu
Appl. Sci. 2021, 11(18), 8588; https://doi.org/10.3390/app11188588 - 16 Sep 2021
Cited by 7 | Viewed by 3238
Abstract
The reconstruction of 3D face data is widely used in the fields of biometric recognition and virtual reality. However, the rapid acquisition of 3D data is plagued by reconstruction accuracy, slow speed, excessive scenes and contemporary reconstruction-technology. To solve this problem, an accurate [...] Read more.
The reconstruction of 3D face data is widely used in the fields of biometric recognition and virtual reality. However, the rapid acquisition of 3D data is plagued by reconstruction accuracy, slow speed, excessive scenes and contemporary reconstruction-technology. To solve this problem, an accurate 3D face-imaging implementation framework based on coarse-to-fine spatiotemporal correlation is designed, improving the spatiotemporal correlation stereo matching process and accelerating the processing using a spatiotemporal box filter. The reliability of the reconstruction parameters is further verified in order to resolve the contention between the measurement accuracy and time cost. A binocular 3D data acquisition device with a rotary speckle projector is used to continuously and synchronously acquire an infrared speckle stereo image sequence for reconstructing an accurate 3D face model. Based on the face mask data obtained by the high-precision industrial 3D scanner, the relationship between the number of projected speckle patterns, the matching window size, the reconstruction accuracy and the time cost is quantitatively analysed. An optimal combination of parameters is used to achieve a balance between reconstruction speed and accuracy. Thus, to overcome the problem of a long acquisition time caused by the switching of the rotary speckle pattern, a compact 3D face acquisition device using a fixed three-speckle projector is designed. Using the optimal combination parameters of the three speckles, the parallel pipeline strategy is adopted in each core processing unit to maximise system resource utilisation and data throughput. The most time-consuming spatiotemporal correlation stereo matching activity was accelerated by the graphical processing unit. The results show that the system achieves real-time image acquisition, as well as 3D face reconstruction, while maintaining acceptable systematic precision. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

12 pages, 4012 KiB  
Article
Image Splicing Location Based on Illumination Maps and Cluster Region Proposal Network
by Ye Zhu, Xiaoqian Shen, Shikun Liu, Xiaoli Zhang and Gang Yan
Appl. Sci. 2021, 11(18), 8437; https://doi.org/10.3390/app11188437 - 11 Sep 2021
Cited by 3 | Viewed by 2106
Abstract
Splicing is the most common operation in image forgery, where the tampered background regions are imported from different images. Illumination maps are inherent attribute of images and provide significant clues when searching for splicing locations. This paper proposes an end-to-end dual-stream network for [...] Read more.
Splicing is the most common operation in image forgery, where the tampered background regions are imported from different images. Illumination maps are inherent attribute of images and provide significant clues when searching for splicing locations. This paper proposes an end-to-end dual-stream network for splicing location, where the illumination stream, which includes Grey-Edge (GE) and Inverse-Intensity Chromaticity (IIC), extract the inconsistent features, and the image stream extracts the global unnatural tampered features. The dual-stream feature in our network is fused through Multiple Feature Pyramid Network (MFPN), which contains richer context information. Finally, a Cluster Region Proposal Network (C-RPN) with spatial attention and an adaptive cluster anchor are proposed to generate potential tampered regions with greater retention of location information. Extensive experiments, which were evaluated on the NIST16 and CASIA standard datasets, show that our proposed algorithm is superior to some state-of-the-art algorithms, because it achieves accurate tampered locations at the pixel level, and has great robustness in post-processing operations, such as noise, blur and JPEG recompression. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

13 pages, 2288 KiB  
Article
Learning Spatial–Temporal Background-Aware Based Tracking
by Peiting Gu, Peizhong Liu, Jianhua Deng and Zhi Chen
Appl. Sci. 2021, 11(18), 8427; https://doi.org/10.3390/app11188427 - 10 Sep 2021
Cited by 2 | Viewed by 1616
Abstract
Discriminative correlation filter (DCF) based tracking algorithms have obtained prominent speed and accuracy strengths, which have attracted extensive attention and research. However, some unavoidable deficiencies still exist. For example, the circulant shifted sampling process is likely to cause repeated periodic assumptions and cause [...] Read more.
Discriminative correlation filter (DCF) based tracking algorithms have obtained prominent speed and accuracy strengths, which have attracted extensive attention and research. However, some unavoidable deficiencies still exist. For example, the circulant shifted sampling process is likely to cause repeated periodic assumptions and cause boundary effects, which degrades the tracker’s discriminative performance, and the target is not easy to locate in complex appearance changes. In this paper, a spatial–temporal regularization module based on BACF (background-aware correlation filter) framework is proposed, which is performed by introducing a temporal regularization to deal effectively with the boundary effects issue. At the same time, the accuracy of target recognition is improved. This model can be effectively optimized by employing the alternating direction multiplier (ADMM) method, and each sub-problem has a corresponding closed solution. In addition, in terms of feature representation, we combine traditional hand-crafted features with deep convolution features linearly enhance the discriminative performance of the filter. Considerable experiments on multiple well-known benchmarks show the proposed algorithm is performs favorably against many state-of-the-art trackers and achieves an AUC score of 64.4% on OTB-100. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 4120 KiB  
Article
Common Gabor Features for Image Watermarking Identification
by Ismail Taha Ahmed, Baraa Tareq Hammad and Norziana Jamil
Appl. Sci. 2021, 11(18), 8308; https://doi.org/10.3390/app11188308 - 8 Sep 2021
Cited by 15 | Viewed by 2340
Abstract
Image watermarking is one of many methods for preventing unauthorized alterations to digital images. The major goal of the research is to find and identify photos that include a watermark, regardless of the method used to add the watermark or the shape of [...] Read more.
Image watermarking is one of many methods for preventing unauthorized alterations to digital images. The major goal of the research is to find and identify photos that include a watermark, regardless of the method used to add the watermark or the shape of the watermark. As a result, this study advocated using the best Gabor features and classifiers to improve the accuracy of image watermarking identification. As classifiers, discriminant analysis (DA) and random forests are used. The DA and random forest use mean squared energy feature, mean amplitude feature, and combined feature vector as inputs for classification. The performance of the classifiers is evaluated using a variety of feature sets, and the best results are achieved. In order to assess the performance of the proposed method, we use a public database. VOC2008 is a public database that we use. The findings reveal that our proposed method’s DA classifier with integrated features had the greatest TPR of 93.71 and the lowest FNR of 6.29. This shows that the performance outcomes of the proposed approach are consistent. The proposed method has the advantages of being able to find images with the watermark in any database and not requiring a specific type or algorithm for embedding the watermark. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

17 pages, 2715 KiB  
Article
Automatic System for the Detection of Defects on Olive Fruits in an Oil Mill
by Pablo Cano Marchal, Silvia Satorres Martínez, Juan Gómez Ortega and Javier Gámez García
Appl. Sci. 2021, 11(17), 8167; https://doi.org/10.3390/app11178167 - 3 Sep 2021
Cited by 3 | Viewed by 2091
Abstract
The ripeness and sanitary state of olive fruits are key factors in the final quality of the virgin olive oil (VOO) obtained. Since even a small number of damaged fruits may significantly impact the final quality of the produced VOO, the olive inspection [...] Read more.
The ripeness and sanitary state of olive fruits are key factors in the final quality of the virgin olive oil (VOO) obtained. Since even a small number of damaged fruits may significantly impact the final quality of the produced VOO, the olive inspection in the oil mill reception area or in the first stages of the productive process is of great interest. This paper proposes and validates an automatic defect detection system that utilizes infrared images, acquired under regular operating conditions of an olive oil mill, for the detection of defects on individual fruits. First, the image processing algorithm extracts the fruits based on the iterative application of the active contour technique assisted with mathematical morphology operations. Second, the defect detection is performed on the segmented olives using a decision tree based on region descriptors. The final assessment of the algorithm suggests that it works effectively with a high detection rate, which makes it suitable for the VOO industry. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

11 pages, 2597 KiB  
Article
A Study on Multiple Factors Affecting the Accuracy of Multiclass Skin Disease Classification
by Jiayi Fan, Jongwook Kim, Insu Jung and Yongkeun Lee
Appl. Sci. 2021, 11(17), 7929; https://doi.org/10.3390/app11177929 - 27 Aug 2021
Cited by 1 | Viewed by 2566
Abstract
Diagnosis of skin diseases by human experts is a laborious task prone to subjective judgment. Aided by computer technology and machine learning, it is possible to improve the efficiency and robustness of skin disease classification. Deep transfer learning using off-the-shelf deep convolutional neural [...] Read more.
Diagnosis of skin diseases by human experts is a laborious task prone to subjective judgment. Aided by computer technology and machine learning, it is possible to improve the efficiency and robustness of skin disease classification. Deep transfer learning using off-the-shelf deep convolutional neural networks (CNNs) has huge potential in the automation of skin disease classification tasks. However, complicated architectures seem to be too heavy for the classification of only a few skin disease classes. In this paper, in order to study potential ways to improve the classification accuracy of skin diseases, multiple factors are investigated. First, two different off-the-shelf architectures, namely AlexNet and ResNet50, are evaluated. Then, approaches using either transfer learning or trained from scratch are compared. In order to reduce the complexity of the network, the effects of shortening the depths of deep CNNs are investigated. Furthermore, different data augmentation techniques based on basic image manipulation are compared. Finally, the choice of mini-batch size is studied. Experiments were carried out on the HAM10000 skin disease dataset. The results show that the ResNet50-based model is more accurate than the AlexNet-based model. The transferred knowledge from the ImageNet database helps to improve the accuracy of the model. The reduction in stages of the ResNet50-based model can reduce complexity while maintaining good accuracy. Additionally, the use of different types of data augmentation techniques and the choice of mini-batch size can also affect the classification accuracy of skin diseases. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

24 pages, 5724 KiB  
Review
Review of Image-Based 3D Reconstruction of Building for Automated Construction Progress Monitoring
by Jingguo Xue, Xueliang Hou and Ying Zeng
Appl. Sci. 2021, 11(17), 7840; https://doi.org/10.3390/app11177840 - 25 Aug 2021
Cited by 38 | Viewed by 7367
Abstract
With the spread of camera-equipped devices, massive images and videos are recorded on construction sites daily, and the ever-increasing volume of digital images has inspired scholars to visually capture the actual status of construction sites from them. Three-dimensional (3D) reconstruction is the key [...] Read more.
With the spread of camera-equipped devices, massive images and videos are recorded on construction sites daily, and the ever-increasing volume of digital images has inspired scholars to visually capture the actual status of construction sites from them. Three-dimensional (3D) reconstruction is the key to connecting the Building Information Model and the project schedule to daily construction images, which enables managers to compare as-planned with as-built status and detect deviations and therefore monitor project progress. Many scholars have carried out extensive research and produced a variety of intricate methods. However, few studies comprehensively summarize the existing technologies and introduce the homogeneity and differences of these technologies. Researchers cannot clearly identify the relationship between various methods to solve the difficulties. Therefore, this paper focuses on the general technical path of various methods and sorts out a comprehensive research map, to provide reference for researchers in the selection of research methods and paths. This is followed by identifying gaps in knowledge and highlighting future research directions. Finally, key findings are summarized. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

24 pages, 8802 KiB  
Article
Enhanced Tone Mapping Using Regional Fused GAN Training with a Gamma-Shift Dataset
by Sung-Woon Jung, Hyuk-Ju Kwon and Sung-Hak Lee
Appl. Sci. 2021, 11(16), 7754; https://doi.org/10.3390/app11167754 - 23 Aug 2021
Cited by 2 | Viewed by 2475
Abstract
High-dynamic-range (HDR) imaging is a digital image processing technique that enhances an image’s visibility by modifying its color and contrast ranges. Generative adversarial networks (GANs) have proven to be potent deep learning models for HDR imaging. However, obtaining a sufficient volume of training [...] Read more.
High-dynamic-range (HDR) imaging is a digital image processing technique that enhances an image’s visibility by modifying its color and contrast ranges. Generative adversarial networks (GANs) have proven to be potent deep learning models for HDR imaging. However, obtaining a sufficient volume of training image pairs is difficult. This problem has been solved using CycleGAN, but the result of the use of CycleGAN for converting a low-dynamic-range (LDR) image to an HDR image exhibits problematic color distortion, and the intensity of the output image only slightly changes. Therefore, we propose a GAN training optimization model for converting LDR images into HDR images. First, a gamma shift method is proposed for training the GAN model with an extended luminance range. Next, a weighted loss map trains the GAN model for tone compression in the local area of images. Then, a regional fusion training model is used to balance the training method with the regional weight map and the restoring speed of local tone training. Finally, because the generated module tends to show a good performance in bright images, mean gamma tuning is used to evaluate image luminance channels, which are then fed into modules. Tests are conducted on foggy, dark surrounding, bright surrounding, and high-contrast images. The proposed model outperforms conventional models in a comparison test. The proposed model complements the performance of an object detection model even in a real night environment. The model can be used in commercial closed-circuit television surveillance systems and the security industry. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 4519 KiB  
Article
Visual MAV Tracker with Adaptive Search Region
by Wooryong Park, Donghee Lee, Junhak Yi and Woochul Nam
Appl. Sci. 2021, 11(16), 7741; https://doi.org/10.3390/app11167741 - 23 Aug 2021
Cited by 2 | Viewed by 1838
Abstract
Tracking a micro aerial vehicle (MAV) is challenging because of its small size and swift motion. A new model was developed by combining compact and adaptive search region (SR). The model can accurately and robustly track MAVs with a fast computation speed. A [...] Read more.
Tracking a micro aerial vehicle (MAV) is challenging because of its small size and swift motion. A new model was developed by combining compact and adaptive search region (SR). The model can accurately and robustly track MAVs with a fast computation speed. A compact SR, which is slightly larger than a target MAV, is less likely to include a distracting background than a large SR; thus, it can accurately track the MAV. Moreover, the compact SR reduces the computation time because tracking can be conducted with a relatively shallow network. An optimal SR to MAV size ratio was obtained in this study. However, this optimal compact SR causes frequent tracking failures in the presence of the dynamic MAV motion. An adaptive SR is proposed to address this problem; it adaptively changes the location and size of the SR based on the size, location, and velocity of the MAV in the SR. The compact SR without adaptive strategy tracks the MAV with an accuracy of 0.613 and a robustness of 0.086, whereas the compact and adaptive SR has an accuracy of 0.811 and a robustness of 1.0. Moreover, online tracking is accomplished within approximately 400 frames per second, which is significantly faster than the real-time speed. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

13 pages, 3893 KiB  
Article
The Study on Internal Flow Characteristics of Disc Filter under Different Working Condition
by Yanbing Chi, Peiling Yang, Zixuan Ma, Haiying Wang, Yuxuan Liu, Bingbing Jiang and Zongguang Hu
Appl. Sci. 2021, 11(16), 7715; https://doi.org/10.3390/app11167715 - 22 Aug 2021
Cited by 5 | Viewed by 2324
Abstract
A disc filter (DF) is an important component in a micro irrigation system. However, it has a high head loss and low filtration efficiency, which can lead to the inoperability of micro irrigation systems. To improve the filtration ability and to decrease the [...] Read more.
A disc filter (DF) is an important component in a micro irrigation system. However, it has a high head loss and low filtration efficiency, which can lead to the inoperability of micro irrigation systems. To improve the filtration ability and to decrease the pressure loss of the irrigation system, it is necessary to internalize the hydraulic characteristics of DFs. In this study, the filter bed of a DF was divided into three parts, i.e., upper, middle, and lower, which were wrapped with a transparent film. The wrapped part was completely blocked. The purpose was to analyze the hydraulic characteristics of different clogged conditions in three types of filters under four types of flows. In addition, we attempted to simulate the filter operation process with computational fluid dynamics, based on two aspects—a macroscopic model and a simplified model. The results showed that the patterns of head loss among all of the DFs was consistent, and the macroscopic model that treated filter bed as a porous medium could express the measured results. The macroscopic models observed that there was a circular flow in the DF, and the flow velocity presented a symmetrical distribution in a horizontal direction. The middle of the filter element appeared in a high-pressure area and demonstrated the highest head loss, which may be the main flow area of the DF, and the inner flow characteristics of the DF were consistent under different conditions. The simplified models showed that the main flow area is near the filter bed in the inner DF, and the flow is tangent to the filter bed between 45 and 90 degrees in a horizontal direction. The uneven distribution of velocity and pressure on the filter bed might be necessary factors to impact filter efficiency. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 3721 KiB  
Article
High Performance DeepFake Video Detection on CNN-Based with Attention Target-Specific Regions and Manual Distillation Extraction
by Van-Nhan Tran, Suk-Hwan Lee, Hoanh-Su Le and Ki-Ryong Kwon
Appl. Sci. 2021, 11(16), 7678; https://doi.org/10.3390/app11167678 - 20 Aug 2021
Cited by 22 | Viewed by 5625
Abstract
The rapid development of deep learning models that can produce and synthesize hyper-realistic videos are known as DeepFakes. Moreover, the growth of forgery data has prompted concerns about malevolent intent usage. Detecting forgery videos are a crucial subject in the field of digital [...] Read more.
The rapid development of deep learning models that can produce and synthesize hyper-realistic videos are known as DeepFakes. Moreover, the growth of forgery data has prompted concerns about malevolent intent usage. Detecting forgery videos are a crucial subject in the field of digital media. Nowadays, most models are based on deep learning neural networks and vision transformer, SOTA model with EfficientNetB7 backbone. However, due to the usage of excessively large backbones, these models have the intrinsic drawback of being too heavy. In our research, a high performance DeepFake detection model for manipulated video is proposed, ensuring accuracy of the model while keeping an appropriate weight. We inherited content from previous research projects related to distillation methodology but our proposal approached in a different way with manual distillation extraction, target-specific regions extraction, data augmentation, frame and multi-region ensemble, along with suggesting a CNN-based model as well as flexible classification with a dynamic threshold. Our proposal can reduce the overfitting problem, a common and particularly important problem affecting the quality of many models. So as to analyze the quality of our model, we performed tests on two datasets. DeepFake Detection Dataset (DFDC) with our model obtains 0.958 of AUC and 0.9243 of F1-score, compared with the SOTA model which obtains 0.972 of AUC and 0.906 of F1-score, and the smaller dataset Celeb-DF v2 with 0.978 of AUC and 0.9628 of F1-score. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

18 pages, 3725 KiB  
Article
Automatic TV Logo Identification for Advertisement Detection without Prior Data
by Pedro Carvalho, Américo Pereira and Paula Viana
Appl. Sci. 2021, 11(16), 7494; https://doi.org/10.3390/app11167494 - 15 Aug 2021
Cited by 4 | Viewed by 2877
Abstract
Advertisements are often inserted in multimedia content, and this is particularly relevant in TV broadcasting as they have a key financial role. In this context, the flexible and efficient processing of TV content to identify advertisement segments is highly desirable as it can [...] Read more.
Advertisements are often inserted in multimedia content, and this is particularly relevant in TV broadcasting as they have a key financial role. In this context, the flexible and efficient processing of TV content to identify advertisement segments is highly desirable as it can benefit different actors, including the broadcaster, the contracting company, and the end user. In this context, detecting the presence of the channel logo has been seen in the state-of-the-art as a good indicator. However, the difficulty of this challenging process increases as less prior data is available to help reduce uncertainty. As a result, the literature proposals that achieve the best results typically rely on prior knowledge or pre-existent databases. This paper proposes a flexible method for processing TV broadcasting content aiming at detecting channel logos, and consequently advertising segments, without using prior data about the channel or content. The final goal is to enable stream segmentation identifying advertisement slices. The proposed method was assessed over available state-of-the-art datasets as well as additional and more challenging stream captures. Results show that the proposed method surpasses the state-of-the-art. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 1196 KiB  
Article
Deep Feature Fusion Based Dual Branch Network for X-ray Security Inspection Image Classification
by Yingda Xu and Jianming Wei
Appl. Sci. 2021, 11(16), 7485; https://doi.org/10.3390/app11167485 - 14 Aug 2021
Cited by 3 | Viewed by 2285
Abstract
Automatic computer security inspection of X-ray scanned images has an irresistible trend in modern life. Aiming to address the inconvenience of recognizing small-sized prohibited item objects, and the potential class imbalance within multi-label object classification of X-ray scanned images, this paper proposes a [...] Read more.
Automatic computer security inspection of X-ray scanned images has an irresistible trend in modern life. Aiming to address the inconvenience of recognizing small-sized prohibited item objects, and the potential class imbalance within multi-label object classification of X-ray scanned images, this paper proposes a deep feature fusion model-based dual branch network architecture. Firstly, deep feature fusion is a method to fuse features extracted from several model layers. Specifically, it operates these features by upsampling and dimension reduction to match identical sizes, then fuses them by element-wise sum. In addition, this paper introduces focal loss to handle class imbalance. For balancing importance on samples of minority and majority class, it assigns weights to class predictions. Additionally, for distinguishing difficult samples from easy samples, it introduces modulating factor. Dual branch network adopts the two components above and integrates them in final loss calculation through the weighted sum. Experimental results illustrate that the proposed method outperforms baseline and state-of-art by a large margin on various positive/negative ratios of datasets. These demonstrate the competitivity of the proposed method in classification performance and its potential application under actual circumstances. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

13 pages, 6016 KiB  
Article
Efficient Hair Damage Detection Using SEM Images Based on Convolutional Neural Network
by Qiaoyue Man, Lintong Zhang and Youngim Cho
Appl. Sci. 2021, 11(16), 7333; https://doi.org/10.3390/app11167333 - 9 Aug 2021
Cited by 8 | Viewed by 7364
Abstract
With increasing interest in hairstyles and hair color, bleaching, dyeing, straightening, and curling hair is being widely used worldwide, and the chemical and physical treatment of hair is also increasing. As a result, hair has suffered a lot of damage, and the degree [...] Read more.
With increasing interest in hairstyles and hair color, bleaching, dyeing, straightening, and curling hair is being widely used worldwide, and the chemical and physical treatment of hair is also increasing. As a result, hair has suffered a lot of damage, and the degree of damage to hair has been measured only by the naked eye or touch. This has led to serious consequences, such as hair damage and scalp diseases. However, although these problems are serious, there is little research on hair damage. With the advancement of technology, people began to be interested in preventing and reversing hair damage. Manual observation methods cannot accurately and quickly identify hair damage areas. In recent years, with the rise of artificial intelligence technology, a large number of applications in various scenarios have given researchers new methods. In the project, we created a new hair damage data set based on SEM (scanning electron microscope) images. Through various physical and chemical analyses, we observe the changes in the hair surface according to the degree of hair damage, found the relationship between them, used a convolutional neural network to recognize and confirm the degree of hair damage, and categorized the degree of damage into weak damage, moderate damage and high damage. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

17 pages, 26803 KiB  
Article
MFCosface: A Masked-Face Recognition Algorithm Based on Large Margin Cosine Loss
by Hongxia Deng, Zijian Feng, Guanyu Qian, Xindong Lv, Haifang Li and Gang Li
Appl. Sci. 2021, 11(16), 7310; https://doi.org/10.3390/app11167310 - 9 Aug 2021
Cited by 44 | Viewed by 7433
Abstract
The world today is being hit by COVID-19. As opposed to fingerprints and ID cards, facial recognition technology can effectively prevent the spread of viruses in public places because it does not require contact with specific sensors. However, people also need to wear [...] Read more.
The world today is being hit by COVID-19. As opposed to fingerprints and ID cards, facial recognition technology can effectively prevent the spread of viruses in public places because it does not require contact with specific sensors. However, people also need to wear masks when entering public places, and masks will greatly affect the accuracy of facial recognition. Accurately performing facial recognition while people wear masks is a great challenge. In order to solve the problem of low facial recognition accuracy with mask wearers during the COVID-19 epidemic, we propose a masked-face recognition algorithm based on large margin cosine loss (MFCosface). Due to insufficient masked-face data for training, we designed a masked-face image generation algorithm based on the detection of the detection of key facial features. The face is detected and aligned through a multi-task cascaded convolutional network; and then we detect the key features of the face and select the mask template for coverage according to the positional information of the key features. Finally, we generate the corresponding masked-face image. Through analysis of the masked-face images, we found that triplet loss is not applicable to our datasets, because the results of online triplet selection contain fewer mask changes, making it difficult for the model to learn the relationship between mask occlusion and feature mapping. We use a large margin cosine loss as the loss function for training, which can map all the feature samples in a feature space with a smaller intra-class distance and a larger inter-class distance. In order to make the model pay more attention to the area that is not covered by the mask, we designed an Att-inception module that combines the Inception-Resnet module and the convolutional block attention module, which increases the weight of any unoccluded area in the feature map, thereby enlarging the unoccluded area’s contribution to the identification process. Experiments on several masked-face datasets have proved that our algorithm greatly improves the accuracy of masked-face recognition, and can accurately perform facial recognition with masked subjects. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

24 pages, 13400 KiB  
Article
Continuous Camera-Based Premature-Infant Monitoring Algorithms for NICU
by Ádám Nagy, Péter Földesy, Imre Jánoki, Dániel Terbe, Máté Siket, Miklós Szabó, Judit Varga and Ákos Zarándy
Appl. Sci. 2021, 11(16), 7215; https://doi.org/10.3390/app11167215 - 5 Aug 2021
Cited by 11 | Viewed by 4098
Abstract
Non-contact visual monitoring of vital signs in neonatology has been demonstrated by several recent studies in ideal scenarios where the baby is calm and there is no medical or parental intervention. Similar to contact monitoring methods (e.g., ECG, pulse oximeter) the camera-based solutions [...] Read more.
Non-contact visual monitoring of vital signs in neonatology has been demonstrated by several recent studies in ideal scenarios where the baby is calm and there is no medical or parental intervention. Similar to contact monitoring methods (e.g., ECG, pulse oximeter) the camera-based solutions suffer from motion artifacts. Therefore, during care and the infants’ active periods, calculated values typically differ largely from the real ones. In this way, our main contribution to existing remote camera-based techniques is to detect and classify such situations with a high level of confidence. Our algorithms can not only evaluate quiet periods, but can also provide continuous monitoring. Altogether, our proposed algorithms can measure pulse rate, breathing rate, and to recognize situations such as medical intervention or very active subjects using only a single camera, while the system does not exceed the computational capabilities of average CPU-GPU-based hardware. The performance of the algorithms was evaluated on our database collected at the Ist Dept. of Neonatology of Pediatrics, Dept of Obstetrics and Gynecology, Semmelweis University, Budapest, Hungary. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 3897 KiB  
Article
Orientation-Encoding CNN for Point Cloud Classification and Segmentation
by Hongbin Lin, Wu Zheng and Xiuping Peng
Mach. Learn. Knowl. Extr. 2021, 3(3), 601-614; https://doi.org/10.3390/make3030031 - 2 Aug 2021
Cited by 7 | Viewed by 3656
Abstract
With the introduction of effective and general deep learning network frameworks, deep learning based methods have achieved remarkable success in various visual tasks. However, there are still tough challenges in applying them to convolutional neural networks due to the lack of a potential [...] Read more.
With the introduction of effective and general deep learning network frameworks, deep learning based methods have achieved remarkable success in various visual tasks. However, there are still tough challenges in applying them to convolutional neural networks due to the lack of a potential rule structure of point clouds. Therefore, by taking the original point clouds as the input data, this paper proposes an orientation-encoding (OE) convolutional module and designs a convolutional neural network for effectively extracting local geometric features of point sets. By searching for the same number of points in 8 directions and arranging them in order in 8 directions, the OE convolution is then carried out according to the number of points in the direction, which realizes the effective feature learning of the local structure of the point sets. Further experiments on diverse datasets show that the proposed method has competitive performance on classification and segmentation tasks of point sets. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

13 pages, 2005 KiB  
Article
Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks
by Tao Zhang, Lun He, Xudong Li and Guoqing Feng
Appl. Sci. 2021, 11(15), 6975; https://doi.org/10.3390/app11156975 - 29 Jul 2021
Cited by 7 | Viewed by 2542
Abstract
Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend [...] Read more.
Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 6247 KiB  
Article
Data Augmentation Methods Applying Grayscale Images for Convolutional Neural Networks in Machine Vision
by Jinyeong Wang and Sanghwan Lee
Appl. Sci. 2021, 11(15), 6721; https://doi.org/10.3390/app11156721 - 22 Jul 2021
Cited by 18 | Viewed by 6961
Abstract
In increasing manufacturing productivity with automated surface inspection in smart factories, the demand for machine vision is rising. Recently, convolutional neural networks (CNNs) have demonstrated outstanding performance and solved many problems in the field of computer vision. With that, many machine vision systems [...] Read more.
In increasing manufacturing productivity with automated surface inspection in smart factories, the demand for machine vision is rising. Recently, convolutional neural networks (CNNs) have demonstrated outstanding performance and solved many problems in the field of computer vision. With that, many machine vision systems adopt CNNs to surface defect inspection. In this study, we developed an effective data augmentation method for grayscale images in CNN-based machine vision with mono cameras. Our method can apply to grayscale industrial images, and we demonstrated outstanding performance in the image classification and the object detection tasks. The main contributions of this study are as follows: (1) We propose a data augmentation method that can be performed when training CNNs with industrial images taken with mono cameras. (2) We demonstrate that image classification or object detection performance is better when training with the industrial image data augmented by the proposed method. Through the proposed method, many machine-vision-related problems using mono cameras can be effectively solved by using CNNs. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 5187 KiB  
Article
An Expert Artificial Intelligence Model for Discriminating Microseismic Events and Mine Blasts
by Dijun Rao, Xiuzhi Shi, Jian Zhou, Zhi Yu, Yonggang Gou, Zezhen Dong and Jinzhong Zhang
Appl. Sci. 2021, 11(14), 6474; https://doi.org/10.3390/app11146474 - 13 Jul 2021
Cited by 6 | Viewed by 2194
Abstract
To reduce the workload and misjudgment of manually discriminating microseismic events and blasts in mines, an artificial intelligence model called PSO-ELM, based on the extreme learning machine (ELM) optimized by the particle swarm optimization (PSO) algorithm, was applied in this study. Firstly, based [...] Read more.
To reduce the workload and misjudgment of manually discriminating microseismic events and blasts in mines, an artificial intelligence model called PSO-ELM, based on the extreme learning machine (ELM) optimized by the particle swarm optimization (PSO) algorithm, was applied in this study. Firstly, based on the difference between microseismic events and mine blasts and previous research results, 22 seismic parameters were selected as the discrimination feature parameters and their correlation was analyzed. Secondly, 1600 events were randomly selected from the database of the microseismic monitoring system in Fankou Lead-Zinc Mine to form a sample dataset. Then, the optimal discrimination model was established by investigating the model parameters. Finally, the performance of the model was tested using the sample dataset, and it was compared with the performance of the original ELM model and other commonly used intelligent discrimination models. The results indicate that the discrimination performance of PSO-ELM is the best. The values of the six evaluation indicators are close to the optimal value, which shows that PSO-ELM has great potential for discriminating microseismic events and blasts. The research results obtained can provide a new method for discriminating microseismic events and blasts, and it is of great significance to ensure the safe and smooth operation of mines. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

11 pages, 2315 KiB  
Article
Class Retrieval of Detected Adversarial Attacks
by Jalal Al-afandi and Horváth András
Appl. Sci. 2021, 11(14), 6438; https://doi.org/10.3390/app11146438 - 12 Jul 2021
Cited by 3 | Viewed by 1776
Abstract
Adversarial attack is a genuine threat compromising the safety of many intelligent systems curbing the standardization of using neural networks in security-critical applications. Since the emergence of adversarial attacks, the research community has worked relentlessly to avert the malicious damage of these attacks. [...] Read more.
Adversarial attack is a genuine threat compromising the safety of many intelligent systems curbing the standardization of using neural networks in security-critical applications. Since the emergence of adversarial attacks, the research community has worked relentlessly to avert the malicious damage of these attacks. Here, we present a new, additional and required element to ameliorate adversarial attacks: the recovery of the original class after a detected attack. Recovering the original class of an adversarial sample without taking any precautions is an uncharted concept which we would like to introduce with our novel class retrieval algorithm. As case studies, we demonstrate the validity of our approach on MNIST, CIFAR10 and ImageNet datasets where recovery rates were 72%, 65% and 65%, respectively. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3997 KiB  
Article
A Transfer Learning Architecture Based on a Support Vector Machine for Histopathology Image Classification
by Jiayi Fan, JangHyeon Lee and YongKeun Lee
Appl. Sci. 2021, 11(14), 6380; https://doi.org/10.3390/app11146380 - 9 Jul 2021
Cited by 49 | Viewed by 4953
Abstract
Recently, digital pathology is an essential application for clinical practice and medical research. Due to the lack of large annotated datasets, the deep transfer learning technique is often used to classify histopathology images. A softmax classifier is often used to perform classification tasks. [...] Read more.
Recently, digital pathology is an essential application for clinical practice and medical research. Due to the lack of large annotated datasets, the deep transfer learning technique is often used to classify histopathology images. A softmax classifier is often used to perform classification tasks. Besides, a Support Vector Machine (SVM) classifier is also popularly employed, especially for binary classification problems. Accurately determining the category of the histopathology images is vital for the diagnosis of diseases. In this paper, the conventional softmax classifier and the SVM classifier-based transfer learning approach are evaluated to classify histopathology cancer images in a binary breast cancer dataset and a multiclass lung and colon cancer dataset. In order to achieve better classification accuracy, a methodology that attaches SVM classifier to the fully-connected (FC) layer of the softmax-based transfer learning model is proposed. The proposed architecture involves a first step training the newly added FC layer on the target dataset using the softmax-based model and a second step training the SVM classifier with the newly trained FC layer. Cross-validation is used to ensure no bias for the evaluation of the performance of the models. Experimental results reveal that the conventional SVM classifier-based model is the least accurate on either binary or multiclass cancer datasets. The conventional softmax-based model shows moderate classification accuracy, while the proposed synthetic architecture achieves the best classification accuracy. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 9578 KiB  
Article
Generative Adversarial Networks to Improve the Robustness of Visual Defect Segmentation by Semantic Networks in Manufacturing Components
by Fátima A. Saiz, Garazi Alfaro, Iñigo Barandiaran and Manuel Graña
Appl. Sci. 2021, 11(14), 6368; https://doi.org/10.3390/app11146368 - 9 Jul 2021
Cited by 18 | Viewed by 2825
Abstract
This paper describes the application of Semantic Networks for the detection of defects in images of metallic manufactured components in a situation where the number of available samples of defects is small, which is rather common in real practical environments. In order to [...] Read more.
This paper describes the application of Semantic Networks for the detection of defects in images of metallic manufactured components in a situation where the number of available samples of defects is small, which is rather common in real practical environments. In order to overcome this shortage of data, the common approach is to use conventional data augmentation techniques. We resort to Generative Adversarial Networks (GANs) that have shown the capability to generate highly convincing samples of a specific class as a result of a game between a discriminator and a generator module. Here, we apply the GANs to generate samples of images of metallic manufactured components with specific defects, in order to improve training of Semantic Networks (specifically DeepLabV3+ and Pyramid Attention Network (PAN) networks) carrying out the defect detection and segmentation. Our process carries out the generation of defect images using the StyleGAN2 with the DiffAugment method, followed by a conventional data augmentation over the entire enriched dataset, achieving a large balanced dataset that allows robust training of the Semantic Network. We demonstrate the approach on a private dataset generated for an industrial client, where images are captured by an ad-hoc photometric-stereo image acquisition system, and a public dataset, the Northeastern University surface defect database (NEU). The proposed approach achieves an improvement of 7% and 6% in an intersection over union (IoU) measure of detection performance on each dataset over the conventional data augmentation. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 3580 KiB  
Article
Automatic Reading Algorithm of Substation Dial Gauges Based on Coordinate Positioning
by Dahua Li, Weixuan Li, Xiao Yu, Qiang Gao and Yu Song
Appl. Sci. 2021, 11(13), 6059; https://doi.org/10.3390/app11136059 - 29 Jun 2021
Cited by 11 | Viewed by 2439
Abstract
With the development of science and technology, inspection robots have attracted more and more attention, and research on the automatic reading of pointer instruments through inspection robots has become particularly valuable. Aiming at the problems of uneven illumination, complex dial background and damping [...] Read more.
With the development of science and technology, inspection robots have attracted more and more attention, and research on the automatic reading of pointer instruments through inspection robots has become particularly valuable. Aiming at the problems of uneven illumination, complex dial background and damping fluid interference of the collected instrument images, this paper proposes a dial gauge reading algorithm based on coordinate positioning. First, the multi-scale retinex with color restoration (MSRCR) is applied to improve the uneven illumination of the image. Second, a circle detection algorithm based on the arc-support line segment is proposed to detect the disc to obtain the coordinate of the center and radius of the circle. Then, a pointerless template is used to obtain the pointer, and the concentric circle algorithm is applied to locate the refined pointer. Finally, the automatic reading is calculated using the relative position of the pointer and the zero scale. The experimental results prove that the proposed algorithm can accurately locate the center of the circle and the pointer and obtain readings automatically. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 5485 KiB  
Article
Occluded Pedestrian Detection Techniques by Deformable Attention-Guided Network (DAGN)
by Han Xie, Wenqi Zheng and Hyunchul Shin
Appl. Sci. 2021, 11(13), 6025; https://doi.org/10.3390/app11136025 - 29 Jun 2021
Cited by 14 | Viewed by 3120
Abstract
Although many deep-learning-based methods have achieved considerable detection performance for pedestrians with high visibility, their overall performances are still far from satisfactory, especially when heavily occluded instances are included. In this research, we have developed a novel pedestrian detector using a deformable attention-guided [...] Read more.
Although many deep-learning-based methods have achieved considerable detection performance for pedestrians with high visibility, their overall performances are still far from satisfactory, especially when heavily occluded instances are included. In this research, we have developed a novel pedestrian detector using a deformable attention-guided network (DAGN). Considering that pedestrians may be deformed with occlusions or under diverse poses, we have designed a deformable convolution with an attention module (DCAM) to sample from non-rigid locations, and obtained the attention feature map by aggregating global context information. Furthermore, the loss function was optimized to get accurate detection bounding boxes, by adopting complete-IoU loss for regression, and the distance IoU-NMS was used to refine the predicted boxes. Finally, a preprocessing technique based on tone mapping was applied to cope with the low visibility cases due to poor illumination. Extensive evaluations were conducted on three popular traffic datasets. Our method could decrease the log-average miss rate (MR2) by 12.44% and 7.8%, respectively, for the heavy occlusion and overall cases, when compared to the published state-of-the-art results of the Caltech pedestrian dataset. Of the CityPersons and EuroCity Persons datasets, our proposed method outperformed the current best results by about 5% in MR2 for the heavy occlusion cases. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

18 pages, 6998 KiB  
Article
Generation and Annotation of Simulation-Real Ship Images for Convolutional Neural Networks Training and Testing
by Ji’an You, Zhaozheng Hu, Chao Peng and Zhiqiang Wang
Appl. Sci. 2021, 11(13), 5931; https://doi.org/10.3390/app11135931 - 25 Jun 2021
Cited by 4 | Viewed by 2334
Abstract
Large amounts of high-quality image data are the basis and premise of the high accuracy detection of objects in the field of convolutional neural networks (CNN). It is challenging to collect various high-quality ship image data based on the marine environment. A novel [...] Read more.
Large amounts of high-quality image data are the basis and premise of the high accuracy detection of objects in the field of convolutional neural networks (CNN). It is challenging to collect various high-quality ship image data based on the marine environment. A novel method based on CNN is proposed to generate a large number of high-quality ship images to address this. We obtained ship images with different perspectives and different sizes by adjusting the ships’ postures and sizes in three-dimensional (3D) simulation software, then 3D ship data were transformed into 2D ship image according to the principle of pinhole imaging. We selected specific experimental scenes as background images, and the target ships of the 2D ship images were superimposed onto the background images to generate “Simulation–Real” ship images (named SRS images hereafter). Additionally, an image annotation method based on SRS images was designed. Finally, the target detection algorithm based on CNN was used to train and test the generated SRS images. The proposed method is suitable for generating a large number of high-quality ship image samples and annotation data of corresponding ship images quickly to significantly improve the accuracy of ship detection. The annotation method proposed is superior to the annotation methods that label images with the image annotation software of Label-me and Label-img in terms of labeling the SRS images. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

28 pages, 11790 KiB  
Article
Image Retrieval Method Based on Image Feature Fusion and Discrete Cosine Transform
by DaYou Jiang and Jongweon Kim
Appl. Sci. 2021, 11(12), 5701; https://doi.org/10.3390/app11125701 - 19 Jun 2021
Cited by 21 | Viewed by 3436
Abstract
This paper presents a new content-based image retrieval (CBIR) method based on image feature fusion. The deep features are extracted from object-centric and place-centric deep networks. The discrete cosine transform (DCT) solves the strong correlation of deep features and reduces dimensions. The shallow [...] Read more.
This paper presents a new content-based image retrieval (CBIR) method based on image feature fusion. The deep features are extracted from object-centric and place-centric deep networks. The discrete cosine transform (DCT) solves the strong correlation of deep features and reduces dimensions. The shallow features are extracted from a Quantized Uniform Local Binary Pattern (ULBP), hue-saturation-value (HSV) histogram, and dual-tree complex wavelet transform (DTCWT). Singular value decomposition (SVD) is applied to reduce the dimensions of ULBP and DTCWT features. The experimental results tested on Corel datasets and the Oxford building dataset show that the proposed method based on shallow features fusion can significantly improve performance compared to using a single type of shallow feature. The proposed method based on deep features fusion can slightly improve performance compared to using a single type of deep feature. This paper also tests variable factors that affect image retrieval performance, such as using principal component analysis (PCA) instead of DCT. The DCT can be used for dimensional feature reduction without losing too much performance. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 857 KiB  
Article
Regularized Chained Deep Neural Network Classifier for Multiple Annotators
by Julián Gil-González, Andrés Valencia-Duque, Andrés Álvarez-Meza, Álvaro Orozco-Gutiérrez and Andrea García-Moreno
Appl. Sci. 2021, 11(12), 5409; https://doi.org/10.3390/app11125409 - 10 Jun 2021
Cited by 5 | Viewed by 2798
Abstract
The increasing popularity of crowdsourcing platforms, i.e., Amazon Mechanical Turk, changes how datasets for supervised learning are built. In these cases, instead of having datasets labeled by one source (which is supposed to be an expert who provided the absolute gold standard), databases [...] Read more.
The increasing popularity of crowdsourcing platforms, i.e., Amazon Mechanical Turk, changes how datasets for supervised learning are built. In these cases, instead of having datasets labeled by one source (which is supposed to be an expert who provided the absolute gold standard), databases holding multiple annotators are provided. However, most state-of-the-art methods devoted to learning from multiple experts assume that the labeler’s behavior is homogeneous across the input feature space. Besides, independence constraints are imposed on annotators’ outputs. This paper presents a regularized chained deep neural network to deal with classification tasks from multiple annotators. The introduced method, termed RCDNN, jointly predicts the ground truth label and the annotators’ performance from input space samples. In turn, RCDNN codes interdependencies among the experts by analyzing the layers’ weights and includes l1, l2, and Monte-Carlo Dropout-based regularizers to deal with the over-fitting issue in deep learning models. Obtained results (using both simulated and real-world annotators) demonstrate that RCDNN can deal with multi-labelers scenarios for classification tasks, defeating state-of-the-art techniques. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 396 KiB  
Article
A Fall Posture Classification and Recognition Method Based on Wavelet Packet Transform and Support Vector Machine
by Qingyun Zhang, Jin Tao, Qinglin Sun, Xianyi Zeng, Matthias Dehmer and Quan Zhou
Appl. Sci. 2021, 11(11), 5030; https://doi.org/10.3390/app11115030 - 29 May 2021
Cited by 1 | Viewed by 2005
Abstract
An accidental fall seriously threatens the health and safety of the elderly. The injuries caused by a fall have a lot to do with the different postures during the fall. Therefore, recognizing the posture of falling is essential for the rescue and care [...] Read more.
An accidental fall seriously threatens the health and safety of the elderly. The injuries caused by a fall have a lot to do with the different postures during the fall. Therefore, recognizing the posture of falling is essential for the rescue and care of the elderly. In this paper, a novel method was proposed to improve the classification and recognition accuracy of fall postures. Firstly, the wavelet packet transform was used to extract multiple features from sample data. Secondly, random forest was used to evaluate the importance of the extracted features and obtain effective features through screening. Finally, the support vector machine classifier based on the linear kernel function was used to realize the falling posture recognition. The experiment results on “Simulated Falls and Daily Living Activities Data Set” show that the proposed method can distinguish different types of fall postures and achieve 99% classification accuracy. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

28 pages, 16561 KiB  
Article
A Hybrid Gearbox Fault Diagnosis Method Based on GWO-VMD and DE-KELM
by Gang Yao, Yunce Wang, Mohamed Benbouzid and Mourad Ait-Ahmed
Appl. Sci. 2021, 11(11), 4996; https://doi.org/10.3390/app11114996 - 28 May 2021
Cited by 19 | Viewed by 3075
Abstract
In this paper, a vibration signal-based hybrid diagnostic method, including vibration signal adaptive decomposition, vibration signal reconstruction, fault feature extraction, and gearbox fault classification, is proposed to realize fault diagnosis of general gearboxes. The main contribution of the proposed method is the combining [...] Read more.
In this paper, a vibration signal-based hybrid diagnostic method, including vibration signal adaptive decomposition, vibration signal reconstruction, fault feature extraction, and gearbox fault classification, is proposed to realize fault diagnosis of general gearboxes. The main contribution of the proposed method is the combining of signal processing, machine learning, and optimization techniques to effectively eliminate noise contained in vibration signals and to achieve high diagnostic accuracy. Firstly, in the study of vibration signal preprocessing and fault feature extraction, to reduce the impact of noise and mode mixing problems on the accuracy of fault classification, Variational Mode Decomposition (VMD) was adopted to realize adaptive signal decomposition and Wolf Grey Optimizer (GWO) was applied to optimize parameters of VMD. The correlation coefficient was subsequently used to select highly correlated Intrinsic Mode Functions (IMFs) to reconstruct the vibration signals. With these re-constructed signals, fault features were extracted by calculating their time domain parameters, energies, and permutation entropies. Secondly, in the study of fault classification, Kernel Extreme Learning Machine (KELM) was adopted and Differential Evolutionary (DE) was applied to search its regularization coefficient and kernel parameter to further improve classification accuracy. Finally, gearbox vibration signals in healthy and faulty conditions were obtained and contrast experiences were conducted to validate the effectiveness of the proposed hybrid fault diagnosis method. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 2373 KiB  
Article
Single-Core Multiscale Residual Network for the Super Resolution of Liquid Metal Specimen Images
by Keqing Ning, Zhihao Zhang, Kai Han, Siyu Han and Xiqing Zhang
Mach. Learn. Knowl. Extr. 2021, 3(2), 453-466; https://doi.org/10.3390/make3020023 - 27 May 2021
Cited by 1 | Viewed by 2840
Abstract
In a gravity-free or microgravity environment, liquid metals without crystalline nuclei achieve a deep undercooling state. The resulting melts exhibit unique properties, and the research of this phenomenon is critical for exploring new metastable materials. Owing to the rapid crystallization rates of deeply [...] Read more.
In a gravity-free or microgravity environment, liquid metals without crystalline nuclei achieve a deep undercooling state. The resulting melts exhibit unique properties, and the research of this phenomenon is critical for exploring new metastable materials. Owing to the rapid crystallization rates of deeply undercooled liquid metal droplets, as well as cost concerns, experimental systems meant for the study of liquid metal specimens usually use low-resolution, high-framerate, high-speed cameras, which result in low-resolution photographs. To facilitate subsequent studies by material scientists, it is necessary to use super-resolution techniques to increase the resolution of these photographs. However, existing super-resolution algorithms cannot quickly and accurately restore the details contained in images of deeply undercooled liquid metal specimens. To address this problem, we propose the single-core multiscale residual network (SCMSRN) algorithm for photographic images of liquid metal specimens. In this model, multiple cascaded filters are used to obtain feature information, and the multiscale features are then fused by a residual network. Compared to existing state-of-the-art artificial neural network super-resolution algorithms, such as SRCNN, VDSR and MSRN, our model was able to achieve higher PSNR and SSIM scores and reduce network size and training time. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

17 pages, 5438 KiB  
Article
A New Computational Method for Arabic Calligraphy Style Representation and Classification
by Zineb Kaoudja, Mohammed Lamine Kherfi and Belal Khaldi
Appl. Sci. 2021, 11(11), 4852; https://doi.org/10.3390/app11114852 - 25 May 2021
Cited by 7 | Viewed by 6184
Abstract
Despite the importance of recognizing Arabic calligraphy styles and their potential usefulness for many applications, a very limited number of Arabic calligraphy style recognition works have been established. Thus, we propose a new computational tool for Arabic calligraphy style recognition (ACSR). The present [...] Read more.
Despite the importance of recognizing Arabic calligraphy styles and their potential usefulness for many applications, a very limited number of Arabic calligraphy style recognition works have been established. Thus, we propose a new computational tool for Arabic calligraphy style recognition (ACSR). The present work aims to identify Arabic calligraphy style (ACS) from images where text images are captured by different tools from different resources. To this end, we were inspired by the indices used by human experts to distinguish different calligraphy styles. These indices were transformed into a descriptor that defines, for each calligraphy style, a set of specific features. Three scenarios have been considered in the experimental part to prove the effectiveness of the proposed tool. The results confirmed the outperformance of both individual and combine features coded by our descriptor. The proposed work demonstrated outstanding performance, even with few training samples, compared to other related works for Arabic calligraphy recognition. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 7932 KiB  
Article
Semantic Matching Based on Semantic Segmentation and Neighborhood Consensus
by Huaiyuan Xu, Xiaodong Chen, Huaiyu Cai, Yi Wang, Haitao Liang and Haotian Li
Appl. Sci. 2021, 11(10), 4648; https://doi.org/10.3390/app11104648 - 19 May 2021
Cited by 1 | Viewed by 2860
Abstract
Establishing dense correspondences across semantically similar images is a challenging task, due to the large intra-class variation caused by the unconstrained setting of images, which is prone to cause matching errors. To suppress potential matching ambiguity, NCNet explores the neighborhood consensus pattern in [...] Read more.
Establishing dense correspondences across semantically similar images is a challenging task, due to the large intra-class variation caused by the unconstrained setting of images, which is prone to cause matching errors. To suppress potential matching ambiguity, NCNet explores the neighborhood consensus pattern in the 4D space of all possible correspondences, which is based on the assumption that the correspondence is continuous in space. We retain the neighborhood consensus constraint, while introducing semantic segmentation information into the features, which makes them more distinguishable and reduces matching ambiguity from a feature perspective. Specifically, we combine the semantic segmentation network to extract semantic features and the 4D convolution to explore 4D-space context consistency. Experiments demonstrate that our algorithm has good semantic matching performances and semantic segmentation information can improve semantic matching accuracy. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 3609 KiB  
Article
Multi-Step Traffic Speed Prediction Based on Ensemble Learning on an Urban Road Network
by Bin Feng, Jianmin Xu, Yonggang Zhang and Yongjie Lin
Appl. Sci. 2021, 11(10), 4423; https://doi.org/10.3390/app11104423 - 13 May 2021
Cited by 11 | Viewed by 2611
Abstract
Short-term traffic speed prediction plays an important role in the field of Intelligent Transportation Systems (ITS). Usually, traffic speed forecasting can be divided into single-step-ahead and multi-step-ahead. Compared with the single-step method, multi-step prediction can provide more future traffic condition to road traffic [...] Read more.
Short-term traffic speed prediction plays an important role in the field of Intelligent Transportation Systems (ITS). Usually, traffic speed forecasting can be divided into single-step-ahead and multi-step-ahead. Compared with the single-step method, multi-step prediction can provide more future traffic condition to road traffic participants for guidance decision-making. This paper proposes a multi-step traffic speed forecasting by using ensemble learning model with traffic speed detrending algorithm. Firstly, the correlation analysis is conducted to determine the representative features by considering the spatial and temporal characteristics of traffic speed. Then, the traffic speed time series is split into a trend set and a residual set via a detrending algorithm. Thirdly, a multi-step residual prediction with direct strategy is formulated by the ensemble learning model of stacking integrating support vector machine (SVM), CATBOOST, and K-nearest neighbor (KNN). Finally, the forecasting traffic speed can be reached by adding predicted residual part to the trend one. In tests that used field data from Zhongshan, China, the experimental results indicate that the proposed model outperforms the benchmark ones like SVM, CATBOOST, KNN, and BAGGING. Full article
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)
Show Figures

Figure 1

Back to TopTop