applsci-logo

Journal Browser

Journal Browser

AI-Based Image Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 April 2023) | Viewed by 132988

Special Issue Editors


E-Mail Website
Guest Editor
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
Interests: artificial intelligence; Internet of Things; robots

E-Mail Website
Guest Editor
Institute of Rail Transit, Tongji University, Shanghai 201804, China
Interests: maglev trains; offshore cranes; quay cranes and nonlinear control with applications to mechatronic systems
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, China
Interests: GAN-based facial editing; deep cloth animation; deep portrait editing; special effects simulation for motion picture; cloth animation and virtual try-on; traffic simulation and autonomous driving; crowd and group animation; implicit surface modeling and applications; creative modeling; sketch-based modeling; physically-based animation; image processing

Special Issue Information

Dear Colleagues,

With the rapid development of artificial intelligence, a series of artificial intelligence algorithms have been proposed, and intelligent applications have started to play an increasingly important role in industrial production and our social lives. The intersections of AI in medical image processing, image processing for intelligent transportation systems, satellite image processing, face recognition, object recognition, etc. are still challenging.

Therefore, a Special Issue on “AI-based Image Processing”, focusing on tackling the most pressing problems faced by the world is organized, and more specifically, in the topics specified below:

  • Image processing algorithms;
  • Image analytics;
  • Medial image processing;
  • Biomedical image analysis;
  • Image generation;
  • Image restoration and enhancement;
  • Image compression;
  • Edge detection;
  • Image segmentation;
  • Semantic segmentation;
  • Image classification;
  • Image inpainting;
  • Image captioning;
  • Feature detection and extraction;
  • Content-based image retrieval;
  • Optical character recognition;
  • Face recognition;
  • Emotion recognition;
  • Gesture recognition;
  • Object recognition and tracking.

Prof. Dr. Xinwei Yao
Dr. Yougang Sun
Prof. Dr. Xiaogang Jin
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • feature detection and extraction
  • object recognition and tracking

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (40 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

17 pages, 7788 KiB  
Article
Grasp Detection Combining Self-Attention with CNN in Complex Scenes
by Jinxing Niu, Shuo Liu, Hanbing Li, Tao Zhang and Lijun Wang
Appl. Sci. 2023, 13(17), 9655; https://doi.org/10.3390/app13179655 - 25 Aug 2023
Cited by 2 | Viewed by 1832
Abstract
In this paper, we present a novel approach that subtly combines the transformer with grasping CNN to achieve more optimal grasps in complex real-life situations. The approach comprises two unique designs that effectively improve grasp precision in complex scenes. The first essential design [...] Read more.
In this paper, we present a novel approach that subtly combines the transformer with grasping CNN to achieve more optimal grasps in complex real-life situations. The approach comprises two unique designs that effectively improve grasp precision in complex scenes. The first essential design uses self-attention mechanisms to capture contextual information from RGB images, boosting contrast between key object features and their surroundings. We precisely adjust internal parameters to balance accuracy and computing costs. The second crucial design involves building a feature fusion bridge that processes all one-dimensional sequence features at once to create an intuitive visual perception for the detection stage, ensuring a seamless combination of the transformer block and CNN. These designs eliminate noise features in complex backgrounds and emphasize graspable object features, providing valuable semantic data to the subsequent grasping CNN to achieve appropriate grasping. We evaluated the approach on the Cornell and VMRD datasets. According to the experimental results, our method achieves better performance than the original grasping CNN in single-object and multi-object scenarios, exhibiting 97.7% and 72.2% accuracy on the Cornell and VMRD grasp datasets using RGB, respectively. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

18 pages, 6126 KiB  
Article
A Semantic Information-Based Optimized vSLAM in Indoor Dynamic Environments
by Shuangfeng Wei, Shangxing Wang, Hao Li, Guangzu Liu, Tong Yang and Changchang Liu
Appl. Sci. 2023, 13(15), 8790; https://doi.org/10.3390/app13158790 - 29 Jul 2023
Cited by 5 | Viewed by 1579
Abstract
In unknown environments, mobile robots can use visual-based Simultaneous Localization and Mapping (vSLAM) to complete positioning tasks while building sparse feature maps and dense maps. However, the traditional vSLAM works in the hypothetical environment of static scenes and rarely considers the dynamic objects [...] Read more.
In unknown environments, mobile robots can use visual-based Simultaneous Localization and Mapping (vSLAM) to complete positioning tasks while building sparse feature maps and dense maps. However, the traditional vSLAM works in the hypothetical environment of static scenes and rarely considers the dynamic objects existing in the actual scenes. In addition, it is difficult for the robot to perform high-level semantic tasks due to its inability to obtain semantic information from sparse feature maps and dense maps. In order to improve the ability of environment perception and accuracy of mapping for mobile robots in dynamic indoor environments, we propose a semantic information-based optimized vSLAM algorithm. The optimized vSLAM algorithm adds the modules of dynamic region detection and semantic segmentation to ORB-SLAM2. First, a dynamic region detection module is added to the vision odometry. The dynamic region of the image is detected by combining single response matrix and dense optical flow method to improve the accuracy of pose estimation in dynamic environment. Secondly, the semantic segmentation of images is implemented based on BiSeNet V2 network. For the over-segmentation problem in semantic segmentation, a region growth algorithm combining depth information is proposed to optimize the 3D segmentation. In the process of map building, semantic information and dynamic regions are used to remove dynamic objects and build an indoor map containing semantic information. The system not only can effectively remove the effect of dynamic objects on the pose estimation, but also use the semantic information of images to build indoor maps containing semantic information. The proposed algorithm is evaluated and analyzed in TUM RGB-D dataset and real dynamic scenes. The results show that the accuracy of our algorithm outperforms that of ORB-SLAM2 and DS-SLAM in dynamic scenarios. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

13 pages, 5203 KiB  
Article
Anomaly Detection of Remote Sensing Images Based on the Channel Attention Mechanism and LRX
by Huinan Guo, Hua Wang, Xiaodong Song and Zhongling Ruan
Appl. Sci. 2023, 13(12), 6988; https://doi.org/10.3390/app13126988 - 9 Jun 2023
Cited by 2 | Viewed by 2043
Abstract
Anomaly detection of remote sensing images has gained significant attention in remote sensing image processing due to their rich spectral information. The Local RX (LRX) algorithm, derived from the Reed–Xiaoli (RX) algorithm, is a hyperspectral anomaly detection method that focuses on identifying anomalous [...] Read more.
Anomaly detection of remote sensing images has gained significant attention in remote sensing image processing due to their rich spectral information. The Local RX (LRX) algorithm, derived from the Reed–Xiaoli (RX) algorithm, is a hyperspectral anomaly detection method that focuses on identifying anomalous pixels in hyperspectral images by exploiting local statistics and background modeling. However, it is still susceptible to the noises in the Hyperspectral Images (HSIs), which limits its detection performance. To address this problem, a hyperspectral anomaly detection algorithm based on channel attention mechanism and LRX is proposed in this paper. The HSI is feed into the auto-encoder network that is constrained by the channel attention module to generate a more representative reconstructed image that better captures the characteristics of different land covers and has less noises. The channel attention module in the auto-encoder network aims to explore the effective spectral bands corresponding to different land covers. Subsequently, the LRX algorithm is utilized for anomaly detection on the reconstructed image obtained from the auto-encoder network with the channel attention mechanism, which avoids the influence of noises on the anomaly detection results and improves the anomaly detection performance. The experiments are conducted on three HSIs to verify the performance of the proposed method. The proposed hyperspectral anomaly detection method achieves higher Area Under Curve (AUC) values of 0.9871, 0.9916 and 0.9642 on HYDICE urban dataset, AVIRIS aircraft dataset and Salinas Valley dataset, respectively, compared with other six methods. The experimental results demonstrate that the proposed algorithm has better anomaly detection performance than LRX and other algorithms. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

12 pages, 2215 KiB  
Article
A Structured Recognition Method for Invoices Based on StrucTexT Model
by Zhijie Li, Wencan Tian, Changhua Li, Yunpeng Li and Haoqi Shi
Appl. Sci. 2023, 13(12), 6946; https://doi.org/10.3390/app13126946 - 8 Jun 2023
Viewed by 1518
Abstract
Invoice recognition has long been an active research direction in the field of image recognition. Existing invoice recognition methods suffer from a low recognition rate for structured invoices, a slow recognition speed, and difficulty in mobile deployment. To address these issues, we propose [...] Read more.
Invoice recognition has long been an active research direction in the field of image recognition. Existing invoice recognition methods suffer from a low recognition rate for structured invoices, a slow recognition speed, and difficulty in mobile deployment. To address these issues, we propose an invoice-structured recognition method based on the StrucTexT model. This method uses the idea of knowledge distillation to speed up the recognition speed and compress the model size without reducing the model recognition rate; this is achieved using the teacher model StrucTexT to guide the student model StrucTexT_slim. The method can effectively solve the problems of slow model recognition speed and large model size that make mobile deployment difficult with traditional methods. Experimental results show that the proposed model achieves an accuracy rate of over 94% on the SROIE and FUNSD public datasets and over 95% on the self-built structured invoice dataset. In addition, the method is 30% faster than other models (YOLOv4, LeNet-5, and Tesseract-OCR) in terms of recognition speed, while the model size is compressed by about 20%. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

14 pages, 422 KiB  
Article
CFormerFaceNet: Efficient Lightweight Network Merging a CNN and Transformer for Face Recognition
by Lin He, Lile He and Lijun Peng
Appl. Sci. 2023, 13(11), 6506; https://doi.org/10.3390/app13116506 - 26 May 2023
Cited by 7 | Viewed by 3030
Abstract
Most face recognition methods rely on deep convolutional neural networks (CNNs) that construct multiple layers of processing units in a cascaded form and employ convolution operations to fuse local features. However, these methods are not conducive to modeling the global semantic information of [...] Read more.
Most face recognition methods rely on deep convolutional neural networks (CNNs) that construct multiple layers of processing units in a cascaded form and employ convolution operations to fuse local features. However, these methods are not conducive to modeling the global semantic information of the face and lack attention to important facial feature regions and their spatial relationships. In this work, a Group Depth-Wise Transpose Attention (GDTA) block is designed to effectively capture both local and global representations, mitigate the issue of limited receptive fields in CNNs, and establish long-range dependencies among different feature regions. Based on GDTA and CNNs, a novel, efficient, and lightweight face recognition model called CFormerFaceNet, which combines a CNN and Transformer, is proposed. The model significantly reduces the parameters and computational cost without compromising performance, greatly improving the computational efficiency of deep neural networks in face recognition tasks. The model achieves competitive accuracy on multiple challenging benchmark face datasets, including LFW, CPLFW, CALFW, SLLFW, CFP_FF, CFP_FP, and AgeDB-30, while maintaining the minimum computational cost compared to all other advanced face recognition models. The experimental results using computers and embedded devices also demonstrate that it can meet real-time requirements in practical applications. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

13 pages, 3794 KiB  
Article
Use of CNN for Water Stress Identification in Rice Fields Using Thermal Imagery
by Mu-Wei Li, Yung-Kuan Chan and Shyr-Shen Yu
Appl. Sci. 2023, 13(9), 5423; https://doi.org/10.3390/app13095423 - 26 Apr 2023
Viewed by 1459
Abstract
Rice is a staple food in many Asian countries, but its production requires a high water demand. Moreover, more attention should be paid to the water management of rice due to global climate change and frequent droughts. To address this problem, we propose [...] Read more.
Rice is a staple food in many Asian countries, but its production requires a high water demand. Moreover, more attention should be paid to the water management of rice due to global climate change and frequent droughts. To address this problem, we propose a rice water stress identification system. Since water irrigation usually affects the opening and closing of rice leaf stomata which directly affects leaf temperature, rice leaf temperature is a suitable index for evaluating rice water stress. The proposed rice water stress identification system uses a CNN (convolutional neural network) to identify water stress in thermal images of rice fields and to classify the irrigation situation into three classes: 100%, 90%, and 80% irrigation. The CNN was applied to extract the temperature level score from each thermal image based on the degree of difference between the three irrigation situations, then these scores were used to further classify the water-stress situation. In the experiments in this study, we compare CNN classification results without considering the degree between each class. The proposed method considerably improves water stress identification. Since rice leaf temperature is relative to air temperature and is not an absolute value, the background temperature is also important reference information. We combine two different methods for background processing to extract more features and achieve more accurate identification. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

16 pages, 696 KiB  
Article
Influence of Hyperparameters in Deep Learning Models for Coffee Rust Detection
by Adrian F. Chavarro, Diego Renza and Dora M. Ballesteros
Appl. Sci. 2023, 13(7), 4565; https://doi.org/10.3390/app13074565 - 4 Apr 2023
Cited by 4 | Viewed by 1900
Abstract
Most of the world’s crops can be attacked by various diseases or pests, affecting their quality and productivity. In recent years, transfer learning with deep learning (DL) models has been used to detect diseases in maize, tomato, rice, and other crops. In the [...] Read more.
Most of the world’s crops can be attacked by various diseases or pests, affecting their quality and productivity. In recent years, transfer learning with deep learning (DL) models has been used to detect diseases in maize, tomato, rice, and other crops. In the specific case of coffee, some recent works have used fixed hyperparameters to fine-tune the pre-trained models with the new dataset and/or applied data augmentation, such as image patching, to improve classifier performance. However, a detailed evaluation of the impact of architecture (e.g., backbone) and training (e.g., optimizer and learning rate) hyperparameters on the performance of coffee rust classification models has not been performed. Therefore, this paper presents a comprehensive study of the impact of five types of hyperparameters on the performance of coffee rust classification models. Specifically, eight pre-trained models are compared, each with four different amounts of transferred layers and three different numbers of neurons in the fully-connected (FC) layer, and the models are fine-tuned with three types of optimizers, each with three learning rate values. Comparing more than 800 models in terms of F1-score and accuracy, it is identified that the type of backbone is the hyperparameter with the greatest impact (with differences between models of up to 70%), followed by the optimizer (with differences of up to 20%). At the end of the study, specific recommendations are made on the values of the most suitable hyperparameters for the identification of this type of disease in coffee crops. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

14 pages, 4084 KiB  
Article
Neural Network-Based Identification of Cloud Types from Ground-Based Images of Cloud Layers
by Zijun Li, Hoiio Kong and Chan-Seng Wong
Appl. Sci. 2023, 13(7), 4470; https://doi.org/10.3390/app13074470 - 31 Mar 2023
Cited by 4 | Viewed by 3190
Abstract
Clouds are a significant factor in regional climates and play a crucial role in regulating the Earth’s water cycle through the interaction of sunlight and wind. Meteorological agencies around the world must regularly observe and record cloud data. Unfortunately, the current methods for [...] Read more.
Clouds are a significant factor in regional climates and play a crucial role in regulating the Earth’s water cycle through the interaction of sunlight and wind. Meteorological agencies around the world must regularly observe and record cloud data. Unfortunately, the current methods for collecting cloud data mainly rely on manual observation. This paper presents a novel approach to identifying ground-based cloud images to aid in the collection of cloud data. However, there is currently no publicly available dataset that is suitable for this research. To solve this, we built a dataset of surface-shot images of clouds called the SSC, which was overseen by the Macao Meteorological Society. Compared to previous datasets, the SSC dataset offers a more balanced distribution of data samples across various cloud genera and provides a more precise classification of cloud genera. This paper presents a method for identifying cloud genera based on cloud texture, using convolutional neural networks. To extract cloud texture effectively, we apply Gamma Correction to the images. The experiments were conducted on the SSC dataset. The results show that the proposed model performs well in identifying 10 cloud genera, achieving an accuracy rate of 80% for the top three possibilities. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

15 pages, 10345 KiB  
Article
Improved First-Order Motion Model of Image Animation with Enhanced Dense Motion and Repair Ability
by Yu Xu, Feng Xu, Qiang Liu and Jianwen Chen
Appl. Sci. 2023, 13(7), 4137; https://doi.org/10.3390/app13074137 - 24 Mar 2023
Cited by 2 | Viewed by 3143
Abstract
Image animation aims to transfer the posture change of a driving video to the static object of the source image, and has potential applications in various domains, such as film and game industries. The essential part in this task is to generate a [...] Read more.
Image animation aims to transfer the posture change of a driving video to the static object of the source image, and has potential applications in various domains, such as film and game industries. The essential part in this task is to generate a video by learning the motion from the driving video while preserving the appearance from the source image. As a result, a new object with the same motion will be generated in the animated video. However, it is a significant challenge if the object pose shows large-scale change. Even the most recent method failed to achieve this correctly with good visual effects. In order to solve the problem of poor visual effects in the videos with the large-scale pose change, a novel method based on an improved first-order motion model (FOMM) with enhanced dense motion and repair ability was proposed in this paper. Firstly, when generating optical flow, we propose an attention mechanism that optimizes the feature representation of the image in both channel and spatial domains through maximum pooling. This enables better distortion of the source image into the feature domain of the driving image. Secondly, we further propose a multi-scale occlusion restoration module that generates a multi-resolution occlusion map by upsampling the low-resolution occlusion map. Following this, the generator redraws the occluded part of the reconstruction result across multiple scales through the multi-resolution occlusion map to achieve more accurate and vivid visual effects. In addition, the proposed model can be trained effectively in an unsupervised manner. We evaluated the proposed model on three benchmark datasets. The experimental results showed that multiple evaluation indicators were improved by our proposed method, and the visual effect of the animated videos obviously outperformed the FOMM. On the Voxceleb1 dataset, the pixel error, average keypoints distance and average Euclidean distance by our proposed method were reduced by 6.5%, 5.1% and 0.7%, respectively. On the TaiChiHD dataset, the pixel error, average keypoints distance and missing keypoints rate measured by our proposed method were reduced by 4.9%, 13.5% and 25.8%, respectively. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

15 pages, 7871 KiB  
Article
A Study on the Super Resolution Combining Spatial Attention and Channel Attention
by Dongwoo Lee, Kyeongseok Jang, Soo Young Cho, Seunghyun Lee and Kwangchul Son
Appl. Sci. 2023, 13(6), 3408; https://doi.org/10.3390/app13063408 - 7 Mar 2023
Cited by 3 | Viewed by 2076
Abstract
Existing CNN-based super resolution methods have low emphasis on high-frequency features, resulting in poor performance for contours and textures. To solve this problem, this paper proposes single image super resolution using an attention mechanism that emphasizes high-frequency features and a feature extraction process [...] Read more.
Existing CNN-based super resolution methods have low emphasis on high-frequency features, resulting in poor performance for contours and textures. To solve this problem, this paper proposes single image super resolution using an attention mechanism that emphasizes high-frequency features and a feature extraction process with different depths. In order to emphasize the high-frequency features of the channel and space, it is composed of CSBlock that combines channel attention and spatial attention. Attention block using 10 CSBlocks was used for high-frequency feature extraction. In order to extract various features with different degrees of feature emphasis from insufficient low-resolution features, features were extracted from structures connected with different numbers of attention blocks. The extracted features were expanded through sub-pixel convolution to create super resolution images, and learning was performed through L1 loss. Compared to the existing deep learning method, it showed improved results in several high-frequency features such as small object outlines and line patterns. In PSNR and SSIM, it showed about 11% to 26% improvement over the existing Bicubic interpolation and about 1 to 2% improvement over VDSR and EDSR. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

14 pages, 882 KiB  
Article
Filter Pruning via Attention Consistency on Feature Maps
by Huoxiang Yang, Yongsheng Liang, Wei Liu and Fanyang Meng
Appl. Sci. 2023, 13(3), 1964; https://doi.org/10.3390/app13031964 - 2 Feb 2023
Cited by 3 | Viewed by 2007
Abstract
Due to the effective guidance of prior information, feature map-based pruning methods have emerged as promising techniques for model compression. In the previous works, the undifferentiated treatment of all information on feature maps amplifies the negative impact of noise and background information. To [...] Read more.
Due to the effective guidance of prior information, feature map-based pruning methods have emerged as promising techniques for model compression. In the previous works, the undifferentiated treatment of all information on feature maps amplifies the negative impact of noise and background information. To address this issue, a novel filter pruning strategy called Filter Pruning via Attention Consistency (FPAC) is proposed, and a simple and effective implementation method of FPAC is presented. FPAC is inspired by the notion that the attention of feature maps on one layer is in high consistency of spatial dimension. The experiments also show that feature maps with lower consistency are less important. Hence, FPAC measures the importance of filters by evaluating the attention consistency on the feature maps and then prunes the filters corresponding to feature maps with lower consistency. The present experiments on various datasets further confirm the effectiveness of FPAC. For instance, applying VGG-16 on CIFAR-10, the classification accuracy even increases from 93.96% to 94.03% with 58.1% FLOPs reductions. Furthermore, applying ResNet-50 on ImageNet achieves 45% FLOPs reductions with only 0.53% accuracy loss. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

14 pages, 2668 KiB  
Article
RGCLN: Relational Graph Convolutional Ladder-Shaped Networks for Signed Network Clustering
by Anping Song, Ruyi Ji, Wendong Qi and Chenbei Zhang
Appl. Sci. 2023, 13(3), 1367; https://doi.org/10.3390/app13031367 - 19 Jan 2023
Viewed by 1510
Abstract
Node embeddings are increasingly used in various analysis tasks of networks due to their excellent dimensional compression and feature representation capabilities. However, most researchers’ priorities have always been link prediction, which leads to signed network clustering being under-explored. Therefore, we propose an asymmetric [...] Read more.
Node embeddings are increasingly used in various analysis tasks of networks due to their excellent dimensional compression and feature representation capabilities. However, most researchers’ priorities have always been link prediction, which leads to signed network clustering being under-explored. Therefore, we propose an asymmetric ladder-shaped architecture called RGCLN based on multi-relational graph convolution that can fuse deep node features to generate node representations with great representational power. RGCLN adopts a deep framework to capture and convey information instead of using the common method in signed networks—balance theory. In addition, RGCLN adds a size constraint to the loss function to prevent image-like overfitting during the unsupervised learning process. Based on the node features learned by this end-to-end trained model, RGCLN performs community detection in a large number of real-world networks and generative networks, and the results indicate that our model has an advantage over state-of-the-art network embedding algorithms. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

22 pages, 2763 KiB  
Article
Face Mask Detection on Photo and Real-Time Video Images Using Caffe-MobileNetV2 Transfer Learning
by B. Anil Kumar and Mohan Bansal
Appl. Sci. 2023, 13(2), 935; https://doi.org/10.3390/app13020935 - 10 Jan 2023
Cited by 22 | Viewed by 5913
Abstract
Face detection systems have generally been used primarily for non-masked faces, which include relevant facial characteristics such as the ears, chin, lips, nose, and eyes. Masks are necessary to cover faces in many situations, such as pandemics, crime scenes, medical settings, high pollution, [...] Read more.
Face detection systems have generally been used primarily for non-masked faces, which include relevant facial characteristics such as the ears, chin, lips, nose, and eyes. Masks are necessary to cover faces in many situations, such as pandemics, crime scenes, medical settings, high pollution, and laboratories. The COVID-19 epidemic has increased the requirement for people to use protective face masks in public places. Analysis of face detection technology is crucial with blocked faces, which typically have visibility only in the periocular area and above. This paper aims to implement a model on complex data, i.e., by taking tasks for the face detection of people from the photo and in real-time video images with and without a mask. This task is implemented based on the features around their eyes, ears, nose, and forehead by using the original masked and unmasked images to form a baseline for face detection. The idea of performing such a task is by using the Caffe-MobileNetV2 (CMNV2) model for feature extraction and masked image classification. The convolutional architecture for the fast feature embedding Caffe model is used as a face detector, and the MobileNetV2 is used for mask identification. In this work, five different layers are added to the pre-trained MobileNetV2 architecture for better classification accuracy with fewer training parameters for the given data for face mask detection. Experimental results revealed that the proposed methodology performed well, with an accuracy of 99.64% on photo images and good accuracy on real-time video images. Other metrics show that the model outperforms previous models with a precision of 100%, recall of 99.28%, f1-score of 99.64%, and an error rate of 0.36%. Face mask detection was originally a form of computing application, but it is now widely used in other technological areas such as smartphones and artificial intelligence. Computer-based masked-face detection belongs in the category of biometrics, since it includes using a person’s unique features to identify them with a mask on. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

21 pages, 3063 KiB  
Article
GPSR: Gradient-Prior-Based Network for Image Super-Resolution
by Xiancheng Zhu, Detian Huang, Xiaorui Li, Danlin Cai and Daxin Zhu
Appl. Sci. 2023, 13(2), 833; https://doi.org/10.3390/app13020833 - 7 Jan 2023
Viewed by 1891
Abstract
Recent deep learning has shown great potential in super-resolution (SR) tasks. However, most deep learning-based SR networks are optimized via pixel-level loss (i.e., L1, L2, and MSE), which forces the networks to output the average of all possible predictions, leading to blurred details. [...] Read more.
Recent deep learning has shown great potential in super-resolution (SR) tasks. However, most deep learning-based SR networks are optimized via pixel-level loss (i.e., L1, L2, and MSE), which forces the networks to output the average of all possible predictions, leading to blurred details. Especially in SR tasks with large scaling factors (i.e., ×4, ×8), the limitation is further aggravated. To alleviate this limitation, we propose a Gradient-Prior-based Super-Resolution network (GPSR). Specifically, a detail-preserving Gradient Guidance Strategy is proposed to fully exploit the gradient prior to guide the SR process from two aspects. On the one hand, an additional gradient branch is introduced into GPSR to provide the critical structural information. On the other hand, a compact gradient-guided loss is proposed to strengthen the constraints on the spatial structure and to prevent the blind restoration of high-frequency details. Moreover, two residual spatial attention adaptive aggregation modules are proposed and incorporated into the SR branch and the gradient branch, respectively, to fully exploit the crucial intermediate features to enhance the feature representation ability. Comprehensive experimental results demonstrate that the proposed GPSR outperforms state-of-the-art methods regarding both subjective visual quality and objective quantitative metrics in SR tasks with large scaling factors (i.e., ×4 and ×8). Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

33 pages, 7484 KiB  
Article
DRL-FVRestore: An Adaptive Selection and Restoration Method for Finger Vein Images Based on Deep Reinforcement
by Ruoran Gao, Huimin Lu, Adil Al-Azzawi, Yupeng Li and Chengcheng Zhao
Appl. Sci. 2023, 13(2), 699; https://doi.org/10.3390/app13020699 - 4 Jan 2023
Cited by 4 | Viewed by 2373
Abstract
Finger vein recognition has become a research hotspot in the field of biometrics due to its advantages of non-contact acquisition, unique information, and difficulty in terms of forging or pirating. However, in the real-world application process, the extraction of image features for the [...] Read more.
Finger vein recognition has become a research hotspot in the field of biometrics due to its advantages of non-contact acquisition, unique information, and difficulty in terms of forging or pirating. However, in the real-world application process, the extraction of image features for the biometric remains a significant challenge when the captured finger vein images suffer from blur, noise, or missing feature information. To address the above challenges, we propose a novel deep reinforcement learning-based finger vein image recovery method, DRL-FVRestore, which trained an agent that adaptively selects the appropriate restoration behavior according to the state of the finger vein image, enabling continuous restoration of the image. The behaviors of image restoration are divided into three tasks: deblurring restoration, defect restoration, and denoising and enhancement restoration. Specifically, a DeblurGAN-v2 based on the Inception-Resnet-v2 backbone is proposed to achieve deblurring restoration of finger vein images. A finger vein feature-guided restoration network is proposed to achieve defect image restoration. The DRL-FVRestore is proposed to deal with multi-image problems in complex situations. In this paper, extensive experimental results are conducted based on using four publicly accessible datasets. The experimental results show that for restoration with single image problems, the EER values of the deblurring network and damage restoration network are reduced by an average of 4.31% and 1.71%, respectively, compared to other methods. For images with multiple vision problems, the EER value of the proposed DRL-FVRestore is reduced by an average of 3.98%. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

14 pages, 521 KiB  
Article
Enhancing Semantic-Consistent Features and Transforming Discriminative Features for Generalized Zero-Shot Classifications
by Guan Yang, Ayou Han, Xiaoming Liu, Yang Liu, Tao Wei and Zhiyuan Zhang
Appl. Sci. 2022, 12(24), 12642; https://doi.org/10.3390/app122412642 - 9 Dec 2022
Cited by 3 | Viewed by 1508
Abstract
Generalized zero-shot learning (GZSL) aims to classify classes that do not appear during training. Recent state-of-the-art approaches rely on generative models, which use correlating semantic embeddings to synthesize unseen classes visual features; however, these approaches ignore the semantic and visual relevance, and visual [...] Read more.
Generalized zero-shot learning (GZSL) aims to classify classes that do not appear during training. Recent state-of-the-art approaches rely on generative models, which use correlating semantic embeddings to synthesize unseen classes visual features; however, these approaches ignore the semantic and visual relevance, and visual features synthesized by generative models do not represent their semantics well. Although existing GZSL methods based on generative model disentanglement consider consistency between visual and semantic models, these methods consider semantic consistency only in the training phase and ignore semantic consistency in the feature synthesis and classification phases. The absence of such constraints may lead to an unrepresentative synthesized visual model with respect to semantics, and the visual and semantic features are not modally well aligned, thus causing the bias between visual and semantic features. Therefore, an approach for GZSL is proposed to enhance semantic-consistent features and discriminative features transformation (ESTD-GZSL). The proposed method can enhance semantic-consistent features at all stages of GZSL. A semantic decoder module is first added to the VAE to map synthetic and real features to the corresponding semantic embeddings. This regularization method allows synthesizing unseen classes for a more representative visual representation, and synthetic features can better represent their semantics. Then, the semantic-consistent features decomposed by the disentanglement module and the features output by the semantic decoder are transformed into enhanced semantic-consistent discriminative features and used in classification to reduce the ambiguity between categories. The experimental results show that our proposed method achieves more competitive results on four benchmark datasets (AWA2, CUB, FLO, and APY) of GZSL. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

9 pages, 1286 KiB  
Article
Multi-Feature Fusion Event Argument Entity Recognition Method for Industrial Robot Fault Diagnosis
by Senye Chen, Lianglun Cheng, Jianfeng Deng and Tao Wang
Appl. Sci. 2022, 12(23), 12359; https://doi.org/10.3390/app122312359 - 2 Dec 2022
Cited by 1 | Viewed by 1420
Abstract
The advance of knowledge graphs can bring tangible benefits to the fault detection of industrial robots. However, the construction of the KG for industrial robot fault detection is still in its infancy. In this paper, we propose a top-down approach to constructing a [...] Read more.
The advance of knowledge graphs can bring tangible benefits to the fault detection of industrial robots. However, the construction of the KG for industrial robot fault detection is still in its infancy. In this paper, we propose a top-down approach to constructing a knowledge graph from robot fault logs. We define the event argument classes for fault phenomena and fault cause events as well as their relationship. Then, we develop the event logic ontology model. In order to construct the event logic knowledge extraction dataset, the ontology is used to label the entity and relationship of the fault detection event argument in the corpus. Additionally, due to the small size of the corpus, many professional terms, and sparse entities, a model for recognizing entities for robot fault detection is proposed. The accuracy of the entity boundary determination of the model is improved by combining multiple text features and using the relationship information. Compared with other methods, this method can significantly improve the performance of entity recognition of the dataset. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

12 pages, 2938 KiB  
Communication
Light-YOLOv5: A Lightweight Algorithm for Improved YOLOv5 in Complex Fire Scenarios
by Hao Xu, Bo Li and Fei Zhong
Appl. Sci. 2022, 12(23), 12312; https://doi.org/10.3390/app122312312 - 1 Dec 2022
Cited by 35 | Viewed by 3506
Abstract
Fire-detection technology is of great importance for successful fire-prevention measures. Image-based fire detection is one effective method. At present, object-detection algorithms are deficient in performing detection speed and accuracy tasks when they are applied in complex fire scenarios. In this study, a lightweight [...] Read more.
Fire-detection technology is of great importance for successful fire-prevention measures. Image-based fire detection is one effective method. At present, object-detection algorithms are deficient in performing detection speed and accuracy tasks when they are applied in complex fire scenarios. In this study, a lightweight fire-detection algorithm, Light-YOLOv5 (You Only Look Once version five), is presented. First, a separable vision transformer (SepViT) block is used to replace several Cross Stage Partial Bottleneck with 3 convolutions (C3) modules in the final layer of a backbone network to enhance both the contact of the backbone network to global information and the extraction of flame and smoke features; second, a light bidirectional feature pyramid network (Light-BiFPN) is designed to lighten the model while improving the feature extraction and balancing speed and accuracy features during a fire-detection procedure; third, a global attention mechanism (GAM) is fused into the network to cause the model to focus more on the global dimensional features and further improve the detection accuracy of the model; and finally, the Mish activation function and SIoU loss are utilized to simultaneously increase the convergence speed and enhance the accuracy. The experimental results show that compared to the original algorithm, the mean average accuracy (mAP) of Light-YOLOv5 increases by 3.3%, the number of parameters decreases by 27.1%, and the floating point operations (FLOPs) decrease by 19.1%. The detection speed reaches 91.1 FPS, which can detect targets in complex fire scenarios in real time. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

19 pages, 3065 KiB  
Article
Imitative Reinforcement Learning Fusing Mask R-CNN Perception Algorithms
by Lei He, Jian Ou, Mingyue Ba, Guohong Deng and Echuan Yang
Appl. Sci. 2022, 12(22), 11821; https://doi.org/10.3390/app122211821 - 21 Nov 2022
Cited by 3 | Viewed by 1927
Abstract
Autonomous urban driving navigation is still an open problem and has ample room for improvement in unknown complex environments. This paper proposes an end-to-end autonomous driving approach that combines Conditional Imitation Learning (CIL), Mask R-CNN with DDPG. In the first stage, data acquisition [...] Read more.
Autonomous urban driving navigation is still an open problem and has ample room for improvement in unknown complex environments. This paper proposes an end-to-end autonomous driving approach that combines Conditional Imitation Learning (CIL), Mask R-CNN with DDPG. In the first stage, data acquisition is first performed by using CARLA, a high-fidelity simulation software. Data collected by CARLA is used to train the Mask R-CNN network, which is used for object detection and segmentation. The segmented images are transformed into the backbone of CIL to perform supervised Imitation Learning (IL). DDPG means using Reinforcement Learning for further training in the second stage, which shares the learned weights from the pre-trained CIL model. The combination of the two methods is an innovative way of considering. The benefit is that it is possible to speed up training considerably and obtain super-high levels of performance beyond humans. We conduct experiments on the CARLA driving benchmark of urban driving. In the final experiments, our algorithm outperforms the original MP by 30%, CIL by 33%, and CIRL by 10% in the most difficult tasks, dynamic navigation tasks, and in new environments and new weather, demonstrating that the two-stage framework proposed in this paper shows remarkable generalization capability in unknown environments on navigation tasks. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

18 pages, 3450 KiB  
Article
ViT-Cap: A Novel Vision Transformer-Based Capsule Network Model for Finger Vein Recognition
by Yupeng Li, Huimin Lu, Yifan Wang, Ruoran Gao and Chengcheng Zhao
Appl. Sci. 2022, 12(20), 10364; https://doi.org/10.3390/app122010364 - 14 Oct 2022
Cited by 15 | Viewed by 3299
Abstract
Finger vein recognition has been widely studied due to its advantages, such as high security, convenience, and living body recognition. At present, the performance of the most advanced finger vein recognition methods largely depends on the quality of finger vein images. However, when [...] Read more.
Finger vein recognition has been widely studied due to its advantages, such as high security, convenience, and living body recognition. At present, the performance of the most advanced finger vein recognition methods largely depends on the quality of finger vein images. However, when collecting finger vein images, due to the possible deviation of finger position, ambient lighting and other factors, the quality of the captured images is often relatively low, which directly affects the performance of finger vein recognition. In this study, we proposed a new model for finger vein recognition that combined the vision transformer architecture with the capsule network (ViT-Cap). The model can explore finger vein image information based on global and local attention and selectively focus on the important finger vein feature information. First, we split-finger vein images into patches and then linearly embedded each of the patches. Second, the resulting vector sequence was fed into a transformer encoder to extract the finger vein features. Third, the feature vectors generated by the vision transformer module were fed into the capsule module for further training. We tested the proposed method on four publicly available finger vein databases. Experimental results showed that the average recognition accuracy of the algorithm based on the proposed model was above 96%, which was better than the original vision transformer, capsule network, and other advanced finger vein recognition algorithms. Moreover, the equal error rate (EER) of our model achieved state-of-the-art performance, especially reaching less than 0.3% under the test of FV-USM datasets which proved the effectiveness and reliability of the proposed model in finger vein recognition. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

17 pages, 4555 KiB  
Article
Nighttime Image Dehazing Based on Point Light Sources
by Xin-Wei Yao, Xinge Zhang, Yuchen Zhang, Weiwei Xing and Xing Zhang
Appl. Sci. 2022, 12(20), 10222; https://doi.org/10.3390/app122010222 - 11 Oct 2022
Cited by 4 | Viewed by 2062
Abstract
Images routinely suffer from quality degradation in fog, mist, and other harsh weather conditions. Consequently, image dehazing is an essential and inevitable pre-processing step in computer vision tasks. Image quality enhancement for special scenes, especially nighttime image dehazing is extremely well studied for [...] Read more.
Images routinely suffer from quality degradation in fog, mist, and other harsh weather conditions. Consequently, image dehazing is an essential and inevitable pre-processing step in computer vision tasks. Image quality enhancement for special scenes, especially nighttime image dehazing is extremely well studied for unmanned driving and nighttime surveillance, while the vast majority of dehazing algorithms in the past were only applicable to daytime conditions. After observing a large number of nighttime images, artificial light sources have replaced the position of the sun in daytime images and the impact of light sources on pixels varies with distance. This paper proposed a novel nighttime dehazing method using the light source influence matrix. The luminosity map can well express the photometric difference value of the picture light source. Then, the light source influence matrix is calculated to divide the image into near light source region and non-near light source region. Using the result of two regions, the two initial transmittances obtained by dark channel prior are fused by edge-preserving filtering. For the atmospheric light term, the initial atmospheric light value is corrected by the light source influence matrix. Finally, the final result is obtained by substituting the atmospheric light model. Theoretical analysis and comparative experiments verify the performance of the proposed image dehazing method. In terms of PSNR, SSIM, and UQI, this method improves 9.4%, 11.2%, and 3.3% over the existed night-time defogging method OSPF. In the future, we will explore the work from static picture dehazing to real-time video stream dehazing detection and will be used in detection on potential applications. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

13 pages, 1349 KiB  
Article
Image-Caption Model Based on Fusion Feature
by Yaogang Geng, Hongyan Mei, Xiaorong Xue and Xing Zhang
Appl. Sci. 2022, 12(19), 9861; https://doi.org/10.3390/app12199861 - 30 Sep 2022
Cited by 1 | Viewed by 2967
Abstract
The encoder–decoder framework is the main frame of image captioning. The convolutional neural network (CNN) is usually used to extract grid-level features of the image, and the graph convolutional neural network (GCN) is used to extract the image’s region-level features. Grid-level features are [...] Read more.
The encoder–decoder framework is the main frame of image captioning. The convolutional neural network (CNN) is usually used to extract grid-level features of the image, and the graph convolutional neural network (GCN) is used to extract the image’s region-level features. Grid-level features are poor in semantic information, such as the relationship and location of objects, while regional features lack fine-grained information about images. To address this problem, this paper proposes a fusion-features-based image-captioning model, which includes the fusion feature encoder and LSTM decoder. The fusion-feature encoder is divided into grid-level feature encoder and region-level feature encoder. The grid-level feature encoder is a convoluted neural network embedded in squeeze and excitation operations so that the model can focus on features that are highly correlated to the title. The region-level encoder employs node-embedding matrices to enable models to understand different node types and gain richer semantics. Then the features are weighted together by an attention mechanism to guide the decoder LSTM to generate an image caption. Our model was trained and tested in the MS COCO2014 dataset with the experimental evaluation standard Bleu-4 score and CIDEr score of 0.399 and 1.311, respectively. The experimental results indicate that the model can describe the image in detail. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

20 pages, 931 KiB  
Article
Adaptive Hybrid Storage Format for Sparse Matrix–Vector Multiplication on Multi-Core SIMD CPUs
by Shizhao Chen, Jianbin Fang, Chuanfu Xu and Zheng Wang
Appl. Sci. 2022, 12(19), 9812; https://doi.org/10.3390/app12199812 - 29 Sep 2022
Cited by 4 | Viewed by 2151
Abstract
Optimizing sparse matrix–vector multiplication (SpMV) is challenging due to the non-uniform distribution of the non-zero elements of the sparse matrix. The best-performing SpMV format changes depending on the input matrix and the underlying architecture, and there is no “one-size-fit-for-all” format. A hybrid scheme [...] Read more.
Optimizing sparse matrix–vector multiplication (SpMV) is challenging due to the non-uniform distribution of the non-zero elements of the sparse matrix. The best-performing SpMV format changes depending on the input matrix and the underlying architecture, and there is no “one-size-fit-for-all” format. A hybrid scheme combining multiple SpMV storage formats allows one to choose an appropriate format to use for the target matrix and hardware. However, existing hybrid approaches are inadequate for utilizing the SIMD cores of modern multi-core CPUs with SIMDs, and it remains unclear how to best mix different SpMV formats for a given matrix. This paper presents a new hybrid storage format for sparse matrices, specifically targeting multi-core CPUs with SIMDs. Our approach partitions the target sparse matrix into two segmentations based on the regularities of the memory access pattern, where each segmentation is stored in a format suitable for its memory access patterns. Unlike prior hybrid storage schemes that rely on the user to determine the data partition among storage formats, we employ machine learning to build a predictive model to automatically determine the partition threshold on a per matrix basis. Our predictive model is first trained off line, and the trained model can be applied to any new, unseen sparse matrix. We apply our approach to 956 matrices and evaluate its performance on three distinct multi-core CPU platforms: a 72-core Intel Knights Landing (KNL) CPU, a 128-core AMD EPYC CPU, and a 64-core Phytium ARMv8 CPU. Experimental results show that our hybrid scheme, combined with the predictive model, outperforms the best-performing alternative by 2.9%, 17.5% and 16% on average on KNL, AMD, and Phytium, respectively. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

16 pages, 2946 KiB  
Article
Research on CNN-BiLSTM Fall Detection Algorithm Based on Improved Attention Mechanism
by Congcong Li, Minghao Liu, Xinsheng Yan and Guifa Teng
Appl. Sci. 2022, 12(19), 9671; https://doi.org/10.3390/app12199671 - 26 Sep 2022
Cited by 5 | Viewed by 2265
Abstract
Falls are one of the significant causes of accidental injuries to the elderly. With the rapid growth of the elderly population, fall detection has become a critical issue in the medical and healthcare fields. In this paper, we propose a model based on [...] Read more.
Falls are one of the significant causes of accidental injuries to the elderly. With the rapid growth of the elderly population, fall detection has become a critical issue in the medical and healthcare fields. In this paper, we propose a model based on an improved attention mechanism, CBAM-IAM-CNN-BiLSTM, to detect falls of the elderly accurately and in time. The model includes a convolution layer, bidirectional LSTM layer, sampling layer and dense layer, and incorporates the improved convolutional attention block module (CBAM) into the network structure so that the one-dimensional convolution layer replaces the dense layer to aggregate the information from channels, which allows the model to accurately extract different behavior characteristics. The acceleration and angular velocity data of the human body, collected by wearable sensors, are respectively input into the convolution layer and bidirectional LSTM layer of the model and then classified and identified by softmax after feature fusion. Based on comparison with models such as CNN and CNN-BiLSTM, as well as with different attention mechanisms such as squeeze-and-excitation (SE), efficient channel attention (ECA) and the convolutional block attention module (CBAM), this model improves the accuracy, sensitivity and specificity to varying degrees. The experimental results showed that the accuracy, sensitivity and specificity of the CBAM-IAM-CNN-BiLSTM model proposed in this paper were 97.37%, 97.29% and 99.56%, respectively, which proves that the model has good practicability and strong generalization ability. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

20 pages, 6924 KiB  
Article
High-Precision Depth Map Estimation from Missing Viewpoints for 360-Degree Digital Holography
by Hakdong Kim, Heonyeong Lim, Minkyu Jee, Yurim Lee, MinSung Yoon and Cheongwon Kim
Appl. Sci. 2022, 12(19), 9432; https://doi.org/10.3390/app12199432 - 20 Sep 2022
Cited by 2 | Viewed by 1986
Abstract
In this paper, we propose a novel model to extract highly precise depth maps from missing viewpoints, especially for generating holographic 3D content. These depth maps are essential elements for phase extraction, which is required for the synthesis of computer-generated holograms (CGHs). The [...] Read more.
In this paper, we propose a novel model to extract highly precise depth maps from missing viewpoints, especially for generating holographic 3D content. These depth maps are essential elements for phase extraction, which is required for the synthesis of computer-generated holograms (CGHs). The proposed model, called the holographic dense depth, estimates depth maps through feature extraction, combining up-sampling. We designed and prepared a total of 9832 multi-view images with resolutions of 640 × 360. We evaluated our model by comparing the estimated depth maps with their ground truths using various metrics. We further compared the CGH patterns created from estimated depth maps with those from ground truths and reconstructed the holographic 3D image scenes from their CGHs. Both quantitative and qualitative results demonstrate the effectiveness of the proposed method. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

16 pages, 2491 KiB  
Article
AdaCB: An Adaptive Gradient Method with Convergence Range Bound of Learning Rate
by Xuanzhi Liao, Shahnorbanun Sahran, Azizi Abdullah and Syaimak Abdul Shukor
Appl. Sci. 2022, 12(18), 9389; https://doi.org/10.3390/app12189389 - 19 Sep 2022
Viewed by 2299
Abstract
Adaptive gradient descent methods such as Adam, RMSprop, and AdaGrad achieve great success in training deep learning models. These methods adaptively change the learning rates, resulting in a faster convergence speed. Recent studies have shown their problems include extreme learning rates, non-convergence issues, [...] Read more.
Adaptive gradient descent methods such as Adam, RMSprop, and AdaGrad achieve great success in training deep learning models. These methods adaptively change the learning rates, resulting in a faster convergence speed. Recent studies have shown their problems include extreme learning rates, non-convergence issues, as well as poor generalization. Some enhanced variants have been proposed, such as AMSGrad, and AdaBound. However, the performances of these alternatives are controversial and some drawbacks still occur. In this work, we proposed an optimizer called AdaCB, which limits the learning rates of Adam in a convergence range bound. The bound range is determined by the LR test, and then two bound functions are designed to constrain Adam, and two bound functions tend to a constant value. To evaluate our method, we carry out experiments on the image classification task, three models including Smallnet, Network IN Network, and Resnet are trained on CIFAR10 and CIFAR100 datasets. Experimental results show that our method outperforms other optimizers on CIFAR10 and CIFAR100 datasets with accuracies of (82.76%, 53.29%), (86.24%, 60.19%), and (83.24%, 55.04%) on Smallnet, Network IN Network and Resnet, respectively. The results also indicate that our method maintains a faster learning speed, like adaptive gradient methods, in the early stage and achieves considerable accuracy, like SGD (M), at the end. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

18 pages, 1099 KiB  
Article
Simulation of Intellectual Property Management on Evolution Driving of Regional Economic Growth
by Xiran Yang and Yong Qi
Appl. Sci. 2022, 12(18), 9011; https://doi.org/10.3390/app12189011 - 8 Sep 2022
Cited by 4 | Viewed by 2051
Abstract
The input, application, and transformation of intellectual property can significantly promote the economic development of a region, but the path and operation mechanism of intellectual property management on regional economic growth are not very clear. System dynamics theory was used to analyze the [...] Read more.
The input, application, and transformation of intellectual property can significantly promote the economic development of a region, but the path and operation mechanism of intellectual property management on regional economic growth are not very clear. System dynamics theory was used to analyze the driving force and resistance of intellectual property management from macro to micro. With the help of system dynamics theory, equations were constructed to simulate the process path and force from intellectual property management to regional economic growth, and sensitivity analysis was used to find sensitive influencing factors in the system. The following conclusions were drawn: (1) intellectual property affects regional economic growth from the macro level, such as intellectual property investment, policies, and the construction of rules and regulations; (2) enterprises, whether in the industry, universities and research institutes, or in this system, are the main body to create innovative benefits and ultimately promote regional economic growth; (3) the continuous investment of intellectual property resources and the driving force of enterprise innovation are all sensitive factors of this system. The government should give full play to its functions and strengthen the management of intellectual property in order to enable the regional economy to obtain high quality development. Through the study of the cooperation between the subjects in the process of intellectual property management activities, the integration and allocation process of factors and resources has been researched, and revealing the action process and dynamic change process of technological innovation activities on economic growth also have been revealed. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

13 pages, 3058 KiB  
Article
An Unsupervised Depth-Estimation Model for Monocular Images Based on Perceptual Image Error Assessment
by Hyeseung Park and Seungchul Park
Appl. Sci. 2022, 12(17), 8829; https://doi.org/10.3390/app12178829 - 2 Sep 2022
Cited by 2 | Viewed by 1956
Abstract
In this paper, we propose a novel unsupervised learning-based model for estimating the depth of monocular images by integrating a simple ResNet-based auto-encoder and some special loss functions. We use only stereo images obtained from binocular cameras as training data without using depth [...] Read more.
In this paper, we propose a novel unsupervised learning-based model for estimating the depth of monocular images by integrating a simple ResNet-based auto-encoder and some special loss functions. We use only stereo images obtained from binocular cameras as training data without using depth ground-truth data. Our model basically outputs a disparity map that is necessary to warp an input image to an image corresponding to a different viewpoint. When the input image is warped using the output-disparity map, distortions of various patterns inevitably occur in the reconstructed image. During the training process, the occurrence frequency and size of these distortions gradually decrease, while the similarity between the reconstructed and target images increases, which proves that the accuracy of the predicted disparity maps also increases. Therefore, one of the important factors in this type of training is an efficient loss function that accurately measures how much the difference in quality between the reconstructed and target images is and guides the gap to be properly and quickly closed as the training progresses. In recent related studies, the photometric difference was calculated through simple methods such as L1 and L2 loss or by combining one of these with a traditional computer vision-based hand-coded image-quality assessment algorithm such as SSIM. However, these methods have limitations in modeling various patterns at the level of the human visual system. Therefore, the proposed model uses a pre-trained perceptual image-quality assessment model that effectively mimics human-perception mechanisms to measure the quality of distorted images as image-reconstruction loss. In order to highlight the performance of the proposed loss functions, a simple ResNet50-based network is adopted in our model. We trained our model using stereo images of the KITTI 2015 driving dataset to measure the pixel-level depth for 768 × 384 images. Despite the simplicity of the network structure, thanks to the effectiveness of the proposed image-reconstruction loss, our model outperformed other state-of-the-art studies that have been trained in unsupervised methods on a variety of evaluation indicators. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

17 pages, 3891 KiB  
Article
Exploiting Hierarchical Label Information in an Attention-Embedding, Multi-Task, Multi-Grained, Network for Scene Classification of Remote Sensing Imagery
by Peng Zeng, Shixuan Lin, Hao Sun and Dongbo Zhou
Appl. Sci. 2022, 12(17), 8705; https://doi.org/10.3390/app12178705 - 30 Aug 2022
Cited by 3 | Viewed by 1713
Abstract
Remote sensing scene classification aims to automatically assign proper labels to remote sensing images. Most of the existing deep learning based methods usually consider the interclass and intraclass relationships of the image content for classification. However, these methods rarely consider the hierarchical information [...] Read more.
Remote sensing scene classification aims to automatically assign proper labels to remote sensing images. Most of the existing deep learning based methods usually consider the interclass and intraclass relationships of the image content for classification. However, these methods rarely consider the hierarchical information of scene labels, as a scene label may belong to hierarchically multi-grained levels. For example, multi-grained level labels may indicate that a remote sensing scene image may belong to the coarse-grained label “transportation land” while also belonging to the fine-grained label “airport”. In this paper, to exploit hierarchical label information, we propose an attention-embedding multi-task multi-grained network (AEMMN) for remote sensing scene classification. In the proposed AEMMN, we add a coarse-grained classifier as the first level and a fine-grained classifier as the second level to perform multi-task learning tasks. Additionally, a gradient control module is utilized to control the gradient propagation of two classifiers to suppress the negative transfer caused by the irrelevant features between tasks. In the feature extraction portion, the model uses an ECA module embedding Resnet50 to extract effective features with cross-channel interaction information. Furthermore, an external attention module is exploited to improve the discrimination of fine-grained and coarse-grained features. Experiments were conducted on the NWPU-RESISC45 and the Aerial Image Data Set (AID), and the overall accuracy of the proposed AEMMN is 92.07% on the NWPU-RESISC45 dataset and reached 94.96% on the AID. The results indicate that hierarchical label information can effectively improve the performance of scene classification tasks when categorizing remote sensing imagery. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

14 pages, 2616 KiB  
Article
Image Dehazing Algorithm Based on Deep Learning Coupled Local and Global Features
by Shuping Li, Qianhao Yuan, Yeming Zhang, Baozhan Lv and Feng Wei
Appl. Sci. 2022, 12(17), 8552; https://doi.org/10.3390/app12178552 - 26 Aug 2022
Cited by 13 | Viewed by 5984
Abstract
To address the problems that most convolutional neural network-based image defogging algorithm models capture incomplete global feature information and incomplete defogging, this paper proposes an end-to-end convolutional neural network and vision transformer hybrid image defogging algorithm. First, the shallow features of the haze [...] Read more.
To address the problems that most convolutional neural network-based image defogging algorithm models capture incomplete global feature information and incomplete defogging, this paper proposes an end-to-end convolutional neural network and vision transformer hybrid image defogging algorithm. First, the shallow features of the haze image were extracted by a preprocessing module. Then, a symmetric network structure including a convolutional neural network (CNN) branch and a vision transformer branch was used to capture the local features and global features of the haze image, respectively. The mixed features were fused using convolutional layers to cover the global representation while retaining the local features. Finally, the features obtained by the encoder and decoder were fused to obtain richer feature information. The experimental results show that the proposed defogging algorithm achieved better defogging results in both the uniform and non-uniform haze datasets, solves the problems of dark and distorted colors after image defogging, and the recovered images are more natural for detail processing. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

22 pages, 47381 KiB  
Article
Method for 2D-3D Registration under Inverse Depth and Structural Semantic Constraints for Digital Twin City
by Xiaofei Hu, Yang Zhou and Qunshan Shi
Appl. Sci. 2022, 12(17), 8543; https://doi.org/10.3390/app12178543 - 26 Aug 2022
Cited by 3 | Viewed by 1787
Abstract
A digital twin city maps a virtual three-dimensional (3D) city model to the geographic information system, constructs a virtual world, and integrates real sensor data to achieve the purpose of virtual–real fusion. Focusing on the accuracy problem of vision sensor registration in the [...] Read more.
A digital twin city maps a virtual three-dimensional (3D) city model to the geographic information system, constructs a virtual world, and integrates real sensor data to achieve the purpose of virtual–real fusion. Focusing on the accuracy problem of vision sensor registration in the virtual digital twin city scene, this study proposes a 2D-3D registration method under inverse depth and structural semantic constraints. First, perspective and inverse depth images of the virtual scene were obtained by using perspective view and inverse-depth nascent technology, and then the structural semantic features were extracted by the two-line minimal solution set method. A simultaneous matching and pose estimation method under inverse depth and structural semantic constraints was proposed to achieve the 2D-3D registration of real images and virtual scenes. The experimental results show that the proposed method can effectively optimize the initial vision sensor pose and achieve high-precision registration in the digital twin scene, and the Z-coordinate error is reduced by 45%. An application experiment of monocular image multi-object spatial positioning was designed, which proved the practicability of this method, and the influence of model data error on registration accuracy was analyzed. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Graphical abstract

14 pages, 13605 KiB  
Article
Rwin-FPN++: Rwin Transformer with Feature Pyramid Network for Dense Scene Text Spotting
by Chengbin Zeng, Yi Liu and Chunli Song
Appl. Sci. 2022, 12(17), 8488; https://doi.org/10.3390/app12178488 - 25 Aug 2022
Cited by 1 | Viewed by 1607
Abstract
Scene text spotting has made tremendous progress with the in-depth research on deep convolutional neural networks (DCNN). Previous approaches mainly focus on the spotting of arbitrary-shaped scene text, on which it is difficult to achieve satisfactory results on dense scene text containing various [...] Read more.
Scene text spotting has made tremendous progress with the in-depth research on deep convolutional neural networks (DCNN). Previous approaches mainly focus on the spotting of arbitrary-shaped scene text, on which it is difficult to achieve satisfactory results on dense scene text containing various instances of bending, occlusion, and lighting. To address this problem, we propose an approach called Rwin-FPN++, which incorporates the long-range dependency merit of the Rwin Transformer into the feature pyramid network (FPN) to effectively enhance the functionality and generalization of FPN. Specifically, we first propose the rotated windows-based Transformer (Rwin) to enhance the rotation-invariant performance of self-attention. Then, we attach the Rwin Transformer to each level on our feature pyramids to extract global self-attention contexts for each feature map produced by the FPN. Thirdly, we fuse these feature pyramids by upsampling to predict the score matrix and keypoints matrix of the text regions. Fourthly, a simple post-processing process is adopted to precisely merge the pixels in the score matrix and keypoints matrix and obtain the final segmentation results. Finally, we use the recurrent neural network to recognize each segmentation region and thus achieve the final spotting results. To evaluate the performance of our Rwin-FPN++ network, we construct a dense scene text dataset with various shapes and occlusion from the wiring of the terminal block of the substation panel cabinet. We train our Rwin-FPN++ network on public datasets and then evaluate the performance on our dense scene text dataset. Experiments demonstrate that our Rwin-FPN++ network can achieve an F-measure of 79% and outperform all other methods in F-measure by at least 2.8%. This is because our proposed method has better rotation invariance and long-range dependency merit. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

15 pages, 3893 KiB  
Article
Research on Rockburst Risk Level Prediction Method Based on LightGBM−TCN−RF
by Li Ma, Jiajun Cai, Xinguan Dai and Ronghao Jia
Appl. Sci. 2022, 12(16), 8226; https://doi.org/10.3390/app12168226 - 17 Aug 2022
Cited by 7 | Viewed by 1437
Abstract
Rockburst hazards pose a severe threat to mine safety. To accurately predict the risk level of rockburst, a LightGBM−TCN−RF prediction model is proposed in this paper. The correlation coefficient heat map combined with the LightGBM feature selection algorithm is used to screen the [...] Read more.
Rockburst hazards pose a severe threat to mine safety. To accurately predict the risk level of rockburst, a LightGBM−TCN−RF prediction model is proposed in this paper. The correlation coefficient heat map combined with the LightGBM feature selection algorithm is used to screen the rockburst characteristic variables and establish rockburst predicted characteristic variables. Then, the TCN prediction model with a better prediction performance is selected to predict the rockburst characteristic variables at time t + 1. The RF classification model of rockburst risk level with a better classification effect is used to classify the risk level of rockburst characteristic variables at time t + 1. The comparison experiments show that the rockburst characteristic variables after screening allow a more accurate prediction. The overall RMSE and MAE of the TCN prediction model are 0.124 and 0.079, which are better than those of RNN, LSTM, and GRU by about 0.1–2.5%. The accuracy of the RF classification model for the rockburst risk level is 96.17%, which is about 20% higher than that of KNN and SVM, and the model accuracy is improved by 1.62% after parameter tuning by the PSO algorithm. The experimental results show that the LightGBM−TCN−RF model can better classify and predict rockburst risk levels at future moments, which has a certain reference value for rockburst monitoring and early warning. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

9 pages, 2114 KiB  
Article
Intelligent Target Design Based on Complex Target Simulation
by Jiaxing Hao, Xuetian Wang, Sen Yang, Hongmin Gao, Cuicui Yu and Wentao Xing
Appl. Sci. 2022, 12(16), 8010; https://doi.org/10.3390/app12168010 - 10 Aug 2022
Cited by 1 | Viewed by 1662
Abstract
The emergence and popularization of various fifth-generation fighter jets with supersonic cruise, super maneuverability, and stealth functionalities have raised higher and more comprehensive challenges for the tactical performance and operational indicators of air defense weapon systems. The training of air defense systems requires [...] Read more.
The emergence and popularization of various fifth-generation fighter jets with supersonic cruise, super maneuverability, and stealth functionalities have raised higher and more comprehensive challenges for the tactical performance and operational indicators of air defense weapon systems. The training of air defense systems requires simulated targets; however, the traditional targets cannot simulate the radar cross-section (RCS) distribution characteristics of fifth-generation fighter aircrafts. In addition, the existing target aircrafts are expensive and cannot be mass-produced. Therefore, in this paper, a corner reflector and a Luneburg ball reflector with RCS distribution characteristics of a fifth-generation fighter in a certain spatial area are designed for target simulation. Several corner reflectors and Luneburg balls are used to form an array to realize the simulations. The RCS value and distribution characteristics of the target can be combined with fuzzy clustering and a single-chip microcomputer to design an intelligent switching system, which improves the practicability of the intelligent target design proposed in this paper. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

15 pages, 8658 KiB  
Article
Numerical Analysis of Instability Mechanism of a High Slope under Excavation Unloading and Rainfall
by Manli Qu and Faning Dang
Appl. Sci. 2022, 12(16), 7990; https://doi.org/10.3390/app12167990 - 10 Aug 2022
Cited by 7 | Viewed by 2046
Abstract
High slope simulation analysis is an essential means of slope engineering design, construction, and operation management. It is necessary to master slope dynamics, ensure slope safety, analyze slope instability mechanisms, and carry out slope stability early warning and prediction. This paper, aiming at [...] Read more.
High slope simulation analysis is an essential means of slope engineering design, construction, and operation management. It is necessary to master slope dynamics, ensure slope safety, analyze slope instability mechanisms, and carry out slope stability early warning and prediction. This paper, aiming at the landslide phenomenon of the high slope on the left bank of a reservoir project, considering the influence of stratum lithology, fault, excavation unloading, rainfall, and water storage, establishes a refined finite element model that reflects the internal structure of the slope. The fluid-solid coupling numerical simulation analysis of the high slope is carried out. Based on this, the failure mechanism of the slope under excavation unloading and heavy rainfall is explained. The application of an engineering example shows that under the combined action of excavation unloading and rainfall infiltration, the in-plane saturation of the structure formed at fault at the trailing edge of the excavation slope surface increases, the pore water pressure increases, and the shear strain concentration area appears at the internal structural surface of the slope. The shear strain concentration area extends along the structural surface to the front and rear edges of the slope, resulting in landslide damage. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

21 pages, 9716 KiB  
Article
STDecoder-CD: How to Decode the Hierarchical Transformer in Change Detection Tasks
by Bo Zhao, Xiaoyan Luo, Panpan Tang, Yang Liu, Haoming Wan and Ninglei Ouyang
Appl. Sci. 2022, 12(15), 7903; https://doi.org/10.3390/app12157903 - 6 Aug 2022
Cited by 3 | Viewed by 2050
Abstract
Change detection (CD) is in demand in satellite imagery processing. Inspired by the recent success of the combined transformer-CNN (convolutional neural network) model, TransCNN, originally designed for image recognition, in this paper, we present STDecoder-CD for change detection applications, which is a combination [...] Read more.
Change detection (CD) is in demand in satellite imagery processing. Inspired by the recent success of the combined transformer-CNN (convolutional neural network) model, TransCNN, originally designed for image recognition, in this paper, we present STDecoder-CD for change detection applications, which is a combination of the Siamese network (“S”), the TransCNN backbone (“T”), and three types of decoders (“Decoder”). The Type I model uses a UNet-like decoder, and the Type II decoder is defined by a combination of three modules: the difference detector, FPN (feature pyramid network), and FCN (fully convolutional network). The Type III model updates the change feature map by introducing a transformer decoder. The effectiveness and advantages of the proposed methods over the state-of-the-art alternatives were demonstrated on several CD datasets, and experimental results indicate that: (1) STDecoder-CD has excellent generalization ability and has strong robustness to pseudo-changes and noise. (2) An end-to-end CD network architecture cannot be completely free from the influence of the decoding strategy. In our case, the Type I decoder often obtained finer details than Types II and III due to its multi-scale design. (3) Using the ablation or replacing strategy to modify the three proposed decoder architectures had a limited impact on the CD performance of STDecoder-CD. To the best of our knowledge, we are the first to investigate the effect of different decoding strategies on CD tasks. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

11 pages, 1943 KiB  
Article
Calibrated Convolution with Gaussian of Difference
by Huoxiang Yang, Chao Li, Yongsheng Liang, Wei Liu and Fanyang Meng
Appl. Sci. 2022, 12(13), 6570; https://doi.org/10.3390/app12136570 - 29 Jun 2022
Viewed by 1567
Abstract
Attention mechanisms are widely used for Convolutional Neural Networks (CNNs) when performing various visual tasks. Many methods introduce multi-scale information into attention mechanisms to improve their feature transformation performance; however, these methods do not take into account the potential importance of scale invariance. [...] Read more.
Attention mechanisms are widely used for Convolutional Neural Networks (CNNs) when performing various visual tasks. Many methods introduce multi-scale information into attention mechanisms to improve their feature transformation performance; however, these methods do not take into account the potential importance of scale invariance. This paper proposes a novel type of convolution, called Calibrated Convolution with Gaussian of Difference (CCGD), that takes into account both the attention mechanisms and scale invariance. A simple yet effective scale-invariant attention module that operates within a single convolution is able to adaptively build powerful scale-invariant features to recalibrate the feature representation. Along with this, a CNN with a heterogeneously grouped structure is used, which enhances the multi-scale representation capability. CCGD can be flexibly deployed in modern CNN architectures without introducing extra parameters. During experimental tests on various datasets, the method increased the ResNet50-based classification accuracy from 76.40% to 77.87% on the ImageNet dataset, and the tests generally confirmed that CCGD can outperform other state-of-the-art attention methods. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

19 pages, 6319 KiB  
Article
Single Image Super-Resolution Method Based on an Improved Adversarial Generation Network
by Qiang Wang, Hongbin Zhou, Guangyuan Li and Jiansheng Guo
Appl. Sci. 2022, 12(12), 6067; https://doi.org/10.3390/app12126067 - 15 Jun 2022
Cited by 5 | Viewed by 2325
Abstract
Super-Resolution (SR) techniques for image restoration have recently been gaining attention due to their excellent performance. For powerful learning abilities, Generative Adversarial Networks (GANs) have been proven to have achieved great success. In this paper, we propose an Enhanced Generative Adversarial Network (EGAN) [...] Read more.
Super-Resolution (SR) techniques for image restoration have recently been gaining attention due to their excellent performance. For powerful learning abilities, Generative Adversarial Networks (GANs) have been proven to have achieved great success. In this paper, we propose an Enhanced Generative Adversarial Network (EGAN) for improving its effects for a real-time Super-Resolution task. The main content of this paper are as follows: (1) We adopted the Laplacian pyramid framework as a pre-trained module, which is beneficial for providing multiscale features for our input. (2) At each feature block, a convolutional skip-connections network, which may contain some latent information, was significant for the generative model to reconstruct a plausible-looking image. (3) Considering that the edge details usually play an important role in image generation, a perceptual loss function was defined to train and seek the optimal parameters. Quantitative and qualitative evaluations were demonstrated so that our algorithm not only took full advantage of the Convolutional Neural Networks (CNNs) to improve the image quality, but also performed better than other algorithms in speed and performance for real-time Super-Resolution tasks. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Review

Jump to: Research

22 pages, 7275 KiB  
Review
Deep Learning Methods in Image Matting: A Survey
by Lingtao Huang, Xipeng Liu, Xuelin Wang, Jiangqi Li and Benying Tan
Appl. Sci. 2023, 13(11), 6512; https://doi.org/10.3390/app13116512 - 26 May 2023
Cited by 3 | Viewed by 4920
Abstract
Image matting is a fundamental technique used to extract a fine foreground image from a given image by estimating the opacity values of each pixel. It is one of the key techniques in image processing and has a wide range of applications in [...] Read more.
Image matting is a fundamental technique used to extract a fine foreground image from a given image by estimating the opacity values of each pixel. It is one of the key techniques in image processing and has a wide range of applications in practical scenarios, such as in image and video editing. Deep learning has demonstrated outstanding performance in various image processing tasks, making it a popular research topic. In recent years, image matting methods based on deep learning have gained significant attention due to their superior performance. Therefore, this article presents a comprehensive overview of the deep learning-based image matting algorithms that have been proposed in recent years. This paper initially introduces frequently used datasets and their production methods, along with the basic principles of traditional image matting techniques. We then analyze deep learning-based matting algorithms in detail and introduce commonly used image matting evaluation metrics. Additionally, this paper discusses the application scenarios of image matting, conducts experiments to illustrate the limitations of current image matting methods, and outlines potential future research directions in this field. Overall, this paper can serve as a valuable reference for researchers that are interested in image matting. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

43 pages, 3126 KiB  
Review
Deep Residual Learning for Image Recognition: A Survey
by Muhammad Shafiq and Zhaoquan Gu
Appl. Sci. 2022, 12(18), 8972; https://doi.org/10.3390/app12188972 - 7 Sep 2022
Cited by 327 | Viewed by 36233
Abstract
Deep Residual Networks have recently been shown to significantly improve the performance of neural networks trained on ImageNet, with results beating all previous methods on this dataset by large margins in the image classification task. However, the meaning of these impressive numbers and [...] Read more.
Deep Residual Networks have recently been shown to significantly improve the performance of neural networks trained on ImageNet, with results beating all previous methods on this dataset by large margins in the image classification task. However, the meaning of these impressive numbers and their implications for future research are not fully understood yet. In this survey, we will try to explain what Deep Residual Networks are, how they achieve their excellent results, and why their successful implementation in practice represents a significant advance over existing techniques. We also discuss some open questions related to residual learning as well as possible applications of Deep Residual Networks beyond ImageNet. Finally, we discuss some issues that still need to be resolved before deep residual learning can be applied on more complex problems. Full article
(This article belongs to the Special Issue AI-Based Image Processing)
Show Figures

Figure 1

Back to TopTop