applsci-logo

Journal Browser

Journal Browser

Deep Learning for Image Recognition and Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 January 2025) | Viewed by 20083

Special Issue Editors


E-Mail Website
Guest Editor
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Interests: image and video semantic segmentation; deep learning; industrial process control; industrial intelligence; natural language processing; knowledge graph
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Interests: machine learning; deep learning; remote sensing image analysis; image segmentation

E-Mail Website
Guest Editor
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Interests: deep learning; machine learning; image processing

Special Issue Information

Dear Colleagues,

Deep learning technology has been drawing increasing interest for a wide range of computer vision and image analysis tasks, such as image classification, image segmentation, object detection and so on. A number of applications can be utilized by deep learning technology, such as industrial intelligence, remote sensing image analysis and autonomous driving. However, deep learning technology has faced some challenging problems in various applications, limiting image recognition and processing performance. Some modified deep learning on model-based or module-based strategies cannot cope with the volume of problems, and this must be urgently addressed: for example, how to improve deep learning model accuracy by using a limited sample size, or how to propose an explainable deep learning model to make it precise and reliable. As a result, this Special Issue aims to cover novel strategies on deep learning algorithms for image recognition and processing by using reliable, optimized and hybrid deep learning algorithms in a number of applications.

In this Special Issue, original research articles and reviews are welcome. Research areas may include (but are not limited to) the following:

  • Deep learning image analysis in civil applications (e.g., industrial intelligence, remote sensing, biology and medical image).
  • Novel training strategies when facing small sample size problems.
  • Explainable deep learning models to innovate deep learning model construction and improve model reliability.
  • Image classification, segmentation and object targeting using deep learning algorithms.
  • The transfer of learning and knowledge distillation for image processing.

We look forward to receiving your contributions.

Prof. Dr. Jiangyun Li
Dr. Tianxiang Zhang
Dr. Peixian Zhuang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning/deep learning
  • image segmentation
  • image classification
  • object detection
  • remote sensing image analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

18 pages, 8134 KiB  
Article
YOLOv8-WD: Deep Learning-Based Detection of Defects in Automotive Brake Joint Laser Welds
by Jiajun Ren, Haifeng Zhang and Min Yue
Appl. Sci. 2025, 15(3), 1184; https://doi.org/10.3390/app15031184 - 24 Jan 2025
Viewed by 398
Abstract
The rapid advancement of industrial automation in the automotive manufacturing sector has heightened demand for welding quality, particularly in critical component welding, where traditional manual inspection methods are inefficient and prone to human error, leading to low defect recognition rates that fail to [...] Read more.
The rapid advancement of industrial automation in the automotive manufacturing sector has heightened demand for welding quality, particularly in critical component welding, where traditional manual inspection methods are inefficient and prone to human error, leading to low defect recognition rates that fail to meet modern manufacturing standards. To address these challenges, an enhanced YOLOv8-based algorithm for steel defect detection, termed YOLOv8-WD (weld detection), was developed to improve accuracy and efficiency in identifying defects in steel. We implemented a novel data augmentation strategy with various image transformation techniques to enhance the model’s generalization across different welding scenarios. The Efficient Vision Transformer (EfficientViT) architecture was adopted to optimize feature representation and contextual understanding, improving detection accuracy. Additionally, we integrated the Convolution and Attention Fusion Module (CAFM) to effectively combine local and global features, enhancing the model’s ability to capture diverse feature scales. Dynamic convolution (DyConv) techniques were also employed to generate convolutional kernels based on input images, increasing model flexibility and efficiency. Through comprehensive optimization and tuning, our research achieved a mean average precision (map) at IoU 0.5 of 90.5% across multiple datasets, contributing to improved weld defect detection and offering a reliable automated inspection solution for the industry. Full article
(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)
Show Figures

Figure 1

20 pages, 3280 KiB  
Article
Robust Miner Detection in Challenging Underground Environments: An Improved YOLOv11 Approach
by Yadong Li, Hui Yan, Dan Li and Hongdong Wang
Appl. Sci. 2024, 14(24), 11700; https://doi.org/10.3390/app142411700 - 15 Dec 2024
Viewed by 1318
Abstract
To address the issue of low detection accuracy caused by low illumination and occlusion in underground coal mines, this study proposes an innovative miner detection method. A large dataset encompassing complex environments, such as low-light conditions, partial strong light interference, and occlusion, was [...] Read more.
To address the issue of low detection accuracy caused by low illumination and occlusion in underground coal mines, this study proposes an innovative miner detection method. A large dataset encompassing complex environments, such as low-light conditions, partial strong light interference, and occlusion, was constructed. The Efficient Channel Attention (ECA) mechanism was integrated into the YOLOv11 model to enhance the model’s ability to focus on key features, thereby significantly improving detection accuracy. Additionally, a new weighted Complete Intersection over Union (CIoU) and adaptive confidence loss function were proposed to enhance the model’s robustness in low-light and occlusion scenarios. Experimental results demonstrate that the proposed method outperforms various improved algorithms and state-of-the-art detection models in both detection performance and robustness, providing important technical support and reference for coal miner safety assurance and intelligent mine management. Full article
(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)
Show Figures

Figure 1

19 pages, 3375 KiB  
Article
Lightweight Robust Image Classifier Using Non-Overlapping Image Compression Filters
by Mingde Wang and Zhijing Liu
Appl. Sci. 2024, 14(19), 8636; https://doi.org/10.3390/app14198636 - 25 Sep 2024
Viewed by 731
Abstract
Machine learning systems, particularly in the domain of image recognition, are susceptible to adversarial perturbations applied to input data. These perturbations, while imperceptible to humans, have the capacity to easily deceive deep learning classifiers. Current defense methods for image recognition focus on using [...] Read more.
Machine learning systems, particularly in the domain of image recognition, are susceptible to adversarial perturbations applied to input data. These perturbations, while imperceptible to humans, have the capacity to easily deceive deep learning classifiers. Current defense methods for image recognition focus on using diffusion models and their variants. Due to the depth of diffusion models and the large amount of computations generated during each inference process, the GPU and storage performance of the device are extremely high. To address this problem, we propose a new defense-based non-overlapping image compression filter for image recognition classifiers against adversarial attacks. This method inserts a non-overlapping image compression filter before the classifier to make the results of the classifier invariant under subtle changes in images. This method does not weaken the adversarial robustness of the model and can reduce the computational cost during the training process of the image classification model. In addition, our method can be easily integrated with existing image classification training frameworks with only some minor adjustments. We validate our results by performing a series of experiments under three different convolutional neural network architectures (VGG16, ResNet34, and Inception-ResNet-v2) and on different datasets (CIFAR10 and CIFAR100). The experimental results show that under the Inception-ResNet-v2 architecture, our method achieves an average accuracy of up to 81.15% on the CIFAR10 dataset, fully demonstrating its effectiveness in mitigating adversarial attacks. In addition, under the WRN-28-10 architecture, our method achieves not only 91.28% standard accuracy on the CIFAR10 dataset but also 76.46% average robust accuracy. The test experiment on the model training time consumption shows that our defense method has an advantage in time cost, proving that our defense method is a lightweight and efficient defense strategy. Full article
(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)
Show Figures

Figure 1

19 pages, 1263 KiB  
Article
Defense against Adversarial Attacks in Image Recognition Based on Multilayer Filters
by Mingde Wang and Zhijing Liu
Appl. Sci. 2024, 14(18), 8119; https://doi.org/10.3390/app14188119 - 10 Sep 2024
Cited by 1 | Viewed by 1360
Abstract
The security and privacy of a system are urgent issues in achieving secure and efficient learning-based systems. Recent studies have shown that these systems are susceptible to subtle adversarial perturbations applied to inputs. Although these perturbations are difficult for humans to detect, they [...] Read more.
The security and privacy of a system are urgent issues in achieving secure and efficient learning-based systems. Recent studies have shown that these systems are susceptible to subtle adversarial perturbations applied to inputs. Although these perturbations are difficult for humans to detect, they can easily mislead deep learning classifiers. Noise injection, as a defense mechanism, can offer a provable defense against adversarial attacks by reducing sensitivity to subtle input changes. However, these methods face issues of computational complexity and limited adaptability. We propose a multilayer filter defense model, drawing inspiration from filter-based image denoising techniques. This model inserts a filtering layer after the input layer and before the convolutional layer, and incorporates noise injection techniques during the training process. This model substantially enhances the resilience of image classification systems to adversarial attacks. We also investigated the impact of various filter combinations, filter area sizes, standard deviations, and filter layers on the effectiveness of defense. The experimental results indicate that, across the MNIST, CIFAR10, and CIFAR100 datasets, the multilayer filter defense model achieves the highest average accuracy when employing a double-layer Gaussian filter (filter area size of 3×3, standard deviation of 1). We compared our method with two filter-based defense models, and the experimental results demonstrated that our method attained an average accuracy of 71.9%, effectively enhancing the robustness of the image recognition classifier against adversarial attacks. This method not only performs well on small-scale datasets but also exhibits robustness on large-scale datasets (miniImageNet) and modern models (EfficientNet and WideResNet). Full article
(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)
Show Figures

Figure 1

18 pages, 5456 KiB  
Article
Research on X-ray Diagnosis Model of Musculoskeletal Diseases Based on Deep Learning
by Ganglong Duan, Shaoyang Zhang, Yanying Shang and Weiwei Kong
Appl. Sci. 2024, 14(8), 3451; https://doi.org/10.3390/app14083451 - 19 Apr 2024
Cited by 2 | Viewed by 1511
Abstract
Musculoskeletal diseases affect over 100 million people globally and are a leading cause of severe, prolonged pain, and disability. Recognized as a clinical emergency, prompt and accurate diagnosis of musculoskeletal disorders is crucial, as delayed identification poses the risk of amputation for patients, [...] Read more.
Musculoskeletal diseases affect over 100 million people globally and are a leading cause of severe, prolonged pain, and disability. Recognized as a clinical emergency, prompt and accurate diagnosis of musculoskeletal disorders is crucial, as delayed identification poses the risk of amputation for patients, and in severe cases, can result in life-threatening conditions such as bone cancer. In this paper, a hybrid model HRD (Human-Resnet50-Densenet121) based on deep learning and human participation is proposed to efficiently identify disease features by classifying X-ray images. Feasibility testing of the model was conducted using the MURA dataset, with metrics such as accuracy, recall rate, F1-score, ROC curve, Cohen’s kappa, and AUC values employed for evaluation. Experimental results indicate that, in terms of model accuracy, the hybrid model constructed through a combination strategy surpassed the accuracy of any individual model by more than 4%. The model achieved a peak accuracy of 88.81%, a maximum recall rate of 94%, and the highest F1-score value of 87%, all surpassing those of any single model. The hybrid model demonstrates excellent generalization performance and classification accuracy. Full article
(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)
Show Figures

Figure 1

19 pages, 3357 KiB  
Article
Integrating Sigmoid Calibration Function into Entropy Thresholding Segmentation for Enhanced Recognition of Potholes Imaged Using a UAV Multispectral Sensor
by Sandisiwe Nomqupu, Athule Sali, Adolph Nyamugama and Naledzani Ndou
Appl. Sci. 2024, 14(7), 2670; https://doi.org/10.3390/app14072670 - 22 Mar 2024
Cited by 1 | Viewed by 1339
Abstract
This study was aimed at enhancing pothole detection by combining sigmoid calibration function and entropy thresholding segmentation on UAV multispectral imagery. UAV imagery was acquired via the flying of the DJI Matrice 600 (M600) UAV system, with the MicaSense RedEdge imaging sensor mounted [...] Read more.
This study was aimed at enhancing pothole detection by combining sigmoid calibration function and entropy thresholding segmentation on UAV multispectral imagery. UAV imagery was acquired via the flying of the DJI Matrice 600 (M600) UAV system, with the MicaSense RedEdge imaging sensor mounted on its fixed wing. An endmember spectral pixel denoting pothole feature was selected and used as the base from which spectral radiance patterns of a pothole were analyzed. A field survey was carried out to measure pothole diameters, which were used as the base on which the pothole area was determined. Entropy thresholding segmentation was employed to classify potholes. The sigmoid calibration function was used to reconfigure spectral radiance properties of the UAV spectral bands to pothole features. The descriptive statistics was computed to determine radiance threshold values to be used in demarcating potholes from the reconfigured or calibrated spectral bands. The performance of the sigmoid calibration function was evaluated by analyzing the area under curve (AUC) results generated using the Relative Operating Characteristic (ROC) technique. Spectral radiance pattern analysis of the pothole surface revealed high radiance values in the red channel and low radiance values in the near-infrared (NIR) channels of the spectrum. The sigmoid calibration function radiometrically reconfigured UAV spectral bands based on a total of 500 sampled pixels of pothole surface obtained from all the spectral channels. Upon successful calibration of UAV radiometric properties to pothole surface, the reconfigured mean radiance values for pothole surface were noted to be 0.868, 0.886, 0.944, 0.211 and 0.863 for blue, green, red, NIR and red edge, respectively. The area under curve (AUC) results revealed the r2 values of 0.53, 0.35, 0.71, 0.19 and 0.35 for blue, green, red, NIR and red edge spectral channels, respectively. Overestimation of pothole 1 by both original and calibrated spectral channels was noted and can be attributed to the presence of soils adjacent to the pothole. However, calibrated red channel estimated pothole 2 and pothole 3 accurately, with a slight area deviation from the measured potholes. The results of this study emphasize the significance of reconfiguring radiometric properties of the UAV imagery for improved recognition of potholes. Full article
(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)
Show Figures

Figure 1

14 pages, 4292 KiB  
Article
LM-DeeplabV3+: A Lightweight Image Segmentation Algorithm Based on Multi-Scale Feature Interaction
by Xinyu Hou, Peng Chen and Haishuo Gu
Appl. Sci. 2024, 14(4), 1558; https://doi.org/10.3390/app14041558 - 15 Feb 2024
Cited by 3 | Viewed by 2587
Abstract
Street-view images can help us to better understand the city environment and its potential characteristics. With the development of computer vision and deep learning, the technology of semantic segmentation algorithms has become more mature. However, DeeplabV3+, which is commonly used in semantic segmentation, [...] Read more.
Street-view images can help us to better understand the city environment and its potential characteristics. With the development of computer vision and deep learning, the technology of semantic segmentation algorithms has become more mature. However, DeeplabV3+, which is commonly used in semantic segmentation, has shortcomings such as a large number of parameters, high requirements for computing resources, and easy loss of detailed information. Therefore, this paper proposes LM-DeeplabV3+, which aims to greatly reduce the parameters and computations of the model while ensuring segmentation accuracy. Firstly, the lightweight network MobileNetV2 is selected as the backbone network, and the ECA attention mechanism is introduced after MobileNetV2 extracts shallow features to improve the ability of feature representation; secondly, the ASPP module is improved, and on this basis, the EPSA attention mechanism is introduced to achieve cross-dimensional channel attention and important feature interaction; thirdly, a loss function named CL loss is designed to balance the training offset of multiple categories and better indicate the segmentation quality. This paper conducted experimental verification on the Cityspaces dataset, and the results showed that the mIoU reached 74.9%, which was an improvement of 3.56% compared to DeeplabV3+; and the mPA reached 83.01%, which was an improvement of 2.53% compared to DeeplabV3+. Full article
(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)
Show Figures

Figure 1

20 pages, 2965 KiB  
Article
A Novel Road Crack Detection Technology Based on Deep Dictionary Learning and Encoding Networks
by Li Fan and Jiancheng Zou
Appl. Sci. 2023, 13(22), 12299; https://doi.org/10.3390/app132212299 - 14 Nov 2023
Cited by 2 | Viewed by 1802
Abstract
Road crack detection is an important indicator of road detection. In real life, it is very meaningful work to detect road cracks. With the rapid development of science and technology, especially computer science and technology, quite a lot of methods have been applied [...] Read more.
Road crack detection is an important indicator of road detection. In real life, it is very meaningful work to detect road cracks. With the rapid development of science and technology, especially computer science and technology, quite a lot of methods have been applied to crack detection. Traditional detection methods rely on manual identification, which is inefficient and prone to errors. In addition, the commonly used image processing methods are affected by many factors, such as illumination, road stains, etc., so the results are unstable. In the research on pavement crack detection, many research studies mainly focus on the recognition and classification of cracks, lacking the analysis of the specific characteristics of cracks, and the feature values of cracks cannot be measured. Starting from the deep learning method in computer science and technology, this paper proposes a road crack detection technology based on deep learning. It relies on a new deep dictionary learning and encoding network DDLCN, establishes a new activation function MeLU, and adopts a new differentiable calculation method. The technology relies on the traditional Mask-RCNN algorithm and is implemented after improvement. In the comparison of evaluation indicators, the values of recall, precision, and F1-score reflect certain superiority. Experiments show that the proposed method has good implementability and performance in road crack detection and crack feature measurement. Full article
(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)
Show Figures

Figure 1

17 pages, 5136 KiB  
Article
A Novel Hybrid Approach for a Content-Based Image Retrieval Using Feature Fusion
by Shahbaz Sikandar, Rabbia Mahum and AbdulMalik Alsalman
Appl. Sci. 2023, 13(7), 4581; https://doi.org/10.3390/app13074581 - 4 Apr 2023
Cited by 19 | Viewed by 5389
Abstract
The multimedia content generated by devices and image processing techniques requires high computation costs to retrieve images similar to the user’s query from the database. An annotation-based traditional system of image retrieval is not coherent because pixel-wise matching of images brings significant variations [...] Read more.
The multimedia content generated by devices and image processing techniques requires high computation costs to retrieve images similar to the user’s query from the database. An annotation-based traditional system of image retrieval is not coherent because pixel-wise matching of images brings significant variations in terms of pattern, storage, and angle. The Content-Based Image Retrieval (CBIR) method is more commonly used in these cases. CBIR efficiently quantifies the likeness between the database images and the query image. CBIR collects images identical to the query image from a huge database and extracts more useful features from the image provided as a query image. Then, it relates and matches these features with the database images’ features and retakes them with similar features. In this study, we introduce a novel hybrid deep learning and machine learning-based CBIR system that uses a transfer learning technique and is implemented using two pre-trained deep learning models, ResNet50 and VGG16, and one machine learning model, KNN. We use the transfer learning technique to obtain the features from the images by using these two deep learning (DL) models. The image similarity is calculated using the machine learning (ML) model KNN and Euclidean distance. We build a web interface to show the result of similar images, and the Precision is used as the performance measure of the model that achieved 100%. Our proposed system outperforms other CBIR systems and can be used in many applications that need CBIR, such as digital libraries, historical research, fingerprint identification, and crime prevention. Full article
(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)
Show Figures

Figure 1

Review

Jump to: Research

23 pages, 1377 KiB  
Review
A Survey on Surface Defect Inspection Based on Generative Models in Manufacturing
by Yu He, Shuai Li, Xin Wen and Jing Xu
Appl. Sci. 2024, 14(15), 6774; https://doi.org/10.3390/app14156774 - 2 Aug 2024
Viewed by 1458
Abstract
Surface defect inspection based on deep learning has demonstrated outstanding performance in improving detection accuracy and model generalization. However, the small scale of defect datasets always limits the application of deep models in industry. Generative models can obtain realistic samples in a very [...] Read more.
Surface defect inspection based on deep learning has demonstrated outstanding performance in improving detection accuracy and model generalization. However, the small scale of defect datasets always limits the application of deep models in industry. Generative models can obtain realistic samples in a very cheap way, which can effectively solve this problem and thus has received widespread attention in recent years. This paper provides a comprehensive analysis and summary of the current studies of surface defect inspection methods proposed between 2022 and 2024. First, according to the use of generative models, these methods are classified into four categories: Variational Auto-Encoders (VAEs), Generative Adversarial Networks (GANs), Diffusion Models (DMs), and multi-models. Second, the research status of surface defect inspection based on generative models in recent years is discussed from four aspects: sample generation, detection objective, inspection task, and learning model. Then, the public datasets and evaluation metrics that are commonly used for surface defect inspection are discussed, and a comparative evaluation of defect inspection methods based on generative models is provided. Finally, this study discusses the existing challenges for the defect inspection methods based on generative models, providing insights for future research. Full article
(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)
Show Figures

Figure 1

Back to TopTop