Application of Machine Learning in Image Processing and Computer Vision

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (25 March 2024) | Viewed by 29434

Special Issue Editors


E-Mail Website
Guest Editor
Federal Research Center Computer Science and Control of the Russian Academy of Sciences, 119333 Moscow, Russia
Interests: pattern recognition; image processing; computer vision
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Federal Research Center Computer Science and Control of the Russian Academy of Sciences, 119333 Moscow, Russia
Interests: pattern recognition; image processing; computer vision

Special Issue Information

Dear Colleagues,

Image processing and computer vision are both immensely broad fields, which continue to impact and bring innovation to human civilization, with application areas including fundamental research in astrophysics, material science, and biology; industrial production and agriculture; medical diagnostics; autonomous transport; social services automation; security, personal identification, biometrics, and fraud prevention; and countless more. The usage of machine learning, ranging from shallow models to large-scale deep learning models, fueled a breakthrough in the range of image processing and computer vision tasks, which were thought to be unfeasible or even impossible only a few decades ago, and researchers and engineers continue to find new ways of applying machine learning approaches, first with prototypes and proofs of concept, and then to application in real working systems.

Now that the ‘age of prototyping’ can be considered long gone, there is a growing demand from both academic and technical standpoints for a more in-depth analysis and understanding of the properties of machine learning approaches and their impact. This Special Issue is being launched to collect research articles and in-depth reports on the topics of application of various machine learning approaches to real technical problems related to the broad field of image processing and computer vision, and to discuss its emergent problems, both from a purely mathematical and from an engineering perspective. Example topics include convergence and stability; interpretability of results; efficient computational models and their impact on the solutions ‘in silico’; the privacy risks, ethical issues and dangers presented by the usage of deep learning models for critical tasks such as personal identification, biometrics, industrial safety and many more; the availability of datasets and methodologies both for training the models and for objective comparison of the competing methods.

Dr. Vladimir V. Arlazarov
Dr. Konstantin Bulatov
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • computer vision
  • machine learning
  • deep learning
  • explainable AI
  • ethical AI
  • open datasets
  • computational models

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 950 KiB  
Article
4.6-Bit Quantization for Fast and Accurate Neural Network Inference on CPUs
by Anton Trusov, Elena Limonova, Dmitry Nikolaev and Vladimir V. Arlazarov
Mathematics 2024, 12(5), 651; https://doi.org/10.3390/math12050651 - 23 Feb 2024
Viewed by 2067
Abstract
Quantization is a widespread method for reducing the inference time of neural networks on mobile Central Processing Units (CPUs). Eight-bit quantized networks demonstrate similarly high quality as full precision models and perfectly fit the hardware architecture with one-byte coefficients and thirty-two-bit dot product [...] Read more.
Quantization is a widespread method for reducing the inference time of neural networks on mobile Central Processing Units (CPUs). Eight-bit quantized networks demonstrate similarly high quality as full precision models and perfectly fit the hardware architecture with one-byte coefficients and thirty-two-bit dot product accumulators. Lower precision quantizations usually suffer from noticeable quality loss and require specific computational algorithms to outperform eight-bit quantization. In this paper, we propose a novel 4.6-bit quantization scheme that allows for more efficient use of CPU resources. This scheme has more quantization bins than four-bit quantization and is more accurate while preserving the computational efficiency of the later (it runs only 4% slower). Our multiplication uses a combination of 16- and 32-bit accumulators and avoids multiplication depth limitation, which the previous 4-bit multiplication algorithm had. The experiments with different convolutional neural networks on CIFAR-10 and ImageNet datasets show that 4.6-bit quantized networks are 1.5–1.6 times faster than eight-bit networks on the ARMv8 CPU. Regarding the quality, the results of the 4.6-bit quantized network are close to the mean of four-bit and eight-bit networks of the same architecture. Therefore, 4.6-bit quantization may serve as an intermediate solution between fast and inaccurate low-bit network quantizations and accurate but relatively slow eight-bit ones. Full article
Show Figures

Figure 1

18 pages, 10242 KiB  
Article
Gaussian Process-Based Transfer Kernel Learning for Unsupervised Domain Adaptation
by Pengfei Ge and Yesen Sun
Mathematics 2023, 11(22), 4695; https://doi.org/10.3390/math11224695 - 19 Nov 2023
Cited by 3 | Viewed by 1405
Abstract
The discriminability and transferability of models are two important factors for the success of domain adaptation methods. Recently, some domain adaptation methods have improved models by adding a discriminant information extraction module. However, these methods need to carefully balance the discriminability and transferability [...] Read more.
The discriminability and transferability of models are two important factors for the success of domain adaptation methods. Recently, some domain adaptation methods have improved models by adding a discriminant information extraction module. However, these methods need to carefully balance the discriminability and transferability of a model. To address this problem, we propose a new deep domain adaptation method, Gaussian Process-based Transfer Kernel Learning (GPTKL), which can perform domain knowledge transfer and improve the discrimination ability of the model simultaneously. GPTKL uses the kernel similarity between all samples in the source and target domains as a priori information to establish a cross-domain Gaussian process. By maximizing its likelihood function, GPTKL reduces the domain discrepancy between the source and target domains, thereby enhancing generalization across domains. At the same time, GPTKL introduces the deep kernel learning strategy into the cross-domain Gaussian process to learn a transfer kernel function based on deep features. Through transfer kernel learning, GPTKL learns a deep feature space with both discriminability and transferability. In addition, GPTKL uses cross-entropy and mutual information to learn a classification model shared by the source and target domains. Experiments on four benchmarks show that GPTKL achieves superior classification performance over state-of-the-art methods. Full article
Show Figures

Figure 1

15 pages, 5107 KiB  
Article
PIDFusion: Fusing Dense LiDAR Points and Camera Images at Pixel-Instance Level for 3D Object Detection
by Zheng Zhang, Ruyu Xu and Qing Tian
Mathematics 2023, 11(20), 4277; https://doi.org/10.3390/math11204277 - 13 Oct 2023
Cited by 2 | Viewed by 1621
Abstract
In driverless systems (scenarios such as subways, buses, trucks, etc.), multi-modal data fusion, such as light detection and ranging (LiDAR) points and camera images, is essential for accurate 3D object detection. In the fusion process, the information interaction between the modes is challenging [...] Read more.
In driverless systems (scenarios such as subways, buses, trucks, etc.), multi-modal data fusion, such as light detection and ranging (LiDAR) points and camera images, is essential for accurate 3D object detection. In the fusion process, the information interaction between the modes is challenging due to the different coordinate systems of various sensors and the significant difference in the density of the collected data. It is necessary to fully consider the consistency and complementarity of multi-modal information, make up for the gap between multi-source data density, and achieve the joint interactive processing of multi-source information. Therefore, this paper is based on Transformer to improve a new multi-modal fusion model called PIDFusion for 3D object detection. Firstly, the method uses the results of 2D instance segmentation to generate dense 3D virtual points to enhance the original sparse 3D point clouds. This optimizes the issue that the nearest Euclidean distance in the 2D image space cannot ensure the nearest in the 3D space. Secondly, a new cross-modal fusion architecture is designed to maintain individual per-modality features to take advantage of their unique characteristics during 3D object detection. Finally, an instance-level fusion module is proposed to enhance semantic consistency through cross-modal feature interaction. Experiments show that PIDFusion is far ahead of existing 3D object detection methods, especially for small and long-range objects, with 70.8 mAP and 73.5 NDS on the nuScenes test set. Full article
Show Figures

Figure 1

17 pages, 2151 KiB  
Article
Alzheimer’s Disease Prediction Using Deep Feature Extraction and Optimization
by Farah Mohammad and Saad Al Ahmadi
Mathematics 2023, 11(17), 3712; https://doi.org/10.3390/math11173712 - 29 Aug 2023
Cited by 3 | Viewed by 2436
Abstract
Alzheimer’s disease (AD) is a prevalent neurodegenerative disorder that affects a substantial proportion of the population. The accurate and timely prediction of AD carries considerable importance in enhancing the diagnostic process and improved treatment. This study provides a thorough examination of AD prediction [...] Read more.
Alzheimer’s disease (AD) is a prevalent neurodegenerative disorder that affects a substantial proportion of the population. The accurate and timely prediction of AD carries considerable importance in enhancing the diagnostic process and improved treatment. This study provides a thorough examination of AD prediction using the VGG19 deep learning model. The primary objective of this study is to investigate the effectiveness of feature fusion and optimization techniques in enhancing the accuracy of classification. The generation of a comprehensive feature map is achieved through the fusion of features that have been extracted from the fc7 and fc8 layers of VGG19. Several machine learning algorithms are employed to classify integrated features and recognize AD. The amalgamated feature map demonstrates a significant level of accuracy of 98% in the prognostication of AD, outperforming present cutting-edge methodologies. In this study, a methodology is utilized that makes use of the whale optimization algorithm (WoA), a metaheuristic approach to optimize features through feature selection. Feature optimization aims to eliminate redundant features and enhance the discriminatory power of the selected features. Following the optimization procedure, the F-KNN algorithm attained a precision level of 99%, surpassing the present state-of-the-art (SOTA) results reported in the current literature. Full article
Show Figures

Figure 1

28 pages, 1784 KiB  
Article
Real-Time Detection of Unrecognized Objects in Logistics Warehouses Using Semantic Segmentation
by Serban Vasile Carata, Marian Ghenescu and Roxana Mihaescu
Mathematics 2023, 11(11), 2445; https://doi.org/10.3390/math11112445 - 25 May 2023
Cited by 1 | Viewed by 2072
Abstract
Pallet detection and tracking using computer vision is challenging due to the complexity of the object and its contents, lighting conditions, background clutter, and occlusions in industrial areas. Using semantic segmentation, this paper aims to detect pallets in a logistics warehouse. The proposed [...] Read more.
Pallet detection and tracking using computer vision is challenging due to the complexity of the object and its contents, lighting conditions, background clutter, and occlusions in industrial areas. Using semantic segmentation, this paper aims to detect pallets in a logistics warehouse. The proposed method examines changes in image segmentation from one frame to the next using semantic segmentation, taking into account the position and stationary behavior of newly introduced objects in the scene. The results indicate that the proposed method can detect pallets despite the complexity of the object and its contents. This demonstrates the utility of semantic segmentation for detecting unrecognized objects in real-world scenarios where a precise definition of the class cannot be given. Full article
Show Figures

Figure 1

17 pages, 2597 KiB  
Article
Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training
by Artem Sher, Anton Trusov, Elena Limonova, Dmitry Nikolaev and Vladimir V. Arlazarov
Mathematics 2023, 11(9), 2112; https://doi.org/10.3390/math11092112 - 29 Apr 2023
Cited by 7 | Viewed by 1992
Abstract
Quantized neural networks (QNNs) are widely used to achieve computationally efficient solutions to recognition problems. Overall, eight-bit QNNs have almost the same accuracy as full-precision networks, but working several times faster. However, the networks with lower quantization levels demonstrate inferior accuracy in comparison [...] Read more.
Quantized neural networks (QNNs) are widely used to achieve computationally efficient solutions to recognition problems. Overall, eight-bit QNNs have almost the same accuracy as full-precision networks, but working several times faster. However, the networks with lower quantization levels demonstrate inferior accuracy in comparison to their classical analogs. To solve this issue, a number of quantization-aware training (QAT) approaches were proposed. In this paper, we study QAT approaches for two- to eight-bit linear quantization schemes and propose a new combined QAT approach: neuron-by-neuron quantization with straight-through estimator (STE) gradient forwarding. It is suitable for quantizations with two- to eight-bit widths and eliminates significant accuracy drops during training, which results in better accuracy of the final QNN. We experimentally evaluate our approach on CIFAR-10 and ImageNet classification and show that it is comparable to other approaches for four to eight bits and outperforms some of them for two to three bits while being easier to implement. For example, the proposed approach to three-bit quantization of the CIFAR-10 dataset results in 73.2% accuracy, while baseline direct and layer-by-layer result in 71.4% and 67.2% accuracy, respectively. The results for two-bit quantization for ResNet18 on the ImageNet dataset are 63.69% for our approach and 61.55% for the direct baseline. Full article
Show Figures

Figure 1

15 pages, 4919 KiB  
Article
Efficient and Low Color Information Dependency Skin Segmentation Model
by Hojoon You, Kunyoung Lee, Jaemu Oh and Eui Chul Lee
Mathematics 2023, 11(9), 2057; https://doi.org/10.3390/math11092057 - 26 Apr 2023
Cited by 1 | Viewed by 2072
Abstract
Skin segmentation involves segmenting the human skin region in an image. It is a preprocessing technique mainly used in many applications such as face detection, hand gesture recognition, and remote biosignal measurements. As the performance of skin segmentation directly affects the performance of [...] Read more.
Skin segmentation involves segmenting the human skin region in an image. It is a preprocessing technique mainly used in many applications such as face detection, hand gesture recognition, and remote biosignal measurements. As the performance of skin segmentation directly affects the performance of these applications, precise skin segmentation methods have been studied. However, previous skin segmentation methods are unsuitable for real-world environments because they rely heavily on color information. In addition, deep-learning-based skin segmentation methods incur high computational costs, even though skin segmentation is mainly used for preprocessing. This study proposes a lightweight skin segmentation model with a high performance. Additionally, we used data augmentation techniques that modify the hue, saturation, and values, allowing the model to learn texture or contextual information better without relying on color information. Our proposed model requires 1.09M parameters and 5.04 giga multiply-accumulate. Through experiments, we demonstrated that our proposed model shows high performance with an F-score of 0.9492 and consistent performance even for modified images. Furthermore, our proposed model showed a fast processing speed of approximately 68 fps, based on 3 × 512 × 512 images and an NVIDIA RTX 2080TI GPU (11GB VRAM) graphics card. Full article
Show Figures

Figure 1

21 pages, 13135 KiB  
Article
Multi-Task Learning Approach Using Dynamic Hyperparameter for Multi-Exposure Fusion
by Chan-Gi Im, Dong-Min Son, Hyuk-Ju Kwon and Sung-Hak Lee
Mathematics 2023, 11(7), 1620; https://doi.org/10.3390/math11071620 - 27 Mar 2023
Viewed by 1644
Abstract
High-dynamic-range (HDR) image synthesis is a technology developed to accurately reproduce the actual scene of an image on a display by extending the dynamic range of an image. Multi-exposure fusion (MEF) technology, which synthesizes multiple low-dynamic-range (LDR) images to create an HDR image, [...] Read more.
High-dynamic-range (HDR) image synthesis is a technology developed to accurately reproduce the actual scene of an image on a display by extending the dynamic range of an image. Multi-exposure fusion (MEF) technology, which synthesizes multiple low-dynamic-range (LDR) images to create an HDR image, has been developed in various ways including pixel-based, patch-based, and deep learning-based methods. Recently, methods to improve the synthesis quality of images using deep-learning-based algorithms have mainly been studied in the field of MEF. Despite the various advantages of deep learning, deep-learning-based methods have a problem in that numerous multi-exposed and ground-truth images are required for training. In this study, we propose a self-supervised learning method that generates and learns reference images based on input images during the training process. In addition, we propose a method to train a deep learning model for an MEF with multiple tasks using dynamic hyperparameters on the loss functions. It enables effective network optimization across multiple tasks and high-quality image synthesis while preserving a simple network architecture. Our learning method applied to the deep learning model shows superior synthesis results compared to other existing deep-learning-based image synthesis algorithms. Full article
Show Figures

Figure 1

18 pages, 3183 KiB  
Article
Automated Fire Extinguishing System Using a Deep Learning Based Framework
by Senthil Kumar Jagatheesaperumal, Khan Muhammad, Abdul Khader Jilani Saudagar and Joel J. P. C. Rodrigues
Mathematics 2023, 11(3), 608; https://doi.org/10.3390/math11030608 - 26 Jan 2023
Cited by 6 | Viewed by 4273
Abstract
Fire accidents occur in every part of the world and cause a large number of casualties because of the risks involved in manually extinguishing the fire. In most cases, humans cannot detect and extinguish fire manually. Fire extinguishing robots with sophisticated functionalities are [...] Read more.
Fire accidents occur in every part of the world and cause a large number of casualties because of the risks involved in manually extinguishing the fire. In most cases, humans cannot detect and extinguish fire manually. Fire extinguishing robots with sophisticated functionalities are being rapidly developed nowadays, and most of these systems use fire sensors and detectors. However, they lack mechanisms for the early detection of fire, in case of casualties. To detect and prevent such fire accidents in its early stages, a deep learning-based automatic fire extinguishing mechanism was introduced in this work. Fire detection and human presence in fire locations were carried out using convolution neural networks (CNNs), configured to operate on the chosen fire dataset. For fire detection, a custom learning network was formed by tweaking the layer parameters of CNN for detecting fires with better accuracy. For human detection, Alex-net architecture was employed to detect the presence of humans in the fire accident zone. We experimented and analyzed the proposed model using various optimizers, activation functions, and learning rates, based on the accuracy and loss metrics generated for the chosen fire dataset. The best combination of neural network parameters was evaluated from the model configured with an Adam optimizer and softmax activation, driven with a learning rate of 0.001, providing better accuracy for the learning model. Finally, the experiments were tested using a mobile robotic system by configuring them in automatic and wireless control modes. In automatic mode, the robot was made to patrol around and monitor for fire casualties and fire accidents. It automatically extinguished the fire using the learned features triggered through the developed model. Full article
Show Figures

Figure 1

26 pages, 7552 KiB  
Article
Categorical Variable Mapping Considerations in Classification Problems: Protein Application
by Gerardo Alfonso Perez and Raquel Castillo
Mathematics 2023, 11(2), 279; https://doi.org/10.3390/math11020279 - 5 Jan 2023
Viewed by 2262
Abstract
The mapping of categorical variables into numerical values is common in machine learning classification problems. This type of mapping is frequently performed in a relatively arbitrary manner. We present a series of four assumptions (tested numerically) regarding these mappings in the context of [...] Read more.
The mapping of categorical variables into numerical values is common in machine learning classification problems. This type of mapping is frequently performed in a relatively arbitrary manner. We present a series of four assumptions (tested numerically) regarding these mappings in the context of protein classification using amino acid information. This assumption involves the mapping of categorical variables into protein classification problems without the need to use approaches such as natural language process (NLP). The first three assumptions relate to equivalent mappings, and the fourth involves a comparable mapping using a proposed eigenvalue-based matrix representation of the amino acid chain. These assumptions were tested across a range of 23 different machine learning algorithms. It is shown that the numerical simulations are consistent with the presented assumptions, such as translation and permutations, and that the eigenvalue approach generates classifications that are statistically not different from the base case or that have higher mean values while at the same time providing some advantages such as having a fixed predetermined dimensions regardless of the size of the analyzed protein. This approach generated an accuracy of 83.25%. An optimization algorithm is also presented that selects an appropriate number of neurons in an artificial neural network applied to the above-mentioned protein classification problem, achieving an accuracy of 85.02%. The model includes a quadratic penalty function to decrease the chances of overfitting. Full article
Show Figures

Figure 1

16 pages, 8046 KiB  
Article
Unsupervised Image Translation Using Multi-Scale Residual GAN
by Yifei Zhang, Weipeng Li, Daling Wang and Shi Feng
Mathematics 2022, 10(22), 4347; https://doi.org/10.3390/math10224347 - 19 Nov 2022
Cited by 1 | Viewed by 2893
Abstract
Image translation is a classic problem of image processing and computer vision for transforming an image from one domain to another by learning the mapping between an input image and an output image. A novel Multi-scale Residual Generative Adversarial Network (MRGAN) based on [...] Read more.
Image translation is a classic problem of image processing and computer vision for transforming an image from one domain to another by learning the mapping between an input image and an output image. A novel Multi-scale Residual Generative Adversarial Network (MRGAN) based on unsupervised learning is proposed in this paper for transforming images between different domains using unpaired data. In the model, a dual generater architecture is used to eliminate the dependence on paired training samples and introduce a multi-scale layered residual network in generators for reducing semantic loss of images in the process of encoding. The Wasserstein GAN architecture with gradient penalty (WGAN-GP) is employed in the discriminator to optimize the training process and speed up the network convergence. Comparative experiments on several image translation tasks over style transfers and object migrations show that the proposed MRGAN outperforms strong baseline models by large margins. Full article
Show Figures

Figure 1

14 pages, 6095 KiB  
Article
Cross-Section Dimension Measurement of Construction Steel Pipe Based on Machine Vision
by Fuxing Yu, Zhihu Qin, Ruina Li and Zhanlin Ji
Mathematics 2022, 10(19), 3535; https://doi.org/10.3390/math10193535 - 28 Sep 2022
Cited by 3 | Viewed by 2193
Abstract
Currently, the on-site measuring of the size of a steel pipe cross-section for scaffold construction relies on manual measurement tools, which is a time-consuming process with poor accuracy. Therefore, this paper proposes a new method for steel pipe size measurements that is based [...] Read more.
Currently, the on-site measuring of the size of a steel pipe cross-section for scaffold construction relies on manual measurement tools, which is a time-consuming process with poor accuracy. Therefore, this paper proposes a new method for steel pipe size measurements that is based on edge extraction and image processing. Our primary aim is to solve the problems of poor accuracy and waste of labor in practical applications of construction steel pipe inspection. Therefore, the developed method utilizes a convolutional neural network and image processing technology to find an optimum solution. Our experiment revealed that the edge image that is proposed in the existing convolutional neural network technology is relatively rough and is unable to calculate the steel pipe’s cross-sectional size. Thus, the suggested network model optimizes the current technology and combines it with image processing technology. The results demonstrate that compared with the richer convolutional features (RCF) network, the optimal dataset scale (ODS) is improved by 3%, and the optimal image scale (OIS) is improved by 2.2%. At the same time, the error value of the Hough transform can be effectively reduced after improving the Hough algorithm. Full article
Show Figures

Figure 1

Back to TopTop