Advances in Computer Vision and Machine Learning, 2nd Edition

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (31 August 2024) | Viewed by 10291

Special Issue Editors


E-Mail Website
Guest Editor
College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China.
Interests: cross-domain scene classification; multi-modal image analysis; cross-modal image interpretation
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
Interests: Information and communication engineering; satellite communication and satellite navigation; machine learning; pattern recognition
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Computer vision focuses on the theories and practices that give rise to semantically meaningful interpretations of the visual world. Mathematical models and tools can provide enormous opportunities for developing intelligent algorithms that extract useful information from visual data, such as a single image, a video sequence, and even a multi-/hyper-spectral image cube. In recent years, a number of  emerging machine learning techniques have been applied in visual perception tasks such as camera imaging geometry, camera calibration, image stabilization, multiview geometry, feature learning, image classification, and object recognition and tracking. However, it is still challenging to provide theoretical explanations of the underlying learning processes, especial when using deep neural networks, where a few questions remain to be answered, such as the design principles, the optimal architecture, the number of required layers, the sample complexity, and the optimization algorithms.

This Special Issue focuses on recent advances in computer vision and machine learning. The topics of interest include, but are not limited to, the following:

  • Pattern recognition and machine learning for computer vision;
  • Feature learning for computer vision;
  • Self-supervised/weakly supervised/unsupervised learning;
  • Image processing and analysis;
  • Deep neural networks in computer vision;
  • Graph neural networks;
  • Optimization method for machine learning;
  • Evolutionary computation and optimization problems;
  • Emerging applications.

Dr. Xiangtao Zheng
Prof. Dr. Jinchang Ren
Prof. Dr. Ling Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • computer vision
  • pattern recognition
  • statistical learning
  • data mining
  • deep learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 397 KiB  
Article
Local Directional Difference and Relational Descriptor for Texture Classification
by Weidan Yan and Yongsheng Dong
Mathematics 2024, 12(21), 3432; https://doi.org/10.3390/math12213432 - 1 Nov 2024
Viewed by 586
Abstract
The local binary pattern (LBP) has been widely used for extracting texture features. However, the LBP and most of its variants tend to focus on pixel units within small neighborhoods, neglecting differences in direction and relationships among different directions. To alleviate this issue, [...] Read more.
The local binary pattern (LBP) has been widely used for extracting texture features. However, the LBP and most of its variants tend to focus on pixel units within small neighborhoods, neglecting differences in direction and relationships among different directions. To alleviate this issue, in this paper, we propose a novel local directional difference and relational descriptor (LDDRD) for texture classification. Our proposed LDDRD utilizes information from multiple pixels along the radial direction. Specifically, a directional difference pattern (DDP) is first extracted by performing binary encoding on the differences between the central pixel and multiple neighboring pixels along the radial direction. Furthermore, by taking the central pixel as a reference, we extract the directional relation pattern (DRP) by comparing binary encodings representing different directions. Finally, we fuse the above DDP and DRP to form the LDDRD feature vector. Experimental results on six texture datasets reveal that our proposed LDDRD is effective and outperforms eight representative methods. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning, 2nd Edition)
Show Figures

Figure 1

18 pages, 1861 KiB  
Article
Improving Hybrid Regularized Diffusion Processes with the Triple-Cosine Smoothness Constraint for Re-Ranking
by Miao Du and Jianfeng Cai
Mathematics 2024, 12(19), 3082; https://doi.org/10.3390/math12193082 - 1 Oct 2024
Viewed by 547
Abstract
In the last few decades, diffusion processes have been widely used to solve visual re-ranking problems. The key point of these approaches is that, by diffusing the baseline similarities in the context of other samples, more reliable similarities or dissimilarities can be learned. [...] Read more.
In the last few decades, diffusion processes have been widely used to solve visual re-ranking problems. The key point of these approaches is that, by diffusing the baseline similarities in the context of other samples, more reliable similarities or dissimilarities can be learned. This was later found to be achieved by solving the optimization problem underlying the framework of the regularized diffusion process. In this paper, the proposed model differs from previous approaches in two aspects. Firstly, by taking the high-order information of the graph into account, a novel smoothness constraint, named the triple-cosine smoothness constraint, is proposed. The triple-cosine smoothness constraint is generated using the cosine of the angle between the vectors in the coordinate system, which is created based on a group of three elements: the queries treated as a whole and two other data points. A hybrid fitting constraint is also introduced into the proposed model. It consists of two types of predefined values, which are, respectively, used to construct two types of terms: the squared L2 norm and the L1 norm. Both the closed-form solution and the iterative solution of the proposed model are provided. Secondly, in the proposed model, the learned contextual dissimilarities can be used to describe “one-to-many” relationships, making it applicable to problems with multiple queries, which cannot be solved by previous methods that only handle “one-to-one” relationships. By taking advantage of these “one-to-many” contextual dissimilarities, an iterative re-ranking process based on the proposed model is further provided. Finally, the proposed algorithms are validated on various databases, and comprehensive experiments demonstrate that retrieval results can be effectively improved using our methods. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning, 2nd Edition)
Show Figures

Figure 1

17 pages, 5052 KiB  
Article
A New Instance Segmentation Model for High-Resolution Remote Sensing Images Based on Edge Processing
by Xiaoying Zhang, Jie Shen, Huaijin Hu and Houqun Yang
Mathematics 2024, 12(18), 2905; https://doi.org/10.3390/math12182905 - 18 Sep 2024
Cited by 1 | Viewed by 581
Abstract
With the goal of addressing the challenges of small, densely packed targets in remote sensing images, we propose a high-resolution instance segmentation model named QuadTransPointRend Net (QTPR-Net). This model significantly enhances instance segmentation performance in remote sensing images. The model consists of two [...] Read more.
With the goal of addressing the challenges of small, densely packed targets in remote sensing images, we propose a high-resolution instance segmentation model named QuadTransPointRend Net (QTPR-Net). This model significantly enhances instance segmentation performance in remote sensing images. The model consists of two main modules: preliminary edge feature extraction (PEFE) and edge point feature refinement (EPFR). We also created a specific approach and strategy named TransQTA for edge uncertainty point selection and feature processing in high-resolution remote sensing images. Multi-scale feature fusion and transformer technologies are used in QTPR-Net to refine rough masks and fine-grained features for selected edge uncertainty points while balancing model size and accuracy. Based on experiments performed on three public datasets: NWPU VHR-10, SSDD, and iSAID, we demonstrate the superiority of QTPR-Net over existing approaches. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning, 2nd Edition)
Show Figures

Figure 1

19 pages, 12242 KiB  
Article
Reconstructing the Colors of Underwater Images Based on the Color Mapping Strategy
by Siyuan Wu, Bangyong Sun, Xiao Yang, Wenjia Han, Jiahai Tan and Xiaomei Gao
Mathematics 2024, 12(13), 1933; https://doi.org/10.3390/math12131933 - 21 Jun 2024
Viewed by 786
Abstract
Underwater imagery plays a vital role in ocean development and conservation efforts. However, underwater images often suffer from chromatic aberration and low contrast due to the attenuation and scattering of visible light in the complex medium of water. To address these issues, we [...] Read more.
Underwater imagery plays a vital role in ocean development and conservation efforts. However, underwater images often suffer from chromatic aberration and low contrast due to the attenuation and scattering of visible light in the complex medium of water. To address these issues, we propose an underwater image enhancement network called CM-Net, which utilizes color mapping techniques to remove noise and restore the natural brightness and colors of underwater images. Specifically, CM-Net consists of a three-step solution: adaptive color mapping (ACM), local enhancement (LE), and global generation (GG). Inspired by the principles of color gamut mapping, the ACM enhances the network’s adaptive response to regions with severe color attenuation. ACM enables the correction of the blue-green cast in underwater images by combining color constancy theory with the power of convolutional neural networks. To account for inconsistent attenuation in different channels and spatial regions, we designed a multi-head reinforcement module (MHR) in the LE step. The MHR enhances the network’s attention to channels and spatial regions with more pronounced attenuation, further improving contrast and saturation. Compared to the best candidate models on the EUVP and UIEB datasets, CM-Net improves PSNR by 18.1% and 6.5% and SSIM by 5.9% and 13.3%, respectively. At the same time, CIEDE2000 decreased by 25.6% and 1.3%. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning, 2nd Edition)
Show Figures

Figure 1

22 pages, 30026 KiB  
Article
Multi-Camera Multi-Vehicle Tracking Guided by Highway Overlapping FoVs
by Hongkai Zhang, Ruidi Fang, Suqiang Li, Qiqi Miao, Xinggang Fan, Jie Hu and Sixian Chan
Mathematics 2024, 12(10), 1467; https://doi.org/10.3390/math12101467 - 9 May 2024
Viewed by 1468
Abstract
Multi-Camera Multi-Vehicle Tracking (MCMVT) is a critical task in Intelligent Transportation Systems (ITS). Differently to in urban environments, challenges in highway tunnel MCMVT arise from the changing target scales as vehicles traverse the narrow tunnels, intense light exposure within the tunnels, high similarity [...] Read more.
Multi-Camera Multi-Vehicle Tracking (MCMVT) is a critical task in Intelligent Transportation Systems (ITS). Differently to in urban environments, challenges in highway tunnel MCMVT arise from the changing target scales as vehicles traverse the narrow tunnels, intense light exposure within the tunnels, high similarity in vehicle appearances, and overlapping camera fields of view, making highway MCMVT more challenging. This paper presents an MCMVT system tailored for highway tunnel roads incorporating road topology structures and the overlapping camera fields of view. The system integrates a Cascade Multi-Level Multi-Target Tracking strategy (CMLM), a trajectory refinement method (HTCF) based on road topology structures, and a spatio-temporal constraint module (HSTC) considering highway entry–exit flow in overlapping fields of view. The CMLM strategy exploits phased vehicle movements within the camera’s fields of view, addressing such challenges as those presented by fast-moving vehicles and appearance variations in long tunnels. The HTCF method filters static traffic signs in the tunnel, compensating for detector imperfections and mitigating the strong lighting effects caused by the tunnel lighting. The HSTC module incorporates spatio-temporal constraints designed for accurate inter-camera trajectory matching within overlapping fields of view. Experiments on the proposed Highway Surveillance Traffic (HST) dataset and CityFlow dataset validate the system’s effectiveness and robustness, achieving an IDF1 score of 81.20% for the HST dataset. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning, 2nd Edition)
Show Figures

Figure 1

18 pages, 8004 KiB  
Article
Improving Oriented Object Detection by Scene Classification and Task-Aligned Focal Loss
by Xiaoliang Qian, Shaoguan Gao, Wei Deng and Wei Wang
Mathematics 2024, 12(9), 1343; https://doi.org/10.3390/math12091343 - 28 Apr 2024
Viewed by 869
Abstract
Oriented object detection (OOD) can precisely detect objects with arbitrary direction in remote sensing images (RSIs). Up to now, the two-stage OOD methods have attracted more attention because of their high detection accuracy. However, the two-stage methods only rely on the features of [...] Read more.
Oriented object detection (OOD) can precisely detect objects with arbitrary direction in remote sensing images (RSIs). Up to now, the two-stage OOD methods have attracted more attention because of their high detection accuracy. However, the two-stage methods only rely on the features of each proposal for object recognition, which leads to the misclassification problem because of the intra-class diversity, inter-class similarity and clutter backgrounds in RSIs. To address the above problem, an OOD model combining scene classification is proposed. Considering the fact that each foreground object has a strong contextual relationship with the scene of the RSI, a scene classification branch is added to the baseline OOD model, and the scene classification result of input RSI is used to exclude the impossible categories. To focus on the hard instances and enhance the consistency between classification and regression, a task-aligned focal loss (TFL) which combines the classification difficulty with the regression loss is proposed, and TFL assigns lager weights to the hard instances and optimizes the classification and regression branches simultaneously. The ablation study proves the effectiveness of scene classification branch, TFL and their combination. The comparisons with 15 and 14 OOD methods on the DOTA and DIOR-R datasets validate the superiority of our method. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning, 2nd Edition)
Show Figures

Figure 1

14 pages, 6180 KiB  
Article
High-Efficiency and High-Precision Ship Detection Algorithm Based on Improved YOLOv8n
by Kun Lan, Xiaoliang Jiang, Xiaokang Ding, Huan Lin and Sixian Chan
Mathematics 2024, 12(7), 1072; https://doi.org/10.3390/math12071072 - 2 Apr 2024
Cited by 5 | Viewed by 1272
Abstract
With the development of the intelligent vision industry, ship detection and identification technology has gradually become a research hotspot in the field of marine insurance and port logistics. However, due to the interference of rain, haze, waves, light, and other bad weather, the [...] Read more.
With the development of the intelligent vision industry, ship detection and identification technology has gradually become a research hotspot in the field of marine insurance and port logistics. However, due to the interference of rain, haze, waves, light, and other bad weather, the robustness and effectiveness of existing detection algorithms remain a continuous challenge. For this reason, an improved YOLOv8n algorithm is proposed for the detection of ship targets under unforeseen environmental conditions. In the proposed method, the efficient multi-scale attention module (C2f_EMAM) is introduced to integrate the context information of different scales so that the convolutional neural network can generate better pixel-level attention to high-level feature maps. In addition, a fully-concatenate bi-directional feature pyramid network (Concatenate_FBiFPN) is adopted to replace the simple superposition/addition of feature map, which can better solve the problem of feature propagation and information flow in target detection. An improved spatial pyramid pooling fast structure (SPPF2+1) is also designed to emphasize low-level pooling features and reduce the pooling depth to accommodate the information characteristics of the ship. A comparison experiment was conducted between other mainstream methods and our proposed algorithm. Results showed that our proposed algorithm outperformed other models by achieving 99.4% of accuracy, 98.2% of precision, 98.5% of recall, 99.1% of [email protected], and 85.4% of [email protected]:.95 on the SeaShips dataset. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning, 2nd Edition)
Show Figures

Figure 1

18 pages, 5379 KiB  
Article
Tensor-Based Sparse Representation for Hyperspectral Image Reconstruction Using RGB Inputs
by Yingtao Duan, Nan Wang, Yifan Zhang and Chao Song
Mathematics 2024, 12(5), 708; https://doi.org/10.3390/math12050708 - 28 Feb 2024
Cited by 2 | Viewed by 1233
Abstract
Hyperspectral image (HSI) reconstruction from RGB input has drawn much attention recently and plays a crucial role in further vision tasks. However, current sparse coding algorithms often take each single pixel as the basic processing unit during the reconstruction process, which ignores the [...] Read more.
Hyperspectral image (HSI) reconstruction from RGB input has drawn much attention recently and plays a crucial role in further vision tasks. However, current sparse coding algorithms often take each single pixel as the basic processing unit during the reconstruction process, which ignores the strong similarity and relation between adjacent pixels within an image or scene, leading to an inadequate learning of spectral and spatial features in the target hyperspectral domain. In this paper, a novel tensor-based sparse coding method is proposed to integrate both spectral and spatial information represented in tensor forms, which is capable of taking all the neighboring pixels into account during the spectral super-resolution (SSR) process without breaking the semantic structures, thus improving the accuracy of the final results. Specifically, the proposed method recovers the unknown HSI signals using sparse coding on the learned dictionary pairs. Firstly, the spatial information of pixels is used to constrain the sparse reconstruction process, which effectively improves the spectral reconstruction accuracy of pixels. In addition, the traditional two-dimensional dictionary learning is further extended to the tensor domain, by which the structure of inputs can be processed in a more flexible way, thus enhancing the spatial contextual relations. To this end, a rudimentary HSI estimation acquired in the sparse reconstruction stage is further enhanced by introducing the regression method, aiming to eliminate the spectral distortion to some extent. Abundant experiments are conducted on two public datasets, indicating the considerable availability of the proposed framework. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning, 2nd Edition)
Show Figures

Figure 1

20 pages, 4383 KiB  
Article
Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts
by Youngki Park and Youhyun Shin
Mathematics 2023, 11(22), 4585; https://doi.org/10.3390/math11224585 - 9 Nov 2023
Viewed by 1586
Abstract
In this paper, we present a novel approach to optical character recognition that incorporates various supplementary techniques, including the gradual detection of texts and gradual filtering of inaccurately recognized texts. To minimize false negatives, we attempt to detect all text by incrementally lowering [...] Read more.
In this paper, we present a novel approach to optical character recognition that incorporates various supplementary techniques, including the gradual detection of texts and gradual filtering of inaccurately recognized texts. To minimize false negatives, we attempt to detect all text by incrementally lowering the relevant thresholds. To mitigate false positives, we implement a novel filtering method that dynamically adjusts based on the confidence levels of recognized texts and their corresponding detection thresholds. Additionally, we use straightforward yet effective strategies to enhance the optical character recognition accuracy and speed, such as upscaling, link refinement, perspective transformation, the merging of cropped images, and simple autoregression. Given our focus on Korean chart data, we compile a mix of real-world and artificial Korean chart datasets for experimentation. Our experimental results show that our approach outperforms Tesseract by approximately 7 to 15 times and EasyOCR by 3 to 5 times in accuracy, as measured using a Jaccard similarity-based error rate on our datasets. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning, 2nd Edition)
Show Figures

Figure 1

Back to TopTop