Computer Vision, Image Processing Technologies and Artificial Intelligence

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (30 November 2023) | Viewed by 29335

Special Issue Editors


E-Mail Website
Guest Editor
Institute of Computing Technology, University of Chinese Academy of Sciences, Beijing 100049, China
Interests: video coding; computer vision; deep learning
Special Issues, Collections and Topics in MDPI journals
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
Interests: image processing; signal processing; artificial intelligence
Special Issues, Collections and Topics in MDPI journals
School of Computer Science, Beijing Information Science and Technology University, Beijing 100101, China
Interests: neural networks; machine learning; computer vision and developmental robotics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Computer vision task has been expanded into various researching fields, where information is extracted from vision data including image and video. The application of computer vision technology is prevailing in modern human life, with billions of people utilizing applications with the relevant technologies, including image recognition, image processing, object detection, etc., showing the necessity and potential of research in computer vision and its applications. The development of artificial intelligence has now equipped these techniques with the ability to outperform human beings. However, there are still many valuable problems in the research and application of computer vision and image processing technology and Artificial Intelligence.

This Special Issue on “Computer Vision, Image Processing Technologies and Artificial Intelligence” is aimed at gathering a collection of original articles contributing to the progress of theoretical and practical research in the domains of computer vision, image processing, and artificial intelligence, including but not limited to the following aspects and tasks:

  • image augmentation;
  • image restoration;
  • image encoding;
  • image segmentation;
  • image recognition;
  • image classification;
  • image and video retrieval;
  • image and video synthesis;
  • object detection;
  • image depiction;
  • image-to-image translation;
  • image forensics;
  • artificial intelligence applied in information security.

Dr. Hongang Qi
Dr. Yan Liu
Dr. Jun Miao
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • artificial intelligence
  • deep learning
  • machine learning
  • neural networks
  • image processing
  • vision information

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 4089 KiB  
Article
General Image Manipulation Detection Using Feature Engineering and a Deep Feed-Forward Neural Network
by Sajjad Ahmed, Byungun Yoon, Sparsh Sharma, Saurabh Singh and Saiful Islam
Mathematics 2023, 11(21), 4537; https://doi.org/10.3390/math11214537 - 3 Nov 2023
Cited by 1 | Viewed by 1458
Abstract
Within digital forensics, a notable emphasis is placed on the detection of the application of fundamental image-editing operators, including but not limited to median filters, average filters, contrast enhancement, resampling, and various other operations closely associated with these techniques. When conducting a historical [...] Read more.
Within digital forensics, a notable emphasis is placed on the detection of the application of fundamental image-editing operators, including but not limited to median filters, average filters, contrast enhancement, resampling, and various other operations closely associated with these techniques. When conducting a historical analysis of an image that has potentially undergone various modifications in the past, it is a logical initial approach to search for alterations made by fundamental operators. This paper presents the development of a deep-learning-based system designed for the purpose of detecting fundamental manipulation operations. The research involved training a multilayer perceptron using a feature set of 36 dimensions derived from the gray-level co-occurrence matrix, gray-level run-length matrix, and normalized streak area. The system detected median filtering, mean filtering, the introduction of additive white Gaussian noise, and the application of JPEG compression in digital Images. Our system, which utilizes a multilayer perceptron trained with a 36-feature set, achieved an accuracy of 99.46% and outperformed state-of-the-art deep-learning-based solutions, which achieved an accuracy of 97.89%. Full article
Show Figures

Figure 1

19 pages, 5930 KiB  
Article
EHFP-GAN: Edge-Enhanced Hierarchical Feature Pyramid Network for Damaged QR Code Reconstruction
by Jianhua Zheng, Ruolin Zhao, Zhongju Lin, Shuangyin Liu, Rong Zhu, Zihao Zhang, Yusha Fu and Junde Lu
Mathematics 2023, 11(20), 4349; https://doi.org/10.3390/math11204349 - 19 Oct 2023
Cited by 1 | Viewed by 1829
Abstract
In practical usage, QR codes often become difficult to recognize due to damage. Traditional restoration methods exhibit a limited effectiveness for severely damaged or densely encoded QR codes, are time-consuming, and have limitations in addressing extensive information loss. To tackle these challenges, we [...] Read more.
In practical usage, QR codes often become difficult to recognize due to damage. Traditional restoration methods exhibit a limited effectiveness for severely damaged or densely encoded QR codes, are time-consuming, and have limitations in addressing extensive information loss. To tackle these challenges, we propose a two-stage restoration model named the EHFP-GAN, comprising an edge restoration module and a QR code reconstruction module. The edge restoration module guides subsequent restoration by repairing the edge images, resulting in finer edge details. The hierarchical feature pyramid within the QR code reconstruction module enhances the model’s global image perception. Using our custom dataset, we compare the EHFP-GAN against several mainstream image processing models. The results demonstrate the exceptional restoration performance of the EHFP-GAN model. Specifically, across various levels of contamination, the EHFP-GAN achieves significant improvements in the recognition rate and image quality metrics, surpassing the comparative models. For instance, under mild contamination, the EHFP-GAN achieves a recognition rate of 95.35%, while under a random contamination, it reaches 31.94%, both outperforming the comparative models. In conclusion, the EHFP-GAN model demonstrates remarkable efficacy in the restoration of damaged QR codes. Full article
Show Figures

Figure 1

22 pages, 63373 KiB  
Article
Plant Image Classification with Nonlinear Motion Deblurring Based on Deep Learning
by Ganbayar Batchuluun, Jin Seong Hong, Abdul Wahid and Kang Ryoung Park
Mathematics 2023, 11(18), 4011; https://doi.org/10.3390/math11184011 - 21 Sep 2023
Cited by 2 | Viewed by 1293
Abstract
Despite the significant number of classification studies conducted using plant images, studies on nonlinear motion blur are limited. In general, motion blur results from movements of the hands of a person holding a camera for capturing plant images, or when the plant moves [...] Read more.
Despite the significant number of classification studies conducted using plant images, studies on nonlinear motion blur are limited. In general, motion blur results from movements of the hands of a person holding a camera for capturing plant images, or when the plant moves owing to wind while the camera is stationary. When these two cases occur simultaneously, nonlinear motion blur is highly probable. Therefore, a novel deep learning-based classification method applied on plant images with various nonlinear motion blurs is proposed. In addition, this study proposes a generative adversarial network-based method to reduce nonlinear motion blur; accordingly, the method is explored for improving classification performance. Herein, experiments are conducted using a self-collected visible light images dataset. Evidently, nonlinear motion deblurring results in a structural similarity index measure (SSIM) of 73.1 and a peak signal-to-noise ratio (PSNR) of 21.55, whereas plant classification results in a top-1 accuracy of 90.09% and F1-score of 84.84%. In addition, the experiment conducted using two types of open datasets resulted in PSNRs of 20.84 and 21.02 and SSIMs of 72.96 and 72.86, respectively. The proposed method of plant classification results in top-1 accuracies of 89.79% and 82.21% and F1-scores of 84% and 76.52%, respectively. Thus, the proposed network produces higher accuracies than the existing state-of-the-art methods. Full article
Show Figures

Figure 1

23 pages, 11979 KiB  
Article
Multi-Focus Image Fusion via PAPCNN and Fractal Dimension in NSST Domain
by Ming Lv, Zhenhong Jia, Liangliang Li and Hongbing Ma
Mathematics 2023, 11(18), 3803; https://doi.org/10.3390/math11183803 - 5 Sep 2023
Cited by 2 | Viewed by 1133
Abstract
Multi-focus image fusion is a popular technique for generating a full-focus image, where all objects in the scene are clear. In order to achieve a clearer and fully focused fusion effect, in this paper, the multi-focus image fusion method based on the parameter-adaptive [...] Read more.
Multi-focus image fusion is a popular technique for generating a full-focus image, where all objects in the scene are clear. In order to achieve a clearer and fully focused fusion effect, in this paper, the multi-focus image fusion method based on the parameter-adaptive pulse-coupled neural network and fractal dimension in the nonsubsampled shearlet transform domain was developed. The parameter-adaptive pulse coupled neural network-based fusion rule was used to merge the low-frequency sub-bands, and the fractal dimension-based fusion rule via the multi-scale morphological gradient was used to merge the high-frequency sub-bands. The inverse nonsubsampled shearlet transform was used to reconstruct the fused coefficients, and the final fused multi-focus image was generated. We conducted comprehensive evaluations of our algorithm using the public Lytro dataset. The proposed method was compared with state-of-the-art fusion algorithms, including traditional and deep-learning-based approaches. The quantitative and qualitative evaluations demonstrated that our method outperformed other fusion algorithms, as evidenced by the metrics data such as QAB/F, QE, QFMI, QG, QNCIE, QP, QMI, QNMI, QY, QAG, QPSNR, and QMSE. These results highlight the clear advantages of our proposed technique in multi-focus image fusion, providing a significant contribution to the field. Full article
Show Figures

Figure 1

20 pages, 12456 KiB  
Article
ATC-YOLOv5: Fruit Appearance Quality Classification Algorithm Based on the Improved YOLOv5 Model for Passion Fruits
by Changhong Liu, Weiren Lin, Yifeng Feng, Ziqing Guo and Zewen Xie
Mathematics 2023, 11(16), 3615; https://doi.org/10.3390/math11163615 - 21 Aug 2023
Cited by 4 | Viewed by 2538
Abstract
Passion fruit, renowned for its significant nutritional, medicinal, and economic value, is extensively cultivated in subtropical regions such as China, India, and Vietnam. In the production and processing industry, the quality grading of passion fruit plays a crucial role in the supply chain. [...] Read more.
Passion fruit, renowned for its significant nutritional, medicinal, and economic value, is extensively cultivated in subtropical regions such as China, India, and Vietnam. In the production and processing industry, the quality grading of passion fruit plays a crucial role in the supply chain. However, the current process relies heavily on manual labor, resulting in inefficiency and high costs, which reflects the importance of expanding the application of fruit appearance quality classification mechanisms based on computer vision. Moreover, the existing passion fruit detection algorithms mainly focus on real-time detection and overlook the quality-classification aspect. This paper proposes the ATC-YOLOv5 model based on deep learning for passion fruit detection and quality classification. First, an improved Asymptotic Feature Pyramid Network (APFN) is utilized as the feature-extraction network, which is the network modified in this study by adding weighted feature concat pathways. This optimization enhances the feature flow between different levels and nodes, allowing for the adaptive and asymptotic fusion of richer feature information related to passion fruit quality. Secondly, the Transformer Cross Stage Partial (TRCSP) layer is constructed based on the introduction of the Multi-Head Self-Attention (MHSA) layer in the Cross Stage Partial (CSP) layer, enabling the network to achieve a better performance in modeling long-range dependencies. In addition, the Coordinate Attention (CA) mechanism is introduced to enhance the network’s learning capacity for both local and non-local information, as well as the fine-grained features of passion fruit. Moreover, to validate the performance of the proposed model, a self-made passion fruit dataset is constructed to classify passion fruit into four quality grades. The original YOLOv5 serves as the baseline model. According to the experimental results, the mean average precision (mAP) of ATC-YOLOv5 reaches 95.36%, and the mean detection time (mDT) is 3.2 ms, which improves the mAP by 4.83% and the detection speed by 11.1%, and the number of parameters is reduced by 10.54% compared to the baseline, maintaining the lightweight characteristics while improving the accuracy. These experimental results validate the high detection efficiency of the proposed model for fruit quality classification, contributing to the realization of intelligent agriculture and fruit industries. Full article
Show Figures

Figure 1

21 pages, 9369 KiB  
Article
Raindrop-Removal Image Translation Using Target-Mask Network with Attention Module
by Hyuk-Ju Kwon and Sung-Hak Lee
Mathematics 2023, 11(15), 3318; https://doi.org/10.3390/math11153318 - 28 Jul 2023
Cited by 7 | Viewed by 1934
Abstract
Image processing plays a crucial role in improving the performance of models in various fields such as autonomous driving, surveillance cameras, and multimedia. However, capturing ideal images under favorable lighting conditions is not always feasible, particularly in challenging weather conditions such as rain, [...] Read more.
Image processing plays a crucial role in improving the performance of models in various fields such as autonomous driving, surveillance cameras, and multimedia. However, capturing ideal images under favorable lighting conditions is not always feasible, particularly in challenging weather conditions such as rain, fog, or snow, which can impede object recognition. This study aims to address this issue by focusing on generating clean images by restoring raindrop-deteriorated images. Our proposed model comprises a raindrop-mask network and a raindrop-removal network. The raindrop-mask network is based on U-Net architecture, which learns the location, shape, and brightness of raindrops. The rain-removal network is a generative adversarial network based on U-Net and comprises two attention modules: the raindrop-mask module and the residual convolution block module. These modules are employed to locate raindrop areas and restore the affected regions. Multiple loss functions are utilized to enhance model performance. The image-quality assessment metrics of proposed method, such as SSIM, PSNR, CEIQ, NIQE, FID, and LPIPS scores, are 0.832, 26.165, 3.351, 2.224, 20.837, and 0.059, respectively. Comparative evaluations against state-of-the-art models demonstrate the superiority of our proposed model based on qualitative and quantitative results. Full article
Show Figures

Figure 1

18 pages, 8052 KiB  
Article
Neural Rendering-Based 3D Scene Style Transfer Method via Semantic Understanding Using a Single Style Image
by Jisun Park and Kyungeun Cho
Mathematics 2023, 11(14), 3243; https://doi.org/10.3390/math11143243 - 24 Jul 2023
Cited by 1 | Viewed by 2338
Abstract
In the rapidly emerging era of untact (“contact-free”) technologies, the requirement for three-dimensional (3D) virtual environments utilized in virtual reality (VR)/augmented reality (AR) and the metaverse has seen significant growth, owing to their extensive application across various domains. Current research focuses on the [...] Read more.
In the rapidly emerging era of untact (“contact-free”) technologies, the requirement for three-dimensional (3D) virtual environments utilized in virtual reality (VR)/augmented reality (AR) and the metaverse has seen significant growth, owing to their extensive application across various domains. Current research focuses on the automatic transfer of the style of rendering images within a 3D virtual environment using artificial intelligence, which aims to minimize human intervention. However, the prevalent studies on rendering-based 3D environment-style transfers have certain inherent limitations. First, the training of a style transfer network dedicated to 3D virtual environments demands considerable style image data. These data must align with viewpoints that closely resemble those of the virtual environment. Second, there was noticeable inconsistency within the 3D structures. Predominant studies often neglect 3D scene geometry information instead of relying solely on 2D input image features. Finally, style adaptation fails to accommodate the unique characteristics inherent in each object. To address these issues, we propose a novel approach: a neural rendering-based 3D scene-style conversion technique. This methodology employs semantic nearest-neighbor feature matching, thereby facilitating the transfer of style within a 3D scene while considering the distinctive characteristics of each object, even when employing a single style image. The neural radiance field enables the network to comprehend the geometric information of a 3D scene in relation to its viewpoint. Subsequently, it transfers style features by employing the unique features of a single style image via semantic nearest-neighbor feature matching. In an empirical context, our proposed semantic 3D scene style transfer method was applied to 3D scene style transfers for both interior and exterior environments. This application utilizes the replica, 3DFront, and Tanks and Temples datasets for testing. The results illustrate that the proposed methodology surpasses existing style transfer techniques in terms of maintaining 3D viewpoint consistency, style uniformity, and semantic coherence. Full article
Show Figures

Figure 1

17 pages, 10993 KiB  
Article
An End-to-End Framework Based on Vision-Language Fusion for Remote Sensing Cross-Modal Text-Image Retrieval
by Liu He, Shuyan Liu, Ran An, Yudong Zhuo and Jian Tao
Mathematics 2023, 11(10), 2279; https://doi.org/10.3390/math11102279 - 13 May 2023
Cited by 7 | Viewed by 2153
Abstract
Remote sensing cross-modal text-image retrieval (RSCTIR) has recently attracted extensive attention due to its advantages of fast extraction of remote sensing image information and flexible human–computer interaction. Traditional RSCTIR methods mainly focus on improving the performance of uni-modal feature extraction separately, and most [...] Read more.
Remote sensing cross-modal text-image retrieval (RSCTIR) has recently attracted extensive attention due to its advantages of fast extraction of remote sensing image information and flexible human–computer interaction. Traditional RSCTIR methods mainly focus on improving the performance of uni-modal feature extraction separately, and most rely on pre-trained object detectors to obtain better local feature representation, which not only lack multi-modal interaction information, but also cause the training gap between the pre-trained object detector and the retrieval task. In this paper, we propose an end-to-end RSCTIR framework based on vision-language fusion (EnVLF) consisting of two uni-modal (vision and language) encoders and a muti-modal encoder which can be optimized by multitask training. Specifically, to achieve an end-to-end training process, we introduce a vision transformer module for image local features instead of a pre-trained object detector. By semantic alignment of visual and text features, the vision transformer module achieves the same performance as pre-trained object detectors for image local features. In addition, the trained multi-modal encoder can improve the top-one and top-five ranking performances after retrieval processing. Experiments on common RSICD and RSITMD datasets demonstrate that our EnVLF can obtain state-of-the-art retrieval performance. Full article
Show Figures

Figure 1

21 pages, 8810 KiB  
Article
MBDM: Multinational Banknote Detecting Model for Assisting Visually Impaired People
by Chanhum Park and Kang Ryoung Park
Mathematics 2023, 11(6), 1392; https://doi.org/10.3390/math11061392 - 13 Mar 2023
Cited by 2 | Viewed by 1742
Abstract
With the proliferation of smartphones and advancements in deep learning technologies, object recognition using built-in smartphone cameras has become possible. One application of this technology is to assist visually impaired individuals through the banknote detection of multiple national currencies. Previous studies have focused [...] Read more.
With the proliferation of smartphones and advancements in deep learning technologies, object recognition using built-in smartphone cameras has become possible. One application of this technology is to assist visually impaired individuals through the banknote detection of multiple national currencies. Previous studies have focused on single-national banknote detection; in contrast, this study addressed the practical need for the detection of banknotes of any nationality. To this end, we propose a multinational banknote detection model (MBDM) and a method for multinational banknote detection based on mosaic data augmentation. The effectiveness of the MBDM is demonstrated through evaluation on a Korean won (KRW) banknote and coin database built using a smartphone camera, a US dollar (USD) and Euro banknote database, and a Jordanian dinar (JOD) database that is an open database. The results show that the MBDM achieves an accuracy of 0.8396, a recall value of 0.9334, and an F1 score of 0.8840, outperforming state-of-the-art methods. Full article
Show Figures

Figure 1

16 pages, 11259 KiB  
Article
A Fuzzy Plug-and-Play Neural Network-Based Convex Shape Image Segmentation Method
by Xuyuan Zhang, Yu Han, Sien Lin and Chen Xu
Mathematics 2023, 11(5), 1101; https://doi.org/10.3390/math11051101 - 22 Feb 2023
Cited by 3 | Viewed by 1476
Abstract
The task of partitioning convex shape objects from images is a hot research topic, since this kind of object can be widely found in natural images. The difficulties in achieving this task lie in the fact that these objects are usually partly interrupted [...] Read more.
The task of partitioning convex shape objects from images is a hot research topic, since this kind of object can be widely found in natural images. The difficulties in achieving this task lie in the fact that these objects are usually partly interrupted by undesired background scenes. To estimate the whole boundaries of these objects, different neural networks are designed to ensure the convexity of corresponding image segmentation results. To make use of well-trained neural networks to promote the performances of convex shape image segmentation tasks, in this paper a new image segmentation model is proposed in the variational framework. In this model, a fuzzy membership function, instead of a classical binary label function, is employed to indicate image regions. To ensure fuzzy membership functions can approximate to binary label functions well, an edge-preserving smoothness regularizer is constructed from an off-the-shelf plug-and-play network denoiser, since an image denoising process can also be seen as an edge-preserving smoothing process. From the numerical results, our proposed method could generate better segmentation results on real images, and our image segmentation results were less affected by the initialization of our method than the results obtained from classical methods. Full article
Show Figures

Figure 1

15 pages, 7152 KiB  
Article
COCM: Co-Occurrence-Based Consistency Matching in Domain-Adaptive Segmentation
by Siyu Zhu, Yingjie Tian, Fenfen Zhou, Kunlong Bai and Xiaoyu Song
Mathematics 2022, 10(23), 4468; https://doi.org/10.3390/math10234468 - 26 Nov 2022
Cited by 2 | Viewed by 1273
Abstract
This paper focuses on domain adaptation in a semantic segmentation task. Traditional methods regard the source domain and the target domain as a whole, and the image matching is determined by random seeds, leading to a low degree of consistency matching between domains [...] Read more.
This paper focuses on domain adaptation in a semantic segmentation task. Traditional methods regard the source domain and the target domain as a whole, and the image matching is determined by random seeds, leading to a low degree of consistency matching between domains and interfering with the reduction in the domain gap. Therefore, we designed a two-step, three-level cascaded domain consistency matching strategy—co-occurrence-based consistency matching (COCM)—in which the two steps are: Step 1, in which we design a matching strategy from the perspective of category existence and filter the sub-image set with the highest degree of matching from the image of the whole source domain, and Step 2, in which, from the perspective of spatial existence, we propose a method of measuring the PIOU score to quantitatively evaluate the spatial matching of co-occurring categories in the sub-image set and select the best-matching source image. The three levels mean that in order to improve the importance of low-frequency categories in the matching process, we divide the categories into three levels according to the frequency of co-occurrences between domains; these three levels are the head, middle, and tail levels, and priority is given to matching tail categories. The proposed COCM maximizes the category-level consistency between the domains and has been proven to be effective in reducing the domain gap while being lightweight. The experimental results on general datasets can be compared with those of state-of-the-art (SOTA) methods. Full article
Show Figures

Figure 1

17 pages, 4546 KiB  
Article
Reprojection-Based Numerical Measure of Robustness for CT Reconstruction Neural Network Algorithms
by Aleksandr Smolin, Andrei Yamaev, Anastasia Ingacheva, Tatyana Shevtsova, Dmitriy Polevoy, Marina Chukalina, Dmitry Nikolaev and Vladimir Arlazarov
Mathematics 2022, 10(22), 4210; https://doi.org/10.3390/math10224210 - 11 Nov 2022
Cited by 2 | Viewed by 2170
Abstract
In computed tomography, state-of-the-art reconstruction is based on neural network (NN) algorithms. However, NN reconstruction algorithms can be not robust to small noise-like perturbations in the input signal. A not robust NN algorithm can produce inaccurate reconstruction with plausible artifacts that cannot be [...] Read more.
In computed tomography, state-of-the-art reconstruction is based on neural network (NN) algorithms. However, NN reconstruction algorithms can be not robust to small noise-like perturbations in the input signal. A not robust NN algorithm can produce inaccurate reconstruction with plausible artifacts that cannot be detected. Hence, the robustness of NN algorithms should be investigated and evaluated. There have been several attempts to construct the numerical metrics of the NN reconstruction algorithms’ robustness. However, these metrics estimate only the probability of the easily distinguishable artifacts occurring in the reconstruction. However, these methods measure only the probability of appearance of easily distinguishable artifacts on the reconstruction, which cannot lead to misdiagnosis in clinical applications. In this work, we propose a new method for numerical estimation of the robustness of the NN reconstruction algorithms. This method is based on the probability evaluation for NN to form such selected additional structures during reconstruction which may lead to an incorrect diagnosis. The method outputs a numerical score value from 0 to 1 that can be used when benchmarking the robustness of different reconstruction algorithms. We employed the proposed method to perform a comparative study of seven reconstruction algorithms, including five NN-based and two classical. The ResUNet network had the best robustness score (0.65) among the investigated NN algorithms, but its robustness score is still lower than that of the classical algorithm SIRT (0.989). The investigated NN models demonstrated a wide range of robustness scores (0.38–0.65). Thus, in this work, robustness of 7 reconstruction algorithms was measured using the new proposed score and it was shown that some of the neural algorithms are not robust. Full article
Show Figures

Figure 1

17 pages, 10586 KiB  
Article
Multibranch Attention Mechanism Based on Channel and Spatial Attention Fusion
by Guojun Mao, Guanyi Liao, Hengliang Zhu and Bo Sun
Mathematics 2022, 10(21), 4150; https://doi.org/10.3390/math10214150 - 6 Nov 2022
Cited by 9 | Viewed by 3063
Abstract
Recently, it has been demonstrated that the performance of an object detection network can be improved by embedding an attention module into it. In this work, we propose a lightweight and effective attention mechanism named multibranch attention (M3Att). For the input feature map, [...] Read more.
Recently, it has been demonstrated that the performance of an object detection network can be improved by embedding an attention module into it. In this work, we propose a lightweight and effective attention mechanism named multibranch attention (M3Att). For the input feature map, our M3Att first uses the grouped convolutional layer with a pyramid structure for feature extraction, and then calculates channel attention and spatial attention simultaneously and fuses them to obtain more complementary features. It is a “plug and play” module that can be easily added to the object detection network and significantly improves the performance of the object detection network with a small increase in parameters. We demonstrate the effectiveness of M3Att on various challenging object detection tasks, including PASCAL VOC2007, PASCAL VOC2012, KITTI, and Zhanjiang Underwater Robot Competition. The experimental results show that this method dramatically improves the object detection effect, especially for the PASCAL VOC2007, and the mapping index of the original network increased by 4.93% when embedded in the YOLOV4 (You Only Look Once v4) network. Full article
Show Figures

Figure 1

15 pages, 4764 KiB  
Article
An Attention and Wavelet Based Spatial-Temporal Graph Neural Network for Traffic Flow and Speed Prediction
by Shihao Zhao, Shuli Xing and Guojun Mao
Mathematics 2022, 10(19), 3507; https://doi.org/10.3390/math10193507 - 26 Sep 2022
Cited by 3 | Viewed by 3002
Abstract
Traffic flow prediction is essential to the intelligent transportation system (ITS). However, due to the complex spatial-temporal dependence of traffic flow data, it is insufficient in the extraction of local and global spatial-temporal correlations for the previous process on road network and traffic [...] Read more.
Traffic flow prediction is essential to the intelligent transportation system (ITS). However, due to the complex spatial-temporal dependence of traffic flow data, it is insufficient in the extraction of local and global spatial-temporal correlations for the previous process on road network and traffic flow modeling. This paper proposes an attention and wavelet-based spatial-temporal graph neural network for traffic flow and speed prediction (STAGWNN). It integrated attention and graph wavelet neural networks to capture local and global spatial information. Meanwhile, we stacked a gated temporal convolutional network (gated TCN) with a temporal attention mechanism to extract the time series information. The experiment was carried out on real public transportation datasets: PEMS-BAY and PEMSD7(M). The comparison results showed that our proposed model outperformed baseline networks on these datasets, which indicated that STAGWNN could better capture the spatial-temporal correlation information. Full article
Show Figures

Figure 1

Back to TopTop