applsci-logo

Journal Browser

Journal Browser

Advanced Intelligent Imaging Technology Ⅲ

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 July 2022) | Viewed by 54935

Special Issue Editor


E-Mail Website
Guest Editor
Department of Image, Graduate School of Advanced Imaging Science, Chung-Ang University, Seoul 06974, Korea
Interests: image enhancement and restoration; computational imaging; intelligent surveillance systems
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

A general pipeline of visual information processing includes: (i) image sensing and acquisition, (ii) pre-processing, (iii) feature detection or metric estimation, and (iv) high-level decisions. State-of-the-art artificial intelligence technology has caused a quantum leap in performance improvements to each step of visual information processing. In this context, deep learning-based image processing and computer vision algorithms have recently been developed and actively led the visual information processing field.

Artificial intelligence-based image signal processing (ISP) technology can drastically enhance the acquired digital images through demosaicing, denoising, deblurring, super resolution, and the wide dynamic range using deep neural networks. Feature detection and image analyses are the most popular application areas of artificial intelligence. An intelligent imaging system can solve various problems that are unsolvable without using the intelligence or learning.

An objective of this Special Issue is to highlight innovative developments of intelligent imaging technology related with various state-of-the-art image acquisition, preprocessing, feature detection, and image analysis using machine learning and artificial intelligence. In addition, any applications that combine two or more intelligent imaging methods are another important research area. Topics include but are not limited to:

  • Computational photography for intelligent imaging;
  • Visual inspection using machine learning and artificial intelligence;
  • Depth estimation and three-dimensional analysis;
  • Image processing and computer vision algorithms for advanced driver assistance systems (ADAS);
  • Wide-area, intelligent surveillance systems using multiple camera network;
  • Advanced image signal processor (ISP) based on artificial intelligence;
  • Deep neural networks for inverse imaging problems;
  • Multiple camera collaboration based on reinforcement learning;
  • Fusion of hybrid sensors for intelligent imaging systems;
  • Deep learning architectures for intelligent image processing and computer vision;
  • Learning-based multimodal image processing;
  • Remote sensing and UAV image processing.

Prof. Dr. Joonki Paik
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep neural network (DNN)
  • artificial neural network (ANN)
  • artificial intelligence
  • machine learning
  • deep learning
  • image processing
  • computer vision
  • mecal imaging
  • intelligent surveillance systems
  • computational photography
  • computational imaging
  • image signal processor (ISP)
  • camera network
  • visual inspection
  • multimodal imaging
  • medical diagnosis
  • railway inspection
  • visual surveillance
  • satellite imaging
  • thermal imaging…

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Published Papers (18 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 4691 KiB  
Article
Dynamic Hand Gesture Recognition for Smart Lifecare Routines via K-Ary Tree Hashing Classifier
by Hira Ansar, Amel Ksibi, Ahmad Jalal, Mohammad Shorfuzzaman, Abdulmajeed Alsufyani, Suliman A. Alsuhibany and Jeongmin Park
Appl. Sci. 2022, 12(13), 6481; https://doi.org/10.3390/app12136481 - 26 Jun 2022
Cited by 10 | Viewed by 2256
Abstract
In the past few years, home appliances have been influenced by the latest technologies and changes in consumer trends. One of the most desired gadgets of this time is a universal remote control for gestures. Hand gestures are the best way to control [...] Read more.
In the past few years, home appliances have been influenced by the latest technologies and changes in consumer trends. One of the most desired gadgets of this time is a universal remote control for gestures. Hand gestures are the best way to control home appliances. This paper presents a novel method of recognizing hand gestures for smart home appliances using imaging sensors. The proposed model is divided into six steps. First, preprocessing is done to de-noise the video frames and resize each frame to a specific dimension. Second, the hand is detected using a single shot detector-based convolution neural network (SSD-CNN) model. Third, landmarks are localized on the hand using the skeleton method. Fourth, features are extracted based on point-based trajectories, frame differencing, orientation histograms, and 3D point clouds. Fifth, features are optimized using fuzzy logic, and last, the H-Hash classifier is used for the classification of hand gestures. The system is tested on two benchmark datasets, namely, the IPN hand dataset and Jester dataset. The recognition accuracy on the IPN hand dataset is 88.46% and on Jester datasets is 87.69%. Users can control their smart home appliances, such as television, radio, air conditioner, and vacuum cleaner, using the proposed system. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

18 pages, 6739 KiB  
Article
Extrinsic Behavior Prediction of Pedestrians via Maximum Entropy Markov Model and Graph-Based Features Mining
by Yazeed Yasin Ghadi, Israr Akhter, Hanan Aljuaid, Munkhjargal Gochoo, Suliman A. Alsuhibany, Ahmad Jalal and Jeongmin Park
Appl. Sci. 2022, 12(12), 5985; https://doi.org/10.3390/app12125985 - 12 Jun 2022
Cited by 16 | Viewed by 2229
Abstract
With the change of technology and innovation of the current era, retrieving data and data processing becomes a more challenging task for researchers. In particular, several types of sensors and cameras are used to collect multimedia data from various resources and domains, which [...] Read more.
With the change of technology and innovation of the current era, retrieving data and data processing becomes a more challenging task for researchers. In particular, several types of sensors and cameras are used to collect multimedia data from various resources and domains, which have been used in different domains and platforms to analyze things such as educational and communicational setups, emergency services, and surveillance systems. In this paper, we propose a robust method to predict human behavior from indoor and outdoor crowd environments. While taking the crowd-based data as input, some preprocessing steps for noise reduction are performed. Then, human silhouettes are extracted that eventually help in the identification of human beings. After that, crowd analysis and crowd clustering are applied for more accurate and clear predictions. This step is followed by features extraction in which the deep flow, force interaction matrix and force flow features are extracted. Moreover, we applied the graph mining technique for data optimization, while the maximum entropy Markov model is applied for classification and predictions. The evaluation of the proposed system showed 87% of mean accuracy and 13% of error rate for the avenue dataset, while 89.50% of mean accuracy rate and 10.50% of error rate for the University of Minnesota (UMN) dataset. In addition, it showed a 90.50 mean accuracy rate and 9.50% of error rate for the A Day on Campus (ADOC) dataset. Therefore, these results showed a better accuracy rate and low error rate compared to state-of-the-art methods. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

24 pages, 5856 KiB  
Article
A Graph-Based Approach to Recognizing Complex Human Object Interactions in Sequential Data
by Yazeed Yasin Ghadi, Manahil Waheed, Munkhjargal Gochoo, Suliman A. Alsuhibany, Samia Allaoua Chelloug, Ahmad Jalal and Jeongmin Park
Appl. Sci. 2022, 12(10), 5196; https://doi.org/10.3390/app12105196 - 20 May 2022
Cited by 11 | Viewed by 2303
Abstract
The critical task of recognizing human–object interactions (HOI) finds its application in the domains of surveillance, security, healthcare, assisted living, rehabilitation, sports, and online learning. This has led to the development of various HOI recognition systems in the recent past. Thus, the purpose [...] Read more.
The critical task of recognizing human–object interactions (HOI) finds its application in the domains of surveillance, security, healthcare, assisted living, rehabilitation, sports, and online learning. This has led to the development of various HOI recognition systems in the recent past. Thus, the purpose of this study is to develop a novel graph-based solution for this purpose. In particular, the proposed system takes sequential data as input and recognizes the HOI interaction being performed in it. That is, first of all, the system pre-processes the input data by adjusting the contrast and smoothing the incoming image frames. Then, it locates the human and object through image segmentation. Based on this, 12 key body parts are identified from the extracted human silhouette through a graph-based image skeletonization technique called image foresting transform (IFT). Then, three types of features are extracted: full-body feature, point-based features, and scene features. The next step involves optimizing the different features using isometric mapping (ISOMAP). Lastly, the optimized feature vector is fed to a graph convolution network (GCN) which performs the HOI classification. The performance of the proposed system was validated using three benchmark datasets, namely, Olympic Sports, MSR Daily Activity 3D, and D3D-HOI. The results showed that this model outperforms the existing state-of-the-art models by achieving a mean accuracy of 94.1% with the Olympic Sports, 93.2% with the MSR Daily Activity 3D, and 89.6% with the D3D-HOI datasets. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

16 pages, 3383 KiB  
Article
A Method Based on Multi-Network Feature Fusion and Random Forest for Foreign Objects Detection on Transmission Lines
by Yanzhen Yu, Zhibin Qiu, Haoshuang Liao, Zixiang Wei, Xuan Zhu and Zhibiao Zhou
Appl. Sci. 2022, 12(10), 4982; https://doi.org/10.3390/app12104982 - 14 May 2022
Cited by 14 | Viewed by 2310
Abstract
Foreign objects such as kites, nests and balloons, etc., suspended on transmission lines may shorten the insulation distance and cause short-circuits between phases. A detection method for foreign objects on transmission lines is proposed, which combines multi-network feature fusion and random forest. Firstly, [...] Read more.
Foreign objects such as kites, nests and balloons, etc., suspended on transmission lines may shorten the insulation distance and cause short-circuits between phases. A detection method for foreign objects on transmission lines is proposed, which combines multi-network feature fusion and random forest. Firstly, the foreign object image dataset of balloons, kites, nests and plastic was established. Then, the Otus binarization threshold segmentation and morphology processing were applied to extract the target region of the foreign object. The features of the target region were extracted by five types of convolutional neural networks (CNN): GoogLeNet, DenseNet-201, EfficientNet-B0, ResNet-101, AlexNet and then fused by concatenation fusion strategy. Furthermore, the fused features in different schemes were used to train and test random forest, meanwhile, the gradient-weighted class activation mapping (Grad-CAM) was used to visualize the decision region of each network, which can verify the effectiveness of the optimal feature fusion scheme. Simulation results indicate that the detection accuracy of the proposed method can reach 95.88%, whose performance is better than the model of a single network. This study provides references for detection of foreign objects suspended on transmission lines. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

16 pages, 38773 KiB  
Article
Computational Imaging for Simultaneous Image Restoration and Super-Resolution Image Reconstruction of Single-Lens Diffractive Optical System
by Kai Liu, Xiao Yu, Yongsen Xu, Yulei Xu, Yuan Yao, Nan Di, Yefei Wang, Hao Wang and Honghai Shen
Appl. Sci. 2022, 12(9), 4753; https://doi.org/10.3390/app12094753 - 9 May 2022
Cited by 4 | Viewed by 2179
Abstract
Diffractive optical elements (DOEs) are difficult to apply in natural scenes imaging covering the visible bandwidth-spectral due to their strong chromatic aberration and the decrease in diffraction efficiency. Advances in computational imaging make it possible. In this paper, the image quality degradation model [...] Read more.
Diffractive optical elements (DOEs) are difficult to apply in natural scenes imaging covering the visible bandwidth-spectral due to their strong chromatic aberration and the decrease in diffraction efficiency. Advances in computational imaging make it possible. In this paper, the image quality degradation model of DOE in bandwidth-spectral imaging is established to quantitatively analyze its degradation process. We design a DDZMR network for a single-lens diffractive lens computational imaging system, which can simultaneously perform image restoration and image super-resolution reconstruction on degraded images. The multimodal loss function was created to evaluate the reconstruction of the diffraction imaging degradation by the DDZMR network. The prototype physical prototype of the single-lens harmonic diffraction computational imaging system (SHDCIS) was built to verify the imaging performance. SHDCIS testing showed that optical chromatic aberration is corrected by computational reconstruction, and the computational imaging module can interpret an image and restore it at 1.4 times the resolution. We also evaluated the performance of the DDZMR model using the B100 and Urban100 datasets. Mean Peak Signal to Noise Ratio (PSNR)/Structural Similarity (SSIM) were, respectively, 32.09/0.8975 and 31.82/0.9247, which indicates that DDZMR performed comparably to the state-of-the-art (SOTA) methods. This work can promote the development and application of diffractive imaging systems in the imaging of natural scenes in the bandwidth-spectrum. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

17 pages, 5757 KiB  
Article
Detection of Transmission Line Insulator Defects Based on an Improved Lightweight YOLOv4 Model
by Zhibin Qiu, Xuan Zhu, Caibo Liao, Dazhai Shi and Wenqian Qu
Appl. Sci. 2022, 12(3), 1207; https://doi.org/10.3390/app12031207 - 24 Jan 2022
Cited by 60 | Viewed by 5774
Abstract
Defective insulators seriously threaten the safe operation of transmission lines. This paper proposes an insulator defect detection method based on an improved YOLOv4 algorithm. An insulator image sample set was established according to the aerial images from the power grid and the public [...] Read more.
Defective insulators seriously threaten the safe operation of transmission lines. This paper proposes an insulator defect detection method based on an improved YOLOv4 algorithm. An insulator image sample set was established according to the aerial images from the power grid and the public dataset on the Internet, combining with the image augmentation method based on GraphCut. The insulator images were preprocessed by Laplace sharpening method. To solve the problems of too many parameters and low detection speed of the YOLOv4 object detection model, the MobileNet lightweight convolutional neural network was used to improve YOLOv4 model structure. Combining with the transfer learning method, the insulator image samples were used to train, verify, and test the improved YOLOV4 model. The detection results of transmission line insulator and defect images show that the detection accuracy and speed of the proposed model can reach 93.81% and 53 frames per second (FPS), respectively, and the detection accuracy can be further improved to 97.26% after image preprocessing. The overall performance of the proposed lightweight YOLOv4 model is better than traditional object detection algorithms. This study provides a reference for intelligent inspection and defect detection of suspension insulators on transmission lines. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

17 pages, 2948 KiB  
Article
Detection of Small Size Traffic Signs Using Regressive Anchor Box Selection and DBL Layer Tweaking in YOLOv3
by Yawar Rehman, Hafsa Amanullah, Dost Muhammad Saqib Bhatti, Waqas Tariq Toor, Muhammad Ahmad and Manuel Mazzara
Appl. Sci. 2021, 11(23), 11555; https://doi.org/10.3390/app112311555 - 6 Dec 2021
Cited by 2 | Viewed by 2627
Abstract
Traffic sign recognition is a key module of autonomous cars and driver assistance systems. Traffic sign detection accuracy and inference time are the two most important parameters. Current methods for traffic sign recognition are very accurate; however, they do not meet the requirement [...] Read more.
Traffic sign recognition is a key module of autonomous cars and driver assistance systems. Traffic sign detection accuracy and inference time are the two most important parameters. Current methods for traffic sign recognition are very accurate; however, they do not meet the requirement for real-time detection. While some are fast enough for real-time traffic sign detection, they fall short in accuracy. This paper proposes an accuracy improvement in the YOLOv3 network, which is a very fast detection framework. The proposed method contributes to the accurate detection of a small-sized traffic sign in terms of image size and helps to reduce false positives and miss rates. In addition, we propose an anchor frame selection algorithm that helps in achieving the optimal size and scale of the anchor frame. Therefore, the proposed method supports the detection of a small traffic sign with real-time detection. This ultimately helps to achieve an optimal balance between accuracy and inference time. The proposed network is evaluated on two publicly available datasets, namely the German Traffic Sign Detection Benchmark (GTSDB) and the Swedish Traffic Sign dataset (STS), and its performance showed that the proposed approach achieves a decent balance between mAP and inference time. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

31 pages, 8736 KiB  
Article
Ensemble Deep Learning for the Detection of COVID-19 in Unbalanced Chest X-ray Dataset
by Khin Yadanar Win, Noppadol Maneerat, Syna Sreng and Kazuhiko Hamamoto
Appl. Sci. 2021, 11(22), 10528; https://doi.org/10.3390/app112210528 - 9 Nov 2021
Cited by 15 | Viewed by 3653
Abstract
The ongoing COVID-19 pandemic has caused devastating effects on humanity worldwide. With practical advantages and wide accessibility, chest X-rays (CXRs) play vital roles in the diagnosis of COVID-19 and the evaluation of the extent of lung damages incurred by the virus. This study [...] Read more.
The ongoing COVID-19 pandemic has caused devastating effects on humanity worldwide. With practical advantages and wide accessibility, chest X-rays (CXRs) play vital roles in the diagnosis of COVID-19 and the evaluation of the extent of lung damages incurred by the virus. This study aimed to leverage deep-learning-based methods toward the automated classification of COVID-19 from normal and viral pneumonia on CXRs, and the identification of indicative regions of COVID-19 biomarkers. Initially, we preprocessed and segmented the lung regions usingDeepLabV3+ method, and subsequently cropped the lung regions. The cropped lung regions were used as inputs to several deep convolutional neural networks (CNNs) for the prediction of COVID-19. The dataset was highly unbalanced; the vast majority were normal images, with a small number of COVID-19 and pneumonia images. To remedy the unbalanced distribution and to avoid biased classification results, we applied five different approaches: (i) balancing the class using weighted loss; (ii) image augmentation to add more images to minority cases; (iii) the undersampling of majority classes; (iv) the oversampling of minority classes; and (v) a hybrid resampling approach of oversampling and undersampling. The best-performing methods from each approach were combined as the ensemble classifier using two voting strategies. Finally, we used the saliency map of CNNs to identify the indicative regions of COVID-19 biomarkers which are deemed useful for interpretability. The algorithms were evaluated using the largest publicly available COVID-19 dataset. An ensemble of the top five CNNs with image augmentation achieved the highest accuracy of 99.23% and area under curve (AUC) of 99.97%, surpassing the results of previous studies. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

14 pages, 4965 KiB  
Article
Hierarchical Visual Place Recognition Based on Semantic-Aggregation
by Baifan Chen, Xiaoting Song, Hongyu Shen and Tao Lu
Appl. Sci. 2021, 11(20), 9540; https://doi.org/10.3390/app11209540 - 14 Oct 2021
Cited by 4 | Viewed by 1954
Abstract
A major challenge in place recognition is to be robust against viewpoint changes and appearance changes caused by self and environmental variations. Humans achieve this by recognizing objects and their relationships in the scene under different conditions. Inspired by this, we propose a [...] Read more.
A major challenge in place recognition is to be robust against viewpoint changes and appearance changes caused by self and environmental variations. Humans achieve this by recognizing objects and their relationships in the scene under different conditions. Inspired by this, we propose a hierarchical visual place recognition pipeline based on semantic-aggregation and scene understanding for the images. The pipeline contains coarse matching and fine matching. Semantic-aggregation happens in residual aggregation of visual information and semantic information in coarse matching, and semantic association of semantic edges in fine matching. Through the above two processes, we realized a robust coarse-to-fine pipeline of visual place recognition across viewpoint and condition variations. Experimental results on the benchmark datasets show that our method performs better than several state-of-the-art methods, improving the robustness against severe viewpoint changes and appearance changes while maintaining good matching-time performance. Moreover, we prove that it is possible for a computer to realize place recognition based on scene understanding. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

12 pages, 3216 KiB  
Article
Novel Application of Long Short-Term Memory Network for 3D to 2D Retinal Vessel Segmentation in Adaptive Optics—Optical Coherence Tomography Volumes
by Christopher T. Le, Dongyi Wang, Ricardo Villanueva, Zhuolin Liu, Daniel X. Hammer, Yang Tao and Osamah J. Saeedi
Appl. Sci. 2021, 11(20), 9475; https://doi.org/10.3390/app11209475 - 12 Oct 2021
Cited by 5 | Viewed by 2385
Abstract
Adaptive optics—optical coherence tomography (AO-OCT) is a non-invasive technique for imaging retinal vascular and structural features at cellular-level resolution. Whereas retinal blood vessel density is an important biomarker for ocular diseases, particularly glaucoma, automated blood vessel segmentation tools in AO-OCT have not yet [...] Read more.
Adaptive optics—optical coherence tomography (AO-OCT) is a non-invasive technique for imaging retinal vascular and structural features at cellular-level resolution. Whereas retinal blood vessel density is an important biomarker for ocular diseases, particularly glaucoma, automated blood vessel segmentation tools in AO-OCT have not yet been explored. One reason for this is that AO-OCT allows for variable input axial dimensions, which are not well accommodated by 2D-2D or 3D-3D segmentation tools. We propose a novel bidirectional long short-term memory (LSTM)-based network for 3D-2D segmentation of blood vessels within AO-OCT volumes. This technique incorporates inter-slice connectivity and allows for variable input slice numbers. We compare this proposed model to a standard 2D UNet segmentation network considering only volume projections. Furthermore, we expanded the proposed LSTM-based network with an additional UNet to evaluate how it refines network performance. We trained, validated, and tested these architectures in 177 AO-OCT volumes collected from 18 control and glaucoma subjects. The LSTM-UNet has statistically significant improvement (p < 0.05) in AUC (0.88) and recall (0.80) compared to UNet alone (0.83 and 0.70, respectively). LSTM-based approaches had longer evaluation times than the UNet alone. This study shows that a bidirectional convolutional LSTM module improves standard automated vessel segmentation in AO-OCT volumes, although with higher time cost. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

26 pages, 8719 KiB  
Article
Classification of the Trap-Neuter-Return Surgery Images of Stray Animals Using Yolo-Based Deep Learning Integrated with a Majority Voting System
by Yi-Cheng Huang, Ting-Hsueh Chuang and Yeong-Lin Lai
Appl. Sci. 2021, 11(18), 8578; https://doi.org/10.3390/app11188578 - 15 Sep 2021
Cited by 3 | Viewed by 3001
Abstract
Trap-neuter-return (TNR) has become an effective solution to reduce the prevalence of stray animals. Due to the non-culling policy for stray cats and dogs since 2017, there is a great demand for the sterilization of cats and dogs in Taiwan. In 2020, Heart [...] Read more.
Trap-neuter-return (TNR) has become an effective solution to reduce the prevalence of stray animals. Due to the non-culling policy for stray cats and dogs since 2017, there is a great demand for the sterilization of cats and dogs in Taiwan. In 2020, Heart of Taiwan Animal Care (HOTAC) had more than 32,000 cases of neutered cats and dogs. HOTAC needs to take pictures to record the ears and excised organs of each neutered cat or dog from different veterinary hospitals. The correctness of the archived medical photos and the different shooting and imaging angles from different veterinary hospitals must be carefully reviewed by human professionals. To reduce the cost of manual review, Yolo’s ensemble learning based on deep learning and a majority voting system can effectively identify TNR surgical images, save 80% of the labor force, and its average accuracy (mAP) exceeds 90%. The best feature extraction based on the Yolo model is Yolov4, whose mAP reaches 91.99%, and the result is integrated into the voting classification. Experimental results show that compared with the previous manual work, it can decrease the workload by more than 80%. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

20 pages, 1221 KiB  
Article
Computer-Aided Diagnosis of Alzheimer’s Disease via Deep Learning Models and Radiomics Method
by Yin Dai, Wenhe Bai, Zheng Tang, Zian Xu and Weibing Chen
Appl. Sci. 2021, 11(17), 8104; https://doi.org/10.3390/app11178104 - 31 Aug 2021
Cited by 5 | Viewed by 2877
Abstract
This paper focused on the problem of diagnosis of Alzheimer’s disease via the combination of deep learning and radiomics methods. We proposed a classification model for Alzheimer’s disease diagnosis based on improved convolution neural network models and image fusion method and compared it [...] Read more.
This paper focused on the problem of diagnosis of Alzheimer’s disease via the combination of deep learning and radiomics methods. We proposed a classification model for Alzheimer’s disease diagnosis based on improved convolution neural network models and image fusion method and compared it with existing network models. We collected 182 patients in the ADNI and PPMI database to classify Alzheimer’s disease, and reached 0.906 AUC in training with single modality images, and 0.941 AUC in training with fusion images. This proved the proposed method has better performance in the fusion images. The research may promote the application of multimodal images in the diagnosis of Alzheimer’s disease. Fusion images dataset based on multi-modality images has higher diagnosis accuracy than single modality images dataset. Deep learning methods and radiomics significantly improve the diagnosing accuracy of Alzheimer’s disease diagnosis. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

17 pages, 2797 KiB  
Article
Leaf Spot Attention Networks Based on Spot Feature Encoding for Leaf Disease Identification and Detection
by Chang-Hwan Son
Appl. Sci. 2021, 11(17), 7960; https://doi.org/10.3390/app11177960 - 28 Aug 2021
Cited by 7 | Viewed by 2914
Abstract
This study proposes a new attention-enhanced YOLO model that incorporates a leaf spot attention mechanism based on regions-of-interest (ROI) feature extraction into the YOLO framework for leaf disease detection. Inspired by a previous study, which revealed that leaf spot attention based on the [...] Read more.
This study proposes a new attention-enhanced YOLO model that incorporates a leaf spot attention mechanism based on regions-of-interest (ROI) feature extraction into the YOLO framework for leaf disease detection. Inspired by a previous study, which revealed that leaf spot attention based on the ROI-aware feature extraction can improve leaf disease recognition accuracy significantly and outperform state-of-the-art deep learning models, this study extends the leaf spot attention model to leaf disease detection. The primary idea is that spot areas indicating leaf diseases appear only in leaves, whereas the background area does not contain useful information regarding leaf diseases. To increase the discriminative power of the feature extractor that is required in the object detection framework, it is essential to extract informative and discriminative features from the spot and leaf areas. To realize this, a new ROI-aware feature extractor, that is, a spot feature extractor was designed. To divide the leaf image into spot, leaf, and background areas, the leaf segmentation module was first pretrained, and then spot feature encoding was applied to encode spot information. Next, the ROI-aware feature extractor was connected to an ROI-aware feature fusion layer to model the leaf spot attention mechanism, and to be joined with the YOLO detection subnetwork. The experimental results confirm that the proposed ROI-aware feature extractor can improve leaf disease detection by boosting the discriminative power of the spot features. In addition, the proposed attention-enhanced YOLO model outperforms conventional state-of-the-art object detection models. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

17 pages, 9117 KiB  
Article
Layer Decomposition Learning Based on Gaussian Convolution Model and Residual Deblurring for Inverse Halftoning
by Chang-Hwan Son
Appl. Sci. 2021, 11(15), 7006; https://doi.org/10.3390/app11157006 - 29 Jul 2021
Cited by 3 | Viewed by 2422
Abstract
Layer decomposition to separate an input image into base and detail layers has been steadily used for image restoration. Existing residual networks based on an additive model require residual layers with a small output range for fast convergence and visual quality improvement. However, [...] Read more.
Layer decomposition to separate an input image into base and detail layers has been steadily used for image restoration. Existing residual networks based on an additive model require residual layers with a small output range for fast convergence and visual quality improvement. However, in inverse halftoning, homogenous dot patterns hinder a small output range from the residual layers. Therefore, a new layer decomposition network based on the Gaussian convolution model (GCM) and a structure-aware deblurring strategy is presented to achieve residual learning for both the base and detail layers. For the base layer, a new GCM-based residual subnetwork is presented. The GCM utilizes a statistical distribution, in which the image difference between a blurred continuous-tone image and a blurred halftoned image with a Gaussian filter can result in a narrow output range. Subsequently, the GCM-based residual subnetwork uses a Gaussian-filtered halftoned image as the input, and outputs the image difference as a residual, thereby generating the base layer, i.e., the Gaussian-blurred continuous-tone image. For the detail layer, a new structure-aware residual deblurring subnetwork (SARDS) is presented. To remove the Gaussian blurring of the base layer, the SARDS uses the predicted base layer as the input, and outputs the deblurred version. To more effectively restore image structures such as lines and text, a new image structure map predictor is incorporated into the deblurring network to induce structure-adaptive learning. This paper provides a method to realize the residual learning of both the base and detail layers based on the GCM and SARDS. In addition, it is verified that the proposed method surpasses state-of-the-art methods based on U-Net, direct deblurring networks, and progressively residual networks. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

17 pages, 3419 KiB  
Article
Revisiting Low-Resolution Images Retrieval with Attention Mechanism and Contrastive Learning
by Thanh-Vu Dang, Gwang-Hyun Yu and Jin-Young Kim
Appl. Sci. 2021, 11(15), 6783; https://doi.org/10.3390/app11156783 - 23 Jul 2021
Cited by 2 | Viewed by 2434
Abstract
Recent empirical works reveal that visual representation learned by deep neural networks can be successfully used as descriptors for image retrieval. A common technique is to leverage pre-trained models to learn visual descriptors by ranking losses and fine-tuning with labeled data. However, retrieval [...] Read more.
Recent empirical works reveal that visual representation learned by deep neural networks can be successfully used as descriptors for image retrieval. A common technique is to leverage pre-trained models to learn visual descriptors by ranking losses and fine-tuning with labeled data. However, retrieval systems’ performance significantly decreases when querying images of lower resolution than the training images. This study considered a contrastive learning framework fine-tuned on features extracted from a pre-trained neural network encoder equipped with an attention mechanism to address the image retrieval task for low-resolution image retrieval. Our method is simple yet effective since the contrastive learning framework drives similar samples close to each other in feature space by manipulating variants of their augmentations. To benchmark the proposed framework, we conducted quantitative and qualitative analyses of CARS196 (mAP = 0.8804), CUB200-2011 (mAP = 0.9379), and Stanford Online Products datasets (mAP = 0.9141) and analyzed their performances. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

16 pages, 3626 KiB  
Article
Real-Time Surveillance System for Analyzing Abnormal Behavior of Pedestrians
by Dohun Kim, Heegwang Kim, Yeongheon Mok and Joonki Paik
Appl. Sci. 2021, 11(13), 6153; https://doi.org/10.3390/app11136153 - 2 Jul 2021
Cited by 21 | Viewed by 4923
Abstract
In spite of excellent performance of deep learning-based computer vision algorithms, they are not suitable for real-time surveillance to detect abnormal behavior because of very high computational complexity. In this paper, we propose a real-time surveillance system for abnormal behavior analysis in a [...] Read more.
In spite of excellent performance of deep learning-based computer vision algorithms, they are not suitable for real-time surveillance to detect abnormal behavior because of very high computational complexity. In this paper, we propose a real-time surveillance system for abnormal behavior analysis in a closed-circuit television (CCTV) environment by constructing an algorithm and system optimized for a CCTV environment. The proposed method combines pedestrian detection and tracking to extract pedestrian information in real-time, and detects abnormal behaviors such as intrusion, loitering, fall-down, and violence. To analyze an abnormal behavior, it first determines intrusion/loitering through the coordinates of an object and then determines fall-down/violence based on the behavior pattern of the object. The performance of the proposed method is evaluated using an intelligent CCTV data set distributed by Korea Internet and Security Agency (KISA). Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

10 pages, 2282 KiB  
Article
Action Recognition Network Using Stacked Short-Term Deep Features and Bidirectional Moving Average
by Jinsol Ha, Joongchol Shin, Hasil Park and Joonki Paik
Appl. Sci. 2021, 11(12), 5563; https://doi.org/10.3390/app11125563 - 16 Jun 2021
Cited by 4 | Viewed by 2128
Abstract
Action recognition requires the accurate analysis of action elements in the form of a video clip and a properly ordered sequence of the elements. To solve the two sub-problems, it is necessary to learn both spatio-temporal information and the temporal relationship between different [...] Read more.
Action recognition requires the accurate analysis of action elements in the form of a video clip and a properly ordered sequence of the elements. To solve the two sub-problems, it is necessary to learn both spatio-temporal information and the temporal relationship between different action elements. Existing convolutional neural network (CNN)-based action recognition methods have focused on learning only spatial or temporal information without considering the temporal relation between action elements. In this paper, we create short-term pixel-difference images from the input video, and take the difference images as an input to a bidirectional exponential moving average sub-network to analyze the action elements and their temporal relations. The proposed method consists of: (i) generation of RGB and differential images, (ii) extraction of deep feature maps using an image classification sub-network, (iii) weight assignment to extracted feature maps using a bidirectional, exponential, moving average sub-network, and (iv) late fusion with a three-dimensional convolutional (C3D) sub-network to improve the accuracy of action recognition. Experimental results show that the proposed method achieves a higher performance level than existing baseline methods. In addition, the proposed action recognition network takes only 0.075 seconds per action class, which guarantees various high-speed or real-time applications, such as abnormal action classification, human–computer interaction, and intelligent visual surveillance. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

14 pages, 635 KiB  
Article
CF-CNN: Coarse-to-Fine Convolutional Neural Network
by Jinho Park, Heegwang Kim and Joonki Paik
Appl. Sci. 2021, 11(8), 3722; https://doi.org/10.3390/app11083722 - 20 Apr 2021
Cited by 6 | Viewed by 4086
Abstract
In this paper, we present a coarse-to-fine convolutional neural network (CF-CNN) for learning multilabel classes. The basis of the proposed CF-CNN is a disjoint grouping method that first creates a class group with hierarchical association, and then assigns a new label to a [...] Read more.
In this paper, we present a coarse-to-fine convolutional neural network (CF-CNN) for learning multilabel classes. The basis of the proposed CF-CNN is a disjoint grouping method that first creates a class group with hierarchical association, and then assigns a new label to a class belonging to each group so that each class acquires multiple labels. CF-CNN consists of one main network and two subnetworks. Each subnetwork performs coarse prediction using the group labels created by the disjoint grouping method. The main network includes a refine convolution layer and performs fine prediction to fuse the feature maps acquired from the subnetwork. The generated class set in the upper level has the same classification boundary to that in the lower level. Since the classes belonging to the upper level label are classified with a higher priority, parameter optimization becomes easier. In experimental results, the proposed method is applied to various classification tasks to show a higher classification accuracy by up to 3% with a much smaller number of parameters without modification of the baseline model. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology Ⅲ)
Show Figures

Figure 1

Back to TopTop