Convolutional Neural Networks and Vision Applications, Volume II

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (31 December 2022) | Viewed by 49666

Special Issue Editors

School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510006, China
Interests: computer vision and pattern recognition
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Processing speed is critical for visual inspection automation and mobile visual computing applications. Many powerful and sophisticated computer vision algorithms generate accurate results but require high computational power or resources and are not entirely suitable for real-time vision applications. On the other hand, there are vision algorithms and convolutional neural networks that perform at camera frame rates but with moderately reduced accuracy, which is arguably more applicable for real-time vision applications. This Special Issue is dedicated to research related to the design, optimization, and implementation of machine-learning-based vision algorithms or convolutional neural networks that are suitable for real-time vision applications.

General topics covered in this Special Issue include but are not limited to:

  • Optimization of software-based vision algorithms;
  • CNN architecture optimizations for real-time performance;
  • CNN acceleration through approximate computing;
  • CNN applications that require real-time performance;
  • Tradeoff analysis between speed and accuracy in CNN;
  • GPU-based implementations for real-time CNN performance;
  • FPGA-based implementations for real-time CNN performance;
  • Embedded vision systems for applications that require real-time performance;
  • Machine vision applications that require real-time performance.

Prof. Dr. D. J. Lee
Prof. Dr. Dong Zhang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (20 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 2275 KiB  
Article
Intelligent Decision Support System for Differential Diagnosis of Chronic Odontogenic Rhinosinusitis Based on U-Net Segmentation
by Victoria Alekseeva, Alina Nechyporenko, Marcus Frohme, Vitaliy Gargin, Ievgen Meniailov and Dmytro Chumachenko
Electronics 2023, 12(5), 1202; https://doi.org/10.3390/electronics12051202 - 2 Mar 2023
Cited by 26 | Viewed by 1958
Abstract
The share of chronic odontogenic rhinosinusitis is 40% among all chronic rhinosinusitis. Using automated information systems for differential diagnosis will improve the efficiency of decision-making by doctors in diagnosing chronic odontogenic rhinosinusitis. Therefore, this study aimed to develop an intelligent decision support system [...] Read more.
The share of chronic odontogenic rhinosinusitis is 40% among all chronic rhinosinusitis. Using automated information systems for differential diagnosis will improve the efficiency of decision-making by doctors in diagnosing chronic odontogenic rhinosinusitis. Therefore, this study aimed to develop an intelligent decision support system for the differential diagnosis of chronic odontogenic rhinosinusitis based on computer vision methods. A dataset was collected and processed, including 162 MSCT images. A deep learning model for image segmentation was developed. A 23 convolutional layer U-Net network architecture has been used for the segmentation of multi-spiral computed tomography (MSCT) data with odontogenic maxillary sinusitis. The proposed model is implemented in such a way that each pair of repeated 3 × 3 convolutions layers is followed by an Exponential Linear Unit instead of a Rectified Linear Unit as an activation function. The model showed an accuracy of 90.09%. To develop a decision support system, an intelligent chatbot allows the user to conduct an automated patient survey and collect patient examination data from several doctors of various profiles. The intelligent information system proposed in this study made it possible to combine an image processing model with a patient interview and examination data, improving physician decision-making efficiency in the differential diagnosis of Chronic Odontogenic Rhinosinusitis. The proposed solution is the first comprehensive solution in this area. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

20 pages, 2623 KiB  
Article
HBCA: A Toolchain for High-Accuracy Branch-Fused CNN Accelerator on FPGA with Dual-Decimal-Fused Technique
by Zhengjie Li, Lingli Hou, Xinxuan Tao, Jian Wang and Jinmei Lai
Electronics 2023, 12(1), 192; https://doi.org/10.3390/electronics12010192 - 30 Dec 2022
Cited by 1 | Viewed by 1773
Abstract
The programmability of FPGA suits the constantly changing convolutional neural network (CNN). However, several challenges arise when the previous FPGA-based accelerators update CNN. Firstly, although the model of RepVGG can balance accuracy and speed, it solely supports two types of kernels. Meanwhile, 8-bit [...] Read more.
The programmability of FPGA suits the constantly changing convolutional neural network (CNN). However, several challenges arise when the previous FPGA-based accelerators update CNN. Firstly, although the model of RepVGG can balance accuracy and speed, it solely supports two types of kernels. Meanwhile, 8-bit integer-only quantization of PyTorch which can support various CNNs is seldom successfully supported by the FPGA-based accelerators. In addition, Winograd F(4 × 4, 3 × 3) uses less multiplication, but its transformation matrix contains irregular decimals, which could lead to accuracy problems. To tackle these issues, this paper proposes High-accuracy Branch-fused CNN Accelerator (HBCA): a toolchain and corresponding FPGA-based accelerator. The toolchain proposes inception-based branch–fused technique, which can support more types of kernels. Meanwhile, the accelerator proposes Winograd-quantization dual decimal–fuse techniques to balance accuracy and speed. In addition, this accelerator supports multi-types of kernels and proposes Winograd decomposed-part reuse, multi-mode BRAM & DSP and data reuse to increase power efficiency. Experiments show that HBCA is capable of supporting seven CNNs with different types of kernels and more branches. The accuracy loss is within 0.1% when compared to the quantized model. Furthermore, the power efficiency (GOPS/W) of Inception, ResNet and VGG is up to 226.6, 188.1 and 197.7, which are better than other FPGA-based CNN accelerators. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

15 pages, 4122 KiB  
Article
A Player-Specific Framework for Cricket Highlights Generation Using Deep Convolutional Neural Networks
by Rabbia Mahum, Aun Irtaza, Saeed Ur Rehman, Talha Meraj and Hafiz Tayyab Rauf
Electronics 2023, 12(1), 65; https://doi.org/10.3390/electronics12010065 - 24 Dec 2022
Cited by 2 | Viewed by 2265
Abstract
Automatic ways to generate video summarization is a key technique to manage huge video content nowadays. The aim of video summaries is to provide important information in less time to viewers. There exist some techniques for video summarization in the cricket domain, however, [...] Read more.
Automatic ways to generate video summarization is a key technique to manage huge video content nowadays. The aim of video summaries is to provide important information in less time to viewers. There exist some techniques for video summarization in the cricket domain, however, to the best of our knowledge our proposed model is the first one to deal with specific player summaries in cricket videos successfully. In this study, we provide a novel framework and a valuable technique for cricket video summarization and classification. For video summary specific to the player, the proposed technique exploits the fact i.e., presence of Score Caption (SC) in frames. In the first stage, optical character recognition (OCR) is applied to extract text summary from SC to find all frames of the specific player such as the Start Frame (SF) to the Last Frame (LF). In the second stage, various frames of cricket videos are used in the supervised AlexNet classifier for training along with class labels such as positive and negative for binary classification. A pre-trained network is trained for binary classification of those frames which are attained from the first phase exhibiting the performance of a specific player along with some additional scenes. In the third phase, the person identification technique is employed to recognize frames containing the specific player. Then, frames are cropped and SIFT features are extracted from identified person to further cluster these frames using the fuzzy c-means clustering method. The reason behind the third phase is to further optimize the video summaries as the frames attained in the second stage included the partner player’s frame as well. The proposed framework successfully utilizes the cricket videoo dataset. Additionally, the technique is very efficient and useful in broadcasting cricket video highlights of a specific player. The experimental results signify that our proposed method surpasses the previously stated results, improving the overall accuracy of up to 95%. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

17 pages, 5844 KiB  
Article
Computer Vision-Based Kidney’s (HK-2) Damaged Cells Classification with Reconfigurable Hardware Accelerator (FPGA)
by Arfan Ghani, Rawad Hodeify, Chan H. See, Simeon Keates, Dah-Jye Lee and Ahmed Bouridane
Electronics 2022, 11(24), 4234; https://doi.org/10.3390/electronics11244234 - 19 Dec 2022
Cited by 6 | Viewed by 3205
Abstract
In medical and health sciences, the detection of cell injury plays an important role in diagnosis, personal treatment and disease prevention. Despite recent advancements in tools and methods for image classification, it is challenging to classify cell images with higher precision and accuracy. [...] Read more.
In medical and health sciences, the detection of cell injury plays an important role in diagnosis, personal treatment and disease prevention. Despite recent advancements in tools and methods for image classification, it is challenging to classify cell images with higher precision and accuracy. Cell classification based on computer vision offers significant benefits in biomedicine and healthcare. There have been studies reported where cell classification techniques have been complemented by Artificial Intelligence-based classifiers such as Convolutional Neural Networks. These classifiers suffer from the drawback of the scale of computational resources required for training and hence do not offer real-time classification capabilities for an embedded system platform. Field Programmable Gate Arrays (FPGAs) offer the flexibility of hardware reconfiguration and have emerged as a viable platform for algorithm acceleration. Given that the logic resources and on-chip memory available on a single device are still limited, hardware/software co-design is proposed where image pre-processing and network training were performed in software, and trained architectures were mapped onto an FPGA device (Nexys4DDR) for real-time cell classification. This paper demonstrates that the embedded hardware-based cell classifier performs with almost 100% accuracy in detecting different types of damaged kidney cells. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

15 pages, 2623 KiB  
Article
Metric-Based Key Frame Extraction for Gait Recognition
by Tuanjie Wei, Rui Li, Huimin Zhao, Rongjun Chen, Jin Zhan, Huakang Li and Jiwei Wan
Electronics 2022, 11(24), 4177; https://doi.org/10.3390/electronics11244177 - 14 Dec 2022
Viewed by 1496
Abstract
Gait recognition is one of the most promising biometric technologies that can identify individuals at a long distance. From observation, we find that there are differences in the length of the gait cycle and the quality of each frame in the sequence. In [...] Read more.
Gait recognition is one of the most promising biometric technologies that can identify individuals at a long distance. From observation, we find that there are differences in the length of the gait cycle and the quality of each frame in the sequence. In this paper, we propose a novel gait recognition framework to analyze human gait. On the one hand, we designed the Multi-scale Temporal Aggregation (MTA) module that models temporal and aggregate contextual information with different scales, on the other hand, we introduce the Metric-based Frame Attention Mechanism (MFAM) to re-weight each frame by the importance score, which calculates using the distance between frame-level features and sequence-level features. We evaluate our model on two of the most popular public datasets, CASIA-B and OU-MVLP. For normal walking, the rank-1 accuracies on the two datasets are 97.6% and 90.1%, respectively. In complex scenarios, the proposed method achieves accuracies of 94.8% and 84.9% on CASIA-B under bag-carrying and coat-wearing walking conditions. The results show that our method achieves the top level among state-of-the-art methods. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

16 pages, 1871 KiB  
Article
Semi-Supervised Group Emotion Recognition Based on Contrastive Learning
by Jiayi Zhang, Xingzhi Wang, Dong Zhang and Dah-Jye Lee
Electronics 2022, 11(23), 3990; https://doi.org/10.3390/electronics11233990 - 1 Dec 2022
Cited by 6 | Viewed by 1946
Abstract
The performance of all learning-based group emotion recognition (GER) methods depends on the number of labeled samples. Although there are lots of group emotion images available on the Internet, labeling them manually is a labor-intensive and cost-expensive process. For this reason, datasets for [...] Read more.
The performance of all learning-based group emotion recognition (GER) methods depends on the number of labeled samples. Although there are lots of group emotion images available on the Internet, labeling them manually is a labor-intensive and cost-expensive process. For this reason, datasets for GER are usually small in size, which limits the performance of GER. Considering labeling manually is challenging, using limited labeled images and a large number of unlabeled images in the network training is a potential way to improve the performance of GER. In this work, we propose a semi-supervised group emotion recognition framework based on contrastive learning to learn efficient features from both labeled and unlabeled images. In the proposed method, the unlabeled images are used to pretrain the backbone by a contrastive learning method, and the labeled images are used to fine-tune the network. The unlabeled images are then given pseudo-labels by the fine-tuned network and used for further training. In order to alleviate the uncertainty of the given pseudo-labels, we propose a Weight Cross-Entropy Loss (WCE-Loss) to suppress the influence of the samples with unreliable pseudo-labels in the training process. Experiment results on three prominent benchmark datasets for GER show the effectiveness of the proposed framework and its superiority compared with other competitive state-of-the-art methods. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

17 pages, 6277 KiB  
Article
Automatic Knee Injury Identification through Thermal Image Processing and Convolutional Neural Networks
by Omar Trejo-Chavez, Juan P. Amezquita-Sanchez, Jose R. Huerta-Rosales, Luis A. Morales-Hernandez, Irving A. Cruz-Albarran and Martin Valtierra-Rodriguez
Electronics 2022, 11(23), 3987; https://doi.org/10.3390/electronics11233987 - 1 Dec 2022
Cited by 2 | Viewed by 2223
Abstract
Knee injury is a common health problem that affects both people who practice sports and those who do not do it. The high prevalence of knee injuries produces a considerable impact on the health-related life quality of patients. For this reason, it is [...] Read more.
Knee injury is a common health problem that affects both people who practice sports and those who do not do it. The high prevalence of knee injuries produces a considerable impact on the health-related life quality of patients. For this reason, it is essential to develop procedures for an early diagnosis, allowing patients to receive timely treatment for preventing and correcting knee injuries. In this regard, this paper presents, as main contribution, a methodology based on infrared thermography (IT) and convolutional neural networks (CNNs) to automatically differentiate between a healthy knee and an injured knee, being an alternative tool to help medical specialists. In general, the methodology consists of three steps: (1) database generation, (2) image processing, and (3) design and validation of a CNN for automatically identifying a patient with an injured knee. In the image-processing stage, grayscale images, equalized images, and thermal images are obtained as inputs for the CNN, where 98.72% of accuracy is obtained by the proposed method. To test its robustness, different infrared images with changes in rotation angle and different brightness levels (i.e., possible conditions at the time of imaging) are used, obtaining 97.44% accuracy. These results demonstrate the effectiveness and robustness of the proposal for differentiating between a patient with a healthy knee and an injured knee, having the advantages of using a fast, low-cost, innocuous, and non-invasive technology. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

21 pages, 1260 KiB  
Article
Multi-Model Inference Accelerator for Binary Convolutional Neural Networks
by André L. de Sousa, Mário P. Véstias and Horácio C. Neto
Electronics 2022, 11(23), 3966; https://doi.org/10.3390/electronics11233966 - 30 Nov 2022
Cited by 3 | Viewed by 1768
Abstract
Binary convolutional neural networks (BCNN) have shown good accuracy for small to medium neural network models. Their extreme quantization of weights and activations reduces off-chip data transfer and greatly reduces the computational complexity of convolutions. Further reduction in the complexity of a BCNN [...] Read more.
Binary convolutional neural networks (BCNN) have shown good accuracy for small to medium neural network models. Their extreme quantization of weights and activations reduces off-chip data transfer and greatly reduces the computational complexity of convolutions. Further reduction in the complexity of a BCNN model for fast execution can be achieved with model size reduction at the cost of network accuracy. In this paper, a multi-model inference technique is proposed to reduce the execution time of the binarized inference process without accuracy reduction. The technique considers a cascade of neural network models with different computation/accuracy ratios. A parameterizable binarized neural network with different trade-offs between complexity and accuracy is used to obtain multiple network models. We also propose a hardware accelerator to run multi-model inference throughput in embedded systems. The multi-model inference accelerator is demonstrated on low-density Zynq-7010 and Zynq-7020 FPGA devices, classifying images from the CIFAR-10 dataset. The proposed accelerator improves the frame rate per number of LUTs by 7.2× those of previous solutions on a ZYNQ7020 FPGA with similar accuracy. This shows the effectiveness of the multi-model inference technique and the efficiency of the proposed hardware accelerator. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

30 pages, 3531 KiB  
Article
Hybrid CNN and XGBoost Model Tuned by Modified Arithmetic Optimization Algorithm for COVID-19 Early Diagnostics from X-ray Images
by Miodrag Zivkovic, Nebojsa Bacanin, Milos Antonijevic, Bosko Nikolic, Goran Kvascev, Marina Marjanovic and Nikola Savanovic
Electronics 2022, 11(22), 3798; https://doi.org/10.3390/electronics11223798 - 18 Nov 2022
Cited by 86 | Viewed by 4637
Abstract
Developing countries have had numerous obstacles in diagnosing the COVID-19 worldwide pandemic since its emergence. One of the most important ways to control the spread of this disease begins with early detection, which allows that isolation and treatment could perhaps be started. According [...] Read more.
Developing countries have had numerous obstacles in diagnosing the COVID-19 worldwide pandemic since its emergence. One of the most important ways to control the spread of this disease begins with early detection, which allows that isolation and treatment could perhaps be started. According to recent results, chest X-ray scans provide important information about the onset of the infection, and this information may be evaluated so that diagnosis and treatment can begin sooner. This is where artificial intelligence collides with skilled clinicians’ diagnostic abilities. The suggested study’s goal is to make a contribution to battling the worldwide epidemic by using a simple convolutional neural network (CNN) model to construct an automated image analysis framework for recognizing COVID-19 afflicted chest X-ray data. To improve classification accuracy, fully connected layers of simple CNN were replaced by the efficient extreme gradient boosting (XGBoost) classifier, which is used to categorize extracted features by the convolutional layers. Additionally, a hybrid version of the arithmetic optimization algorithm (AOA), which is also developed to facilitate proposed research, is used to tune XGBoost hyperparameters for COVID-19 chest X-ray images. Reported experimental data showed that this approach outperforms other state-of-the-art methods, including other cutting-edge metaheuristics algorithms, that were tested in the same framework. For validation purposes, a balanced X-ray images dataset with 12,000 observations, belonging to normal, COVID-19 and viral pneumonia classes, was used. The proposed method, where XGBoost was tuned by introduced hybrid AOA, showed superior performance, achieving a classification accuracy of approximately 99.39% and weighted average precision, recall and F1-score of 0.993889, 0.993887 and 0.993887, respectively. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

12 pages, 5132 KiB  
Article
SlowFast Action Recognition Algorithm Based on Faster and More Accurate Detectors
by Wei Zeng, Junjian Huang, Wei Zhang, Hai Nan and Zhenjiang Fu
Electronics 2022, 11(22), 3770; https://doi.org/10.3390/electronics11223770 - 16 Nov 2022
Cited by 3 | Viewed by 2865
Abstract
Object detection algorithms play a crucial role in other vision tasks. This paper finds that the action recognition algorithm SlowFast’s detection algorithm FasterRCNN (Region Convolutional Neural Network) has disadvantages in terms of both detection accuracy and speed and the traditional IOU (Intersection over [...] Read more.
Object detection algorithms play a crucial role in other vision tasks. This paper finds that the action recognition algorithm SlowFast’s detection algorithm FasterRCNN (Region Convolutional Neural Network) has disadvantages in terms of both detection accuracy and speed and the traditional IOU (Intersection over Union) localization loss is difficult to make the detection model converge to the minimum stability point. To solve the above problems, the article uses YOLOv3 (You Only Look Once), YOLOX, and CascadeRCNN to improve the detection accuracy and speed of the SlowFast. This paper proposes a new localization loss function that adopts the Lance and Williams distance as a new penalty term. The new loss function is more sensitive when the distance difference is smaller, and this property is very suitable for the late convergence of the detection model. The experiments were conducted on the VOC (Visual Object Classes) dataset and the COCO dataset. In the final videos test, YOLOv3 improved the detection speed by 10.5 s. CascadeRCNN improved by 3.1%AP compared to FasterRCNN in the COCO dataset. YOLOX’s performance on the COCO dataset is also mostly better than that of FasterRCNN. The new LIOU (Lance and Williams Distance Intersection over Union) localization loss function performs better than other loss functions in the VOC dataset. It can be seen that improving the detection algorithm of the SlowFast seems to be crucial and the proposed loss function is indeed effective. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

16 pages, 4372 KiB  
Article
Research on Defect Detection in Automated Fiber Placement Processes Based on a Multi-Scale Detector
by Yongde Zhang, Wei Wang, Qi Liu, Zhonghua Guo and Yangchun Ji
Electronics 2022, 11(22), 3757; https://doi.org/10.3390/electronics11223757 - 16 Nov 2022
Cited by 5 | Viewed by 2531
Abstract
Various surface defects in automated fiber placement (AFP) processes affect the forming quality of the components. In addition, defect detection usually requires manual observation with the naked eye, which leads to low production efficiency. Therefore, automatic solutions for defect recognition have high economic [...] Read more.
Various surface defects in automated fiber placement (AFP) processes affect the forming quality of the components. In addition, defect detection usually requires manual observation with the naked eye, which leads to low production efficiency. Therefore, automatic solutions for defect recognition have high economic potential. In this paper, we propose a multi-scale AFP defect detection algorithm, named the spatial pyramid feature fusion YOLOv5 with channel attention (SPFFY-CA). The spatial pyramid feature fusion YOLOv5 (SPFFY) adopts spatial pyramid dilated convolutions (SPDCs) to fuse the feature maps extracted in different receptive fields, thus integrating multi-scale defect information. For the feature maps obtained from a concatenate function, channel attention (CA) can improve the representation ability of the network and generate more effective features. In addition, the sparsity training and pruning (STP) method is utilized to achieve network slimming, thus ensuring the efficiency and accuracy of defect detection. The experimental results of the PASCAL VOC and our AFP defect datasets demonstrate the effectiveness of our scheme, which achieves superior performance. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

14 pages, 3090 KiB  
Article
Food Recognition and Food Waste Estimation Using Convolutional Neural Network
by Jelena Lubura, Lato Pezo, Mirela Alina Sandu, Viktoria Voronova, Francesco Donsì, Jana Šic Žlabur, Bojan Ribić, Anamarija Peter, Jona Šurić, Ivan Brandić, Marija Klõga, Sanja Ostojić, Gianpiero Pataro, Ana Virsta, Ana Elisabeta Oros (Daraban), Darko Micić, Saša Đurović, Giovanni De Feo, Alessandra Procentese and Neven Voća
Electronics 2022, 11(22), 3746; https://doi.org/10.3390/electronics11223746 - 15 Nov 2022
Cited by 9 | Viewed by 4478
Abstract
In this study, an evaluation of food waste generation was conducted, using images taken before and after the daily meals of people aged between 20 and 30 years in Serbia, for the period between 1 January and 31 April in 2022. A convolutional [...] Read more.
In this study, an evaluation of food waste generation was conducted, using images taken before and after the daily meals of people aged between 20 and 30 years in Serbia, for the period between 1 January and 31 April in 2022. A convolutional neural network (CNN) was employed for the tasks of recognizing food images before the meal and estimating the percentage of food waste according to the photographs taken. Keeping in mind the vast variates and types of food available, the image recognition and validation of food items present a generally very challenging task. Nevertheless, deep learning has recently been shown to be a very potent image recognition procedure, while CNN presents a state-of-the-art method of deep learning. The CNN technique was implemented to the food detection and food waste estimation tasks throughout the parameter optimization procedure. The images of the most frequently encountered food items were collected from the internet to create an image dataset, covering 157 food categories, which was used to evaluate recognition performance. Each category included between 50 and 200 images, while the total number of images in the database reached 23,552. The CNN model presented good prediction capabilities, showing an accuracy of 0.988 and a loss of 0.102, after the network training cycle. The average food waste per meal, in the frame of the analysis in Serbia, was 21.3%, according to the images collected for food waste evaluation. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

13 pages, 2145 KiB  
Article
On the Optimization of Machine Learning Techniques for Chaotic Time Series Prediction
by Astrid Maritza González-Zapata, Esteban Tlelo-Cuautle and Israel Cruz-Vega
Electronics 2022, 11(21), 3612; https://doi.org/10.3390/electronics11213612 - 5 Nov 2022
Cited by 8 | Viewed by 2323
Abstract
Interest in chaotic time series prediction has grown in recent years due to its multiple applications in fields such as climate and health. In this work, we summarize the contribution of multiple works that use different machine learning (ML) methods to predict chaotic [...] Read more.
Interest in chaotic time series prediction has grown in recent years due to its multiple applications in fields such as climate and health. In this work, we summarize the contribution of multiple works that use different machine learning (ML) methods to predict chaotic time series. It is highlighted that the challenge is predicting the larger horizon with low error, and for this task, the majority of authors use datasets generated by chaotic systems such as Lorenz, Rössler and Mackey–Glass. Among the classification and description of different machine learning methods, this work takes as a case study the Echo State Network (ESN) to show that its optimization can lead to enhance the prediction horizon of chaotic time series. Different optimization methods applied to different machine learning ones are given to appreciate that metaheuristics are a good option to optimize an ESN. In this manner, an ESN in closed-loop mode is optimized herein by applying Particle Swarm Optimization. The prediction results of the optimized ESN show an increase of about twice the number of steps ahead, thus highlighting the usefulness of performing an optimization to the hyperparameters of an ML method to increase the prediction horizon. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

17 pages, 2615 KiB  
Article
LRSE-Net: Lightweight Residual Squeeze-and-Excitation Network for Stenosis Detection in X-ray Coronary Angiography
by Emmanuel Ovalle-Magallanes, Juan Gabriel Avina-Cervantes, Ivan Cruz-Aceves and Jose Ruiz-Pinales
Electronics 2022, 11(21), 3570; https://doi.org/10.3390/electronics11213570 - 1 Nov 2022
Cited by 6 | Viewed by 1981
Abstract
Coronary heart disease is the primary cause of death worldwide. Among these, ischemic heart disease and stroke are the most common diseases induced by coronary stenosis. This study presents a Lightweight Residual Squeeze-and-Excitation Network (LRSE-Net) for stenosis classification in X-ray Coronary Angiography images. [...] Read more.
Coronary heart disease is the primary cause of death worldwide. Among these, ischemic heart disease and stroke are the most common diseases induced by coronary stenosis. This study presents a Lightweight Residual Squeeze-and-Excitation Network (LRSE-Net) for stenosis classification in X-ray Coronary Angiography images. The proposed model employs redundant kernel deletion and tensor decomposition by Depthwise Separable Convolutions to reduce the model parameters up to 48.6 x concerning a Vanilla Residual Squeeze-and-Excitation Network. Furthermore, the reduction ratios of each Squeeze-and-Excitation module are optimized individually to improve the feature recalibration. Experimental results for Stenosis Detection on the publicly available Deep Stenosis Detection Dataset and Angiographic Dataset demonstrate that the proposed LRSE-Net achieves the best Accuracy—0.9549/0.9543, Sensitivity—0.6320/0.8792, Precision—0.5991/0.8944, and F1-score—0.6103/0.8944, as well as competitive Specificity of 0.9620/0.9733. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

16 pages, 705 KiB  
Article
Quantum Chaotic Honey Badger Algorithm for Feature Selection
by Samah Alshathri, Mohamed Abd Elaziz, Dalia Yousri , Osama Farouk Hassan  and Rehab Ali Ibrahim 
Electronics 2022, 11(21), 3463; https://doi.org/10.3390/electronics11213463 - 26 Oct 2022
Cited by 8 | Viewed by 2146
Abstract
Determining the most relevant features is a critical pre-processing step in various fields to enhance prediction. To address this issue, a set of feature selection (FS) techniques have been proposed; however, they still have certain limitations. For example, they may focus on nearby [...] Read more.
Determining the most relevant features is a critical pre-processing step in various fields to enhance prediction. To address this issue, a set of feature selection (FS) techniques have been proposed; however, they still have certain limitations. For example, they may focus on nearby points, which lowers classification accuracy because the chosen features may include noisy features. To take advantage of the benefits of the quantum-based optimization technique and the 2D chaotic Hénon map, we provide a modified version of the honey badger algorithm (HBA) called QCHBA. The ability of such strategies to strike a balance between exploitation and exploration while identifying the workable subset of pertinent features is the basis for employing them to enhance HBA. The effectiveness of QCHBA was evaluated in a series of experiments conducted using eighteen datasets involving comparison with recognized FS techniques. The results indicate high efficiency of the QCHBA among the datasets using various performance criteria. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

18 pages, 4858 KiB  
Article
Fast 3D Liver Segmentation Using a Trained Deep Chan-Vese Model
by Orhan Akal and Adrian Barbu
Electronics 2022, 11(20), 3323; https://doi.org/10.3390/electronics11203323 - 14 Oct 2022
Cited by 1 | Viewed by 2384
Abstract
This paper introduces an approach for 3D organ segmentation that generalizes in multiple ways the Chan-Vese level set method. Chan-Vese is a segmentation method that simultaneously evolves a level set while fitting locally constant intensity models for the interior and exterior regions. First, [...] Read more.
This paper introduces an approach for 3D organ segmentation that generalizes in multiple ways the Chan-Vese level set method. Chan-Vese is a segmentation method that simultaneously evolves a level set while fitting locally constant intensity models for the interior and exterior regions. First, its simple length-based regularization is replaced with a learned shape model based on a Fully Convolutional Network (FCN). We show how to train the FCN and introduce data augmentation methods to avoid overfitting. Second, two 3D variants of the method are introduced, one based on a 3D U-Net that makes global shape modifications and one based on a 3D FCN that makes local refinements. These two variants are integrated in a full 3D organ segmentation approach that is capable and efficient in dealing with the large size of the 3D volumes with minimal overfitting. Experiments on liver segmentation on a standard benchmark dataset show that the method obtains 3D segmentation results competitive with the state of the art while being very fast and having a small number of trainable parameters. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

13 pages, 5498 KiB  
Article
Towards Low-Cost Classification for Novel Fine-Grained Datasets
by Abbas Anwar, Hafeez Anwar and Saeed Anwar
Electronics 2022, 11(17), 2701; https://doi.org/10.3390/electronics11172701 - 29 Aug 2022
Cited by 2 | Viewed by 2364
Abstract
Fine-grained categorization is an essential field in classification, a subfield of object recognition that aims to differentiate subordinate classes. Fine-grained image classification concentrates on distinguishing between similar, hard-to-differentiate types or species, for example, flowers, birds, or specific animals such as dogs or cats, [...] Read more.
Fine-grained categorization is an essential field in classification, a subfield of object recognition that aims to differentiate subordinate classes. Fine-grained image classification concentrates on distinguishing between similar, hard-to-differentiate types or species, for example, flowers, birds, or specific animals such as dogs or cats, and identifying airplane makes or models. An important step towards fine-grained classification is the acquisition of datasets and baselines; hence, we propose a holistic system and two novel datasets, including reef fish and butterflies, for fine-grained classification. The butterflies and fish can be imaged at various locations in the image plane; thus, causing image variations due to translation, rotation, and deformation in multiple directions can induce variations, and depending on the image acquisition device’s position, scales can be different. We evaluate the traditional algorithms based on quantized rotation and scale-invariant local image features and the convolutional neural networks (CNN) using their pre-trained models to extract features. The comprehensive evaluation shows that the CNN features calculated using the pre-trained models outperform the rest of the image representations. The proposed system can prove instrumental for various purposes, such as education, conservation, and scientific research. The codes, models, and dataset are publicly available. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

25 pages, 3606 KiB  
Article
Feature Activation through First Power Linear Unit with Sign
by Boxi Duan, Yufei Yang and Xianhua Dai
Electronics 2022, 11(13), 1980; https://doi.org/10.3390/electronics11131980 - 24 Jun 2022
Cited by 2 | Viewed by 1607
Abstract
The activation function represents a crucial component in the design of a convolutional neural network (CNN). It enables the efficient extraction of multiple features from visual patterns, and introduces systemic non-linearity to data processing. This paper proposes a novel and insightful activation method [...] Read more.
The activation function represents a crucial component in the design of a convolutional neural network (CNN). It enables the efficient extraction of multiple features from visual patterns, and introduces systemic non-linearity to data processing. This paper proposes a novel and insightful activation method termed FPLUS, which exploits mathematical power function with polar signs in form. It is enlightened by common inverse operations while endowed with an intuitive meaning of bionics. The formulation is derived theoretically under conditions of some prior knowledge and anticipative properties. Subsequently, its feasibility is verified through a series of experiments using typical benchmark datasets. The results indicate that our approach bears superior competitiveness among numerous activation functions, as well as compatible stability across many CNN architectures. Furthermore, we extend the function presented to a more generalized type called PFPLUS with two parameters that can be fixed or learnable, so as to augment its expressive capacity. The outcomes of identical tests serve to validate this improvement. Therefore, we believe the work in this paper holds a certain value in enriching the family of activation units. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

14 pages, 12975 KiB  
Article
Learning Facial Motion Representation with a Lightweight Encoder for Identity Verification
by Zheng Sun, Andrew W. Sumsion, Shad A. Torrie and Dah-Jye Lee
Electronics 2022, 11(13), 1946; https://doi.org/10.3390/electronics11131946 - 22 Jun 2022
Cited by 3 | Viewed by 1793
Abstract
Deep learning became an important image classification and object detection technique more than a decade ago. It has since achieved human-like performance for many computer vision tasks. Some of them involve the analysis of human face for applications like facial recognition, expression recognition, [...] Read more.
Deep learning became an important image classification and object detection technique more than a decade ago. It has since achieved human-like performance for many computer vision tasks. Some of them involve the analysis of human face for applications like facial recognition, expression recognition, and facial landmark detection. In recent years, researchers have generated and made publicly available many valuable datasets that allow for the development of more accurate and robust models for these important tasks. Exploiting the information contained inside these pretrained deep structures could open the door to many new applications and provide a quick path to their success. This research focuses on a unique application that analyzes short facial motion video for identity verification. Our proposed solution leverages the rich information in those deep structures to provide accurate face representation for facial motion analysis. We have developed two strategies to employ the information contained in the existing models for image-based face analysis to learn the facial motion representations for our application. Combining with those pretrained spatial feature extractors for face-related analyses, our customized sequence encoder is capable of generating accurate facial motion embedding for identity verification application. The experimental results show that the facial geometry information from those feature extractors is valuable and helps our model achieve an impressive average precision of 98.8% for identity verification using facial motion. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

22 pages, 1583 KiB  
Article
Attentive Part-Based Alignment Network for Vehicle Re-Identification
by Yichu Liu, Haifeng Hu and Dihu Chen
Electronics 2022, 11(10), 1617; https://doi.org/10.3390/electronics11101617 - 19 May 2022
Cited by 3 | Viewed by 1864
Abstract
Vehicle Re-identification (Re-ID) has become a research hotspot along with the rapid development of video surveillance. Attention mechanisms are utilized in vehicle Re-ID networks but often miss the attention alignment across views. In this paper, we propose a novel Attentive Part-based Alignment Network [...] Read more.
Vehicle Re-identification (Re-ID) has become a research hotspot along with the rapid development of video surveillance. Attention mechanisms are utilized in vehicle Re-ID networks but often miss the attention alignment across views. In this paper, we propose a novel Attentive Part-based Alignment Network (APANet) to learn robust, diverse, and discriminative features for vehicle Re-ID. To be specific, in order to enhance the discrimination of part features, two part-level alignment mechanisms are proposed in APANet, consisting of Part-level Orthogonality Loss (POL) and Part-level Attention Alignment Loss (PAAL). Furthermore, POL aims to maximize the diversity of part features via an orthogonal penalty among parts whilst PAAL learns view-invariant features by means of realizing attention alignment in a part-level fashion. Moreover, we propose a Multi-receptive-field Attention (MA) module to adopt an efficient and cost-effective pyramid structure. The pyramid structure is capable of employing more fine-grained and heterogeneous-scale spatial attention information through multi-receptive-field streams. In addition, the improved TriHard loss and Inter-group Feature Centroid Loss (IFCL) function are utilized to optimize both the inter-group and intra-group distance. Extensive experiments demonstrate the superiority of our model over multiple existing state-of-the-art approaches on two popular vehicle Re-ID benchmarks. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)
Show Figures

Figure 1

Back to TopTop