applsci-logo

Journal Browser

Journal Browser

Advances in Deep Learning

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 April 2019) | Viewed by 197540

Special Issue Editors


E-Mail Website
Guest Editor
Department of Electrical Engineering and Information Technologies, University of Naples Federico II, via Claudio, 21, 80125 Napoli, Italy
Interests: deep learning; computer vision; multimedia forensics; medical imaging; biometrics

E-Mail Website
Guest Editor
Department of Control and Computer Engineering, Politecnico di Torino, 10129 Torino, Italy
Interests: extended reality; HCI; computer graphics; machine learning; serious games
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Politecnico di Torino, Department of Control and Computer Engineering, Corso Duca degli Abruzzi, 24, 10129 Torino, Italy
Interests: speaker and language recognition; pattern recognition; machine learning; statistical models

E-Mail Website
Guest Editor
Department of Control and Computer Engineering, North Carolina State University, Raleigh, NC 27695, USA
Interests: human factors; statistic learning; deep learning

Special Issue Information

Machine-learning-based algorithms are widespread in several aspects of our daily life, from the advertising and logistics systems of corporations to the applications on our smartphones and cameras, with an ever-increasing number of devices including dedicated hardware. This growing deployment of machine-learning-based algorithms would not have been possible if not for the lightning-fast progress of the relevant research.

In recent years, a growing interest in deep learning approaches has been observed among the scientific community. These are a particular class of machine-learning techniques that allow an intelligent system to automatically learn a suitable data representation from the data themselves. This has been even more successful for multimedia applications, such as video and audio classification, due to the ability of deep-learning-based techniques to extract the implicit information of this kind of data. For instance, various deep learning classifiers have reached human performance in medical image classification for the recognition of a large number of diseases, narrowing the gap between the analytic capability of the machine and that of the human brain. Great improvements have also been achieved in the field of natural language processing, with techniques able to analyze and extract information from a text even when it lacks a predetermined form.

An even more interesting research trend is focusing on generative models: A completely novel deep learning approach that has shown the ability to learn a complex statistical distribution from its samples in an unsupervised manner. The aim of this approach is to train a neural network to generate new samples of the learned distribution. Generative models have demonstrated their effectiveness in different fields, from the generation of image and video that are marginally distinguishable from the original ones to text and speech automatic translation.

We encourage authors to submit original research articles, reviews, theoretical and critical perspectives, and viewpoint articles, on (but not limited to) the following topics:

- Convolutional neural networks;

- Recurrent neural networks;

- Generative neural network models;

- Comparison of neural networks and other methods;

- Multiscale multimedia analysis;

- Constrained learning approaches for critical applications;

- Predictive analysis;

- Developing new models for multimodal deep learning;

- Combining multiple deep learning models;

- Applications in vision, audio, speech, natural language processing, robotics, neuroscience, or any other field.

Dr. Diego Gragnaniello
Prof. Dr. Andrea Bottino
Dr. Sandro Cumani
Dr. Wonjoon Kim
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Deep learning Neural networks Generative neural network models Multiscale data representation Constrained optimization Predictive analysis Feature interpretation Deep learning analytics involving linked data

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (35 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research, Review, Other

6 pages, 191 KiB  
Editorial
Special Issue on Advances in Deep Learning
by Diego Gragnaniello, Andrea Bottino, Sandro Cumani and Wonjoon Kim
Appl. Sci. 2020, 10(9), 3172; https://doi.org/10.3390/app10093172 - 2 May 2020
Cited by 4 | Viewed by 2001
Abstract
Nowadays, deep learning is the fastest growing research field in machine learning and has a tremendous impact on a plethora of daily life applications, ranging from security and surveillance to autonomous driving, automatic indexing and retrieval of media content, text analysis, speech recognition, [...] Read more.
Nowadays, deep learning is the fastest growing research field in machine learning and has a tremendous impact on a plethora of daily life applications, ranging from security and surveillance to autonomous driving, automatic indexing and retrieval of media content, text analysis, speech recognition, automatic translation, and many others [...] Full article
(This article belongs to the Special Issue Advances in Deep Learning)

Research

Jump to: Editorial, Review, Other

15 pages, 6023 KiB  
Article
Image-to-Image Translation Using Identical-Pair Adversarial Networks
by Thai Leang Sung and Hyo Jong Lee
Appl. Sci. 2019, 9(13), 2668; https://doi.org/10.3390/app9132668 - 30 Jun 2019
Cited by 10 | Viewed by 7101
Abstract
We propose Identical-pair Adversarial Networks (iPANs) to solve image-to-image translation problems, such as aerial-to-map, edge-to-photo, de-raining, and night-to-daytime. Our iPANs rely mainly on the effectiveness of adversarial loss function and its network architectures. Our iPANs consist of two main networks, an image transformation [...] Read more.
We propose Identical-pair Adversarial Networks (iPANs) to solve image-to-image translation problems, such as aerial-to-map, edge-to-photo, de-raining, and night-to-daytime. Our iPANs rely mainly on the effectiveness of adversarial loss function and its network architectures. Our iPANs consist of two main networks, an image transformation network T and a discriminative network D. We use U-NET for the transformation network T and a perceptual similarity network, which has two streams of VGG16 that share the same weights for network D. Our proposed adversarial losses play a minimax game against each other based on a real identical-pair and a fake identical-pair distinguished by the discriminative network D; e.g. a discriminative network D considers two inputs as a real pair only when they are identical, otherwise a fake pair. Meanwhile, the transformation network T tries to persuade the discriminator network D that the fake pair is a real pair. We experimented on several problems of image-to-image translation and achieved results that are comparable to those of some existing approaches, such as pix2pix, and PAN. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

13 pages, 1675 KiB  
Article
Efficient Weights Quantization of Convolutional Neural Networks Using Kernel Density Estimation based Non-uniform Quantizer
by Sanghyun Seo and Juntae Kim
Appl. Sci. 2019, 9(12), 2559; https://doi.org/10.3390/app9122559 - 23 Jun 2019
Cited by 29 | Viewed by 8209
Abstract
Convolutional neural networks (CNN) have achieved excellent results in the field of image recognition that classifies objects in images. A typical CNN consists of a deep architecture that uses a large number of weights and layers to achieve high performance. CNN requires relatively [...] Read more.
Convolutional neural networks (CNN) have achieved excellent results in the field of image recognition that classifies objects in images. A typical CNN consists of a deep architecture that uses a large number of weights and layers to achieve high performance. CNN requires relatively large memory space and computational costs, which not only increase the time to train the model but also limit the real-time application of the trained model. For this reason, various neural network compression methodologies have been studied to efficiently use CNN in small embedded hardware such as mobile and edge devices. In this paper, we propose a kernel density estimation based non-uniform quantization methodology that can perform compression efficiently. The proposed method performs efficient weights quantization using a significantly smaller number of sampled weights than the number of original weights. Four-bit quantization experiments on the classification of the ImageNet dataset with various CNN architectures show that the proposed methodology can perform weights quantization efficiently in terms of computational costs without significant reduction in model performance. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

16 pages, 1751 KiB  
Article
Improving Generative and Discriminative Modelling Performance by Implementing Learning Constraints in Encapsulated Variational Autoencoders
by Wenjun Bai, Changqin Quan and Zhi-Wei Luo
Appl. Sci. 2019, 9(12), 2551; https://doi.org/10.3390/app9122551 - 21 Jun 2019
Cited by 2 | Viewed by 3803
Abstract
Learning latent representations of observed data that can favour both discriminative and generative tasks remains a challenging task in artificial-intelligence (AI) research. Previous attempts that ranged from the convex binding of discriminative and generative models to the semisupervised learning paradigm could hardly yield [...] Read more.
Learning latent representations of observed data that can favour both discriminative and generative tasks remains a challenging task in artificial-intelligence (AI) research. Previous attempts that ranged from the convex binding of discriminative and generative models to the semisupervised learning paradigm could hardly yield optimal performance on both generative and discriminative tasks. To this end, in this research, we harness the power of two neuroscience-inspired learning constraints, that is, dependence minimisation and regularisation constraints, to improve generative and discriminative modelling performance of a deep generative model. To demonstrate the usage of these learning constraints, we introduce a novel deep generative model: encapsulated variational autoencoders (EVAEs) to stack two different variational autoencoders together with their learning algorithm. Using the MNIST digits dataset as a demonstration, the generative modelling performance of EVAEs was improved with the imposed dependence-minimisation constraint, encouraging our derived deep generative model to produce various patterns of MNIST-like digits. Using CIFAR-10(4K) as an example, a semisupervised EVAE with an imposed regularisation learning constraint was able to achieve competitive discriminative performance on the classification benchmark, even in the face of state-of-the-art semisupervised learning approaches. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

18 pages, 1713 KiB  
Article
Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features
by Anvarjon Tursunov, Soonil Kwon and Hee-Suk Pang
Appl. Sci. 2019, 9(12), 2470; https://doi.org/10.3390/app9122470 - 17 Jun 2019
Cited by 25 | Viewed by 6058
Abstract
The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence [...] Read more.
The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence dimension (positive and negative). The main reason for this is that some of the discrete emotions, such as anger and happiness, share similar acoustic features in the arousal dimension (high and low) but are different in the valence dimension. Timbre is a sound quality that can discriminate between two sounds even with the same pitch and loudness. In this paper, we analyzed timbre acoustic features to improve the classification performance of discrete emotions as well as emotions in the valence dimension. Sequential forward selection (SFS) was used to find the most relevant acoustic features among timbre acoustic features. The experiments were carried out on the Berlin Emotional Speech Database and the Interactive Emotional Dyadic Motion Capture Database. Support vector machine (SVM) and long short-term memory recurrent neural network (LSTM-RNN) were used to classify emotions. The significant classification performance improvements were achieved using a combination of baseline and the most relevant timbre acoustic features, which were found by applying SFS on a classification of emotions for the Berlin Emotional Speech Database. From extensive experiments, it was found that timbre acoustic features could characterize emotions sufficiently in a speech in the valence dimension. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

23 pages, 421 KiB  
Article
A Simple Convolutional Neural Network with Rule Extraction
by Guido Bologna
Appl. Sci. 2019, 9(12), 2411; https://doi.org/10.3390/app9122411 - 13 Jun 2019
Cited by 20 | Viewed by 5057
Abstract
Classification responses provided by Multi Layer Perceptrons (MLPs) can be explained by means of propositional rules. So far, many rule extraction techniques have been proposed for shallow MLPs, but not for Convolutional Neural Networks (CNNs). To fill this gap, this work presents a [...] Read more.
Classification responses provided by Multi Layer Perceptrons (MLPs) can be explained by means of propositional rules. So far, many rule extraction techniques have been proposed for shallow MLPs, but not for Convolutional Neural Networks (CNNs). To fill this gap, this work presents a new rule extraction method applied to a typical CNN architecture used in Sentiment Analysis (SA). We focus on the textual data on which the CNN is trained with “tweets” of movie reviews. Its architecture includes an input layer representing words by “word embeddings”, a convolutional layer, a max-pooling layer, followed by a fully connected layer. Rule extraction is performed on the fully connected layer, with the help of the Discretized Interpretable Multi Layer Perceptron (DIMLP). This transparent MLP architecture allows us to generate symbolic rules, by precisely locating axis-parallel hyperplanes. Experiments based on cross-validation emphasize that our approach is more accurate than that based on SVMs and decision trees that substitute DIMLPs. Overall, rules reach high fidelity and the discriminative n-grams represented in the antecedents explain the classifications adequately. With several test examples we illustrate the n-grams represented in the activated rules. They present the particularity to contribute to the final classification with a certain intensity. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

17 pages, 1924 KiB  
Article
Disentangled Feature Learning for Noise-Invariant Speech Enhancement
by Soo Hyun Bae, Inkyu Choi and Nam Soo Kim
Appl. Sci. 2019, 9(11), 2289; https://doi.org/10.3390/app9112289 - 3 Jun 2019
Cited by 3 | Viewed by 3914
Abstract
Most of the recently proposed deep learning-based speech enhancement techniques have focused on designing the neural network architectures as a black box. However, it is often beneficial to understand what kinds of hidden representations the model has learned. Since the real-world speech data [...] Read more.
Most of the recently proposed deep learning-based speech enhancement techniques have focused on designing the neural network architectures as a black box. However, it is often beneficial to understand what kinds of hidden representations the model has learned. Since the real-world speech data are drawn from a generative process involving multiple entangled factors, disentangling the speech factor can encourage the trained model to result in better performance for speech enhancement. With the recent success in learning disentangled representation using neural networks, we explore a framework for disentangling speech and noise, which has not been exploited in the conventional speech enhancement algorithms. In this work, we propose a novel noise-invariant speech enhancement method which manipulates the latent features to distinguish between the speech and noise features in the intermediate layers using adversarial training scheme. To compare the performance of the proposed method with other conventional algorithms, we conducted experiments in both the matched and mismatched noise conditions using TIMIT and TSPspeech datasets. Experimental results show that our model successfully disentangles the speech and noise latent features. Consequently, the proposed model not only achieves better enhancement performance but also offers more robust noise-invariant property than the conventional speech enhancement techniques. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Graphical abstract

14 pages, 2694 KiB  
Article
Boosting Targeted Black-Box Attacks via Ensemble Substitute Training and Linear Augmentation
by Xianfeng Gao, Yu-an Tan, Hongwei Jiang, Quanxin Zhang and Xiaohui Kuang
Appl. Sci. 2019, 9(11), 2286; https://doi.org/10.3390/app9112286 - 3 Jun 2019
Cited by 26 | Viewed by 3261
Abstract
These years, Deep Neural Networks (DNNs) have shown unprecedented performance in many areas. However, some recent studies revealed their vulnerability to small perturbations added on source inputs. Furthermore, we call the ways to generate these perturbations’ adversarial attacks, which contain two types, black-box [...] Read more.
These years, Deep Neural Networks (DNNs) have shown unprecedented performance in many areas. However, some recent studies revealed their vulnerability to small perturbations added on source inputs. Furthermore, we call the ways to generate these perturbations’ adversarial attacks, which contain two types, black-box and white-box attacks, according to the adversaries’ access to target models. In order to overcome the problem of black-box attackers’ unreachabilities to the internals of target DNN, many researchers put forward a series of strategies. Previous works include a method of training a local substitute model for the target black-box model via Jacobian-based augmentation and then use the substitute model to craft adversarial examples using white-box methods. In this work, we improve the dataset augmentation to make the substitute models better fit the decision boundary of the target model. Unlike the previous work that just performed the non-targeted attack, we make it first to generate targeted adversarial examples via training substitute models. Moreover, to boost the targeted attacks, we apply the idea of ensemble attacks to the substitute training. Experiments on MNIST and GTSRB, two common datasets for image classification, demonstrate our effectiveness and efficiency of boosting a targeted black-box attack, and we finally attack the MNIST and GTSRB classifiers with the success rates of 97.7% and 92.8%. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

17 pages, 872 KiB  
Article
Design and Investigation of Capsule Networks for Sentence Classification
by Haftu Wedajo Fentaw and Tae-Hyong Kim
Appl. Sci. 2019, 9(11), 2200; https://doi.org/10.3390/app9112200 - 29 May 2019
Cited by 19 | Viewed by 4980
Abstract
In recent years, convolutional neural networks (CNNs) have been used as an alternative to recurrent neural networks (RNNs) in text processing with promising results. In this paper, we investigated the newly introduced capsule networks (CapsNets), which are getting a lot of attention due [...] Read more.
In recent years, convolutional neural networks (CNNs) have been used as an alternative to recurrent neural networks (RNNs) in text processing with promising results. In this paper, we investigated the newly introduced capsule networks (CapsNets), which are getting a lot of attention due to their great performance gains on image analysis more than CNNs, for sentence classification or sentiment analysis in some cases. The results of our experiment show that the proposed well-tuned CapsNet model can be a good, sometimes better and cheaper, substitute of models based on CNNs and RNNs used in sentence classification. In order to investigate whether CapsNets can learn the sequential order of words or not, we performed a number of experiments by reshuffling the test data. Our CapsNet model shows an overall better classification performance and better resistance to adversarial attacks than CNN and RNN models. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

14 pages, 622 KiB  
Article
Confidence Measures for Deep Learning in Domain Adaptation
by Simone Bonechi, Paolo Andreini, Monica Bianchini, Akshay Pai and Franco Scarselli
Appl. Sci. 2019, 9(11), 2192; https://doi.org/10.3390/app9112192 - 29 May 2019
Cited by 6 | Viewed by 3590
Abstract
In recent years, Deep Neural Networks (DNNs) have led to impressive results in a wide variety of machine learning tasks, typically relying on the existence of a huge amount of supervised data. However, in many applications (e.g., bio–medical image analysis), gathering large sets [...] Read more.
In recent years, Deep Neural Networks (DNNs) have led to impressive results in a wide variety of machine learning tasks, typically relying on the existence of a huge amount of supervised data. However, in many applications (e.g., bio–medical image analysis), gathering large sets of labeled data can be very difficult and costly. Unsupervised domain adaptation exploits data from a source domain, where annotations are available, to train a model able to generalize also to a target domain, where labels are unavailable. Recent research has shown that Generative Adversarial Networks (GANs) can be successfully employed for domain adaptation, although deciding when to stop learning is a major concern for GANs. In this work, we propose some confidence measures that can be used to early stop the GAN training, also showing how such measures can be employed to predict the reliability of the network output. The effectiveness of the proposed approach has been tested in two domain adaptation tasks, with very promising results. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

17 pages, 936 KiB  
Article
Chronic Disease Prediction Using Character-Recurrent Neural Network in The Presence of Missing Information
by Changgyun Kim, Youngdoo Son and Sekyoung Youm
Appl. Sci. 2019, 9(10), 2170; https://doi.org/10.3390/app9102170 - 27 May 2019
Cited by 17 | Viewed by 5059
Abstract
The aim of this study was to predict chronic diseases in individual patients using a character-recurrent neural network (Char-RNN), which is a deep learning model that treats data in each class as a word when a large portion of its input values is [...] Read more.
The aim of this study was to predict chronic diseases in individual patients using a character-recurrent neural network (Char-RNN), which is a deep learning model that treats data in each class as a word when a large portion of its input values is missing. An advantage of Char-RNN is that it does not require any additional imputation method because it implicitly infers missing values considering the relationship with nearby data points. We applied Char-RNN to classify cases in the Korea National Health and Nutrition Examination Survey (KNHANES) VI as normal status and five chronic diseases: hypertension, stroke, angina pectoris, myocardial infarction, and diabetes mellitus. We also employed a multilayer perceptron network for the same task for comparison. The results show higher accuracy for Char-RNN than for the conventional multilayer perceptron model. Char-RNN showed remarkable performance in finding patients with hypertension and stroke. The present study utilized the KNHANES VI data to demonstrate a practical approach to predicting and managing chronic diseases with partially observed information. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

13 pages, 1288 KiB  
Article
Layer-Level Knowledge Distillation for Deep Neural Network Learning
by Hao-Ting Li, Shih-Chieh Lin, Cheng-Yeh Chen and Chen-Kuo Chiang
Appl. Sci. 2019, 9(10), 1966; https://doi.org/10.3390/app9101966 - 14 May 2019
Cited by 18 | Viewed by 6167
Abstract
Motivated by the recently developed distillation approaches that aim to obtain small and fast-to-execute models, in this paper a novel Layer Selectivity Learning (LSL) framework is proposed for learning deep models. We firstly use an asymmetric dual-model learning framework, called Auxiliary Structure Learning [...] Read more.
Motivated by the recently developed distillation approaches that aim to obtain small and fast-to-execute models, in this paper a novel Layer Selectivity Learning (LSL) framework is proposed for learning deep models. We firstly use an asymmetric dual-model learning framework, called Auxiliary Structure Learning (ASL), to train a small model with the help of a larger and well-trained model. Then, the intermediate layer selection scheme, called the Layer Selectivity Procedure (LSP), is exploited to determine the corresponding intermediate layers of source and target models. The LSP is achieved by two novel matrices, the layered inter-class Gram matrix and the inter-layered Gram matrix, to evaluate the diversity and discrimination of feature maps. The experimental results, demonstrated using three publicly available datasets, present the superior performance of model training using the LSL deep model learning framework. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

16 pages, 834 KiB  
Article
Heated Metal Mark Attribute Recognition Based on Compressed CNNs Model
by He Yin, Keming Mao, Jianzhe Zhao, Huidong Chang, Dazhi E and Zhenhua Tan
Appl. Sci. 2019, 9(9), 1955; https://doi.org/10.3390/app9091955 - 13 May 2019
Cited by 2 | Viewed by 2640
Abstract
This study considered heated metal mark attribute recognition based on compressed convolutional neural networks (CNNs) models. Based on our previous works, the heated metal mark image benchmark dataset was further expanded. State-of-the-art lightweight CNNs models were selected. Technologies of pruning, compressing, weight quantization [...] Read more.
This study considered heated metal mark attribute recognition based on compressed convolutional neural networks (CNNs) models. Based on our previous works, the heated metal mark image benchmark dataset was further expanded. State-of-the-art lightweight CNNs models were selected. Technologies of pruning, compressing, weight quantization were introduced and analyzed. Then, a multi-label model training method was devised. Moreover, the proposed models were deployed on Android devices. Finally, comprehensive experiments were evaluated. The results show that, with the fine-tuned compressed CNNs model, the recognition rate of attributes meta type, heating mode, heating temperature, heating duration, cooling mode, placing duration and relative humidity were 0.803, 0.837, 0.825, 0.812, 0.883, 0.817 and 0.894, respectively. The best model obtained an overall performance of 0.823. Comparing with traditional CNNs, the adopted compressed multi-label model greatly improved the training efficiency and reduced the space occupation, with a relatively small decrease in recognition accuracy. The running time on Android devices was acceptable. It is shown that the proposed model is applicable for real time application and is convenient to implement on mobile or embedded devices scenarios. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

14 pages, 1763 KiB  
Article
The N-Grams Based Text Similarity Detection Approach Using Self-Organizing Maps and Similarity Measures
by Pavel Stefanovič, Olga Kurasova and Rokas Štrimaitis
Appl. Sci. 2019, 9(9), 1870; https://doi.org/10.3390/app9091870 - 7 May 2019
Cited by 26 | Viewed by 5612
Abstract
In the paper the word-level n-grams based approach is proposed to find similarity between texts. The approach is a combination of two separate and independent techniques: self-organizing map (SOM) and text similarity measures. SOM’s uniqueness is that the obtained results of data clustering, [...] Read more.
In the paper the word-level n-grams based approach is proposed to find similarity between texts. The approach is a combination of two separate and independent techniques: self-organizing map (SOM) and text similarity measures. SOM’s uniqueness is that the obtained results of data clustering, as well as dimensionality reduction, are presented in a visual form. The four measures have been evaluated: cosine, dice, extended Jaccard’s, and overlap. First of all, texts have to be converted to numerical expression. For that purpose, the text has been split into the word-level n-grams and after that, the bag of n-grams has been created. The n-grams’ frequencies are calculated and the frequency matrix of dataset is formed. Various filters are used to create a bag of n-grams: stemming algorithms, number and punctuation removers, stop words, etc. All experimental investigation has been made using a corpus of plagiarized short answers dataset. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Graphical abstract

18 pages, 10497 KiB  
Article
A Deep Learning Method for Bearing Fault Diagnosis through Stacked Residual Dilated Convolutions
by Zilong Zhuang, Huichun Lv, Jie Xu, Zizhao Huang and Wei Qin
Appl. Sci. 2019, 9(9), 1823; https://doi.org/10.3390/app9091823 - 1 May 2019
Cited by 72 | Viewed by 4685
Abstract
Real-time monitoring and fault diagnosis of bearings are of great significance to improve production safety, prevent major accidents, and reduce production costs. However, there are three primary concerns in the current research, namely real-time performance, effectiveness, and generalization performance. In this paper, a [...] Read more.
Real-time monitoring and fault diagnosis of bearings are of great significance to improve production safety, prevent major accidents, and reduce production costs. However, there are three primary concerns in the current research, namely real-time performance, effectiveness, and generalization performance. In this paper, a deep learning method based on stacked residual dilated convolutional neural network (SRDCNN) is proposed for real-time bearing fault diagnosis, which is subtly combined by the dilated convolution, the input gate structure of long short-term memory network (LSTM) and the residual network. In the SRDCNN model, the dilated convolution is used to exponentially increase the receptive field of convolution kernel and extract features from the sample with more points, alleviating the influence of randomness. The input gate structure of LSTM could effectively remove noise and control the entry of information contained in the input sample. Meanwhile, the residual network is introduced to overcome the problem of vanishing gradients caused by the deeper structure of the neural network, hence improving the overall classification accuracy. The experimental results indicate that compared with three excellent models, the proposed SRDCNN model has higher denoising ability and better workload adaptability. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

24 pages, 2026 KiB  
Article
Data-Driven Model-Free Tracking Reinforcement Learning Control with VRFT-based Adaptive Actor-Critic
by Mircea-Bogdan Radac and Radu-Emil Precup
Appl. Sci. 2019, 9(9), 1807; https://doi.org/10.3390/app9091807 - 30 Apr 2019
Cited by 43 | Viewed by 4460
Abstract
This paper proposes a neural network (NN)-based control scheme in an Adaptive Actor-Critic (AAC) learning framework designed for output reference model tracking, as a representative deep-learning application. The control learning scheme is model-free with respect to the process model. AAC designs usually require [...] Read more.
This paper proposes a neural network (NN)-based control scheme in an Adaptive Actor-Critic (AAC) learning framework designed for output reference model tracking, as a representative deep-learning application. The control learning scheme is model-free with respect to the process model. AAC designs usually require an initial controller to start the learning process; however, systematic guidelines for choosing the initial controller are not offered in the literature, especially in a model-free manner. Virtual Reference Feedback Tuning (VRFT) is proposed for obtaining an initially stabilizing NN nonlinear state-feedback controller, designed from input-state-output data collected from the process in open-loop setting. The solution offers systematic design guidelines for initial controller design. The resulting suboptimal state-feedback controller is next improved under the AAC learning framework by online adaptation of a critic NN and a controller NN. The mixed VRFT-AAC approach is validated on a multi-input multi-output nonlinear constrained coupled vertical two-tank system. Discussions on the control system behavior are offered together with comparisons with similar approaches. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

18 pages, 3137 KiB  
Article
Deterministic and Probabilistic Wind Power Forecasting Based on Bi-Level Convolutional Neural Network and Particle Swarm Optimization
by Xiyun Yang, Yanfeng Zhang, Yuwei Yang and Wei Lv
Appl. Sci. 2019, 9(9), 1794; https://doi.org/10.3390/app9091794 - 29 Apr 2019
Cited by 30 | Viewed by 3703
Abstract
The intermittency and uncertainty of wind power result in challenges for large-scale wind power integration. Accurate wind power prediction is becoming increasingly important for power system planning and operation. In this paper, a probabilistic interval prediction method for wind power based on deep [...] Read more.
The intermittency and uncertainty of wind power result in challenges for large-scale wind power integration. Accurate wind power prediction is becoming increasingly important for power system planning and operation. In this paper, a probabilistic interval prediction method for wind power based on deep learning and particle swarm optimization (PSO) is proposed. Variational mode decomposition (VMD) and phase space reconstruction are used to pre-process the original wind power data to obtain additional details and uncover hidden information in the data. Subsequently, a bi-level convolutional neural network is used to learn nonlinear features in the pre-processed wind power data for wind power forecasting. PSO is used to determine the uncertainty of the point-based wind power prediction and to obtain the probabilistic prediction interval of the wind power. Wind power data from a Chinese wind farm and modeled wind power data provided by the United States Renewable Energy Laboratory are used to conduct extensive tests of the proposed method. The results show that the proposed method has competitive advantages for the point-based and probabilistic interval prediction of wind power. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

17 pages, 1106 KiB  
Article
Obtaining Human Experience for Intelligent Dredger Control: A Reinforcement Learning Approach
by Changyun Wei, Fusheng Ni and Xiujing Chen
Appl. Sci. 2019, 9(9), 1769; https://doi.org/10.3390/app9091769 - 28 Apr 2019
Cited by 16 | Viewed by 3429
Abstract
This work presents a reinforcement learning approach for intelligent decision-making of a Cutter Suction Dredger (CSD), which is a special type of vessel for deepening harbors, constructing ports or navigational channels, and reclaiming landfills. Currently, CSDs are usually controlled by human operators, and [...] Read more.
This work presents a reinforcement learning approach for intelligent decision-making of a Cutter Suction Dredger (CSD), which is a special type of vessel for deepening harbors, constructing ports or navigational channels, and reclaiming landfills. Currently, CSDs are usually controlled by human operators, and the production rate is mainly determined by the so-called cutting process (i.e., cutting the underwater soil into fragments). Long-term manual operation is likely to cause driving fatigue, resulting in operational accidents and inefficiencies. To reduce the labor intensity of the operator, we seek an intelligent controller the can manipulate the cutting process to replace human operators. To this end, our proposed reinforcement learning approach consists of two parts. In the first part, we employ a neural network model to construct a virtual environment based on the historical dredging data. In the second part, we develop a reinforcement learning model that can lean the optimal control policy by interacting with the virtual environment to obtain human experience. The results show that the proposed learning approach can successfully imitate the dredging behavior of an experienced human operator. Moreover, the learning approach can outperform the operator in a way that can make quick responses to the change in uncertain environments. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Graphical abstract

13 pages, 282 KiB  
Article
Abstract Text Summarization with a Convolutional Seq2seq Model
by Yong Zhang, Dan Li, Yuheng Wang, Yang Fang and Weidong Xiao
Appl. Sci. 2019, 9(8), 1665; https://doi.org/10.3390/app9081665 - 23 Apr 2019
Cited by 54 | Viewed by 6869
Abstract
Abstract text summarization aims to offer a highly condensed and valuable information that expresses the main ideas of the text. Most previous researches focus on extractive models. In this work, we put forward a new generative model based on convolutional seq2seq architecture. A [...] Read more.
Abstract text summarization aims to offer a highly condensed and valuable information that expresses the main ideas of the text. Most previous researches focus on extractive models. In this work, we put forward a new generative model based on convolutional seq2seq architecture. A hierarchical CNN framework is much more efficient than the conventional RNN seq2seq models. We also equip our model with a copying mechanism to deal with the rare or unseen words. Additionally, we incorporate a hierarchical attention mechanism to model the keywords and key sentences simultaneously. Finally we verify our model on two real-life datasets, GigaWord and DUC corpus. The experiment results verify the effectiveness of our model as it outperforms state-of-the-art alternatives consistently and statistical significantly. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

16 pages, 3315 KiB  
Article
Fertility Detection of Hatching Eggs Based on a Convolutional Neural Network
by Lei Geng, Yuzhou Hu, Zhitao Xiao and Jiangtao Xi
Appl. Sci. 2019, 9(7), 1408; https://doi.org/10.3390/app9071408 - 3 Apr 2019
Cited by 13 | Viewed by 4596
Abstract
In order to achieve the goal of detecting the fertility of hatching eggs which are divided into fertile eggs and dead eggs more accurately and effectively, a novel method combining a convolution neural network (CNN) and a heartbeat signal of the hatching eggs [...] Read more.
In order to achieve the goal of detecting the fertility of hatching eggs which are divided into fertile eggs and dead eggs more accurately and effectively, a novel method combining a convolution neural network (CNN) and a heartbeat signal of the hatching eggs is proposed in this paper. Firstly, we collected heartbeat signals of 9-day-later hatching eggs by the method of PhotoPlethysmoGraphy (PPG), which is a non-invasive method to detect the change of blood volume in living tissues by photoelectric means. Secondly, a sequential convolutional neural network E-CNN, which was used to analyze heartbeat sequence of hatching eggs, was designed. Thirdly, an end-to-end trainable convolutional neural network SR-CNN, which was used to process heartbeat waveform images of hatching eggs, was designed to improve the classification performance in this paper. Key to improving the classification performance of SR-CNN is the SE-Res module, which combines the channel weighting unit “Squeeze-and-Excitation” (SE) block and the residual structure. The experimental results show that two models trained on our dataset, with E-CNN and SR-CNN, are able to achieve the fertility detection of the hatching eggs with superior identification accuarcy, up to 99.50% and 99.62% respectively, on our test set. It is demonstrated that the proposed method is feasible for identifying and classifying the survival of hatching eggs accurately and effectively. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Graphical abstract

16 pages, 1976 KiB  
Article
Parts Semantic Segmentation Aware Representation Learning for Person Re-Identification
by Hua Gao, Shengyong Chen and Zhaosheng Zhang
Appl. Sci. 2019, 9(6), 1239; https://doi.org/10.3390/app9061239 - 25 Mar 2019
Cited by 11 | Viewed by 4369
Abstract
Person re-identification is a typical computer vision problem which aims at matching pedestrians across disjoint camera views. It is challenging due to the misalignment of body parts caused by pose variations, background clutter, detection errors, camera point of view variation, different accessories and [...] Read more.
Person re-identification is a typical computer vision problem which aims at matching pedestrians across disjoint camera views. It is challenging due to the misalignment of body parts caused by pose variations, background clutter, detection errors, camera point of view variation, different accessories and occlusion. In this paper, we propose a person re-identification network which fuses global and local features, to deal with part misalignment problem. The network is a four-branch convolutional neural network (CNN) which learns global person appearance and local features of three human body parts respectively. Local patches, including the head, torso and lower body, are segmented by using a U_Net semantic segmentation CNN architecture. All four feature maps are then concatenated and fused to represent a person image. We propose a DropParts method to solve the parts missing problem, with which the local features are weighed according to the number of parts found by semantic segmentation. Since three body parts are well aligned, the approach significantly improves person re-identification. Experiments on the standard benchmark datasets, such as Market1501, CUHK03 and DukeMTMC-reID datasets, show the effectiveness of our proposed pipeline. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

15 pages, 634 KiB  
Article
A Spam Filtering Method Based on Multi-Modal Fusion
by Hong Yang, Qihe Liu, Shijie Zhou and Yang Luo
Appl. Sci. 2019, 9(6), 1152; https://doi.org/10.3390/app9061152 - 19 Mar 2019
Cited by 39 | Viewed by 6260
Abstract
In recent years, the single-modal spam filtering systems have had a high detection rate for image spamming or text spamming. To avoid detection based on the single-modal spam filtering systems, spammers inject junk information into the multi-modality part of an email and combine [...] Read more.
In recent years, the single-modal spam filtering systems have had a high detection rate for image spamming or text spamming. To avoid detection based on the single-modal spam filtering systems, spammers inject junk information into the multi-modality part of an email and combine them to reduce the recognition rate of the single-modal spam filtering systems, thereby implementing the purpose of evading detection. In view of this situation, a new model called multi-modal architecture based on model fusion (MMA-MF) is proposed, which use a multi-modal fusion method to ensure it could effectively filter spam whether it is hidden in the text or in the image. The model fuses a Convolutional Neural Network (CNN) model and a Long Short-Term Memory (LSTM) model to filter spam. Using the LSTM model and the CNN model to process the text and image parts of an email separately to obtain two classification probability values, then the two classification probability values are incorporated into a fusion model to identify whether the email is spam or not. For the hyperparameters of the MMA-MF model, we use a grid search optimization method to get the most suitable hyperparameters for it, and employ a k-fold cross-validation method to evaluate the performance of this model. Our experimental results show that this model is superior to the traditional spam filtering systems and can achieve accuracies in the range of 92.64–98.48%. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

10 pages, 2840 KiB  
Article
Learning Deep CNN Denoiser Priors for Depth Image Inpainting
by Zun Li and Jin Wu
Appl. Sci. 2019, 9(6), 1103; https://doi.org/10.3390/app9061103 - 15 Mar 2019
Cited by 18 | Viewed by 4007
Abstract
Due to the rapid development of RGB-D sensors, increasing attention is being paid to depth image applications. Depth images play an important role in computer vision research. In this paper, we address the problem of inpainting for single depth images without corresponding color [...] Read more.
Due to the rapid development of RGB-D sensors, increasing attention is being paid to depth image applications. Depth images play an important role in computer vision research. In this paper, we address the problem of inpainting for single depth images without corresponding color images as a guide. Within the framework of model-based optimization methods for depth image inpainting, the split Bregman iteration algorithm was used to transform depth image inpainting into the corresponding denoising subproblem. Then, we trained a set of efficient convolutional neural network (CNN) denoisers to solve this subproblem. Experimental results demonstrate the effectiveness of the proposed algorithm in comparison with three traditional methods in terms of visual quality and objective metrics. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

13 pages, 4754 KiB  
Article
An Algorithm for Scene Text Detection Using Multibox and Semantic Segmentation
by Hongbo Qin, Haodi Zhang, Hai Wang, Yujin Yan, Min Zhang and Wei Zhao
Appl. Sci. 2019, 9(6), 1054; https://doi.org/10.3390/app9061054 - 13 Mar 2019
Cited by 11 | Viewed by 3463
Abstract
An outside mutual correction (OMC) algorithm for natural scene text detection using multibox and semantic segmentation was developed. In the OMC algorithm, semantic segmentation and multibox were processed in parallel, and the text detection results were mutually corrected. The mutual correction process was [...] Read more.
An outside mutual correction (OMC) algorithm for natural scene text detection using multibox and semantic segmentation was developed. In the OMC algorithm, semantic segmentation and multibox were processed in parallel, and the text detection results were mutually corrected. The mutual correction process was divided into two steps: (1) The semantic segmentation results were employed in the bounding box enhancement module (BEM) to correct the multibox results. (2) The semantic bounding box module (SBM) was used to optimize the adhesion text boundary of the semantic segmentation results. Non-maximum suppression (NMS) was adopted to merge the SBM and BEM results. Our algorithm was evaluated on the ICDAR2013 and SVT datasets. The experimental results show that the developed algorithm had a maximum increase of 13.62% in the F-measure score and the highest F-measure score was 81.38%. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

15 pages, 4097 KiB  
Article
An Automatic Modulation Recognition Method with Low Parameter Estimation Dependence Based on Spatial Transformer Networks
by Mingxuan Li, Ou Li, Guangyi Liu and Ce Zhang
Appl. Sci. 2019, 9(5), 1010; https://doi.org/10.3390/app9051010 - 11 Mar 2019
Cited by 13 | Viewed by 4900
Abstract
Recently, automatic modulation recognition has been an important research topic in wireless communication. Due to the application of deep learning, it is prospective of using convolution neural networks on raw in-phase and quadrature signals in developing automatic modulation recognition methods. However, the errors [...] Read more.
Recently, automatic modulation recognition has been an important research topic in wireless communication. Due to the application of deep learning, it is prospective of using convolution neural networks on raw in-phase and quadrature signals in developing automatic modulation recognition methods. However, the errors introduced during signal reception and processing will greatly deteriorate the classification performance, which affects the practical application of such methods. Therefore, we first analyze and quantify the errors introduced by signal detection and isolation in noncooperative communication through a baseline convolution neural network. In response to these errors, we then design a signal spatial transformer module based on the attention model to eliminate errors by a priori learning of signal structure. By cascading a signal spatial transformer module in front of the baseline classification network, we propose a method that can adaptively resample the signal capture to adjust time drift, symbol rate, and clock recovery. Besides, it can also automatically add a perturbation on the signal carrier to correct frequency offset. By applying this improved model to automatic modulation recognition, we obtain a significant improvement in classification performance compared with several existing methods. Our method significantly improves the prospect of the application of automatic modulation recognition based on deep learning under nonideal synchronization. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

17 pages, 2520 KiB  
Article
An On-Line and Adaptive Method for Detecting Abnormal Events in Videos Using Spatio-Temporal ConvNet
by Samir Bouindour, Hichem Snoussi, Mohamad Mazen Hittawe, Nacef Tazi and Tian Wang
Appl. Sci. 2019, 9(4), 757; https://doi.org/10.3390/app9040757 - 21 Feb 2019
Cited by 30 | Viewed by 4355
Abstract
We address in this paper the problem of abnormal event detection in video-surveillance. In this context, we use only normal events as training samples. We propose to use a modified version of pretrained 3D residual convolutional network to extract spatio-temporal features, and we [...] Read more.
We address in this paper the problem of abnormal event detection in video-surveillance. In this context, we use only normal events as training samples. We propose to use a modified version of pretrained 3D residual convolutional network to extract spatio-temporal features, and we develop a robust classifier based on the selection of vectors of interest. It is able to learn the normal behavior model and detect potentially dangerous abnormal events. This unsupervised method prevents the marginalization of normal events that occur rarely during the training phase since it minimizes redundancy information, and adapt to the appearance of new normal events that occur during the testing phase. Experimental results on challenging datasets show the superiority of the proposed method compared to the state of the art in both frame-level and pixel-level in anomaly detection task. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

14 pages, 13637 KiB  
Article
Joint Pedestrian and Body Part Detection via Semantic Relationship Learning
by Junhua Gu, Chuanxin Lan, Wenbai Chen and Hu Han
Appl. Sci. 2019, 9(4), 752; https://doi.org/10.3390/app9040752 - 21 Feb 2019
Cited by 10 | Viewed by 4035
Abstract
While remarkable progress has been made to pedestrian detection in recent years, robust pedestrian detection in the wild e.g., under surveillance scenarios with occlusions, remains a challenging problem. In this paper, we present a novel approach for joint pedestrian and body part detection [...] Read more.
While remarkable progress has been made to pedestrian detection in recent years, robust pedestrian detection in the wild e.g., under surveillance scenarios with occlusions, remains a challenging problem. In this paper, we present a novel approach for joint pedestrian and body part detection via semantic relationship learning under unconstrained scenarios. Specifically, we propose a Body Part Indexed Feature (BPIF) representation to encode the semantic relationship between individual body parts (i.e., head, head-shoulder, upper body, and whole body) and highlight per body part features, providing robustness against partial occlusions to the whole body. We also propose an Adaptive Joint Non-Maximum Suppression (AJ-NMS) to replace the original NMS algorithm widely used in object detection, leading to higher precision and recall for detecting overlapped pedestrians. Experimental results on the public-domain CUHK-SYSU Person Search Dataset show that the proposed approach outperforms the state-of-the-art methods for joint pedestrian and body part detection in the wild. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

13 pages, 395 KiB  
Article
A Deep Temporal Neural Music Recommendation Model Utilizing Music and User Metadata
by Hai-Tao Zheng, Jin-Yuan Chen, Nan Liang, Arun Kumar Sangaiah, Yong Jiang and Cong-Zhi Zhao
Appl. Sci. 2019, 9(4), 703; https://doi.org/10.3390/app9040703 - 18 Feb 2019
Cited by 21 | Viewed by 4358
Abstract
Deep learning shows its superiority in many domains such as computing vision, nature language processing, and speech recognition. In music recommendation, most deep learning-based methods focus on learning users’ temporal preferences using their listening histories. The cold start problem is not addressed, however, [...] Read more.
Deep learning shows its superiority in many domains such as computing vision, nature language processing, and speech recognition. In music recommendation, most deep learning-based methods focus on learning users’ temporal preferences using their listening histories. The cold start problem is not addressed, however, and the music characteristics are not fully exploited by these methods. In addition, the music characteristics and the users’ temporal preferences are not combined naturally, which cause the relatively low performance of music recommendation. To address these issues, we proposed a Deep Temporal Neural Music Recommendation model (DTNMR) based on music characteristics and the users’ temporal preferences. We encoded the music metadata into one-hot vectors and utilized the Deep Neural Network to project the music vectors to low-dimensional space and obtain the music characteristics. In addition, Long Short-Term Memory (LSTM) neural networks are utilized to learn about users’ long-term and short-term preferences from their listening histories. DTNMR alleviates the cold start problem in the item side using the music medadata and discovers new users’ preferences immediately after they listen to music. The experimental results show DTNMR outperforms seven baseline methods in terms of recall, precision, f-measure, MAP, user coverage and AUC. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

15 pages, 3141 KiB  
Article
Multiscale Object Detection in Infrared Streetscape Images Based on Deep Learning and Instance Level Data Augmentation
by Hao Qu, Lilian Zhang, Xuesong Wu, Xiaofeng He, Xiaoping Hu and Xudong Wen
Appl. Sci. 2019, 9(3), 565; https://doi.org/10.3390/app9030565 - 8 Feb 2019
Cited by 19 | Viewed by 4449
Abstract
The development of object detection in infrared images has attracted more attention in recent years. However, there are few studies on multi-scale object detection in infrared street scene images. Additionally, the lack of high-quality infrared datasets hinders research into such algorithms. In order [...] Read more.
The development of object detection in infrared images has attracted more attention in recent years. However, there are few studies on multi-scale object detection in infrared street scene images. Additionally, the lack of high-quality infrared datasets hinders research into such algorithms. In order to solve these issues, we firstly make a series of modifications based on Faster Region-Convolutional Neural Network (R-CNN). In this paper, a double-layer region proposal network (RPN) is proposed to predict proposals of different scales on both fine and coarse feature maps. Secondly, a multi-scale pooling module is introduced into the backbone of the network to explore the response of objects on different scales. Furthermore, the inception4 module and the position sensitive region of interest (ROI) align (PSalign) pooling layer are utilized to explore richer features of the objects. Thirdly, this paper proposes instance level data augmentation, which takes into account the imbalance between categories while enlarging dataset. In the training stage, the online hard example mining method is utilized to further improve the robustness of the algorithm in complex environments. The experimental results show that, compared with baseline, our detection method has state-of-the-art performance. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

15 pages, 2087 KiB  
Article
Diverse Decoding for Abstractive Document Summarization
by Xu-Wang Han, Hai-Tao Zheng, Jin-Yuan Chen and Cong-Zhi Zhao
Appl. Sci. 2019, 9(3), 386; https://doi.org/10.3390/app9030386 - 23 Jan 2019
Cited by 8 | Viewed by 3229
Abstract
Recently, neural sequence-to-sequence models have made impressive progress in abstractive document summarization. Unfortunately, as neural abstractive summarization research is in a primitive stage, the performance of these models is still far from ideal. In this paper, we propose a novel method called Neural [...] Read more.
Recently, neural sequence-to-sequence models have made impressive progress in abstractive document summarization. Unfortunately, as neural abstractive summarization research is in a primitive stage, the performance of these models is still far from ideal. In this paper, we propose a novel method called Neural Abstractive Summarization with Diverse Decoding (NASDD). This method augments the standard attentional sequence-to-sequence model in two aspects. First, we introduce a diversity-promoting beam search approach in the decoding process, which alleviates the serious diversity issue caused by standard beam search and hence increases the possibility of generating summary sequences that are more informative. Second, we creatively utilize the attention mechanism combined with the key information of the input document as an estimation of the salient information coverage, which aids in finding the optimal summary sequence. We carry out the experimental evaluation with state-of-the-art methods on the CNN/Daily Mail summarization dataset, and the results demonstrate the superiority of our proposed method. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

14 pages, 3205 KiB  
Article
Unsupervised Domain Adaptation with Coupled Generative Adversarial Autoencoders
by Xiaoqing Wang and Xiangjun Wang
Appl. Sci. 2018, 8(12), 2529; https://doi.org/10.3390/app8122529 - 7 Dec 2018
Cited by 10 | Viewed by 4069
Abstract
When large-scale annotated data are not available for certain image classification tasks, training a deep convolutional neural network model becomes challenging. Some recent domain adaptation methods try to solve this problem using generative adversarial networks and have achieved promising results. However, these methods [...] Read more.
When large-scale annotated data are not available for certain image classification tasks, training a deep convolutional neural network model becomes challenging. Some recent domain adaptation methods try to solve this problem using generative adversarial networks and have achieved promising results. However, these methods are based on a shared latent space assumption and they do not consider the situation when shared high level representations in different domains do not exist or are not ideal as they assumed. To overcome this limitation, we propose a neural network structure called coupled generative adversarial autoencoders (CGAA) that allows a pair of generators to learn the high-level differences between two domains by sharing only part of the high-level layers. Additionally, by introducing a class consistent loss calculated by a stand-alone classifier into the generator optimization, our model is able to generate class invariant style-transferred images suitable for classification tasks in domain adaptation. We apply CGAA to several domain transferred image classification scenarios including several benchmark datasets. Experiment results have shown that our method can achieve state-of-the-art classification results. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

Review

Jump to: Editorial, Research, Other

24 pages, 11091 KiB  
Review
A Survey on Deep Learning-Driven Remote Sensing Image Scene Understanding: Scene Classification, Scene Retrieval and Scene-Guided Object Detection
by Yating Gu, Yantian Wang and Yansheng Li
Appl. Sci. 2019, 9(10), 2110; https://doi.org/10.3390/app9102110 - 23 May 2019
Cited by 121 | Viewed by 10344
Abstract
As a fundamental and important task in remote sensing, remote sensing image scene understanding (RSISU) has attracted tremendous research interest in recent years. RSISU includes the following sub-tasks: remote sensing image scene classification, remote sensing image scene retrieval, and scene-driven remote sensing image [...] Read more.
As a fundamental and important task in remote sensing, remote sensing image scene understanding (RSISU) has attracted tremendous research interest in recent years. RSISU includes the following sub-tasks: remote sensing image scene classification, remote sensing image scene retrieval, and scene-driven remote sensing image object detection. Although these sub-tasks have different goals, they share some communal hints. Hence, this paper tries to discuss them as a whole. Similar to other domains (e.g., speech recognition and natural image recognition), deep learning has also become the state-of-the-art technique in RSISU. To facilitate the sustainable progress of RSISU, this paper presents a comprehensive review of deep-learning-based RSISU methods, and points out some future research directions and potential applications of RSISU. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

29 pages, 10285 KiB  
Review
Review of Artificial Intelligence Adversarial Attack and Defense Technologies
by Shilin Qiu, Qihe Liu, Shijie Zhou and Chunjiang Wu
Appl. Sci. 2019, 9(5), 909; https://doi.org/10.3390/app9050909 - 4 Mar 2019
Cited by 261 | Viewed by 29902
Abstract
In recent years, artificial intelligence technologies have been widely used in computer vision, natural language processing, automatic driving, and other fields. However, artificial intelligence systems are vulnerable to adversarial attacks, which limit the applications of artificial intelligence (AI) technologies in key security fields. [...] Read more.
In recent years, artificial intelligence technologies have been widely used in computer vision, natural language processing, automatic driving, and other fields. However, artificial intelligence systems are vulnerable to adversarial attacks, which limit the applications of artificial intelligence (AI) technologies in key security fields. Therefore, improving the robustness of AI systems against adversarial attacks has played an increasingly important role in the further development of AI. This paper aims to comprehensively summarize the latest research progress on adversarial attack and defense technologies in deep learning. According to the target model’s different stages where the adversarial attack occurred, this paper expounds the adversarial attack methods in the training stage and testing stage respectively. Then, we sort out the applications of adversarial attack technologies in computer vision, natural language processing, cyberspace security, and the physical world. Finally, we describe the existing adversarial defense methods respectively in three main categories, i.e., modifying data, modifying models and using auxiliary tools. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

Other

9 pages, 2859 KiB  
Letter
Variable Chromosome Genetic Algorithm for Structure Learning in Neural Networks to Imitate Human Brain
by Kang-moon Park, Donghoon Shin and Sung-do Chi
Appl. Sci. 2019, 9(15), 3176; https://doi.org/10.3390/app9153176 - 5 Aug 2019
Cited by 17 | Viewed by 3545
Abstract
This paper proposes the variable chromosome genetic algorithm (VCGA) for structure learning in neural networks. Currently, the structural parameters of neural networks, i.e., number of neurons, coupling relations, number of layers, etc., have mostly been designed on the basis of heuristic knowledge of [...] Read more.
This paper proposes the variable chromosome genetic algorithm (VCGA) for structure learning in neural networks. Currently, the structural parameters of neural networks, i.e., number of neurons, coupling relations, number of layers, etc., have mostly been designed on the basis of heuristic knowledge of an artificial intelligence (AI) expert. To overcome this limitation, in this study evolutionary approach (EA) has been utilized to automatically generate the proper artificial neural network (ANN) structures. VCGA has a new genetic operation called a chromosome attachment. By applying the VCGA, the initial ANN structures can be flexibly evolved toward the proper structure. The case study applied to the typical exclusive or (XOR) problem shows the feasibility of our methodology. Our approach is differentiated with others in that it uses a variable chromosome in the genetic algorithm. It makes a neural network structure vary naturally, both constructively and destructively. It has been shown that the XOR problem is successfully optimized using a VCGA with a chromosome attachment to learn the structure of neural networks. Research on the structure learning of more complex problems is the topic of our future research. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

17 pages, 7106 KiB  
Case Report
Evaluation of Deep Learning Neural Networks for Surface Roughness Prediction Using Vibration Signal Analysis
by Wan-Ju Lin, Shih-Hsuan Lo, Hong-Tsu Young and Che-Lun Hung
Appl. Sci. 2019, 9(7), 1462; https://doi.org/10.3390/app9071462 - 8 Apr 2019
Cited by 87 | Viewed by 8940
Abstract
The use of surface roughness (Ra) to indicate product quality in the milling process in an intelligent monitoring system applied in-process has been developing. From the considerations of convenient installation and cost-effectiveness, accelerator vibration signals combined with deep learning predictive models for predicting [...] Read more.
The use of surface roughness (Ra) to indicate product quality in the milling process in an intelligent monitoring system applied in-process has been developing. From the considerations of convenient installation and cost-effectiveness, accelerator vibration signals combined with deep learning predictive models for predicting surface roughness is a potential tool. In this paper, three models, namely, Fast Fourier Transform-Deep Neural Networks (FFT-DNN), Fast Fourier Transform Long Short Term Memory Network (FFT-LSTM), and one-dimensional convolutional neural network (1-D CNN), are used to explore the training and prediction performances. Feature extraction plays an important role in the training and predicting results. FFT and the one-dimensional convolution filter, known as 1-D CNN, are employed to extract vibration signals’ raw data. The results show the following: (1) the LSTM model presents the temporal modeling ability to achieve a good performance at higher Ra value and (2) 1-D CNN, which is better at extracting features, exhibits highly accurate prediction performance at lower Ra ranges. Based on the results, vibration signals combined with a deep learning predictive model could be applied to predict the surface roughness in the milling process. Based on this experimental study, the use of prediction of the surface roughness via vibration signals using FFT-LSTM or 1-D CNN is recommended to develop an intelligent system. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Show Figures

Figure 1

Back to TopTop