Deep Learning for Image and Video Understanding

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Evolutionary Algorithms and Machine Learning".

Deadline for manuscript submissions: closed (31 March 2019) | Viewed by 29285

Special Issue Editors


E-Mail Website
Guest Editor
Machine Learning & Knowledge Representation Lab, Innopolis University, Innopolis, Russia
Interests: machine learning; data mining; pattern recognition; context-aware computing; intelligent systems; data modeling and analysis

E-Mail Website
Guest Editor
Institute of Computing, Universidade Estadual de Campinas, Brazil
Interests: Image Processing; Computer Vision; Machine Learning

Special Issue Information

Dear Colleagues,

Comically trivial tasks, such as recognizing a handwritten digit in images, become dauntingly difficult when we try to automate them by writing a computer program. However, thanks to artificial neural networks, especially deep learning, we have found a solution to this problem. Such methods can learn to model such complex problems as a layered representation of simple concepts, directly from data, without requiring any hand-crafted features or hard-coded knowledge from experts.

Deep learning methods are therefore being employed on a large scale to solve computer vision problems. We invite you to submit your latest research in the area of deep learning and computer vision to this Special Issue, “Deep Learning for Image and Video Understanding.” We are looking for new and innovative deep learning approaches to solving problems such as object detection, segmentation, recognition, tracking, action recognition, etc.

High-quality papers are solicited to address both theoretical and practical issues of deep learning algorithms. Submissions are welcome both for traditional computer vision problems, as well as new applications. Potential topics include, but are not limited to:

  • K-shot learning for image and video understanding
  • Open set recognition for image and video understanding
  • Small-data learning for image and video understanding
  • Hierarchical and ensemble learning for image and video understanding
  • Data augmentation and transfer learning for image and video understanding
  • Semantic segmentation
  • Representation learning, feature detection and description for image and video understanding
  • Scene modeling and reconstruction
  • Scene understanding
  • Object detection, recognition and classification
  • Object pose estimation and tracking for image and video understanding
  • Person detection, tracking and identification for image and video understanding
  • Action and activity recognition for image and video understanding
  • Video annotation

Prof. Adil Mehmood Khan
Prof. Adín Ramírez Rivera
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Deep learning
  • computer vision
  • image processing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 4317 KiB  
Article
Refinement of Background-Subtraction Methods Based on Convolutional Neural Network Features for Dynamic Background
by Tianming Yu, Jianhua Yang and Wei Lu
Algorithms 2019, 12(7), 128; https://doi.org/10.3390/a12070128 - 27 Jun 2019
Cited by 5 | Viewed by 4054
Abstract
Advancing the background-subtraction method in dynamic scenes is an ongoing timely goal for many researchers. Recently, background subtraction methods have been developed with deep convolutional features, which have improved their performance. However, most of these deep methods are supervised, only available for a [...] Read more.
Advancing the background-subtraction method in dynamic scenes is an ongoing timely goal for many researchers. Recently, background subtraction methods have been developed with deep convolutional features, which have improved their performance. However, most of these deep methods are supervised, only available for a certain scene, and have high computational cost. In contrast, the traditional background subtraction methods have low computational costs and can be applied to general scenes. Therefore, in this paper, we propose an unsupervised and concise method based on the features learned from a deep convolutional neural network to refine the traditional background subtraction methods. For the proposed method, the low-level features of an input image are extracted from the lower layer of a pretrained convolutional neural network, and the main features are retained to further establish the dynamic background model. The evaluation of the experiments on dynamic scenes demonstrates that the proposed method significantly improves the performance of traditional background subtraction methods. Full article
(This article belongs to the Special Issue Deep Learning for Image and Video Understanding)
Show Figures

Figure 1

16 pages, 3842 KiB  
Article
Triplet Loss Network for Unsupervised Domain Adaptation
by Imad Eddine Ibrahim Bekkouch, Youssef Youssry, Rustam Gafarov, Adil Khan and Asad Masood Khattak
Algorithms 2019, 12(5), 96; https://doi.org/10.3390/a12050096 - 8 May 2019
Cited by 12 | Viewed by 8288
Abstract
Domain adaptation is a sub-field of transfer learning that aims at bridging the dissimilarity gap between different domains by transferring and re-using the knowledge obtained in the source domain to the target domain. Many methods have been proposed to resolve this problem, using [...] Read more.
Domain adaptation is a sub-field of transfer learning that aims at bridging the dissimilarity gap between different domains by transferring and re-using the knowledge obtained in the source domain to the target domain. Many methods have been proposed to resolve this problem, using techniques such as generative adversarial networks (GAN), but the complexity of such methods makes it hard to use them in different problems, as fine-tuning such networks is usually a time-consuming task. In this paper, we propose a method for unsupervised domain adaptation that is both simple and effective. Our model (referred to as TripNet) harnesses the idea of a discriminator and Linear Discriminant Analysis (LDA) to push the encoder to generate domain-invariant features that are category-informative. At the same time, pseudo-labelling is used for the target data to train the classifier and to bring the same classes from both domains together. We evaluate TripNet against several existing, state-of-the-art methods on three image classification tasks: Digit classification (MNIST, SVHN, and USPC datasets), object recognition (Office31 dataset), and traffic sign recognition (GTSRB and Synthetic Signs datasets). Our experimental results demonstrate that (i) TripNet beats almost all existing methods (having a similar simple model like it) on all of these tasks; and (ii) for models that are significantly more complex (or hard to train) than TripNet, it even beats their performance in some cases. Hence, the results confirm the effectiveness of using TripNet for unsupervised domain adaptation in image classification. Full article
(This article belongs to the Special Issue Deep Learning for Image and Video Understanding)
Show Figures

Figure 1

14 pages, 8047 KiB  
Article
Learning an Efficient Convolution Neural Network for Pansharpening
by Yecai Guo, Fei Ye and Hao Gong
Algorithms 2019, 12(1), 16; https://doi.org/10.3390/a12010016 - 8 Jan 2019
Cited by 9 | Viewed by 5410
Abstract
Pansharpening is a domain-specific task of satellite imagery processing, which aims at fusing a multispectral image with a corresponding panchromatic one to enhance the spatial resolution of multispectral image. Most existing traditional methods fuse multispectral and panchromatic images in linear manners, which greatly [...] Read more.
Pansharpening is a domain-specific task of satellite imagery processing, which aims at fusing a multispectral image with a corresponding panchromatic one to enhance the spatial resolution of multispectral image. Most existing traditional methods fuse multispectral and panchromatic images in linear manners, which greatly restrict the fusion accuracy. In this paper, we propose a highly efficient inference network to cope with pansharpening, which breaks the linear limitation of traditional methods. In the network, we adopt a dilated multilevel block coupled with a skip connection to perform local and overall compensation. By using dilated multilevel block, the proposed model can make full use of the extracted features and enlarge the receptive field without introducing extra computational burden. Experiment results reveal that our network tends to induce competitive even superior pansharpening performance compared with deeper models. As our network is shallow and trained with several techniques to prevent overfitting, our model is robust to the inconsistencies across different satellites. Full article
(This article belongs to the Special Issue Deep Learning for Image and Video Understanding)
Show Figures

Figure 1

16 pages, 5416 KiB  
Article
A Robust Visual Tracking Algorithm Based on Spatial-Temporal Context Hierarchical Response Fusion
by Wancheng Zhang, Yanmin Luo, Zhi Chen, Yongzhao Du, Daxin Zhu and Peizhong Liu
Algorithms 2019, 12(1), 8; https://doi.org/10.3390/a12010008 - 26 Dec 2018
Cited by 5 | Viewed by 5800
Abstract
Discriminative correlation filters (DCFs) have been shown to perform superiorly in visual object tracking. However, visual tracking is still challenging when the target objects undergo complex scenarios such as occlusion, deformation, scale changes and illumination changes. In this paper, we utilize the hierarchical [...] Read more.
Discriminative correlation filters (DCFs) have been shown to perform superiorly in visual object tracking. However, visual tracking is still challenging when the target objects undergo complex scenarios such as occlusion, deformation, scale changes and illumination changes. In this paper, we utilize the hierarchical features of convolutional neural networks (CNNs) and learn a spatial-temporal context correlation filter on convolutional layers. Then, the translation is estimated by fusing the response score of the filters on the three convolutional layers. In terms of scale estimation, we learn a discriminative correlation filter to estimate scale from the best confidence results. Furthermore, we proposed a re-detection activation discrimination method to improve the robustness of visual tracking in the case of tracking failure and an adaptive model update method to reduce tracking drift caused by noisy updates. We evaluate the proposed tracker with DCFs and deep features on OTB benchmark datasets. The tracking results demonstrated that the proposed algorithm is superior to several state-of-the-art DCF methods in terms of accuracy and robustness. Full article
(This article belongs to the Special Issue Deep Learning for Image and Video Understanding)
Show Figures

Figure 1

15 pages, 11769 KiB  
Article
A Study on Faster R-CNN-Based Subway Pedestrian Detection with ACE Enhancement
by Hongquan Qu, Meihan Wang, Changnian Zhang and Yun Wei
Algorithms 2018, 11(12), 192; https://doi.org/10.3390/a11120192 - 26 Nov 2018
Cited by 6 | Viewed by 4922
Abstract
At present, the problem of pedestrian detection has attracted increasing attention in the field of computer vision. The faster regions with convolutional neural network features (Faster R-CNN) are regarded as one of the most important techniques for studying this problem. However, the detection [...] Read more.
At present, the problem of pedestrian detection has attracted increasing attention in the field of computer vision. The faster regions with convolutional neural network features (Faster R-CNN) are regarded as one of the most important techniques for studying this problem. However, the detection capability of the model trained by faster R-CNN is susceptible to the diversity of pedestrians’ appearance and the light intensity in specific scenarios, such as in a subway, which can lead to the decline in recognition rate and the offset of target selection for pedestrians. In this paper, we propose the modified faster R-CNN method with automatic color enhancement (ACE), which can improve sample contrast by calculating the relative light and dark relationship to correct the final pixel value. In addition, a calibration method based on sample categories reduction is presented to accurately locate the target for detection. Then, we choose the faster R-CNN target detection framework on the experimental dataset. Finally, the effectiveness of this method is verified with the actual data sample collected from the subway passenger monitoring video. Full article
(This article belongs to the Special Issue Deep Learning for Image and Video Understanding)
Show Figures

Figure 1

Back to TopTop