Research on Deep Neural Networks for Video Motion Recognition

16 pages, 5429 KiB

Open AccessArticle

Video WeAther RecoGnition (VARG): An Intensity-Labeled Video Weather Recognition Dataset

by Himanshu Gupta, Oleksandr Kotlyar, Henrik Andreasson and Achim J. Lilienthal

J. Imaging 2024, 10(11), 281; https://doi.org/10.3390/jimaging10110281 - 5 Nov 2024

Viewed by 557

Adverse weather (rain, snow, and fog) can negatively impact computer vision tasks by introducing noise in sensor data; therefore, it is essential to recognize weather conditions for building safe and robust autonomous systems in the agricultural and autonomous driving/drone sectors. The performance degradation in computer vision tasks due to adverse weather depends on the type of weather and the intensity, which influences the amount of noise in sensor data. However, existing weather recognition datasets often lack intensity labels, limiting their effectiveness. To address this limitation, we present VARG, a novel video-based weather recognition dataset with weather intensity labels. The dataset comprises a diverse set of short video sequences collected from various social media platforms and videos recorded by the authors, processed into usable clips, and categorized into three major weather categories, rain, fog, and snow, with three intensity classes: absent/no, moderate, and high. The dataset contains 6742 annotated clips from 1079 videos, with the training set containing 5159 clips and the test set containing 1583 clips. Two sets of annotations are provided for training, the first set to train the models as a multi-label weather intensity classifier and the second set to train the models as a multi-class classifier for three weather scenarios. This paper describes the dataset characteristics and presents an evaluation study using several deep learning-based video recognition approaches for weather intensity prediction. Full article

(This article belongs to the Topic Research on Deep Neural Networks for Video Motion Recognition)

► Show Figures

Figure 1

16 pages, 1535 KiB

Open AccessArticle

Temporal–Semantic Aligning and Reasoning Transformer for Audio-Visual Zero-Shot Learning

by Kaiwen Zhang, Kunchen Zhao and Yunong Tian

Mathematics 2024, 12(14), 2200; https://doi.org/10.3390/math12142200 - 13 Jul 2024

Cited by 1 | Viewed by 655

Abstract

Zero-shot learning (ZSL) enables models to recognize categories not encountered during training, which is crucial for categories with limited data. Existing methods overlook efficient temporal modeling in multimodal data. This paper proposes a Temporal–Semantic Aligning and Reasoning Transformer (TSART) for spatio-temporal modeling. TSART uses the pre-trained SeLaVi network to extract audio and visual features and explores the semantic information of these modalities through audio and visual encoders. It incorporates a temporal information reasoning module to enhance the capture of temporal features in audio, and a cross-modal reasoning module to effectively integrate audio and visual information, establishing a robust joint embedding representation. Our experimental results validate the effectiveness of this approach, demonstrating outstanding Generalized Zero-Shot Learning (GZSL) performance on the UCF101 Generalized Zero-Shot Learning (UCF-GZSL), VGGSound-GZSL, and ActivityNet-GZSL datasets, with notable improvements in the Harmonic Mean (HM) evaluation. These results indicate that TSART has great potential in handling complex spatio-temporal information and multimodal fusion. Full article

(This article belongs to the Topic Research on Deep Neural Networks for Video Motion Recognition)

► Show Figures

Figure 1

Submit your Abstract

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
Future Internet futureinternet	2.8	7.1	2009	13.1 Days	CHF 1600	Submit
Information information	2.4	6.9	2010	14.9 Days	CHF 1600	Submit
Journal of Imaging jimaging	2.7	5.9	2015	20.9 Days	CHF 1800	Submit
Mathematics mathematics	2.3	4.0	2013	17.1 Days	CHF 2600	Submit
Symmetry symmetry	2.2	5.4	2009	16.8 Days	CHF 2400	Submit

Topic Menu

Topic Editors

Research on Deep Neural Networks for Video Motion Recognition

Topic Information

Keywords

Participating Journals

Published Papers (2 papers)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI