sensors-logo

Journal Browser

Journal Browser

AI Multimedia Applications

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (15 September 2022) | Viewed by 32148

Special Issue Editors


E-Mail Website
Guest Editor
Department of Electronic Engineering, National Taipei University of Technology, Taipei, Taiwan
Interests: intelligent multimedia systems; deep learning and artificial intelligence; image processing and video coding; intelligent video surveillance systems; cloud computing and big data analytics; mobile applications and systems

E-Mail Website
Guest Editor
Department of Computer Science, National Chengchi University, Taipei 116011, Taiwan
Interests: image processing; video compression; machine learning and its applications
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Canada Research Chair, School of Information Studies, McGill University, Montreal, Canada
Interests: Data mining; artificial intelligence; data privacy; machine learning; cybersecurity

E-Mail Website
Guest Editor
Department of Mechanical Systems Engineering, College of Engineering, Ibaraki University, Ibaraki 316-8511, Japan
Interests: communication network engineering; machine control algorithm; Internet of Things; AI robot; reinforcement learning; embedded software and systems
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The advance of AI technology has developed the world at a rapid pace. These technologies have created a better environment for people to live in, particularly through multimedia applications and medical signal processing, such as surveillance systems, medical sensors systems, robotics, computer vision systems, image restoration systems, electroencephalogram signal processing, and so In addition, they also involve all kinds of image/vision/camera/acoustic sensors and sensing systems to acquire data for developing and verifying these systems. To offer more adaptive approaches to better understand the complex and changing world, we would like to invite related researchers to submit papers regarding AI and multimedia applications. This Special Issue addresses all types of AI-based multimedia sensors designed to help people understand the world and to live easier.

Prof. Dr. Shih-Chia Huang
Dr. Yan-Tsung Peng
Prof. Dr. Benjamin C. M. Fung
Dr. Cheng Zhang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Multimedia applications
  • Surveillance
  • Medical Sensors
  • Robotics
  • Computer Vision
  • Image Restoration

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

31 pages, 23995 KiB  
Article
A Transformer-Based Model for Super-Resolution of Anime Image
by Shizhuo Xu, Vibekananda Dutta, Xin He and Takafumi Matsumaru
Sensors 2022, 22(21), 8126; https://doi.org/10.3390/s22218126 - 24 Oct 2022
Cited by 5 | Viewed by 4824
Abstract
Image super-resolution (ISR) technology aims to enhance resolution and improve image quality. It is widely applied to various real-world applications related to image processing, especially in medical images, while relatively little appliedto anime image production. Furthermore, contemporary ISR tools are often based on [...] Read more.
Image super-resolution (ISR) technology aims to enhance resolution and improve image quality. It is widely applied to various real-world applications related to image processing, especially in medical images, while relatively little appliedto anime image production. Furthermore, contemporary ISR tools are often based on convolutional neural networks (CNNs), while few methods attempt to use transformers that perform well in other advanced vision tasks. We propose a so-called anime image super-resolution (AISR) method based on the Swin Transformer in this work. The work was carried out in several stages. First, a shallow feature extraction approach was employed to facilitate the features map of the input image’s low-frequency information, which mainly approximates the distribution of detailed information in a spatial structure (shallow feature). Next, we applied deep feature extraction to extract the image semantic information (deep feature). Finally, the image reconstruction method combines shallow and deep features to upsample the feature size and performs sub-pixel convolution to obtain many feature map channels. The novelty of the proposal is the enhancement of the low-frequency information using a Gaussian filter and the introduction of different window sizes to replace the patch merging operations in the Swin Transformer. A high-quality anime dataset was constructed to curb the effects of the model robustness on the online regime. We trained our model on this dataset and tested the model quality. We implement anime image super-resolution tasks at different magnifications (2×, 4×, 8×). The results were compared numerically and graphically with those delivered by conventional convolutional neural network-based and transformer-based methods. We demonstrate the experiments numerically using standard peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), respectively. The series of experiments and ablation study showcase that our proposal outperforms others. Full article
(This article belongs to the Special Issue AI Multimedia Applications)
Show Figures

Figure 1

15 pages, 3465 KiB  
Article
Bed-Exit Behavior Recognition for Real-Time Images within Limited Range
by Cheng-Jian Lin, Ta-Sen Wei, Peng-Ta Liu, Bing-Hong Chen and Chi-Huang Shih
Sensors 2022, 22(15), 5495; https://doi.org/10.3390/s22155495 - 23 Jul 2022
Cited by 1 | Viewed by 1721
Abstract
In the context of behavior recognition, the emerging bed-exit monitoring system demands a rapid deployment in the ward to support mobility and personalization. Mobility means the system can be installed and removed as required without construction; personalization indicates human body tracking is limited [...] Read more.
In the context of behavior recognition, the emerging bed-exit monitoring system demands a rapid deployment in the ward to support mobility and personalization. Mobility means the system can be installed and removed as required without construction; personalization indicates human body tracking is limited to the bed region so that only the target is monitored. To satisfy the above-mentioned requirements, the behavior recognition system aims to: (1) operate in a small-size device, typically an embedded system; (2) process a series of images with narrow fields of view (NFV) to detect bed-related behaviors. In general, wide-range images are preferred to obtain a good recognition performance for diverse behaviors, while NFV images are used with abrupt activities and therefore fit single-purpose applications. This paper develops an NFV-based behavior recognition system with low complexity to realize a bed-exit monitoring application on embedded systems. To achieve effectiveness and low complexity, a queueing-based behavior classification is proposed to keep memories of object tracking information and a specific behavior can be identified from continuous object movement. The experimental results show that the developed system can recognize three bed behaviors, namely off bed, on bed and return, for NFV images with accuracy rates of 95~100%. Full article
(This article belongs to the Special Issue AI Multimedia Applications)
Show Figures

Figure 1

15 pages, 8206 KiB  
Article
RCA-LF: Dense Light Field Reconstruction Using Residual Channel Attention Networks
by Ahmed Salem, Hatem Ibrahem and Hyun-Soo Kang
Sensors 2022, 22(14), 5254; https://doi.org/10.3390/s22145254 - 14 Jul 2022
Cited by 2 | Viewed by 1756
Abstract
Dense multi-view image reconstruction has played an active role in research for a long time and interest has recently increased. Multi-view images can solve many problems and enhance the efficiency of many applications. This paper presents a more specific solution for reconstructing high-density [...] Read more.
Dense multi-view image reconstruction has played an active role in research for a long time and interest has recently increased. Multi-view images can solve many problems and enhance the efficiency of many applications. This paper presents a more specific solution for reconstructing high-density light field (LF) images. We present this solution for images captured by Lytro Illum cameras to solve the implicit problem related to the discrepancy between angular and spatial resolution resulting from poor sensor resolution. We introduce the residual channel attention light field (RCA-LF) structure to solve different LF reconstruction tasks. In our approach, view images are grouped in one stack where epipolar information is available. We use 2D convolution layers to process and extract features from the stacked view images. Our method adopts the channel attention mechanism to learn the relation between different views and give higher weight to the most important features, restoring more texture details. Finally, experimental results indicate that the proposed model outperforms earlier state-of-the-art methods for visual and numerical evaluation. Full article
(This article belongs to the Special Issue AI Multimedia Applications)
Show Figures

Figure 1

15 pages, 1683 KiB  
Article
Sentiment Analysis: An ERNIE-BiLSTM Approach to Bullet Screen Comments
by Yen-Hao Hsieh and Xin-Ping Zeng
Sensors 2022, 22(14), 5223; https://doi.org/10.3390/s22145223 - 13 Jul 2022
Cited by 10 | Viewed by 3229
Abstract
Sentiment analysis is one of the fields of affective computing, which detects and evaluates people’s psychological states and sentiments through text analysis. It is an important application of text mining technology and is widely used to analyze comments. Bullet screen videos have become [...] Read more.
Sentiment analysis is one of the fields of affective computing, which detects and evaluates people’s psychological states and sentiments through text analysis. It is an important application of text mining technology and is widely used to analyze comments. Bullet screen videos have become a popular way for people to interact and communicate while watching online videos. Existing studies have focused on the form, content, and function of bullet screen comments, but few have examined bullet screen comments using natural language processing. Bullet screen comments are short text messages of different lengths and ambiguous emotional information, which makes it extremely challenging in natural language processing. Hence, it is important to understand how we can use the characteristics of bullet screen comments and sentiment analysis to understand the sentiments expressed and trends in bullet screen comments. This study poses the following research question: how can one analyze the sentiments ex-pressed in bullet screen comments accurately and effectively? This study mainly proposes an ERNIE-BiLSTM approach for sentiment analysis on bullet screen comments, which provides effective and innovative thinking for the sentiment analysis of bullet screen comments. The experimental results show that the ERNIE-BiLSTM approach has a higher accuracy rate, precision rate, recall rate, and F1-score than other methods. Full article
(This article belongs to the Special Issue AI Multimedia Applications)
Show Figures

Figure 1

14 pages, 1764 KiB  
Article
End-to-End Train Horn Detection for Railway Transit Safety
by Van-Thuan Tran, Wei-Ho Tsai, Yury Furletov and Mikhail Gorodnichev
Sensors 2022, 22(12), 4453; https://doi.org/10.3390/s22124453 - 12 Jun 2022
Cited by 4 | Viewed by 2539
Abstract
The train horn sound is an active audible warning signal used for warning commuters and railway employees of the oncoming train(s), assuring a smooth operation and traffic safety, especially at barrier-free crossings. This work studies deep learning-based approaches to develop a system providing [...] Read more.
The train horn sound is an active audible warning signal used for warning commuters and railway employees of the oncoming train(s), assuring a smooth operation and traffic safety, especially at barrier-free crossings. This work studies deep learning-based approaches to develop a system providing the early detection of train arrival based on the recognition of train horn sounds from the traffic soundscape. A custom dataset of train horn sounds, car horn sounds, and traffic noises is developed to conduct experiments and analysis. We propose a novel two-stream end-to-end CNN model (i.e., THD-RawNet), which combines two approaches of feature extraction from raw audio waveforms, for audio classification in train horn detection (THD). Besides a stream with a sequential one-dimensional CNN (1D-CNN) as in existing sound classification works, we propose to utilize multiple 1D-CNN branches to process raw waves in different temporal resolutions to extract an image-like representation for the 2D-CNN classification part. Our experiment results and comparative analysis have proved the effectiveness of the proposed two-stream network and the method of combining features extracted in multiple temporal resolutions. The THD-RawNet obtained better accuracies and robustness compared to those of baseline models trained on either raw audio or handcrafted features, in which at the input size of one second the network yielded an accuracy of 95.11% for testing data in normal traffic conditions and remained above a 93% accuracy for the considerable noisy condition of-10 dB SNR. The proposed THD system can be integrated into the smart railway crossing systems, private cars, and self-driving cars to improve railway transit safety. Full article
(This article belongs to the Special Issue AI Multimedia Applications)
Show Figures

Figure 1

18 pages, 17708 KiB  
Article
A Underwater Sequence Image Dataset for Sharpness and Color Analysis
by Miao Yang, Ge Yin, Haiwen Wang, Jinnai Dong, Zhuoran Xie and Bing Zheng
Sensors 2022, 22(9), 3550; https://doi.org/10.3390/s22093550 - 7 May 2022
Cited by 6 | Viewed by 2480
Abstract
The complex underwater environment usually leads to the problem of quality degradation in underwater images, and the distortion of sharpness and color are the main factors to the quality of underwater images. The paper discloses an underwater sequence image dataset called TankImage-I with [...] Read more.
The complex underwater environment usually leads to the problem of quality degradation in underwater images, and the distortion of sharpness and color are the main factors to the quality of underwater images. The paper discloses an underwater sequence image dataset called TankImage-I with gradually changing sharpness and color distortion collected in a pool. TankImage-I contains two plane targets, a total of 78 images. It includes two lighting conditions and three different water transparency. The imaging distance is also changed during the photographing process. The paper introduces the relevant details of the photographing process, and provides the measurement results of the sharpness and color distortion of the sequence images. In addition, we verify the performance of 14 image quality assessment methods on TankImage-I, and analyze the results of 14 image quality assessment methods from the aspects of sharpness and color, which provides a reference for the design and improvement of underwater image quality assessment algorithm and underwater imaging system design. Full article
(This article belongs to the Special Issue AI Multimedia Applications)
Show Figures

Figure 1

14 pages, 5397 KiB  
Article
End-to-End Residual Network for Light Field Reconstruction on Raw Images and View Image Stacks
by Ahmed Salem, Hatem Ibrahem, Bilel Yagoub and Hyun-Soo Kang
Sensors 2022, 22(9), 3540; https://doi.org/10.3390/s22093540 - 6 May 2022
Cited by 2 | Viewed by 1832
Abstract
Light field (LF) technology has become a focus of great interest (due to its use in many applications), especially since the introduction of the consumer LF camera, which facilitated the acquisition of dense LF images. Obtaining densely sampled LF images is costly due [...] Read more.
Light field (LF) technology has become a focus of great interest (due to its use in many applications), especially since the introduction of the consumer LF camera, which facilitated the acquisition of dense LF images. Obtaining densely sampled LF images is costly due to the trade-off between spatial and angular resolutions. Accordingly, in this research, we suggest a learning-based solution to this challenging problem, reconstructing dense, high-quality LF images. Instead of training our model with several images of the same scene, we used raw LF images (lenslet images). The raw LF format enables the encoding of several images of the same scene into one image. Consequently, it helps the network to understand and simulate the relationship between different images, resulting in higher quality images. We divided our model into two successive modules: LFR and LF augmentation (LFA). Each module is represented using a convolutional neural network-based residual network (CNN). We trained our network to lessen the absolute error between the novel and reference views. Experimental findings on real-world datasets show that our suggested method has excellent performance and superiority over state-of-the-art approaches. Full article
(This article belongs to the Special Issue AI Multimedia Applications)
Show Figures

Figure 1

19 pages, 105796 KiB  
Article
Variational Model for Single-Image Reflection Suppression Based on Multiscale Thresholding
by Pei-Chiang Shao
Sensors 2022, 22(6), 2271; https://doi.org/10.3390/s22062271 - 15 Mar 2022
Viewed by 1946
Abstract
Reflections often cause degradation in image quality for pictures taken through glass medium. Removing the undesired reflections is becoming increasingly important. For human vision, it can produce much more pleasing results for multimedia applications. For machine vision, it can benefit various applications such [...] Read more.
Reflections often cause degradation in image quality for pictures taken through glass medium. Removing the undesired reflections is becoming increasingly important. For human vision, it can produce much more pleasing results for multimedia applications. For machine vision, it can benefit various applications such as image segmentation and classification. Reflection removal is itself a highly illposed inverse problem that is very difficult to solve, especially for a single input image. Existing methods mainly rely on various prior information and assumptions to alleviate the ill-posedness. In this paper, we design a variational model based on multiscale hard thresholding to both effectively and efficiently suppress image reflections. A direct solver using the discrete cosine transform for implementing the proposed variational model is also provided. Both synthetic and real glass images are used in the numerical experiments to compare the performance of the proposed algorithm with other representative algorithms. The experimental results show the superiority of our algorithm over the previous ones. Full article
(This article belongs to the Special Issue AI Multimedia Applications)
Show Figures

Figure 1

16 pages, 2847 KiB  
Article
Convolutional Blur Attention Network for Cell Nuclei Segmentation
by Phuong Thi Le, Tuan Pham, Yi-Chiung Hsu and Jia-Ching Wang
Sensors 2022, 22(4), 1586; https://doi.org/10.3390/s22041586 - 18 Feb 2022
Cited by 15 | Viewed by 3232
Abstract
Accurately segmented nuclei are important, not only for cancer classification, but also for predicting treatment effectiveness and other biomedical applications. However, the diversity of cell types, various external factors, and illumination conditions make nucleus segmentation a challenging task. In this work, we present [...] Read more.
Accurately segmented nuclei are important, not only for cancer classification, but also for predicting treatment effectiveness and other biomedical applications. However, the diversity of cell types, various external factors, and illumination conditions make nucleus segmentation a challenging task. In this work, we present a new deep learning-based method for cell nucleus segmentation. The proposed convolutional blur attention (CBA) network consists of downsampling and upsampling procedures. A blur attention module and a blur pooling operation are used to retain the feature salience and avoid noise generation in the downsampling procedure. A pyramid blur pooling (PBP) module is proposed to capture the multi-scale information in the upsampling procedure. The superiority of the proposed method has been compared with a few prior segmentation models, namely U-Net, ENet, SegNet, LinkNet, and Mask RCNN on the 2018 Data Science Bowl (DSB) challenge dataset and the multi-organ nucleus segmentation (MoNuSeg) at MICCAI 2018. The Dice similarity coefficient and some evaluation matrices, such as F1 score, recall, precision, and average Jaccard index (AJI) were used to evaluate the segmentation efficiency of these models. Overall, the proposal method in this paper has the best performance, the AJI indicator on the DSB dataset and MoNuSeg is 0.8429, 0.7985, respectively. Full article
(This article belongs to the Special Issue AI Multimedia Applications)
Show Figures

Figure 1

20 pages, 7490 KiB  
Article
Detail Preserving Low Illumination Image and Video Enhancement Algorithm Based on Dark Channel Prior
by Lingli Guo, Zhenhong Jia, Jie Yang and Nikola K. Kasabov
Sensors 2022, 22(1), 85; https://doi.org/10.3390/s22010085 - 23 Dec 2021
Cited by 4 | Viewed by 2884
Abstract
In low illumination situations, insufficient light in the monitoring device results in poor visibility of effective information, which cannot meet practical applications. To overcome the above problems, a detail preserving low illumination video image enhancement algorithm based on dark channel prior is proposed [...] Read more.
In low illumination situations, insufficient light in the monitoring device results in poor visibility of effective information, which cannot meet practical applications. To overcome the above problems, a detail preserving low illumination video image enhancement algorithm based on dark channel prior is proposed in this paper. First, a dark channel refinement method is proposed, which is defined by imposing a structure prior to the initial dark channel to improve the image brightness. Second, an anisotropic guided filter (AnisGF) is used to refine the transmission, which preserves the edges of the image. Finally, a detail enhancement algorithm is proposed to avoid the problem of insufficient detail in the initial enhancement image. To avoid video flicker, the next video frames are enhanced based on the brightness of the first enhanced frame. Qualitative and quantitative analysis shows that the proposed algorithm is superior to the contrast algorithm, in which the proposed algorithm ranks first in average gradient, edge intensity, contrast, and patch-based contrast quality index. It can be effectively applied to the enhancement of surveillance video images and for wider computer vision applications. Full article
(This article belongs to the Special Issue AI Multimedia Applications)
Show Figures

Figure 1

Review

Jump to: Research

65 pages, 9169 KiB  
Review
A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint
by Ubaid Ullah, Jeong-Sik Lee, Chang-Hyeon An, Hyeonjin Lee, Su-Yeong Park, Rock-Hyun Baek and Hyun-Chul Choi
Sensors 2022, 22(18), 6816; https://doi.org/10.3390/s22186816 - 8 Sep 2022
Cited by 5 | Viewed by 4183
Abstract
For decades, co-relating different data domains to attain the maximum potential of machines has driven research, especially in neural networks. Similarly, text and visual data (images and videos) are two distinct data domains with extensive research in the past. Recently, using natural language [...] Read more.
For decades, co-relating different data domains to attain the maximum potential of machines has driven research, especially in neural networks. Similarly, text and visual data (images and videos) are two distinct data domains with extensive research in the past. Recently, using natural language to process 2D or 3D images and videos with the immense power of neural nets has witnessed a promising future. Despite the diverse range of remarkable work in this field, notably in the past few years, rapid improvements have also solved future challenges for researchers. Moreover, the connection between these two domains is mainly subjected to GAN, thus limiting the horizons of this field. This review analyzes Text-to-Image (T2I) synthesis as a broader picture, Text-guided Visual-output (T2Vo), with the primary goal being to highlight the gaps by proposing a more comprehensive taxonomy. We broadly categorize text-guided visual output into three main divisions and meaningful subdivisions by critically examining an extensive body of literature from top-tier computer vision venues and closely related fields, such as machine learning and human–computer interaction, aiming at state-of-the-art models with a comparative analysis. This study successively follows previous surveys on T2I, adding value by analogously evaluating the diverse range of existing methods, including different generative models, several types of visual output, critical examination of various approaches, and highlighting the shortcomings, suggesting the future direction of research. Full article
(This article belongs to the Special Issue AI Multimedia Applications)
Show Figures

Figure 1

Back to TopTop