Advances Techniques in Computer Vision and Multimedia

A special issue of Future Internet (ISSN 1999-5903). This special issue belongs to the section "Big Data and Augmented Intelligence".

Deadline for manuscript submissions: closed (30 April 2023) | Viewed by 15311

Special Issue Editor


E-Mail Website
Guest Editor
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
Interests: pattern recognition; machine learning; multimedia computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

With the popularization of Artificial Intelligence (AI) technology, computer vision has experienced significant advancements and great success in areas closely concerned with human society, e.g., autonomous driving, virtual reality, mixed reality, and medical health. As a research topic, computer vision aims to enable computer systems to automatically see, recognize, and understand the visual world by simulating the mechanism of human vision.

Multimedia have also changed our lifestyles and are becoming an indispensable part of our daily life. This research field mainly discusses the emerging computing methods of dealing with various media (picture, text, audio, video, etc.) generated by the ubiquitous multimedia sensors and infrastructures, including retrieval of multimedia data, analysis of multimedia contents, methodology based on deep learning, and practical multimedia applications.

Large amounts of researchers have devoted themselves to exploring the emerging fields of computer vision and multimedia, e.g., adversarial learning for multimedia, multimodal sentiment analysis, and explainable AI. Meanwhile, numerous advanced technologies in these areas continue to emerge. This Special Issue will provide an excellent opportunity for sharing a timely collection of research updates and will benefit researchers and practitioners engaged in computer vision, media computing, machine learning, and other fields.

Prof. Dr. Yang Wang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Future Internet is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • motion and tracking
  • image and video retrieval
  • detection and localization
  • scene analysis and understanding
  • multimedia systems
  • multimedia for society and health
  • multimedia application and services
  • multimedia security and content protection
  • multimedia communications, networking, and mobility

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

2 pages, 146 KiB  
Editorial
Advances Techniques in Computer Vision and Multimedia
by Yang Wang
Future Internet 2023, 15(9), 294; https://doi.org/10.3390/fi15090294 - 1 Sep 2023
Viewed by 1175
Abstract
Computer vision has experienced significant advancements and great success in areas closely related to human society, which aims to enable computer systems to automatically see, recognize, and understand the visual world by simulating the mechanism of human vision [...] Full article
(This article belongs to the Special Issue Advances Techniques in Computer Vision and Multimedia)

Research

Jump to: Editorial

17 pages, 7576 KiB  
Article
Banging Interaction: A Ubimus-Design Strategy for the Musical Internet
by Damián Keller, Azeema Yaseen, Joseph Timoney, Sutirtha Chakraborty and Victor Lazzarini
Future Internet 2023, 15(4), 125; https://doi.org/10.3390/fi15040125 - 27 Mar 2023
Cited by 2 | Viewed by 1688
Abstract
We introduce a new perspective for musical interaction tailored to a specific class of sonic resources: impact sounds. Our work is informed by the field of ubiquitous music (ubimus) and engages with the demands of artistic practices. Through a series of deployments of [...] Read more.
We introduce a new perspective for musical interaction tailored to a specific class of sonic resources: impact sounds. Our work is informed by the field of ubiquitous music (ubimus) and engages with the demands of artistic practices. Through a series of deployments of a low-cost and highly flexible network-based prototype, the Dynamic Drum Collective, we exemplify the limitations and specific contributions of banging interaction. Three components of this new design strategy—adaptive interaction, mid-air techniques and timbre-led design—target the development of creative-action metaphors that make use of resources available in everyday settings. The techniques involving the use of sonic gridworks yielded positive outcomes. The subjects tended to choose sonic materials that—when combined with their actions on the prototype—approached a full rendition of the proposed soundtrack. The results of the study highlighted the subjects’ reliance on visual feedback as a non-exclusive strategy to handle both temporal organization and collaboration. The results show a methodological shift from device-centric and instrumental-centric methods to designs that target the dynamic relational properties of ubimus ecosystems. Full article
(This article belongs to the Special Issue Advances Techniques in Computer Vision and Multimedia)
Show Figures

Figure 1

14 pages, 7919 KiB  
Article
Neural Network-Based Price Tag Data Analysis
by Pavel Laptev, Sergey Litovkin, Sergey Davydenko, Anton Konev, Evgeny Kostyuchenko and Alexander Shelupanov
Future Internet 2022, 14(3), 88; https://doi.org/10.3390/fi14030088 - 13 Mar 2022
Cited by 6 | Viewed by 4048
Abstract
This paper compares neural networks, specifically Unet, MobileNetV2, VGG16 and YOLOv4-tiny, for image segmentation as part of a study aimed at finding an optimal solution for price tag data analysis. The neural networks considered were trained on an individual dataset collected by the [...] Read more.
This paper compares neural networks, specifically Unet, MobileNetV2, VGG16 and YOLOv4-tiny, for image segmentation as part of a study aimed at finding an optimal solution for price tag data analysis. The neural networks considered were trained on an individual dataset collected by the authors. Additionally, this paper covers the automatic image text recognition approach using EasyOCR API. Research revealed that the optimal network for segmentation is YOLOv4-tiny, featuring a cross validation accuracy of 96.92%. EasyOCR accuracy was also calculated and is 95.22%. Full article
(This article belongs to the Special Issue Advances Techniques in Computer Vision and Multimedia)
Show Figures

Figure 1

23 pages, 3828 KiB  
Article
DA-GAN: Dual Attention Generative Adversarial Network for Cross-Modal Retrieval
by Liewu Cai, Lei Zhu, Hongyan Zhang and Xinghui Zhu
Future Internet 2022, 14(2), 43; https://doi.org/10.3390/fi14020043 - 27 Jan 2022
Cited by 8 | Viewed by 3807
Abstract
Cross-modal retrieval aims to search samples of one modality via queries of other modalities, which is a hot issue in the community of multimedia. However, two main challenges, i.e., heterogeneity gap and semantic interaction across different modalities, have not been solved efficaciously. Reducing [...] Read more.
Cross-modal retrieval aims to search samples of one modality via queries of other modalities, which is a hot issue in the community of multimedia. However, two main challenges, i.e., heterogeneity gap and semantic interaction across different modalities, have not been solved efficaciously. Reducing the heterogeneous gap can improve the cross-modal similarity measurement. Meanwhile, modeling cross-modal semantic interaction can capture the semantic correlations more accurately. To this end, this paper presents a novel end-to-end framework, called Dual Attention Generative Adversarial Network (DA-GAN). This technique is an adversarial semantic representation model with a dual attention mechanism, i.e., intra-modal attention and inter-modal attention. Intra-modal attention is used to focus on the important semantic feature within a modality, while inter-modal attention is to explore the semantic interaction between different modalities and then represent the high-level semantic correlation more precisely. A dual adversarial learning strategy is designed to generate modality-invariant representations, which can reduce the cross-modal heterogeneity efficiently. The experiments on three commonly used benchmarks show the better performance of DA-GAN than these competitors. Full article
(This article belongs to the Special Issue Advances Techniques in Computer Vision and Multimedia)
Show Figures

Figure 1

17 pages, 391 KiB  
Article
Graph Representation-Based Deep Multi-View Semantic Similarity Learning Model for Recommendation
by Jiagang Song, Jiayu Song, Xinpan Yuan, Xiao He and Xinghui Zhu
Future Internet 2022, 14(2), 32; https://doi.org/10.3390/fi14020032 - 19 Jan 2022
Cited by 8 | Viewed by 3165
Abstract
With the rapid development of Internet technology, how to mine and analyze massive amounts of network information to provide users with accurate and fast recommendation information has become a hot and difficult topic of joint research in industry and academia in recent years. [...] Read more.
With the rapid development of Internet technology, how to mine and analyze massive amounts of network information to provide users with accurate and fast recommendation information has become a hot and difficult topic of joint research in industry and academia in recent years. One of the most widely used social network recommendation methods is collaborative filtering. However, traditional social network-based collaborative filtering algorithms will encounter problems such as low recommendation performance and cold start due to high data sparsity and uneven distribution. In addition, these collaborative filtering algorithms do not effectively consider the implicit trust relationship between users. To this end, this paper proposes a collaborative filtering recommendation algorithm based on graphsage (GraphSAGE-CF). The algorithm first uses graphsage to learn low-dimensional feature representations of the global and local structures of user nodes in social networks and then calculates the implicit trust relationship between users through the feature representations learned by graphsage. Finally, the comprehensive evaluation shows the scores of users and implicit users on related items and predicts the scores of users on target items. Experimental results on four open standard datasets show that our proposed graphsage-cf algorithm is superior to existing algorithms in RMSE and MAE. Full article
(This article belongs to the Special Issue Advances Techniques in Computer Vision and Multimedia)
Show Figures

Figure 1

Back to TopTop