Robust Deep Learning Techniques for Multimedia Forensics and Security

A special issue of Journal of Imaging (ISSN 2313-433X). This special issue belongs to the section "Biometrics, Forensics, and Security".

Deadline for manuscript submissions: closed (16 March 2024) | Viewed by 18889

Special Issue Editors


E-Mail Website
Guest Editor
Department of Information Engineering, University of Siena, Siena, Italy
Interests: adversarial signal processing; adversarial machine learning; multimedia forensics and security; watermarking and data hiding

E-Mail Website
Guest Editor
Department of Computer, Control, and Management Engineering A. Ruberti, Sapienza University of Rome, 00185 Rome, Italy
Interests: multimedia forensics and security; machine learning; deep learning; computer vision
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Information Engineering of the University of Siena, 53100 Siena SI, Italy
Interests: multimedia forensics and security; deep learning; computer vision

E-Mail Website
Guest Editor
Center for Data-Driven Science and Artificial Intelligence, Tohoku University, Sendai 980-8577, Japan
Interests: multimedia security; fingerprinting; traitor tracing; signal processing; cryptographic protocol; coding theory; statistical analysis
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Despite the tremendous efforts of researchers, the continuous advances in the Artificial Intelligence (AI) field and the new trends in digital media creation and manipulation are posing novel challenges. Adversarial machine learning has shown that it is possible to craft powerful jamming signals, namely adversarial examples, that can undermine the performance of AI-based detectors. Moreover, operations that media are often subject to (e.g., multiple social media sharing, compression, recapturing) can also be regarded as laundering-type attacks and affect the performance of AI-based systems. Furthermore, media are evolving. While a few years ago, images were by far the most manipulated media type, audio, text, and video manipulation are now incredibly common thanks to deepfake technology.

Most of the solutions developed so far by researchers to mitigate the above threats are quite naive and can only work under controlled operative conditions or thought to work under a very specific attack setting.

Robust systems should thus be designed, departing from fully data-driven solutions based on features completely self-learned by the network and trained on the whole data under analysis, exploiting more robust structures and architectures, and—whenever possible—resorting to multi-modal analysis. Focusing on the analysis of semantic attributes can also help to avoid the network relying on confounding factors, which comes with the consequence that the solutions developed lack generality and robustness.

The goal of this Special Issue is to collect new tools with improved robustness, capable of working in modern and real-word scenarios where the presence of intentional and unintentional attacks cannot be neglected, as well as new and powerful adversarial attacks that can threaten such detectors.

Dr. Benedetta Tondi
Dr. Irene Amerini
Dr. Andrea Costanzo
Dr. Minoru Kuribayashi
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 1568 KiB  
Article
A Neural-Network-Based Watermarking Method Approximating JPEG Quantization
by Shingo Yamauchi and Masaki Kawamura
J. Imaging 2024, 10(6), 138; https://doi.org/10.3390/jimaging10060138 - 6 Jun 2024
Viewed by 865
Abstract
We propose a neural-network-based watermarking method that introduces the quantized activation function that approximates the quantization of JPEG compression. Many neural-network-based watermarking methods have been proposed. Conventional methods have acquired robustness against various attacks by introducing an attack simulation layer between the embedding [...] Read more.
We propose a neural-network-based watermarking method that introduces the quantized activation function that approximates the quantization of JPEG compression. Many neural-network-based watermarking methods have been proposed. Conventional methods have acquired robustness against various attacks by introducing an attack simulation layer between the embedding network and the extraction network. The quantization process of JPEG compression is replaced by the noise addition process in the attack layer of conventional methods. In this paper, we propose a quantized activation function that can simulate the JPEG quantization standard as it is in order to improve the robustness against the JPEG compression. Our quantized activation function consists of several hyperbolic tangent functions and is applied as an activation function for neural networks. Our network was introduced in the attack layer of ReDMark proposed by Ahmadi et al. to compare it with their method. That is, the embedding and extraction networks had the same structure. We compared the usual JPEG compressed images and the images applying the quantized activation function. The results showed that a network with quantized activation functions can approximate JPEG compression with high accuracy. We also compared the bit error rate (BER) of estimated watermarks generated by our network with those generated by ReDMark. We found that our network was able to produce estimated watermarks with lower BERs than those of ReDMark. Therefore, our network outperformed the conventional method with respect to image quality and BER. Full article
(This article belongs to the Special Issue Robust Deep Learning Techniques for Multimedia Forensics and Security)
Show Figures

Figure 1

18 pages, 5383 KiB  
Article
Reliable Out-of-Distribution Recognition of Synthetic Images
by Anatol Maier and Christian Riess
J. Imaging 2024, 10(5), 110; https://doi.org/10.3390/jimaging10050110 - 1 May 2024
Viewed by 1575
Abstract
Generative adversarial networks (GANs) and diffusion models (DMs) have revolutionized the creation of synthetically generated but realistic-looking images. Distinguishing such generated images from real camera captures is one of the key tasks in current multimedia forensics research. One particular challenge is the generalization [...] Read more.
Generative adversarial networks (GANs) and diffusion models (DMs) have revolutionized the creation of synthetically generated but realistic-looking images. Distinguishing such generated images from real camera captures is one of the key tasks in current multimedia forensics research. One particular challenge is the generalization to unseen generators or post-processing. This can be viewed as an issue of handling out-of-distribution inputs. Forensic detectors can be hardened by the extensive augmentation of the training data or specifically tailored networks. Nevertheless, such precautions only manage but do not remove the risk of prediction failures on inputs that look reasonable to an analyst but in fact are out of the training distribution of the network. With this work, we aim to close this gap with a Bayesian Neural Network (BNN) that provides an additional uncertainty measure to warn an analyst of difficult decisions. More specifically, the BNN learns the task at hand and also detects potential confusion between post-processing and image generator artifacts. Our experiments show that the BNN achieves on-par performance with the state-of-the-art detectors while producing more reliable predictions on out-of-distribution examples. Full article
(This article belongs to the Special Issue Robust Deep Learning Techniques for Multimedia Forensics and Security)
Show Figures

Figure 1

20 pages, 7937 KiB  
Article
Harmonizing Image Forgery Detection & Localization: Fusion of Complementary Approaches
by Hannes Mareen, Louis De Neve, Peter Lambert and Glenn Van Wallendael
J. Imaging 2024, 10(1), 4; https://doi.org/10.3390/jimaging10010004 - 25 Dec 2023
Cited by 1 | Viewed by 2664
Abstract
Image manipulation is easier than ever, often facilitated using accessible AI-based tools. This poses significant risks when used to disseminate disinformation, false evidence, or fraud, which highlights the need for image forgery detection and localization methods to combat this issue. While some recent [...] Read more.
Image manipulation is easier than ever, often facilitated using accessible AI-based tools. This poses significant risks when used to disseminate disinformation, false evidence, or fraud, which highlights the need for image forgery detection and localization methods to combat this issue. While some recent detection methods demonstrate good performance, there is still a significant gap to be closed to consistently and accurately detect image manipulations in the wild. This paper aims to enhance forgery detection and localization by combining existing detection methods that complement each other. First, we analyze these methods’ complementarity, with an objective measurement of complementariness, and calculation of a target performance value using a theoretical oracle fusion. Then, we propose a novel fusion method that combines the existing methods’ outputs. The proposed fusion method is trained using a Generative Adversarial Network architecture. Our experiments demonstrate improved detection and localization performance on a variety of datasets. Although our fusion method is hindered by a lack of generalization, this is a common problem in supervised learning, and hence a motivation for future work. In conclusion, this work deepens our understanding of forgery detection methods’ complementariness and how to harmonize them. As such, we contribute to better protection against image manipulations and the battle against disinformation. Full article
(This article belongs to the Special Issue Robust Deep Learning Techniques for Multimedia Forensics and Security)
Show Figures

Figure 1

18 pages, 1746 KiB  
Article
A Robust Approach to Multimodal Deepfake Detection
by Davide Salvi, Honggu Liu, Sara Mandelli, Paolo Bestagini, Wenbo Zhou, Weiming Zhang and Stefano Tubaro
J. Imaging 2023, 9(6), 122; https://doi.org/10.3390/jimaging9060122 - 19 Jun 2023
Cited by 18 | Viewed by 7245
Abstract
The widespread use of deep learning techniques for creating realistic synthetic media, commonly known as deepfakes, poses a significant threat to individuals, organizations, and society. As the malicious use of these data could lead to unpleasant situations, it is becoming crucial to distinguish [...] Read more.
The widespread use of deep learning techniques for creating realistic synthetic media, commonly known as deepfakes, poses a significant threat to individuals, organizations, and society. As the malicious use of these data could lead to unpleasant situations, it is becoming crucial to distinguish between authentic and fake media. Nonetheless, though deepfake generation systems can create convincing images and audio, they may struggle to maintain consistency across different data modalities, such as producing a realistic video sequence where both visual frames and speech are fake and consistent one with the other. Moreover, these systems may not accurately reproduce semantic and timely accurate aspects. All these elements can be exploited to perform a robust detection of fake content. In this paper, we propose a novel approach for detecting deepfake video sequences by leveraging data multimodality. Our method extracts audio-visual features from the input video over time and analyzes them using time-aware neural networks. We exploit both the video and audio modalities to leverage the inconsistencies between and within them, enhancing the final detection performance. The peculiarity of the proposed method is that we never train on multimodal deepfake data, but on disjoint monomodal datasets which contain visual-only or audio-only deepfakes. This frees us from leveraging multimodal datasets during training, which is desirable given their lack in the literature. Moreover, at test time, it allows to evaluate the robustness of our proposed detector on unseen multimodal deepfakes. We test different fusion techniques between data modalities and investigate which one leads to more robust predictions by the developed detectors. Our results indicate that a multimodal approach is more effective than a monomodal one, even if trained on disjoint monomodal datasets. Full article
(This article belongs to the Special Issue Robust Deep Learning Techniques for Multimedia Forensics and Security)
Show Figures

Figure 1

19 pages, 776 KiB  
Article
White Box Watermarking for Convolution Layers in Fine-Tuning Model Using the Constant Weight Code
by Minoru Kuribayashi, Tatsuya Yasui and Asad Malik
J. Imaging 2023, 9(6), 117; https://doi.org/10.3390/jimaging9060117 - 9 Jun 2023
Cited by 4 | Viewed by 1797
Abstract
Deep neural network (DNN) watermarking is a potential approach for protecting the intellectual property rights of DNN models. Similar to classical watermarking techniques for multimedia content, the requirements for DNN watermarking include capacity, robustness, transparency, and other factors. Studies have focused on robustness [...] Read more.
Deep neural network (DNN) watermarking is a potential approach for protecting the intellectual property rights of DNN models. Similar to classical watermarking techniques for multimedia content, the requirements for DNN watermarking include capacity, robustness, transparency, and other factors. Studies have focused on robustness against retraining and fine-tuning. However, less important neurons in the DNN model may be pruned. Moreover, although the encoding approach renders DNN watermarking robust against pruning attacks, the watermark is assumed to be embedded only into the fully connected layer in the fine-tuning model. In this study, we extended the method such that the model can be applied to any convolution layer of the DNN model and designed a watermark detector based on a statistical analysis of the extracted weight parameters to evaluate whether the model is watermarked. Using a nonfungible token mitigates the overwriting of the watermark and enables checking when the DNN model with the watermark was created. Full article
(This article belongs to the Special Issue Robust Deep Learning Techniques for Multimedia Forensics and Security)
Show Figures

Figure 1

12 pages, 874 KiB  
Article
On the Generalization of Deep Learning Models in Video Deepfake Detection
by Davide Alessandro Coccomini, Roberto Caldelli, Fabrizio Falchi and Claudio Gennaro
J. Imaging 2023, 9(5), 89; https://doi.org/10.3390/jimaging9050089 - 29 Apr 2023
Cited by 9 | Viewed by 3016
Abstract
The increasing use of deep learning techniques to manipulate images and videos, commonly referred to as “deepfakes”, is making it more challenging to differentiate between real and fake content, while various deepfake detection systems have been developed, they often struggle to detect deepfakes [...] Read more.
The increasing use of deep learning techniques to manipulate images and videos, commonly referred to as “deepfakes”, is making it more challenging to differentiate between real and fake content, while various deepfake detection systems have been developed, they often struggle to detect deepfakes in real-world situations. In particular, these methods are often unable to effectively distinguish images or videos when these are modified using novel techniques which have not been used in the training set. In this study, we carry out an analysis of different deep learning architectures in an attempt to understand which is more capable of better generalizing the concept of deepfake. According to our results, it appears that Convolutional Neural Networks (CNNs) seem to be more capable of storing specific anomalies and thus excel in cases of datasets with a limited number of elements and manipulation methodologies. The Vision Transformer, conversely, is more effective when trained with more varied datasets, achieving more outstanding generalization capabilities than the other methods analysed. Finally, the Swin Transformer appears to be a good alternative for using an attention-based method in a more limited data regime and performs very well in cross-dataset scenarios. All the analysed architectures seem to have a different way to look at deepfakes, but since in a real-world environment the generalization capability is essential, based on the experiments carried out, the attention-based architectures seem to provide superior performances. Full article
(This article belongs to the Special Issue Robust Deep Learning Techniques for Multimedia Forensics and Security)
Show Figures

Figure 1

Back to TopTop