MVACLNet: A Multimodal Virtual Augmentation Contrastive Learning Network for Rumor Detection
Abstract
:1. Introduction
- We propose MVACLNet, a Multimodal Virtual Augmentation Contrastive Learning Network, which achieves more effective multimodal rumor detection. It consists of five modules: a Hierarchical Textual Feature Extraction module, a Visual Feature Extraction module, a Multimodal Feature Fusion module, a Virtual Augmentation Contrastive Learning module, and a Rumor Classification module. Each designed module has a different role, and all the modules contribute to the improvement of detection performance.
- We design a Hierarchical Textual Feature Extraction (HTFE) module to extract textual features from multiple perspectives in order to make comprehensive use of text data.
- We utilize a modified cross-attention mechanism, which operates from different perspectives at the feature value level, to obtain richer and more precise multimodal feature representations.
- We devise a Virtual Augmentation Contrastive Learning (VACL) module as an auxiliary training module to improve detection performance, which can help the model learn more robust and generalized multimodal feature representations by enhancing the diversity of multimodal samples in feature space to enhance feature learning, capturing more crucial similarities and differences among multimodal samples, and extracting more content-invariant multimodal features.
- Experiments on two real-world datasets demonstrate the effectiveness and superiority of MVACLNet in multimodal rumor detection.
2. Related Work
2.1. Unimodal Rumor Detection Methods
2.2. Multimodal Rumor Detection Methods
3. Methodology
3.1. Problem Definition
3.2. Overview
3.3. Hierarchical Textual Feature Extraction Module
3.3.1. Local-Level Textual Feature Extraction
3.3.2. Global-Level Textual Feature Extraction
3.3.3. Multi-Perspective Textual Feature Fusion
3.4. Visual Feature Extraction Module
3.5. Multimodal Feature Fusion Module
3.6. Virtual Augmentation Contrastive Learning Module
3.6.1. Ground-Truth Label Introduction
3.6.2. Virtual Sample Generation
3.6.3. Sample Reorganization
3.6.4. Enhanced Contrastive Learning
3.6.5. KL Divergence Constraint
3.7. Rumor Classification Module
3.8. Overall Loss
4. Experiments and Analysis
4.1. Datasets
4.2. Experimental Setup
4.3. Baselines
- VGG-19 [51]: This model is a pre-trained deep convolutional neural network architecture with 19 layers, which is widely employed for image classification tasks and is known for its straightforward yet effective stacked convolutional layer structure. We used it to obtain visual feature representation, which is input into a fully connected layer followed by a softmax layer to detect rumors.
- BERT [39]: This model is a pre-trained language model based on the Transformer encoder architecture, which captures bidirectional contextual textual information. We used it to obtain textual feature representation, which is input into a fully connected layer followed by a softmax layer to detect rumors.
- att-RNN [25]: This model employs LSTM to learn a joint representation of text and social context, and it extracts visual features through VGG-19. Then, it designs a neuron-level attention mechanism to capture correlations between visual and textual social features to obtain an attention-aggregated visual representation. Finally, it concatenates the representations for rumor detection. For fairness in the comparison, we removed the part that deals with social features in the concrete implementation.
- EANN [20]: This model uses TextCNN [40] and VGG-19 to extract textual and visual features, respectively. Then, it concatenates them as a multimodal feature representation, which is input into an event discriminator and a rumor classifier. The event discriminator guides the model to capture event-invariant multimodal features through an event adversarial mechanism.
- MVAE [21]: This model utilizes the variational autoencoder and a designed multimodal reconstruction loss to learn a shared representation between textual and visual modalities, where the encoder extracts textual and visual features through bidirectional LSTM and VGG19, respectively. Finally, the sampled latent multimodal feature representation is used for rumor detection.
- Spotfake [23]: This model uses BERT to extract textual features and utilizes VGG19 to capture visual features. Then, it concatenates them for rumor detection.
- SAFE [22]: This model first converts an image to text through a pre-trained image2sentence model. Then, it uses TextCNN to capture textual and visual features, which are concatenated for rumor detection. Meanwhile, it further utilizes the relevance between modalities, quantified as crossmodal similarity, to define an extra detection loss, thereby helping identify rumors.
- MCNN [28]: This model applies BERT and BiGRU to capture textual semantic features and utilizes ResNet50, an attention mechanism, and BiGRU to extract visual semantic features. Meanwhile, it captures visual tampering features through the Error Level Analysis (ELA) algorithm and ResNet50. Then, it uses a crossmodal weight-sharing layer and the attention mechanism to fuse all the above features and the semantic features of ResNet50 output for prediction. Afterward, it further employs cosine similarity to measure the similarity between textual and visual semantic features, which is used to define an extra detection loss to help detect rumors.
- CAFE [29]: This model utilizes BERT and ResNet34 to extract textual and visual features, respectively. Then, it defines an auxiliary correlation learning task to help achieve crossmodal feature alignment. Following this, it adaptively aggregates unimodal features and crossmodal correlations based on a learned ambiguity score between modalities, where the score is quantified by estimating the KL divergence between distributions of textual and visual features. Finally, the aggregated multimodal feature representation is input into a classifier for rumor detection.
- MRAN [30]: This model first extracts multilevel textual semantic features through different encoding layers of BERT. Then, it further utilizes TextCNN to aggregate these features in layers, thus filtering out some noise while extracting important local information. The visual features are extracted through VGG-19. Afterward, it uses text/image attention blocks and cross-attention blocks to capture intramodal and intermodal associations, thereby obtaining higher-order fusion features between textual and visual modalities. Finally, the fused multimodal feature representation is used for rumor detection.
4.4. Comparative Experiments and Analysis
4.5. Ablation Experiments and Analysis
- “w/o VACL” represents a model without the Virtual Augmentation Contrastive Learning module.
- “w/o KLC” represents a model that only uses the enhanced contrastive learning without performing the KL divergence constraint in VACL.
- “w/o ECL” indicates a model that only implements the KL divergence constraint without employing the enhanced contrastive learning in VACL.
- “w/o KLC+VA” represents a model that only leverages ground-truth labels to enhance contrastive learning in VACL, which does not perform the KL divergence constraint and does not use the extra-generated virtual multimodal feature representations to enhance contrastive learning.
- “w/o VA” represents a model that only leverages ground-truth labels to enhance contrastive learning and performs the KL divergence constraint in VACL, which does not use the extra-generated virtual multimodal feature representations to enhance contrastive learning.
- “w/o MFF” signifies a model with the Multimodal Feature Fusion module replaced by a two-layer MLP with an ReLu activation function.
- “w/o LT” represents a model that removes the local-level textual feature extraction component from HTFE.
- “w/o GTC” signifies a model that removes the global-level textual continuous feature extraction component from HTFE.
- “w/o GTNC” denotes a model that removes the global-level textual non-continuous feature extraction component from HTFE.
4.6. Visualization Analysis
5. Limitations and Threats to Validity
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
MVACLNet | Multimodal Virtual Augmentation Contrastive Learning Network. |
HTFE | Hierarchical Textual Feature Extraction. |
VACL | Virtual Augmentation Contrastive Learning. |
KL | Kullback–Leibler. |
LSTM | Long Short-Term Memory. |
BERT | Bidirectional Encoder Representation from Transformers. |
CNN | convolutional neural network. |
BiGRU | Bidirectional Gated Recurrent Unit. |
TF-IDF | Term Frequency-Inverse Document Frequency. |
PPMI | Positive Point-Wise Mutual Information. |
GCN | graph convolutional network. |
ResNet | Residual Network. |
VGG | Visual Geometry Group. |
att-RNN | Recurrent Neural Network with an attention mechanism. |
EANN | Event Adversarial Neural Network. |
MVAE | multimodal variational autoencoder. |
SAFE | Similarity-Aware FakE news detection method. |
MCNN | Multimodal Consistency Neural Network. |
MRAN | Multimodal Relationship-Aware Attention Network. |
TP | true positive. |
TN | true negative. |
FP | false positive. |
FN | false negative. |
References
- Zhang, X.; Ghorbani, A.A. An overview of online fake news: Characterization, detection, and discussion. Inf. Process. Manag. 2020, 57, 102025. [Google Scholar] [CrossRef]
- Naeem, S.B.; Bhatti, R.; Khan, A. An exploration of how fake news is taking over social media and putting public health at risk. Health Inf. Libr. J. 2021, 38, 143–149. [Google Scholar] [CrossRef] [PubMed]
- Castillo, C.; Mendoza, M.; Poblete, B. Information credibility on twitter. In Proceedings of the 20th International World Wide Web Conference, Hyderabad, India, 28 March–1 April 2011; pp. 675–684. [Google Scholar]
- Zhao, Z.; Resnick, P.; Mei, Q. Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts. In Proceedings of the 24th International World Wide Web Conference, Florence, Italy, 18–22 May 2015; pp. 1395–1405. [Google Scholar]
- Jin, Z.; Cao, J.; Zhang, Y.; Luo, J. News verification by exploiting conflicting social viewpoints in microblogs. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 2972–2978. [Google Scholar]
- Shao, C.; Ciampaglia, G.L.; Flammini, A.; Menczer, F. Hoaxy: A platform for tracking online misinformation. In Proceedings of the 25th International World Wide Web Conference, Montréal, QC, Canada, 11–15 April 2016; pp. 745–750. [Google Scholar]
- Ma, J.; Gao, W.; Mitra, P.; Kwon, S.; Jansen, B.J.; Wong, K.F.; Cha, M. Detecting rumors from microblogs with recurrent neural networks. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; pp. 3818–3824. [Google Scholar]
- Yu, F.; Liu, Q.; Wu, S.; Wang, L.; Tan, T. A Convolutional Approach for Misinformation Identification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 3901–3907. [Google Scholar]
- Ma, J.; Gao, W.; Wong, K.F. Detect rumor and stance jointly by neural multi-task learning. In Proceedings of the Companion Proceedings of the Web Conference 2018, Lyon, France, 23–27 April 2018; pp. 585–593. [Google Scholar]
- Nan, Q.; Cao, J.; Zhu, Y.; Wang, Y.; Li, J. MDFEND: Multi-domain fake news detection. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Event, 1–5 November 2021; pp. 3343–3347. [Google Scholar]
- Wu, L.; Rao, Y.; Zhang, C.; Zhao, Y.; Nazir, A. Category-controlled encoder-decoder for fake news detection. IEEE Trans. Knowl. Data Eng. 2023, 35, 1242–1257. [Google Scholar] [CrossRef]
- Ma, J.; Gao, W.; Wong, K.F. Rumor detection on Twitter with tree-structured recursive neural networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; pp. 1980–1989. [Google Scholar]
- Bian, T.; Xiao, X.; Xu, T.; Zhao, P.; Huang, W.; Rong, Y.; Huang, J. Rumor detection on social media with bi-directional graph convolutional networks. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 549–556. [Google Scholar]
- Wei, L.; Hu, D.; Zhou, W.; Yue, Z.; Hu, S. Towards propagation uncertainty: Edge-enhanced bayesian graph convolutional networks for rumor detection. arXiv 2021, arXiv:2107.11934. [Google Scholar]
- Hu, D.; Wei, L.; Zhou, W.; Huai, X.; Han, J.; Hu, S. A rumor detection approach based on multi-relational propagation tree. J. Comput. Res. Dev. 2021, 58, 1395–1411. [Google Scholar]
- Sun, M.; Zhang, X.; Zheng, J.; Ma, G. DDGCN: Dual Dynamic Graph Convolutional Networks for Rumor Detection on Social Media. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, Virtual Event, 22 February–1 March 2022; pp. 4611–4619. [Google Scholar]
- Jin, Z.; Cao, J.; Zhang, Y.; Zhou, J.; Tian, Q. Novel Visual and Statistical Image Features for Microblogs News Verification. IEEE Trans. Multimed. 2017, 19, 598–608. [Google Scholar] [CrossRef]
- Alam, F.; Cresci, S.; Chakraborty, T.; Silvestri, F.; Dimitrov, D.; Martino, G.D.S.; Shaar, S.; Firooz, H.; Nakov, P. A survey on multimodal disinformation detection. arXiv 2021, arXiv:2103.12541. [Google Scholar]
- Silva, A.; Luo, L.; Karunasekera, S.; Leckie, C. Embracing domain differences in fake news: Cross-domain fake news detection using multi-modal data. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, Virtual Event, 2–9 February 2021. pp. 557–565.
- Wang, Y.; Ma, F.; Jin, Z.; Yuan, Y.; Xun, G.; Jha, K.; Su, L.; Gao, J. EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 849–857. [Google Scholar]
- Khattar, D.; Goud, J.S.; Gupta, M.; Varma, V. MVAE: Multimodal Variational Autoencoder for Fake News Detection. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 2915–2921. [Google Scholar]
- Zhou, X.; Wu, J.; Zafarani, R. SAFE: Similarity-Aware Multi-modal Fake News Detection. In Proceedings of the 24th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Singapore, 11–14 May 2020; pp. 354–367. [Google Scholar]
- Singhal, S.; Shah, R.R.; Chakraborty, T.; Kumaraguru, P.; Satoh, S.I. Spotfake: A multi-modal framework for fake news detection. In Proceedings of the 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM), Singapore, 11–13 September 2019; pp. 39–47. [Google Scholar]
- Singhal, S.; Kabra, A.; Sharma, M.; Shah, R.R.; Chakraborty, T.; Kumaraguru, P. Spotfake+: A multimodal framework for fake news detection via transfer learning (student abstract). In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 13915–13916. [Google Scholar]
- Jin, Z.; Cao, J.; Guo, H.; Zhang, Y.; Luo, J. Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs. In Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 795–816. [Google Scholar]
- Qian, S.; Wang, J.; Hu, J.; Fang, Q.; Xu, C. Hierarchical multi-modal contextual attention network for fake news detection. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 11–15 July 2021; pp. 153–162. [Google Scholar]
- Wu, Y.; Zhan, P.; Zhang, Y.; Wang, L.; Xu, Z. Multimodal fusion with co-attention networks for fake news detection. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Bangkok, Thailand, 1–6 August 2021; pp. 2560–2569. [Google Scholar]
- Xue, J.; Wang, Y.; Tian, Y.; Li, Y.; Shi, L.; Wei, L. Detecting fake news by exploring the consistency of multimodal data. Inf. Process. Manag. 2021, 58, 102610. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Li, D.; Zhang, P.; Sui, J.; Lv, Q.; Tun, L.; Shang, L. Cross-modal Ambiguity Learning for Multimodal Fake News Detection. In Proceedings of the ACM Web Conference 2022, Virtual Event, 25–29 April 2022; pp. 2897–2905. [Google Scholar]
- Yang, H.; Zhang, J.; Zhang, L.; Cheng, X.; Hu, Z. MRAN: Multimodal relationship-aware attention network for fake news detection. Comput. Stand. Interfaces 2024, 89, 103822. [Google Scholar] [CrossRef]
- Qi, P.; Cao, J.; Yang, T.; Guo, J.; Li, J. Exploiting Multi-domain Visual Information for Fake News Detection. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM), Beijing, China, 8–11 November 2019; pp. 518–527. [Google Scholar]
- Dai, B.; Lin, D. Contrastive Learning for Image Captioning. Adv. Neural Inf. Process. Syst. 2017, 30, 898–907. [Google Scholar]
- Cai, H.; Chen, H.; Song, Y.; Ding, Z.; Bao, Y.; Yan, W.; Zhao, X. Group-wise Contrastive Learning for Neural Dialogue Generation. arXiv 2020, arXiv:2009.07543. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, Virtual Event, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- Wu, H.; Ma, T.; Wu, L.; Manyumwa, T.; Ji, S. Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning. arXiv 2020, arXiv:2010.01781. [Google Scholar]
- Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; Wang, L. Graph Contrastive Learning with Adaptive Augmentation. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 2069–2080. [Google Scholar]
- Sun, T.; Qian, Z.; Dong, S.; Li, P.; Zhu, Q. Rumor Detection on Social Media with Graph Adversarial Contrastive Learning. In Proceedings of the ACM Web Conference 2022, Virtual Event, 25–29 April 2022; pp. 2789–2797. [Google Scholar]
- Chen, J.; Yang, Z.; Yang, D. MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification. arXiv 2020, arXiv:2004.12239. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Kim, Y. Convolutional Neural Networks for Sentence Classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
- Yao, L.; Mao, C.; Luo, Y. Graph Convolutional Networks for Text Classification. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 7370–7377. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Lu, J.; Batra, D.; Parikh, D.; Lee, S. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. Adv. Neural Inf. Process. Syst. 2019, 32, 12–23. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
- Boididou, C.; Papadopoulos, S.; Zampoglou, M.; Apostolidis, L.; Papadopoulou, O.; Kompatsiaris, Y. Detection and visualization of misleading content on Twitter. Int. J. Multimed. Inf. Retr. 2018, 7, 71–86. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in PyTorch. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Grave, E.; Bojanowski, P.; Gupta, P.; Joulin, A.; Mikolov, T. Learning word vectors for 157 languages. arXiv 2018, arXiv:1802.06893. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. Adv. Neural Inf. Process. Syst. 2019, 32, 5754–5764. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Method | Accuracy | Rumor | Non-Rumor | ||||
---|---|---|---|---|---|---|---|
Precision | Recall | F1 | Precision | Recall | F1 | ||
VGG-19 | 0.596 | 0.695 | 0.518 | 0.593 | 0.524 | 0.700 | 0.599 |
BERT | 0.706 | 0.648 | 0.540 | 0.589 | 0.715 | 0.636 | 0.673 |
att-RNN | 0.664 | 0.749 | 0.615 | 0.676 | 0.589 | 0.728 | 0.651 |
EANN | 0.648 | 0.810 | 0.498 | 0.617 | 0.584 | 0.759 | 0.660 |
MVAE | 0.745 | 0.801 | 0.719 | 0.758 | 0.689 | 0.777 | 0.730 |
SAFE | 0.766 | 0.777 | 0.795 | 0.786 | 0.752 | 0.731 | 0.742 |
Spotfake | 0.771 | 0.784 | 0.744 | 0.764 | 0.769 | 0.807 | 0.787 |
Spotfake+ | 0.790 | 0.793 | 0.827 | 0.810 | 0.786 | 0.747 | 0.766 |
MCNN | 0.784 | 0.778 | 0.781 | 0.779 | 0.790 | 0.787 | 0.788 |
CAFE | 0.806 | 0.807 | 0.799 | 0.803 | 0.805 | 0.813 | 0.809 |
MRAN | 0.855 | 0.861 | 0.857 | 0.859 | 0.847 | 0.816 | 0.831 |
MVACLNet | 0.891 | 0.811 | 0.922 | 0.863 | 0.949 | 0.872 | 0.909 |
Method | Accuracy | Rumor | Non-Rumor | ||||
---|---|---|---|---|---|---|---|
Precision | Recall | F1 | Precision | Recall | F1 | ||
VGG-19 | 0.633 | 0.630 | 0.500 | 0.550 | 0.630 | 0.750 | 0.690 |
BERT | 0.804 | 0.800 | 0.860 | 0.830 | 0.840 | 0.760 | 0.800 |
att-RNN | 0.772 | 0.854 | 0.656 | 0.742 | 0.720 | 0.889 | 0.795 |
EANN | 0.782 | 0.827 | 0.697 | 0.756 | 0.752 | 0.863 | 0.804 |
MVAE | 0.824 | 0.854 | 0.769 | 0.809 | 0.802 | 0.875 | 0.837 |
SAFE | 0.763 | 0.833 | 0.659 | 0.736 | 0.717 | 0.868 | 0.785 |
Spotfake | 0.869 | 0.877 | 0.859 | 0.868 | 0.861 | 0.879 | 0.870 |
Spotfake+ | 0.870 | 0.887 | 0.849 | 0.868 | 0.855 | 0.892 | 0.873 |
MCNN | 0.846 | 0.809 | 0.857 | 0.832 | 0.879 | 0.837 | 0.858 |
CAFE | 0.840 | 0.855 | 0.830 | 0.842 | 0.825 | 0.851 | 0.837 |
MRAN | 0.903 | 0.904 | 0.908 | 0.906 | 0.897 | 0.892 | 0.894 |
MVACLNet | 0.913 | 0.916 | 0.911 | 0.913 | 0.910 | 0.916 | 0.913 |
Method | Accuracy | Rumor | Non-Rumor | ||||
---|---|---|---|---|---|---|---|
Precision | Recall | F1 | Precision | Recall | F1 | ||
MVACLNet | 0.891 | 0.811 | 0.922 | 0.863 | 0.949 | 0.872 | 0.909 |
w/o VACL | 0.824 | 0.763 | 0.763 | 0.763 | 0.860 | 0.860 | 0.860 |
w/o ECL | 0.831 | 0.772 | 0.776 | 0.774 | 0.867 | 0.864 | 0.865 |
w/o KLC | 0.843 | 0.792 | 0.781 | 0.787 | 0.872 | 0.879 | 0.875 |
w/o KLC+VA | 0.847 | 0.770 | 0.837 | 0.802 | 0.899 | 0.853 | 0.875 |
w/o VA | 0.857 | 0.768 | 0.883 | 0.821 | 0.923 | 0.841 | 0.881 |
w/o MFF | 0.857 | 0.772 | 0.871 | 0.818 | 0.918 | 0.849 | 0.882 |
w/o LT | 0.833 | 0.746 | 0.831 | 0.786 | 0.894 | 0.834 | 0.863 |
w/o GTC | 0.841 | 0.797 | 0.767 | 0.781 | 0.866 | 0.885 | 0.875 |
w/o GTNC | 0.830 | 0.763 | 0.784 | 0.773 | 0.870 | 0.857 | 0.864 |
Method | Accuracy | Rumor | Non-Rumor | ||||
---|---|---|---|---|---|---|---|
Precision | Recall | F1 | Precision | Recall | F1 | ||
MVACLNet | 0.913 | 0.916 | 0.911 | 0.913 | 0.910 | 0.916 | 0.913 |
w/o VACL | 0.874 | 0.876 | 0.872 | 0.874 | 0.872 | 0.876 | 0.874 |
w/o ECL | 0.879 | 0.890 | 0.865 | 0.877 | 0.868 | 0.892 | 0.880 |
w/o KLC | 0.869 | 0.880 | 0.857 | 0.868 | 0.860 | 0.882 | 0.871 |
w/o KLC+VA | 0.876 | 0.882 | 0.869 | 0.876 | 0.870 | 0.883 | 0.876 |
w/o VA | 0.878 | 0.890 | 0.862 | 0.876 | 0.866 | 0.893 | 0.879 |
w/o MFF | 0.874 | 0.900 | 0.842 | 0.870 | 0.851 | 0.906 | 0.878 |
w/o LT | 0.874 | 0.853 | 0.907 | 0.879 | 0.900 | 0.842 | 0.870 |
w/o GTC | 0.879 | 0.893 | 0.861 | 0.877 | 0.865 | 0.896 | 0.880 |
w/o GTNC | 0.883 | 0.868 | 0.903 | 0.885 | 0.899 | 0.862 | 0.880 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, X.; Pang, M.; Li, Q.; Zhou, J.; Wang, H.; Yang, D. MVACLNet: A Multimodal Virtual Augmentation Contrastive Learning Network for Rumor Detection. Algorithms 2024, 17, 199. https://doi.org/10.3390/a17050199
Liu X, Pang M, Li Q, Zhou J, Wang H, Yang D. MVACLNet: A Multimodal Virtual Augmentation Contrastive Learning Network for Rumor Detection. Algorithms. 2024; 17(5):199. https://doi.org/10.3390/a17050199
Chicago/Turabian StyleLiu, Xin, Mingjiang Pang, Qiang Li, Jiehan Zhou, Haiwen Wang, and Dawei Yang. 2024. "MVACLNet: A Multimodal Virtual Augmentation Contrastive Learning Network for Rumor Detection" Algorithms 17, no. 5: 199. https://doi.org/10.3390/a17050199
APA StyleLiu, X., Pang, M., Li, Q., Zhou, J., Wang, H., & Yang, D. (2024). MVACLNet: A Multimodal Virtual Augmentation Contrastive Learning Network for Rumor Detection. Algorithms, 17(5), 199. https://doi.org/10.3390/a17050199