Cross-Domain Fake News Detection Through Fusion of Evidence from Multiple Social Media Platforms
Abstract
:1. Introduction
- A Dempster–Shafer-based fusion method is proposed to combine probabilities from comment-based detectors across multiple social media platforms, incorporating correlation-based consistency to handle uncertainty effectively.
- To trade off between content- and comment-based methods, we join them using a threshold that improves the overall fake news identification accuracy rate by 7% compared with previously proposed methods in the cross domain.
2. Literature Review
2.1. In-Domain Fake News Detection
- Content-based FND: content-based methods (news text) are the most commonly used techniques for fake news detection in in-domain scenarios. These methods are currently focused on utilizing transformer-based models for feature extraction and classification. Kaliyar et al. [36] used over 8 million tweets about the U.S. general election to develop a bidirectional training approach. This method improves fake news classification by capturing semantic and long-distance dependencies, achieving a 98.90% accuracy with a BERT-based model. Ahn et al. [37] fine-tuned BERT for detecting fake news in a Korean dataset, achieving an ROC-AUC score of 83.8%. Safaya et al. [38] proposed a BERT-CNN model, which outperformed five state-of-the-art models in F1-score on Arabic, Greek, and Turkish tweets, suggesting potential improvements for other languages. In addition to transformer-based methods, TF-IDF, part-of-speech tagging, and word embeddings are also common in content-based fake news detection [39,40,41]. Since BERT is a highly powerful semantic feature extractor, it is widely used in numerous studies [42,43]. However, its computational intensity can be a limiting factor in some applications. In addition to BERT-based models, other transformers like RoBERT, XBert, and GPT perform well for fake news detection [44,45]. He et al. [46] introduced a single-layer CNN model integrated with BERT, evaluated on the Airline Travel Information Systems (ATIS) dataset, achieving 98.54% accuracy. They noted the model’s suitability for short sentences and potential limitations in robustness. Other attention- and sentiment-based methods have also been studied in the literature [30,47,48]. However, these methods may struggle with complex language and context variability.
- Social media- based FND: detecting fake news in traditional news media primarily relies on the content of the news itself. However, in social media, additional contextual information such as user profiles, comments, and news propagation patterns can assist in identifying fake news. The role of users is crucial in this context, as both humans and bots can disseminate news. Users provide valuable information for fake news detection, and user-based features reflect the characteristics of those interacting with news on social media. These features are categorized into individual and group levels.Individual-level features assess each user’s credibility and reliability by examining demographics such as account age, number of followers/following, and the volume of tweets authored [49]. On the group level, it is assumed that the communities spreading fake news differ from those spreading real news. Group-level features are typically derived by aggregating individual-level features, such as the percentage of verified users and the average number of followers within a group [50].Another important feature in social media is user comments. Several approaches have used temporal linguistic features extracted from sequences of user comments for FND. These methods often rely on fusion techniques that incorporate both news article content and user comments to enhance classification accuracy. In early fusion, features from the text and comments are concatenated in the initial stage, allowing the model to learn joint representations, which can be useful when the modalities are complementary. Late fusion processes the text and comments separately and combines the outputs later, typically by averaging or weighting predictions. This approach ensures that each modality retains its distinctive characteristics. (Early fusion combines data from multiple modalities at an initial stage, allowing the model to capture complex interactions between modalities, such as text and images, from the beginning. However, this approach may lead to overfitting when the interactions between modalities are weak or nonexistent. Late fusion, on the other hand, processes each modality independently and combines their outputs at a later stage, preserving the uniqueness of each modality and reducing the risk of overfitting. However, it may miss valuable cross-modal dependencies by treating the modalities separately for most of the process. Hybrid fusion combines elements of both early and late fusion, capturing some cross-modal interactions early while preserving the distinct information of each modality later on. While powerful, hybrid fusion can increase model complexity and computational demands.) These methods, which fuse text and comments, enhance detection accuracy more compared with relying solely on text. For instance, Ma et al. [51] used recurrent neural networks (RNNs) to analyse sequences of user comments using gated recurrent units, achieving an improvement in accuracy of approximately 15% with gated recurrent units (GRUs) compared with traditional machine learning ML models. Theirs method was further refined by Zubiaga et al. [52], who categorized classified user comments into various categories such as help, reject, question, and comment. They emphasized that the nature of user responses differs depending on the dissemination phase of the news. Similarly, Qian et al. [53] found that fake news tends to elicits more negative responses and questions compared with real news, particularly in the early stages when users have difficulty assessing its credibility. Recent studies have increasingly focused on the sentiment and emotion analysis of comments as a means to enhance the accuracy of fake news detection models. For instance, Guo et al. [28] reported an emotion-based framework that incorporates both publisher details and social emotions that improves the detection accuracy to a notable 87% by considering emotional signals from the content and user comments. In 2023, Hamed et al. [27] demonstrated the importance of using sentiment and emotion analysis in FND. Their approach achieved 90% accuracy, although they acknowledged the complexity of accurately capturing and interpreting emotional features from diverse social media data.Despite these advancements, explainability remains challenging. Many studies, such as Shu et al. [54] and Sharma et al. [55], offer explanations for their predictions on the basis of particular sentences and user comments. Shu et al.’s model achieved 90% accuracy, and Sharma et al.’s method saw a 2% improvement in accuracy over the previous approach [54]. However, the reliance on high-quality labeled datasets for training makes it difficult to generalize these models across different domains. Only a few systems possess robust explainability features, which are critical for gaining trust and ensuring that the models’ predictions are transparent to users. Additionally, the concept of generated comments, as proposed by Nan et al. [32] in 2024, introduces a novel approach to fake news detection. By leveraging large language models (LLMs) to generate diverse comments, this method aims to enrich the dataset and capture a broader range of user interactions. Nan et al.’s model achieved an accuracy of 89%, but the effectiveness of this approach depends heavily on the quality and representation of the generated comments.Although sentiment and emotion analysis are becoming integral to FND, the lack of explainability in many systems poses a barrier to their widespread adoption. The introduction of generated comments represents a promising direction, but further exploration is required to ensure that they can reliably enhance detection capability across different social media platforms. A limitation of comment-based FND is that it is difficult to locate unbiased comments among the rich information on social media. Because each social media platform is distinct, the same fake news on different platforms will have unique users and unique comments. To address this problem, leveraging comments from multiple social media platforms could be a potential solution. Ensuring consistency among comments across platforms can lead to improved detection accuracy. Comments from different platforms capture varied user viewpoints, actions, and language styles, offering complementary insights that can greatly improve fake news detection. Evaluating the consistency of comments across platforms can further improve detection reliability and accuracy. However, one of the primary reasons this approach remains unexplored in the literature is the lack of available datasets containing comments on the same news across multiple social media platforms. Collecting datasets from social media platforms is time-consuming and they often have API restrictions. This absence of multi-platform datasets highlights a critical gap, underscoring the need for research efforts to develop and utilize such datasets to advance cross-platform fake news detection methodologies.
2.2. Cross-Domain Fake News Detection
- Lack of diverse data: finding comprehensive datasets that span multiple domains represents a substantial challenge. The diversity and breadth of such data are crucial for building models that can accurately detect fake news across a wide range of topics and formats. Without extensive and varied datasets, models may struggle to generalize well beyond their training environments.
- Domain-specific features: adapting features that are specific to one domain for use in another is inherently difficult. Features that are highly indicative of fake news in one context may not be relevant or could even be misleading in another. This necessitates the development of sophisticated algorithms capable of identifying and leveraging transferable features that maintain their significance across various domains.
- Limited transfer learning techniques: although transfer learning offers a promising approach to cross-domain fake news detection, existing methods are still under refinement. Developing techniques that can seamlessly transfer knowledge from one domain to another without significant loss of accuracy or relevance remains a key research focus. Enhancements in this area are critical for creating models that can adapt to new and emerging forms of misinformation with minimal need for retraining.
3. System Architecture
- Initial screening module: Figure 1 describes the initial screening process where the probability, p, is calculated using linear regression (LR) classified model, which we call . This classifier employs an extensive feature set crafted by merging two main types of features: (i) features generated using part-of-speech (POS) tags; and (ii) features derived directly from the text, emphasizing word-level characteristics. POS tagging can often be ambiguous, and the introduction of word tags aims to mitigate this ambiguity [56]. A list of features used in training the LR classifier, , is shown in Table 2. New news text, T, is input into the trained model, , to output . Mathematically, it can be written as:After the initial screening by this method the value of is compared against a threshold (Section 5 sheds more light on how to determine the threshold value), and if the value of is lower than the threshold then the news is deemed be true with high confidence and the process is terminated. Otherwise, the process moves to the next social media module (SMM).
- Social media module: in the social media module, we collect similar news from the two different social media platforms and also the comments. Subsequently, we compute the probability of the comments being fake with respect to the news, T. The details are discussed below:
- Collection of similar news articles: here we consider collecting news items similar to T from two online social networks (OSNs). Collecting similar news articles from various OSNs, such as Twitter and Reddit, is an important step in our fake news detection system. Each platform attracts a unique user base with diverse content, opinions, and sources. By gathering news articles from multiple platforms, we ensure a comprehensive coverage of the news landscape, capturing different perspectives and reducing the risk of bias. The detailed procedure of collecting similar news from social media is described in Section 4 (Dataset Preparation).In addition to news articles, we collect the comments associated with these articles from social media platforms, because comments offer valuable insights into public sentiment, reactions, and potential biases related to the news items [57,58,59,60]. Analyzing these comments provides a deeper understanding of how users perceive and respond to the news. By considering this user-generated content, we can evaluate the overall credibility and reception of the news among the online community.
- Deriving probability from comments: after collecting comments, the probability of news, T, being fake is calculated by Algorithm 1, named as ‘Fake news detection from comments (FNDC)’. Some important aspects of this algorithm are given below:
- -
- FNDC analyzes the content and characteristics of each comment using natural language processing (NLP) and machine learning techniques. It examines the language patterns of the comment using BERT sentence embeddings to identify potential indicators of fake information [61]. The 768 features of BERT provide rich contextual representations and fine-grained semantic understanding at the sentence level, allowing FNDC to capture the nuanced meaning of each comment. For example, during the COVID-19 pandemic, comments like ‘This miracle herb can cure COVID-19 overnight, but doctors are keeping it a secret!’ were flagged. BERT’s embeddings helped in detecting sensationalist language, such as the adjectives ’miracle’ and ’overnight’ [56,62]. Similarly, in the context of the Russia–Ukraine war, statements like ‘Ukrainian forces have all surrendered, and this is being hidden by Western media!’ were identified. The use of absolute terms like ‘all’, combined with verbs implying secrecy such as ‘hidden’, were key indicators [63,64]. By leveraging these detailed BERT features, FNDC effectively enhances the model’s ability to identify and mitigate unreliable information, ensuring the integrity of the data analyzed. We call this FNDC model .
- -
- (i.e., FNDC) operates under the assumption that, if news is fake, the comments on the corresponding news article are also likely to be fake. Similarly, if news is true, it suggests that the associated comments are more likely to be true.
The detailed algorithm is given below, where input is a set of n comments, , and the probability of these nti-th comments being fake is denoted as :
Algorithm 1 Fake news detection from comments (FNDC) |
Require: List of comments from social media: |
Ensure: List of probabilities of comments being fake: |
1: Preprocessing: Clean and preprocess the comments to get |
2: Use Nearmiss algorithm for balancing data |
3: NLP Analysis: Extract features from using BERT, which is called encoded comments. |
4: Train Classifier: Use labeled data to train an MLP with as input to get model |
5: |
6: for each in do |
7: |
8: Append to |
9: end for |
10: Return |
- So, for the Twitter comments, , the probabilities of them being fake are denoted as , and for Reddit comments, they are denoted as , where both and range from 1 to n. The calculation of n is as follows: if there are comments from Twitter and comments from Reddit, the number of comments considered in the analysis is:
- Fusion module: from the last module, we find the probability of news being fake with respect to each comment. Now, we want to fuse the probability with uncertainty. The fusion module consists of three parts: correlation calculation, uncertainty calculation, and aggregation of the fused decision. The correlation calculation aims to understand the relationship between comments across platforms, while the uncertainty calculation assesses the reliability of the comments. Finally, the aggregation step combines these analyses to arrive at a unified decision about the credibility of the news.
- -
- Correlation calculation: to better understand the relationships between comments on two social media platforms, we calculated the correlations between them. Specifically, we measured the correlation between the probabilities of news being fake based on comments from each platform, using the FNDC module. This statistical measure helps quantify how closely the fake news probabilities from one platform align with those from the other. The correlation coefficient, C, ranges from to 1. Positive values () indicate that, as the probability of fake news increases on one platform, it tends to increase on the other platform as well. This suggests a similar trend in fake news probabilities across both platforms. Conversely, negative values () suggest an inverse relationship, where a higher probability of fake news on one platform corresponds to a lower probability on the other. Understanding these correlations helps us analyze how social media platforms interact and influence the spread of fake news. Table 3 illustrates our correlation calculation with an example involving comments from Twitter and Reddit about a news item detailing a confrontation between Roger Federer and Frances Tiafoe. Each comment of each platform (Twitter and Reddit) was encoded using BERT into a 768-dimensional vector. Let and denote the encoded comment for the i-th comment on Twitter and Reddit, respectively. These encoded comments were then processed through a machine learning model, , to calculate a probability for each comment. Mathematically, this is expressed generally for Twitter and Reddit as:In Table 3, the original comments, their encoded vectors, and the computed probabilities are presented for both Twitter and Reddit. For instance, a Twitter comment ‘Doesn’t account schedule Federer chooses play…’ is encoded and the corresponding probability is 0.08, while a Reddit comment ‘It 100 the time I topic /r/tennis’ is encoded and has a probability of 0.91.After calculating all probabilities using Algorithm 1, the correlation between the probabilities is calculated. To calculate these probabilities, pairs are formed between Twitter and Reddit comments based on their timestamps. This means the first comment on Twitter is paired with the first comment on Reddit, and so on. Specifically, the Pearson correlation coefficient (C) is determined to evaluate the relationship between the sets of probabilities.The correlation value (Table 3) of −0.61 reveals a moderate-to-strong inverse relationship between comments on Twitter and Reddit, highlighting contrasting patterns of user engagement and interpretation across the two platforms. When comments on Twitter align positively with a news item, Reddit comments often challenge or refute it, and vice versa. This divergence reflects the distinct user behaviors and discussion dynamics unique to each platform—Twitter’s real-time and concise communication contrasts with Reddit’s preference for detailed and critical discussions. This negative correlation underscores the variability in how users interact with the same news item across platforms, shaped by their respective functional and cultural characteristics. Such findings reinforce the importance of cross-platform analysis in understanding how information is debated and interpreted, offering valuable insights into the mechanisms of news dissemination and the potential for misinformation to spread.
- -
- Comment analysis with uncertainty: after calculating the correlation, the next step is uncertainty. This process is divided into three parts. First, we calculate the uncertainty based on the correlation between comments. Next, we compute both the support and non-support probabilities to quantify the likelihood of the news being fake or true in the presence of uncertainty. Finally, we fuse these decisions to arrive at a comprehensive conclusion.
- ◦
- Uncertainty calculation: analyzing comments from two social media platforms can introduce the possibility of uncertainty that needs to be addressed. Calculating the uncertainty associated with comment analysis provides a measure of the confidence or reliability of the analysis results. This information aids in interpreting the findings and making informed decisions or drawing accurate conclusions based on the analyzed comments with the presence of uncertainty. Here we measure uncertainty, U, as defined by the following equation:Figure 2 illustrates the relationship between correlation (C) and uncertainty (U), and comes from Equation (1). When there is a positive or negative correlation, the level of uncertainty is relatively low. However, as C approaches zero, U increases significantly. When , the uncertainty reaches its maximum value of 1.The justification of this equation for uncertainty, U, is linked to Shannon’s information theory, which quantifies uncertainty as a measure of unpredictability in probabilistic systems. When the correlation (C) between two social media platforms is zero (), the platforms exhibit no relationship, leading to maximum uncertainty (), consistent with the concept of maximum entropy in the absence of information. As the absolute correlation () increases, uncertainty decreases, reflecting stronger consistency in evidence across platforms. The term models this reduction linearly, while the logarithmic components amplify the sensitivity to weak correlations (), where the lack of agreement leads to higher uncertainty. The normalization factor ensures U remains bounded and interpretable. This formulation is an adaptation of entropy-based principles commonly applied in communication systems [65], making it suitable for modeling uncertainty in cross-platform fake news detection.We have used the same uncertainty calculation method across different social media platforms for all comments to ensure fairness and comparability in the analysis.
- ◦
- Support and non-support probability calculation: assessing the level of support, , or non-support, , expressed in comments is another key factor for evaluating the credibility of information or detecting fake news. Support probability means the probability of news being fake in the presence of uncertainty and the inverse for non-support. For any comment, the general rule is this:So, the calculation of support probability for Twitter follows as:Similarly, the support probability for Reddit is defined as:So, after completing this step, we obtain a ‘support’ probability and a ‘non-support’ probability for each comment on both social media platforms. Given that we have n comments from both Twitter and Reddit, after this step we have n support and n non-support probabilities for each platform. Additionally, a common uncertainty, U, value is associated with every comment across all platforms.
- ◦
- Fuse the support probabilities: afterwards, the pairwise support probabilities of comments from both Twitter and Reddit are fused using the Dempster–Shafer (DS) combination rule. This approach involves combining the support probabilities from both platforms according to the equation below.Since there are a total of n comments, the combined values form a list, , for .
- -
- Aggregate the fused decision: after fusing the decisions from comments on two different platforms using Dempster–Shafer theory, we aggregate the combined support probabilities, , to obtain the final probability, . This is done by first determining the majority class of the support values (whether more values are equal or greater than 0.5 or less than 0.5) and then averaging the support values that belong to the majority class. This approach ensures a comprehensive and balanced assessment of the information gathered from both Twitter and Reddit comments.
- -
- Hypothetical case study: COVID-19 vaccine misinformation: to illustrate the applicability of our model, consider a hypothetical fake news case claiming that “alcohol cures COVID-19”. Our proposed system would first analyze the content of the news article. If the content is rich, detailed, and linguistically credible, the system assigns a low probability of the news being fake, indicating it is likely true. However, if the content is shallow or lacks depth, the system assigns a high probability of the news being fake. In such cases, the model further analyzes user comments from multiple platforms, such as Twitter and Reddit, to gain additional insights.For this example, comments might display varying degrees of support or disagreement. Supportive comments might include:
- (a)
- “Finally, an easy cure for COVID! Alcohol every day is the way!”
- (b)
- “This is amazing! People need to know alcohol can save lives!”
Conversely, non-supportive comments might include:- (a)
- “This is fake news. Drinking alcohol won’t cure COVID-19.”
- (b)
- “Don’t trust this claim—it’s dangerous and unsupported by science!”
If comments across both platforms consistently support or refute the claim, this consistency strengthens the evidence, enhancing the model’s confidence in its prediction. For instance, consistent disagreement across platforms would reinforce the conclusion that the news is fake with lower uncertainty. On the other hand, if comments display conflicting patterns—supportive on one platform but refuting on another—the model incorporates high uncertainty. Correlation is calculated to check the consistency between comments of two social media platforms. By integrating content analysis and cross-platform comment evaluations, our system effectively detects fake news while addressing the nuances of cross-domain fake news. This case highlights the robustness of our approach in leveraging both content and user interactions to assess the credibility of news, even in scenarios involving conflicting or ambiguous signals.
4. Dataset Preparation
Algorithm 2 Similar news article retrieval from OSNs |
Require: News article T, Maximum news items z |
Ensure: Similar news articles from social media |
1: Extract top m keywords from T |
2: |
3: Generate prioritized queries |
4: |
5: Initialize list |
6: |
7: for each query q in Q do |
8: Append SearchPlatforms(q) to |
9: if length(results) then |
10: break |
11: end if |
12: end for |
13: Compute BERT features for T |
14: |
15: for each news article r in results do |
16: Compute BERT features for r |
17: |
18: Compute similarity |
19: |
20: end for |
21: Sort news articles by similarity in descending order |
22: First article in sorted results |
23: return |
5. Experimental Setup
6. Experimental Results
Comparison with the Previous Methods
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mayfield, A. What Is Social Media. 2009. Available online: https://online.anyflip.com/fmli/xqvr/mobile/index.html#p=8 (accessed on 23 July 2024).
- Van der Meer, T.G.; Verhoeven, P. Public framing organizational crisis situations: Social media versus news media. Public Relations Rev. 2013, 39, 229–231. [Google Scholar] [CrossRef]
- Boczkowski, P.J.; Mitchelstein, E.; Matassi, M. “News comes across when I’m in a moment of leisure”: Understanding the practices of incidental news consumption on social media. New Media Soc. 2018, 20, 3523–3539. [Google Scholar] [CrossRef]
- De Corniere, A.; Sarvary, M. Social media and news: Content bundling and news quality. Manag. Sci. 2023, 69, 162–178. [Google Scholar] [CrossRef]
- Iida, T.; Song, J.; Estrada, J.L.; Takahashi, Y. Fake news and its electoral consequences: A survey experiment on Mexico. AI Soc. 2024, 39, 1065–1078. [Google Scholar] [CrossRef]
- McKay, S.; Tenove, C. Disinformation as a threat to deliberative democracy. Political Res. Q. 2021, 74, 703–717. [Google Scholar] [CrossRef]
- Nguyen, D.; Hekman, E. The news framing of artificial intelligence: A critical exploration of how media discourses make sense of automation. AI Soc. 2024, 39, 437–451. [Google Scholar] [CrossRef]
- Kogan, S.; Moskowitz, T.J.; Niessner, M. Fake news: Evidence from financial markets. SSRN Electron. J. 2019, 3237763. [Google Scholar] [CrossRef]
- Kogan, S.; Moskowitz, T.J.; Niessner, M. Fake News in Financial Markets; Social Science Research Network (SSRN): Rochester, NY, USA, 2020. [Google Scholar]
- USC Scientists Discover the Real Reason Why Fake News Spreads on Social Media—scitechdaily.com. Available online: https://scitechdaily.com/usc-scientists-discover-the-real-reason-why-fake-news-spreads-on-social-media/ (accessed on 22 July 2023).
- Janicka, M.; Pszona, M.; Wawer, A. Cross-domain failures of fake news detection. Comput. Sist. 2019, 23, 1089–1097. [Google Scholar] [CrossRef]
- Haenlein, M.; Anadol, E.; Farnsworth, T.; Hugo, H.; Hunichen, J.; Welte, D. Navigating the New Era of Influencer Marketing: How to be Successful on Instagram, TikTok, & Co. Calif. Manag. Rev. 2020, 63, 5–25. [Google Scholar]
- Ancu, M. Older adults on Facebook: A survey examination of motives and use of social networking by people 50 and older. Fla. Commun. J. 2012, 40, 1–12. [Google Scholar]
- Parmelee, J.H.; Bichard, S.L. Politics and the Twitter Revolution: How Tweets Influence the Relationship Between Political Leaders and the Public. Political Sci. Q. 2013, 128, 178–180. Available online: http://www.jstor.org/stable/23563384 (accessed on 30 August 2023).
- Nguyen, M. Twitter’s Role In Politics—northwesternbusinessreview.org. Available online: https://northwesternbusinessreview.org/twitters-role-in-politics-b3ed620465c9 (accessed on 30 August 2023).
- Utz, S.; Breuer, J. The relationship between networking, LinkedIn use, and retrieving informational benefits. Cyberpsychol. Behav. Soc. Netw. 2019, 22, 180–185. [Google Scholar] [CrossRef] [PubMed]
- 8 Facts About Americans and Twitter as It Rebrands to X—pewrsr.ch. Available online: https://pewrsr.ch/44HbxcN (accessed on 30 August 2023).
- Takhteyev, Y.; Gruzd, A.; Wellman, B. Geography of Twitter networks. Soc. Netw. 2012, 34, 73–81. [Google Scholar] [CrossRef]
- Donchenko, D.; Ovchar, N.; Sadovnikova, N.; Parygin, D.; Shabalina, O.; Ather, D. Analysis of comments of users of social networks to assess the level of social tension. Procedia Comput. Sci. 2017, 119, 359–367. [Google Scholar] [CrossRef]
- Li, L.; Wen, H.; Zhang, Q. Characterizing the role of Weibo and WeChat in sharing original information in a crisis. J. Contingencies Crisis Manag. 2023, 31, 236–248. [Google Scholar] [CrossRef]
- Japan Social Media Statistics 2023 | Most Popular Social Media Platforms—theglobalstatistics.com. Available online: https://www.theglobalstatistics.com/japan-social-media-statistics/?expand_article=1 (accessed on 30 August 2023).
- Pérez-Rosas, V.; Kleinberg, B.; Lefevre, A.; Mihalcea, R. Automatic detection of fake news. arXiv 2017, arXiv:1708.07104. [Google Scholar]
- Gautam, A.; Jerripothula, K.R. Sgg: Spinbot, grammarly and glove based fake news detection. In Proceedings of the 2020 IEEE Sixth International Conference on Multimedia Big Data (bigMM), New Delhi, India, 24–26 September 2020; pp. 174–182. [Google Scholar]
- Saikh, T.; De, A.; Ekbal, A.; Bhattacharyya, P. A deep learning approach for automatic detection of fake news. arXiv 2020, arXiv:2005.04938. [Google Scholar]
- Goel, P.; Singhal, S.; Aggarwal, S.; Jain, M. Multi domain fake news analysis using transfer learning. In Proceedings of the 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 8–10 April 2021; pp. 1230–1237. [Google Scholar]
- Foster, C.L. Truth as social practice in a digital era: Iteration as persuasion. AI Soc. 2023, 38, 2009–2023. [Google Scholar] [CrossRef]
- Hamed, S.K.; Ab Aziz, M.J.; Yaakub, M.R. Fake news detection model on social media by leveraging sentiment analysis of news content and emotion analysis of users’ comments. Sensors 2023, 23, 1748. [Google Scholar] [CrossRef]
- Guo, C.; Cao, J.; Zhang, X.; Shu, K.; Yu, M. Exploiting emotions for fake news detection on social media. arXiv 2019, arXiv:1903.01728. [Google Scholar]
- Xu, X.; Li, X.; Wang, T.; Jiang, Y. AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection. In Proceedings of the International Conference on Multimedia Modeling, Nara, Japan, 8–10 January 2025; pp. 86–100. [Google Scholar]
- Alonso, M.A.; Vilares, D.; Gómez-Rodríguez, C.; Vilares, J. Sentiment analysis for fake news detection. Electronics 2021, 10, 1348. [Google Scholar] [CrossRef]
- Yanagi, Y.; Orihara, R.; Sei, Y.; Tahara, Y.; Ohsuga, A. Fake news detection with generated comments for news articles. In Proceedings of the 2020 IEEE 24th International Conference on Intelligent Engineering Systems (INES), Reykjavík, Iceland, 8–10 July 2020; pp. 85–90. [Google Scholar]
- Nan, Q.; Sheng, Q.; Cao, J.; Hu, B.; Wang, D.; Li, J. Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models. arXiv 2024, arXiv:2405.16631. [Google Scholar]
- Palacios Barea, M.; Boeren, D.; Ferreira Goncalves, J. At the intersection of humanity and technology: A technofeminist intersectional critical discourse analysis of gender and race biases in the natural language processing model GPT-3. AI Soc. 2023, 1–19. [Google Scholar] [CrossRef]
- Goldstein, S.; Kirk-Giannini, C.D. Language agents reduce the risk of existential catastrophe. AI Soc. 2023, 1–11. [Google Scholar] [CrossRef]
- O’Connor, S.; Liu, H. Gender bias perpetuation and mitigation in AI technologies: Challenges and opportunities. AI Soc. 2023, 39, 2045–2057. [Google Scholar] [CrossRef]
- Kaliyar, R.K.; Goswami, A.; Narang, P. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimed. Tools Appl. 2021, 80, 11765–11788. [Google Scholar] [CrossRef]
- Ahn, Y.C.; Jeong, C.S. Natural language contents evaluation system for detecting fake news using deep learning. In Proceedings of the 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE), Chonburi, Thailand, 10–12 July 2019; pp. 289–292. [Google Scholar]
- Safaya, A.; Abdullatif, M.; Yuret, D. Kuisail at semeval-2020 task 12: Bert-cnn for offensive speech identification in social media. arXiv 2020, arXiv:2007.13184. [Google Scholar]
- Capuano, N.; Fenza, G.; Loia, V.; Nota, F.D. Content-based fake news detection with machine and deep learning: A systematic review. Neurocomputing 2023, 530, 91–103. [Google Scholar] [CrossRef]
- Pan, J.Z.; Pavlova, S.; Li, C.; Li, N.; Li, Y.; Liu, J. Content based fake news detection using knowledge graphs. In Proceedings of the The Semantic Web–ISWC 2018: 17th International Semantic Web Conference, Monterey, CA, USA, 8–12 October 2018; Proceedings, Part I 17. pp. 669–683. [Google Scholar]
- Wynne, H.E.; Wint, Z.Z. Content based fake news detection using n-gram models. In Proceedings of the 21st International Conference on Information Integration and Web-Based Applications & Services, Munich, Germany, 2–4 December 2019; pp. 669–673. [Google Scholar]
- Szczepański, M.; Pawlicki, M.; Kozik, R.; Choraś, M. New explainability method for BERT-based model in fake news detection. Sci. Rep. 2021, 11, 23705. [Google Scholar] [CrossRef]
- Kula, S.; Choraś, M.; Kozik, R. Application of the bert-based architecture in fake news detection. In Proceedings of the 13th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2020) 12, Burgos, Spain, 16–18 September 2020; pp. 239–249. [Google Scholar]
- Kumar, S.; Kumar, G.; Singh, S.R. Text_Minor at CheckThat!-2022: Fake News Article Detection Using RoBERT. In Proceedings of the CLEF (Working Notes), Bologna, Italy, 5–8 September 2022; pp. 554–563. [Google Scholar]
- Stewart, J.; Lyubashenko, N.; Stefanek, G. The efficacy of detecting AI-generated fake news using transfer learning. Issues Inf. Syst. 2023, 24, 164–177. [Google Scholar]
- He, C.; Chen, S.; Huang, S.; Zhang, J.; Song, X. Using convolutional neural network with BERT for intent determination. In Proceedings of the 2019 International Conference on Asian Language Processing (IALP), Shanghai, China, 15–17 November 2019; pp. 65–70. [Google Scholar]
- Trueman, T.E.; Kumar, A.; Narayanasamy, P.; Vidya, J. Attention-based C-BiLSTM for fake news detection. Appl. Soft Comput. 2021, 110, 107600. [Google Scholar] [CrossRef]
- Fang, Y.; Gao, J.; Huang, C.; Peng, H.; Wu, R. Self multi-head attention-based convolutional neural networks for fake news detection. PLoS ONE 2019, 14, e0222713. [Google Scholar] [CrossRef] [PubMed]
- Arin, K.P.; Mazrekaj, D.; Thum, M. Ability of detecting and willingness to share fake news. Sci. Rep. 2023, 13, 7298. [Google Scholar] [CrossRef] [PubMed]
- Shrestha, A.; Spezzano, F. Characterizing and predicting fake news spreaders in social networks. Int. J. Data Sci. Anal. 2022, 13, 385–398. [Google Scholar] [CrossRef]
- Ma, J.; Gao, W.; Mitra, P.; Kwon, S.; Jansen, B.J.; Wong, K.F.; Cha, M. Detecting rumors from microblogs with recurrent neural networks. In Proceedings of the 25th International Joint Conference on Artificial Intelligence: IJCAI, New York, NY, USA, 9–15 July 2016; pp. 3818–3824. [Google Scholar]
- Zubiaga, A.; Aker, A.; Bontcheva, K.; Liakata, M.; Procter, R. Detection and resolution of rumours in social media: A survey. Acm Comput. Surv. (Csur.) 2018, 51, 1–36. [Google Scholar] [CrossRef]
- Qian, F.; Gong, C.; Sharma, K.; Liu, Y. Neural User Response Generator: Fake News Detection with Collective User Intelligence. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; Volume 18, pp. 3834–3840. [Google Scholar]
- Shu, K.; Cui, L.; Wang, S.; Lee, D.; Liu, H. dEFEND: Explainable Fake News Detection. In Proceedings of the 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 395–405. [Google Scholar]
- Sharma, D.K.; Sharma, S. Comment filtering based explainable fake news detection. In Proceedings of the Second International Conference on Computing, Communications, and Cyber-Security: IC4S, Delhi, India, 3–4 October 2020; pp. 447–458. [Google Scholar]
- Ferdush, J.; Kamruzzaman, J.; Karmakar, G.; Gondal, I.; Das, R. Identification of Fake News: A Semantic Driven Technique for Transfer Domain. In Proceedings of the International Conference on Neural Information Processing, Virtual Event, 22–26 November 2022; pp. 564–575. [Google Scholar]
- Ziegele, M.; Breiner, T.; Quiring, O. What creates interactivity in online news discussions? An exploratory analysis of discussion factors in user comments on news items. J. Commun. 2014, 64, 1111–1138. [Google Scholar] [CrossRef]
- Ziegele, M.; Weber, M.; Quiring, O.; Breiner, T. The dynamics of online news discussions: Effects of news articles and reader comments on users’ involvement, willingness to participate, and the civility of their contributions. Inf. Commun. Soc. 2018, 21, 1419–1435. [Google Scholar] [CrossRef]
- Sairambay, Y.; Kamza, A.; Kap, Y.; Nurumov, B. Monitoring public electoral sentiment through online comments in the news media: A comparative study of the 2019 and 2022 presidential elections in Kazakhstan. Media Asia 2024, 51, 33–61. [Google Scholar] [CrossRef]
- Raza, S.; Reji, D.J.; Ding, C. Dbias: Detecting biases and ensuring fairness in news articles. Int. J. Data Sci. Anal. 2024, 17, 39–59. [Google Scholar] [CrossRef]
- Hu, B.; Sheng, Q.; Cao, J.; Shi, Y.; Li, Y.; Wang, D.; Qi, P. Bad actor, good advisor: Exploring the role of large language models in fake news detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 22105–22113. [Google Scholar]
- Micallef, N.; He, B.; Kumar, S.; Ahamad, M.; Memon, N. The role of the crowd in countering misinformation: A case study of the COVID-19 infodemic. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 748–757. [Google Scholar]
- Tanchak, P.N. The invisible front: Russia, trolls, and the information war against Ukraine. Revolut. War Contemp. Ukr. Chall. Change 2017, 161, 253. [Google Scholar]
- Ferdush, J.; Kamruzzaman, J.; Karmakar, G.; Gondal, I.; Das, R. Detecting Fake News of Evolving Events using Machine Learning: Case of Russia-Ukraine War. In Proceedings of the 35th Australasian Conference on Information Systems, Wellington, New Zealand, 5–8 December 2023; Available online: https://aisel.aisnet.org/acis2023/122 (accessed on 30 August 2023).
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Shafer, G. Dempster’s rule of combination. Int. J. Approx. Reason. 2016, 79, 26–40. [Google Scholar] [CrossRef]
- Scikit-Learn Developers. sklearn.linear_model.LinearRegression. 2023. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html (accessed on 23 July 2024).
Notation | Description |
---|---|
T | News text being evaluated |
Probability of a news text T being fake as determined by the model | |
Semantically similar news to T found on Twitter | |
Semantically similar news to T found on Reddit | |
and | Machine learning models |
U | Uncertainty |
Twitter’s comments | |
Reddit’s comments | |
Encoded comments for the i-th comment of Twitter | |
Encoded comments for the i-th comment of Reddit | |
C | Correlation |
Probability derived from the i-th comment of Twitter by model | |
Probability derived from the i-th comment of Reddit by model | |
and | Support and non-support probability for the i-th comment of Reddit |
and | Support and non-support probability for the i-th comment of Reddit |
Fused probability of i-th comment |
Word tags | Word count, char count, average word count, hashtags count, link count, number of length, user mention count |
Pos tags | CC—Coordinating conjunction, CD—Cardinal number, DT—Determiner, EX—Existential there, FW—Foreign word, IN-Preposition, JJ—Simple adjective, JJR—Comparative adjective, JJS—Superlative adjective, MD—Modal, NN—Singular noun, NNP—Noun proper singular, NNPS—Noun proper plural, NNS—Noun plural, PDT—Pre-determiner, POS—Possessive ending, PRP—Personal pronoun, PRP$—Possessive pronoun, RB—Adverb, RBR—Adverb comparative, RBS—Adverb superlative, RP—Particle, SYM—Symbol, TO—to, UH—Interjection, VB—Verb base form, VBD—Verb past form, VBG—Verb present or gerund particle, VBN—Verb past participle, VBP—Verb 3rd person singular, LS—List marker, VBZ—Verb 3rd person singular, WDT, WP—Wh determiner, WP$—possessive wh pronoun, WRB—Wh adverb and other symbols |
News item: | Roger Federer has fist fight with Frances Tiafoe after Miami Open defeat. Things became extremely heated between world class tennis star Roger Federer and American teenager and tennis up and comer Frances Tiafoe after their Miami Open tennis match. The match was not close, but the two players were playing in rainy and windy conditions, which gave Federer an edge with his years of experience over Tiafoe. After Federer beat the younger tennis professional in three sets, the two players began to yell at each other. Tiafoe was angry about several alleged incorrect calls made by Federer in the match. Tiafoe then jumped over the net and attacked Federer with several punches. Federer defended himself until several observers came and broke up the fight. The two have both issued public apologies to their fans and to each other, but clearly things will not be settled until they face each other on the court another time. | |||||
Correlation, C | ||||||
Comments, | Encoded comments, | Probability, | Comments, | Encoded comments, | Probability, | −0.61 |
Doesn’t account schedule Federer chooses play. I mean didn’t bypass French prioritise grasscourt season. | [−0.51961124, 0.36066872, 1.1010087, …] | 0.08 | It 100th time I topic /r/tennis | [−5.89307308 , 6.89936399 , 5.71980417 , …] | 0.91 | |
Managed clinch important must-win match fired isner. respect man’s name | [−7.77983904 , 2.72371382 , 7.28409469 , …] | 0.99 | Nice. don’t want contribute I think add categorie “Mental Strenght” seeing Federer “Saving BPs” horrendous BP conversion rate doesn’t fit. | [−5.32212555 , 3.82354587 , 1.10202396 , …] | 0.03 | |
Goat here, shit man | [5.41974604 , 7.15636134 , 2.83419457 , …] | 0.99 | Needs Nadals bald patch | [1.14744902 , 8.40671659 , 6.28644377 , …] | 0.99 | |
. | . | . | . | . | . | |
. | . | . | . | . | . | |
. | . | . | . | . | . |
Dataset | Total Number of News | Avg Words/Article | Avg Words/Sentence | Distribution (Fake/True) |
---|---|---|---|---|
Celebrity | 500 | 122 | 24 | 250/250 |
FakeNewsAMT | 480 | 132 | 23 | 240/240 |
Hidden layer: (100,100,100,100,100) Epoch: 500 | |||
---|---|---|---|
Threshold | Accuracy | Precision | Recall |
0.1 | 0.77 | 0.73 | 0.85 |
0.15 | 0.82 | 0.80 | 0.85 |
0.2 | 0.82 | 0.82 | 0.81 |
0.25 | 0.84 | 0.88 | 0.79 |
0.3 | 0.84 | 0.90 | 0.77 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ferdush, J.; Kamruzzaman, J.; Karmakar, G.; Gondal, I.; Das, R. Cross-Domain Fake News Detection Through Fusion of Evidence from Multiple Social Media Platforms. Future Internet 2025, 17, 61. https://doi.org/10.3390/fi17020061
Ferdush J, Kamruzzaman J, Karmakar G, Gondal I, Das R. Cross-Domain Fake News Detection Through Fusion of Evidence from Multiple Social Media Platforms. Future Internet. 2025; 17(2):61. https://doi.org/10.3390/fi17020061
Chicago/Turabian StyleFerdush, Jannatul, Joarder Kamruzzaman, Gour Karmakar, Iqbal Gondal, and Rajkumar Das. 2025. "Cross-Domain Fake News Detection Through Fusion of Evidence from Multiple Social Media Platforms" Future Internet 17, no. 2: 61. https://doi.org/10.3390/fi17020061
APA StyleFerdush, J., Kamruzzaman, J., Karmakar, G., Gondal, I., & Das, R. (2025). Cross-Domain Fake News Detection Through Fusion of Evidence from Multiple Social Media Platforms. Future Internet, 17(2), 61. https://doi.org/10.3390/fi17020061