Collecting a Large Scale Dataset for Classifying Fake News Tweets Using Weak Supervision
Round 1
Reviewer 1 Report
This paper utilizes weak supervision to learn a fake news classifier from noise data.
The contributions of this paper are as follows:
- Collect a large amount of Twitter datasets and weakly label these samples through the credibility of the source.
- Apply many machine learning methods to this dataset and the weakly labeled samples achieve stable performance.
- Collect a list of features from users, tweet contents, and sentiment. This will help further research in feature engineering in fake news detection.
The limitations of this paper are as follows:
- There is not enough information about the tweets. The authors only listed the statistical information of the dataset. It would be better if the authors can provide more analysis information about the dataset. For example, the sentiment and topic distribution of credible and incredible news.
- Some notations in the table are unclear. In Table 1, what is the meaning of "No News"? Is this tweet deleted by the user or suspended by Twitter? In Table 2 and Figure 2, which part of the data is been evaluated? Is this part of training datasets or validation dataset (from PolitiFact).
- There is no comparison between the model only utilize the weakly labeled samples and the model with limited manually labeled samples. Although the selected models have achieved good performance, this can due to the easiness of the dataset. The authors should add some methods that do not utilize any weakly labeled samples or evaluate their approach in Fake News benchmark datasets like FakeNewsNet(KaiDMML/FakeNewsNet: This is a dataset for fake news detection research (github.com)), Liar(https://www.cs.ucsb.edu/~william/data/liar_dataset.zip) and etc.
- In lines 163-164, did the author observe the situation in the dataset? This claim is unconvinced to me.
- In lines 172-173, why did the author picks the 116 tweets that are close to fake news? Since PolitiFact also contains the real news category, the authors can also utilize the real news on that website.
Author Response
We would like to thank the reviewer for their thorough review.
We have addressed the points of critique as follows:
- We have added word clouds and further statistical diagrams for the training dataset.
- The data description has been made more self-contained.
- The experiment proposed by the reviewer has been conducted and the results are contained and discussed in the paper.
- Yes, it has been observed. We have also added a source to support this claim.
- The rationale of that setup is to prevent the classifier from learning a topical distinction, rather than the actual distinction of fake and real news. This has been clarified in the paper.
Author Response File: Author Response.docx
Reviewer 2 Report
The problem of automatic detection of fake news in social media, e.g., on Twitter, has recently drawn some attention. Although, from a technical perspective, it can be regarded as a straight-forward, binary classification problem, the major challenge is the collection of large enough training corpora, since manual annotation of tweets as fake or non-fake news is an expensive and tedious endeavor. Authors discuss a weakly supervised approach, which automatically collects a large-scale, but very noisy training dataset comprising hundreds of thousands of tweets. During collection, we automatically label tweets by their source, i.e., trustworthy or untrustworthy source, and train a classifier on this dataset. Authors use that classifier for a different classification target, i.e., the classification of fake and non-fake tweets. Although the labels are not accurate according to the new classification target (not all tweets by an untrustworthy source need to be fake news, and vice versa), and show that despite this unclean inaccurate dataset, it is possible to detect fake news tweets with an F1 score of up to 0.9.The paper is interesting overall but following are the comments that must be addressed.
Comments:
- Authors should revise the Abstract more technically in the current version abstract looks more general which is very confusing for the readers.
- The authors should add a block diagram which shows step by step of the proposed approach.
- Is any time complexity issue during experiments in Experiment and Evaluation,?? How you handle this issue ?? don’t discuss anywhere.
- Authors need to show the model structure of various machine learning classifiers.
- Authors miss experiments setup ???? Results are doubtful without any specific experiment environment. SO, show proper experiment setup as well as GitHub link, etc. better to attach the zip file with running code and outputs which will be good and help for the readers in the supplementary material.
- Fake and real news Dataset, will be imbalance how to tackle imbalance data?
- Authors need to explain how select these features?? 2. Tweet-level Features
- Before Conclusion, please draw a Table and compare with previous researchers, how your approach is better in terms of accuracy.
- Please conclude your manuscript in a more concrete way.
- Table 2: The performances of the learners with a bag-of-words model with 500 terms for
unigrams and 250 terms for bigrams why NB and RF are better as compared to other classical ML approaches in unigram and bigram variant respectively.
So Major contribution of this study looks very weak authors need to explain it very deeply and technically.
- Many formatting issues please fix it.
Author Response
We would like to thank the reviewer for their thorough report.
We believe there is a misconception here - it is not our goal to provide a new classification model, but rather a means to create a training dataset for such models.
We have improved the paper based on the suggestions. Improvements include, but are not limited to
- Provision of a GitHub repository with all the necessary code and data.
- Addition of a block diagram
- Revision of the abstract and conclusion
- More elaborate explanation of the feature selection
- Fixing of formatting issues
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
In this paper, the authors have utilized the website credibility as the distant supervision to weakly label the tweets and train a classification model on the weakly labeled data.
However, the authors did not present the practicality of the proposed weakly labeled dataset. In Section 5.3, the model trained with the limited manually labeled dataset, achieved better performance than the weak supervision dataset. If possible, the researchers can still manually label a small dataset for better results.
In addition, the authors do not evaluate the proposed weakly labeled dataset on fake news benchmark datasets. It would better if the authors can utilize the proposed dataset as the additional training dataset and achieve better performance by only utilizing the training dataset in the benchmark dataset.
Author Response
> If possible, the researchers can still manually label a small dataset for better results.
We have conducted an additional experiment for showcasing the added value of our extended dataset.
> In addition, the authors do not evaluate the proposed weakly labeled dataset on fake news benchmark datasets. It would better if the authors can utilize the proposed dataset as the additional training dataset and achieve better performance by only utilizing the training dataset in the benchmark dataset.
We have further strengthened the argument by showing that when computing the distributional features on the large, noisy training dataset, the results are better than using weak supervision and manual labeling in isolation.
Our results in the previous submission mixed these two aspects, i.e., the results on the manually labeled data were obtained by still using the distributional features from the large-scale training set, and were hence looking more biased towards the manual labeling approach. We think that the current version illustrates the trade off of the two approaches better.
Reviewer 2 Report
The authors did excellent work and resolve my previous comments but this paper still needs improvement and authors should add these references in the introduction section that will be good for the readers and improve the quality of the paper.
Jo, E. S., & Gebru, T. (2020, January). Lessons from archives: Strategies for collecting sociocultural data in machine learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 306-316).Khan, M. A., Karim, M., & Kim, Y. (2018). A two-stage big data analytics framework with real world applications using spark machine learning and long Short-term memory network. Symmetry, 10(10), 485.Helmstetter, S., & Paulheim, H. (2018, August). Weakly supervised learning for fake news detection on Twitter. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 274-277). IEEE.
All the Tables and Figures should have the same font and align with Text.
Author Response
> The authors did excellent work and resolve my previous comments but this paper still needs improvement and authors should add these references in the introduction section that will be good for the readers and improve the quality of the paper.
We have carefully re-read the article. Moreover, more recent references were added to the introduction. The references mentioned by the reviewer were incorporated where the seemed fitting, however, the second reference (Khan et al.) is topically unrelated to the paper at hand.
> All the Tables and Figures should have the same font and align with Text.
Where possible, the figures have been changed to use the same font as the text. However, some of them were created with specific tools which do not allow for customizing the font. We still believe that this does not diminish the message of this paper.