Next Article in Journal
Operations with Nested Named Sets as a Tool for Artificial Intelligence
Previous Article in Journal
Social Networks Marketing and Consumer Purchase Behavior: The Combination of SEM and Unsupervised Machine Learning Approaches
 
 
Article
Peer-Review Record

Illusion of Truth: Analysing and Classifying COVID-19 Fake News in Brazilian Portuguese Language

Big Data Cogn. Comput. 2022, 6(2), 36; https://doi.org/10.3390/bdcc6020036
by Patricia Takako Endo 1,*, Guto Leoni Santos 2, Maria Eduarda de Lima Xavier 1, Gleyson Rhuan Nascimento Campos 1, Luciana Conceição de Lima 3, Ivanovitch Silva 4, Antonia Egli 5 and Theo Lynn 5
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4: Anonymous
Big Data Cogn. Comput. 2022, 6(2), 36; https://doi.org/10.3390/bdcc6020036
Submission received: 25 January 2022 / Revised: 6 March 2022 / Accepted: 28 March 2022 / Published: 1 April 2022

Round 1

Reviewer 1 Report

Summary: Paper titled "Illusion of truth: Analyzing and classifying COVID-19 fake news in Brazilian Portuguese language" introduces a COVID-19 related dataset in Portuguese language generated by the authors, discusses exploratory data analysis and some machine learning based classification techniques applied on the introduced dataset.

Strength:

1- Introducing annotated datasets for detecting misinformation is of utmost important, specially considering the data hungry deep learning models and the scarcity of ground truth in this field.

2-Exploretory analysis presented in the paper is very interesting, even though it could be improved in terms of representation.

3-The data gathering process and annotation process e.g., fact checking are explained fairly.

Weaknesses: 

1-The authors did not well justify their motivation for generating a new dataset in Portuguese language. Why not using Machine Translation (NMT) on existing English datasets and generate Portuguese text? Why not using STOA language models e.g., BERT GPT3 pretrained on large textual data and use them for emerging events such as COVID-19. 

2- The paper does not cover all existing datasets on COVID-19. For instance, in the page 5 section 4.1 it is mentioned that "There is no COVID-19 dataset for Portuguese language". I refer the authors to the following multilingual dataset which comprises articles in 6 different languages i.e., English, Spanish, Portuguese, Hindi, French and Italian.

MM-COVID: A Multilingual and Multimodal Data Repository for Combating COVID-19 Disinformation

2- The dataset is highly imbalanced and the effect of class imbalance is not studied  and no solution is suggested. For instance, you can refer to the following paper, where the authors leverage augmentation technique  to tackle the class imbalance  problem:

NewsBag: A Multimodal Benchmark Dataset for Fake News Detection

3-The real set seems to be highly biased on G1 outlet. Specially considering the classification results reported in the paper. For the fake class it is just mentioned that the articles are gathered by boatos.org and the number of domains is not mentioned. One possible reason for high accuracy of the ML/DL model might be the fact that models easily learn the language structure of G1 outlet.

5- In the experiment part, specially for deep learning techniques the ROC curves is not reported and is not clear if the reported results are overfitted or not.

4- The exploratory part specially the examples in the pages 8 are sort of confusing. This part needs to be presented better. There are also typos in the text that needs to be edited e.g.,  page 3 paragraph 4 the citation is broken.

 

 

 

Author Response

Summary: Paper titled "Illusion of truth: Analyzing and classifying COVID-19 fake news in Brazilian Portuguese language" introduces a COVID-19 related dataset in Portuguese language generated by the authors, discusses exploratory data analysis and some machine learning based classification techniques applied on the introduced dataset.

Strength:

1- Introducing annotated datasets for detecting misinformation is of utmost important, specially considering the data hungry deep learning models and the scarcity of ground truth in this field.

2-Exploretory analysis presented in the paper is very interesting, even though it could be improved in terms of representation.

3-The data gathering process and annotation process e.g., fact checking are explained fairly.

Answer: Thank you for the comments. We have revised the entire manuscript in order to improve its quality and contribution.

Weaknesses: 

1-The authors did not well justify their motivation for generating a new dataset in Portuguese language. Why not using Machine Translation (NMT) on existing English datasets and generate Portuguese text? Why not using STOA language models e.g., BERT GPT3 pretrained on large textual data and use them for emerging events such as COVID-19. 

Answer: Thank you for your comments. We have updated the text to address this issue specifically. Portuguese is a pluricentric or polycentric language in that it possesses more than one standard (national) variety, e.g., European Portuguese and Brazilian Portuguese, as well as African varieties. Furthermore, Brazilian Portuguese has been characterised as highly diglossic, i.e., it has a formal traditional form of the language, the so-called H-variant, and the Brazilian vernacular, the L-variant, as well as a wide range of dialects (Silva, 2004; da Silva, 2010). The COVID-19 pandemic introduced new terms and new public health concepts to the global linguistic repertoire, which in turn introduced a number of language challenges, not least problems related to the translation and use of multilingual terminology in public health information and medical research from dominant languages (Piller et al. 2020). Consequently, building models based on English language translation which does not take into account the specific features of the Brazilian Portuguese language and the specific language challenges of COVID-19 is likely to be inadequate thus motivating this work.

2- The paper does not cover all existing datasets on COVID-19. For instance, in the page 5 section 4.1 it is mentioned that "There is no COVID-19 dataset for Portuguese language". I refer the authors to the following multilingual dataset which comprises articles in 6 different languages i.e., English, Spanish, Portuguese, Hindi, French and Italian.

MM-COVID: A Multilingual and Multimodal Data Repository for Combating COVID-19 Disinformation

Answer: In addition to the explanation in Response No. 1 above, it is important to highlight our focus on Brazilian-specific COVID-19 fake news; other datasets that circulated in Brazil and therefore other datasets may not only be based on European Portuguese but lack the specific context of COVID-19 in Brazil, and thus may be inadequate for machine classification. For example, MM-COVID appears to use both European Portuguese and Brazilian Portuguese interchangeably. Notwithstanding this, we have included additional text regarding MM-COVID but also two additional datasets FakeWhatsApp.BR and COVID19.BR in the related works section.

2- The dataset is highly imbalanced and the effect of class imbalance is not studied  and no solution is suggested. For instance, you can refer to the following paper, where the authors leverage augmentation technique  to tackle the class imbalance  problem:

NewsBag: A Multimodal Benchmark Dataset for Fake News Detection

Answer: We explain in the text: "We applied a random undersampling technique to balance the dataset where the largest class is randomly trimmed until it is the

same size as the smallest class. The final dataset used to train and test the models comprises 1,047 fake news items and 1,047 true news items, totaling 2,094 items."

3-The real set seems to be highly biased on G1 outlet. Specially considering the classification results reported in the paper. For the fake class it is just mentioned that the articles are gathered by boatos.org and the number of domains is not mentioned. One possible reason for high accuracy of the ML/DL model might be the fact that models easily learn the language structure of G1 outlet.

Answer: While the boatos.org does not provide the domains where fake news is collected, all referent items are confirmed as real articles and widely circulated in Brazil. 

5- In the experiment part, specially for deep learning techniques the ROC curves is not reported and is not clear if the reported results are overfitted or not.

Answer: We have plotted the ROC curve and AUC results of all deep learning and machine learning models. Please see pages 15 and 16 (Figures 8 and 9) .

4- The exploratory part specially the examples in the pages 8 are sort of confusing. This part needs to be presented better. There are also typos in the text that needs to be edited e.g.,  page 3 paragraph 4 the citation is broken.

Answer: Thank you. We have rewritten much of the text and addressed typographical and grammatical errors, as well as broken citation links.

Reviewer 2 Report

The paper provides a dataset of fake news in Portuguese, an exploratory analysis data analysis on Brazilian fake news on COVID-19, and a comparison on machine learning and deep learning models to detect Covid-19 fake news in the Brazilian language.

In my opinion, the quality of the paper needs some improvements. 

First of all, it is necessary to clearly and precisely state what makes this work relevant and original.

The discussion on related works should be improved since it does not allow evaluating the originality degree of the work. The authors should focus on the specific original contribution and improve the discussion characterizing each work, its benefits and drawbacks, and a comparison with the closest state-of-the-art works in order to justify the necessity of the work proposed in the paper.

In particular, the paper needs to be improved by adding a discussion on available datasets for fake news in Portuguese as well as a discussion on the main characteristics of the available datasets. Some examples are:

  • Martins, A.D., Cabral, L., Mourão, P.J., de Sá, I.C., Monteiro, J.M., & Machado, J.C. (2021). COVID19.BR: A Dataset of Misinformation about COVID-19 in Brazilian Portuguese WhatsApp Messages. Anais do III Dataset Showcase Workshop (DSW 2021).
  • Li, Yichuan & Jiang, Bohan & Shu, Kai & Liu, Huan. (2020). MM-COVID: A Multilingual and Multidimensional Data Repository for CombatingCOVID-19 Fake New.
  • D’Ulizia A, Caschera MC, Ferri F, Grifoni P. 2021. Fake news detection: a survey of evaluation datasets. PeerJ Computer Science 7:e518 https://doi.org/10.7717/peerj-cs.518
  • Taichi Murayama. 2021. Dataset of Fake News Detection and Fact Verification: A Survey. arXiv:2111.03299

In addition, the paper could be improved by adding a comparison with other works on detection of Brazilian fake news, as for example in “M. Paixão, R. Lima and B. Espinasse, "Fake News Classification and Topic Modeling in Brazilian Portuguese," 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 2020, pp. 427-432, doi: 10.1109/WIIAT50758.2020.00063.”

Finally, for improving the readability of the paper, the description of the evaluation process should be improved.

Author Response

The paper provides a dataset of fake news in Portuguese, an exploratory analysis data analysis on Brazilian fake news on COVID-19, and a comparison on machine learning and deep learning models to detect Covid-19 fake news in the Brazilian language.

In my opinion, the quality of the paper needs some improvements. 

First of all, it is necessary to clearly and precisely state what makes this work relevant and original.

Answer: This article makes a number of contributions. Firstly, we provide a dataset composed of 11,382 articles in the Portuguese language comprising 10,285 articles labelled 'true news' and 1,047 articles labelled 'fake news' in relation to COVID-19. Secondly, we present an exploratory data analysis on COVID-19 fake news that circulated in Brazil. Thirdly, we propose and compare machine learning and deep learning models to detect COVID-19 fake news in the Portuguese language, and analyse the impact of removing stop-words from the messages.

The discussion on related works should be improved since it does not allow evaluating the originality degree of the work. The authors should focus on the specific original contribution and improve the discussion characterizing each work, its benefits and drawbacks, and a comparison with the closest state-of-the-art works in order to justify the necessity of the work proposed in the paper.

In particular, the paper needs to be improved by adding a discussion on available datasets for fake news in Portuguese as well as a discussion on the main characteristics of the available datasets. 

Answer:  Thank you for your comment. Firstly, we have added supplemental text on the motivation for a  new fake news Brazilian-specific dataset in the Introduction. Secondly, we have added additional references to other data-sets, e.g. MM-COVID, FakeWhatsApp.BR and COVID-19.BR. in the Related Works section. These were not available at the initial time of writing. We also discuss the research performed using the FakeWhatsApp.BR and COVID-19.BR datasets in Cabral et al. (2021), Martins et al. (2021a) and Martins et al. (2021b). Finally, we highlight the specific differences between our dataset and study in the related works.

Some examples are:

Martins, A.D., Cabral, L., Mourão, P.J., de Sá, I.C., Monteiro, J.M., & Machado, J.C. (2021). COVID19.BR: A Dataset of Misinformation about COVID-19 in Brazilian Portuguese WhatsApp Messages. Anais do III Dataset Showcase Workshop (DSW 2021).

Answer: See above. This has been added.

Li, Yichuan & Jiang, Bohan & Shu, Kai & Liu, Huan. (2020). MM-COVID: A Multilingual and Multidimensional Data Repository for Combating COVID-19 Fake New.

Answer: See above. This has been added.

D’Ulizia A, Caschera MC, Ferri F, Grifoni P. 2021. Fake news detection: a survey of evaluation datasets. PeerJ Computer Science 7:e518 https://doi.org/10.7717/peerj-cs.518

Answer: DÚlizia et al. present a survey describing 27 datasets for fake news detection, and only one of them contains fake news in Portuguese. Moreover, this data set is not focused on COVID.

Taichi Murayama. 2021. Dataset of Fake News Detection and Fact Verification: A Survey. arXiv:2111.03299

Answer: Murayama presents a survey about fake news and the Portuguese dataset he describes is exactly the data set I have made available. See the reference he used: 

[73] Patricia Takako Endo, Gleyson Rhuan Nascimento Campos, Maria Eduarda de  Lima Xavier, Kayo Henrique Carvalho Monteiro, Maicon Herverton Lino Ferreira da Silva Barros, Ivanovitch Silva, and Breno Santos. 2021. COVID-19RUMOR: a classified data set of COVID-19 related online rumors in Brazilian Portuguese. doi: 10.17632/pz2j957rzc.2.

In addition, the paper could be improved by adding a comparison with other works on detection of Brazilian fake news, as for example in “M. Paixão, R. Lima and B. Espinasse, "Fake News Classification and Topic Modeling in Brazilian Portuguese," 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 2020, pp. 427-432, doi: 10.1109/WIIAT50758.2020.00063.”

Answer: The suggested work presents a study of fake news classification but with another data set in different contexts (Politics, TV Shows, Daily News, Technology, Economy, and Religion) that directly impacts the performance of the models. Thus, it is difficult to make a direct comparison between the two works. This way, we reproduced their CNN model as described in their work and tested it with our dataset, for discussion purposes. Please, check Page 14, and Tables 4 and 7.

Finally, for improving the readability of the paper, the description of the evaluation process should be improved.

Answer: We described the metrics used for evaluation on sub-section 5.1 and improved the description of the experiment's methodology on sub-section 5.2.

Reviewer 3 Report

To combat the COVID-19 pandemic, people have expanded their use of the Internet to obtain health information more efficiently, but the resulting spread of misinformation in the vast flood of information has caused confusion among people. Misinformation, disinformation, and misinterpretation of health information can cause losses due to obstruction of people's attempts to suppress the virus, delay or failure in the continuation of legitimate medical care.

With this background, in this paper the authors address the task of fake news detection for the Portuguese language. Specifically, the authors construct a dataset of 11382 message texts written in Portuguese about the COVID-19 pandemic. This constructed text dataset can be an effective resource for detecting fake news related to COVID-19 in Portuguese. The authors also evaluate binary classification on existing machine learning algorithms, which can be a guideline for accomplishing similar tasks, and thus contain limited but partially useful content.

 

  1. The evaluation using several machine learning algorithms shows significant results. However, there seems to be little originality in the proposed method. Some differences should be emphasized, such as problems specific to the Portuguese language, trends in opinions based on the nationality of the contributors, or the fact that the method of translating and using the English dataset cannot be used.
  2. The survey of previous studies is well done, but there are some areas where the literature is not cited. (Line 138)
  3. The period at the end of the sentence is missing (line 147).
  4. In the experiment, SVM, Naive Bayes, Random Forest, RNNs, bidirectional RNNs are used and their performances are compared. bi-GRU, which has the best performance in the evaluation experiment for raw text, the authors do not give specific reasons. What kind of fake news? It would be better to discuss what kind of fake news is misclassified by showing concrete examples.
  5. Since this task is basically a binary classification, it is not sufficient to simply obtain the feature values of words when classifying the text of fake news alone. it is necessary to examine the interpretability of the classification by using methods such as feature selection and attention weighting. It is necessary to examine the interpretability of fake news. In particular, fake news can be generated by humans or by artificial intelligence technologies such as GPT-3, which can generate sophisticated sentences mechanically. In order to discriminate them effectively, other deep learning models such as TextGAN should be considered. In future work, a more in-depth discussion, citing recent literature on deep learning-based natural language text classification and visualization techniques for discrimination evidence, would make for a better paper.
  6. It would be good to discuss fake news that attracts more people, such as the degree of diffusion. In particular, if we can predict fake news that people will pay attention to, it would be useful to propose an algorithm that can identify them as fake news in many situations. Simply being able to determine that something seems to be fake news will soon turn into a useless model when it is disguised by AI. If there is an event that is not actually happening now, but could happen in the future, and it is described as fake news, will it be identified as fake news? These issues should also be considered as problems to be solved.
  7. The analysis of the data is interesting. However, it should be shown in detail whether there is any difference between Portuguese and English (or any other language) in the tendency of what is generally called fake news. An in-depth analysis of the content, which is not possible with statistical data alone, would be important.
  8. The position of this study, the differences from other studies, the differences in methods, etc., are in many cases easier to understand if they are explained visually using diagrams.

Author Response

To combat the COVID-19 pandemic, people have expanded their use of the Internet to obtain health information more efficiently, but the resulting spread of misinformation in the vast flood of information has caused confusion among people. Misinformation, disinformation, and misinterpretation of health information can cause losses due to obstruction of people's attempts to suppress the virus, delay or failure in the continuation of legitimate medical care.

With this background, in this paper the authors address the task of fake news detection for the Portuguese language. Specifically, the authors construct a dataset of 11382 message texts written in Portuguese about the COVID-19 pandemic. This constructed text dataset can be an effective resource for detecting fake news related to COVID-19 in Portuguese. The authors also evaluate binary classification on existing machine learning algorithms, which can be a guideline for accomplishing similar tasks, and thus contain limited but partially useful content.

Answer: Thank you for the comments and suggestions to improve our work. We have revised the entire manuscript.

The evaluation using several machine learning algorithms shows significant results. However, there seems to be little originality in the proposed method. Some differences should be emphasized, such as problems specific to the Portuguese language, trends in opinions based on the nationality of the contributors, or the fact that the method of translating and using the English dataset cannot be used.

Answer: Thank you for the suggestion. We have added content about it in Section 1.

The survey of previous studies is well done, but there are some areas where the literature is not cited. (Line 138)

Answer:  We have reviewed the text and believe appropriate references have been included throughout.

The period at the end of the sentence is missing (line 147).

Answer: Thank you. Done.

In the experiment, SVM, Naive Bayes, Random Forest, RNNs, bidirectional RNNs are used and their performances are compared. bi-GRU, which has the best performance in the evaluation experiment for raw text, the authors do not give specific reasons. What kind of fake news? It would be better to discuss what kind of fake news is misclassified by showing concrete examples.

Answer: As our data set does not classify the types of fake news, we can not make this discussion in a quantitative way. It was described as future works. Anyway, we have added some examples of fake news that were misclassified. 

Since this task is basically a binary classification, it is not sufficient to simply obtain the feature values of words when classifying the text of fake news alone. it is necessary to examine the interpretability of the classification by using methods such as feature selection and attention weighting. It is necessary to examine the interpretability of fake news. In particular, fake news can be generated by humans or by artificial intelligence technologies such as GPT-3, which can generate sophisticated sentences mechanically. In order to discriminate them effectively, other deep learning models such as TextGAN should be considered. In future work, a more in-depth discussion, citing recent literature on deep learning-based natural language text classification and visualization techniques for discrimination evidence, would make for a better paper.

Answer: Thank you. We have improved the description of future works by adding your suggestions. "Furthermore, future research should consider the diffusion and impact of fake news and explore both the extent of propagation, the type of engagement, and actors involved including the use of bots. The detection of new types of fake news, particularly in a public health context, can inform public health responses but also optimise platform moderation systems. To this end, research on the use of new transformer-based deep learning architectures such as BERT and GPT-3 may prove fruitful."

It would be good to discuss fake news that attracts more people, such as the degree of diffusion. In particular, if we can predict fake news that people will pay attention to, it would be useful to propose an algorithm that can identify them as fake news in many situations. Simply being able to determine that something seems to be fake news will soon turn into a useless model when it is disguised by AI. If there is an event that is not actually happening now, but could happen in the future, and it is described as fake news, will it be identified as fake news? These issues should also be considered as problems to be solved.

Answer: Thank you for the suggestion. The degree of message diffusion would be possible if this information were collected and probably, it would be from social networks, such as Twitter. In our case, we do not have this information. As we do agree it is a relevant issue that should be considered, we have added it as our future works.

The analysis of the data is interesting. However, it should be shown in detail whether there is any difference between Portuguese and English (or any other language) in the tendency of what is generally called fake news. An in-depth analysis of the content, which is not possible with statistical data alone, would be important.

Answer: We do agree that identifying the difference between Portuguese and English would be interesting but it is out of our main goal since we do not make analysis on language structure of fake news.

The position of this study, the differences from other studies, the differences in methods, etc., are in many cases easier to understand if they are explained visually using diagrams.

Answer: We have revised the related works section.

Reviewer 4 Report

Overall this piece of work explores fake news detection in Brazil, specifically with regards to the ongoing COVID-19 pandemic.  The authors correctly identify and describe the challenges that are on going in that country with regards to the spread of misinformation (government and other actors involved).  

The largest contribution appears to be the curation of a new dataset for Fake news analysis from Brazillian/Portuguese sources.  It is hoped this dataset will be shared and, as suggested in future work, expanded upon.

The approach taken appears to be similar to other researchers in the area (not much novelty or originality with the approach) aside from the introduction of their own dataset and splitting the experiments up into two components; one with linguistic pre-processing (stop-word removal, lowercasing, etc) and one using the text as is (raw). 

The analysis and examination/examples of the dataset were useful and interesting.  One component that should be added, for comparison purposes, is the analysis of the non fake news dataset.  The tables/figures/descriptives of the data only focused on the fake news components and, thus, it is impossible to see if there are any cursory or interesting differences between the two datasets (Table 1).  

The evaluation metrics were fine and typical for work of this nature.  One troubling component was the description of the experiments themselves (page 14, 2nd paragraph in "Experiments" section).  The pre processing is described in a few steps and then it's stated a matrix is created for the token counts (I interpreted this as a typically word by doc matrix in the Bag of Words sense).  The last sentence says "Vectorization was limited to preprocessing" which suggests the BoW model is only used for the pre-processed dataset, the other dataset is used "as is" in order.  The BoW vs "raw" dataset in its original order itself is a variable that should be mentioned (if i am interpreting this correctly) and opens the door to exploring that relationships too.  This must be made much clearer and any implications explored.

Outside of this I think it's a topic piece of work that helps expand the knowledge of fake news detection slightly.  I am pleased that it was identified and stressed that this is looking at characteristics of fake news rather than an actual analysis of the content (much work in this area does not mention that at all - that is it's not the truthfulness of the text that is analysed, but the features that tend to occur in articles of this nature).

Minor corrections/typos are highlighted (along with a few comments) in the attached reviewed PDF.

Comments for author File: Comments.pdf

Author Response

Overall this piece of work explores fake news detection in Brazil, specifically with regards to the ongoing COVID-19 pandemic. The authors correctly identify and describe the challenges that are on going in that country with regards to the spread of misinformation (government and other actors involved). The largest contribution appears to be the curation of a new dataset for Fake news analysis from Brazillian/Portuguese sources. It is hoped this dataset will be shared and, as suggested in future work, expanded upon.

Answer: Thank you for the review and comments to improve the quality of our work. We have revised the manuscript.

The approach taken appears to be similar to other researchers in the area (not much novelty or originality with the approach) aside from the introduction of their own dataset and splitting the experiments up into two components; one with linguistic pre-processing (stopword removal, lowercasing, etc) and one using the text as is (raw).

The analysis and examination/examples of the dataset were useful and interesting. One component that should be added, for comparison purposes, is the analysis of the non fake news dataset. The tables/figures/descriptives of the data only focused on the fake news components and, thus, it is impossible to see if there are any cursory or interesting differences between the two datasets (Table 1).

Answer: Firstly, the dataset is novel compared to other Brazilian Portuguese datasets both in the source and type of content but also the time period. We have added supplemental text in Related Works to highlight this. Secondly, previous Brazilian Portuguese studies neither perform an exploratory data analysis, which provides context, nor do they use and compare the same ML or DL techniques e.g. GRU and both unidirectional and bidirectional DL. We have added supplemental text to highlight this.

The evaluation metrics were fine and typical for work of this nature. One troubling component was the description of the experiments themselves (page 14, 2nd paragraph in "Experiments" section). The pre processing is described in a few steps and then it's stated a matrix is created for the token counts (I interpreted this as a typically word by doc matrix in the Bag of Words sense). The last sentence says "Vectorization was limited to preprocessing" which suggests the BoW model is only used for the pre-processed dataset, the other dataset is used "as is" in order. The BoW vs "raw" dataset in its original order itself is a variable that should be mentioned (if i am interpreting this correctly) and opens the door to exploring that relationships too. This must be made much clearer and any implications explored.

Answer: Sorry for our mistake. Actually, there was an error when we wrote the text, as we applied vectorization in both experiments. It is necessary to convert the text into a number matrix to pass as input to the models. The only difference between the experiments is the removal of stop-words and converting from text to lowercase. We made the correction in the text.

Outside of this I think it's a topic piece of work that helps expand the knowledge of fake

news detection slightly. I am pleased that it was identified and stressed that this is looking at characteristics of fake news rather than an actual analysis of the content (much work in this area does not mention that at all - that is it's not the truthfulness of the text that is analysed, but the features that tend to occur in articles of this nature).

Answer: Thank you very much for your comments.

Minor corrections/typos are highlighted (along with a few comments) in the attached reviewed PDF.

Answer: We have fixed all of them.

Round 2

Reviewer 2 Report

The revised version of the paper has been improved according to the suggested comments.

Reviewer 3 Report

The authors have correctly revised the paper in accordance with the comments.
The authors provide several examples of data used in their experiments to more clearly and specifically describe the contribution, novelty, and originality of this study.
The authors have also added some references necessary to keep up with the latest developments.
This has improved the credibility and completeness of the paper in many ways.
Therefore, I agree to accept their revised paper.

Back to TopTop