Applying Machine Learning to Identify Anti-Vaccination Tweets during the COVID-19 Pandemic
Abstract
:1. Introduction
2. Related Work
3. Methods
3.1. Data Source
3.2. Data Processing and Labeling
3.3. Bidirectional Long Short-Term Memory Networks (Bi-LSTM)
3.4. Bidirectional Encoder Representations from Transformers (BERT)
3.5. Support Vector Machine (SVM) and Naïve Bayes (NB) Classifier
3.6. Metrics for Evaluating Performance
4. Results
5. Discussion
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AUC | Area under the receiver operating characteristic curve |
BERT | Bidirectional encoder representations from transformers |
Bi-LSTM | Bidirectional long short-term memory networks |
ELMo | Embeddings from language models |
GPT | Generative pre-training |
NB | Naïve Bayes |
RNN | Recurrent neural networks |
SVM | Support vector machine |
References
- Doherty, M.; Buchy, P.; Standaert, B.; Giaquinto, C.; Prado-Cohrs, D. Vaccine impact: Benefits for human health. Vaccine 2016, 34, 6707–6714. [Google Scholar] [CrossRef]
- American Academy of Pediatrics. Documenting Parental Refusal to Have Their Children Vaccinated. Available online: https://www.aap.org/en-us/documents/immunization_refusaltovaccinate.pdf (accessed on 30 November 2020).
- Bednarczyk, R.A.; King, A.R.; Lahijani, A.; Omer, S.B. Current landscape of nonmedical vaccination exemptions in the United States: Impact of policy changes. Expert Rev. Vaccines 2019, 18, 175–190. [Google Scholar] [CrossRef]
- World Health Organization. Ten Threats to Global Health in 2019. Available online: https://www.who.int/news-room/spotlight/ten-threats-to-global-health-in-2019 (accessed on 30 November 2020).
- Megget, K. Even covid-19 can’t kill the anti-vaccination movement. BMJ 2020, 369, m2184. [Google Scholar] [CrossRef]
- Alley, S.J.; Stanton, R.; Browne, M.; To, Q.G.; Khalesi, S.; Williams, S.L.; Thwaite, T.L.; Fenning, A.S.; Vandelanotte, C. As the Pandemic Progresses, How Does Willingness to Vaccinate against COVID-19 Evolve? Int. J. Environ. Res. Public Health 2021, 18, 797. [Google Scholar] [CrossRef]
- Rhodes, A.; Hoq, M.; Measey, M.-A.; Danchin, M. Intention to vaccinate against COVID-19 in Australia. Lancet Infect. Dis. 2020. Available online: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7489926/ (accessed on 10 April 2020). [CrossRef]
- Puri, N.; Coomes, E.A.; Haghbayan, H.; Gunaratne, K. Social media and vaccine hesitancy: New updates for the era of COVID-19 and globalized infectious diseases. Hum. Vaccines Immunother. 2020, 16, 2586–2593. [Google Scholar] [CrossRef]
- Burki, T. The online anti-vaccine movement in the age of COVID-19. Lancet Digit. Health 2020, 2, e504–e505. [Google Scholar] [CrossRef]
- Ahmed, N.; Quinn, S.C.; Hancock, G.R.; Freimuth, V.S.; Jamison, A. Social media use and influenza vaccine uptake among White and African American adults. Vaccine 2018, 36, 7556–7561. [Google Scholar] [CrossRef] [PubMed]
- Dunn, A.G.; Leask, J.; Zhou, X.; Mandl, K.D.; Coiera, E. Associations between exposure to and expression of negative opinions about human papillomavirus vaccines on social media: An observational study. J. Med. Internet Res. 2015, 17, e144. [Google Scholar] [CrossRef] [PubMed]
- Massey, P.M.; Leader, A.; Yom-Tov, E.; Budenz, A.; Fisher, K.; Klassen, A.C. Applying multiple data collection tools to quantify human papillomavirus vaccine communication on Twitter. J. Med. Internet Res. 2016, 18, e318. [Google Scholar] [CrossRef]
- Shapiro, G.K.; Surian, D.; Dunn, A.G.; Perry, R.; Kelaher, M. Comparing human papillomavirus vaccine concerns on Twitter: A cross-sectional study of users in Australia, Canada and the UK. BMJ Open 2017, 7, e016869. [Google Scholar] [CrossRef] [Green Version]
- Mitra, T.; Counts, S.; Pennebaker, J.W. Understanding anti-vaccination attitudes in social media. In Proceedings of the Tenth International AAAI Conference on Web and Social Media, Cologne, Germany, 17–20 May 2016. [Google Scholar]
- Zhou, X.; Coiera, E.; Tsafnat, G.; Arachi, D.; Ong, M.-S.; Dunn, A.G. Using social connection information to improve opinion mining: Identifying negative sentiment about HPV vaccines on Twitter. Stud. Health Technol. Inform. 2015, 216, 761–765. [Google Scholar]
- Kunneman, F.; Lambooij, M.; Wong, A.; Bosch, A.V.D.; Mollema, L. Monitoring stance towards vaccination in twitter messages. BMC Med. Inform. Decis. Mak. 2020, 20, 33. [Google Scholar] [CrossRef] [Green Version]
- Deiner, M.S.; Fathy, C.; Kim, J.; Niemeyer, K.; Ramirez, D.; Ackley, S.F.; Liu, F.; Lietman, T.M.; Porco, T.C. Facebook and Twitter vaccine sentiment in response to measles outbreaks. Health Inform. J. 2019, 25, 1116–1132. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tomeny, T.S.; Vargo, C.J.; El-Toukhy, S. Geographic and demographic correlates of autism-related anti-vaccine beliefs on Twitter, 2009–2015. Soc. Sci. Med. 2017, 191, 168–175. [Google Scholar] [CrossRef]
- Gunaratne, K.; Coomes, E.A.; Haghbayan, H. Temporal trends in anti-vaccine discourse on twitter. Vaccine 2019, 37, 4867–4871. [Google Scholar] [CrossRef] [PubMed]
- Hartmann, J.; Huppertz, J.; Schamp, C.; Heitmann, M. Comparing automated text classification methods. Int. J. Res. Mark. 2019, 36, 20–38. [Google Scholar] [CrossRef]
- Al-Smadi, M.; Qawasmeh, O.; Al-Ayyoub, M.; Jararweh, Y.; Gupta, B. Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. J. Comput. Sci. 2018, 27, 386–393. [Google Scholar] [CrossRef]
- Zhang, L.; Wang, S.; Liu, B. Deep learning for sentiment analysis: A survey. Wires Data Min. Knowl. Discov. 2018, 8, e1253. [Google Scholar] [CrossRef] [Green Version]
- Du, J.; Luo, C.; Shegog, R.; Bian, J.; Cunningham, R.M.; Boom, J.A.; Poland, G.A.; Chen, Y.; Tao, C. Use of Deep Learning to Analyze Social Media Discussions About the Human Papillomavirus Vaccine. JAMA Netw. Open 2020, 3, e2022025. [Google Scholar] [CrossRef]
- Zhang, L.; Fan, H.; Peng, C.; Rao, G.; Cong, Q. Sentiment Analysis Methods for HPV Vaccines Related Tweets Based on Transfer Learning. Healthcare 2020, 8, 307. [Google Scholar] [CrossRef] [PubMed]
- Pennington, J.; Socher, R.; Manning, C.D. (Eds.) Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014. [Google Scholar]
- Du, J.; Xu, J.; Song, H.; Liu, X.; Tao, C. Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets. J. Biomed. Semant. 2017, 8, 9. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Joulin, A.; Grave, E.; Bojanowski, P.; Mikolov, T. Bag of tricks for efficient text classification. arXiv 2016, arXiv:160701759. [Google Scholar]
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep contextualized word representations. arXiv 2018, arXiv:180205365. [Google Scholar]
- Wikipedia. Twitter. Available online: https://en.wikipedia.org/wiki/Twitter#cite_note-15 (accessed on 1 April 2021).
- Banda, J.M.; Tekumalla, R.; Wang, G.; Yu, J.; Liu, T.; Ding, Y.; Chowell, G. A large-scale COVID-19 Twitter chatter dataset for open scientific research—An international collaboration. arXiv 2020, arXiv:2004.03688v03681. [Google Scholar]
- Mohammad, S.; Kiritchenko, S.; Sobhani, P.; Zhu, X.; Cherry, C. Semeval-2016 task 6: Detecting stance in tweets. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, CA, USA, 16–17 June 2016; pp. 31–41. [Google Scholar]
- Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed]
- Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef]
- Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef] [Green Version]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- McCallum, A.; Nigam, K. A comparison of event models for naive bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Madison, WI, USA, 26–27 July 1998; pp. 41–48. [Google Scholar]
- Kamath, C.N.; Bukhari, S.S.; Dengel, A. Comparative study between traditional machine learning and deep learning approaches for text classification. In Proceedings of the ACM Symposium on Document Engineering 2018, Halifax, NS, Canada, 28–31 August 2018; pp. 1–11. [Google Scholar]
- Mariel, W.C.F.; Mariyah, S.; Pramana, S. Sentiment analysis: A comparison of deep learning neural network algorithm with SVM and naive Bayes for Indonesian text. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2018; p. 012049. [Google Scholar]
- Zarrella, G.; Marsh, A. Mitre at semeval-2016 task 6: Transfer learning for stance detection. arXiv 2016, arXiv:1606.03784. [Google Scholar]
Learning Rate | Epoch | Accuracy | Precision | Recall | F1 Score | AUC |
---|---|---|---|---|---|---|
0.00003 | 10 | 77.2% | 26.2% | 74.9% | 38.9% | 84.2% |
20 | 78.7% | 28.1% | 76.9% | 41.1% | 85.1% | |
30 | 79.8% | 28.7% | 73.3% | 41.2% | 86.3% | |
40 | 82.3% | 31.2% | 68.6% | 42.9% | 87.1% | |
50 | 82.3% | 31.3% | 69.3% | 43.2% | 87.6% | |
60 | 82.7% | 31.8% | 68.6% | 43.4% | 87.7% | |
70 | 81.3% | 30.1% | 70.6% | 42.2% | 87.6% | |
80 | 82.5% | 31.9% | 71.3% | 44.0% | 88.0% | |
0.0001 | 10 | 80.7% | 29.9% | 73.9% | 42.6% | 87.2% |
20 | 80.5% | 29.9% | 75.6% | 42.8% | 88.3% | |
30 | 80.8% | 30.1% | 74.9% | 43.0% | 88.0% | |
40 | 85.8% | 37.0% | 66.3% | 47.5% | 88.2% | |
50 | 89.7% | 47.2% | 53.1% | 50.0% | 86.2% | |
60 | 88.4% | 43.2% | 64.4% | 51.7% | 87.9% | |
70 | 86.1% | 37.4% | 64.0% | 47.2% | 87.3% | |
80 | 88.8% | 43.9% | 54.1% | 48.4% | 85.0% | |
0.001 | 10 | 84.5% | 35.2% | 71.3% | 47.1% | 88.1% |
20 | 84.7% | 35.1% | 68.0% | 46.3% | 86.9% | |
30 | 90.0% | 47.9% | 41.6% | 44.5% | 83.8% | |
40 | 89.1% | 44.6% | 52.1% | 48.1% | 80.4% | |
50 | 90.7% | 53.3% | 34.3% | 41.8% | 74.5% | |
60 | 89.6% | 46.2% | 42.6% | 44.3% | 78.3% | |
70 | 88.6% | 42.1% | 47.2% | 44.5% | 79.4% | |
80 | 88.7% | 42.5% | 46.5% | 44.4% | 77.5% |
Learning Rate | Epoch | Accuracy | Precision | Recall | F1 Score | AUC |
---|---|---|---|---|---|---|
0.00003 | 1 | 91.7% | 92.5% | 98.8% | 95.5% | 90.7% |
2 | 91.8% | 94.3% | 96.8% | 95.5% | 91.5% | |
3 | 92.2% | 94.6% | 96.8% | 95.7% | 86.5% | |
4 | 92.1% | 94.5% | 96.9% | 95.7% | 83.7% | |
5 | 91.7% | 94.5% | 96.4% | 95.4% | 79.8% | |
0.0001 | 1 | 92.1% | 93.5% | 98.0% | 95.7% | 91.0% |
2 | 92.0% | 94.1% | 97.2% | 95.6% | 91.4% | |
3 | 92.5% | 94.5% | 97.3% | 95.9% | 90.8% | |
4 | 92.1% | 94.5% | 96.9% | 95.7% | 84.6% | |
5 | 92.0% | 94.4% | 96.9% | 95.6% | 82.1% |
Accuracy | Precision | Recall | F1 Score | AUC | |
---|---|---|---|---|---|
SVM-linear | 91.7% | 20.5% | 75.6% | 32.2% | 83.9% |
SVM-poly | 90.8% | 9.2% | 66.7% | 16.2% | 78.9% |
SVM-rbf | 91.1% | 12.5% | 74.5% | 21.5% | 83.0% |
SVM-sigmoid | 91.5% | 17.5% | 75.7% | 28.4% | 83.8% |
Complement NB | 88.8% | 25.4% | 38.1% | 30.5% | 65.2% |
Multinomial NB | 90.6% | 5.6% | 68.0% | 10.4% | 79.4% |
Accuracy | Precision | Recall | F1 Score | AUC | |
---|---|---|---|---|---|
Bi-LSTM-128, learning rate = 0.0001, epoch = 50 | 89.8% | 44.0% | 47.2% | 45.5% | 85.8% |
BERT, learning rate = 0.0001, epoch = 3 | 91.6% | 93.4% | 97.6% | 95.5% | 84.7% |
SVM-linear | 92.3% | 19.5% | 78.6% | 31.2% | 85.6% |
Complement NB | 88.8% | 23.0% | 32.8% | 27.1% | 62.7% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
To, Q.G.; To, K.G.; Huynh, V.-A.N.; Nguyen, N.T.Q.; Ngo, D.T.N.; Alley, S.J.; Tran, A.N.Q.; Tran, A.N.P.; Pham, N.T.T.; Bui, T.X.; et al. Applying Machine Learning to Identify Anti-Vaccination Tweets during the COVID-19 Pandemic. Int. J. Environ. Res. Public Health 2021, 18, 4069. https://doi.org/10.3390/ijerph18084069
To QG, To KG, Huynh V-AN, Nguyen NTQ, Ngo DTN, Alley SJ, Tran ANQ, Tran ANP, Pham NTT, Bui TX, et al. Applying Machine Learning to Identify Anti-Vaccination Tweets during the COVID-19 Pandemic. International Journal of Environmental Research and Public Health. 2021; 18(8):4069. https://doi.org/10.3390/ijerph18084069
Chicago/Turabian StyleTo, Quyen G., Kien G. To, Van-Anh N. Huynh, Nhung T. Q. Nguyen, Diep T. N. Ngo, Stephanie J. Alley, Anh N. Q. Tran, Anh N. P. Tran, Ngan T. T. Pham, Thanh X. Bui, and et al. 2021. "Applying Machine Learning to Identify Anti-Vaccination Tweets during the COVID-19 Pandemic" International Journal of Environmental Research and Public Health 18, no. 8: 4069. https://doi.org/10.3390/ijerph18084069
APA StyleTo, Q. G., To, K. G., Huynh, V. -A. N., Nguyen, N. T. Q., Ngo, D. T. N., Alley, S. J., Tran, A. N. Q., Tran, A. N. P., Pham, N. T. T., Bui, T. X., & Vandelanotte, C. (2021). Applying Machine Learning to Identify Anti-Vaccination Tweets during the COVID-19 Pandemic. International Journal of Environmental Research and Public Health, 18(8), 4069. https://doi.org/10.3390/ijerph18084069