Stance Analysis of Distance Education in the Kingdom of Saudi Arabia during the COVID-19 Pandemic Using Arabic Twitter Data
Abstract
:1. Introduction
- to collect and annotate Arabic tweets regarding distance learning in the KSA;
- to train and test several classical and deep-learning models in the detection and classification of stances;
- to evaluate and elicit the best performing model; and
- to use the best performance model to classify the tweets in order to analyze the trends of the public stance towards distance learning in Saudi Arabia during the 2020/2021 academic year in connection with six major events.
2. Related Work
3. Methodology
3.1. Data Collection Stage
3.2. Stance Learning Stage
3.2.1. First Step: Data Labeling
3.2.2. Second Step: Data Pre-Processing
- Unwanted characters were removed from the tweets, such as links, emojis, special characters (#, %, &), Arabic diacritics, punctuation marks, and numbers.
- Tweets written in a language other than Arabic were deleted.
- Text correction was performed using the TextBlob library in Python [31].
- I normalized the Arabic text as follows:
- -
- إ, أ and آ was replaced with ئ; ا was replaced with ى; ا was replaced with ة; ي was replaced with ؤ; ه was replaced with و; and كـ was replaced with ك.
- Duplicate characters were removed, as in جمييييييل, مرررره, and راااااائع.
- Arabic stop words were removed, such as على, في, من, and الى.
3.2.3. Third Step: Feature Extraction
Space Vector Based on TF–IDF
Word Embedding
3.2.4. Fourth Step: Stance-Classification Models
3.2.5. Fifth Step: Model-Performance Evaluation
- A true positive (TP) is the number of real In-favor tweets classified as In-favor.
- A false positive (FP) is the number of real Against tweets classified incorrectly as In-favor.
- A false negative (FN) is the number of real In-favor tweets incorrectly classified as Against.
- A true negative (TN) is the number of Against tweets correctly classified as Against.
3.3. Stance Detection Analysis Stage
4. Experiments and Results
4.1. The Results of Classical Machine-Learning Algorithms
- Different number of features, including all generated features, 1500, 2000, 3000 and 5000.
- Different n-gram combinations for the string vectorizer including unigrams (1-1), unigrams + bigrams (1-2), unigrams + bigrams + trigrams (1-3), bigrams (2-2), bigrams + trigrams (2-3), and trigrams (3-3).
- Different values for the document frequency threshold (maxDF), including , , and .
4.2. The Results of Deep Learning
5. Discussion
6. Analyzing Social Media
6.1. Major Events
- A1:Announcement of the switch to remote learning for the Hijri year 1442 for seven weeks. (https://www.spa.gov.sa/viewfullstory.php?lang=ar&newsid=2120893 (accessed on 15 August 2020)).
- A2: Announcing the continuation of distance learning for the remaining weeks of the first semester. (https://www.spa.gov.sa/viewfullstory.php?lang=ar&newsid=2142888 (accessed on 8 October 2020)).
- A3: Adoption of a mechanism for performing the first semester’s final exams for general education students and administering the grades. (https://www.spa.gov.sa/2147922 (accessed on 24 October 2020)).
- A4: Announcing the continuation of distance learning for the second semester until the tenth week of the semester. (https://www.moe.gov.sa/ar/mediacenter/MOEnews/Pages/PE-2547S.aspx (accessed on 13 January 2021)).
- A5: Distance learning will continue until the end of the school year. (https://www.moe.gov.sa/ar/mediacenter/MOEnews/Pages/ERC1442-23.aspx (accessed on 22 February 2021)).
- A6: Bringing forward the exam period for the second semester to start on the first day of Ramadan for the primary school students and the sixth dayof Ramadan for middle and secondary school students. (https://www.moe.gov.sa/ar/mediacenter/MOEnews/Pages/th1442-89.aspx (accessed on 29 March 2021)).
6.2. Trend Analysis
7. Conclusions, Limitations, and Future Research
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Al Jazirah Waas. Suspension of Studies in All Public, Private, University and Technical Schools. 2020. Available online: https://www.al-jazirah.com/2020/20200309/ln17.htm (accessed on 20 July 2020).
- Minister of Education. IEN. Available online: http://www.ientv.edu.sa/ (accessed on 10 May 2021).
- Minister of Education. Madrasati Platform. Available online: https://schools.madrasati.sa/ (accessed on 10 May 2021).
- Aljaber, A. E-learning policy in Saudi Arabia: Challenges and successes. Res. Comp. Int. Educ. 2018, 13, 176–194. [Google Scholar] [CrossRef] [Green Version]
- Kantar Group and Affiliates. COVID-19 Barometer: Consumer Attitudes, Media Habits and Expectations. 2021. Available online: https://www.kantar.com/Inspiration/Coronavirus/COVID-19-Barometer-Consumer-attitudes-media-habits-and-expectations (accessed on 16 September 2021).
- Global Media Insight. Saudi Arabia Social Media Statistics 2021. 2021. Available online: https://www.globalmediainsight.com/blog/saudi-arabia-social-media-statistics/ (accessed on 16 September 2021).
- Statista Research Department. Leading Countries Based on Number of Twitter Users as of July 2021 (in Millions). 2021. Available online: https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/ (accessed on 15 September 2021).
- Biber, D.; Finegan, E. Adverbial stance types in English. Discourse Process. 1988, 11, 1–34. [Google Scholar] [CrossRef]
- Khouja, J. Stance prediction and claim verification: An Arabic perspective. arXiv 2020, arXiv:2005.10410. [Google Scholar]
- Darwish, K.; Stefanov, P.; Aupetit, M.; Nakov, P. Unsupervised user stance detection on Twitter. In Proceedings of the International AAAI Conference on Web and Social Media, Atlanta, GA, USA, 8–11 June 2020; Volume 14, pp. 141–152. [Google Scholar]
- Wang, R.; Zhou, D.; Jiang, M.; Si, J.; Yang, Y. A survey on opinion mining: From stance to product aspect. IEEE Access 2019, 7, 41101–41124. [Google Scholar] [CrossRef]
- Oueslati, O.; Cambria, E.; HajHmida, M.B.; Ounelli, H. A review of sentiment analysis research in Arabic language. Future Gener. Comput. Syst. 2020, 112, 408–430. [Google Scholar] [CrossRef]
- Al-Ayyoub, M.; Khamaiseh, A.A.; Jararweh, Y.; Al-Kabi, M.N. A comprehensive survey of arabic sentiment analysis. Inf. Process. Manag. 2019, 56, 320–342. [Google Scholar] [CrossRef]
- Al-Ayyoub, M.; Essa, S.B.; Alsmadi, I. Lexicon-based sentiment analysis of Arabic tweets. Int. J. Soc. Netw. Min. 2015, 2, 101–114. [Google Scholar] [CrossRef]
- Abuaiadh, D. Dataset for Arabic Document Classification. 2011. Available online: http://diab.edublogs.org/dataset-for-arabic-document-classification (accessed on 15 September 2021).
- Aldayel, H.K.; Azmi, A.M. Arabic tweets sentiment analysis—A hybrid scheme. J. Inf. Sci. 2016, 42, 782–797. [Google Scholar] [CrossRef]
- Baccianella, S.; Esuli, A.; Sebastiani, F. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the LREC, Valletta, Malta, 17–23 May 2010; Volume 10, pp. 2200–2204. [Google Scholar]
- Al-Twairesh, N.; Al-Khalifa, H.; Alsalman, A.; Al-Ohali, Y. Sentiment analysis of Arabic tweets: Feature engineering and a hybrid approach. arXiv 2018, arXiv:1805.08533. [Google Scholar]
- Al-Twairesh, N.; Al-Khalifa, H.; Al-Salman, A. Arasenti: Large-scale Twitter-specific Arabic sentiment lexicons. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; pp. 697–705. [Google Scholar]
- AL-Rubaiee, H.S.; Qiu, R.; Alomar, K.; Li, D. Sentiment Analysis of Arabic Tweets in e-Learning. 2016. Available online: https://uobrep.openrepository.com/bitstream/handle/10547/623004/jcssp.2016.553.563.pdf?sequence=2&isAllowed=y (accessed on 18 September 2021).
- Aljarah, I.; Habib, M.; Hijazi, N.; Faris, H.; Qaddoura, R.; Hammo, B.; Abushariah, M.; Alfawareh, M. Intelligent detection of hate speech in Arabic social network: A machine learning approach. J. Inf. Sci. 2020, 47. [Google Scholar] [CrossRef]
- Alhajji, M.; Al Khalifah, A.; Aljubran, M.; Alkhalifah, M. Sentiment Analysis of Tweets in Saudi Arabia Regarding Governmental Preventive Measures to Contain COVID-19. Preprints 2020. Available online: https://www.preprints.org/manuscript/202004.0031/v1 (accessed on 6 September 2020).
- Al Sallab, A.; Hajj, H.; Badaro, G.; Baly, R.; El-Hajj, W.; Shaban, K. Deep-learning models for sentiment analysis in Arabic. In Proceedings of the Second Workshop on Arabic Natural Language Processing, Beijing, China, 30 July 2015; pp. 9–17. [Google Scholar]
- Alayba, A.M.; Palade, V.; England, M.; Iqbal, R. Arabic language sentiment analysis on health services. In Proceedings of the 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, 3–5 April 2017; pp. 114–118. [Google Scholar]
- Abuzayed, A.; Elsayed, T. Quick and simple approach for detecting hate speech in Arabic tweets. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, France, 12 May 2020; pp. 109–114. [Google Scholar]
- Mohaouchane, H.; Mourhir, A.; Nikolov, N.S. Detecting offensive language on Arabic social media using deep learning. In Proceedings of the 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), Granada, Spain, 22–25 October 2019; pp. 466–471. [Google Scholar]
- Heikal, M.; Torki, M.; El-Makky, N. Sentiment analysis of Arabic Tweets using deep learning. Procedia Comput. Sci. 2018, 142, 114–122. [Google Scholar] [CrossRef]
- Nabil, M.; Aly, M.; Atiya, A. ASTD: Arabic sentiment tweets dataset. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 2515–2519. [Google Scholar]
- Aljabri, M.; Chrouf, S.M.; Alzahrani, N.A.; Alghamdi, L.; Alfehaid, R.; Alqarawi, R.; Alhuthayfi, J.; Alduhailan, N. Sentiment Analysis of Arabic Tweets Regarding Distance Learning in Saudi Arabia during the COVID-19 Pandemic. Sensors 2021, 21, 5431. [Google Scholar] [CrossRef] [PubMed]
- Cotfas, L.A.; Delcea, C.; Roxin, I.; Ioanăş, C.; Gherai, D.S.; Tajariol, F. The Longest Month: Analyzing COVID-19 Vaccination Opinions Dynamics from tweets in the Month following the First Vaccine Announcement. IEEE Access 2021, 9, 33203–33223. [Google Scholar] [CrossRef] [PubMed]
- Al-Eidan, R.M.B.; Al-Khalifa, H.S.; Al-Salman, A.S. Measuring the credibility of Arabic text content in Twitter. In Proceedings of the 2010 Fifth International Conference on Digital Information Management (ICDIM), Thunder Bay, ON, Canada, 5–8 July 2010; pp. 285–291. [Google Scholar]
- Soliman, A.B.; Eissa, K.; El-Beltagy, S.R. AraVec: A set of Arabic word embedding models for use in Arabic NLP. Procedia Comput. Sci. 2017, 117, 256–265. [Google Scholar] [CrossRef]
- Cliche, M. BB_twtr at SemEval-2017 Task 4: Twitter sentiment analysis with CNNs and LSTMs. arXiv 2017, arXiv:1704.06125. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
- McCallum, A.; Nigam, K. A comparison of event models for naive bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Madison, WI, USA, 26–27 July 1998; Volume 752, pp. 41–48. [Google Scholar]
- Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143, 29–36. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Keras Team. Keras. 2015. Available online: https://github.com/fchollet/keras (accessed on 6 September 2020).
Reference | Dataset Size | Classification Technique(s) | Feature Extraction Technique(s) | Accuracy |
---|---|---|---|---|
[20] | 1121 | SVM, NB | TF–IDF, N-grams | 73% |
[21] | 1633 | SVM, NB, DT, RF | BoW, TF, TF–IDF | 91% |
[22] | 53,000 | NB | unigram | 89% |
[23] | 3795 | RAE, DBN, DAE | BoW | 73% |
[24] | 2026 | CNN, SVM, NB | unigram, bigrams, TF–IDF | 91% |
[25] | 8000 | CNN, LSTM, Bi-LSTM, RF, SVM | AraVec, TF–IDF | 73% |
[26] | 15,050 | CNN, Bi-LSTM, CNN-LSTM | AraVec | 84% |
[27] | 10,000 | CNN-LSTM | AraVec | 64% |
[28] | 10,000 | MND, BNB, SVM | TF–IDF | 69% |
[29] | 14,000 | LR, NB, KNN, XGB, SVM, RF | N-gram, TF–IDF | 89% |
Keyword | English Translation |
---|---|
التعليم عن بعد | Distance Education |
الدراسة عن بعد | Distance Studying |
منصة مدرستي | My School Platform |
نعوذ بحذر | Back with Caution |
التباعد الاجتماعي | Social Distancing |
الدراسة مستمرة | Continuous Study |
كورونا | Corona |
كوفيد٩١ | COVID 19 |
جائحة | Pandemic |
الفصل الدراسي الاول | First Semester |
الفصل الدراسي الثاني | Second Semester |
الاختبارات | Exams |
Year | Month | Tweets | Cleaned |
---|---|---|---|
August | 110,439 | 2059 | |
September | 41,785 | 22,028 | |
2020 | October | 92,564 | 61,079 |
November | 162,739 | 119,078 | |
December | 850,739 | 78,465 | |
January | 99,489 | 78,369 | |
2021 | February | 58,035 | 34,873 |
March | 430,495 | 68,459 | |
Total | 1,846,285 | 464,410 |
Class | Total Number | Percentage |
---|---|---|
In-favor | 3063 | 70.45% |
Against | 1312 | 30.17% |
Total | 4348 | 100.00% |
Label | Language | Tweet Example |
---|---|---|
In-favor | AR | الخيار واضح لكل عاقل التعليم عن بعد طبعا سلامة الابناء اهم ووسائل التعليم بالمنزل متاحه الحمد لله |
En | The choice is clear to every sane person, distance education, of course. The safety of the children is the most important. Thank God education at home is available. | |
Against | AR | إضافة إلى عدم تطوره الاجتماعي والعاطفي بشكل اعتيادي، يؤثر التعليمعنبعد على مسار الطفل الأكاديمي أيضاً، وهو ما يؤكده الخبراء كونه أمرلا يستهان به |
En | In addition to his lack of normal social and emotional development, distance education affects the child’s academic path as well, which is confirmed by experts as a matter to be reckoned with. |
Classifier | Precision | Recall | F-Measure | AUC Score |
---|---|---|---|---|
RF | ||||
n_estimators = 32, maxDF = 1.0, | 0.884 | 0.837 | 0.855 | 0.939 |
max_features = 5000, ngram_range = (1, 1). | ||||
SVC | ||||
C = 1, kernel = rbf, maxDF = 0.75, | 0.892 | 0.838 | 0.859 | 0.951 |
max_features = 3000, ngram_range = (1, 1). | ||||
AdaBoost | ||||
n_estimators = 32, maxDF = 0.75, | 0.837 | 0.803 | 0.817 | 0.916 |
max_features = 5000, ngram_range = (1, 3). | ||||
MultinomialNB, | ||||
maxDF = 0.5, max_features = 5000, | 0.863 | 0.795 | 0.818 | 0.940 |
ngram_range=(1, 1). | ||||
SVC | ||||
AraVec-SG | 0.349 | 0.500 | 0.411 | 0.521 |
SVC | ||||
weighted AraVec-SG | 0.862 | 0.840 | 0.850 | 0.929 |
SVC | ||||
AraVec-CBoW | 0.349 | 0.500 | 0.411 | 0.505 |
SVC | ||||
weighted AraVec-CBoW | 0.836 | 0.795 | 0.811 | 0.911 |
CNN | ||||
AreVec-SG | 0.825 | 0.845 | 0.834 | 0.938 |
CNN | ||||
weighted AraVec-SG | 0.851 | 0.819 | 0.832 | 0.926 |
CNN | ||||
AreVec-CBoW | 0.829 | 0.791 | 0.806 | 0.922 |
CNN | ||||
weighted AreVec-CBoW | 0.812 | 0.694 | 0.717 | 0.898 |
LSTM | ||||
AreVec-SG | 0.832 | 0.840 | 0.836 | 0.944 |
LSTM | ||||
weighted AreVec-SG | 0.865 | 0.842 | 0.853 | 0.940 |
LSTM | ||||
AreVec-CBoW | 0.837 | 0.831 | 0.834 | 0.927 |
LSTM | ||||
weighted AreVec-CBoW | 0.823 | 0.788 | 0.802 | 0.913 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alqurashi, T. Stance Analysis of Distance Education in the Kingdom of Saudi Arabia during the COVID-19 Pandemic Using Arabic Twitter Data. Sensors 2022, 22, 1006. https://doi.org/10.3390/s22031006
Alqurashi T. Stance Analysis of Distance Education in the Kingdom of Saudi Arabia during the COVID-19 Pandemic Using Arabic Twitter Data. Sensors. 2022; 22(3):1006. https://doi.org/10.3390/s22031006
Chicago/Turabian StyleAlqurashi, Tahani. 2022. "Stance Analysis of Distance Education in the Kingdom of Saudi Arabia during the COVID-19 Pandemic Using Arabic Twitter Data" Sensors 22, no. 3: 1006. https://doi.org/10.3390/s22031006
APA StyleAlqurashi, T. (2022). Stance Analysis of Distance Education in the Kingdom of Saudi Arabia during the COVID-19 Pandemic Using Arabic Twitter Data. Sensors, 22(3), 1006. https://doi.org/10.3390/s22031006