A Data-Driven Exploration of a New Islamic Fatwas Dataset for Arabic NLP Tasks
Abstract
:1. Introduction
- Fatwaset is a large and diverse Arabic dataset that covers Islamic text from several Arab countries. This makes it suitable for training Arabic language models and domain-specific language models. For instance, pre-training a language model using Fatwaset can help in building effective systems that detect anti-Islamic or hateful content from social media platforms.
- It can be augmented in a chatbot system to answer questions about Islamic content. For instance, Fatwaset can be used to build a question answering system that provides answers to queries about Islamic topics, such as “ What are the five pillars of Islam?”.
- It is an excellent option for author attribution tasks. It contains a great number of texts from a considerable set of authors, which makes it possible to train author attribution models. For instance, Fatwaset allows for training a model that learns the features and patterns of religious scholars’ answers in terms of vocabulary, style, and structure. Then, this model can be tested to identify the religious scholar of a new given text. Also, it can be used to evaluate and compare the effect of several features in an author attribution task.
- Because it provides a considerable set of metadata for each fatwa text, it can be used in topic identification, clustering, and text classification tasks. For instance, each fatwa in Fatwaset has a title that allows for building models that are able to cluster Islamic texts into groups based on their similarity in terms of topics and vocabulary.
- It contains long texts that can be used in text summarization tasks; for instance, Fatwaset contains a large and diverse collection of answers that support training models that generate abstractive coherent summaries from religious scholars’ answers.
- It can be used and extended to support other domains, such as philosophy, history, language art, and social science, due to its strong connection with Islamic spiritual culture.
- To our knowledge, Fatwaset is the first available Arabic dataset for Islamic fatwas.
- Construct Fatwaset, the first public dataset of Islamic fatwas in Arabic, to enable researchers in computational linguistics to conduct studies on Arabic and Islamic-related NLP problems.
- Understand the content of Islamic fatwas by performing an Exploratory Data Analysis on Fatwaset.
2. Background and Related Works
2.1. Islamic Fatwa-Related Studies
- In the literature, the focus has been mainly placed on Quran datasets. In contrast, other types of Islamic content datasets such as Sunnah and fatwa have not received considerable attention [5].
- Currently, most of the available datasets are designed specifically to target question answering tasks. This limitation in design restricts expansion of the pool of Islamic content-related studies. Dataset design should incorporate criteria that facilitate undertaking other tasks.
- It is evident that Arabic is the dominant language across Islamic content datasets. This might be because the original resources are available in Arabic. However, there is a need to address other languages.
- Regarding datasets concerning fatwa, there is only one dataset presented in [1]. However, the dataset is not publicly available. Conversely, the proposed dataset in this paper, fatwaset, is public to the academic community.
- The fatwa dataset introduced in [1] only includes data about fatwa questions, fatwa answers, fatwa topics, and publication date. It does not contain other metadata about the fatwas. The inclusion of other metadata enriches the dataset and increases its usability. Therefore, fatwaset includes all the metadata related to a certain fatwa in the given resource (the collected metadata is presented in Section 3.1). The aim is to make the dataset effective and applicable to studying a range of Natural Language Processing and text mining problems.
2.2. Exploratory Data Analysis (EDA)
3. Materials and Methods
3.1. Fatwaset
- The different formats of each website were a great challenge. For instance, each website has its own way of categorizing fatwas. Some websites classify based on Figh (jurisprudence) and subject categories, while others use main topics and subtopics. There are also some websites that do not organize fatwas into categories; fatwas are just posted in a list without any order;
- The number of given metadata related to fatwas is not the same for each website because each one provides a different set of metadata;
- Some websites replace the text of the answer with an audio clip;
- The problem of different used hierarchies for pages from the same website.
3.2. Proposed Pipeline of Exploratory Data Analysis (EDA)
4. Results and Discussion
4.1. Fatwas’ Topics
4.2. Fatwas (Questions and Answers)
4.3. Religious Scholars’ Answers
5. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Token (In Arabic) | Translation | Token (In Arabic) | Translation |
---|---|---|---|
الله | Allah (God) | العلم | the science |
وسلم | and peace be upon | وهكذا | and so on |
صلى | prayed | الإنسان | the human |
الصلاة | the prayer | يقول | says |
عبد | slave | وعلا | exalted |
النبي | The Prophet | رمضان | Ramadhan |
محمد | Mohammad | جل | Majestic |
إلا | unless | المسلم | The Muslim |
التوفيق | success | الدين | The religion |
فلا | and no | الخير | The good |
وصحبه | his companions | المسلمين | The Muslims |
واله | his family | القرآن | The Quraan |
يجوز | Permissible | فضيلة | virtue |
نبينا | Our prophet | عز | Almighty |
وبالله | and with Allah (God) | شك | doubt |
صلاة | prayer | العلماء | The scientists |
يقول | said | ينبغي | should |
المسجد | The mosque | الحفظ | Preservation |
سبحانه | Glorified | الأمور | matters |
وتعالى | exalted | وأيضا | additionally |
شرعا | legally | تقسمه | divide it |
حكم | rule | حينما | when |
ابن | son | حال | status |
الإمام | Imam (leader) | البنك | the bank |
رسول | Messenger | وبركاته | His blessings |
الناس | the people | الشرعية | legitimacy |
سيدنا | Our master | أعلم | Know best |
الحمد | praise | المستعان | The helper |
رواه | Narrated by | البيع | The sale |
وإن | and that | العمل | The work |
خيرا | good | يظهر | shows |
جزاكم | reward you | أبي | Father of |
أهل | people | الحديث | Hadith |
فإذا | and if | المرأة | The woman |
References
- Munshi, A.A.; Al-Sabban, W.H.; Farag, A.T.; Rakha, O.E.; Alotaibi, M.; Alotaibi, M. Towards an Automated Islamic Fatwa System: Survey, Dataset and Benchmarks. Int. J. Comput. Sci. Mob. Comput. (IJCSMC) 2021, 10, 118–131. [Google Scholar] [CrossRef]
- Al-Yahya, M. Towards Automated Fiqh School Authorship Attribution. In Computational Linguistics and Intelligent Text Processing; Gelbukh, A., Ed.; Springer International Publishing: Cham, Switzerland, 2018. [Google Scholar]
- Abdullah, O.; Shaharuddin, A.; Wahid, M.A.; Harun, M.S. The Potential and Challenges of Decision Support Systems for Islamic Banking and Finance. Eur. J. Islam. Financ. 2022, 9, 21–29. [Google Scholar] [CrossRef]
- Khairuldin, W.M.K.F.; Anas, W.N.I.W.N.; Embong, A.H.; Hassan, S.A.; Hanapi, M.S.; Ismail, D. Ethics of Mufti in the Declaration of Fatwa According to Islam. J. Leg. Ethical Regul. Issues 2019, 22. [Google Scholar]
- Alnefaie, S.; Atwell, E.; Alsalka, M.A. Challenges in the Islamic Question Answering Corpora. Int. J. Islam. Appl. Comput. Sci. Technol. 2022, 10, 1–10. [Google Scholar]
- Malhas, R.; Elsayed, T. AyaTEC: Building a Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur’an. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2020, 19, 1–21. [Google Scholar] [CrossRef]
- Malhas, R.; Mansour, W.; Elsayed, T. Qur’an QA 2022: Overview of The First Shared Task on Question Answering over the Holy Qur’an. In Proceedings of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur’an QA and Fine-Grained Hate Speech Detection; OSACT; European Language Resources Association: Marseille, France, June 2022; pp. 79–87. [Google Scholar]
- Mohammed, M.; Amin, S.; Aref, M.M. An English Islamic Articles Dataset (EIAD) for developing an IslamBot Question Answering Chatbot. In Proceedings of the 2022 5th International Conference on Computing and Informatics (ICCI), Riyadh, Saudi Arabia, 9–10 March 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 303–309. [Google Scholar] [CrossRef]
- AlZahrani, F.M.; Al-Yahya, M. A Transformer-Based Approach to Authorship Attribution in Classical Arabic Texts. Appl. Sci. 2023, 13, 7255. [Google Scholar] [CrossRef]
- Gartner, R. Metadata: Shaping Knowledge from Antiquity to the Semantic Web; Springer: Cham, Switzerland, 2016; pp. 1–10. [Google Scholar]
- Riley, J. Understanding Metadata: What Is Metadata, and What Is It For?: A Primer; NISO: Baltimore, MD, USA, 2017; pp. 1–49. [Google Scholar]
- Sahoo, K.; Samal, A.K.; Pramanik, J.; Pani, S.K. Exploratory Data Analysis Using Python. IJITEE 2019, 8, 4727–4735. [Google Scholar] [CrossRef]
- Endsuy, R.D. Sentiment Analysis between VADER and EDA for the US Presidential Election 2020 on Twitter Datasets. JADS 2021, 2, 8–18. [Google Scholar] [CrossRef]
- Komorowski, M.; Marshall, D.C.; Salciccioli, J.D.; Crutain, Y. Exploratory Data Analysis. In Secondary Analysis of Electronic Health Records; Springer: Cham, Switzerland, 2016; pp. 185–203. [Google Scholar] [CrossRef]
- Kalmukov, Y. Using Word Clouds for Fast Identification of Papers’ Subject Domain and Reviewers’ Competences. arXiv 2021, arXiv:2112.14861. [Google Scholar] [CrossRef]
- Balz, T. Scientometric Full-Text Analysis of Papers Published in Remote Sensing between 2009 and 2021. Remote Sens. 2022, 14, 4285. [Google Scholar] [CrossRef]
- Alfraidi, T.; Abdeen, M.A.; Yatimi, A.; Alluhaibi, R.; Al-Thubaity, A. The Saudi Novel Corpus: Design and Compilation. Appl. Sci. 2022, 12, 6648. [Google Scholar] [CrossRef]
- Albadi, N.; Kurdi, M.; Mishra, S. Are they our brothers? Analysis and detection of religious hate speech in the Arabic twittersphere. In Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Barcelona, Spain, 28–31 August 2018; IEEE: Piscataway, NJ, USA; pp. 69–76. [Google Scholar] [CrossRef]
- Adebayo, G.O.; Yampolskiy, R.V. Estimating intelligence quotient using stylometry and machine learning techniques: A review. Big Data Min. Anal. 2022, 5, 163–191. [Google Scholar] [CrossRef]
Website | Link | Country |
---|---|---|
Dar Al Ifta in Saudi Arabia | https://www.alifta.gov.sa/ | Saudi Arabia |
Dar Al Ifta in Egypt | https://www.dar-alifta.org | Egypt |
Dar Al Ifta in Jordan | https://aliftaa.jo | Jordan |
Al Shaikh Abdual Aziz Ibn Baz | https://binbaz.org.sa | Saudi Arabia |
Al Shaikh Mohammad Ibn Othaimin | https://binothaimeen.net/site | Saudi Arabia |
Al Shaikh Abdual Aziz Al Ashaikh | https://www.mufti.af.org.sa | Saudi Arabia |
Al Shaikh Saleh Al Fwzan | https://www.alfawzan.af.org.sa | Saudi Arabia |
Al Shaikh Saleh Bin Humaid | https://www.ibnhomaid.af.org.sa/ | Saudi Arabia |
Al Shaikh Abdullah Al Manee | https://al-manee.com | Saudi Arabia |
IslamWeb | https://www.islamweb.com | Qatar |
FatwaPedia | https://fatawapedia.com | Saudi Arabia |
IslamQA | https://islamqa.info | Syria |
IslamOnline | https://islamonline.net | Qatar |
Website | Number of Records | Metadata |
---|---|---|
Dar Al Ifta in Saudi Arabia | 20,000 | Main title, Subtitle, Fatwa number, Mufti name |
Dar Al Ifta in Egypt | 3769 | Main title, Subtitle, Fatwa number, Publication date, Mufti name |
Dar Al Ifta in Jordan | 3146 | Main title, Subtitle, Fatwa number, Publication date, Mufti name |
Al Shaikh Abdual Aziz Ibn Baz | 27,111 | Fighi main title, Subject main title, Subtitle |
Al Shaikh Mohammad Ibn Othaimin | 9125 | Main title, Subtitle, Fatwa number |
Al Shaikh Abdual Aziz Al Ashaikh | 135 | Title |
Al Shaikh Saleh Al Fwzan | 1723 | Title, Answer source |
Al Shaikh Saleh Bin Humaid | 48 | Title |
Al Shaikh Abdullah Al Manee | 233 | Title |
IslamWeb | 11,662 | Main title, Subtitle, Fatwa number, Publication date |
FatwaPedia | 47,098 | Title, Mufti name |
IslamQA | 195 | Title, Answer summary |
IslamOnline | 5937 | Main title, Subtitle |
Total | 130,182 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alyemny, O.; Al-Khalifa, H.; Mirza, A. A Data-Driven Exploration of a New Islamic Fatwas Dataset for Arabic NLP Tasks. Data 2023, 8, 155. https://doi.org/10.3390/data8100155
Alyemny O, Al-Khalifa H, Mirza A. A Data-Driven Exploration of a New Islamic Fatwas Dataset for Arabic NLP Tasks. Data. 2023; 8(10):155. https://doi.org/10.3390/data8100155
Chicago/Turabian StyleAlyemny, Ohoud, Hend Al-Khalifa, and Abdulrahman Mirza. 2023. "A Data-Driven Exploration of a New Islamic Fatwas Dataset for Arabic NLP Tasks" Data 8, no. 10: 155. https://doi.org/10.3390/data8100155
APA StyleAlyemny, O., Al-Khalifa, H., & Mirza, A. (2023). A Data-Driven Exploration of a New Islamic Fatwas Dataset for Arabic NLP Tasks. Data, 8(10), 155. https://doi.org/10.3390/data8100155