Submit to Special Issue Submit Abstract to Special Issue Review for BDCC Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Advances in Natural Language Processing and Text Mining

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Big Data and Cognitive Computing (ISSN 2504-2289).

Deadline for manuscript submissions: 30 April 2025 | Viewed by 28566

Share This Special Issue

Special Issue Editors

Dr. Zuchao Li

E-Mail Website
Guest Editor

School of Computer Science, Wuhan University, Wuhan 430072, China
Interests: parsing; information extraction; machine translation; large language models; multi-modal processing; natural language understanding; text mining

Prof. Dr. Min Peng

E-Mail Website
Guest Editor

School of Computer Science, Wuhan University, Wuhan 430072, China
Interests: text mining; entity linking; knowledge graph; natural language processing

Special Issue Information

Dear Colleagues,

Natural language processing (NLP) and text mining are two rapidly evolving fields with an increasing importance in both academic and industrial research areas. NLP focuses on the interaction between human language and computers, while text mining aims to extract useful insights and knowledge from unstructured textual data. Both fields are essential for handling the vast amounts of text data generated in today's world, which is crucial for various applications, such as information retrieval, sentiment analysis, machine translation, and many others.

With the growing volume and complexity of textual data, new challenges and opportunities arise in NLP and text mining. Recent advancements in machine learning, deep learning, and artificial intelligence have led to significant improvements in these fields. However, there is still much room for innovation and research to tackle the existing challenges.

The aim of this Special Issue is to present the latest research and developments in NLP and text mining, including new methodologies, techniques, and applications. This Special Issue intends to bring together researchers, practitioners, and academics to showcase their work and share their knowledge and expertise in these fields. The scope of this Special Issue aligns with the broader scope of big data and cognitive computing, which focuses on exploring the intersection of big data, cognitive computing, and artificial intelligence. The subject matter of NLP and text mining directly relates to the journal’s scope as these fields contribute significantly to the advancement of artificial intelligence and cognitive computing.

In this Special Issue, original research articles and reviews are welcome. Research areas may include (but are not limited to) the following:

Natural language understanding: techniques and algorithms for understanding and analyzing natural language, including sentiment analysis, topic modeling, named entity recognition, and entity linking.
Text mining and information retrieval: approaches for mining knowledge and insights from unstructured text data, including information retrieval, text classification, and clustering.
Deep learning for NLP and text mining: deep learning-based techniques for natural language processing and text mining, including neural language models, sequence-to-sequence models, and attention-based models.
Large language model pre-training: techniques for pre-training large language models, including BERT, GPT, and RoBERTa, and their applications in NLP and text mining tasks.
Multimodal NLP: techniques for analyzing and understanding multimodal data, including text, images, and videos.
Text generation: techniques for generating natural language text, including text summarization, question-answering systems, and text-to-speech systems.
Applications of NLP and text mining: practical applications of NLP and text mining in various domains, including healthcare, finance, social media, and e-commerce.
Explainable NLP and text mining: approaches for making NLP models more transparent and interpretable, including model visualization, attention mechanisms, and explainable AI.
Low-resource NLP and text mining: techniques for NLP tasks in low-resource languages or domains, where training data are scarce, including transfer learning, domain adaptation, and few-shot learning.
Multilingual NLP and text mining: techniques for processing and analyzing text data in multiple languages, including multilingual embeddings, cross-lingual transfer learning, and multilingual topic modeling.
NLP and text mining for social good: applications of NLP and text mining for social good, including hate speech detection, cyberbullying prevention, and disaster response.

We look forward to receiving your contributions.

Dr. Zuchao Li
Prof. Dr. Min Peng
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Big Data and Cognitive Computing is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

natural language processing
text mining
deep learning
large language models
information retrieval
entity linking
relation extraction
multimodal NLP
low-resource NLP
NLP and text mining for social good

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (13 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

Jump to: Review

30 pages, 5419 KiB

Open AccessArticle

Explainable Aspect-Based Sentiment Analysis Using Transformer Models

by Isidoros Perikos and Athanasios Diamantopoulos

Big Data Cogn. Comput. 2024, 8(11), 141; https://doi.org/10.3390/bdcc8110141 - 24 Oct 2024

Viewed by 1199

Abstract

An aspect-based sentiment analysis (ABSA) aims to perform a fine-grained analysis of text to identify sentiments and opinions associated with specific aspects. Recently, transformers and large language models have demonstrated exceptional performance in detecting aspects and determining their associated sentiments within text. However, understanding the decision-making processes of transformers remains a significant challenge, as they often operate as black-box models, making it difficult to interpret how they arrive at specific predictions. In this article, we examine the performance of various transformers on ABSA and we employ explainability techniques to illustrate their inner decision-making processes. Firstly, we fine-tune several pre-trained transformers, including BERT, RoBERTa, DistilBERT, and XLNet, on an extensive set of data composed of MAMS, SemEval, and Naver datasets. These datasets consist of over 16,100 complex sentences, each containing a couple of aspects and corresponding polarities. The models were fine-tuned using optimal hyperparameters and RoBERTa achieved the highest performance, reporting 89.16% accuracy on MAMS and SemEval and 97.62% on Naver. We implemented five explainability techniques, LIME, SHAP, attention weight visualization, integrated gradients, and Grad-CAM, to illustrate how transformers make predictions and highlight influential words. These techniques can reveal how models use specific words and contextual information to make sentiment predictions, which can improve performance, address biases, and enhance model efficiency and robustness. These also point out directions for further focus on the analysis of models’ bias in combination with explainability methods, ensuring that explainability highlights potential biases in predictions. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

19 pages, 714 KiB

Open AccessArticle

Combining Semantic Matching, Word Embeddings, Transformers, and LLMs for Enhanced Document Ranking: Application in Systematic Reviews

by Goran Mitrov, Boris Stanoev, Sonja Gievska, Georgina Mirceva and Eftim Zdravevski

Big Data Cogn. Comput. 2024, 8(9), 110; https://doi.org/10.3390/bdcc8090110 - 4 Sep 2024

Viewed by 1438

Abstract

The rapid increase in scientific publications has made it challenging to keep up with the latest advancements. Conducting systematic reviews using traditional methods is both time-consuming and difficult. To address this, new review formats like rapid and scoping reviews have been introduced, reflecting an urgent need for efficient information retrieval. This challenge extends beyond academia to many organizations where numerous documents must be reviewed in relation to specific user queries. This paper focuses on improving document ranking to enhance the retrieval of relevant articles, thereby reducing the time and effort required by researchers. By applying a range of natural language processing (NLP) techniques, including rule-based matching, statistical text analysis, word embeddings, and transformer- and LLM-based approaches like Mistral LLM, we assess the article’s similarities to user-specific inputs and prioritize them according to relevance. We propose a novel methodology, Weighted Semantic Matching (WSM) + MiniLM, combining the strengths of the different methodologies. For validation, we employ global metrics such as precision at K, recall at K, average rank, median rank, and pairwise comparison metrics, including higher rank count, average rank difference, and median rank difference. Our proposed algorithm achieves optimal performance, with an average recall at 1000 of 95% and an average median rank of 185 for selected articles across the five datasets evaluated. These findings give promising results in pinpointing the relevant articles and reducing the manual work. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

20 pages, 4936 KiB

Open AccessArticle

Development of Context-Based Sentiment Classification for Intelligent Stock Market Prediction

by Nurmaganbet Smatov, Ruslan Kalashnikov and Amandyk Kartbayev

Big Data Cogn. Comput. 2024, 8(6), 51; https://doi.org/10.3390/bdcc8060051 - 22 May 2024

Cited by 1 | Viewed by 1284

Abstract

This paper presents a novel approach to sentiment analysis specifically customized for predicting stock market movements, bypassing the need for external dictionaries that are often unavailable for many languages. Our methodology directly analyzes textual data, with a particular focus on context-specific sentiment words within neural network models. This specificity ensures that our sentiment analysis is both relevant and accurate in identifying trends in the stock market. We employ sophisticated mathematical modeling techniques to enhance both the precision and interpretability of our models. Through meticulous data handling and advanced machine learning methods, we leverage large datasets from Twitter and financial markets to examine the impact of social media sentiment on financial trends. We achieved an accuracy exceeding 75%, highlighting the effectiveness of our modeling approach, which we further refined into a convolutional neural network model. This achievement contributes valuable insights into sentiment analysis within the financial domain, thereby improving the overall clarity of forecasting in this field. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

26 pages, 12425 KiB

Open AccessArticle

Topic Modelling: Going beyond Token Outputs

by Lowri Williams, Eirini Anthi, Laura Arman and Pete Burnap

Big Data Cogn. Comput. 2024, 8(5), 44; https://doi.org/10.3390/bdcc8050044 - 25 Apr 2024

Cited by 1 | Viewed by 1676

Abstract

Topic modelling is a text mining technique for identifying salient themes from a number of documents. The output is commonly a set of topics consisting of isolated tokens that often co-occur in such documents. Manual effort is often associated with interpreting a topic’s description from such tokens. However, from a human’s perspective, such outputs may not adequately provide enough information to infer the meaning of the topics; thus, their interpretability is often inaccurately understood. Although several studies have attempted to automatically extend topic descriptions as a means of enhancing the interpretation of topic models, they rely on external language sources that may become unavailable, must be kept up to date to generate relevant results, and present privacy issues when training on or processing data. This paper presents a novel approach towards extending the output of traditional topic modelling methods beyond a list of isolated tokens. This approach removes the dependence on external sources by using the textual data themselves by extracting high-scoring keywords and mapping them to the topic model’s token outputs. To compare how the proposed method benchmarks against the state of the art, a comparative analysis against results produced by Large Language Models (LLMs) is presented. Such results report that the proposed method resonates with the thematic coverage found in LLMs and often surpasses such models by bridging the gap between broad thematic elements and granular details. In addition, to demonstrate and reinforce the generalisation of the proposed method, the approach was further evaluated using two other topic modelling methods as the underlying models and when using a heterogeneous unseen dataset. To measure the interpretability of the proposed outputs against those of the traditional topic modelling approach, independent annotators manually scored each output based on their quality and usefulness as well as the efficiency of the annotation task. The proposed approach demonstrated higher quality and usefulness, as well as higher efficiency in the annotation task, in comparison to the outputs of a traditional topic modelling method, demonstrating an increase in their interpretability. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

18 pages, 470 KiB

Open AccessArticle

Comparing Hierarchical Approaches to Enhance Supervised Emotive Text Classification

by Lowri Williams, Eirini Anthi and Pete Burnap

Big Data Cogn. Comput. 2024, 8(4), 38; https://doi.org/10.3390/bdcc8040038 - 29 Mar 2024

Cited by 1 | Viewed by 1762

Abstract

The performance of emotive text classification using affective hierarchical schemes (e.g., WordNet-Affect) is often evaluated using the same traditional measures used to evaluate the performance of when a finite set of isolated classes are used. However, applying such measures means the full characteristics and structure of the emotive hierarchical scheme are not considered. Thus, the overall performance of emotive text classification using emotion hierarchical schemes is often inaccurately reported and may lead to ineffective information retrieval and decision making. This paper provides a comparative investigation into how methods used in hierarchical classification problems in other domains, which extend traditional evaluation metrics to consider the characteristics of the hierarchical classification scheme, can be applied and subsequently improve the classification of emotive texts. This study investigates the classification performance of three widely used classifiers, Naive Bayes, J48 Decision Tree, and SVM, following the application of the aforementioned methods. The results demonstrated that all the methods improved the emotion classification. However, the most notable improvement was recorded when a depth-based method was applied to both the testing and validation data, where the precision, recall, and F1-score were significantly improved by around 70 percentage points for each classifier. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

26 pages, 6098 KiB

Open AccessArticle

Unveiling Sentiments: A Comprehensive Analysis of Arabic Hajj-Related Tweets from 2017–2022 Utilizing Advanced AI Models

by Hanan M. Alghamdi

Big Data Cogn. Comput. 2024, 8(1), 5; https://doi.org/10.3390/bdcc8010005 - 2 Jan 2024

Cited by 4 | Viewed by 3057

Abstract

Sentiment analysis plays a crucial role in understanding public opinion and social media trends. It involves analyzing the emotional tone and polarity of a given text. When applied to Arabic text, this task becomes particularly challenging due to the language’s complex morphology, right-to-left script, and intricate nuances in expressing emotions. Social media has emerged as a powerful platform for individuals to express their sentiments, especially regarding religious and cultural events. Consequently, studying sentiment analysis in the context of Hajj has become a captivating subject. This research paper presents a comprehensive sentiment analysis of tweets discussing the annual Hajj pilgrimage over a six-year period. By employing a combination of machine learning and deep learning models, this study successfully conducted sentiment analysis on a sizable dataset consisting of Arabic tweets. The process involves pre-processing, feature extraction, and sentiment classification. The objective was to uncover the prevailing sentiments associated with Hajj over different years, before, during, and after each Hajj event. Importantly, the results presented in this study highlight that BERT, an advanced transformer-based model, outperformed other models in accurately classifying sentiment. This underscores its effectiveness in capturing the complexities inherent in Arabic text. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

12 pages, 2495 KiB

Open AccessArticle

by Marcos Orellana, Patricio Santiago García, Guillermo Daniel Ramon, Jorge Luis Zambrano-Martinez, Andrés Patiño-León, María Verónica Serrano and Priscila Cedillo

Big Data Cogn. Comput. 2024, 8(1), 3; https://doi.org/10.3390/bdcc8010003 - 29 Dec 2023

Viewed by 2264

Abstract

Health problems in older adults lead to situations where communication with peers, family and caregivers becomes challenging for seniors; therefore, it is necessary to use alternative methods to facilitate communication. In this context, Augmentative and Alternative Communication (AAC) methods are widely used to support this population segment. Moreover, with Artificial Intelligence (AI), and specifically, machine learning algorithms, AAC can be improved. Although there have been several studies in this field, it is interesting to analyze common phrases used by seniors, depending on their context (i.e., slang and everyday expressions typical of their age). This paper proposes a semantic analysis of the common phrases of older adults and their corresponding meanings through Natural Language Processing (NLP) techniques and a pre-trained language model using semantic textual similarity to represent the older adults’ phrases with their corresponding graphic images (pictograms). The results show good scores achieved in the semantic similarity between the phrases of the older adults and the definitions, so the relationship between the phrase and the pictogram has a high degree of probability. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

15 pages, 1210 KiB

Open AccessEditor’s ChoiceArticle

Text Classification Based on the Heterogeneous Graph Considering the Relationships between Documents

by Hiromu Nakajima and Minoru Sasaki

Big Data Cogn. Comput. 2023, 7(4), 181; https://doi.org/10.3390/bdcc7040181 - 13 Dec 2023

Cited by 1 | Viewed by 2118

Abstract

Text classification is the task of estimating the genre of a document based on information such as word co-occurrence and frequency of occurrence. Text classification has been studied by various approaches. In this study, we focused on text classification using graph structure data. Conventional graph-based methods express relationships between words and relationships between words and documents as weights between nodes. Then, a graph neural network is used for learning. However, there is a problem that conventional methods are not able to represent the relationship between documents on the graph. In this paper, we propose a graph structure that considers the relationships between documents. In the proposed method, the cosine similarity of document vectors is set as weights between document nodes. This completes a graph that considers the relationship between documents. The graph is then input into a graph convolutional neural network for training. Therefore, the aim of this study is to improve the text classification performance of conventional methods by using this graph that considers the relationships between document nodes. In this study, we conducted evaluation experiments using five different corpora of English documents. The results showed that the proposed method outperformed the performance of the conventional method by up to 1.19%, indicating that the use of relationships between documents is effective. In addition, the proposed method was shown to be particularly effective in classifying long documents. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

20 pages, 7293 KiB

Open AccessArticle

Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles

by Deptii Chaudhari and Ambika Vishal Pawar

Big Data Cogn. Comput. 2023, 7(4), 175; https://doi.org/10.3390/bdcc7040175 - 15 Nov 2023

Cited by 5 | Viewed by 2652

Abstract

Misinformation, fake news, and various propaganda techniques are increasingly used in digital media. It becomes challenging to uncover propaganda as it works with the systematic goal of influencing other individuals for the determined ends. While significant research has been reported on propaganda identification and classification in resource-rich languages such as English, much less effort has been made in resource-deprived languages like Hindi. The spread of propaganda in the Hindi news media has induced our attempt to devise an approach for the propaganda categorization of Hindi news articles. The unavailability of the necessary language tools makes propaganda classification in Hindi more challenging. This study proposes the effective use of deep learning and transformer-based approaches for Hindi computational propaganda classification. To address the lack of pretrained word embeddings in Hindi, Hindi Word2vec embeddings were created using the H-Prop-News corpus for feature extraction. Subsequently, three deep learning models, i.e., CNN (convolutional neural network), LSTM (long short-term memory), Bi-LSTM (bidirectional long short-term memory); and four transformer-based models, i.e., multi-lingual BERT, Distil-BERT, Hindi-BERT, and Hindi-TPU-Electra, were experimented with. The experimental outcomes indicate that the multi-lingual BERT and Hindi-BERT models provide the best performance, with the highest F1 score of 84% on the test data. These results strongly support the efficacy of the proposed solution and indicate its appropriateness for propaganda classification. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

18 pages, 1858 KiB

Open AccessArticle

Arabic Toxic Tweet Classification: Leveraging the AraBERT Model

by Amr Mohamed El Koshiry, Entesar Hamed I. Eliwa, Tarek Abd El-Hafeez and Ahmed Omar

Big Data Cogn. Comput. 2023, 7(4), 170; https://doi.org/10.3390/bdcc7040170 - 26 Oct 2023

Cited by 7 | Viewed by 3112

Abstract

Social media platforms have become the primary means of communication and information sharing, facilitating interactive exchanges among users. Unfortunately, these platforms also witness the dissemination of inappropriate and toxic content, including hate speech and insults. While significant efforts have been made to classify toxic content in the English language, the same level of attention has not been given to Arabic texts. This study addresses this gap by constructing a standardized Arabic dataset specifically designed for toxic tweet classification. The dataset is annotated automatically using Google’s Perspective API and the expertise of three native Arabic speakers and linguists. To evaluate the performance of different models, we conduct a series of experiments using seven models: long short-term memory (LSTM), bidirectional LSTM, a convolutional neural network, a gated recurrent unit (GRU), bidirectional GRU, multilingual bidirectional encoder representations from transformers, and AraBERT. Additionally, we employ word embedding techniques. Our experimental findings demonstrate that the fine-tuned AraBERT model surpasses the performance of other models, achieving an impressive accuracy of 0.9960. Notably, this accuracy value outperforms similar approaches reported in recent literature. This study represents a significant advancement in Arabic toxic tweet classification, shedding light on the importance of addressing toxicity in social media platforms while considering diverse languages and cultures. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

16 pages, 3082 KiB

Open AccessArticle

The Development of a Kazakh Speech Recognition Model Using a Convolutional Neural Network with Fixed Character Level Filters

by Nurgali Kadyrbek, Madina Mansurova, Adai Shomanov and Gaukhar Makharova

Big Data Cogn. Comput. 2023, 7(3), 132; https://doi.org/10.3390/bdcc7030132 - 20 Jul 2023

Cited by 3 | Viewed by 2484

Abstract

This study is devoted to the transcription of human speech in the Kazakh language in dynamically changing conditions. It discusses key aspects related to the phonetic structure of the Kazakh language, technical considerations in collecting the transcribed audio corpus, and the use of deep neural networks for speech modeling. A high-quality decoded audio corpus was collected, containing 554 h of data, giving an idea of the frequencies of letters and syllables, as well as demographic parameters such as the gender, age, and region of residence of native speakers. The corpus contains a universal vocabulary and serves as a valuable resource for the development of modules related to speech. Machine learning experiments were conducted using the DeepSpeech2 model, which includes a sequence-to-sequence architecture with an encoder, decoder, and attention mechanism. To increase the reliability of the model, filters initialized with symbol-level embeddings were introduced to reduce the dependence on accurate positioning on object maps. The training process included simultaneous preparation of convolutional filters for spectrograms and symbolic objects. The proposed approach, using a combination of supervised and unsupervised learning methods, resulted in a 66.7% reduction in the weight of the model while maintaining relative accuracy. The evaluation on the test sample showed a 7.6% lower character error rate (CER) compared to existing models, demonstrating its most modern characteristics. The proposed architecture provides deployment on platforms with limited resources. Overall, this study presents a high-quality audio corpus, an improved speech recognition model, and promising results applicable to speech-related applications and languages beyond Kazakh. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

20 pages, 1788 KiB

Open AccessArticle

DSpamOnto: An Ontology Modelling for Domain-Specific Social Spammers in Microblogging

by Malak Al-Hassan, Bilal Abu-Salih and Ahmad Al Hwaitat

Big Data Cogn. Comput. 2023, 7(2), 109; https://doi.org/10.3390/bdcc7020109 - 2 Jun 2023

Cited by 4 | Viewed by 1994

Abstract

The lack of regulations and oversight on Online Social Networks (OSNs) has resulted in the rise of social spam, which is the dissemination of unsolicited and low-quality content that aims to deceive and manipulate users. Social spam can cause a range of negative consequences for individuals and businesses, such as the spread of malware, phishing scams, and reputational damage. While machine learning techniques can be used to detect social spammers by analysing patterns in data, they have limitations such as the potential for false positives and false negatives. In contrast, ontologies allow for the explicit modelling and representation of domain knowledge, which can be used to create a set of rules for identifying social spammers. However, the literature exposes a deficiency of ontologies that conceptualize domain-based social spam. This paper aims to address this gap by designing a domain-specific ontology called DSpamOnto to detect social spammers in microblogging that targes a specific domain. DSpamOnto can identify social spammers based on their domain-specific behaviour, such as posting repetitive or irrelevant content and using misleading information. The proposed model is compared and benchmarked against well-proven ML models using various evaluation metrics to verify and validate its utility in capturing social spammers. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

Review

Jump to: Research

32 pages, 614 KiB

Open AccessReview

Automatic Generation of Medical Case-Based Multiple-Choice Questions (MCQs): A Review of Methodologies, Applications, Evaluation, and Future Directions

by Somaiya Al Shuraiqi, Abdulrahman Aal Abdulsalam, Ken Masters, Hamza Zidoum and Adhari AlZaabi

Big Data Cogn. Comput. 2024, 8(10), 139; https://doi.org/10.3390/bdcc8100139 - 17 Oct 2024

Viewed by 1041

Abstract

This paper offers an in-depth review of the latest advancements in the automatic generation of medical case-based multiple-choice questions (MCQs). The automatic creation of educational materials, particularly MCQs, is pivotal in enhancing teaching effectiveness and student engagement in medical education. In this review, we explore various algorithms and techniques that have been developed for generating MCQs from medical case studies. Recent innovations in natural language processing (NLP) and machine learning (ML) for automatic language generation have garnered considerable attention. Our analysis evaluates and categorizes the leading approaches, highlighting their generation capabilities and practical applications. Additionally, this paper synthesizes the existing evidence, detailing the strengths, limitations, and gaps in current practices. By contributing to the broader conversation on how technology can support medical education, this review not only assesses the present state but also suggests future directions for improvement. We advocate for the development of more advanced and adaptable mechanisms to enhance the automatic generation of MCQs, thereby supporting more effective learning experiences in medical education. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Journal Menu

Journal Browser

Advances in Natural Language Processing and Text Mining

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (13 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI