This section is organized into two main parts. The first part, Background, provides essential theoretical concepts and discusses the key milestones that have influenced the development of the research. The second part, Methodology, details the research approach and is structured into three phases to illustrate the process comprehensively.
2.1. Background
There has been a lack of consensus among researchers regarding offensive language, leaving the possibility of subjective interpretations open. Therefore, the same linguistic phenomenon can receive different terms; conversely, the same label can be used for different meanings or expressions [
11].
In particular, offensive language intends to offend a person or a specific group through derogatory, hurtful, or obscene expressions [
12], which may include insults, toxic comments, threats, profanity, or swearing [
13]. Therefore, the similarities between the approaches proposed in previous works have encouraged the argument of a typology that differentiates whether the offensive language is directed to a specific individual or entity or a generalized group and whether the abusive content is explicit or implicit [
13].
Recently, many researchers have been investigating the characterization and taxonomy of offensive language to identify abusive content and develop classification systems with different types: aggression identification, cyberbullying detection, HS identification, bullying identification, offensive language, and the identification of toxic comments [
14].
Therefore, hate speech as a part of offensive language is a type of manifestation characterized by expressing hostility, prejudice, discrimination, and verbal aggression towards individuals or groups based on their ethnic origin, religion, gender, sexual orientation, and disability, among other factors [
15].
Racism (RS) and misogyny are specific forms of hate speech directed at particular groups. The current research on sexism in social networks focuses on detecting misogyny or hatred towards women. The Oxford Dictionary defines misogyny as “a feeling of hate or dislike towards women, or a feeling that women are not as good as men” [
16]. The Royal Spanish Academy (RSA) dictionary, or RAE in Spanish, defines it as “aversion to women” [
17]. Instead, racism refers to “the belief that some races of people are better than others, or a general belief about a whole group of people based only on their race” [
18], which leads to discrimination or social persecution.
Generally, a comment is sexist when it discriminates against people based on gender. This discrimination, whose predominant objective is women, is a prevalent cultural component based on the superiority of men over women in different sectors of life, such as work, politics, society, and the family [
19]. Although both men and women can experience violence and abuse online, women are much more likely to be the victims of harmful actions in severe and repeated forms. Young girls are particularly vulnerable to sexual exploitation and abuse, as well as bullying by their peers in the digital space [
20].
In the context of the Internet and social networks, hate speech not only creates tension between groups of people, but its impact can also influence business or even lead to real-life conflicts [
21]. Therefore, to prevent and counter the spread of hate speech on social networks, the European Commission agreed with Facebook, Microsoft, Twitter, and YouTube on a code of conduct to counter illegal hate speech online [
7,
22,
23]. However, controlling and filtering all the content is a challenge. For this reason, researchers have tried to develop different automatic hate speech detection tools developed in the Natural Language Processing (NLP) and Machine Learning (ML) fields.
In ML, hate speech detection can be modeled as a dichotomous (hate speech or non-hate speech) or multiclass classification problem (misogyny, racism, etc.), which can be adequately addressed using both classical learning algorithms and Deep Learning (DL) algorithms [
24]. NLP techniques such as sentiment analysis and identifying offensive keywords and linguistic patterns can also be used. Although Reinforcement Neural Network (RNN) algorithms have been widely used in processing data streams, their limitation in the length of the streams they can handle has led to the increasing popularity of self-attention-based models, such as TLM.
End-to-end memory networks are based on a recursive attention mechanism rather than sequence-aligned recursion and have been shown to perform well in simple language question-answering and natural language modeling tasks [
25]. Nevertheless, the literature reveals that the TLM is the first transduction model completely based on self-attention to calculate representations of its input and output without using sequence-aligned RNN or convolution [
26].
Regarding works on the automatic detection of hate speech, the vast majority of developments are in the English language, predominating over other languages, such as Spanish. In short, since Spanish is the third most used language on the Internet [
27], it is essential to research and implement natural language models focused on the Spanish language, especially the domain and dialect of Spanish-speaking countries such as Colombia.
Hate speech in Colombia is profoundly influenced by historical, cultural, and socioeconomic factors. The armed conflict and social polarization have created an environment where hate can emerge as a tool of resistance or rejection. The country’s ethnic and regional diversity also means that expressions of hate vary depending on the group and region, while socioeconomic inequalities foster resentments that can manifest as discriminatory speech. Additionally, media and social networks amplify and propagate these discourses, although they can also serve as platforms for denouncement and mobilization against hate. Also, the level of education and social norms impact the prevalence of hate speech, with higher prevalence in areas with lower awareness of tolerance issues. Thus, it is crucial to consider these factors when addressing hate speech and formulating strategies specifically tailored to the Colombian context.
In short, the detection of offensive language is a difficult task. The most critical challenges identified in this research are factors of the subjective nature of language framed in culture, gender, demographics, and the social environment. Therefore, there is a great complexity of words that have different senses or meanings depending on the region; this leads to significant challenges, such as the variation of vocabulary depending on the place where it is spoken, the variants of Spanish (e.g., Spain and Latin America), the content of comments with sarcasm, literary figures, linguistic registers in social networks such as emojis, and even rhetorical figures with an ironic sense such as hyperbole, for instance, “casi me muero del susto”, “Tu cara parece la Luna con tantos hoyos”, “Eres más lenta que una Tortuga”, “Te dejé un millón de mensajes en tu celular y nunca me devolviste la llamada”, among others.
The detection of hate speech is an active and constantly evolving research topic [
28]. Early studies mainly used a combination of feature extraction and ML modeling for detection in social networks [
29]. A variation on that approach was to use Paragraph2vec with Bag of Words (BOW) to detect hate speech in a collection of comments pulled from Yahoo! Finance. In total, 56,280 comments with hate speech and 895,456 comments without hate speech were collected. The results showed that the Paragraph2vec had a higher Area Under the Curve (AUC) than any BOW model [
30].
The authors [
31] propose an approach to automatically detect hate speech on Twitter using n-grams and patterns as features to train the Support Vector Machine (SVM) algorithm. The approach achieves a precision of 87.4% for binary classification and 78.4% for multiclass classification.
In [
32], the performance of different feature extraction techniques (Term Frequency-Inverse Document Frequency (TF-IDF), Word2vec, and Doc2vec) and ML algorithms are compared, such as Naïve Bayes, Random Forest, Decision Tree, Logistic Regression, SVM, K-Nearest Neighbors, AdaBoost and Multilayer Perceptron (MLP) to detect hate speech messages. The best-performing combination led to the representation of TF-IDF features with bi-gram features and SVM algorithm, achieving an overall accuracy of 79%.
In addition, text mining features for predicting different forms of online hate speech are explored, including features such as character and word n-grams, dependency tuples, sentiment scores, and first- and second-person pronoun counts [
21].
Nevertheless, despite the good results of the above approach, the unstructured nature of human language presents various intrinsic challenges for automated text classification methods [
24]. In 2017, researchers introduced a neural network-based method that learned semantic word embeddings to handle this complexity. Running experiments with a reference dataset of 16K tweets, they found that these DL methods outperform state-of-the-art character/word n-gram methods by 18 points in the F1 score [
33].
In [
34], they classified Arabic tweets into five hate speech categories using SVM and four DL models. The results obtained showed that the DL models outperformed the SVM model. Indeed, the Long Short-Term Memory (LSTM) network model with a Convolutional Neural Network (CNN) layer perfected the highest performance with a precision of 72%, recall of 75%, and F1 score of 73%.
The researchers in [
35] propose a hate detection system with a set of RNN and LSTM classifiers, incorporating user-related information characteristics, such as bias towards racism or sexism. Likewise, the work [
29] develops a hate classifier for different social networks using multiple algorithms, such as Logistic Regression, Naive Bayes, SVM, XGBoost, and neural networks, in addition to the use of feature representations (BoW, TF-IDF, Word2Vec, Bidirectional Encoder Representations from Transformers (BERT), and their combination), obtaining the BERT [
26] model with the best results when comparing individual characteristics.
The evaluation of pre-trained models based on TLM for detecting Spanish (Castilian) speech has obtained promising results [
36]. In particular,
Table 1 compares the performance of pre-trained multilingual models (mBERT [
37]) and Cross-lingual Language Models (XLM) [
36] with a monolingual Spanish BERT model (BETO) [
38] trained with a specific Spanish corpus.
The results obtained in
Table 1 show that the BETO pre-trained model scores better in the F1 evaluation metric than mBERT and XLM. In this context, it can be concluded that it is necessary to train a model in Spanish (Colombian slang) since the system is capable of more accurately modulating the vocabulary.
Consequently, the DL models generally implemented for detecting hate speech are pre-trained with the union of monolingual corpora from different languages. Although models such as mBERT and XLM provide a greater vocabulary, it is observed that the greatest coverage in the case of Spanish is for the BETO model [
38], which could be one of the main reasons why it achieves the best performance on the HatEval [
44] and HaterNet [
36,
39] hate speech datasets.
In particular, the automatic detection of hate speech in Spanish is closely related to participation in Task 5: Multilingual detection of hate speech against immigrants and women on Twitter at SemEval-2019 [
36]. The task consists of two sub-tasks in English and Spanish, the first for detecting hate speech and the second for classifying aggressive hate tweets and identifying the affected individual or group.
Table 2 details the statistics of the public datasets of hate speech in Spanish based on social networks. Considering the limited availability of publicly accessible data, there are approaches for training the data augmentation of hate speech sequences with examples automatically generated with BERT and Generative Pre-trained Transformer 2 (GPT-2) [
45]. These advances present significant improvements in the performance of the models when increasing the training data, leading to an improvement of +73% in recall and, consequently, an increase of +33.1% in F1 score [
45]. To a great extent, the close relationship between the decrease in performance and the training of a classifier with small amounts of data is evident.
On the other hand, for the solution of the first SemEval task, the participants with the highest score for the Spanish language implemented an SVM model with a combinatorial framework with linguistically motivated techniques and different types of n-grams, whereas for the second task, they opted for a multilabel approach using the Random Forest (RF) classifier [
42]. However, one of the main limitations of this type of traditional classifiers is that they are not flexible enough to capture more complex relationships naturally and do not usually work well on large datasets [
36].
As illustrated in
Table 3, it is possible to observe general trends in the literature on the different techniques of NLP to address the problem of hate speech classification.
After a careful review of previous studies, it becomes clear that a number of recent pre-trained LLMs based on the Transformer mechanism have not yet been tested for detecting HS in Spanish. However, recent work has explored GLLMs such as GPT3, GPT-3.5 ChatGPT, and GPT-4 for various text classification problems, such as sentiment analysis [
49], stance detection [
50], intent classification [
51], mental health analysis [
52], hate speech detection [
53], misinformation detection [
54], paraphrase detection [
55], news classification [
56], natural language inference [
55], and text classification [
56].
In addition, there is no evidence of comparative studies of monolingual and multilingual pre-trained LLMs to prove their validity in this language; some of them are dialects.
Finally, this article is based on TLMs due to the main advantage of not needing a large dataset that is only sometimes available, specifically for languages other than English. Indeed, it can capture long-term dependencies in the language and effectively incorporate hierarchical relationships, which is very important in languages such as Spanish due to its syntactic and semantic complexity [
36]. Additionally, a comparative study is carried out between the traditional transformer language systems, considering them the baselines in this study, and the pre-trained TLM and GPT with the dataset of the Colombian slang.
Table 3.
Summarized state of the art.
Table 3.
Summarized state of the art.
Model | Dataset | Contribution |
---|
LR [30] | 951,736 Yahoo Finance user comments | Applied BOW, TF, TF-IDF, paragraph2vec embeddings. AUC 0.8007 |
SVM, NB, kNN [57] | Tweets | Applied uni-grams, TF-IDF, retweets, favourites, page authenticity. F1 score 0.971 |
LSTM, CNN + LSTM, GRU, CNN+GRU [34] | 11,000 Arabic tweets into five classes: none, religious, racial, sexism or general hate | SVM achieves an overall recall of 74%, DL have an average recall of 75%. However, adding a layer of CNN to LTSM enhances the overall performance of detection with 72% precision, 75% recall and 73% F1 score. |
CNN + GRU [45] | 1M hate and nonhate, produced by BERT and GPT-2 | Significant improvements in the performance of a classification model |
ELMO, BERT and CNN [58] | SemEval 2019 Task-5 13,000 tweets in English | Performances of the fusion method are better than the original methods (accuracy = 0.750 and F1 score = 0.704) |
Ensemble of BERT models for Spanish (BETO) [36,59] | MeOffendEs IberLEF 2021: Offensive Language Detection in Spanish Variants | External data was from hate speech detection and sentiment analysis was used to augment the training set [59]. |
BETO [36] | HaterNet and HatEval (Spanish) | The results obtained with LM BETO outperform the other ML models. |
XLMRoBERTa [60] | MeOffendEs IberLEF 2021 | The model was trained with both tweets and sentiment analysis data in Spanish [60]. A diversity of configurations were tested, a model pre-trained on tweets and sentiment analysis data obtained the best performance [61]. |
Bidirectional LSTM + BERT (bertbase-ML) [62] | MeOffendEs IberLEF 2021 | Better results were obtained withe the Bi-LSTM model |
Transformerbased [46] | IberLEF 2021 | The pre-trained transformers for Spanish in the modeling process was very helpful. Most of the top ranked participants used transformers. They think that more specialized mechanisms could help to boost performance when using transformers. |