Considering the role of code words in hate speech, which was elaborated in the preliminary study, it provided a strong motivation to develop an automatic method for identifying hate speech that does not rely on specific terminologies. One of the shortcomings of traditional approaches in the hate speech domain is the lack of contextual information and heavily relying on annotated resources with meta-linguistic information. The advantage of using a pattern-based approach over lexicon-based approaches is the linguistic cues that can be provided to ensure resilience.
The method of extracting patterns in [
4] provides a flexible representation of an underlying sentence structure. The work focuses on extracting patterns and performing multi-class emotion classification. Using such a method, we implemented an unsupervised graph approach to identify patterns that are used in hate speech. In order to fulfill the requirement of classification, the pattern extraction process was necessary. Once the patterns were extracted, the aim was to evaluate each pattern by using a ranking algorithm to assign a pattern score. This pattern score is significant as it expresses the pattern relevance to the different target categories of hate speech (HS) or not hate speech (NHS). The patterns and their scores serve as features for classification. Finally, a hate speech classifier was constructed based on a vector multiplication approach to represent tweets as a vector of the frequency of each pattern set.
4.1. Graph Construction
The proposed methodology requires two data sources, which are then transformed into a graph representation. These two data sources refer to opposing target classes on the classification task. For this study, we can think of them as one collection containing hate speech and another containing non-hate-speech expressions, for example, HateComm and TwitterClean, respectively.
Given the normalized datasets, each word in them is considered a token. A list of the weights of each token pair is constructed for each class: for hate speech and for non-hate-speech. Calculating the weights of each token pair is necessary, as it allows the framework to identify the underlying structures in the tweets, capturing those words that are commonly used together. For instance, a post “Build the wall higher!!” results in the following token pairs: (Build, the), (the, wall), (wall, higher), and (higher, !).
Definition 8. (Token Pair Weight) For a token pair , its normalized weight can be computed as shown in Equation (9).where is the frequency of token pair . A weight aggregation is calculated to identify which of the two classes the token pairs highly represent. The goal of this step is to ensure that the weights for the token pairs represent how common they are in the specific pattern class.
Definition 9. Subsequently, new weights for arcs are assigned based on a pairwise adjustment as shown in Equation (10). A similar calculation, based on a pairwise measurement, was done for , as shown in Equation (11): Arcs with high weights represent token sequences that are more common or relevant in the respective classes. Lower weights either represent tokens that are more representative of the opposite class or token sequences that are just commonly present. Furthermore, weights in and are pruned based on a threshold .
With the extracted tokens and their weights, two weighted graphs were constructed: the hate speech graph and the non-hate-speech graph . In which:
Two different graph measurements were used to determine connector words and subject words. We believe these two types of words constitute the building blocks of written expression, and they both carry out their own important functions. They are also related to the broader concepts of syntax and semantics. However, the syntactic structure can also convey meaning [
24].
Connector words (CW) are those that play an important role in the syntax and structure of a text, similar to the idea of conjunction described by Halliday et al. [
25]. The intuition is that these types of words are central in the graph of a corpus since they enable several connections. The eigenvector centrality was used to rank tokens and to avoid promoting very frequent words. The eigenvector centrality assigns a score to all nodes on a graph based on the idea that connections to high-scoring nodes contribute more to the score of a given node in comparison to low-scoring nodes. Nodes with an eigenvector centrality score higher than
were selected as connector words.
Subject words (SW) are those that can elicit a concept related to the class of the corpus. This list of words was extracted once we had obtained the list of connector words. However, taking into consideration the fact that the graph had already been pruned, we could make the assumption that the words highly connected to connector words are likely to represent information related to the topic of the graph. Opposite to connector words, subject words focus on the closeness degree of a word group. Hence, the clustering coefficient was calculated to select the words in a specified range. Nodes with a clustering score higher than results were selected as subject words.
4.2. Pattern Extraction
The motivation of extracting linguistic patterns in comparison to a set of unigrams was to obtain features that are richer and more representative. To avoid long patterns or increase computation effort, we took into consideration patterns of two to three words. Keeping the grammatical structure of a statement intact, the patterns extracted must contain a minimum of one word from each category (CW and SW). Pattern candidate templates of two-word patterns would be extracted as follows <cw, sw> and <sw, cw>, while three-word pattern candidate templates include the following combinations: <cw, cw, sw>, <sw, cw, cw>, and <cw, sw, cw>. There are cases where a word can be marked as both CW and SW. In that case, both representations are shown. In
Table 6, examples of the pattern candidate templates and what they capture are presented.
As shown, SW in the pattern examples were substituted with a wildcard “*” symbol. This operation allows flexibility of allowing other subject words, while keeping the underlying structure of the pattern intact. Additionally, this operation permits the patterns to be applied to other domains. Since our work focused on identifying linguistic cues that can be used to detect hate speech, we were interested in finding the general pattern that it represents.
4.3. Pattern Ranking
These linguistic patterns will act as input features into a learning model. The linguistic patterns that were extracted contain many patterns that are either too frequent in the class or not very frequent. In order to ensure we are getting patterns with substance that provide us useful information and are true representations of their respective class, a pattern ranking is crucial. To conduct this pattern ranking, a customized term frequency–inverse document frequency (TF-IDF) measure that was proposed by [
4] was adopted. This method is composed of the following three measures: pattern frequency, inverse hate speech frequency, and diversity degree.
Definition 10. (Pattern frequency) The frequency of the pattern p in a collection of social data related to hate speech h. The log-scaled pattern frequency is denoted as:where is the frequency of pattern p in hate speech h. Definition 11. (Inverse hate speech frequency) The inverse hate speech frequency measures how common or rare the pattern p is across all hate speech collections and is computed as:where is the frequency of pattern p in hate speech h. Definition 12. (Diversity Degree) Diversity is based on the capturing of unique hate words in a collection by a pattern with its wildcard. If a pattern captures a wider range of subject words, their pattern diversity would rank higher. This would indicate that they are a better representation of the kind of linguistic cues used in hate speech.
Let denote the diversity degree of a pattern p, which is calculated as:where represents the number of unique words across hate speech collections that the pattern p can capture through its wildcard or placeholder “*”. Definition 13. (Hate Degree) Finally, all three measures: pattern frequency (), inverse emotion frequency (), and diversity degree () were multiplied to form the hate degree (). However, the scope of the degree is limited by its own class. It is a true representation of the importance in its own class, but it does not take into consideration how representative it is for the other class. Thus, a degree normalization was executed:
A similar calculation was done for non-hate candidate patterns:
Patterns were pruned based on a degree threshold . This pruning process ensures that patterns that are not representative of the class are removed. High-ranking patterns are better representations of the class in comparison to low-ranking patterns. Pattern ranking is based on the ascending rank of the degree as our work focused on generating distinct patterns that are true representations of its class. The result of this whole process is two distinct sets of ranked patterns R that represent each one of the classes, hate and non-hate.
4.4. Hate Speech Classification
Given an incoming social post, the patterns contained in it are identified to generate two frequency vectors F for each pattern set, for hate and F for non-hate.
The frequency vector for hate:
where
is the frequency of pattern
i in the post. The frequency vector for hate:
where
is the frequency of pattern
i in the post.
The classification of the post was computed:
The vector whose multiplication yields the frequency vector with the higher value determines the class of the post.