3.1. Architecture Overview
We divided the overall model into three parts according to their functions, namely Input Module, Clause Module and Output Module. In the Input Module, we used bag of words (
) to binarize Chinese texts into one-hot vector form. In the spam review detection task, following previous works, the features of users and products were introduced as additional features, which could help the model to detect deliberately fabricated reviews. This was equivalent to adding prior knowledge to the model. Specifically, we converted the user and target features into a binary form and then text features and additional features were fused together as the final output of the module (See
Section 3.2 for details).
As
Figure 1 shows, the Clause Module consisted of a series of clauses which evaluated the input vector. The clauses were composed of a series of Tsetlin Automata (TA) teams, where
represents not negated literal, and
negated literal. They decide whether the literal is excluded or included in the clause, as in Equation (
1) shown (see
Section 3.3 for details and
Section 3.5 for learning process).
where
is the clause, and function
is affected by the action of
and
.
w is the polarity of Clause, and
c is the number of Clauses, which can be configured manually. Finally, each Clause of the
Module outputs a score for each category. In the Output Module, all scores are summarized, and the highest one considered as the final output category.
3.2. Input Module
The input module provides the Chinese texts’ embedding vectors for post modules. TM requires input features to be binary, so that it can decompose the problem into multiple sub-patterns.
We binarized reviews into one-hot form. As
might lead to an overly sparse distribution problem, we referred to a previous work [
29] to alleviate this. Firstly, we calculated the cosine similarity of word embeddings. Then, we treated words with similar meanings as a whole and marked them together. On the other hand, we kept only high-frequency words and used chi-square test to identify the importance of each word and selected the key words. Bag size was reduced 3 times without missing much useful information. For the spam review detection task, we introduced additional features, including user behavior and product-related features. Additional features could increase the prior knowledge of the model. The TM model learning process automatically learnt the relationship between these features and categories and, finally, improved performance. The statistical features of the users’ behaviors and products’ properties were used in our framework, such as the mean, maximum and minimum values of the voting numbers, the length of related comments and the percentage of spam reviews. We adopted product features associated with the comments, which were actually movie features from Douban, an influential online Chinese platform for reviews of movies, music and books.
These additional features also needed to be converted to the binary input form required by TM, so we adopted the following conversion mechanism:
1. First, for each feature, the unique values were identified and then sorted from smallest to largest;
2. The sorted unique values were considered as thresholds and saved in the feature matrix U;
3. We compared the original features with their own identified thresholds. We set 1 or 0 depending on whether the feature value was greater than the threshold.
4. We repeated the above three steps until we achieved the final Boolean vector.
For additional features, including movie feature and user feature, we first calculated different values of each column in the feature matrix. For convenience, we only retained no more than 25 different values for each feature and saved them in the Matrix
U. Each row in
U represented one additional input feature, and the width of that row was the amount of different values of that feature (no more than 25 in our experiment). For example, binary conversion could be performed according to the following rules:
In the Equation (
2),
i represents the index of a certain feature from input
(one column of
X),
j represents the index of that feature’s different values saved before in
U (
j is smaller than 25 in our experiment).
For the sake of illustration, we considered one sample’s additional feature
as an example (
d represents the dimension of the feature). As shown in
Figure 2, if one dimension of the
was smaller than another of the different values
, previously counted in
U (where
j is the index of different values of feature
i,
j smaller than 25), then the corresponding output was marked as 0, and otherwise marked as 1. In this way, any feature of input was expanded to the amount of different values (maximum 25).
This was equivalent to a statistical judgment of the relationship between every element of a feature and all elements of that feature.
3.3. Clause Module
The clause module was composed of a series of Tsetlin Automata (TA), which could automatically learn the best action in an unknown random environment so as to learn a set of sub-patterns expressed in propositional logic. That meant that a feature, or in other words, a sub-pattern, was captured by a group of TAs, which formed a clause. One TA was only responsible for including or excluding a literal within a clause. The complex relationship was, therefore, captured by distinct clauses jointly for a certain class.
As shown in the
Figure 1, the sub-patterns associated with the target classes were captured by conjunction clauses. The number of clauses namely
c was manually set. We assumed that the dimension of feature formulated at the input module was
n. To be clear,
n was not fixed, it was determined by the size of the word bag on the current condition. For each feature
, we set two-action
s, namely
and
, to learn the original state and the opposite state of
. The opposite one was called
, as Equation (
3) shows:
As
Figure 3 shows each
or
had
N states and performed Include or Exclude action based on the current state. Then it triggered a reward or penalty. Include Action represented the inclusion of current literal, and Exclude Action meant the abandonment of the current literal.
learnt the state change of input literal
, and
learnt the change of the opposite input literal
. Function
calculated the output of all
and
, namely
and
, and formed Clause
, where index was the index of dimensions,
i was the index of Clauses and
w meant the polarity of Clause (
represented positive and
represented negative). The feedback mechanism affected the choice of each action, based on whether the action was reasonable, by rewarding or punishing. When a
received a reward, it emphasized the action performed by moving towards the most internal state. However, if a penalty happened, the
moved towards the center to reduce the reliance on the current action and weaken it, eventually switching to the other action.
TM used multiple clauses to enhance generalization ability and output a better judgment result. Half the clauses learned positive polarity, and the other half learned negative polarity. For convenience, we used odd indices () for clauses with positive polarity , and even indices () for clauses with negative polarity .
As an example, the current literal was
k, and the clause’s index was
i, and the polarity was
w. If
i was an odd number, the current clause learnt positive aspects (
). The calculation formula of
was given by Equation (
4), where ∧ meant multiply and
c was the number of clauses (set manually):
On the contrary, if
i was an even number, the current clause learnt a negative aspect (
). The calculation of
is given by Equation (
5):
3.5. Interpretative Learning Process
Differing from the neural network’s convergence method, that minimizes the output error, learning processing of TM was adjusted according to 4 factors: actual output, prediction output, current state and action of s. Considering a particular situation, we analyzed the inclusion of literal in , where j represented the index and w represented the polarity. We defined Inclusion as and Exclusion as . Once the clause made an Action, a Feedback was triggered.
As shown in
Figure 4, when the truth value of Literal was 1 and
was positive polarity (
w was 1),
received Reward feedback with a probability of
.
s was a hyper parameter (set manually), and it influenced the probability of Reward, Inaction or Penalty. Type I Feedback could reinforce the true positive output of a clause (the clause output was 1 when it had to be 1) by introducing more literals from samples. This made clauses’ predictions become closer to 1 (when the label was 1). This helped the model avoid false negatives (the clause output was 0 when it was supposed to be 1). We could also sub-divide the Type I Feedback into Type Ia and Ib. Type Ia impacted the reinforcement of exclude actions while Type Ib reinforced exclude actions of
s. Together, Type Ia and Type Ib Feedback forced clauses to output 1. To diversify the clauses, they were targeted for Type I Feedback stochastically as follows:
We wanted different sub-patterns to be learned, which meant it would be better for clauses to be smartly allocated among the sub-patterns.
T was a manually set parameter in Equation (
7), which meant the number of clauses available to learn each sub-pattern in each class. A higher
T allocated more clauses to learn each sub-pattern and increased the robustness of learning. The probability of clause
j receiving Type I Feedback,
, was influenced by
T and
v. The decisions for the complete set of clauses to receive Type I Feedback were organized in the vector
.
Once the clauses to receive Type I Feedback were selected according to Equation (
7), the probability of updating individual TA was based on the clause output value, i.e., 1 for Type Ia feedback and 0 for Type Ib feedback. Specifically, the probability of the TA updated according to Type Ia feedback when the literal also had value 1 (the
th TA of the
th clause that received Type Ia Feedback) was given by
, shown in Equation (
8):
Similarly, the probability for the Type Ib Feedback was given by
in Equation (
9)
For Type Ia, the current state should move more towards the include action direction, while for Type Ib Feedback, the current state should move towards the exclude action direction to strengthen the exclude action of s.
In comparison, the mechanism of Type II Feedback in
Figure 5 was simpler. It dud not need the adjustment of the hyper parameter
s, but made a trade-off with absolute probability. For example, when the true value of Literal was 0 and
had positive polarity, it received a Penalty with a probability of 1. To turn this clause output from 1 to 0, a literal value of 0 could simply be included in the clause. Clauses with a positive polarity needed Type II feedback when
and clauses with negative polarity needed this when
as they did not want to vote for the opposite class. Again, using the user-set target
T, the decision for the
th clause was made as follows:
The states of the
sm having corresponding literals with value 0 in selected clauses were now moved towards the include action direction with a probability of 1, according to Equation (
10). Type II Feedback introduced literals that made the clause evaluation false when a false alarm error occurred, which could increase the discrimination of the model and suppress false alarm errors (the clause output 1 when it had to be 0). It made clause predictions closer to zero.
For example, we used an example from a sentiment analysis task to show the learning process of TM: X = 好可爱的名字[可爱], translated into “Such a cute name [cute].”, having a positive label y = 1. The input was tokenized and negated: such = 1, cute = 1, name = 1, , , .
We considered two positive polarity clauses and one negative polarity one (i.e.,
,
, and
) to show different feedback conditions occurring while learning the propositional rules. As
Figure 6 shows, for the first time stamp
in the figure, we assumed the clauses were initialized randomly, as follows:
,
and
. Since clause
consisted of non-negated literals, and output 1, the condition was
and
. As shown in Equation (
8), Type I (a) feedback was used to reinforce the Include action. Therefore, literals, such as
, were included for the second time stamp
, as
Figure 6 shows.
Similarly, for , the output was 0 due to . Therefore, Type I (b) feedback was triggered and Excluded actions would start in the condition ( and ).
Type II feedback (right side of figure) only occurred when y = 0(for positive polarity clauses) and y = 1 (for negative polarity clauses). We took negative polarity clauses to demonstrate the effect of Type II feedback. The model added 0-valued inputs to make the output of the affected clause change from 1 to 0.