This section presents the proposed array-embedded tree algorithm for
TFUIM, called the
ATTFUM algorithm. In the past, a two-phase algorithm [
18] was introduced to solve the temporal fuzzy utility mining problem. This study uses a one-phase method to find
HTFUIs according to the upper bound model [
16] and FP-tree [
19]. First, the data-processing step needs to be processed. The item quantities in all the transactions are transformed into fuzzy sets based on the membership functions. If the
tfuubr value of a linguistic term is not less than the specified threshold, it is treated as an
HTFUUBI. Then, the linguistic terms which are not
HTFUUBIs are deleted from the transformed database. The remaining terms are then sorted in descending counts. The sorted terms in each transaction are used to build the tree. Finally, a mining procedure similar to the FP-Growth in FIM is designed to find out potential candidates and determine whether they are
HTFUIs. The details for the steps of the proposed method with examples are described in the following four subsections.
4.1. Transformed into a Fuzzy Database
In order to execute the
ATTFUM method for mining
HTFUIs, the pre-processing step is first executed. First, each transaction
TID in a database is converted according to its sale time to a specific time period. The start time period of each item in the given database is then recorded. Next, the quantities of all the items in a database are fuzzified into linguistic terms based on the defined membership functions. In the transformation process, the fuzzy value of each transformed term can be obtained as well. For the
TQD in
Table 1, the start time periods (
STPs) of all the items are shown in
Table 3.
Assume the five items in
TQD use the same membership functions in
Figure 1. The final fuzzified results are listed in the last column of
Table 1. Then the upper-bound measure is utilized to generate the downward-closure property. The
mtfu value of each transaction needs to be calculated. Thus, to calculate the upper-bound value of a transaction, the
fu value of each linguistic term in the transformed database (the last column of
Table 1) needs to be obtained. For example, for the linguistic term (
a.L) with the value 0.66 in
T2, its
fu2,a.L is 5.36 (=0.67 × 4 × 2). The
fu values of the other linguistic terms in
T2 are
fu2,a.M (2.64),
fu2,b.L (16.08), and
fu2,a.M (7.92), respectively. Thus, the
mtfu and
tfu values of
T2 are 21.44 (=5.36 + 16.08) and 32 (=5.36 + 2.64 + 16.08 + 7.92). The
mtfu and
tfu values for all the transactions are listed in
Table 4.
Next, the
tfuubr value of each linguistic term is calculated. Take the linguistic term (
a.L) in the last column of
Table 1 as an example. It can be observed that the term (
a.L) happens in
T2,
T3,
T4,
T5, and
T6, and their
mtfu values are shown in
Table 4. Besides, the
STPall derived from
Table 3 is
P2. Assume the defined threshold is 40%. According to definition 10, the
tfuubr value of the linguistic term (
a.L) is first handled. The numerator is 139.56 (=21.44 + 1.98 + 45.42 + 47.36 + 23.36). Then, the denominator is calculated as 128.68 (=1.98 + 53.34 + 47.36 + 26), which depicts the
tfu value from
STPall (
P2) to
LTPall (
P3). The
tfuubr value of the linguistic term (
a.L) is then 108.4% (=139.56/128.68), which is not less than the defined threshold. Therefore, the linguistic term (
a.L) is an
HTFUUBI. The same process is performed for the other terms, and the sorted results based on their occurrence frequency are displayed in
Table 5.
The linguistic terms that are not
HTFUUBIs are then deleted from the transformed database. The remaining linguistic terms in a transaction are sorted by descending counts and formed as a revised database. The resultant revised database is shown in
Table 6.
The rearranged terms in each transaction are then used to be inserted into the tree sequentially. The process is described below.
4.2. Tree Structure Construction
The proposed algorithm begins building a tree structure after forming the fuzzified database. Each transaction is sequentially processed. The linguistic terms in each transaction are inserted to create nodes in the tree. Each node is attached with the total_mtfu field and the array-list data structure for storing mining information. The total_mtfu field stores the accumulated mtfu value. If the inserted linguistic term in a transaction belongs to this node, set total_mtfu = total_mtfu + mtfuy. The array-list data structure includes three subfields: TID, fuzzy value, and utility value.
For example, the linguistic term
c.L of transaction
T1 in
Table 6 results in the first branch of the tree. The initial
total_mtfu value of the node is the
mtfu value of
T1, which is 17.36 in
Table 4. Then, its array-list structure contains the transaction identifier
T1, the fuzzy value of the term
c.L (=0.67) in
Table 6, and its utility value (=2 × 4 = 8) derived from
Table 1 and
Table 2. The first inserted node is shown in
Figure 2.
Next, the transaction
T2 is handled. It consists of three linguistic terms (
a.
L), (
b.
L), and (
b.
M). The
mtfu value of
T2 is 21.44 in
Table 4. The first linguistic term (
a.
L) is added to the tree. The node of the term (
a.L) cannot share the same prefix (
c.L) of the tree with the transaction
T1. Therefore, the second branch is grown to connect the node (
a.L) and the root. The term (
b.L) is then added to the tree as the child of the (
a.L) node. The term (
b.M) is handled in the same way. Thus, in this branch, there are three nodes inserted, all with the
mtfu (21.44) value of
T2 attached. After
T2 is processed, the tree is shown in
Figure 3. The complete tree for all the transactions is depicted in
Figure 4.
4.4. Mining HTFUIs from the Tree
When the tree is built, the candidate
HTFUI itemsets can then be mined using a procedure similar to the FP-Growth approach [
19]. The procedure is, however, more complicated than the FP-Growth due to the complex data kept in the nodes. It will check whether a candidate has its
tfur value satisfying the defined threshold.
For a developed tree such as that in
Figure 4, the mining algorithm finds
HTFUIs as follows. The Header_Table includes the sorted linguistic terms according to the order of descending counts. A conditional pattern tree is grown for each linguistic term in the Header_Table. These terms are processed bottom-up and one by one. The
mtfu values of the generated candidate itemsets, including the processed linguistic term, are then recursively calculated. If their
mtfu values are not less than the defined threshold, they are removed in the conditional pattern tree. In
Section 4.3, each branch in the tree is built using the
mtfu values of the linguistic terms in a transaction. Thus, the candidate fuzzy
k-itemsets obtained by the
total_mtfu field in the node can be found easily directly from the tree. If the
tfur values of the candidate fuzzy
k-itemsets are larger than or equal to the threshold by using the array-list data in the node, the
k-itemsets are
HTFUIs. Below is the mining algorithm.
The mining Algorithm 2:
Algorithm 2: mining HTFUIs from the tree. |
INPUT: | r, the constructed tree; |
| λ, minimum temporal fuzzy utility threshold. |
OUTPUT: | All HTFUI itemsets. |
For each linguistic term x bottom-up in the Header_Table, conduct the following steps:
- Step 1:
From the constructed tree, form a conditional pattern tree of the node x by the link of x in the Header_Table. - Step 2:
Check each prefix node in the conditional pattern tree from the node x for whether the prefix node and x are originated from the same item. If yes, remove the prefix node. - Step 3:
Check the total_mtfu value for each prefix node in the conditional pattern tree from x for whether the value is not less than *λ. If no, remove the prefix node. - Step 4:
Find candidate HTFUIs from the remaining conditional pattern tree from x. - Step 5:
Calculate the tfurz value from the node of each candidate z with the array-list structure using the intersection operation as the fuzzy operator. If tfurz is not less than λ, z is a HTFUI.
|
For example, take the linguistic term (
d.M) in the Header_Table in
Figure 4 as an example. Its conditional pattern tree is shown in
Figure 5a. There are two branches derived from the linguistic term (
d.M), as shown in
Figure 5b.
Along with building the conditional pattern tree for the linguistic term (
d.M), the combinations of the possible fuzzy candidate itemsets with (
d.M) are then derived using the mining algorithm recursively. All the possible fuzzy itemsets are processed for checking whether the
mtfu values of the candidates are not less than the threshold in the temporal fuzzy utility mining. The linguistic terms (
d.L) and (
d.M) are originated from the same item
d, so the linguistic terms (
d.L) in two branches of the conditional pattern tree are removed in
Figure 5b. All the linguistic terms in
Figure 5b are then processed for checking whether their
mtfu values meet the threshold value. The total
mtfu value of each fuzzy itemset, including (
d.M), is then counted from individual branches. In this case, the fuzzy itemset (
e.L,
d.M) exist in two branches of
Figure 5b. Thus, their
mtfu value is 68.78 (=45.42 + 23.36). It is larger than the threshold value (=(1.98 + 53.34 + 47.36 + 26) × 40% = 51.472). As another example, the
mtfu values of the linguistic terms (
c.L) and (
b.L), which are both 45.42 in the left branch, are less than the threshold value, so they are omitted in the two branches. The final result for the conditional pattern tree with the linguistic term (
d.M) is shown in
Figure 5c.
All the fuzzy itemsets derived from the conditional pattern tree can be obtained without rescanning the database. In this example, all the possible itemsets with
d.M are [(
d.
M,
e.
L): 68.78], [(
d.
M,
a.
L): 68.78] and [(
d.
M,
e.
L,
a.
L): 68.78] derived from
Figure 5c. Next, all the fuzzy itemsets are processed to check whether their
tfur values satisfy the threshold. If yes, they are
HTFUIs. Take [(
d.
M,
e.
L)] as an example. According to definition 12, the
tfur value of the fuzzy itemsets [(
d.
M,
e.
L)] needs to be calculated. It can be known that the
LTP[(d.M, e.L)] range is from
P2 to
P4. Therefore, the
tfu values of all the transactions in
LTP[(d.M, e.L)] are summed, and its value is 128.68 (=1.98 + 53.34 + 47.36 + 26), which is the denominator. Next, the fuzzy utility value of [(
d.
M,
e.
L)] is calculated. In
Figure 5c, the array-list structure information of the nodes (
d.
M) and (
e.
L), including identifier transaction, fuzzy value, and utility value, can be obtained. Thus, the fuzzy values of (
d.
M) and (
e.
L) are both 0.33 and 0.67 from their array-list structure in
T4 and
T6. By using the intersection operation in the fuzzy sets, the fuzzy values of the fuzzy itemset [(
d.
M,
e.
L)] in
T4 and
T6 are 0.33 and 0.33, respectively. The fuzzy utility value of [(
d.
M,
e.
L)] is calculated as 0.33 × (8 + 8) + 0.33 × (8 + 8) = 11.88, and its
tfur value is 9.23% (=11.88/128.68), which is less than the defined threshold (40%). It is not an
HTFUI. The remaining itemsets are handled in the same way.