1. Introduction
The growing e-commerce market is a significant interest for managers to produce technology to facilitate customers’ buying the products they need. With the improvement of the digital economy, the competition in the business environment gets more complicated. Therefore, e-commerce companies need to be supported by choosing the right products to draw the attention of online customers [
1]. Customers are often confused when deciding on the product to purchase, as a wide range of products is offered. This purchasing confusion triggers developing a product recommendation system [
2]. Sales transactions are analyzed to understand customers’ visit purposes related to product preferences. Association rule mining (ARM) is a popular and powerful method that allows identifying relationships between purchase preferences. It analyzes historical transactions called item sets purchased by customers and creates association rules among items. According to the associations, product recommendations can be generated for customer preferences. Association rule mining methods, such as the apriori algorithm, are performed by counting the sales amount together. However, these algorithms ignore the items’ profit. It is not feasible to reveal the more profitable items with infrequent sales volume. Hence, the possibility of profit improvement gets lower using the traditional recommendation systems.
Discovering an association among item sets in many e-commerce sales is not straightforward because of the data explosion in the big data era [
3]. There are many overlapping intersections among datasets, and the boundaries between them are fuzzier. Traditional association rule mining methods cannot overcome the overlapping, and boundary problems [
4]. The fuzzy-based methods can manage the uncertainty and produce more accurate solutions [
5]. Fuzzy logic is a sub-domain of artificial intelligence, referred to as multi-valued logic. It presents a robust way of describing the concept of vagueness. Instead of having binary-valued transactions (true and false), a continuum of possible truth values has existed. In fuzzy logic, every statement has a membership between 0 and 1 instead of having two possible values, such as true and false. Therefore, adapting fuzziness to the association rule mining becomes more critical for a recommendation system [
6,
7,
8]. This study only handles the sales amount thanks to product diversity and missing data about the product feature. However, some product features could also be considered. For instance, assuming an e-commerce company sells books, apart from sales amounts, book type (horror, adventure, autobiography, travel, etc.) and the publication year (recent, new, old, etc.) can also be used for fuzzy logic. Because one book can belong to more than one type or the newness of the book cannot be described strictly, the fuzzy logic approach was chosen to benefit from its fuzziness.
The primary motivation behind this study is to address the limitations of existing methods for product recommendations in e-commerce systems, which only consider the sales frequency rather than the products’ profitability. The goal is to provide a more comprehensive and practical solution for e-commerce companies, as profitability is a crucial aspect of any business. The proposed P-FARM method aims to enhance decision-making and maximize company profits by considering both the frequency of sales and profitability. The study seeks to bridge the gap between existing methods and real-life business requirements and provide a more accurate and valuable product recommendation system for e-commerce companies. From this perspective, this study contributes to the literature twofold. First, it proposes a novel association rule mining method that overcomes overlapping and boundary problems in the real-life application with fuzziness. Second, it considers profit instead of sales volume in traditional association rules. The proposed novel method is called profit-support fuzzy association rule mining (P-FARM). It adopts the relationship between the profit and sales amount of each item. Considering profit instead of item sets is mainly ignored in the previous studies. However, e-commerce companies should also consider their profits while proposing products to visitors. In this way, profit values used as support inputs are converted into fuzzy numbers to define them in a more appropriate case, like in real-world implementations.
Association Rule Mining (ARM), Profit-Support Association Rule Mining (P-ARM), and Profit-Supported Fuzzy Association Rule Mining (P-FARM) are three approaches that can be used for generating association rules in the e-commerce domain. ARM is a traditional data mining technique that focuses on identifying frequent itemsets and association rules. It uses support and confidence metrics to identify the most relevant rules. P-ARM is an ARM extension that considers the items’ profit in addition to the support and confidence metrics. P-ARM uses the Profit-Support metric, which is calculated by dividing the total profit of an item by its support value. P-FARM is an improvement of the P-ARM approach that further extends the fuzzy logic theory. P-FARM uses a novel approach to calculate the support of an item that considers both the traditional support and the item’s profit. It then uses a profit potential coefficient to calculate the fuzzy profit support of an item. P-FARM thus provides a more comprehensive way to mine association rules by considering both the frequency and profitability of the items. In summary, ARM focuses on identifying frequent itemsets and association rules based on support and confidence, P-ARM takes into account the profit of the items in addition to the support and confidence, while P-FARM extends the approach further by considering both the traditional support and the profit of the item to calculate the fuzzy profit support.
The structure of the paper is organized as follows.
Section 2 gives a quick review on recommendation systems for e-commerce domain.
Section 3 presents background of association rules, fuzzy association rules.
Section 4 introduces the proposed methodology, profit-support fuzzy association rule mining (P-FARM).
Section 6 compares the effect of the proposed P-FARM and P-ARM with a numerical experiment.
Section 5 shows the implementation of the proposed method to verify its validity. In
Section 7, the study is concluded by giving highlights and limitations.
2. Literature Review
The most popular use of association rules is analyzing customer transaction data to identify relations between purchased products. The main aim of association rules mining is to support sales. Ref. [
9] introduced an effective method to create important association rules between products purchased. Various algorithms were improved to find association rules in large databases, such as the AprioriHybrid algorithm [
10], direct hashing and pruning algorithm [
11], frequent pattern-growth algorithm [
12], cluster-based association rule algorithm [
13], integrating web traversal patterns and association rules [
14], and matrix and interestingness-based association rule mining [
15].
Table 1 summarizes previous studies about association rules.
Since the data type used may affect the data preparation and the methods to be applied for the study, the researchers discovered the association rules with different data types. Some frequent pattern algorithms have been introduced to extract information from streaming data. These algorithms involve significant data mining techniques such as clustering [
16,
17], classification [
18], prediction [
19] frequent pattern mining [
20] and time series analysis [
21]. Studies using stream data frequently developed a new association rule algorithm by handling the problem of the time window. The sliding window is a broadly applied approach for data stream mining thanks to its importance on recent data and bounded memory requirements. A transactional sliding window aims to retain a fixed-size window over a data stream [
22,
23,
24,
25]. Web log data are another popular data type used in association rule mining. The web log data consists of a series of events where each recording describes the session with particular page navigation [
26,
27,
28,
29].
Various researchers extended traditional association rule mining algorithms with fuzzy theory [
26,
30,
31]. A fuzzy set can recompense some of the limitations of the association rule methods. This research proposes a generic model to find association rules by fuzzifying transactions, called fuzzy association rules. Computing the support and accuracy of fuzzy association rules is the main difficulty [
8,
32,
33,
34,
35]. Ref. [
36] developed a recommendation system using hybrid fuzzy association rules to identify the significant user navigation pattern from the clustered frequent patterns in tourism sector via a questionnaire. Ref. [
31] proposed a fuzzy c-means clustering method to create association rules by combining the Apriori algorithm. They focused on customer ratings instead of frequent item sets in the telecom area. In the media sector, Ref. [
26] proposed a fuzzy inference system including a set of rules from the clustered pattern for identifying the significant user navigation pattern. Ref. [
23] integrated fuzzy theory with data streams employing a sliding window approach to analyze association rules. In the e-commerce domain, visitors follow different navigation paths on the website and visit different pages in different order and frequency [
37]. Ref. [
38] designed a personalized recommendation system using fuzzy association rules in the e-commerce domain.
Some researchers utilized clustering approaches to create customer profiles and then created association rules for the clustered customers [
31,
39,
40]. Clustering results were mainly used to provide personalized recommendations [
41,
42], optimize a website structure [
43,
44], and improve a customer-oriented strategy [
45,
46].
Most research in the literature focused on overcoming binary-valued transaction data [
22,
36,
47,
48]. Yet, transaction data in real-world cases mainly include fuzzy and quantitative values. Consequently, some fuzzy-oriented association rules algorithms were introduced [
6,
26,
36,
49,
50]. Ref. [
51] introduced a group recommendation system to achieve suitable membership functions and practical association rules from a database that includes uncertain data. The apriori algorithm was updated with fuzzy theory to get membership functions with more effective results. Ref. [
52] used fuzzy association rules to design a recommendation system. The apriori algorithm was improved with discretization based on a clustering algorithm to express quantitative results in a nominal variable matrix. A fuzzy recommendation algorithm was proposed by combining quantitative association rules and fuzzy rules to predict the product that will be recommended.
Table 1.
Previous studies on association rules.
Table 1.
Previous studies on association rules.
Study | Data Generator | Method | Fuzzy | Focus | Domain | Explanation |
---|
[53] | Stream data | Novel | - | Time window | N/A | It introduced FP-stream, an effective FP-tree-based model for mining frequent patterns from data streams. |
[47] | Stream data | Novel | - | Time window | E-commerce | It created all recent frequent patterns from a high-speed data stream over a sliding window. |
[54] | Stream data | Novel | - | Time window | N/A | It enabled defining time windows’ number, size and weight. |
[55] | Stream data | Novel | - | Time window | N/A | It presented a novel algorithm with normalized weight over data streams and tree structure that stores compressed crucial information about frequent item sets. |
[24] | Stream data | Novel | - | Time window | E-commerce | It proposed a new algorithm which is suitable for observing recent changes in the set of frequent item sets over data streams. |
[56] | Secondary | FS | + | Item set | Finance | It predicted the level of the stock market after the associations among different parameters are extracted |
[48] | Web log | Novel | - | Time window | Education | It proposed the usage of a specific density-based algorithm for navigational pattern discovery. |
[28] | Web log | FS | + | Item set | N/A | It used the 2-tuple linguistic description to create association rules at the intersection of fuzzy set boundaries. |
[29] | Web log | FS | + | Item set | 6 Domains | It discovered frequent fuzzy–probabilistic item sets and fuzzy association rules using a novel algorithm. |
[38] | Secondary | FS | + | Item set | E-commerce | It designed a personalized recommendation system using fuzzy association rules. |
[57] | Secondary | ARM | - | Item set | Retail | It calculated new support and confidence values based on “profit” to create interesting patterns. |
[23] | Stream data | FS | + | Item set | Retail, Transportation | It integrated fuzzy theory with data streams, employing sliding window approach, to analyze association rules. |
[22] | Stream data | Novel | - | Item set | Sport | It analyzes frequent patterns from real-time transactions with the sliding window technique. |
[25] | Stream data | Novel | - | Item set | Retail | It proposed a new algorithm that focuses on keeping self-consistency of the discovered item sets. |
[26] | Web log | FS | + | Item set | Media | A fuzzy inference system was generated, which includes a set of rules from the clustered pattern for identifying the significant user navigation pattern. |
[50] | Secondary | CF | - | Item set | Education | It developed a recommendation system for students’ programming skills. |
[36] | Surveyed | CF | - | Ratings | Tourism | A novel hybrid recommendation algorithm (HyRA) was introduced with point of interest and geographical information. |
[31] | Web log | CF, FCM | + | Ratings | Telecom, e-commerce | It proposed the topological representation of tree-structured taxonomy and the statistical properties of the taxonomy. |
[27] | Web log | CF | - | Ratings | Media | It described a new recommendation algorithm based on probability matrix factorization. |
[58] | Secondary | FS | + | Item set | E-commerce | It considers alse sales amount to create asociation rules. |
Previous studies focused on only a single
. It means that the studies implicitly assume that all items in the database are similar. In other words, items have similar frequencies in the database, which is not valid in the real world. If using previous association rule mining algorithms, two problems will be encountered. First, some rules making few profits will be generated. Second, in the first iteration of the Apriori algorithm to yield a 1-item set, some items are deleted, which can make higher profits but have lower support. In terms of profit, even though the sale of some items has occurred only a few times (less than the predefined
), they can be more important (e.g., much more expensive) than the others, which have occurred more frequently. For example, in a clothing store, a designer gown may have been sold only a few times but has a significantly higher value than the other clothing items sold more frequently. In this case, even though the designer gown may not have reached the predefined minimum number of sales, it would still be important to consider it for analysis as it can potentially contribute more to the store’s profit. Because of that, Ref. [
2] focused on the multiple minimum supports to mine association rules considering the profit impact on the frequencies. They used a synthetic data set with 1000 items and 10,000 transactions. In the same focus, Ref. [
57] proposed profit support and profit confidence by regarding the actual profit and averaging the total profit of each item. They used a sample data set with five transactions, including five products. Although [
2,
57] focused on profit-based association rules, this study improves their models under vagueness because transaction data in real-life mainly involve fuzzy and quantitative values. Moreover, this research uses real-world data to test the proposed methodology with 834,047 sales transactions, including 339 products generated by above 460,000 customers.
This study stands out from previous related works, which can be categorized into fuzzy association rules models and profit-support association rule models. It enhances fuzzy association rule models by incorporating profit considerations and expands the fuzzy theory to create profit-supported association rule models. The proposed method modifies classical support to include profit information, which determines the significance of items. The study introduces a novel model, P-FARM, for mining profit-supported fuzzy association rules in e-commerce to suggest more profitable products based on visitors’ interests.
4. Proposed Methodology
The proposed method extracts hidden patterns from the e-commerce sales volume and profit, called Profit-Support Fuzzy Association Rule Mining (P-FARM).
Figure 1 shows the proposed methodology. It consists of three stages: ETL (Extract, Transform, Load) process, data analysis, and rules.
Step 1: The formation of the database, the first stage, begins with data collection.
Step 2: The second step is transforming quantitative values into fuzzy numbers. The membership values of an item
i can be defined as a set of
. Equation (
8) is used to transform crisp numbers to fuzzy numbers, where
presents the sales amount of item
i. The values in the membership set indicate fuzzy membership degrees for low, medium and high classes, respectively.
Step 3: The fuzzy profit support (FPS) values are obtained.
Let
be a set of variables and
an arbitrary fuzzy set associated with attribute
in
Z. In Equation (
9),
indicates a fuzzy item and
refers to a fuzzy item set where
C shows the corresponding set of some fuzzy intervals.
The fuzzy item sets are used to determine the fuzzy support values. The tuple
of the dataset includes the value of
for the attribute
. Hence, the fuzzy support values of
are calculated by the minimum operator (Equation (
10)) or the product operator (Equation (
11)).
Step 4: The frequent item sets are found. A frequent item set can be described as a set with fuzzy support values higher than a user-defined minimum support threshold. An algorithm is necessary to reduce the number of possible item sets because of high numbers. This study applies the Apriori algorithm to create frequent item sets. It is a stepwise algorithm that commences with obtaining the frequent 1-item set and iteratively creates new candidates utilizing the frequent items discovered in the previous iteration [
61].
Potential Profit Coefficient (PPC) must be computed by Equation (
12) to adapt the profit parameter into the fuzzy support. It is the proportion of the total profit of item
i and the average profit of total items.
FPS values are obtained by multiplying fuzzy support value and potential profit coefficient similar to the given in Equation (
13). FPS values are used to decide frequent item sets.
Step 5: All possible combinations of frequent item sets are considered to calculate the fuzzy confidence values before producing fuzzy association rules using Equation (
14).
Step 6: The candidate item sets with higher confidence values than the threshold are put into the association rule repository.
Step 7: The information discovered from the association rules at the end of the six steps can be used to develop an e-commerce company’s profitability by offering more appropriate products.
Discovering fuzzy association rules is a process of obtaining the consequents and predecessors of a frequent item set. It is stated as if then . A fuzzy association rule is essential when support and confidence values are higher than predetermined thresholds.
This study does not apply the well-known Fuzzy Association Rule Mining (FARM) method. FARM is a data mining technique that uses fuzzy logic to extract association rules from data. In traditional association rule mining (ARM), items are considered either present or absent in a transaction. However, in FARM, the degree of membership of each item in a transaction is represented by a fuzzy set. It allows for more flexibility and expressiveness in modeling uncertain and imprecise data. The proposed methodology improves the traditional FARM technique by adding profit support values after converting them into fuzzy numbers. Different extensions are presented in the Proposed Methodology section step by step. As an example of these extensions, FPS in step 3 with Equation (
9) and PPC with Equations (
12) and (
13) are calculated, which are ignored in FARM studies. The novelty of this study is to recommend products that are not only the most frequent and relevant but also the most profitable.
6. Experimental Comparison and Discussion
Table 5 shows a small part of sales transactions for experimental comparison of three different methods, ARM, P-ARM, and P-FARM. It includes a total of 17 transactions and shows the corresponding product IDs, quantities sold, unit profit, and total profit for each transaction. Total profit was calculated by multiplying the quantity and unit profit.
Table 6 presents an evaluation of the P-ARM approach created by reformatting
Table 5. It summarizes the products sold in each transaction and identifies products above the minimum threshold, indicated by a green background. It shows each product’s Count, Quantity, and Total Profit, which were obtained from
Table 5. Each product’s Potential Profit Coefficient (PPC) was computed using Equation (
12). For example, the PPC of product 1002 is calculated by dividing the Total Profit obtained from product 1002 and the average profit received in five transactions:
. It also presents the Profit Support derived from the P-ARM approach for each product. It considers both the classical support and profit information and is used to identify important items for association rule mining.
Table 7 shows the fuzzy transformation of
Table 6 using a fuzzy set consisting of triangular fuzzy numbers
. For example, ‘1’ product 1002 was sold in transaction 401. According to the predefined fuzzy set, sales volume of ‘1’ can belong to ‘Low’ and ‘Medium’ sales sets. The crips number ‘1’ is transformed into the fuzzy numbers, calculated as
) using Equation (
8). Similarly, all sales amounts in the dataset were converted into the corresponding fuzzy numbers. The amount of fuzzy numbers for each class was counted to determine the fuzzy class of an item. The maximum amount was used to define the Final Fuzzy Class (FFC) and gives the Fuzzy Count. Product 1002 was renamed ‘1002H’ because the maximum Fuzzy Count belonged to the ‘High’ class with 1.75. Then, the Frequency Profit Support (FPS) was calculated by multiplying the Fuzzy Count and PPC to determine frequent item sets. Products over the support threshold were assigned to the frequent 1-item set.
According to the traditional ARM (Count row) in
Table 6, all products, apart from 1002 and 1092, are frequent because the sales amounts are above the minimum support value, 3. On the other hand, because P-ARM changes the frequencies, some crucial changes occurred in the illustrative example. Whereas products 1080 and 1095 are in the frequent 1-itemset, they were left out of the P-ARM’s frequent 1-itemset. By chance, no product was included in the P-ARM method. The profit support measure considers both the classical support and profit information and provides a better assessment of the importance of products. The changes in the Count and Profit Support values can impact the results of the association rule mining. In this case, the products above the threshold are critical because they contribute significantly to the frequent item sets and the derived association rules. Hence, the effect of the threshold value will be disscused.
Table 8 indicates the details of the comparison. ARM, P-ARM and P-FARM methods created 4, 3 and 2 products, respectively, in the illustrative example. The average profits of the recommended products by ARM, P-ARM and P-FARM are 7.25, 15.33, and 19.5 units of currency, respectively.
The P-FARM algorithm improved the results obtained from the P-ARM approach. Specifically, it resulted in changes in the frequency of certain products. In this illustrative example, products 1092 and 1093 became more frequent with applying the P-FARM algorithm, while other products did not. It could be attributed to the fact that the P-FARM algorithm considers the items’ support value and their profit by adding profit support values after converting them into fuzzy numbers. P-FARM recommended products 1092 and 1093 in this illustrative example. 1093 was recommended because it was also recommended in ARM and P-ARM. Because of the high profit of product 1092, it was listed in the recommended product list for P-ARM and P-FARM. The critical point is that product 1003 was recommended by ARM and P-ARM. Still, when the fuzziness of the sales amount and profitability was considered, it was excluded from the recommendation list. P-ARM listed it because of its Profit Support. However, it was out of the recommendation list by P-FARM when Fuzzy Profit Support was calculated, which indicated it had a low level of profitability. Because sales amounts of product 1003 are in different Fuzzy Classes, ‘Medium’ and ‘High’. It resulted in a lower Fuzzy Count than a crisp count. Overall, the P-FARM algorithm identified that although they did not have a high support value, they had high-profit values, which made them more critical than other items.
The number of recommended products depends on the minimum support threshold.
Figure 2 depicts the effects of minimum support values on the methods. In every case, P-FARM is better or at least better than others. It means the average profit of the recommended products by P-FARM is better than those by ARM and P-ARM. Because ARM and P-FARM did not become applicable, the results yielded with
with five should be ignored.
7. Conclusions and Future Directions
Rapidly growing technologies for the e-commerce domain necessitate developing novel algorithms and methodologies to investigate customer data. Due to customers’ varying demands and companies’ highly competitive environment, little progress obtained by advanced methods may present companies with ample opportunities contrary to traditional methods. E-commerce companies need to understand users’ visit purposes to gain competitive advantages. One way to learn customers’ visit purposes is to analyze their purchased products and discover associations among them. Traditional studies such as Association Rule Mining (ARM) and Fuzzy Association Rule Mining (FARM) can produce some rules regarding customer transactions. Whereas ARM focuses on products purchased together, FARM improves ARM by considering purchased amounts. However, previous studies ignore profitability while they create association rules. This study proposes a novel approach called Profit-Support Fuzzy Association Rule Mining (P-FARM) to analyze customer transactions by considering company profit.
This study also compares ARM, P-ARM and the proposed P-FARM methods by a numerical illustration. This comparison indicates that the same products can be regarded as frequent in one technique and infrequent in another. Therefore the number of frequent items and consequently generated association rules are different in each technique. Adding extra inputs, such as sales amount and profit parameter, into the technique also decreases the support and confidence values. On the other hand, these additional inputs provide much information for decision-makers about recommending related products.
Whereas ARM generates higher support and confidence values, P-ARM produces lower values because of profit input. Support and confidence values are lower in P-FARM because of fuzziness and potential profit coefficient. Therefore, the number of rules generated is fewer in advanced association rule mining studies. The focus should be on rules instead of confidence values to compare various methods. Although the numbers of the produced rules in ARM (84 rules), P-ARM (49 rules) and P-FARM (23 rules) vary, the rules carry more information in P-FARM because it gives details about profitability and sales volume. FARM produces information about sales volume instead of just “sold or not (binary)” information in addition to the ARM.
The study presents exciting results about which products are sold together will increase profitability more. For example, the combination of products 1065M and 1199L has the highest fuzzy profit support (FPS). Selling these two products in the “Medium” and “Low” fuzzy class, respectively, results in a high profit for the company. The company should recommend product 1127, up to four items, to customers who bought products 1165 and 1164 in the “Medium” fuzzy class for both. These three products are frequently bought items. However, while recommending other relevant items is possible, this recommendation makes the company more profitable.
Further studies can add a “time” perspective to the proposed methodology. Customer demands may change over time. Hence, purchasing time can be considered to improve this research. Since the number of implementations on mining data streams increases, there is a lack of association rule mining on stream data considering profitability. Weblogs can be used for analysis instead of transaction data because transactions indicate only purchased customers’ behaviors. It is critical to know for e-commerce companies the visit purposes of customers who visited the webpage and left without buying anything.