Blin: A Multi-Task Sequence Recommendation Based on Bidirectional KL-Divergence and Linear Attention
Abstract
:1. Introduction
- Blin adopts the RandomPad method to replace traditional zero-padding, thereby alleviating data sparsity issues and improving the effective utilization of input space.
- Blin employs a linear attention mechanism that reduces the computational complexity of the attention dot product operations from quadratic O(n2) to near-linear O(n), while ensuring the accuracy of attention mechanisms for long sequences.
- Blin introduces bidirectional KL divergence loss as an auxiliary task for sequential recommendation, aiming to regularize the probability distributions derived from different representations of padded sequences. This loss, combined with the sequence recommendation loss, jointly updates the model parameters to achieve enhanced user preference representations.
- Experimental evaluations on multiple public datasets, compared with representative baseline methods, demonstrate that Blin achieves performance improvements over baseline methods to varying degrees.
2. Related Work
2.1. Sequence-Based Recommendation
2.2. Linear Attention Mechanisms
3. Methodology
3.1. Problem Statement
3.2. RandomPad
3.3. Embedding Layer
3.4. Transformer Layer
3.4.1. Definition of Traditional Dot Product Attention Method
3.4.2. Generalization of Kernel-Based Dot Product Attention
3.4.3. Linear Attention Mechanism
3.5. Bidirectional KL Divergence Loss
3.6. Model Training
4. Experiments
4.1. Datasets
- Amazon: The Amazon dataset is a large-scale dataset that records user reviews of products on the Amazon website, making it a classic dataset for recommendation systems. We selected Beauty and Sports as two different datasets from the Amazon dataset.
- Yelp: The Yelp dataset is a well-known open-source dataset obtained through a business platform.
- MovieLens-1M (ML-1M): ML-1M is a dense movie recommendation dataset widely used for evaluating recommendation algorithms.
4.2. Evaluation Metrics
- HR@K: HR@K is a metric used to measure the recall rate. Specifically, HR@K evaluates the proportion of the top K positions in the recommendation list that successfully “hit” the user’s true preference items.
- NDCG@K: NDCG@K focuses more on the ranking performance of recommendations. In NDCG@K, not only should the recommended items match the user’s true interests, but the items ranked higher should also match the user’s higher interest levels.
4.3. Comparison Methods
- GRU4Rec [3]: An RNN-based method that introduces Gated Recurrent Units (GRUs) to explore the dependencies between items in a sequence.
- Caser [2]: A CNN-based method that uses horizontal and vertical convolution to model users’ dynamic preferences.
- SASRec [5]: The first method to introduce Transformers into sequential recommendation, modeling user behavior sequences based on a unidirectional self-attention mechanism.
- TiSASRec [14]: Builds on SASRec by incorporating absolute positions and the time intervals between items for sequence modeling.
- CL4SRec [15]: Proposes various data augmentation methods to construct contrastive learning tasks, adding a contrastive learning objective to the original SASRec objective.
- LinRec [30]: A Transformer-based sequential recommendation model that proposes L2-normalized linear attention mechanisms to reduce the computational cost of traditional attention.
- MStein [31]: Calculates the Wasserstein distance between augmented sequences as a self-supervised learning framework for mutual information in sequential recommendation.
4.4. Experimental Details
4.5. Overall Performance
4.6. Ablation Study
- Blin-ZC: Removes the RandomPad padding method and replaces it with traditional zero-padding, using item cropping as the sequence data augmentation method.
- Blin-D: Replaces the linear attention mechanism used in this paper with a traditional attention mechanism.
- Blin-DWC: Removes the DWC module from the linear attention mechanism.
- Blin-BKL: Removes the auxiliary learning task and relies solely on sequence loss for model training.
4.7. Hyperparameter Study
4.8. Computational Cost Analysis
5. Conclusions and Outlook
5.1. Conclusions
5.2. Outlook
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, S.; Hu, L.; Wang, Y.; Cao, L.; Sheng, Q.Z.; Orgun, M. Sequential recommender systems: Challenges, progress and prospects. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August 2019; pp. 6332–6338. [Google Scholar]
- Tang, J.; Wang, K. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, Los Angeles, CA, USA, 5–9 February 2018; pp. 565–573. [Google Scholar]
- Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based Recommendations with Recurrent Neural Networks. arXiv 2015, arXiv:1511.06939. [Google Scholar]
- Zhang, S.; Chen, L.; Wang, C.; Li, S.; Xiong, H. Temporal Graph Contrastive Learning for Sequential Recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 9359–9367. [Google Scholar]
- Kang, W.C.; McAuley, J. Self-attentive sequential recommendation. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 197–206. [Google Scholar]
- Sun, F.; Liu, J.; Wu, J.; Pei, C.; Lin, X.; Ou, W.; Jiang, P. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1441–1450. [Google Scholar]
- Dang, Y.; Yang, E.; Guo, G.; Jiang, L.; Wang, X.; Xu, X.; Sun, Q.; Liu, H. TiCoSeRec: Augmenting data to uniform sequences by time intervals for effective recommendation. IEEE Trans. Knowl. Data Eng. 2023, 36, 2686–2700. [Google Scholar] [CrossRef]
- Dang, Y.; Yang, E.; Guo, G.; Jiang, L.; Wang, X.; Xu, X.; Sun, Q.; Liu, H. Uniform sequence better: Time interval aware data augmentation for sequential recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 4225–4232. [Google Scholar]
- Dang, Y.; Liu, Y.; Yang, E.; Guo, G.; Jiang, L.; Wang, X.; Zhao, J. Repeated Padding as Data Augmentation for Sequential Recommendation. arXiv 2024, arXiv:2403.06372. [Google Scholar]
- Adler, A.; Tang, J.; Polyanskiy, Y. Quantization of random distributions under KL divergence. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Virtual, 12–20 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2762–2767. [Google Scholar]
- Chong, L.; Liu, X.; Zheng, R.; Zhang, L.; Liang, X.; Li, J.; Wu, L.; Zhang, M.; Lin, L. CT4Rec: Simple yet Effective Consistency Training for Sequential Recommendation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 3901–3913. [Google Scholar]
- Rendle, S.; Freudenthaler, C.; Schmidt-Thieme, L. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 811–820. [Google Scholar]
- Zhang, T.; Zhao, P.; Liu, Y.; Sheng, V.S.; Xu, J.; Wang, D.; Liu, G.; Zhou, X. Feature-level deeper self-attention network for sequential recommendation. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August 2019; pp. 4320–4326. [Google Scholar]
- Li, J.; Wang, Y.; McAuley, J. Time interval aware self-attention for sequential recommendation. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 322–330. [Google Scholar]
- Xie, X.; Sun, F.; Liu, Z.; Wu, S.; Gao, J.; Zhang, J.; Ding, B.; Cui, B. Contrastive learning for sequential recommendation. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Virtual, 9–12 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1259–1273. [Google Scholar]
- Liu, Z.; Chen, Y.; Li, J.; Yu, P.S.; McAuley, J.; Xiong, C. Contrastive self-supervised sequential recommendation with robust augmentation. arXiv 2021, arXiv:2108.06479. [Google Scholar]
- Qiu, R.; Huang, Z.; Yin, H.; Wang, Z. Contrastive learning for representation degeneration problem in sequential recommendation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, Tempe, AZ, USA, 21–25 February 2022; pp. 813–823. [Google Scholar]
- Zhang, Y.; Liu, Y.; Xu, Y.; Xiong, H.; Lei, C.; He, W.; Cui, L.; Miao, C. Enhancing sequential recommendation with graph contrastive learning. arXiv 2022, arXiv:2205.14837. [Google Scholar]
- Choromanski, K.; Likhosherstov, V.; Dohan, D.; Song, X.; Gane, A.; Sarlos, T.; Hawkins, P.; Davis, J.; Mohiuddin, A.; Kaiser, L.; et al. Rethinking Attention with Performers. arXiv 2020, arXiv:2009.14794. [Google Scholar]
- Li, R.; Su, J.; Duan, C.; Zheng, S. Linear attention mechanism: An efficient attention for semantic segmentation. arXiv 2020, arXiv:2007.14902. [Google Scholar]
- Katharopoulos, A.; Vyas, A.; Pappas, N.; Fleuret, F. Transformers are rnns: Fast autoregressive transformers with linear attention. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 5156–5165. [Google Scholar]
- Shen, Z.; Zhang, M.; Zhao, H.; Yi, S.; Li, H. Efficient attention: Attention with linear complexities. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 3531–3539. [Google Scholar]
- Bolya, D.; Fu, C.Y.; Dai, X.; Zhang, P.; Hoffman, J. Hydra attention: Efficient attention with many heads. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer Nature: Cham, Switzerland, 2022; pp. 35–49. [Google Scholar]
- Han, D.; Pan, X.; Han, Y.; Song, S.; Huang, G. Flatten transformer: Vision transformer using focused linear attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 5961–5971. [Google Scholar]
- Guo, J.; Chen, X.; Tang, Y.; Wang, Y. SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization. arXiv 2024, arXiv:2405.11582. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Liu, L.; Cai, L.; Zhang, C.; Zhao, X.; Gao, J.; Wang, W.; Lv, Y.; Fan, W.; Wang, Y.; He, M.; et al. Linrec: Linear attention mechanism for long-term sequential recommender systems. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 23–27 July 2023; pp. 289–299. [Google Scholar]
- Fan, Z.; Liu, Z.; Peng, H.; Yu, P.S. Mutual wasserstein discrepancy minimization for sequential recommendation. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 1375–1385. [Google Scholar]
Dataset | Users | Items | Interactions | Sparsity |
---|---|---|---|---|
Beauty | 22,363 | 12,101 | 198,502 | 99.73% |
Sports | 35,958 | 18,357 | 296,337 | 99.95% |
Yelp | 30,431 | 20,033 | 316,354 | 99.95% |
ML-1M | 6041 | 3417 | 999,611 | 95.16% |
Datasets | Metric | GRU4Rec | Caser | SASRec | TiSASRec | CL4SRec | Linrec | MStein | Blin |
---|---|---|---|---|---|---|---|---|---|
Beauty | HR@10 | 0.1601 | 0.1498 | 0.2775 | 0.2788 | 0.2931 | 0.2772 | 0.3184 | 0.3319 |
HR@20 | 0.2172 | 0.2012 | 0.3572 | 0.3606 | 0.3648 | 0.3558 | 0.3876 | 0.4051 | |
NDCG@10 | 0.1013 | 0.0966 | 0.1752 | 0.1794 | 0.1906 | 0.1774 | 0.2122 | 0.2218 | |
NDCG@20 | 0.1152 | 0.1078 | 0.1961 | 0.1989 | 0.2102 | 0.1978 | 0.2294 | 0.2446 | |
Sports | HR@10 | 0.1526 | 0.1453 | 0.2763 | 0.2782 | 0.2854 | 0.2766 | 0.2988 | 0.3112 |
HR@20 | 0.2322 | 0.2221 | 0.3769 | 0.3794 | 0.3780 | 0.3762 | 0.3941 | 0.4102 | |
NDCG@10 | 0.0824 | 0.0813 | 0.1574 | 0.1581 | 0.1739 | 0.1597 | 0.1824 | 0.1911 | |
NDCG@20 | 0.1037 | 0.1015 | 0.1836 | 0.1846 | 0.1964 | 0.1859 | 0.2067 | 0.2158 | |
Yelp | HR@10 | 0.2569 | 0.2546 | 0.4243 | 0.4244 | 0.4462 | 0.4239 | 0.4772 | 0.4894 |
HR@20 | 0.4273 | 0.4252 | 0.5988 | 0.5902 | 0.6003 | 0.5978 | 0.6281 | 0.6403 | |
NDCG@10 | 0.1253 | 0.1234 | 0.2349 | 0.2448 | 0.2652 | 0.2366 | 0.2846 | 0.2944 | |
NDCG@20 | 0.1684 | 0.1666 | 0.2796 | 0.2803 | 0.3048 | 0.2842 | 0.3235 | 0.3369 | |
ML-1M | HR@10 | 0.3322 | 0.3145 | 0.5588 | 0.5609 | 0.5811 | 0.5626 | 0.6068 | 0.5701 |
HR@20 | 0.4301 | 0.4028 | 0.6747 | 0.6842 | 0.7024 | 0.6807 | 0.7252 | 0.6884 | |
NDCG@10 | 0.1823 | 0.1726 | 0.3022 | 0.3084 | 0.3232 | 0.3068 | 0.3371 | 0.3124 | |
NDCG@20 | 0.2269 | 0.2165 | 0.3846 | 0.3928 | 0.4142 | 0.3892 | 0.4308 | 0.3985 |
Model | HR@10 | NDCG@10 |
---|---|---|
(A) Blin-ZC | 0.3167 | 0.2088 |
(B) Blin-D | 0.3278 | 0.2196 |
(C) Blin-DWC | 0.3256 | 0.2178 |
(D) Blin-BKL | 0.3198 | 0.2114 |
(E) Blin | 0.3319 | 0.2218 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bai, Y.; Wang, H.; He, J. Blin: A Multi-Task Sequence Recommendation Based on Bidirectional KL-Divergence and Linear Attention. Mathematics 2024, 12, 2391. https://doi.org/10.3390/math12152391
Bai Y, Wang H, He J. Blin: A Multi-Task Sequence Recommendation Based on Bidirectional KL-Divergence and Linear Attention. Mathematics. 2024; 12(15):2391. https://doi.org/10.3390/math12152391
Chicago/Turabian StyleBai, Yanfeng, Haitao Wang, and Jianfeng He. 2024. "Blin: A Multi-Task Sequence Recommendation Based on Bidirectional KL-Divergence and Linear Attention" Mathematics 12, no. 15: 2391. https://doi.org/10.3390/math12152391
APA StyleBai, Y., Wang, H., & He, J. (2024). Blin: A Multi-Task Sequence Recommendation Based on Bidirectional KL-Divergence and Linear Attention. Mathematics, 12(15), 2391. https://doi.org/10.3390/math12152391