Personalized News Recommendation Method with Double-Layer Residual Connections and Double Multi-Head Self-Attention Mechanisms
Abstract
:1. Introduction
- The designed double residual connections and double multi-head attention mechanisms in this paper are capable of better capturing the word-level interaction information between a user’s historically accessed news and candidate news;
- In this paper, inner and outer double residual structures were designed to enhance the model’s effectiveness and prevent the overfitting of word-level interaction information between a user’s historically accessed news and candidate news during model training;
- The proposed method in this paper was validated by conducting a substantial number of experiments with the real-world MIND, released by Microsoft Research. Furthermore, this paper conducted a series of ablation studies to further explore the efficacy of our model.
2. Related Works
3. Methods of DDM
3.1. Candidate News Module
3.2. Historically Accessed News Module
3.3. Click Prediction Module
3.4. Model Training
4. Experiments
4.1. Datasets and the Experimental Setup
4.2. Evaluating Indicator
4.3. Model and Training Details
4.4. Performance Evaluation
- (1)
- LibFM [6]. A matrix-factorization-based recommendation method that extracts features from both user-accessed and candidate news and concatenates them as input vectors to the model.
- (2)
- DeepFM [7]. Similar to LibFM, it utilizes factorization machines for recommendations.
- (3)
- DKN [8]. This approach preprocesses the news using knowledge graphs and learns news representations from news titles using a three-channel convolutional neural network (CNN).
- (4)
- NPA [2], which applies personalized attention mechanisms to model users’ interests in different contexts and news.
- (5)
- NAML [24]. A neural news recommendation method with attentive multi-view learning.
- (6)
- LSTUR [25]. A neural news recommendation method that utilizes GRUs to learn user representations.
- (7)
- NRMS [11]. A neural news recommendation method that uses single-layer multi-head self-attention to learn user and news representations in the pre-training phase.
- (8)
- FIM [20], which leverages dilated convolutions for multi-scale text feature extraction and utilizes 3D convolutions with MaxPooling for fine-grained matching between browsed and candidate news at each semantic level.
- (9)
- KIM [26]. This approach primarily employs a graph co-attention network.
- (10)
- DDM. The method presented in this paper.
4.5. Ablation Study
4.5.1. Ablation Experiment with DDM and GRU
4.5.2. Comparison of Residual Structure and Attention Layers
4.5.3. Comparison of Additive Attention Mechanisms
4.6. The Effectiveness of DDM
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
LD | Linear dichroism |
DDM | Double-layer residual connections and double multi-head self-attention mechanisms |
BERT | Pre-training of Deep Bidirectional Transformers for Language Understanding |
LibFM | Factorization machines with libfm |
DeepFM | A Factorization-Machine based Neural Network for CTR Prediction |
DKN | Deep knowledge-aware network for news recommendations |
NAML | Neural news recommendation with attentive multi-view learning |
LSTUR | Neural news recommendation with long- and short-term user representations |
NPA | Neural news recommendation with personalized attention |
NRMS | Neural news recommendation with multi-head self-attention |
FIM | Fine-grained interest matching for neural news recommendations |
KIM | Personalized news recommendations with knowledge-aware interactive matching |
References
- Phelan, O.; McCarthy, K.; Bennett, M.; Smyth, B. Terms of a feather: Contentbased news recommendation and discovery using twitter. In Proceedings of the European Conference on Information Retrieval, Dublin, Ireland, 18–21 April 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 448–459. [Google Scholar]
- Wu, C.; Wu, F.; An, M.; Huang, J.; Huang, Y.; Xie, X. Npa: Neural news recommendation with personalized attention. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2576–2584. [Google Scholar]
- Wu, F.; Qiao, Y.; Chen, J.-H.; Wu, C.; Qi, T.; Lian, J.; Liu, D.; Xie, X.; Gao, J.; Wu, W.; et al. Mind: A large-scale dataset for news recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 3597–3606. [Google Scholar]
- Raza, S.; Ding, C. News recommender system: A review of recent progress, challenges, and opportunities. Artif. Intell. Rev. 2022, 55, 749–800. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Q.; Li, J.; Jia, Q.; Wang, C.; Zhu, J.; Wang, Z.; He, X. UNBERT: User-News Matching BERT for News Recommendation. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 19-27 August 2021; Volume 21. [Google Scholar]
- Rendle, S. Factorization machines with libfm. Acm Trans. Intell. Syst. Technol. (TIST) 2012, 3, 1–22. [Google Scholar] [CrossRef]
- Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X. Deepfm: A factorization-machine based neural network for ctr prediction. arXiv 2017, arXiv:1703.04247. [Google Scholar]
- Wang, H.; Zhang, F.; Xie, X.; Guo, M. Dkn: Deep knowledge-aware network for news recommendation. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 1835–1844. [Google Scholar]
- Okura, S.; Tagami, Y.; Ono, S.; Tajima, A. Embedding-based news recommendation for millions of users. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 1933–1942. [Google Scholar]
- Wu, C.; Wu, F.; An, M.; Qi, T.; Huang, J.; Huang, Y.; Xie, X. Neural news recommendation with heterogeneous user behavior. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 4876–4885. [Google Scholar]
- Wu, C.; Wu, F.; Ge, S.; Qi, T.; Huang, Y.; Xie, X. Neural news recommendation with multi-head self-attention. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), Hong Kong, China, 3–7 November 2019; pp. 6390–6395. [Google Scholar]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Liu, J.; Dolan, P.; Pedersen, E.R. Personalized news recommendation based on click behavior. In Proceedings of the 15th International Conference on Intelligent User Interfaces, Hong Kong, China, 7–10 February 2010; pp. 31–40. [Google Scholar]
- Capelle, M.; Frasincar, F.; Moerland, M.; Hogenboom, F. Semantics-based news recommendation. In Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics, Craiova, Romania, 6–8 June 2012; pp. 1–9. [Google Scholar]
- Son, J.-W.; Kim, A.-Y.; Park, S.-B. A location-based news article recommendation with explicit localized semantic analysis. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 28 July–1 August 2013; pp. 293–302. [Google Scholar]
- Garcin, F.; Dimitrakakis, C.; Faltings, B. Personalized news recommendation with context trees. In Proceedings of the 7th ACM Conference on Recommender Systems, Hong Kong, China, 12–16 October 2013; pp. 105–112. [Google Scholar]
- Pazzani, J.M.; Billsus, D. Content-based recommendation systems. In The Adaptive Web: Methods and Strategies of Web Personalization; Springer: Berlin/Heidelberg, Germany, 2007; pp. 325–341. [Google Scholar]
- Balabanović, M.; Shoham, Y. Fab: Content-based, collaborative recommendation. Commun. ACM 1997, 40, 66–72. [Google Scholar] [CrossRef]
- Koren, Y. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008. [Google Scholar]
- Wang, H.; Wu, F.; Liu, Z.; Xie, X. Fine-grained interest matching for neural news recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 836–845. [Google Scholar]
- Liu, S.; Chen, Z.; Liu, H.; Hu, X. User-video co-attention network for personalized micro-video recommendation. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Wu, C.; Wu, F.; An, M.; Huang, J.; Huang, Y.; Xie, X. Neural news recommendation with attentive multi-view learning. arXiv 2019, arXiv:1907.05576. [Google Scholar]
- An, M.; Wu, F.; Wu, C.; Zhang, K.; Liu, Z.; Xie, X. Neural news recommendation with long-and short-term user representations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 336–345. [Google Scholar]
- Qi, T.; Wu, F.; Wu, C.; Huang, Y. Personalized news recommendation with knowledge-aware interactive matching. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021. [Google Scholar]
- Tuan, N.M.D.; Minh, P.Q.N. Multimodal Fusion with BERT and Attention Mechanism for Fake News Detection. In Proceedings of the 2021 RIVF International Conference on Computing and Communication Technologies (RIVF), Hanoi, Vietnam, 19–21 August 2021; pp. 1–6. [Google Scholar]
- Manakul, P.; Gales, M.J.F. Long-Span Summarization via Local Attention and Content Selection. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 22–27 May 2021. [Google Scholar]
- Huang, J.; Han, Z.; Xu, H.; Liu, H. Adapted transformer network for news recommendation. Neurocomputing 2022, 469, 119–129. [Google Scholar] [CrossRef]
- Shaw, P.; Uszkoreit, J.; Vaswani, A. Self-Attention with Relative Position Representations. arXiv 2018, arXiv:1803.02155. [Google Scholar]
- He, P.; Gao, J.; Chen, W. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing. arXiv 2021, arXiv:2111.09543. [Google Scholar]
Train | Dev | Test | |
---|---|---|---|
User | 162,898 | 47,187 | 88,898 |
News | 76,904 | 53,897 | 57,856 |
Impressions | 199,998 | 50,002 | 100,000 |
Positive samples | 300,357 | 75,183 | 153,963 |
Negative samples | 7,060,083 | 1,779,492 | 3,740,561 |
Method | AUC | MRR | nDCG@5 | nDCG@10 |
---|---|---|---|---|
LibFM | 59.74 | 26.33 | 27.95 | 34.29 |
DeepFM | 59.89 | 26.21 | 27.74 | 34.06 |
DKN | 61.75 | 27.05 | 28.90 | 35.38 |
NPA | 63.21 | 29.11 | 31.70 | 37.81 |
NAML | 65.50 | 30.39 | 33.08 | 39.31 |
LSTUR | 64.38 | 29.46 | 31.89 | 38.17 |
NRMS | 64.83 | 30.01 | 32.52 | 38.92 |
FIM | 65.02 | 30.26 | 32.91 | 39.10 |
KIM | 66.25 | 31.62 | 34.97 | 41.16 |
DDM | 66.65 | 32.05 | 35.32 | 41.58 |
Improv. | 0.40 | 0.43 | 0.35 | 0.42 |
Ways | Evaluation Indicator | |||
---|---|---|---|---|
AUC | MRR | nDCG@5 | nDCG@10 | |
Pure multi-head attention | 64.83 | 30.01 | 32.52 | 38.92 |
With incorporation of the GRU module | 65.00 | 30.45 | 33.35 | 39.92 |
DDM module | 66.65 | 32.05 | 35.32 | 41.58 |
Ways | Evaluation Indicator | |||
---|---|---|---|---|
AUC | MRR | nDCG@5 | nDCG@10 | |
Self-attention mechanism | 65.05 | 30.79 | 33.73 | 40.21 |
Scaled dot-product attention mechanism | 65.33 | 31.27 | 34.19 | 40.62 |
Local attention mechanism | 65.54 | 30.75 | 33.65 | 40.29 |
Adaptive attention mechanism | 65.67 | 31.39 | 34.42 | 40.83 |
Additive attention mechanism | 66.65 | 32.05 | 35.32 | 41.58 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, D.; Zhu, Z.; Wang, Z.; Wang, J.; Xiao, L.; Chen, Y.; Zhao, D. Personalized News Recommendation Method with Double-Layer Residual Connections and Double Multi-Head Self-Attention Mechanisms. Appl. Sci. 2024, 14, 5667. https://doi.org/10.3390/app14135667
Zhang D, Zhu Z, Wang Z, Wang J, Xiao L, Chen Y, Zhao D. Personalized News Recommendation Method with Double-Layer Residual Connections and Double Multi-Head Self-Attention Mechanisms. Applied Sciences. 2024; 14(13):5667. https://doi.org/10.3390/app14135667
Chicago/Turabian StyleZhang, Dehai, Zhaoyang Zhu, Zhengwu Wang, Jianxin Wang, Liang Xiao, Yin Chen, and Di Zhao. 2024. "Personalized News Recommendation Method with Double-Layer Residual Connections and Double Multi-Head Self-Attention Mechanisms" Applied Sciences 14, no. 13: 5667. https://doi.org/10.3390/app14135667
APA StyleZhang, D., Zhu, Z., Wang, Z., Wang, J., Xiao, L., Chen, Y., & Zhao, D. (2024). Personalized News Recommendation Method with Double-Layer Residual Connections and Double Multi-Head Self-Attention Mechanisms. Applied Sciences, 14(13), 5667. https://doi.org/10.3390/app14135667