Efficient Headline Generation with Hybrid Attention for Long Texts
Abstract
:1. Introduction
- (1)
- We propose a hybrid attention mechanism that combines sliding window and global attention to capture both local and global semantic dependencies, and design two ways to implement the hybrid attention.
- (2)
- In local semantic modeling, various sliding window sizes are compared for the effectiveness of the HG task, and the optimal window size for local semantic representation is determined for the HG task.
- (3)
- We conduct comparative experiments and in-depth analyses to verify the effectiveness of the proposed HG model with the hybrid attention mechanism, in terms of training time, memory overhead and the accuracy and readability of the final generated headlines.
2. Related Work
2.1. Rule-Based and Statistical Methods
2.2. Deep Neural Network Methods
2.3. Transformer-Based Models
2.4. Attention Mechanisms
3. Headline Generation Model with Hybrid Attention
3.1. Motivation for the Hybrid Attention Mechanism
3.2. Transformer-Based Headline Generation Model
3.3. Hybrid Attention
4. Experiments, Results and Discussion
4.1. Dataset
4.2. Performance Indicators
4.2.1. Syntactic Similarity Indicators
4.2.2. Semantic Similarity Indicators
4.2.3. Human Evaluation Indicators
- (1)
- Readability. Is the headline easy for readers to understand and read?
- (2)
- Informativeness. Does the headline contain rich and useful information?
- (3)
- Coherence. Are the content and logic of this headline coherent?
- (4)
- Conciseness. Is this headline concise in its content while conveying an effective message?
4.3. Comparison of Local Attention Mechanisms
4.4. Comparison of Hybrid Attention Mechanisms
4.5. Comparison of the Training and Prediction Costs
4.6. Comparison of Syntactic Similarity between the Generated and the Reference Headlines
4.7. Comparison of Semantic Similarity between the Generated and the Reference Headlines
4.8. Comparison of Human Evaluation Results between the Generated and the Reference Headlines
4.9. Comparison of the Generated Headline Instances
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lee, S.-H.; Choi, S.-W.; Lee, E.-B. A Question-Answering Model Based on Knowledge Graphs for the General Provisions of Equipment Purchase Orders for Steel Plants Maintenance. Electronics 2023, 12, 2504. [Google Scholar] [CrossRef]
- Ahmad, P.N.; Liu, Y.; Khan, K.; Jiang, T.; Burhan, U. BIR: Biomedical Information Retrieval System for Cancer Treatment in Electronic Health Record Using Transformers. Sensors 2023, 23, 9355. [Google Scholar] [CrossRef] [PubMed]
- Lu, Y.; Liu, Q.; Dai, D.; Xiao, X.; Lin, H.; Han, X.; Sun, L.; Wu, H. Unified Structure Generation for Universal Information Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 5755–5772. [Google Scholar]
- Peng, M.; Gao, B.; Zhu, J.; Huang, J.; Yuan, M.; Li, F. High Quality Information Extraction and Query-Oriented Summarization for Automatic Query-Reply in Social Network. Expert Syst. Appl. 2016, 44, 92. [Google Scholar] [CrossRef]
- Sakurai, T.; Utsumi, A. Query-Based Multidocument Summarization for Information Retrieval. In Proceedings of the NTCIR-4; National Institute of Informatics: Winter Garden, FL, USA, 2004. [Google Scholar]
- Deutsch, D.; Roth, D. Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia, 2–6 May 2023; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 575–588. [Google Scholar]
- Panthaplackel, S.; Benton, A.; Dredze, M. Updated Headline Generation: Creating Updated Summaries for Evolving News Stories. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 6438–6461. [Google Scholar]
- Akash, A.U.; Nayeem, M.T.; Shohan, F.T.; Islam, T. Shironaam: Bengali News Headline Generation Using Auxiliary Information. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia, 2–6 May 2023; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 52–67. [Google Scholar]
- Liu, H.; Guo, W.; Chen, Y.; Li, X. Contrastive Learning Enhanced Author-Style Headline Generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Industry Track, Abu Dhabi, Arab, 7–11 December 2022; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 5063–5072. [Google Scholar]
- Hayashi, Y.; Yanagimoto, H. Headline Generation with Recurrent Neural Network; Matsuo, T., Mine, T., Hirokawa, S., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 81–96. [Google Scholar]
- Thu, Y.; Pa, W.P. Myanmar News Headline Generation with Sequence-to-Sequence Model. In Proceedings of the 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), Yangon, Myanmar, 5–7 November 2020; pp. 117–122. [Google Scholar]
- Zhuoran, S.; Mingyuan, Z.; Haiyu, Z.; Shuai, Y.; Hongsheng, L. Efficient Attention: Attention with Linear Complexities. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 5–9 January 2021; IEEE: New York, NY, USA, 2021; pp. 3530–3538. [Google Scholar]
- Fan, A.; Grave, E.; Joulin, A. Reducing Transformer Depth on Demand with Structured Dropout. arXiv 2019, arXiv:1909.11556. [Google Scholar]
- Yang, K.; Ackermann, J.; He, Z.; Feng, G.; Zhang, B.; Feng, Y.; Ye, Q.; He, D.; Wang, L. Do Efficient Transformers Really Save Computation? arXiv 2024, arXiv:2402.13934. [Google Scholar]
- Dorr, B.; Zajic, D.; Schwartz, R. Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation. In Proceedings of the HLT-NAACL 03 on Text Summarization Workshop; Association for Computational Linguistics: Stroudsburg, PA, USA, 2003; pp. 1–8. [Google Scholar]
- Banko, M.; Mittal, V.O.; Witbrock, M.J. Headline Generation Based on Statistical Translation. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, Hong Kong, China, 3–6 October 2000; Association for Computational Linguistics: Hong Kong, China, 2000; pp. 318–325. [Google Scholar]
- Elman, J.L. Finding Structure in Time. Cogn. Sci. 1990, 14, 179. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735. [Google Scholar] [CrossRef] [PubMed]
- Bowman, S.R.; Vilnis, L.; Vinyals, O.; Dai, A.; Jozefowicz, R.; Bengio, S. Generating Sentences from a Continuous Space. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, 11–12 August 2016; Association for Computational Linguistics: Stroudsburg, PA, USA, 2016; pp. 10–21. [Google Scholar]
- Lopyrev, K. Generating News Headlines with Recurrent Neural Networks. arXiv 2015, arXiv:1512.01712,. [Google Scholar]
- Bengio, Y.; Simard, P.; Frasconi, P. Learning Long-Term Dependencies with Gradient Descent Is Difficult. IEEE Trans. Neural Netw. 1994, 5, 157. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Mohamed, A.; Okhonko, D.; Zettlemoyer, L. Transformers with Convolutional Context for ASR. arXiv 2020, arXiv:1904.11660. [Google Scholar]
- Zhang, S.; Chen, H.; Yang, H.; Sun, X.; Yu, P.S.; Xu, G. Graph Masked Autoencoders with Transformers. arXiv 2022, arXiv:2202.08391. [Google Scholar]
- Zhang, J.; Zhao, Y.; Saleh, M.; Liu, P.J. PEGASUS: Pre-Training with Extracted Gap-Sentences for Abstractive Summarization. In Proceedings of the Thirty-seventh International Conference on Machine Learning, Online, 13–18 July 2020; pp. 11328–11339. [Google Scholar]
- Li, Z.; Wu, J.; Miao, J.; Yu, X. News Headline Generation Based on Improved Decoder from Transformer. Sci. Rep. 2022, 12, 11648. [Google Scholar] [CrossRef] [PubMed]
- Yamada, K.; Hitomi, Y.; Tamori, H.; Sasano, R.; Okazaki, N.; Inui, K.; Takeda, K. Transformer-Based Lexically Constrained Headline Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 4085–4090. [Google Scholar]
- Bukhtiyarov, A.; Gusev, I. Advances of Transformer-Based Models for News Headline Generation. In Proceedings of the Ninth Conference on Artificial Intelligence and Natural Language, Helsinki, Finland, 7–9 October 2020; pp. 54–61. [Google Scholar]
- Tikhonova, M.; Shavrina, T.; Pisarevskaya, D.; Shliazhko, O. Using Generative Pretrained Transformer-3 Models for Russian News Clustering and Title Generation Tasks. In Proceedings of the Conference on Computational Linguistics and Intellectual Technologies, Lviv, Ukraine, 22–23 April 2021; pp. 1214–1223. [Google Scholar]
- Wang, Y.; Zhang, Z.; Zhao, Y.; Zhang, M.; Li, X. Design and Implementation of Automatic Generation System for Chinese Scientific and Technical Paper Titles. Data Anal. Knowl. Discov. 2023, 5, 61–71. [Google Scholar]
- Zhang, X.; Jiang, Y.; Shang, Y.; Cheng, Z.; Zhang, C.; Fan, X.; Xiao, Y.; Long, B. DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text Generation in E-Commerce Title and Review Summarization. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Online, 11–15 July 2021; pp. 2146–2150. [Google Scholar]
- Meng, Q.; Liu, B.; Sun, X.; Yan, H.; Liang, C.; Cao, J.; Lee, R.K.-W.; Bao, X. Attention-Fused Deep Relevancy Matching Network for Clickbait Detection. IEEE Trans. Comput. Soc. Syst. 2023, 10, 3120. [Google Scholar] [CrossRef]
- Cui, Z.; Sun, X.; Pan, L.; Liu, S.; Xu, G. Event-Based Incremental Recommendation via Factors Mixed Hawkes Process. Inf. Sci. 2023, 639, 119007. [Google Scholar] [CrossRef]
- Ma, T.; Pan, Q.; Rong, H.; Qian, Y.; Tian, Y.; Al-Nabhan, N. T-BERTSum: Topic-Aware Text Summarization Based on BERT. IEEE Trans. Comput. Soc. Syst. 2022, 9, 879. [Google Scholar] [CrossRef]
- Dong, L.; Yang, N.; Wang, W.; Wei, F.; Liu, X.; Wang, Y.; Gao, J.; Zhou, M.; Hon, H. Unified Language Model Pre-Training for Natural Language Understanding and Generation. In Proceedings of the 33rd Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 13042–13054. [Google Scholar]
- Hutchins, D.; Schlag, I.; Wu, Y.; Dyer, E.; Neyshabur, B. Block-Recurrent Transformers. Adv. Neural Inf. Process. Syst. 2022, 35, 33248. [Google Scholar]
- Liang, X.; Tang, Z.; Li, J.; Zhang, M. Open-Ended Long Text Generation via Masked Language Modeling. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 223–241. [Google Scholar]
- Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The Long-Document Transformer 2020. arXiv 2020, arXiv:2004.05150. [Google Scholar]
- Lin, C.-Y. ROUGE: A Package for Automatic Evaluation of Summaries. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, 21–26 July 2004; Association for Computational Linguistics: Stroudsburg, PA, USA, 2004; pp. 74–81. [Google Scholar]
- Ng, J.-P.; Abrecht, V. Better Summarization Evaluation with Word Embeddings for ROUGE. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1925–1930. [Google Scholar]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating Text Generation with BERT. In Proceedings of the 2020 International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Gu, J.; Lu, Z.; Li, H.; Li, V.O.K. Incorporating Copying Mechanism in Sequence-to-Sequence Learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; Association for Computational Linguistics: Stroudsburg, PA, USA, 2016; pp. 1631–1640. [Google Scholar]
- Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2692–2700. [Google Scholar]
- Zhou, Q.; Yang, N.; Wei, F.; Huang, S.; Zhou, M.; Zhao, T. Neural Document Summarization by Jointly Learning to Score and Select Sentences. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 654–663. [Google Scholar]
Indicators | Train | Val | Test |
---|---|---|---|
Average number of characters in text | 1232.8 | 1236.2 | 1226.4 |
Average number of sentences in text | 28.2 | 28.1 | 28.1 |
Average number of characters in sentences | 24.5 | 24.6 | 24.5 |
Average number of characters in headline | 20.3 | 20.4 | 20.3 |
Average compression ratio | 60.6 | 60.8 | 60.6 |
Model | ROUGE-1 | ROUGE-2 | ROUGE-L | Score | Percentage% |
---|---|---|---|---|---|
L8 | 0.3705 | 0.2684 | 0.3455 | 0.9845 | 0 |
L16 | 0.3762 | 0.2732 | 0.3507 | 1.0002 | 1.5 |
L32 | 0.3856 | 0.2826 | 0.3598 | 1.0281 | 4.4 |
L64 | 0.3904 | 0.2862 | 0.3644 | 1.0412 | 5.7 |
L128 | 0.3897 | 0.2847 | 0.3626 | 1.0372 | 5.3 |
L256 | 0.3903 | 0.2864 | 0.3647 | 1.0414 | 5.7 |
Model | ROUGE-1 | ROUGE-2 | ROUGE-L | Score | Percentage% |
---|---|---|---|---|---|
FTGA+L64 | 0.3930 | 0.2881 | 0.3668 | 1.0479 | 0.6 |
SGA+L64 | 0.3874 | 0.2830 | 0.3614 | 1.0320 | −0.8 |
Transformer | 0.3398 | 0.2409 | 0.3172 | 0.8979 | −13.8% |
Model | ROUGE-1 | ROUGE-2 | ROUGE-L | Score |
---|---|---|---|---|
CopyNetwork [42] | 0.24 | 0.13 | 0.23 | 0.60 |
PointerNetwork [43] | 0.25 | 0.16 | 0.22 | 0.63 |
NeuSum [44] | 0.28 | 0.19 | 0.24 | 0.71 |
Transformer [22] | 0.34 | 0.24 | 0.32 | 0.90 |
FTGA+L64 (ours) | 0.39 | 0.28 | 0.36 | 1.03 |
Model | ROUGE-WE | BERTScore | Score * |
---|---|---|---|
CopyNetwork | 0.14 | 0.64 | 0.78 |
PointerNetwork | 0.27 | 0.69 | 0.96 |
NeuSum | 0.29 | 0.70 | 0.99 |
Transformer | 0.33 | 0.72 | 1.05 |
FTGA+L64 (ours) | 0.36 | 0.74 | 1.10 |
Model | Readability | Informativeness | Coherence | Conciseness |
---|---|---|---|---|
CopyNetwork | 0.37 | 0.33 | 0.31 | 0.59 |
PointerNetwork | 1.49 | 1.31 | 1.42 | 0.56 |
NeuSum | 1.53 | 1.32 | 1.47 | 0.64 |
Transformer | 1.73 | 1.65 | 1.74 | 1.78 |
FTGA+L64 (ours) | 1.85 | 1.70 | 1.88 | 1.84 |
Original text | It’s the fourth anniversary of the Redline Team. In the past four years, we have held hands with more than 400 couples, and more than 100 couples have entered the marriage hall. Here we have witnessed the most beautiful love and the purest trust. For four years, from the initial attempt to the present, it has always been you who have persevered. Some people say that the best love is companionship. Here at our Redline platform, we have also walked all the way with everyone’s company. Thank you for having you along the way!...... (628 words) |
Referencing headline | Finding the best hardcore fans of the Redline platform |
Model | Generated Title |
CopyNetwork | It’s the fourth anniversary of the Redline team, and four years of holding hands in the halls of love here with us, whether you’re a newcomer or not! |
PointerNetwork | From now on the most beautiful Redline hardcore fan selection activity is about to kick off, no matter you are a newcomer or an old fan, no matter you are a pro-military girl, or a soldier brother, as long as you are enthusiastic enough, sincere enough can become our most beautiful hardcore fans! |
NeuSum | From now on the most beautiful Redline hardcore fan selection activity is about to kick off, no matter you are a newcomer or an old fan, no matter you are a pro-military girl, or a soldier brother, as long as you are enthusiastic enough, sincere enough can become our most beautiful hardcore fans! |
Transformer | The contest for hardcore Redline fan is now open! |
FTGA+L64 (ours) | The contest for the most beautiful hardcore Redline fan is now open! |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wan, W.; Zhang, C.; Huang, L. Efficient Headline Generation with Hybrid Attention for Long Texts. Electronics 2024, 13, 3558. https://doi.org/10.3390/electronics13173558
Wan W, Zhang C, Huang L. Efficient Headline Generation with Hybrid Attention for Long Texts. Electronics. 2024; 13(17):3558. https://doi.org/10.3390/electronics13173558
Chicago/Turabian StyleWan, Wenjin, Cong Zhang, and Lan Huang. 2024. "Efficient Headline Generation with Hybrid Attention for Long Texts" Electronics 13, no. 17: 3558. https://doi.org/10.3390/electronics13173558
APA StyleWan, W., Zhang, C., & Huang, L. (2024). Efficient Headline Generation with Hybrid Attention for Long Texts. Electronics, 13(17), 3558. https://doi.org/10.3390/electronics13173558