Prompt Learning with Structured Semantic Knowledge Makes Pre-Trained Language Models Better
Abstract
:1. Introduction
- (1)
- We introduce more abundant and fine-grained semantic knowledge from the Xinhua Dictionary and disease knowledge from MedicalKG into pre-trained language models, enhancing the models’ ability to understand Chinese word semantics and medical science.
- (2)
- We propose a novel long-answer prompt learning method (KLAPrompt), which provides a reasonable solution to two main challenges in answer engineering: (a) When there are many classes, how can we seek the proper answer space? (b) How can we decode the multi-token answers?
- (3)
- Extensive experiments on five Chinese NLP datasets and five biomedical datasets demonstrate that the proposed method significantly empowers the widely adopted pre-trained language models. The empirical studies also confirm that KLAPrompt with the discrete answer strategy is the best method to integrate structured semantic knowledge.
- (4)
- We generate a word sense prediction dataset (WSP) based on the Xinhua Dictionary, which is available at https://github.com/Xie-Zuotong/WSP (accessed on 28 April 2023). We also collect a disease and category prediction dataset (DCP) based on MedicalKG, which is available at https://github.com/Xie-Zuotong/DCP (accessed on 28 April 2023).
2. Related Works
2.1. Semantic Knowledge
2.2. Prompt Learning
2.3. Biomedical PLMs
3. Methodology
3.1. Prompt Engineering
3.1.1. Discrete Prompts
3.1.2. Continuous Prompts
3.2. Answer Engineering
3.2.1. Discrete Answers
3.2.2. Continuous Answers
3.2.3. Sentence Similarity
3.3. KLAPrompt’s Application in the Medical Field
4. Experiments
4.1. Datasets
4.1.1. Open Domain
4.1.2. Biomedical Domain
4.2. Implementation Details
4.3. Results on Open Domain
5. Discussion
Results on Biomedical Domain
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1 (Long and Short Papers), pp. 4171–4186. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. Adv. Neural Inf. Process. Syst. 2019, 32, 5753–5763. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Zellers, R.; Bisk, Y.; Schwartz, R.; Choi, Y. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 93–104. [Google Scholar]
- Qiu, X.; Sun, T.; Xu, Y.; Shao, Y.; Dai, N.; Huang, X. Pre-trained models for natural language processing: A survey. Sci. China Technol. Sci. 2020, 63, 1872–1897. [Google Scholar] [CrossRef]
- Han, X.; Zhang, Z.; Ding, N.; Gu, Y.; Liu, X.; Huo, Y.; Qiu, J.; Yao, Y.; Zhang, A.; Zhang, L.; et al. Pre-trained models: Past, present and future. AI Open 2021, 2, 225–250. [Google Scholar] [CrossRef]
- Feng, X.; Feng, X.; Qin, B.; Feng, Z.; Liu, T. Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; Volume 1, pp. 4071–4077. [Google Scholar]
- Zhang, S.; Zhang, Y.; Chen, Y.; Wu, D.; Xu, J.; Liu, J. Exploiting Morpheme and Cross-lingual Knowledge to Enhance Mongolian Named Entity Recognition. Trans. Asian Low-Resour. Lang. Inf. Process. 2022, 21, 94. [Google Scholar] [CrossRef]
- Li, Z.; Ding, N.; Liu, Z.; Zheng, H.; Shen, Y. Chinese relation extraction with multi-grained information and external linguistic knowledge. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 4377–4386. [Google Scholar]
- Alt, C.; Gabryszak, A.; Hennig, L. Probing Linguistic Features of Sentence-Level Representations in Neural Relation Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 1534–1545. [Google Scholar]
- Aharoni, R.; Goldberg, Y. Towards String-To-Tree Neural Machine Translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 2, Short Papers. pp. 132–140. [Google Scholar]
- Yu, Z.; Xian, Y.; Yu, Z.; Huang, Y.; Guo, J. Linguistic feature template integration for Chinese-Vietnamese neural machine translation. Front. Comput. Sci. 2022, 16, 163344. [Google Scholar] [CrossRef]
- Yu, Z.; Yu, Z.; Xian, Y.; Huang, Y.; Guo, J. Improving Chinese-Vietnamese Neural Machine Translation with Linguistic Differences. Trans. Asian Low-Resour. Lang. Inf. Process. 2022, 21, 1–12. [Google Scholar] [CrossRef]
- Liu, X.; Zheng, Y.; Du, Z.; Ding, M.; Qian, Y.; Yang, Z.; Tang, J. GPT understands, too. arXiv 2021, arXiv:2103.10385. [Google Scholar]
- Cui, L.; Wu, Y.; Liu, J.; Yang, S.; Zhang, Y. Template-Based Named Entity Recognition Using BART. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, 1–6 August 2021; pp. 1835–1845. [Google Scholar]
- Mahabadi, R.K.; Zettlemoyer, L.; Henderson, J.; Saeidi, M.; Mathias, L.; Stoyanov, V.; Yazdani, M. PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), Dublin, Ireland, 22–27 May 2022; Volume 1: Long Papers, pp. 3638–3652. [Google Scholar]
- Cui, G.; Hu, S.; Ding, N.; Huang, L.; Liu, Z. Prototypical Verbalizer for Prompt-based Few-shot Tuning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; Volume 1: Long Papers, pp. 7014–7024. [Google Scholar]
- Miller, G.A. WordNet: A lexical database for English. Commun. Acm 1995, 38, 39–41. [Google Scholar] [CrossRef]
- Dong, Z.; Dong, Q. HowNet-a hybrid language and knowledge resource. In Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering, Beijing, China, 26–29 October 2003; pp. 820–824. [Google Scholar]
- Bevilacqua, M.; Pasini, T.; Raganato, A.; Navigli, R. Recent trends in word sense disambiguation: A survey. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, Montreal, QC, Canada, 19–26 August 2021. [Google Scholar]
- Strubell, E.; Verga, P.; Andor, D.; Weiss, D.; McCallum, A. Linguistically-Informed Self-Attention for Semantic Role Labeling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 5027–5038. [Google Scholar]
- Kong, D.; Li, X.; Wang, S.; Li, J.; Yin, B. Learning visual-and-semantic knowledge embedding for zero-shot image classification. Appl. Intell. 2022, 53, 2250–2264. [Google Scholar] [CrossRef]
- Kiefer, S. CaSE: Explaining text classifications by fusion of local surrogate explanation models with contextual and semantic knowledge. Inf. Fusion 2022, 77, 184–195. [Google Scholar] [CrossRef]
- Grand, G.; Blank, I.A.; Pereira, F.; Fedorenko, E. Semantic projection recovers rich human knowledge of multiple object features from word embeddings. Nat. Hum. Behav. 2022, 6, 975–987. [Google Scholar] [CrossRef] [PubMed]
- Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Chen, X.; Zhang, H.; Tian, X.; Zhu, D.; Tian, H.; Wu, H. Ernie: Enhanced representation through knowledge integration. arXiv 2019, arXiv:1904.09223. [Google Scholar]
- Peters, M.E.; Neumann, M.; Logan, R.; Schwartz, R.; Joshi, V.; Singh, S.; Smith, N.A. Knowledge Enhanced Contextual Word Representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 43–54. [Google Scholar]
- Levine, Y.; Lenz, B.; Dagan, O.; Ram, O.; Padnos, D.; Sharir, O.; Shalev-Shwartz, S.; Shashua, A.; Shoham, Y. SenseBERT: Driving Some Sense into BERT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 4656–4667. [Google Scholar]
- Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv 2021, arXiv:2107.13586. [Google Scholar] [CrossRef]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef] [Green Version]
- Peng, Y.; Yan, S.; Lu, Z. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. In Proceedings of the 18th BioNLP Workshop and Shared Task, Florence, Italy, 1 August 2019; pp. 58–65. [Google Scholar]
- Beltagy, I.; Lo, K.; Cohan, A. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3615–3620. [Google Scholar]
- Huang, K.; Altosaar, J.; Ranganath, R. Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv 2019, arXiv:1904.05342. [Google Scholar]
- Gu, Y.; Tinn, R.; Cheng, H.; Lucas, M.; Usuyama, N.; Liu, X.; Naumann, T.; Gao, J.; Poon, H. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 2021, 3, 2. [Google Scholar] [CrossRef]
- Zhang, N.; Jia, Q.; Yin, K.; Dong, L.; Gao, F.; Hua, N. Conceptualized representation learning for chinese biomedical text mining. arXiv 2020, arXiv:2008.10813. [Google Scholar]
- Zhang, T.; Cai, Z.; Wang, C.; Qiu, M.; Yang, B.; He, X. SMedBERT: A Knowledge-Enhanced Pre-trained Language Model with Structured Semantics for Medical Text Mining. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; Volume 1: Long Papers, pp. 5882–5893. [Google Scholar]
- Hambardzumyan, K.; Khachatrian, H.; May, J. WARP: Word-level Adversarial ReProgramming. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; Volume 1: Long Papers, pp. 4921–4933. [Google Scholar]
- Cer, D.; Diab, M.; Agirre, E.; Lopez-Gazpio, I.; Specia, L. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv 2017, arXiv:1708.00055. [Google Scholar]
- Liu, W.; Zhou, P.; Zhao, Z.; Wang, Z.; Ju, Q.; Deng, H.; Wang, P. K-bert: Enabling language representation with knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 2901–2908. [Google Scholar]
- Conneau, A.; Rinott, R.; Lample, G.; Schwenk, H.; Stoyanov, V.; Williams, A.; Bowman, S.R. XNLI: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, Brussels, Belgium, 31 October–4 November 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 2475–2485. [Google Scholar]
- Xu, L.; Hu, H.; Zhang, X.; Li, L.; Cao, C.; Li, Y.; Xu, Y.; Sun, K.; Yu, D.; Yu, C.; et al. CLUE: A Chinese Language Understanding Evaluation Benchmark. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 4762–4772. [Google Scholar]
- Zhang, N.; Chen, M.; Bi, Z.; Liang, X.; Li, L.; Shang, X.; Yin, K.; Tan, C.; Xu, J.; Huang, F.; et al. CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; Volume 1: Long Papers, pp. 7888–7915. [Google Scholar]
- Cui, Y.; Che, W.; Liu, T.; Qin, B.; Wang, S.; Hu, G. Revisiting Pre-Trained Models for Chinese Natural Language Processing. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; pp. 657–668. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Word | Sense | Phrase |
---|---|---|
order | ID:06029 the way in which people or things are placed or arranged in relation to each other | in alphabetical order in chronological order in descending/ascending order |
ID:06030 the state that exists when people obey laws, rules or authority | keep the class in good order maintain order in the capital restore public order | |
ID:06031 a request for food or drinks in a restaurant; the food or drinks that you ask for | May I take your order? an order for steak and fries a side order |
Models | STS-B | Book Review | XNLI | Chnsenticorp | IFLYTEK |
---|---|---|---|---|---|
BERT | 50.75 | 86.62 | 76.8 | 93.3 | 60.52 |
BERT + KLAPrompt | 52.92 ↑ | 88.63 ↑ | 78.61 ↑ | 94.82 ↑ | 61.58 ↑ |
RoBERTa | 48.23 | 89.08 | 78.37 | 94.85 | 60.44 |
RoBERTa + KLAPrompt | 50.37 ↑ | 91.12 ↑ | 80.69 ↑ | 95.1 ↑ | 61.46 ↑ |
MacBERT | 52.92 | 88.78 | 79.05 | 94.98 | 60.82 |
MacBERT + KLAPrompt | 54.67 ↑ | 90.1 ↑ | 81.49 ↑ | 95.79 ↑ | 62.01 ↑ |
Models | XNLI |
---|---|
BERT | 76.8 |
BERT + | 77.34 (+0.54) |
BERT + Discrete | 77.52 (+0.72) |
BERT + Continuous | 77.72 (+0.92) |
BERT + Continuous Prompt + Discrete | 78.61 (+1.81) |
BERT + Continuous Prompt + Continuous | 78.28 (+1.48) |
BERT + Continuous Prompt + Sentence | 78.41 (+1.61) |
Prompt Type | Template T(x) | STS-B | Book Review | XNLI | Chnsenticorp | IFLYTEK |
---|---|---|---|---|---|---|
Discrete | The meaning of [C] is [Y]. [X] | 52.38 | 86.57 | 77.8 | 94.49 | 60.43 |
The meaning of “[C]” is [Y]. [X] | 51.63 | 86.05 | 76.64 | 94.65 | 60.4 | |
Among them, the meaning of [C] is [Y]. [X] | 49.12 | 86.92 | 77.04 | 94.74 | 60.44 | |
In this sentence, the meaning of [C] is [Y]. [X] | 51.36 | 86.71 | 77.68 | 94.41 | 60.25 | |
Continuous | [C] [Y]. [X], l = 1 | 52.45 | 88.58 | 78.33 | 94.82 | 60.14 |
[C] [Y]. [X], l = 2 | 50.75 | 88.63 | 77.8 | 94.65 | 60.25 | |
[C] [Y]. [X], l = 3 | 49.93 | 88.37 | 77.92 | 94.57 | 59.87 | |
[C] [Y]. [X], l = 4 | 51.09 | 88.52 | 77.32 | 94.33 | 60.56 | |
[C] [Y]. [X], l = 5 | 51.29 | 88.39 | 78.61 | 94.57 | 60.55 |
Tasks | Relation Extraction | Text Classification | Sentence Similarity Estimation | ||
---|---|---|---|---|---|
Datasets | CMeIE | CHIP-CDN | CHIP-CTC | KUAKE-QQR | KUAKE-QTR |
Metrics (%) | F1 | F1 | F1 | Accuracy | Accuracy |
BERT | 71.04 | 75.12 | 83.3 | 81.49 | 66.29 |
BERT + KLAPrompt | 72.12 ↑ | 76.28 ↑ | 84.12 ↑ | 82.99 ↑ | 67.49 ↑ |
MacBERT | 69.89 | 76.01 | 83.19 | 82.68 | 66.53 |
MacBERT + KLAPrompt | 72.64 ↑ | 77.23 ↑ | 83.9 ↑ | 82.99 ↑ | 67.77 ↑ |
MC-BERT | 69.08 | 76.51 | 82.2 | 82.36 | 66.3 |
MC-BERT + KLAPrompt | 71.81 ↑ | 77.44 ↑ | 83.58 ↑ | 83.05 ↑ | 67.54 ↑ |
MedBERT | 71.63 | 76.85 | 82.74 | 80.49 | 66.13 |
MedBERT + KLAPrompt | 72.35 ↑ | 78.53 ↑ | 84.12 ↑ | 82.05 ↑ | 67.2 ↑ |
SMedBERT | 70.53 | 75.49 | 82.56 | 82.36 | 65.53 |
SMedBERT + KLAPrompt | 72.65 ↑ | 76.73 ↑ | 84.25 ↑ | 83.55 ↑ | 66.98 ↑ |
Variants | F1 |
---|---|
SMedBERT | 70.53 |
SMedBERT + | 70.97 |
- Disease | 71.88 |
- Category | 72.33 |
SMedBERT + | 72.65 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zheng, H.-T.; Xie, Z.; Liu, W.; Huang, D.; Wu, B.; Kim, H.-G. Prompt Learning with Structured Semantic Knowledge Makes Pre-Trained Language Models Better. Electronics 2023, 12, 3281. https://doi.org/10.3390/electronics12153281
Zheng H-T, Xie Z, Liu W, Huang D, Wu B, Kim H-G. Prompt Learning with Structured Semantic Knowledge Makes Pre-Trained Language Models Better. Electronics. 2023; 12(15):3281. https://doi.org/10.3390/electronics12153281
Chicago/Turabian StyleZheng, Hai-Tao, Zuotong Xie, Wenqiang Liu, Dongxiao Huang, Bei Wu, and Hong-Gee Kim. 2023. "Prompt Learning with Structured Semantic Knowledge Makes Pre-Trained Language Models Better" Electronics 12, no. 15: 3281. https://doi.org/10.3390/electronics12153281
APA StyleZheng, H. -T., Xie, Z., Liu, W., Huang, D., Wu, B., & Kim, H. -G. (2023). Prompt Learning with Structured Semantic Knowledge Makes Pre-Trained Language Models Better. Electronics, 12(15), 3281. https://doi.org/10.3390/electronics12153281