Parameter-Efficient Fine-Tuning Method for Task-Oriented Dialogue Systems
Abstract
:1. Introduction
2. Related Work
2.1. Pre-Trained Language Models
2.2. Task-Oriented Dialogue System
2.3. Parameter-Efficient Fine-Tuning Method
2.3.1. PEFT Method without Adding Parameters
2.3.2. PEFT Method with Added Parameters
3. Design
3.1. End-to-End Dialogue Modeling
3.2. The Proposed Model
3.3. Domain Adaptation
4. Evaluation
4.1. Dataset and Evaluation Metrics
4.2. Adapter Types
Method | Inform | Success | BLEU | Comb. | Params |
---|---|---|---|---|---|
Fine-tuning | 87.8 | 75.3 | 19.89 | 101.44 | 100% |
Fine-tuning (our run) | 83.7 | 75.4 | 19.07 | 98.62 | 100% |
Prefix tuning | 58.5 | 42.7 | 12.28 | 62.88 | 0.30% |
Houlsby Adapter | 82.0 | 71.8 | 17.50 | 94.40 | 1.32% |
Parallel Adapter | 83.4 | 74.0 | 19.14 | 97.84 | 1.32% |
4.3. Performance Comparison for the Number of Adapters
Method | Inform | Success | BLEU | Comb. | Params |
---|---|---|---|---|---|
Fine-tuning | 87.8 | 75.3 | 19.89 | 101.44 | 100% |
Fine-tuning (our run) | 83.7 | 75.4 | 19.07 | 98.62 | 100% |
Houlsby Adapter | 82.0 | 71.8 | 17.50 | 94.40 | 1.32% |
Houlsby Adapter (3) | 87.8 | 77.3 | 17.73 | 100.28 | 3.96% |
Houlsby Adapter (5) | 89.4 | 76.9 | 17.58 | 100.73 | 6.60% |
Houlsby Adapter (7) | 85.6 | 77.7 | 17.62 | 99.27 | 9.24% |
Parallel Adapter | 83.4 | 74.0 | 19.14 | 97.84 | 1.32% |
Parallel Adapter (3) | 87.4 | 76.1 | 17.58 | 99.33 | 3.96% |
Parallel Adapter (5) | 86.7 | 76.9 | 19.15 | 100.95 | 6.60% |
Parallel Adapter (7) | 87.0 | 75.4 | 19.61 | 100.81 | 9.24% |
4.4. Prefix-Tuning Performance Comparison
Method | Inform | Success | BLEU | Comb. | Params |
---|---|---|---|---|---|
Fine-tuning | 87.8 | 75.3 | 19.89 | 101.44 | 100% |
Fine-tuning (our run) | 83.7 | 75.4 | 19.07 | 98.62 | 100% |
Houlsby Adapter (3) | 87.8 | 77.3 | 17.73 | 100.28 | 3.96% |
Houlsby Adapter (3) + prefix tuning | 84.5 | 74.1 | 18.38 | 97.68 | 4.27% |
Houlsby Adapter (5) | 89.4 | 76.9 | 17.58 | 100.73 | 6.60% |
Houlsby Adapter (5) + prefix tuning | 88.3 | 77.4 | 18.01 | 100.86 | 6.90% |
Parallel Adapter (3) | 87.4 | 76.1 | 17.58 | 99.33 | 3.96% |
Parallel Adapter (3) + prefix tuning | 88.3 | 78.4 | 19.38 | 102.73 | 4.27% |
Parallel Adapter (5) | 86.7 | 76.9 | 19.15 | 100.95 | 6.60% |
Parallel Adapter (5) + prefix tuning | 86.5 | 75.2 | 18.92 | 99.77 | 6.90% |
4.5. Low-Resource Conditions
4.6. Prefix Length
4.7. Efficiency
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
NLU | Natural Language Understanding |
DST | Dialogue State Tracking |
POL | Dialogue Policy Learning |
NLG | Natural Language Generation |
PEFT | Parameter-Efficient Fine-Tuning method |
TOD | Task-Oriented Dialogue system |
References
- Young, S.J. Probabilistic methods in spoken–dialogue systems. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 2000, 358, 1389–1402. [Google Scholar] [CrossRef]
- Su, Y.; Shu, L.; Mansimov, E.; Gupta, A.; Cai, D.; Lai, Y.A.; Zhang, Y. Multi-task pre-training for plug-and-play task-oriented dialogue system. arXiv 2021, arXiv:2109.14739. [Google Scholar]
- Lin, Z.; Madotto, A.; Winata, G.I.; Fung, P. Mintl: Minimalist transfer learning for task-oriented dialogue systems. arXiv 2020, arXiv:2009.12005. [Google Scholar]
- Lee, Y. Improving end-to-end task-oriented dialog system with a simple auxiliary task. Findings of the Association for Computational Linguistics. In Proceedings of the EMNLP 2021, Punta Cana, Dominican Republic, 7 November 2021; pp. 1296–1303. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems 30, Proceedings of the NIPS, Long Beach, CA, USA, 4–9 December 2007; MIT Press: Cambridge, MA, USA, 2007; pp. 5998–6008. [Google Scholar]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33, Proceedings of the NIPS 2020, Vancouver, BC, Canada, 6–12 December 2020; MIT Press: Cambridge, MA, USA, 2020; pp. 1877–1901. [Google Scholar]
- Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 2790–2799. [Google Scholar]
- Li, X.L.; Liang, P. Prefix tuning: Optimizing continuous prompts for generation. arXiv 2021, arXiv:2101.00190. [Google Scholar]
- Mangrulkar, S.; Gugger, S.; Debut, L.; Belkada, Y.; Paul, S. PEFT: State-of-the-Art Parameter-Efficient Fine-Tuning Methods. 2022. Available online: https://github.com/huggingface/peft (accessed on 6 July 2023).
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text Transformer. J. Mach. Learn. Res. 2020, 21, 5485–5551. [Google Scholar]
- Budzianowski, P.; Wen, T.H.; Tseng, B.H.; Casanueva, I.; Ultes, S.; Ramadan, O.; Gašić, M. MultiWOZ—A large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modeling. arXiv 2018, arXiv:1810.00278. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional Transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- He, P.; Liu, X.; Gao, J.; Chen, W. Deberta: Decoding-enhanced bert with disentangled attention. arXiv 2020, arXiv:2006.03654. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. Technical Report. OpenAI. 2018. Available online: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf (accessed on 6 July 2023).
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Thoppilan, R.; De Freitas, D.; Hall, J.; Shazeer, N.; Kulshreshtha, A.; Cheng, H.T.; Jin, A.; Bos, T.; Baker, L.; Du, Y.; et al. Lamda: Language models for dialog applications. arXiv 2022, arXiv:2201.08239. [Google Scholar]
- Zhang, S.; Roller, S.; Goyal, N.; Artetxe, M.; Chen, M.; Chen, S.; Dewan, C.; Diab, M.; Li, X.; Lin, X.V.; et al. Opt: Open pre-trained Transformer language models. arXiv 2022, arXiv:2205.01068. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
- Wen, T.H.; Vandyke, D.; Mrksic, N.; Gasic, M.; Rojas-Barahona, L.M.; Su, P.H.; Ultes, S.; Young, S. A network-based end-to-end trainable task-oriented dialogue system. arXiv 2016, arXiv:1604.04562. [Google Scholar]
- Lei, W.; Jin, X.; Kan, M.Y.; Ren, Z.; He, X.; Yin, D. Sequicity: Simplifying task-oriented dialogue systems with single sequence-to-sequence architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 1437–1447. [Google Scholar]
- Zhang, Y.; Ou, Z.; Yu, Z. Task-oriented dialog systems that consider multiple appropriate responses under the same context. In Proceedings of the AAAI Conference on Artificial Intelligence, Hilton, NY, USA, 7–12 February 2020; Volume 34, pp. 9604–9611. [Google Scholar]
- Zhang, Y.; Ou, Z.; Wang, H.; Feng, J. A probabilistic end-to-end task-oriented dialog model with latent belief states towards semi-supervised learning. arXiv 2020, arXiv:2009.08115. [Google Scholar]
- Tseng, B.H.; Dai, Y.; Kreyssig, F.; Byrne, B. Transferable dialogue systems and user simulators. arXiv 2021, arXiv:2107.11904. [Google Scholar]
- Lubis, N.; Geishauser, C.; Heck, M.; Lin, H.c.; Moresi, M.; van Niekerk, C.; Gašić, M. LAVA: Latent action spaces via variational auto-encoding for dialogue policy optimization. arXiv 2020, arXiv:2011.09378. [Google Scholar]
- Jeon, H.; Lee, G.G. DORA: Towards policy optimization for task-oriented dialogue system with efficient context. Comput. Speech Lang. 2022, 72, 101310. [Google Scholar] [CrossRef]
- Lee, H.; Jo, S.; Kim, H.; Jung, S.; Kim, T.Y. Sumbt+ larl: Effective multi-domain end-to-end neural task-oriented dialog system. IEEE Access 2021, 9, 116133–116146. [Google Scholar] [CrossRef]
- Ramachandran, G.S.; Hashimoto, K.; Xiong, C. Causal-aware safe policy improvement for task-oriented dialogue. arXiv 2021, arXiv:2103.06370. [Google Scholar]
- Jeon, H.; Lee, G.G. Domain state tracking for a simplified dialogue system. arXiv 2021, arXiv:2103.06648. [Google Scholar]
- Hosseini-Asl, E.; McCann, B.; Wu, C.S.; Yavuz, S.; Socher, R. A simple language model for task-oriented dialogue. In Advances in Neural Information Processing Systems 33, Proceedings of the Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6–12 December 2020; MIT Press: Cambridge, MA, USA, 2020; pp. 20179–20191. [Google Scholar]
- Peng, B.; Li, C.; Li, J.; Shayandeh, S.; Liden, L.; Gao, J. Soloist: Building task bots at scale with transfer learning and machine teaching. Trans. Assoc. Comput. Linguist. 2021, 9, 807–824. [Google Scholar] [CrossRef]
- Chopra, S.; Hadsell, R.; LeCun, Y. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 539–546. [Google Scholar]
- Yang, Y.; Li, Y.; Quan, X. Ubar: Towards fully end-to-end task-oriented dialog system with gpt-2. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 14230–14238. [Google Scholar]
- He, W.; Dai, Y.; Zheng, Y.; Wu, Y.; Cao, Z.; Liu, D.; Jiang, P.; Yang, M.; Huang, F.; Si, L.; et al. Galaxy: A generative pre-trained model for task-oriented dialog with semi-supervised learning and explicit policy injection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 22 February–1 March 2022; Volume 36, pp. 10749–10757. [Google Scholar]
- Lee, J.; Tang, R.; Lin, J. What would elsa do? freezing layers during Transformer fine-tuning. arXiv 2019, arXiv:1911.03090. [Google Scholar]
- Ravfogel, S.; Ben-Zaken, E.; Goldberg, Y. Bitfit: Simple parameter-efficient fine-tuning for Transformer-based masked languagemodels. arXiv 2021, arXiv:2106.10199. [Google Scholar]
- Pfeiffer, J.; Kamath, A.; Rücklé, A.; Cho, K.; Gurevych, I. AdapterFusion: Non-destructive task composition for transfer learning. arXiv 2020, arXiv:2005.00247. [Google Scholar]
- Zhu, Y.; Feng, J.; Zhao, C.; Wang, M.; Li, L. Counter-interference Adapter for multilingual machine translation. arXiv 2021, arXiv:2104.08154. [Google Scholar]
- Bapna, A.; Arivazhagan, N.; Firat, O. Simple, scalable adaptation for neural machine translation. arXiv 2019, arXiv:1909.08478. [Google Scholar]
- Pfeiffer, J.; Vulić, I.; Gurevych, I.; Ruder, S. Mad-x: An Adapter-based framework for multi-task cross-lingual transfer. arXiv 2020, arXiv:2005.00052. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. arXiv 2021, arXiv:2106.09685. [Google Scholar]
- He, J.; Zhou, C.; Ma, X.; Berg-Kirkpatrick, T.; Neubig, G. Towards a unified view of parameter-efficient transfer learning. arXiv 2021, arXiv:2110.04366. [Google Scholar]
- Mao, Y.; Mathias, L.; Hou, R.; Almahairi, A.; Ma, H.; Han, J.; Yih, W.t.; Khabsa, M. Unipelt: A unified framework for parameter-efficient language model tuning. arXiv 2021, arXiv:2110.07577. [Google Scholar]
- Nekvinda, T.; Dušek, O. Shades of BLEU, flavours of success: The case of MultiWOZ. arXiv 2021, arXiv:2106.05555. [Google Scholar]
- Aghajanyan, A.; Zettlemoyer, L.; Gupta, S. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv 2020, arXiv:2012.13255. [Google Scholar]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 6–12 July 2002; pp. 311–318. [Google Scholar]
- Mehri, S.; Srinivasan, T.; Eskenazi, M. Structured fusion networks for dialog. arXiv 2019, arXiv:1907.10016. [Google Scholar]
Model | Inform | Success | BLEU | Comb. |
---|---|---|---|---|
1% of training data | ||||
Baseline | 66.5 | 51.1 | 12.05 | 70.85 |
PEFTTOD | 51.3 | 34.7 | 9.64 | 52.64 |
5% of training data | ||||
Baseline | 80.0 | 63.1 | 14.82 | 86.37 |
PEFTTOD | 76.6 | 54.3 | 17.03 | 82.48 |
10% of training data | ||||
Baseline | 79.5 | 65.6 | 16.73 | 89.28 |
PEFTTOD | 84.5 | 69.7 | 15.98 | 93.08 |
20% of training data | ||||
Baseline | 85.4 | 69.0 | 15.77 | 92.97 |
PEFTTOD | 82.9 | 70.9 | 17.17 | 94.07 |
Model | Training Time | Storage Space | Trainable Parameter |
---|---|---|---|
Baseline | 1109 s (100%) | 240 M (100%) | 60.5 M (100%) |
PEFTTOD | 882 s (79.5%) | 10 M (4.27%) | 2.5 M (4.27%) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mo, Y.; Yoo, J.; Kang, S. Parameter-Efficient Fine-Tuning Method for Task-Oriented Dialogue Systems. Mathematics 2023, 11, 3048. https://doi.org/10.3390/math11143048
Mo Y, Yoo J, Kang S. Parameter-Efficient Fine-Tuning Method for Task-Oriented Dialogue Systems. Mathematics. 2023; 11(14):3048. https://doi.org/10.3390/math11143048
Chicago/Turabian StyleMo, Yunho, Joon Yoo, and Sangwoo Kang. 2023. "Parameter-Efficient Fine-Tuning Method for Task-Oriented Dialogue Systems" Mathematics 11, no. 14: 3048. https://doi.org/10.3390/math11143048
APA StyleMo, Y., Yoo, J., & Kang, S. (2023). Parameter-Efficient Fine-Tuning Method for Task-Oriented Dialogue Systems. Mathematics, 11(14), 3048. https://doi.org/10.3390/math11143048