Cross-Lingual Named Entity Recognition Based on Attention and Adversarial Training
Abstract
:1. Introduction
2. Related Work
2.1. Cross-Lingual Migration
2.2. Adversarial Training
2.3. Attention Mechanism
3. Model
3.1. Embedding
3.2. Double Attention
3.3. Adversarial Training Layer
3.4. Output Layer
4. Experimental Results and Analysis
4.1. Data Sets
4.2. Experimental Configuration
4.3. Experimental Results and Analysis
4.4. Case Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Conneau, A.; Lample, G. Cross-lingual language model pretraining. In Advances in Neural Information Processing Systems 32; NeurIPS: La Jolla, CA, USA, 2019; Volume 3, pp. 154–196. [Google Scholar]
- Wu, Q.; Lin, Z.; Wang, G.; Chen, H.; Karlsson, B.F.; Huang, B.; Lin, C.-Y. Enhanced meta-learning for cross-lingual named entity recognition with minimal resources. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34. [Google Scholar]
- Patra, B.; Moniz, J.R.A.; Garg, S.; Gormley, M.R.; Neubig, G. Bilingual lexicon induction with semi-supervision in non-isometric embedding spaces. arXiv 2019, arXiv:1908.06625. [Google Scholar]
- Liu, L.; Ding, B.; Bing, L.; Joty, S.; Si, L.; Miao, C. MulDA: A Multilingual Data Augmentation Framework for Low-Resource Cross-Lingual NER. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; Volume 1. Long Papers. [Google Scholar]
- Tzeng, E.; Hoffman, J.; Zhang, N.; Saenko, K.; Darrell, T. Deep domain confusion: Maximizing for domain invariance. arXiv 2014, arXiv:1412.3474. [Google Scholar]
- Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Deep Transfer Learning with Joint Adaptation Networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; PMLR: London, UK, 2017; pp. 2208–2217. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep contextualized word representations. In Proceedings of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2227–2237. [Google Scholar]
- Chen, Y.; Liu, Y.; Cheng, Y.; Li, V.O. A teacher-student framework for zero-resource neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Association for Computational Linguistics: Toronto, ON, Canada, 2017; pp. 1–8. [Google Scholar]
- Sennrich, R.; Haddow, B.; Birch, A. Improvingneural machine translation models w ith monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 86–96. [Google Scholar]
- Edunov, S.; Ott, M.; Auli, M.; Grangier, D. Understanding back-translation at scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; Association for Computational Linguistics: Toronto, ON, Canada, 2018; pp. 1–15. [Google Scholar]
- Rush, A.M.; Chopra, S.; Weston, J. A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical M ethods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; Association for Computational Linguistics: Toronto, ON, Canada, 2015; pp. 15–35. [Google Scholar]
- Walker, C.; Strassel, S.; Medero, J.; Maeda, K. ACE 2005 Multi-Lingual Training Corpus. Available online: https://catalog.ldc.upenn.edu/LDC2006T06 (accessed on 20 February 2020).
- Mayhew, S.; Chen-Tse, T.; Dan, R. Cheap translation for cross-lingual named entity recognition. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017. [Google Scholar]
- Shen, Z.; Zhang, Y.-H.; Han, K.; Nandi, A.K.; Honig, B.; Huang, D.-S. miRNA-Disease Association Prediction with Collaborative Matrix Factorization. Complexity 2017, 2017, 2498957. [Google Scholar] [CrossRef]
- Ruder, S.; Vulić, I.; Søgaard, A. A survey of cross-lingual word embedding models. J. Artif. Intell. Res. 2019, 65, 569–631. [Google Scholar] [CrossRef] [Green Version]
- Tsai, C.T.; Mayhew, S.; Roth, D. Cross-lingual named entity recognition via wikification. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, 11–12 August 2016. [Google Scholar]
- Cotterell, R.; Duh, K. Low-resource named entity recognition with cross-lingual, character-level neural conditional random fields. In Proceedings of the Eighth International Joint Conference on Natural Language Processing, Taipei, Taiwan, 27 November–1 December 2017; Volume 2. Short Papers. [Google Scholar]
- Fang, Y.; Wang, S.; Gan, Z.; Sun, S.; Liu, J. Filter: An enhanced fusion method for cross-lingual language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, Held Virtually, 2–9 February 2021; Volume 35. [Google Scholar]
- Feng, X.; Feng, X.; Qin, B.; Feng, Z.; Liu, T. Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer. IJCAI 2018, 1, 4071–4077. [Google Scholar]
- Bari, M.S.; Joty, S.; Jwalapuram, P. Zero-resource cross-lingual named entity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34. [Google Scholar]
- Chen, X.; Awadallah, A.H.; Hassan, H.; Wang, W.; Cardie, C. Multi-source cross-lingual model transfer: Learning what to share. arXiv 2018, arXiv:1810.03552. [Google Scholar]
- Jiang, X.; Liang, Y.; Chen, W.; Duan, N. XLM-K: Improving cross-lingual language model pre-training with multilingual knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, Held Virtually, 22 February–1 March 2022; Volume 36. [Google Scholar]
- Peng, N.; Dredze, M. Named entity recognition for Chinese social media with jointly trained embeddings. In Proceedings of the 2015 Conference on Empirical Meth-ods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 548–554. [Google Scholar]
- Artetxe, M.; Labaka, G.; Agirre, E. Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1. Long Papers. [Google Scholar]
- Zirikly, A.; Hagiwara, M. Cross-lingual transfer of named entity recognizers without parallel corpora. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 27–31 July 2015; Volume 2. Short Papers. [Google Scholar]
- Darwish, K. Named entity recognition using cross-lingual resources: Arabic as an example. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013; Volume 1. Long Papers. [Google Scholar]
- Panchendrarajan, R.; Amaresan, A. Bidirectional LSTM-CRF for named entity recognition. In Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation, Hong Kong, China, 1–3 December 2018; pp. 531–540. [Google Scholar]
- Peng, N.; Dredze, M. Improving named entityrecognition for Chinese social media with word segmentation representation learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Association for Computational Linguistics: Toronto, ON, Canada, 2016; p. 149. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; NeurIPS: La Jolla, CA, USA, 2017; p. 599. [Google Scholar]
- Yan, H.; Deng, B.; Li, X.; Qiu, X. TENER: Adapting Transformer Encoder for Named Entity Recognition. arXiv 2019, arXiv:1911.04474. [Google Scholar]
- Luo, L.; Yang, Z.; Yang, P.; Zhang, Y.; Wang, L.; Lin, H.; Wang, J. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics 2018, 34, 1381–1388. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems; NeurIPS: La Jolla, CA, USA, 2014; pp. 3320–3328. [Google Scholar]
Dataset | Language | Train | Dev | Test |
---|---|---|---|---|
CoNLL2003 | English | 204,567 | 51,578 (5942) | 46,666 (5648) |
WeiboNER | Chinese | 1350 | 270 | 270 |
PeopleDaily2004 | Chinese | 28,046 | 4636 | 4636 |
Parameters | Value |
---|---|
Epoch | 3 |
Batch size | 20 |
Learning rate | 4 × 10−5 |
Dropout | 0.5 |
Token embedding dimension | 100 |
Model | p | R | F1 |
---|---|---|---|
CRF (Peng and Dredze, 2015 [24]) | 0.5698 | 0.2526 | 0.3500 |
Pipeline Seg.Repr. + NER (Peng and Dredze, 2015) | 0.6422 | 0.3608 | 0.4620 |
Word2vec + Bi-LSTM-CRF | 0.3199 | 0.5600 | 0.3290 |
ELMO + Bi-LSTM-CRF | 0.3870 | 0.4428 | 0.4131 |
M-Bert + Bi-LSTM-CRF | 0.4222 | 0.1843 | 0.4693 |
Model | WeiboNER (F1) | People-Daily2004 (F1) |
---|---|---|
M-Bert + Bi-LSTM-CRF | 0.5032 | 0.4693 |
M-Bert + Bi-LSTM-CRF + Word-adv + Att | 0.5217 | 0.5208 |
M-Bert + Bi-LSTM-CRF + Word-adv + Att + xlpos | 0.5192 | 0.5322 |
Model | WeiboNER (F1) | People-Daily2004 (F1) |
---|---|---|
Model | 0.5192 | 0.5322 |
Without attention | 0.5012 | 0.4834 |
Without position | 0.5107 | 0.5214 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, H.; Zhou, L.; Duan, J.; He, L. Cross-Lingual Named Entity Recognition Based on Attention and Adversarial Training. Appl. Sci. 2023, 13, 2548. https://doi.org/10.3390/app13042548
Wang H, Zhou L, Duan J, He L. Cross-Lingual Named Entity Recognition Based on Attention and Adversarial Training. Applied Sciences. 2023; 13(4):2548. https://doi.org/10.3390/app13042548
Chicago/Turabian StyleWang, Hao, Lekai Zhou, Jianyong Duan, and Li He. 2023. "Cross-Lingual Named Entity Recognition Based on Attention and Adversarial Training" Applied Sciences 13, no. 4: 2548. https://doi.org/10.3390/app13042548
APA StyleWang, H., Zhou, L., Duan, J., & He, L. (2023). Cross-Lingual Named Entity Recognition Based on Attention and Adversarial Training. Applied Sciences, 13(4), 2548. https://doi.org/10.3390/app13042548