Mixup Based Cross-Consistency Training for Named Entity Recognition
Abstract
:1. Introduction
- We propose the mixup, which was originally used as a data augmentation method, as perturbation to implement consistency regularization.
- We propose a framework to train the model for Named Entity Recognition tasks better using weakly labeled data. The proposed framework is based on semi-supervised learning combined with pseudo-labeling and consistency regularization by mixup.
- Our experimental results proved that if we directly utilize mixup on NER tasks, we yield worse performances with respect to deep NER models. Moreover, the results showed that the proposed framework improves performances compared to using only a few strongly labeled data.
2. Preliminaries
2.1. Named Entity Recognition
2.2. Mixup
2.3. Semi-Supervised Learning
2.3.1. Pseudo Labeling
2.3.2. Consistency Regularization
2.3.3. Holistic Methods
3. Method
3.1. Model Architecture
3.2. Stage 1: Weakly Labeling
3.3. Stage 2: Pretraining
3.4. Stage 3: Pseudo-Labeling and Joint Training
3.4.1. Pseudo labeling
3.4.2. Joint Training
4. Experiments
4.1. Datasets
4.2. Experimental Setup
- BERT baseline: A supervised learning baseline. We construct the representative baseline model for NER. The BERT baseline is structured with the multi-layer pretrained model BERT and some fully connected layers. That is, we stacked the main classifier on top of the language model.
- Mixup baseline: A baseline to show the effect of mixup on the NER task. The mixup baseline has the same structure as the BERT baseline. However, it is trained with examples generated by mixup as well as strongly labeled data.
- Mixup-CCT: The method proposed in [1]. CCT means cross-consistency training. The architecture is the same as the one proposed in this paper. There is no pseudo-labeling; instead, it gradually increases the affection of the loss and .
- Proposed: It performs consistency regularization from mixed examples generated from the pseudo-label and weak label, while mixup-CCT is mixed by strongly labeled and weakly labeled data.
4.3. Results
4.4. Low Resource Environments
4.5. Comparison of Embedding Vectors
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Dataset | Method | 20% | 40% | 60% | 80% | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pre | Rec | F1 | Pre | Rec | F1 | Pre | Rec | F1 | Pre | Rec | F1 | ||||
BC5CDR-chem | BERT baseline | 87.86 | 89.31 | 88.20 | 89.48 | 90.66 | 89.99 | 89.31 | 91.72 | 90.43 | 91.15 | 92.05 | 91.52 | ||
proposed | 88.91 | 91.31 | 89.60 | 90.77 | 91.55 | 90.74 | 90.65 | 92.36 | 90.98 | 92.27 | 93.08 | 92.25 | |||
BC5CDR-disease | BERT baseline | 75.86 | 78.99 | 77.30 | 77.88 | 81.27 | 79.46 | 80.05 | 83.55 | 81.68 | 81.43 | 83.96 | 82.60 | ||
proposed | 79.65 | 80.59 | 79.52 | 81.09 | 82.59 | 81.24 | 83.18 | 84.99 | 83.58 | 83.87 | 85.76 | 84.37 | |||
NCBI-disease | BERT baseline | 73.10 | 83.97 | 78.12 | 80.85 | 89.05 | 84.72 | 84.36 | 87.28 | 85.75 | 84.26 | 89.06 | 86.56 | ||
proposed | 82.80 | 86.20 | 84.23 | 85.66 | 86.85 | 86.03 | 86.03 | 89.62 | 87.63 | 86.22 | 89.35 | 87.56 | |||
LaptopReview | BERT baseline | 68.46 | 67.84 | 68.12 | 75.07 | 79.79 | 77.32 | 74.81 | 74.63 | 74.72 | 76.04 | 78.45 | 77.17 | ||
proposed | 80.73 | 80.24 | 80.16 | 83.95 | 82.13 | 82.63 | 82.96 | 83.56 | 82.89 | 81.77 | 82.46 | 81.66 |
References
- Youn, G.; Yoon, B.; Ji, S.; Ko, D.; Rhee, J. MixUp based Cross-Consistency Training for Named Entity Recognition. In Proceedings of the 6th International Conference on Advances in Artificial Intelligence, Birmingham, UK, 21–23 October 2022. [Google Scholar]
- Danger, R.; Pla, F.; Molina, A.; Rosso, P. Towards a Protein–Protein Interaction information extraction system: Recognizing named entities. Knowl.-Based Syst. 2014, 57, 104–118. [Google Scholar] [CrossRef] [Green Version]
- Mollá, D.; Van Zaanen, M.; Smith, D. Named entity recognition for question answering. In Proceedings of the Australasian Language Technology Workshop 2006, Sydney, Australia, 11 November 2006; pp. 51–58. [Google Scholar]
- Chen, Y.; Zong, C.; Su, K.Y. A joint model to identify and align bilingual named entities. Comput. Linguist. 2013, 39, 229–266. [Google Scholar] [CrossRef]
- Baralis, E.; Cagliero, L.; Jabeen, S.; Fiori, A.; Shah, S. Multi-document summarization based on the Yago ontology. Expert Syst. Appl. 2013, 40, 6976–6984. [Google Scholar] [CrossRef] [Green Version]
- Nobata, C.; Sekine, S.; Isahara, H.; Grishman, R. Summarization System Integrated with Named Entity Tagging and IE pattern Discovery. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), Las Palmas, Spain, 28 May–3 June 2002; European Language Resources Association (ELRA): Las Palmas, Spain, 2002. [Google Scholar]
- Chiu, J.P.; Nichols, E. Named entity recognition with bidirectional LSTM-CNNs. Trans. Assoc. Comput. Linguist. 2016, 4, 357–370. [Google Scholar] [CrossRef]
- Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P. Natural Language Processing (Almost) from Scratch. J. Mach. Learn. Res. 2011, 12, 2493–2537. [Google Scholar]
- Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF models for sequence tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
- Li, J.; Sun, A.; Han, J.; Li, C. A survey on deep learning for named entity recognition. IEEE Trans. Knowl. Data Eng. 2020, 34, 50–70. [Google Scholar] [CrossRef] [Green Version]
- Fang, Z.; Cao, Y.; Li, T.; Jia, R.; Fang, F.; Shang, Y.; Lu, Y. TEBNER: Domain Specific Named Entity Recognition with Type Expanded Boundary-aware Network. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 198–207. [Google Scholar] [CrossRef]
- Jiang, H.; Zhang, D.; Cao, T.; Yin, B.; Zhao, T. Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Virtual Event, 1–6 August 2021; pp. 1775–1789. [Google Scholar] [CrossRef]
- Liu, S.; Sun, Y.; Li, B.; Wang, W.; Zhao, X. HAMNER: Headword amplified multi-span distantly supervised method for domain specific named entity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 8401–8408. [Google Scholar]
- Shang, J.; Liu, L.; Ren, X.; Gu, X.; Ren, T.; Han, J. Learning named entity tagger using domain-specific dictionary. arXiv 2018, arXiv:1809.03599. [Google Scholar]
- Liang, C.; Yu, Y.; Jiang, H.; Er, S.; Wang, R.; Zhao, T.; Zhang, C. Bond: Bert-assisted open-domain named entity recognition with distant supervision. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 1054–1064. [Google Scholar]
- Ouali, Y.; Hudelot, C.; Tami, M. An overview of deep semi-supervised learning. arXiv 2020, arXiv:2006.05278. [Google Scholar]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
- Chen, J.; Yang, Z.; Yang, D. MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 2147–2157. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Chen, J.; Wang, Z.; Tian, R.; Yang, Z.; Yang, D. Local Additivity Based Data Augmentation for Semi-supervised NER. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 1241–1251. [Google Scholar] [CrossRef]
- Zhang, R.; Yu, Y.; Zhang, C. SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 8566–8579. [Google Scholar]
- Pinto, F.; Yang, H.; Lim, S.N.; Torr, P.H.; Dokania, P.K. RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness. arXiv 2022, arXiv:2206.14502. [Google Scholar]
- Xie, Q.; Luong, M.T.; Hovy, E.; Le, Q.V. Self-training with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10687–10698. [Google Scholar]
- Li, Z.z.; Feng, D.W.; Li, D.S.; Lu, X.C. Learning to select pseudo labels: A semi-supervised method for named entity recognition. Front. Inf. Technol. Electron. Eng. 2020, 21, 903–916. [Google Scholar] [CrossRef]
- Gaur, B.; Saluja, G.S.; Sivakumar, H.B.; Singh, S. Semi-supervised deep learning based named entity recognition model to parse education section of resumes. Neural Comput. Appl. 2021, 33, 5705–5718. [Google Scholar] [CrossRef]
- Chen, H.; Yuan, S.; Zhang, X. ROSE-NER: Robust Semi-supervised Named Entity Recognition on Insufficient Labeled Data. In Proceedings of the The 10th International Joint Conference on Knowledge Graphs, Virtual Event, 6–8 December 2021; pp. 38–44. [Google Scholar]
- French, G.; Laine, S.; Aila, T.; Mackiewicz, M.; Finlayson, G. Semi-supervised semantic segmentation needs strong, varied perturbations. arXiv 2019, arXiv:1906.01916. [Google Scholar]
- Ouali, Y.; Hudelot, C.; Tami, M. Semi-supervised semantic segmentation with cross-consistency training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12674–12684. [Google Scholar]
- Clark, K.; Luong, M.T.; Manning, C.D.; Le, Q. Semi-Supervised Sequence Modeling with Cross-View Training. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; Association for Computational Linguistics: Brussels, Belgium, 2018; pp. 1914–1925. [Google Scholar] [CrossRef] [Green Version]
- Berthelot, D.; Carlini, N.; Goodfellow, I.; Papernot, N.; Oliver, A.; Raffel, C.A. Mixmatch: A holistic approach to semi-supervised learning. Adv. Neural Inf. Process. Syst. 2019, 32, 5050–5060. [Google Scholar]
- Berthelot, D.; Carlini, N.; Cubuk, E.D.; Kurakin, A.; Sohn, K.; Zhang, H.; Raffel, C. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. arXiv 2019, arXiv:1911.09785. [Google Scholar]
- Sohn, K.; Berthelot, D.; Carlini, N.; Zhang, Z.; Zhang, H.; Raffel, C.A.; Cubuk, E.D.; Kurakin, A.; Li, C.L. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Adv. Neural Inf. Process. Syst. 2020, 33, 596–608. [Google Scholar]
- Peng, M.; Xing, X.; Zhang, Q.; Fu, J.; Huang, X. Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Association for Computational Linguistics: Florence, Italy, 2019; pp. 2409–2419. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Doğan, R.I.; Leaman, R.; Lu, Z. NCBI disease corpus: A resource for disease name recognition and concept normalization. J. Biomed. Inform. 2014, 47, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Li, J.; Sun, Y.; Johnson, R.J.; Sciaky, D.; Wei, C.H.; Leaman, R.; Davis, A.P.; Mattingly, C.J.; Wiegers, T.C.; Lu, Z. BioCreative V CDR task corpus: A resource for chemical disease relation extraction. Database 2016, 2016, baw068. [Google Scholar] [CrossRef] [Green Version]
- Pontiki, M.; Galanis, D.; Papageorgiou, H.; Androutsopoulos, I.; Manandhar, S.; Al-Smadi, M.; Al-Ayyoub, M.; Zhao, Y.; Qin, B.; De Clercq, O.; et al. Semeval-2016 task 5: Aspect based sentiment analysis. In Proceedings of the International Workshop on Semantic Evaluation, San Diego, CA, USA, 16–17 June 2016; pp. 19–30. [Google Scholar]
- Wang, H.; Lu, Y.; Zhai, C. Latent aspect rating analysis without aspect keyword supervision. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011; pp. 618–626. [Google Scholar]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef] [Green Version]
- Jawahar, G.; Sagot, B.; Seddah, D. What does BERT learn about the structure of language? In Proceedings of the ACL 2019—57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019. [Google Scholar]
- Lee, H.G.; Park, G.; Kim, H. Effective integration of morphological analysis and named entity recognition based on a recurrent neural network. Pattern Recognit. Lett. 2018, 112, 361–365. [Google Scholar] [CrossRef]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Dataset | The Number of Examples | |||
---|---|---|---|---|
Train | Dev | Test | Weak | |
BC5CDR-chem | 4560 | 4581 | 4797 | 1 |
BC5CDR-disease | 4560 | 4581 | 4797 | 1 |
NCBI-disease | 5424 | 923 | 940 | 1 |
LaptopReview | 2436 | 609 | 800 | 2 |
Method | BC5CDR-Chem | BC5CDR-Disease | NCBI-Disease | LaptopReview | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pre | Rec | F1 | Pre | Rec | F1 | Pre | Rec | F1 | Pre | Rec | F1 | |||
BERT baseline | 90.88 | 92.08 | 91.38 | 83.44 | 85.40 | 84.32 | 83.69 | 90.91 | 86.86 | 79.76 | 79.74 | 79.72 | ||
mixup baseline | 90.95 | 92.49 | 91.21 | 83.65 | 85.53 | 84.11 | 85.94 | 87.76 | 86.60 | 80.56 | 77.49 | 78.54 | ||
mixup-CCT | 91.98 | 92.78 | 92.19 | 86.69 | 85.46 | 85.86 | 88.64 | 89.23 | 88.82 | 83.40 | 78.09 | 80.46 | ||
proposed | 92.12 | 93.14 | 92.23 | 84.19 | 86.81 | 85.03 | 87.98 | 90.25 | 88.92 | 84.85 | 80.90 | 82.24 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Youn, G.; Yoon, B.; Ji, S.; Ko, D.; Rhee, J. Mixup Based Cross-Consistency Training for Named Entity Recognition. Appl. Sci. 2022, 12, 11084. https://doi.org/10.3390/app122111084
Youn G, Yoon B, Ji S, Ko D, Rhee J. Mixup Based Cross-Consistency Training for Named Entity Recognition. Applied Sciences. 2022; 12(21):11084. https://doi.org/10.3390/app122111084
Chicago/Turabian StyleYoun, Geonsik, Bohan Yoon, Seungbin Ji, Dahee Ko, and Jongtae Rhee. 2022. "Mixup Based Cross-Consistency Training for Named Entity Recognition" Applied Sciences 12, no. 21: 11084. https://doi.org/10.3390/app122111084
APA StyleYoun, G., Yoon, B., Ji, S., Ko, D., & Rhee, J. (2022). Mixup Based Cross-Consistency Training for Named Entity Recognition. Applied Sciences, 12(21), 11084. https://doi.org/10.3390/app122111084