A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon-Enhanced BERT and Contrastive Learning
Abstract
:Featured Application
Abstract
1. Introduction
- (1)
- We propose a simple yet effective NER model that integrates enhanced lexicon and contrastive learning for the complex pig disease domain, making the model more sensitive to texts in this domain and improving predictions for entities. The lexicon-enhanced BERT facilitates the direct integration of external lexicon knowledge of pig diseases into BERT layers via a Lexicon Adapter layer.
- (2)
- To enrich the semantic feature representation and improve performance under data scarcity conditions, we propose a lexicon-enhanced contrastive loss layer on top of the BERT encoder. Experimental results on small sample scenarios and common public datasets demonstrate that our model outperforms other models.
- (3)
- Given the lack of an annotated corpus for the pig disease domain, we collected and annotated a new Chinese corpus and annotated datasets consisting of 7518 entities. To address the insensitivity of word segmentation caused by the specialization of the pig disease domain, we constructed a lexicon for identifying specific terms in pig diseases using frequency statistics methods under the guidance of veterinarians.
2. Materials and Methods
2.1. Materials
2.1.1. Corpus Collection and Pre-Processing
2.1.2. Analysis of Corpus
2.1.3. Corpus Annotation
2.1.4. Construction of Lexicon and Pre-Training Word Embedding
2.2. Methods
2.2.1. Char–Word Pair Sequence
2.2.2. Lexicon Adapter
2.2.3. Lexicon-Enhanced BERT
2.2.4. Lexicon-Enhanced Contrastive Learning
2.2.5. Construction of Positive and Negative Pairs
- Positive sample pairs based on the same label token
- 2.
- Negative sample pairs based on entities of different types
3. Results
3.1. Evaluation
3.2. Experimental Settings
3.3. Results
3.3.1. Comparison with Baseline Models
3.3.2. The Recognition Effect on Different Entities
3.3.3. The Recognition Effect on Small Sample
3.3.4. The Recognition Effect on Public Datasets
4. Discussion
5. Conclusions
- (1)
- We aim to enhance the identification of more fine-grained entity types in the pig disease domain, including appearance symptoms and anatomical symptoms.
- (2)
- We will further optimize the pig disease corpus and construct a more extensive professional vocabulary to enhance NER performance.
- (3)
- We will explore the application of our model in other animal diseases, such as chicken or cow diseases, and investigate methods to handle nested entities and discontinuous entities based on our approach.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Data Source Details
No | Type | Example |
1 | Professional books | Jeffrey J. Zimmerman, Locke A. Kerri, Alejandro Raminez. etc., Editor-in-Chief. Hanchun Yang, main translation. Disease of Swine. North United publishing Media Co., Ltd., Liaoning science and technology publishing house: Beijing, China, 2022. Yousheng Xu. Primary color atlas of scientific pig raising and pig disease prevention and control. China Agricultural Publishing House: Beijing, China, 2017. Changyou Li, Xiaocheng Li. Prevention and control technology of swine epidemic disease. China Agricultural Publishing House: Beijing, China, 2015. Jianxin Zhang. Diagnosis and control of herd pig epidemic disease. He’nan Science and Technology Press: Zhengzhou, China, 2014. Chaoying Luo, Guibo Wang. Prevention and treatment of pig diseases and safe medication. Chemical industry press: Beijing, China, 2016, etc. |
2 | Standard specification | Technical Specification for Quarantine of Porcine Reproductive and Respiratory Syndrome (SN/T 1247-2022), Diagnostic Techniques for Mycoplasma Pneumonia in Swine (NY/T 1186-2017), Diagnostic Techniques for Infectious Pleuropneumonia in Swine (NY/T 537-2023), Diagnostic Techniques for Swine Dysentery (NY/T 545-2023), Technical Specification for Quarantine of Porcine Rotavirus Infection (SN/T 5196-2020), etc. |
3 | Technological specification | Technical specification for prevention and control of highly pathogenic blue ear disease in pigs, technical specification for prevention and control of foot-and-mouth disease, technical specification for prevention and control of classical swine fever, etc. |
4 | Policy paper | Ministry of Agriculture and Rural Affairs “List of Class I, II and III Animal Diseases”, The Ministry of Agriculture issued the “Guiding Opinions on Prevention and Control of Highly Pathogenic Porcine Blue Ear Disease (2017–2020)”, Notice of National Guiding Opinions on Prevention and Control of Classical Swine Fever (2017–2020), etc. |
5 | Relevant industry website | China Veterinary Website (https://www.cadc.net.cn/sites/MainSite/, 10 December 2023), Big Animal Husbandry Website (https://www.dxumu.com/, 10 December 2023), Huinong Website (https://www.cnhnb.com/, 30 January 2024), etc. |
References
- Li, J.; Sun, A.X.; Han, J.L.; Li, C.L. A Survey on Deep Learning for Named Entity Recognition. IEEE Trans. Knowl. Data Eng. 2022, 34, 50–70. [Google Scholar] [CrossRef]
- Cheng, J.R.; Liu, J.X.; Xu, X.B.; Xia, D.W.; Liu, L.; Sheng, V. A review of Chinese named entity recognition. KSII Trans. Internet Inf. Syst. 2021, 15, 2012–2030. [Google Scholar]
- Mi, B.G.; Fan, Y. A review: Development of named entity recognition (NER) technology for aeronautical information intelligence. Artif. Intell. Rev. 2022, 56, 1515–1542. [Google Scholar]
- Liu, P.; Guo, Y.; Wang, F.; Li, G. Chinese named entity recognition: The state of the art. Neuro Comput. 2022, 473, 37–53. [Google Scholar] [CrossRef]
- Qiu, X.; Sun, T.; Xu, Y.; Shao, Y.; Dai, N.; Huang, X. Pre-trained models for natural language processing: A survey. Sci. China Technol. Sci. 2020, 63, 1872–1897. [Google Scholar] [CrossRef]
- Kang, Y.L.; Sun, L.B.; Zhu, R.B.; Li, M.Y. Survey on Chinese named entity recognition with deep learning. J. Huazhong Univ. Sci. Technol. (Nat. Sci. Ed.) 2022, 50, 44–53. [Google Scholar]
- Huang, S.B.; Sha, Y.P.; Li, R.S. A Chinese named entity recognition method for small-scale dataset based on lexicon and unlabeled data. Multimed. Tools Appl. 2023, 82, 2185–2206. [Google Scholar] [CrossRef]
- Dang, X.; Wang, L.; Dong, X.; Li, F.; Deng, H. Improving Low-Resource Chinese Named Entity Recognition Using Bidirectional Encoder Representation from Transformers and Lexicon Adapter. Appl. Sci. 2023, 13, 10759. [Google Scholar] [CrossRef]
- Zhang, Y.; Yang, J. Chinese NER using lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; Volume 1, pp. 1554–1564. [Google Scholar]
- Gui, T.; Ma, R.; Zhang, Q.; Zhao, L.; Jiang, Y.G.; Huang, X. CNN-Based Chinese NER with lexicon rethinking. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; Volume 8, pp. 4982–4988. [Google Scholar]
- Gui, T.; Zou, Y.; Zhang, Q.; Peng, M.; Fu, J.; Wei, Z.; Huang, X.J. A Lexicon-Based Graph Neural Network for Chinese NER. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 1040–1050. [Google Scholar]
- Liu, W.; Fu, X.; Zhang, Y.; Xiao, W. Lexicon enhanced Chinese sequence labeling using BERT adapter. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Bangkok, Thailand, 1–6 August 2021. [Google Scholar]
- Ma, R.; Peng, M.; Zhang, Q.; Huang, X. Simplify the usage of lexicon in Chinese NER. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 5951–5960. [Google Scholar]
- Li, X.; Yan, H.; Qiu, X.; Huang, X. FLAT: Chinese NER using flat-lattice transformer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 6836–6842. [Google Scholar]
- Mengge, X.; Yu, B.; Liu, T.; Zhang, Y.; Meng, E.; Wang, B. Porous lattice transformer encoder for Chinese NER. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 3831–3841. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language under-standing. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep contextualized word representations. arXiv 2018, arXiv:1802.05365. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A robustly optimized BERT pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Wu, H. ERNIE: Enhanced representation through knowledge integration. arXiv 2019, arXiv:1904.09223v1. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedings of the 8th International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Li, S.; Bai, Z.Q.; Zhao, S.; Jiang, G.S.; Shan, L.L.; Zhang, L. A LEBERT-based model for named entity recognition. In Proceedings of the 2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture (AIAM), ACM International Conference Proceeding Series, Manchester, UK, 23–25 October 2021; pp. 980–983. [Google Scholar]
- Yan, Y.M.; Li, R.M.; Wang, S.R.; Zhang, F.Z.; Wu, W.; Xu, W. ConSERT: A contrastive framework for self-supervised sentence representation transfer. arXiv 2021, arXiv:2105.11741. [Google Scholar]
- Gao, T.; Yao, X.; Chen, D. SimCSE: Simple Contrastive Learning of Sentence Embeddings. EMNLP. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 6894–6910. [Google Scholar]
- Huang, Y.; He, K.; Wang, Y.; Zhang, X.; Gong, T.; Mao, R.; Li, C. COPNER: Contrastive learning with prompt guiding for few-shot named entity recognition. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 2515–2527. [Google Scholar]
- He, K.; Mao, R.; Huang, Y.; Gong, T.; Li, C.; Cambria, E. Template-Free Prompting for Few-Shot Named Entity Recognition via Semantic-Enhanced Contrastive Learning. In IEEE Transactions on Neural Networks and Learning Systems; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar] [CrossRef]
- Li, X.W.; Li, X.L.; Zhao, M.K.; Yang, M.; Yu, R.G.; Yu, M.; Yu, J. CLINER: Exploring task-relevant features and label semantic for few-shot named entity recognition. Neural Comput. Appl. 2023, 36, 4679–4691. [Google Scholar] [CrossRef]
- Chen, P.; Wang, J.; Lin, H.F.; Zhao, D.; Yang, Z.H.; Wren, J. Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning. Bioinformatics 2023, 39, btad496. [Google Scholar] [CrossRef]
- Sahadevan, S.; Hofmann-Apitius, M.; Schellander, K.; Tesfaye, D.; Fluck, J.; Friedrich, C.M. Text mining in livestock animal science: Introducing the potential of text mining to animal sciences. J. Anim. Sci. 2012, 90, 3666–3676. [Google Scholar] [CrossRef] [PubMed]
- Oh, H.S.; Lee, H. Named Entity Recognition for Pet Disease Q&A System. J. Digit. Contents Soc. 2022, 23, 765–771. [Google Scholar]
- Kung, H.Y.; Yu, R.W.; Chen, C.H.; Tsai, C.W.; Lin, C.Y. Intelligent pig-raising knowledge question-answering system based on neural network schemes. Agron. J. 2021, 113, 906–922. [Google Scholar] [CrossRef]
- Zhang, D.; Zheng, G.; Liu, H.; Ma, X.; Xi, L. AWdpCNER: Automated Wdp Chinese Named Entity Recognition from Wheat Diseases and Pests Text. Agriculture 2023, 13, 1220. [Google Scholar] [CrossRef]
- Veena, G.; Kanjirangat, V.; Gupta, D. AGRONER: An unsupervised agriculture named entity recognition using weighted distributional semantic model. Expert Syst. Appl. 2023, 229, 120440. [Google Scholar] [CrossRef]
- Zhang, L.; Nie, X.; Zhang, M.; Gu, M.; Geissen, V.; Ritsema, C.J.; Niu, D.; Zhang, H. Lexicon and attention-based named entity recognition for kiwifruit diseases and pests: A Deep learning approach. Front. Plant Sci. 2022, 13, 1053449. [Google Scholar] [CrossRef] [PubMed]
- Guo, X.C.; Lu, S.H.; Tang, Z.; Bai, Z.; Diao, L.; Zhou, H.; Li, L. CG-ANER: Enhanced contextual embeddings and glyph features-based agricultural named entity recognition. Comput. Electron. Agric. 2022, 194, 106776. [Google Scholar] [CrossRef]
- Huang, B.; Lin, Y.; Pang, S.; Fu, L. Named Entity Recognition in Government Audit Texts Based on ChineseBERT and Character-Word Fusion. Appl. Sci. 2024, 14, 1425. [Google Scholar] [CrossRef]
- Guo, Y.; Feng, S.; Liu, F.; Lin, W.; Liu, H.; Wang, X.; Su, J.; Gao, Q. Enhanced Chinese Domain Named Entity Recognition: An Approach with Lexicon Boundary and Frequency Weight Features. Appl. Sci. 2024, 14, 354. [Google Scholar] [CrossRef]
- Jia, Y.C.; Zhu, D.J. Medical Named Entity Recognition Based on Deep Learning. Comput. Syst. Appl. 2022, 31, 70–81. (In Chinese) [Google Scholar]
- Du, J.; Hao, Y.; Song, F. Research and Development of Named Entity Recognition in Chinese Electronic Medical Record. Acta Electron. Sin. 2022, 50, 3030–3053. [Google Scholar]
- Cao, L.L.; Wu, C.C.; Luo, G.; Guo, C.; Zheng, A.N. Online biomedical named entities recognition by data and knowledge-driven model. Artif. Intell. Med. 2024, 150, 102813. [Google Scholar] [CrossRef] [PubMed]
- Zhai, Z.; Fan, R.; Huang, J.; Xiong, N.; Zhang, L.; Wan, J.; Zhang, L. A Named Entity Recognition Method Based on Knowledge Distillation and Efficient Global Pointer for Chinese Medical Texts. IEEE Access 2024, 12, 83563–83574. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. arXiv 2020, arXiv:2002.05709. [Google Scholar]
Category | Category Definition | Examples | Numbers | Proportion of the Total |
---|---|---|---|---|
Type | Name of different types of pig | 妊娠母猪, 仔猪 (Pregnant sows, piglets) | 735 | 9.78% |
Disease | Name of pig disease | 猪丹毒, 胸膜炎 (Porcine erysipelas, pleurisy) | 958 | 12.74% |
Body parts | Body position, organs, and system of pigs | 心脏, 巨噬细胞 (Heart, Macrophages) | 2063 | 27.44% |
Symptom | External performance | 气喘, 咳嗽, 水肿 (Asthma, cough, swollen) | 2973 | 39.55% |
caused by diseases | ||||
Medicine | Medications for treating diseases | 替米考星, 克林霉素 (Timicosin, clindamycin) | 789 | 10.49% |
Total | 7518 | 100% |
Hyper Param | Value | Hyper Param | Value |
---|---|---|---|
batch_size | 16 | dropout | 0.5 |
learning rate | 0.00001 | optimizer | Adam |
epoch | 20 | maximum sentence length | 256 |
Model Category | Model | P (%) | R (%) | F1 (%) |
---|---|---|---|---|
baseline model without pre-training | BILSTM_CRF | 71.58 | 67.51 | 69.49 |
pre-trained model | BERT-BiLSTM-CRF | 80.29 | 84.73 | 82.45 |
BERT-CRF | 81.62 | 84.39 | 82.98 | |
BERT-CNN-CRF | 82.44 | 80.55 | 81.48 | |
BERT-WWM-ext-BiLSTM-CRF | 81.73 | 85.47 | 83.56 | |
RoBERTa-BiLSTM-CRF | 81.64 | 85.31 | 83.43 | |
pre-trained model with lexicon | BERT-BILSTM-CRF-SoftLexicon | 82.99 | 84.73 | 83.85 |
LEBERT | 87.18 | 86.45 | 86.81 | |
PDCNER (ours) | 87.76 | 86.97 | 87.36 |
Model | 1% | 10% | 30% | ||||||
P | R | F1 | P | R | F1 | P | R | F1 | |
BERT-BiLSTM-CRF | 31.79 | 1.80 | 3.41 | 67.92 | 75.84 | 71.66 | 74.59 | 81.44 | 77.87 |
LEBERT | 17.39 | 11.43 | 13.79 | 74.81 | 83.47 | 78.91 | 74.05 | 80.14 | 76.97 |
PDCNER (ours) | 18.18 | 11.43 | 14.04 | 85.00 | 86.44 | 85.71 | 87.22 | 86.17 | 86.69 |
Model | 50% | 100% | |||||||
P | R | F1 | P | R | F1 | ||||
BERT-BiLSTM-CRF | 79.18 | 83.45 | 81.26 | 80.29 | 84.73 | 82.45 | |||
LEBERT | 81.09 | 83.4 | 82.23 | 87.18 | 86.45 | 86.81 | |||
PDCNER (ours) | 87.68 | 86.71 | 87.19 | 87.76 | 86.97 | 87.36 |
Model | Ontonotes | Resume | |||||||
---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | P | R | F1 | |
BERT-BiLSTM-CRF | 71.29 | 67.1 | 69.13 | 84.11 | 80.2 | 82.11 | 96.56 | 97.23 | 95.89 |
LEBERT | 78.2 | 74.64 | 76.38 | 89.76 | 82.68 | 86.07 | 95.89 | 97.42 | 96.65 |
PDCNER (ours) | 81.42 | 76.56 | 78.91 | 89.68 | 83.43 | 86.44 | 96.07 | 97.36 | 96.71 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Peng, C.; Wang, X.; Li, Q.; Yu, Q.; Jiang, R.; Ma, W.; Wu, W.; Meng, R.; Li, H.; Huai, H.; et al. A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon-Enhanced BERT and Contrastive Learning. Appl. Sci. 2024, 14, 6944. https://doi.org/10.3390/app14166944
Peng C, Wang X, Li Q, Yu Q, Jiang R, Ma W, Wu W, Meng R, Li H, Huai H, et al. A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon-Enhanced BERT and Contrastive Learning. Applied Sciences. 2024; 14(16):6944. https://doi.org/10.3390/app14166944
Chicago/Turabian StylePeng, Cheng, Xiajun Wang, Qifeng Li, Qinyang Yu, Ruixiang Jiang, Weihong Ma, Wenbiao Wu, Rui Meng, Haiyan Li, Heju Huai, and et al. 2024. "A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon-Enhanced BERT and Contrastive Learning" Applied Sciences 14, no. 16: 6944. https://doi.org/10.3390/app14166944
APA StylePeng, C., Wang, X., Li, Q., Yu, Q., Jiang, R., Ma, W., Wu, W., Meng, R., Li, H., Huai, H., Wang, S., & He, L. (2024). A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon-Enhanced BERT and Contrastive Learning. Applied Sciences, 14(16), 6944. https://doi.org/10.3390/app14166944