Research on Named Entity Recognition Methods in Chinese Forest Disease Texts
Abstract
:1. Introduction
2. Related Work
3. Materials and Methods
3.1. Construction and Analysis of Data Set
3.2. The Proposed Approach
3.2.1. Multi-Feature Embedding Layer
- (1)
- The radical features of Chinese characters. By analyzing texts of forest diseases, it was found that many diseases and fungicides have specific radicals. For example, the names of the diseases contain partial radicals such as “口”, “疒”, and “艹”, while the control agents typically contain partial radicals such as “氵”, “雨”, and “刂”. Thus, the radical of a character was regarded as a basic feature set.
- (2)
- Word boundary features. The place names and organization names contained in the general domain data set contained obvious word boundaries. For example, most of the place names contained obvious boundary words such as “省” and “市”. In the texts on forest diseases, a large number of entities do not have obvious boundary characteristics, such as “混灭威” and “百菌清”. Therefore, the word boundary was introduced as a feature. The sentence is automatically labeled with a word boundary.
- (3)
- Part-of-speech features. Parts of speech contain the deep information of words, which is a common feature in Chinese natural language processing. By analyzing forest disease texts, it is found that there are certain rules in the part-of-speech distribution of some entities; for example, some disease entities are connected by multiple nouns, while control agent entities usually appear after verbs. This we the result of automatic part-of-speech tagging as a basic feature. Parts of speech include more than 30 kinds of nouns, verbs, prepositions, and adverbs.
3.2.2. Transformer Encoder Layer with Position Information
3.2.3. BiGRU Layer
3.2.4. CRF Layer
3.3. The Existing Methods
3.3.1. BiGRU-CRF Method
3.3.2. BiLSTM-CRF Method
3.3.3. Transformer-BiLSTM-CRF Method
4. Experimental Setup and Results
4.1. Experimental Parameter Setup
4.2. Experimental Results
4.2.1. Results of Multi-feature Embedding
4.2.2. Results of the Methods in Two Different Conditions
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhao, S.; Luo, R.; Cai, Z. A Survey of Chinese Named Entity Recognition. J. Front. Comput. Sci. Technol. 2021. Available online: https://kns.cnki.net/kcms/detail/11.5602.TP.20210927.2223.002.html (accessed on 1 November 2021).
- Liu, L.; Wang, D. A Review on Named Entity Recognition. J. China Soc. Sci. Tech. Inf. 2018, 37, 329–340. [Google Scholar]
- Gong, D.; Zhang, Y.; Guo, Y.; Wang, B.; Fan, K.; Huo, Y. Research on named entity recognition of Chinese electronic medical records based on multifeatured embedding and attention mechanism. Chin. J. Eng. 2021, 43, 1190–1196. [Google Scholar]
- Li, R.; Li, T.; Yang, J.; Mo, T.; Jiang, S.; Li, D. Bridge Inspection Named Entity Recognition Based on Transformer-BiLSTM-CRF. J. Chin. Inf. Process. 2021, 35, 83–91. [Google Scholar]
- Han, X.; Ben, K.; Zhang, X. Research on named entity recognition technology in military software testing. J. Front. Comput. Sci. Technol. 2020, 14, 740–748. [Google Scholar]
- Hu, C.; Wei, X.; Jiang, G.; Li, F.; Jin, Y. Construction and Application of Forestry Knowledge Graph Based on Encyclopedia Data. Int. Com. APP 2020, 10, 47–53. [Google Scholar]
- Li, D.; Tan, W. Research on named entity recognition method in plant attribute text. J. Front. Comput. Sci. Technol. 2019, 13, 2085–2093. [Google Scholar]
- Grishman, R.; Sundheim, B. Message Understanding Conference-6: A Brief History. In Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, Denmark, 5–9 August 1996. [Google Scholar]
- Chinchor, N.A. Overview of MUC-7/MET-2. In Proceedings of the 7th Message Understanding Conference, Fairfax, VA, USA, 29 April–1 May 1998. [Google Scholar]
- Chieu, H.L.; Ng, H.T. Named entity recognition with a maximum entropy approach. In Proceedings of the 7th Conference on Natural Language Learning, Edmonton, AB, Canada, 31 May–1 June 2003; ACL: Stroudsburg, PA, USA, 2003; pp. 160–163. [Google Scholar]
- Lee, K.J.; Hwang, Y.S.; Kim, S.; Rim, H.C. Biomedical named entity recognition using two-phase model based on SVMs. J. Biomed. Inform. 2004, 37, 436–447. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bikel, D.M.; Schwartz, R.; Weischedel, R.M. An algorithm that learns what’s in a name. Mach. Learn. 1999, 34, 211–231. [Google Scholar] [CrossRef]
- McCallum, A.; Wei, L. Early results for named entity recog nition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the 7th Conference on Natural Language Learning, Edmonton, AB, Canada, 31 May–1 June 2003; ACL: Stroudsburg, PA, USA, 2003; pp. 188–191. [Google Scholar]
- Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2021, 109, 43–76. [Google Scholar] [CrossRef]
- Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P. Natural Language Processing (almost) from Scratch. J. Mach. Learn. Res. 2011, 12, 2493–2537. [Google Scholar]
- Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
- Yang, Z.; Salakhutdinov, R.; Cohen, W. Multi-task cross-lingual sequence tagging from scratch. arXiv 2016, arXiv:1603.06270. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Yan, H.; Deng, B.; Li, X.; Qiu, X. TENER: Adapting transformer encoder for name entity recognition. arXiv 2019, arXiv:1911.04474. [Google Scholar]
- Ankit, A.; Sarsij, T.; Manu, V.; Vikas, S.; Gaurav, C.; Nicola, D. BERT-Based Transfer-Learning Approach for Nested Named-Entity Recognition Using Joint Labeling. Appl. Sci. 2022, 12, 976. [Google Scholar] [CrossRef]
- Dong, C.; Zhang, J.; Zong, C.; Hattori, M.; Di, H. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In Natural Language Understanding and Intelligent Applications; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 239–250. [Google Scholar]
- Xuan, Z.; Jiang, S.; Zhang, L.; Bao, R. Multi-feature Bi-LSTM-CRF Model for Person Named Recognition from Movie Reviews. J. Chin. Inf. Process. 2019, 33, 94–101. [Google Scholar]
- Li, B.; Kang, X.; Zhang, H.L. Named entity recognition in Chinese electronic medical records using Transformer-CRF. Comput. Eng. Appl. 2020, 56, 153–159. [Google Scholar]
- Chinese Academy of Forestry Sciences. China Forestry Information Network. 1996. Available online: http://frps.iplant.cn/ (accessed on 8 October 2021).
- He, W.; Ye, J. Forest Pathology; China Forestry Publishing House: Beijing, China, 2017. [Google Scholar]
Types | Labels |
---|---|
Forest disease entities (D) | B-D, I-D |
Drug entities (T) | B-T, I-T |
Damage site entities (L) | B-L, I-L |
Types | Total Quantity | Training Set | Testing Set |
---|---|---|---|
D | 5468 | 4803 | 665 |
T | 7169 | 6285 | 884 |
L | 706 | 25 | 81 |
Parameter | Value |
---|---|
Character embedding size | 230 |
Maximum length of sequence | 100 |
Learning rate | 0.0001 |
Dropout rate | 0.1 |
Batch size | 128 |
Number of epochs | 100 |
Features | P | R | F1 |
---|---|---|---|
Char | 90.70 | 89.06 | 89.87 |
POS | 83.03 | 73.19 | 77.80 |
Boundary | 88.40 | 82.63 | 85.42 |
Radical | 91.68 | 90.66 | 91.17 |
RBP | 93.16 | 92.97 | 93.07 |
Models | P | R | F1 |
---|---|---|---|
BiLSTM-CRF | 85.20 | 86.85 | 86.02 |
BiGRU-CRF | 87.42 | 86.55 | 86.98 |
Transformer-BiLSTM-CRF | 88.98 | 88.35 | 88.66 |
Transformer-BiGRU-CRF | 90.70 | 89.06 | 89.87 |
RBP-Transformer-BiGRU-CRF | 93.16 | 92.97 | 93.07 |
Models | P | R | F1 |
---|---|---|---|
RBP-BiLSTM-CRF | 89.79 | 85.64 | 87.67 |
RBP-BiGRU-CRF | 91.74 | 89.16 | 90.43 |
RBP-Transformer-BiLSTM-CRF | 90.11 | 92.37 | 91.20 |
RBP-Transformer-BiGRU-CRF | 93.16 | 92.97 | 93.07 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Q.; Su, X. Research on Named Entity Recognition Methods in Chinese Forest Disease Texts. Appl. Sci. 2022, 12, 3885. https://doi.org/10.3390/app12083885
Wang Q, Su X. Research on Named Entity Recognition Methods in Chinese Forest Disease Texts. Applied Sciences. 2022; 12(8):3885. https://doi.org/10.3390/app12083885
Chicago/Turabian StyleWang, Qi, and Xiyou Su. 2022. "Research on Named Entity Recognition Methods in Chinese Forest Disease Texts" Applied Sciences 12, no. 8: 3885. https://doi.org/10.3390/app12083885
APA StyleWang, Q., & Su, X. (2022). Research on Named Entity Recognition Methods in Chinese Forest Disease Texts. Applied Sciences, 12(8), 3885. https://doi.org/10.3390/app12083885