A New Entity Relationship Extraction Method for Semi-Structured Patent Documents
Abstract
:1. Introduction
2. Related Works
3. Methodology
3.1. Patent Document Ontology Modeling Method Based on Hierarchical Clustering and Association Rules
3.1.1. Concept Acquisition
3.1.2. Inter-Conceptual Relationship Extraction
- (1)
- Hierarchical relationship extraction
- (2)
- Non-hierarchical relationship extraction
3.2. Patent Document Entity Identification Method Combining Statistical Learning and Deep Learning
3.2.1. Rule Dictionary
3.2.2. Vector Initialization
3.2.3. Hole Convolution Neural Network
3.2.4. BiGRU Network Layer
3.2.5. CRF Inference Layer
3.3. Patent Document Entity Relationship Extraction Method Integrating Attention Mechanism
- (1)
- Characteristic reinforcement layer
- (2)
- Header entity labeling layer
- (3)
- Head entity feature fusion
- (4)
- Relationship and Tail Entity Tagger
4. Experiments
4.1. Datasets and Implementation Details
4.2. Comparative and Ablation Experiments
4.3. Validation
5. Results and Analysis
5.1. Experiment Results
5.2. Validity Results
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Pejic-Bach, M.; Pivar, J.; Krstić, Ž. Big data for prediction: Patent analysis—Patenting big data for prediction analysis. In Big Data Governance and Perspectives in Knowledge Management; IGI Global: Hershey, PA, USA, 2019; pp. 218–240. [Google Scholar]
- Ma, K.; Tian, M.; Tan, Y.; Qiu, Q.; Xie, Z.; Huang, R. Ontology-based BERT model for automated information extraction from geological hazard reports. J. Earth Sci. 2023, 34, 1390–1405. [Google Scholar] [CrossRef]
- Puccetti, G.; Giordano, V.; Spada, I.; Chiarello, F.; Fantoni, G. Technology identification from patent texts: A novel named entity recognition method. Technol. Forecast. Soc. Chang. 2023, 186, 122160. [Google Scholar] [CrossRef]
- Yang, G.; Niu, S.; Dai, B.; Zhang, B.; Li, C.; Jiang, Y. Named entity recognition method of blockchain patent text based on deep learning. In Proceedings of the Third International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2024), Qingdao, China, 21 February 2024; Volume 13181. [Google Scholar]
- Bhattacharya, K.; Chakrabarti, A. A Knowledge Graph and Rule based Reasoning Method for Extracting SAPPhIRE Information from Text. Proc. Des. Soc. 2023, 3, 221–230. [Google Scholar] [CrossRef]
- Trappey, A.J.C.; Liang, C.-P.; Lin, H.-J. Using machine learning language models to generate innovation knowledge graphs for patent mining. Appl. Sci. 2022, 12, 9818. [Google Scholar] [CrossRef]
- Yang, Y.; Li, S. Entity Overlapping Relation Extracting Algorithm based on CNN and BERT. IEEE Access 2024. [Google Scholar] [CrossRef]
- Bai, T.; Guan, H.; Wang, S.; Wang, Y.; Huang, L. Traditional Chinese medicine entity relation extraction based on CNN with segment attention. Neural Comput. Appl. 2022, 34, 2739–2748. [Google Scholar] [CrossRef]
- Shi, M.; Huang, J.; Li, C. Entity relationship extraction based on BLSTM model. In Proceedings of the 2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS), Beijing, China, 17–19 June 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar]
- Wei, M.; Xu, Z.; Hu, J. Entity relationship extraction based on bi-LSTM and attention mechanism. In Proceedings of the 2021 2nd International Conference on Artificial Intelligence and Information Systems, Chongqing, China, 28–30 May 2021. [Google Scholar]
- Liu, Y.; Zuo, Q.; Wang, X.; Zong, T. Entity relationship extraction based on a multi-neural network cooperation model. Appl. Sci. 2023, 13, 6812. [Google Scholar] [CrossRef]
- Qiao, B.; Zou, Z.; Huang, Y.; Fang, K.; Zhu, X.; Chen, Y. A joint model for entity and relation extraction based on BERT. Neural Comput. Appl. 2021, 34, 3471–3481. [Google Scholar] [CrossRef]
- Fan, C. The Entity Relationship Extraction Method Using Improved RoBERTa and Multi-Task Learning. Comput. Mater. Contin. 2023, 77, 1719–1738. [Google Scholar] [CrossRef]
- Lin, Y.; Ji, H.; Huang, F.; Wu, L. A joint neural model for information extraction with global features. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020. [Google Scholar]
- Nasar, Z.; Jaffry, S.W.; Malik, M.K. Named entity recognition and relation extraction: State-of-the-art. ACM Comput. Surv. 2021, 54, 1–39. [Google Scholar] [CrossRef]
- Miric, M.; Jia, N.; Kenneth, G. Huang. Using supervised machine learning for large-scale classification in management research: The case for identifying artificial intelligence patents. Strategy Manag. J. 2023, 44, 491–519. [Google Scholar] [CrossRef]
- Lin, H.; Yan, J.; Qu, M.; Ren, X. Learning dual retrieval module for semi-supervised relation extraction. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019. [Google Scholar]
- Shang, Y.; Huang, H.Y.; Mao, X.L.; Sun, X.; Wei, W. Are noisy sentences useless for distant supervised relation extraction? In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34. [Google Scholar]
- Hong, Y.; Li, J.; Feng, J.; Huang, C.; Li, Z.; Qu, J.; Xiao, Y.; Wang, W. Competition or cooperation? exploring unlabeled data via challenging minimax game for semi-supervised relation extraction. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37. [Google Scholar]
- Kambhatla, N. Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, Barcelona, Spain, 22 July 2004. [Google Scholar]
- Shan, Z.; Liang, F. Extraction of STEM Knowledge Relationship in Physical Education Course Textbooks Based on KNN. In Proceedings of the 2023 IEEE 6th Eurasian Conference on Educational Innovation (ECEI), Singapore, 3–5 February 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
- Hou, W.; Hong, L.; Xu, H.; Yin, W. RoRED: Bootstrapping labeling rule discovery for robust relation extraction. Inf. Sci. 2023, 629, 62–76. [Google Scholar] [CrossRef]
- Li, P.; Mao, K. Knowledge-oriented convolutional neural network for causal relation extraction from natural language texts. Expert Syst. Appl. 2019, 115, 512–523. [Google Scholar] [CrossRef]
- Zhou, P.; Shi, W.; Tian, J.; Qi, Z.; Li, B.; Hao, H.; Xu, B. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany, 7–12 August 2016. [Google Scholar]
- Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Van Den Berg, R.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. In Proceedings of the Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Greece, 3–7 June 2018; proceedings 15. Springer International Publishing: Cham, Switzerland, 2018. [Google Scholar]
- Zhou, H.; Xu, Y.; Yao, W.; Liu, Z.; Lang, C.; Jiang, H. Global context-enhanced graph convolutional networks for document-level relation extraction. In Proceedings of the 28th International Conference on Computational Linguistics, Online, 8–13 December 2020. [Google Scholar]
- Zhen, Y.; Zheng, L.; Chen, P. Constructing knowledge graphs for online collaborative programming. IEEE Access 2021, 9, 117969–117980. [Google Scholar] [CrossRef]
- Zhao, T.; Yan, Z.; Cao, Y.; Li, Z. Asking effective and diverse questions: A machine reading comprehension based framework for joint entity-relation extraction. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, Yokohama, Japan, 7–15 January 2021. [Google Scholar]
- Oliveira, L.; Claro, D.B.; Souza, M. DptOIE: A Portuguese open information extraction based on dependency analysis. Artif. Intell. Rev. 2023, 56, 7015–7046. [Google Scholar] [CrossRef]
- Bhatia, P.; Celikkaya, B.; Khalilia, M.; Senthivel, S. Comprehend medical: A named entity recognition and relationship extraction web service. In Proceedings of the 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar]
- Berahmand, K.; Haghani, S.; Rostami, M.; Li, Y. A new attributed graph clustering by using label propagation in complex networks. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 1869–1883. [Google Scholar] [CrossRef]
- Yuan, L.; Cai, Y.; Wang, J.; Li, Q. Joint multimodal entity-relation extraction based on edge-enhanced graph alignment network and word-pair relation tagging. Proc. AAAI Conf. Artif. Intell. 2023, 37, 11051–11059. [Google Scholar] [CrossRef]
- Kamateri, E.; Stamatis, V.; Diamantaras, K.; Salampasis, M. Automated single-label patent classification using ensemble classifiers. In Proceedings of the 2022 14th International Conference on Machine Learning and Computing, Guangzhou, China, 18–21 February 2022. [Google Scholar]
- Chen, Y.; Yang, W.; Wang, K.; Qin, Y.; Huang, R.; Zheng, Q. A neuralized feature engineering method for entity relation extraction. Neural Netw. 2021, 141, 249–260. [Google Scholar] [CrossRef] [PubMed]
- Yan, Y.; Okazaki, N.; Matsuo, Y.; Yang, Z.; Ishizuka, M. Unsupervised relation extraction by mining wikipedia texts using information from the web. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, 2–7 August 2009. [Google Scholar]
Dataset | Training Set | Test Set |
---|---|---|
Dataset 1 | 61,530 | 5541 |
Dataset 2 | 15,229 | 729 |
Models | Dataset 1 | Dataset 2 | ||||
---|---|---|---|---|---|---|
Accuracy/% | Recall Rate/% | F1/% | Accuracy/% | Recall Rate/% | F1/% | |
CopyRE | 62 | 56.5 | 58.6 | 37.7 | 36.4 | 37.1 |
GraphRel | 63.9 | 60.0 | 61.9 | 44.7 | 41.1 | 42.9 |
CopyRRL | 77.8 | 68.1 | 72.1 | 63.3 | 59.9 | 61.6 |
ETL-Span | 85.3 | 72.3 | 78.0 | 84.3 | 82.0 | 83.1 |
CasRel | 88.7 | 88.2 | 89.5 | 93.4 | 90.1 | 91.8 |
MPreA | 90.2 | 93.1 | 92.1 | 93.8 | 91.9 | 92.8 |
MPreA (BERT) | 91.3 | 91.2 | 91.1 | 93.4 | 91.3 | 92.3 |
MPreA (ALBERT) | 91.9 | 91.7 | 91.5 | 93.2 | 91.5 | 92.4 |
MPreA (ELECTRA) | 92.5 | 93.5 | 92.4 | 93.6 | 92.2 | 92.9 |
Model | Dataset 1 | Dataset 2 | ||||
---|---|---|---|---|---|---|
Accuracy/% | Recall Rate/% | F1/% | Accuracy/% | Recall Rate/% | F1/% | |
Model-a | 91.3 | 92.4 | 92.1 | 93.0 | 92.2 | 91.3 |
Model-b | 92.3 | 89.1 | 89.3 | 93.1 | 90.6 | 90.4 |
Model-c | 91.6 | 92.5 | 92.8 | 93.8 | 91.9 | 93.1 |
Models | Dataset 3 | ||
---|---|---|---|
Accuracy/% | Recall Rate/% | F1/% | |
MPreA | 92.9 | 91.2 | 93.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, L.; Sun, X.; Ma, X.; Hu, K. A New Entity Relationship Extraction Method for Semi-Structured Patent Documents. Electronics 2024, 13, 3144. https://doi.org/10.3390/electronics13163144
Zhang L, Sun X, Ma X, Hu K. A New Entity Relationship Extraction Method for Semi-Structured Patent Documents. Electronics. 2024; 13(16):3144. https://doi.org/10.3390/electronics13163144
Chicago/Turabian StyleZhang, Liyuan, Xiangyu Sun, Xianghua Ma, and Kaitao Hu. 2024. "A New Entity Relationship Extraction Method for Semi-Structured Patent Documents" Electronics 13, no. 16: 3144. https://doi.org/10.3390/electronics13163144
APA StyleZhang, L., Sun, X., Ma, X., & Hu, K. (2024). A New Entity Relationship Extraction Method for Semi-Structured Patent Documents. Electronics, 13(16), 3144. https://doi.org/10.3390/electronics13163144