Technology Keyword Analysis Using Graphical Causal Models
Abstract
:1. Introduction
2. Graph and Causal Inference
2.1. Graph Structure
2.2. Causal Inference Models
3. Proposed Method
- (Step 1)
- Collecting patent document data:
- (1-1)
- Selecting target technology;
- (1-2)
- Searching patents related to target technology;
- (1-3)
- Determining valid patents.
- (Step 2)
- Preprocessing patent text data:
- (2-1)
- Performing tokenization;
- (2-2)
- Performing normalization: stemming, lemmatization, lowercasing,deleting stop words, removing punctuation and meaningless characters.
- (Step 3)
- Building a patent document–keyword matrix:
- (3-1)
- Finding unique vocabulary from corpus;
- (3-2)
- Constructing a data matrix:rows = patent documents, columns = keywords from the vocabulary;matrix values = frequency values of each keyword in a document.
- (Step 1)
- Building a complete undirected graph based on technology keywords.
- (Step 2)
- Performing a conditional independence test:
- (2-1)
- Determining the significance level
- (2-2)
- Using subsets of adjacency sets related to all pairs of technology keywords.
- (Step 3)
- Deleting the edges of a pair of nodes with conditional independence.
- (Step 4)
- Constructing a graphical causal model using the remaining edges.
- (Step 1)
- Skeleton of directed acyclic graph (DAG):
- (1-1)
- Estimating the DAG skeleton;
- (1-2)
- Starting with a complete undirected graph.
- (Step 2)
- Testing the constraint of each edge (between and ):
- (2-1)
- Defining the conditional set C;
- (2-2)
- Deleting and if and are conditionally independent given C;
- (2-3)
- Building the separation set (,).
- (M1)
- Building the patent–keyword matrix;
- (M2)
- Graphical causal inference modeling;
- (M3)
- Poisson regression modeling;
- (M4)
- Constructing the keyword diagram of cause and effect technologies.
4. Experimental Results
4.1. Experimental Data
4.2. Analyzing Technology Keyword Data Using Graphical Causal Modeling
- (Q1)
- What are the causes within the field of digital therapeutics technology?
- (Q2)
- What are the effects within the field of digital therapeutics technology?
- (Q3)
- What is the mediator that connects cause and effect in the field of digital therapeutics?
5. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Jun, S. Keyword Data Analysis Using Generative Models Based on Statistics and Machine Learning Algorithms. Electronics 2024, 13, 798. [Google Scholar] [CrossRef]
- Jun, S. Patent Keyword Analysis Using Bayesian Zero-Inflated Model and Text Mining. Stats 2024, 7, 827–841. [Google Scholar] [CrossRef]
- Jun, S. Zero-Inflated Text Data Analysis using Generative Adversarial Networks and Statistical Modeling. Computers 2023, 12, 258. [Google Scholar] [CrossRef]
- Shin, H.; Lee, H.J.; Cho, S. General-use unsupervised keyword extraction model for keyword analysis. Expert Syst. Appl. 2023, 233, 120889. [Google Scholar] [CrossRef]
- Bzhalava, L.; Kaivo-oja, J.; Hassan, S.S. Digital business foresight: Keyword-based analysis and CorEx topic modeling. Futures 2024, 155, 103303. [Google Scholar] [CrossRef]
- Xue, D.; Shao, Z. Patent text mining based hydrogen energy technology evolution path identification. Int. J. Hydrog. Energy 2024, 49, 699–710. [Google Scholar] [CrossRef]
- Reher, L.; Runst, P.; Thomä, J. Personality and regional innovativeness: An empirical analysis of German patent data. Res. Policy 2024, 53, 105006. [Google Scholar] [CrossRef]
- Coccia, M.; Roshani, S. Path-Breaking Directions in Quantum Computing Technology: A Patent Analysis with Multiple Techniques. J. Knowl. Econ. 2024, 1–34. Available online: https://link.springer.com/article/10.1007/s13132-024-01977-y (accessed on 1 September 2024).
- Park, S.; Jun, S. Zero-Inflated Patent Data Analysis Using Compound Poisson Models. Appl. Sci. 2023, 13, 4505. [Google Scholar] [CrossRef]
- Park, S.; Jun, S. Patent Analysis Using Bayesian Data Analysis and Network Modeling. Appl. Sci. 2022, 12, 1423. [Google Scholar] [CrossRef]
- Kim, J.-M.; Jun, S. Graphical causal inference and copula regression model for apple keywords by text mining. Adv. Eng. Inform. 2015, 29, 918–929. [Google Scholar] [CrossRef]
- Uhm, D.; Jun, S. Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples. Future Internet 2022, 14, 211. [Google Scholar] [CrossRef]
- Julia, S.; Robinson, D. Text Mining with R.; O’Reilly: Sebastopol, CA, USA, 2017. [Google Scholar]
- Feinerer, I.; Hornik, K. Package ‘tm’ Version 0.7-13, Text Mining Package; CRAN of R Project; R Foundation for Statistical Com-puting: Vienna, Austria, 2024. [Google Scholar]
- Goodrich, M.T.; Tamassia, R.; Goldwasser, M.H. Data Structures and Algorithms in Python, 1st ed.; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
- Sucar, L.E. Probabilistic Graphical Models Principles and Applications; Springer: New York, NY, USA, 2015. [Google Scholar]
- Kalisch, M.; Mächler, M.; Colombo, D.; Maathuis, M.H.; Bühlmann, P. Causal Inference Using Graphical Models with the R Package pcalg. J. Stat. Softw. 2012, 47, 1–26. [Google Scholar] [CrossRef]
- Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge MA, USA, 2012. [Google Scholar]
- Theodoridis, S. Machine Learning A Bayesian and Optimization Perspective; Elsevier: London, UK, 2015. [Google Scholar]
- Pearl, J. Causal diagrams for empirical research. Biometrika 1995, 82, 669–710. [Google Scholar] [CrossRef]
- Hogg, R.V.; McKean, J.M.; Craig, A.T. Introduction to Mathematical Statistics, 8th ed.; Pearson: Upper Saddle River, NJ, USA, 2018. [Google Scholar]
- Bruce, P.; Bruce, A.; Gedeck, P. Practical Statistics for Data Scientists; O’Reilly Media: Sebastopol, CA, USA, 2020. [Google Scholar]
- Hilbe, J.M. Modeling Count Data; Cambridge University Press: New York, NY, USA, 2014. [Google Scholar]
- Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data, 2nd ed.; Cambridge University Press: New York, NY, USA, 2013. [Google Scholar]
- Li, X.-J.; Tian, G.-L.; Zhang, M.; Ho, G.T.S.; Li, S. Modeling Under-Dispersed Count Data by the Generalized Poisson Distribution via Two New MM Algorithms. Mathematics 2023, 11, 1478. [Google Scholar] [CrossRef]
- Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Sidumo, B.; Sonono, E.; Takaidza, I. Count Regression and Machine Learning Techniques for Zero-Inflated Overdispersed Count Data: Application to Ecological Data. Ann. Data Sci. 2023, 11, 803–817. [Google Scholar] [CrossRef]
- Roback, P.; Legler, J. Beyond Multiple Linear Regression: Applied Generalized Linear Models And Multilevel Models in R.; CRC Press: Boca Raton, FL, USA, 2021. [Google Scholar]
- USPTO, The United States Patent and Trademark Office. Available online: http://www.uspto.gov (accessed on 1 April 2024).
- KIPRIS, Korea Intellectual Property Rights Information Service. Available online: www.kipris.or.kr (accessed on 1 April 2024).
- Sepah, S.C.; Jiang, L.; Peters, A.L. Long-Term Outcomes of a Web-Based Diabetes Prevention Program: 2-Year Results of a Single-Arm Longitudinal Study. J. Med. Internet Res. 2015, 17, e92. [Google Scholar] [CrossRef] [PubMed]
- Nakamura, K.A.; Kim, N. Digital Therapeutics in Hearing Healthcare: Evidence-Based Review. J. Audiol. Otol. 2024, 28, 159–166. [Google Scholar]
- Liu, M.; Schueller, S.M. Integrating Digital Therapeutics With Mental Healthcare Delivery. J. Health Serv. Psychol. Off. J. Natl. Regist. Health Serv. Psychol. 2024, 50, 77–85. [Google Scholar] [CrossRef]
- R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020; Available online: http://www.R-project.org (accessed on 1 April 2024).
- Kalisch, M. Package ‘pcalg’ Ver. 2.7–11, Methods for Graphical Models and Causal Inference; CRAN of R Project; R Foundation for Statistical Computing: Vienna, Austria, 2024. [Google Scholar]
- Wasserman, L. Topological Data Analysis. Annu. Rev. Stat. Its Appl. 2018, 5, 501–532. [Google Scholar] [CrossRef]
- Chazal, F.; Michel, B. An introduction to Topological Data Analysis: Fundamental and practical aspects for data scientists. Front. Artif. Intell. 2021, 4, 667963. [Google Scholar] [CrossRef] [PubMed]
Graph | Representation | Patent–Keyword Matrix |
---|---|---|
Node | Variable | Keyword |
Edge | Cause | Technology relationship |
Keyword | α = 0.05 | α = 0.01 | ||
---|---|---|---|---|
In | Out | In | Out | |
analysis | 0 | 3 | 0 | 2 |
compute | 1 | 4 | 1 | 4 |
digit | 4 | 3 | 3 | 3 |
generate | 6 | 1 | 3 | 2 |
intelligent | 2 | 2 | 0 | 2 |
learn | 3 | 1 | 3 | 0 |
machine | 0 | 2 | 0 | 2 |
network | 3 | 1 | 3 | 1 |
sensor | 2 | 2 | 3 | 1 |
signal | 4 | 1 | 5 | 0 |
smart | 0 | 2 | 0 | 2 |
therapeutics | 0 | 3 | 0 | 2 |
Dependent | Independent | |β| | p-Value |
---|---|---|---|
digit | analysis | 0.7398 | <0.0001 |
compute | 0.0395 | 0.0503 | |
signal | 0.1952 | <0.0001 | |
smart | 0.6709 | <0.0001 | |
generate | digit | 0.3594 | <0.0001 |
intelligent | 0.5361 | <0.0001 | |
learn | 0.3201 | <0.0001 | |
machine | 0.2507 | <0.0001 | |
sensor | 0.1344 | <0.0001 | |
therapeutics | 0.0110 | 0.6130 | |
signal | compute | 0.1045 | <0.0001 |
generate | 0.3181 | <0.0001 | |
sensor | 0.1756 | <0.0001 | |
therapeutics | 0.1421 | <0.0001 |
Dependent | Independent | |β| | p-Value |
---|---|---|---|
signal | compute | 0.0975 | <0.0001 |
digit | 0.2816 | 0.0006 | |
generate | 0.3122 | <0.0001 | |
sensor | 0.1682 | <0.0001 | |
therapeutics | 0.1533 | <0.0001 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jun, S. Technology Keyword Analysis Using Graphical Causal Models. Electronics 2024, 13, 3670. https://doi.org/10.3390/electronics13183670
Jun S. Technology Keyword Analysis Using Graphical Causal Models. Electronics. 2024; 13(18):3670. https://doi.org/10.3390/electronics13183670
Chicago/Turabian StyleJun, Sunghae. 2024. "Technology Keyword Analysis Using Graphical Causal Models" Electronics 13, no. 18: 3670. https://doi.org/10.3390/electronics13183670
APA StyleJun, S. (2024). Technology Keyword Analysis Using Graphical Causal Models. Electronics, 13(18), 3670. https://doi.org/10.3390/electronics13183670