Zero-Inflated Patent Data Analysis Using Compound Poisson Models
Abstract
:1. Introduction
2. Related Works
3. Proposed Method
3.1. Preprocessing of Text Data
3.2. Proposed Model
3.3. Model Evaluation and Procedure of Proposed Method
4. Experimental Results
4.1. Simulation Data Analysis
4.2. Patent Data Analysis
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Arijanto, J.E.; Geraldy, S.; Tania, C.; Suhartono, D. Personality Prediction Based on Text Analytics Using Bidirectional Encoder Representations from Transformers from English Twitter Dataset. Int. J. Fuzzy Log. Intell. Syst. 2021, 21, 310–316. [Google Scholar] [CrossRef]
- Kim, S.; Son, D.; Park, M.; Hwang, H. Developing a Big Data Analytic Model and a Platform for Particulate Matter Prediction: A Case Study. Int. J. Fuzzy Log. Intell. Syst. 2019, 19, 242–249. [Google Scholar] [CrossRef] [Green Version]
- Lee, J.; Lee, J. Constructing Efficient Regional Hazardous Weather Prediction Models through Big Data Analysis. Int. J. Fuzzy Log. Intell. Syst. 2016, 16, 1–12. [Google Scholar] [CrossRef] [Green Version]
- Zolkepli, M.; Dong, F.; Hirota, K. Automatic Switching of Clustering Methods based on Fuzzy Inference in Bibliographic Big Data Retrieval System. Int. J. Fuzzy Log. Intell. Syst. 2014, 14, 256–267. [Google Scholar] [CrossRef] [Green Version]
- Feinerer, I.; Hornik, K.; Meyer, D. Text mining infrastructure in R. J. Stat. Softw. 2008, 25, 1–54. [Google Scholar] [CrossRef] [Green Version]
- Feinerer, I.; Hornik, K. Package ‘tm’ Version 0.7-8, Text Mining Package; CRAN of R Project; R Foundation for Statistical Computing: Vienna, Austria, 2022. [Google Scholar]
- Jun, S.; Park, S.; Jang, D. Document Clustering Method Using Dimension Reduction and Support Vector Clustering to Overcome Sparseness. Expert Syst. Appl. 2014, 41, 3204–3212. [Google Scholar] [CrossRef]
- Kim, J.; Jun, S. Zero-Inflated Poisson and Negative Binomial Regressions for Technology Analysis. Int. J. Softw. Eng. Its Appl. 2016, 10, 431–448. [Google Scholar] [CrossRef]
- Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data, 2nd ed.; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
- Feng, C.X. A comparison of zero-inflated and hurdle models for modeling zero-inflated count data. J. Stat. Distrib. Appl. 2021, 8, 8. [Google Scholar] [CrossRef]
- Hilbe, J.M. Negative Binomial Regression, 2nd ed.; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Hilbe, J.M. Modeling Count Data; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Dencks, S.; Piepenbrock, M.; Schmitz, G. Assessing Vessel Reconstruction in Ultrasound Localization Microscopy by Maximum Likelihood Estimation of a Zero-Inflated Poisson Model. Proc. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2020, 67, 1603–1612. [Google Scholar] [CrossRef]
- Hwang, H.; Song, E.; Park, N.; Lee, W. Analyzing Precipitation Data with Zeroes Using Compound Poisson Distribution. J. Korean Data Anal. Soc. 2016, 18, 129–140. [Google Scholar]
- Sert, O.C.; Sahin, S.D.; Özyer, T.; Alhajj, R. Analysis and prediction in sparse and high dimensional text data: The case of Dow Jones stock market. Phys. A: Stat. Mech. Its Appl. 2020, 545, 123752. [Google Scholar] [CrossRef]
- Unnikrishnan, P.; Govindan, V.K.; Madhu Kumar, S.D. Enhanced sparse representation classifier for text classification. Expert Syst. Appl. 2019, 129, 260–272. [Google Scholar]
- Zhang, Y. Package ‘cplm’ ver. 0.7-10, Likelihood-Based and Bayesian Methods for Various Compound Poisson Linear Models; CRAN of R Project; R Foundation for Statistical Computing: Vienna, Austria, 2022. [Google Scholar]
- Hajjaji, Y.; Boulila, W.; Farah, I.R.; Romdhani, I.; Hussain, A. Big data and IoT-based applications in smart environments: A systematic review. Comput. Sci. Rev. 2021, 39, 100318. [Google Scholar] [CrossRef]
- Javanmardi, S.; Shojafar, M.; Mohammadi, R.; Persico, V.; Pescapè, A. S-FoS: A secure workflow scheduling approach for performance optimization in SDN-based IoT-Fog networks. J. Inf. Secur. Appl. 2023, 72, 103404. [Google Scholar] [CrossRef]
- Park, S.; Lee, S.; Jun, S. Patent Big Data Analysis using Fuzzy Learning. Int. J. Fuzzy Syst. 2017, 19, 1158–1167. [Google Scholar] [CrossRef]
- Park, S.; Jun, S. Technological Cognitive Diagnosis Model for Patent Keyword Analysis. ICT Express 2020, 6, 57–61. [Google Scholar] [CrossRef]
- Hunt, D.; Nguyen, L.; Rodgers, M. Patent Searching Tools & Techniques; Wiley: Hoboken, NJ, USA, 2007. [Google Scholar]
- Roper, A.T.; Cunningham, S.W.; Porter, A.L.; Mason, T.W.; Rossini, F.A.; Banks, J. Forecasting and Management of Technology; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Gamba, S. The effect of intellectual property rights on domestic innovation in the pharmaceutical sector. World Dev. 2017, 99, 15–27. [Google Scholar] [CrossRef] [Green Version]
- Truica, C.; Darmont, J.; Boicea, A.; Radulescu, F. Benchmarking top-k keyword and top-k document processing with T2K2 and T2K2D2. Future Gener. Comput. Syst. 2018, 85, 60–75. [Google Scholar] [CrossRef] [Green Version]
- Truica, C.; Radulescu, F.; Boicea, A. Comparing Different Term Weighting Schemas for Topic Modeling. In Proceedings of the 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Timisoara, Romania, 24–27 September 2016; pp. 307–310. [Google Scholar]
- Radu, R.; Radulescu, I.; Truica, C.; Apostol, E.; Mocanu, M. Clustering Documents using the Document to Vector Model for Dimensionality Reduction. In Proceedings of the 2020 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR), Cluj-Napoca, Romania, 21–23 May 2020; pp. 1–6. [Google Scholar]
- Radulescu, I.; Truica, C.; Apostol, E.; Boicea, A.; Mocanu, M.; Popeanga, D.; Radulescu, F. Density-based Text Clustering using Document Embeddings. In Proceedings of the 36th IBIMA Conference, Granada, Spain, 4–5 November 2020; pp. 6222–6230. [Google Scholar]
- Mitroi, M.; Truica, C.; Apostol, E.; Florea, A. Sentiment Analysis using Topic-Document Embeddings. In Proceedings of the IEEE 16th International Conference on Intelligent Computer Communication and Processing, Cluj-Napoca, Romania, 3–5 September 2020; pp. 75–82. [Google Scholar]
- Truica, O.; Aostol, E.; Paschke, A. Awakened at CheckThat! 2022: Fake news detection using BiLSTM and sentence transformer. In Proceedings of the Conference and Labs of the Evaluation Forum, Bologna, Italy, 5–8 September 2022; pp. 1–9. [Google Scholar]
- Altay, A.; Baykal-Gürsoy, M. Imperfect rail-track inspection scheduling with zero-inflated miss rates. Transp. Res. Part C 2022, 138, 103608. [Google Scholar] [CrossRef]
- Kim, J.; Jun, S. Graphical Causal Inference and Copula Regression Model for Apple Keywords by Text Mining. Adv. Eng. Inform. 2015, 29, 918–929. [Google Scholar] [CrossRef]
- Kim, J.; Ryu, J.; Lee, S.; Jun, S. Penalized Regression Models for Patent Keyword Analysis. Model Assist. Stat. Appl.-Int. J. 2017, 12, 239–244. [Google Scholar] [CrossRef] [Green Version]
- Wagh, Y.S.; Kamalja, K.K. Zero-inflated models and estimation in zero-inflated Poisson distribution. Commun. Stat. -Simul. Comput. 2018, 47, 2248–2265. [Google Scholar] [CrossRef]
- R Development Core Team. R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing. Available online: http://www.R-project.org (accessed on 1 March 2022).
- Babai, M.Z.; Chen, H.; Syntetos, A.A.; Lengu, D. A compound-Poisson Bayesian approach for spare parts inventory forecasting. Int. J. Prod. Econ. 2021, 232, 107954. [Google Scholar] [CrossRef]
- Haakonsson, S.; Rodríguez, M.A.; Carballo, C.; Pérez, M.D.C.; Arocena, R.; Bonilla, S. Predicting cyanobacterial biovolume from water temperature and conductivity using a Bayesian compound Poisson-Gamma model. Water Res. 2020, 176, 115710. [Google Scholar] [CrossRef]
- Prak, D.; Teunter, R.; Babai, M.Z.; Boylan, J.E.; Syntetos, A. Robust compound Poisson parameter estimation for inventory control. Omega 2021, 104, 102481. [Google Scholar] [CrossRef]
- Xie, J.; Zhang, Z. Statistical estimation for some dividend problems under the compound Poisson risk model. Insur. Math. Econ. 2020, 95, 101–115. [Google Scholar] [CrossRef]
- Su, W.; Yong, Y.; Zhang, Z. Estimating the Gerber–Shiu function in the perturbed compound Poisson model by Laguerre series expansion. J. Math. Anal. Appl. 2019, 469, 705–729. [Google Scholar] [CrossRef]
- Zhang, Y. Likelihood-based and Bayesian methods for Tweedie compound Poisson linear mixed models. Stat. Comput. 2013, 23, 743–757. [Google Scholar] [CrossRef]
- Hogg, R.V.; Mckean, J.W.; Craig, A.T. Introduction to Mathematical Statistics, 8th ed.; Pearson: Essex, UK, 2020. [Google Scholar]
- Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Bruce, P.; Bruce, A.; Gedeck, P. Practical Statistics for Data Scientists, 2nd ed.; O’Reilly Media: Sebastopol, CA, USA, 2020. [Google Scholar]
- Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; Chapman & Hall/CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
- Li, H.; Chen, R.; Nguyen, H.; Chung, Y.; Gao, R.; Demirtas, H. Package ‘RNGforGPD’ Version 1.1.0, Random Number Generation for Generalized Poisson Distribution; CRAN of R Project; R Foundation for Statistical Computing: Vienna, Austria, 2022. [Google Scholar]
- Li, H.; Demirtas, H.; Chen, R. RNGforGPD: An R Package for Generation of Univariate and Multivariate Generalized Poisson Data. R J. 2020, 12, 173–188. [Google Scholar] [CrossRef]
- USPTO. The United States Patent and Trademark Office. Available online: http://www.uspto.gov (accessed on 1 May 2022).
- KIPRIS. Korea Intellectual Property Rights Information Service. Available online: www.kipris.or.kr (accessed on 1 March 2022).
Transformation of tm Package | Function |
---|---|
tm_map(x,tolower) tm_map(x,removeNumbers) tm_map(x,removePunctuation) tm_map(x,removeWords,stopwords(“english”)) tm_map(x,stripWhitespace) tm_map(x,removeWords, c(word list)) | Convert uppercases to lowercases Remove numbers Remove punctuations Remove stop-words Remove unnecessary spaces Remove user defined words (word list) |
Variable | Min | Q1 | Median | Q3 | Max | Mean | Percentage of Zero (%) |
---|---|---|---|---|---|---|---|
Y X1 X2 | 0.0000 0.0000 0.0000 | 0.0000 0.0000 0.0000 | 1.0000 0.0000 0.0000 | 1.0000 1.0000 1.0000 | 7 4 5 | 0.7365 0.3028 0.4224 | 49.44 74.08 66.27 |
Explanatory Variable | LM | GLM | ZIP | CP |
---|---|---|---|---|
X1 X2 | 0.5195 (0.0001) 0.2104 (0.0001) | 0.4975 (0.0001) 0.2269 (0.0001) | 0.4453 (0.0001) 0.1712 (0.0001) | 0.4981 (0.0001) 0.2274 (0.0001) |
AIC | 24,586.03 | 21,601.06 | 21,539.37 | 9087.50 |
Keyword | Min | Q1 | Median | Q3 | Max | Mean | Percentage of Zero Values |
---|---|---|---|---|---|---|---|
drone control device flight aircraft wing power data rotate motor drive camera signal detect battery sensor shaft propel automat charge remote | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 | 28 33 24 27 26 34 26 22 24 19 22 27 21 25 26 16 20 24 17 28 21 | 0.1750 1.4280 0.9813 0.7134 0.6868 0.4832 0.4548 0.4104 0.3800 0.3727 0.2996 0.2807 0.2724 0.2619 0.2320 0.2266 0.2160 0.2072 0.1669 0.1625 0.1512 | 0.9518 0.5806 0.6546 0.7259 0.7748 0.8810 0.8254 0.8610 0.8432 0.8650 0.8742 0.8928 0.8973 0.8913 0.9284 0.9069 0.9158 0.9229 0.9049 0.9606 0.9274 |
Explanatory Variable | LM | GLM | ZIP | CP |
---|---|---|---|---|
control | −0.0047 (0.0106) | −0.0242 (0.0001) | 0.0495 (0.0001) | −0.0259 (0.0509) |
device | −0.0076 (0.0005) | −0.0503 (0.0001) | −0.0355 (0.0001) | −0.0565 (0.0007) |
flight | −0.0023 (0.3590) | −0.0118 (0.0590) | 0.0417 (0.0001) | −0.0181 (0.3271) |
aircraft | −0.0262 (0.0001) | −0.4066 (0.0001) | −0.0776 (0.0001) | −0.2913 (0.0001) |
wing | −0.0096 (0.0001) | −1.1221 (0.0001) | −0.0178 (0.1172) | −0.0991 (0.0001) |
power | −0.0122 (0.0001) | −0.1155 (0.0001) | 0.0107 (0.3166) | −0.0887 (0.0011) |
data | 0.0153 (0.0001) | −0.0528 (0.0001) | 0.0011 (0.8683) | 0.0592 (0.0007) |
rotate | −0.0133 (0.0003) | −0.1484 (0.0001) | 0.0160 (0.3388) | −0.1455 (0.0002) |
motor | −0.0166 (0.0001) | −0.1615 (0.0001) | −0.0513 (0.0044) | −0.1289 (0.0002) |
drive | −0.0128 (0.0013) | −0.1493 (0.0001) | −0.0515 (0.0017) | −0.0751 (0.0463) |
camera | −0.0054 (0.1530) | −0.0326 (0.0007) | −0.0031 (0.7927) | −0.0277 (0.3116) |
signal | 0.0154 (0.0001) | 0.0632 (0.0001) | 0.0047 (0.5774) | 0.0816 (0.0002) |
detect | −0.0047 (0.2488) | −0.0313 (0.0015) | −0.0184 (0.1340) | −0.0168 (0.5493) |
battery | −0.0139 (0.0001) | −0.1312 (0.0001) | −0.0220 (0.1689) | −0.1398 (0.0001) |
sensor | 0.0121 (0.0085) | 0.0540 (0.0001) | 0.0543 (0.0001) | 0.0654 (0.0165) |
shaft | −0.0160 (0.0008) | −0.3207 (0.0001) | 0.0184 (0.5301) | −0.3351 (0.0001) |
propel | −0.0013 (0.7699) | 0.0109 (0.3811) | 0.0053 (0.7493) | 0.0085 (0.8046) |
automat | −0.0164 (0.0109) | −0.1042 (0.0001) | 0.0093 (0.6419) | −0.1148 (0.0289) |
charge | 0.0048 (0.2000) | 0.0281 (0.0005) | 0.0354 (0.0001) | 0.0299 (0.2274) |
remote | 0.0071 (0.2329) | 0.0325 (0.0120) | −0.0163 (0.3042) | 0.0419 (0.2808) |
AIC | 172,438.10 | 74,791.42 | 36,524.01 | 35,501.73 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, S.; Jun, S. Zero-Inflated Patent Data Analysis Using Compound Poisson Models. Appl. Sci. 2023, 13, 4505. https://doi.org/10.3390/app13074505
Park S, Jun S. Zero-Inflated Patent Data Analysis Using Compound Poisson Models. Applied Sciences. 2023; 13(7):4505. https://doi.org/10.3390/app13074505
Chicago/Turabian StylePark, Sangsung, and Sunghae Jun. 2023. "Zero-Inflated Patent Data Analysis Using Compound Poisson Models" Applied Sciences 13, no. 7: 4505. https://doi.org/10.3390/app13074505
APA StylePark, S., & Jun, S. (2023). Zero-Inflated Patent Data Analysis Using Compound Poisson Models. Applied Sciences, 13(7), 4505. https://doi.org/10.3390/app13074505