Exploring Technology Influencers from Patent Data Using Association Rule Mining and Social Network Analysis
Abstract
:1. Introduction
2. Related Works
2.1. Patent Database
2.2. Patent Analysis Reviews
2.3. Summary of Findings and Observations from Related Works
3. Methodology
3.1. K-Means Clustering
- (1)
- Determine the number of cluster K from the data domain.
- (2)
- Choose K random points from data as centroid.
- (3)
- Set all the data points to the closest cluster centroid.
- (4)
- Recalculate the centroid of newly formed clusters.
- (5)
- Repeat until there is no change in the centroid, i.e., the data points are in their original clusters.
3.2. Text Mining
- Tokenizing: the process of breaking text from the document into single words (tokens or terms).
- Filtering out stop words: the process of removing meaningless elements (punctuation marks, special characters, prepositions, articles, pronouns, etc.)
- Transforming cases: the process of transforming all characters into either lowercase or uppercase to avoid confusion between similar words in different cases.
- Stemming: the process of reducing the base form of some single words or their stems.
3.3. Association Rule Mining (ARM)
3.4. Social Network Analysis (SNA)
- (1)
- Degree Centrality (DC): a center of connectivity in a network (Hub), which is the most influential in a network. The node that connects many edges is the most influential in a social network. A vertex v of graph G = (V, E) can be calculated as follows:CD (v) = deg (v)
- (2)
- Betweenness Centrality (BC): the shortest link or path by which an individual node bridges the other node in a network. A high value of BC indicates full control or that it plays an important role between two other nodes participating in a social network. The BC of vertex v of graph G = (V, E) can be calculated as follows:
- (3)
- Closeness Centrality (CC): the mean distance (or average shortest path) from each node to every other node in a network. The high value of CC indicates a broad connection of individuals in a social network. The CC of vertex v of graph G = (V, E) can be calculated as follows:
- (4)
- Eigenvector Centrality (EC): the relative scores assigned to all nodes in a network. The score of each node is measured from the links with other influential nodes. A high eigenvector score means that a node is connected to many nodes that themselves have high scores. The eigenvector centrality is used for measuring the importance of all nodes in a network. To find the EC score of a graph G = (V, E) with |V| vertices, let B = b_ (v, t) be the adjacency matrix, where b_ (v, t) = 1 if v is linked to vertex t and b_ (v, t) = 0 otherwise. The relative centrality, x, score of vertex v can be calculated as follows:
3.5. Conceptual Framework
- Step 1.
- Data collection and preprocessing
- (1.1)
- Extract all IPC codes and patent titles from EPO’s database.
- (1.2)
- Combine multiple datasets.
- (1.3)
- Perform data cleaning.
- (1.4)
- Transform datasets into a format suitable for K-means clustering and ARM.
- Step 2.
- K-means clustering
- (2.1)
- Perform data clustering to obtain the patent cluster profile.
- (2.2)
- Perform cluster validation to obtain an appropriate number of clusters.
- Step 3.
- Text mining
- (3.1)
- Perform text mining on the patent titles dataset to obtain the technical terms (key terms).
- Step 4.
- Association rule mining (ARM)
- (4.1)
- Apply ARM to IPC code dataset to each cluster to obtain association rules.
- (4.2)
- Apply ARM to technical terms (key terms) to obtain text association rules.
- Step 5.
- Social Network Analysis (SNA)
- (5.1)
- Use SNA to calculate the degree centrality (DC), betweenness centrality (BC), closeness centrality (CC), and eigenvector centrality (EC) of IPC association rules, and the text association rules that exist in each cluster.
- (5.2)
- Construct a network graph to visualize association rules and text association rules in each cluster.
- (5.3)
- Analysis of the results: the most influential technology, connectivity of technology, and technology prioritization, etc.
4. Results and Analysis
4.1. K-Means Clustering
4.2. Text Mining
4.3. Association Rule Mining (ARM)
- The rules with high support value implied the popularity of technologies and inventions. For example, the rule (C08L → C08K) and the rule (wind, turbine → blade) from the Chemistry cluster had the highest support value. This means technology C08K was widely developed on technology C08L and the invention of “blade” was widely developed from the invention of “wind” and “turbine”.
- The rules with high confidence value implied the probability of technologies and inventions. For example, the rule (E04C → E04H) and the rule (turbine, foundation → wind) in the Other Fields cluster had the highest confidence values. If technology “E04C” was developed, then technology “E04C” was more likely to developed as well. Additionally, if the inventions related to “turbine” and “foundation” were developed, the invention related to “wind” was more likely to be developed.
- The rules with high lift value implied a strong relationship between technologies and inventions development. The lift values from each cluster were greater than 1, which means the antecedent and the consequent of the technologies and inventions are more likely to associate with each other. The rules in the first order of each cluster had the highest lift, which means the technologies, as well as the inventions, are dependent on each other and the rules are potentially useful to predict the consequences in the future.
4.4. Social Network Analysis (SNA)
4.4.1. Constructing a Network of ARM
4.4.2. Summary of Influential Nodes from SNA
4.4.3. Application of the Results to Patent Management
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Kim, J.; Choi, J.; Park, S.; Jang, D. Patent keyword extraction for sustainable technology management. Sustainablility 2018, 10, 1287. [Google Scholar] [CrossRef] [Green Version]
- Chae, S.; Gim, J. A study on trend analysis of applicants based on patent classification systems. Information 2019, 10, 364. [Google Scholar] [CrossRef] [Green Version]
- WIPO. World Intellectual Property Organization. 2018. Available online: https://www.wipo.int/classifications/ipc/en/ (accessed on 1 August 2019).
- EPO. European Patent Office, ESPACENET Data Catalog. 2018. Available online: https://www.epo.org/searching-for-patents/business/patstat.html#tab-4 (accessed on 14 May 2020).
- Markellos, K.; Markellou, P.; Mayritsakis, G.; Perdikuri, K.; Sirmakessis, S.; Tsakalidis, A. Knowledge discovery in patent databases. In Proceedings of the 11th international conference on Information and knowledge management, McLean, WV, USA, 4–9 November 2002; pp. 672–677. [Google Scholar]
- Ampornphan, P.; Tongnam, S. Patent knowledge discovery using data analytics. In Proceedings of the ICIT: International Conference on Information Technology, Singapore, 27–29 December 2017; pp. 42–46. [Google Scholar]
- Larose, D.; Larose, C. Discovering Knowledge in Data; John Wiley Sons, Inc.: Hoboken, NJ, USA, 2014. [Google Scholar]
- Zhuang, L.; Li, L.; Li, T. Patent mining: A survey. ACM SIGKDD Explor. Newslett. 2014, 16, 1–19. [Google Scholar] [CrossRef]
- Ma, J.; Porter, A. Analyzing patent topical information to identifying technology pathways and potential opportunities. Scientometrics 2015, 102, 811–827. [Google Scholar] [CrossRef]
- Jun, S.; Park, S.; Jang, D. Patent management for technology forecasting: A case study of the bio-industry. JIPR 2012, 17, 539–546. [Google Scholar]
- Park, S.; Lee, S.-J.; Jun, S. A network analysis model for selecting sustainable technology. Sustainablility 2015, 7, 13126–13141. [Google Scholar] [CrossRef] [Green Version]
- Choi, J.; Jang, D.; Jun, S.; Park, S. A predictive model of technology transfer using patent analysis. Sustainablility 2015, 7, 16175–16195. [Google Scholar] [CrossRef] [Green Version]
- Choi, D.; Song, B. Exploring technological trends in logistics: Topic modeling-based patent analysis. Sustainablility 2018, 10, 2810. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Tao, Y.; Yang, Z.; Bi, K. Exploring and visualizing the patent collaboration network. Sustainability 2019, 11, 465. [Google Scholar] [CrossRef] [Green Version]
- Witten, H.; Frank, E.; Hall, A. Data Mining; Morgan Kaufmann: Burlington, MA, USA, 2011. [Google Scholar]
- Melvin, C.; Lee, W. Evaluation and improvement of procurement process with data analytics. IJACSA 2015, 6, 70–80. [Google Scholar]
- Talib, A.; Hanif, M.; Ayesha, S.; Fatima, F. Text mining: Techniques, applications and issues. Int. Adv. Comput. Sci. Appl. 2016, 7, 414–418. [Google Scholar] [CrossRef]
- Agrawal, R.; Srikant, R. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Databases, San Francisco, CA, USA, 12–15 September 1994; pp. 478–499. [Google Scholar]
- Yang, D.; Kang, J.; Park, Y.B.; Park, Y.J.; Oh, H.; Kim, S. Association rule mining and network analysis in oriental medicine. PLoS ONE 2013, 8, e59241. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Farooq, A.; Uzair, M.; Joyia, G.; Akram, U. Detection of influential nodes using social networks analysis based on network metrics. In Proceedings of the IEEE International Conference on Computing, Mathematics and Engineering Technologies, Sukkur, Pakistan, 3–4 March 2018. [Google Scholar]
- Lee, S.; Cha, Y.; Han, S.; Hyun, C. Application of association rule mining and social network analysis for understanding causality of construction defects. Sustainablility 2019, 11, 618. [Google Scholar] [CrossRef] [Green Version]
- Singh, S.; Thapar, V.; Bagga, S. Exploring the hidden pattern of cyberbullying on social media. In Proceedings of the International Conference on Computational Intelligence and Data Science, Gurgaon, India, 6–7 September 2019; pp. 1636–1647. [Google Scholar]
Section | Class | Subclass | Main Group | Subgroup |
---|---|---|---|---|
(1st Level) | (2nd Level) | (3rd Level) | (4th Level) | (5th Level) |
A | 43 | B | 5/00 | 5/02 |
Human necessities | Footwear | Characteristic Feature of Footwear | Footwear for Sporting Purposes | Footwear Boots |
Patent_ID | IPC_Code | Titles |
---|---|---|
317553806 | C02F | Wave power generator |
267832041 | C02F | Water reclamation system and method |
267805514 | C02F | Device and method for automatic wind-power sewage aeration |
317599314 | C02F | Method and apparatus for water distillation |
Cluster | Technical Sector | Number of Patents | Percent (%) |
---|---|---|---|
1 | Chemistry | 7909 | 4.84 |
2 | Electrical Engineering | 36,322 | 22.26 |
3 | Instruments | 11,570 | 7.09 |
4 | Mechanical Engineering | 97,270 | 59.63 |
5 | Other Fields | 10,046 | 6.15 |
Cluster | Technical Sector | Technical Fields | IPC Codes |
---|---|---|---|
1 | Chemistry | Organic Fine Chemistry | A61Q, C07B, C07C, C07D, C07F |
Biotechnology | C07G, C07K, C12M, C12N, C12P | ||
Pharmaceuticals | A61P | ||
Macromolecular Chemistry, Polymer | C08B, C08C, C08F, C08G, C08H | ||
Food Chemistry | A01H, A21D, A23B, C12C, C12G | ||
Basic Materials Chemistry | A01N, A01P, C05B, C05C, C05D | ||
Materials, Metallurgy | B22C, B22D, B22F, C01B, C01C | ||
Surface Technology, Coating | B05C, B05D, B32B, C23C, C23D | ||
Microstructural, Nanotechnology | B81B, B81C, B82B, B82Y | ||
Chemical Engineering | B08B, C14C, D06B, F25J, H05H | ||
Environmental Technology | A62C, B09B, C02F, F01N, G01T | ||
2 | Electrical Engineering | Electrical machinery, Apparatus, Energy | F21K, F21L, G06C, H01B, H01C |
Audiovisual Technology | G09F, G09G, G11B, H04R, H04S | ||
Telecommunications | G08C, H01P, H01Q, H04B, H04H | ||
Digital Communication | H04L, H04W | ||
Basic Communication Processes | H03B, H03C, H03D, H03F, H03G | ||
Computer Technology | G06C, G06E, G06F, G06G, G06K | ||
IT Methods for Management | G06Q | ||
Semiconductors | H01L | ||
3 | Instruments | Optics | G02B, G02C, G02F, G03B, H01S |
Measurement | G01B, G01C, G01D, G01F, G01G | ||
Control | G05B, G05D, G05F, G07B, G07C | ||
Medical Technology | A61L, A61M, A61N, G16H, H05G | ||
4 | Mechanical Engineering | Handling | B25J, B65B, B65C, B65D, B65G |
Machine Tools | A62B, B21B, B21C, B21D, B21F | ||
Engines, Pumps, Turbines | F01B, F20C, F03D, G21B, G21C | ||
Textile, Paper Machines | A41H, A43D, B41M, C14D, D01B | ||
Other special Machines | A01B, A01C, B28C, C03B, F41A | ||
Thermal Processes and Apparatus | F22B, F22D, F22G, F23B, F23C | ||
Mechanical Elements | F15B, F15C, F15D, F16B, F16C | ||
Transport | B60B, B60C, B60D, B60F, B60G | ||
5 | Other Fields | Furniture, Games | A47B, A47C, A47F, A47G, A47H |
Other Consumer Goods | A99Z, B42D, D04D, F25D, G10B | ||
Civil Engineering | E01B, E01C, E01D, E01H, E02B |
Cluster | Technical Sector | Examples of Patent Titles |
---|---|---|
1 | Chemistry | - Method for manufacturing resin impregnated multi-orientation composite material. |
- Hydrogen supplementation fuel apparatus and method. | ||
- Resin transfer molding process for an article containing a protective member. | ||
2 | Electrical Engineering | - Power storage and power transfer method and apparatus. |
- Active power optimizing and distributing method for wind generator unit of wind power station. | ||
- Street lamp with power supply system powered by wind heat energy. | ||
3 | Instruments | - Wind turbine blade load sensor. |
- Apparatus and method for automatically fabricating tape with threads for visualization of air streams on aerodynamic surfaces. | ||
- Method for sensing strain in a component in a wind turbine, optical strain sensing system and uses thereof. | ||
4 | Mechanical Engineering | - Wind turbine comprising a thermal management system. |
- Electrical power generation via the movement of a fluid body. | ||
- Integrated control apparatus and method for hybrid type wind turbine system. | ||
5 | Other Fields | - Tower for a wind farm with flange piece for connection of segments. |
- Waste-receiving device for incontinent persons. | ||
- Hydraulic geofracture energy storage system. |
Cluster | Technical Sector | Examples of Extracted Technical Terms |
---|---|---|
1 | Chemistry | system, composite, wind, generator, device, energy, power, turbine, material, blade, coat, structure, water, manufacture, process, product, compound, product, apparatus |
2 | Electrical Engineering | generator, wind, device, electric, control, energy, turbine, apparatus, machine, solar, magnet, operator, motor, supply, use, converter, base, plant, grid, storage |
3 | Instruments | wind, device, power, control, turbine, generator, monitor, apparatus, detector, measurement, testing, energy, sensor, electric, blade, base, usage, operator, determinator |
4 | Mechanical Engineering | apparatus, base, control, converter, device, electric, energy, generator, grid, machine, magnet, motor, operator, plant, solar, storage, supply, turbine, use, wind |
5 | Other Fields | wind, tower, system, turbine, structure, power, foundation, device, generator, energy, installer, construct, support, assemble, apparatus, concrete, plant, water, type |
Cluster | Antecedent | Consequent | Support (%) | Confidence (%) | Lift |
---|---|---|---|---|---|
Chemistry | C22C | C21D | 2.2 | 51.3 | 16.3 |
C08L | C08K | 2.5 | 49.1 | 12 | |
C01B | B01J | 1.8 | 33.6 | 8.6 | |
C08G | C08L | 1.8 | 33.4 | 6.6 | |
C08G | C08K | 1.1 | 20.9 | 5.1 | |
Electrical Engineering | F21V | F21S | 1.3 | 52.1 | 23.5 |
G06F | G06Q | 1.2 | 13.4 | 2.8 | |
H02M | H02J | 2.7 | 44.6 | 1.7 | |
H01M | H02J | 1.2 | 37.2 | 1.4 | |
H02M | H02P | 1.1 | 18.3 | 1.3 | |
Instruments | A61N | A61B | 0.5 | 10.3 | 12 |
A61N | A61M | 0.3 | 39.4 | 11.3 | |
G01K | G01W | 0.3 | 17.1 | 4.9 | |
A61F | A61M | 0.3 | 17.1 | 4.8 | |
A61B | A61M | 0.6 | 14 | 4 | |
Mechanical Engineering | F01D | F02C | 1.3 | 22.2 | 8.1 |
B63B | F03D | 1.2 | 44 | 7.1 | |
F16H | F03D | 1.6 | 39.4 | 6.3 | |
F03B | F03D | 3.5 | 38 | 6.1 | |
B29C | F03D | 1.3 | 35.7 | 5.7 | |
Other Fields | E04B | E04C | 1.3 | 16.5 | 3.8 |
E04C | E04H | 2.3 | 54.5 | 1.6 | |
E04H | E04B | 3.9 | 11.6 | 1.4 | |
E04G | E04H | 2 | 37.4 | 1.1 | |
E02D | E02B | 3.4 | 14.7 | 1 |
Cluster | Antecedent | Consequent | Support (%) | Confidence (%) | Lift |
---|---|---|---|---|---|
Chemistry | fiber | reinforce | 1.4 | 40.4 | 11.7 |
wind, turbine | blade | 3 | 59.9 | 9.4 | |
wind, generator | power | 2.2 | 55.2 | 6.3 | |
system, power | generator | 1.6 | 53.6 | 5.7 | |
fiber | material | 1.1 | 31.5 | 4.2 | |
structure | composite | 2.6 | 41.8 | 2.7 | |
hydrogen | system | 1.6 | 49.8 | 2.6 | |
composite | material | 2.8 | 18.4 | 2.4 | |
Electrical Engineering | storage | energy | 2.1 | 70.2 | 5.5 |
electric | machine | 3.3 | 22.6 | 4.4 | |
machine | electric | 3.3 | 66.1 | 4.4 | |
wind, control | turbine | 1.8 | 36.8 | 3.6 | |
plant | wind | 1.8 | 60.5 | 2.3 | |
wind, device | generator | 2.2 | 62.7 | 2.3 | |
generator, solar | wind | 1.1 | 58.1 | 2.2 | |
wind, solar | generator | 1.1 | 55.3 | 2 | |
magnet | generator | 1.5 | 36.4 | 1.3 | |
Instruments | wind, blade | turbine | 2.5 | 75.1 | 5.9 |
wind, power | generator | 3 | 53.4 | 4.4 | |
wind, turbine | blade | 2.5 | 23.6 | 4.4 | |
wind, test | turbine | 1.3 | 51.9 | 4 | |
wind, device | generator | 1.9 | 47.2 | 3.9 | |
turbine, monitor | wind | 1.4 | 85.8 | 3.8 | |
turbine | wind | 10.7 | 84 | 3.7 | |
device, power | generator | 1.3 | 43 | 3.6 | |
Mechanical Engineering | method, rotor | blade | 1.1 | 66.8 | 5.2 |
method, blade | rotor | 1.1 | 31.3 | 5.1 | |
turbine, rotor | blade | 2 | 62.8 | 4.9 | |
turbine, blade | rotor | 2 | 27.7 | 4.5 | |
rotor | blade | 3.4 | 55.5 | 4.3 | |
plant | power | 2.8 | 78.2 | 3.3 | |
method, power | control | 1.2 | 25 | 3.1 | |
driven | generator | 2.8 | 80.7 | 3 | |
power | generator | 14.7 | 63.9 | 2.4 | |
Other Fields | wind, power | plant | 1.6 | 22.4 | 5.9 |
wind, structure | support | 1.2 | 33.7 | 5.9 | |
wind, plant | power | 1.6 | 60.8 | 4.5 | |
turbine, foundation | wind | 3.1 | 98.9 | 3.6 | |
turbine, installer | wind | 1.7 | 97.5 | 3.6 | |
concrete | tower | 2.4 | 52.8 | 2.6 | |
plant | wind | 2.7 | 71.9 | 2.6 | |
plant | tower | 1.3 | 34.1 | 1.7 | |
structure, support | wind | 1.2 | 44.1 | 1.6 | |
wind, system | tower | 1 | 31.9 | 1.6 | |
tower, concrete | wind | 1 | 44 | 1.6 |
Technology Sector | Influential Nodes | Degree Centrality | Betweenness Centrality | Closeness Centrality | Eigenvector Centrality |
---|---|---|---|---|---|
Chemistry | IPC code | C08G | C08G | B05D, B32B B01J, C01B C21D, C22C | B32B |
Key terms | system | system | system | system | |
Electrical Engineering | IPC code | H02J | H02J | G06F, G06Q F21V, F21S | H02J |
Key terms | generator | generator | generator | Generator | |
Instruments | IPC code | G01D | G01W | A61M, A61B | G01D |
Key terms | wind | wind | wind | Wind | |
Mechanical Engineering | IPC code | F03D | F03D | F03D | F03D, F01D |
Key terms | method | turbine | method generator power | Method | |
Other Fields | IPC code | E04H | E04H | E04H | E04H |
Key terms | wind | wind | wind, turbine system, generator, power, device, method, tower, energy, composite | Wind |
Technical Sector | Technology Influencer | Defined Technology |
---|---|---|
Chemistry | C08G | Macromolecular chemistry, polymers; Reaction involving carbon to carbon. |
Electrical Engineering | H02J | Electrical machinery, apparatus, energy; Circuit arrangement, system for supplying and storing electric power. |
Instruments | G01D | Measurement; measuring apparatus for two or more variable. |
Mechanical Engineering | F03D | Engines, pumps, turbines; Machines or engines for liquids. |
Other Fields | E04H | Civil engineering; Buildings or like structures for particular purposes |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ampornphan, P.; Tongngam, S. Exploring Technology Influencers from Patent Data Using Association Rule Mining and Social Network Analysis. Information 2020, 11, 333. https://doi.org/10.3390/info11060333
Ampornphan P, Tongngam S. Exploring Technology Influencers from Patent Data Using Association Rule Mining and Social Network Analysis. Information. 2020; 11(6):333. https://doi.org/10.3390/info11060333
Chicago/Turabian StyleAmpornphan, Pranomkorn, and Sutep Tongngam. 2020. "Exploring Technology Influencers from Patent Data Using Association Rule Mining and Social Network Analysis" Information 11, no. 6: 333. https://doi.org/10.3390/info11060333
APA StyleAmpornphan, P., & Tongngam, S. (2020). Exploring Technology Influencers from Patent Data Using Association Rule Mining and Social Network Analysis. Information, 11(6), 333. https://doi.org/10.3390/info11060333