3. Proposed Model
The proposed model for analyzing trends in the invention of applicants consists of four steps, as shown in
Figure 1. The first is to collect patent data and extract the desired information from it. We acquire raw data using OpenAPI from the Korea Intellectual Property Rights Information Service (KIPRIS), which provides information on patents invented in the Republic of Korea. Patent metadata and specific information such as the title, claim, abstract, invention date, and publication date are extracted from patent documents. In the second step, patent data are clustered based on time-series information (date of invention, year of publication) for the applicant’s invention trend analysis. In the third step, a representative tree of each patent is created using the tree structure of patent classification systems as shown in
Table 1. In the fourth step, common patterns are extracted from the experimental data using the IPC and CPC systems as shown in
Table 2. Finally, the representative technology classification codes are extracted based on the common patterns of the applicants. After that, the trends of patent inventions are analyzed and compared for various applicants, including universities, companies, and research institutes.
As shown in
Figure 2, a patent can contain different classification system codes, and based on their hierarchical structure, these codes can be integrated and merged into a representative tree for the patent, which can, in turn, be used to compare the classification system codes between the patents and thus extract the applicant’s technical invention patterns.
Table 2 shows common patterns that can be extracted from the compared patents using the representative tree derived from each patent.
Based on the common patterns defined in
Table 2, the patents of applicants were grouped by year, and two compared patents were paired. After that, a patent invented in a specific
T year was defined as a prior patent (PP), and a subsequent patent invented in a
year was defined as the following patent (FP). The proposed model determined whether the applicant had continued to build on specific techniques persistence or to invent novel techniques novelty. Persistence was divided into three patterns: sameAS (
S), transition (
T) and expansion (
E), and novelty was defined as the independent (
I) pattern. It implied that some technologies included in the prior patent appeared in the following patent; essentially, the technology classification code representing the technology was continuously included in the FP as well as the PP. First, the transition pattern showed that some technologies at the sub-group level included in the PP reappeared in FP, indicating that some of the technologies in PP had propagated and influenced the invention of FP. Second, the expansion pattern showed that some technologies at the main-group level emerged with other domain disciplines in the FP; that is, the existing element or common technologies integrated and converged with technologies from other fields. Novelty represented FP with technologies in a completely different field than PP and was defined by an independent pattern. Therefore, FP indicated a case in which the classification code was completely different, or only some of the class level technologies were included from PP.
Figure 3 shows a concrete example for depicting common patterns. There were three patents invented in different years;
contained one classification code, and
and
each contained two classification codes. In this case, a total of eight comparison pairs were generated, and in order to compare patent documents, a tree structure was created representing each patent based on its classification codes. In the first comparison pair consisting of
and
,
may be expressed as PP, and
may be expressed as FP. Because FP contained the complete classification code of PP, the comparison pair was determined to be the same pattern (
S). In this manner, the classification codes of the comparison pairs composed of PP and FP were compared to determine the second comparison pair as the transition pattern (
T) and the third as the expansion pattern (
E), thereby extracting patterns related to the persistence of the patent invention. For the last comparison pair, PP and FP did not have a common classification code, and therefore, the pair was determined to be independent (
I) and used as a novelty pattern.
4. Experiment
KIPRIS provided a database of information on domestic and foreign intellectual property rights, providing open API or bulk files. In this study, the patent information (application number, publication date, abstract, claims, IPC and CPC) of the applicant was collected by using the Open API to analyze the trend in the applicant’s technical inventions based on the proposed model. The patent invention trend selected three different domains (university, company, and research institute) based on the assumption that the respective characteristics of the applicant type would be different and selected three applicants from each domain, making a total of nine. Also, to analyze the applicants’ technical invention trends over the last ten years (2007–2016), the relevant information on their inventions and published patents were collected from KIPRIS.
Table 3 shows tree structure information constructed using collected applicant-specific patent data. To do this, we created an applicant-based taxonomy tree using the hierarchical structure of each taxonomy (IPC and CPC) included in the patent and show the average depth and width of the tree. It could be seen that based on the average depth, the difference between the IPC and CPC was not significant, but based on the average width, the CPC was more comprehensive than the IPC. Thus, it was shown that the CPC classification code could represent more types of technical fields than the IPC, which could be understood in the same context as the reason for the development of the CPC despite the prior existence of the IPC. Based on the model proposed in this paper, the collected data were refined for comparative analysis. In addition, through the proposed method, only the patents with both IPC and CPC codes were constructed as experimental data to compare the characteristics of IPC and CPC systems.
Table 4 shows the number of collected and purified data by the applicant. In order to analyze the applicant’s technical invention trend based on the purified data, a comparison pair wes generated based on the publication year, and the common patterns same (
S), transition (
T), expansion (
E), and independent (
I) were extracted.
Table 4 shows the results of the three applicants representing the company domain. They were seen to have significantly higher (
E) patterns than (
T) patterns. Also, companies had more (
S) patterns than the research group. It means that companies focused on the expansion of existing technologies and the development of related technologies more than research institutes.
Table 5,
Table 6 and
Table 7 show the number of common patterns extracted for each field by IPC and CPC. In all the applicant groups, the pattern I, representing the independence of the applicant’s invention, occupied the most significant ratio compared to
S,
T, and
E, which represented the persistence of the invention. The reason is that patent inventions must be guaranteed novelty and progression fundamentally in order to claim the rights of the patent. Although independent patterns were generated more frequently than other common patterns, the persistence patterns occurred differently by each applicant type such as research institute, company, and university.
Table 5 shows the number of common patterns for the group of research institute applicants by IPC and CPC. The results show that
E had a relatively higher occurrence than
S and
T; it can thus be confirmed that inventions were carried out by applying and extending the existing invention patents to different domains.
Table 6 shows the experimental results of the group of companies. In the case of company applicants, the
E pattern appeared to have the highest frequency, as in the group of research institutes, and of the remaining persistence patterns,
S had a higher frequency than
T. It appears that the group of company applicants aims to invent new patents in the development and commercialization of services based on the technologies as well as the patent inventions to claim the technology held by the company. Mostly, the group of research institutes focuses on the expansion and propagation of existing technologies in the invention of new technology, while the group of companies invents to develop the core technologies of companies for specific service areas in response to changes in the market trends.
Table 7 shows the experimental results for the group of university applicants. In particular, in the case of the group of university applicants, the average pattern of 97.1% of the independent pattern (
I) in the university applicants, which is about 30% larger than the research institute applicants (67%) and the company applicants (65.8%). Also, the results show that the same order of
S,
T, and
E patterns shown for the research institute applicants, indicating that universities also focus on R&D research like the research institute applicants, which is a remarkable difference from the group of company applicants.
Table 8 shows the representative technology classification codes extracted by the persistence pattern (
S,
T,
E) for tracking the change in invention trends for the representative technical fields of the applicants. For this purpose, each representative technical classification code was selected as the code that appears with the highest frequency in each pattern (
S,
T,
E). As shown in the experimental results in
Table 3, because the IPC and CPC systems have the characteristics of different tree structures, representative technical codes were extracted respectively to compare and analyze the characteristics of each classification system. Trend analysis results based on the representative technology codes of the applicants shown in
Table 8, and the invention trends of all applicants are classified into three different types as follows.
- -
A case in which the representative technology codes belong to the same field in both classification systems and the representative technology codes are also the same (, , ).
- -
A case in which the representative technology codes belong to different fields of IPC and CPC (, ).
- -
A case in which the representative technology codes belong to the same field in both IPC and CPC but more specific in CPC (, , , ).
The invention trends based on representative technology codes derived from common patterns of each applicant are shown in
Figure 4,
Figure 5,
Figure 6 and
Figure 7.
Figure 4 shows that the research institute
has continued the invention of technology in the ‘information retrieval’ field in both
S,
T, and
E patterns, with the same representative technology code (‘G06F 17/30’) in IPC and CPC.
Figure 5 shows the second type of invention patterns, research institute
, whose representative technical code belongs to the same field (‘semiconductor’), but the CPC classification system represents the more detailed technical codes (‘C09K 11/77’, ‘H01L 33/50’) than the IPC classification system. That is, the CPC-based representative technology code (‘C09K 11/77’) indicates that the applicant of
had concentrated on the technical invention related to the rare earth metals, but it was not as specific as the IPC-based code.
Figure 6 shows the technology invention trend for
, one of the group of university applicants.
Figure 6a,b show the technical codes representing the IPC classification scheme, and
Figure 6c,d are the technical codes representing the CPC classification scheme.
Figure 6d shows the technical code for the solid-state drive (SSD), which specifies a more detailed technical field than the ‘H01L 21/02’ (semiconductor) code of the existing IPC classification scheme. In the case of the
T pattern, the representative technology seems to have changed from ‘G01N 33/53’ to ‘H04L 9/32’. This indicates that the inventions that
focused on in the field of biology (immunoassay) are gradually shifting to the field of network authorization in recent years. As a result, it is possible to confirm the state of the art in detail of the technical invention not shown in the IPC classification system through the representative technology code based on the CPC classification system.
Figure 7 shows the representative technology codes of the company applicant
and the trend of the invention shown in the experimental results of
Table 8. From the IPC system, ‘G06Q 50/10’ (Pattern
S) and ‘G06F 9/44’ (Pattern
T) were derived as representative technical codes. ‘G06Q 50/10’ refers to the service technology field, and in the case of
, the related technology invention showed the highest point in 2009, but gradually decreased after that. On the other hand, the representative technology based on CPC showed that the ‘G06F 17/30’ information technology field has been increasing continuously since 2009. In other words, it can be seen that the technical invention of
primarily changed from service technology to information retrieval technology through
S pattern-based representative technology codes extracted from the classification systems. In the service technology field, more detailed technology inventions have been made in the transportation and tourism fields.