Rating the Dominance of Concepts in Semantic Taxonomies
Abstract
:1. Introduction
- Publicly available taxonomies may not sufficiently cover all content aspects;
- Tailor-made taxonomies are not easily extended to be applied to other context or scientific domains [2];
- Tagging the content and maintaining those tags requires a lot of effort; and
- Client-specific intelligence is not efficiently associated with the global research context (thus solutions such as TrendMD [4] emerge in order to partially solve this issue).
- The requirement for tailor-made taxonomies is drastically reduced, as a result of the multidisciplinary nature and concept coverage of the DoFoS;
- In comparison to the current workflow, the DoFoS can be immediately adopted, thus saving time and resources, as opposed to the creation of a tailor-made taxonomy;
- The adoption of DoFoS can lead to improved content organization and enhanced content discoverability;
- The fragmented scientific knowledge is consolidated as the content is classified based on a single taxonomy;
- Publisher-specific taxonomic silos are decreased both in volume and number; and
- Large-scale recommendations and analytics across all publishers and disciplines are feasible.
2. Related Work
3. Defining Dominance Metrics
- “Depth”: The depth of the taxonomy, as an integer of one or greater positive value;
- “Level”: The level of a concept in the taxonomy, defined as Depth plus one, as an integer of zero or greater positive value, where the lowest value is assigned to the root node;
- “Descendants (direct and inferred)”: The number of direct descendant concepts of a concept along with its inferred ones (all the descendants of its descendants), as an integer of zero or greater positive value; and
- “Tagged Objects (direct and inferred)”: The number of tagged objects directly associated with a concept along with the tagged objects of its inferred descendants, as an integer of zero or greater positive value.
4. Use Cases
4.1. MAG FoS Taxonomy
4.1.1. Adapting the “Dominance Metric” Methodology
- “UMLS relation”: Indicates whether a concept is related with a UMLS term, as a Boolean value.
- “Source relation”: Indicates whether a concept is related with external knowledge sources (e.g., Wikipedia), as a Boolean value.
4.1.2. Generating the DoFoS
4.1.3. Cleansing Process
DoFoS Deduplication
- [0] indicate no agreement;
- (0, 0.2) indicate slight agreement;
- [0.2, 0.4) indicate fair agreement;
- [0.4, 0.6) indicate moderate agreement;
- [0.6, 0.8) indicate substantial agreement; and
- [0.8, 1] indicate nearly perfect agreement.
DoFoS Hierarchy Refinement
4.2. MeSH Taxonomy
4.2.1. Adapting the “Dominance Metric” Methodology
- “Registry relation”: Indicates whether a concept is related with a term from an external registry (i.e., CAS, EC, FDA, and NCBI), as a Boolean value.
4.2.2. Generating the DoMeSH Taxonomy
5. Discussion: Taxonomies in the Scholarly Publishing Domain
- “Small” ranging from one to 100 concepts;
- “Medium” ranging from 100 to 500 concepts;
- “Large” ranging from 500 to 1000 concepts; and
- “Huge” containing more than 1000 concepts.
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Carelli, B. TrendMD: Using AI to enhance discovery and achieve publisher goals. Inf. Serv. Use 2020, 39, 335–346. [Google Scholar] [CrossRef] [Green Version]
- Sujatha, R.; Rao, B.R. Taxonomy Construction Techniques-Issues and Challenges. Indian J. Comput. Sci. Eng. 2011, 2, 661–671. [Google Scholar]
- Shen, J.; Wu, Z.; Lei, D.; Zhang, C.; Ren, X.; Vanni, M.T.; Sadler, B.M.; Han, J. HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’18), London, UK, 19–23 August 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 2180–2189. [Google Scholar] [CrossRef] [Green Version]
- Tuan, L.A.; Kim, J.; Kiong, N.S. Taxonomy Construction Using Syntactic Contextual Evidence. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 810–819. [Google Scholar] [CrossRef]
- Sinha, A.; Shen, Z.; Song, Y.; Ma, H.; Eide, D.; Hsu, B.-J.; Wang, K. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15 Companion), Florence, Italy, 18–22 May 2015; Association for Computing Machinery: New York, NY, USA, 2015; pp. 243–246. [Google Scholar] [CrossRef]
- Shen, Z.; Ma, H.; Wang, K. A Web-scale system for scientific knowledge exploration. In Proceedings of the ACL 2018, System Demonstrations, Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; pp. 87–92. [Google Scholar] [CrossRef] [Green Version]
- Shen, Z.; Wu, C.-H.; Ma, L.; Chen, C.-P.; Wang, K. SciConceptMiner: A system for large-scale scientific concept discovery. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, Association for Computational Linguistics, online, 1–6 August 2021; pp. 48–54. [Google Scholar] [CrossRef]
- Razis, G.; Anagnostopoulos, I. Semantifying Twitter: The Influence Tracker Ontology. In Proceedings of the 9th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP), Corfu, Greece, 6–7 November 2014; pp. 98–103. [Google Scholar] [CrossRef] [Green Version]
- Romero, D.M.; Galuba, W.; Asur, S.; Huberman, B.A. Influence and passivity in social media. In Proceedings of the 20th international conference companion on World Wide Web (WWW ’11), Hyderabad, India, 28 March–1 April 2011; Association for Computing Machinery: New York, NY, USA, 2011; pp. 113–114. [Google Scholar] [CrossRef] [Green Version]
- Cha, M.; Haddadi, H.; Benevenuto, F.; Gummadi, P.K. Measuring user influence in Twitter: The million follower fallacy. In Proceedings of the 4th International Conference on Weblogs and Social Media (ICWSM ’10), Dublin, Ireland, 4–8 June 2010; The AAAI Press: Palo Alto, CA, USA, 2010. [Google Scholar]
- Razis, G.; Anagnostopoulos, I.; Zhou, H. Identifying Dominant Nodes in Semantic Taxonomies. In Proceedings of the 16th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP), Corfu, Greece, 4–5 November 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Nargundkar, A.; Rao, Y.S. InfluenceRank: A machine learning approach to measure influence of Twitter users. In Proceedings of the International Conference on Recent Trends in Information Technology (ICRTIT ’16), Chennai, India, 8–9 April 2016; pp. 1–6. [Google Scholar] [CrossRef]
- Peng, S.; Yang, A.; Cao, L.; Yu, S.; Xie, D. Social influence modeling using information theory in mobile social networks. Inf. Sci. 2017, 379, 146–159. [Google Scholar] [CrossRef]
- Hutchins, B.I.; Yuan, X.; Anderson, J.M.; Santangelo, G.M. Relative Citation Ratio (RCR): A New Metric That Uses Citation Rates to Measure Influence at the Article Level. PLoS Biol. 2016, 14, e1002541. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jaitly, V.; Chowriappa, P.; Dua, S. A framework to identify influencers in signed social networks. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, India, 21–24 September 2016; pp. 2335–2340. [Google Scholar] [CrossRef]
- Hajian, B.; White, T. Modelling Influence in a Social Network: Metrics and Evaluation. In Proceedings of the IEEE 3rd International Conference on Privacy, Security, Risk and Trust and IEEE 3rd International Conference on Social Computing, Boston, MA, USA, 9–11 October 2011; pp. 497–500. [Google Scholar] [CrossRef]
- Almgren, K.; Lee, J. Applying an influence measurement framework to large social network. J. Netw. Technol. 2016, 7, 6–15. [Google Scholar]
- Li, H.; Gao, G.; Chen, R.; Ge, X.; Guo, S.; Hao, L.-Y. The Influence Ranking for Testers in Bug Tracking Systems. Int. J. Softw. Eng. Knowl. Eng. 2019, 29, 93–113. [Google Scholar] [CrossRef]
- Pal, A.; Ruj, S. CITEX: A new citation index to measure the relative importance of authors and papers in scientific publications. In Proceedings of the 2015 IEEE International Conference on Communications (ICC), London, UK, 8–12 June 2015; pp. 1256–1261. [Google Scholar] [CrossRef] [Green Version]
- Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781002E. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J. Distributed Representations of Words and Phrases and their Com-positionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 2 (NIPS ’13), Lake Tahoe, NV, USA, 5–10 December 2013; Curran Associates Inc.: Red Hook, NY, USA, 2013; pp. 3111–3119. [Google Scholar]
Art | Biology | Business | Chemistry | Computer Science |
Economics | Engineering | Environmental science | Geography | Geology |
History | Materials science | Mathematics | Medicine | Philosophy |
Physics | Political science | Psychology | Sociology |
Tag ID | Publications | Descendants | Level | UMLS | Source | Dominance Metric |
---|---|---|---|---|---|---|
ID_01 | 14,709 | 0 | 3 | 0 | 1 | 32,359.8 |
ID_02 | 4318 | 0 | 5 | 0 | 1 | 28,498.8 |
ID_03 | 14,656 | 7 | 4 | 0 | 1 | 6045.6 |
ID_04 | 970,638 | 375 | 1 | 0 | 1 | 3407.56 |
ID_05 | 4,052,723 | 3630 | 0 | 0 | 1 | 1227.76 |
ID_06 | 12,989 | 56 | 2 | 0 | 1 | 376 |
ID_07 | 383 | 12 | 3 | 0 | 1 | 64.82 |
ID_08 | 6 | 0 | 5 | 1 | 0 | 43.2 |
ID_09 | 9 | 1 | 4 | 1 | 0 | 16.2 |
ID_10 | 6 | 0 | 3 | 0 | 1 | 13.2 |
ID_11 | 11 | 2 | 3 | 1 | 0 | 8.8 |
ID_12 | 1 | 0 | 5 | 1 | 0 | 7.2 |
ID_13 | 1 | 0 | 2 | 0 | 1 | 1.65 |
Ranking | Curator ID | Cohen’s Kappa | Rating |
---|---|---|---|
1 | C05 | 0.83 | Perfect− |
2 | C04 | 0.79 | Substantial+ |
3 | C01 | 0.76 | Substantial+ |
4 | C03 | 0.62 | Substantial− |
5 | C02 | 0.58 | Moderate+ |
Overall: 0.72 (Substantial+) |
Ranking | Curator ID | Agreement | Rating |
---|---|---|---|
1 | C01 | 0.90 | Perfect+ |
2 | C02 | 0.89 | Perfect− |
3 | C03 | 0.85 | Perfect− |
4 | C04 | 0.84 | Perfect− |
5 | C05 | 0.84 | Perfect− |
Overall: 0.87 (Perfect−) |
Similarity Range | Percentage |
---|---|
[−1.0, −0.2) | 0% |
[−0.2, −0.1) | 0.002% |
[−0.1, 0.0) | 0.44% |
[0.0, 0.1) | 5.28% |
[0.1, 0.2) | 12.03% |
[0.2, 0.3) | 14.09% |
[0.3, 0.4) | 12.36% |
[0.4, 0.5) | 9.15% |
[0.5, 0.6) | 7.31% |
[0.6, 0.7) | 6.55% |
[0.7, 0.8) | 7.07% |
[0.8, 0.9) | 8.56% |
[0.9, 1] | 4.07% |
Ranking | Curator ID | Cohen’s Kappa | Rating |
---|---|---|---|
1 | C08 | 0.81 | Perfect− |
2 | C07 | 0.79 | Substantial+ |
3 | C09 | 0.78 | Substantial+ |
4 | C03 | 0.76 | Substantial− |
5 | C04 | 0.58 | Moderate+ |
6 | C05 | 0.55 | Moderate+ |
7 | C06 | 0.53 | Moderate+ |
8 | C10 | 0.38 | Fair+ |
9 | C11 | 0.36 | Fair+ |
10 | C12 | 0.35 | Fair+ |
11 | C01 | 0.31 | Fair+ |
12 | C02 | 0.29 | Fair− |
Overall: 0.54 (Moderate+) |
Anatomy | Organisms | Diseases | Chemicals and Drugs | Analytical, Diagnostic and Therapeutic Techniques, and Equipment |
Psychiatry and Psychology | Phenomena and Processes | Disciplines and Occupations | Anthropology, Education, Sociology, and Social Phenomena | Technology, Industry, and Agriculture |
Humanities | Information Science | Named Groups | Health Care | Geographicals |
Tag ID | Publications | Descendants | Level | Registry | Dominance Metric |
---|---|---|---|---|---|
ID_01 | 1,623,587 | 1635 | 3 | 1 | 1515.68 |
ID_02 | 135,895 | 568 | 5 | 1 | 445.82 |
ID_03 | 13,588 | 12 | 7 | 1 | 2508.55 |
ID_04 | 2,983,654 | 18,233 | 1 | 1 | 211.46 |
ID_05 | 1,988,774 | 3710 | 1 | 1 | 692.56 |
ID_06 | 15 | 0 | 12 | 0 | 105 |
ID_07 | 15 | 1 | 11 | 0 | 35 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Razis, G.; Anagnostopoulos, I.; Zhou, H. Rating the Dominance of Concepts in Semantic Taxonomies. Computers 2022, 11, 35. https://doi.org/10.3390/computers11030035
Razis G, Anagnostopoulos I, Zhou H. Rating the Dominance of Concepts in Semantic Taxonomies. Computers. 2022; 11(3):35. https://doi.org/10.3390/computers11030035
Chicago/Turabian StyleRazis, Gerasimos, Ioannis Anagnostopoulos, and Hong Zhou. 2022. "Rating the Dominance of Concepts in Semantic Taxonomies" Computers 11, no. 3: 35. https://doi.org/10.3390/computers11030035
APA StyleRazis, G., Anagnostopoulos, I., & Zhou, H. (2022). Rating the Dominance of Concepts in Semantic Taxonomies. Computers, 11(3), 35. https://doi.org/10.3390/computers11030035