Improving SDG Classification Precision Using Combinatorial Fusion
Abstract
:1. Introduction
2. Materials and Methods
2.1. Model A
2.2. Model B
- Identify a keyword in the text: “… beaches estuaries dune systems mangroves MARSHES lagoons swamps reefs, etc., are …”;
- The UNBIS concept extracted from the keyword via its synonym: WETLANDS;
- The path from the extracted concept to the subject tag associated with the SDG entity: WETLANDS -> SURFACE WATERS -> WATER;
- The most relevant goal associated with the subject tag WATER: “06 Ensure availability and sustainable management of water and sanitation for all”.
2.3. Performance Evaluation of Models A and B
2.3.1. Example 1: Low Cognitive Diversity
2.3.2. Example 2: Median Cognitive Diversity
2.3.3. Example 3: High Cognitive Diversity
2.4. Results of Combining Models A and B for Nine Sample Documents
2.5. Comparing the Performance of the Four Models
3. Extension of the Results to a Larger Set of Sample Documents
3.1. Extending by an Additional 30 Randomly Selected Sample Documents
3.2. Combination Results in Terms of Precision @ 1
3.3. Combination Results in Terms of Precision @ 3
3.4. Combination Results in Terms of Precision @ 5
3.5. Combination Results in Terms of Precision @ 8
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, L.; Tang, L.; Dong, W.; Yao, S.; Zhou, W. An Overview of Topic Modeling and Its Current Applications in Bioinformatics. SpringerPlus 2016, 5, 1608. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Blei, D.M. Probabilistic Topic Models. Commun. ACM 2012, 55, 77–84. [Google Scholar] [CrossRef] [Green Version]
- Hsu, D.F.; Chung, Y.-S.; Kristal, B.S. Combinatorial Fusion Analysis: Methods and Practices of Combining Multiple Scoring Systems. In Advanced Data Mining Technologies in Bioinformatics; Hsu, H.-H., Ed.; IGI Global: Hershey, PA, USA, 2006; pp. 32–62. ISBN 978-1-59140-863-5. [Google Scholar]
- Hsu, D.F.; Shapiro, J.; Taksa, I. Methods of Data Fusion in Information Retrieval: Rank vs. Score Combination; Rutgers University: New Brunswick, NJ, USA, 2002. [Google Scholar]
- Lin, K.-L.; Lin, C.-Y.; Huang, C.-D.; Chang, H.-M.; Yang, C.-Y.; Lin, C.-T.; Tang, C.Y.; Hsu, D.F. Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction. IEEE Trans. Nanobioscience 2007, 6, 186–196. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schweikert, C.; Brown, S.; Tang, Z.; Smith, P.R.; Hsu, D.F. Combining Multiple ChIP-Seq Peak Detection Systems Using Combinatorial Fusion. BMC Genom. 2012, 13, S12. [Google Scholar] [CrossRef] [Green Version]
- Yang, J.-M.; Chen, Y.-F.; Shen, T.-W.; Kristal, B.S.; Hsu, D.F. Consensus Scoring Criteria for Improving Enrichment in Virtual Screening. J. Chem. Inf. Modeling 2005, 45, 1134–1146. [Google Scholar] [CrossRef]
- Chen, Y.F.; Hsu, K.C.; Lin, P.T.; Hsu, D.F.; Kristal, B.S.; Yang, J.M. LigSeeSVM: Ligand-Based Virtual Screening Using Support Vector Machines and Data Fusion. Int. J. Comput. Biol. Drug Des. 2011, 4, 274. [Google Scholar] [CrossRef] [Green Version]
- Lyons, D.M.; Hsu, D.F. Combining Multiple Scoring Systems for Target Tracking Using Rank–Score Characteristics. Inf. Fusion 2009, 10, 124–136. [Google Scholar] [CrossRef] [Green Version]
- Deng, Y.; Wu, Z.; Chu, C.-H.; Zhang, Q.; Hsu, D.F. Sensor Feature Selection and Combination for Stress Identification Using Combinatorial Fusion. Int. J. Adv. Robot. Syst. 2013, 10, 306. [Google Scholar] [CrossRef]
- Deng, Y.; Hsu, D.F.; Wu, Z.; Chu, C.-H. Combining Multiple Sensor Features for Stress Detection Using Combinatorial Fusion. J. Interconnect. Netw. 2012, 13, 1250008. [Google Scholar] [CrossRef]
- Wang, X.; Ho-Shek, J.; Ondusko, D.; Frank Hsu, D. Improving Portfolio Performance Using Attribute Selection and Combination. In Pervasive Systems, Algorithms and Networks; Esposito, C., Hong, J., Choo, K.-K.R., Eds.; Communications in Computer and Information Science; Springer International Publishing: Cham, Switzerland, 2019; Volume 1080, pp. 58–70. ISBN 978-3-030-30142-2. [Google Scholar]
- Batallones, A.; Sanchez, K.; Mott, B.; Coffran, C.; Frank Hsu, D. On the Combination of Two Visual Cognition Systems Using Combinatorial Fusion. Brain Inform. 2015, 2, 21–32. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kustiawan, I.; Liu, C.-Y.; Hsu, D.F. Vertical Handoff Decision Using Fuzzification and Combinatorial Fusion. IEEE Commun. Lett. 2017, 21, 2089–2092. [Google Scholar] [CrossRef]
- Sniatynski, M.J.; Shepherd, J.A.; Ernst, T.; Wilkens, L.R.; Hsu, D.F.; Kristal, B.S. Ranks Underlie Outcome of Combining Classifiers: Quantitative Roles for Diversity and Accuracy. Patterns 2021, 3, 100415. [Google Scholar] [CrossRef]
- Li, Y.; Hsu, D.F.; Chung, S.M. Combination of Multiple Feature Selection Methods for Text Categorization by Using Combinatorial Fusion Analysis And Rank-Score Characteristic. Int. J. Artif. Intell. Tools 2013, 22, 1350001. [Google Scholar] [CrossRef]
- Hsu, D.F.; Kristal, B.S.; Hao, Y.; Schweikert, C. Cognitive Diversity: A Measurement of Dissimilarity Between Multiple Scoring Systems. J. Interconnect. Netw. 2019, 19, 194001–194042. [Google Scholar] [CrossRef]
- Hurley, L.; Kristal, B.S.; Sirimulla, S.; Schweikert, C.; Hsu, D.F. Multi-Layer Combinatorial Fusion Using Cognitive Diversity. IEEE Access 2021, 9, 3919–3935. [Google Scholar] [CrossRef]
- Rosli, N.; Rahman, M.; Balakrishnan, M.; Komeda, T.; Mazlan, S.; Zamzuri, H. Improved Gender Recognition during Stepping Activity for Rehab Application Using the Combinatorial Fusion Approach of EMG and HRV. Appl. Sci. 2017, 7, 348. [Google Scholar] [CrossRef] [Green Version]
- United Nations The 17 Goals. Available online: https://sdgs.un.org/goals (accessed on 22 December 2021).
- LaFleur, M.T. Art Is Long, Life Is Short: An SDG Classification System for DESA Publications; DESA: New York, NY, USA, 2019; Working Paper No. 159. [Google Scholar]
- LaFleur, M.T.; Kim, N. What Does the United Nations “Say” about Global Agenda? An Exploration of Trends Using Natural Language Processing for Machine Learning; Working Paper No. 171; DESA: New York, NY, USA, 2020. [Google Scholar]
- Le Blanc, D.; Freire, C.; Vierros, M. Mapping the Linkages between Oceans and Other Sustainable Development Goals: A Preliminary Exploration; Working Paper No. 149; DESA: New York, NY, USA, 2017. [Google Scholar]
- Le Blanc, D. Towards Integration at Last? The Sustainable Development Goals as a Network of Targets; Working Paper No. 141; DESA: New York, NY, USA, 2015. [Google Scholar]
- UN DESA LinkedSDGs. Available online: https://linkedsdg.officialstatistics.org (accessed on 22 December 2021).
- W3C Semantic Web. Available online: https://www.w3.org/standards/semanticweb (accessed on 22 December 2021).
- Eastman, M.T.; Horrocks, P.; Singh, T.; Kumar, N. Institutional Investing for the SDGs; MSCI and OECD, 2018. Available online: https://www.msci.com/documents/10199/239004/Institutional_Investing_for_the_SDGs.pdf (accessed on 22 December 2021).
- Cocho, G.; Rodríguez, R.F.; Sánchez, S.; Flores, J.; Pineda, C.; Gershenson, C. Rank-Frequency Distribution of Natural Languages: A Difference of Probabilities Approach. Phys. A Stat. Mech. Appl. 2019, 532, 121795. [Google Scholar] [CrossRef] [Green Version]
- Brakman, S.; Garretsen, H.; Van Marrewijk, C.; Van Den Berg, M. The Return of Zipf: Towards a Further Understanding of the Rank-Size Distribution. J. Reg. Sci. 1999, 39, 183–213. [Google Scholar] [CrossRef]
- Orazbek, I.; LaFleur, M.T.; Hsu, D.F. Improving SDG Classification Precision of Topic Models with Combinatorial Fusion Algorithm. In Proceedings of the 2021 IEEE Intl Conference on Cyber Science and Technology Congress (CyberSciTech), Calgary, AB, Canada, 25–28 October 2021. [Google Scholar]
- Tang, Y.; Li, Z.; Nellikkal, M.A.N.; Eramian, H.; Chan, E.M.; Norquist, A.J.; Hsu, D.F.; Schrier, J. Improving Data and Prediction Quality of High-Throughput Perovskite Synthesis with Model Fusion. J. Chem. Inf. Modeling 2021, 61, 1593–1602. [Google Scholar] [CrossRef] [PubMed]
- Hsu, D.F.; Taksa, I. Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval. Inf. Retr. 2005, 8, 449–480. [Google Scholar] [CrossRef]
- Debnath, P.; Konwar, N.; Radenović, S. Metric Fixed Point Theory: Applications in Science, Engineering and Behavioural Sciences; Forum for Interdisciplinary Mathematics; Springer: Singapore, 2021; ISBN 9789811648953. [Google Scholar]
- Todorčević, V. Harmonic Quasiconformal Mappings and Hyperbolic Type Metrics; Springer International Publishing: Cham, Switzerland, 2019; ISBN 978-3-030-22590-2. [Google Scholar]
Rank | Results | ||
---|---|---|---|
SDG of Model A | SDG of Model B | Human | |
1 | d_3 | d_3 | d_3 |
2 | d_4 | d_5 | d_4 |
3 | d_5 | d_2 | d_5 |
4 | d_17 | d_4 | d_1 |
5 | d_6 | d_16 | d_2 |
6 | d_10 | d_6 | d_6 |
7 | d_2 | d_11 | d_10 |
8 | d_13 | d_17 | d_16 |
precision @ 1 | 1.00 | 1.00 | |
precision @ 3 | 1.00 | 0.67 | |
precision @ 5 | 0.60 | 0.80 | |
precision @ 8 | 0.75 | 0.75 |
Rank | Results | ||
---|---|---|---|
SDG of Model A | SDG of Model B | Human | |
1 | d_4 | d_4 | d_4 |
2 | d_12 | d_3 | d_8 |
3 | d_13 | d_5 | d_5 |
4 | d_15 | d_8 | d_12 |
5 | d_17 | d_16 | d_10 |
6 | d_5 | d_6 | d_3 |
7 | d_8 | d_17 | d_1 |
8 | d_11 | d_11 | d_13 |
precision @ 1 | 1.00 | 1.00 | |
precision @ 3 | 0.33 | 0.67 | |
precision @ 5 | 0.40 | 0.60 | |
precision @ 8 | 0.63 | 0.50 |
Rank | Results | ||
---|---|---|---|
SDG of Model A | SDG of Model B | Human | |
1 | d_17 | d_3 | d_8 |
2 | d_4 | d_2 | d_7 |
3 | d_3 | d_4 | d_3 |
4 | d_12 | d_11 | d_4 |
5 | d_9 | d_5 | d_2 |
6 | d_16 | d_17 | d_9 |
7 | d_7 | d_6 | d_17 |
8 | d_10 | d_8 | d_12 |
precision @ 1 | 0.00 | 0.00 | |
precision @ 3 | 0.33 | 0.33 | |
precision @ 5 | 0.40 | 0.60 | |
precision @ 8 | 0.75 | 0.63 |
Rank | Results | ||
---|---|---|---|
SDG of SC(A, B) | SDG of RC(A, B) | Human | |
1 | d_3 | d_3 | d_3 |
2 | d_5 | d_5 | d_4 |
3 | d_4 | d_4 | d_5 |
4 | d_17 | d_2 | d_1 |
5 | d_2 | d_6 | d_2 |
6 | d_6 | d_17 | d_6 |
7 | d_10 | d_11 | d_10 |
8 | d_11 | d_16 | d_16 |
precision @ 1 | 1.00 | 1.00 | |
precision @ 3 | 1.00 | 1.00 | |
precision @ 5 | 0.80 | 0.80 | |
precision @ 8 | 0.75 | 0.75 |
Rank | Results | ||
---|---|---|---|
SDG of SC(A, B) | SDG of RC(A, B) | Human | |
1 | d_4 | d_4 | d_4 |
2 | d_3 | d_5 | d_8 |
3 | d_5 | d_8 | d_5 |
4 | d_12 | d_17 | d_12 |
5 | d_8 | d_12 | d_10 |
6 | d_17 | d_11 | d_3 |
7 | d_13 | d_6 | d_1 |
8 | d_11 | d_13 | d_13 |
precision @ 1 | 1.00 | 1.00 | |
precision @ 3 | 0.67 | 1.00 | |
precision @ 5 | 0.80 | 0.80 | |
precision @ 8 | 0.75 | 0.63 |
Rank | Results | ||
---|---|---|---|
SDG of SC(A, B) | SDG of RC(A, B) | Human | |
1 | d_13 | d_1 | d_13 |
2 | d_1 | d_9 | d_12 |
3 | d_9 | d_12 | d_9 |
4 | d_12 | d_7 | d_11 |
5 | d_7 | d_13 | d_10 |
6 | d_11 | d_14 | d_1 |
7 | d_8 | d_8 | d_8 |
8 | d_14 | d_11 | d_7 |
precision @ 1 | 1.00 | 0.00 | |
precision @ 3 | 0.67 | 0.67 | |
precision @ 5 | 0.60 | 0.60 | |
precision @ 8 | 0.88 | 0.88 |
Pre@k (A) | Pre@k (B) | Pre@k SC(A, B) | Pre@k RC(A, B) | |
---|---|---|---|---|
Low CD case | ||||
k = 1 | 1.00 | 1.00 | 1.00 | 1.00 |
k = 3 | 1.00 | 0.67 | 1.00 | 1.00 |
k = 5 | 0.60 | 0.80 | 0.80 | 0.80 |
k = 8 | 0.75 | 0.75 | 0.75 | 0.75 |
Median CD case | ||||
k = 1 | 1.00 | 1.00 | 1.00 | 1.00 |
k = 3 | 0.33 | 0.67 | 0.67 | 1.00 |
k = 5 | 0.40 | 0.60 | 0.80 | 0.80 |
k = 8 | 0.63 | 0.50 | 0.75 | 0.63 |
High CD case | ||||
k = 1 | 0.00 | 0.00 | 0.00 | 0.00 |
k = 3 | 0.33 | 0.33 | 0.33 | 0.33 |
k = 5 | 0.40 | 0.60 | 0.60 | 0.60 |
k = 8 | 0.75 | 0.63 | 0.88 | 0.88 |
Cognitive Diversity | Gain in Pre @ 1 | Gain in Pre @ 3 | Gain in Pre @ 5 | Gain in Pre @ 8 |
---|---|---|---|---|
0.08 | 0.00 | 0.00 | 0.00 | 0.00 |
0.11 | 0.00 | 0.00 | 0.20 | 0.00 |
0.13 | 0.00 | 0.33 | 0.00 | 0.00 |
0.45 | −1.00 | 0.00 | −0.20 | 0.00 |
0.46 | 0.00 | 0.00 | 0.20 | 0.125 |
0.46 | 0.00 | 0.00 | 0.40 | 0.00 |
1.40 | −1.00 | 0.33 | 0.00 | 0.00 |
1.67 | 0.00 | 0.00 | 0.00 | 0.125 |
1.78 | 0.00 | 0.33 | 0.00 | 0.125 |
Avg Improvement | −22% | 11% | 6.7% | 4.2% |
Cognitive Diversity | Gain in Pre @ 1 | Gain in Pre @ 3 | Gain in Pre @ 5 | Gain in Pre @ 8 |
---|---|---|---|---|
0.08 | 0.00 | 0.00 | 0.00 | 0.00 |
0.11 | 0.00 | −0.33 | 0.00 | 0.00 |
0.13 | 0.00 | 0.33 | 0.00 | 0.00 |
0.45 | −1.00 | −0.33 | 0.00 | 0.125 |
0.46 | 0.00 | 0.33 | 0.20 | 0.00 |
0.46 | 0.00 | −0.67 | 0.20 | 0.00 |
1.40 | −1.00 | 0.33 | 0.00 | 0.00 |
1.67 | 0.00 | 0.00 | 0.00 | 0.125 |
1.78 | −1.00 | 0.33 | 0.00 | 0.125 |
Avg Improvement | −33% | 0% | 4.4% | 4.2% |
Cognitive Diversity | Gain in Pre @ 1 | Gain in Pre @ 3 | Gain in Pre @ 5 | Gain in Pre @ 8 |
---|---|---|---|---|
0.08 | 0.00 | 0.00 | 0.00 | 0.00 |
0.11 | 0.00 | 0.00 | 0.20 | 0.00 |
0.13 | 0.00 | 0.33 | 0.00 | 0.00 |
0.45 | −1.00 | 0.00 | −0.20 | 0.00 |
0.46 | 0.00 | 0.00 | 0.20 | 0.125 |
0.46 | 0.00 | 0.00 | 0.40 | 0.00 |
1.40 | −1.00 | 0.33 | 0.00 | 0.00 |
1.67 | 0.00 | 0.00 | 0.00 | 0.125 |
1.78 | −1.00 | 0.33 | 0.00 | 0.125 |
Avg Improvement | −33% | 11% | 6.7% | 4.2% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hsu, D.F.; LaFleur, M.T.; Orazbek, I. Improving SDG Classification Precision Using Combinatorial Fusion. Sensors 2022, 22, 1067. https://doi.org/10.3390/s22031067
Hsu DF, LaFleur MT, Orazbek I. Improving SDG Classification Precision Using Combinatorial Fusion. Sensors. 2022; 22(3):1067. https://doi.org/10.3390/s22031067
Chicago/Turabian StyleHsu, D. Frank, Marcelo T. LaFleur, and Ilyas Orazbek. 2022. "Improving SDG Classification Precision Using Combinatorial Fusion" Sensors 22, no. 3: 1067. https://doi.org/10.3390/s22031067
APA StyleHsu, D. F., LaFleur, M. T., & Orazbek, I. (2022). Improving SDG Classification Precision Using Combinatorial Fusion. Sensors, 22(3), 1067. https://doi.org/10.3390/s22031067