Fine-Grained Algorithm for Improving KNN Computational Performance on Clinical Trials Text Classification
Abstract
:1. Introduction
2. Materials and Methods
- Their positions were in relation to the phrases “inclusion criteria” or “exclusion criteria”, which usually preceded the respective lists. If those phrases were not found, then the statement was labeled “Eligible”.
- Negation identification and transformation: negated inclusion criteria starting with “no” were transformed into positive statements and labeled “Not Eligible”.
3. Result and Discussion
4. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Toruno, D.; Çak, E.; Ganiz, M.C.; Akyoku, S.; Gürbüz, M.Z. Analysis of Preprocessing Methods on Classification of Turkish Texts. In Proceedings of the 2011 International Symposium on Innovations in Intelligent Systems and Applications, Istanbul, Turkey, 15–18 June 2011; pp. 112–117. [Google Scholar]
- Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Mnaning, C.D.; Ng, A.; Potts, C. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Washington, DC, USA, 18–21 October 2013; pp. 1631–1642. [Google Scholar]
- Zeng, D.; Liu, K.; Chen, Y.; Zhao, J. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1753–1762. [Google Scholar]
- Wang, A.H. DON’T FOLLOW ME–Spam Detection in Twitter. In Proceedings of the 10th International Conference on Security and Cryptography, Amalfi, Italy, 26 July 2010; pp. 142–151. [Google Scholar]
- Xie, S.; Wang, G.; Lin, S.; Yu, P.S. Review Spam Detection via Temporal Pattern Discovery. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining–KDD ’12, Beijing, China, 12–16 August 2012; pp. 823–831. [Google Scholar]
- Melinda, B. Clinical Trials; Bill & Melinda Gates Foundation: Seattle, WA, USA, 2014. [Google Scholar]
- Shivade, C.; Hebert, C.; Lopetegui, M.; De Marneffe, M.-C.; Fosler-Lussier, E.; Lai, A.M. Textual inference for eligibility criteria resolution in clinical trials. J. Biomed. Inform. 2015, 58, S211–S218. [Google Scholar] [CrossRef] [PubMed]
- Chondrogiannis, E.; Andronikou, V.; Tagaris, A.; Karanastasis, E.; Varvarigou, T.; Tsuji, M. A novel semantic representation for eligibility criteria in clinical trials. J. Biomed. Inform. 2017, 69, 10–23. [Google Scholar] [CrossRef] [PubMed]
- Mackellar, B.; Schweikert, C. Analyzing conflicts between Clinical Trials from a patient perspective. In Proceedings of the 17th International Conference on E-health Networking, Application & Services (HealthCom) 2015, Boston, MA, USA, 14–17 October 2015; pp. 479–482. [Google Scholar]
- Mackellar, B.; Schweikert, C. Patterns for conflict identification in clinical trial eligibility criteria. In Proceedings of the IEEE 18th International Conference on e-Health Networking, Applications and Services (Healthcom), Munich, Germany, 14–17 September 2016; pp. 1–6. [Google Scholar]
- Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Campos, V.; Jou, B.; Giró-I-Nieto, X. From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction. Image Vis. Comput. 2017, 65, 15–22. [Google Scholar] [CrossRef] [Green Version]
- Brocki, Ł.; Marasek, K. Deep Belief Neural Networks and Bidirectional Long-Short Term Memory Hybrid for Speech Recognition. Arch. Acoust. 2015, 40, 191–195. [Google Scholar] [CrossRef] [Green Version]
- Tai, K.S.; Socher, R.; Manning, C.D. Improved Semantic Representations from Tree-Structured Long Short-Term Memory Networks. arXiv 2015, arXiv:1503.00075. [Google Scholar]
- Menger, V.; Scheepers, F.; Spruit, M. Comparing Deep Learning and Classical Machine Learning Approaches for Predicting Inpatient Violence Incidents from Clinical Text. Appl. Sci. 2018, 8, 981. [Google Scholar] [CrossRef] [Green Version]
- Bustos, A.; Pertusa, A. Learning Eligibility in Cancer Clinical Trials Using Deep Neural Networks. Appl. Sci. 2018, 8, 1206. [Google Scholar] [CrossRef] [Green Version]
- Sutanto, T.; Nayak, R. Fine-grained document clustering via ranking and its application to social media analytics. Soc. Netw. Anal. Min. 2018, 8, 29. [Google Scholar] [CrossRef] [Green Version]
- Isa, D.; Lee, L.H.; Kallimani, V.P.; Rajkumar, R. Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine. IEEE Trans. Knowl. Data Eng. 2008, 20, 1264–1272. [Google Scholar] [CrossRef]
- Ramesh, B.; Sathiaseelan, J. An Advanced Multi Class Instance Selection based Support Vector Machine for Text Classification. Procedia Comput. Sci. 2015, 57, 1124–1130. [Google Scholar] [CrossRef] [Green Version]
- Husni, N.L.; Handayani, A.S.; Nurmaini, S.; Yani, I. Odor classification using Support Vector Machine. In Proceedings of theInternational Conference on Electrical Engineering and Computer Science (ICECOS), Palembang, Indonesia, 22–23 August 2017; pp. 71–76. [Google Scholar]
- National Library of Medicine; National Institutes of Health. XML Schema for ClinicalTrials.gov Public XML; National Library of Medicine, National Institutes of Health: Bethesda, MD, USA, 2017. [Google Scholar]
- Liu, C.-Z.; Sheng, Y.-X.; Wei, Z.-Q.; Yang, Y.-Q. Research of Text Classification Based on Improved TF-IDF Algorithm. In Proceedings of the IEEE International Conference of Intelligent Robotic and Control Engineering (IRCE), Lanzhou, China, 24–27 August 2018; pp. 218–222. [Google Scholar]
- Fuhr, N.; Lechtenfeld, M.; Stein, B.; Gollub, T. The optimum clustering framework: Implementing the cluster hypothesis. Inf. Retr. 2011, 15, 93–115. [Google Scholar] [CrossRef] [Green Version]
- Pratama, B.Y.; Sarno, R. Personality classification based on Twitter text using Naive Bayes, KNN and SVM. In Proceedings of the International Conference on Data and Software Engineering (ICoDSE), Yogyakarta, Indonesia, 25–26 November 2015; pp. 170–174. [Google Scholar]
- Tan, Y. An Improved KNN Text Classification Algorithm Based on K-Medoids and Rough Set. In 2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC); IEEE: New York, NY, USA, 2018; Volume 1, pp. 109–113. [Google Scholar]
- Lin, Y.; Wang, J. Research on text classification based on SVM-KNN. In Proceedings of the IEEE 5th International Conference on Software Engineering and Service Science, Beijing, China, 27–29 June 2014; pp. 842–844. [Google Scholar]
Classifier | Evaluation | |||
---|---|---|---|---|
Precision | Recall | F1 | Cohen’s K | |
FastText | 0.88 | 0.86 | 0.87 | 0.75 |
CNN | 0.88 | 0.88 | 0.88 | 0.76 |
SVM | 0.79 | 0.79 | 0.79 | 0.57 |
KNN | 0.92 | 0.92 | 0.92 | 0.83 |
Evaluation | KNN | SVM |
---|---|---|
Accuracy | 94.5 | 79.1 |
Precision | 93.8 | 80.7 |
Recall | 95.2 | 82.2 |
F1score | 94.5 | 81.4 |
No | Data Record | KNN Second | KNN + FGA Second |
---|---|---|---|
1 | 100,000 | 39,368 | 32,365 |
2 | 200,000 | 78,132 | 57,729 |
3 | 300,000 | 116,897 | 83,093 |
4 | 400,000 | 155,663 | 108,457 |
5 | 500,000 | 194,427 | 133,821 |
6 | 600,000 | 233,194 | 159,185 |
7 | 700,000 | 271,962 | 184,549 |
8 | 800,000 | 310,731 | 209,913 |
9 | 900,000 | 349,502 | 235,277 |
10 | 1,000,000 | 388,274 | 260,641 |
No | Data Record | SVM Second | SVM + FGA Second |
---|---|---|---|
1 | 100,000 | 35,294 | 40,244 |
2 | 200,000 | 71,771 | 79,080 |
3 | 300,000 | 108,248 | 117,916 |
4 | 400,000 | 144,725 | 156,752 |
5 | 500,000 | 181,202 | 195,588 |
6 | 600,000 | 217,679 | 234,424 |
7 | 700,000 | 254,156 | 273,260 |
8 | 800,000 | 290,633 | 312,096 |
9 | 900,000 | 327,110 | 350,932 |
10 | 1,000,000 | 363,587 | 389,768 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jasmir, J.; Nurmaini, S.; Tutuko, B. Fine-Grained Algorithm for Improving KNN Computational Performance on Clinical Trials Text Classification. Big Data Cogn. Comput. 2021, 5, 60. https://doi.org/10.3390/bdcc5040060
Jasmir J, Nurmaini S, Tutuko B. Fine-Grained Algorithm for Improving KNN Computational Performance on Clinical Trials Text Classification. Big Data and Cognitive Computing. 2021; 5(4):60. https://doi.org/10.3390/bdcc5040060
Chicago/Turabian StyleJasmir, Jasmir, Siti Nurmaini, and Bambang Tutuko. 2021. "Fine-Grained Algorithm for Improving KNN Computational Performance on Clinical Trials Text Classification" Big Data and Cognitive Computing 5, no. 4: 60. https://doi.org/10.3390/bdcc5040060
APA StyleJasmir, J., Nurmaini, S., & Tutuko, B. (2021). Fine-Grained Algorithm for Improving KNN Computational Performance on Clinical Trials Text Classification. Big Data and Cognitive Computing, 5(4), 60. https://doi.org/10.3390/bdcc5040060