Data Integration–Possibilities of Molecular and Clinical Data Fusion on the Example of Thyroid Cancer Diagnostics
Abstract
:1. Introduction
2. Results
2.1. Clinical Feature Extraction
Data Dependencies
2.2. Classification Accuracy
2.3. Stability of Feature Selection
3. Discussion
4. Materials and Methods
4.1. Dataset
4.2. Clinical Feature Extraction
4.3. Feature Sets
4.4. Data Fusion
4.5. Feature Selection and Classification
4.6. Stability of Feature Selection
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Shah, P.; Kendall, F.; Khozin, S.; Goosen, R.; Hu, J.; Laramie, J.; Ringel, M.; Schork, N. Artificial Intelligence and Machine Learning in Clinical Development: A Translational Perspective. NPJ Digit. Med. 2019, 2, 100. [Google Scholar] [CrossRef]
- Leclercq, M.; Vittrant, B.; Martin-Magniette, M.L.; Scott Boyer, M.P.; Perin, O.; Bergeron, A.; Fradet, Y.; Droit, A. Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data. Front. Genet. 2019, 10, 452. [Google Scholar] [CrossRef] [PubMed]
- Hira, Z.M.; Gillies, D.F. A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data. Available online: https://www.hindawi.com/journals/abi/2015/198363/ (accessed on 5 April 2020).
- Li, G.-Z.; Bu, H.-L.; Yang, M.Q.; Zeng, X.-Q.; Yang, J.Y. Selecting Subsets of Newly Extracted Features from PCA and PLS in Microarray Data Analysis. BMC Genom. 2008, 9, S24. [Google Scholar] [CrossRef]
- Wee, L.J.; Simarmata, D.; Kam, Y.-W.; Ng, L.F.; Tong, J.C. SVM-Based Prediction of Linear B-Cell Epitopes Using Bayes Feature Extraction. BMC Genom. 2010, 11, S21. [Google Scholar] [CrossRef]
- Louie, B.; Mork, P.; Martin-Sanchez, F.; Halevy, A.; Tarczy-Hornoch, P. Data Integration and Genomic Medicine. J. Biomed. Inform. 2007, 40, 5–16. [Google Scholar] [CrossRef]
- Subhani, M.M.; Anjum, A.; Koop, A.; Antonopoulos, N. Clinical and Genomics Data Integration Using Meta-Dimensional Approach. In Proceedings of the 2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing (UCC), Shanghai, China, 6–9 December 2016; pp. 416–421. [Google Scholar]
- Hamid, J.S.; Hu, P.; Roslin, N.M.; Ling, V.; Greenwood, C.M.T.; Beyene, J. Data Integration in Genetics and Genomics: Methods and Challenges. Hum. Genom. Proteom. 2009, 2009, 869093. [Google Scholar] [CrossRef]
- Tretyakov, K. Methods of Genomic Data Fusion: An Overview. 2006. Available online: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.423.2133&rep=rep1&type=pdf (accessed on 4 April 2020).
- Durrant-Whyte, H. Sensor Models and Multisensor Integration. Int. J. Robot. Res. 1988, 7, 97–113. [Google Scholar]
- Dasarathy, B.V. Sensor Fusion Potential Exploitation-Innovative Architectures and Illustrative Applications. Proc. IEEE 1997, 85, 24–38. [Google Scholar] [CrossRef]
- Castanedo, F. A Review of Data Fusion Techniques. Sci. World J. 2013, 2013, e704504. [Google Scholar] [CrossRef]
- Student, S.; Płuciennik, A.; Łakomiec, K.; Wilk, A.; Bensz, W.; Fujarewicz, K. Integration Strategies of Cross-Platform Microarray Data Sets in Multiclass Classification Problem. In Proceedings of the Computational Science and Its Applications—ICCSA 2019, Saint Petersburg, Russia, 1–4 July 2019; Misra, S., Gervasi, O., Murgante, B., Stankova, E., Korkhov, V., Torre, C., Rocha, A.M.A.C., Taniar, D., Apduhan, B.O., Tarantino, E., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 602–612. [Google Scholar]
- Tadist, K.; Najah, S.; Nikolov, N.S.; Mrabti, F.; Zahi, A. Feature Selection Methods and Genomic Big Data: A Systematic Review. J. Big Data 2019, 6, 79. [Google Scholar] [CrossRef]
- Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
- Bomeli, S.R.; LeBeau, S.O.; Ferris, R.L. Evaluation of a Thyroid Nodule. Otolaryngol. Clin. N. Am. 2010, 43, 229–238. [Google Scholar] [CrossRef]
- Cibas, E.S.; Ali, S.Z. The Bethesda System for Reporting Thyroid Cytopathology. Thyroid 2009, 19, 1159–1165. [Google Scholar] [CrossRef]
- Cibas, E.S.; Ali, S.Z. The 2017 Bethesda System for Reporting Thyroid Cytopathology. Thyroid 2017, 27, 1341–1346. [Google Scholar] [CrossRef]
- Wesoła, M.; Jeleń, M. Bethesda System in the Evaluation of Thyroid Nodules: Review. Adv. Clin. Exp. Med. 2017, 26, 177–182. [Google Scholar] [CrossRef]
- Tan, H.; Li, Z.; Li, N.; Qian, J.; Fan, F.; Zhong, H.; Feng, J.; Xu, H.; Li, Z. Thyroid Imaging Reporting and Data System Combined with Bethesda Classification in Qualitative Thyroid Nodule Diagnosis. Medicine 2019, 98, e18320. [Google Scholar] [CrossRef]
- Nikiforova, M.N.; Nikiforov, Y.E. Molecular Diagnostics and Predictors in Thyroid Cancer. Thyroid 2009, 19, 1351–1361. [Google Scholar] [CrossRef]
- Rossi, E.D.; Pantanowitz, L.; Faquin, W.C. The Role of Molecular Testing for the Indeterminate Thyroid FNA. Genes 2019, 10, 736. [Google Scholar] [CrossRef]
- Zhang, M.; Lin, O. Molecular Testing of Thyroid Nodules: A Review of Current Available Tests for Fine-Needle Aspiration Specimens. Arch. Pathol. Lab. Med. 2016, 140, 1338–1344. [Google Scholar] [CrossRef]
- Chudova, D.; Wilde, J.I.; Wang, E.T.; Wang, H.; Rabbee, N.; Egidio, C.M.; Reynolds, J.; Tom, E.; Pagan, M.; Rigl, C.T.; et al. Molecular Classification of Thyroid Nodules Using High-Dimensionality Genomic Data. J. Clin. Endocrinol. Metab. 2010, 95, 5296–5304. [Google Scholar] [CrossRef]
- Fujarewicz, K.; Jarzab, M.; Eszlinger, M.; Krohn, K.; Paschke, R.; Oczko-Wojciechowska, M.; Wiench, M.; Kukulska, A.; Jarzab, B.; Swierniak, A. A Multi-Gene Approach to Differentiate Papillary Thyroid Carcinoma from Benign Lesions: Gene Selection Using Support Vector Machines with Bootstrapping. Endocr. Relat. Cancer 2007, 14, 809–826. [Google Scholar] [CrossRef]
- Kopczyński, J.; Suligowska, A.; Niemyska, K.; Pałyga, I.; Walczyk, A.; Gąsior-Perczak, D.; Kowalik, A.; Hińcza, K.; Mężyk, R.; Góźdź, S.; et al. Did Introducing a New Category of Thyroid Tumors (Non-Invasive Follicular Thyroid Neoplasm with Papillary-like Nuclear Features) Decrease the Risk of Malignancy for the Diagnostic Categories in the Bethesda System for Reporting Thyroid Cytopathology? Endocr. Pathol. 2020, 31, 143–149. [Google Scholar] [CrossRef]
- Oczko-Wojciechowska, M.; Kotecka-Blicharz, A.; Krajewska, J.; Rusinek, D.; Barczyński, M.; Jarząb, B.; Czarniecka, A. European Perspective on the Use of Molecular Tests in the Diagnosis and Therapy of Thyroid Neoplasms. Gland Surg. 2020, 9, S69–S76. [Google Scholar] [CrossRef]
- Urbanowicz, R.J.; Meeker, M.; La Cava, W.; Olson, R.S.; Moore, J.H. Relief-Based Feature Selection: Introduction and Review. J. Biomed. Inform. 2018, 85, 189–203. [Google Scholar] [CrossRef]
- Kuncheva, L.I. A Stability Index for Feature Selection. In Proceedings of the Artificial Intelligence and Applications, Vancouver, BC, Canada, 22–26 July 2007. [Google Scholar]
- Khaire, U.M.; Dhanalakshmi, R. Stability of Feature Selection Algorithm: A Review. J. King Saud Univ.—Comput. Inf. Sci. 2019, 34, 1060–1073. [Google Scholar] [CrossRef]
- Nogueira, S.; Sechidis, K.; Brown, G. On the Stability of Feature Selection Algorithms. J. Mach. Learn. Res. 2018, 18, 1–54. [Google Scholar]
- Bengtsson, H.; Simpson, K.; Bullard, J.; Hansen, K.M. Aroma. Affymetrix: A Generic Framework in R for Analyzing Small to Very Large Affymetrix Data Sets in Bounded Memory. 2008. Available online: https://statistics.berkeley.edu/sites/default/files/tech-reports/745.pdf (accessed on 17 September 2020).
- Microarray Lab. Available online: http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/ (accessed on 16 April 2020).
- Maslove, D.M.; Podchiyska, T.; Lowe, H.J. Discretization of Continuous Features in Clinical Datasets. J. Am. Med. Inform. Assoc. 2013, 20, 544–553. [Google Scholar] [CrossRef]
- Jarząb, B.; Dedecjus, M.; Handkiewicz-Junak, D.; Lange, D.; Lewiński, A.; Nasierowska-Guttmejer, A.; Ruchała, M.; Słowińska-Klencka, D.; Nauman, J. Diagnostics and Treatment of Thyroid Carcinoma. Endokrynol. Pol. 2016, 67, 74–145. [Google Scholar] [CrossRef]
- Tessler, F.N.; Middleton, W.D.; Grant, E.G.; Hoang, J.K.; Berland, L.L.; Teefey, S.A.; Cronan, J.J.; Beland, M.D.; Desser, T.S.; Frates, M.C.; et al. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee. J. Am. Coll. Radiol. 2017, 14, 587–595. [Google Scholar] [CrossRef]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating Mutual Information. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2004, 69, 066138. [Google Scholar] [CrossRef]
- Sales, G.; Romualdi, C. Parmigene—A Parallel R Package for Mutual Information Estimation and Gene Network Reconstruction. Bioinformatics 2011, 27, 1876–1877. [Google Scholar] [CrossRef] [PubMed]
- Płaczek, A.; Płuciennik, A.; Kotecka-Blicharz, A.; Jarzab, M.; Mrozek, D. Bayesian Assessment of Diagnostic Strategy for a Thyroid Nodule Involving a Combination of Clinical Synthetic Features and Molecular Data. IEEE Access 2020, 8, 175125–175139. [Google Scholar] [CrossRef]
- Scutari, M. Learning Bayesian Networks with the Bnlearn R Package. J. Stat. Softw. 2010, 35, 1–22. [Google Scholar] [CrossRef]
- Alexander, E.K.; Kennedy, G.C.; Baloch, Z.W.; Cibas, E.S.; Chudova, D.; Diggans, J.; Friedman, L.; Kloos, R.T.; Li Volsi, V.A.; Mandel, S.J.; et al. Preoperative Diagnosis of Benign Thyroid Nodules with Indeterminate Cytology. N. Engl. J. Med. 2012, 367, 705–715. [Google Scholar] [CrossRef]
- Fujarewicz, K.; Student, S.; Zielański, T.; Jakubczak, M.; Pieter, J.; Pojda, K.; Świerniak, A. Large-Scale Data Classification System Based on Galaxy Server and Protected from Information Leak. In Proceedings of the Intelligent Information and Database Systems; Nguyen, N.T., Tojo, S., Nguyen, L.M., Trawiński, B., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 765–773. [Google Scholar]
- Robnik-Sikonja, M.; Savicky, P. CORElearn: Classification, Regression and Feature Evaluation. Available online: https://cran.r-project.org/web/packages/CORElearn/CORElearn.pdf (accessed on 17 April 2021).
- Drotár, P.; Gazda, J.; Smékal, Z. An Experimental Comparison of Feature Selection Methods on Two-Class Biomedical Datasets. Comput. Biol. Med. 2015, 66, 1–10. [Google Scholar] [CrossRef]
Data | Strategy | Feature Selection | Accuracy (Median) | nFeatures | Kuncheva Index |
---|---|---|---|---|---|
Microarray_163 + Malignancy_risk | Early Fusion | ReliefF | 0.942 | 3 | 0.876 |
Microarray_163 + Malignancy_risk | Late Fusion | ReliefF | 0.942 | 3 | 0.753 |
Microarray_163 + Malignancy_risk | Early Fusion | Wilcoxon | 0.939 | 8 | 0.747 |
Microarray_163 + Malignancy_risk | Late Fusion | Wilcoxon | 0.940 | 8 | 0.753 |
Microarray_40 + Malignancy_risk | Early Fusion | ReliefF | 0.931 | 3 | 0.693 |
Microarray_40 + Malignancy_risk | Late Fusion | ReliefF | 0.931 | 3 | 0.564 |
Microarray_40 + Malignancy_risk | Early Fusion | Wilcoxon | 0.932 | 6 | 0.582 |
Microarray_40 + Malignancy_risk | Late Fusion | Wilcoxon | 0.932 | 6 | 0.552 |
Dataset | Feature Set | Number of Features | Characteristics |
---|---|---|---|
Microarray_40 | Expression of genes listed in Fujarewicz et al. | 40 | Binomial distribution of features’ correlation |
Microarray_163 | Expression of genes listed in Alexander et al. | 163 | Normal distribution of features’ correlation |
Malignancy_risk | Extracted feature with method Płaczek et al. | 1 | Continuous variable in range 0–1 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Płuciennik, A.; Płaczek, A.; Wilk, A.; Student, S.; Oczko-Wojciechowska, M.; Fujarewicz, K. Data Integration–Possibilities of Molecular and Clinical Data Fusion on the Example of Thyroid Cancer Diagnostics. Int. J. Mol. Sci. 2022, 23, 11880. https://doi.org/10.3390/ijms231911880
Płuciennik A, Płaczek A, Wilk A, Student S, Oczko-Wojciechowska M, Fujarewicz K. Data Integration–Possibilities of Molecular and Clinical Data Fusion on the Example of Thyroid Cancer Diagnostics. International Journal of Molecular Sciences. 2022; 23(19):11880. https://doi.org/10.3390/ijms231911880
Chicago/Turabian StylePłuciennik, Alicja, Aleksander Płaczek, Agata Wilk, Sebastian Student, Małgorzata Oczko-Wojciechowska, and Krzysztof Fujarewicz. 2022. "Data Integration–Possibilities of Molecular and Clinical Data Fusion on the Example of Thyroid Cancer Diagnostics" International Journal of Molecular Sciences 23, no. 19: 11880. https://doi.org/10.3390/ijms231911880
APA StylePłuciennik, A., Płaczek, A., Wilk, A., Student, S., Oczko-Wojciechowska, M., & Fujarewicz, K. (2022). Data Integration–Possibilities of Molecular and Clinical Data Fusion on the Example of Thyroid Cancer Diagnostics. International Journal of Molecular Sciences, 23(19), 11880. https://doi.org/10.3390/ijms231911880