Machine Learning in Prediction of Bladder Cancer on Clinical Laboratory Data
Abstract
:1. Introduction
2. Materials and Methods
2.1. Patient Cohort
2.2. Statistical Analysis
2.3. Data Processing
2.4. Feature Selection and Machine Learning
3. Results
3.1. Clinical Characterisitcs and Clincal Laboratory Data from Patients
3.2. Feature Selection and Sampling Technique Experiment
3.3. Model Evaluation and Comparison
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Saginala, K.; Barsouk, A.; Aluru, J.S.; Rawla, P.; Padala, S.A.; Barsouk, A. Epidemiology of Bladder Cancer. Med. Sci. 2020, 8, 15. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef]
- Burger, M.; Catto, J.W.F.; Dalbagni, G.; Grossman, H.B.; Herr, H.; Karakiewicz, P.; Kassouf, W.; Kiemeney, L.A.; La Vecchia, C.; Shariat, S.; et al. Epidemiology and Risk Factors of Urothelial Bladder Cancer. Eur. Urol. 2013, 63, 234–241. [Google Scholar] [CrossRef]
- Zhu, C.-Z.; Ting, H.-N.; Ng, K.-H.; Ong, T.-A. A review on the accuracy of bladder cancer detection methods. J. Cancer 2019, 10, 4038–4044. [Google Scholar] [CrossRef] [Green Version]
- Planz, B.; Jochims, E.; Deix, T.; Caspers, H.P.; Jakse, G.; Boecking, A. The role of urinary cytology for detection of bladder cancer. Eur. J. Surg. Oncol. 2005, 31, 304–308. [Google Scholar] [CrossRef] [PubMed]
- Hindmarsh, J.T.; Lyon, A.W. Strategies to promote rational clinical chemistry test utilization. Clin. Biochem. 1996, 29, 291–299. [Google Scholar] [CrossRef]
- Huang, X.-J.; Choi, Y.-K.; Im, H.-S.; Yarimaga, O.; Yoon, E.; Kim, H.-S. Aspartate Aminotransferase (AST/GOT) and Alanine Aminotransferase (ALT/GPT) Detection Techniques. Sensors 2006, 6, 756–782. [Google Scholar] [CrossRef] [Green Version]
- Sharma, U.; Pal, D.; Prasad, R. Alkaline phosphatase: An overview. Indian J. Clin. Biochem. 2014, 29, 269–278. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Epstein, E.; Kiechle, F.L.; Artiss, J.D.; Zak, B. The clinical use of alkaline phosphatase enzymes. Clin. Lab. Med. 1986, 6, 491–505. [Google Scholar] [CrossRef]
- Beddhu, S.; Ma, X.; Baird, B.; Cheung, A.K.; Greene, T. Serum alkaline phosphatase and mortality in African Americans with chronic kidney disease. Clin. J. Am. Soc. Nephrol. 2009, 4, 1805–1810. [Google Scholar] [CrossRef] [PubMed]
- Kendall, M.J.; Cockel, R.; Becker, J.; Hawkins, C.F. Raised serum alkaline phosphatase in rheumatoid disease. An index of liver dysfunction? Ann. Rheum. Dis. 1970, 29, 537–540. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Van Hoof, V.O.; Van Oosterom, A.T.; Lepoutre, L.G.; De Broe, M.E. Alkaline phosphatase isoenzyme patterns in malignant disease. Clin. Chem. 1992, 38, 2546–2551. [Google Scholar] [CrossRef] [PubMed]
- Wymenga, L.F.; Boomsma, J.H.; Groenier, K.; Piers, D.A.; Mensink, H.J. Routine bone scans in patients with prostate cancer related to serum prostate-specific antigen and alkaline phosphatase. BJU Int. 2001, 88, 226–230. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Blaustein, M.P. Sodium ions, calcium ions, blood pressure regulation, and hypertension: A reassessment and a hypothesis. Am J. Physiol. 1977, 232, C165–C173. [Google Scholar] [CrossRef] [Green Version]
- Bou-Abboud, E.; Nattel, S. Relative role of alkalosis and sodium ions in reversal of class I antiarrhythmic drug-induced sodium channel blockade by sodium bicarbonate. Circulation 1996, 94, 1954–1961. [Google Scholar] [CrossRef]
- Chovancova, B.; Liskova, V.; Babula, P.; Krizanova, O. Role of Sodium/Calcium Exchangers in Tumors. Biomolecules 2020, 10, 1257. [Google Scholar] [CrossRef]
- Waxman, S.G. Mechanisms of Disease: Sodium channels and neuroprotection in multiple sclerosis—current status. Nat. Clin. Pract. Neurol. 2008, 4, 159–169. [Google Scholar] [CrossRef]
- Rothschild, M.A.; Oratz, M.; Schreiber, S.S. ALBUMIN SYNTHESIS††Supported in part by the U.S. Public Health Service Grants AA 00959 and HL 09562. In Albumin: Structure, Function and Uses; Rosenoer, V.M., Oratz, M., Rothschild, M.A., Eds.; Pergamon: Oxford, UK, 1977; pp. 227–253. [Google Scholar]
- Oettl, K.; Stadlbauer, V.; Petter, F.; Greilberger, J.; Putz-Bankuti, C.; Hallström, S.; Lackner, C.; Stauber, R.E. Oxidative damage of albumin in advanced liver disease. Biochim. Biophys. Acta 2008, 1782, 469–473. [Google Scholar] [CrossRef] [Green Version]
- Nelson, J.J.; Liao, D.; Sharrett, A.R.; Folsom, A.R.; Chambless, L.E.; Shahar, E.; Szklo, M.; Eckfeldt, J.; Heiss, G. Serum albumin level as a predictor of incident coronary heart disease: The Atherosclerosis Risk in Communities (ARIC) study. Am. J. Epidemiol. 2000, 151, 468–477. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.-H.; Feng, S.-Y.; Cai, W.-M.; Chen, X.-F.; Huang, Z.-M. The Relationship between C-Reactive Protein/Albumin Ratio and Disease Activity in Patients with Inflammatory Bowel Disease. Gastroenterol. Res. Pract. 2020, 2020, 3467419. [Google Scholar] [CrossRef]
- Martin, H. Laboratory measurement of urine albumin and urine total protein in screening for proteinuria in chronic kidney disease. Clin. Biochem. Rev. 2011, 32, 97–102. [Google Scholar] [PubMed]
- Borch-Johnsen, K.; Feldt-Rasmussen, B.; Strandgaard, S.; Schroll, M.; Jensen, J.S. Urinary Albumin Excretion. Arterioscler. Thromb. Vasc. Biol. 1999, 19, 1992–1997. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Feher, J. (Ed.) 7.4-Tubular Reabsorption and Secretion. In Quantitative Human Physiology, 2nd ed.; Academic Press: Boston, MA, USA, 2017; pp. 719–729. [Google Scholar]
- Washington, I.M.; Van Hoosier, G. Chapter 3-Clinical Biochemistry and Hematology. In The Laboratory Rabbit, Guinea Pig, Hamster, and Other Rodents; Suckow, M.A., Stevens, K.A., Wilson, R.P., Eds.; Academic Press: Boston, MA, USA, 2012; pp. 57–116. [Google Scholar]
- Uchino, S. Creatinine. Curr. Opin. Crit. Care 2010, 16, 562–567. [Google Scholar] [CrossRef] [PubMed]
- Echeverry, G.; Hortin, G.L.; Rai, A.J. Introduction to Urinalysis: Historical Perspectives and Clinical Application. In The Urinary Proteome: Methods and Protocols; Rai, A.J., Ed.; Humana Press: Totowa, NJ, USA, 2010; pp. 1–12. [Google Scholar]
- Simerville, J.A.; Maxted, W.C.; Pahira, J.J. Urinalysis: A comprehensive review. Am. Fam. Physician 2005, 71, 1153–1162. [Google Scholar] [PubMed]
- Lillian, A.; Mundt, K.S. Chemical Analysis of Urine. In Graff’s Textbook of Routine Urinalysis and Body Fluids; Lippincott Williams & Wilkins: Philadelphia, PA, USA, 2010; Volume 1, pp. 35–53. [Google Scholar]
- Cavanaugh, C.; Perazella, M.A. Urine Sediment Examination in the Diagnosis and Management of Kidney Disease: Core Curriculum 2019. Am. J. Kidney Dis. 2019, 73, 258–272. [Google Scholar] [CrossRef]
- Ismail, A.A. When laboratory tests can mislead even when they appear plausible. Clin. Med. 2017, 17, 329–332. [Google Scholar] [CrossRef] [Green Version]
- Haymond, S.; McCudden, C. Rise of the Machines: Artificial Intelligence and the Clinical Laboratory. J. Appl. Lab. Med. 2021, 6, 1640–1654. [Google Scholar] [CrossRef]
- U.S. National Library of Medicine. Machine Learning-MeSH; U.S. National Library of Medicine: Bethesda, MD, USA, 2016. [Google Scholar]
- Mahesh, B. Machine Learning Algorithms-A Review. Int. J. Sci. Res. (IJSR) 2020, 9, 381–386. [Google Scholar]
- Banerjee, M.; Reynolds, E.; Andersson, H.B.; Nallamothu, B.K. Tree-Based Analysis. Circ. Cardiovasc. Qual. Outcomes 2019, 12, e004879. [Google Scholar] [CrossRef]
- Chang, W.; Liu, Y.; Xiao, Y.; Yuan, X.; Xu, X.; Zhang, S.; Zhou, S. A Machine-Learning-Based Prediction Method for Hypertension Outcomes Based on Medical Data. Diagnostics 2019, 9, 178. [Google Scholar] [CrossRef] [Green Version]
- Zhang, J.; Mucs, D.; Norinder, U.; Svensson, F. LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity-Application to the Tox21 and Mutagenicity Data Sets. J. Chem. Inf. Modeling 2019, 2019, 4150–4158. [Google Scholar] [CrossRef]
- Yu, W.; Liu, T.; Valdez, R.; Gwinn, M.; Khoury, M.J. Application of support vector machine modeling for prediction of common diseases: The case of diabetes and pre-diabetes. BMC Med. Inform. Decis. Mak. 2010, 10, 16. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cai, Z.; Xu, D.; Zhang, Q.; Zhang, J.; Ngai, S.M.; Shao, J. Classification of lung cancer using ensemble-based feature selection and machine learning methods. Mol. Biosyst. 2015, 11, 791–800. [Google Scholar] [CrossRef] [PubMed]
- Gould, M.K.; Huang, B.Z.; Tammemagi, M.C.; Kinar, Y.; Shiff, R. Machine Learning for Early Lung Cancer Identification Using Routine Clinical and Laboratory Data. Am. J. Respir. Crit. Care Med. 2021, 204, 445–453. [Google Scholar] [CrossRef]
- Ibrahim Obaid, O.; Mohammed, M.; Abd Ghani, M.K.; Mostafa, S.; Al-Dhief, F. Evaluating the Performance of Machine Learning Techniques in the Classification of Wisconsin Breast Cancer. Int. J. Eng. Technol. 2018, 7, 160–166. [Google Scholar] [CrossRef]
- Garapati, S.S.; Hadjiiski, L.; Cha, K.H.; Chan, H.P.; Caoili, E.M.; Cohan, R.H.; Weizer, A.; Alva, A.; Paramagul, C.; Wei, J.; et al. Urinary bladder cancer staging in CT urography using machine learning. Med. Phys. 2017, 44, 5814–5823. [Google Scholar] [CrossRef] [PubMed]
- Kouznetsova, V.L.; Kim, E.; Romm, E.L.; Zhu, A.; Tsigelny, I.F. Recognition of early and late stages of bladder cancer using metabolites and machine learning. Metabolomics 2019, 15, 94. [Google Scholar] [CrossRef]
- Lemaître, G.; Nogueira, F.; Aridas, C.K. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. J. Mach. Learn. Res. 2017, 18, 559–563. [Google Scholar]
- Tsai, K.L.; Chang, C.C.; Chang, Y.S.; Lu, Y.Y.; Tsai, I.J.; Chen, J.H.; Lin, S.H.; Tai, C.C.; Lin, Y.F.; Chang, H.W.; et al. Isotypes of autoantibodies against novel differential 4-hydroxy-2-nonenal-modified peptide adducts in serum is associated with rheumatoid arthritis in Taiwanese women. BMC Med. Inform. Decis. Mak. 2021, 21, 49. [Google Scholar] [CrossRef]
- Liu, Y.; Bai, F.; Tang, Z.; Liu, N.; Liu, Q. Integrative transcriptomic, proteomic, and machine learning approach to identifying feature genes of atrial fibrillation using atrial samples from patients with valvular heart disease. BMC Cardiovasc. Disord. 2021, 21, 52. [Google Scholar] [CrossRef]
- Wong, J.; Murray Horwitz, M.; Zhou, L.; Toh, S. Using Machine Learning to Identify Health Outcomes from Electronic Health Record Data. Curr. Epidemiol. Rep. 2018, 5, 331–342. [Google Scholar] [CrossRef]
- Beam, A.L.; Kohane, I.S. Big Data and Machine Learning in Health Care. JAMA 2018, 319, 1317–1318. [Google Scholar] [CrossRef]
- Gunčar, G.; Kukar, M.; Notar, M.; Brvar, M.; Černelč, P.; Notar, M.; Notar, M. An application of machine learning to haematological diagnosis. Sci. Rep. 2018, 8, 411. [Google Scholar] [CrossRef]
- Obermeyer, Z.; Emanuel, E.J. Predicting the Future-Big Data, Machine Learning, and Clinical Medicine. N. Engl. J. Med. 2016, 375, 1216–1219. [Google Scholar] [CrossRef] [Green Version]
- Pedersen, A.B.; Mikkelsen, E.M.; Cronin-Fenton, D.; Kristensen, N.R.; Pham, T.M.; Pedersen, L.; Petersen, I. Missing data and multiple imputation in clinical epidemiological research. Clin. Epidemiol. 2017, 9, 157–166. [Google Scholar] [CrossRef] [Green Version]
- Wei, R.; Wang, J.; Su, M.; Jia, E.; Chen, S.; Chen, T.; Ni, Y. Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data. Sci. Rep. 2018, 8, 663. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Royston, P. Multiple Imputation of Missing Values. Stata J. 2004, 4, 227–241. [Google Scholar] [CrossRef] [Green Version]
- Patrician, P.A. Multiple imputation for missing data. Res. Nurs. Health 2002, 25, 76–84. [Google Scholar] [CrossRef] [PubMed]
- Hong, S.; Lynn, H.S. Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction. BMC Med. Res. Methodol. 2020, 20, 199. [Google Scholar] [CrossRef] [PubMed]
- Kaur, H.; Pannu, H.S.; Malhi, A.K. A Systematic Review on Imbalanced Data Challenges in Machine Learning: Applications and Solutions. ACM Comput. Surv. 2019, 52, 79. [Google Scholar] [CrossRef] [Green Version]
- Wu, J.; Roy, J.; Stewart, W.F. Prediction modeling using EHR data: Challenges, strategies, and a comparison of machine learning approaches. Med. Care. 2010, 48, S106–S113. [Google Scholar] [CrossRef] [PubMed]
- Mohammed, R.; Rawashdeh, J.; Abdullah, M. Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results. In Proceedings of the 2020 11th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 7–9 April 2020; pp. 243–248. [Google Scholar]
- Chang, C.Y.; Hsu, M.T.; Esposito, E.X.; Tseng, Y.J. Oversampling to overcome overfitting: Exploring the relationship between data set composition, molecular descriptors, and predictive modeling methods. J. Chem. Inf. Model. 2013, 53, 958–971. [Google Scholar] [CrossRef]
- Jiang, Z.; Pan, T.; Zhang, C.; Yang, J. A New Oversampling Method Based on the Classification Contribution Degree. Symmetry 2021, 13, 194. [Google Scholar] [CrossRef]
- Ganganwar, V. An overview of classification algorithms for imbalanced datasets. Int. J. Emerg. Technol. Adv. Eng. 2012, 2, 42–47. [Google Scholar]
- Peng, Z.; Yan, F.; Li, X. Comparison of the Different Sampling Techniques for Imbalanced Classification Problems in Machine Learning. In Proceedings of the 2019 11th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), Qiqihar, China, 28–29 April 2019; pp. 431–434. [Google Scholar]
- Michel, F.; Gattegno, B.; Meyrier, A.; Paillard, F.; Roland, J.; Scetbon, V.; Thibault, P. Paraneoplastic Hypercalcemia Associated with Bladder Carcinoma: Report of 2 Cases. J. Urol. 1984, 131, 753–755. [Google Scholar] [CrossRef]
- La Rosa, A.H.; Ali, A.; Swain, S.; Manoharan, M. Resolution of hypercalcemia of malignancy following radical cystectomy in a patient with paraneoplastic syndrome associated with urothelial carcinoma of the bladder. Urol. Ann. 2015, 7, 86–87. [Google Scholar] [CrossRef]
- Huang, P.; Lan, M.; Peng, A.F.; Yu, Q.F.; Chen, W.Z.; Liu, Z.L.; Liu, J.M.; Huang, S.H. Serum calcium, alkaline phosphotase and hemoglobin as risk factors for bone metastases in bladder cancer. PLoS ONE 2017, 12, e0183835. [Google Scholar] [CrossRef]
- Li, D.; Lv, H.; Hao, X.; Hu, B.; Song, Y. Prognostic value of serum alkaline phosphatase in the survival of prostate cancer: Evidence from a meta-analysis. Cancer Manag. Res. 2018, 10, 3125–3139. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Braendengen, M.; Winderen, M.; Fosså, S.D. Clinical significance of routine pre-cystectomy bone scans in patients with muscle-invasive bladder cancer. Br. J. Urol. 1996, 77, 36–40. [Google Scholar] [CrossRef] [PubMed]
- Mao, M.J.; Wei, X.L.; Sheng, H.; Wang, X.P.; Li, X.H.; Liu, Y.J.; Xing, S.; Huang, Q.; Dai, S.Q.; Liu, W.L. Clinical Significance of Preoperative Albumin and Globulin Ratio in Patients with Gastric Cancer Undergoing Treatment. Biomed. Res. Int. 2017, 2017, 3083267. [Google Scholar] [CrossRef]
- Quhal, F.; Pradere, B.; Laukhtina, E.; Sari Motlagh, R.; Mostafaei, H.; Mori, K.; Schuettfort, V.M.; Karakiewicz, P.I.; Rouprêt, M.; Enikeev, D.; et al. Prognostic value of albumin to globulin ratio in non-muscle-invasive bladder cancer. World J. Urol. 2021, 39, 3345–3352. [Google Scholar] [CrossRef] [PubMed]
- Tan, P.; Xie, N.; Ai, J.; Xu, H.; Xu, H.; Liu, L.; Yang, L.; Wei, Q. The prognostic significance of Albumin-to-Alkaline Phosphatase Ratio in upper tract urothelial carcinoma. Sci. Rep. 2018, 8, 12311. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pinto, J.; Carapito, Â.; Amaro, F.; Lima, A.R.; Carvalho-Maia, C.; Martins, M.C.; Jerónimo, C.; Henrique, R.; Bastos, M.L.; Guedes de Pinho, P. Discovery of Volatile Biomarkers for Bladder Cancer Detection and Staging through Urine Metabolomics. Metabolites 2021, 11, 199. [Google Scholar] [CrossRef] [PubMed]
- Laffel, L. Ketone bodies: A review of physiology, pathophysiology and application of monitoring to diabetes. Diabetes Metab. Res. Rev. 1999, 15, 412–426. [Google Scholar] [CrossRef]
- Misra, S.; Oliver, N.S. Utility of ketone measurement in the prevention, diagnosis and management of diabetic ketoacidosis. Diabet. Med. 2015, 32, 14–23. [Google Scholar] [CrossRef] [PubMed]
- Xu, Y.; Huo, R.; Chen, X.; Yu, X. Diabetes mellitus and the risk of bladder cancer: A PRISMA-compliant meta-analysis of cohort studies. Medicine 2017, 96, e8588. [Google Scholar] [CrossRef] [PubMed]
- Chen, H.I.; Liou, S.H.; Loh, C.H.; Uang, S.N.; Yu, Y.C.; Shih, T.S. Bladder cancer screening and monitoring of 4,4’-methylenebis(2-chloroaniline) exposure among workers in Taiwan. Urology 2005, 66, 305–310. [Google Scholar] [CrossRef] [PubMed]
- Matulewicz, R.S.; DeLancey, J.O.; Pavey, E.; Schaeffer, E.M.; Popescu, O.; Meeks, J.J. Dipstick Urinalysis as a Test for Microhematuria and Occult Bladder Cancer. Bladder Cancer 2017, 3, 45–49. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ingelfinger, J.R. Hematuria in Adults. N. Engl. J. Med. 2021, 385, 153–163. [Google Scholar] [CrossRef]
- Gomes, C.M.; Sánchez-Ortiz, R.F.; Harris, C.; Wein, A.J.; Rovner, E.S. Significance of hematuria in patients with interstitial cystitis: Review of radiographic and endoscopic findings. Urology 2001, 57, 262–265. [Google Scholar] [CrossRef]
- Wu, J.; Chen, L.; Wang, Y.; Tan, W.; Huang, Z. Prognostic value of aspartate transaminase to alanine transaminase (De Ritis) ratio in solid tumors: A pooled analysis of 9400 patients. Onco. Targets Ther. 2019, 12, 5201–5213. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Laukhtina, E.; Mostafaei, H.; D’Andrea, D.; Pradere, B.; Quhal, F.; Mori, K.; Miura, N.; Schuettfort, V.M.; Sari Motlagh, R.; Aydh, A.; et al. Association of De Ritis ratio with oncological outcomes in patients with non-muscle invasive bladder cancer (NMIBC). World J. Urol. 2021, 39, 1961–1968. [Google Scholar] [CrossRef]
- Ha, Y.S.; Kim, S.W.; Chun, S.Y.; Chung, J.W.; Choi, S.H.; Lee, J.N.; Kim, B.S.; Kim, H.T.; Yoo, E.S.; Kwon, T.G.; et al. Association between De Ritis ratio (aspartate aminotransferase/alanine aminotransferase) and oncological outcomes in bladder cancer patients after radical cystectomy. BMC Urol. 2019, 19, 10. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, H.Y.; Chen, C.H.; Shi, S.; Chung, C.R.; Wen, Y.H.; Wu, M.H.; Lebowitz, M.S.; Zhou, J.; Lu, J.J. Improving Multi-Tumor Biomarker Health Check-up Tests with Machine Learning Algorithms. Cancers 2020, 12, 1442. [Google Scholar] [CrossRef] [PubMed]
- Shao, C.H.; Chen, C.L.; Lin, J.Y.; Chen, C.J.; Fu, S.H.; Chen, Y.T.; Chang, Y.S.; Yu, J.S.; Tsui, K.H.; Juo, C.G.; et al. Metabolite marker discovery for the detection of bladder cancer by comparative metabolomics. Oncotarget 2017, 8, 38802–38810. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wittmann, B.M.; Stirdivant, S.M.; Mitchell, M.W.; Wulff, J.E.; McDunn, J.E.; Li, Z.; Dennis-Barrie, A.; Neri, B.P.; Milburn, M.V.; Lotan, Y.; et al. Bladder cancer biomarker discovery using global metabolomic profiling of urine. PLoS ONE 2014, 9, e115870. [Google Scholar] [CrossRef]
- Belugina, R.; Karpushchenko, E.; Sleptsov, A.; Protoshchak, V.; Legin, A.; Kirsanov, D. Developing non-invasive bladder cancer screening methodology through potentiometric multisensor urine analysis. Talanta 2021, 234, 122696. [Google Scholar] [CrossRef]
Feature | Data Type | Missing Data | Missing Data (%) |
---|---|---|---|
A/G Ratio | Continuous | 0.441 | 44.1 |
Albumin | Continuous | 0.224 | 22.4 |
ALP | Continuous | 0.378 | 37.8 |
ALT | Continuous | 0.068 | 6.8 |
AST | Continuous | 0.046 | 4.6 |
BUN | Continuous | 0.018 | 1.8 |
Calcium | Continuous | 0.428 | 42.8 |
Chloride | Continuous | 0.084 | 8.4 |
Creatinine | Continuous | 0.005 | 0.5 |
Direct Bilirubin | Continuous | 0.303 | 30.3 |
Estimated GFR | Continuous | 0.008 | 0.8 |
Glucose AC | Continuous | 0.038 | 3.8 |
Nitrite | Categorical | 0.026 | 2.6 |
Urine occult Blood | Categorical | 0.026 | 2.6 |
pH | Continuous | 0.026 | 2.6 |
Potassium | Continuous | 0.035 | 3.5 |
Sodium | Continuous | 0.037 | 3.7 |
Specific Gravity | Continuous | 0.026 | 2.6 |
Strip WBC | Continuous | 0.16 | 16 |
Total Bilirubin | Continuous | 0.216 | 21.6 |
Total Cholesterol | Continuous | 0.19 | 19 |
Total Protein | Continuous | 0.285 | 28.5 |
Triglyceride | Continuous | 0.204 | 20.4 |
Urine epitheilum (UL) | Continuous | 0.43 | 43 |
Urine epithelium count | Continuous | 0.02 | 2 |
Uric acid | Continuous | 0.15 | 15 |
Urine Bilirubin | Categorical | 0.026 | 2.6 |
Urine Glucose | Categorical | 0.16 | 16 |
Urine Ketone | Categorical | 0.16 | 16 |
Urine Protein | Categorical | 0.026 | 2.6 |
Urobilinogen | Categorical | 0.026 | 2.6 |
Algorithm Name | Parameter Name | Parameter Value |
---|---|---|
InfoGainAttributeEval | binarizeNumericAttributes | False |
doNotCheckCapabilities | False | |
missingMerge | True | |
Ranker | generateRanking | True |
numToSelect | −1 | |
Decision tree | criterion of tree | gini |
depth of tree | 4 | |
Random forest | criterion of tree | gini |
estimators | 300 | |
SVM | kernel | rbf |
C value | 1000 | |
gamma | 0.000001 | |
XGBoost | eta | 0.2 |
depth of tree | 7 | |
LightGBM | number of leaf | 100 |
depth of tree | 1 |
Patients | |||
---|---|---|---|
Bladder Cancer | Cystitis | ||
Prediction | bladder cancer | true positive, TP | false positive, FP |
cystitis | false negative, FN | true negative, TN |
Cystitis | Kidney Cancer | Prostate Cancer | Bladder Cancer | Uterus Cancer | |
---|---|---|---|---|---|
n = 144 | n = 200 | n = 201 | n = 591 | n = 200 | |
age | 60.12 ± 11.99 | 63.41 ± 10.45 ** | 71.83 ± 6.42 ** | 66.73 ± 9.4 ** | 60.86 ± 10.26 |
sex | 88 (61.1%) | 138 (69%) | 201 (100%) ** | 386 (65.3%) | 0 ** |
hypertension | 34 (23.7%) | 72 (36%) * | 66 (32.8%) | 173 (29.3%) | 22 (11%) * |
diabetes | 20 (13.9%) | 46 (23%) * | 33 (16.4%) | 93 (15.7%) | 8 (4%) ** |
smoking | 18 (12.5%) | 51 (25.5%) * | 47 (23.4%) * | 138 (23.4%) * | 9 (4.5%) * |
drinking | 23 (16%) | 41 (20.5%) | 64 (31.8%) ** | 118 (20%) | 17 (8.5%) ** |
beetle nuts | 2 (1.4%) | 3 (1.5%) | 3 (1.5%) | 20 (3.4%) | 1 (0.5%) |
family history | 1 (0.7%) | 7 (3.5%) | 5 (2.5%) | 12 (2%) | 7 (3.5%) |
A/G Ratio | - | 1.75 ± 0.4 | 1.07 ± 0.46 | 1.61 ± 0.47 | 1.64 ± 0.38 |
Albumin | 3.96 ± 0.64 | 4.27 ± 0.57 ** | 4 ± 0.68 | 4 ± 0.68 | 4.15 ± 0.58 * |
ALP | 71 (55, 91) | 66 (52, 79) ** | 66 (60, 72) ** | 69 (55, 90) ** | 65.5 (53, 79) |
ALT | 30.12 ± 42.66 | 31.46 ± 35.99 | 29.94 ± 32.49 | 27.45 ± 31.9 | 25.65 ± 28.02 |
AST | 28.31 ± 32.07 | 30.56 ± 29.51 | 43.13 ± 74.83 * | 40.65 ± 293.66 | 29.33 ± 53.46 |
BUN | 14 (11, 21) | 16 (12, 21) ** | 17 (13, 22.9) * | 16 (12, 26) ** | 12 (9, 16.85) * |
Calcium | 9 (8.5, 9.4) | 9.3 (8.85, 9.6) | 8.895 (8.3, 9.4) ** | 9 (8.5, 9.4) ** | 9.2 (8.65, 9.65) ** |
Chloride | 105.2 (103, 107.5) | 105 (103, 107) * | 105 (102.075, 107.225) | 105 (102, 108) * | 106 (104, 108) ** |
Creatinine | 0.9 (0.7, 1.2) | 1.1 (0.9, 1.5) ** | 1 (0.9, 1.2) * | 1.1 (0.8, 1.5) ** | 0.7 (0.6, 0.8) ** |
Direct Bilirubin | 0.11 (0.1, 0.2) | 0.1 (0.1, 0.2) ** | 0.2 (0.1, 0.2) | 0.1 (0.1, 0.2) * | 0.1 (0.1, 0.18) |
Estimated GFR | 75.38 ± 33.22 | 62.68 ± 32.54 ** | 71.92 ± 27.16 | 64.13 ± 34 ** | 91.57 ± 28.86 ** |
Glucose AC | 120.65 ± 47.39 | 123.11 ± 41.14 | 129.36 ± 56.6 | 124.83 ± 93.83 | 109.57 ± 26.12 * |
pH | 6.24 ± 0.89 | 6.03 ± 0.74 * | 6.09 ± 0.85 | 6.26 ± 0.86 | 6.12 ± 0.71 |
Potassium | 3.92 ± 0.5 | 4.14 ± 0.52 ** | 4.02 ± 0.48 | 4.12 ± 0.61 ** | 4.06 ± 0.46 ** |
Sodium | 139 (137, 141) | 139 (137, 140.775) | 138 (136, 140) * | 139 (137, 140.8) | 139 (138.5, 141) ** |
Specific Gravity | 1.016 (1.012, 1.021) | 1.017 (1.013, 1.021) | 1.017 (1.011, 1.022) | 1.014 (1.01, 1.0195) * | 1.016 (1.011, 1.021) |
Total Bilirubin | 0.9 ± 0.49 | 0.84 ± 0.31 | 1.08 ± 1.25 | 1.04 ± 1.93 | 0.81 ± 0.68 |
Total Cholesterol | 190.15 ± 49.57 | 182.2 ± 42.36 | 189.88 ± 45.34 | 183.76 ± 44.58 | 200.1 ± 50.31 |
Total Protein | 6.6 (6.0625, 7) | 6.9 (6.55, 7.2) ** | 6.7 (6, 7.1) | 6.7 (6.1, 7.1325) ** | 6.9 (6.3, 7.3) |
Triglyceride | 143.47 ± 88.07 | 141.88 ± 170.99 | 142.3 ± 89.64 | 131.97 ± 91.15 | 133.65 ± 141.94 |
Uric acid | 5.82 ± 1.83 | 6.1 ± 1.64 | 6.36 ± 2.61 | 6.22 ± 1.88 * | 5.4 ± 1.76 |
Urine epitheilum (UL) | - | 0 (0, 6) | 2.5 (0, 5.75) | 3 (0, 9) | - |
Urine epithelium count | 0 (0, 2) | 0 (0, 2) | 0 (0, 2) ** | 1 (0, 3) | 2 (0, 3) * |
Nitrite | p = 0.001 | ||||
0 | 127 (88.2%) | 194 (97%) | 188 (93.5%) | 493 (83.4%) | 181 (90.5%) |
1 | 17 (11.8%) | 6 (3%) | 13 (6.5%) | 98 (16.6%) | 19 (9.5%) |
Strip WBC | p = 0.001 | p = 0.003 | |||
0 | 49 (34%) | 140 (70%) | 145 (72.1%) | 298 (50.4%) | 79 (39.5%) |
1 | 24 (16.9%) | 44 (22%) | 34 (17%) | 133 (62.1%) | 79 (39.5%) |
2 | 12 (8.3%) | 12 (6%) | 10 (5%) | 73 (12.4%) | 24 (12%) |
3 | 13 (9%) | 4 (2%) | 12 (6%) | 87 (14.7%) | 18 (9%) |
Urine Bilirubin | |||||
0 | 137 (95.1%) | 190 (95%) | 190 (94.5%) | 549 (94.9%) | 194 (97%) |
1 | 4 (2.8%) | 10 (5%) | 11 (5.5%) | 28 (4.7%) | 4 (2%) |
2 | 2 (1.4%) | 0 | 0 | 7 (1.2%) | 2 (1%) |
3 | 1 (0.7%) | 0 | 0 | 7 (1.2%) | 0 |
Urine Glucose | |||||
0 | 86 (59.7%) | 179 (89.5%) | 183 (91%) | 509 (86.1%) | 185 (92.5%) |
1 | 8 (5.6%) | 12 (6%) | 8 (4%) | 52 (8.8%) | 9 (4.5%) |
2 | 1 (0.7%) | 3 (1.5%) | 2 (1%) | 13 (2.2%) | 3 (1.5%) |
3 | 3 (2.1%) | 6 (3%) | 8 (4%) | 17 (2.9%) | 3 (1.5%) |
Urine Ketone | |||||
0 | 86 (59.7%) | 184 (92%) | 176 (87.6%) | 523 (88.5%) | 156 (78%) |
1 | 10 (7%) | 14 (7%) | 24 (12%) | 58 (9.8%) | 38 (19%) |
2 | 0 | 0 | 0 | 6 (1%) | 5 (2.5%) |
3 | 2 (1.4%) | 2 (1%) | 1 (0.5%) | 4 (0.7%) | 1 (0.5%) |
Urine Protein | p < 0.0001 | ||||
0 | 75 (52.1%) | 124 (62%) | 118 (58.7%) | 279 (47.2%) | 156 (78%) |
Trace | 16 (11.1%) | 25 (12.5%) | 25 (12.4%) | 50 (8.5%) | 15 (7.5%) |
1 | 15 (10.4%) | 19 (9.5%) | 27 (13.4%) | 91 (15.4%) | 9 (4.5%) |
2 | 24 (16.7%) | 19 (9.5%) | 23 (11.4%) | 97 (16.4%) | 12 (6%) |
3 | 14 (9.7%) | 13 (6.5%) | 8 (4%) | 74 (12.5%) | 8 (4%) |
Urobilinogen | p = 0.001 | ||||
0 | 2 (1.4%) | 7 (3.5%) | 6 (3%) | 20 (3.4%) | 1 (0.5%) |
0.1 | 55 (38.2%) | 62 (31%) | 59 (29.4%) | 200 (33.8%) | 124 (62%) |
0.2 | 55 (38.2%) | 84 (42%) | 83 (41.3%) | 261 (44.2%) | 53 (26.5%) |
1 | 29 (20.1%) | 47 (23.5%) | 52 (25.9%) | 106 (17.9%) | 20 (10%) |
2 | 3 (2.1%) | 0 | 1 (0.5%) | 2 (0.3%) | 1 (0.5%) |
4 | 0 | 0 | 0 | 2 (0.3%) | 1 (0.5%) |
Urine occult Blood | p < 0.0001 | p = 0.001 | p = 0.016 | ||
0 | 52 (36.1%) | 119 (59.5%) | 112 (55.7%) | 201 (34%) | 85 (42.5%) |
Trace | 16 (11.1%) | 23 (11.5%) | 22 (10.9) | 66 (11.2%) | 26 (13%) |
1 | 12 (8.3%) | 25 (12.5%) | 16 (8%) | 53 (9%) | 28 (14%) |
2 | 19 (13.2%) | 13 (6.5%) | 21 (10.4%) | 73 (12.4%) | 29 (14.5%) |
3 | 45 (31.3%) | 20 (10%) | 30 (14.9%) | 198 (33.5%) | 32 (16%) |
Comparison Group | Top Selected Feature | |
---|---|---|
cystitis | kidney cancer | Calcium |
prostate cancer | ALP | |
bladder cancer | Albumin | |
uterus cancer | Urine Ketone | |
kidney cancer | bladder cancer | Urine occult blood |
prostate cancer | ALP | |
uterus cancer | Calcium | |
bladder cancer | prostate cancer | ALP |
uterus cancer | Creatinine |
Models | Accuracy (95%CI) | Precision (95%CI) | f1 Score (95%CI) | Sensitivity (95%CI) | Specificity (95%CI) | AUC (95%CI) |
---|---|---|---|---|---|---|
decision tree | 76.2% (71–81.5%) | 77.9% (69.1–86.6%) | 74.6% (69.8–79.5%) | 73.2% (62.5–83.9%) | 78.1% (64.9–91.3%) | 0.775 (0.711–0.839) |
random forest | 83.1% (78.7–87.5%) | 78.2% (71.5–85%) | 81.6% (74.3–88.9%) | 85.5% (76–94.9%) | 79.4% (72–86.8%) | 0.887 (0.826–0.947) |
SVM | 71.7% (63.5–80%) | 81.9% (71.3–92.4%) | 65.5% (51.9–79.1%) | 55.7% (40.6–70.8%) | 86.7% (78.9–94.5%) | 0.736 (0.624–0.849) |
XGBoost | 82.8% (76.7–88.8%) | 84.7% (74.5–94.9%) | 82.7% (76.2–89.2%) | 81.4% (75.1–87.7%) | 83.3% (71–95.7%) | 0.879 (0.819–0.939) |
lightGBM | 87.6% (81–94.1%) | 86.3% (77.9–94.6%) | 87.7% (81.8–93.5%) | 89.5% (84.2–94.9%) | 85.5% (75.2–95.7%) | 0.932 (0.862–1.000) |
Groups | Accuracy (95%CI) | Precision (95%CI) | f1 Score (95%CI) | Sensitivity (95%CI) | Specificity (95%CI) | AUC (95%CI) |
---|---|---|---|---|---|---|
bladder cancer uterus cancer | 86.9% (77.3–96.5%) | 87.1% (76.5–97.7%) | 87.3% (77.5–97%) | 87.8% (77.3–98.3%) | 86.7% (76.8–96.6%) | 0.918 (0.849–0.988) |
bladder cancer prostate cancer | 84.8% (76–93.6%) | 86.6% (76.8–96.4%) | 85.1% (75.3–94.9%) | 84.4% (71.9–96.9%) | 85.1% (76.4–93.8%) | 0.883 (0.823–0.942) |
bladder cancer cystitis | 87.6% (81–94.1%) | 86.3% (77.9–94.6%) | 87.7% (81.8–93.5%) | 89.5% (84.2–94.9%) | 85.5% (75.2–95.7%) | 0.932 (0.862–1.001) |
bladder cancer kidney cancer | 84.5% (78.3–90.6%) | 83% (73.2–92.9%) | 84.5% (78.3–90.7%) | 86.8% (79.9–93.7%) | 82.9% (72.7–93.2%) | 0.928 (0.88–0.977) |
kidney cancer cystitis | 86.2% (78.2–94.2%) | 86.8% (78–95.6%) | 86.9% (79–94.9%) | 88% (76.5–99.6%) | 84.2% (73–95.5%) | 0.903 (0.854–0.952) |
uterus cancer cystitis | 83.8% (77.1–90.5%) | 87% (76.8–97.3%) | 83.7% (76.7–90.7%) | 81.8% (71.2–92.5%) | 86.5% (76.9–96.1%) | 0.915 (0.834–0.995) |
prostate cancer cystitis | 87.6% (82.9–92.2%) | 88.5% (80.3–96.6%) | 87% (81.7–92.4%) | 86.7% (77.5–95.8%) | 88.4% (77.9–98.9%) | 0.944 (0.903–0.986) |
prostate cancer uterus cancer | 84.1% (77–91.3%) | 82.5% (68.2–96.7%) | 82.3% (70.1–94.5%) | 82.7% (70.5–94.9%) | 84.7% (76.4–93.1%) | 0.897 (0.837–0.957) |
kidney cancer uterus cancer | 86.2% (78.4–94%) | 87.1% (76.5–97.7%) | 86.8% (79–94.5%) | 87.1% (79.5–94.6%) | 86.2% (75.6–96.9%) | 0.923 (0.858–0.988) |
kidney cancer prostate cancer | 84.5% (77.7–91.2%) | 82.1% (72.7–91.6%) | 84.9% (79.1–90.7%) | 88.6% (82.5–94.6%) | 81.3% (69.1–93.5%) | 0.91 (0.849–0.972) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tsai, I.-J.; Shen, W.-C.; Lee, C.-L.; Wang, H.-D.; Lin, C.-Y. Machine Learning in Prediction of Bladder Cancer on Clinical Laboratory Data. Diagnostics 2022, 12, 203. https://doi.org/10.3390/diagnostics12010203
Tsai I-J, Shen W-C, Lee C-L, Wang H-D, Lin C-Y. Machine Learning in Prediction of Bladder Cancer on Clinical Laboratory Data. Diagnostics. 2022; 12(1):203. https://doi.org/10.3390/diagnostics12010203
Chicago/Turabian StyleTsai, I-Jung, Wen-Chi Shen, Chia-Ling Lee, Horng-Dar Wang, and Ching-Yu Lin. 2022. "Machine Learning in Prediction of Bladder Cancer on Clinical Laboratory Data" Diagnostics 12, no. 1: 203. https://doi.org/10.3390/diagnostics12010203
APA StyleTsai, I. -J., Shen, W. -C., Lee, C. -L., Wang, H. -D., & Lin, C. -Y. (2022). Machine Learning in Prediction of Bladder Cancer on Clinical Laboratory Data. Diagnostics, 12(1), 203. https://doi.org/10.3390/diagnostics12010203