Personalized Data Analysis Approach for Assessing Necessary Hospital Bed-Days Built on Condition Space and Hierarchical Predictor
Abstract
:1. Introduction
- The mathematical formalization of the condition space for personalized medical data is developed. It allows for predicting target variables in a sub-part of the space with higher predictive accuracy.
- A dataset with information about personal parameters of 51 patients was collected, which allowed generalization and deeper analysis (https://doi.org/10.6084/m9.figshare.14865411.v1, accessed on 28 June 2021).
- The hierarchical predictor consists of splitting objects and prediction for each separated cluster is developed. It produces 1.47 times greater predictive accuracy than the best weak predictor (perceptron with 12 units in single hidden layer).
- The collected dataset is too small. That is why a specific method based on the hierarchical predictor is proposed for small dataset analysis. Five-fold cross-validation is also used for results validation.
2. Materials and Methods
3. Results
3.1. The Experimental Setup
- Exploratory data analysis (feature normalization and encoding);
- The condition space development;
- Weak predictors selection;
- The hierarchical predictor development;
- Results evaluation.
3.2. Dataset Description
- Age (time-dependent parameter, performance indicator )—integer;
- Sex (time-independed parameter)—boolean;
- Weight (time-dependent parameter, performance indicator )—categorical;
- Date admission—date;
- Diagnosis (time-independent parameter, used for choosing protocol )—categorical;
- Related diagnosis (time-dependent parameter, )—categorical;
- Flora (time-dependent parameter, )—categorical;
- Medicament (time-dependent parameter, depends on )—categorical;
- Active substance (time-dependent parameter, depends on )—categorical;
- Time in hospital (time-dependent parameter, bed-days in hospital, target parameter)—integer.
3.3. Condition Space Development
- the most significant features selection;
- the splitting of instances into clusters with similar time-dependent and time-independent parameters.
- diagnos_furunkulois;
- sub_diabet;
- Ceftriaxon;
- sex.
- diagnos_furunkulois;
- sub_diabet;
- diagnos_gidroenit;
- Ceftriaxon;
- diagnos_flegmona;
- sex.
3.4. Weak Predictors Selection
3.5. The Hierarchical Predictor Development
- 1.
- Fuzzy c-means divide objects into four clusters (Table 1);
- 2.
- Linear regression random forest, SVM with radial basis kernel, and SVM with polynomial kernel are used for each cluster separately;
- 3.
- Average voting on the obtained results is provided. Based on it, average value will be selected.
4. Discussion
- The authors suggest that the accuracy of the prediction will strongly depend on the number of empty values.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Odeloui, A.E.; Edoh, T.O. A Context-Aware Machine-to-Machine-Enabled Pervasive Cardiac Telemetry for Personalizing Health Care Delivery. In Proceedings of the 2nd IEEE International Rural and Elderly Health Informatics Conference, Cotonou, Benin, 3–4 December 2018; pp. 1–8. [Google Scholar]
- Djulbegovic, B.; Guyatt, G.H. Progress in evidence-based medicine: A quarter century on. Lancet 2017, 390, 415–423. [Google Scholar] [CrossRef]
- Danhof, M.; Klein, K.; Stolk, P.; Aitken, M.; Leufkens, H. The future of drug development: The paradigm shift towards systems therapeutics. Drug Discov. Today 2018, 23, 1990–1995. [Google Scholar] [CrossRef]
- Kuznietsova, N.V.; Bidyuk, P.I. Business intelligence techniques for missing data imputation. Sci. News Natl. Tech. Univ. Ukr. 2015, 5, 7–56. [Google Scholar]
- Mishyna, M.; Volokh, O.; Danilova, Y.; Gerasimova, N.; Pechnikova, E.; Sokolova, O. Effects of radiation damage in studies of protein-DNA complexes by cryo-EM. Micron 2017, 96, 57–64. [Google Scholar] [CrossRef]
- Khanmohammadi, S. An improved synchronization likelihood method for quantifying neuronal synchrony. Comput. Biol. Med. 2017, 91, 80–95. [Google Scholar] [CrossRef]
- Perov, Y.; Graham, L.; Gourgoulias, K.; Richens, J.; Lee, C.; Baker, A.; Johri, S. Multiverse: Causal Reasoning Using Importance Sampling in Probabilistic Programming. In Proceedings of the Symposium on Advances in Approximate Bayesian Inference PMLR, Vancouver, BC, Canada, 8 December 2019; pp. 1–36. [Google Scholar]
- Tang, Y.; Wang, Y.; Cooper, K.M.; Li, L. Towards big data Bayesian Network Learning-an Ensemble Learning Based Approach. In Proceedings of the IEEE International Congress on Big Data, Anchorage, AK, USA, 27 June–2 July 2014; pp. 355–357. [Google Scholar]
- Lakho, S.; Jalbani, A.H.; Vighio, M.S.; Memon, I.A.; Soomro, S.S.; Soomro, Q.-U.-N. Decision Support System for Hepatitis Disease Diagnosis using Bayesian Network. Sukkur IBA J. Comput. Math. Sci. 2017, 1, 11–19. [Google Scholar] [CrossRef] [Green Version]
- Seixas, F.L.; Zadrozny, B.; Laks, J.; Conci, A.; Saade, D.C.M. A Bayesian network decision model for supporting the diagnosis of dementia, Alzheimer’s disease and mild cognitive impairment. Comput. Biol. Med. 2014, 51, 140–158. [Google Scholar] [CrossRef] [Green Version]
- Perova, I.; Bodyanskiy, Y. Fast Medical Diagnostics Using Autoassociative Neuro-Fuzzy Memory. Int. J. Comput. 2017, 16, 34–40. [Google Scholar] [CrossRef]
- Bhatt, C.; Dey, N.; Ashour, A. Internet of Things and Big Data Technologies for Next Generation Healthcare; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
- Podletskaya, N.I.; Divak, M.P. Information technology for the identification of the reverse laryngeal nerve during thyroid surgery. Meas. Comput. Technol. Technol. Process. 2015, 1, 151–157. [Google Scholar]
- Silva-Ramírez, E.L.; Pino-Mejías, R.; López-Coello, M. Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns. Appl. Soft Comput. 2015, 29, 65–74. [Google Scholar] [CrossRef]
- Chiacchio, F.; Aizpurua, J.I.; D’Urso, D.; Compagno, L. Coherence region of the Priority-AND gate: 5 Analytical and numerical examples. Qual. Reliab. Eng. Int. 2018, 34, 107–115. [Google Scholar] [CrossRef]
- Tsai, C.W.; Wu, N.-K.; Huang, C.-H. A multiple-state discrete-time Markov chain model for estimating suspended sediment concentrations in open channel flow. Appl. Math. Model. 2016, 40, 10002–10019. [Google Scholar] [CrossRef] [Green Version]
- Kadri, H.; Ahmed, S.B.; Collart-Dutilleul, S. Formal approach to control design of complex and dynamical systems. Procedia Comput. Sci. 2017, 108, 2512–2516. [Google Scholar] [CrossRef]
- Masic, I.; Miokovic, M.; Muhamedagic, B. Evidence Based Medicine—New Approaches and Challenges. Acta Inform. Medica 2008, 16, 219–225. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sobrinho, A.; Perkusich, A.; Da Silva, L.D.; Cordeiro, T.; Rêgo, J.; Cunha, P. Towards medical device certification: A colored Petri Nets model of a surface electrocardiography device. In Proceedings of the IECON 2014—40th Annual Conference of the IEEE Industrial Electronics Society, Dallas, TX, USA, 29 October–1 November 2014; pp. 2645–2651. [Google Scholar]
- Boubeta-Puig, J.; Díaz, G.; Macià, H.; Valero, V.; Ortiz, G. MEdit4CEP-CPN: An approach for complex event processing modeling by prioritized colored petri nets. Inf. Syst. 2019, 81, 267–289. [Google Scholar] [CrossRef]
- Anand, N.; Sehgal, R.; Anand, S.; Kaushik, A. Feature selection on educational data using Boruta algorithm. Int. J. Comput. Intell. Stud. 2021, 10, 27–35. [Google Scholar] [CrossRef]
- Pakhira, M.K. Finding Number of Clusters before Finding Clusters. Procedia Technol. 2012, 4, 27–37. [Google Scholar] [CrossRef] [Green Version]
- Li, Z.; Tian, Z.; Zhou, M.; Zhang, Z.; Jin, Y. Awareness of Line-of-Sight Propagation for Indoor Localization Using Hopkins Statistic. IEEE Sens. J. 2018, 18, 3864–3874. [Google Scholar] [CrossRef]
- Melnykova, N.; Shakhovska, N.; Greguš, M.; Melnykov, V. Using Big Data for Formalization the Patient’s Personalized Data. Procedia Comput. Sci. 2019, 155, 624–629. [Google Scholar] [CrossRef]
- Shakhovska, N.; Izonin, I.; Melnykova, N. The Hierarchical Classifier for COVID-19 Resistance Evaluation. Data 2021, 6, 6. [Google Scholar] [CrossRef]
- Izonin, I.; Tkachenko, R.; Dronyuk, I.; Tkachenko, P.; Gregus, M.; Rashkevych, M. Predictive modeling based on small data in clinical medicine: RBF-based additive input-doubling method. Math. Biosci. Eng. 2021, 18, 2599–2613. [Google Scholar] [CrossRef] [PubMed]
Memberships | 1 | 2 | 3 | Memberships | 1 | 2 | 3 |
---|---|---|---|---|---|---|---|
[1] | 0.89150163 | 0.06212372 | 0.046374647 | [26] | 0.31107702 | 0.65580938 | 0.033113601 |
[2] | 0.77796788 | 0.08358509 | 0.138447034 | [27] | 0.15995697 | 0.81908319 | 0.020959847 |
[3] | 0.56614562 | 0.09556459 | 0.338289785 | [28] | 0.07447833 | 0.90846830 | 0.017053375 |
[4] | 0.86018278 | 0.10377855 | 0.036038668 | [29] | 0.07030249 | 0.91359600 | 0.016101503 |
[5] | 0.68808007 | 0.09069853 | 0.221221402 | [30] | 0.05056537 | 0.94033247 | 0.009102165 |
[6] | 0.03321895 | 0.96077754 | 0.006003508 | [31] | 0.07316832 | 0.90999577 | 0.016835911 |
[7] | 0.30215319 | 0.66503031 | 0.032816504 | [32] | 0.07499601 | 0.90979079 | 0.015213199 |
[8] | 0.14472035 | 0.83639243 | 0.018887226 | [33] | 0.05565317 | 0.93171882 | 0.012628002 |
[9] | 0.81184745 | 0.15376100 | 0.034391551 | [34] | 0.47632097 | 0.48284576 | 0.040833272 |
[10] | 0.04098111 | 0.95165817 | 0.007360716 | [35] | 0.03983866 | 0.95303284 | 0.007128497 |
[11] | 0.91012330 | 0.05144097 | 0.038435725 | [36] | 0.90736664 | 0.06052699 | 0.032106373 |
[12] | 0.14787502 | 0.83223998 | 0.019885001 | [37] | 0.08044003 | 0.03597534 | 0.883584625 |
[13] | 0.64440750 | 0.31189553 | 0.043696971 | [38] | 0.89430505 | 0.07879489 | 0.026900054 |
[14] | 0.89189168 | 0.06118803 | 0.046920295 | [39] | 0.55963468 | 0.09392417 | 0.346441148 |
[15] | 0.05546220 | 0.93451510 | 0.010022703 | [40] | 0.90959793 | 0.05132587 | 0.039076209 |
[16] | 0.04969225 | 0.94011428 | 0.010193473 | [41] | 0.18072499 | 0.79517059 | 0.024104421 |
[17] | 0.64818641 | 0.30658830 | 0.045225286 | [42] | 0.78670219 | 0.08077068 | 0.132527121 |
[18] | 0.89320798 | 0.07949093 | 0.027301094 | [43] | 0.09733532 | 0.02808194 | 0.874582736 |
[19] | 0.06356521 | 0.92325893 | 0.013175858 | [44] | 0.06184420 | 0.02651523 | 0.911640561 |
[20] | 0.89348356 | 0.07989919 | 0.026617253 | [45] | 0.48991432 | 0.46830619 | 0.041779495 |
[21] | 0.05293598 | 0.93743658 | 0.009627441 | [46] | 0.85462052 | 0.06862292 | 0.076756557 |
[22] | 0.08066868 | 0.90078226 | 0.018549056 | [47] | 0.31122970 | 0.65498914 | 0.033781156 |
[23] | 0.88746299 | 0.08389872 | 0.028638293 | [48] | 0.88266229 | 0.08734761 | 0.029990095 |
[24] | 0.06510596 | 0.92012271 | 0.014771324 | [49] | 0.89659897 | 0.05858958 | 0.044811455 |
[25] | 0.07236656 | 0.91123006 | 0.016403377 | [50] | 0.17377591 | 0.04574853 | 0.780475563 |
[51] | 0.66297817 | 0.29258686 | 0.044434975 |
Model | RMSE | MAPE |
---|---|---|
Linear regression | 6.109753 | 0.4946111 |
Regression tree> | 4.970676 | 0.4763212 |
Random forest (500 trees, mtree-3) | 3.560431 | 0.3753441 |
knn | 3.360431 | 0.3753441 |
SVM with radial basis kernel | 3.194611 | 0.2923931 |
SVM with polynomial kernel | 2.262171 | 0.2670496 |
ANN with 12 units in single hidden layer | 2.0972937 | 0.2025539 |
Model Based on Selected Variables | RMSE | MAPE |
---|---|---|
Linear regression | 3.972548 | 0.3240777 |
Regression tree | 4.970676 | 0.4763212 |
Random forest (500 trees, mtree-3) | 3.546447 | 0.2730968 |
knn | 3.346447 | 0.2730968 |
SVM with radial basis kernel | 3.095239 | 0.2386008 |
SVM with polynomial kernel | 2.226906 | 0.1766799 |
ANN with 12 units in single hidden layer | 2.059849 | 0.1513523 |
Model Based on Whole Variables | RMSE | MAPE |
---|---|---|
Hierarchical predictor | 1.401258 | 0.137792 |
Model Based on Selected Variables | RMSE | MAPE |
Hierarchical predictor | 1.401257 | 0.102961 |
Hierarchical predictor with repeated K-fold cross-validation, 5-fold, repeated 3 times | 1.401125 | 0.102753 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Melnykova, N.; Shakhovska, N.; Melnykov, V.; Melnykova, K.; Lishchuk-Yakymovych, K. Personalized Data Analysis Approach for Assessing Necessary Hospital Bed-Days Built on Condition Space and Hierarchical Predictor. Big Data Cogn. Comput. 2021, 5, 37. https://doi.org/10.3390/bdcc5030037
Melnykova N, Shakhovska N, Melnykov V, Melnykova K, Lishchuk-Yakymovych K. Personalized Data Analysis Approach for Assessing Necessary Hospital Bed-Days Built on Condition Space and Hierarchical Predictor. Big Data and Cognitive Computing. 2021; 5(3):37. https://doi.org/10.3390/bdcc5030037
Chicago/Turabian StyleMelnykova, Nataliia, Nataliya Shakhovska, Volodymyr Melnykov, Kateryna Melnykova, and Khrystyna Lishchuk-Yakymovych. 2021. "Personalized Data Analysis Approach for Assessing Necessary Hospital Bed-Days Built on Condition Space and Hierarchical Predictor" Big Data and Cognitive Computing 5, no. 3: 37. https://doi.org/10.3390/bdcc5030037
APA StyleMelnykova, N., Shakhovska, N., Melnykov, V., Melnykova, K., & Lishchuk-Yakymovych, K. (2021). Personalized Data Analysis Approach for Assessing Necessary Hospital Bed-Days Built on Condition Space and Hierarchical Predictor. Big Data and Cognitive Computing, 5(3), 37. https://doi.org/10.3390/bdcc5030037