On the Reliability of Machine Learning Models for Survival Analysis When Cure Is a Possibility
Abstract
:1. Introduction to Survival Analysis
- We propose to adapt the nonparametric mixture cure model approach by [13] to ML methods. We show that all state-of-the-art ML techniques can be used to compute the cure rate by placing a slight modification in the algorithms.
- An extensive study is conducted, in which we compare the performance of the adapted ML algorithms with existing cure models. Furthermore, a real data application is also presented, in which we show the gain of using some of the proposed methods in specific situations and some drawbacks to take into account when working with small sample sizes.
2. Cure Models
2.1. Parametric MCM
2.2. Semiparametric MCM
2.3. Nonparametric MCM
3. Machine Learning Techniques Applied to Survival Analysis
- Survival Trees: the tree-like approaches were the first machine learning models successfully used to handle censored data. Similarly to other tree-like algorithms, the main idea is to split the data according to the similarity among observations, grouping similar data in the same branches while keeping the different data apart. The key of these models is the splitting criterion, which can be classified into two categories: those that minimize within-node homogeneity [59,60,61,62,63,64] and those that maximize between-node heterogeneity [65,66,67]. Amongst the last group, the more recent Random Survival Forests [68,69,70] is the best tree-like survival model to this date.
- Support Vector Machines (SVMs): although SVMs were firstly developed to be used in classification, they were modified and adapted to regression problems, giving rise to the support vector regression (SVR) model [71]. In order to handle censored data, ref. [72] considered uncensored instances in the SVR, with the disadvantage of ignoring the information about the censored observations; Ref. [73] added the constraint classification approach to the SVM formulation, increasing the complexity of the algorithm to quadratic scale with respect the number of observations. Ref. [74] proposed a SVR for censored data (SVRc) that uses an asymmetric loss functions to allow right and left censored observations to be processed, and refs. [75,76] introduced different SVR-based approaches that combine both ranking and regression methods in survival data analysis.
- Bayesian Networks: both naive-Bayes (NB) and Bayesian networks are models which predict the probability of the event of interest based on the Bayes theorem. Ref. [77] modeled the censored data by introducing a Bayesian framework to carry out automatic relevance determination (ARD) in feedforward neural networks, and ref. [78] effectively integrate Bayesian methods with an AFT model by adapting the prior probability of the event occurrence for future time points. The main drawback of NB methods is the assumptions of independence between all the features, which may not be realistic in survival analysis [58].
- Neural Networks: Artificial neural networks (ANNs), originally introduced by [79], are one of the most used machine learning models in survival analysis. In the literature, there are three main groups, depending on the considered output for the neural network:
- Neural networks based on the discrete-time survival likelihood, where the output is the discrete survival time.
3.1. Neural Networks Based on a Cox PH Model
3.2. Neural Networks Based on the Discrete-Time Survival Likelihood
3.3. Machine Learning Algorithm’s Selection for This Study
4. Simulation Study
5. Real Data Application
5.1. Application to COVID-19 Patients
5.2. Application to Bone Marrow Transplant Patients
6. Discussion
- There is a high discrepancy between the results obtained by the tested methods in recent classic survival analysis studies [7,9] and the ones obtained here, when cure is a possibility. Previous studies conclude that techniques like Random Survival Forests, DeepHit, and Deep Survival Machine achieve the best results in terms of the concordance index. However, when dealing with survival scenarios where cure is a possibility, the previous results do not translate. The experiments conducted in this paper conclude that mixture cure models and deep learning Cox-based approaches are the ones that best approximate the cure rate distribution.
- The MCM_semip approach is the best choice whenever the underlying cure rate function follows a semiparametric model. However, this is a hard assumption that is difficult to reassure when dealing with real case scenarios.
- Although it is never the best choice, the MCM_np approach obtains close-to-best results. Unfortunately, due to the curse of dimensionality, its implementation is only available for single-covariate contexts.
- The ML_coxtime algorithm is very accurate when dealing with both semiparametric and nonparametric cure rate functions.
- Both ML_coxcc and ML_dcph achieve similar results, proving to be a good choice whenever the cure rate function is not known.
- The ML_coxtime approach has to be treated carefully, as its solution can be unstable if a wrong neural network architecture is selected. Our experiments suggest that small networks (less than three hidden layers) tend to achieve more consistent results.
- With independence of the selected Deep Survival algorithm, the performance is highly affected in cases where the uncure rate and the latency are influenced by different covariates. In these cases, the MCM_semip achieves the best results, even when dealing with nonparametric scenarios.
- ML_coxtime estimation in the application to bone narrow transplant is clearly far from the solutions provided by the other methods. We advise not to use it when the number of study samples is low.
- In addition, we do not recommend to use the ML_dhcp algorithm when the sample size is small. Figure 11, left, suggests the algorithm is not able to adjust to the cure rate estimation, providing a completely flat curve.
- The MCM_np is the only algorithm that is able to capture subtle changes in the cure rate estimation. Both machine learning Cox models and the MCM_semip method assume some conditions that cause a high degree of rigidity in their output. Therefore, in order to avoid biased estimations, we recommend the use of MCM_np when the underlying covariate effects cannot be well approximated by any parametric or semiparametric model.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AFT | Accelerated Failure Time |
ANN | Artificial Neural Netowrks |
ARD | Automatic Relevance Determination |
IPCW | Inverse Probability of Censoring Weighting |
MCM | Mixture Cure Models |
ML | Machine Learning |
MSE | Mean Squared Error |
NB | Naive-Bayes |
NMCM | Non-Mixture Cure Models |
PH | Proportional Hazard |
SVM | Support Vector Machine |
SVR | Support Vector Regression |
References
- Leung, K.; Elashoff, R.M.; Afifi, A.A. Censoring issues in Survival Analysis. Annu. Rev. Public Health 1997, 18, 83–104. [Google Scholar] [CrossRef] [PubMed]
- Marubini, E.; Valsecchi, M. Analysing Survival Data from Clinical Trials and Observational Studies; John Wiley & Sons: Chichester, UK, 2004; Volume 15. [Google Scholar]
- Amico, M.; Van Keilegom, I. Cure models in survival analysis. Ann. Rev. Stat. Appl. 2018, 5, 311–342. [Google Scholar] [CrossRef]
- Pedrosa-Laza, M.; López-Cheda, A.; Cao, R. Cure models to estimate time until hospitalization due to COVID-19. Appl. Intell. 2022, 52, 794–807. [Google Scholar] [CrossRef] [PubMed]
- Peng, Y.; Yu, B. Cure Models. Methods, Applications, and Implementation; Chapman and Hall/CRC Press: Boca Raton, FL, USA, 2021. [Google Scholar]
- Steele, A.J.; Denaxas, S.C.; Shah, A.D.; Hemingway, H.; Luscombe, N.M. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS ONE 2018, 13, e0202344. [Google Scholar] [CrossRef]
- Kvamme, H.; Borgan, Ø.; Scheel, I. Time-to-Event Prediction with Neural Networks and Cox Regression. J. Mach. Learn Res. 2019, 20, 1–30. [Google Scholar]
- Spooner, A.; Chen, E.; Sowmya, A.; Sachdev, P.; Kochan, N.A.; Trollor, J.; Brodaty, H. A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction. Sci. Rep. 2020, 10, 20410. [Google Scholar] [CrossRef]
- Nagpal, C.; Li, X.; Dubrawski, A. Deep survival machines: Fully parametric survival regression and representation learning for censored data with competing risks. IEEE J. Biomed Health Inform. 2021, 25, 3163–3175. [Google Scholar] [CrossRef]
- Jiang, C.; Wang, Z.; Zhao, H. A prediction-driven mixture cure model and its application in credit scoring. Eur. J. Oper. Res. 2019, 277, 20–31. [Google Scholar] [CrossRef]
- Li, P.; Peng, Y.; Jiang, P.; Dong, Q. A support vector machine based semiparametric mixture cure model. Comput. Stat. 2020, 35, 931–945. [Google Scholar] [CrossRef]
- Štěpánek, L.; Habarta, F.; Malá, I.; Štěpánek, L.; Nakládalová, M.; Boriková, A.; Marek, L. Machine Learning at the Service of Survival Analysis: Predictions Using Time-to-Event Decomposition and Classification Applied to a Decrease of Blood Antibodies against COVID-19. Mathematics 2023, 11, 819. [Google Scholar] [CrossRef]
- Xu, J.; Peng, Y. Nonparametric cure rate estimation with covariates. Can. J. Stat. 2014, 42, 1–17. [Google Scholar] [CrossRef]
- Haybittle, J. The estimation of the proportion of patients cured after treatment for cancer of the breast. Brit. J. Radiol. 1959, 32, 725–733. [Google Scholar] [CrossRef] [PubMed]
- Haybittle, J. A two-parameter model for the survival curve of treated cancer patients. J. Am. Stat. Assoc. 1965, 60, 16–26. [Google Scholar] [CrossRef]
- Tsodikov, A.D.; Yakovlev, A.Y.; Asselain, B. Stochastic Models of Tumor Latency and Their Biostatistical Applications; World Scientific: Singapore, 1996; Volume 1. [Google Scholar]
- Yakovlev, A.Y.; Cantor, A.B.; Shuster, J.J. Parametric versus non-parametric methods for estimating cure rates based on censored survival data. Stat. Med. 1994, 13, 983–986. [Google Scholar] [CrossRef] [PubMed]
- Chen, M.H.; Ibrahim, J.; Sinha, D. A new Bayesian model for survival data with a surviving fraction. J. Am. Stat. Assoc. 1999, 94, 909–919. [Google Scholar] [CrossRef]
- Chen, K.; Jin, Z.; Ying, Z. Semiparametric analysis of transformation models with censored data. Biometrika 2002, 89, 659–668. [Google Scholar] [CrossRef]
- Tsodikov, A. A proportional hazards model taking account of long-term survivors. Biometrics 1998, 48, 1508–1516. [Google Scholar] [CrossRef]
- Tsodikov, A. Semiparametric models: A generalized self-consistency approach. J. R. Stat. Soc. Series B Stat. Methodol. 2003, 65, 759–774. [Google Scholar] [CrossRef]
- Zeng, D.; Yin, G.; Ibrahim, J.G. Semiparametric transformation models for survival data with a cure fraction. J. Am. Stat. Assoc. 2006, 101, 670–684. [Google Scholar] [CrossRef]
- Liu, X.; Xiang, L. Generalized accelerated hazards mixture cure models with interval-censored data. Comput. Stat. Data Anal. 2021, 161, 107248. [Google Scholar] [CrossRef]
- Tsodikov, A. Estimation of survival based on proportional hazards when cure is a possibility. Math. Comput. Model. 2001, 33, 1227–1236. [Google Scholar] [CrossRef]
- Kaplan, E.L.; Meier, P. Nonparametric Estimation from Incomplete Observations. J. Am. Stat. Assoc. 1958, 53, 457–481. [Google Scholar] [CrossRef]
- Beran, R. Nonparametric Regression with Randomly Survival Data; University of California: Berkeley, CA, USA, 1981. [Google Scholar]
- Klein, J.P.; Moeschberger, M.L. Survival Analysis: Techniques for Censored and Truncated Data; Springer: New York, NY, USA, 1997; Volume 1. [Google Scholar]
- Klein, J.P.; Moeschberger, M.L.; Yan, J. KMsurv: Data Sets from Klein and Moeschberger (1997), Survival Analysis; R package version 0.1-5; 2012. Available online: https://CRAN.R-project.org/package=KMsurv (accessed on 4 September 2023).
- López-Cheda, A.; Cao, R.; Jácome, M.A.; Van Keilegom, I. Nonparametric incidence estimation and bootstrap bandwidth selection in mixture cure models. Comput. Stat. Data Anal. 2017, 105, 144–165. [Google Scholar] [CrossRef]
- Boag, J.W. Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J. R. Stat. Soc. Series B Stat. Methodol. 1949, 11, 15–53. [Google Scholar] [CrossRef]
- Berkson, J.; Gage, R.P. Survival curve for cancer patients following treatment. J. Am. Stat. Assoc. 1952, 47, 501–515. [Google Scholar] [CrossRef]
- Farewell, V.T. The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 1982, 38, 1041–1046. [Google Scholar] [CrossRef] [PubMed]
- Yamaguchi, K. Accelerated failure-time regression models with a regression model of surviving fraction: An application to the analysis of “permanent employment” in Japan. J. Am. Stat. Assoc. 1992, 87, 284–292. [Google Scholar]
- Peng, Y.; Dear, K.B.; Denham, J. A generalized F mixture model for cure rate estimation. Stat. Med. 1998, 17, 813–830. [Google Scholar] [CrossRef]
- Denham, J.; Denham, E.; Dear, K.; Hudson, G. The follicular non-Hodgkin’s Lymphomas—I. The possibility of cure. Eur. J. Cancer 1996, 32, 470–479. [Google Scholar] [CrossRef]
- Wileyto, E.P.; Li, Y.; Chen, J.; Heitjan, D.F. Assessing the fit of parametric cure models. Biostatistics 2012, 14, 340–350. [Google Scholar] [CrossRef]
- de Oliveira, R.P.; de Oliveira Peres, M.V.; Martinez, E.Z.; Alberto Achcar, J. A new cure rate regression framework for bivariate data based on the Chen distribution. Stat. Methods Med. Res. 2022, 31, 2442–2455. [Google Scholar] [CrossRef] [PubMed]
- Müller, U.; Van Keilegom, I. Goodness-of-fit tests for the cure rate in a mixture cure model. Biometrika 2019, 106, 211–227. [Google Scholar] [CrossRef]
- Scolas, S.; El Ghouch, A.; Legrand, C.; Oulhaj, A. Variable selection in a flexible parametric mixture cure model with interval-censored data. Stat. Med. 2016, 35, 1210–1225. [Google Scholar] [CrossRef] [PubMed]
- Geng, Z.; Li, J.; Niu, Y.; Wang, X. Goodness-of-fit test for a parametric mixture cure model with partly interval-censored data. Stat. Med. 2022, 42, 407–421. [Google Scholar] [CrossRef]
- Musta, E.; Patilea, V.; Van Keilegom, I. A presmoothing approach for estimation in the semiparametric Cox mixture cure model. Bernoulli 2022, 28, 2689–2715. [Google Scholar] [CrossRef]
- Li, C.; Taylor, J.M.G. A semi-parametric accelerated failure time cure model. Stat. Med. 2002, 21, 3235–3247. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, J.; Tang, Y. Semiparametric estimation for accelerated failure time mixture cure model allowing non-curable competing risk. Stat. Theory Relat. Fields 2020, 4, 97–108. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, W.; Tang, Y. A Bayesian semiparametric accelerate failure time mixture cure model. Int. J. Biostat. 2022, 18, 473–485. [Google Scholar] [CrossRef]
- Peng, Y.; Dear, K. A nonparametric mixture model for cure rate estimation. Biometrics 2000, 56, 237–243. [Google Scholar] [CrossRef]
- Lam, K.F.; Fong, D.Y.T.; Tang, O.Y. Estimating the proportion of cured patients in a censored sample. Stat. Med. 2005, 24, 1865–1879. [Google Scholar] [CrossRef]
- Corbière, F.; Commenges, D.; Taylor, J.M.G.; Joly, P. A penalized likelihood approach for mixture cure models. Stat. Med. 2009, 28, 510–524. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Du, P.; Liang, H. Two-component mixture cure rate model with spline estimated nonparametric components. Biometrics 2012, 68, 726–735. [Google Scholar] [CrossRef] [PubMed]
- Hu, T.; Xiang, L. Efficient estimation for semiparametric cure models with interval-censored data. J. Multivar. Anal. 2013, 121, 139–151. [Google Scholar] [CrossRef]
- Amico, M.; Van Keilegom, I.; Legrand, C. The single-index/Cox mixture cure model. Biometrics 2019, 75, 452–462. [Google Scholar] [CrossRef]
- Maller, R.A.; Zhou, S. Estimating the proportion of immunes in a censored sample. Biometrika 1992, 79, 731–739. [Google Scholar] [CrossRef]
- Laska, E.M.; Meisner, M.J. Nonparametric estimation and testing in a cure model. Biometrics 1992, 48, 1223–1234. [Google Scholar] [CrossRef]
- Dabrowska, D. Uniform consistency of the kernel conditional Kaplan-Meier estimate. Ann. Stat. 1989, 17, 1157–1167. [Google Scholar] [CrossRef]
- López-Cheda, A.; Jácome, M.A.; Cao, R. Nonparametric latency estimation for mixture cure models. Test 2017, 26, 353–376. [Google Scholar] [CrossRef]
- López-Cheda, A.; Jácome, M.A.; López-de-Ullibarri, I. npcure: An R Package for Nonparametric Inference in Mixture Cure Models. R J. 2021, 13, 21–41. [Google Scholar] [CrossRef]
- López-de-Ullibarri, I.; López-Cheda, A.; Jácome, M.A. Npcure: Nonparametric Estimation in Mixture Cure Models; R package version 0.1-5; 2020. Available online: https://CRAN.R-project.org/package=npcure (accessed on 4 September 2023).
- López-Cheda, A.; Jácome, M.A.; Van Keilegom, I.; Cao, R. Nonparametric covariate hypothesis tests for the cure rate in mixture cure models. Stat. Med. 2020, 39, 2291–2307. [Google Scholar] [CrossRef]
- Wang, P.; Li, Y.; Reddy, C.K. Machine learning for survival analysis: A Survey. ACM Comput. Surv. 2019, 51, 1–36. [Google Scholar] [CrossRef]
- Gordon, L.; Olshen, R.A. Tree-structured survival analysis. Cancer Treat. Rep. 1985, 69, 1065–1069. [Google Scholar] [PubMed]
- Davis, R.B.; Anderson, J.R. Exponential survival trees. Stat. Med. 1989, 8, 947–961. [Google Scholar] [CrossRef] [PubMed]
- Kwak, L.W.; Halpern, J.; Olshen, R.A.; Horning, S.J. Prognostic significance of actual dose intensity in diffuse large-cell lymphoma: Results of a tree-structured survival analysis. J. Clin. Oncol. 1990, 8, 963–977. [Google Scholar] [CrossRef] [PubMed]
- LeBlanc, M.; Crowley, J. Relative risk trees for censored survival data. Biometrics 1992, 48, 411–425. [Google Scholar] [CrossRef]
- Huang, X.; Soong, S.; McCarthy, W.; Urist, M.; Balch, C. Classification of localized melanoma by the exponential survival trees method. Cancer 1997, 79, 1122–1128. [Google Scholar] [CrossRef]
- Huang, X.; Chen, S.; Soong, S. Piecewise exponential survival trees with time-dependent covariates. Biometrics 1998, 54, 1420–1433. [Google Scholar] [CrossRef]
- Ciampi, A.; Thiffault, J.; Nakache, J.; Asselain, B. Stratification by stepwise regression, correspondence analysis and recursive partition: A comparison of three methods of analysis for survival data with covariates. Comput. Stat. Data Anal. 1986, 4, 185–204. [Google Scholar] [CrossRef]
- Ciampi, A.; Chang, C.; Hogg, S.; McKinney, S. Recursive Partition: A Versatile Method for Exploratory-Data Analysis in Biostatistics. In Biostatistics: Advances in Statistical Sciences Festschrift in Honor of Professor V.M. Joshi’s 70th Birthday Volume V; Springer: Dordrecht, The Netherlands, 1987; pp. 23–50. [Google Scholar]
- Segal, M.R. Regression trees for censored data. Biometrics 1988, 44, 35–47. [Google Scholar] [CrossRef]
- Ishwaran, H.; Kogalur, U.B.; Blackstone, E.H.; Lauer, M.S. Random Survival Forests. Ann. Appl. Stat. 2008, 2, 841–860. [Google Scholar] [CrossRef]
- Schmid, M.; Wright, M.; Ziegler, A. On the use of Harrell’s C for clinical risk prediction via random survival forests. Expert Syst. Appl. 2016, 63, 450–459. [Google Scholar] [CrossRef]
- Andrade, J.; Valencia, J. A Fuzzy Random Survival Forest for Predicting Lapses in Insurance Portfolios Containing Imprecise Data. Mathematics 2023, 11, 198. [Google Scholar] [CrossRef]
- Awad, M.; Khanna, R. Support vector regression. In Efficient Learning Machines; Apress: Berkeley, CA, USA, 2015; pp. 67–80. [Google Scholar]
- Smola, A.J.; Schölkopf, B. Learning with Kernels; Citeseer: New York, NY, USA, 1998; Volume 4. [Google Scholar]
- Har-Peled, S.; Roth, D.; Zimak, D. Constraint classification: A new approach to multiclass classification. In Proceedings of the International Conference on Algorithmic Learning Theory, Virtual, 16–19 March 2021; Springer: Heidelberg, Germany, 2002; pp. 365–379. [Google Scholar]
- Khan, F.M.; Zubek, V.B. Support vector regression for censored data (SVRc): A novel tool for survival analysis. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, IEEE, Pisa, Italy, 15–19 December 2008; pp. 863–868. [Google Scholar]
- Van Belle, V.; Pelckmans, K.; Van Huffel, S.; Suykens, J.A. Support vector methods for survival analysis: A comparison between ranking and regression approaches. Artif. Intell. Med. 2011, 53, 107–118. [Google Scholar] [CrossRef] [PubMed]
- Kiaee, F.; Sheikhzadeh, H.; Mahabadi, S.E. Relevance vector machine for survival analysis. IEEE Trans. Neural. Netw. Learn. Syst. 2015, 27, 648–660. [Google Scholar] [CrossRef] [PubMed]
- Lisboa, P.J.; Wong, H.; Harris, P.; Swindell, R. A Bayesian neural network approach for modelling censored data with an application to prognosis after surgery for breast cancer. Artif. Intell. Med. 2003, 28, 1–25. [Google Scholar] [CrossRef]
- Fard, M.J.; Wang, P.; Chawla, S.; Reddy, C.K. A bayesian perspective on early stage event prediction in longitudinal data. IEEE Trans. Knowl. Data Eng. 2016, 28, 3126–3139. [Google Scholar] [CrossRef]
- Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 1958, 65, 386. [Google Scholar] [CrossRef]
- Liestbl, K.; Andersen, P.K.; Andersen, U. Survival analysis and neural nets. Stat. Med. 1994, 13, 1189–1200. [Google Scholar] [CrossRef]
- Mariani, L.; Coradini, D.; Biganzoli, E.; Boracchi, P.; Marubini, E.; Pilotti, S.; Salvadori, B.; Silvestrini, R.; Veronesi, U.; Zucali, R.; et al. Prognostic factors for metachronous contralateral breast cancer: A comparison of the linear Cox regression model and its artificial neural network extension. Breast Cancer Res. Treat. 1997, 44, 167–178. [Google Scholar] [CrossRef]
- Brown, S.F.; Branford, A.J.; Moran, W. On the use of artificial neural networks for the analysis of survival data. IEEE Trans. Neural. Netw. Learn. Syst. 1997, 8, 1071–1077. [Google Scholar] [CrossRef]
- Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Series B Stat. Methodol. 1972, 34, 187–202. [Google Scholar] [CrossRef]
- Cox, D.R. Partial likelihood. Biometrika 1975, 62, 269–276. [Google Scholar] [CrossRef]
- Faraggi, D.; Simon, R. A neural network model for survival data. Stat. Med. 1995, 14, 73–82. [Google Scholar] [CrossRef] [PubMed]
- Yousefi, S.; Amrollahi, F.; Amgad, M.; Dong, C.; Lewis, J.E.; Song, C.; Gutman, D.A.; Halani, S.H.; Vega, J.E.V.; Brat, D.J.; et al. Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Sci. Rep. 2017, 7, 11707. [Google Scholar] [CrossRef] [PubMed]
- Luck, M.; Sylvain, T.; Cardinal, H.; Lodi, A.; Bengio, Y. Deep learning for patient-specific kidney graft survival analysis. arXiv 2017, arXiv:1705.10245. [Google Scholar]
- Katzman, J.L.; Shaham, U.; Cloninger, A.; Bates, J.; Jiang, T.; Kluger, Y. DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med. Res. Methodol. 2018, 18, 1–12. [Google Scholar] [CrossRef]
- Ching, T.; Zhu, X.; Garmire, L.X. Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Comput. Biol. 2018, 14, e1006076. [Google Scholar] [CrossRef]
- Kvamme, H.; Borgan, Ø. Continuous and discrete-time survival prediction with neural networks. Lifetime Data Anal. 2021, 27, 710–736. [Google Scholar] [CrossRef]
- Beaulac, C.; Rosenthal, J.S.; Pei, Q.; Friedman, D.; Wolden, S.; Hodgson, D. An evaluation of machine learning techniques to predict the outcome of children treated for Hodgkin-Lymphoma on the AHOD0031 trial. Appl. Artif. Intell. 2020, 34, 1100–1114. [Google Scholar] [CrossRef]
- Srujana, B.; Verma, D.; Naqvi, S. Machine Learning vs. Survival Analysis Models: A study on right censored heart failure data. Commun. Stat. Simul. Comput. 2022, 0, 1–18. [Google Scholar] [CrossRef]
- Lee, C.; Zame, W.R.; Yoon, J.; Van der Schaar, M. Deephit: A deep learning approach to survival analysis with competing risks. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Gensheimer, M.F.; Narasimhan, B. A scalable discrete-time survival model for neural networks. PeerJ 2019, 7, e6257. [Google Scholar] [CrossRef] [PubMed]
- Xie, Y.; Yu, Z. Mixture cure rate models with neural network estimated nonparametric components. Comput. Stat. 2021, 36, 2467–2489. [Google Scholar] [CrossRef]
- Xie, Y.; Yu, Z. Promotion time cure rate model with a neural network estimated non-parametric component. Stat. Med. 2021, 40, 3516–3532. [Google Scholar] [CrossRef] [PubMed]
- Pal, S.; Aselisewine, W. A semiparametric promotion time cure model with support vector machine. Ann. Appl. Stat. 2023, 17, 2680–2699. [Google Scholar] [CrossRef]
- Kim, D.; Lee, S.; Kwon, S.; Nam, W.; Cha, I.; Kim, H. Deep learning-based survival prediction of oral cancer patients. Sci. Rep. 2019, 9, 6994. [Google Scholar] [CrossRef]
- Antosz, K.; Machado, J.; Mazurkiewicz, D.; Antonelli, D.; Soares, F. Systems Engineering: Availability and Reliability. Appl. Sci. 2022, 12, 2504. [Google Scholar] [CrossRef]
- Martyushev, N.; Malozyomov, B.; Sorokova, S.; Efremenkov, E.; Valuev, D.; Qi, M. Review Models and Methods for Determining and Predicting the Reliability of Technical Systems and Transport. Mathematics 2023, 11, 3317. [Google Scholar] [CrossRef]
- Antolini, L.; Boracchi, P.; Biganzoli, E. A time-dependent discrimination index for survival data. Stat. Med. 2005, 24, 3927–3944. [Google Scholar] [CrossRef]
- Kuk, A.; Chen, C. A mixture model combining logistic regression with proportional hazards regression. Biometrika 1992, 79, 531–541. [Google Scholar] [CrossRef]
- Nagpal, C.; Potosnak, W.; Dubrawski, A. Auton-survival: An open-source package for regression, counterfactual estimation, evaluation and phenotyping with censored time-to-event data. In Proceedings of the Machine Learning for Healthcare Conference, PMLR, Durham, NC, USA, 5–6 August 2022; pp. 585–608. [Google Scholar]
Scenario | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|
0.476 | 0.0476 | 0.476 | 0.0476 | 0.476 | 0.0476 | 0.476 | 0.0476 | |
0.358 | −0.2558 | 0.358 | −0.2558 | 0.358 | −0.2558 | 0.358 | −0.2558 | |
0 | −0.0027 | 0 | −0.0027 | −0.151 | −0.0027 | −0.151 | −0.0027 | |
− | 0.0020 | − | 0.0020 | − | 0.0020 | − | 0.0020 | |
− | 0 | − | 0 | − | 0.207 | − | 0.207 |
Sex | Number of Infected Patients | Number of Censored Data | % Censoring |
---|---|---|---|
Female | 5451 | 5015 | 92.00% |
Male | 3172 | 2808 | 88.52% |
Sex | Number of Patients | Number of Censored Data | % Censoring |
---|---|---|---|
Female | 57 | 21 | 36.84% |
Male | 80 | 33 | 41.25% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ezquerro, A.; Cancela, B.; López-Cheda, A. On the Reliability of Machine Learning Models for Survival Analysis When Cure Is a Possibility. Mathematics 2023, 11, 4150. https://doi.org/10.3390/math11194150
Ezquerro A, Cancela B, López-Cheda A. On the Reliability of Machine Learning Models for Survival Analysis When Cure Is a Possibility. Mathematics. 2023; 11(19):4150. https://doi.org/10.3390/math11194150
Chicago/Turabian StyleEzquerro, Ana, Brais Cancela, and Ana López-Cheda. 2023. "On the Reliability of Machine Learning Models for Survival Analysis When Cure Is a Possibility" Mathematics 11, no. 19: 4150. https://doi.org/10.3390/math11194150
APA StyleEzquerro, A., Cancela, B., & López-Cheda, A. (2023). On the Reliability of Machine Learning Models for Survival Analysis When Cure Is a Possibility. Mathematics, 11(19), 4150. https://doi.org/10.3390/math11194150