A Simple, Test-Based Method to Control the Overestimation Bias in the Analysis of Potential Prognostic Tumour Markers
Abstract
:Simple Summary
Abstract
1. Introduction
2. Adjusting the Hazard Ratio for the Overestimation Bias
3. Statistical Validation of the Proposed Method to Control the Overestimation Bias: Analysing in Silico Data from Two Real Data Sets
3.1. The Data Sets
3.2. The Validation Procedure
3.3. Results of the Statistical Validation on Real Data Sets
4. Statistical Validation Using Simulated Sets of Randomly Generated Data
4.1. The Data Sets
4.2. Results of Analysis of Simulated Data Sets under the Null Hypothesis
4.3. Results of Analysis of Simulated Data Sets under the Alternative Hypothesis
5. Application to a Real Data Set for the Evaluation of Potential Prognostic Markers in Patients with Stage 4S Neuroblastoma
5.1. The Data Set
5.2. Results of the Application of the Proposed Method
6. Validation on an Independent Cohort of the Results Described in the Previous Section
6.1. Patients and Methods
6.2. E2F1 Protein Expression in Primary Neuroblastoma Tissue Samples
7. Discussion
8. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Budczies, J.; Klauschen, F.; Sinn, B.V.; Győrffy, B.; Schmitt, W.D.; Darb-Esfahani, S.; Denkert, C. Cutoff Finder: A comprehensive and straightforward Web application enabling rapid biomarker cutoff optimization. PLoS ONE 2012, 7, e51862. [Google Scholar] [CrossRef] [PubMed]
- Cox, D.R. Regression Models and Life-Tables. J. R. Stat. Soc. Ser. B 1972, 34, 187–220. [Google Scholar] [CrossRef]
- Ogłuszka, M.; Orzechowska, M.; Jędroszka, D.; Witas, P.; Bednarek, A.K. Evaluate Cutpoints: Adaptable continuous data distribution system for determining survival in Kaplan-Meier estimator. Comput Methods Programs Biomed. 2019, 177, 133–139. [Google Scholar] [CrossRef]
- Altman, D.G.; Royston, P. What do we mean by validating a prognostic model? Stat. Med. 2000, 19, 453–473. [Google Scholar] [CrossRef]
- Magdon-Ismail, M.; Mertsalov, K. A permutation approach to validation. Stat. Anal. Data Min. 2010, 3, 361–380. [Google Scholar] [CrossRef]
- Parodi, S.; Ognibene, M.; Haupt, R.; Pezzolo, A. The Over-Expression of E2F3 Might Serve as Prognostic Marker for Neuroblastoma Patients with Stage 4S Disease. Diagnostics 2020, 16, 315. [Google Scholar] [CrossRef]
- Davison, A.C.; Hinkley, D.V. Non parametric permutation tests. In Bootstrap Methods and Their Application; Cambridge University Press: Cambridge, UK, 1997; pp. 156–161. [Google Scholar]
- Hosmer, D.W.; Lemeshow, S. Applied Survival Analysis; John Wiley & Sons: New York, NY, USA, 1999; pp. 90–105. [Google Scholar]
- Cangelosi, D.; Morini, M.; Zanardi, N.; Sementa, A.R.; Muselli, M.; Conte, M.; Garaventa, A.; Pfeffer, U.; Bosco, M.C.; Varesio, L.; et al. Hypoxia Predicts Poor Prognosis in Neuroblastoma Patients and Associates with Biological Mechanisms Involved in Telomerase Activation and Tumor Microenvironment Reprogramming. Cancers 2020, 12, 2343. [Google Scholar] [CrossRef] [PubMed]
- Cavalli, F.M.G.; Remke, M.; Rampasek, K.; Peacock, J.; Shih, D.J.H.; Luu, B.; Garzia, L.; Torchia, J.; Nor, C.; Morrissy, A.S.; et al. Intertumoral heterogeneity within medulloblastoma subgroups. Cancer Cell 2017, 31, 737–754. [Google Scholar] [CrossRef] [PubMed]
- Firth, D. Bias reduction of maximum likelihood estimates. Biometrika 1993, 80, 27–38. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2015. [Google Scholar]
- Tas, M.L.; Nagtegaal, M.; Kraal, K.C.J.M.; Tytgat, G.A.M.; Abeling, N.G.G.M.; Koster, J.; Pluijm, S.M.F.; Zwaan, C.M.; de Keizer, B.; Molenaar, J.J.; et al. Neuroblastoma stage 4S: Tumor regression rate and risk factors of progressive disease. Pediatr. Blood Cancer 2020, 67, e28061. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- De Bernardi, B.; Di Cataldo, A.; Garaventa, A.; Massirio, P.; Viscardi, E.; Podda, M.G.; Castellano, A.; D’Angelo, P.; Tirtei, E.; Melchionda, F.; et al. Stage 4 s neuroblastoma: Features, management and outcome of 268 cases from the Italian Neuroblastoma Registry. Ital. J. Pediatr. 2019, 45, 8. [Google Scholar] [CrossRef]
- Ognibene, M.; De Marco, P.; Parodi, S.; Meli, M.; Di Cataldo, A.; Zara, F.; Pezzolo, A. Genomic Analysis Made It Possible to Identify Gene-Driver Alterations Covering the Time Window between Diagnosis of Neuroblastoma 4S and the Progression to Stage 4. Int. J. Mol. Sci. 2022, 23, 6513. [Google Scholar] [CrossRef] [PubMed]
- Kocak, H.; Ackermann, S.; Hero, B.; Kahlert, Y.; Oberthuer, A.; Juraeva, D.; Roels, F.; Theissen, J.; Westermann, F.; Deubzer, H.; et al. Hox-C9 activates the intrinsic pathway of apoptosis and is associated with spontaneousregression in neuroblastoma. Cell Death Dis. 2013, 4, e586. [Google Scholar] [CrossRef]
- Oberthuer, A.; Berthold, F.; Warnat, P.; Hero, B.; Kahlert, Y.; Spitz, R.; Ernestus, K.; König, R.; Haas, S.; Eils, R.; et al. Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification. J. Clin. Oncol. 2006, 24, 5070–5078. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Yu, Y.; Hertwig, F.; Thierry-Mieg, J.; Thierry-Mieg, D.; Wang, J.; Furlanello, C.; Devanarayan, V.; Cheng, J.; Deng, Y.; et al. Comparison of RNA-seq and microarray-based models for clinical endpoint prediction. Genome Biol. 2015, 16, 133. [Google Scholar] [CrossRef] [PubMed]
- DerSimonian, R.; Laird, N. Meta-analysis in clinical trials. Control. Clin. Trials 1986, 7, 177–188. [Google Scholar] [CrossRef] [PubMed]
- Brodeur, G.M.; Pritchard, J.; Berthold, F.; Carlsen, N.L.; Castel, V.; Castelberry, R.P.; DE Bernardi, B.; Evans, A.E.; Favrot, M.; Hedborg, F.; et al. Revisions of the international criteria for neuroblastoma diagnosis, staging, and response to treatment. J. Clin. Oncol. 1993, 11, 1466–1477. [Google Scholar] [CrossRef] [PubMed]
- De Bernardi, B.; Gerrard, M.; Boni, L.; Rubie, H.; Cañete, A.; Di Cataldo, A.; de Lacerda, A.F.; Ladenstein, R.; Ruud, E.; Brichard, B.; et al. Excellent outcome with reduced treatment for infants with disseminated neuroblastoma without MYCN gene amplification. J. Clin. Oncol. 2009, 27, 1034–1040. [Google Scholar] [CrossRef] [PubMed]
- Ognibene, M.; Podestà, M.; Garaventa, A.; Pezzolo, A. Role of GOLPH3 and TPX2 in Neuroblastoma DNA Damage Response and Cell Resistance to Chemotherapy. Int. J. Mol. Sci. 2019, 20, 4764. [Google Scholar] [CrossRef] [PubMed]
- Phipson, B.; Smyth, G.K. Permutation p-values should never be zero: Calculating exact p-values when permutations are randomly drawn. Stat. Appl. Genet. Mol. Biol. 2010, 9, 39. [Google Scholar] [CrossRef] [PubMed]
- Breslow, N.E.; Day, N.E. Statistical Methods in Cancer Research Volume II: The Design and Analysis of Cohort Studies; IARC Scientific Publication No. 82: Lyon, France, 1987. [Google Scholar]
- Degregori, J.; Johnson, D.G. Distinct and overlapping roles for E2F family members in transcription, proliferation and apoptosis. Curr. Mol. Med. 2006, 6, 739–748. [Google Scholar]
- Wang, H.; Wang, X.; Xu, L.; Zhang, J. Prognostic analysis of E2F transcription factors E2F1 and E2F3 in four independent pediatric neuroblastoma cohorts. BMC Pediatr. 2022, 22, 376. [Google Scholar] [CrossRef]
- Pepe, M.S.; Longton, G.; Anderson, G.L.; Schummer, M. Selecting differentially expressed genes from microarray experiments. Biometrics 2003, 59, 133–142. [Google Scholar] [CrossRef] [PubMed]
- Pepe, M.S.; Etzioni, R.; Feng, Z.; Potter, J.D.; Thompson, M.L.; Thornquist, M.; Winget, M.; Yasui, Y. Phases of biomarker development for early detection of cancer. J. Natl. Cancer Inst. 2001, 93, 1054–1061. [Google Scholar] [CrossRef] [PubMed]
- Janes, H.; Pepe, M.S. Adjusting for covariates in studies of diagnostic, screening, or prognostic markers: An old concept in a new setting. Am. J. Epidemiol. 2008, 168, 89–97. [Google Scholar] [CrossRef]
- Castrignanò, T.; Gioiosa, S.; Flati, T.; Cestari, M.; Picardi, E.; Chiara, M.; Fratelli, M.; Amente, S.; Cirilli, M.; Tangaro, M.A.; et al. F. ELIXIR-IT HPC@CINECA: High performance computing resources for the bioinformatics community. BMC Bioinform. 2020, 21 (Suppl. S10), 352. [Google Scholar] [CrossRef]
Whole Data (n = 786) | Training Set (n = 393) | Test Set (n = 393) | |
---|---|---|---|
Patient Characteristics | N (%) | N (%) | N (%) |
Age at diagnosis | |||
<18 months | 449 (57.1) | 228 (58.0) | 221 (56.2) |
≥18 months | 337 (42.9) | 165 (42.0) | 172 (43.8) |
MYCN status | |||
Not amplified | 629 (80.0) | 312 (79.4) | 317 (80.7) |
Amplified | 153 (19.5) | 78 (19.8) | 75 (19.1) |
Missing | 4 (0.5) | 3 (0.8) | 1 (0.2) |
Disease extension | |||
Localised | 373 (47.5) | 174 (44.3) | 199 (50.6) |
Disseminated | 412 (52.4) | 218 (55.5) | 194 (49.4) |
Missing | 1 (0.1) | 1 (0.2) | 0 (0.0) |
INSS Stage | |||
1 | 143 (18.2) | 56 (14.2) | 87 (22.1) |
2 | 125 (15.9) | 67 (17.0) | 58 (14.8) |
3 | 105 (13.4) | 51 (13.0) | 54 (13.7) |
4 | 320 (40.7) | 170 (43.3) | 150 (38.2) |
4s | 92 (11.7) | 48 (12.2) | 44 (11.2) |
Missing | 1 (0.1) | 1 (0.3) | 0 (0.0) |
Deaths | 229 (29.1) | 116 (29.5) | 113 (28.8) |
Whole Data (n = 499) | Training Set (n = 250) | Test Set (n = 249) | |
---|---|---|---|
Patient Characteristics | N (%) | N (%) | N (%) |
Age at diagnosis | |||
0–3 years | 109 (21.8) | 57 (22.8) | 52 (20.9) |
4–10 years | 320 (64.1) | 163 (65.2) | 157 (63.0) |
11–13 years | 70 (14.0) | 30 (12.0) | 40 (16.1) |
Gender | |||
Males | 323 (64.7) | 148 (59.2) | 175 (70.3) |
Females | 173 (34.7) | 99 (39.6) | 74 (29.7) |
Missing | 3 (0.6) | 3 (1.2) | 0 (0.0) |
Molecular subgroups | |||
WNT | 40 (8.0) | 27 (10.8) | 13 (5.2) |
SHH | 112 (22.4) | 58 (23.2) | 54 (21.7) |
Group 3 | 106 (21.2) | 54 (21.6) | 52 (20.8) |
Group 4 | 241 (48.3) | 111 (44.4) | 130 (52.2) |
Histology | |||
Classic | 279 (55.9) | 137 (54.8) | 142 (57.0) |
Desmoplastic | 65 (13.0) | 30 (12.0) | 35 (14.1) |
Large cell/anaplastic | 57 (11.4) | 27 (10.8) | 30 (12.0) |
Extensive nodularity | 14 (2.8) | 7 (2.8) | 7 (2.8) |
Not available | 84 (16.8) | 49 (19.6) | 35 (14.1) |
Disease extension | |||
Localised | 304 (60.9) | 149 (59.6) | 155 (62.2) |
Disseminated | 148 (29.7) | 78 (31.2) | 70 (28.1) |
Missing | 47 (9.4) | 23 (9.2) | 24 (9.6) |
Deaths | 139 (27.9) | 65 (26.0) | 74 (29.7) |
Type I Error * | Standard Error of ln(HR) | |||||
---|---|---|---|---|---|---|
Sample Size | Unadjusted HR Estimates | Adjusted HR Estimates | HR Estimates on Median Value | Unadjusted HR Estimates | Adjusted HR Estimates | HR Estimates on Median Value |
Event rate = 0.1 u−1 | ||||||
20 | 0.127 | 0.048 | 0.042 | 0.842 | 0.659 | 0.591 |
40 | 0.311 | 0.044 | 0.037 | 0.849 | 0.520 | 0.394 |
60 | 0.358 | 0.055 | 0.056 | 0.795 | 0.463 | 0.328 |
80 | 0.408 | 0.049 | 0.046 | 0.754 | 0.432 | 0.279 |
100 | 0.459 | 0.049 | 0.047 | 0.722 | 0.397 | 0.255 |
200 | 0.564 | 0.056 | 0.052 | 0.688 | 0.411 | 0.188 |
Pooled | 0.371 | 0.050 | 0.047 | 0.777 | 0.489 | 0.363 |
Event rate = 0.3 u−1 | ||||||
20 | 0.136 | 0.046 | 0.058 | 0.747 | 0.564 | 0.537 |
40 | 0.303 | 0.056 | 0.052 | 0.704 | 0.404 | 0.347 |
60 | 0.388 | 0.046 | 0.053 | 0.651 | 0.338 | 0.277 |
80 | 0.436 | 0.051 | 0.045 | 0.613 | 0.312 | 0.237 |
100 | 0.436 | 0.047 | 0.044 | 0.583 | 0.278 | 0.203 |
200 | 0.554 | 0.062 | 0.055 | 0.544 | 0.261 | 0.147 |
Pooled | 0.376 | 0.051 | 0.051 | 0.644 | 0.374 | 0.317 |
Marker | Data Set | Median | IQR | Optimal Cut-Off |
---|---|---|---|---|
E2F1 | Kocak | 10.2 | 9.3–11.3 | 9.5 |
Oberthuer | −0.250 | −0.354–0.079 | −0.026 | |
SEQC | 4.8 | 4.0–5.6 | 4.092 | |
E2F2 | Kocak | 11.7 | 10.8–12.7 | 11.0 |
Oberthuer | −0.010 | −0.048–0.109 | 0.099 | |
SEQC | 4.2 | 3.5–4.8 | 3.5 | |
E2F3 | Kocak | 11.5 | 10.9–12.1 | 11.8 |
Oberthuer | −0.132 | −0.347–0.079 | 0.035 | |
SEQC | 4.0 | 3.6–4.5 | 4.1 |
Cut-Off on Median Value | Optimal Cut-Off | Optimal Cut-Off (Adjusted) * | |||||
---|---|---|---|---|---|---|---|
Databases | N/E | HR | 95%CI | HR | 95%CI | HR | 95%CI |
Kocak | 56/13 | 1.9 | 0.61–5.7 | 6.0 | 0.79–46.5 | 3.8 | 0.50–29.3 |
Oberthuer | 30/7 | 1.3 | 0.30–6.0 | 4.6 | 1.0–20.8 | 2.6 | 0.58–11.6 |
SEQC | 48/12 | 1.6 | 0.50–5.0 | 5.9 | 0.76–45.4 | 3.7 | 0.48–28.7 |
Combined | 134/32 | 1.6 | 0.80–3.3 | 5.3 | 1.9–15.0 | 3.1 | 1.1–8.9 |
Cut-Off on Median Value | Optimal Cut-Off | Optimal Cut-Off (Adj) * | |||||
---|---|---|---|---|---|---|---|
Databases | N/E | HR | 95%CI | HR | 95%CI | HR | 95%CI |
Kocak | 56/13 | 2.0 | 0.64–6.0 | 6.7 | 0.87–51.8 | 4.5 | 0.61–36.2 |
Oberthuer | 30/7 | 1.4 | 0.31–6.1 | 2.4 | 0.54–10.9 | 1.3 | 0.28–5.7 |
SEQC | 48/12 | 1.0 | 0.34–3.2 | 5.2 | 0.68–40.7 | 3.0 | 0.39–23.3 |
Combined | 134/32 | 1.4 | 0.70–2.9 | 3.9 | 1.4–10.9 | 2.2 | 0.79–6.3 |
Cut-Off on Median Value | Optimal Cut-Off | Optimal Cut-Off (Adj) * | |||||
---|---|---|---|---|---|---|---|
Databases | N/E | HR | 95%CI | HR | 95%CI | HR | 95%CI |
Kocak | 56/13 | 3.8 | 1.0–13.7 | 4.8 | 1.5–15.0 | 3.0 | 0.97–9.5 |
Oberthuer | 30/7 | 6.8 | 0.81–56.2 | 10.0 | 1.9–52.5 | 7.9 | 1.5–41.7 |
SEQC | 48/12 | 3.3 | 0.89–2.5 | 4.9 | 1.3–18.2 | 3.4 | 0.91–12.6 |
Combined | 134/32 | 3.9 | 1.7–9.1 | 5.7 | 2.6–12.1 | 3.8 | 1.8–8.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ognibene, M.; Pezzolo, A.; Cavanna, R.; Cangelosi, D.; Sorrentino, S.; Parodi, S. A Simple, Test-Based Method to Control the Overestimation Bias in the Analysis of Potential Prognostic Tumour Markers. Cancers 2023, 15, 1188. https://doi.org/10.3390/cancers15041188
Ognibene M, Pezzolo A, Cavanna R, Cangelosi D, Sorrentino S, Parodi S. A Simple, Test-Based Method to Control the Overestimation Bias in the Analysis of Potential Prognostic Tumour Markers. Cancers. 2023; 15(4):1188. https://doi.org/10.3390/cancers15041188
Chicago/Turabian StyleOgnibene, Marzia, Annalisa Pezzolo, Roberto Cavanna, Davide Cangelosi, Stefania Sorrentino, and Stefano Parodi. 2023. "A Simple, Test-Based Method to Control the Overestimation Bias in the Analysis of Potential Prognostic Tumour Markers" Cancers 15, no. 4: 1188. https://doi.org/10.3390/cancers15041188
APA StyleOgnibene, M., Pezzolo, A., Cavanna, R., Cangelosi, D., Sorrentino, S., & Parodi, S. (2023). A Simple, Test-Based Method to Control the Overestimation Bias in the Analysis of Potential Prognostic Tumour Markers. Cancers, 15(4), 1188. https://doi.org/10.3390/cancers15041188