Prediction of Incident Cancers in the Lifelines Population-Based Cohort
Abstract
:Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Design
2.2. Participants
2.3. Patient Outcome
2.4. Predictive Variables
2.5. Statistical Analysis
2.6. Model Development and Validation
3. Results
3.1. General Model
3.2. Breast Cancer Model
3.3. Gastrointestinal Cancer Model
3.4. Prostate Cancer Model
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ferlay, J.; Colombet, M.; Soerjomataram, I.; Dyba, T.; Randi, G.; Bettio, M.; Gavin, A.; Visser, O.; Bray, F. Cancer incidence and mortality patterns in Europe: Estimates for 40 countries and 25 major cancers in 2018. Eur. J. Cancer 2018, 103, 356–387. [Google Scholar] [CrossRef]
- Pilleron, S.; Sarfati, D.; Janssen-Heijnen, M.; Vignat, J.; Ferlay, J.; Bray, F.; Soerjomataram, I. Global cancer incidence in older adults, 2012 and 2035: A population-based study. Int. J. Cancer 2019, 144, 49–58. [Google Scholar] [CrossRef] [PubMed]
- Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2018, 68, 394–424. [Google Scholar] [CrossRef] [Green Version]
- Soerjomataram, I.; De Vries, E.; Pukkala, E.; Coebergh, J.W. Excess of cancers in Europe: A study of eleven major cancers amenable to lifestyle change. Int. J. Cancer 2007, 120, 1336–1343. [Google Scholar] [CrossRef] [PubMed]
- Jayes, L.; Haslam, P.L.; Gratziou, C.G.; Powell, P.; Britton, J.; Vardavas, C.; Jimenez-Ruiz, C.; Leonardi-Bee, J.; Dautzenberg, B.; Lundbäck, B.; et al. SmokeHaz: Systematic Reviews and Meta-analyses of the Effects of Smoking on Respiratory Health. Chest 2016, 150, 164–179. [Google Scholar] [CrossRef] [Green Version]
- Bagnardi, V.; Rota, M.; Botteri, E.; Tramacere, I.; Islami, F.; Fedirko, V.; Scotti, L.; Jenab, M.; Turati, F.; Pasquali, E.; et al. Alcohol consumption and site-specific cancer risk: A comprehensive dose–response meta-analysis. Br. J. Cancer 2015, 112, 580–593. [Google Scholar] [CrossRef]
- Potter, J.; Brown, L.; Williams, R.L.; Byles, J.; Collins, C.E. Diet Quality and Cancer Outcomes in Adults: A Systematic Review of Epidemiological Studies. Int. J. Mol. Sci. 2016, 17, 1052. [Google Scholar] [CrossRef] [Green Version]
- Grosso, G.; Bella, F.; Godos, J.; Sciacca, S.; Del Rio, D.; Ray, S.; Galvano, F.; Giovannucci, E.L. Possible role of diet in cancer: Systematic review and multiple meta-analyses of dietary patterns, lifestyle factors, and cancer risk. Nutr. Rev. 2017, 75, 405–419. [Google Scholar] [CrossRef]
- Choi, E.; Park, H.; Lee, K.; Park, J.; Eisenhut, M.; van der Vliet, H.; Kim, G.; Shin, J. Body mass index and 20 specific cancers: Re-analyses of dose–response meta-analyses of observational studies. Ann. Oncol. 2018, 29, 749–757. [Google Scholar] [CrossRef] [Green Version]
- Moore, S.C.; Lee, I.-M.; Weiderpass, E.; Campbell, P.T.; Sampson, J.N.; Kitahara, C.M.; Keadle, S.K.; Arem, H.; De Gonzalez, A.B.; Hartge, P.; et al. Association of Leisure-Time Physical Activity With Risk of 26 Types of Cancer in 1.44 Million Adults. JAMA Intern. Med. 2016, 176, 816–825. [Google Scholar] [CrossRef] [PubMed]
- Schmid, D.; Leitzmann, M.F. Television Viewing and Time Spent Sedentary in Relation to Cancer Risk: A Meta-Analysis. J. Natl. Cancer Inst. 2014, 106, 1–19. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mistry, M.; Parkin, D.M.; Ahmad, A.S.; Sasieni, P. Cancer incidence in the United Kingdom: Projections to the year 2030. Br. J. Cancer 2011, 105, 1795–1803. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Weir, H.K.; Bs, T.D.T.; Soman, A.; Møller, B.; Ms, S.L. The past, present, and future of cancer incidence in the United States: 1975 through 2020. Cancer 2015, 121, 1827–1837. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Parkin, D.M.; Boyd, L.; Walker, L.C. The fraction of cancer attributable to lifestyle and environmental factors in the UK in 2010 Summary and conclusions. Br. J. Cancer 2011, 105, S77–S81. [Google Scholar] [CrossRef] [Green Version]
- Islami, F.; Sauer, A.G.; Miller, K.D.; Siegel, R.L.; Fedewa, S.A.; Jacobs, E.J.; McCullough, M.L.; Patel, A.V.; Ma, J.; Soerjomataram, I.; et al. Proportion and number of cancer cases and deaths attributable to potentially modifiable risk factors in the United States. CA A Cancer J. Clin. 2017, 68, 31–54. [Google Scholar] [CrossRef]
- Song, M.; Giovannucci, E. Preventable Incidence and Mortality of Carcinoma Associated With Lifestyle Factors Among White Adults in the United States. JAMA Oncol. 2016, 2, 1154–1161. [Google Scholar] [CrossRef] [Green Version]
- Ganggayah, M.D.; Taib, N.A.; Har, Y.C.; Lio, P.; Dhillon, S.K. Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med. Inform. Decis. Mak. 2019, 19, 48. [Google Scholar] [CrossRef] [Green Version]
- Gupta, S.; Tran, T.; Luo, W.; Phung, D.; Kennedy, R.L.; Broad, A.; Campbell, D.; Kipp, D.; Singh, M.; Khasraw, M.; et al. Machine-learning prediction of cancer survival: A retrospective study using electronic administrative records and a cancer registry. BMJ Open 2014, 4, e004007. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lynch, C.M.; Abdollahi, B.; Fuqua, J.D.; de Carlo, A.R.; Bartholomai, J.A.; Balgemann, R.N.; van Berkel, V.H.; Frieboes, H.B. Prediction of lung cancer patient survival via supervised machine learning classification techniques. Int. J. Med. Inform. 2017, 108, 1–8. [Google Scholar] [CrossRef]
- Ming, C.; Viassolo, V.; Probst-Hensch, N.; Chappuis, P.O.; Dinov, I.D.; Katapodi, M.C. Machine learning techniques for personalized breast cancer risk prediction: Comparison with the BCRAT and BOADICEA models. Breast Cancer Res. 2019, 21, 1–11. [Google Scholar] [CrossRef] [Green Version]
- Kourou, K.; Exarchos, T.P.; Exarchos, K.P.; Karamouzis, M.V.; Fotiadis, D.I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 2015, 13, 8–17. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cruz, J.A.; Wishart, D.S. Applications of Machine Learning in Cancer Prediction and Prognosis. Cancer Inform. 2006, 2, 59–77. [Google Scholar] [CrossRef]
- Huang, S.; Yang, J.; Fong, S.; Zhao, Q. Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges. Cancer Lett. 2020, 471, 61–71. [Google Scholar] [CrossRef]
- Christodoulou, E.; Ma, J.; Collins, G.S.; Steyerberg, E.W.; Verbakel, J.Y.; Van Calster, B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 2019, 110, 12–22. [Google Scholar] [CrossRef] [PubMed]
- Richter, A.N.; Khoshgoftaar, T.M. A review of statistical and machine learning methods for modeling cancer risk using structured clinical data. Artif. Intell. Med. 2018, 90, 1–14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Klijs, B.; Scholtens, S.; Mandemakers, J.J.; Snieder, H.; Stolk, R.P.; Smidt, N. Representativeness of the LifeLines Cohort Study. PLoS ONE 2015, 10, e0137203. [Google Scholar] [CrossRef] [PubMed]
- Lifelines. 2021. Available online: https://catalogue.lifelines.nl/menu/main/protocolviewer (accessed on 4 October 2020).
- Scholtens, S.; Smidt, N.; A Swertz, M.; Bakker, S.J.L.; Dotinga, A.; Vonk, J.M.; Van Dijk, F.; Van Zon, S.K.R.; Wijmenga, C.; Wolffenbuttel, B.H.R.; et al. Cohort Profile: LifeLines, a three-generation cohort study and biobank. Int. J. Epidemiol. 2015, 44, 1172–1180. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Galobardes, B.; Lynch, J.; Smith, G.D. Measuring socioeconomic position in health research. Br. Med. Bull. 2007, 81-82, 21–37. [Google Scholar] [CrossRef] [Green Version]
- Vinke, P.C.; Corpeleijn, E.; Dekker, L.H.; Jacobs, D.R.; Navis, G.; Kromhout, D. Development of the food-based Lifelines Diet Score (LLDS) and its application in 129,369 Lifelines participants. Eur. J. Clin. Nutr. 2018, 72, 1111–1119. [Google Scholar] [CrossRef] [Green Version]
- Wendel-Vos, G.C.W.; Schuit, A.J.; Saris, W.H.M.; Kromhout, D. Reproducibility and relative validity of the short questionnaire to assess health-enhancing physical activity. J. Clin. Epidemiol. 2003, 56, 1163–1169. [Google Scholar] [CrossRef] [Green Version]
- Ainsworth, B.E.; Haskell, W.L.; Herrmann, S.D.; Meckes, N.; Bassett, D.R.; Tudor-Locke, C.; Greer, J.L.; Vezina, J.; Whitt-Glover, M.C.; Leon, A.S. 2011 Compendium of Physical Activities: A Second Update of Codes and MET Values. Med. Sci. Sports Exerc. 2011, 43, 1575–1581. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- National Cancer Institute. Smoking and Tobacco Control Monograph 9: Cigars: Health Effects and Trends; Department of Health and Human Services, National Institutes of Health, National Cancer Institute: Bethesda, MD, USA, 1998. [CrossRef] [Green Version]
- Sun, Y.; Wong, A.K.C.; Kamel, M.S. Classification of Imbalanced Data: A Review. Int. J. Pattern Recognit. Artif. Intell. 2009, 23, 687–719. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 5–32. [Google Scholar] [CrossRef] [Green Version]
- White, I.R.; Royston, P.; Wood, A.M. Multiple imputation using chained equations: Issues and guidance for practice. Stat. Med. 2010, 30, 377–399. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
- Collins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD Statement. Eur. Urol. 2015, 67, 1142–1151. [Google Scholar] [CrossRef] [Green Version]
- Moons, K.G.; Altman, D.G.; Reitsma, J.B.; Ioannidis, J.P.; Macaskill, P.; Steyerberg, E.W.; Vickers, A.J.; Ransohoff, D.F.; Collins, G.S. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration. Ann. Intern. Med. 2015, 162, W1–W73. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- De Magalhães, J.P. How ageing processes influence cancer. Nat. Rev. Cancer 2013, 13, 357–365. [Google Scholar] [CrossRef]
- Krakovska, O.; Christie, G.; Sixsmith, A.; Ester, M.; Moreno, S. Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets. PLoS ONE 2019, 14, e0213584. [Google Scholar] [CrossRef] [PubMed]
- Meads, C.; Ahmed, I.; Riley, R.D. A systematic review of breast cancer incidence risk prediction models with meta-analysis of their performance. Breast Cancer Res. Treat. 2011, 132, 365–377. [Google Scholar] [CrossRef] [PubMed]
- Anothaisintawee, T.; Teerawattananon, Y.; Wiratkapun, C.; Kasamesup, V.; Thakkinstian, A. Risk prediction models of breast cancer: A systematic review of model performances. Breast Cancer Res. Treat. 2011, 133, 1–10. [Google Scholar] [CrossRef]
- Al-Ajmi, K.; Lophatananon, A.; Yuille, M.; Ollier, W.; Muir, K.R. Review of non-clinical risk models to aid prevention of breast cancer. Cancer Causes Control 2018, 29, 967–986. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ming, C.; Viassolo, V.; Probst-Hensch, N.; Chappuis, P.O.; Dinov, I.D.; Katapodi, M.C. Letter to the editor: Response to Giardiello D, Antoniou AC, Mariani L, Easton DF, Steyerberg EW. Breast Cancer Res. 2020, 22. [Google Scholar] [CrossRef] [PubMed]
- Lavalette, C.; Adjibade, M.; Srour, B.; Sellem, L.; Fiolet, T.; Hercberg, S.; Latino-Martel, P.; Fassier, P.; Deschasaux, M.; Kesse-Guyot, E.; et al. Cancer-Specific and General Nutritional Scores and Cancer Risk: Results from the Prospective NutriNet-Santé Cohort. Cancer Res. 2018, 78, 4427–4435. [Google Scholar] [CrossRef] [Green Version]
- Feng, P.; Xiao, L.-H.; Chen, P.-R.; Gou, Z.-P.; Li, Y.-Z.; Li, M.; Xiang, L.-C. Prostate cancer prediction using the random forest algorithm that takes into account transrectal ultrasound findings, age, and serum levels of prostate-specific antigen. Asian J. Androl. 2017, 19, 586–590. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.H.; Asch, S.M. Machine Learning and Prediction in Medicine—Beyond the Peak of Inflated Expectations. N. Engl. J. Med. 2017, 376, 2507–2509. [Google Scholar] [CrossRef] [Green Version]
Variation | Cancer in Follow-Up | Controls | Without Any History of Cancer |
---|---|---|---|
(n = 4232) | (n = 4232) | (n = 116,188) | |
Baseline age (SD) | 52.53 (13.12) | 52.53 (13.12) | 43.62 (12.68) |
Sex | |||
Females (%) | 2581 (61.0%) | 2581 (61.0%) | 67,679 (58.2%) |
Education level | |||
Low (%) | 1715 (40.5%) | 1658 (39.2%) | 34,678 (29.8%) |
Medium (%) | 1432 (33.8%) | 1418 (33.5%) | 46,238 (39.8%) |
High (%) | 1085 (25.6%) | 1156 (27.3%) | 35,272 (30.4%) |
Baseline body mass index (SD) | 26.62 (4.22) | 26.31 (4.30) | 25.97 (4.30) |
Baseline alcohol intake grams/day (SD) | 8.12 (9.69) | 7.15 (8.67) | 7.20 (8.93) |
Smoking packages/year (SD) | 10.20 (13.89) | 7.92 (11.83) | 5.90 (9.57) |
Baseline physical activity h/week (SD) | 4.51 (5.49) | 4.46 (5.14) | 4.14 (4.80) |
Baseline sedentary behavior TV h/day (SD) | 2.70 (1.53) | 2.66 (1.53) | 2.46 (1.48) |
Breast Cancer in Follow-Up | Breast Cancer Random Controls | Breast Cancer Controls | Prostate Cancer in Follow-Up | Prostate Cancer Random Controls | Prostate Cancer Controls | GI Cancer in Follow-Up | GI Cancer Random Controls | GI Cancer Controls | |
---|---|---|---|---|---|---|---|---|---|
(n = 977) | (n = 977) | (n = 977) | (n = 508) | (n = 508) | (n = 508) | (n = 609) | (n = 609) | (n = 609) | |
Baseline age (SD) | 50.76 (10.60) | 43.74 (12.83) | 50.76 (10.60) | 62.75 (7.48) | 45.08 (12.70) | 62.75 (7.48) | 57.22 (10.73) | 44.68 (12.01) | 57.22 (10.73) |
Sex | |||||||||
Females (%) | 977 (100%) | 977 (100%) | 977 (100%) | - | - | - | 249 (40.9%) | 361 (59.2%) | 249 (40.9%) |
Education level | |||||||||
Low (%) | 390 (39.9%) | 305 (31.2%) | 364 (37.3%) | 222 (43.7%) | 166 (32.7%) | 228 (44.9%) | 274 (45.0%) | 177 (29.1%) | 274 (45.0%) |
Medium (%) | 345 (35.3%) | 392 (40.1%) | 365 (37.4%) | 134 (26.4%) | 198 (39.0%) | 131 (25.8%) | 189 (31.0%) | 258 (42.4%) | 171 (28.1%) |
High (%) | 242 (24.8%) | 280 (28.7%) | 248 (25.4%) | 152 (29.9%) | 144 (28.3%) | 149 (29.3%) | 146 (24.0%) | 174 (28.5%) | 164 (26.9%) |
Body mass index (SD) | 26.47 (4.53) | 25.74 (4.81) | 26.10 (4.57) | 26.72 (3.19) | 26.18 (3.18) | 27.04 (3.53) | 27.47 (4.15) | 25.91 (4.06) | 26.51 (3.82) |
Alcohol intake grams/day (SD) | 5.97 (7.45) | 4.75 (5.88) | 5.10 (6.41) | 10.93 (9.74) | 10.39 (11.35) | 10.47 (10.16) | 9.58 (10.95) | 7.17 (8.89) | 8.19 (8.89) |
Smoking packages/year (SD) | 6.46 (9.66) | 5.36 (9.03) | 6.53 (9.46) | 11.66 (13.87) | 7.35 (10.44) | 12.52 (14.97) | 13.14 (16.60) | 5.84 (9.15) | 9.19 (12.97) |
Physical activity h/week (SD) | 4.26 (4.90) | 4.30 (5.01) | 4.35 (5.05) | 5.45 (6.46) | 4.40 (5.20) | 5.32 (6.27) | 4.73 (5.92) | 4.65 (5.80) | 5.06 (6.10) |
Sedentary behavior TV h/day(SD) | 2.66 (1.41) | 2.42 (1.32) | 2.63 (1.33) | 2.80 (1.70) | 2.42 (1.23) | 2.78 (1.37) | 2.84 (1.48) | 2.44 (2.09) | 2.67 (1.31) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cortés-Ibañez, F.O.; Nagaraj, S.B.; Cornelissen, L.; Navis, G.J.; van der Vegt, B.; Sidorenkov, G.; de Bock, G.H. Prediction of Incident Cancers in the Lifelines Population-Based Cohort. Cancers 2021, 13, 2133. https://doi.org/10.3390/cancers13092133
Cortés-Ibañez FO, Nagaraj SB, Cornelissen L, Navis GJ, van der Vegt B, Sidorenkov G, de Bock GH. Prediction of Incident Cancers in the Lifelines Population-Based Cohort. Cancers. 2021; 13(9):2133. https://doi.org/10.3390/cancers13092133
Chicago/Turabian StyleCortés-Ibañez, Francisco O., Sunil Belur Nagaraj, Ludo Cornelissen, Gerjan J. Navis, Bert van der Vegt, Grigory Sidorenkov, and Geertruida H. de Bock. 2021. "Prediction of Incident Cancers in the Lifelines Population-Based Cohort" Cancers 13, no. 9: 2133. https://doi.org/10.3390/cancers13092133
APA StyleCortés-Ibañez, F. O., Nagaraj, S. B., Cornelissen, L., Navis, G. J., van der Vegt, B., Sidorenkov, G., & de Bock, G. H. (2021). Prediction of Incident Cancers in the Lifelines Population-Based Cohort. Cancers, 13(9), 2133. https://doi.org/10.3390/cancers13092133