Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey

Banegas-Luna, Antonio Jesús; Peña-García, Jorge; Iftene, Adrian; Guadagni, Fiorella; Ferroni, Patrizia; Scarpato, Noemi; Zanzotto, Fabio Massimo; Bueno-Crespo, Andrés; Pérez-Sánchez, Horacio

doi:10.3390/ijms22094394

Open AccessReview

Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey

by

Antonio Jesús Banegas-Luna

^1,*,

Jorge Peña-García

¹,

Adrian Iftene

²

,

Fiorella Guadagni

^3,4,

Patrizia Ferroni

^3,4

,

Noemi Scarpato

⁴

,

Fabio Massimo Zanzotto

⁵

,

Andrés Bueno-Crespo

¹ and

Horacio Pérez-Sánchez

^1,*

¹

Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain

²

Faculty of Computer Science, Universitatea Alexandru Ioan Cuza (UAIC), 700505 Jashi, Romania

³

Interinstitutional Multidisciplinary Biobank (BioBIM), IRCCS San Raffaele Roma, 00166 Rome, Italy

⁴

Department of Human Sciences and Promotion of the Quality of Life, San Raffaele Roma Open University, 00166 Rome, Italy

⁵

Dipartimento di Ingegneria dell’Impresa “Mario Lucertini”, University of Rome Tor Vergata, 00133 Rome, Italy

^*

Authors to whom correspondence should be addressed.

Int. J. Mol. Sci. 2021, 22(9), 4394; https://doi.org/10.3390/ijms22094394

Submission received: 30 March 2021 / Revised: 16 April 2021 / Accepted: 20 April 2021 / Published: 22 April 2021

(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)

Download

Browse Figures

Versions Notes

Abstract

:

Artificial Intelligence is providing astonishing results, with medicine being one of its favourite playgrounds. Machine Learning and, in particular, Deep Neural Networks are behind this revolution. Among the most challenging targets of interest in medicine are cancer diagnosis and therapies but, to start this revolution, software tools need to be adapted to cover the new requirements. In this sense, learning tools are becoming a commodity but, to be able to assist doctors on a daily basis, it is essential to fully understand how models can be interpreted. In this survey, we analyse current machine learning models and other in-silico tools as applied to medicine—specifically, to cancer research—and we discuss their interpretability, performance and the input data they are fed with. Artificial neural networks (ANN), logistic regression (LR) and support vector machines (SVM) have been observed to be the preferred models. In addition, convolutional neural networks (CNNs), supported by the rapid development of graphic processing units (GPUs) and high-performance computing (HPC) infrastructures, are gaining importance when image processing is feasible. However, the interpretability of machine learning predictions so that doctors can understand them, trust them and gain useful insights for the clinical practice is still rarely considered, which is a factor that needs to be improved to enhance doctors’ predictive capacity and achieve individualised therapies in the near future.

Keywords:

drug repurposing; machine learning; personalised therapy; cancer treatment; deep learning; high performance computing

1. Introduction

Cancer has become one of the most common human diseases and causes of death [1,2,3]. Among other factors, its occurrence is mainly growing because of aging [4]. Even though cancer is a disease that affects men as well as women, there seems to be a clear relationship between gender and incidence. Thus, lung, prostate, colorectal, stomach and liver cancer are predominant among men, while breast, colorectal, lung, cervical and thyroid are the most common cancers in women (https://www.who.int/health-topics/cancer, accessed on 29 March 2021). Figure 1 depicts the number of estimated deaths in 2020 by cancer type collected from the Surveillance, Epidemiology and End Results (SEER) database.

A diverse range of therapies, including chemotherapy, radiotherapy, surgery and irradiation, is used in cancer patients depending on tumour type and stage. Unfortunately, the success of these treatments is limited because they attack normal and tumoral cells equally, which may result in toxicity and make the tumoral cells drug-resistant. In this scenario, early detection is a crucial factor for the successful application of therapies, for limiting associated side effects and, consequently, increasing the chance of survival [5,6]. For this reason, providing the physicians with appropriate tools for accurate diagnosis and prognosis remains a major challenge in cancer research.

Colorectal cancer (CRC) is the third most common type of cancer worldwide, representing 10% of all diagnosed cases and the fourth in the number of deaths it causes [7,8]. Furthermore, these figures are not very promising because the number of CRC cases is expected to increase by around 60% in the forthcoming decade [9].

As regard the reasons for such disheartening data, bad dietary habits are suspected to be behind the growing number of CRC cases reported in recent years but there are other reasons, such as the lack of exercise, obesity and smoking that are suspected of causing tumours [10]. Moreover, familial and hereditary antecedents have proved to influence the incidence of this cancer [11]. In an attempt to identify reasons, beyond the biological, for the evolution of CRC worldwide, Arnold [9] published a study correlating the human development index with the incidence and high mortality of CRC, which resulted in the classification of countries into three groups with well-defined characteristics. In short, a number of factors in our daily lives promote the emergence of colorectal tumours and, although there is no clear numerical estimation of how much these factors contribute to the appearance CRC, it seems to be in our hands to change the trend. From a more medical point of view, the high morbidity and mortality rates could be explained by the fact that malignant CRC tumours are considered to be especially complex biologically [12].

Much effort has been put into predicting CRC or, at least, into predicting the manner in which the tumour is likely to progress. Genetic information plays a key role for detecting tumoral cells and tissues that can help identify cancer disease at an early stage. The role of genetic mutations in CRC has been extensively analysed and several publications are available in the literature on this topic [13,14,15,16]. Other authors have focused on identifying biomarkers with the aim of finding the subset with the highest predictive power [17,18,19,20]. Early identification could increase the likelihood of survival and dramatically reduce the mortality rate. Unfortunately, a full understanding of cancer cell behaviour is still beyond our grasp, making this a major challenge in medicine.

When prevention has failed, the application of individualised therapies is the ideal scenario for the treatment of cancer patients. Personalising therapies implies finding the most suitable set of drugs and their exact dose for a given patient, based on the available input parameters, such as cancer type, tumour size and whether metastasis is present or not. The idea behind this individualisation of therapies is to maximize the effect of drugs, limit their side effects, shorten the time necessary to cure the disease and reduce costs. The idea that individualised therapies are more cost-effective than generic ones seems credible because the same treatment is obviously not suitable for every patient since not all cases are similar. Several publications have discussed the direction that medicine is taking in this respect [21,22,23,24] and its popularity has grown in recent years. Although all these authors agree that personalised treatment will increase the effectiveness of existing drugs, to the best of our knowledge, there has been no attempt to put it into practice in the case of cancer treatment, making this goal a priority in cancer research.

In this move towards individual therapies, computing sciences have become a close ally of health and life sciences and medicinal chemistry. The rapid development of high-performance computing (HPC) platforms such as parallel and distributed computing have found a place to develop in the field of chemical and biological problems. It is well known that HPC infrastructures are extensively used to carry out complex scientific calculations [25,26,27] and their computing power can drastically speed up the resolution of a problem [28,29,30]. However, this is not enough: firstly, because the amount of medical and pharmacological data available is overwhelming and huge computing power is needed to analyse it all; and, secondly, the analysis methods necessary to transform such data into real understandable knowledge are very challenging. While HPC can help overcome the first difficulty, the application of artificial intelligence (AI), and more specifically machine learning (ML), is necessary for the second. Only if HPC and ML work together will they be capable of screening the vast chemical space and predict the most cost-effective therapy for individual patients [31,32].

Machine learning experts know that with the right data very efficient predictions can be made, as has been demonstrated in several fields such as sports results, injuries, stock market movements, text-based emotions, etc. The field of medicine has not been left behind in this respect and such technology is already used to diagnose or predict diseases such as cancer [33], making it clear that ML, complemented by HPC, represents the future of anti-cancer medicine. Already, ML algorithms are very helpful in many cancer-related tasks, such as the prediction and diagnosis of the disease, predicting its progression, the search for new drug synergies, predicting therapy outcomes and estimating survivability. It is the potential for analysing historical data, learning from the analysis and making predictions for future cases that makes them suitable for application in cancer research. It might even be claimed that ML is the aid that doctors need to increase the accuracy of their predictions and decision making, due to its ability to extract knowledge from previous cases. Evidently, the output of ML systems has to be transformed to make it understandable by healthcare staff; otherwise, we would be wasting an important opportunity.

This critical review highlights the role of ML in each of the main steps of anti-cancer medicine. Section 2 focuses on the needs of doctors, attempting to answer questions like “What kind of ML do doctors need?” and “Does ML output need to be adapted to medical doctors?”. Section 3 presents a revision of the typical ML algorithms used in each stage, each subsection describing the most frequently used approaches, which are condensed into a table to facilitate their readability. The most relevant findings observed in Section 3 are discussed in Section 4. Finally, the main conclusions reached and the future of ML in cancer research are summarized.

2. What Kind of ML Is important in Medicine/Cancer Prediction and Treatment

In this section, we focus on the basic features of an ML system that medical doctors and medical/biological researchers are seeking beyond the output that a trained ML system already provides.

The advantages of ML systems stem from the fact that they use thousands of features, which they use to produce decisions in a very short time. It is important to note that the training stage can be expensive in terms of computing power, while the prediction stage is in comparison fast and computationally cheaper. The correlations that the algorithm finds between the samples are similar to those found by experienced doctors, who have seen hundreds of patients and begun to notice repetitive symptoms or similar values in their detailed medical tests, which helps them to make decisions.

However, no matter how accurate ML systems are, no matter how many lives they can save in principle and no matter if they are based on the doctor’s entire medical knowledge if medical/biological researchers do not understand the underlying models and their inferences. Only if ML systems cannot be explained, these systems will not be a game changer in medicine, nor will medical/biological researchers use them to make everyday decisions, condemning the whole approach to failure. To achieve any success, ML systems need to gain trust of medical/biological researchers.

Consequently, our aim was to define four factors that should contribute to the success of ML learning systems in the medical domain: (i) output interpretability, (ii) linking the predictions to the original cases used to produce outputs and (iii) low data hungriness. In this survey, we analyse existing approaches with respect to these factors as, only if there is a substantial attention paid to all of them will a novel ML approach or system be a game changer in a specific clinical situation. Only if the answer to the question “Do doctors need to know about and learn ML in the future?” is negative can ML add real value to clinical practice.

2.1. Factor One: Output Interpretability

Interpretability in Machine Learning or, in AI in general (XAI), is a hot topic, especially when it is applied to medicine. AI systems tend to return raw results that are hard to understand, which complicates their interpretation by non-expert users, including doctors. Thus, to make AI more attractive to healthcare professionals, we should answer the question “What do doctors need to easily interpret AI predictions?” Interpretability often appears as a desideratum, but it is poorly defined [34]. Hence, a clear understanding of the term interpretability is essential in order to classify existing ML approaches. In general, there are two approaches to interpretability: model interpretability and inference interpretability [35]. Model interpretability relates to understanding how a model behaves in general, whereas inference interpretability aims to describe how systems decide on each instance. Hence, these are two facets of the same problem. However, in both cases, interpretability may be obtained by showing symbols (e.g., natural language or structured languages such as logical forms) to explain models or inferences.

Since the first AI systems, authors have outlined their stages of inference. For example, Swartout et al. [36] deal with explanations for expert systems, Johnson [37] presents agents that learn to explain themselves and Lacave and Diez [38] discuss interpretation methods for Bayesian networks (BN). In recent years, there has been a strong emphasis on revealing what happens behind the black box that uses AI algorithms [39]. This is necessary if doctors are to trust the results provided by these algorithms and so use them in their daily activities (diagnosis, deciding on the most appropriate treatment, etc.). In comparison with other domains, medicine deals with the uncertain, probabilistic, unknown, incomplete, imbalanced, heterogeneous, noisy, dirty, erroneous, inaccurate and missing data sets in arbitrary high-dimensional spaces [40,41].

Explainable artificial intelligence (XAI) has received much attention in recent years [42]. There are two aspects of unsupervised learning models relevant in the context of interpretability [39]. First, the representations learned in these models may show similarities between the data in a class. One such case is the word embedding, which can signal semantic similarity between words [43] and, second, being able to generate instances that allow us to study the differences between data within a class. This is relevant in medicine, where the discovery and analysis of disease-related abnormalities are relevant [44].

Trustworthiness in AI is the ability to evaluate the validity and reliability of an ML system in many different input configuration and application environments. This factor is very important in the medical environment, particularly in cancer prediction, where it is necessary to be able to evaluate exactly the limitations of an ML system and, consequently, accurately interpret and trustfully apply ML prediction system outputs.

Bærøe et al. [45] underline the growing importance of AI and the relative need for trustworthiness in AI systems, especially in the medical environment. In the same work, the authors analyse the report: “Ethical guidelines for trustworthy artificial intelligence” published by the European Commission in 2019 (https://ec.europa.eu/futurium/en/ai-alliance-consultation, accessed on 29 March 2021) and highlight the need for “globalising” the guidelines at both European and international level.

2.2. Factor Two: Linking to Original Cases to Produce Outputs

AI systems often focus on the outputs but do not explain how much each input participates in the result. In a medical context, this correlation between inputs and outputs may be necessary to identify the reasons leading to a given decision.

Assignment methods try to link a certain output of the deep neural network with input variables [39]. In another paper [46], the authors analyse the change in output gradients depending on the change in input variables. In this way, the authors propose a result based on the data that were used as the input of an algorithm and try to make a link between these data and the result obtained. However, in the medical field, although we will can still explain the results obtained and see a link with similar cases that formed the basis of a decision formulated by the AI algorithms, there will always be the possibility of making a mistake and exposing the patient to certain risks [47].

2.3. Factor Three: Data Hungriness

With the widespread application of computer technology in the medical field, the amount of medical data available has increased dramatically and analysis methods are already in use for the intelligent assessment of medical health. In the coming years, we expect the volume of medical data to increase even more, ranging from terabytes to petabytes and even yottabytes [48,49,50].

However, due to the mixed format of medical data, incomplete records and the noise present in them, it is still difficult to analyse large amounts of medical data [51]. Because traditional ML methods cannot efficiently extract a rich body of information from large medical databases, Deep Learning (DL) methods arise to build more complex models based on an idea similar to the way that neurons set interconnections in the human brain. Increasingly, DL models use large medical databases, from which they select and optimize parameters and automatically learn the process of pathological analysis of doctors [52]. Based on these models, the disease in question is identified in an intelligent way and an early diagnosis can be made. Thus, the pressure on the activities of doctors is considerably reduced and the efficiency of their work can be improved.

3. Application of ML Approaches in Cancer Cases

In this section, a number of cases will be discussed to illustrate how ML can help doctors in the different stages of cancer evolution, from its diagnosis to the prediction of survival chances. Each section focuses on one of the main steps targeted by ML in healthcare contexts. Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 summarize a detailed collection of works related with the topic of discussion. The datasets column describes the original source of data to reference a specific dataset, a full database, a citation, a project or the institution that collected the samples. The column entitled “Exp?” means whether the interpretability of the results by non-experts is considered in the paper or not. Other relevant information, such as the AI approaches and the software tools used, are also reported. To facilitate the readability of the examples, we present the works in a short table per section.

3.1. Predict the Possibility of Cancer

Currently, most of the studies performed for predicting the possibility of cancer are based on the analysis of genetic data and mutations. Kaminker et al. [54] developed CanPredict software to identify and predict whether certain mutations are associated with tumours or not. The software combines the Sorting Intolerant From Tolerant (SIFT), LogR.E-value score and Genetic Ontology Similarity Score (GOSS) methods by applying an advanced Random Forest (RF) classification scheme. Capriotti and Altman [55] used support vector machines (SVM) to analyse different databases, each created with an equal number of cancer driver Single Amino Acid Polymorphisms (SAPs) and neutral SAPs. Using this technique, it is possible to predict whether a given missense SAP is neutral or is involved in cancer appearance. In their study, the authors achieved an effectiveness greater than or equal to 90% in the overall predictions.

Taninaga [61] describe how a set of characteristics related to gastric cancer can be processed using extra gradient boosting decision (XGBoost) algorithms or logistic regression (LR) methods to predict whether a patient is at risk of developing the disease over the next 122 months. In this study, 10 models were developed. For the first five, the authors used XGBoost: the first model only took into account Helicobacter infections, while to the second they added data on chronic atrophic gastritis, in the third they included endoscopic findings, in the fourth they added biological background factors and in the fifth they also included blood tests. The other five models were identical applied linear logistic regression instead of XGBoost. The performance of each model was measured using the area under the curve (AUC) value. As a result of the research, the most influential characteristics in the development of gastric cancer were seen to be the mean corpuscular volume, the proportion of lymphocytes, age, body mass index (BMI) and postgastrectomy. Finally, AUC values of 0.899 and 0.874, respectively, were obtained with the 5th and 10th models, the authors concluding that with these models it is likely to predict whether a patient might suffer from cancer.

According to the American Cancer Society, 3.3 million people are diagnosed with skin cancer annually. A prediction of the risk of suffering Non-Melanoma Skin Cancer (NMSC) was made [62] using 13 personal data of patients that can easily be obtained from an Electronic Medical Record (EMR): gender, age, BMI, diabetic status, smoking status, emphysema, asthma, race, Hispanic ethnicity, hypertension, heart diseases, vigorous exercise habits and history of stroke. These input parameters were first normalized to values between 0 and 1 and an artificial neural network (ANN) model was developed based on one input layer with 13 nodes, two hidden layers with 13 nodes and one output node. The authors used 462,630 cases, taking 70% of the cases for training and the remaining 30% for validation and obtained an AUC value of 0.81. The study concluded that by including the two most important factors that should be taken into account in skin cancer, i.e., radiation and personal history, risk predictions of the model could very likely be improved.

Martínez-Más et al. [63] combined different ML techniques with features obtained by Fourier transform (FT) to classify ovarian tumours as benign or malignant, using ultrasound images. After extracting 187 features from the ultrasound images using FT, they were used as input features for k-nearest neighbours (KNN), linear discriminant analysis (LDA), SVM and extreme learning machine (ELM). For this, different kernels were analyzed to obtain the optimal configuration and it was seen that the combinations of FT with LDA, SVM or ELM are good classifiers for biomedical images, providing an accuracy of more than 85%.

Breast cancer (BC) is one of the most common types of cancer in women. For prediction purposes, a regular analysis of mammographic images is required. To estimate the probability of malignancy of the tumour, there are three categories: prognostic models, computer-aided detection and computer-aided diagnosis. Ayer [58] proposed a method for accurately predicting BC using ANNs, with particular emphasis on calibration made by means of the Hosmer–Lemeshow goodness-of-fit test. This generates a network topology with three layers: the first one with 36 input nodes (mammographic descriptors, demographic factors and BI-RADS), a hidden layer with 1000 nodes and an output layer with 1 node. Later, they trained the network using a cross-validation method on 62,219 registers. Next, they compared the results obtained through their model with the prediction experience of eight radiologists. The fact that the ANN obtained an AUC value of 0.965 and the radiologists a value of 0.939, demonstrates the good predictive capabilities of ANN, which can, therefore, be considered a reliable support tool.

Predictions of the risk of developing BC in the short term can be made by comparing the distribution of volumetric breast density of both breasts based on mammographic image analysis [59]. The authors proposed a model based on a Convolutional Neural Network (CNN), which converts an image into a characteristics vector, then applied a Locality Preserving Projection (LPP) algorithm to reduce the features obtained by the network, finally obtaining a vector with 44 characteristics. Classification was then carried out, comparing two classification methods, SVM and KNN. The model was trained through a cross-validation using 500 mammographic images, which provided an AUC value of 0.62 for SVM and 0.60 for KNN. In order to further optimize the accuracy of the model, the AUC values were calculated for each of the 44 characteristics and then sorted according to these values. Subsequently, the least relevant characteristics were eliminated, by testing the model based on a range of 2 to 10 characteristics. With 10 features and using KNN, an AUC value equivalent to 0.64 was obtained, which was better than when using 44 features. The best configuration was achieved using LPP-KNN, reducing the regenerated features to four. This gave an AUC value of 0.68 for the short-term prediction of BC (less than 5 years).

The risk of developing BC can be predicted through the identification of Single Nucleotide Polymorphisms (SNPs) in DNA that contribute most to its development [60]. To identify them, a three-stage protocol is implemented: (i) the SNPs are selected using a gradient boosting classification technique: XGBoost; (ii) based on the XGBoost output data, an adaptive iterative search for SNPs is made, sorting the results downwards according to their scores; the M best-scored results and the M worst-scored ones are selected and are separately ordered from lowest to highest; this process is repeated, increasing the size of M until the both lists overlap; (iii) the top SNPs are chosen and classified with SVM representing an optimal group that can potentially predict the risk of BC. The protocol is implemented in Python with the libraries sklearn, xgboost among others and can be downloaded from github.

DNA methylation is known to play a major role in tumorigenesis. BIGGIOCL [56] is a tool that can be used to analyse hundreds of thousands of individual data in a few hours. Although it was designed to analyse DNA and CpG Islands, the author specifies that it could be adapted to other fields. The tool, developed in Java and based on the MLlib learning library, allows parallelization of work in multiple machines. When developing the software one of the reasons for implementing RF was its parallelization capability that allows a forest tree to be executed in each node and the information to be sent to the master node. As it is based on MLlib it can be used in Yet Another Resource Negotiator (YARN) environment. In the publication, the authors analysed data from HumanMethylation450 to check its relationship with BC and obtained a direct relationship with the genes RP53, PIK3CA, BRCA1, BRCA2 and BDNF, results that match those previously published by other authors.

Another type of cancer that is frequent in both men and women is CRC. Myte [57] carry out the first study relating a One-carbon metabolism (1CM) pathway to cancer risk in humans by applying a BN. The observed relationship between compounds of 1CM and CRC and the lack of empirical studies proving the impact of 1CM and SNPs on CRC motivated this work. The study collects data from blood samples, one per patient, and uses a BN to relate population-based data, SNPs and the metabolic pathways involved in 1CM. The authors suggested that the most important factors in colorectal tumorigenesis are the associations between folate, vitamin B6 and vitamin B2 and concluded that these compounds should be taken into account in future studies of 1CM and the development of CRC.

Lifestyle is important for disease prevention. In the case of lung cancer particularly, there are certain habits or external factors that can increase the risk of contracting the disease. In the study of Chen and Wu [53] a set of data concerning demographics, disease, radiation, behaviour, environment and smoking was analysed in a group of adult patients. The authors used a CNN to identify which of these factors are the most important in the development of this type of cancer. The study divided the samples into four groups: (i) men over 64 years, (ii) women over 64 years, (iii) all those over 64 years and (iv) all those over 17 years. The four sets of data were then converted into Hierarchical Data Format 5 (HDF5), which is designed to store and organize large amounts of data and is used by Caffe, a Deep learning framework, to import the data into their CNNs. After training the model with a cross-validation, it achieved an AUC prediction value of 0.913 and, of all the risk factors for lung cancer examined in those over 64 years of age, smoking was the most important.

In Martínez-Mas et al. [64], the authors propose a novel method for the early detection of cervical cancer, which is one of those with high mortality in women. Frequently, the automatic classification of medical images does not pre-clean the images to remove overlaps, which does not reflect the reality of the images obtained directly from the medical samples. To overcome this issue, the authors implemented an artificial cell merger approach to improve the efficiency and realism of the classification model using CNN and without ruling out blurred, overlapping cells, etc. This approach showed a classification accuracy of 88.8%, obtaining a sensitivity and specificity of 0.92 and 0.83, respectively.

3.2. Predict Cancer Recurrence

Once the cancer is diagnosed, one of the main concerns is the possibility of recurrence or metastasis. In this line, Exarchos et al. [71] used a data set comprising clinical, image and genomic data to provide a multiparametric system to detect recurrence in squamous cell carcinoma using BN, ANN, SVM, decision trees (DT) and RF classifier algorithms and ROC curve assessments. The best results in terms of accuracy were obtained for the BN classifier (78.6% accuracy for clinical data, 82.8% for images and 91.7% for genomic data). Kim et al. [134] studied the recurrence of BC over 5 years using SVMs, ANNs and regression analysis; in this case, the SVM model gave the best results in terms of accuracy (89%). In the same study, it should be noted that selection of the characteristics of the models was based on the mutual information provided by the input characteristics. In the same line of detecting recurrent BC, Park et al. [70] used genetic information to create a graphical model based on semi-supervised learning (SSL) through gene pairs that indicate strong biological interactions, in this case for both breast and colon cancer. This graphic model proved to be quite accurate in predicting the recurrence of breast and colon cancer (80.7% and 76.7%, respectively). This SSL technique was seen to very interesting when very few labelled samples are available, which is a fairly common problem for this type of data set.

In Ahmad et al. [68], three ML methods (DT, ANN and SVM) were compared for predicting for BC recurrence by analysing sensitivity, specificity and accuracy. The C4.5 algorithm was used in DT. Accuracy of 0.936, 0.947 and 0.957, respectively, were obtained. This work showed that SVM had the lowest error rate and the highest accuracy for predicting the recurrence of BC. In Tseng et al. [72], SVM, DT and ELM are used to predict the recurrence of cervical cancer. Of these three methods, DT obtained the best results, especially when using the C5.0 algorithm (92.44% accuracy). The following were analysed in the study: Pathologic Stage, Pathologic T, Cell Type and RT Target Summary.

Another way of approaching cancer prediction is through making individual predictions for each patient. Ferroni et al. [69] studied this approach using SVM and Random Optimization (RO) to predict BC in individual patients. In addition to prediction, the model allowed patients with low and high risk of cancer progression to be differentiated. The authors concluded that the use of ML algorithms (specifically SVM) with RO, allows the creation of an efficient model for customization in the prediction and recurrence of BC.

Two studies by Lu et al. and Xu et al. [65,66] worked on the early identification of CRC recurrence. In the first paper, several treatments were analysed and good results were observed in patients who are sensitive to FOLFOX (5-FU, leucovorin and oxaliplatin). The authors used ML algorithms (more specifically KNN, SVM, gradient boosting machines (GBM), ANN, DT and RF) to identify the differences in genes between patients who respond to FOLFOX and those who do not respond in cases of CRC recurrence. They concluded that SVM and RF are the most effective ML methods for predicting FOLFOX response. In the second paper, too, ML techniques (LR, DT, Light GBM, GBM) were used to study the impact of treatments once CRC had been detected. Light GBM and GBM were found to be the most efficient for detecting the reappearance of CRC and the treatments that most influence the reappearance of tumours were chemotherapy, age, carcinoembryonic antigen and anaesthesia time.

3.3. Predicting Cancer Progression

Tumours can change over time, getting bigger, becoming malignant or undergoing metastasis [135] in an evolutive process that involves cancerous cells [136]. Tumours evolve in different ways in different patients. The REVOLVER (Repeated EVOLution in cancER) method [76] applies the so-called Transfer Leaning (TL) approach to forecasting cancer progression. While the standard procedure infers uncorrelated models for each individual patient depicted by phylogenetic trees containing noisy data, REVOLVER uses TL to correlate models obtained from different patients and identify similarities in those tumours that evolve in a similar manner. The idea behind TL is to store the knowledge obtained while solving one problem and to apply this knowledge, when possible, in the resolution of a similar task. Thus, the knowledge extracted from one sample is transferred to another. As input, REVOLVER uses a set of Cancer Cell Fractions (CCF) or any other genetic alteration that can be represented in binary format. It then follows a two-step process: (i) it calculates a set of correlated evolutionary trees, which are numerically scored, describing the evolution of each patient’s tumour; and (ii) it computes the evolutionary trajectories for each group of input alterations depicted in a tree that shows the number of times an alteration occurs among other values. This method was used to analyse a collection of datasets for lung, breast, renal and colorectal cancer based on 768 samples and identified interesting genomic trajectories that were judged to merit further study (e.g., CDKNA → TP53 → TERT, TP53 → PIK3CA → −8p → +8q).

Alternative to TL for studying mutation timelines are Long Short-Term Memory (LSTM) networks, which are a type of recurrent neural network (RNN) with the ability to learn long-term dependencies from a sequence of events. LSTM takes advantage of the temporal nature of mutation trajectories. With this type of algorithm, mutations can be sorted by occurrence time to provide an explanation of tumour evolution [77]. The authors trained an LSTM of 5 hidden layers aiming to predict the number of mutations present in each tumour, the so-called mutational load. The model was trained on two datasets containing CRC and lung cancer samples. In less than 100 epochs they reach an AUC of 0.95. It is also possible to predict the genes that are present in such mutations and identify a set present in both types of cancer (e.g., titin, mucin-16, nesprin-1). Finally, the authors reported that the last 20 mutations are highly correlated with the mutational load. To validate their model, they implemented an SVM model that exhibited lower performance than LSTM, probably because they studied a non-linear relationship between mutations.

The state of a BC usually depends on several factors, such as the tumour size and cellularity, the presence of tumoral cells in the lymph nodes being the most reliable marker and the expression of S100A4 and nm23 genes the most effective predictors of their status. In order to investigate the predictive power of these genes and tumour size and grade a set of 15 ANNs was trained on 16 BC samples and tested against another 16 [79]. The results confirmed the expression of S100A4 and nm23 genes as the most effective predictor and that the inclusion of other markers could improve the accuracy (e.g., ER/PgR expression).

Simpler ML approaches, such as LR, can also help in predicting cancer progression [80]. The method works in the knowledge that Transforming Growth Factor beta (TGF-β) is involved in the acquisition of heterogeneity by tumours [137]. This fact means that TGF-β is responsible for promoting tumour evolution, thus, complicating cancer prognosis. The activation of TGF-β signalling contributes to the acquisition of malignant properties by head and neck squamous cell carcinoma (HNSCC). However, the effects of TGF-β on lipid metabolism remain unclear. In this context, the authors aimed to develop an ML-based algorithm to detect intratumoural TGF-β-stimulated areas in clinical HNSCC tissue without recourse to a conventional immunohistological examination. For this purpose, Logistic Regression of the mass spectra of HNSCC-stimulated and non-stimulated human cells was carried out on the public datasets GSE57441 and GSE9844. The LR algorithm accurately segregated stimulated and non-stimulated cells reaching a classification accuracy of up to 98%. This finding demonstrates that simple ML approaches, despite their limitations, can also be helpful in predicting cancer progression.

Metastatic Skin Cutaneous Melanoma (SKCM) has been demonstrated to arise from factors such as the expression of mRNAs and miRNAs and aberrations in methylation patterns [138,139]. To understand how skin melanoma progresses a combination of feature selection methods and ML classifiers has been used [81]. The data, including mRNA, miRNA and methylation expressions from The Cancer Genome Atlas (TCGA) database, were split into 80% for training and 20% for testing, giving training datasets of 371, 354 and 371 samples respectively. First, three feature selection methods, namely Weka-FCBF, SVM with L1 regularization (SVM-L1) and Principal Component Analysis (PCA), were applied to reduce the number of input features so that subsequent analysis could focus on the most discriminative characteristics. In this step, SVM-L1 outperformed the other methods by selecting the 17 features that were used in the next stage. The Jaccard index was calculated to select the best method. Secondly, six classification models were developed to support vector classification with weight (SVC-W) performed best, obtaining 0.95 AUC and 89.4% accuracy in an external validation test. The other classifiers were ExtraTrees, KNN, RF, LR and Ridge classifier. The models were assessed using different metrics, including AUC, the Matthews coefficient, sensitivity, specificity and accuracy. As a conclusion, the authors reported a collection of genes that could be considered relevant markers of cutaneous melanoma metastasis (e.g., ESM1, NFATC3, C7orf4).

3.4. Calculating Drug Doses or Drug Combinations

It used to be commonly accepted that the administration of drug combinations rather than providing monotherapy can increase treatment efficacy [140]. This approach is nowadays limited by the huge size of the chemical space that makes the identification of novel drugs very difficult and, consequently, complicates the choice of effective drug combinations. In order to perform a cost-effective screening of this chemical space, DL methods are gaining in importance. For example, the DeepSynergy tool [89] aims to predict the most efficacious anti-cancer multi-drug treatments by means of DL. DeepSynergy provides an ANN, which is implemented with the modern TensorFlow framework and outperformed other ML methods, such as GBM, RF, SVM and Elastic Nets, in a benchmark on the largest synergy dataset. However, the performance all these methods decreased when exploring new datasets of different sizes and data distributions, which is one of the typical problems of ML approaches which remains a challenge today. In the same line, Celebi [86] published a study to identify functional anti-cancer dual therapies, an approach whereby two single-target drugs work in synergy to cure a disease. The above authors evaluated five ML methods (LR, Lasso, SVM, RF and GBM) implemented with the sklearn and xgboost Python libraries. All the models were trained on a novel dataset released by AstraZeneca and the Dialogue for Reverse Engineering Assessments and Methods consortium [141]. The assessment showed that GBM outperformed the other methods in synergy identification. It is interesting to mention that the study included a variant of LR, the so-called Lasso [142], which is a regularized version of LR that reduces overfitting in the model.

In addition, deciding on the drug combination to be administered, identifying the exact dose is crucial for creating personalized cancer therapies. However, despite the importance of these points, research into them lags behind estimating cancer risk or predicting therapy outcome. EON software [85], a component-based decision support system (DSS) that was developed to build healthcare protocols at a high level of abstraction, represented a first attempt to use AI to build reusable software capable of helping doctors. Its modular design makes it easy to add and replace components and the graphical interface means that it is accessible to any user, even those lacking advanced computer skills. A major advantage of EON is that, once designed, the protocols can be reused for any disease with minimal adaptations; for example, different types of cancer or AIDS might share the same protocol. With regards to drug dose estimation and the optimal application time, EON includes the Chronus temporal query system, which implements a specific algebra for writing temporal queries and can be extended with the Catenation operator. This operator is able to identify adjacent periods and merge them into a single one, making it possible to know when and for how long a patient was given a certain drug combination. This information, along with the therapy outcomes for the same periods, can help analyse the effectiveness of a drug synergy, providing useful information for future cases.

A recently published work [143] summarizes the main advances of AI for treating head and neck cancer patients. A key factor when planning treatments for this cancer is the intensity modulated radiotherapy (IMRT) dose prediction. The manuscript describes the way ANN [82,83], CNN [84,91,92] and tree-based methods [90] are currently applied to resolve classification problems from a collection of images. The aim of this sort of protocol is to identify the most effective dose for each patient. Tree-based methods try to mimic the thinking of an expert clinician looking at a set of images of a new patient, identify a similar past patient with the most similar images and map the dose distribution administered to the former patient in order to assess the optimal treatment to be applied with the new patient. To do this, a collection of features is extracted from the images to build a dataset of structured data that can be handled by most ML algorithms. This approach reached 78.68% and 86.83% accuracy in breast and prostate cancer, respectively, when the Gamma metric was used. The main drawback of tree-based algorithms that work in this way is that their accuracy is closely coupled to their core steps: extracting descriptive features from the source images, identifying a similar patient on the basis of such descriptive features and adapting the past dose to the new patient. The alternatives to the tree-based methods used in the above work are fully connected ANNs with two layers, which are easy to train but which do not conserve memory and may suffer overfitting. Whatever the case, the prediction error reported was lower than 10% [83]. Fortunately, CNNs are very good for predicting volumetric information, the most suitable types being Tiramisu and Dilated CNNs (DCNN). Tiramisu models work in two steps: (i) encoding the input image to extract the most descriptive features; and (ii) decoding the information to restore it to the initial size. When the dose volumes are consistent with respect to the anatomy (e.g., in prostate cancer), Tiramisu models are the preferred option [84], otherwise (e.g., head and neck cancer), DCNNs are preferable.

Frequently, gene mutations are detected in cancer patients and discovering the relationship between these genetic variations and drug responses has led to the ability to identify which patients might profit most from certain drug synergies. However, the results of clinical trials in their advanced stages must exhibit a significant improvement over standard therapy. Thus, clearly defining groups of patients in which a novel drug may be more effective than the existing ones could help lead to individualised therapies and, as a consequence, this has become a target of ML. An unsupervised learning approach based on multivariate analysis (MVA) of undirected graphs [87] was performed to classify patients into well-defined subpopulations. The statistical methods were implemented with R packages and the input datasets were collected from the GDSC (https://www.cancerrxgene.org/, accessed on 29 March 2021), CCLE (https://portals.broadinstitute.org/ccle, accessed on 29 March 2021) and CTRP (https://portals.broadinstitute.org/ctrp/, accessed on 29 March 2021) databases. As result of this work, the SEABED (Segmentation and Biomarker Enrichment of Differential Treatment Response) platform was developed and used in several examples, in one of which the authors aimed to assess the response to a combination of drugs, namely A and B. To accomplish this, they segmented patients into subpopulations depending on their response to the therapies, considering AUC and IC50 as metrics. They also provided a graphical representation of the results in a tree whereby the identified subpopulations were coloured depending on the exhibited sensitivity to both, A, B or no drugs, which is important for facilitating interpretation of the results. Then, the authors chose a BRAF and a MEK inhibitor and discovered that the subpopulation sensitive to A was enriched for BRAF mutations and the one sensitive to B was enriched for MEK mutations. This approach is generic enough to be used for the analysis of any type of cancer sample, independently of its particular characteristics and can also be of great use for predicting tumour progress.

As can be inferred from Table 4, image processing is a key procedure when estimating drug doses and finding effective drug combinations. To satisfy the need for powerful image processing algorithms, CNNs have shown themselves to be alternative to traditional ANNs. In parallel, new frameworks (e.g., TensorFlow, PyTorch) have been developed to exploit all the computing power of graphical processing units (GPUs) and accelerate image analysis. When there are no images available or their inspection is not suitable, other statistical methods and classifiers (e.g., LR, RF, MVA) can be fed with a diverse collection of data types. Regarding interpretability of the results, this is not the main concern of scientists according to Table 4. Very few of the works try to adapt the output of their models to make it understandable by doctors or use easily interpretable models (e.g., DT, BN). Whatever the case, the extensive use of image processing with CNN makes some models easier to understand than raw numerical results.

3.5. Predict Treatment Outcome

In the move towards personalized therapies, the prediction of therapy outcome is essential. In spite of the fact that several works where AI is used to estimate a tumour’s evolution after therapy for colorectal [95,101], breast [102,103,104], blood [106], renal [107], ovary [108] or oesophageal [109] cancer, this topic remains a major challenge for scientists.

Classification, regression and clustering algorithms have frequently been used to resolve this sort of issue. As example of the classification method, a DT was implemented to diagnose and predict therapy outcome for bladder cancer patients using the SPSS statistical package [105]. The work showed how nearly 950 patients could be classified into three groups with different recurrence-free and overall survival probabilities. DTs have the advantage of being very intuitive and easy to interpret by medical doctors, which is one of the main aims of health-related MLs. A similar statistical analysis for classification purposes was carried out with BN implemented with Number Cruncher Statistical Systems (NCSS) on a dataset of CRC patients [96]. In this case, the positive prediction rate ranged from 78 to 84 per cent when estimating recurrence for the training dataset extracted from the ACTUR database. The main limitation of this work is data reliability and consistency due the military nature of some institutions feeding the data source, which lack approved programs for cancer treatments. RF is another widely used recurrent classification algorithm that is already used to predict the response to FOLFOX (5-FU, leucovorin and oxaliplatin) therapy [95]. The model was able to correctly predict 69.2% of cases in the test set. Relationships between genomic alterations and drug responses is a factor that could lead to enhanced individual therapies. Although both genomic features and chemical properties have been computationally analysed, there is still a lack of works studying both factors together. To shed some light on this topic, ANNs and RF were used to predict therapy outcomes [97]. The core of this work was the implementation of a three-layer ANN. The inputs were 608 cell lines and 111 drugs, a number between 1 and 30 hidden nodes were tested to find the best performing architecture and the IC50 predicted value was the only output. Note that the IC50 value is normalized in the range [0,1] by the sigmoid function added in the output layer. Based on the R2 performance metric, the model obtained 0.64 on the test dataset extracted from the GDSC portal and 0.61 on an external validation dataset. Then, a RF implemented in R was developed to ascertain whether the ANN model could be improved but it resulted in a R2 of 0.59 on the blind test dataset, which is a slightly lower value than that achieved by the ANN model. Although the results look promising, the model has some limitations that could be overcome by adding more cell lines, epigenetics data and gene expression data as inputs. Classification algorithms could also help in identifying potential biomarkers too which is another topic that has received increasing attention in recent years. An R-implemented RF [99] for this task achieved 81% accuracy in the validation dataset. A feature selection step is carried out in this study before the classification. Reducing the dimension of the input makes the classifier faster and facilitates interpretation of the results by clinicians.

The diversity of classification and regression algorithms makes scientists wonder about the best choice to build new models and benchmark their own. To fairly assess some of the most typical classifiers an extensive study was carried out [121] with a set of algorithms. Six classifiers were evaluated on twelve datasets related to different cancers (lung, head, neck, meningioma and laryngeal) using the AUC as a measure of which ones will work well in the future too. Although none of the algorithms stood out over the others, RF and Elastic Net Logistic Regression (ENLR) exhibited a higher discriminative power in chemo and radiotherapy outcome. Therefore, it is suggested that they might be the first choice when building classification models. The authors also claim that RF and ENLR should be the preferred option against which custom models should be compared.

Many other supervised learning approaches can be found in the literature. Most of the cases exploit datasets from the National Center for Biotechnology Information (NCBI) or collected from local institutions. SVMs represent a method that is commonly adopted to predict tumour progress after therapy and is especially helpful when predicting FOLFOX therapy results in CRC patients because this type of algorithm usually works with images. When working alone, SVM reached a positive prediction rate of 85.4% [98], which is similar to that obtained by RF. However, SVM can also be combined with LR to provide a novel scoring method to measure the tumour size response to therapy, as it outperforms the traditional WHO and RECIST measurements [100].

Recent studies assessed a variety of ML methods in CRC prediction scenarios. Lu [65] compared six models implemented with R packages in a FOLFOX response prediction task. The models represented the following approaches: RF, SVM, ANN, DT, KNN and GBM. The experimental tests showed that RF and SVM were the most accurate methods when predicting FOLFOX outcome. Unfortunately, their performance fell off when predicting other therapies such as FOLFIRI (5-FU, leucovorin and irinotecan), therefore, their application to future patients is limited. The reason for this reduction in performance when using alternative therapies seems to be related with the aforementioned use of unexplored datasets with different characteristics, which would indicate a close relationship between the model and the training data. The third best-ranked classifier was the ANN model, whose accuracy was close to that of RF and SVM but was more consistent when confronted with other therapies. This result demonstrates that ANNs constitute a powerful predictive tool for future CRC studies. In another work [66] the authors assessed four ML methods (LR, DT, GBM and Light GBM) and found GBM and Light GBM to be more accurate than the others. This evidence leads us to think that GBM probably gain in importance in the near future. Finally, the rapid development of ANN and its variants (e.g., recurrent neural networks, convolutional neural networks, adversarial neural networks) has encouraged scientists to develop enhanced and more powerful networks capable of profiting from HPC architectures. As a result of that evolution, several libraries (e.g., TensorFlow) are widely used nowadays. Tensor-based networks are especially useful for image processing due to their ability to exploit all the computing power of GPUs to analyse images in a parallel manner. This novel ML paradigm has been used to build a CNN model that anticipates the outcome after resection based on a dataset of 12 million images [94].

The poor interpretability of the results is a challenge that needs to be faced. Raw estimations or complicated charts might be unintelligible to doctors and may render any ML algorithm worthless for practical reasons. The data types feeding ML systems intended to predict therapy outcomes are very different, ranging from binary data to well-structured records (e.g., Excel, CSV, database records). In this step, the application of image processing through CNN is not so frequent, as explained in the previous section, but still constitutes the preferred approach when manipulating images, as can be seen in Table 5.

3.6. Predicting Survival Likelihood

Once cancer has been diagnosed, classified and treated, the next questions are how the tumour will evolve and how likely is the patient’s survival. The former was already answered in Section 3.3, so this section will focus on the available ML methods for the latter. Note that the works introduced in Section 3.5 not necessarily predict the survival chances in months, for example, but is more likely to focus on how the treatment will reduce the tumour size. The prognostication of a patient’s survivability is not easy and depends on many factors, such as the type of cancer and the stage. Fortunately, ML can help doctors evaluate survival chances by analysing several biomarkers in a systematic manner. With the aim of answering this question, Zhu [144] summarizes an extensive collection of works concerning the use of DL in cancer prognosis, including some that estimate the survival likelihood and even the survival time.

According to a recent review [33], SVMs provide the most accurate predictions of cancer survival. Although all the analysed studies are trained on small datasets, they are able to reach up to 98% and 97% accuracy in oral [128] and breast [123] cancer, respectively. Other approaches such as ANNs [72] and BNs [125] are showing good results as well, attaining more than 83% accuracy and both are expected to gain in importance in coming years. On the other hand, SSL, which only works with a few labelled samples, has emerged as a feasible alternative to the classic supervised and unsupervised learning paradigms but, as its results show (71% and 76% accuracy reported by [126] and [67]), predictive capacity of this approach still has to be improved. Nevertheless, another study on lung cancer that used similar ML techniques yielded different results [130]. The authors evaluated linear regression, DT, SVM, GBM and a custom ensemble, finding that GBM was the most accurate model in terms of root mean square error (RMSE). All the models were implemented in R language and trained on SEER database. In recent decades, cancer has been one of the preferred fields for the assessment of ML models to predict survival likelihood. An analysis of survivability in prostate cancer patients [133] was carried out using three non-linear statistical methods: DT, BN and Cox [145]. This work represents a case study that aims to demonstrate that ML classifiers are useful for estimating a patient’s survival chances, a process that is receiving increasing attention from ML experts. The authors conclude that ML statistical models could be helpful in the near future for predicting survival and other issues such as the probability of recurrence in cancer patients.

The new wave of ML is dominated by ANNs and their subtypes such as convolutional, recurrent or adversarial neural networks, among others. CRC can also profit from ANNs to predict survival chances, especially when the input datasets are image collections and the use of CNNs is advantageous. A recent work [131] described the training of a DL system, built on convolutional and recurrent neural networks, to classify tumour images. Such classifications of tumour images are a frequent way of predicting tumour evolution and, consequently, evaluating survival chances. It is worth mentioning that the classifier used by these authors ran on a GPU to accelerate the processing and deliver the results in a short time. GPUs can speed up CNN calculations dramatically, which is a huge advantage due to the large number of samples that CNNs usually deal with and the high number of layers they have. Other cancer types also take advantage of CNNs and exploit GPU computing power. Such is the case with brain cancer, for which condition patient survival can be estimated by means of the recently published classifier DeepSurvNet [132]. DeepSurvNet builds CNN models implemented with Keras and TensorFlow libraries, which are trained with a dataset from the TCGA Program [146]. The models classify the patients into four groups, each with an estimated overall survival.

The use of ML approaches whose output can be graphically represented, such as BN, DT and CNN, facilitates the interpretation of survival chances by healthcare professionals. The easy interpretation of results should always be taken as a requirement when ML is to be applied in a context outside computer sciences. It is also worth noting that medical records extracted from public databases are a common input [147,148] when evaluating survivability, which indicates that long-term well-structured data are the most useful data source to predict survival chances.

4. Software and Datasets

In this section, we will summarize the most relevant technical details extracted from the above-mentioned works, such as the software tools created, the availability of the source code, the use of HPC platforms and the main features of the datasets. Figure 2 summarizes the approaches applied at every stage. It can be observed that ANN, LR and SVM are the most common methods in cancer research. RF, BN, DT, KNN, GBM and CNN are also used frequently but are not reported in all the tasks.

4.1. Software Tools

In Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 we have enumerated a number of libraries and frameworks frequently used for developing ML models. There is a clear trend to implement models in R and Python languages. R is a good choice for rapid development due to the diverse collection of packages it provides (e.g., caret, e1071, Bioconductor) and the many possibilities it offers to create different models, including SVM, RF and DT, among others. Therefore, its simplicity and flexibility make it an attractive alternative for several scientists. The other preferred option is Python and, in particular, frameworks like TensorFlow and PyTorch. Tensor-based frameworks have gained in importance in recent years supported by the rapid development of GPUs, which are a very suitable hardware solution for tensored calculations. While the use of R implementations has been mentioned for several years, publications reporting works in Python-based frameworks tend to date from 2017. This confirms the intuition that the development of GPUs and more generally HPC, will be closely connected with the advances achieved in the performance of ML algorithms in the near future.

Many statistical tools are less frequently used. This group of statistical methods is composed of tools such as Matlab, SPSS, Caffe and Weka. Although they are not so powerful as programming languages, they offer many statistical features that allow the rapid development of models, including LR, SVM, ANN and BN. Furthermore, the indicated tools are well established in the academic world and so many scientists are familiar with them and their reliability has been extensively proved.

Despite being well known and a very stable language, Java is barely used in this context. Only the Encog and MLlib libraries are reported in the works. There may be many reasons to explain this, but the main ones are probably that Java is usually considered slower than other languages and that the users do not have the programming skills required by this tool.

Few authors share the source code of their models with the community (see Table 7). Sometimes they prefer to develop and release a novel tool providing the obtained models through a web interface [81,89]. While this is an understandable decision it hinders understanding of the models by external users. However, other researchers freely share their codes, usually on github and allow others to study and analyse how they are developed. From an objective point of view, this is the preferred solution because it allows existing codes to be better understood, improved and optimized, as well as the development of new models from a solid base.

4.2. HPC Infrastructures

While HPC platforms are rarely reported in the analyzed papers, the use of GPUs has increasingly been mentioned recent years (e.g., Bychkov et al., 2018; Zadeh Shirazi et al., 2020). The recent development of Tensor-based frameworks and libraries for ML, e.g., TensorFlow, Keras, PyTorch, has promoted the use of GPUs for programming ML algorithms [66,84,89,91,94,129,131,132]. The rapid integration of GPU computing in ML strongly suggests that faster ML algorithms will emerge in coming years, resulting in the ability to handle even larger training datasets.

Please note that, although references to other HPC paradigms have not been found in this revision, it is very possible that other authors have leveraged HPC platforms (e.g., parallel computing) in their works.

4.3. Datasets

We can broadly classify the input datasets into two major groups: (i) those obtained from publicly available databases; and (ii) those collected from institutions (e.g., hospitals or universities). Although both online and custom approaches are valid, public datasets facilitate the reproducibility of the experiments. SEER and TCGA databases are typically used in cancer research.

Leaving their source aside, we have focused on two properties of the datasets: the data types they contain and the size of the training dataset. The data types vary widely between works, including in terms of the text, images, medical records and binary data. Numerical values are the preferred option for feeding ML algorithms because they mostly work on numerical calculations. As can be observed in Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6, when public or private institutions are responsible for collecting data, they usually work with numerical data. In addition, text inputs are widely used, probably because they can be easily translated into numerical values. Images are typically used to feed CNNs due to the ability of this type of network to apply sequential filters on images and extract patterns. This is also a frequent option because many hospitals and universities have easy access to historical images from scans, tomography or mammography.

Structured information is more suitable for ML than unstructured. Usually, ML algorithms receive a set of well-defined inputs, which they evaluate and weight to make predictions. Therefore, when the inputs are clearly defined, the models can be easily developed (e.g., BN, ANN, LR, DT). This is the case of databases, such as ACTUR, SEER and TCGA and other datasets where the fields are undoubtedly separated.

The second key feature of the analysed datasets is their size, which ranges from tens to millions of samples. In general terms, neural networks (ANN, CNN and RNN) handle the largest datasets (e.g., 200,000, 463,080, 235,673 and 12 × 106 samples). Although a large number of samples may seem an advantage, their sheer numbers can slow the system down during training. Thus, finding a good balance between dataset size and learning capability is required. By contrast, the simplest approaches seem to need fewer data to learn as can be inferred from the fact that the smallest datasets (less than 100 training samples) are used by traditional methods such as RF and SVM. Figure 3 shows the reported dataset size used in ML algorithms.

5. Conclusions and Outlook

Decision-making is one of the main challenges in modern medicine is exercised at every stage of a disease’s lifecycle, from diagnosis to the prediction of recurrence. Traditionally, doctors have trusted their experience to choose the best option for individual patients. However, they cannot be expected to recall all the details of all the patients they have treated in the past, which clouds their ability to recognize patterns in similar situations. This is where computational help is required.

In recent years, AI and, more specifically, machine and deep learning, have looked at medical decision-making. In this context, anti-cancer medicine has been found to be a favourite playground due to the high mortality rate of the disease, the increasing number of cases expected in the forthcoming years and the vast amount of data available in databases of hospitals, universities and research centres. The diversity of existing cancer types encourages experiments with different ML algorithms aimed at the same target. In this review, we have analysed the generalized use of ML in cancer research but always bearing in mind CRC.

CRC is the fourth cause of mortality due to cancer worldwide and the number of cases that are expected to appear in the next decade is not promising, making it a suitable target for ML. Any ML algorithm can be applied on CRC research ranging from the simplest (e.g., LR, SVM, KNN) to the most complex ones (e.g., CNN, DNN) but it has been observed that ANN, LR and SVM are frequently reported in any task related with decision-making (risk prediction, recurrence prediction, tumour progression, estimation of drug synergy, therapy outcomes and survival time estimates). Moreover, RF, GBM, BN, DT, KNN and CNN are often applied in many cases.

There is no clear relationship between the selected approach and the type of data feeding the system. However, CNN is clearly the preferred option when manipulating medical images. It is clear that well-structured records with text or numerical fields are the simplest and favourite options when available. The dataset size is another key factor when training ML or DL systems. If the dataset is too small, the ML system will face difficulties related with learning and generalizing, whereas excessively large datasets may slow down the training phase. Thus, finding the optimal dataset size remains a challenge. As regards performance in terms of computing time, his key concern for scientists has resulted in the emergence of libraries and frameworks specifically focused on profiting from HPC facilities, such as GPUs. GPU are the preferred architecture for running CNN calculations and NVIDIA has placed its bet on this technology becoming the world’s leading manufacturer.

Interpretability has been identified as the third key point to worry about, although it is no less important than the typical accuracy and performance metrics. The importance of interpretability stems from the fact that ML is increasingly used in a medical context, where users are often inexperienced in interpreting AI metrics and results. Consequently, output must be translated into a language that physicians can understand. It has been perceived that interpretability is still barely considered in most of the works analysed, suggesting that it is a factor that can be improved in order to “democratize” AI in many other areas. To improve the interpretability of systems, feature selection methods are sometimes applied before classification. This technique helps to reduce the input size leading to faster classification and providing a more interpretable output. Some ML algorithms such as BN and DT are especially appropriate for this purpose because they return labelled directed graphs which are very easy to read and interpret.

In short, we predict a bright future of ML and DL in medical decision-making, but the results must be more explainable in this or any other context. Identifying the optimal training dataset size is another factor that deserves further study. Fortunately, the rapid development of HPC will make ML systems more efficient and enable them to transform the overwhelming quantity of historical data stored in public and private databases into real, reliable and valuable knowledge.

Author Contributions

All the authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been funded by grants from the European Project Horizon 2020 SC1-BHC-02-2019 [REVERT, ID:848098]; Fundación Séneca del Centro de Coordinación de la Investigación de la Región de Murcia [Project 20988/PI/18]; and Spanish Ministry of Economy and Competitiveness [CTQ2017-87974-R].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

This work has been funded by grants from the European Project Horizon 2020 SC1-BHC-02-2019 [REVERT, ID:848098]; Fundación Séneca del Centro de Coordinación de la Investigación de la Región de Murcia [Project 20988/PI/18]; and Spanish Ministry of Economy and Competitiveness [CTQ2017-87974-R]. Powered@NLHPC: This research was partially supported by the supercomputing infrastructure of the NLHPC (ECM-02).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

1CM	One-carbo metabolism
AI	Artificial Intelligence
ANN	Artificial Neural Network
AUC	Area Under the Curve
BC	Breast Cancer
BioBIM	InterInstitutional Multidisciplinary Biobank
BMI	Body Mass Index
BN	Bayesian Network
CCF	Cancer Cell Fraction
CNN	Convolutional Neural Network
CRC	Colorectal Cancer
DCNN	Dilated Convolutional Neural Network
DL	Deep Learning
DSS	Decision Support System
DT	Decision Tree
ELM	Extreme Learning Machine
EMR	Electronic Medical Record
ENLR	Elastic Net Logistic Regression
FOLFIRI	5-FU leucovorin and irinotecan
FOLFOX	5-FU leucovorin and oxaliplatin
FT	Fourier Transform
GBM	Gradient Boosting Machine
GEO	Gene Expression Omnibus
GOSS	Genetic Ontology Similarity Score
GPU	Graphics Processing Unit
HDF5	Hierarchical Data Format 5
HNSCC	Head and Neck Squamous Cell Carcinoma
HPC	High Performance Computing
ICBC	Iranian Centre for Breast Cancer
IMRT	Intensity Modulated Radiotherapy
KNN	K-Nearest Neighbours
LDA	Linear Discriminant Analysis
LPP	Locality Preserving Projection
LR	Logistic Regression
LSTM	Long Short-Term Memory
ML	Machine Learning
MVA	Multivariate analysis
NCBI	National Center for Biotechnology Information
NCSS	Number Cruncher Statistical Systems
NMSC	Non-Melanoma Skin Cancer
PCA	Principal Component Analysis
RECIST	Response Evaluation Criteria In Solid Tumors
REVOLVER	Repeated EVOLution in cancER
RF	Random Forest
RMSE	Root Mean Square Error
RNN	Recurrent Neural Network
RO	Random Optimization
ROC	Receiver Operating Characteristic
SAP	Single Amino Acid Polymorphism
SEABED	Segmentation and Biomarker Enrichment of Differential Treatment Response
SEER	Surveillance Epidemiology and End Results
SIFT	Sorting Intolerant From Tolerant
SKCM	Skin Cutaneous Melanoma
SNP	Single Nucleotide Polymorphism
SSL	Semi-Supervised Learning
SVC-W	Support Vector Classification with Weight
SVM	Support Vector Machine
SVM-L1	Support Vector Machine with L1 Regularization
TCGA	The Cancer Genome Atlas
TGF-β	Transforming Growth Factor beta
TL	Transfer Learning
WEKA-FCBF	Waikato Environment of Knowledge Analysis—Fast Correlation Based Filter
WHO	World Health Organization
XAI	Explainable Artificial Intelligence
YARN	Yet Another Resource Negotiator

References

Cronin, K.A.; Lake, A.J.; Scott, S.; Sherman, R.L.; Noone, A.M.; Howlader, N.; Henley, S.J.; Anderson, R.N.; Firth, A.U.; Ma, J.; et al. Annual report to the nation on the status of cancer, part I: National cancer statistics. Cancer 2018, 124, 2785–2800. [Google Scholar] [CrossRef] [Green Version]
Culp, M.B.B.; Soerjomataram, I.; Efstathiou, J.A.; Bray, F.; Jemal, A. Recent global patterns in prostate cancer incidence and mortality rates. Eur. Urol. 2020, 77, 38–52. [Google Scholar] [CrossRef]
Ferlay, J.; Soerjomataram, I.; Dikshit, R.; Eser, S.; Mathers, C.; Rebelo, M.; Parkin, D.M.; Forman, D.; Bray, F. Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012. Int. J. Cancer 2015, 136, E359–E386. [Google Scholar] [CrossRef]
Chiavenna, S.; Jaworski, J.P.; Vendrell, A. State of the art in anti-cancer mAbs. J. Biomed. Sci. 2017, 24. [Google Scholar] [CrossRef] [Green Version]
Loud, J.T.; Murphy, J. Cancer screening and early detection in the 21st century. Semin. Oncol. Nurs. 2017, 33, 121–128. [Google Scholar] [CrossRef] [Green Version]
Coleman, C. Early detection and screening for breast cancer. Semin. Oncol. Nurs. 2017, 33, 141–155. [Google Scholar] [CrossRef]
Araghi, M.; Soerjomataram, I.; Jenkins, M.; Brierley, J.; Morris, E.; Bray, F.; Arnold, M. Global trends in colorectal cancer mortality: Projections to the year 2035. Int. J. Cancer 2019, 144, 2992–3000. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Dekker, E.; Tanis, P.J.; Vleugels, J.L.A.; Kasi, P.M.; Wallace, M.B. Colorectal cancer. Lancet 2019, 394, 1467–1480. [Google Scholar] [CrossRef]
Arnold, M.; Sierra, M.S.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global patterns and trends in colorectal cancer incidence and mortality. Gut 2017, 66, 683–691. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kuipers, E.J.; Grady, W.M.; Lieberman, D.; Seufferlein, T.; Sung, J.J.; Boelens, P.G.; Van De Velde, C.J.H.; Watanabe, T. Colorectal cancer. Nat. Rev. Dis. Prim. 2015, 1. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Weinberg, B.A.; Marshall, J.L.; Salem, M.E. The growing challenge of young adults with colorectal cancer. Oncology 2017, 31, 381–389. [Google Scholar]
García-Figueiras, R.; Baleato-González, S.; Padhani, A.R.; Luna-Alcalá, A.; Marhuenda, A.; Vilanova, J.C.; Osorio-Vázquez, I.; Martínez-de-Alegría, A.; Gómez-Caamaño, A. Advanced imaging techniques in evaluation of colorectal cancer. Radiographics 2018, 38, 740–765. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Valle, L.; Vilar, E.; Tavtigian, S.V.; Stoffel, E.M. Genetic predisposition to colorectal cancer: Syndromes, genes, classification of genetic variants and implications for precision medicine. J. Pathol. 2019, 247, 574–588. [Google Scholar] [CrossRef]
Huang, D.; Sun, W.; Zhou, Y.; Li, P.; Chen, F.; Chen, H.; Xia, D.; Xu, E.; Lai, M.; Wu, Y.; et al. Mutations of key driver genes in colorectal cancer progression and metastasis. Cancer Metastasis Rev. 2018, 37, 173–187. [Google Scholar] [CrossRef] [PubMed]
Oh, M.; McBride, A.; Yun, S.; Bhattacharjee, S.; Slack, M.; Martin, J.R.; Jeter, J.; Abraham, I. BRCA1 and BRCA2 gene mutations and colorectal cancer risk: Systematic review and meta-analysis. J. Natl. Cancer Inst. 2018, 110, 1178–1189. [Google Scholar] [CrossRef] [Green Version]
Ruiz-López, L.; Blancas, I.; Garrido, J.M.; Mut-Salud, N.; Moya-Jódar, M.; Osuna, A.; Rodríguez-Serrano, F. The role of exosomes on colorectal cancer: A review. J. Gastroenterol. Hepatol. 2018, 33, 792–799. [Google Scholar] [CrossRef] [Green Version]
Yiu, A.J.; Yiu, C.Y. Biomarkers in colorectal cancer. Anticancer Res. 2016, 36, 1093–1102. [Google Scholar]
Lech, G.; Słotwiński, R.; Słodkowski, M.; Krasnodębski, I.W. Colorectal cancer tumour markers and biomarkers: Recent therapeutic advances. World J. Gastroenterol. 2016, 22, 1745–1755. [Google Scholar] [CrossRef] [PubMed]
Ding, D.; Han, S.; Zhang, H.; He, Y.; Li, Y. Predictive biomarkers of colorectal cancer. Comput. Biol. Chem. 2019, 83. [Google Scholar] [CrossRef] [PubMed]
Kather, J.N.; Halama, N.; Jaeger, D. Genomics and emerging biomarkers for immunotherapy of colorectal cancer. Semin. Cancer Biol. 2018, 52, 189–197. [Google Scholar] [CrossRef] [PubMed]
Jain, K.K. Personalised medicine for cancer: From drug development into clinical practice. Expert Opin. Pharmacother. 2005, 6, 1463–1476. [Google Scholar] [CrossRef]
Jackson, S.E.; Chester, J.D. Personalised cancer medicine. Int. J. Cancer 2015, 137, 262–266. [Google Scholar] [CrossRef]
Usher-Smith, J.A.; Silarova, B.; Lophatananon, A.; Duschinsky, R.; Campbell, J.; Warcaba, J.; Muir, K. Responses to provision of personalised cancer risk information: A qualitative interview study with members of the public. BMC Public Health 2017, 17. [Google Scholar] [CrossRef] [Green Version]
Olin, R.L. Delivering intensive therapies to older adults with hematologic malignancies: Strategies to personalize care. Blood 2019, 134, 2013–2021. [Google Scholar] [CrossRef] [PubMed]
Upton, A.; Trelles, O.; Cornejo-García, J.A.; Perkins, J.R. Review: High-performance computing to detect epistasis in genome scale data sets. Brief. Bioinform. 2016, 17, 368–379. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Schmidt, B.; Hildebrandt, A. Next-generation sequencing: Big data meets high performance computing. Drug Discov. Today 2017, 22, 712–717. [Google Scholar] [CrossRef] [PubMed]
Chen, S.; He, Z.; Han, X.; He, X.; Li, R.; Zhu, H.; Zhao, D.; Dai, C.; Zhang, Y.; Lu, Z.; et al. How big data and high-performance computing drive brain science. Genom. Proteom. Bioinf. 2019, 17, 381–392. [Google Scholar] [CrossRef]
Wang, H.; Ma, Y.; Pratx, G.; Xing, L. Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure. Phys. Med. Biol. 2011, 56, N175–N181. [Google Scholar] [CrossRef]
Garg, V.; Arora, S.; Gupta, C. Cloud computing approaches to accelerate drug discovery value chain. Comb. Chem. High Throughput Screen. 2011, 14, 861–871. [Google Scholar] [CrossRef]
Nobile, M.S.; Cazzaniga, P.; Tangherloni, A.; Besozzi, D. Graphics processing units in bioinformatics, computational biology and systems biology. Brief. Bioinf. 2017, 18, 870–885. [Google Scholar] [CrossRef] [Green Version]
Dilsizian, S.E.; Siegel, E.L. Artificial intelligence in medicine and cardiac imaging: Harnessing big data and advanced computing to provide personalized medical diagnosis and treatment. Curr. Cardiol. Rep. 2014, 16. [Google Scholar] [CrossRef]
Pérez-Sianes, J.; Pérez-Sánchez, H.; Díaz, F. Virtual Screening Meets Deep Learning. Curr. Comput. Aided. Drug Des. 2019, 15, 6–28. [Google Scholar] [CrossRef]
Kourou, K.; Exarchos, T.P.; Exarchos, K.P.; Karamouzis, M.V.; Fotiadis, D.I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 2015, 13, 8–17. [Google Scholar] [CrossRef] [Green Version]
Lipton, Z.C. The Mythos of Model Interpretability. arXiv 2018, arXiv:1301.3781. [Google Scholar] [CrossRef]
Jacovi, A.; Sar Shalom, O.; Goldberg, Y. Understanding convolutional neural networks for text classification. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP; Association for Computational Linguistics: Brussels, Belgium, 2018; pp. 56–65. [Google Scholar]
Swartout, W.; Paris, C.; Moore, J. Explanations in knowledge systems: Design for explainable expert systems. IEEE Exp. 1991, 6, 54–58. [Google Scholar] [CrossRef]
Johnson, W.L. Agents that learn to explain themselves. In Proceedings of the 12th National Conference on Artificial Intelligence (AAAI’ 94), Seattle, WA, USA, 31 July– 4 August 1994; pp. 1257–1263. [Google Scholar]
Lacave, C.; Díez, F.J. A review of explanation methods for Bayesian networks. Knowl. Eng. Rev. 2002, 17, 107–127. [Google Scholar] [CrossRef]
Holzinger, A.; Langs, G.; Denk, H.; Zatloukal, K.; Müller, H. Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1312. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Holzinger, A.; Dehmer, M.; Jurisica, I. Knowledge discovery and interactive data mining in bioinformatics—State-of-the-art, future challenges and research directions. BMC Bioinf. 2014, 15, I1. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lee, S.; Holzinger, A. Knowledge discovery from complex high dimensional data. In Solving Large Scale Learning Tasks. Challenges and Algorithms, Lecture Notes in Artificial Intelligence; Michaelis, S., Piatkowski, N., Eds.; Springer: Heidelberg, Germany, 2016; pp. 148–167. [Google Scholar]
Gunning, D. Explainable artificial intelligence (XAI). AI Mag. 2019, 40, 44–58. [Google Scholar] [CrossRef] [Green Version]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
Schlegl, T.; Seeböck, P.; Waldstein, S.M.; Schmidt-Erfurth, U.; Langs, G. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In Proceedings of the International Conference on Information Processing in Medical Imaging, Boone, NC, USA, 25–30 June 2017; pp. 146–157. [Google Scholar]
Bærøe, K.; Miyata-Sturm, A.; Henden, E. How to achieve trustworthy artificial intelligence for health. Bull. World Health Organ. 2020, 98, 257–262. [Google Scholar] [CrossRef]
Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 3319–3328. [Google Scholar]
Oakden-Rayner, L.; Palmer, L.J. Artificial intelligence in medicine: Validation and study design: Opportunities, applications and risks. In Artificial Intelligence in Medical Imaging; Springer: Cham, Switzerland, 2019; pp. 83–104. [Google Scholar]
Hermon, R.; Williams, P.A. Big data in healthcare: What is it used for? In Proceedings of the Australian Ehealth Informatics and Security Conference, Perth, Australia, 1–3 December 2014; pp. 40–49. [Google Scholar]
Archenaa, J.; Anita, E.M. A survey of big data analytics in healthcare and government. Procedia Comput. Sci. 2015, 50, 408–413. [Google Scholar] [CrossRef] [Green Version]
Ristevski, B.; Chen, M. Big Data Analytics in Medicine and Healthcare. J. Integr. Bioinform. 2018, 15, 20170030. [Google Scholar] [CrossRef] [PubMed]
Sun, H.; Liu, Z.; Wang, G.; Lian, W.; Ma, J. Intelligent analysis of medical big data based on deep learning. IEEE Access 2019, 7, 142022–142037. [Google Scholar] [CrossRef]
Hassan, A.K.; Hassan, Y.F.; Kholief, M.H. A deep classification system for medical big data analysis. J. Med. Imag. Health Inf. 2018, 8, 250–256. [Google Scholar]
Chen, S.; Wu, S. Identifying lung cancer risk factors in the elderly using deep neural networks: Quantitative analysis of web-based survey data. J. Med. Internet Res. 2020, 22, e17695. [Google Scholar] [CrossRef]
Kaminker, J.S.; Zhang, Y.; Watanabe, C.; Zhang, Z. CanPredict: A computational tool for predicting cancer-associated missense mutations. Nucleic Acids Res. 2007, 35. [Google Scholar] [CrossRef] [PubMed]
Capriotti, E.; Altman, R.B. A new disease-specific machine learning approach for the prediction of cancer-causing missense variants. Genomics 2011, 98, 310–317. [Google Scholar] [CrossRef] [Green Version]
Celli, F.; Cumbo, F.; Weitschek, E. Classification of large DNA methylation datasets for identifying cancer drivers. Big Data Res. 2018, 13, 21–28. [Google Scholar] [CrossRef]
Myte, R.; Gylling, B.; Häggström, J.; Schneede, J.; Magne Ueland, P.; Hallmans, G.; Johansson, I.; Palmqvist, R.; Van Guelpen, B. Untangling the role of one-carbon metabolism in colorectal cancer risk: A comprehensive Bayesian network analysis. Sci. Rep. 2017, 7. [Google Scholar] [CrossRef] [Green Version]
Ayer, T.; Alagoz, O.; Chhatwal, J.; Shavlik, J.W.; Kahn, C.E.; Burnside, E.S. Breast cancer risk estimation with artificial neural networks revisited: Discrimination and calibration. Cancer 2010, 116, 3310–3321. [Google Scholar] [CrossRef] [Green Version]
Heidari, M.; Khuzani, A.Z.; Hollingsworth, A.B.; Danala, G.; Mirniaharikandehei, S.; Qiu, Y.; Liu, H.; Zheng, B. Prediction of breast cancer risk using a machine learning approach embedded with a locality preserving projection algorithm. Phys. Med. Biol. 2018, 63. [Google Scholar] [CrossRef]
Behravan, H.; Hartikainen, J.M.; Tengström, M.; Pylkäs, K.; Winqvist, R.; Kosma, V.-M.; Mannermaa, A. Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls. Sci. Rep. 2018, 8. [Google Scholar] [CrossRef] [PubMed]
Taninaga, J.; Nishiyama, Y.; Fujibayashi, K.; Gunji, T.; Sasabe, N.; Iijima, K.; Naito, T. Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study. Sci. Rep. 2019, 9. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Roffman, D.; Hart, G.; Girardi, M.; Ko, C.J.; Deng, J. Predicting non-melanoma skin cancer via a multi-parameterized artificial neural network. Sci. Rep. 2018, 8. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Martínez-Más, J.; Bueno-Crespo, A.; Khazendar, S.; Remezal-Solano, M.; Martínez-Cendán, J.P.; Jassim, S.; Du, H.; Assam, H.A.; Bourne, T.; Timmerman, D. Evaluation of machine learning methods with Fourier Transform features for classifying ovarian tumors based on ultrasound images. PLoS ONE 2019, 14, e0219388. [Google Scholar] [CrossRef] [Green Version]
Martínez-Más, J.; Bueno-Crespo, A.; Martínez-España, R.; Remezal-Solano, M.; Ortiz-González, A.; Ortiz-Reina, S.; Martínez-Cendán, J.P. Classifying Papanicolaou cervical smears through a cell merger approach by deep learning technique. Exp. Syst. Appl. 2020, 160, 113707. [Google Scholar] [CrossRef]
Lu, W.; Fu, D.; Kong, X.; Huang, Z.; Hwang, M.; Zhu, Y.; Chen, L.; Jiang, K.; Li, X.; Wu, Y.; et al. FOLFOX treatment response prediction in metastatic or recurrent colorectal cancer patients via machine learning algorithms. Cancer Med. 2020, 9, 1419–1429. [Google Scholar] [CrossRef]
Xu, Y.; Ju, L.; Tong, J.; Zhou, C.M.; Yang, J.J. Machine Learning Algorithms for Predicting the Recurrence of Stage IV Colorectal Cancer After Tumor Resection. Sci. Rep. 2020, 10, 1–9. [Google Scholar] [CrossRef]
Kim, J.; Shin, H. Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data. J. Am. Med. Inf. Assoc. 2013, 20, 613–618. [Google Scholar] [CrossRef] [Green Version]
Ahmad, L.; Eshlaghy, A.; Poorebrahimi, A.; Ebrahimi, M.; AR, R. Using three machine learning techniques for predicting breast cancer recurrence. J. Health Med. Inf. 2013, 4. [Google Scholar] [CrossRef] [Green Version]
Ferroni, P.; Zanzotto, F.M.; Riondino, S.; Scarpato, N.; Guadagni, F.; Roselli, M. Breast cancer prognosis using a machine learning approach. Cancers 2019, 11, 328. [Google Scholar] [CrossRef] [Green Version]
Park, C.; Ahn, J.; Kim, H.; Park, S. Integrative gene network construction to analyze cancer recurrence using semi-supervised learning. PLoS ONE 2014, 9. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Exarchos, K.P.; Goletsis, Y.; Fotiadis, D.I. Multiparametric decision support system for the prediction of oral cancer reoccurrence. IEEE Trans. Inf. Technol. Biomed. 2012, 16, 1127–1134. [Google Scholar] [CrossRef] [PubMed]
Tseng, C.J.; Lu, C.J.; Chang, C.C.; Chen, G. Den Application of machine learning to predict the recurrence-proneness for cervical cancer. Neural Comput. Appl. 2014, 24, 1311–1316. [Google Scholar] [CrossRef]
Dercle, L.; Lu, L.; Schwartz, L.H.; Qian, M.; Tejpar, S.; Eggleton, P.; Zhao, B.; Piessevaux, H. Radiomics response signature for identification of metastatic colorectal cancer sensitive to therapies targeting EGFR pathway. J. Natl. Cancer Inst. 2020, 112, 902–912. [Google Scholar] [CrossRef] [PubMed]
Yates, L.R.; Gerstung, M.; Knappskog, S.; Desmedt, C.; Gundem, G.; Van Loo, P.; Aas, T.; Alexandrov, L.B.; Larsimont, D. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 2015, 21, 751–759. [Google Scholar] [CrossRef]
Gerlinger, M.; Horswell, S.; Larkin, J.; Rowan, A.J.; Salm, M.P.; Varela, I.; Fisher, R.; McGranahan, N.; Matthews, N.; Santos, C.R.; et al. Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing. Nat. Genet. 2014, 46, 225–233. [Google Scholar] [CrossRef]
Caravagna, G.; Giarratano, Y.; Ramazzotti, D.; Tomlinson, I.; Graham, T.A.; Sanguinetti, G.; Sottoriva, A. Detecting repeated cancer evolution from multi-region tumor sequencing data. Nat. Methods 2018, 15, 707–714. [Google Scholar] [CrossRef]
Auslander, N.; Wolf, Y.I.; Koonin, E.V. In silico learning of tumor evolution through mutational time series. Proc. Natl. Acad. Sci. USA 2019, 116, 9501–9510. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Albertazzi, E.; Cajone, F.; Leone, B.E.; Naguib, R.N.; Lakshmi, M.S.; Sherbet, G.V. Expression of metastasis-associated genes h-mts1 (S100A4) and nm23 in carcinoma of breast is related to disease progression. DNA Cell Biol. 1998, 17, 335–342. [Google Scholar] [CrossRef]
Grey, S.R.; Dlay, S.S.; Leone, B.E.; Cajone, F.; Sherbet, G. V Prediction of nodal spread of breast cancer by using artificial neural network-based analyses of S100A4, nm23 and steroid receptor expression. Clin. Exp. Metastasis 2003, 20, 507–514. [Google Scholar] [CrossRef] [PubMed]
Ishii, H.; Saitoh, M.; Sakamoto, K.; Sakamoto, K.; Saigusa, D.; Kasai, H.; Ashizawa, K.; Miyazawa, K.; Takeda, S.; Masuyama, K.; et al. Lipidome-based rapid diagnosis with machine learning for detection of TGF-β signalling activated area in head and neck cancer. Br. J. Cancer 2020, 122, 995–1004. [Google Scholar] [CrossRef] [PubMed]
Bhalla, S.; Kaur, H.; Dhall, A.; Raghava, G.P.S. Prediction and analysis of skin cancer progression using genomics profiles of patients. Sci. Rep. 2019, 9. [Google Scholar] [CrossRef] [Green Version]
Shiraishi, S.; Tan, J.; Olsen, L.A.; Moore, K.L. Knowledge-based prediction of plan quality metrics in intracranial stereotactic radiosurgery. Med. Phys. 2015, 42, 908. [Google Scholar] [CrossRef] [PubMed]
Shiraishi, S.; Moore, K.L. Knowledge-based prediction of three-dimensional dose distributions for external beam radiotherapy. Med. Phys. 2016, 43, 378–387. [Google Scholar] [CrossRef]
Nguyen, D.; Long, T.; Jia, X.; Lu, W.; Gu, X.; Iqbal, Z.; Jiang, S. A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning. Sci. Rep. 2019, 9. [Google Scholar] [CrossRef] [Green Version]
Musen, M.A.; Tu, S.W.; Das, A.K.; Shahar, Y. EON: A component-based approach to automation of protocol-directed therapy. Emerg. Infect. Dis. 1996, 3, 367–388. [Google Scholar] [CrossRef] [Green Version]
Celebi, R.; Movva, R.; Alpsoy, S.; Dumontier, M. In-silico prediction of synergistic anti-cancer drug combinations using multi-omics data. Sci. Rep. 2019, 9. [Google Scholar] [CrossRef] [Green Version]
Keshava, N.; Toh, T.S.; Yuan, H.; Yang, B.; Menden, M.P.; Wang, D. Defining subpopulations of differential drug response to reveal novel target populations. NPJ Syst. Biol. Appl. 2019, 5. [Google Scholar] [CrossRef] [PubMed]
O’Neil, J.; Benita, Y.; Feldman, I.; Chenard, M.; Roberts, B.; Liu, Y.; Li, J.; Kral, A.; Lejnine, S.; Loboda, A.; et al. An unbiased oncology compound screen to identify novel combination strategies. Mol. Cancer Ther. 2016, 15, 1155–1162. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Preuer, K.; Lewis, R.P.I.; Hochreiter, S.; Bender, A.; Bulusu, K.C.; Klambauer, G. DeepSynergy: Predicting anti-cancer drug synergy with Deep Learning. Bioinformatics 2018, 34, 1538–1546. [Google Scholar] [CrossRef] [PubMed] [Green Version]
McIntosh, C.; Purdie, T.G. Contextual atlas regression forests: Multiple-atlas-based automated dose prediction in radiation therapy. IEEE Trans. Med. Imag. 2016, 35, 1000–1012. [Google Scholar] [CrossRef]
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
Weinstein, J.N.; Kohn, K.W.; Grever, M.R.; Viswanadhan, V.N.; Rubinstein, L.V.; Monks, A.P.; Scudiero, D.A. Neural computing in cancer drug development: Predicting mechanism of action. Science 1992, 258, 447–451. [Google Scholar] [CrossRef]
Skrede, O.J.; De Raedt, S.; Kleppe, A.; Hveem, T.S.; Liestøl, K.; Maddison, J.; Askautrud, H.A.; Pradhan, M.; Nesheim, J.A.; Albregtsen, F.; et al. Deep learning for prediction of colorectal cancer outcome: A discovery and validation study. Lancet 2020, 395, 350–360. [Google Scholar] [CrossRef]
Tsuji, S.; Midorikawa, Y.; Takahashi, T.; Yagi, K.; Takayama, T.; Yoshida, K.; Sugiyama, Y.; Aburatani, H. Potential responders to FOLFOX therapy for colorectal cancer by Random Forests analysis. Br. J. Cancer 2012, 106, 126–132. [Google Scholar] [CrossRef] [Green Version]
Steele, S.R.; Bilchik, A.; Johnson, E.K.; Nissan, A.; Peoples, G.E.; Berhardt, J.S.; Kalina, P.; Petersen, B.; Brücher, B.; Protic, M.; et al. Time-dependent estimates of recurrence and survival in colon cancer: Clinical decision support system tool development for adjuvant therapy and oncological outcome assessment. Am. Surg. 2014, 80, 441–453. [Google Scholar] [CrossRef] [PubMed]
Menden, M.P.; Iorio, F.; Garnett, M.; McDermott, U.; Benes, C.H.; Ballester, P.J.; Saez-Rodriguez, J. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS ONE 2013, 8. [Google Scholar] [CrossRef] [Green Version]
Lin, H.; Qiu, X.; Zhang, B.; Zhang, J. Identification of the predictive genes for the response of colorectal cancer patients to FOLFOX therapy. Oncol. Targets Ther. 2018, 11, 5943–5955. [Google Scholar] [CrossRef] [Green Version]
Gan, Z.; Zou, Q.; Lin, Y.; Xu, Z.; Huang, Z.; Chen, Z.; Lv, Y. Identification of a 13-gene-based classifier as a potential biomarker to predict the effects of fluorouracil-based chemotherapy in colorectal cancer. Oncol. Lett. 2019, 17, 5057–5063. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Land, W.H.; Margolis, D.; Gottlieb, R.; Yang, J.Y.; Krupinski, E.A. Improving CT prediction of treatment response in patients with metastatic colorectal carcinoma using statistical learning. Int. J. Comput. Biol. Drug Des. 2010, 3, 15–18. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Del Rio, M.; Molina, F.; Bascoul-Mollevi, C.; Copois, V.; Bibeau, F.; Chalbos, P.; Bareil, C.; Kramar, A.; Salvetat, N.; Fraslon, C.; et al. Gene expression signature in advanced colorectal cancer patients select drugs and response for the use of leucovorin, fluorouracil, and irinotecan. J. Clin. Oncol. 2007, 25, 773–780. [Google Scholar] [CrossRef] [PubMed]
Hess, K.R.; Anderson, K.; Symmans, W.F.; Valero, V.; Ibrahim, N.; Mejia, J.A.; Booser, D.; Theriault, R.L.; Buzdar, A.U.; Dempsey, P.J.; et al. Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. J. Clin. Oncol. 2006, 24, 4236–4244. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Thuerigen, O.; Schneeweiss, A.; Toedt, G.; Warnat, P.; Hahn, M.; Kramer, H.; Brors, B.; Rudlowski, C.; Benner, A.; Schuetz, F.; et al. Gene expression signature predicting pathologic complete response with gemcitabine, epirubicin, and docetaxel in primary breast cancer. J. Clin. Oncol. 2006, 24, 1839–1845. [Google Scholar] [CrossRef] [PubMed]
Harris, L.N.; You, F.; Schnitt, S.J.; Witkiewicz, A.; Lu, X.; Sgroi, D.; Ryan, P.D.; Come, S.E.; Burstein, H.J.; Lesnikoski, B.A.; et al. Predictors of resistance to preoperative trastuzumab and vinorelbine for HER2-positive early breast cancer. Clin. Cancer Res. 2007, 13, 1198–1207. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Mitra, A.P.; Skinner, E.C.; Miranda, G.; Daneshmand, S. A precystectomy decision model to predict pathological upstaging and oncological outcomes in clinical stage T2 bladder cancer. BJU Int. 2013, 111, 240–248. [Google Scholar] [CrossRef] [Green Version]
Talby, L.; Chambost, H.; Roubaud, M.-C.; N’Guyen, C.; Milili, M.; Loriod, B.; Fossat, C.; Picard, C.; Gabert, J.; Chiappetta, P.; et al. The chemosensitivity to therapy of childhood early B acute lymphoblastic leukemia could be determined by the combined expression of CD34, SPI-B and BCR genes. Leuk. Res. 2006, 30, 665–676. [Google Scholar] [CrossRef]
Huang, C.C.; Gadd, S.; Breslow, N.; Cutcliffe, C.; Sredni, S.T.; Helenowski, I.B.; Dome, J.S.; Grundy, P.E.; Green, D.M.; Fritsch, M.K.; et al. Predicting relapse in favorable histology wilms tumor using gene expression analysis: A report from the renal tumor committee of the children’s oncology group. Clin. Cancer Res. 2009, 15, 1770–1778. [Google Scholar] [CrossRef] [Green Version]
Dressman, H.K.; Berchuck, A.; Chan, G.; Zhai, J.; Bild, A.; Sayer, R.; Cragun, J.; Clarke, J.; Whitaker, R.S.; Li, L.H.; et al. An integrated genomic-based approach to individualized treatment of patients with advanced-stage ovarian cancer. J. Clin. Oncol. 2007, 25, 517–525. [Google Scholar] [CrossRef] [Green Version]
Duong, C.; Greenawalt, D.M.; Kowalczyk, A.; Ciavarella, M.L.; Raskutti, G.; Murray, W.K.; Phillips, W.A.; Thomas, R.J.S. Pretreatment gene expression profiles can be used to predict response to neoadjuvant chemoradiotherapy in esophageal cancer. Ann. Surg. Oncol. 2007, 14, 3602–3609. [Google Scholar] [CrossRef] [PubMed]
Belderbos, J.; Heemsbergen, W.; Hoogeman, M.; Pengel, K.; Rossi, M.; Lebesque, J. Acute esophageal toxicity in non-small cell lung cancer patients after high dose conformal radiotherapy. Radiother. Oncol. 2005, 75, 157–164. [Google Scholar] [CrossRef] [PubMed]
Bots, W.T.C.; van den Bosch, S.; Zwijnenburg, E.M.; Dijkema, T.; van den Broek, G.B.; Weijs, W.L.J.; Verhoef, L.C.G.; Kaanders, J.H.A.M. Reirradiation of head and neck cancer: Long-term disease control and toxicity. Head Neck 2017, 39, 1122–1130. [Google Scholar] [CrossRef] [PubMed]
Carvalho, S.; Troost, E.G.C.; Bons, J.; Menheere, P.; Lambin, P.; Oberije, C. Prognostic value of blood-biomarkers related to hypoxia, inflammation, immune response and tumour load in non-small cell lung cancer—A survival model with external validationPrognostic value of blood-biomarkers in NSCLC. Radiother. Oncol. 2016, 119, 487–494. [Google Scholar] [CrossRef] [Green Version]
Janssens, G.O.; Rademakers, S.E.; Terhaard, C.H.; Doornaert, P.A.; Bijl, H.P.; Van Ende, P.D.; Chin, A.; Marres, H.A.; De Bree, R.; Van Der Kogel, A.J.; et al. Accelerated radiotherapy with carbogen and nicotinamide for laryngeal cancer: Results of a phase III randomized trial. J. Clin. Oncol. 2012, 30, 1777–1783. [Google Scholar] [CrossRef] [PubMed]
Jochems, A.; Deist, T.M.; El Naqa, I.; Kessler, M.; Mayo, C.; Reeves, J.; Jolly, S.; Matuszak, M.; Ten Haken, R.; van Soest, J.; et al. Developing and validating a survival prediction model for NSCLC patients through distributed learning across 3 countries. Int. J. Radiat. Oncol. Biol. Phys. 2017, 99, 344–352. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kwint, M.; Uijterlinde, W.I.; Nijkamp, J.; Chen, C.; de Bois, J.; Sonke, J.J.; van Herk, M.; van den Heuvel, M.M.; Belderbos, J. Acute esophagus toxicity in lung cancer patients after Intensity Modulated Radiotherapy and concurrent chemotherapy. Int. J. Radiat. Oncol. Biol. Phys. 2012, 84, 223–228. [Google Scholar] [CrossRef] [PubMed]
Lustberg, T.; Bailey, M.; Thwaites, D.I.; Miller, A.; Carolan, M.; Holloway, L.; Velazquez, E.R.; Hoebers, F.; Dekker, A. Implementation of a rapid learning platform: Predicting 2-year survival in laryngeal carcinoma patients in a clinical setting. Oncotarget 2016, 7, 37288–37296. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Oberije, C.; De Ruysscher, D.; Houben, R.; Van De Heuvel, M.; Uyterlinde, W.; Deasy, J.O.; Belderbos, J.; Dingemans, A.M.C.; Rimner, A.; Din, S.; et al. A validated prediction model for overall survival from stage III non-small cell lung cancer: Toward survival prediction for individual patients. Int. J. Radiat. Oncol. Biol. Phys. 2015, 92, 935–944. [Google Scholar] [CrossRef] [Green Version]
Olling, K.; Nyeng, D.W.; Wee, L. Predicting acute odynophagia during lung cancer radiotherapy using observations derived from patient-centred nursing care. Tech. Innov. Patient Support Radiat. Oncol. 2018, 5, 16–20. [Google Scholar] [CrossRef]
Wijsman, R.; Dankers, F.J.W.M.; Troost, E.G.C.; Hoffmann, A.L.; van der Heijden, E.H.F.M.; de Geus-Oei, L.-F.; Bussink, J. Multivariable normal-tissue complication modeling of acute esophageal toxicity in advanced stage non-small cell lung cancer patients treated with intensity-modulated (chemo-)radiotherapy. Radiother. Oncol. 2015, 117, 49–54. [Google Scholar] [CrossRef] [PubMed]
Wijsman, R.; Dankers, F.J.W.M.; Troost, E.G.C.; Hoffmann, A.L.; van der Heijden, E.H.F.M.; de Geus-Oei, L.F.; Bussink, J. Inclusion of incidental radiation dose to the cardiac atria and ventricles does not improve the prediction of radiation pneumonitis in advanced-stage non-small cell lung cancer patients treated with intensity modulated radiation therapy. Int. J. Radiat. Oncol. Biol. Phys. 2017, 99, 434–441. [Google Scholar] [CrossRef]
Deist, T.M.; Dankers, F.J.W.M.; Valdes, G.; Wijsman, R.; Hsu, I.C.; Oberije, C.; Lustberg, T.; van Soest, J.; Hoebers, F.; Jochems, A.; et al. Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers. Med. Phys. 2018, 45, 3449–3459. [Google Scholar] [CrossRef] [PubMed]
Van de Vijver, M.J.; Yudong, D.H.; van’t Veer, L.J.; Dai, H.; Hart, A.A.M.; Voskuil, D.W.; Schreiber, G.J.; Peterse, J.L.; Roberts, C. A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 2002, 347, 1999–2009. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Xu, X.; Zhang, Y.; Zou, L.; Wang, M.; Li, A. A gene signature for breast cancer prognosis using support vector machine. In Proceedings of the 5th International Conference on Biomedical Engineering and Informatics—BMEI 2012, Chongqing, China, 16–18 October 2012; pp. 928–931. [Google Scholar]
Van’t Veer, L.J.; Dai, H.; Van de Vijver, M.J.; He, Y.D.; Hart, A.A.M.; Mao, M.; Peterse, H.L.; Van Der Kooy, K.; Marton, M.J.; Witteveen, A.T.; et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002, 415, 530–536. [Google Scholar] [CrossRef] [Green Version]
Gevaert, O.; De Smet, F.; Timmerman, D.; Moreau, Y.; De Moor, B. Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics 2006, 22, e184–e190. [Google Scholar] [CrossRef] [Green Version]
Park, K.; Ali, A.; Kim, D.; An, Y.; Kim, M.; Shin, H. Robust predictive model for evaluating breast cancer survivability. Eng. Appl. Artif. Intell. 2013, 26, 2194–2205. [Google Scholar] [CrossRef]
Delen, D.; Walker, G.; Kadam, A. Predicting breast cancer survivability: A comparison of three data mining methods. Artif. Intell. Med. 2005, 34, 113–127. [Google Scholar] [CrossRef]
Rosado, P.; Lequerica-Fernandez, P.; Villallain, L.; Peña, I.; Sánchez-Lasheras, F.; De Vicente, J.C. Survival model in oral squamous cell carcinoma based on clinicopathological parameters, molecular markers and support vector machines. Expert Syst. Appl. 2013, 40, 4770–4776. [Google Scholar] [CrossRef]
Chen, Y.-C.; Ke, W.-C.; Chiu, H.-W. Risk classification of cancer survival using ANN with gene expression data from multiple laboratories. Comput. Biol. Med. 2014, 48, 1–7. [Google Scholar] [CrossRef] [PubMed]
Lynch, C.M.; Abdollahi, B.; Fuqua, J.D.; de Carlo, A.R.; Bartholomai, J.A.; Balgemann, R.N.; van Berkel, V.H.; Frieboes, H.B. Prediction of lung cancer patient survival via supervised machine learning classification techniques. Int. J. Med. Inform. 2017, 108, 1–8. [Google Scholar] [CrossRef] [PubMed]
Bychkov, D.; Linder, N.; Turkki, R.; Nordling, S.; Kovanen, P.E.; Verrill, C.; Walliander, M.; Lundin, M.; Haglund, C.; Lundin, J. Deep learning based tissue analysis predicts outcome in colorectal cancer. Sci. Rep. 2018, 8, 1–11. [Google Scholar] [CrossRef] [PubMed]
Zadeh Shirazi, A.; Fornaciari, E.; Bagherian, N.S.; Ebert, L.M.; Koszyca, B.; Gomez, G.A. DeepSurvNet: Deep survival convolutional network for brain cancer survival rate classification based on histopathological images. Med. Biol. Eng. Comput. 2020. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zupan, B.; Demšar, J.; Kattan, M.W.; Beck, J.R.; Bratko, I. Machine learning for survival analysis: A case study on recurrence of prostate cancer. Artif. Intell. Med. 2000, 20, 59–75. [Google Scholar] [CrossRef] [Green Version]
Kim, W.; Kim, K.S.; Lee, J.E.; Noh, D.Y.; Kim, S.W.; Jung, Y.S.; Park, M.Y.; Park, R.W. Development of novel breast cancer recurrence prediction model using support vector machine. J. Breast Cancer 2012, 15, 230–238. [Google Scholar] [CrossRef] [Green Version]
McGranahan, N.; Swanton, C. Clonal heterogeneity and tumor evolution: Past, present, and the future. Cell 2017, 168, 613–628. [Google Scholar] [CrossRef] [Green Version]
Greaves, M.; Maley, C.C. Clonal evolution in cancer. Nature 2012, 481, 306–313. [Google Scholar] [CrossRef]
Hall, A.; Massagué, J. Cell regulation. Curr. Opin. Cell Biol. 2008, 20, 117–118. [Google Scholar] [CrossRef]
Greenberg, E.S.; Chong, K.K.; Huynh, K.T.; Tanaka, R.; Hoon, D.S.B. Epigenetic biomarkers in skin cancer. Cancer Lett. 2012, 342, 170–177. [Google Scholar] [CrossRef] [Green Version]
Mazar, J.; Khaitan, D.; DeBlasio, D.; Zhong, C.; Govindarajan, S.S.; Kopanathi, S.; Zhang, S.; Ray, A.; Perera, R.J. Epigenetic regulation of MicroRNA genes and the role of miR-34b in cell invasion and motility in human melanoma. PLoS ONE 2011, 6, e24922. [Google Scholar] [CrossRef] [Green Version]
Mokhtari, R.B.; Homayouni, T.S.; Baluch, N.; Morgatskaya, E.; Kumar, S.; Das, B.; Yeger, H. Combination therapy in combating cancer. Oncotarget 2017, 8, 38022–38043. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Menden, M.P.; Wang, D.; Guan, Y.; Mason, M.J.; Szalai, B.; Bulusu, K.C.; Yu, T.; Kang, J.; Jeon, M.; Wolfinger, R.; et al. A cancer pharmacogenomic screen powering crowd-sourced advancement of drug combination prediction. bioRxiv 2017. [Google Scholar] [CrossRef] [Green Version]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Kearney, V.; Chan, J.W.; Valdes, G.; Solberg, T.D.; Yom, S.S. The application of artificial intelligence in the IMRT planning process for head and neck cancer. Oral Oncol. 2018, 87, 111–116. [Google Scholar] [CrossRef]
Zhu, W.; Xie, L.; Han, J.; Guo, X. The application of deep learning in cancer prognosis prediction. Cancers 2020, 12, 603. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B 1972, 34, 187–220. [Google Scholar] [CrossRef]
Weinstein, J.N.; Collison, E.A.; Mills, G.B.; Shaw, K.R.M.; Ozenberger, B.A.; Ellrott, K.; Shmulevich, I.; Sander, C.; Stuart, J.M. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 2013, 45, 1113–1120. [Google Scholar] [CrossRef]
SEER. SEER Research Data 1975–2017—Surveillance, Epidemiology, and End Results (SEER) Program. 2019. Available online: www.seer.cancer.gov (accessed on 29 March 2021).
Hutter, C.; Zenklusen, J.C. The Cancer Genome Atlas: Creating lasting value beyond its data. Cell 2018, 173, 283–285. [Google Scholar] [CrossRef]

Figure 1. Estimated deaths in the USA in 2020 by cancer type and gender. Source: SEER database.

Figure 2. Graphical summary of ML methods being applied in cancer research tasks. Super indices in the central figure represent the number of steps in which that approach is reported. No index means that the approach is reported in all the tasks.

Figure 3. Reported dataset size by algorithm.

Table 1. Review of publications whose main topic is Machine Learning and cancer risk prediction.

Cancer Type	AI Approach	Datasets	Software	Training Data Set Size	Data Types	Exp?	Reference
Lung	CNN ¹	BRFSS	Caffe	235,673	Text	Yes	[53]
Any	RF ²	COSMIC, dbSNP	R, HMMER, Dojo	200, 800	Text	No	[54]
Any	SVM ³	Cosmic, SwissVar, Swiss-Prot	Libsvm	6326	Text	No	[55]
Breast, Thyroid, Kidney	RF	TCGA:BRCA, TCGA:THCA, TCGA:KIRP	Java, Weka, YARN, MLlib	897, 571, 321	Text	No	[56]
	DT ⁴	TCGA:BRCA	unknown	897	Text	No
	SVM	TCGA:BRCA	unknown	897	Text	No
	BN	TCGA:BRCA	unknown	897	Text	No
CRC	BN	NSHDS	R, Visualizations with Cytoscape	1676	Text	Yes	[57]
Breast	ANN ⁵	Private	Matlab	62,219	Images, Text	No	[58]
	CNN, SVM	unknown	R	500	Images, Text	No	[59]
	CNN, KNN ⁶	unknown	R	500	Images, Text	No	[59]
	GBM ⁷, SVM	KBCP, OBCS	XGBoost, Sklearn, esyN, Matplotlib, Python	696, 923	Text	Yes	[60]
Gastric	GBM	Private	XGBoost	1431	Text	No	[61]
Gastric	LR ⁸	Private	unknown	1431	Text	No	[61]
Skin	ANN	NHIS	unknown	462,630	Text	No	[62]
Ovarian	KNN, LDA ⁹, SVM, ELM ¹⁰	IOTA tumor images database	Matlab	348	Images	No	[63]
Cervical	CNN	Private	Caffe	20,000	Images	No	[64]

¹ CNN, Convolutional Neural Network; ² RF, Random Forest; ³ SVM, Support Vector Machines; ⁴ DT, Decision Tree; ⁵ ANN, Artificial Neural Network; ⁶ KNN, K-Nearest Neighbours; ⁷ GBM Gradient Boosting Machines; ⁸ LR, Logistic Regression; ⁹ LDA Linear Discriminant Analysis; ¹⁰ ELM, Extreme Learning Machines.

Table 2. Summary of studies analysed in Section 3.2 about cancer recurrence.

Cancer Type	AI Approach	Datasets	Software	Training Data Set Size	Data Types	Exp?	Reference
CRC	KNN, SVM, GBM, ANN, DT, RF	GEO, ArrayExpress	R	50	Text	Yes	[65]
CRC	LR, DT, GBM	BioStudies database	Python, R	800	Text	Yes	[66]
Breast	SVM, ANN, Regression	unknown	SPSS, R	733	Text	No	[67]
	SVM, ANN, DT	ICBC	Weka	1189	Text	No	[68]
	SVM, RO ¹	BioBIM	Java	318	Text	Yes	[69]
Breast	SSL ²	GEO, I2D	C++	194,988	Text	Yes	[70]
CRC	SSL ²	GEO, I2D	C++	194,988	Text	Yes	[70]
Oral	BN, ANN, SVM, DT, RF	unknown	unknown	86	Text, Images	Yes	[71]
Cervical	SVM, DT, ELM	Chung Shan Medical University Hospital Tumor Registry	unknown	168	Text	Yes	[72]

¹ RO, Random Optimization; ² SSL, Semi-Supervised Learning.

Table 3. Works applying ML to forecast cancer progression.

Cancer Type	AI Approach	Datasets	Software	Training Dataset Size	Data Types	Exp?	Reference
Lung	RF	Multicenter Clinical Trials	Matlab2016, SPSS23	72, 32, 31	Images	No	[73]
Lung	TL ¹	TRACERx, [74,75]	ClonEvol	768	CCF, binary data	Yes	[76]
Breast
Renal
CRC
Lung	RNN	TCGA	Matlab	506, 253	Numbers	No	[77]
CRC	RNN	TCGA	Matlab	506, 253	Numbers	No	[77]
Breast	ANN	[78]	unknown	16	Numbers	No	[79]
Head and Neck	LR	GSE57441, GSE9844	GraphPad Prism	330	Mass spectra	No	[80]
Skin	Weka-FCBF ², SVM, PCA ³, ExtraTrees, KNN, RF, LR, Ridge	TCGA	caret, scikit, OmicsMarkeR, Rtsne, scatterplot3d	371, 354, 371	Numbers	No	[81]

¹ TL, Transfer Learning; ² Weka-FCBF, Waikato Environment of Knowledge Analysis—Fast Correlation Based Filter; ³ PCA, Principal Component Analysis.

Table 4. Manuscripts applying ML to estimate drug doses or finding drug combinations for cancer therapies.

Cancer Type	AI Approach	Datasets	Software	Training Dataset Size	Data Types	Exp?	Reference
Prostate	ANN	UCSD #140520 study	unknown	66	Text, Images	unknown	[82]
	ANN	UCSD #140520 study	unknown	66	Text, Images	No	[83]
	CNN	unknown	Keras, Tensorflow	72	Images	No	[84]
Breast	DSS ¹	Local database	unknown	unknown	DB-stored medical records	Yes	[85]
Any	LR, SVM, RF, GBM	AstraZeneca, DREAM consortium	sklearn, xgboost	2790	Numbers	Yes	[86]
	MVA ² on Undirected Graphs	GDSC, CCLE, CTRP	R, Matplotlib, Graphviz	700	CSV, Text	Yes	[87]
	ANN	[88]	TensorFlow	23,062	Compounds, Cell lines	Yes	[89]
	RF	Princess Margaret Cancer Centre	unknown	383	Images	No	[90]
	CNN	PASCAL VOC 2012	TensorFlow	1464	Images	No	[91]
	CNN	PASCAL VOC 2012	Caffe, TensorFlow	1464	Images	No	[92]
	ANN	NCI database	unknown	141	Text	Yes	[93]

¹ DSS, Decision Support Systems; ² MVA, Multivariate Analysis.

Table 5. List of works presented in Section 3.5 about the prediction of therapy outcome in cancer patients.

Cancer Type	AI Approach	Datasets	Software	Training Dataset Size	Data Types	Exp?	Reference
CRC	CNN	Akershus University Hospital, Aker University Hospital, Gloucester Colorectal Cancer Study, VICTOR trial	TensorFlow	12×106	Images	No	[94]
	RF	Teikyo University Hospital, Gifo University Hospital	unknown	54	Medical Records	No	[95]
	RF, SVM, ANN, DT, KNN, GBM	GSE19860, GSE28702, GSE72970	caret, class, e1071, gbm, tree, randomForest, RSNNS	50	Raw data	No	[65]
	LR, DT, GBM	BioStudies database	Scikit-learn, R	800	Excel	No	[66]
	BN	ACTUR database	NCSS	5301	DB-stored medical records	Yes	[96]
	RF, ANN	Genomics of Drug Sensitivity in Cancer portal	Encog, randomForest	38,930	Raw data	No	[97]
	SVM	GSE19860, GSE28702, GSE72970	e1071	144	Raw data	No	[98]
	RF	GSE52735, GSE62080, GSE69657	limma, glmnet, Boruta, randomForest, pROC	58	Raw data	No	[99]
	SVM, LR	unknown	Orange	38	unknown	No	[100]
	SVM	Val d’Aurelle Regional Cancer Center	MAS 5.0	5 to 19	Numbers	No	[101]
Breast	Diagonal LDA, KNN	Nellie B. Connally Breast Center, M.D. Anderson Cancer Center, Instituto Nacional de Enfermedades Neoplásicas de Lima	dCHIP	133	Text, Numbers	No	[102]
	SVM, Recursive Feature Elimination	University of Heidelberg	e1071, ROC	52, 48	Numbers	No	[103]
	LR	unknown	unknown	84	Numbers	No	[104]
Bladder	DT	University of Southern California	SPSS	948	Numbers	No	[105]
Blood	LDA	FRALLE93 protocol	unknown	32	Numbers	No	[106]
Renal	SVM	National Wilms Tumor Study-5	e1071	250	Numbers	No	[107]
Ovary	Binary LR, Stochastic Regression	Duke University Medical Center, H. Lee Moffitt Cancer Center and Research Institute	Bioconductor	83	Numbers	No	[108]
Esophageal	SVM	unknown	unknown	46	Text, Numbers	No	[109]
Lung	DT, RF, ANN, SVM, LR, GBM	[110,111,112,113,114,115,116], Morin (forthcoming), [117,118,119,120]	caret	156, 137, 363, 179, 327, 139, 922, 257, 548, 131, 149, 188	Text	Yes	[121]
Head and Neck
Meningioma
Laryngeal

Table 6. Summary of works about ML and the likelihood of survival.

Cancer Type	AI Approach	Datasets	Software	Training Dataset Size	Data Types	Exp?	Reference
Breast	SVM	[122]	unknown	295	Numbers	No	[123]
	BN	[124]	unknown	97	Numbers	Yes	[125]
	SSL	SEER database	unknown	162,500	DB-stored medical records	No	[126]
	SSL Co-training	SEER database	unknown	162,500	DB-stored medical records	No	[67]
	ANN, LR, DT	SEER database	unknown	200,000	DB-stored medical records	Yes	[127]
Oral	SVM	unknown	unknown	69	unknown	No	[128]
Any	ANN	unknown	unknown	440	unknown	No	[129]
Lung	Linear Regression, DT, SVM, GBM, Custom¹	SEER database	R	7830	DB-stored medical records	Yes	[130]
CRC	CNN, RNN	Helsinki University Central Hospital	Keras	420	Images	Yes	[131]
Brain	CNN	TCGA, South Australian public hospital system	Keras, Tensorflow	679	Images	Yes	[132]
Prostate	DT, BN, Cox	The Methodist Hospital	S-PLUS	1050	Text	Yes	[133]

¹ A custom ensemble of methods.

Table 7. List of code repositories or servers listed in the manuscript.

Task	Ref.	Code Availability ¹
Predict cancer risk	[60]	https://github.com/hambeh/breast-cancer-risk-prediction
Predict cancer risk	[56]	https://github.com/fcproj/BIGBIOCL
Predict progression	[77]	https://github.com/noamaus/LSTM-Mutational-series
Predict progression	[81]	https://webs.iiitd.edu.in/raghava/cancerspp/
Predictrecurrence	[134]	http://ami.ajou.ac.kr/bcr/
Estimate drug synergy	[85]	https://protege.stanford.edu/
	[86]	https://github.com/rcelebi/dream-drugcombo https://www.synapse.org/#!Synapse:syn5605365/wiki/394725
	[87]	https://github.com/szen95/SEABED
	[89]	http://www.bioinf.jku.at/software/DeepSynergy/
	[91]	https://github.com/tensorflow/models/tree/master/research/deeplab
	[92]	http://liangchiehchen.com/projects/DeepLab.html
Predict therapy outcome	[121]	https://github.com/timodeist/classifier_selection_code
Predict survival	[126]	http://embio.yonsei.ac.kr/Park/ssl.php

¹ Access date: 29 March 2021.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Banegas-Luna, A.J.; Peña-García, J.; Iftene, A.; Guadagni, F.; Ferroni, P.; Scarpato, N.; Zanzotto, F.M.; Bueno-Crespo, A.; Pérez-Sánchez, H. Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey. Int. J. Mol. Sci. 2021, 22, 4394. https://doi.org/10.3390/ijms22094394

AMA Style

Banegas-Luna AJ, Peña-García J, Iftene A, Guadagni F, Ferroni P, Scarpato N, Zanzotto FM, Bueno-Crespo A, Pérez-Sánchez H. Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey. International Journal of Molecular Sciences. 2021; 22(9):4394. https://doi.org/10.3390/ijms22094394

Chicago/Turabian Style

Banegas-Luna, Antonio Jesús, Jorge Peña-García, Adrian Iftene, Fiorella Guadagni, Patrizia Ferroni, Noemi Scarpato, Fabio Massimo Zanzotto, Andrés Bueno-Crespo, and Horacio Pérez-Sánchez. 2021. "Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey" International Journal of Molecular Sciences 22, no. 9: 4394. https://doi.org/10.3390/ijms22094394

APA Style

Banegas-Luna, A. J., Peña-García, J., Iftene, A., Guadagni, F., Ferroni, P., Scarpato, N., Zanzotto, F. M., Bueno-Crespo, A., & Pérez-Sánchez, H. (2021). Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey. International Journal of Molecular Sciences, 22(9), 4394. https://doi.org/10.3390/ijms22094394

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey

Abstract

1. Introduction

2. What Kind of ML Is important in Medicine/Cancer Prediction and Treatment

2.1. Factor One: Output Interpretability

2.2. Factor Two: Linking to Original Cases to Produce Outputs

2.3. Factor Three: Data Hungriness

3. Application of ML Approaches in Cancer Cases

3.1. Predict the Possibility of Cancer

3.2. Predict Cancer Recurrence

3.3. Predicting Cancer Progression

3.4. Calculating Drug Doses or Drug Combinations

3.5. Predict Treatment Outcome

3.6. Predicting Survival Likelihood

4. Software and Datasets

4.1. Software Tools

4.2. HPC Infrastructures

4.3. Datasets

5. Conclusions and Outlook

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI