1. Introduction
Colorectal cancer (CRC) is one of the leading causes of cancer-related mortality worldwide [
1]. The survival of patients is closely related to the tumor stage at the time of diagnosis, as five-year survival rates range from approximately 80% for early-stage disease to 63% for late-stage disease [
2]. To limit the potential side effects and unnecessary health care expenditure, the most common treatment for low-risk stage I and stage II colon cancer (CC) patients is curative surgery alone, and adjuvant therapy is only recommended to those with high-risk stage II and stage III–IV tumors [
3,
4]. However, the prognosis of low-risk patients who have undergone curative surgery is variable, since relapse is known to occur in a small fraction of these patients [
5,
6]. Therefore, identifying these relapse-prone patients could greatly contribute to the optimization of treatment selection. Furthermore, chemotherapy is only effective in a few patients, while some patients respond initially but will become resistant later. Therefore, there is an urgent need to find overall survival (OS) and recurrence-free survival (RFS) biomarkers to increase the mortality rate. Finding sensitive diagnostic and prognostic biomarkers is an urgent need for CRC patients. The prognosis for CRC is affected by a variety of features that are present at the initial diagnosis. While colonoscopy screening remains the most reliable diagnostic tool for CRC patients, it is necessary to develop a noninvasive biomarker for the early diagnosis of CRC patients before colonoscopy. The treatment of CRC fails if the tumors are aggressive at the time of diagnosis. Therefore, until now, it has been extremely important to find and improve the variables that define patient prognosis in the early stages of the disease. On the other hand, the availability of noninvasive, cost-effective, and highly accurate biomarkers is urgently needed to increase patient participation in affordable screening programs and facilitate the detection of early-stage CRCs. In contrast, recent reports have shown that noninvasive screening strategies such as cell-free circulating biomarkers are acceptable and feasible, as high-throughput molecular technology has rapidly developed in the past decade. An accurate and robust gene signature can be of great help for a more precise diagnosis of the disease in clinical practice.
Brain-derived neurotrophic factor (BDNF) is a protein belonging to neurotrophins, a family of growth factors that mediate neuronal migration, development, and differentiation via its tyrosine kinase receptors tropomyosin receptors TrkA, TrkB, and TrkC [
7]. The participation of BDNF and its TrkB receptor in oncogenesis has been the subject of many studies in recent years [
8,
9]. BDNF has many different carcinogenic effects via its receptor TrkB, inducing epithelial–mesenchymal transition-like transformation, anoikis resistance, and metastasis [
10]. It has also been shown in head and neck cancer that BDNF induces RAS activation via TrkB [
11]. In our recent finding, we have shown that BDNF also has an important role in CRC in a mice xenograft model and that BDNF mRNA and protein expression can be used as a prognostic marker for CRC patients [
12,
13].
Building upon this evidence, we have performed an unbiased, systematic, and comprehensive genome-wide discovery to identify a novel and robust gene signature (gene selection based on the significant correlation with BDNF and significant association with cancer pathology) that can predict tumor recurrence in patients with stage I, II, and III CRC. By analyzing multiple clinical cohorts that included a total of 1 987 CRC patients, we demonstrate that this selected gene signature overall and recurrence classifier has superior predictive power over clinicopathologic risk determinants and currently available commercial assays, and this combined with its robust performance even in FFPE tissues makes it attractive for relatively immediate clinical translation. Furthermore, we performed a comprehensive analysis of transcriptome profiling data and identified a novel signature for the early diagnosis of CRC patients.
3. Discussion
In our quest to develop a robust CRC prognostic signature, we successfully developed a five-gene signature that achieved excellent predictive values in OS and tumor recurrence in patients with stage I, II, and III CRC, which were validated in one tissue-based independent clinical cohort. Furthermore, when compared with the other clinicopathologic risk factors, our five-gene classifier remained the strongest prognostic indicator. To further highlight the clinical significance of our findings, our five genes significantly stratified patients from all clinical cohorts into high- and low-risk subgroups robustly.
The current TNM staging system based on The American Joint Committee on Cancer: the 7th edition (AJCC 7th edition) is closely associated with patient prognosis [
5]. The univariable and multivariable Cox regression analyses in our study consistently showed that tumor stage was a significant prognostic factor in both the training and validation cohorts. Therefore, stratification analysis was performed to investigate whether these five-gene signatures were independent of tumor stage. The results showed that it could also discriminate the high-risk patients from the stratified groups in the in silico discovery, training, validation, and clinical validation cohorts. One important question should be mentioned here: the ethnicity of the in silico and clinical cohorts differed in this study. Successful validation indicated that our gene signature was not only across populations but was also independent of tumor stage, as our signature was independent of tumor stage. Thus, the conclusions of our analyses were convincing.
Recent studies have shown that right- and left-sided CRCs have different epidemiologic and histological characteristics as well as underlying biological mechanisms [
14,
15]. However, when we stratified the patients by tumor location in our clinical cohort, we found that our five-gene signature could not discriminate high-risk patients from the subgroup of rectal carcinoma. This result indicated that our signature may apply to left-sided or right-sided CCs. Notably, there were very few patients with rectal carcinoma, so bias might have affected the stratification analysis. It is necessary to enlarge the sample size to generate more reliable results. Previously, a 13-gene-based classifier was reported to predict tumor recurrence in patients with stage II CRC [
16]; however, in this study, our five-gene classifier performed significantly better and illustrated its ability to predict OS and recurrence not only in patients with stage I and II CRC but also in patients with stage III CRC.
Although we did not perform a direct comparison, the OS and recurrence prediction values for our five-gene signature were superior to gene expression-based signatures offered by the ColoPrint and Oncotype DX assays [
17]. It would also be interesting to validate our markers or stratify them based on CDX2 as well as recently published immune scores in the future. An ideal prognostic classifier for CRC risk prediction should be robust, reproducible, and most importantly, potentially feasible in FFPE materials, which would eliminate the need to plan and invest methodologies to collect and preserve fresh-frozen tumor specimens. Our five-gene classifier successfully overcomes these barriers, as evidenced by its superior performance and independent validation in a cohort of FFPE specimens. The availability of ideal prognostic and predictive biomarkers is essential for achieving clinical goals in refining therapeutic decisions and thereby improving the survival and quality of life of patients with CRC.
ROC analysis showed that our five-gene signature was superior to tumor stage for prognostic evaluation. To further improve the prognostic prediction ability, we combined the five-gene risk model with TNM staging and lymph node metastasis. There was very little difference between the combined model and our gene signature, indicating that our five-gene signature could yield results alone or in combination with TNM staging and lymph node metastasis information. As a result of poor reproducibility, most established signatures have not been used clinically for prognostic prediction in CRC. The reasons for poor reproducibility are manifold. In early studies, small sample series and lack of validation in independent samples limited the strength of the conclusions. In addition, some gene signatures use too many genes for the construction of a model, which inhibits their clinical utility. Importantly, most studies of gene signatures are retrospective; good reproducibility is still hampered by the lack of validation in prospective multicenter studies [
18,
19].
As a new diagnostic test for determining the likelihood of recurrence in stage II CRC patients after surgical resection, the OncotypeDX colon cancer assay has been commercially available worldwide since 2010 [
17,
20]. The results indicated that this five-gene signature might be a useful tool for the management of CRC patients. As our study was retrospective, its reliability still needs further validation in a large prospective study.
The innovation of our research rests on the following aspects. First, the AUC of our five-gene signature was fairly large (>0.75), indicating good prognostic ability. Second, our study is a relatively systematic examination of prognostic gene signatures in CRC. Third, our study has several strengths related to the study design and analytical methods. The five-gene classifier was validated in independent in silico datasets as well as one independent population and IHC-based clinical cohort. As we developed a “risk prediction model” using our five-gene signature, the scores can be readily applied to independent, future prospective cohorts. Our assay also demonstrated effectiveness in FFPE tissues. Our signature is derived from a common cancer-related gene that plays a major role in cancer cell development, apoptosis, and metastasis; thus, this signature is closely related to cancer cell growth, apoptosis, and metastasis and should be suitable for prognostic assessment.
Although we did not have access to blood specimens in this study for prognostic assessment, we feel encouraged that given the stability and relative abundance of five- genes in circulation, it is very likely that our five-gene signature may eventually be translated into a blood-based, predictive, surveillance assay as OS and RFS. Furthermore, in this study, we proposed an mRNA biomarker panel for the detection of CRC in human plasma by measuring four mRNAs retrieved from bioinformatics analysis and comparing their sensitivities and specificities in one RNAseq and plasma-based GEO microarray cohort. We first investigated the upregulated four genes (
BDNF,
PTGS2,
CTNNB1, and
GSK3B) in the tumor sample expression profile in CC patient tissue using RNAseq data. In our recent study, we showed that
BDNF with
CYSLTR1 and
CD66B together or individually identified the high-risk group or were used as strong prognostic markers in CRC patients, and these markers had a significant role on CC progression in the mice xenograft model [
13,
21]. Whereas, another earlier study from our group showed that non-canonical WNT5A signaling in CC cells have an opposite effect than canonical β-catenin signaling on 15-PGDH (
HPGD) expression due to its ability to inhibit β-catenin (
CTNNB1) in CC cells [
22]. As shown in another study, even patients who were defined as having a poor prognosis had high nuclear β-catenin and COX-2 expression and low 15-PGDH expression [
23]. In our earlier studies, we provided first evidence that LTC
4, via CysLT
2R signaling, can induce 15-PGDH expression through the activation of the JNK/AP-1 pathway, and this activation in turn prompts the differentiation of CC cells [
24]. In a follow-up study, we revealed that the induction of 15-PGDH by LTC
4/CysLT
2R signaling axis in CC cells downregulated GLI1 and induced differentiation in CC cells [
25].
From a clinical standpoint, early detection is a vitally important factor for therapeutic decision making, especially in CRC, due to the lower mortality rate and poor symptoms in the early stage. If patients can be diagnosed in the early stage of cancer, before lymph node metastasis, then the surgical resection and design of the treatment strategy could be improved. An ideal biomarker should be noninvasive, accurate, inexpensive, specific, sensitive, reliable, and reproducible [
26,
27]. Consistent with this idea, body fluids are commonly used materials in the diagnosis of multiple human diseases [
26,
28]. In particular, plasma has significant advantages in reflecting gastrointestinal diseases [
29,
30]. To date, microRNAs (miRNAs) have been widely reported to be biomarkers in body fluids [
31]. Our data indicated that the expression patterns of all differentially upregulated four-gene signatures in patient plasma samples of clinical cohorts were consistent with the results of discovery and training analysis. The relative expression levels of four selected mRNAs in CRC plasma samples were higher than those in healthy control plasma samples for the plasma-based clinical cohort. CRC plasma mRNA levels were also associated with major clinicopathological factors. All these results strongly suggested that the selected mRNAs may be novel potential CC-associated biomarkers. Due to the low sensitivity or specificity of the existing blood biomarkers of CRC, a panel approach of combinations will allow us to pursue a more precise diagnosis. As shown in
Figure 5G, we evaluated the combination of our four gene indices using a logistic model. In the plasma from the CRC patients cohort, the AUC of the combined use of our four genes was up to 0.83, with sensitivity = 59.63% and specificity = 96% (
Figure 5G). These results mean that the diagnosis of the selected novel genes signature will have satisfactory specificity and sensitivity. Nevertheless, to confirm the specificity of this method, a larger cohort of patients should be analyzed in the future.
4. Materials and Methods
4.1. Patient Cohorts
This study included multiple clinical cohorts with a total of 1850 patients. These cohorts include patients from the publicly available dataset from GSE44076 (N = 246), GSE44861 (N = 92), GSE41258 (N = 240), GSE37364 (N = 65), GSE110224 (N = 34), The Cancer Genome Atlas (TCGA-COADREAD; N = 404), and GSE39582 (N = 585) as well as clinical validation tissue-based cohorts of Malmö-CC (N = 144), Malmö-CC RNA-seq (N = 12), and plasma from Linköping-CRC cohort (N = 28) (
Table 1). The first clinical cohort (cohort 1 as the prognostic validation cohort) comprised formalin-fixed, paraffin-embedded tissues (matched normal and tumor samples) from 72 patients who were enrolled at Malmö Hospital during 1990 and were followed up after 120 months [
32]. The second clinical cohort (cohort 2 as the diagnostic cohort) included plasma from 19 CRC patients and 9 healthy control plasma samples [
33]. The studies were conducted in accordance with the Declaration of Helsinki, and the respective institutional review boards approved the study.
4.2. Identification of the mRNA Signature from Microarray and Genome-Wide RNA Sequencing Data
Four public datasets were analyzed in the discovery phase
for marker selection: CRC microarray data from GSE44076 (N = 246), GSE44861 (N = 92), GSE41258 (N = 240), and GSE37364 (N = 65) from the Gene Expression Omnibus (GEO). One RNA sequencing dataset from TCGA (N = 404) and microarray dataset GSE39582 (N = 585) from GEO were used as the in silico training and validation cohort, respectively. The discovery phase datasets were chosen according to the availability of the sample sizes and types of tissues. The GSE444076 microarray dataset contains mRNA expression of fresh frozen (FF) tissue from healthy control mucosa (HCM, N = 50), matched normal colon mucosa (NCM, N = 98), and matched primary colon tumor (PCT, N = 98), which contained a satisfactory number of sample size in stage II and III CRC in each group (
Table 1). Whereas, the GSE41258 dataset contains a significant amount of formalin-fixed paraffin-embedded tissue with stage I, II, III, and IV samples. We also used two other datasets (GSE44861 and GSE37364), which contain a sufficient number of matched NCM and matched PCT, but other clinical information was missing in both the datasets. The TCGA-COADREAD (RNA-seq) and GSE39582 (microarray) datasets were used as in silico training and validation cohorts because both the datasets contain a significant number of stage I, II, and III samples with all the clinical information (including OS and RFS information,
Table 1). A total of five cancer-related genes were selected based on the Gene Ontology and cancer hallmark database (
Figure S3). The in silico discovery cohort was used to identify differentially regulated cancer-related gene signatures between adjacent normal and tumor samples in CRC patients. The mRNA expression levels, measured by reads per million mRNA mapped (RPM), were first log2 transformed. We checked the expression of five previously selected genes that reached significance (
p = 0.05) and log2-fold change > ±1. The differentially regulated genes were represented as upregulated and downregulated in the volcano plot for all the datasets (
Figure S4). We used these five genes to build the model using the backward elimination Wald test to check the feasibility of combining the five genes in the prognostic model. Differential mRNA expression analysis was subsequently performed between normal and tumor tissues at five-year survival using the Wilcoxon signed-rank test. For the in silico validation of identified mRNAs, we analyzed one additional independent cohort, GSE39582 (death = 112, alive = 361), and the TCGA cohort (death = 75, alive = 249). The mRNA expression profiles were normalized using the robust multiarray average (RMA) algorithm in R. We downloaded preprocessed data from GEO using the Bioconductor package “GEOquery.” Using multivariate Cox regression analysis, we calculated risk scores and assessed the prognostic performance of the mRNA signature-based survival analysis using Youden’s J Statistic index association value of the predicted risk scores in each dataset as the cutoff.
4.3. Immunohistochemistry
The paraffin-embedded samples were cut into 4-µm sections, deparaffinized, and incubated in 10 mM citrate buffer (pH 6.0, S1699; Dako, Glostrup, Denmark) for 20 min in a microwave oven. Blocking buffer (Dako) with 10% fetal bovine serum in PBS was added to the slides for 10–20 min, and then, a specific serum-free protein block (Dako) was added for 30 min at RT. After washing, the sections were stained using Dako Autostainer Plus18 with the following primary antibodies: rabbit monoclonal antibody against BDNF (dilution 1:1000, pH 6. RT, 1 h; ab108319; Abcam, Cambridge, UK), rabbit polyclonal antibody against 15-PGDH (dilution 1:500, pH 6. RT, 1 h; Novus Biologicals, Cambridge, UK), mouse monoclonal antibody against β-catenin (dilution 1:500, pH 6. RT, 1 h; BD Transduction Laboratories, Becton, NJ, USA), rabbit polyclonal antibody against GSK-3β (dilution 1:50, pH 6. RT, 1 h; Cell Signaling Technology, Danvers, MA, USA), and rabbit polyclonal anti-COX-2 (dilution 1:150, pH 6. RT, 1 h; ab52237; Abcam, Cambridge, UK). The secondary antibody Envision+ System-HRP-labeled polymeric anti-rat/anti-rabbit antibody (Dako) was visualized using 3,3′-diaminobenzidine (DAB) substrate (Vector Laboratories Inc., Burlingame, CA, USA) and counterstained with hematoxylin.
Four genes (BDNF, PTGS2, GSK3B, and CTNNB1) were significantly upregulated, and one gene (HPGD) was significantly downregulated in primary tumor tissues compared with adjacent normal tissues in all five in silico datasets. A correlation matrix and heatmap were generated for the selected markers, and we found a significant correlation between the mRNA expression of BDNF and 4 other genes, CTNNB1, PTGS2, GSK3B (β-catenin, COX-2, and GSK-3β respectively), and HPGD (15-PGDH) in primary tumor tissues compared to adjusted colon mucosa and colon mucosa from healthy donors.
We compared gene expression between cancer and normal tissue from a training TCGA dataset (N = 324) and built a prognostic model and calculated the risk score of selected genes in stage I, II, and III CRC patient populations. We performed OS and RFS prognostic analyses based on the in silico data and used an mRNA-based prognostic model to predict the risk score for CRC patients using univariate and multivariate analyses after including the significant clinical factors in the in silico training and validation cohorts. By combining all these results, we extracted the five-gene signature and suggested it as an independent predictor panel with the best prognostic assessment.
For validation, the selected prognostic gene signatures in our clinical cohort from Malmö Hospital (a TMA with 58 CC patients from both normal and tumor areas, stage I–III) were analyzed using IHC. Human tumor tissues and their normal matched pair colon tissues were stained against BDNF, β-catenin, COX-2, GSK-3β, and 15-PGDH, and the results from the evaluation of the protein levels were consistent with in silico data.
4.4. RNA Isolation and mRNA Expression Analysis from Tissue and Plasma Samples
Six colon adenoma patients and six matched normal tissues were collected from the Malmö Cancer Hospital. Those adjacent normal tissues were confirmed after performing histopathology by a trained pathologist. Plasma samples from 19 CC patients and 9 normal healthy controls were collected for validation as diagnostic markers. Anti-coagulated blood was collected in EDTA tubes and centrifuged at 1600× g for 10 min at 4 °C. The clear upper layer plasma was collected and recentrifuged at 16,000× g for 10 min at 4 °C to remove residual cell pellets. After that, cell-free plasma was collected and preserved in 1 mL Qiazol Reagent (Qiagen, Germantown, MD, USA) before storage at −80 °C.
Total RNA was extracted from the colon tumor and adjacent normal samples, which was followed by quantification using a bioanalyzer (Agilent Technology, Santa Clara, CA, USA). The extracted RNA was sent to the Lund University sequencing facility for transcriptome analysis. A sequencing library was prepared using a TruSeq Targeted RNA Expression Kit (Illumina, San Diego, CA, USA), and data were filtered using the Trimmomatic package. The mRNA expression levels, measured by reads per million mRNA mapped (RPM), were first log2 transformed.
Total RNA from the plasma samples was isolated using an RNeasy mini kit (QIAGEN, Germantown, MD, USA). mRNA expression analysis was performed using an Agilent real-time PCR system (Agilent Technology, Waltham, MA, USA). Specific TaqMan probes, rather than the more commonly used gene-specific primers, of selected genes were purchased from Thermo Scientific. qRT–PCR assays were conducted using an RNA reverse transcription kit (Applied Biosystems, Waltham, MA, USA) using a Thermosphere probe mix kit. The relative expression of mRNA was determined by the 2-Δct method using β-actin as a normalizer, as described previously. All results are expressed as the mean ± SD of three independent experiments.
To evaluate whether our suggested panel can be considered an accurate indicator of early diagnosis in CRC patients, we conducted a study in which we analyzed the mRNA level of the selected markers in plasma from a plasma-based CRC patient cohort (N = 28). We built a logistic model and calculated the risk score for the four-gene signature (only the upregulated genes were selected due to the lower concentration of mRNA in plasma samples) for use as an early diagnosis marker panel using tissues and plasma samples from CRC patients, after comparing the gene expression between healthy mucosa and primary CC tumors and healthy controls plasma with cancer patient plasma samples. The risk score will be validated in a plasma-based clinical cohort using RT-PCR methods for the four-panel gene signature.
4.5. Statistical Analysis
Statistical analyses were performed using IBM SPSS version 20 (Chicago, IL, USA), MedCalc version 18 (Ostend, Belgium), GraphPad Prism version 8.0 (La Jolla, CA, USA), and R 3.2.4. Statistical differences between mRNAs and various clinicopathologic factors were determined by the χ2 test. The Benjamini–Hochberg method was used to correct for multiple hypothesis testing wherever applicable. All statistical tests were two-sided, and a p-value of <0.05 was considered significant. OS was defined from the day of surgery to death or the end of follow-up and was analyzed by the log-rank test. We performed receiver operating characteristic (ROC) curve analysis to evaluate the predictive power of the selected gene signature. All five mRNA expression values derived from the transcriptome datasets were used to build an overall survival classifier (OSC) using Cox proportional hazard regression. The risk scores derived from the five-gene OSC Cox model were used to plot the area under the curves (AUC). The risk scores were calculated using the formula derived from the Cox model. The risk score was validated using the immune reactivity score from IHC slides in a tissue-based cohort for OS prognostic assessment. To evaluate the association of protein expression in CC tissue with OS, univariate and multivariate Cox proportional hazards regression models were applied, and hazard ratios (HRs) together with 95% confidence intervals (CIs) were calculated to determine the risk of death or cancer recurrence. The multivariate model was adjusted for established prognostic factors such as age, sex, lymph node metastasis (LNM), tumor-node-metastasis (TNM) stage, and tumor size. All patients with incomplete or missing cores were excluded from the analysis. To plot the Kaplan–Meier curves, we dichotomized the patients into low- or high-risk groups based on Youden index-derived cutoff values (X-tile software 3.6.1, Yale School of Medicine). Additionally, we performed univariate and multivariate Cox proportional hazard regression models using clinicopathologic variables and OSC to calculate estimated hazard ratios (HRs). Only the significant variables in the univariate model were used to perform the multivariate analysis. The differences in mRNA levels between plasma of CRC patients and healthy control plasma samples were assessed using the t-test for paired data. The correlations between mRNA levels and clinicopathological factors were further analyzed by one-way analysis of variance (ANOVA). We performed receiver operating characteristic (ROC) curve analysis to evaluate its diagnostic value. All four mRNA expression values derived from RT-PCR were used to build an mRNA signature early diagnosis classifier using a logistic regression model. The risk scores derived from the four-mRNA signature logistic model were used to plot the area under the curves (AUCs). Additionally, we performed univariate and multivariate logistic regression models using clinicopathologic variables and mRNA signatures. Only the significant variables in the univariate model were used to perform the multivariate analysis. Experimental reproducibility was determined by the Pearson correlation test.
In this study, the limitations concerning clinical sample size can be ignored because the Swedish ethnic group has a very small population size, and that the study was well backed up with a large number of CRC samples from different public databases. Another limitation is that the study contains only one ethnic population for independent validation for the prognostic and diagnostic markers. However, the fact that our study is one of the first to perform that the same genes could be used as prognostic as well as diagnostic markers and significantly separate the high-risk group from the low-risk group and provide insights for early diagnosis of CRC patients might make it very useful.