1. Introduction
Copy number alterations, both gains and losses, are common molecular lesions in cancer and complement mutations and epigenetic changes in the development of neoplasia [
1]. Amplifications of a specific chromosomal locus usually containing several genes are observed in specific cancer types but not in others, implying that the molecular genomic environment promotes and favors stabilization of these loci possibly due to the resident oncogene content. One of the best-known cancer amplicons is observed around the
ERBB2 locus, at chromosome 17q12, in a subset of breast cancers which become sensitive to treatments specifically targeting the over-expressed HER2 (Human EGFR family Receptor 2) receptor. Amplifications of 17q12 are focal and rarely these cancers have polysomy of the whole chromosome [
2]. In addition, segments surrounding the amplified locus may be amplified independently or lost, a fact well-illustrated by the gene encoding for topoisomerase II at 17q12, which is commonly amplified or lost in
ERBB2-amplified breast cancers [
3]. The centromeric region of chromosome 17 is often co-amplified with
ERBB2, but even in these cases there is no associated polysomy of the whole chromosome 17 [
4,
5]. Amplifications of the locus surrounding
ERBB2 is specific for breast cancer and is rarely seen in other cancer types besides a sub-set of gastric adenocarcinomas [
6]. Genes in close proximity to
ERBB2 include
GRB7 (Growth Factor Receptor-Bound protein 7), coding for an adapter protein binding to EGFR (Epidermal Growth Factor Receptor) family tyrosine kinase receptors, and
STARD3 (Steroidogenic Acute Regulatory-related lipid transfer Domain containing 3), encoding for a protein involved in lipid trafficking, are most commonly co-amplified with
ERBB2 [
7]. A different amplicon based at locus 11q13 contains the gene
CCND1 encoding for cyclin D which is implicated in hormonal therapy resistance in breast cancer and is targeted therapeutically by inhibitors of cyclin-dependent kinases [
8]. Other genes such as those encoding for Fas-Associated Death Domain (
FADD) or for the cytoskeleton scaffold protein cortactin are also oncogenes frequently co-amplified with
CCND1 at 11q13 or independently in some cases [
8].
This paper investigates another amplicon at locus 8p11.23, commonly observed in breast cancers, using publicly available genomic data and other freely available resources informing on the amplicon genes and their products and putative clinical significance. Although several possible oncogenes have been implicated as critical in this locus, there is no clear consensus on the dominant gene or genes that are critical for the establishment and fixation of the amplicon in breast cancer or other cancer types [
9,
10]. The focus of this investigation is the frequency of the amplicon in different primary cancer types, the most common gene content of the amplified region in breast cancers, expression of the genes commonly amplified and possible prognostic implications. In addition, available data are examined to derive information for the most critical oncogenes residing in the amplicon with a focus on breast cancer.
2. Methods
The frequency of amplifications in genes at 8p11 was determined in different studies from The Cancer Genome Atlas (TCGA,
www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga) with data listed in the cBioCancer Genomics Portal (cBioportal,
http://www.cbioportal.org). cBioportal is a site that enables interrogation of genomic data from publicly available studies [
11]. cBioportal associates genomic data in studies from TCGA and other sources with patient clinical characteristics and survival outcomes [
12,
13,
14]. Analysis of copy number alterations in TCGA is performed with the GISTIC (Genomic Identification of Significant Targets in Cancer) algorithm, in which a score of 2 or above denotes putative amplification of a gene. Aneuploidy Score (AS) is calculated as the sum of the number of chromosome arms in each sample that have copy number alterations (gains or losses). A chromosome arm is considered copy number altered, gained or lost, if there is a somatic copy number alteration in more than 80% of the length of the arm, as calculated by the ABSOLUTE algorithm from Affymetrix 6.0 SNP arrays [
15]. Chromosomal arms with somatic copy number alterations in 20% to 80% of the arm length are not called and chromosomal arms with somatic copy number alterations in less than 20% of the arm length are considered not altered. mRNA expression grids in cBioportal are constructed and normalized using the RSEM algorithm [
16]. The z score used in the mRNA expression comparisons denotes the standard deviations that the expression of a gene in a sample differs to the mean expression of the same gene in samples that are copy neutral for that gene. In addition to the TCGA cohort, analysis for breast cancer was performed with publicly available data from the METABRIC cohort [
17]. Two genomic studies of metastatic breast cancer patients, the French INSERM study and the metastatic breast cancer project study, included in cBioportal, were evaluated for determination of the frequency of amplifications of the genes at chromosome 8p11.23 or any additional lesions developing in these genes as breast cancers progress [
18].
Expression of proteins of interest from chromosome location 8p11.23 in breast cancer was evaluated using data from the Human Protein Atlas (
www.proteinatlas.org), a publicly available database of protein expressions in human normal and neoplastic tissues [
19]. The database contains a semi-quantitative immunohistochemistry-based evaluation of the expression of proteins of interest.
Promoter sequences of genes at 8p11.23 as listed in the Eukaryotic Promoter Database (EPD-
www.epd.epfl.ch) were interrogated for presence of binding sites of the transcription factors ERα, ERβ, E2F1, FOXA1 and GATA3 as listed in the 8th release of JASPAR open-access database of transcription factors binding profiles [
20].
The effect of mRNA expression level of each of the amplicon gene on survival of breast cancer patients was examined with data derived from the online publicly available platform Kaplan Meier Plotter (
www.kmplot.com) [
21]. The cut-off of amplified and non-amplified samples for each gene was set at the higher quartile of amplification, which is the closer cut-off provided by the platform to the percentage of breast cancer cases with the 8p11 amplicon.
Categorical and continuous data were compared with the Fisher’s exact test or the x2 test and the t-test, respectively. Kaplan–Meier survival curves were compared using the Log Rank test. All statistical comparisons were considered significant if p < 0.05, except for the survival analysis according to mRNA expression levels which was considered significant at a p < 0.0005 level to account for multiple comparisons.
3. Results
The previously reported amplified chromosomal area in breast cancer at the short arm of chromosome 8 locus p11 spans an area that contains several coding genes (
Table 1). A survey of several TCGA studies shows that the genes at 8p11.23 are amplified as a block in 12% to 17.5% of squamous lung carcinomas, 11% to 12% of breast cancer samples and a slightly lower percentage (7–9%) in bladder cancer samples (
Table 2). In several other cancers the amplicon is present in 1% to 4% of cases (
Figure 1) and, in these cancers, genes of the locus are co-deleted in a similar percentage of cases (not shown). Co-deletions are also observed in about 3% of bladder cancer cases.
In breast cancer the genes at the most telomeric part of the amplicon, ERLIN2 and ZNF703, display the higher frequency of amplification (11.8% and 11.7% of breast cancer cases, respectively), while the frequency of amplification decreases in more centromeric genes of 8p11.23 locus (
Table 3). The most centromeric gene of locus 8p11.23 is FGFR1, which is amplified in 10.9% of breast samples in TCGA. Frequency of amplification further decreases more centromeric to 8p11.23, in genes of loci 8p11.22 and 8p11.21. Those genes are amplified in less than 75% of ZNF703 amplified cases and their expression is not increased in those cases (not shown). The area telomeric to 8p11.23 constitutes a transcribed genes desert and the next genes have a much lower amplification frequency than ERLIN2 and ZNF703. Overall similar but slightly higher, from 13% to 14.3% of the total breast cancer cohort, amplification frequencies are also observed in the METABRIC study, with the most telomeric genes showing again the higher prevalence. In contrast, in squamous carcinomas of the lung, the highest frequency of amplification is observed in the most centromeric genes NSD3, LETM2 and FGFR1 (
Table 2).
The frequency of co-amplification of each of the other genes at 8p11.23 in the 124 samples of breast TCGA study that have ZNF703 amplifications is shown in
Figure 2. In samples without ZNF703 amplifications, the rest of the genes of the amplicon are amplified in 0.2% to 1.7% of cases (not shown). Regarding clinicopathologic characteristics, ER+/HER2-/high proliferation cancers were over-represented in the ZNF703-amplified group (as representative of the presence of the whole amplicon) (
Table 4). Cancers with positive ER, negative PR and of the luminal B phenotype in the PAM50 classification were also over-represented in the ZNF703-amplified group (
Table 4).
The distribution of Aneuploidy Scores (ASs) was not significantly different in cancers with ZNF703 amplifications compared with ZNF703 non-amplified breast cancers (
Figure 3). The distribution of Tumor Mutation Burden (TMB) was also similar in cancers with ZNF703 amplifications compared with ZNF703 non-amplified cancers, besides the subset with TMB above 13 where non-amplified tumors were more commonly observed, in the METABRIC cohort (
Figure 4). However, high TMB frequency is overall low in breast cancer. Frequencies of copy number alterations in specific chromosomal arms differ significantly between ZNF703-amplified and non-amplified cases. Most interestingly, gain of 8p arm is only observed in one case (0.8%) in the ZNF703-amplified cohort and in 10.1% of cases in the non-amplified cohort (x
2 p < 0.001) (
Figure 5). In contrast losses of chromosome arm 8p are present in 68.5% of ZNF703-amplified cases and in only 38.2% of ZNF703 non-amplified cases (x
2 p < 0.001) (
Figure 6). Other chromosome arms with significant differences in the frequencies of putative copy number alterations between the ZNF703-amplified and non-amplified cohorts in the breast cancer TCGA study include 17q and 1q (x
2 p = 0.006 and 0.004, respectively) (
Figure 5).
Evaluation of the median mRNA expression of genes of the amplicon in TCGA breast cancer study discloses that a few genes (GOT1L1, ADRB3, STAR and LETM2) have a very low or no expression in breast cancers. Among these, ADRB3 and STAR as well as ADGRA2 show no expression at the protein level at Human Protein Atlas and thus are unlikely to be important in breast carcinogenesis. Moreover, these genes, as well as ZNF703, RAB11FIP1, EIF4EBP1 and FGFR1 show low correlation in amplified cases with increased mRNA expression in TCGA (
Figure 7). In contrast, ERLIN2, PLPBP, BRF2, ASH2L, BAG4 and NSD3 are most often upregulated at the mRNA level in cases with amplifications of the 8p11.23 amplicon. Among the 150 ER+/HER2-/Proliferation high breast cancers in the ZNF703-amplified group of the METABRIC study cohort, the most commonly putative increased mRNA expression (z score above 2) was observed in BRF2 (46% of cases), ERLIN2 (42.7% of case), NSD3 (40.7% of cases), ASH2L (39.3% of cases), RAB11FIP1 (38.7% of cases) and PLPBP (38.7% of cases). Other genes of the amplicon show lower over-expression frequencies in ER+/HER2-/Proliferation high breast cancers (
Table 5).
A survey of promoter binding sequences disclosed that all genes in the amplicon possess several putative binding sequences in their promoters for ERα, ERβ and the transcription factor E2F1 which is a target of activation by the cyclin D/CDK4 cascade in breast cancer with high proliferation fraction (
Table 6). In addition, several but not all genes possess promoter binding sites for the breast cancer pioneer factors FOXA1 and GATA3 and the transcription factor NFE2L2, a master regulator of detoxification programs which has been proposed to co-operate with BRF2 in resetting the cell oxidative stress tolerance limit (
Table 6) [
22].
Expression of the protein products of 8p11.23 genes in breast cancer, using immunohistochemistry with commercially available monoclonal and polyclonal antibodies, is shown in
Table 7. This evaluation includes cases with and without 8p11.23 amplifications. Expression of the proteins in breast cancers independently of sub-type and amplicon presence implies a potential of the protein to be a pathogenic player. Most proteins are moderately to highly expressed in several breast cancer cases with at least one antibody checked, the exception being ADGRA2, ADRB3 and STAR, whose genes are also not over-expressed at the mRNA level.
Next, the evolution of the 8p11.23 amplifications in metastatic breast cancer studies was assessed. In two studies, that included metastatic breast cancer patients, a slight increase in the frequency of amplifications was observed compared with non-metastatic studies (
Table 8). In one of the studies, the metastatic breast cancer project study, with 180 metastatic breast cancer patients, ZNF703 amplifications were more significantly increased and were observed in 20.6% of patients, including six samples with isolated amplifications of the gene without the neighboring genes being amplified. However, no concomitant increased ZNF703 mRNA expression was observed in these samples. In addition, mutations in any of the amplicon genes remain rare in metastatic breast cancer with a frequency of 1.4% or lower (range of mutated samples in each amplicon gene in either study = 0 to 4). In the French INSERM study, ZNF703, ERLIN2 and PLPBP were the most frequently amplified genes in 34 of 216 (15.7%) of patients [
18].
Another commonly amplified chromosomal locus in breast cancer is found at 11q13 and has been reported to be commonly co-amplified with 8p11 amplicon [
23]. Interrogation of TCGA breast cancer cohort confirmed that the 11q13 amplification (as captured by amplification of CCND1 gene) is observed in 34.7% of cases with the 8p11.23 amplicon (as represented by amplification of ZNF703 gene). In contrast, the 11q13 amplification is observed in only 12.5% of cases without 8p11.23 amplifications (x
2 p < 0.0001). Among the 43 cases in TCGA with 8p11 and 11q13 co-amplifications, 51.2% were of the luminal B subtype and 41.9% were of the luminal A subtype. These sub-types represent about 15% and 50%, respectively, of the total number of cases in the TCGA breast cancer study. The length of 8p11.23 amplicon, as determined by the number of amplified genes, is not different when 11q is co-amplified, compared with samples that do not possess 11q13 amplification.
Prognosis of breast cancer patients with the higher quartile mRNA expressions of each of the amplicon genes was compared with counterparts with mRNA expressions at the three lower quartiles. Among cohorts of patients with all sub-types of breast cancer, patients with higher expression (the higher quartile of the cohorts) of EIF4EBP1 and LSM1 mRNA had worse Relapse-Free Survival (RFS) than patients with lower expression of either genes, suggesting that the two genes act as oncogenes (
Figure 8A and
Figure 9A). When examined according to breast cancer sub-type, worse RFS of high expressers was observed in ER (Estrogen Receptor) positive cancers (
Figure 8B and
Figure 9B) but not in ER negative cases (
Figure 8C and
Figure 9C). The frequency of mRNA over-expression (z score > 2) in the different sub-groups of luminal breast cancers classified according to the 3-gene classifier for EIF4EBP1 and LSM1 was higher for the ER+/HER2-/proliferation high group compared with the ER+/HER2-/proliferation low group, suggesting that, in both gene cases, increased gene dosage in luminal B cancers translates in higher mRNA production and possibly increased protein that could lead to inferior RFS outcomes (
Figure 10).
RFS of cohorts of breast cancer patients categorized according to mRNA expressions of the other genes of the amplicon showed no statistically significant difference between the higher quartile expression and lower quartiles. For ERLIN2, NSD3, LETM2, FGFR1 and TACC1 cases with higher mRNA expression had even a better survival than the cohorts with lower expressions, although not reaching statistical significance at the pre-set 0.0005 level.
4. Discussion
Copy number alterations are more common than mutations in breast cancer and often happen in clusters encompassing several genes. One of the most common clusters of amplification is observed at locus 8p11.23 in about 10% to 15% of the total breast cancer cases. Nineteen genes are located on the most frequently amplified segment. In the great majority of amplified cases the amplification encompasses the whole segment with all the genes amplified, and more rarely only parts of the segment, most commonly including the more telomeric area.
The 8p11.23 amplicon or areas close to it at 8p have been previously reported to play an oncogenic role in breast cancer and putative driver oncogenes among the genes located in the amplicon have been proposed. These include ZNF703, FGFR1, EIF4EBP1, LSM1, BAG4 and PLPP5 [
24]. ZNF703 amplifications, for example, are associated with PR negativity among ER positive breast cancers [
25]. ER positive/PR negative/HER2 negative cancers segregate in the luminal B genomic phenotype, are commonly endocrine therapy resistant and have poor prognosis [
26]. A survey of genome gains and losses in luminal and basal breast cancers using array comparative genomic hybridization (aCGH) microarrays disclosed that ZNF703 was the most significant candidate oncogene in luminal cancers [
27]. ZNF703 mRNA over-expression correlates better with gene amplification of the gene in luminal B cancers and was associated with worse overall survival [
28]. Breast cancer cell lines expressing ZNF703 were resistant to tamoxifen treatment, while down-regulation of ZNF703 mRNA through miRNA synergized with tamoxifen in cell killing [
29]. ZNF703 is an ER-responsive gene and has a negative effect in expression of ER by suppressing its promoter in a negative feedback loop [
30]. In addition, ZNF703 up-regulates transcription factor E2F1, having, hence, a role in promoting proliferation [
30,
31]. ZNF703 suppresses also TGFβ signaling in breast cancer cells, neutralizing TGFβ-mediated anti-proliferative transduction signals [
32]. These data place ZNG703 as a strong candidate oncogene in breast cancers with amplifications of its gene locus.
FGFR1 overexpression has been reported to promote endocrine therapy resistance and to decrease DMFS (Distant Metastasis Free Survival) in ER positive cancers. Moreover, it is associated with higher Ki67 and decreased PR expression both characteristics of luminal B subtype [
33]. Amplification of EIF4EBP1 may favor the common co-occurrence of 8p11 amplification with amplifications of 11q13 which have been previously reported and are confirmed in the current study. The co-occurrence of 11q13 and 8p11 amplifications leads, among other genes, to amplification of gene RPS6KB2, encoding for kinase S6K2, on 11q13, which is, in common with EIF4EBP1, a target of mTOR (mechanistic Target of Rapamycin) complex and co-operates with it in cell programs of protein synthesis for cell growth [
34]. ER positive/HER2 negative breast cancers that were resistant to short term letrozole neo-adjuvant therapy and remained highly proliferative as measured by Ki67 expression were amplified for 8p11 and 11q13 [
35]. Cyclin D amplification from 11q13 leads to up-regulation of function of transcription factor E2F1, which then promotes transcription of ZNF703 and FGFR1 genes of the 8p11.23 amplicon [
23]. ZNF703 induction completes a positive feedback loop, given that ZNF703 is an inducer of E2F1 [
30,
36].
The current descriptive investigation based on the published series details the characteristics of 8p11.23 amplicon in breast cancer and elucidates possible implications based on gene expressions and regulations. Amplifications of 8p11.23 are not unique for breast cancer but are observed also in squamous lung carcinomas and urothelial cancers. In contrast, they are more rarely observed in other cancers including lung adenocarcinomas and other non-lung squamous carcinomas. This implies that one or more amplified genes in the area are under positive pressure for over-expression in some cancer environments but not others. This notion is enforced by the fact that in breast cancer there is a higher prevalence of the amplicon in aggressive luminal cancers compared with other sub-types. The gene or genes that are drivers of the aggressive pathophysiology of the amplified cases act through direct signaling effects and not through a global influence on the cancer tumor mutation burden or ploidy status, as there are no significant differences in TMB and AS in 8p11.23 amplified and non-amplified breast cancers.
A higher probability that the amplification of a gene of the amplicon is a driver event would be expected if the amplification is associated with a higher expression of the gene products at the mRNA and protein levels, and if higher expression is associated with adverse patient prognosis. Among the genes of the amplicon, mRNA expression is imperfectly associated with amplification with the higher number of cases with mRNA over-expression observed for ERLIN2, PLPBP, BRF2, RAB11FIP1, ASH2L, LSM1, DDHD2 and NSD3 in cancers with the amplicon. An adverse prognosis for higher mRNA expression was only present for EIF4EBP1 and LSM1. The caveat of this evaluation is that the comparison was made between groups expressing the respective mRNAs above or below the upper quartile and, as a result, several of the cases included in the high group would be non-amplified for the respective genes. The expression of the protein products of the amplicon genes is observed in several breast cancers; a fact that would maintain them in the list of candidate drivers. This evaluation is not able to inform regarding the functional status of the proteins or the presence and function of different isoforms that exist.
mRNA expression and eventual protein expression of the amplified genes may still depend on the presence of transcription factors and programs that are critical for their transcription in cells without the amplifications. In luminal cancers, the ER-dependent programs are influenced by pioneer factors FOXA1 and GATA3 and thus genes whose promoters contain binding sites or clusters of sites for these factors may be expected to be dependent on ER programs [
37,
38]. However, in luminal B cancers, where the amplicon is more often present, E2F1 programs, dependent on CDK4/cyclin D activation, may substitute for ER programs, which have lower activity in these cancers [
39]. The presence of binding sites for both the ER axis and E2F1 in many promoters of amplicon genes confirms the potential for regulation that could switch from ER dominant in luminal A cancers to E2F1 programs in luminal B cancers, or upon progression, when hormonal resistance develops. In the evaluation of protein products expression of the genes in 8p11.23 most, besides ADGRA2, ADRB3 and STAR are confirmed to be present. Variability in the expression with different monoclonal and polyclonal antibodies used in the Human Protein Atlas is probably a result of both the antigen specificity of the antibody, a variability of expression between cases and possibly the presence of homologous proteins in the human genome that could cross react.
A significant percentage (almost 70%) of 8p11.23 amplified breast cancer cases in TCGA display a global loss of the 8p arm denoting that the 8p11.23 locus is amplified amidst extensive chromosomal material losses elsewhere in the 8p arm. 8p losses are less common (less than 40% of cases) in 8p11.23 non-amplified breast cancers. This suggests a mechanism of acquisition of the amplicon in which it arises in cases with random extensive 8p arm breaks, that lead mostly to chromosomal material loss, and is favored and becomes fixed due to promotion of survival and proliferation in the cancer cells harboring it. It would be worth investigating whether the ratio of amplification compared to loss of surrounding loci suggests the presence of an oncogene more globally and could serve for the establishment of criteria of oncogene discovery.
Regarding therapy of breast cancers harboring the 8p11.23 amplicon, FGFR inhibitors have been investigated and could be a rational targeted therapy for cancers overexpressing FGFR1. However, the clinical experience with FGFR inhibitors shows that cancers with FGFR mutations or fusions are more sensitive to inhibition compared to cancers that harbor amplifications [
40,
41]. It is possible that, akin to HER2 inhibitors, the level of amplification of FGFR1 and the level of protein expression will need to be taken into consideration in the targeted development of FGFR inhibitors for the therapy of 8p11.23 amplified breast cancers and other types of cancer with the amplicon.
EIF4EBP1 is a negative regulator of protein translation and is a target of kinase mTOR which inhibits EIF4EBP1, thereby promoting protein translation that is critical in proliferating cells [
42]. Breast cancer with 8p11.23 inhibition of the amplified EIF4EBP1 may increase the dependency of the cancer cells to the activity of mTOR in order to secure active protein production, and thus may make these cells sensitive to inhibition of mTOR by drugs such as everolimus. Interestingly, a protein produced by a gene in the frequently co-amplified 11q13 amplicon, p70S6 is also an mTOR target in a cell growth pathway, possibly increasing the dependence of co-amplified cells to mTOR inhibitors. Presence of 8p11.23 and/or 11q13 amplicons as predictive biomarkers of mTOR inhibition efficacy remains to be determined in clinical studies. Other therapeutic opportunities relying on putative tumor cell dependencies on products of 8p11.23 amplified genes may exist but would require further studies. For example, inhibition of histone modifiers BRF2 and NSD3 could be a targeted approach of interest, should clinical-grade inhibitors become available for development.